WO2021175458A1 - Playback transition from first to second audio track with transition functions of decomposed signals - Google Patents

Playback transition from first to second audio track with transition functions of decomposed signals Download PDF

Info

Publication number
WO2021175458A1
WO2021175458A1 PCT/EP2020/065995 EP2020065995W WO2021175458A1 WO 2021175458 A1 WO2021175458 A1 WO 2021175458A1 EP 2020065995 W EP2020065995 W EP 2020065995W WO 2021175458 A1 WO2021175458 A1 WO 2021175458A1
Authority
WO
WIPO (PCT)
Prior art keywords
transition
signal
decomposed
input
function
Prior art date
Application number
PCT/EP2020/065995
Other languages
French (fr)
Inventor
Kariem Morsy
Federico Tessmann
Christoph Teschner
Original Assignee
Algoriddim Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Algoriddim Gmbh filed Critical Algoriddim Gmbh
Priority to EP20730432.0A priority Critical patent/EP4115628A1/en
Priority to PCT/EP2020/074034 priority patent/WO2021175460A1/en
Priority to AU2020433340A priority patent/AU2020433340A1/en
Priority to EP20792654.4A priority patent/EP4115629A1/en
Priority to PCT/EP2020/079275 priority patent/WO2021175461A1/en
Priority to PCT/EP2020/081540 priority patent/WO2021175464A1/en
Priority to EP20800953.0A priority patent/EP4115630A1/en
Priority to PCT/EP2021/055795 priority patent/WO2021176102A1/en
Priority to EP21709063.8A priority patent/EP4133748A1/en
Priority to US17/905,552 priority patent/US20230120140A1/en
Priority to US17/343,386 priority patent/US20210326102A1/en
Priority to US17/343,546 priority patent/US11347475B2/en
Priority to US17/459,450 priority patent/US11462197B2/en
Publication of WO2021175458A1 publication Critical patent/WO2021175458A1/en
Priority to US17/689,574 priority patent/US11488568B2/en
Priority to US17/741,678 priority patent/US20220269476A1/en
Priority to US17/747,473 priority patent/US20220284875A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/008Visual indication of individual signal levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04886Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0091Means for obtaining special acoustic effects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • G10H1/06Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
    • G10H1/08Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by combining tones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/02Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
    • H04H60/04Studio equipment; Interconnection of studios
    • H04H60/05Mobile studios
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/081Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/125Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • G10H2210/195Modulation effects, i.e. smooth non-discontinuous variations over a time interval, e.g. within a note, melody or musical transition, of any sound parameter, e.g. amplitude, pitch, spectral response, playback speed
    • G10H2210/241Scratch effects, i.e. emulating playback velocity or pitch manipulation effects normally obtained by a disc-jockey manually rotating a LP record forward and backward
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/325Musical pitch modification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/375Tempo or beat alterations; Music timing control
    • G10H2210/391Automatic tempo adjustment, correction or control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/091Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
    • G10H2220/101Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/091Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
    • G10H2220/101Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
    • G10H2220/106Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters using icons, e.g. selecting, moving or linking icons, on-screen symbols, screen regions or segments representing musical elements or parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2230/00General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
    • G10H2230/005Device type or category
    • G10H2230/015PDA [personal digital assistant] or palmtop computing devices used for musical purposes, e.g. portable music players, tablet computers, e-readers or smart phones in which mobile telephony functions need not be used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/325Synchronizing two or more audio tracks or files according to musical features or musical timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/031Spectrum envelope processing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/035Crossfade, i.e. time domain amplitude envelope control of the transition between musical sounds or melodies, obtained for musical purposes, e.g. for ADSR tone generation, articulations, medley, remix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/541Details of musical waveform synthesis, i.e. audio waveshape processing from individual wavetable samples, independently of their origin or of the sound they represent
    • G10H2250/641Waveform sampler, i.e. music samplers; Sampled music loop processing, wherein a loop is a sample of a performance that has been edited to repeat seamlessly without clicks or artifacts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/06Receivers
    • H04B1/16Circuits
    • H04B1/1646Circuits adapted for the reception of stereophonic signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/02Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
    • H04H60/04Studio equipment; Interconnection of studios
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/003Digital PA systems using, e.g. LAN or internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/005Audio distribution systems for home, i.e. multi-room use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/01Input selection or mixing for amplifiers or loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/13Aspects of volume control, not necessarily automatic, in stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/07Synergistic effects of band splitting and sub-band processing

Definitions

  • the present invention relates to methods and devices for processing audio signals, in particular a first input signal of a first input audio track and a second input signal of a second input audio track, which allow playback of a transition from the first input audio track to the second input audio track.
  • Input audio tracks may in particular be songs or other pieces of music which are to be played through a PA system, speakers or headphones.
  • this object is achieved by a method for processing audio signals, comprising the steps of
  • audio tracks in particular a first input audio track and a second input audio track, may include digital audio data such as contained in audio files or digital audio streams.
  • the files or streams may have a specific length or playback duration or alternatively may have an undefined or infinitive length or playback duration, such as for example in case of a live stream or a continuous data stream received from a content provider via Internet.
  • digital audio tracks are usually stored in an audio file in association with consecutive time frames, the length of each time frame being dependent on the sampling rate of the audio data as conventionally known. For example, in an audio file sampled at a sampling rate of 44.1 kHz one time frame will have a length of 0.023 ms.
  • audio tracks may be embodied by analog audio signals, for example signals played by an analog playback device such as a vinyl player, a tape player etc.
  • audio tracks may be songs or other pieces of music provided in digital or analog format.
  • audio signal refers to an audio track or any part or portion of an audio track at a certain position or time within the audio track.
  • the audio signal may be a digital signal processed, stored or transmitted through an electronic control system, in particular computer hardware, or may be an analog signal processed, stored or transmitted by analog audio hardware such as an analog mixer, a PA system or the like.
  • Methods according to the first aspect of the invention use a step of decomposing at least the first input audio signal to obtain a plurality of different decomposed signals.
  • decomposing algorithms and services are known in the art, which allow decomposing audio signals to separate therefrom one or more signal components of different timbres, such as vocal components, drum components or instrumental components.
  • Such decomposed signals and decomposed tracks have been used in the past to create certain artificial effects such as removing vocals from a song to create a karaoke version of a song.
  • Some Al systems usually implement a convolutional neural network (CNN), which has been trained by a plurality of data sets for example including a vocal track, a harmonic/instrumental track and a mix of the vocal track and the harmonic/instrumental track.
  • CNN convolutional neural network
  • Examples for such conventional Al systems capable of separating source tracks such as a singing voice track from a mixed audio signal include: Pretet, “Singing Voice Separation: A study on training data”, Acoustics, Speech and Signal Processing (ICASSP), 2019, pages 506-510; “spleeter” - an open-source tool provided by the music streaming company Deezer based on the teaching of Pretet above, “PhonicMind” (https://phonicmind.com) - a voice and source separator based on deep neural networks, Open-Unmix” - a music source separator based on deep neural networks in the frequency domain, or “Demucs” by Facebook Al Research - a music source separator based on deep neural networks in the waveform domain.
  • These tools accept music files in standard formats (for example MP3, WAV, AIFF) and decompose the song to provide decomposed/separated tracks of the song, for example a vocal track, a bass track, a drum track, an accompaniment track or any mixture thereof.
  • standard formats for example MP3, WAV, AIFF
  • volume changes of decomposed tracks based on different transition functions are used to realize a smooth transition between playback of a first audio track and playback of a second audio track.
  • the decomposed signals of the first input signal are recombined or mixed to obtain a first output signal in such a manner that the first output signal substantially equals the first input signal.
  • the volume levels of all decomposed tracks are set to the same value, in particular to 100 % (full volume).
  • the set of decomposed signals obtained in the step of decomposing the first input signal is preferably a complete set, which means that they sum up to an output signal substantially equal to the original input signal.
  • a complete set of decomposed signals obtained in the step of decomposing then includes a decomposed vocal signal, a decomposed drum signal and a decomposed instrumental signal, such that, when recombined, they sum up to an output signal substantially equal to the original first input signal.
  • “Substantially equal” in this respect means that at this point in time of the process, a difference between the first output signal and the first input signal is not audible or at least not disturbing to a user.
  • a transition towards playback of the second input track is commenced by reducing the volume levels of the decomposed signals of the first input signal and increasing the volume levels of audio data obtained from the second input signal.
  • the volume levels of decomposed tracks are each changed according to a respective transition function associated to each of the decomposed signals. At least the first transition function associated with the first decomposed signal is different from the second transition function associated with the second decomposed signal such that in a transition time interval the volume change of the first decomposed signal will be different from that of the second decomposed signal.
  • vocal components may be reduced in volume or even muted during a transition in order to avoid clashing of the vocals of two different songs, while at the same time other sound components, such as a drum component, which more easily mix with corresponding components of the second song can be maintained at a higher volume level in order to achieve an acoustic continuity throughout the transition.
  • the method according to the first aspect of the invention allows reducing abrupt changes of the sound or disharmony/dissonances induced by mixing certain sound components of two different audio tracks, while at the same time mixing other sound components which are more suitable to be played together, such as to achieve a smooth and continuous transition between two audio tracks.
  • a method of the first aspect may further comprise the steps of decomposing the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal, assigning a third volume level to the third decomposed signal and a fourth volume level to the fourth decomposed signal, starting playback of the second output signal obtained from recombining at least the third decomposed signal and the fourth decomposed signal, while playing the second output signal, increasing the third volume level according to a time-dependent third transition function and increasing the fourth volume level according to a time-dependent fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
  • the volume of a decomposed drum signal may be increased more quickly as it has a lower tendency to clash with the decomposed drum signal of the first input signal, whereas the decomposed vocal signal of the second input signal may be faded in at a later point in time or by using a transition function beginning with a lower slope in order to avoid clashing with the decomposed vocal signal of the first input signal.
  • Each of the transition functions preferably assigns a predetermined volume level or a predetermined change in volume level to each of a plurality of time frames within a transition time interval defined by a transition start time and transition end time or to each of a plurality of controller positions of a user control element.
  • Transition functions may be embodied in digital format by a formula stored in a memory of an electronic device such that for each time or control position an associated volume level or a change in volume level can be calculated using the formula.
  • a lookup table or prestored array can be used which stores predetermined values of volume levels or changes in volume level such that a volume level or change in volume level can be derived for each time frame or controller position by looking up the table or array.
  • transition functions may be represented by analog means such as a controllable resistor.
  • At least two of the transition functions are based on time frames and have the same transition time interval reaching from the same transition start time to the same transition end time such that the transition can be carried out within a predetermined time interval using more than one transition function for more than one decomposed signal.
  • the first transition function and the second transition function are preferably defined such that the volume level is at a maximum at the transition start time and at a minimum at the transition end time, such that the first output signal is continuously faded out and can finally be stopped completely when the transition to the second output signal is completed.
  • a minimum volume level herein preferably refers to a 0 % volume level or substantial silence.
  • a 0 % volume level or substantial silence includes playback of an audio signal at a volume level below an auditory threshold such that it cannot be heard any more by a user during playback, and it further includes a complete stop of the playback of an audio signal.
  • the third transition function and the fourth transition function may be defined such that the volume level is at a minimum, in particular corresponding to substantial silence, at a transition start time and at a maximum at a transition end time in order to allow continuous fade-in of the second output signal from silence to maximum.
  • the shapes of the transition functions can be set in order to achieve certain effects for certain decomposed signals and for controlling the transition.
  • at least one of the transition functions may be a linear function or contain a linear portion. Linear fade-ins or fade- outs are relatively easy to realize technically and correspond to sound developments the user is used to hear in conventional mixes, for example at the end of songs.
  • At least one of the transition functions may be a continuous function, such that unexpected sudden changes of the volume level can be avoided.
  • at least one of the transition functions may be a monotonic function such that the volume level does not change its direction with regard to increasing or decreasing throughout the transition time interval or throughout the controller range. In this way, the user gets an impression of a seamless, continuous transition from the first output signal towards the second output signal.
  • the first transition function and the second transition function may differ from each other with regard to slope.
  • the third transition function and the fourth transition function may differ from each other with regard to slope. This means that for example the decomposed vocal signal of the first input signal may be faded out more quickly in order to give way for the decomposed vocal signal of the second input signal, whereas the decomposed drum signal of the first input signal remains in the mix more prominently for a longer time and mixes with the decomposed drum signal of the second input track over a considerable portion of the transition time interval or controller range.
  • the step of decomposing includes processing the first audio signal and/or the second audio signal within an Al system comprising a trained neural network.
  • Al systems achieve a high level of quality and in particular allow decomposing different timbres of a mixed audio signal, which in particular may correspond or resemble certain source tracks that were originally mixed when producing or generating the input audio track, such as certain instrumental tracks, vocal tracks, drum tracks etc.
  • the step of decomposing may include decomposing the first/second audio signal with regard to predetermined timbres such as to obtain decomposed signals of different timbres, preferably being selected from the group consisting of a vocal timbre, a non-vocal timbre, a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, and any combination thereof.
  • the non-vocal timbre, the non-drum timbre and the non-harmonic timbre may in particular be respective complement signals to that of the vocal timbre, the drum timbre and the harmonic timbre.
  • Complement signals may be obtained by excising from the input signal one decomposed signal of a specific timbre.
  • an input signal may be decomposed or separated into two decomposed signals, a decomposed vocal signal of a vocal timbre, and its complement, a decomposed non-vocal signal of a non-vocal timbre, which means that a mixture of the decomposed vocal signal and the decomposed non-vocal signal results in a signal substantially equal to the input signal.
  • decomposition can be carried out to obtain a decomposed vocal track and a plurality of decomposed nonvocal tracks such as a decomposed drum track and a decomposed harmonic track (including harmonic instruments such as guitars, piano, synthesizer).
  • the first decomposed signal and the third decomposed signal are different signals of a vocal timbre
  • the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre
  • a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
  • the sum of the decomposed vocal signals is smaller during the transition, in particular at least at a transition reference time or a controller reference position, than a sum of the decomposed non-vocal signals. This reduces the mixture of the vocals of the different input signals (avoiding clashing of vocals of different songs), while keeping continuity of the playback during the transition because of the higher volume of the decomposed non-vocal signals of both input signals.
  • the first decomposed signal and the third decomposed signal are different signals of a drum timbre
  • the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or at least at a transition reference time or at a controller reference position (for example a controller center position), a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function.
  • a mixture of the decomposed drum signals of both input signals is achieved with relatively high volume level throughout the transition time interval or throughout the controller range, such that the drum beat continuously moves on throughout the transition time interval or throughout the controller range to ensure a feeling of continuity and to avoid any undesired breaks of the rhythm.
  • the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1 ) and the transition end time (T3) or between the controller end positions.
  • decomposed signals which have a tendency to induce disharmony or dissonances when mixed together are controlled in such a manner that at the time they have about the same volume level, i.e.
  • the method further includes a step of analyzing an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
  • Song parts of a song are usually distinguishable by an analyzing algorithm since they differ in several characteristics such as instrumental density, medium pitch or rhythmic pattern. Song parts may in particular be a verse, a chorus, a bridge, an intro or an outro as conventionally known. Certain instrumental or rhythmic patterns will remain constant within a song part and will change in the next song part.
  • Recognition of song parts may be supported by analyzing not only the entire input signal but instead or in addition thereto at least one of the decomposed signals. For example, by analyzing a decomposed bass signal in isolation from the remaining sound components, it will be easy to derive therefrom a chord progression of the song which is one of the key criteria to differentiate song parts. Furthermore, an analysis of the decomposed drum signals allows a more accurate recognition of a rhythmic pattern and thus a more accurate detection of certain song parts. A song part junction then refers to a junction between one song part and the next song part.
  • transition time intervals may include song part junctions which allow to carry out the transition between two songs at the end of the song part which further improves smoothness and likeability of the transition.
  • Song parts may be detected by analyzing at least one of the decomposed signals within an Al system comprising a trained neural network.
  • analyzing includes detecting silence within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than -30 dB.
  • the step of analyzing decomposed signals may include detecting silence continuously extending over a predetermined time span within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than -30 dB.
  • start- and/or end points of silence may be taken as song part junctions.
  • the method further includes the steps of receiving a user input referring to a transition command, including at least one transition parameter, and setting at least one of the transition functions according to the transition parameter.
  • the transition parameter may be a transition start time or a transition end time of a transition time interval of at least one of the transition functions, or may be a length of a transition time interval of at least one of the transition functions. This allows a user to control when the transition is to be carried out and how long it takes.
  • a user may also control at which position in the song a transition is to be performed by choosing only one transition parameter such as a transition reference time of at least one of the transition functions.
  • the transition parameter may refer to a slope, shape or offset of at least one of the transition functions which allows a user to control the dynamics of the transition for one of or more decomposed signals.
  • a transition parameter to be controlled by a user input may refer to an assignment or de-assignment of a preset transition function to or from a selected one of the plurality of decomposed signals.
  • a user may select one or more decomposed signals to take part in the transition which are then submitted to the volume changes according to the respective transition functions.
  • the transition function assigned to a certain decomposed signal may be selected from one of a set of preset transition functions (sets of different transition time interval lengths or sets of different transition functions having different slope, shape or offset).
  • the method may comprise the steps of
  • At least one tempo parameter of the first and/or second input track in particular a BPM (beats per minute) and/or a beat grid and/or a beat phase of the first and/or second input track and
  • a tempo matching processing based on the determined tempo parameter, including a time stretching and/or time shifting and/or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching BPM and/or mutually matching beat phases.
  • first and second output signals will have matching tempi thus enhancing continuity of the playback during the transition.
  • the method may further comprise the steps of
  • a key matching processing based on the determined key including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
  • the method of the present invention can be applied to any type of input audio track.
  • the input audio track may be stored on a local device such as a storing means of a computer, and may be present as a digital audio file.
  • the first input audio track or the second input track may be received as a continuous stream, for example a data stream received via Internet, a real-time audio stream received from a live audio source or from a playback device in playback mode.
  • the range of applications is basically not limited to a specific medium.
  • playback of the first output signal and/or second output signal may be started while continuing to receive the continuous stream.
  • decomposing first and/or second input signal is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal.
  • Partitioning the first and/or second input signals into segments (preferably segments of equal lengths) and operating the method of the invention based on these segments allows using the decomposition result for playing the transition at an earlier point in time, i.e. after finishing decomposition of just one segment, without having to wait until the decomposition result of an entire audio file for example is available.
  • decomposition of the second input signal can start at an arbitrary point within the second input audio track. For example, when an optimal transition start point for the second input audio file is determined to be at e.g. 01 :20 (one minute, twenty seconds), that decomposition can start at the segment closest to 01 :20, and the beginning part which is not used does not have to be decomposed. This saves performance and ensures that decomposition results are available much faster.
  • one segment has a playback duration which smaller than 20 seconds.
  • the method steps in particular the steps of providing the first and second input signals, decomposing the first input signal, starting playback of the first output signal and starting playback of the second output signal, may be carried out in a continuous process, wherein a time shift between receiving the first input audio track or a first portion of a continuous stream of the first input audio track and starting playback of the first output signal is preferably less than 10 seconds, more preferably less than 2 seconds, and/or wherein a time shift between receiving the second input audio track or a first portion of a continuous stream of the second input audio track and starting playback of the second output signal is preferably less than 10 second, more preferably less than 2 seconds.
  • At least one, preferably all of the first and second input signals, the decomposed signals and the first and second output signals represent stereo signals, each comprising a left channel signal portion and a right channel signal portion, respectively.
  • the method is thus suitable for playing music at high quality.
  • a device for processing audio signals comprising:
  • a decomposition unit configured to decompose the first input audio signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal
  • a playback unit configured to start playback of a first output signal obtained from recombining at least the first decomposed signal at a first volume level with the second decomposed signal at a second volume level, such that the first output signal substantially equals the first input signal
  • a transition unit for performing a transition between playback of the first output signal and playback of a second output signal obtained from the second input signal, wherein the transition unit has a volume control section adapted for reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function.
  • a device of the second aspect of the present invention is preferably embodied as a computer running a suitable software application.
  • the software application may be configured to carry out a method according to the first aspect of the present invention.
  • the computer may be a personal computer, a tablet computer or a smartphone, and may include in the manner as conventionally known a RAM, a ROM, a microprocessor and suitable input/output means. Included in the computer or connected to the computer may be an audio interface which may be connected, for example wireless (e.g. via Bluetooth or similar technology), to speakers, headphones or a PA system in order to output sound when playing the first and second output signals, respectively.
  • the device may be embodied as a standalone DJ device including suitable electronic hardware or computing means.
  • the device preferably has a decomposition unit which includes the Al system comprising a trained neural network.
  • the complete Al system including the trained neural network may be integrated within the device, for example as a software application or software plugin running locally in a memory integrated within the device.
  • the device preferably includes a user interface embodied by either a display such as a touch display or a display to be operated by a pointer device, or as one or more hardware control elements such as a hardware fader or rotatable hardware knobs, or by a voice command or by any other user input/output technology.
  • FIG. 1 shows a device according to an embodiment of the present invention
  • Fig. 2 shows a schematic functional diagram of components of the device of the embodiment shown in Fig. 1
  • Figs. 3a-3c show transition functions for decomposed tracks as used in the device of the embodiment of the invention as shown in Figs. 1 and 2 and according to a method of an embodiment of the invention.
  • a device 10 may be formed by a computer such as a tablet computer or a smartphone, which comprises standard hardware components such as input/output ports, wireless connectivity, a housing, a touchscreen, an internal storage as well as a plurality of microprocessors, RAM and ROM.
  • Essential features of the present invention are implemented in device 10 by means of a suitable software application or a software plugin running on device 10.
  • the display of device 10 preferably has a first section 12a associated to a first song A and a second section 12b associated to a second song B.
  • First section 12a includes a first waveform display region 14a which displays at least one graphical representation of song A, in particular one or more waveform signals associated to song A.
  • the first waveform display region 14a may display a waveform of song A and/or one or more waveforms of decomposed signals obtained from decomposing song A.
  • decomposition of song A may be carried out to obtain a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which may be displayed within the first waveform display region 14a.
  • a second waveform display region 14b may be included in the second section 12b such as to display a graphical representation related to song B in the same or corresponding manner as described above for song A.
  • the second waveform display region 14b may display one or more waveforms of song B and/or at least one waveform of a decomposed signal obtained from song B.
  • first and second waveform display regions 14a, 14b may each display a play-head 16a, 16b, respectively, which show a current playback position within song A and song B, respectively.
  • Each of the first and second sections 12a and 12b may further include a number of control elements for controlling playback, effects and other features related to song A and song B, respectively.
  • the first section 12a may include a play button 18a which can be pushed by a user to alternatively start and stop playback of song A (more precisely audio signals obtained from Song A, such as decomposed signals).
  • the second section 12b may include a play button 18b which may be pushed by a user to alternatively start and stop playback of song B (more precisely audio signals obtained from Song B, such as decomposed signals).
  • An output signal generated by device 10 in accordance with the settings of device 10 and with a control input received from a user may be output at an output port 20 in digital or analog format, such as to be transmitted to a further audio processing unit or directly to a PA system, speakers or head phones. Alternatively, the output signal may be output through internal speakers of device 10.
  • device 10 can perform a smooth transition from playback of song A to playback of song B by virtue of a transition unit, which will be explained in more detail below.
  • device 10 may comprise a transition button 22 displayed on the display of device 10, which may be pushed by a user to initiate a transition from playback of song A towards playback of song B.
  • transition button 22 By a single operation of transition button 22 (pushing the button 22), device 10 starts changing individual volumes of individual decomposed signals of songs A and B according to respective transition functions such as to smoothly cross-fade from song A to song B within a predetermined transition time interval.
  • device 10 may include a transition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B.
  • a transition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B. This allows controlling the volumes of individual decomposed signals of songs A and B using transition functions, which are based not on time but on controller position of the transition controller 24. In this manner, in particular the speed and progress of the transition can manually be controlled through the transition controller 24.
  • Fig. 2 shows a schematic illustration of internal components of device 10 and a signal flow within device 10.
  • Audio processing is based on a first input track and a second input track, which may be stored within the device 10, for example in an internal memory of the device, a hard drive or any other storage medium.
  • First and second input tracks are preferably digital audio files of a standard compressed or uncompressed audio file format such as mp3, WAV, AIFF or the like.
  • first and second input tracks may be received as continuous streams, for example via an Internet connection of device 10 or from an external playback device via an input audio interface or via a microphone.
  • First and second input tracks are preferably processed within first and second input units 26a and 26b, respectively, which may be configured to decrypt or decompress the audio data, if necessary, and/or may be configured to extract a segment of the first input track and a segment of the second input track in order to continue processing based on the segments.
  • This has an advantage that time-consuming processing algorithms, such as the decomposition based on a neural network, will not have to analyze the entire first or second input track upfront, but will perform processing based on shorter segments, which allows continuing processing and eventually start playback at an earlier point in time.
  • the output of the first and second input units 26a, 26b form first and second input signals, and they are input into first and second Al systems 28a, 28b of a decomposition unit 40.
  • Each Al system 28a, 28b includes a neural network trained to decompose the first and second input signals, respectively, with respect to sound components of different timbres.
  • Decomposition unit 40 thus decomposes the first input signal to obtain a first group of decomposed signals and decomposes the second input signal to obtain a second group of decomposed signals.
  • each group of decomposed signals includes a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which each form a complete set of decomposed signals or a complete decomposition, which means that a sum of all decomposed signals of the first group will resemble the first input signal, and the sum of all decomposed signals of the second group will resemble the second input signal.
  • decomposition unit 40 may also include only one Al system and only one neural network, which is trained and configured to determine all decomposed signals of the first input signal as well as all decomposed signals of the second input signal.
  • more than two Al systems may be used, for example a separate Al system and a separate neural network may be used to generate each of the decomposed signals.
  • Playback unit 42 comprises a transition unit 44, which is basically adapted to recombine the decomposed signals of both groups taking into account specific volume levels associated to each of the decomposed signals.
  • Transition unit 44 is configured to recombine the decomposed signals in such a manner as to either play only a first output signal obtained from a sum of all decomposed signals of the first input signal, or a second output signal obtained from a sum of all decomposed signals of the second input signal, or any transition in between the first and the second output signals where decomposed signals of both first and second input signals are played.
  • transition unit 44 stores individual transition functions DA, VA, HA, DB, VB, HB for each of the decomposed signals which each define a specific volume level for each time frame within a transition interval or for each controller position of the transfer controller within a controller range. Taking into account the respective volume levels according to the respective transition functions DA, VA, HA, DB, VB, HB, all decomposed signals will then be recombined to obtain the output signal.
  • Playback unit 42 may further include a control unit 45, which is adapted to control at least one or the transition functions DA, VA, HA, DB, VB, HB based on a user input.
  • a control unit 45 which is adapted to control at least one or the transition functions DA, VA, HA, DB, VB, HB based on a user input.
  • the output signal generated by playback unit 42 may then be routed to an output audio interface 46 for a sound output.
  • an output audio interface 46 for a sound output.
  • one or more sound effects may be inserted into the audio signal by means of one or more effect chains 48.
  • effect chain 48 is located between playback unit 42 and output audio interface 46.
  • Figs. 3a to 3c show examples of transition functions that may be used in transition unit 44 to set specific volume levels of individual decomposed signals depending on time.
  • the example transition functions are based on time (time dependent transition functions), thus the transition is performed within a transition time interval reaching from a transition start time T1 to a transition end time T3.
  • a time T2 is referred to as a transition reference time.
  • a transition function DA of the decomposed drum signal of song A starts at 100 % at T1 and decreases linearly to 0 % at T3, while the transition function DB of the decomposed drum signal of song B starts at 0 % at T1 and increases linearly to reach 100 % at T3.
  • the linear transition functions DA and DB intersect at T2. It can be seen that a sum of DA+DB equals 100 % throughout the transition time interval from T1 to T3. Thus, the overall volume level of all drums remains constant during the transition as well as before and after the transition such as to achieve a high level of audible continuity.
  • Fig. 3b shows transition functions of decomposed vocal signals of songs A and B.
  • the transition function VA of the decomposed vocal signal of song A starts at 100 % at T1 and decreases linearly to reach 0 % in a middle region of the transition time interval, for example at the transition reference time T2. Afterwards, the transition function VA remains constant at 0 % until T3, i.e. in the interval between T2 and T3.
  • the transition function VB of the decomposed vocal signal of song B starts at 0 % at T1 and remains constant at 0 % until a middle region of the transition time interval, in particular until T2, and afterwards increases linearly to reach 100 % at T3.
  • the transition function VA of the decomposed vocal signal of song A starts at 100 % at T1 and decreases linearly to reach 0 % in a middle region of the transition time interval, for example at the transition reference time T2. Afterwards, the transition function VA remains constant at 0 % until T3, i.e. in the interval between T2
  • a sum of the transition functions VA+VB reaches the minimum in the middle region of the transition time interval, in particular at T2, and specifically becomes 0 %.
  • the volume level of the decomposed vocal signal of song B starts rising only after the volume level of the decomposed vocal signal of song A has dropped to 0 %. In this way, any clashing of the vocals of songs A and B can be avoided.
  • transition functions of decomposed harmonic signals for example instrumental components
  • the transition function HA of the decomposed harmonic signal of song A starts at 100 % at T1 and reduces in a linear manner, but with a steeper slope as compared to the transition function VA of the decomposed vocal signal of song A, such as to reach 0 % at a time before transition function VA reaches 0 %, specifically before T2.
  • transition function HA remains constant at 0 % until T3.
  • transition function HB of the decomposed harmonic signal of song B rises continuously and monotonically from 0 % at T1 to 100 % at T3, but not in a linear manner but in a curved manner, for example a parabolic or exponentially curved manner.
  • a slope of transition function HB is increasing from T1 to T3.
  • a mixture of the decomposed harmonic signals of songs A and B is again avoided or substantially reduced, because the substantial increase of the volume level of the decomposed harmonic signal of song B starts only after the volume level of the decomposed harmonic signal of song A has reached 0 %.
  • transition functions shown in Figs. 3a to 3c are defined in relation to time within a transition time interval from T1 to T3, corresponding or other transition functions may likewise be defined with respect to the controller position of the transition controller 24 shown in Fig. 1.
  • the horizontal axis of the transition functions may show the controller position reaching over the controller range from left end position to right end position.
  • a user may initiate a transition according to the transition functions shown in Figs. 3a to 3c for example by pushing the transition button 22.
  • T1 may be set to the time at which the user pushes the transition button 22.
  • the transition may be controlled by a user by an appropriate marking or selection within one of the first and second waveform display regions 14a, 14b or any other user input. For example by clicking on a certain position in one of the waveforms displayed on one of the waveform display regions 14a, 14b, timing of a next transition can be set accordingly, for example any of the time points T1 , T2 or T3 may be set at the specified position within the waveform corresponding to a certain future time point.
  • device 10 may have stored a setting, for example a pre-stored setting or a setting that can be manipulated by a user, wherein the setting defines at least one condition for carrying out a transition from song A to song B or vice versa.
  • the setting may specify that at a certain point in time with respect to an end of one of songs A or B, a transition to the respective other song is commenced.
  • a transition from song A to song B may be started at a certain time period (for example 5 seconds) before the end of song A, such as to avoid any interruption of the playback when song A ends.
  • device 10 may include means for determining characteristic song parts of songs A and/or B, such as a verse, a chorus, a bridge, an intro or an outro. A user may then choose to carry out a transition at a junction between two song parts, or device 10 may automatically carry out a transition at certain song part junctions and towards certain song part junctions of the other song, for example a transition from the beginning of an outro section of song A to an end of an intro section of song B.
  • characteristic song parts of songs A and/or B such as a verse, a chorus, a bridge, an intro or an outro.
  • Method for processing audio data comprising the steps of providing a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, providing a second audio track, analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter, generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
  • Method of item 2 wherein the step of analyzing audio data includes analyzing the decomposed data to determine a transition point as the mixing parameter, and wherein the output track is generated using the transition point such that the first portion is arranged before the transition point and the second portion is arranged after the transition point.
  • the output track further includes a transition portion arranged between the first portion and the second portion, and associated to the transition point, wherein in the transition portion a volume level of the first output data is reduced and/or a volume level of the second output data is increased.
  • the step of analyzing audio data includes determining at least one mixing parameter referring to at least one of: a tempo of the first and/or second audio track, a BPM (beats per minute) of the first and/or second audio track, a beat grid of the first and/or second audio track, a beat phase of the first and/or second audio track, a downbeat position within a first and/or second audio track, a beat shift between the first audio track and the second audio track, a key of the first and/or second audio track, a chord progression of the first and/or second audio track, a timbre or group of timbres of the first and/or second audio track, a song part junction of the first and/or second audio track.
  • Method of at least one of the preceding items wherein the step of analyzing audio data includes detecting silence data within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than -30 dB.
  • step of analyzing audio data includes detecting silence data continuously extending over a predetermined time span within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than -30 dB.
  • step of analyzing audio data includes determining at least a first mixing parameter based on the decomposed data, and at least a second mixing parameter based on the first mixing parameter, said second mixing parameter preferably being the transition point.
  • the step of analyzing audio data includes determining a tempo of the first and/or second audio track as mixing parameter, and wherein the step of generating the output track comprises a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first audio track and/or the second audio track, such that the first output data and the second output data have mutually matching tempos.
  • step of analyzing audio data includes determining a key of the first and/or second audio track as mixing parameter, and wherein the step of generating the output track comprises a key matching processing including pitch shifting of audio data obtained from the first audio track and/or the second audio track, such that the first output data and the second output data have mutually matching keys.
  • Method of at least one of the preceding items, wherein the step of decomposing the mixed input data includes processing the mixed input data within an Al system comprising a trained neural network.
  • Method of at least one of the preceding items, wherein at least one of the steps of analyzing the audio data and generating the output track includes processing of audio data within an Al system comprising a trained neural network.
  • Method of at least one of the preceding items further comprising playing the output track.
  • Device for processing audio data preferably device adapted to carry out a method according to at least one of the preceding items, said device comprising a first input unit for receiving a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, a second input unit for receiving a second audio track, a decomposition unit for decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, - an analyzing unit for analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter, an output generation unit for generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
  • Device of item 12 comprising a tempo matching unit adapted for time stretching or resampling of audio data obtained from the first audio track and/or the second audio track, such as to generate the first output data and the second output data having mutually matching tempos.
  • Device of item 12 or item 13 comprising a key matching unit adapted for pitch shifting of audio data obtained from the first audio track and/or the second audio track, such as to generate the first output data and the second output data having mutually matching keys.
  • Method for processing audio data comprising the steps of providing an audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, analyzing the decomposed data to determine a transition point between a first song part and a second song part within the audio track.
  • Method for processing audio data comprising the steps of providing a set of audio tracks, each including mixed input data, said mixed input data representing audio signals containing a plurality of different timbres, decomposing each audio track of the set of audio tracks, such as to obtain a decomposed track associated with the respective audio track, wherein the decomposed track represents an audio signal containing at least one, but not all, of the plurality of different timbres of the respective audio track, thereby obtaining a set of decomposed tracks, analyzing each decomposed track of the set of decomposed tracks to determine at least one track parameter of the respective audio track which the decomposed track is associated with, selecting or allowing a user to select at least one selected audio track out of the set of audio tracks, based on at least one of the track parameters, generating an output track based on the at least one selected audio track.
  • Method of at least one of items 18 to 20, comprising notifying a user about at least one track parameter, preferably displaying at least one track parameter as associated to the respective audio track.
  • Method of item 21 comprising displaying a graphical representation of an audio track of the set of audio tracks, preferably a color, which depends on the associated track parameter of the audio track.
  • Method of at least one of items 18 to 21 comprising playing a selected audio track.
  • a transition point as described in the above items may correspond to any of the transition start time, the transition end time and the transition reference time as mentioned in the first and second aspects of the invention and in the claims.

Abstract

The present invention provides a device for processing audio signals preferably based on Al, comprising: - a first input unit providing a first input signal of a first input audio track, - a second input unit providing a second input signal of a second input audio track, - a playback unit configured to start playback of a first output signal obtained from the first input signal, and - a transition unit for performing a transition between playback of the first output signal and playback of a second output signal obtained from the second input signal.

Description

PLAYBACK TRANSITION FROM FIRST TO SECOND AUDIO TRACK WITH TRANSITION FUNCTIONS OF DECOMPOSED SIGNALS
Description
The present invention relates to methods and devices for processing audio signals, in particular a first input signal of a first input audio track and a second input signal of a second input audio track, which allow playback of a transition from the first input audio track to the second input audio track.
Methods and devices of this type are used in all fields of sound reproduction or audio playback, for example in DJ equipment, mixers, music players etc. Input audio tracks may in particular be songs or other pieces of music which are to be played through a PA system, speakers or headphones.
It is a general desire to play transitions between different audio tracks in such a manner as to sound smooth and continuous, thus avoiding any abrupt changes of the sound, any breaks or gaps of silence, any abrupt shifts in tempo or in the general atmosphere of the sound. Therefore, several approaches are known to cross-fade audio tracks such that over a certain transition time interval (usually some seconds) both tracks are played, wherein the volume of the first track is reduced while the volume of the second track is increased. In order to further improve the smoothness of the transition, it is further known to perform a tempo matching and/or a key matching of the two tracks, hence avoiding a sudden change in the beat or tune during the transition.
However, playing smooth transitions between two audio tracks remains difficult, in particular for tracks containing vocal components which, due to their nature and sound structure, cannot easily be mixed without inducing dissonances or timing problems at least at some point in time during the transition. As a result, for example DJs try to run a transition from one audio track to another audio track at such parts of the songs where one of the two tracks has a break/pause in its vocal component, for example during an instrumental solo part or at a song part junction between two parts of a song (e.g. between verse and chorus or the like). This, however, requires a considerable amount of experience of the DJ and cannot always reliably be achieved for all types of music.
It was therefore an object of the present invention to provide methods and devices of the above-mentioned type which allow playing smooth transitions between a first audio track and a second audio track at different desired positions within the audio tracks, and which are easier to operate for a user.
According to a first aspect of the present invention, this object is achieved by a method for processing audio signals, comprising the steps of
- providing a first input signal of a first input audio track and a second input signal of a second input audio track,
- decomposing the first input signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal,
- assigning a first volume level to the first decomposed signal and a second volume level to the second decomposed signal,
- starting playback of a first output signal obtained from recombining at least the first decomposed signal at the first volume level with the second decomposed signal at the second volume level, such that the first output signal substantially equals the first input signal,
- while playing the first output signal, reducing the first volume level according to a time-dependent first transition function and reducing the second volume level according to a time-dependent second transition function different from said first transition function,
- starting playback of a second output signal obtained from the second input signal after starting playback of the first output signal but before volume levels of all decomposed signals of the first input signal have reached substantially zero.
In the present disclosure, audio tracks, in particular a first input audio track and a second input audio track, may include digital audio data such as contained in audio files or digital audio streams. The files or streams may have a specific length or playback duration or alternatively may have an undefined or infinitive length or playback duration, such as for example in case of a live stream or a continuous data stream received from a content provider via Internet. Note that digital audio tracks are usually stored in an audio file in association with consecutive time frames, the length of each time frame being dependent on the sampling rate of the audio data as conventionally known. For example, in an audio file sampled at a sampling rate of 44.1 kHz one time frame will have a length of 0.023 ms. Furthermore, audio tracks may be embodied by analog audio signals, for example signals played by an analog playback device such as a vinyl player, a tape player etc. In specific embodiments, audio tracks may be songs or other pieces of music provided in digital or analog format.
Furthermore, the term “audio signal” refers to an audio track or any part or portion of an audio track at a certain position or time within the audio track. The audio signal may be a digital signal processed, stored or transmitted through an electronic control system, in particular computer hardware, or may be an analog signal processed, stored or transmitted by analog audio hardware such as an analog mixer, a PA system or the like.
Methods according to the first aspect of the invention use a step of decomposing at least the first input audio signal to obtain a plurality of different decomposed signals. Several decomposing algorithms and services are known in the art, which allow decomposing audio signals to separate therefrom one or more signal components of different timbres, such as vocal components, drum components or instrumental components. Such decomposed signals and decomposed tracks have been used in the past to create certain artificial effects such as removing vocals from a song to create a karaoke version of a song.
More specifically, with regard to decomposing audio data there have been several approaches based on artificial intelligence and deep neural networks in order to decompose mixed audio signals to separate therefrom signals of certain timbres. Some Al systems usually implement a convolutional neural network (CNN), which has been trained by a plurality of data sets for example including a vocal track, a harmonic/instrumental track and a mix of the vocal track and the harmonic/instrumental track. Examples for such conventional Al systems capable of separating source tracks such as a singing voice track from a mixed audio signal include: Pretet, “Singing Voice Separation: A study on training data”, Acoustics, Speech and Signal Processing (ICASSP), 2019, pages 506-510; “spleeter” - an open-source tool provided by the music streaming company Deezer based on the teaching of Pretet above, “PhonicMind” (https://phonicmind.com) - a voice and source separator based on deep neural networks, Open-Unmix” - a music source separator based on deep neural networks in the frequency domain, or “Demucs” by Facebook Al Research - a music source separator based on deep neural networks in the waveform domain. These tools accept music files in standard formats (for example MP3, WAV, AIFF) and decompose the song to provide decomposed/separated tracks of the song, for example a vocal track, a bass track, a drum track, an accompaniment track or any mixture thereof.
According to an important aspect of the present invention, volume changes of decomposed tracks based on different transition functions are used to realize a smooth transition between playback of a first audio track and playback of a second audio track. In particular, at a point in time before the transition, the decomposed signals of the first input signal are recombined or mixed to obtain a first output signal in such a manner that the first output signal substantially equals the first input signal. Normally, this means that the volume levels of all decomposed tracks are set to the same value, in particular to 100 % (full volume). Furthermore, the set of decomposed signals obtained in the step of decomposing the first input signal is preferably a complete set, which means that they sum up to an output signal substantially equal to the original input signal. For example, if the input signal consists of a vocal component, a drum component and an instrumental component and substantially no other components, a complete set of decomposed signals obtained in the step of decomposing then includes a decomposed vocal signal, a decomposed drum signal and a decomposed instrumental signal, such that, when recombined, they sum up to an output signal substantially equal to the original first input signal. “Substantially equal” in this respect means that at this point in time of the process, a difference between the first output signal and the first input signal is not audible or at least not disturbing to a user.
Starting from a condition as stated above in which the first output signal is played such as to be substantially equal to the first input signal, according to the present invention, a transition towards playback of the second input track (more precisely audio signals obtained from the second input track) is commenced by reducing the volume levels of the decomposed signals of the first input signal and increasing the volume levels of audio data obtained from the second input signal. The volume levels of decomposed tracks are each changed according to a respective transition function associated to each of the decomposed signals. At least the first transition function associated with the first decomposed signal is different from the second transition function associated with the second decomposed signal such that in a transition time interval the volume change of the first decomposed signal will be different from that of the second decomposed signal.
This allows reducing the proportion of certain sound components, which tend to create mixing problems when mixed with respective sound components of the second input track during the transition. For example, vocal components may be reduced in volume or even muted during a transition in order to avoid clashing of the vocals of two different songs, while at the same time other sound components, such as a drum component, which more easily mix with corresponding components of the second song can be maintained at a higher volume level in order to achieve an acoustic continuity throughout the transition.
As a result, the method according to the first aspect of the invention allows reducing abrupt changes of the sound or disharmony/dissonances induced by mixing certain sound components of two different audio tracks, while at the same time mixing other sound components which are more suitable to be played together, such as to achieve a smooth and continuous transition between two audio tracks.
In a preferred embodiment of the present invention, a method of the first aspect may further comprise the steps of decomposing the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal, assigning a third volume level to the third decomposed signal and a fourth volume level to the fourth decomposed signal, starting playback of the second output signal obtained from recombining at least the third decomposed signal and the fourth decomposed signal, while playing the second output signal, increasing the third volume level according to a time-dependent third transition function and increasing the fourth volume level according to a time-dependent fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
In this way, not only the fading out of the first input signal but also the fading in of the second input signal can be controlled on the basis of specific sound components or timbres such as to make the transition even smoother and continuous. For example, the volume of a decomposed drum signal may be increased more quickly as it has a lower tendency to clash with the decomposed drum signal of the first input signal, whereas the decomposed vocal signal of the second input signal may be faded in at a later point in time or by using a transition function beginning with a lower slope in order to avoid clashing with the decomposed vocal signal of the first input signal.
Each of the transition functions preferably assigns a predetermined volume level or a predetermined change in volume level to each of a plurality of time frames within a transition time interval defined by a transition start time and transition end time or to each of a plurality of controller positions of a user control element. Transition functions may be embodied in digital format by a formula stored in a memory of an electronic device such that for each time or control position an associated volume level or a change in volume level can be calculated using the formula.
As an alternative to storing a formula of the transition function, a lookup table or prestored array can be used which stores predetermined values of volume levels or changes in volume level such that a volume level or change in volume level can be derived for each time frame or controller position by looking up the table or array. As a further alternative, transition functions may be represented by analog means such as a controllable resistor.
Preferably, at least two of the transition functions, more preferably all of the first to fourth transition functions, are based on time frames and have the same transition time interval reaching from the same transition start time to the same transition end time such that the transition can be carried out within a predetermined time interval using more than one transition function for more than one decomposed signal. The first transition function and the second transition function are preferably defined such that the volume level is at a maximum at the transition start time and at a minimum at the transition end time, such that the first output signal is continuously faded out and can finally be stopped completely when the transition to the second output signal is completed. A minimum volume level herein preferably refers to a 0 % volume level or substantial silence. Note that in the present disclosure a 0 % volume level or substantial silence includes playback of an audio signal at a volume level below an auditory threshold such that it cannot be heard any more by a user during playback, and it further includes a complete stop of the playback of an audio signal.
Likewise, the third transition function and the fourth transition function may be defined such that the volume level is at a minimum, in particular corresponding to substantial silence, at a transition start time and at a maximum at a transition end time in order to allow continuous fade-in of the second output signal from silence to maximum.
According to embodiments of the present invention, the shapes of the transition functions can be set in order to achieve certain effects for certain decomposed signals and for controlling the transition. In particular, at least one of the transition functions may be a linear function or contain a linear portion. Linear fade-ins or fade- outs are relatively easy to realize technically and correspond to sound developments the user is used to hear in conventional mixes, for example at the end of songs.
At least one of the transition functions may be a continuous function, such that unexpected sudden changes of the volume level can be avoided. In addition or alternatively, at least one of the transition functions may be a monotonic function such that the volume level does not change its direction with regard to increasing or decreasing throughout the transition time interval or throughout the controller range. In this way, the user gets an impression of a seamless, continuous transition from the first output signal towards the second output signal.
As mentioned above, improved transitions between audio tracks can be achieved according to the present invention by using different volume changes for different sound components of the tracks, i.e. different transition functions for different decomposed signals of the tracks. In one embodiment, the first transition function and the second transition function may differ from each other with regard to slope. Likewise, the third transition function and the fourth transition function may differ from each other with regard to slope. This means that for example the decomposed vocal signal of the first input signal may be faded out more quickly in order to give way for the decomposed vocal signal of the second input signal, whereas the decomposed drum signal of the first input signal remains in the mix more prominently for a longer time and mixes with the decomposed drum signal of the second input track over a considerable portion of the transition time interval or controller range.
In general, all types of decomposing algorithms can be used for decomposing the first and/or second input signal. Different algorithms, for example algorithms as known in the art and mentioned above, achieve different results with respect to quality of the decomposition and speed of processing. Preferably, in embodiments of the present invention the step of decomposing includes processing the first audio signal and/or the second audio signal within an Al system comprising a trained neural network. Al systems achieve a high level of quality and in particular allow decomposing different timbres of a mixed audio signal, which in particular may correspond or resemble certain source tracks that were originally mixed when producing or generating the input audio track, such as certain instrumental tracks, vocal tracks, drum tracks etc. More particular, the step of decomposing may include decomposing the first/second audio signal with regard to predetermined timbres such as to obtain decomposed signals of different timbres, preferably being selected from the group consisting of a vocal timbre, a non-vocal timbre, a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, and any combination thereof. The non-vocal timbre, the non-drum timbre and the non-harmonic timbre may in particular be respective complement signals to that of the vocal timbre, the drum timbre and the harmonic timbre. Complement signals may be obtained by excising from the input signal one decomposed signal of a specific timbre. For example, an input signal may be decomposed or separated into two decomposed signals, a decomposed vocal signal of a vocal timbre, and its complement, a decomposed non-vocal signal of a non-vocal timbre, which means that a mixture of the decomposed vocal signal and the decomposed non-vocal signal results in a signal substantially equal to the input signal. Alternatively, decomposition can be carried out to obtain a decomposed vocal track and a plurality of decomposed nonvocal tracks such as a decomposed drum track and a decomposed harmonic track (including harmonic instruments such as guitars, piano, synthesizer).
In a preferred embodiment of the present invention, the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or at least at a transition reference time or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function. In this manner, the sum of the decomposed vocal signals is smaller during the transition, in particular at least at a transition reference time or a controller reference position, than a sum of the decomposed non-vocal signals. This reduces the mixture of the vocals of the different input signals (avoiding clashing of vocals of different songs), while keeping continuity of the playback during the transition because of the higher volume of the decomposed non-vocal signals of both input signals.
In a further embodiment of the present invention, the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or at least at a transition reference time or at a controller reference position (for example a controller center position), a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function. With this feature, a mixture of the decomposed drum signals of both input signals is achieved with relatively high volume level throughout the transition time interval or throughout the controller range, such that the drum beat continuously moves on throughout the transition time interval or throughout the controller range to ensure a feeling of continuity and to avoid any undesired breaks of the rhythm.
In a further preferred embodiment of the present invention, the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1 ) and the transition end time (T3) or between the controller end positions. In this way, decomposed signals which have a tendency to induce disharmony or dissonances when mixed together are controlled in such a manner that at the time they have about the same volume level, i.e. somewhere in the middle region of the transition time interval or the controller range, for example at a transition reference time or a controller reference position (for example a controller center position), their overall volume level (the sum of both volume levels) is minimal, such that the contribution of the possibly problematic mixture of the two decomposed signals is reduced to a minimum and the mixture of the remaining decomposed signals which mix more easily will dominate the sound at this point in time.
In a further embodiment of the present invention, the method further includes a step of analyzing an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction. Song parts of a song are usually distinguishable by an analyzing algorithm since they differ in several characteristics such as instrumental density, medium pitch or rhythmic pattern. Song parts may in particular be a verse, a chorus, a bridge, an intro or an outro as conventionally known. Certain instrumental or rhythmic patterns will remain constant within a song part and will change in the next song part.
Recognition of song parts may be supported by analyzing not only the entire input signal but instead or in addition thereto at least one of the decomposed signals. For example, by analyzing a decomposed bass signal in isolation from the remaining sound components, it will be easy to derive therefrom a chord progression of the song which is one of the key criteria to differentiate song parts. Furthermore, an analysis of the decomposed drum signals allows a more accurate recognition of a rhythmic pattern and thus a more accurate detection of certain song parts. A song part junction then refers to a junction between one song part and the next song part.
According to the embodiment described above, transition time intervals may include song part junctions which allow to carry out the transition between two songs at the end of the song part which further improves smoothness and likeability of the transition.
Song parts may be detected by analyzing at least one of the decomposed signals within an Al system comprising a trained neural network. Preferably, such analyzing includes detecting silence within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than -30 dB. In particular, the step of analyzing decomposed signals may include detecting silence continuously extending over a predetermined time span within the decomposed signal, said silence preferably representing an audio signal having a volume level smaller than -30 dB. Thus, in embodiments of the invention start- and/or end points of silence may be taken as song part junctions.
In a further embodiment of the present invention, the method further includes the steps of receiving a user input referring to a transition command, including at least one transition parameter, and setting at least one of the transition functions according to the transition parameter. This allows a user to control when and/or how the transition is played. For example, the transition parameter may be a transition start time or a transition end time of a transition time interval of at least one of the transition functions, or may be a length of a transition time interval of at least one of the transition functions. This allows a user to control when the transition is to be carried out and how long it takes. A user may also control at which position in the song a transition is to be performed by choosing only one transition parameter such as a transition reference time of at least one of the transition functions. In this case, either the location of the transition start time relative to the transition reference time and the length of the transition time interval, or the location of both, the transition start time and the transition end time, should be preset values. Furthermore, the transition parameter may refer to a slope, shape or offset of at least one of the transition functions which allows a user to control the dynamics of the transition for one of or more decomposed signals.
As a further alternative or additional option, a transition parameter to be controlled by a user input may refer to an assignment or de-assignment of a preset transition function to or from a selected one of the plurality of decomposed signals. In this way, a user may select one or more decomposed signals to take part in the transition which are then submitted to the volume changes according to the respective transition functions. The transition function assigned to a certain decomposed signal may be selected from one of a set of preset transition functions (sets of different transition time interval lengths or sets of different transition functions having different slope, shape or offset).
In a further embodiment of the present invention, the method may comprise the steps of
- determining at least one tempo parameter of the first and/or second input track, in particular a BPM (beats per minute) and/or a beat grid and/or a beat phase of the first and/or second input track and
- a tempo matching processing based on the determined tempo parameter, including a time stretching and/or time shifting and/or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching BPM and/or mutually matching beat phases.
As a result, first and second output signals will have matching tempi thus enhancing continuity of the playback during the transition.
In a further embodiment of the present invention, the method may further comprise the steps of
- determining a key of the first and/or second input track and
- a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
Thus, an unexpected pitch shift or change in key during transition can be avoided which enhances continuity and smoothness of the transition.
In general, the method of the present invention can be applied to any type of input audio track. For example, the input audio track may be stored on a local device such as a storing means of a computer, and may be present as a digital audio file. Furthermore, the first input audio track or the second input track may be received as a continuous stream, for example a data stream received via Internet, a real-time audio stream received from a live audio source or from a playback device in playback mode. Thus, the range of applications is basically not limited to a specific medium. When receiving the first/second input audio track as a continuous stream, playback of the first output signal and/or second output signal may be started while continuing to receive the continuous stream. This has particular advantages in many situations where the audio tracks do not have a certain length or playback duration as the length is either unlimited or undefined, for example in case of processing signals from a live concert or live broadcasting. Furthermore, it is not necessary to wait until a certain audio file is completely downloaded or received or until a certain audio track has completely been played by the playback device, but instead playback of the output signals based on the received input signals can be started earlier.
In another preferred embodiment of the present invention, decomposing first and/or second input signal is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal. Partitioning the first and/or second input signals into segments (preferably segments of equal lengths) and operating the method of the invention based on these segments allows using the decomposition result for playing the transition at an earlier point in time, i.e. after finishing decomposition of just one segment, without having to wait until the decomposition result of an entire audio file for example is available. Another advantage of the segmentation is that decomposition of the second input signal can start at an arbitrary point within the second input audio track. For example, when an optimal transition start point for the second input audio file is determined to be at e.g. 01 :20 (one minute, twenty seconds), that decomposition can start at the segment closest to 01 :20, and the beginning part which is not used does not have to be decomposed. This saves performance and ensures that decomposition results are available much faster. Preferably one segment has a playback duration which smaller than 20 seconds.
The method steps, in particular the steps of providing the first and second input signals, decomposing the first input signal, starting playback of the first output signal and starting playback of the second output signal, may be carried out in a continuous process, wherein a time shift between receiving the first input audio track or a first portion of a continuous stream of the first input audio track and starting playback of the first output signal is preferably less than 10 seconds, more preferably less than 2 seconds, and/or wherein a time shift between receiving the second input audio track or a first portion of a continuous stream of the second input audio track and starting playback of the second output signal is preferably less than 10 second, more preferably less than 2 seconds.
In a further embodiment of the present invention, at least one, preferably all of the first and second input signals, the decomposed signals and the first and second output signals represent stereo signals, each comprising a left channel signal portion and a right channel signal portion, respectively. The method is thus suitable for playing music at high quality.
According to a second aspect of the present invention, the above-mentioned object is solved by a device for processing audio signals, comprising:
- a first input unit providing a first input signal of a first input audio track and a second input unit providing a second input signal of a second input audio track, a decomposition unit configured to decompose the first input audio signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal, a playback unit configured to start playback of a first output signal obtained from recombining at least the first decomposed signal at a first volume level with the second decomposed signal at a second volume level, such that the first output signal substantially equals the first input signal,
- a transition unit for performing a transition between playback of the first output signal and playback of a second output signal obtained from the second input signal, wherein the transition unit has a volume control section adapted for reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function. Such a device includes several units carrying out method steps as described above for the first aspect of the present invention. Furthermore, in embodiments of the device of the second aspect of the invention, further units or other device features may be implemented which are configured to carry out methods or method features of any of the above-described embodiments of the first aspect of the present invention. Reference is thus made to the description above of the first aspect of the present invention, as the device of the second aspect of the present invention can achieve the corresponding technical effects and advantages.
A device of the second aspect of the present invention is preferably embodied as a computer running a suitable software application. In particular, the software application may be configured to carry out a method according to the first aspect of the present invention. The computer may be a personal computer, a tablet computer or a smartphone, and may include in the manner as conventionally known a RAM, a ROM, a microprocessor and suitable input/output means. Included in the computer or connected to the computer may be an audio interface which may be connected, for example wireless (e.g. via Bluetooth or similar technology), to speakers, headphones or a PA system in order to output sound when playing the first and second output signals, respectively. As a further alternative, the device may be embodied as a standalone DJ device including suitable electronic hardware or computing means.
If the device uses an Al system for decomposing audio data, the device preferably has a decomposition unit which includes the Al system comprising a trained neural network. This means that the complete Al system including the trained neural network may be integrated within the device, for example as a software application or software plugin running locally in a memory integrated within the device. Furthermore, the device preferably includes a user interface embodied by either a display such as a touch display or a display to be operated by a pointer device, or as one or more hardware control elements such as a hardware fader or rotatable hardware knobs, or by a voice command or by any other user input/output technology.
Preferred embodiments of the present invention will be described in the following on the basis of the attached drawings, wherein Fig. 1 shows a device according to an embodiment of the present invention,
Fig. 2 shows a schematic functional diagram of components of the device of the embodiment shown in Fig. 1, and Figs. 3a-3c show transition functions for decomposed tracks as used in the device of the embodiment of the invention as shown in Figs. 1 and 2 and according to a method of an embodiment of the invention.
A device 10 according to an embodiment of the present invention may be formed by a computer such as a tablet computer or a smartphone, which comprises standard hardware components such as input/output ports, wireless connectivity, a housing, a touchscreen, an internal storage as well as a plurality of microprocessors, RAM and ROM. Essential features of the present invention are implemented in device 10 by means of a suitable software application or a software plugin running on device 10.
The display of device 10 preferably has a first section 12a associated to a first song A and a second section 12b associated to a second song B. First section 12a includes a first waveform display region 14a which displays at least one graphical representation of song A, in particular one or more waveform signals associated to song A. For example, the first waveform display region 14a may display a waveform of song A and/or one or more waveforms of decomposed signals obtained from decomposing song A. For example, decomposition of song A may be carried out to obtain a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which may be displayed within the first waveform display region 14a. Likewise, a second waveform display region 14b may be included in the second section 12b such as to display a graphical representation related to song B in the same or corresponding manner as described above for song A. Thus, the second waveform display region 14b may display one or more waveforms of song B and/or at least one waveform of a decomposed signal obtained from song B.
Furthermore, first and second waveform display regions 14a, 14b may each display a play-head 16a, 16b, respectively, which show a current playback position within song A and song B, respectively. Each of the first and second sections 12a and 12b may further include a number of control elements for controlling playback, effects and other features related to song A and song B, respectively. For example, the first section 12a may include a play button 18a which can be pushed by a user to alternatively start and stop playback of song A (more precisely audio signals obtained from Song A, such as decomposed signals). Likewise, the second section 12b may include a play button 18b which may be pushed by a user to alternatively start and stop playback of song B (more precisely audio signals obtained from Song B, such as decomposed signals).
An output signal generated by device 10 in accordance with the settings of device 10 and with a control input received from a user may be output at an output port 20 in digital or analog format, such as to be transmitted to a further audio processing unit or directly to a PA system, speakers or head phones. Alternatively, the output signal may be output through internal speakers of device 10.
According to the present invention, device 10 can perform a smooth transition from playback of song A to playback of song B by virtue of a transition unit, which will be explained in more detail below. In the present embodiment, device 10 may comprise a transition button 22 displayed on the display of device 10, which may be pushed by a user to initiate a transition from playback of song A towards playback of song B. By a single operation of transition button 22 (pushing the button 22), device 10 starts changing individual volumes of individual decomposed signals of songs A and B according to respective transition functions such as to smoothly cross-fade from song A to song B within a predetermined transition time interval.
In addition or alternatively, device 10 may include a transition controller 24 which can be moved by a user between one controller end point referring to a playback of only song A and a second controller end point referring to playback of only song B. This allows controlling the volumes of individual decomposed signals of songs A and B using transition functions, which are based not on time but on controller position of the transition controller 24. In this manner, in particular the speed and progress of the transition can manually be controlled through the transition controller 24.
Fig. 2 shows a schematic illustration of internal components of device 10 and a signal flow within device 10. Audio processing is based on a first input track and a second input track, which may be stored within the device 10, for example in an internal memory of the device, a hard drive or any other storage medium. First and second input tracks are preferably digital audio files of a standard compressed or uncompressed audio file format such as mp3, WAV, AIFF or the like. Alternatively, first and second input tracks may be received as continuous streams, for example via an Internet connection of device 10 or from an external playback device via an input audio interface or via a microphone. First and second input tracks are preferably processed within first and second input units 26a and 26b, respectively, which may be configured to decrypt or decompress the audio data, if necessary, and/or may be configured to extract a segment of the first input track and a segment of the second input track in order to continue processing based on the segments. This has an advantage that time-consuming processing algorithms, such as the decomposition based on a neural network, will not have to analyze the entire first or second input track upfront, but will perform processing based on shorter segments, which allows continuing processing and eventually start playback at an earlier point in time. In addition, in case of receiving the first and second input tracks as continuous streams, it would in many cases not be feasible to wait until the complete input tracks are received before starting to process the data.
The output of the first and second input units 26a, 26b, for example the segments of the first and second input tracks, form first and second input signals, and they are input into first and second Al systems 28a, 28b of a decomposition unit 40. Each Al system 28a, 28b includes a neural network trained to decompose the first and second input signals, respectively, with respect to sound components of different timbres. Decomposition unit 40 thus decomposes the first input signal to obtain a first group of decomposed signals and decomposes the second input signal to obtain a second group of decomposed signals. In the present example, each group of decomposed signals includes a decomposed drum signal, a decomposed vocal signal and a decomposed harmonic signal, which each form a complete set of decomposed signals or a complete decomposition, which means that a sum of all decomposed signals of the first group will resemble the first input signal, and the sum of all decomposed signals of the second group will resemble the second input signal.
It should be noted that although in the present embodiment two Al systems 28a, 28b are used, decomposition unit 40 may also include only one Al system and only one neural network, which is trained and configured to determine all decomposed signals of the first input signal as well as all decomposed signals of the second input signal. As a further alternative, more than two Al systems may be used, for example a separate Al system and a separate neural network may be used to generate each of the decomposed signals.
All decomposed signals, in particular both groups of decomposed signals, are then input into a playback unit 42 in order to generate an output signal for playback. Playback unit 42 comprises a transition unit 44, which is basically adapted to recombine the decomposed signals of both groups taking into account specific volume levels associated to each of the decomposed signals. Transition unit 44 is configured to recombine the decomposed signals in such a manner as to either play only a first output signal obtained from a sum of all decomposed signals of the first input signal, or a second output signal obtained from a sum of all decomposed signals of the second input signal, or any transition in between the first and the second output signals where decomposed signals of both first and second input signals are played. In particular, transition unit 44 stores individual transition functions DA, VA, HA, DB, VB, HB for each of the decomposed signals which each define a specific volume level for each time frame within a transition interval or for each controller position of the transfer controller within a controller range. Taking into account the respective volume levels according to the respective transition functions DA, VA, HA, DB, VB, HB, all decomposed signals will then be recombined to obtain the output signal.
Playback unit 42 may further include a control unit 45, which is adapted to control at least one or the transition functions DA, VA, HA, DB, VB, HB based on a user input.
The output signal generated by playback unit 42 may then be routed to an output audio interface 46 for a sound output. At any location within the signal flow, one or more sound effects may be inserted into the audio signal by means of one or more effect chains 48. In the present example, effect chain 48 is located between playback unit 42 and output audio interface 46.
Figs. 3a to 3c show examples of transition functions that may be used in transition unit 44 to set specific volume levels of individual decomposed signals depending on time. The example transition functions are based on time (time dependent transition functions), thus the transition is performed within a transition time interval reaching from a transition start time T1 to a transition end time T3. At an intermediate point in time, for example in the center of the transition time interval, a time T2 is referred to as a transition reference time.
As shown in Fig. 3a, a transition function DA of the decomposed drum signal of song A starts at 100 % at T1 and decreases linearly to 0 % at T3, while the transition function DB of the decomposed drum signal of song B starts at 0 % at T1 and increases linearly to reach 100 % at T3. The linear transition functions DA and DB intersect at T2. It can be seen that a sum of DA+DB equals 100 % throughout the transition time interval from T1 to T3. Thus, the overall volume level of all drums remains constant during the transition as well as before and after the transition such as to achieve a high level of audible continuity.
Fig. 3b shows transition functions of decomposed vocal signals of songs A and B. In the present embodiment, the transition function VA of the decomposed vocal signal of song A starts at 100 % at T1 and decreases linearly to reach 0 % in a middle region of the transition time interval, for example at the transition reference time T2. Afterwards, the transition function VA remains constant at 0 % until T3, i.e. in the interval between T2 and T3. On the other hand, the transition function VB of the decomposed vocal signal of song B starts at 0 % at T1 and remains constant at 0 % until a middle region of the transition time interval, in particular until T2, and afterwards increases linearly to reach 100 % at T3. As can be seen in Fig. 3b, a sum of the transition functions VA+VB reaches the minimum in the middle region of the transition time interval, in particular at T2, and specifically becomes 0 %. In other words, the volume level of the decomposed vocal signal of song B starts rising only after the volume level of the decomposed vocal signal of song A has dropped to 0 %. In this way, any clashing of the vocals of songs A and B can be avoided. As can be seen in Fig. 3c, transition functions of decomposed harmonic signals (for example instrumental components) are again different from the transition functions of the decomposed vocal signals and the decomposed drum signals, respectively. In a specific example, the transition function HA of the decomposed harmonic signal of song A starts at 100 % at T1 and reduces in a linear manner, but with a steeper slope as compared to the transition function VA of the decomposed vocal signal of song A, such as to reach 0 % at a time before transition function VA reaches 0 %, specifically before T2. After reaching 0 %, transition function HA remains constant at 0 % until T3. Furthermore, transition function HB of the decomposed harmonic signal of song B rises continuously and monotonically from 0 % at T1 to 100 % at T3, but not in a linear manner but in a curved manner, for example a parabolic or exponentially curved manner. Thus, a slope of transition function HB is increasing from T1 to T3.
As can be seen in Fig. 3c, a mixture of the decomposed harmonic signals of songs A and B is again avoided or substantially reduced, because the substantial increase of the volume level of the decomposed harmonic signal of song B starts only after the volume level of the decomposed harmonic signal of song A has reached 0 %.
It should be noted that although the transition functions shown in Figs. 3a to 3c are defined in relation to time within a transition time interval from T1 to T3, corresponding or other transition functions may likewise be defined with respect to the controller position of the transition controller 24 shown in Fig. 1. In particular, instead of reaching from T1 to T3, the horizontal axis of the transition functions may show the controller position reaching over the controller range from left end position to right end position.
With reference again to Fig. 1 , it should be noted that a user may initiate a transition according to the transition functions shown in Figs. 3a to 3c for example by pushing the transition button 22. In particular, T1 may be set to the time at which the user pushes the transition button 22. Alternatively, the transition may be controlled by a user by an appropriate marking or selection within one of the first and second waveform display regions 14a, 14b or any other user input. For example by clicking on a certain position in one of the waveforms displayed on one of the waveform display regions 14a, 14b, timing of a next transition can be set accordingly, for example any of the time points T1 , T2 or T3 may be set at the specified position within the waveform corresponding to a certain future time point. Thus, when the playback reaches the specified point in time, the transition will be carried out using the respective transition functions for the respective decomposed signals. As a further alternative, device 10 may have stored a setting, for example a pre-stored setting or a setting that can be manipulated by a user, wherein the setting defines at least one condition for carrying out a transition from song A to song B or vice versa. For example, the setting may specify that at a certain point in time with respect to an end of one of songs A or B, a transition to the respective other song is commenced. For example, a transition from song A to song B may be started at a certain time period (for example 5 seconds) before the end of song A, such as to avoid any interruption of the playback when song A ends.
In a further embodiment, device 10 may include means for determining characteristic song parts of songs A and/or B, such as a verse, a chorus, a bridge, an intro or an outro. A user may then choose to carry out a transition at a junction between two song parts, or device 10 may automatically carry out a transition at certain song part junctions and towards certain song part junctions of the other song, for example a transition from the beginning of an outro section of song A to an end of an intro section of song B.
A third and further aspects of the present invention are described by the following items:
1 . Method for processing audio data, comprising the steps of providing a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, providing a second audio track, analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter, generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
2. Method of item 1 , wherein the output track comprises a first portion containing predominantly the first output data, and a second portion arranged after said first portion and containing predominantly the second output data.
3. Method of item 2, wherein the step of analyzing audio data includes analyzing the decomposed data to determine a transition point as the mixing parameter, and wherein the output track is generated using the transition point such that the first portion is arranged before the transition point and the second portion is arranged after the transition point.
4. Method of item 3, wherein the output track further includes a transition portion arranged between the first portion and the second portion, and associated to the transition point, wherein in the transition portion a volume level of the first output data is reduced and/or a volume level of the second output data is increased.
5. Method of at least one of the preceding items, wherein the step of analyzing audio data includes determining at least one mixing parameter referring to at least one of: a tempo of the first and/or second audio track, a BPM (beats per minute) of the first and/or second audio track, a beat grid of the first and/or second audio track, a beat phase of the first and/or second audio track, a downbeat position within a first and/or second audio track, a beat shift between the first audio track and the second audio track, a key of the first and/or second audio track, a chord progression of the first and/or second audio track, a timbre or group of timbres of the first and/or second audio track, a song part junction of the first and/or second audio track.
6. Method of at least one of the preceding items, wherein the step of analyzing audio data includes detecting silence data within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than -30 dB.
7. Method of at least one of the preceding items, wherein the step of analyzing audio data includes detecting silence data continuously extending over a predetermined time span within the decomposed data, said silence data preferably representing an audio signal having a volume level smaller than -30 dB.
8. Method of at least one of the preceding items, wherein the step of analyzing audio data includes determining at least a first mixing parameter based on the decomposed data, and at least a second mixing parameter based on the first mixing parameter, said second mixing parameter preferably being the transition point.
9. Method of at least one of the preceding items, wherein the step of analyzing audio data includes determining a tempo of the first and/or second audio track as mixing parameter, and wherein the step of generating the output track comprises a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first audio track and/or the second audio track, such that the first output data and the second output data have mutually matching tempos.
10. Method of at least one of the preceding items wherein the step of analyzing audio data includes determining a key of the first and/or second audio track as mixing parameter, and wherein the step of generating the output track comprises a key matching processing including pitch shifting of audio data obtained from the first audio track and/or the second audio track, such that the first output data and the second output data have mutually matching keys.
11. Method of at least one of the preceding items, wherein the step of decomposing the mixed input data includes processing the mixed input data within an Al system comprising a trained neural network.
12. Method of at least one of the preceding items, wherein at least one of the steps of analyzing the audio data and generating the output track includes processing of audio data within an Al system comprising a trained neural network.
13. Method of at least one of the preceding items, further comprising playing the output track. 14. Device for processing audio data, preferably device adapted to carry out a method according to at least one of the preceding items, said device comprising a first input unit for receiving a first audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, a second input unit for receiving a second audio track, a decomposition unit for decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, - an analyzing unit for analyzing audio data, including at least the decomposed data, to determine at least one mixing parameter, an output generation unit for generating an output track based on the at least one mixing parameter, said output track comprising first output data obtained from the first audio track and second output data obtained from the second audio track.
15. Device of item 12, comprising a tempo matching unit adapted for time stretching or resampling of audio data obtained from the first audio track and/or the second audio track, such as to generate the first output data and the second output data having mutually matching tempos.
16. Device of item 12 or item 13, comprising a key matching unit adapted for pitch shifting of audio data obtained from the first audio track and/or the second audio track, such as to generate the first output data and the second output data having mutually matching keys.
17. Device of at least one of items 12 to 14, wherein at least one of the decomposition unit, the analyzing unit and the output generation unit includes an Al system comprising a trained neural network. 18. Device of at least one of items 12 to 15, comprising a playback unit for playing the output track.
19. Method for processing audio data, comprising the steps of providing an audio track of mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, decomposing the mixed input data to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, analyzing the decomposed data to determine a transition point between a first song part and a second song part within the audio track.
20. Method for processing audio data, comprising the steps of providing a set of audio tracks, each including mixed input data, said mixed input data representing audio signals containing a plurality of different timbres, decomposing each audio track of the set of audio tracks, such as to obtain a decomposed track associated with the respective audio track, wherein the decomposed track represents an audio signal containing at least one, but not all, of the plurality of different timbres of the respective audio track, thereby obtaining a set of decomposed tracks, analyzing each decomposed track of the set of decomposed tracks to determine at least one track parameter of the respective audio track which the decomposed track is associated with, selecting or allowing a user to select at least one selected audio track out of the set of audio tracks, based on at least one of the track parameters, generating an output track based on the at least one selected audio track.
21. Method of item 18, wherein the track parameter refers to at least one timbre of the respective audio track.
22. Method of item 18 or item 19, wherein the track parameter refers to at least one of a tempo, a beat, a BPM value, beat grit, beat phase, a key and a chord progression of the respective audio track.
23. Method of at least one of items 18 to 20, comprising notifying a user about at least one track parameter, preferably displaying at least one track parameter as associated to the respective audio track.
24. Method of item 21 , comprising displaying a graphical representation of an audio track of the set of audio tracks, preferably a color, which depends on the associated track parameter of the audio track. 25. Method of at least one of items 18 to 21 , comprising playing a selected audio track.
26. Method or device of at least one of the preceding claims, wherein the second audio track contains mixed input data, said mixed input data representing an audio signal containing a plurality of different timbres, wherein the mixed input data are decomposed to obtain decomposed data representing an audio signal containing at least one, but not all, of the plurality of different timbres, and wherein analyzing is carried out taking into account the decomposed data obtained from the second audio track.
27. Method or device of at least one of the preceding claims, wherein the mixed input data of the first and/or second audio track are decomposed to obtain at least decomposed data of a vocal timbre, decomposed data of a harmonic timbre and decomposed data of a drum timbre or to obtain exactly three decomposed tracks which are a decomposed track of a vocal timbre, a decomposed track of a harmonic timbre and a decomposed track of a drum timbre, wherein the three tracks preferably sum up to an audio track substantially equal to the first and/or second audio track, respectively.
It should be noted that methods and devices as described above in items 1 to 27 may be understood as embodiments of methods and devices as described above as first and second aspects of the invention and in the claims. In particular, a transition point as described in the above items may correspond to any of the transition start time, the transition end time and the transition reference time as mentioned in the first and second aspects of the invention and in the claims.

Claims

Claims
1. Method for processing audio signals, comprising the steps of
- providing a first input signal of a first input audio track and a second input signal of a second input audio track,
- decomposing the first input signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal, assigning a first volume level to the first decomposed signal and a second volume level to the second decomposed signal,
- starting playback of a first output signal obtained from recombining at least the first decomposed signal at the first volume level with the second decomposed signal at the second volume level, such that the first output signal substantially equals the first input signal,
- while playing the first output signal, reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function,
- starting playback of a second output signal obtained from the second input signal after starting playback of the first output signal but before volume levels of all decomposed signals of the first input signal have reached substantially zero.
2. Method of claim 1 , further comprising the steps of decomposing the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal, assigning a third volume level to the third decomposed signal and a fourth volume level to the fourth decomposed signal, - starting playback of the second output signal obtained from recombining at least the third decomposed signal and the fourth decomposed signal,
- while playing the second output signal, increasing the third volume level according to a third transition function and increasing the fourth volume level according to a fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
3. Method of claim 1 or claim 2, wherein each of the transition functions assigns a predetermined volume level or a predetermined change in volume level to each of a plurality of time frames within a transition time interval defined between a transition start time (T1) and a transition end time (T3), and/or to each of a plurality of controller positions within a controller range of a user operated controller defined between a controller first end position and a controller second end position.
4. Method of claim 3, wherein the first transition function and the second transition function are defined such that the volume level is at a maximum at the transition start time (T 1 ) and/or at the controller first end position, and at a minimum, in particular corresponding to substantially silence at the transition end time (T3) and/or at the controller second end position, and/or wherein the third transition function and the fourth transition function are defined such that the volume level is at a minimum, in particular corresponding to substantially silence at the transition start time (T1) and/or at the controller first end position, and at a maximum at the transition end time (T3) and/or at the controller second end position.
5. Method of at least one of the preceding claims, wherein at least one of the transition functions is a linear function or contains a linear portion.
6. Method of at least one of the preceding claims, wherein at least one of the transition functions is a continuous function and/or a monotonic function.
7. Method of at least one of the preceding claims, wherein the first transition function and the second transition function differ from each other with regard to slope and/or wherein the third transition function and the fourth transition function differ from each other with regard to slope.
8. Method of at least one of the preceding claims, wherein the step of decomposing includes processing the first audio signal and/or the second audio signal within an Al system comprising a trained neural network.
9. Method of at least one of the preceding claims, wherein the step of decomposing includes decomposing the first audio signal and/or the second audio signal with regard to predetermined timbres, such as to obtain decomposed signals of different timbres, said timbres preferably being selected from the group consisting of:
- a vocal timbre, a non-vocal timbre, - a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, any combination thereof.
10. Method of claim 9, wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
11 . Method of claim 9 or claim 10, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or wherein at least at a transition reference time and/or at a controller reference position a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function.
12. Method of claim 4 and at least one of claims 9 to 11, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, and/or wherein a sum of the first transition function and the third transition function is substantially constant, preferably a maximum volume level, throughout the entire transition time interval and/or the entire controller range.
13. Method of claim 4 and at least one of claims 9 to 12, wherein the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or wherein a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1) and the transition end time (T3) and/or between the controller first end position and the controller second end position.
14. Method of at least one of the preceding claims, further including a step of analyzing an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
15. Method of at least one of the preceding claims, further including the steps of
- receiving a user input referring to a transition command, including at least one transition parameter,
- setting at least one of the transition functions according to the transition parameter, wherein the transition parameter is preferably selected from the group consisting of: a transition start time (T1) of a transition time interval of at least one of the transition functions, a transition end time (T3) of a transition time interval of at least one of the transition functions, a length (T3-T 1 ) of a transition time interval of at least one of the transition functions, a transition reference time (T2) within the transition time interval of at least one of the transition functions, a slope, shape or offset of at least one of the transition functions, an assignment or deassignment of a preset transition function to or from a selected one of the plurality of decomposed signals.
16. Method of at least one of the preceding claims, further comprising the steps of - determining at least one tempo parameter of the first and/or second input track, in particular a BPM (beats per minute) and/or a beat grid and/or a beat phase of the first and/or second input track and
- a tempo matching processing based on the determined tempo parameter, including a time stretching and/or time shifting and/or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching BPM and/or mutually matching beat phases.
17. Method of at least one of the preceding claims, further comprising the steps of determining a key of the first and/or second input track and
- a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
18. Method of at least one of the preceding claims, wherein the first input audio track and or the second input audio track are received as a continuous stream, for example a data stream received via internet, a real-time audio stream received from a live audio source or from a playback device in playback mode, and wherein playback of the first output signal and/or second output signal is started while continuing to receive the continuous stream.
19. Method of at least one of the preceding claims, wherein decomposing first and/or second input signal is carried out segment-wise, wherein decomposing is carried out based on a first segment of the input signal such as to obtain a first segment of the decomposed signal, and wherein decomposing of a second segment of the input signal is carried out while playing the first segment of the decomposed signal.
20. Method of at least one of the preceding claims, wherein the method steps, in particular the steps of providing the first and second input signals, decomposing the first input signal, starting playback of the first output signal and starting playback of the second output signal, are carried out in a continuous process, wherein a time shift between receiving the first input audio track or a first portion of a continuous stream of the first input audio track and starting playback of the first output signal is preferably less than 10 seconds, more preferably less than 2 seconds, and/or wherein a time shift between receiving the second input audio track or a first portion of a continuous stream of the second input audio track and starting playback of the second output signal is preferably less than 10 seconds, more preferably less than 2 seconds.
21. Method of at least one of the preceding claims, wherein at least one, preferably all of the first and second input signals, the decomposed signals and the first and second output signals represent stereo signals, each comprising a left-channel signal portion and a right-channel signal portion, respectively.
22. Device for processing audio signals, comprising:
- a first input unit providing a first input signal of a first input audio track and a second input unit providing a second input signal of a second input audio track, - a decomposition unit configured to decompose the first input audio signal to obtain a plurality of decomposed signals, comprising at least a first decomposed signal and a second decomposed signal different from the first decomposed signal,
- a playback unit configured to start playback of a first output signal obtained from recombining at least the first decomposed signal at a first volume level with the second decomposed signal at a second volume level, such that the first output signal substantially equals the first input signal,
- a transition unit for performing a transition between playback of the first output signal and playback of a second output signal obtained from the second input signal, wherein the transition unit has a volume control section adapted for reducing the first volume level according to a first transition function and reducing the second volume level according to a second transition function different from said first transition function.
23. Device of claim 22, wherein the decomposition unit is configured to decompose the second input signal to obtain a plurality of decomposed signals comprising at least a third decomposed signal and a fourth decomposed signal different from the third decomposed signal, wherein the second output signal is obtained from recombining at least the third decomposed signal at a third volume level and the fourth decomposed signal at a fourth volume level, wherein the volume control section is adapted for increasing the third volume level according to a third transition function and increasing the fourth volume level according to a fourth transition function different from said third transition function, until the second output signal substantially equals the second input signal.
24. Device of claim 22 or claim 23, wherein each of the transition functions assigns a predetermined volume level or a predetermined change in volume level to each of a plurality of time frames within a transition time interval defined between a transition start time (T1) and a transition end time (T3), and/or to each of a plurality of controller positions within a controller range of a user operated controller defined between a controller first end position and a controller second end position.
25. Device of claim 24, wherein the first transition function and the second transition function are defined such that the volume level is at a maximum at the transition start time (T1) and/or at the controller first end position, and at a minimum, in particular corresponding to substantially silence at the transition end time
(T3) and/or at the controller second end position, and/or wherein the third transition function and the fourth transition function are defined such that the volume level is at a minimum, in particular corresponding to substantially silence at the transition start time (T1) and/or at the controller first end position, and at a maximum at the transition end time (T3) and/or at the controller second end position.
26. Device of at least one of claims 22 to 25, wherein at least one of the transition functions is a linear function or contains a linear portion.
27. Device of at least one of claims 22 to 26, wherein at least one of the transition functions is a continuous function and/or a monotonic function.
28. Device of at least one of claims 22 to 27, wherein the first transition function and the second transition function differ from each other with regard to slope and/or wherein the third transition function and the fourth transition function differ from each other with regard to slope.
29. Device of at least one of claims 22 to 28, wherein the decomposition unit includes an Al system comprising a trained neural network.
30. Device of at least one of claims 22 to 29, wherein the decomposition unit is configured to decompose the first audio signal and/or the second audio signal with regard to predetermined timbres, such as to obtain decomposed signals of different timbres, said timbres preferably being selected from the group consisting of:
- a vocal timbre, a non-vocal timbre, a drum timbre, a non-drum timbre, a harmonic timbre, a non-harmonic timbre, any combination thereof.
31. Device of claim 30, wherein the first decomposed signal and the third decomposed signal are different signals of a vocal timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-vocal timbre, and/or wherein at least at a transition reference time and or a controller reference position a sum of the first transition function and the third transition function is smaller than a sum of the second transition function and the fourth transition function.
32. Device of claim 30 or claim 31, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, wherein the second decomposed signal and the fourth decomposed signal are different signals of a non-drum timbre, and/or wherein at least at a transition reference time and/or at a controller reference position a sum of the first transition function and the third transition function is larger than a sum of the second transition function and the fourth transition function.
33. Device of claim 25 and at least one of claims 30 to 32, wherein the first decomposed signal and the third decomposed signal are different signals of a drum timbre, and/or wherein a sum of the first transition function and the third transition function is substantially constant, preferably a maximum volume level, throughout the entire transition time interval and/or the entire controller range.
34. Device of claim 25 and at least one of claims 30 to 33, wherein the first decomposed signal and the third decomposed signal are different signals of a non-drum timbre, a vocal timbre or a harmonic timbre, and/or wherein a sum of the first transition function and the third transition function has a minimum, preferably substantially zero volume level, between the transition start time (T1) and the transition end time (T3) and/or between the controller first end position and the controller second end position.
35. Device of at least one of claims 22 to 34, further including an analyzing unit configured to analyze an audio signal, preferably at least one of the decomposed signals, to determine a song part junction between two song parts within the first input audio track or within the second input audio track, wherein a transition time interval of at least one of the transition functions is set such as to include the song part junction.
36. Device of at least one of claims 22 to 35, further including a user interface configured to accept a user input referring to a transition command, including at least one transition parameter, wherein the transition unit is configured to set at least one of the transition functions according to the transition parameter, wherein the transition parameter is preferably selected from the group consisting of: a transition start time (T1 ) of a transition time interval of at least one of the transition functions, a transition end time (T3) of a transition time interval of at least one of the transition functions, a length of a transition time interval of at least one of the transition functions, a transition reference time (T2) within the transition time interval of at least one of the transition functions,
- a slope, shape or offset of at least one of the transition functions,
- an assignment or deassignment of a preset transition function to or from a selected one of the plurality of decomposed tracks.
37. Device of claim 36, wherein the device includes a display unit configured to display a graphical representation of the first input audio track and/or the second input audio track, wherein the user interface is configured to receive at least one transition parameter through a selection or marker applied by the user in relation to the graphical representation of the first input audio track and/or the second input audio track.
38. Device of claim 36 or claim 37, wherein the device includes a display unit configured to display a graphical representation of at least one of the decomposed signals, wherein the user interface is configured to allow a user to assign or deassign a preset transition function to or from a selected one of the plurality of decomposed tracks.
39. Device of at least one of claims 22 to 38, further comprising a tempo matching unit configured to determine a tempo of the first and/or second input track, and to carry out a tempo matching processing based on the determined tempo, including a time stretching or resampling of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching tempos.
40. Device of at least one of claims 22 to 39, further comprising a key matching unit configured to determine a key of the first and/or second input track, and to carry out a key matching processing based on the determined key, including a pitch shifting of audio data obtained from the first input track and/or the second input track, such that the first output signal and the second output signal have mutually matching keys.
PCT/EP2020/065995 2020-03-06 2020-06-09 Playback transition from first to second audio track with transition functions of decomposed signals WO2021175458A1 (en)

Priority Applications (16)

Application Number Priority Date Filing Date Title
EP20730432.0A EP4115628A1 (en) 2020-03-06 2020-06-09 Playback transition from first to second audio track with transition functions of decomposed signals
PCT/EP2020/074034 WO2021175460A1 (en) 2020-03-06 2020-08-27 Method, device and software for applying an audio effect, in particular pitch shifting
AU2020433340A AU2020433340A1 (en) 2020-03-06 2020-10-16 Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
EP20792654.4A EP4115629A1 (en) 2020-03-06 2020-10-16 Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
PCT/EP2020/079275 WO2021175461A1 (en) 2020-03-06 2020-10-16 Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
PCT/EP2020/081540 WO2021175464A1 (en) 2020-03-06 2020-11-09 Method, device and software for controlling timing of audio data
EP20800953.0A EP4115630A1 (en) 2020-03-06 2020-11-09 Method, device and software for controlling timing of audio data
PCT/EP2021/055795 WO2021176102A1 (en) 2020-03-06 2021-03-08 Ai based remixing of music: timbre transformation and matching of mixed audio data
EP21709063.8A EP4133748A1 (en) 2020-03-06 2021-03-08 Ai based remixing of music: timbre transformation and matching of mixed audio data
US17/905,552 US20230120140A1 (en) 2020-03-06 2021-03-08 Ai based remixing of music: timbre transformation and matching of mixed audio data
US17/343,386 US20210326102A1 (en) 2020-03-06 2021-06-09 Method and device for determining mixing parameters based on decomposed audio data
US17/343,546 US11347475B2 (en) 2020-03-06 2021-06-09 Transition functions of decomposed signals
US17/459,450 US11462197B2 (en) 2020-03-06 2021-08-27 Method, device and software for applying an audio effect
US17/689,574 US11488568B2 (en) 2020-03-06 2022-03-08 Method, device and software for controlling transport of audio data
US17/741,678 US20220269476A1 (en) 2020-03-06 2022-05-11 Transition functions of decomposed signals
US17/747,473 US20220284875A1 (en) 2020-03-06 2022-05-18 Method, device and software for applying an audio effect

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
PCT/EP2020/056124 WO2021175455A1 (en) 2020-03-06 2020-03-06 Method and device for decomposing and recombining of audio data and/or visualizing audio data
EPPCT/EP2020/056124 2020-03-06
EPPCT/EP2020/057330 2020-03-17
PCT/EP2020/057330 WO2021175456A1 (en) 2020-03-06 2020-03-17 Method and device for decomposing, recombining and playing audio data
PCT/EP2020/062151 WO2021175457A1 (en) 2020-03-06 2020-04-30 Live decomposition of mixed audio data
EPPCT/EP2020/062151 2020-04-30

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2020/062151 Continuation-In-Part WO2021175457A1 (en) 2020-03-06 2020-04-30 Live decomposition of mixed audio data
PCT/EP2020/074034 Continuation-In-Part WO2021175460A1 (en) 2020-03-06 2020-08-27 Method, device and software for applying an audio effect, in particular pitch shifting

Related Child Applications (3)

Application Number Title Priority Date Filing Date
PCT/EP2020/062151 Continuation-In-Part WO2021175457A1 (en) 2020-03-06 2020-04-30 Live decomposition of mixed audio data
US17/343,386 Continuation US20210326102A1 (en) 2020-03-06 2021-06-09 Method and device for determining mixing parameters based on decomposed audio data
US17/343,546 Continuation US11347475B2 (en) 2020-03-06 2021-06-09 Transition functions of decomposed signals

Publications (1)

Publication Number Publication Date
WO2021175458A1 true WO2021175458A1 (en) 2021-09-10

Family

ID=69846409

Family Applications (5)

Application Number Title Priority Date Filing Date
PCT/EP2020/056124 WO2021175455A1 (en) 2020-03-06 2020-03-06 Method and device for decomposing and recombining of audio data and/or visualizing audio data
PCT/EP2020/057330 WO2021175456A1 (en) 2020-03-06 2020-03-17 Method and device for decomposing, recombining and playing audio data
PCT/EP2020/062151 WO2021175457A1 (en) 2020-03-06 2020-04-30 Live decomposition of mixed audio data
PCT/EP2020/065995 WO2021175458A1 (en) 2020-03-06 2020-06-09 Playback transition from first to second audio track with transition functions of decomposed signals
PCT/EP2020/081540 WO2021175464A1 (en) 2020-03-06 2020-11-09 Method, device and software for controlling timing of audio data

Family Applications Before (3)

Application Number Title Priority Date Filing Date
PCT/EP2020/056124 WO2021175455A1 (en) 2020-03-06 2020-03-06 Method and device for decomposing and recombining of audio data and/or visualizing audio data
PCT/EP2020/057330 WO2021175456A1 (en) 2020-03-06 2020-03-17 Method and device for decomposing, recombining and playing audio data
PCT/EP2020/062151 WO2021175457A1 (en) 2020-03-06 2020-04-30 Live decomposition of mixed audio data

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/081540 WO2021175464A1 (en) 2020-03-06 2020-11-09 Method, device and software for controlling timing of audio data

Country Status (7)

Country Link
US (3) US20230089356A1 (en)
EP (2) EP4005243B1 (en)
CA (1) CA3170462A1 (en)
DE (1) DE202020005830U1 (en)
ES (1) ES2960983T3 (en)
MX (1) MX2022011059A (en)
WO (5) WO2021175455A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11762445B2 (en) * 2017-01-09 2023-09-19 Inmusic Brands, Inc. Systems and methods for generating a graphical representation of audio signal data during time compression or expansion
US11232773B2 (en) * 2019-05-07 2022-01-25 Bellevue Investments Gmbh & Co. Kgaa Method and system for AI controlled loop based song construction
US11475867B2 (en) * 2019-12-27 2022-10-18 Spotify Ab Method, system, and computer-readable medium for creating song mashups
EP4115628A1 (en) * 2020-03-06 2023-01-11 algoriddim GmbH Playback transition from first to second audio track with transition functions of decomposed signals
EP4115630A1 (en) * 2020-03-06 2023-01-11 algoriddim GmbH Method, device and software for controlling timing of audio data
EP4115629A1 (en) * 2020-03-06 2023-01-11 algoriddim GmbH Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
US11604622B1 (en) * 2020-06-01 2023-03-14 Meta Platforms, Inc. Selecting audio clips for inclusion in content items
US20220337898A1 (en) * 2021-04-20 2022-10-20 Block, Inc. Live playback streams
CN114299976A (en) * 2022-03-06 2022-04-08 荣耀终端有限公司 Audio data processing method and electronic equipment
WO2023217352A1 (en) 2022-05-09 2023-11-16 Algoriddim Gmbh Reactive dj system for the playback and manipulation of music based on energy levels and musical features
US11740862B1 (en) 2022-11-22 2023-08-29 Algoriddim Gmbh Method and system for accelerated decomposing of audio data using intermediate data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180277076A1 (en) * 2016-06-30 2018-09-27 Nokia Technologies Oy Intelligent Crossfade With Separated Instrument Tracks
US20190246204A1 (en) * 2013-05-30 2019-08-08 Spotify Ab Systems and methods for automatic mixing of media

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6184898B1 (en) 1998-03-26 2001-02-06 Comparisonics Corporation Waveform display utilizing frequency-based coloring and navigation
US8311656B2 (en) * 2006-07-13 2012-11-13 Inmusic Brands, Inc. Music and audio playback system
US7525037B2 (en) * 2007-06-25 2009-04-28 Sony Ericsson Mobile Communications Ab System and method for automatically beat mixing a plurality of songs using an electronic equipment
WO2010137057A1 (en) * 2009-05-25 2010-12-02 パイオニア株式会社 Cross-fader apparatus, mixer apparatus and program
US9323438B2 (en) * 2010-07-15 2016-04-26 Apple Inc. Media-editing application with live dragging and live editing capabilities
US9398390B2 (en) * 2013-03-13 2016-07-19 Beatport, LLC DJ stem systems and methods
WO2015143076A1 (en) * 2014-03-19 2015-09-24 Torrales Jr Hipolito Method and system for selecting tracks on a digital file
US10014002B2 (en) * 2016-02-16 2018-07-03 Red Pill VR, Inc. Real-time audio source separation using deep neural networks
JP6705422B2 (en) * 2017-04-21 2020-06-03 ヤマハ株式会社 Performance support device and program
WO2019229199A1 (en) * 2018-06-01 2019-12-05 Sony Corporation Adaptive remixing of audio content
US10991385B2 (en) * 2018-08-06 2021-04-27 Spotify Ab Singing voice separation with deep U-Net convolutional networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190246204A1 (en) * 2013-05-30 2019-08-08 Spotify Ab Systems and methods for automatic mixing of media
US20180277076A1 (en) * 2016-06-30 2018-09-27 Nokia Technologies Oy Intelligent Crossfade With Separated Instrument Tracks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GERARD ROMA ET AL: "MUSIC REMIXING AND UPMIXING USING SOURCE SEPARATION", PROCEEDINGS OF THE 2 ND AES WORKSHOP ON INTELLIGENT MUSIC PRODUCTION, 13 September 2016 (2016-09-13), XP055743124 *
PRETET: "Singing Voice Separation: A study on training data", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP, 2019, pages 506 - 510, XP033566106, DOI: 10.1109/ICASSP.2019.8683555

Also Published As

Publication number Publication date
WO2021175456A1 (en) 2021-09-10
WO2021175457A1 (en) 2021-09-10
EP4311268A3 (en) 2024-04-10
US20210326102A1 (en) 2021-10-21
US11216244B2 (en) 2022-01-04
ES2960983T3 (en) 2024-03-07
EP4005243A1 (en) 2022-06-01
US20210279030A1 (en) 2021-09-09
WO2021175464A1 (en) 2021-09-10
EP4311268A2 (en) 2024-01-24
US20230089356A1 (en) 2023-03-23
EP4005243B1 (en) 2023-08-23
MX2022011059A (en) 2022-09-19
WO2021175455A1 (en) 2021-09-10
CA3170462A1 (en) 2021-09-10
DE202020005830U1 (en) 2022-09-26

Similar Documents

Publication Publication Date Title
US11347475B2 (en) Transition functions of decomposed signals
WO2021175458A1 (en) Playback transition from first to second audio track with transition functions of decomposed signals
JP2012103603A (en) Information processing device, musical sequence extracting method and program
US11462197B2 (en) Method, device and software for applying an audio effect
US20230120140A1 (en) Ai based remixing of music: timbre transformation and matching of mixed audio data
AU2022218554B2 (en) Method and device for decomposing, recombining and playing audio data
JP7136979B2 (en) Methods, apparatus and software for applying audio effects
JP5879996B2 (en) Sound signal generating apparatus and program
CN110574107B (en) Data format
JP6926354B1 (en) AI-based DJ systems and methods for audio data decomposition, mixing, and playback
WO2021175461A1 (en) Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
US20230343314A1 (en) System for selection and playback of song versions from vinyl type control interfaces
WO2023217352A1 (en) Reactive dj system for the playback and manipulation of music based on energy levels and musical features
NZ791507A (en) Method and device for decomposing, recombining and playing audio data
Eisele Sound Design and Mixing in Reason
JP2008225111A (en) Karaoke machine and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20730432

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020730432

Country of ref document: EP

Effective date: 20221006

NENP Non-entry into the national phase

Ref country code: DE