WO2021175461A1 - Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte - Google Patents

Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte Download PDF

Info

Publication number
WO2021175461A1
WO2021175461A1 PCT/EP2020/079275 EP2020079275W WO2021175461A1 WO 2021175461 A1 WO2021175461 A1 WO 2021175461A1 EP 2020079275 W EP2020079275 W EP 2020079275W WO 2021175461 A1 WO2021175461 A1 WO 2021175461A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
effect
unit
data
track
Prior art date
Application number
PCT/EP2020/079275
Other languages
English (en)
Inventor
Kariem Morsy
Original Assignee
Algoriddim Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/EP2020/056124 external-priority patent/WO2021175455A1/fr
Priority to EP20792654.4A priority Critical patent/EP4115629A1/fr
Priority to AU2020433340A priority patent/AU2020433340A1/en
Application filed by Algoriddim Gmbh filed Critical Algoriddim Gmbh
Priority to EP20800953.0A priority patent/EP4115630A1/fr
Priority to PCT/EP2020/081540 priority patent/WO2021175464A1/fr
Priority to JP2021035838A priority patent/JP6926354B1/ja
Priority to PCT/EP2021/055795 priority patent/WO2021176102A1/fr
Priority to US17/905,552 priority patent/US20230120140A1/en
Priority to EP21709063.8A priority patent/EP4133748A1/fr
Priority to JP2021137938A priority patent/JP7136979B2/ja
Priority to US17/459,450 priority patent/US11462197B2/en
Publication of WO2021175461A1 publication Critical patent/WO2021175461A1/fr
Priority to US17/689,574 priority patent/US11488568B2/en
Priority to US17/747,473 priority patent/US20220284875A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • G10H1/04Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos by additional modulation
    • G10H1/053Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos by additional modulation during execution only
    • G10H1/057Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos by additional modulation during execution only by envelope-forming circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/125Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • G10H2210/195Modulation effects, i.e. smooth non-discontinuous variations over a time interval, e.g. within a note, melody or musical transition, of any sound parameter, e.g. amplitude, pitch, spectral response, playback speed
    • G10H2210/235Flanging or phasing effects, i.e. creating time and frequency dependent constructive and destructive interferences, obtained, e.g. by using swept comb filters or a feedback loop around all-pass filters with gradually changing non-linear phase response or delays
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/325Musical pitch modification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/375Tempo or beat alterations; Music timing control
    • G10H2210/391Automatic tempo adjustment, correction or control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/325Synchronizing two or more audio tracks or files according to musical features or musical timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/035Crossfade, i.e. time domain amplitude envelope control of the transition between musical sounds or melodies, obtained for musical purposes, e.g. for ADSR tone generation, articulations, medley, remix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/01Input selection or mixing for amplifiers or loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments

Definitions

  • the present invention relates to a method for processing music audio data comprising the steps of providing input audio data representing a piece of music containing a mixture of predetermined musical timbres and applying an audio effect to the input audio data. Furthermore, the present invention relates to a device processing music audio data and a software suitable to run on a computer to control the computer to process audio data.
  • Audio effects which modify certain sound parameters of the music such as to change the character of the sound without substantially changing the musical composition as such.
  • Examples of known audio effects are reverb effects, delay effects, chorus effects, equalizers, filters, pitch shifting or pitch scaling effects, and tempo shifts (time-stretching / resampling).
  • Another audio processing application is a sound editing environment such as a digital audio workstation (DAW) or similar software, which allows import of a mixed mono or stereo audio file and editing the audio file by application of one or more audio effects.
  • audio effects include editing effects such as time stretching, resampling, pitch shifting, reverb, delay, chorus, equalizer (EQ) etc.
  • Digital audio workstations are used by producers or mixing/mastering engineers, in recording studios, postproduction studios or the like. ln most audio processing applications, the input audio data are mono or stereo audio files containing one (mono) or two (stereo) mixed audio tracks of a piece of music.
  • the mixed audio tracks may be produced in recording studios by mixing a plurality source tracks, which are programmed on a computer (for example a drum computer) or obtained from directly recording individual instruments or vocals. In other cases, mixed audio tracks are obtained from live recording of a concert or from recording the output of a playback device, for example a vinyl player. Mixed audio tracks are often distributed by music distributors via streaming or downloading or broadcasted by radio or TV broadcasting services.
  • the application of audio effects can sometimes distort the character of the sound such that the music sounds less natural and the presence of the audio effect becomes audible more than desired.
  • the audio effect is applied for the purpose of correcting some acoustic shortfall or for the purpose of matching the sound of one song to that of another song, such as in a DJ environment in which a smooth transition from one song to the another song is desired, it is generally an aim to apply the effect in such a manner that the listener will not recognize the presence of the effect or will at least not perceive a significant change of the character of the piece of music.
  • the audio effect may be a pitch scaling effect changing the pitch of audio data while maintaining its playback duration, which might be desired by DJs to match the key of one song to that of another song such as to smoothly crossfade between the two songs (without the clashing of different keys).
  • Conventional pitch scaling will lead to an unnatural distortion of the music, when the pitch is shifted by more than one or two semitones. This results in a limitation of the creative freedom of the DJ.
  • a method for processing music audio data comprising the steps of (a) providing input audio data representing a first piece of music containing a mixture of predetermined musical timbres, (b) decomposing the input audio data to generate at least a first audio track representing a first musical timbre selected from the predetermined musical timbres, and a second audio track representing a second musical timbre selected from the predetermined musical timbres, (c) applying a predetermined first audio effect to the first audio track, (d) applying no audio effect or a predetermined second audio effect, which is different from the first audio effect, to the second audio track, (e) recombining the first audio track (with the effect applied) with the second audio track to obtain recombined audio data.
  • the input audio data are decomposed to obtain at least two different audio tracks of different musical timbres, the first audio effect is applied to only one of the two audio tracks, and the audio tracks are then recombined again to obtain recombined audio data.
  • the first audio effect it becomes possible to apply the first audio effect in a more sophisticated and differentiated manner to affect only selected musical timbres.
  • a reverb effect may be applied to only a vocal component but not, or only with reduced intensity to a drum component of the audio track, such as to provide new options for modifying the character of the sound of a piece of music by virtue of a reverb effect.
  • a PA system for music entertainment is controlled by a DJ
  • the second audio track may receive no audio effect at all such as to remain unchanged, i.e. audio data of the second audio track at the time of its generation in step (b) and at the time of its recombination in step (e) are equal.
  • the second audio track may receive a predetermined second audio effect, which is different from the first audio effect.
  • input audio data are preferably mono or stereo audio files containing one (mono) or two (stereo) mixed audio tracks of a piece of music.
  • the mixed audio tracks may be produced in recording studios by mixing a plurality source tracks, which are programmed on a computer (for example a drum computer) or obtained from directly recording individual instruments or vocals. In other cases, mixed audio tracks are obtained from live recording of a concert or from recording the output of a playback device, for example a vinyl player.
  • Mixed audio tracks are often distributed by music distributors via streaming or downloading or broadcasted by radio or TV broadcasting services.
  • an audio effect is defined by an effect type, such as reverb, chorus, delay, pitch scaling, tempo shifts, etc, and at least one effect parameter, such as a wet/dry parameter, chorus intensity, delay time/intensity, pitch shift value (e.g. number of semitones or cents up/down), or a tempo shift value (e.g. sample rate change ratio) etc.
  • an effect type such as reverb, chorus, delay, pitch scaling, tempo shifts, etc
  • at least one effect parameter such as a wet/dry parameter, chorus intensity, delay time/intensity, pitch shift value (e.g. number of semitones or cents up/down), or a tempo shift value (e.g. sample rate change ratio) etc.
  • two audio effects are different, if they differ in effect type or in at least one effect parameter.
  • the feature that the second audio effect is different from the first audio effect includes cases in which the second audio effect has an effect type which is different from the effect type of the first audio effect, as well
  • an audio effect is defined as an effect that typically modifies the shape of the waveform of an audio signal contained in music audio data, or modifies at least part of that waveform (in particular a time interval).
  • audio effects are distinguished from simple volume changes that just scale the amplitude of the waveform without modifying the shape of the waveform.
  • An audio effect in the sense of the present invention may comprise at least one of a parametric equalizer (EQ with low, middle, high frequency bands, for example, or with any other frequency bands), a high pass filter, a low pass filter, a flanger (a frequency modulation that uses a delay effect introduced to the signal in a feedback loop), a phaser (a frequency-modulated sound mixed back into the original sound or a sound obtained by phase-shifting of a part of the signal), a chorus, a vocoder, a harmonizer, a pitch shifter, a gate (a filter attenuating signals below a threshold volume level), a reverb effect, a delay effect, an echo effect, a bit crusher (an audio effect producing distortion by reducing the resolution or band width of the input audio data), a tremolo effect, a loop roll effect, a beat roll effect, a beat masher, a censor effect, a back spin effect, a scratch effect (variation of dynamic sample rate conversion and/or forward and reverse playback
  • audio effects may have effect parameters, for example beat parameters or timing parameters, wherein a beat or timing parameter may be chosen depending on the beat of the music contained in the audio signal, wherein the beat may be determined by known beat detection algorithms or may be taken from metadata of the audio data.
  • the timing effect parameter may represent the beat or a fraction or a multiple of the beat.
  • the first audio effect or any audio effect according to the invention may be applied to the entire audio track or only to a time interval of the audio track. Also effect automations are possible in which effect parameter are changed over the playing time.
  • the method according to the first aspect of the invention may be used in a DJ equipment (such as a DJ software, a DJ device etc.) in order to allow the application of audio effects to only selected musical timbres of a song or to allow different audio effects to different musical timbres of a song.
  • a DJ equipment such as a DJ software, a DJ device etc.
  • the method according to the first aspect of the invention may be used in a sound editing environment such as a digital audio workstation (DAW) or similar software, which has a functionality to import a mixed mono or stereo audio file as input audio data and to edit the input audio data by application of one or more audio effects.
  • DAW digital audio workstation
  • the decomposed first and second audio tracks may then be edited differently and separately from one another, by applying (or not applying) audio effects such as time stretching, resampling, pitch shifting, reverb, delay, chorus, equalizer (EQ) etc.
  • Such digital audio workstation may be used by producers or mixing/mastering engineers, in recording studios, postproduction studios or the like, and it allows to process mixed audio files (for example mixed songs obtained from music distribution services or record labels or from live recording a mixture of different instruments or other sound sources).
  • mixed audio files for example mixed songs obtained from music distribution services or record labels or from live recording a mixture of different instruments or other sound sources.
  • the user may obtain access to individual audio tracks of specific musical timbres for the purpose of applying desired audio effects in a more targeted and sophisticated manner.
  • the first audio track (with the first audio effect applied) and the second audio track (with no audio effect applied or a different audio effect applied) are recombined again to form a single audio track, which may be stored to a storage medium or further processed or played back.
  • the method could include a first playback mode, in which the original input audio data is played back or in which recombined audio data obtained by recombining all decomposed audio tracks (in particular first audio track and second audio track) obtained from decomposing the input audio data, i.e. without any audio effects and preferably without any volume changes applied to the individual decomposed audio tracks, are played back, and a second playback mode, in which the at least one first audio effect is applied to at least one of the decomposed audio tracks, while the other decomposed tracks are unmodified.
  • a first playback mode in which the original input audio data is played back or in which recombined audio data obtained by recombining all decomposed audio tracks (in particular first audio track and second audio track) obtained from decomposing the input audio data, i.e. without any audio effects and preferably without any volume changes applied to the individual decomposed audio tracks, are played back
  • a second playback mode in which the at least one first audio effect is applied to at least one of the decomposed audio tracks,
  • the method could be made, at any desired point in time during the playback duration, to switch from the first playback mode to the second playback mode and/or to switch from the second playback mode to the first playback mode.
  • the at least one audio effect could be inserted to the desired timbre within a desired time interval, while ensuring continuous, uninterrupted playback of the piece of music.
  • Such first and second playback modes are particularly advantageous for DJ applications of the method where audio effects can be seamlessly turned on and off on the fly.
  • the method may include a step of receiving a user input (for example a user selection) representing a time interval within the piece of music, in which the first audio effect is to be applied to the first (decomposed) audio track, wherein the method is adapted to create and store in a storage unit output data (in particular in a destination audio file) representing a modified version of the piece of music, wherein, at playback positions outside the time interval, the output data correspond to the input audio data, while at playback positions within the time interval, the output data corresponds to a recombination of the first audio track to which the first audio effect has been applied, with at least the second and/or all remaining decomposed audio tracks to which the first audio effect has not been applied.
  • a user input for example a user selection
  • the method is adapted to create and store in a storage unit output data (in particular in a destination audio file) representing a modified version of the piece of music, wherein, at playback positions outside the time interval, the output data correspond to the input audio data, while at play
  • the output data may be substantially equal to the input audio data, or the audio signal of the output data may be substantially phonetically equal to the audio signal of the input audio data, specifically if the output data are obtained by recombination of all decomposed audio tracks (in particular first audio track and second audio track) obtained from decomposing the input audio data of the piece of music, without any audio effects and preferably without any volume changes applied to the individual decomposed audio tracks.
  • the first audio effect is a pitch scaling effect changing the pitch of audio data of the first audio track while maintaining its playback duration / rate.
  • a pitch scaling effect achieves a much more natural result, when applied only to some of the musical timbres of the piece of music. For example, drum timbres do not have a musical pitch and thus do not need to be pitch shifted, which avoids distortion of the drums, in particular when shifting the pitch by more than one or two semitones up or down.
  • timbres having melodic components or contain actual notes of different pitches according to the key/harmonies of the music may be pitch shifted such as to shift the key of the piece of music to the desired key, while other timbres, such as drums or maybe spoken, non-melodic vocals, such as in Rap music, may remain unchanged with regard to their pitch.
  • pitch scaling becomes particularly prominent, if, in a preferred embodiment, the pitch is shifted by more than 2 semitones, more preferably more than 5 semitones, even more preferably more than 11 semitones.
  • pitch shifts by more 5 semitones or even more than 11 semitones allow great freedom for matching the keys of two different songs.
  • the pitch scaling effect may shift the pitch of the audio data of the first audio track up or down by a predetermined number of semitones. This allows pitch shifts for musical purposes, such as to transpose a song to a different key, which might be useful for a DJ for matching the key of one song to the key of another song, in order to allow simultaneous playback of both songs for several artistic reasons, such as smooth crossfades between the two songs (without clashing of different harmonies).
  • the first audio effect may be a time shifting effect, in particular quantization effect, which is adapted to insert time stretchings or time compressions or perform cutting out time intervals of the audio track at selected positions within an audio track in order to shift certain portions or the audio track such as to match a beat of the piece of music (timing corrections). For example, if one of the musical timbres is found to have incorrect timing or if timing of one of the timbres is to be modified for any other purposes, the user may do such timing changes on the desired audio track, for example the first audio track, without affecting the timing of the audio tracks of the other musical timbres. This feature is particularly relevant when the method is implemented in a digital audio workstation.
  • such method allows to correct or modify the timing of a vocal part of a song without changing the timing of the accompaniment part (remaining or non vocal timbres of the song).
  • the present invention allows post production of mixed songs by granting access to the original (or near original) audio tracks representing the individual musical timbres (instruments, vocals, etc.) that make up the mixed song, even if, in a post-production situation, such original audio tracks are no longer available to the user.
  • step b of decomposing the audio data generates a first audio track and a second audio track which are complements, such that their sum substantially equals the input audio data.
  • step (e) of recombining the first and second audio tracks to easily return to the audio signal of the original input audio data by removing the audio effect applied to the first or second audio track, respectively.
  • the first musical timbre is a harmonic vocal timbre (a vocal timbre having melodic components or containing actual notes of different pitches according to the key/harmonies of the music) or a harmonic instrumental timbr ⁇ e (an instrumental timbre having melodic components or containing actual notes of different pitches according to the key/harmonies of the music, for example a timbre including at least one of bass, guitars, piano, strings, etc.), or a combination of a harmonic vocal timbre and a harmonic instrumental timbre (denoted as a melodic timbre, for example a timbre which includes all signal components except drums and bass) and/or the second musical timbre is a non harmonic vocal timbre or a non-harmonic instrumental timbre, preferably a drum timbre.
  • a harmonic vocal timbre a vocal timbre having melodic components or containing actual notes of different pitches according to the key/harmonies of the music
  • a harmonic instrumental timbr ⁇ e an instrumental timbre having melodic
  • step b of decomposing the audio data there is generated the first audio track, the second audio track, and a third audio track representing a third musical timbre, wherein the first audio track, the second audio track and the third audio track are complements, such that their sum substantially equals the input audio data, wherein in step c, the predetermined first audio effect is applied to the first audio track, but not to the second audio track and not to the third audio track, and wherein in step d, the first audio track with the first audio effect applied, the second audio track and the third audio track are recombined to obtain the recombined audio data.
  • the input audio data are separated into three audio tracks of different musical timbres, which allows different effect settings to be applied to three different components of the music.
  • Methods according to the first aspect of the invention use a step of decomposing input audio data to obtain a first and second audio tracks containing different musical timbres.
  • decomposing algorithms and services are known in the art as such, which allow decomposing audio signals to separate therefrom one or more signal components of different timbres, such as vocal components, drum components or instrumental components.
  • Such decomposed signals and decomposed tracks have been used in the past to create certain artificial effects such as removing vocals from a song to create a karaoke version of a song, and they could be used in step (b) of the method of the present invention.
  • step b of decomposing the input audio data may include processing the input audio data by an Al system containing a trained neural network.
  • An Al system may implement a convolutional neural network (CNN), which has been trained by a plurality of data sets for example including a vocal track, a harmonic/instrumental track and a mix of the vocal track and the harmonic/instrumental track.
  • CNN convolutional neural network
  • Examples for conventional Al systems capable of separating source tracks such as a singing voice track from a mixed audio signal include: Pretet, “Singing Voice Separation: A study on training data”, Acoustics, Speech and Signal Processing (ICASSP), 2019, pages 506-510; “spleeter” - an open-source tool provided by the music streaming company Deezer based on the teaching of Pretet above, “PhonicMind” (https://phonicmind.com) - a voice and source separator based on deep neural networks, Open-Unmix” - a music source separator based on deep neural networks in the frequency domain, or “Demucs” by Facebook Al Research - a music source separator based on deep neural networks in the waveform domain.
  • These tools accept music files in standard formats (for example MP3, WAV, AIFF) and decompose the song to provide decomposed/separated tracks of the song, for example a vocal track, a bass track, a drum track, an accompaniment track or any mixture thereof.
  • standard formats for example MP3, WAV, AIFF
  • output data obtained from the recombined audio data are further processed, preferably stored in a storage unit, and/or played back by a playback unit and/or mixed with second-song output data, wherein obtaining the recombined audio data and/or further processing the output data is preferably performed within a time smaller than 5 second, preferably smaller than 200 milliseconds, after the start of decomposing the input audio data.
  • the method may further comprise the steps of determining a first key of the first piece of music of the input audio data, providing second-song input data representing a second piece of music, determining a second key of the second piece of music of the second-song audio data, and determining a pitch shift value based on the first key and the second key, wherein in step (c), the pitch of the first audio track is shifted by the pitch shift value, while maintaining the pitch of the second track, wherein the method preferably further comprises a step of mixing output data obtained from the recombined audio data with second-song output data obtained from the second-song input data, such as to obtain mixed output data, and wherein the method preferably further comprises a step of playing back playback data obtained from the mixed output data.
  • the method is specifically suited for an application by a DJ, for example in a DJ equipment, when the keys of two songs are to be matched automatically in order to allow for smooth transitions between the two songs.
  • sound artefacts or distortions can be avoided or substantially reduced even when the key of a song is shifted by more than one or two semitones.
  • a device for processing music audio data comprising an input unit for receiving input audio data representing a first piece of music containing a mixture of predetermined musical timbres, a decomposition unit for decomposing the input audio data received from the input unit to generate at least a first audio track representing a first musical timbre selected from the predetermined musical timbres, and a second audio track representing a second musical timbre selected from the predetermined musical timbres, an effect unit for applying a predetermined first audio effect to the first audio track and for applying no audio effect or a predetermined second audio effect, which is different from the first audio effect, to the second audio track, and a recombination unit for recombining the first audio track with the second audio track to obtain recombined audio data.
  • a device of the second aspect can be formed by a computer having a microprocessor, a storage unit an input interface and an output interface, wherein at least the input unit, the decomposition unit, the effect unit and the recombination unit are formed by a software program running on the computer.
  • the computer is preferably adapted to carry out a method according to the first aspect of the invention.
  • effect unit may be a pitch scaling unit for changing the pitch of audio data of the first audio track while maintaining its playback duration or playback rate.
  • Such device may show particular advantages when forming part of DJ equipment in which transposition of a song from one key to another is desired. It has been found that sound distortions by pitch scaling can be reduced or avoided, if the pitch scaling effect is applied only to some of the musical timbres included in a piece of music.
  • the decomposition unit preferably includes an Al system containing a trained neural network, wherein the neural network is trained to separate audio data of a predetermined musical timbre from audio data containing a mixture of different musical timbres.
  • Al system are able to separate different musical timbres of a song with high quality.
  • a device of the second aspect of the invention may further comprise a storage unit adapted to store the output data, which allows further processing of the output data, for example at any later point in time.
  • the device may have a playback unit adapted to play back the output data, such that the device is prepared to be used as a music player or for public audition of music through connection to a PA system.
  • the device may have a mixing unit adapted to mix the output data with second-song output data, which allows the use of the device as DJ equipment.
  • the device may further comprise a first key detection unit for determining a first key of the first piece of music of the input audio data, a second- song input unit for providing second-song input data representing a second piece of music, a second key detection unit for determining a second key of the second piece of music of the second-song audio data, a pitch shift calculation unit for determining a pitch shift value based on the first key and the second key, wherein the effect unit is a pitch scaling unit adapted to shift the pitch of the first audio track by the pitch shift value, while maintaining the pitch of the second track.
  • the device is a DJ device.
  • the device may then further comprise a mixing unit adapted to mix output data obtained from the recombined audio data with second-song output data obtained from the second-song input data, such as to obtain mixed output data, and preferably a playback unit adapted to play back playback data obtained from the mixed output data.
  • the device may further comprise a second-song input unit for providing second-song input data representing a second piece of music, a mixing unit adapted to mix output data obtained from the recombined audio data with second-song output data obtained from the second-song input data, such as to obtain mixed output data, and a crossfading unit having a crossfading controller that can be manipulated by a user to assume a control position within a control range, wherein the crossfading unit sets a first volume level of the output data and a second volume level of the second-song output data depending on the control position of the crossfading controller, such that the first volume level is maximum and the second volume level is minimum when the crossfading controller is at one end point of the control range, and the first volume level is minimum and the second volume level is maximum when the crossfading controller is at the other end point of the control range.
  • the device may comprise an effect control unit adapted to allow a user to control an operation of the effect unit, in particular to control the application of at least the first audio effect and/or for controlling an effect type and/or an effect parameter of at least the first audio effect.
  • the effect unit may have a first operational mode in which it applies the first audio effect to the first audio track, but not to the second audio track, and may have a second operational mode in which it applies the first audio effect to the second audio track, but not to the first audio track.
  • the effect unit controls a plurality of audio effects
  • the effect control unit comprises an effect control element, which is adapted to allow a user to select at least one audio effect from the plurality of audio effects as the first audio effect to be applied to the first audio track.
  • the effect control unit may comprise a parameter control element, which is adapted to allow a user to control at least one effect parameter of the first audio effect. This allows a user not only to choose a suitable audio effect but also to adjust the selected audio effect to his/her needs.
  • the decomposition unit is adapted to decompose the input audio data to generate a plurality of decomposed audio tracks each representing different timbres selected from the predetermined musical timbres
  • the effect control unit comprises a routing control element, which is adapted to allow a user to select at least one of the plurality of decomposed audio tracks as the selected decomposed audio track, and wherein the effect unit applies an audio effect or the selected audio effect or the first audio effect to the at least one selected decomposed audio track.
  • a routing control element allows the application of individual audio effects to individual decomposed audio tracks, although it is not necessary to provide a separate effect unit for each decomposed audio track. This reduces costs and provides more flexibility for the user.
  • the effect unit is configured to apply a plurality of different audio effects simultaneously to either one single decomposed audio track or to a plurality of different decomposed audio tracks of the same input audio data (of the same piece of music), i.e. a first audio effect to a first decomposed audio track, and a second audio effect different from the first audio effect to a second decomposed audio track different from the first decomposed audio track, wherein the routing control element may be configured to allow a user to control which audio effect is applied to which decomposed audio track.
  • the decomposition unit is adapted to decompose the input audio data to generate a plurality of decomposed audio tracks, including at least a first decomposed audio track and a second decomposed audio track, wherein each of the plurality of decomposed audio tracks each represents a different timbre selected from the predetermined musical timbres of the same piece of music
  • the effect control unit comprises a combo effect control element, which is adapted to control, preferably by a single control operation of a user, application of at least a first audio effect to the first decomposed audio track and a second audio effect different from the first audio effect to the second decomposed audio track.
  • the combo effect control element of this embodiment accelerates the control of the effect unit for predetermined sets of effects applied to predetermined decomposed audio tracks.
  • a user may apply or remove a plurality of different effects to or from different decomposed audio tracks.
  • An effect control unit of the present invention may comprise two or more effect control sections, each effect control section comprising one or more control elements to control at least one audio effect. This allows controlling two or more audio effects to be applied to the input audio data at the same time.
  • the audio effects controlled by the effect control sections may be applied to different decomposed audio tracks or they may alternatively be applied to the same decomposed audio track as an effect chain, i.e. sequentially (one of the audio effects is applied to a specific decomposed audio track, and the modified decomposed audio track is then submitted to a second audio effect from the plurality of audio effects and, optionally, afterwards is submitted to one or more additional audio effects).
  • the control elements included in the effect control sections may be effect control elements and/or parameter control elements and/or routing control elements and/or combo effect control elements as described above, or any combination thereof.
  • the device of the second aspect may be a computer running a digital audio workstation (DAW).
  • DAW digital audio workstation
  • the above mentioned object of the invention is achieved by a software adapted to run on a computer to control the computer such as to carry out a method of the first aspect of the invention.
  • Such software may be executed/run on known operating systems and platforms, in particular iOS, macOS, Android or Windows running on computers, tablets, and smartphones.
  • the software may be a digital audio workstation (DAW) or a DJ software.
  • Fig. 1 shows a function diagram of a device according to a first specific embodiment of the invention
  • Fig. 2 shows a layout of an effect control unit of the device according to the specific embodiment
  • Fig. 3 shows a layout of a combo effect control element of the device according to the specific embodiment
  • Fig. 4 shows a layout of a DJ control unit that may be used in the specific embodiment of the invention
  • Fig. 5 shows a function diagram of a device according to a second specific embodiment of the invention.
  • components of a device according to a first embodiment are shown, which may all be integrated as hardware or software modules installed on a computer, for example a tablet computer or a smartphone.
  • these hardware or software modules may be parts of a stand-alone DJ device, which includes a housing on which control elements such as control knobs or sliders are mounted to control functions of the device.
  • the device may include an input interface 12 for receiving input audio data or audio signals.
  • the input interface may be adapted to receive digital audio data as audio files via a network or from a storage medium.
  • the input interface 12 may be configured to decode or decompress audio data, when they are received as encoded or compressed data files.
  • the input interface 12 may comprise an analog-digital converter to sample analog data received from an analog audio input (for example a vinyl player or a microphone) and to obtain digital audio data as input audio data.
  • the input audio data provided by input interface 12 are then routed to an input section 14 which contains a first-song input unit 16 and a second-song input unit 18, which are adapted to provide audio data of two different songs according to a user selection.
  • the device may have a user input interface, for example a touch screen, to allow a user to choose songs from a song database and to load it into the first-song input unit 16 or the second-song input unit 18.
  • the audio file of the selected song may be completely loaded into a local memory of the device or portions of the audio file may be continuously streamed (for example via internet from a remote music distribution platform) and further processed before receiving the entire file.
  • the first-song input unit 16 provides first-song audio input data according to a first song selected by a user
  • the second-song input unit 18 provides second-song audio input data according to a second song selected by a user.
  • the first-song audio input data may then be routed to a first key detection unit 20 to detect a first key of the first song, while the second-song audio input data are routed to a second key detection unit 22 to detect a second key of the second song.
  • First and second key detection units 20, 22 are preferably arranged to detect a key or root or fundamental tone of the piece of music according to the 12 semitones of the chromatic scale (e.g. one of C, C sharp, D, D sharp, E, F, F sharp, G, G sharp, A, A sharp, B), including the mode (major or minor).
  • a conventional key detection module may be used as first and second key detection unit, respectively.
  • first and second keys may be detected one after another by one and the same key detection unit.
  • First and second keys may be input into a pitch shift calculation unit 24, which calculates a pitch shift value based on a difference between the two keys.
  • the pitch shift value may be a number of semitones by which the first key needs to be shifted up or down in order to match the second key.
  • the pitch shift value may be a number of semitones by which the first key needs to be shifted up or down in order to assume a key that differs from the second key by a fifth. It has been found that two songs may be mixed and play simultaneously without audible harmonic interference, for example during a crossfading between the two songs, if both songs are at the same key or if their keys differ by a fifth.
  • the first-song audio input data are routed to a decomposition unit 26 which contains an Al system having a trained neural network adapted to decompose the first song audio input data to generate at least a first audio track representing a first musical timbre, a second audio track representing a second musical timbre, and a third audio track representing a third musical timbre.
  • the first musical timbre may be a harmonic timbre (e.g.
  • the second musical timbre may be a non-harmonic timbre, such as a percussion timbre
  • the third musical timbre may be another non-harmonic timbre, such as a drum timbre.
  • Only the first audio track representing the first musical timbre is then routed into a pitch shifting unit 28, which shifts the pitch of the audio data by a predetermined number of semitones up or down, based on the pitch shift value received from the pitch shift calculation unit 24.
  • the second audio track and the third audio track are not routed to the pitch shifting unit 28 but rather bypass the pitch shifting unit 28.
  • only the first audio track including the harmonic timbres is submitted to the pitch shifting, whereas the second and third tracks which include the non-harmonic timbres, maintain their pitch.
  • First audio track, including pitch shift, second audio track and third audio track are then routed into a recombination unit 30 in which they are recombined again into a single audio track (mono or stereo track). Recombination may be performed by simply mixing the audio data.
  • the recombined audio data obtained from recombination unit 30 may then be passed through a first-song effect unit 32 in order to apply some other audio effect, such as a high pass or low pass filter, or an EQ filter, if desired, and to output the result as first-song output data.
  • a first-song effect unit 32 may apply some other audio effect, such as a high pass or low pass filter, or an EQ filter, if desired, and to output the result as first-song output data.
  • the second-song audio input data obtained from the second- song input unit 18 may be passed to any desired effect units as well, similar as those described for the first embodiments.
  • the second- song audio input data are passed through a second-song effect unit 34 in order to apply an audio effect, such as a high pass or low pass filter, or an EQ filter, and to output the result as second-song output data.
  • First-song and second-song output data may then be passed through a tempo matching unit 36 which detects a tempo (BPM value) of both songs and changes the tempo of at least one of the two songs (without changing its pitch) such that both songs have matching tempi.
  • Matching tempi means that the BPM value of one of the two songs equals the BPM value or a multiple of the BPM value of the other song.
  • Such tempo matching units are known in the art as such.
  • first-song and second-song output data may be routed into a mixing unit 38, in which they are mixed with one another to obtain mixed output data (mono or stereo) that contain a sum of both signals.
  • Mixing unit 38 may contain or may be connected to a crossfader, which can be manipulated by a user to assume a control position within a control range, wherein the crossfader sets a first volume level of the first-song output data and a second volume level of the second-song output data depending on the control position of the crossfading controller, such that the first volume level is maximum and the second volume level is minimum when the crossfading controller is at one end point of the control range, and the first volume level is minimum and the second volume level is maximum when the crossfading controller is at the other end point of the control range.
  • Mixing unit 38 then mixes (sums) the first-song and second-song output data according to the first volume level and the second volume level, respectively, to obtain mixed output data (mono or stereo).
  • the mixed output data may then be passed through a sum effect unit 40 to apply any further audio effect, if desired.
  • the output of the sum effect unit 40 may be denoted as playback data and may be played back by an output audio interface 42.
  • Output audio interface 42 may include and audio buffer and a digital to analog converter to generate a sound signal.
  • the playback data may be transmitted to another device for playback, storage or further processing.
  • FIGS 2 to 4 show layouts of control units of the device according to the first embodiment of the invention, which may be operated by a user to control the device. Elements shown in the layouts and described in the following may be displayed by a suitable display of the device controlled by a software running on the device. Alternatively or in addition, these layouts or parts thereof may be realized by hardware design, for example of a DJ device, and the control elements may be realized by control knobs, sliders, switches and so on.
  • an effect control unit 50 may comprise a plurality of effect control sections, for example three effect control sections 52-1, 52-2 and 52-3.
  • Each effect control section may comprise one or more control elements for controlling type, parameter and routing of audio effects.
  • the first effect control section 52-1 may comprise an on/off control element 54 which may be operated by a user to alternatively switch on or switch off the effect control section 52-1 , in particular to switch on or off the audio effect associated to this effect control section 52-1.
  • First effect control section 52-1 may also include an effect control element 56, which is adapted to allow a user to select one of a plurality of audio effects.
  • effect control element 56 may be implemented by a drop-down element or a list selection element or the like, or may open an effect browser or similar dialogue, which allows choosing a particular audio effect (effect type), or may be realized by a previous / next control button to step through the list of available audio effects and select an effect with each step.
  • an echo effect is selected as the audio effect of the first effect control section 52-1.
  • First effect control section 52-1 may further comprise a parameter control element 58, which is adapted to allow a user to set or modify or otherwise control at least one effect parameter of the audio effect that is selected by effect control element 56.
  • parameter control element 58 may control a timing of the echo, i.e. a time interval between the original sound and the echo sound.
  • the device of the present embodiment may contain a beat detection unit that detects the beat of the first-song audio input data.
  • a timing of the selected effect for example a timing of the echo effect, may then be set as particular fractions or multiples of the duration of a beat. This allows reducing the time required for the user to find an appropriate timing for the audio effect.
  • First effect control section 52-1 may further comprise a routing control element 60, which allows selecting one of the plurality of decomposed audio tracks obtained from decomposition unit 26.
  • routing control element 60 may allow a selection between the first audio track, the second audio track and the third audio track obtained from the decomposition unit 26 (e.g. vocal track, harmonic track and drums track).
  • the audio effect selected by effect control element 56 optionally influenced by the setting of parameter control element 58, will be routed to the selected decomposed audio track (only), for example to either the vocal track or the harmonic track or the drums track.
  • routing control element 60 may have another option “combined” which may be selected in order to route the selected audio effect to all decomposed tracks at the same time.
  • the second effect control section 52-2 and/or the third 52-3 and/or any further effect control section may contain similar control elements as those described above for the first effect control section 52-1 , i.e. an on/off control element, an effect control element, a parameter control element and/or a routing control element.
  • an on/off control element i.e. an on/off control element, an effect control element, a parameter control element and/or a routing control element.
  • multiple audio effects may be applied to the audio input data at the same time and may be controlled easily by a user.
  • Fig. 3 shows a combo effect control element 62 that may be used in the effect control unit 50 in addition to or as an alternative to the at least one effect control section 52-1, 52-2, or 52-3.
  • Combo effect control element 62 allows the control of multiple audio effects by a single control operation.
  • combo effect control element 62 is a push button that may be pushed by a user for alternative activation and deactivation. When activated, combo effect control element 62 applies two or more audio effects to two or more different decomposed audio tracks at the same time.
  • Fig. 3 shows a combo effect control element 62 that may be used in the effect control unit 50 in addition to or as an alternative to the at least one effect control section 52-1, 52-2, or 52-3.
  • Combo effect control element 62 allows the control of multiple audio effects by a single control operation.
  • combo effect control element 62 is a push button that may be pushed by a user for alternative activation and deactivation. When activated, combo effect control element 62 applies two or more audio effects to
  • pushing the combo effect control element 62 applies an echo effect to the vocal track, a gate effect to the harmonic track and a reverb effect to the drums track, wherein all effects are applied simultaneously and will be removed upon the next operation of the push button.
  • the effects may be applied simultaneously upon operation of the push button and may remain activated as long as the user presses the push button, while they will be removed when the push button is released.
  • Fig. 4 shows the layout of a device control unit suitable to control a device according to the first embodiment of the invention, in particular a device as schematically illustrated in Fig. 1.
  • First-song input unit 16 and second-song input unit 18 are shown in Fig. 4 as graphical representations of a song A and a song B, respectively.
  • waveforms of songs A and B are displayed.
  • Song-selection control elements 62A and 62B may be operated by a user to select song A as first-song audio input data and song B as second-song audio input data, respectively.
  • Songs A and B may be selected from an external audio source or from an online music distribution service for streaming via the Internet or from a local data storage device.
  • Device control unit 61 may further comprise play/stop control elements 64A, 64B for starting or stopping playback of song A and song B, respectively.
  • Device control unit 61 may have at least one volume control element for controlling the volume of song A and/or song B.
  • the volume control element may be configured as a cross-fader, which allows controlling the volumes of both songs A and B with only one single control element (not illustrated in Fig. 4).
  • Device control unit 61 may have individual cross- faders for the individual decomposed tracks, for example a vocal cross-fader 66V and/or a harmonic cross-fader 66H and/or a drums cross-fader 66D (and/or, as a further option, a bass cross-fader, not illustrated).
  • Each decomposed track cross fader 66V, 66H, 66D is adapted to be controlled between two end points, wherein at the first end point the volume of the decomposed track of song A is maximum and the volume of the corresponding decomposed track of song B is minimum, whereas at the second end point the volume of the decomposed track of song A is minimum and the volume of the corresponding decomposed track of song B is maximum.
  • the volumes of the decomposed tracks of songs A and B are each modified according to a predetermined transition function or a predetermined transition curve.
  • the function or curve may be fixed or it may be modified or selected from a plurality of predetermined functions or curves by operation of curve control elements 68V, 68H, 68D associated to the individual decomposed track cross-faders 66V, 66H, 66D, respectively.
  • Typical examples of DJ style crossfader curves are: intermediate, dipped, cut, constant power, etc.
  • Effect control unit 50 and/or device control unit 61 may be configured to control one or more of the units described above with reference to Fig. 1 , in particular the pitch shifting unit 28, the recombination unit 30 or the sum effect unit 40.
  • recombination unit 30 may comprise an effect unit which is adapted to apply one or more audio effects to the incoming first to third audio tracks according to the settings of effect control unit 50, before recombining the audio tracks.
  • recombination within recombination unit 30 may be performed based on the settings controlled by a user through device control unit 61, in particular based on the settings of decomposed track cross-faders 66V, 66H, 66D, respectively.
  • the device according to the second embodiment is a modification of the device of the first embodiment in such a way as the device of the second embodiment allows for even more flexibility or control options for a user as regards the application of different audio effects to different decomposed audio tracks.
  • the functions of the device of the first embodiment as shown in Fig. 1 may be realized as one possible operational mode of the device of the second embodiment, while the device of the second embodiment offers additional operational modes, as will be described in the following. Only the differences with respect to the first embodiment will be explained in more detail, while reference is to be made to the description above of the first embodiment with regard to all other features and functions.
  • an input interface 112 receives input audio data or audio signals, which are transferred to an input section 114.
  • Input section 114 is adapted to receive first-song audio input data through a first-song input unit 116 and second-song audio input data through a second-song input unit 118.
  • At least the first-song audio input data are further transferred to a decomposition unit 126 which is adapted to decompose the input data based on a trained neural network integrated within decomposition unit 126 such as to obtain a plurality of decomposed audio tracks of different timbres, for example a first audio track, a second audio track and a third audio track (for example a vocal track, a harmonic track and a drum track).
  • a decomposition unit 126 which is adapted to decompose the input data based on a trained neural network integrated within decomposition unit 126 such as to obtain a plurality of decomposed audio tracks of different timbres, for example a first audio track, a second audio track and a third audio track (for example a vocal track, a harmonic track and a drum track).
  • each of the decomposed tracks are input into an effect unit 128, which is configured to apply one or more audio effects to selected decomposed tracks among the received decomposed tracks, depending on the settings made by a user within effect control unit 50.
  • each of the decomposed tracks may receive either no audio effect, one audio effect or a plurality of different audio effects, which differ in either effect type or effect parameter.
  • the decomposed tracks that have passed effect unit 128 are then routed into recombination unit 130, in which they will be mixed together to obtain a single mixed audio signal.
  • the volume levels of the individual decomposed tracks based on which they are mixed within recombination unit 130 may be set through user control using control elements such as solo/mute, faders, etc. Specifically, the volume levels may be set through the decomposed track cross-faders 66V, 66H, 66D described above with reference to Fig. 4.
  • the audio signal output by recombination unit 130 may pass a first-song effect unit 132 for application of at least one additional audio effect. Afterwards, the audio signal will be routed towards a cross-fader/mixing unit 138 for mixing with the second-song audio input data.
  • Second-song audio input data may be received directly from input section 114 or they may be passed through a second-song effect unit 134 for application of at least one audio effect before mixing with the first-song output data.
  • the first- song output data and the second-song output data may be input into a tempo- matching unit 136 for synchronizing or matching the tempo/beat of the two songs, which allows for a smooth mixing of the two songs.
  • Mixed output data obtained from the cross-fader/mixing unit 138 may further be passed through a sum effect unit 140 for application of an additional audio effect, if desired or they may directly be forwarded towards output audio interface 142 for output.
  • device control unit 61 may be used to control the units of the device.
  • the device of the first embodiment as well as the device of the second embodiment may be implemented as a DJ device or as a DJ software, which can run on a computer, including a tablet or a smartphone, or on a standalone hardware device.
  • one or more of the elements and functions described above, in particular one or more of the above-described units may be implemented as a software module such as a software plug-in for integration into another audio processing software, such as a DJ software or a digital audio work station software (DAW).
  • DAW digital audio work station software
  • Device for processing music audio data comprising an input unit for receiving input audio data representing a first piece of music containing a mixture of predetermined musical timbres, a decomposition unit for decomposing the input audio data received from the input unit to generate at least a first audio track representing a first musical timbre selected from the predetermined musical timbres, and a second audio track representing a second musical timbre selected from the predetermined musical timbres, a first effect unit for applying a predetermined first audio effect to the first audio track, but not to the second audio track, a recombination unit for recombining the first audio track with the second audio track to obtain recombined audio data.
  • the first effect unit is a pitch scaling unit for changing the pitch of audio data of the first audio track while maintaining its playback duration.
  • the decomposition unit includes an Al system containing a trained neural network, wherein the neural network is trained to separate audio data of a predetermined musical timbre from audio data containing a mixture of different musical timbres.
  • Device of at least one of items 1 to 3 further comprising a storage unit adapted to store the output data, and/or playback unit adapted to play back the output data, and/or a mixing unit adapted to mix the output data with second-song output data.
  • Device of at least one of items 1 to 4 further comprising a first key detection unit for determining a first key of the first piece of music of the input audio data, - a second-song input unit for providing second-song input data representing a second piece of music, a second key detection unit for determining a second key of the second piece of music of the second-song audio data, a pitch shift calculation unit for determining a pitch shift value based on the first key and the second key, wherein the first effect unit is a pitch scaling unit adapted to shift the pitch of the first audio track by the pitch shift value, while maintaining the pitch of the second track. 6.
  • Device of item 5 further comprising a mixing unit adapted to mix output data obtained from the recombined audio data with second-song output data obtained from the second-song input data, such as to obtain mixed output data, and preferably a playback unit adapted to play back playback data obtained from the mixed output data.
  • Device of at least one of items 1 to 7, comprising a computer having a microprocessor, a storage unit an input interface and an output interface, wherein at least the input unit, the decomposition unit, the first effect unit and the recombination unit are formed by a software program running on the computer, wherein the software is preferably adapted to control the computer such as to carry out a method according to the first aspect of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

La présente invention concerne un procédé de traitement de données audio musicales, comprenant les étapes consistant à fournir des données audio d'entrée représentant un premier morceau de musique contenant un mélange de timbres musicaux prédéterminés, décomposer les données audio d'entrée afin de générer une ou plusieurs premières pistes audio représentant un premier timbre musical sélectionné parmi les timbres musicaux prédéterminés et une seconde piste audio représentant un second timbre musical sélectionné parmi les timbres musicaux prédéterminés, appliquer un premier effet audio prédéterminé à la première piste audio, n'appliquer aucun effet audio ou appliquer un second effet audio prédéterminé différent du premier effet audio à la seconde piste audio et recombiner la première piste audio avec la seconde piste audio afin d'obtenir des données audio recombinées.
PCT/EP2020/079275 2020-03-06 2020-10-16 Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte WO2021175461A1 (fr)

Priority Applications (12)

Application Number Priority Date Filing Date Title
EP20792654.4A EP4115629A1 (fr) 2020-03-06 2020-10-16 Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte
AU2020433340A AU2020433340A1 (en) 2020-03-06 2020-10-16 Method, device and software for applying an audio effect to an audio signal separated from a mixed audio signal
EP20800953.0A EP4115630A1 (fr) 2020-03-06 2020-11-09 Procédé, dispositif et logiciel pour commander la synchronisation de données audio
PCT/EP2020/081540 WO2021175464A1 (fr) 2020-03-06 2020-11-09 Procédé, dispositif et logiciel pour commander la synchronisation de données audio
JP2021035838A JP6926354B1 (ja) 2020-03-06 2021-03-05 オーディオデータの分解、ミキシング、再生のためのaiベースのdjシステムおよび方法
PCT/EP2021/055795 WO2021176102A1 (fr) 2020-03-06 2021-03-08 Remixage de musique à base d'ia : transformation de timbre et mise en correspondance de données audio mixées
EP21709063.8A EP4133748A1 (fr) 2020-03-06 2021-03-08 Remixage de musique à base d'ia: transformation de timbre et mise en correspondance de données audio mixées
US17/905,552 US20230120140A1 (en) 2020-03-06 2021-03-08 Ai based remixing of music: timbre transformation and matching of mixed audio data
JP2021137938A JP7136979B2 (ja) 2020-08-27 2021-08-26 オーディオエフェクトを適用するための方法、装置、およびソフトウェア
US17/459,450 US11462197B2 (en) 2020-03-06 2021-08-27 Method, device and software for applying an audio effect
US17/689,574 US11488568B2 (en) 2020-03-06 2022-03-08 Method, device and software for controlling transport of audio data
US17/747,473 US20220284875A1 (en) 2020-03-06 2022-05-18 Method, device and software for applying an audio effect

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
PCT/EP2020/056124 WO2021175455A1 (fr) 2020-03-06 2020-03-06 Procédé et dispositif de décomposition et de recombinaison de données audio et/ou de visualisation de données audio
EPPCT/EP2020/056124 2020-03-06
PCT/EP2020/057330 WO2021175456A1 (fr) 2020-03-06 2020-03-17 Procédé et dispositif de décomposition, de recombinaison et de lecture de données audio
EPPCT/EP2020/057330 2020-03-17
PCT/EP2020/062151 WO2021175457A1 (fr) 2020-03-06 2020-04-30 Décomposition en direct de données audio mixtes
EPPCT/EP2020/062151 2020-04-30
PCT/EP2020/065995 WO2021175458A1 (fr) 2020-03-06 2020-06-09 Transition de lecture d'une première à une seconde piste audio avec des fonctions de transition de signaux décomposés
EPPCT/EP2020/065995 2020-06-09
PCT/EP2020/074034 WO2021175460A1 (fr) 2020-03-06 2020-08-27 Procédé, dispositif et logiciel pour appliquer un effet audio, en particulier un changement de tonalité
EPPCT/EP2020/074034 2020-08-27

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/056124 Continuation-In-Part WO2021175455A1 (fr) 2020-03-06 2020-03-06 Procédé et dispositif de décomposition et de recombinaison de données audio et/ou de visualisation de données audio

Related Child Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2020/074034 Continuation-In-Part WO2021175460A1 (fr) 2020-03-06 2020-08-27 Procédé, dispositif et logiciel pour appliquer un effet audio, en particulier un changement de tonalité
US17/459,450 Continuation US11462197B2 (en) 2020-03-06 2021-08-27 Method, device and software for applying an audio effect

Publications (1)

Publication Number Publication Date
WO2021175461A1 true WO2021175461A1 (fr) 2021-09-10

Family

ID=77613920

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2020/074034 WO2021175460A1 (fr) 2020-03-06 2020-08-27 Procédé, dispositif et logiciel pour appliquer un effet audio, en particulier un changement de tonalité
PCT/EP2020/079275 WO2021175461A1 (fr) 2020-03-06 2020-10-16 Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/074034 WO2021175460A1 (fr) 2020-03-06 2020-08-27 Procédé, dispositif et logiciel pour appliquer un effet audio, en particulier un changement de tonalité

Country Status (1)

Country Link
WO (2) WO2021175460A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11488568B2 (en) * 2020-03-06 2022-11-01 Algoriddim Gmbh Method, device and software for controlling transport of audio data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015066204A1 (fr) * 2013-10-30 2015-05-07 Music Mastermind, Inc. Système et procédé d'amélioration d'une entrée audio, adaptation d'une entrée audio à une clé musicale et création de pistes d'harmonisation destinées à une entrée audio
US20180122403A1 (en) * 2016-02-16 2018-05-03 Red Pill VR, Inc. Real-time audio source separation using deep neural networks
WO2019229199A1 (fr) * 2018-06-01 2019-12-05 Sony Corporation Remixage adaptatif de contenu audio

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140018947A1 (en) * 2012-07-16 2014-01-16 SongFlutter, Inc. System and Method for Combining Two or More Songs in a Queue
US8847056B2 (en) * 2012-10-19 2014-09-30 Sing Trix Llc Vocal processing with accompaniment music input

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015066204A1 (fr) * 2013-10-30 2015-05-07 Music Mastermind, Inc. Système et procédé d'amélioration d'une entrée audio, adaptation d'une entrée audio à une clé musicale et création de pistes d'harmonisation destinées à une entrée audio
US20180122403A1 (en) * 2016-02-16 2018-05-03 Red Pill VR, Inc. Real-time audio source separation using deep neural networks
WO2019229199A1 (fr) * 2018-06-01 2019-12-05 Sony Corporation Remixage adaptatif de contenu audio

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CANO ESTEFANIA ET AL: "Musical Source Separation: An Introduction", IEEE SIGNAL PROCES SING MAGAZINE, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 36, no. 1, 24 December 2018 (2018-12-24), pages 31 - 40, XP011694891, ISSN: 1053-5888, [retrieved on 20181224], DOI: 10.1109/MSP.2018.2874719 *
GERARD ROMA ET AL: "MUSIC REMIXING AND UPMIXING USING SOURCE SEPARATION", PROCEEDINGS OF THE 2 ND AES WORKSHOP ON INTELLIGENT MUSIC PRODUCTION, 13 September 2016 (2016-09-13), XP055743124 *
JOHN F WOODRUFF ET AL: "Remixing Stereo Music With Score-Informed Source Separation", PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON MUSIC INFORMATION RETRIEVAL, 8 October 2006 (2006-10-08), XP055761326, DOI: 10.5281/zenodo.1414898 *
LEN VANDE VEIRE ET AL: "From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass", EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, BIOMED CENTRAL LTD, LONDON, UK, vol. 2018, no. 1, 24 September 2018 (2018-09-24), pages 1 - 21, XP021260918, DOI: 10.1186/S13636-018-0134-8 *
PRETET: "Singing Voice Separation: A study on training data", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP, 2019, pages 506 - 510, XP033566106, DOI: 10.1109/ICASSP.2019.8683555

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11488568B2 (en) * 2020-03-06 2022-11-01 Algoriddim Gmbh Method, device and software for controlling transport of audio data

Also Published As

Publication number Publication date
WO2021175460A1 (fr) 2021-09-10

Similar Documents

Publication Publication Date Title
US20210326102A1 (en) Method and device for determining mixing parameters based on decomposed audio data
US11347475B2 (en) Transition functions of decomposed signals
US11462197B2 (en) Method, device and software for applying an audio effect
US7952012B2 (en) Adjusting a variable tempo of an audio file independent of a global tempo using a digital audio workstation
JP3365354B2 (ja) 音声信号または楽音信号の処理装置
KR100270434B1 (ko) 생음성의 음역을 검출하여 하모니음성을 조절하는 가라오케 장치
US8198525B2 (en) Collectively adjusting tracks using a digital audio workstation
US6816833B1 (en) Audio signal processor with pitch and effect control
US11488568B2 (en) Method, device and software for controlling transport of audio data
JP7136979B2 (ja) オーディオエフェクトを適用するための方法、装置、およびソフトウェア
US20230335091A1 (en) Method and device for decomposing, recombining and playing audio data
US11875763B2 (en) Computer-implemented method of digital music composition
WO2021175461A1 (fr) Procédé, dispositif et logiciel permettant d'appliquer un effet audio à un signal audio séparé d'un signal audio mixte
JP6926354B1 (ja) オーディオデータの分解、ミキシング、再生のためのaiベースのdjシステムおよび方法
Özer et al. Piano concerto dataset (PCD): A multitrack dataset of piano concertos
Moralis Live popular Electronic music ‘performable recordings’
WO2023217352A1 (fr) Système de dj réactif pour la lecture et la manipulation de musique sur la base de niveaux d'énergie et de caractéristiques musicales
Ransom Use of the Program Ableton Live to Learn, Practice, and Perform Electroacoustic Drumset Works
Riordan Live Processing Into the 21st Century: Delay-Based Performance and Temporal Manipulation in the Music of Joel Ryan, Radiohead, and Sam Pluta
Rudi The just intonation automat–a musically adaptive interface
Logožar et al. The Music production of a rockabilly composition with addition of the big band brass sound
JPH10171475A (ja) カラオケ装置
Lucas Triple Synthesis
Laurello Cope for chamber ensemble and fixed electronics
Eisele Sound Design and Mixing in Reason

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20792654

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020792654

Country of ref document: EP

Effective date: 20221006

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020433340

Country of ref document: AU

Date of ref document: 20201016

Kind code of ref document: A