EP3389028A1 - Automatic music production from voice recording. - Google Patents

Automatic music production from voice recording. Download PDF

Info

Publication number
EP3389028A1
EP3389028A1 EP17165762.0A EP17165762A EP3389028A1 EP 3389028 A1 EP3389028 A1 EP 3389028A1 EP 17165762 A EP17165762 A EP 17165762A EP 3389028 A1 EP3389028 A1 EP 3389028A1
Authority
EP
European Patent Office
Prior art keywords
voice recording
arrangement
tempo
voice
aligned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP17165762.0A
Other languages
German (de)
French (fr)
Inventor
Filippo SUGAR
Jordi Janer
Roberto VERNETTI
Oscar Mayor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sugarmusic SpA
Original Assignee
Sugarmusic SpA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sugarmusic SpA filed Critical Sugarmusic SpA
Priority to EP17165762.0A priority Critical patent/EP3389028A1/en
Priority to US16/500,262 priority patent/US11087727B2/en
Priority to EP18714793.9A priority patent/EP3610477A1/en
Priority to PCT/EP2018/058983 priority patent/WO2018189082A1/en
Publication of EP3389028A1 publication Critical patent/EP3389028A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/18Selecting circuits
    • G10H1/26Selecting circuits for automatically producing a series of tones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/071Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/081Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/086Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects
    • G10H2210/245Ensemble, i.e. adding one or more voices, also instrumental voices
    • G10H2210/251Chorus, i.e. automatic generation of two or more extra voices added to the melody, e.g. by a chorus effect processor or multiple voice harmonizer, to produce a chorus or unison effect, wherein individual sounds from multiple sources with roughly the same timbre converge and are perceived as one
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/325Musical pitch modification
    • G10H2210/331Note pitch correction, i.e. modifying a note pitch or replacing it by the closest one in a given scale
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/341Rhythm pattern selection, synthesis or composition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/375Tempo or beat alterations; Music timing control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/375Tempo or beat alterations; Music timing control
    • G10H2210/391Automatic tempo adjustment, correction or control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/555Tonality processing, involving the key in which a musical piece or melody is played
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/241Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
    • G10H2240/251Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analog or digital, e.g. DECT GSM, UMTS

Definitions

  • the present invention generally relates to a system, device and method to process a voice and create a complete musical track.
  • a system and method to create a musical track from a raw input singing voice is disclosed.
  • the voice is aligned and tuned and synthetic musical instruments are added so that a complete song is created with voice and accompaniment.
  • the system comprises a client as a mobile device (for example a smartphone) for inputting voice and preferences of the user and a remote server device for processing the voice and creating the complete song to be send back to the client device.
  • a client as a mobile device (for example a smartphone) for inputting voice and preferences of the user
  • a remote server device for processing the voice and creating the complete song to be send back to the client device.
  • the voice processing is performed on a server and the resulting data is streamed to the mobile device and also mobile devices having relatively low computing power are usable to have a satisfying experience for the user.
  • Music signals are typically a combination of multiple sound sources instruments (e.g in an orchestra) or voices (e.g. in a choir) that play simultaneously following certain musical rules, e.g. harmony and rhythm.
  • the combination will be pleasant to a listener when the different sound sources follow these rules and are musically coherent (same tempo and musical key), avoiding issues such as timing desynchronization or dissonances among the different sound sources.
  • a method for processing a voice signal by a programmed system to create a song comprising the steps in the programmed system of acquiring an input singing voice recording; extracting a musical key and a Tempo from the singing voice recording; defining a tuning control and a timing control able to align the singing voice recording with the extracted musical key and Tempo; applying the tuning control and the timing control to the singing voice recording so that an aligned voice recording is obtained; generating an music accompaniment as function of the extracted musical key and Tempo and an arrangement database; mixing the aligned voice recording and the music accompaniment to obtain the song.
  • a server carrying out at least part of the method and able to be connected to voice input devices via internet connection is claimed.
  • a system carrying out the method and comprising a device having a user interface, voice input means to input the singing voice recording and play means to play the song.
  • figure 1 shows schematically an electronic system 10 carrying out a method for automatic music production from voice in accordance with the present invention.
  • the system 10 has to process and mix an input voice source (Voice Recording) 11 with a generated instrumental source (Musical Accompaniment) so that a complete song 12 is produced avoiding timing desynchronization and/or dissonance.
  • Voice and accompaniment need to be musically coherent to be mixed, otherwise music mix will sound unconsciously out of tune and out of synchronization.
  • the input voice can be acquired by a microphone or provided as audio file with a voice recording.
  • the musical quality of the input singing voice recording 11 could not be sufficient for a good result. Therefore, the system has to deal with input recordings that can be very badly sung, e.g. out of tune, unstable rhythm, etc.
  • a first voice processing block 13 executes a voice processing of the input voice to align the input voice to be coherent to specific tempo and key, in order to mix it with a music accompaniment which is generated on the basis of these specific tempo and key.
  • the voice processing block 13 comprises a voice analysis block 14 to provide an estimated musical key and tempo data 15 and tuning control 16 and timing control 17.
  • tuning control 16 and timing control 17 are used to transform the input voice 11 by means of a voice transformation block 19 so that the voice becomes coherent in tuning and timing to the estimated musical Key scale and Tempo. Specifically, it ensures that the start time of the voice phrases are synchronized to beat locations and that an auto-tune pitch effect is applied to fit the Musical Key.
  • the voice exits from the processing block 13 as an aligned voice recording 20 which is right in tune and tempo. Therefore, the musical quality of the singing voice recording is guaranteed.
  • Estimated key and tempo data 15 are also used to produce a right music accompaniment by an arranger block 18.
  • the arranger block 18 comprises for example an arrangement generator block 40
  • the arrangement generator block 40 receives the estimated key and tempo data 15 from the block 13 and arrangement score and audio stems data 21 from an arrangement database 22 and produces a corresponding music accompaniment 23.
  • the user selects a Music Theme and the arranger block renders a music accompaniment track 23 following instructions in the arranger score taken from the arranger database 22.
  • the aligned voice recording 20 and the music accompaniment 23 are mixed in an automatic mixing block 24 so that the final generated song 12 is produced.
  • the mixing process follows the instruction in the arrangement score, which establishes the time position of the aligned voice recording 20, the corresponding mixing levels and audio FX to be applied to obtain the final output mix.
  • the aligned voice recording 20 can be also repeated in time so to convert a recording for example of few second into a full-length track of around for example 2-3 minutes as a commercial pop song production.
  • this analysis consists in the analysis of the input user voice recording 11, extracting a number of 'musical descriptors' about its content, and estimating also the necessary alignment modifications in terms of tuning and timing.
  • the process can be advantageously divided in separate tuning and timing processes, marked as separate tuning block 25 and timing block 26.
  • the tuning block 25 extracts the pitch curve 30 of the voice input and estimates also a symbolic note transcription (sequence of musical notes as in a musical score or MIDI file) producing a symbolic notation 28 by means of an automatic melody transcription block 27. Then from the symbolic notation 28, an automatic key estimation block 29 estimates a musical key (e.g. A major) 31 of the input recording based on the note occurrences for a given musical scale.
  • a musical key e.g. A major
  • An auto-tune block 32 receives three inputs, the pitch curve 30 over time, the note transcription in form of symbolic notation 28 and the estimated musical key 31. Based on these data, the auto-tune block 32 computes the necessary pitch correction to be applied to the input voice in order to be sound tuned in a given musical key.
  • the output (Tuning Control 16) of the auto-tune module is a time-series containing for example the transposition values in semitone cents (1/100 semitone).
  • the estimated key 31 can be advantageously used as Key 15a for the arrangement generation.
  • the first step can be the estimation of vowel onsets 33.
  • the vowel onsets are a list of time values indicating the beginning of syllables. This step is performed in a onset detection block 37.
  • the tempo value in beats-per-minute is estimated in an automatic tempo estimation block 34.
  • the estimation in the block 34 is based on autocorrelation of an onset function time-series as shown for example in figure 3 by vertical broken lines on the input voice recording waveform, as in se known by the technician.
  • pauses and levels of the audio signal can be used to define Tempo value in the input voice recording.
  • the estimated Tempo 35 can be advantageously used as Tempo 15a for the arrangement generation.
  • a time-alignment block 36 computes the time-alignment correction to be applied as timing control 17 using the common time-series analysis method named Dynamic Time Warping (DTW).
  • DTW Dynamic Time Warping
  • This method can be used to align two sequences of values.
  • We align the onset function to a function containing values spaced at sub-multiples of the beat locations (sixteenth note, eighth note, and quarter note). This allows aligning the peaks of the onsets to a tempo-quantized grid.
  • the output is a time mapping function (Timing Control 17) as a sequence of pairs, where a input times sequence has a corresponding output time value ⁇ time_in, time_out>.
  • the tuning control 16 and the timing control 17 it is possible to transform the input voice recording to match the target tempo and key by the voice transformation block 19 in which an in se known algorithm manipulates the voice recording driven by the tuning control 16 and the timing control 17.
  • the algorithm modifies the fundamental frequency and duration of the voice signal elements in fine detail.
  • the result is an output audio signal, for example stored as a WAV file.
  • a pitch-shifting block 38 is commanded by the tuning control 16 so that the algorithm transposes the frequencies of the input voice recording and, after, a time-scaling block 39 is commanded by the timing control 17 so that the algorithm scales in time the duration of each elements of the input voice recording and the aligned voice recording is finally produced and it can be mixed with the corresponding music accompaniment 23.
  • FIG. 5 A more detailed example of the arrangement generator block 40 creating the music arrangement 23 is shown in figure 5 .
  • the arrangement generator block 40 produces an instrumental musical accompaniment 23 that can be mixed with the aligned voice recording 20.
  • the musical accompaniment 23, or arrangement can be generated based on instructions (arrangement score 21 a) and audio or arrangement stems 21 b (e.g. audio loops) stored in an arrangement database 22.
  • Each arrangement (score and stems) has an original tempo and key (e.g 90 bpm and A major).
  • Arrangement score 21 a and arrangement stems 21 b form the above mentioned arrangement score and audio stems data 21.
  • the arranger scores can be score text files (i.e. a sequence of instructions in text files) and the audio stems can be audio files.
  • the arrangement generator block 40 receives the values of Tempo and Key data 15 and loads in load score block 50 one arrangement score 21a (for example an instructions text file) from the database 22 that is appropriate to the specified Tempo and Key. For example, if the estimated user voice tempo is 88bpm, we will load the Arrangement Score that has the closest original tempo (e.g. 90 bpm).
  • the Arrangement Score 21a contain detailed instructions about the tracks to be rendered from audio stems, which are specified with unique IDs (for example, electricguitar01, drums05, brass06), and the exact begin and end times, given for example in bars:beats.
  • Next step is to load in the load stem block 51 the necessary audio stems 21b from the database 22. Starting from loaded score and audio stem, a render arrangement block 52 renders the arrangement.
  • the rendering step can be seen as a virtual multitrack session in a typical Digital Audio Workstation (e.g. ProTools, Cubase), where we have the audio stems located in different tracks over time.
  • the block 52 mixes the different stems over time, generating an output audio signal, for example a stereo audio signal.
  • the last step in the arrangement generator block 40 is to time-scale the music accompaniment to exactly match the voice recording tempo 15a, (for example 88 bpm).
  • the time scaling block 53 receives in input the tempo 15a as a target tempo, the arrangement tempo 54 (from the arrangement score) and the music arrangement 55 from the render arrangement block 52, and outputs the music accompaniment 23 matched the aligned voice recording 20. It is possible to use an existing polyphonic audio time-scaling algorithm to store the output Music accompaniment 23 as, for example, a Music Accompaniment file.
  • the arrangement file in the example is a score with mixing instructions, similar to a mulitrack view in a DAW (Digital Audio Workstation), where the multiple instrumental stems and vocal track excerpts are combined.
  • DAW Digital Audio Workstation
  • the arrangement score can be a table with in each rows an instrument with a sequence of note and duration (as multiple of a basic length), as clear in figure 6 .
  • the arrangement score can also comprise some variations.
  • the final mixing block 24 combines the aligned voice recording 20 with the music accompaniment track 23.
  • this block 24 estimates automatically the mixing levels of the music and voice input to produce a well-balanced downmix as result.
  • the mixing block 24 can comprise for example two steps or blocks 60 and 64 in sequence. However, if preferable, only one step or block 60 or 64 can be present or used.
  • the first step is eventually to generate additional effects on the aligned voice recording 20.
  • additional vocal tracks can be generated in a block 60 starting from the aligned voice recording 20.
  • the purpose is to build a more complex downmix with harmonies, and other audio effects as typically used in commercial music production.
  • the first block 60 is a vocal FX track generation block.
  • This block 60 can comprise a FX track block 61 and a vocal mix and FX block 62.
  • the FX track block 61 creates FX tracks, for example using adapted instructions found in the Arrangement Score 21 a.
  • the FX track block 61 can create effects as harmonization, delay, edit, etc.
  • these FX tracks are mixed together with the input Aligned Voice Recording using eventually other effects as compression and reverb in the vocal mix and FX block 62.
  • the processed voice recording 63 produced by the vocal FX track generation block 60 is applied to the final block 64 or Downmix block 64.
  • This block 64 comprises a level adjustment block 65 in which the levels (loudness) of both inputs the vocal track 63 and the music accompaniment 23 are estimated. Based on this values the block 64 applies gains to obtain the desired balance, advantageously specified in the Arrangement Score 21 a and the balanced signals are mixed in the mixing block 66.
  • the output from this mixing block 66 is the final full-length song 12.
  • the effects applied can be automatically selected (for example, as function of the selected accompaniment score) or selected by the user.
  • a user records a Vocal Recording (sing voice melody) using a device with microphone (for example a smartphone) or providing an audio file with a voice recording.
  • the user can record, playback the recording, discard and repeat the recording if the user thinks it to be not satisfactory.
  • the recording can be short (for example, 10- 20 seconds).
  • the user can also eventually select a "Music Theme” (e.g. musical genre/style) for the output Produced Song.
  • the Music Theme can be selected from a list of candidates, which can be available on an app GUI (for example, a combo box).
  • the Voice Recording is automatically processed and mixed on top of a Music Accompaniment (instrumental music track) according to the selected Music Theme as above disclosed.
  • the user listens to the generated song and can repeat the process and go back to initial step to try a different voice recording or selecting a different music theme.
  • the method according to the invention can be implemented on a suitable device as you can now easily image after the above disclosure of the invention.
  • the device can be a device specifically made or it can be a suitable known device of the type programmable for various application and properly programmed to implement the invention, as it will be easily understands by the technician when he reads the present description of the invention.
  • the device may be a tablet PC, a smartphone, laptop, notebook, computer desktop, etc.
  • the device (for example a device 70 in figures 8 and 9 ) can comprise a user interface 72, voice input means 71, 77 to input the singing voice recording 11 and player means 75, 76 to play the song 12.
  • the device can also comprise known processing means (a microprocessor, memory, etc.) programmed to process the audio signal as above disclosed so that the inventive method is operated.
  • Input means can comprise a microphone and/or a memory in which a pre-recorded singing voice recording is stored.
  • Processing may also be divided up, being performed partly in a remote computer and partly in the device.
  • the device can be used as input voice device and the processing of the voice can be operate on a remote unit of the system.
  • a remote unit By assigning all or part of the data processing to a remote unit it is possible to obtain a portable device which may be, for example, less powerful or therefore less costly.
  • the method according to the invention can be implemented at least in part with an APP or software installed in a portable device.
  • a client-server architecture can also be used, so that other parts of the method eventually can be implemented with a software installed in a remote server.
  • client-server architecture can be particularly useful in case of a device having relatively low computing power and/or a local memory too small for the arrangement database.
  • a smartphone or other device
  • the server carries out the complete analysis, transformation, arrangement generation, mixing and send back to the smartphone (or other device) the generated song so that the smartphone (or other device) reproduces the song.
  • a client-server architecture can be useful to centralize the voice processing and/or the accompaniment generation for many client device.
  • Figure 8 depicts an example of an client-server architecture according to the present invention.
  • the Client software on the client device 70 (for example a smartphone or like) records the voice (for example user's voice by means of a microphone 71) and its internal recorder or store circuits 77 and acquires the user's preferences 81 by means of a user interface 72 (for example a display and a keyboard or a touchscreen).
  • a user interface 72 for example a display and a keyboard or a touchscreen.
  • the client device 70 uploads the recording and the user's preferences to a server 73 using standard internet connection 74.
  • the Server software on the server 73 carries out the voice processing, arrangement generation and mixing by means of the voice processing block 13 and the arranger block 18 as above disclosed and sends the result, for example as an audio file, back to the client via the standard internet connection 74.
  • the generated song can be also published on the web, for example as public URL 78 and/or audio file (for example, a mp3 file).
  • the client uses its play means (for example well known internal audio circuits 75) to reproduce the generated song via a speaker or headphone 76.
  • play means for example well known internal audio circuits 75
  • the method implementation can be partially transferred to the client, if the client device is sufficiently powerful. This would transfer part of the computational load from the Server to the Client, for example reducing the necessity of powerful servers, internet traffic and the associated costs.
  • Figure 9 depicts other example of an client-server architecture according to the present invention where part of the computational load is transferred into the client device 70.
  • the voice processing block 13 can be divided in two part, a first part 13a in the client and a second part 13b in the server.
  • the two parts 13a and 13b communicates via the internet connection 74 exchanging corresponding date and messages 79.
  • the first part 13a can comprise voice analysis and the second part 13b can comprise voice transformation and date and messages 79 comprise the tuning and timing controls.
  • the arranger block 18 can be divided in two part too, a first part 18a in the client and a second part 18b in the server.
  • the two parts 18a and 18b communicates via the internet connection 74.
  • the two parts 18a and 18b communicates via the internet connection 74 exchanging corresponding date and messages 80.
  • the first part 18a can comprise automatic mixing and the second part 18b can comprise the arrangement generator block connected to the arrangement database in the server.
  • Date and messages 80 can comprise music accompaniment
  • the entire voice processing block 13 could be transferred into the client and only the arrangement generator block being into the server so that the music accompaniment is sent to the client and the client mixes the received music accompaniment with the local produced aligned voice recording.
  • Client-server architecture is also useful to have a large and high quality arrangement database.
  • an arrangement database 22 on a server can be easily maintained up-to date and very large number of instruments of high musical quality can be stored in the database.
  • many device 70 can be connected to one server and the server can advantageously attend to many device 70.
  • a web site connected to the server 73 could be used to publish generated songs so that the musical productions of the users can be spread among the users.
  • the web site can be also a web interface permitting to the users to create professional-like musical tracks using the method according to the invention and publish the generated songs.
  • the system according to the invention can be self contained in a mobile device other in a client-server architecture.
  • multiple users each with its own device, can interact with each other.
  • the method according to the present invention can be carried out with other devices and elements per se well-known and easily imaginable by the technician, and which can be appropriately programmed or adapted to perform the method of the invention.
  • Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure.
  • the method and the system or device may include other facilities for the user, as now easily understandable for the technician on the basis of the present description of the principles of the invention.
  • the system can process the data according to a locally stored set of user preferences and/or show in visual graphical manner the voice analysis, accompaniment generation, mixing, etc..

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

A method for processing a voice signal by an electronic system to create a song is disclosed. The method comprises the steps in the electronic system of acquiring an input singing voice recording (11); extracting a musical key (15b) and a Tempo (15a) from the singing voice recording (11); defining a tuning control (16) and a timing control (17) able to align the singing voice recording (11) with the extracted musical key (15b) and Tempo (15a); applying the tuning control (16) and the timing control (17) to the singing voice recording (11) so that an aligned voice recording (20) is obtained. Next, the method comprises the step of generating an music accompaniment (23) as function of the extracted musical key (15b) and Tempo (15a) and an arrangement database (22) and mixing the aligned voice recording (20) and the music accompaniment (23) to obtain the song (12). A system, a server and a device are also disclosed.

Description

  • The present invention generally relates to a system, device and method to process a voice and create a complete musical track. In particular, a system and method to create a musical track from a raw input singing voice is disclosed. According to the invention the voice is aligned and tuned and synthetic musical instruments are added so that a complete song is created with voice and accompaniment.
  • In particular, a method for automatic music production from voice recording is disclosed.
  • Advantageously, the system comprises a client as a mobile device (for example a smartphone) for inputting voice and preferences of the user and a remote server device for processing the voice and creating the complete song to be send back to the client device. In this manner, at least part of the voice processing is performed on a server and the resulting data is streamed to the mobile device and also mobile devices having relatively low computing power are usable to have a satisfying experience for the user.
  • Music signals are typically a combination of multiple sound sources instruments (e.g in an orchestra) or voices (e.g. in a choir) that play simultaneously following certain musical rules, e.g. harmony and rhythm. The combination will be pleasant to a listener when the different sound sources follow these rules and are musically coherent (same tempo and musical key), avoiding issues such as timing desynchronization or dissonances among the different sound sources.
  • Usually when a no-professional singer tries to sing a song he gets wrong tune and beat. Useful for a better result may be trying to sing on an existing musical base. In the prior art, many software exist in which a musical base is automatically performed and a user could try to sing on the base so that the user's voice is recorded and mixed with the base. However, if the singer sings badly then the result is very poor.
  • Other software try to derive the base directly by the melody sung by the user, but the pitch and tempo errors of the user often does not allow to obtain a satisfactory result. Software have been proposed in which it is attempted to remedy these defects, but they do not allow still to have a really satisfactory result and the mix of voice and music derived from the sung melody does not provide a complete musical track really usable as a professional-like musical track.
  • It is a general aim of the present invention to have a method for processing a singing voice so that an appropriate music is created starting from the voice, the problem in the singing voice are corrected and the voice and music are mixed to obtain a complete musical track or song.
  • In view of the above aim, solutions are proposed according to any one claim of the present invention.
  • According to the invention, it is claimed a method for processing a voice signal by a programmed system to create a song, wherein the method comprising the steps in the programmed system of acquiring an input singing voice recording; extracting a musical key and a Tempo from the singing voice recording; defining a tuning control and a timing control able to align the singing voice recording with the extracted musical key and Tempo; applying the tuning control and the timing control to the singing voice recording so that an aligned voice recording is obtained; generating an music accompaniment as function of the extracted musical key and Tempo and an arrangement database; mixing the aligned voice recording and the music accompaniment to obtain the song.
  • Moreover, according to other aspect of the invention, a server carrying out at least part of the method and able to be connected to voice input devices via internet connection is claimed.
  • Moreover, according to other aspect of the invention, it is claimed a system carrying out the method and comprising a device having a user interface, voice input means to input the singing voice recording and play means to play the song.
  • For better clarifying the innovative principles of the present invention and the advantages it offers as compared with the known art, a possible embodiment applying said principles will be described hereinafter by way of non-limiting example, with the aid of the accompanying drawings.
  • In the drawings:
    • Figure 1 is a block diagram of a system or method in accordance with the present invention;
    • Figure 2 is a block diagram of a part of voice analysis and alignment according to an aspect of the present invention;
    • Figure 3 is a graphic of a first processing of the voice according to an aspect of the present invention;
    • Figure 4 is a block diagram of a part of voice transformation according to an aspect of the present invention;
    • Figure 5 is a block diagram of an arrangement generator according to an aspect of the present invention;
    • Figure 6 is an example of an possible arrangement score of the present invention;
    • Figure 7 is a block diagram of an automatic mixing according to an aspect of the present invention;
    • Figure 8 is a block diagram of a possible client-server architecture of the system according to the invention;
    • Figure 9 is a block diagram of a possible other client-server architecture of the system according to the invention.
  • With reference to the figures, figure 1 shows schematically an electronic system 10 carrying out a method for automatic music production from voice in accordance with the present invention. In substance, the system 10 has to process and mix an input voice source (Voice Recording) 11 with a generated instrumental source (Musical Accompaniment) so that a complete song 12 is produced avoiding timing desynchronization and/or dissonance. Voice and accompaniment need to be musically coherent to be mixed, otherwise music mix will sound unpleasantly out of tune and out of synchronization.
  • For example, the input voice can be acquired by a microphone or provided as audio file with a voice recording.
  • However, the musical quality of the input singing voice recording 11 could not be sufficient for a good result. Therefore, the system has to deal with input recordings that can be very badly sung, e.g. out of tune, unstable rhythm, etc.
  • According to the method, a first voice processing block 13 executes a voice processing of the input voice to align the input voice to be coherent to specific tempo and key, in order to mix it with a music accompaniment which is generated on the basis of these specific tempo and key.
  • The voice processing block 13 comprises a voice analysis block 14 to provide an estimated musical key and tempo data 15 and tuning control 16 and timing control 17.
  • As it will be further described below, tuning control 16 and timing control 17 are used to transform the input voice 11 by means of a voice transformation block 19 so that the voice becomes coherent in tuning and timing to the estimated musical Key scale and Tempo. Specifically, it ensures that the start time of the voice phrases are synchronized to beat locations and that an auto-tune pitch effect is applied to fit the Musical Key.
  • In this manner, the voice exits from the processing block 13 as an aligned voice recording 20 which is right in tune and tempo. Therefore, the musical quality of the singing voice recording is guaranteed.
  • Estimated key and tempo data 15 (or Tempo 15a and Key 15b) are also used to produce a right music accompaniment by an arranger block 18. The arranger block 18 comprises for example an arrangement generator block 40
  • The arrangement generator block 40 receives the estimated key and tempo data 15 from the block 13 and arrangement score and audio stems data 21 from an arrangement database 22 and produces a corresponding music accompaniment 23. For example, the user selects a Music Theme and the arranger block renders a music accompaniment track 23 following instructions in the arranger score taken from the arranger database 22.
  • Thereafter, the aligned voice recording 20 and the music accompaniment 23 are mixed in an automatic mixing block 24 so that the final generated song 12 is produced. For example, the mixing process follows the instruction in the arrangement score, which establishes the time position of the aligned voice recording 20, the corresponding mixing levels and audio FX to be applied to obtain the final output mix. If necessary, the aligned voice recording 20 can be also repeated in time so to convert a recording for example of few second into a full-length track of around for example 2-3 minutes as a commercial pop song production.
  • Possible voice analysis process according to the invention is shown in more detail in figure 2.
  • As it will be further described below, this analysis consists in the analysis of the input user voice recording 11, extracting a number of 'musical descriptors' about its content, and estimating also the necessary alignment modifications in terms of tuning and timing.
  • In substance, the process can be advantageously divided in separate tuning and timing processes, marked as separate tuning block 25 and timing block 26.
  • For example, for the tuning correction the tuning block 25 extracts the pitch curve 30 of the voice input and estimates also a symbolic note transcription (sequence of musical notes as in a musical score or MIDI file) producing a symbolic notation 28 by means of an automatic melody transcription block 27. Then from the symbolic notation 28, an automatic key estimation block 29 estimates a musical key (e.g. A major) 31 of the input recording based on the note occurrences for a given musical scale.
  • An auto-tune block 32 receives three inputs, the pitch curve 30 over time, the note transcription in form of symbolic notation 28 and the estimated musical key 31. Based on these data, the auto-tune block 32 computes the necessary pitch correction to be applied to the input voice in order to be sound tuned in a given musical key. The output (Tuning Control 16) of the auto-tune module is a time-series containing for example the transposition values in semitone cents (1/100 semitone). The estimated key 31 can be advantageously used as Key 15a for the arrangement generation.
  • For the timing correction, the first step can be the estimation of vowel onsets 33. The vowel onsets are a list of time values indicating the beginning of syllables. This step is performed in a onset detection block 37.
  • The tempo value in beats-per-minute is estimated in an automatic tempo estimation block 34. The estimation in the block 34 is based on autocorrelation of an onset function time-series as shown for example in figure 3 by vertical broken lines on the input voice recording waveform, as in se known by the technician. In substance, pauses and levels of the audio signal can be used to define Tempo value in the input voice recording.
  • The estimated Tempo 35 can be advantageously used as Tempo 15a for the arrangement generation.
  • Starting from the estimated tempo 35 and the list of onset 33, a time-alignment block 36 computes the time-alignment correction to be applied as timing control 17 using the common time-series analysis method named Dynamic Time Warping (DTW). This method can be used to align two sequences of values. We align the onset function to a function containing values spaced at sub-multiples of the beat locations (sixteenth note, eighth note, and quarter note). This allows aligning the peaks of the onsets to a tempo-quantized grid. The output is a time mapping function (Timing Control 17) as a sequence of pairs, where a input times sequence has a corresponding output time value <time_in, time_out>.
  • Once we have obtained from the voice analysis block 14 the tuning control 16 and the timing control 17, it is possible to transform the input voice recording to match the target tempo and key by the voice transformation block 19 in which an in se known algorithm manipulates the voice recording driven by the tuning control 16 and the timing control 17. In substance, the algorithm modifies the fundamental frequency and duration of the voice signal elements in fine detail. The result is an output audio signal, for example stored as a WAV file.
  • For example, with reference to the figure 4, first, a pitch-shifting block 38 is commanded by the tuning control 16 so that the algorithm transposes the frequencies of the input voice recording and, after, a time-scaling block 39 is commanded by the timing control 17 so that the algorithm scales in time the duration of each elements of the input voice recording and the aligned voice recording is finally produced and it can be mixed with the corresponding music accompaniment 23.
  • A more detailed example of the arrangement generator block 40 creating the music arrangement 23 is shown in figure 5.
  • As above disclosed, the arrangement generator block 40 produces an instrumental musical accompaniment 23 that can be mixed with the aligned voice recording 20. For example, the musical accompaniment 23, or arrangement, can be generated based on instructions (arrangement score 21 a) and audio or arrangement stems 21 b (e.g. audio loops) stored in an arrangement database 22. Each arrangement (score and stems) has an original tempo and key (e.g 90 bpm and A major). Arrangement score 21 a and arrangement stems 21 b form the above mentioned arrangement score and audio stems data 21.
  • For example, the arranger scores can be score text files (i.e. a sequence of instructions in text files) and the audio stems can be audio files.
  • As shown in figure 5, the arrangement generator block 40 receives the values of Tempo and Key data 15 and loads in load score block 50 one arrangement score 21a (for example an instructions text file) from the database 22 that is appropriate to the specified Tempo and Key. For example, if the estimated user voice tempo is 88bpm, we will load the Arrangement Score that has the closest original tempo (e.g. 90 bpm). The Arrangement Score 21a contain detailed instructions about the tracks to be rendered from audio stems, which are specified with unique IDs (for example, electricguitar01, drums05, brass06), and the exact begin and end times, given for example in bars:beats. Next step is to load in the load stem block 51 the necessary audio stems 21b from the database 22. Starting from loaded score and audio stem, a render arrangement block 52 renders the arrangement.
  • The rendering step can be seen as a virtual multitrack session in a typical Digital Audio Workstation (e.g. ProTools, Cubase), where we have the audio stems located in different tracks over time. The block 52 mixes the different stems over time, generating an output audio signal, for example a stereo audio signal. The last step in the arrangement generator block 40 is to time-scale the music accompaniment to exactly match the voice recording tempo 15a, (for example 88 bpm). The time scaling block 53 receives in input the tempo 15a as a target tempo, the arrangement tempo 54 (from the arrangement score) and the music arrangement 55 from the render arrangement block 52, and outputs the music accompaniment 23 matched the aligned voice recording 20. It is possible to use an existing polyphonic audio time-scaling algorithm to store the output Music accompaniment 23 as, for example, a Music Accompaniment file.
  • An arrangement score can be designed in many different manner as it is clear at the technician by the above explanation.
  • In figure 6 an example of a possible arrangement score file is shown. The arrangement file in the example is a score with mixing instructions, similar to a mulitrack view in a DAW (Digital Audio Workstation), where the multiple instrumental stems and vocal track excerpts are combined. For example, it can be stored in XLSX (MS Excel format), or a CSV (Comma separated value) format. In substance, the arrangement score can be a table with in each rows an instrument with a sequence of note and duration (as multiple of a basic length), as clear in figure 6. The arrangement score can also comprise some variations.
  • The final mixing block 24 combines the aligned voice recording 20 with the music accompaniment track 23. Advantageously, this block 24 estimates automatically the mixing levels of the music and voice input to produce a well-balanced downmix as result.
  • As shown in figure 7, the mixing block 24 can comprise for example two steps or blocks 60 and 64 in sequence. However, if preferable, only one step or block 60 or 64 can be present or used.
  • The first step is eventually to generate additional effects on the aligned voice recording 20. For example, additional vocal tracks (VoiceFX tracks) can be generated in a block 60 starting from the aligned voice recording 20. The purpose is to build a more complex downmix with harmonies, and other audio effects as typically used in commercial music production.
  • Advantageously, the first block 60 is a vocal FX track generation block. This block 60 can comprise a FX track block 61 and a vocal mix and FX block 62. The FX track block 61 creates FX tracks, for example using adapted instructions found in the Arrangement Score 21 a. For example, the FX track block 61 can create effects as harmonization, delay, edit, etc.
  • Next, these FX tracks are mixed together with the input Aligned Voice Recording using eventually other effects as compression and reverb in the vocal mix and FX block 62. The processed voice recording 63 produced by the vocal FX track generation block 60 is applied to the final block 64 or Downmix block 64. This block 64 comprises a level adjustment block 65 in which the levels (loudness) of both inputs the vocal track 63 and the music accompaniment 23 are estimated. Based on this values the block 64 applies gains to obtain the desired balance, advantageously specified in the Arrangement Score 21 a and the balanced signals are mixed in the mixing block 66. The output from this mixing block 66 is the final full-length song 12.
  • The effects applied can be automatically selected (for example, as function of the selected accompaniment score) or selected by the user.
  • At this point it is apparent how the intended purposes are achieved and how a method according to the present invention can be implemented.
  • Using the method as above disclose with reference to the electronic system 10 of the present invention, a user records a Vocal Recording (singing voice melody) using a device with microphone (for example a smartphone) or providing an audio file with a voice recording. The user can record, playback the recording, discard and repeat the recording if the user thinks it to be not satisfactory. The recording can be short (for example, 10- 20 seconds). Moreover, the user can also eventually select a "Music Theme" (e.g. musical genre/style) for the output Produced Song. For example, the Music Theme can be selected from a list of candidates, which can be available on an app GUI (for example, a combo box). When the recording and the optional selection of theme are completed, the user starts the audio processing.
  • The Voice Recording is automatically processed and mixed on top of a Music Accompaniment (instrumental music track) according to the selected Music Theme as above disclosed.
  • Finally, the user listens to the generated song and can repeat the process and go back to initial step to try a different voice recording or selecting a different music theme.
  • The method according to the invention can be implemented on a suitable device as you can now easily image after the above disclosure of the invention. The device can be a device specifically made or it can be a suitable known device of the type programmable for various application and properly programmed to implement the invention, as it will be easily understands by the technician when he reads the present description of the invention. For example, the device may be a tablet PC, a smartphone, laptop, notebook, computer desktop, etc. In substance, the device (for example a device 70 in figures 8 and 9) can comprise a user interface 72, voice input means 71, 77 to input the singing voice recording 11 and player means 75, 76 to play the song 12. The device can also comprise known processing means (a microprocessor, memory, etc.) programmed to process the audio signal as above disclosed so that the inventive method is operated. Input means can comprise a microphone and/or a memory in which a pre-recorded singing voice recording is stored.
  • The general architecture of such a device is per se well known and easily imaginable by the technician. Therefore, it is not further described or shown herewith.
  • Processing may also be divided up, being performed partly in a remote computer and partly in the device. If preferred, the device can be used as input voice device and the processing of the voice can be operate on a remote unit of the system. By assigning all or part of the data processing to a remote unit it is possible to obtain a portable device which may be, for example, less powerful or therefore less costly.
  • In any case, the method according to the invention can be implemented at least in part with an APP or software installed in a portable device.
  • A client-server architecture can also be used, so that other parts of the method eventually can be implemented with a software installed in a remote server.
  • For example, client-server architecture can be particularly useful in case of a device having relatively low computing power and/or a local memory too small for the arrangement database. For example, a smartphone (or other device) can execute a client program in which is implemented the user interface and an audio pre-processing for sending the user's preferences and the voice recording to a server. The server carries out the complete analysis, transformation, arrangement generation, mixing and send back to the smartphone (or other device) the generated song so that the smartphone (or other device) reproduces the song. Moreover, a client-server architecture can be useful to centralize the voice processing and/or the accompaniment generation for many client device.
  • Figure 8 depicts an example of an client-server architecture according to the present invention.
  • The Client software on the client device 70 (for example a smartphone or like) records the voice (for example user's voice by means of a microphone 71) and its internal recorder or store circuits 77 and acquires the user's preferences 81 by means of a user interface 72 (for example a display and a keyboard or a touchscreen). In a possible client server architecture, the client device 70 uploads the recording and the user's preferences to a server 73 using standard internet connection 74.
  • The Server software on the server 73 carries out the voice processing, arrangement generation and mixing by means of the voice processing block 13 and the arranger block 18 as above disclosed and sends the result, for example as an audio file, back to the client via the standard internet connection 74. The generated song can be also published on the web, for example as public URL 78 and/or audio file (for example, a mp3 file).
  • The client uses its play means (for example well known internal audio circuits 75) to reproduce the generated song via a speaker or headphone 76.
  • In order to make the system more efficient and scalable when used by hundreds users simultaneously, the method implementation can be partially transferred to the client, if the client device is sufficiently powerful. This would transfer part of the computational load from the Server to the Client, for example reducing the necessity of powerful servers, internet traffic and the associated costs.
  • Figure 9 depicts other example of an client-server architecture according to the present invention where part of the computational load is transferred into the client device 70.
  • In substance, the voice processing block 13 can be divided in two part, a first part 13a in the client and a second part 13b in the server. The two parts 13a and 13b communicates via the internet connection 74 exchanging corresponding date and messages 79. For example, the first part 13a can comprise voice analysis and the second part 13b can comprise voice transformation and date and messages 79 comprise the tuning and timing controls.
  • The arranger block 18 can be divided in two part too, a first part 18a in the client and a second part 18b in the server. The two parts 18a and 18b communicates via the internet connection 74. The two parts 18a and 18b communicates via the internet connection 74 exchanging corresponding date and messages 80. For example, the first part 18a can comprise automatic mixing and the second part 18b can comprise the arrangement generator block connected to the arrangement database in the server. Date and messages 80 can comprise music accompaniment
  • However, other distribution of the computational load can be used if it is preferable for example to minimize the computational load in mobile devices or to minimize data exchange.
  • For example, the entire voice processing block 13 could be transferred into the client and only the arrangement generator block being into the server so that the music accompaniment is sent to the client and the client mixes the received music accompaniment with the local produced aligned voice recording.
  • Client-server architecture is also useful to have a large and high quality arrangement database. In fact, an arrangement database 22 on a server can be easily maintained up-to date and very large number of instruments of high musical quality can be stored in the database. Moreover, many device 70 can be connected to one server and the server can advantageously attend to many device 70.
  • In any case, a web site connected to the server 73 could be used to publish generated songs so that the musical productions of the users can be spread among the users.
  • In this case, the web site can be also a web interface permitting to the users to create professional-like musical tracks using the method according to the invention and publish the generated songs.
  • Obviously, the above description of an embodiment applying the innovative principles of the present invention is provided by way of example of these innovative principles and must therefore not be regarded as limiting the scope of the rights claimed herein. For example, it should be noted that the system according to the invention can be self contained in a mobile device other in a client-server architecture. However, thanks to a system connected to a network, multiple users, each with its own device, can interact with each other. The method according to the present invention can be carried out with other devices and elements per se well-known and easily imaginable by the technician, and which can be appropriately programmed or adapted to perform the method of the invention. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure.
  • Obviously, the method and the system or device may include other facilities for the user, as now easily understandable for the technician on the basis of the present description of the principles of the invention. For example the system can process the data according to a locally stored set of user preferences and/or show in visual graphical manner the voice analysis, accompaniment generation, mixing, etc..

Claims (16)

  1. Method for processing a voice signal by an electronic system to create a song, wherein the method comprising the steps in the electronic system of:
    - acquiring an input singing voice recording (11);
    - extracting a musical key (15b) and a Tempo (15a) from the singing voice recording (11);
    - defining a tuning control (16) and a timing control (17) able to align the singing voice recording (11) with the extracted musical key (15b) and Tempo (15a);
    - applying the tuning control (16) and the timing control (17) to the singing voice recording (11) so that an aligned voice recording (20) is obtained;
    - generating an music accompaniment (23) as function of the extracted musical key (15b) and Tempo (15a) and an arrangement database (22);
    - mixing the aligned voice recording (20) and the music accompaniment (23) to obtain the song (12).
  2. Method according to claim 1, wherein the step of defining a tuning control (16) comprises the further steps of:
    - estimating a symbolic note transcription from the singing voice recording (11) to produce a symbolic notation (28) correlated to a melody contained in the singing voice recording (11);
    - estimating a pitch curve (30) over time and a musical key (31) from the symbolic notation (28);
    - producing the tuning control (16) as function of the estimated pitch curve (30), the estimated musical key (30) and the symbolic notation (28).
  3. Method according to claim 1, wherein the step of defining a timing control (16) comprises the further steps of:
    - estimating vowel onsets (33) from the singing voice recording (11);
    - estimating a Tempo (35) from the estimated vowel onsets (33);
    - producing the timing control (16) as function of the estimated vowel onsets (33) and the estimated Tempo (35).
  4. Method according to claim 2, wherein the estimated musical key (30) is used as the extracted musical key (15b).
  5. Method according to claim 3, wherein the estimated Tempo (35) is used as the extracted Tempo (15b).
  6. Method according to claim 1, wherein a pitch-shifting is applied to the singing voice recording (11) as function of the tuning control (16) and a time-scaling is applied to the singing voice recording (11) as function of the timing control (17) to obtain the aligned voice recording (20).
  7. Method according to claim 1, wherein the step of generating the music accompaniment (23) comprises the steps of:
    - loading an arrangement score (21 a) and arrangement stems (21 b) from the arrangement database (22);
    - rendering an musical arrangement (55) based on the loaded arrangement score (21 a) and arrangement stems (21b);
    - time-scaling the musical arrangement (55) to match the extracted Tempo (15b) so that the music accompaniment (23) is obtained.
  8. Method according to claim 1, wherein the step of mixing the aligned voice recording (20) and the music accompaniment (23) comprises in sequence the steps of:
    - adjusting the levels of the aligned voice recording (20) and the music accompaniment (23);
    - mixing the aligned voice recording (20) and music accompaniment (23) with adjusted levels.
  9. Method according to claim 1, wherein before the step of mixing the aligned voice recording (20) and the music accompaniment (23) there is a further step of applying effects to the aligned voice recording (20).
  10. System carrying out the method according to any one of the preceding claims and comprising a device (70) having a user interface (72), voice input means (71, 77) to input the singing voice recording (11) and play means (75, 76) to play the song (12).
  11. System according to claim 10, characterized in that the input means comprises a microphone (71) and/or the play means comprises a speaker or headphone (76)
  12. System according to claim 10, characterized in that the device (70) is a tablet, smart phone or computer.
  13. System according to claim 10, characterized by a client-server architecture having the client based on at least one said device (70) and a server connected the device by an Internet connection.
  14. System according to claim 13, characterized in that the server (73) comprises at least part of a voice processing block (13) and at least part of an arrangement generator block (13) and the arrangement database. (22).
  15. System according to claim 13, characterized in that the client-server architecture comprise a web site to publish the songs (12).
  16. Server (73) carrying out at least part of the method according to any one of the preceding method claims and able to be connected to voice input devices (70) via internet connection.
EP17165762.0A 2017-04-10 2017-04-10 Automatic music production from voice recording. Withdrawn EP3389028A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP17165762.0A EP3389028A1 (en) 2017-04-10 2017-04-10 Automatic music production from voice recording.
US16/500,262 US11087727B2 (en) 2017-04-10 2018-04-09 Auto-generated accompaniment from singing a melody
EP18714793.9A EP3610477A1 (en) 2017-04-10 2018-04-09 Auto-generated accompaniment from singing a melody
PCT/EP2018/058983 WO2018189082A1 (en) 2017-04-10 2018-04-09 Auto-generated accompaniment from singing a melody

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP17165762.0A EP3389028A1 (en) 2017-04-10 2017-04-10 Automatic music production from voice recording.

Publications (1)

Publication Number Publication Date
EP3389028A1 true EP3389028A1 (en) 2018-10-17

Family

ID=58530456

Family Applications (2)

Application Number Title Priority Date Filing Date
EP17165762.0A Withdrawn EP3389028A1 (en) 2017-04-10 2017-04-10 Automatic music production from voice recording.
EP18714793.9A Pending EP3610477A1 (en) 2017-04-10 2018-04-09 Auto-generated accompaniment from singing a melody

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP18714793.9A Pending EP3610477A1 (en) 2017-04-10 2018-04-09 Auto-generated accompaniment from singing a melody

Country Status (3)

Country Link
US (1) US11087727B2 (en)
EP (2) EP3389028A1 (en)
WO (1) WO2018189082A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109741724A (en) * 2018-12-27 2019-05-10 歌尔股份有限公司 Make the method, apparatus and intelligent sound of song
CN110660376A (en) * 2019-09-30 2020-01-07 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN112420003A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Method and device for generating accompaniment, electronic equipment and computer-readable storage medium
US11277216B2 (en) 2013-04-09 2022-03-15 Xhail Ireland Limited System and method for generating an audio file
US11393439B2 (en) 2018-03-15 2022-07-19 Xhail Iph Limited Method and system for generating an audio or MIDI output file using a harmonic chord map

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2422755A (en) * 2005-01-27 2006-08-02 Synchro Arts Ltd Audio signal processing
US20140074459A1 (en) * 2012-03-29 2014-03-13 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
US20140229831A1 (en) * 2012-12-12 2014-08-14 Smule, Inc. Audiovisual capture and sharing framework with coordinated user-selectable audio and video effects filters
US20160210947A1 (en) * 2015-01-20 2016-07-21 Harman International Industries, Inc. Automatic transcription of musical content and real-time musical accompaniment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3598598B2 (en) * 1995-07-31 2004-12-08 ヤマハ株式会社 Karaoke equipment
US7309826B2 (en) * 2004-09-03 2007-12-18 Morley Curtis J Browser-based music rendering apparatus method and system
JP2008537180A (en) * 2005-04-18 2008-09-11 エルジー エレクトロニクス インコーポレーテッド Operation method of music composer
US7705231B2 (en) 2007-09-07 2010-04-27 Microsoft Corporation Automatic accompaniment for vocal melodies
US7838755B2 (en) * 2007-02-14 2010-11-23 Museami, Inc. Music-based search engine
JP5046211B2 (en) * 2008-02-05 2012-10-10 独立行政法人産業技術総合研究所 System and method for automatically associating music acoustic signal and lyrics with time
GB2538994B (en) * 2015-06-02 2021-09-15 Sublime Binary Ltd Music generation tool
US9818396B2 (en) * 2015-07-24 2017-11-14 Yamaha Corporation Method and device for editing singing voice synthesis data, and method for analyzing singing
US20170092246A1 (en) * 2015-09-30 2017-03-30 Apple Inc. Automatic music recording and authoring tool
KR101942814B1 (en) * 2017-08-10 2019-01-29 주식회사 쿨잼컴퍼니 Method for providing accompaniment based on user humming melody and apparatus for the same
KR101931087B1 (en) * 2017-09-07 2018-12-20 주식회사 쿨잼컴퍼니 Method for providing a melody recording based on user humming melody and apparatus for the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2422755A (en) * 2005-01-27 2006-08-02 Synchro Arts Ltd Audio signal processing
US20140074459A1 (en) * 2012-03-29 2014-03-13 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
US20140229831A1 (en) * 2012-12-12 2014-08-14 Smule, Inc. Audiovisual capture and sharing framework with coordinated user-selectable audio and video effects filters
US20160210947A1 (en) * 2015-01-20 2016-07-21 Harman International Industries, Inc. Automatic transcription of musical content and real-time musical accompaniment

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11277216B2 (en) 2013-04-09 2022-03-15 Xhail Ireland Limited System and method for generating an audio file
US11277215B2 (en) 2013-04-09 2022-03-15 Xhail Ireland Limited System and method for generating an audio file
US11483083B2 (en) 2013-04-09 2022-10-25 Xhail Ireland Limited System and method for generating an audio file
US11569922B2 (en) 2013-04-09 2023-01-31 Xhail Ireland Limited System and method for generating an audio file
US11393439B2 (en) 2018-03-15 2022-07-19 Xhail Iph Limited Method and system for generating an audio or MIDI output file using a harmonic chord map
US11393438B2 (en) 2018-03-15 2022-07-19 Xhail Iph Limited Method and system for generating an audio or MIDI output file using a harmonic chord map
US11393440B2 (en) 2018-03-15 2022-07-19 Xhail Iph Limited Method and system for generating an audio or MIDI output file using a harmonic chord map
US11837207B2 (en) 2018-03-15 2023-12-05 Xhail Iph Limited Method and system for generating an audio or MIDI output file using a harmonic chord map
CN109741724A (en) * 2018-12-27 2019-05-10 歌尔股份有限公司 Make the method, apparatus and intelligent sound of song
CN112420003A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Method and device for generating accompaniment, electronic equipment and computer-readable storage medium
CN110660376A (en) * 2019-09-30 2020-01-07 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN110660376B (en) * 2019-09-30 2022-11-29 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium

Also Published As

Publication number Publication date
US11087727B2 (en) 2021-08-10
WO2018189082A1 (en) 2018-10-18
US20200074966A1 (en) 2020-03-05
EP3610477A1 (en) 2020-02-19

Similar Documents

Publication Publication Date Title
US11087727B2 (en) Auto-generated accompaniment from singing a melody
KR100270434B1 (en) Karaoke apparatus detecting register of live vocal to tune harmony vocal
JP5007563B2 (en) Music editing apparatus and method, and program
US8290769B2 (en) Vocal and instrumental audio effects
EP1849154A1 (en) Methods and apparatus for use in sound modification
US6740804B2 (en) Waveform generating method, performance data processing method, waveform selection apparatus, waveform data recording apparatus, and waveform data recording and reproducing apparatus
US11462197B2 (en) Method, device and software for applying an audio effect
US20230120140A1 (en) Ai based remixing of music: timbre transformation and matching of mixed audio data
Arzt et al. Artificial intelligence in the concertgebouw
US20220238088A1 (en) Electronic musical instrument, control method for electronic musical instrument, and storage medium
Daffern Blend in singing ensemble performance: Vibrato production in a vocal quartet
JP2022040079A (en) Method, device, and software for applying audio effect
JP2008286946A (en) Data reproduction device, data reproduction method, and program
JP3750533B2 (en) Waveform data recording device and recorded waveform data reproducing device
CN112825244B (en) Music audio generation method and device
WO2021175460A1 (en) Method, device and software for applying an audio effect, in particular pitch shifting
JPH11338480A (en) Karaoke (prerecorded backing music) device
JP2009244790A (en) Karaoke system with singing teaching function
JP3834963B2 (en) Voice input device and method, and storage medium
JP4033146B2 (en) Karaoke equipment
JPH10116070A (en) Musical playing device
JP2011197564A (en) Electronic music device and program
WO2022208627A1 (en) Song note output system and method
Pluta et al. An automatic synthesis of musical phrases from multi-pitch samples
JP3551000B2 (en) Automatic performance device, automatic performance method, and medium recording program

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20190418