WO2011087460A1 - A method and a device for generating at least one audio file, and a method and a device for playing at least one audio file - Google Patents

A method and a device for generating at least one audio file, and a method and a device for playing at least one audio file Download PDF

Info

Publication number
WO2011087460A1
WO2011087460A1 PCT/SG2011/000021 SG2011000021W WO2011087460A1 WO 2011087460 A1 WO2011087460 A1 WO 2011087460A1 SG 2011000021 W SG2011000021 W SG 2011000021W WO 2011087460 A1 WO2011087460 A1 WO 2011087460A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
equalization
file
audio signal
equalization information
Prior art date
Application number
PCT/SG2011/000021
Other languages
French (fr)
Inventor
Yongwei Zhu
Susanto Rahardja
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Publication of WO2011087460A1 publication Critical patent/WO2011087460A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • G10H1/06Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
    • G10H1/12Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/046File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables

Definitions

  • Embodiments relate generally to methods and devices for generating at least one audio file and methods and devices for playing at least one audio file.
  • Interactive music content consists of multiple audio tracks of a music composition.
  • Each music track represents a sound signal identified as a part of a song, and may correspond to an individual music instrument, a voice, or a group of music instruments or voices mixed together.
  • a user is able to change the settings of the music tracks, such as the volumes. Hence it is called interactive music.
  • the main advantage of interactive music is that a user can play a music piece in different settings, for instance rhythmic style, harmonic style and etc by emphasizing different groups of instruments.
  • a user could mute the vocal track in the music, and can enjoy a Karaoke session.
  • IM AF Interactive Music Application Format
  • MPEG-A Part 12 The current Interactive Music Application Format proposed in ISO/IEC JTC1/SC29/WG11 coding of moving pictures and audio has provided the mechanism of interactivity on track (or group) selection and interactivity on track (or group) volume.
  • the format supports dynamic track volume preset, and dynamic object volume preset.
  • volume fading (fade in or fade out) lasting 3 seconds would cause about 150 updates of volume.
  • An alternative approach for representing the volume change is by specifying the volume of a number of consecutive samples at a time. For instance, a triplet (a, b, c) could be used, where a represents the updated sample number, b represents the number of samples (duration) that the volume change takes place, and c represents the new volume level. It may be assumed that the volume change is linear for the specific duration. In this way, a linear volume fading could be represented with a single volume update.
  • Audio equalization is an important aspect of music mixing, and it can provide huge space for creativity of producers as well as users.
  • the frequency content of tracks or timbre of instruments can be manipulated to achieve various musical effects, such as a better instrument definition, eliminating masking effects between clashing instruments, and so on.
  • Audio equalization is an important tool or process in music production. It is typically conducted by a professional mixing engineer with specialized studio-based tools called digital audio workstations (DAW), to provide audio editing and processing tools for the mixing and mastering for final audio release. [0010] With the DAW software, the audio equalization is performed once, and it is not meant for the end user to manipulate.
  • DAW digital audio workstations
  • Various embodiments provide a method and a device for generating at least one audio file and a method and a device for playing at least one audio file which may solve at least partially the above mentioned problems and may provide a richer set of user interactivity that may result in more appealing interactive music services and higher commercial impact.
  • a method for generating at least one audio file may include including at least one encoded audio signal in the at least one audio file.
  • the method may further include including equalization information for the audio signal in the at least one audio file.
  • a method for playing at least one audio file may include decoding at least one encoded audio signal from the at least one audio file.
  • the method may further include determining equalization information for the audio signal from the at least one audio file.
  • the method may further include outputting the decoded audio signal according to the determined equalization information.
  • FIG. 1 shows a method for generating at least one audio file according to one embodiment
  • FIG. 2 shows a device for generating at least one audio file in one embodiment
  • FIG. 3 shows a method for playing at least one audio file in one embodiment
  • FIG. 4 shows a device for playing at least one audio file in one embodiment
  • FIG. 5 illustrates a generated audio file in one exemplary embodiment
  • FIG. 6 illustrates the types of equalization parameters in one embodiment
  • FIG. 7 illustrates a generated audio file in one exemplary embodiment
  • FIG. 8 illustrates the flowchart of the generating of at least one audio file and the playing of the at least one audio file according to one exemplary embodiment
  • FIG. 9 illustrates how a player may play an audio file according to one exemplary embodiment. Description
  • FIG. 1 illustrates a method 100 for generating at least one audio file.
  • the method 100 may include 101 including at least one encoded audio signal in the at least one audio file.
  • the method may further include 102 including equalization information for the audio signal in the at least one audio file.
  • the equalization information may define how the strength of certain frequencies within the at least one audio signal may be adjusted.
  • the at least one audio file may include both the encoded audio signal and equalization information for the audio signal.
  • Equalization information for an audio signal may for example refer to information specifying how different frequency components, i.e. components of the audio signal of different frequencies of comprising frequencies of different frequency regions, are to be amplified or attenuated before outputting, e.g. at a receiver and/or player side.
  • Equalization information may thus comprise amplification/attenuation information for a plurality of different frequency regions of the audio signal, i.e. of a plurality of different frequency components (corresponding to different frequency regions) of the audio signal.
  • Amplification/attenuation of a component corresponding to a frequency region may also be seen as weighting of the component with a weighting factor (e.g. higher than 1 for amplification and lower than 1 for attenuation).
  • the method 100 may provide the mechanism of equalization interactivity by including the equalization information for the audio signal in the audio file.
  • the method 100 may provide rich interactivity to producers as well as users.
  • the producer may predefine equalization information in the at least one audio file.
  • the producer may predefine several equalization information sets wherein each equalization information set corresponds to a specific music style or a type of personal flavor.
  • the end user may choose one equalization information set among the predefined equalization information sets according to the preference of the end user.
  • the end user may create or edit equalization information in the at least one audio file according to the preference of the end user.
  • the equalization information includes, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the weighting factors for at least two of the at least two audio frequencies may be pairwise different.
  • the weighting of the plurality of audio frequencies may be realized by an equalizer, for example.
  • the at least one encoded audio signal may include a plurality of tracks. Each track may include an encoded component of the audio signal.
  • the audio signal may represent a piece of music and each component of the audio signal may represent a part of the piece of music played by at least one respective instrument.
  • Each instrument may be music instrument or may be vocals, for example.
  • the equalization information may include, for each audio track or object of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the weighting factors for at least two of the at least two audio frequencies may be pairwise different.
  • the equalization information included in the at least one audio file may be used in audio equalization.
  • Audio equalization such as cutting or boosting the energy of particular frequencies, may be applied to individual tracks or groups of tracks, in order to achieve aesthetic goals in the mixing.
  • Audio equalization may for example be used for manipulating the timbre of an instrument for characteristics that are related to music styles or personal flavors. Audio equalization may also be practically used for making two or more instruments fit to each other in the frequency domain, when a new combination of tracks in an IM AF file is specified by the user.
  • the equalization information comprises at least one equalization parameter for at least one track.
  • the at least one equalization parameter for a track may define, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the track having the audio frequency is to be weighted when the frequency component is output.
  • the weighting factors for at least two of the at least two audio frequencies are pairwise different.
  • the equalization setting for at least one track or object may be specified in the preset.
  • a track may be encoded using spatial audio object coding (SAOC).
  • SAOC spatial audio object coding
  • the individual object may be separated after decoding and be manipulated before final mixing and playback.
  • the SAOC audio track may be better played back using multiple speakers (e.g. 5.1). Audio equalization may be done for each object.
  • equalization setting may be optional for any track or object.
  • the equalization information may include a plurality of equalization information sets. Each set may comprise at least one equalization parameter for at least one track. Each equalization information set may correspond to specific music style or a type of personal flavor, for example. This embodiment is further illustrated with reference to FIG. 5.
  • the equalization information includes filter parameters.
  • the weighting of the plurality of audio frequencies may be realized by an equalizer.
  • the major types of audio equalizer are parametric equalizer and graphical equalizer.
  • a parametric equalizer the frequency response of the equalizer can be changed by specifying the three main parameters: center frequency, gain, and bandwidth.
  • a parametric equalizer normally consists of a number of filters, with each filter dealing with a particular frequency range.
  • a graphical equalizer a fixed number of frequency bands are provided, such as 10, 20 or 30 bands. Only the gain needs to be changed for each band.
  • An equalizer may be realized by a number of filters, including low pass filter (LPF), high pass filter (HPF), low shelving filter (LSF), high shelving filter (HSF), and peaking filters.
  • LPF low pass filter
  • HPF high pass filter
  • LSF low shelving filter
  • HSF high shelving filter
  • Peaking filters One particular equalization information setting may specified by the tunable parameters of a number of such filters.
  • Parametric equalizer such as mentioned above, may be more favorable than graphical equalizer in practice.
  • the frequency response of the equalizer may be changed by specifying the three main parameters: center frequency, gain, and bandwidth.
  • a parametric equalizer normally consists of a number of filters, with each filter dealing with a particular frequency range/Accordingly, when a parametric equalizer is used to realize the weighting of the plurality of audio frequencies the filter parameters included in the equalization information may include the number of filters wherein each filter deals with a particular frequency range, and center frequency, gain, and bandwidth for each filter.
  • a graphical equalizer a fixed number of frequency bands are provided, such as 10, 20 or 30 bands. Only the gain needs to be changed for each band. Accordingly, when a graphical equalizer is used to realize the weighting of the plurality of audio frequencies, the filter parameters included in the equalization information may include the number of frequency bands and gain to be changed for each band. This embodiment is further illustrated with reference to FIG. 6.
  • the filter coefficients of the filters may be encoded directly into the at least one audio file, e.g. an IM AF file, so that the desired equalization results may be guaranteed on any IM AF player.
  • the equalization information specifies the weighting factors by rules based on which allowed weighting factors or allowed combinations of weighting factors may be determined.
  • the producer of the audio file may preset the rules for the encoded audio signal in the audio file such that when later an end user create or edit equalization information for the audio signal, the created or edited equalization information is required to satisfy the rules.
  • the rules may define boundaries for the weighting factors. For example, the rules may define that the weighting factor for a certain or each track must not be lower or higher than a certain value.
  • the rules may define dependencies of the weighting factors that must be fulfilled. For example, the rules may define that difference between two weighting factors must not be higher than a threshold. For another example, the rules may define that a weighting factor for a first track must always be higher than a weighting factor for a second track. This embodiment is further illustrated with reference to FIG. 7.
  • equalization interactivity a user may be able to change the equalization information settings in the preset or create a new preset with equalization information settings. Interactivity rules or constraint on equalization may be imposed by the producer.
  • the at least one file is one file.
  • at least one encoded audio signal and the equalization information may be both included in a single file.
  • the encoded audio signal, the equalization information, and the rules based on which the allowed weighting factors or allowed combinations of weighting factors may be determined may be included in a single file.
  • the at least one file is a plurality of files.
  • at least one encoded audio signal and the equalization information may be included in separate files.
  • the audio signal may be included in a single file, and the equalization information and the rules based on which the allowed weighting factors or allowed combinations of weighting factors may be determined -may be included in another file.
  • FIG. 2 illustrates a device 200 for generating at least one audio file which corresponds to the method 100.
  • the device 200 may include a first unit 201 which is configured to include at least one encoded audio signal in the at least one audio file.
  • the device 200 may further include a second unit 202 which is configured to include equalization information for the audio signal in the at least one audio file.
  • FIG. 3 illustrates a method 300 for playing at least one audio file.
  • the method 300 may include 301 decoding at least one encoded audio signal from the at least one audio file.
  • the method 300 may further include 302 determining equalization information for the audio signal in the at least one audio file.
  • the equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighing factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the weighting factors for at least two of the at least two audio frequencies may be pairwise different.
  • the method 300 may further include 303 outputting the decoded audio signal according to the determined equalization information.
  • the audio signal in the at least one audio file may represent a piece of music.
  • the device for playing the at least one audio file may decode the encoded audio signal from the at least one audio file.
  • the device may further determine equalization information for the audio signal from the at least one audio file.
  • Such equalization information may specify for each audio frequency of a plurality of audio frequencies a frequency weighting factor by which the frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the combination of specification of frequency weighting factors for the plurality of audio frequencies may correspond to a particular music style or a personal flavor. At least two weighting factors among the plurality of weighting factors are pairwise different.
  • the determination of equalization information may be according to a selection by an end user among a plurality of predefined equalization information sets, for example.
  • the end user may create or edit equalization information and the determination may be according to the created or edited equalization information by the end user.
  • the determination may be also subjected to predefined rules or constraints based on which allowed weighting factors or combination of weighting factors may be determined.
  • the decoded audio signal is output according to the determined equalization information.
  • the weighting factors are determined based on user input or user selection of an equalization information set from a plurality of equalization information sets.
  • FIG. 4 illustrates a device 400 for playing at least one audio file which corresponds to the method 300.
  • the device 400 may include a decoding unit 401 which is configured to decode at least one encoded audio signal from the at least one audio file.
  • the device 400 may further include a determining unit 402 which is configured to determine equalization information for the audio signal in the at least one audio file.
  • the equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighing factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the weighting factors for at least two of the at least two audio frequencies may be pairwise different.
  • the device 400 may further include an outputting unit 403 configured to output the decoded audio signal according to the determined equalization information.
  • FIG. 5 illustrates the structure of data representation of equalization settings in an audio file 500 according to one exemplary embodiment.
  • the audio file 500 may be an interactive music file, e.g. an IM AF file.
  • the audio file 500 may contain 5 audio tracks 501, 502, 503, 504, and 505 (Track 1 , 2, 5). It is appreciated that the number of 5 tracks is only for illustration purpose but is not limited thereto.
  • Each audio track may be individually encoded audio in either a compressed (AAC) or an uncompressed (PCM) format.
  • Each track may for example correspond to one instrument, and all the audio tracks may have a same duration.
  • the instrument may be a musical instrument such as drum or piano. The instrument may also include vocals.
  • the audio file 500 also includes two equalization information sets 510 and 520, wherein each equalization information set comprises at least one equalization parameter for at least one track. That is, for example, the equalization information set 510 comprises equalization parameters 51 1 for the track 501, equalization parameters 512 for the track 502, equalization parameters 513 for the track 503, equalization parameters 514 for the track 504, and equalization parameters 515 for the track 505. Similarly, the equalization information set 520 comprises equalization parameters 521 for the track 501, equalization parameters 522 for the track 502, equalization parameters 523 for the track 503, equalization parameters 524 for the track 504, and equalization parameters 525 for the track 505. When being played back, the tracks 501 to 505 may be played simultaneously and mixed instantly.
  • Each track may have one or multiple equalization (EQ) parameter sets.
  • the track 501 has two equalization parameter sets 511 and 521
  • the track 502 has two equalization parameter sets 512 and 522, and so on.
  • Only one set of EQ parameter set may be active at a given time of playing back.
  • only one of equalization parameter set of sets 511 and 521 may be active for the track 501 at a given time of playing back.
  • An equalization information set or mixing setting may be a collection of the sets of EQ parameters (one set for each track). In the illustrated case, there are two equalization information sets 510 and 520. Each equalization information set may correspond to one music style or a type of personal flavor. The end user may select one equalization information set between the sets 510 and 520 according to the preference of the end user. It may also be possible that a different mixing setting is used for different temporal segment of the music, i.e. mixing settings may be dynamically changed according to one embodiment.
  • FIG. 6 illustrates the types of EQ parameters.
  • the EQ parameters may be the parameters of a parametric equalizer or a graphical equalizer.
  • the parameters may correspond to the center (or cutoff/corner) frequency, gain, and bandwidth of a number of peaking, shelving and pass filters, for example.
  • the parameters may correspond to the number frequency bands, and the gain for each band.
  • coding of EQ parameters in IM AF may define a new preset type to indicate that the preset contains equalization information, and define a box for equalization parameters, which contains the parameters of a series of filters for a particular track or object.
  • the number of filters may be indicated.
  • the type of each filter may also be specified, i.e. LPF, HPF, LSF, HSF, Peaking filter.
  • the Table 1 below illustrates the parameters of each filter that may be specified according to one exemplary embodiment.
  • G represents the gain value of the respective filter
  • Nil means not available
  • Slope represents the slope of the respective filter
  • BW represents the bandwidth of the filter
  • Q represents the quality factor of the respective filter.
  • the filters in the audio equalizer may be designed with 2 nd order Infinite Impulse Response (IIR) filters, which may be efficiently implemented.
  • IIR Infinite Impulse Response
  • the filter coefficients may be derived from the equalizer parameters.
  • FIG. 7 illustrates an embodiment wherein the rules on EQ parameters in an audio file, e.g. an interactive music file, may be included in an audio file.
  • the audio file 700 may be an interactive music file, e.g. an IM AF file.
  • the audio file 700 may be the same as the audio file 500 as shown in FIG. 5 except that the audio file 700 further includes rules or constraints 701 based on which allowed equalization parameters for at least one track or combination of equalization parameters for the tracks may be determined.
  • the rules or constrains 701 may define the conditions for which allowed weighting factors may be determined, wherein weighting factors may be equalization factors comprised in the equalization information by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output.
  • the rules may be imposed by the producer of the interactive music content, and may be followed by the player, when the end user is interacting with the music.
  • the rules may either put constraints of the value range of the EQ parameters, or enforce a particular relationship among a number of EQ parameters. For example, a rule may impose that the cut-off frequency of a low pass filter should not be lower than 200 Hz. As another example, a rule may specify that the cut-off frequency of the low pass filter on track 1 should always be lower than the cut-off frequency of the high pass filter on track 2.
  • the user may interact with the equalization settings in several ways.
  • a number of mixing settings may be predefined by the producer when the music piece in interactive music format is released.
  • the end user may simply choose the mixing settings to enjoy the music in different styles according to one embodiment.
  • the end user may with appropriate tools create own mixing settings or equalization information set by defining new equalization parameters
  • the new mixing setting may be added to the file or saved in a separate file.
  • Equalization parameters and combination of equalization parameters for different tracks may be defined in order to add effects to a particular instrument, or to make two or more instruments altogether sound better. For example, when a bass guitar track is mixed with a kick drum track, a certain frequency range of the two instruments might overlap. With interaction with equalization settings, the conflict may be eased by cutting any dispensable frequencies from the instruments. For another example, interaction with equalization settings may be used to convey feelings and mood by cutting or boosting certain frequencies of instruments, such as make vocals sweeter, a snare drum more aggressive, a trumpet mellower, etc.
  • Interaction with equalization is in nature a creative process, and software tools may be built to assist in analyzing the audio signal in the audio tracks and help user to adjust the equalization parameter to achieve certain mixing flavors.
  • interactivity on equalization may be similar to volume interactivity, as long as appropriate interface (GUI) is provided by the IM AF player.
  • GUI appropriate interface
  • the limits rule, equivalence rule and upper/lower rule may be defined for the equalization parameters (in Mixing Rule Box).
  • Mixing Rule Box is a data structure inside the IM AF file, which may contain the information on the rules.
  • the rules are basically the constraints on the user's interaction, which may be defined by the producer and obeyed by the IM AF player.
  • a music file may be arranged to store EQ parameters for individual music tracks, which are grouped into mixing settings or equalization information sets.
  • the mixing settings are intended for different music styles or flavors.
  • a music file is arranged to store rules on how EQ parameters can be modified by a user.
  • a music player may provide a selection of EQ parameters mixing settings or equalization information sets to play back music in different style or created personal mixing settings.
  • FIG. 8 illustrates the flowchart of the generating of at least one audio file and the playing of the at least one audio file according to one exemplary embodiment.
  • the producer may compose a piece of music and use interactive music composing tools 801 to generate an audio file 802, e.g. an interactive music file.
  • the generated audio file 802 may include a plurality of audio tracks and equalization information which comprises at least one equalization parameter for at least one track as illustrated in FIG. 5.
  • the generated audio file 802 may further include rules based on which allowed equalization parameters for at least one track or combination of equalization parameters for at least two tracks may be determined as illustrated in FIG. 7.
  • the end user may use an interactive music player 803 to play the audio file 802.
  • the end user may for example select a equalization information set among a plurality of predefined equalization information sets contained in the audio file Alternatively, the end user may create or edit an equalization information set for the audio signal in the audio file 802. The created or edited equalization information set may be subjected to rules contained in the audio file 802.
  • IM AF file Different extent of support of equalization may be provided by different brands of IM AF file, due to the concern of computational complexity.
  • a brand stands for the brand identifier plus all the requirements that the IM AF file needs to conform to, and is intended for the media player to playback the media content properly.
  • requirements may for example define what audio coding format may be used (e.g.
  • a brand identifier may be a short field in the media file format (a string of four characters e.g. 'imOl ' or 'mp4a'), which is included in the IM AF file and indicates a particular brand, and which may also indicate what media with what configuration can be expected in the file.
  • a brand identifier may be seen as a top level handler which indicates the type of file format that the structure uses.
  • filtering computation based on 2 nd order Infinite Impulse Response filter may not be a concern. However, it might still be a concern, especially for mobile applications.
  • Concerning computing complexity, the maximal number of filters simultaneously activated may be limited for different brand identifiers, i.e. 'im01','im02 , , im03', 'im04', 'iml l ', iml2 ⁇ 'im21 ⁇ or a new brand identifier may be specified for the equalization support.
  • FIG. 9 illustrates how an player 920 may play an audio file or data file 901 according to an exemplary embodiment.
  • the player 920 may be an interactive music player, for example.
  • the audio file or the data file 901 may be an IM AF file.
  • the audio file 901 may include preset data 903.
  • the audio file 901 may further include audio data 905.
  • the audio data 905 may include a plurality of audio tracks.
  • the audio. file 901 may further include grouping data 907.
  • the audio file 901 may further include rule data 909.
  • the audio file may further include 3 GPP timed text 910, MPEG-7 metadata 913, and JPEG image 915.
  • the audio or data file 901 may be played by the player 920.
  • the player 920 may be an IM AF player.
  • the player 920 may include audio decoders 921.
  • the player 920 may further include a mixer 923.
  • the player 920 may further include a rule analyzer 925.
  • the player 920 may further include a text decoder 927.
  • the player 920 may further include a JPEG decoder 929.
  • the preset data 903 may include predefined equalization information for the audio signal contained in the audio data 905.
  • the equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output, wherein the weighting factors for at least two of the at least two audio frequencies are pairwise different.
  • the equalization information may include at least one equalization parameter for at least one track.
  • the equalization information may include a plurality of equalization information sets, wherein each set comprises at least one equalization parameter for at least one track. In operation, for example, the end user may select a equalization parameter set among the plurality of equalization information sets such that the decoded audio signal is played by the player 920 according to the selected equalization parameter set.
  • the grouping data 907 may include the information representing the user's selection of the tracks.
  • the grouping data 907 may be created when the user creates or edits the selection of tracks.
  • the rule data 909 may include rules or constraints based on which allowed weighting factors (or allowed combinations of weighting factors) may be determined.
  • the end user may create equalization information for the audio signal contained in the audio or data file 901, and the rule analyzer 925 may determine whether the created equalization information satisfies the rules or constraints in the rule data 909. If yes, the player 920 may play the decoded audio signal according to the created equalization information by the end user.
  • the audio decoders 921 may decode the encoded audio signals in the audio data 905, and the mixer 923 may mix all the decoded audio tracks such that all the decoded audio tracks may be played simultaneously according to the determined equalization information.
  • the equalization information may be selected from a plurality of predefined equalization sets contained in the preset data 903. Alternatively, the equalization information may be created by an end user. The equalization information created by the end user may be subjected to predefined rules or constraints contained in the rule date 909.
  • the text decoder 927 may be configured to decode the 3GPP timed text in the audio or data file 901 and may be further configured to output the decoded display text.
  • the JPEG decoder 929 may be configured to decode the JPEG image contained in the audio or data file 901 , and may be further configured to output the decoded JPEG image for display.
  • various embodiments provide a method of generating an audio file which has interactive music application format.
  • the various embodiments further provide a mechanism to support of audio equalization under the current IM AF framework.
  • the various embodiments also provide a file structure based on the IM AF format, to support of audio equalization under the current IM AF framework.
  • Various embodiments provide another dimension of multiple-track-based interactivity, equalization interactivity, where the listener may modify the frequency band level of at least one track or element of the hierarchy (under interactivity requirement), and that each preset in an IM AF file may be able to describe equalization information (under preset requirements).
  • Various embodiments also provide the support for audio equalization and interactivity.
  • Various embodiments further provide a music player that may allow an end user to manipulate and store audio equalization settings of multiple-track-music.

Abstract

Embodiments provide a method for generating at least one audio file. The method comprises including at least one encoded audio signal in the at least one audio file. The method further comprises including equalization information for the audio signal in the at least one audio file.

Description

A METHOD AND A DEVICE FOR GENERATING AT LEAST ONE AUDIO FILE, AND A METHOD AND A DEVICE FOR PLAYING AT LEAST ONE
AUDIO FILE
[0001] The present application claims the benefit of the Singapore patent application 201000317-6 (filed on 15 January 2010), the entire contents of which are incorporated herein by reference for all purposes.
Technical Field
[0002] Embodiments relate generally to methods and devices for generating at least one audio file and methods and devices for playing at least one audio file.
Background
[0003] Recently interactive music services have been available in the market, such as MUSIC2.0. Interactive music content consists of multiple audio tracks of a music composition. Each music track represents a sound signal identified as a part of a song, and may correspond to an individual music instrument, a voice, or a group of music instruments or voices mixed together. With the music players that support interactive music format, a user is able to change the settings of the music tracks, such as the volumes. Hence it is called interactive music.
[0004] The main advantage of interactive music is that a user can play a music piece in different settings, for instance rhythmic style, harmonic style and etc by emphasizing different groups of instruments. In addition, a user could mute the vocal track in the music, and can enjoy a Karaoke session.
[0005] The current Interactive Music Application Format (IM AF) (MPEG-A Part 12) proposed in ISO/IEC JTC1/SC29/WG11 coding of moving pictures and audio has provided the mechanism of interactivity on track (or group) selection and interactivity on track (or group) volume. The format supports dynamic track volume preset, and dynamic object volume preset.
[0006] The change of volume is realized by updating the volume from a particular single sample. However, this approach is not efficient for continuous volume change, such as fading. The reason is that the time duration of one sample is about 20ms, thus a volume fading (fade in or fade out) lasting 3 seconds would cause about 150 updates of volume.
[0007] An alternative approach for representing the volume change is by specifying the volume of a number of consecutive samples at a time. For instance, a triplet (a, b, c) could be used, where a represents the updated sample number, b represents the number of samples (duration) that the volume change takes place, and c represents the new volume level. It may be assumed that the volume change is linear for the specific duration. In this way, a linear volume fading could be represented with a single volume update.
[0008] Audio equalization is an important aspect of music mixing, and it can provide huge space for creativity of producers as well as users. With the support of equalization, the frequency content of tracks or timbre of instruments can be manipulated to achieve various musical effects, such as a better instrument definition, eliminating masking effects between clashing instruments, and so on.
[0009] Audio equalization is an important tool or process in music production. It is typically conducted by a professional mixing engineer with specialized studio-based tools called digital audio workstations (DAW), to provide audio editing and processing tools for the mixing and mastering for final audio release. [0010] With the DAW software, the audio equalization is performed once, and it is not meant for the end user to manipulate.
[0011] In the DAW setup, the technique of making equalization settings for music mixing would typically be considered as a specialized skill, which requires professional training.
Summary of the Invention
[0012] Various embodiments provide a method and a device for generating at least one audio file and a method and a device for playing at least one audio file which may solve at least partially the above mentioned problems and may provide a richer set of user interactivity that may result in more appealing interactive music services and higher commercial impact.
[0013] In one embodiment, a method for generating at least one audio file is provided. The method may include including at least one encoded audio signal in the at least one audio file. The method may further include including equalization information for the audio signal in the at least one audio file.
[0014] In one embodiment, a method for playing at least one audio file is provided. The method may include decoding at least one encoded audio signal from the at least one audio file. The method may further include determining equalization information for the audio signal from the at least one audio file. The method may further include outputting the decoded audio signal according to the determined equalization information.
[0015] According to other embodiments, devices according to the methods described above are provided. [0016] It should be noted that the embodiments described in the dependent claims of the independent method claim are analogously valid for the corresponding device claim where applicable.
Brief Description of the Drawings
[0017] In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:
FIG. 1 shows a method for generating at least one audio file according to one embodiment;
FIG. 2 shows a device for generating at least one audio file in one embodiment;
FIG. 3 shows a method for playing at least one audio file in one embodiment;
FIG. 4 shows a device for playing at least one audio file in one embodiment;
FIG. 5 illustrates a generated audio file in one exemplary embodiment;
FIG. 6 illustrates the types of equalization parameters in one embodiment;
FIG. 7 illustrates a generated audio file in one exemplary embodiment;
FIG. 8 illustrates the flowchart of the generating of at least one audio file and the playing of the at least one audio file according to one exemplary embodiment; and
FIG. 9 illustrates how a player may play an audio file according to one exemplary embodiment. Description
[0018] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. The following detailed description therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
[0019] The word "exemplary" is used herein to mean "serving as an example, instance, or illustration". Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs.
[0020] FIG. 1 illustrates a method 100 for generating at least one audio file. The method 100 may include 101 including at least one encoded audio signal in the at least one audio file. The method may further include 102 including equalization information for the audio signal in the at least one audio file. For example, the equalization information may define how the strength of certain frequencies within the at least one audio signal may be adjusted.
[0021] In other words, in one embodiment, the at least one audio file may include both the encoded audio signal and equalization information for the audio signal. [0022] Equalization information for an audio signal may for example refer to information specifying how different frequency components, i.e. components of the audio signal of different frequencies of comprising frequencies of different frequency regions, are to be amplified or attenuated before outputting, e.g. at a receiver and/or player side. Equalization information may thus comprise amplification/attenuation information for a plurality of different frequency regions of the audio signal, i.e. of a plurality of different frequency components (corresponding to different frequency regions) of the audio signal. Amplification/attenuation of a component corresponding to a frequency region may also be seen as weighting of the component with a weighting factor (e.g. higher than 1 for amplification and lower than 1 for attenuation).
[0023] As mentioned earlier, the current specification of Interactive Music
Application Format (IM AF) (MPEG-A Part 12) has provided the mechanism of interactivity on track (or group) selection and interactivity on track (or group) volume. The IM AF however has not provided the mechanism of equalization interactivity. The method 100 may provide the mechanism of equalization interactivity by including the equalization information for the audio signal in the audio file. The method 100 may provide rich interactivity to producers as well as users. For example, the producer may predefine equalization information in the at least one audio file. For a more concrete example, the producer may predefine several equalization information sets wherein each equalization information set corresponds to a specific music style or a type of personal flavor. The end user may choose one equalization information set among the predefined equalization information sets according to the preference of the end user. In one embodiment, the end user may create or edit equalization information in the at least one audio file according to the preference of the end user.
[0024] In one embodiment, the equalization information includes, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The weighting factors for at least two of the at least two audio frequencies may be pairwise different. The weighting of the plurality of audio frequencies may be realized by an equalizer, for example.
[0025] In one embodiment, the at least one encoded audio signal may include a plurality of tracks. Each track may include an encoded component of the audio signal. The audio signal may represent a piece of music and each component of the audio signal may represent a part of the piece of music played by at least one respective instrument. Each instrument may be music instrument or may be vocals, for example. In this embodiment, for example, the equalization information may include, for each audio track or object of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The weighting factors for at least two of the at least two audio frequencies may be pairwise different.
[0026] In one embodiment, the equalization information included in the at least one audio file may be used in audio equalization. Audio equalization, such as cutting or boosting the energy of particular frequencies, may be applied to individual tracks or groups of tracks, in order to achieve aesthetic goals in the mixing. Audio equalization may for example be used for manipulating the timbre of an instrument for characteristics that are related to music styles or personal flavors. Audio equalization may also be practically used for making two or more instruments fit to each other in the frequency domain, when a new combination of tracks in an IM AF file is specified by the user.
[0027] In one embodiment, the equalization information comprises at least one equalization parameter for at least one track. For example, the at least one equalization parameter for a track may define, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the track having the audio frequency is to be weighted when the frequency component is output. The weighting factors for at least two of the at least two audio frequencies are pairwise different. Similarly to volume settings, the equalization setting for at least one track or object may be specified in the preset. A track may be encoded using spatial audio object coding (SAOC). For example, there may be a number of spatial audio objects that are encoded in the track, and the individual object may be separated after decoding and be manipulated before final mixing and playback. The SAOC audio track may be better played back using multiple speakers (e.g. 5.1). Audio equalization may be done for each object. In the current standard specification, there are maximally 2 tracks in the IMAF file if the track(s) is encoded using SAOC (e.g. brands 'im04' and 'iml2'). And equalization setting may be optional for any track or object.
[0028] In a further embodiment, the equalization information may include a plurality of equalization information sets. Each set may comprise at least one equalization parameter for at least one track. Each equalization information set may correspond to specific music style or a type of personal flavor, for example. This embodiment is further illustrated with reference to FIG. 5.
[0029] In one embodiment, the equalization information includes filter parameters. As mentioned earlier, the weighting of the plurality of audio frequencies may be realized by an equalizer.
[0030] The major types of audio equalizer are parametric equalizer and graphical equalizer. With a parametric equalizer, the frequency response of the equalizer can be changed by specifying the three main parameters: center frequency, gain, and bandwidth. A parametric equalizer normally consists of a number of filters, with each filter dealing with a particular frequency range. With a graphical equalizer, a fixed number of frequency bands are provided, such as 10, 20 or 30 bands. Only the gain needs to be changed for each band.
[0031] An equalizer may be realized by a number of filters, including low pass filter (LPF), high pass filter (HPF), low shelving filter (LSF), high shelving filter (HSF), and peaking filters. One particular equalization information setting may specified by the tunable parameters of a number of such filters. Parametric equalizer, such as mentioned above, may be more favorable than graphical equalizer in practice.
[0032] For example, with a parametric equalizer, the frequency response of the equalizer may be changed by specifying the three main parameters: center frequency, gain, and bandwidth. A parametric equalizer normally consists of a number of filters, with each filter dealing with a particular frequency range/Accordingly, when a parametric equalizer is used to realize the weighting of the plurality of audio frequencies the filter parameters included in the equalization information may include the number of filters wherein each filter deals with a particular frequency range, and center frequency, gain, and bandwidth for each filter. For another example, with a graphical equalizer, a fixed number of frequency bands are provided, such as 10, 20 or 30 bands. Only the gain needs to be changed for each band. Accordingly, when a graphical equalizer is used to realize the weighting of the plurality of audio frequencies, the filter parameters included in the equalization information may include the number of frequency bands and gain to be changed for each band. This embodiment is further illustrated with reference to FIG. 6.
[0033] In one embodiment, the filter coefficients of the filters may be encoded directly into the at least one audio file, e.g. an IM AF file, so that the desired equalization results may be guaranteed on any IM AF player.
[0034] In one embodiment, the equalization information specifies the weighting factors by rules based on which allowed weighting factors or allowed combinations of weighting factors may be determined. For example, the producer of the audio file may preset the rules for the encoded audio signal in the audio file such that when later an end user create or edit equalization information for the audio signal, the created or edited equalization information is required to satisfy the rules. In one embodiment, the rules may define boundaries for the weighting factors. For example, the rules may define that the weighting factor for a certain or each track must not be lower or higher than a certain value. In one embodiment, the rules may define dependencies of the weighting factors that must be fulfilled. For example, the rules may define that difference between two weighting factors must not be higher than a threshold. For another example, the rules may define that a weighting factor for a first track must always be higher than a weighting factor for a second track. This embodiment is further illustrated with reference to FIG. 7.
[0035] With equalization interactivity, a user may be able to change the equalization information settings in the preset or create a new preset with equalization information settings. Interactivity rules or constraint on equalization may be imposed by the producer.
[0036] In one embodiment, the at least one file is one file. For example, at least one encoded audio signal and the equalization information may be both included in a single file. For another example, the encoded audio signal, the equalization information, and the rules based on which the allowed weighting factors or allowed combinations of weighting factors may be determined may be included in a single file.
[0037] In an alternative embodiment, the at least one file is a plurality of files. For example, at least one encoded audio signal and the equalization information may be included in separate files. For another example, the audio signal may be included in a single file, and the equalization information and the rules based on which the allowed weighting factors or allowed combinations of weighting factors may be determined -may be included in another file.
[0038] FIG. 2 illustrates a device 200 for generating at least one audio file which corresponds to the method 100. The device 200 may include a first unit 201 which is configured to include at least one encoded audio signal in the at least one audio file. The device 200 may further include a second unit 202 which is configured to include equalization information for the audio signal in the at least one audio file.
[0039] FIG. 3 illustrates a method 300 for playing at least one audio file. The method 300 may include 301 decoding at least one encoded audio signal from the at least one audio file. The method 300 may further include 302 determining equalization information for the audio signal in the at least one audio file. The equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighing factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The weighting factors for at least two of the at least two audio frequencies may be pairwise different. The method 300 may further include 303 outputting the decoded audio signal according to the determined equalization information.
[0040] In other words, in one embodiment, for example, the audio signal in the at least one audio file may represent a piece of music. The device for playing the at least one audio file may decode the encoded audio signal from the at least one audio file. The device may further determine equalization information for the audio signal from the at least one audio file. Such equalization information may specify for each audio frequency of a plurality of audio frequencies a frequency weighting factor by which the frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The combination of specification of frequency weighting factors for the plurality of audio frequencies may correspond to a particular music style or a personal flavor. At least two weighting factors among the plurality of weighting factors are pairwise different. The determination of equalization information may be according to a selection by an end user among a plurality of predefined equalization information sets, for example. For another example, the end user may create or edit equalization information and the determination may be according to the created or edited equalization information by the end user. In the embodiment where the determination is according to the created or edited equalization information by the end user, the determination may be also subjected to predefined rules or constraints based on which allowed weighting factors or combination of weighting factors may be determined. After the determination of equalization information, the decoded audio signal is output according to the determined equalization information.
[0041] In one embodiment, the weighting factors are determined based on user input or user selection of an equalization information set from a plurality of equalization information sets.
[0042] FIG. 4 illustrates a device 400 for playing at least one audio file which corresponds to the method 300. The device 400 may include a decoding unit 401 which is configured to decode at least one encoded audio signal from the at least one audio file. The device 400 may further include a determining unit 402 which is configured to determine equalization information for the audio signal in the at least one audio file. The equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighing factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The weighting factors for at least two of the at least two audio frequencies may be pairwise different. The device 400 may further include an outputting unit 403 configured to output the decoded audio signal according to the determined equalization information.
[0043] FIG. 5 illustrates the structure of data representation of equalization settings in an audio file 500 according to one exemplary embodiment. [0044] According to the exemplary embodiment, the audio file 500 may be an interactive music file, e.g. an IM AF file. In the illustration, the audio file 500 may contain 5 audio tracks 501, 502, 503, 504, and 505 (Track 1 , 2, 5). It is appreciated that the number of 5 tracks is only for illustration purpose but is not limited thereto. Each audio track may be individually encoded audio in either a compressed (AAC) or an uncompressed (PCM) format. Each track may for example correspond to one instrument, and all the audio tracks may have a same duration. The instrument may be a musical instrument such as drum or piano. The instrument may also include vocals. The audio file 500 also includes two equalization information sets 510 and 520, wherein each equalization information set comprises at least one equalization parameter for at least one track. That is, for example, the equalization information set 510 comprises equalization parameters 51 1 for the track 501, equalization parameters 512 for the track 502, equalization parameters 513 for the track 503, equalization parameters 514 for the track 504, and equalization parameters 515 for the track 505. Similarly, the equalization information set 520 comprises equalization parameters 521 for the track 501, equalization parameters 522 for the track 502, equalization parameters 523 for the track 503, equalization parameters 524 for the track 504, and equalization parameters 525 for the track 505. When being played back, the tracks 501 to 505 may be played simultaneously and mixed instantly. Each track may have one or multiple equalization (EQ) parameter sets. For example, the track 501 has two equalization parameter sets 511 and 521, the track 502 has two equalization parameter sets 512 and 522, and so on. Only one set of EQ parameter set may be active at a given time of playing back. For example, only one of equalization parameter set of sets 511 and 521 may be active for the track 501 at a given time of playing back.
[0045] An equalization information set or mixing setting may be a collection of the sets of EQ parameters (one set for each track). In the illustrated case, there are two equalization information sets 510 and 520. Each equalization information set may correspond to one music style or a type of personal flavor. The end user may select one equalization information set between the sets 510 and 520 according to the preference of the end user. It may also be possible that a different mixing setting is used for different temporal segment of the music, i.e. mixing settings may be dynamically changed according to one embodiment.
[0046] FIG. 6 illustrates the types of EQ parameters. The EQ parameters may be the parameters of a parametric equalizer or a graphical equalizer. In the case of parametric equalizer, the parameters may correspond to the center (or cutoff/corner) frequency, gain, and bandwidth of a number of peaking, shelving and pass filters, for example. In the case of graphical equalizer, the parameters may correspond to the number frequency bands, and the gain for each band.
[0047] In various embodiments, coding of EQ parameters in IM AF may define a new preset type to indicate that the preset contains equalization information, and define a box for equalization parameters, which contains the parameters of a series of filters for a particular track or object.
[0048] In the equalization box, the number of filters may be indicated. The type of each filter may also be specified, i.e. LPF, HPF, LSF, HSF, Peaking filter. The Table 1 below illustrates the parameters of each filter that may be specified according to one exemplary embodiment.
Table 1: Filter parameters for different types of filters
Figure imgf000017_0001
[0049] In Table 1 , "G" represents the gain value of the respective filter; "Nil" means not available; "Slope" represents the slope of the respective filter; "BW" represents the bandwidth of the filter; and "Q" represents the quality factor of the respective filter.
[0050] The filters in the audio equalizer may be designed with 2nd order Infinite Impulse Response (IIR) filters, which may be efficiently implemented. The filter coefficients may be derived from the equalizer parameters. A straight forward
implementation in "Direct Form 1" is: y[n]=A*x[n] + B*x[n-1] + C*x[n-2] - D*y[n-1] - E*y[n-2] (Eq. 1) where A, B, C, D and E are filter coefficients for each filter, and may be stored together with the EQ parameters in the interactive music file; x[n] represents the input audio signal before the filtering; y[n] represents the output audio signal after the filtering; and 'n' is the time index. "Direct Form 1" is one of the network structures for filter systems. Some of the other forms include "Cascade Form", "Parallel Form", 'Transposed Forms".
[0051] FIG. 7 illustrates an embodiment wherein the rules on EQ parameters in an audio file, e.g. an interactive music file, may be included in an audio file. The audio file 700 may be an interactive music file, e.g. an IM AF file. The audio file 700 may be the same as the audio file 500 as shown in FIG. 5 except that the audio file 700 further includes rules or constraints 701 based on which allowed equalization parameters for at least one track or combination of equalization parameters for the tracks may be determined. In other words, the rules or constrains 701 may define the conditions for which allowed weighting factors may be determined, wherein weighting factors may be equalization factors comprised in the equalization information by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output. The rules may be imposed by the producer of the interactive music content, and may be followed by the player, when the end user is interacting with the music. The rules may either put constraints of the value range of the EQ parameters, or enforce a particular relationship among a number of EQ parameters. For example, a rule may impose that the cut-off frequency of a low pass filter should not be lower than 200 Hz. As another example, a rule may specify that the cut-off frequency of the low pass filter on track 1 should always be lower than the cut-off frequency of the high pass filter on track 2.
[0052] The user may interact with the equalization settings in several ways. A number of mixing settings may be predefined by the producer when the music piece in interactive music format is released. The end user may simply choose the mixing settings to enjoy the music in different styles according to one embodiment.
[0053] In one embodiment, the end user may with appropriate tools create own mixing settings or equalization information set by defining new equalization parameters In one embodiment, the new mixing setting may be added to the file or saved in a separate file.
[0054] Equalization parameters and combination of equalization parameters for different tracks (equalization information set) may be defined in order to add effects to a particular instrument, or to make two or more instruments altogether sound better. For example, when a bass guitar track is mixed with a kick drum track, a certain frequency range of the two instruments might overlap. With interaction with equalization settings, the conflict may be eased by cutting any dispensable frequencies from the instruments. For another example, interaction with equalization settings may be used to convey feelings and mood by cutting or boosting certain frequencies of instruments, such as make vocals sweeter, a snare drum more aggressive, a trumpet mellower, etc.
[0055] Interaction with equalization is in nature a creative process, and software tools may be built to assist in analyzing the audio signal in the audio tracks and help user to adjust the equalization parameter to achieve certain mixing flavors.
[0056] In various embodiments, interactivity on equalization may be similar to volume interactivity, as long as appropriate interface (GUI) is provided by the IM AF player.
[0057] In one embodiment, the limits rule, equivalence rule and upper/lower rule may be defined for the equalization parameters (in Mixing Rule Box). Mixing Rule Box is a data structure inside the IM AF file, which may contain the information on the rules. The rules are basically the constraints on the user's interaction, which may be defined by the producer and obeyed by the IM AF player.
[0058] In one embodiment, a music file may be arranged to store EQ parameters for individual music tracks, which are grouped into mixing settings or equalization information sets. The mixing settings are intended for different music styles or flavors.
[0059] In one embodiment, a music file is arranged to store rules on how EQ parameters can be modified by a user.
[0060] In one embodiment, a music player may provide a selection of EQ parameters mixing settings or equalization information sets to play back music in different style or created personal mixing settings.
[0061] FIG. 8 illustrates the flowchart of the generating of at least one audio file and the playing of the at least one audio file according to one exemplary embodiment.
[0062] As can be seen from FIG. 8, the producer may compose a piece of music and use interactive music composing tools 801 to generate an audio file 802, e.g. an interactive music file. The generated audio file 802 may include a plurality of audio tracks and equalization information which comprises at least one equalization parameter for at least one track as illustrated in FIG. 5. The generated audio file 802 may further include rules based on which allowed equalization parameters for at least one track or combination of equalization parameters for at least two tracks may be determined as illustrated in FIG. 7. The end user may use an interactive music player 803 to play the audio file 802. The end user may for example select a equalization information set among a plurality of predefined equalization information sets contained in the audio file Alternatively, the end user may create or edit an equalization information set for the audio signal in the audio file 802. The created or edited equalization information set may be subjected to rules contained in the audio file 802.
[0063] Different extent of support of equalization may be provided by different brands of IM AF file, due to the concern of computational complexity. A brand stands for the brand identifier plus all the requirements that the IM AF file needs to conform to, and is intended for the media player to playback the media content properly. The
requirements may for example define what audio coding format may be used (e.g.
MP3/AAC/SAOC/PCM), and how many simultaneously decoded audio tracks may be included etc. Different brands basically confine the complexity of the IM AF players for different applications. In the current standard specification, there are 7 brands defined, i.e. 'imOl', 'im02', 'im03\ 'im04', 'iml l ', 'iml2', and 'im21 '. A brand identifier may be a short field in the media file format (a string of four characters e.g. 'imOl ' or 'mp4a'), which is included in the IM AF file and indicates a particular brand, and which may also indicate what media with what configuration can be expected in the file. A brand identifier may be seen as a top level handler which indicates the type of file format that the structure uses. In various embodiments, filtering computation based on 2nd order Infinite Impulse Response filter may not be a concern. However, it might still be a concern, especially for mobile applications. Concerning computing complexity, the maximal number of filters simultaneously activated may be limited for different brand identifiers, i.e. 'im01','im02,,,im03', 'im04', 'iml l ', iml2\ 'im21 \ or a new brand identifier may be specified for the equalization support. [0064] FIG. 9 illustrates how an player 920 may play an audio file or data file 901 according to an exemplary embodiment. The player 920 may be an interactive music player, for example.
[0065] In this exemplary embodiment, the audio file or the data file 901 may be an IM AF file. The audio file 901 may include preset data 903. The audio file 901 may further include audio data 905. For example, the audio data 905 may include a plurality of audio tracks. The audio. file 901 may further include grouping data 907. The audio file 901 may further include rule data 909. In addition, the audio file may further include 3 GPP timed text 910, MPEG-7 metadata 913, and JPEG image 915.
[0066] The audio or data file 901 may be played by the player 920. The player 920 may be an IM AF player. The player 920 may include audio decoders 921. The player 920 may further include a mixer 923. The player 920 may further include a rule analyzer 925. The player 920 may further include a text decoder 927. The player 920 may further include a JPEG decoder 929.
[0067] In one embodiment, the preset data 903 may include predefined equalization information for the audio signal contained in the audio data 905. The equalization information may include, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output, wherein the weighting factors for at least two of the at least two audio frequencies are pairwise different. The equalization information may include at least one equalization parameter for at least one track. In a further embodiment, the equalization information may include a plurality of equalization information sets, wherein each set comprises at least one equalization parameter for at least one track. In operation, for example, the end user may select a equalization parameter set among the plurality of equalization information sets such that the decoded audio signal is played by the player 920 according to the selected equalization parameter set.
[0068] In one embodiment, the grouping data 907 may include the information representing the user's selection of the tracks. The grouping data 907 may be created when the user creates or edits the selection of tracks.
[0069] In one embodiment, the rule data 909 may include rules or constraints based on which allowed weighting factors (or allowed combinations of weighting factors) may be determined. In operation, for example, the end user may create equalization information for the audio signal contained in the audio or data file 901, and the rule analyzer 925 may determine whether the created equalization information satisfies the rules or constraints in the rule data 909. If yes, the player 920 may play the decoded audio signal according to the created equalization information by the end user.
[0070] The audio decoders 921 may decode the encoded audio signals in the audio data 905, and the mixer 923 may mix all the decoded audio tracks such that all the decoded audio tracks may be played simultaneously according to the determined equalization information. The equalization information may be selected from a plurality of predefined equalization sets contained in the preset data 903. Alternatively, the equalization information may be created by an end user. The equalization information created by the end user may be subjected to predefined rules or constraints contained in the rule date 909. [0071] The text decoder 927 may be configured to decode the 3GPP timed text in the audio or data file 901 and may be further configured to output the decoded display text.
[0072] The JPEG decoder 929 may be configured to decode the JPEG image contained in the audio or data file 901 , and may be further configured to output the decoded JPEG image for display.
[0073] In summary, various embodiments provide a method of generating an audio file which has interactive music application format.
[0074] The various embodiments further provide a mechanism to support of audio equalization under the current IM AF framework. The various embodiments also provide a file structure based on the IM AF format, to support of audio equalization under the current IM AF framework.
[0075] Various embodiments provide another dimension of multiple-track-based interactivity, equalization interactivity, where the listener may modify the frequency band level of at least one track or element of the hierarchy (under interactivity requirement), and that each preset in an IM AF file may be able to describe equalization information (under preset requirements). Various embodiments also provide the support for audio equalization and interactivity.
[0076] Various embodiments further provide a music player that may allow an end user to manipulate and store audio equalization settings of multiple-track-music.
[0077] While the invention has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Claims

Claims is claimed is:
A method for generating at least one audio file comprising:
including at least one encoded audio signal in the at least one audio file;
including equalization information for the audio signal in the at least one
Figure imgf000026_0001
2. The method as claimed in claim 1 , wherein the equalization information includes, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighting factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output, wherein the weighting factors for at least two of the at least two audio frequencies are pairwise different.
3. The method as claimed in any of claims 1-2, wherein the at least one encoded audio signal comprises a plurality of tracks, each track comprising an encoded component of the audio signal.
4. The method as claimed in claim 3, wherein the audio signal represents a piece of music and each component of the audio signal represents the part of the piece of music played by at least one respective instrument.
5. The method as claimed in claim 4, wherein instruments are different for different tracks.
6. The method as claimed in any of claims 3-5, wherein the equalization information comprises at least one equalization parameter for at least one track.
7. The method as claimed in claim 6, wherein the equalization information includes a plurality of equalization information sets, each set comprising at least one equalization parameter for at least one track.
8. The method as claimed in any of claims 1-7, wherein the equalization information includes filter parameters.
9. The method as claimed in claim 2, wherein the equalization information specifies the weighting factors by rules based on which allowed weighting factors or allowed combinations of weighting factors may be determined.
The method as claimed in any of claims 1-9, wherein the at least one file is one
11. The method as claimed in any of claims 1-9, wherein the at least one file is a plurality of files.
12. A device for generating at least one audio file comprising:
a first unit configured to include at least one encoded audio signal in the at least one audio file;
a second unit configured to include equalization information for the audio signal in the at least one audio file.
13. A method for playing at least one audio file comprising:
decoding at least one encoded audio signal from the at least one audio file;
determining equalization information for the audio signal from the at least one audio file;
outputting the decoded audio signal according to the determined equalization information.
14. The method as claimed in claim 13, wherein the equalization information includes, for each audio frequency of a plurality of audio frequencies, a specification of at least one frequency weighing factor by which a frequency component of the audio signal having the audio frequency is to be weighted when the frequency component is output, wherein the weighting factors for at least two of the at least two audio frequencies are pairwise different.
15. The method as claimed in claim 13, wherein the weighting factors are determined based on user input of a equalization information set from a plurality of equalization information sets.
16. A device for playing at least one audio file, comprising: a decoding unit configured to decode at least one encoded audio signal from the at least one audio file;
a determining unit configured to determine equalization information for the audio signal from the at least one audio file;
an outputting unit configured to output the decoded audio signal according to the determined equalization information.
PCT/SG2011/000021 2010-01-15 2011-01-17 A method and a device for generating at least one audio file, and a method and a device for playing at least one audio file WO2011087460A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG201000317 2010-01-15
SG201000317-6 2010-01-15

Publications (1)

Publication Number Publication Date
WO2011087460A1 true WO2011087460A1 (en) 2011-07-21

Family

ID=44304519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2011/000021 WO2011087460A1 (en) 2010-01-15 2011-01-17 A method and a device for generating at least one audio file, and a method and a device for playing at least one audio file

Country Status (1)

Country Link
WO (1) WO2011087460A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934790B2 (en) 2015-07-31 2018-04-03 Apple Inc. Encoded audio metadata-based equalization
WO2018065664A1 (en) * 2016-10-03 2018-04-12 Nokia Technologies Oy Method of editing audio signals using separated objects and associated apparatus
WO2019054992A1 (en) * 2017-09-12 2019-03-21 Rovi Guides, Inc. Systems and methods for determining whether to adjust volumes of individual audio components in a media asset based on a type of a segment of the media asset
US10341770B2 (en) 2015-09-30 2019-07-02 Apple Inc. Encoded audio metadata-based loudness equalization and dynamic equalization during DRC
US20210165628A1 (en) * 2019-12-03 2021-06-03 Audible Reality Inc. Systems and methods for selecting and sharing audio presets

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070044643A1 (en) * 2005-08-29 2007-03-01 Huffman Eric C Method and Apparatus for Automating the Mixing of Multi-Track Digital Audio
US20070127739A1 (en) * 2005-12-02 2007-06-07 Samsung Electronics Co., Ltd. Method of setting equalizer for audio file and method of reproducing audio file
WO2007137232A2 (en) * 2006-05-20 2007-11-29 Personics Holdings Inc. Method of modifying audio content
EP2112651A1 (en) * 2008-04-24 2009-10-28 LG Electronics Inc. A method and an apparatus for processing an audio signal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070044643A1 (en) * 2005-08-29 2007-03-01 Huffman Eric C Method and Apparatus for Automating the Mixing of Multi-Track Digital Audio
US20070127739A1 (en) * 2005-12-02 2007-06-07 Samsung Electronics Co., Ltd. Method of setting equalizer for audio file and method of reproducing audio file
WO2007137232A2 (en) * 2006-05-20 2007-11-29 Personics Holdings Inc. Method of modifying audio content
EP2112651A1 (en) * 2008-04-24 2009-10-28 LG Electronics Inc. A method and an apparatus for processing an audio signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Information Technology - Multimedia Application format (MPEG-A) - Part 12: Interactive Music Application format", ISO/IEC FCD 23000-12; ISO/IEC JTC1/SC29/WG11/W10970, 3 July 2009 (2009-07-03) *
KIM, K ET AL.: "MAF Overview", ISO/IEC/JTC1/SC29/WGL I/N10233, October 2008 (2008-10-01), BUSAN, KOREA *
ZHU, Y ET AL.: "A proposal for the support of audio equalization and interactivity of IM AF", ISO/IEC JTC1/SC29/WG111M17341, 15 January 2010 (2010-01-15), KYOTO, JAPAN. *
ZHU, Y ET AL.: "Comments on Study of ISO/IEC FCD 23000-12 Interactive Music AF", (N10970); ISO/IEC/JTC1/SC29/WG11/M17340, 15 January 2010 (2010-01-15), KYOTO, JAPAN *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934790B2 (en) 2015-07-31 2018-04-03 Apple Inc. Encoded audio metadata-based equalization
US10699726B2 (en) 2015-07-31 2020-06-30 Apple Inc. Encoded audio metadata-based equalization
US10341770B2 (en) 2015-09-30 2019-07-02 Apple Inc. Encoded audio metadata-based loudness equalization and dynamic equalization during DRC
WO2018065664A1 (en) * 2016-10-03 2018-04-12 Nokia Technologies Oy Method of editing audio signals using separated objects and associated apparatus
CN109844859A (en) * 2016-10-03 2019-06-04 诺基亚技术有限公司 Method and associated device using separated object editing audio signal
US10349196B2 (en) 2016-10-03 2019-07-09 Nokia Technologies Oy Method of editing audio signals using separated objects and associated apparatus
JP2019533195A (en) * 2016-10-03 2019-11-14 ノキア テクノロジーズ オーユー Method and related apparatus for editing audio signals using isolated objects
US10623879B2 (en) 2016-10-03 2020-04-14 Nokia Technologies Oy Method of editing audio signals using separated objects and associated apparatus
WO2019054992A1 (en) * 2017-09-12 2019-03-21 Rovi Guides, Inc. Systems and methods for determining whether to adjust volumes of individual audio components in a media asset based on a type of a segment of the media asset
US11503379B2 (en) 2017-09-12 2022-11-15 Rovi Guides, Inc. Systems and methods for determining whether to adjust volumes of individual audio components in a media asset based on a type of a segment of the media asset
US20210165628A1 (en) * 2019-12-03 2021-06-03 Audible Reality Inc. Systems and methods for selecting and sharing audio presets

Similar Documents

Publication Publication Date Title
US11501789B2 (en) Encoded audio metadata-based equalization
JP5467105B2 (en) Apparatus and method for generating an audio output signal using object-based metadata
EP2974010B1 (en) Automatic multi-channel music mix from multiple audio stems
KR101118922B1 (en) Acoustical virtual reality engine and advanced techniques for enhancing delivered sound
JP2010511189A (en) Method and apparatus for encoding and decoding object-based audio signal
WO2011087460A1 (en) A method and a device for generating at least one audio file, and a method and a device for playing at least one audio file
d'Escrivan Music technology
US20070297624A1 (en) Digital audio encoding
US8670577B2 (en) Electronically-simulated live music
CN111445914B (en) Processing method and device for detachable and re-editable audio signals
KR100584571B1 (en) Audio stream mixing method
Bhalani et al. Karaoke machine implementation and validation using out of phase stereo method
KR101193152B1 (en) Method and apparatus for mixing objects between object based sound source
KR101562041B1 (en) Method for Producing Media Content of Duet Mode, Media Content Producing Device Used Therein
KR100775188B1 (en) Method for mixing music file and terminal using the same
Cancino et al. On Stockhausen’s Solo (s): Beyond Interpretation
WO2007088490A1 (en) Device for and method of processing audio data
Exarchos et al. Audio processing
Matsakis Mastering Object-Based Music with an Emphasis on Philosophy and Proper Techniques for Streaming Platforms
Malyshev Sound production for 360 videos: in a live music performance case study
WO2018092286A1 (en) Sound processing device, sound processing method and program
US20080195925A1 (en) Compressed Media Files with Intrinsic Supplementary Content
Rincón Music technology
Roche Sound and User Feedback
KR20080096611A (en) Method and appratus for providing multi channel music file

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11733168

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11733168

Country of ref document: EP

Kind code of ref document: A1