WO2005106843A1 - Reproduction control of an audio signal based on musical genre classification - Google Patents

Reproduction control of an audio signal based on musical genre classification Download PDF

Info

Publication number
WO2005106843A1
WO2005106843A1 PCT/GB2005/001637 GB2005001637W WO2005106843A1 WO 2005106843 A1 WO2005106843 A1 WO 2005106843A1 GB 2005001637 W GB2005001637 W GB 2005001637W WO 2005106843 A1 WO2005106843 A1 WO 2005106843A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
signal
output
genre
processing module
Prior art date
Application number
PCT/GB2005/001637
Other languages
French (fr)
Inventor
Westley Dowdles
Stewart Chalmers
Christopher Kirkham
Original Assignee
Axeon Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Axeon Limited filed Critical Axeon Limited
Publication of WO2005106843A1 publication Critical patent/WO2005106843A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/307Frequency adjustment, e.g. tone control

Definitions

  • the present invention relates to the field of audio processing and playback. More particularly, the present invention in one of its aspects relates to a method and apparatus for classifying audio content. In one of its aspects, the invention relates to the control of audio processing subsystems. Optimum sound quality or audibility of audio output is dependent to an extent on the audio content being played back. For example, an audio system configured for high quality playback of dance music may sound unsatisfactory . when delivering an acoustic vocal track. Similarly, an audio system configured appropriately for a speech recording may not deliver high quality output of a jazz or classical track. It is therefore desirable in some applications to configure an audio system in a manner particular to specified audio content.
  • audio systems are available that offer pre-set configurations selectable by a user, with each configuration designed to playback a particular style or genre of music in a favourable manner.
  • These systems are inadequate in the sense that a user must select a different setting if the musical style or genre being played back changes, or make do with a setting that may be unsuitable.
  • Such systems are particular unsuitable when the music source is a radio tuner, television tuner, digital library music (e.g. mp3), or a compilation recording, all of which may play tracks of different styles in over relatively short periods .
  • Other applications for classifying audio content include the monitoring of listening habits or radio playlists.
  • US Patent No. 6,542,869 in the name of Foote relates to an automated system for differentiating between musical audio content and speech. This allows the identification of changes in the audio content, and has applications in indexing, summarising, beat tracking and retrieving.
  • the system of US 6,542,869 looks for local self-similarity within the audio signal.
  • US Patent No. 6,647,366 in the name of Wang relates to method of controlling the digital coding rate of an audio system.
  • the technique determines which of a variety of coding rates is appropriate for the class of audio content being delivered.
  • the system recognises five classes of audio, being voiced and unvoiced speech, silence, transient or stationery music.
  • US Patent No. 6,658,383 in the name of Koishida concerns another system for differentiating between music and speech audio signals in order to identify the appropriate linear predictive coding/decoding strategies.
  • the above-referenced documents allow classification of audio content to a limited extent only. For particular applications, for example control of an audio processing subsystem or monitoring of musical tastes, it is desirable to classify audio content with greater precision. It is therefore one object of an aspect of the invention to enable improved differentiation and classification of audio content. It is a related aim of an aspect of the invention to provide classification of audio content by musical style or genre.
  • Automatic control of the output of an audio system is desirable in a number of applications. For example, where a listening space is subject to background noise, automated control systems for audio devices can be used to compensate for ambient noise variations .
  • US Patent Numbers 4,628,526, 5,550,922, and 5,666,426 in the names of Germer, Becker and Helmes respectively propose systems in which ambient noise levels are monitored using microphone sensors, and the overall volume of the audio system is adjusted in response to the signal received from microphone sensor.
  • US Patent No. 4,868,881 in the name of Zwicker discloses a system for automatically adjusting an audio equaliser of an automobile sound system in response to noise signals derived from the extraneous noise in the passenger compartment of the automobile. This prior art system attempts to mask the ambient noise by boosting and/attenuating appropriate bands of the equaliser.
  • US Patent No. 5,208,866 in the name of Kato discloses a system in which an audio compressor is controlled in response to a signal from in vehicle microphone sensor, which monitors the background noise level within the vehicle.
  • a method of classifying audio content comprising the steps of: - Receiving an audio signal from an audio source into a processing module; - Classifying, using a processing module, the audio content by musical style or genre; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal.
  • the musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
  • the method may include the step of: - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames.
  • the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing.
  • the metrics may be other parameters characteristic of the audio content, but it has been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre.
  • the method includes the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device.
  • the method may include the additional step of: - applying a smoothing algorithm to responses of the neural network device to provide a smoothed identification signal for the audio signal.
  • a method of controlling the output of an audio system comprising the steps of: - Receiving an audio signal; - Classifying, using a neural network, the audio signal by musical style or genre; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal; - Adjusting the output of an audio system in response to the identification signal.
  • the musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop .
  • the method may include the step of: - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames.
  • the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing.
  • the metrics may be other parameters characteristic of the audio content, but it has ' been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre.
  • the method includes the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device.
  • the method may include the additional step of: - applying a smoothing algorithm to responses of the neural network device to provide a smoothed identification signal for the audio signal.
  • the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
  • a method of classifying audio content comprising the steps of: - Receiving an audio signal from an audio source into a processing module; - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames, wherein the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing; - Classifying, using a processing module, the audio content by musical style or genre, based on the metrics computed; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal.
  • the musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
  • a method for controlling the output of an audio system comprising the steps of: - Receiving an audio input signal; - Receiving, from an automated audio content classification system, an identification signal corresponding to classification of the audio input signal by audio content; - Adjusting the output of the audio system in response to the identification signal.
  • the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
  • an audio equaliser a non-linear gain control
  • a surround sound processor a spatial sound processor.
  • a method for controlling the output of an audio system comprising the steps of: - Receiving an audio input signal; - Receiving a noise signal indicative of ambient noise levels; - Receiving, from an automated audio content classification system, an identification signal corresponding to classification of the audio input signal by audio content; - Adjusting the output of the audio system in response to the identification signal and the noise signal.
  • the method includes the additional steps of: - Receiving, from an auxiliary audio input device, an ambient noise level signal; - Scaling the audio input signal; - Computing a transformed domain subtraction between the ambient noise level signal and the scaled audio input signal to provide a noise level estimation signal.
  • the method includes the additional steps of: - Quantising the noise level estimation signal into a plurality of levels by applying at least one threshold point, and; - Adjusting the output of the audio system in response to the quantised noise signal.
  • the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
  • an audio equaliser adapted to receive an audio signal from an audio source
  • the processing module including a neural network device adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal.
  • a system for controlling the output of an audio system comprising - a processing module adapted to receive an audio signal from an audio source, the processing module including a neural network device adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal, and; - an audio control subsystem adapted to adjust the output of the audio system in response to the identification signal.
  • a system for classifying audio content comprising a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal.
  • the processing module comprises a neural network device.
  • a system for controlling the output of an audio system comprising: a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal, and; an audio control subsystem adapted to adjust the output of the audio system in response to the identification signal.
  • the processing module comprises a neural network device.
  • the audio control subsystem may comprise at least one of the following components: an audio equaliser, a non- linear gain control, a surround sound processor and a spatial sound processor.
  • a system for controlling the output of an audio system comprising an audio control subsystem adapted to receive an audio input signal and an identification signal from an automated audio content classification system, the identification signal corresponding to classification of the audio input signal by audio content, wherein the audio control subsystem is adapted to adjust the output of the audio system in response to the identification signal .
  • the audio control subsystem comprises at least one of the following components : an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
  • the audio control subsystem is further adapted to receive a noise signal indicative of ambient noise levels from an auxiliary audio input device.
  • the system may be adapted to adjust the output of the audio system in response to the identification signal and the noise signal.
  • a system for classifying audio content the system being adapted to implement the method of the first or third aspects of the invention.
  • a system for controlling the output of an audio system the system being adapted to implement the method of any of the second, fourth or fifth aspects of the invention.
  • Figure 1 is a schematic overview of the system in accordance with an example 'embodiment of the invention
  • Figure 2 is a block diagram representing the functionality of the system of Figure 1 as a series of method steps
  • Figure 3 is a schematic overview of a system in accordance with an alternative embodiment of the invention
  • Figure 4 is a block diagram representing the functionality of the system of Figure 3 as a series of method steps.
  • FIG. 10 a schematic representation of a system according to an example embodiment of the invention.
  • the system comprises an audio source 11, which in this example is a Compact Disc Digital Audio (CDDA) source.
  • the source 11 provides an audio input signal to a processing module 12, adapted to classify the audio content of the audio input signal and output an identification signal.
  • the processing module 12 may be implemented in hardware, software, or a combination of hardware and software.
  • processing module 12 includes a trained neural network device 16, which may be for example a device of the type described in International Patent Publication Number WO 00/45333 Al in the name of AXEON Limited.
  • This type of neural processor marketed under the VindAX ® brand, has been recognised as being particularly useful for classification-type tasks.
  • the VindAX ® device has a continuous learning capability that offers additional functionality in the context of this invention, including user configuration functions.
  • the processing module includes a pre-processing module 14, and a post-processing module 17.
  • the source 11 also inputs the audio input signal to audio control subsystem 18, which controls the output signal provide to audio output device 20.
  • audio control subsystem 19 Typical components of the audio control subsystem 19 are an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor, although this list should not be considered as exhaustive.
  • the audio output device 20 is, for example, a loudspeaker device or amplification circuitry.
  • Figure 2 is a block diagram representing the functioning of the system of Figure 1 as a series of steps of a method, generally depicted at 30.
  • the audio input signal is received in the processing module.
  • the audio input signal undergoes pre-processing 34, to provide suitable inputs for later analysis of the audio signal.
  • Pre-processing 34 is conducted on a series of audio frames extracted from the audio signal, each audio frame consisting of a fixed number of samples of the audio signal.
  • metrics characteristic of the audio signal are computed.
  • the metrics may be centre point of the frequency spectrum (centroid) , end point of the spectrum, low energy sections, including estimation of beats per minute, flux, or zero-crossing.
  • Other parameters characteristic of the audio content may also be used, but it has been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre. Indeed, one aspect of the invention
  • Feature vectors are produced 35 based on the metrics computed for the audio frames, which are then appropriately scaled for presentation to the inputs of the neural network device 16.
  • the neural network device 16 classifies 36 the audio signal by audio content processing the feature vectors and outputting appropriate responses indicative of the class to which the audio content has been assigned. By selecting an appropriate neural network device and metrics, the audio frames are classified by musical style or genre of the content within that audio frame.
  • the classes of musical style or genre are selected from, for example:
  • the identification signals are received 37 in the post- processing module 17, which generates an output function received 39 in the audio control subsystem 18.
  • a smoothing algorithm is applied 38 in the post-processing module to provide a smoothed output function to the audio control subsystem 18.
  • the audio control subsystem receives an input from the audio source 11, and adjusts 40 the settings of the audio control subsystem in accordance with parameters determined by the output function.
  • the audio control subsystem 18 therefore acts in response to the output function, and the ultimate audio output is adjusted 41 in a manner dependent on the musical style or genre identified by the neural network 16.
  • the output function from the post- processing module 17 is capable of adjusting the settings of the audio equaliser and/or the non-linear gain.
  • the output function may adjust the settings of linear volume, a surround sound processor or spatial sound processor.
  • the smoothing 38 of the output function by the post- processing module 17 allows the system to avoid stepwise changes in the settings of the audio components, and thereby avoid harsh changes to the audio output itself.
  • the linear volume may also be adjusted when implementing new settings of the other audio components in order to smooth the audio output effect.
  • the system and method described above provide a means for automatically classifying audio content and adjusting audio output in real time, according to the musical style or genre of the audio signal being played back.
  • FIG. 3 is a schematic representation of a system in accordance with an alternative embodiment of the invention.
  • This system generally depicted at 50, includes components accounting for environmental and ambient noise levels.
  • the system 50 comprises the components of Figure 1, with like components signified with like reference numerals. These like components function in the manner described with reference to Figure 2, with various differences described below.
  • the system includes an auxiliary audio input 21, in the form of an environment microphone sensor.
  • the microphone sensor 21 functions to monitor background or ambient noise levels present within the listening space.
  • the signal from the microphone sensor 21 is received in a noise level estimation module 23, along with the audio input signal from the main audio source 11.
  • a noise quantisation module 26 is also provided, receiving an input from the noise level estimation module 23 and outputting to the post-processing module 17.
  • the post-processing module 17 provides an output function to the audio control subsystem 18.
  • Figure 4 is a block diagram representing the functioning of the system of Figure 3 " as a series of steps of a method, generally depicted at 70.
  • the noise level estimation module 23 receives the inputs from the auxiliary audio input and the main audio source respectively. An appropriate scaling factor is applied 53 to the main audio signal. Subsequently, the noise level estimation module 23 computes 54 a transformed domain subtraction between the two signals.
  • the resulting signal is received by the noise quantisation module 26.
  • the signal is quantised 56 into three levels by applying two threshold points and an appropriate guard space. It will be appreciated by one skilled in the art that the invention is not limited to a particular number of quantisation levels or threshold points.
  • the quantised noise level signal is received 57a in the post-processing module 17.
  • This embodiment of the invention also implements the method steps described with reference to Figure 2, and outputs an audio content identification signal to the post-processing module 17.
  • the post-processing module 17 receives 57a a noise level signal and the audio content identification signal received 57b from the post- processing module 17.
  • the audio output device 20 is adjusted 61 in a manner dependent on the musical style or genre identified by the neural network 16 and the estimated background noise levels.
  • the output function from the post-processing module 17 is capable of adjusting the settings of the audio equaliser and/or the non-linear gain. In alternative systems, the output function may adjust the settings of linear volume, a surround sound processor, or a spatial sound processor.
  • the smoothing 58 of the output function by the post-processing module 17 allows the system to avoid stepwise changes in the settings of the audio components, and thereby avoid harsh changes to the audio output itself.
  • the linear volume may also be adjusted when implementing new settings of the other audio components in order to smooth the audio output effect.
  • the description refers to CDDA sources, although it will be evident that other audio sources can be used including: digital library music such as mp3 , radio tuners, television tuners, DVD-Audio, SACD, HDCC etc.
  • digital library music such as mp3
  • radio tuners such as mp3
  • television tuners such as DVD-Audio
  • SACD such as stereo tuners
  • HDCC high definition digital channels
  • the signals from the sources may be digital or analogue.
  • the signals may be single-channel and/or multi-channel.
  • the individual channels can be summed into a single channel and processed in the manner described above.
  • each channel of the multi- channel signal can be processed individually.
  • the embodiments' described include a neural processor of the type described in International Patent Publication Number WO 00/45333 Al . While it is considered to be beneficial to use such a system for the purposes of the present invention, alternative systems may achieve similar results.
  • Alternative processing modules may be implemented in software or hardware, and may include alternative neural network devices, look-up tables, algorithms, or genetic algorithms.
  • the post-processing module may, in alternative embodiments, include threshold and history functions to add stability to the system.
  • the threshold function uses the low-energy metrics to prevent the system from classifying periods of "silence" within the audio signal.
  • the history function provides context for the audio control during start-up periods of the audio signal, and can prevent reclassification for slight variations in the audio signal.
  • the present invention in one of its aspects provides an improved audio system by classifying, in real time, the audio content by musical style or genre. This has application, for example, in indexing or cataloguing of music, and monitoring listening habits.
  • the classification by musical style or genre is applied to control of an audio system.
  • the settings of an audio control subsystem can be changed to suit the musical style or genre being played back.
  • ambient noise levels are also estimated and contribute to the control of the audio control subsystem.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

A method and system for classifying audio content according to musical style or genre is described. In one aspect of the invention., a method and system for adjusting parameters of an audio system according to the classification musical style or genre is provided. Embodiments of the invention use a neural network device in the classification stage, and automatically adjust one or more of an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor. A further embodiment of the invention adjusts parameters of the audio system according to an ambient noise level.

Description

REPRODUCTION CONTROL OF AN AUDIO SIGNAL BASED ON MUSICAL GENRE CLASSIFICATION
The present invention relates to the field of audio processing and playback. More particularly, the present invention in one of its aspects relates to a method and apparatus for classifying audio content. In one of its aspects, the invention relates to the control of audio processing subsystems. Optimum sound quality or audibility of audio output is dependent to an extent on the audio content being played back. For example, an audio system configured for high quality playback of dance music may sound unsatisfactory . when delivering an acoustic vocal track. Similarly, an audio system configured appropriately for a speech recording may not deliver high quality output of a jazz or classical track. It is therefore desirable in some applications to configure an audio system in a manner particular to specified audio content. For example, audio systems are available that offer pre-set configurations selectable by a user, with each configuration designed to playback a particular style or genre of music in a favourable manner. These systems are inadequate in the sense that a user must select a different setting if the musical style or genre being played back changes, or make do with a setting that may be unsuitable. Such systems are particular unsuitable when the music source is a radio tuner, television tuner, digital library music (e.g. mp3), or a compilation recording, all of which may play tracks of different styles in over relatively short periods .
Other applications for classifying audio content include the monitoring of listening habits or radio playlists.
US Patent No. 6,542,869 in the name of Foote relates to an automated system for differentiating between musical audio content and speech. This allows the identification of changes in the audio content, and has applications in indexing, summarising, beat tracking and retrieving. The system of US 6,542,869 looks for local self-similarity within the audio signal.
US Patent No. 6,647,366 in the name of Wang relates to method of controlling the digital coding rate of an audio system. The technique determines which of a variety of coding rates is appropriate for the class of audio content being delivered. The system recognises five classes of audio, being voiced and unvoiced speech, silence, transient or stationery music.
US Patent No. 6,658,383 in the name of Koishida concerns another system for differentiating between music and speech audio signals in order to identify the appropriate linear predictive coding/decoding strategies. The above-referenced documents allow classification of audio content to a limited extent only. For particular applications, for example control of an audio processing subsystem or monitoring of musical tastes, it is desirable to classify audio content with greater precision. It is therefore one object of an aspect of the invention to enable improved differentiation and classification of audio content. It is a related aim of an aspect of the invention to provide classification of audio content by musical style or genre.
Automatic control of the output of an audio system is desirable in a number of applications. For example, where a listening space is subject to background noise, automated control systems for audio devices can be used to compensate for ambient noise variations .
US Patent Numbers 4,628,526, 5,550,922, and 5,666,426 in the names of Germer, Becker and Helmes respectively propose systems in which ambient noise levels are monitored using microphone sensors, and the overall volume of the audio system is adjusted in response to the signal received from microphone sensor.
US Patent No. 4,868,881 in the name of Zwicker discloses a system for automatically adjusting an audio equaliser of an automobile sound system in response to noise signals derived from the extraneous noise in the passenger compartment of the automobile. This prior art system attempts to mask the ambient noise by boosting and/attenuating appropriate bands of the equaliser. US Patent No. 5,208,866 in the name of Kato discloses a system in which an audio compressor is controlled in response to a signal from in vehicle microphone sensor, which monitors the background noise level within the vehicle.
The above referenced prior art documents disclose systems capable of adjusting individual audio components in response to background noise signals received from microphone sensors. However, each is limited in its ability to control audio output.
It is one aim of an aspect of the invention to provide an improved system for automatically adjusting the output of an audio system, capable of controlling a variety of audio components.
It is a further aim of an aspect of the invention to provide an automated system for automatically adjusting an audio system in a manner dependent on the classification of audio content.
Further aims and objects of the invention will become apparent from reading the following description.
According to the first aspect of the invention, there is provided a method of classifying audio content, the method comprising the steps of: - Receiving an audio signal from an audio source into a processing module; - Classifying, using a processing module, the audio content by musical style or genre; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal.
The musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
The method may include the step of: - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames.
Preferably, the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing.
The metrics may be other parameters characteristic of the audio content, but it has been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre.
Preferably, the method includes the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device.
The method may include the additional step of: - applying a smoothing algorithm to responses of the neural network device to provide a smoothed identification signal for the audio signal.
According to a second aspect of the invention, there is provided a method of controlling the output of an audio system, the method comprising the steps of: - Receiving an audio signal; - Classifying, using a neural network, the audio signal by musical style or genre; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal; - Adjusting the output of an audio system in response to the identification signal.
The musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop .
The method may include the step of: - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames.
Preferably, the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing. The metrics may be other parameters characteristic of the audio content, but it has' been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre.
Preferably, the method includes the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device.
The method may include the additional step of: - applying a smoothing algorithm to responses of the neural network device to provide a smoothed identification signal for the audio signal.
Preferably, the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
According to a third aspect of the invention, there is provided a method of classifying audio content, the method comprising the steps of: - Receiving an audio signal from an audio source into a processing module; - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames, wherein the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing; - Classifying, using a processing module, the audio content by musical style or genre, based on the metrics computed; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal.
The musical style or genre may be selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
According to a fourth aspect of the invention, there is provided a method for controlling the output of an audio system, the method comprising the steps of: - Receiving an audio input signal; - Receiving, from an automated audio content classification system, an identification signal corresponding to classification of the audio input signal by audio content; - Adjusting the output of the audio system in response to the identification signal.
Preferably, the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor. According to a fifth aspect of the invention, there is provided a method for controlling the output of an audio system, the method comprising the steps of: - Receiving an audio input signal; - Receiving a noise signal indicative of ambient noise levels; - Receiving, from an automated audio content classification system, an identification signal corresponding to classification of the audio input signal by audio content; - Adjusting the output of the audio system in response to the identification signal and the noise signal.
Preferably, the method includes the additional steps of: - Receiving, from an auxiliary audio input device, an ambient noise level signal; - Scaling the audio input signal; - Computing a transformed domain subtraction between the ambient noise level signal and the scaled audio input signal to provide a noise level estimation signal.
Preferably, the method includes the additional steps of: - Quantising the noise level estimation signal into a plurality of levels by applying at least one threshold point, and; - Adjusting the output of the audio system in response to the quantised noise signal.
Preferably, the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor. According to a sixth aspect of the invention there is provided a system for classifying audio content, the system comprising a processing module adapted to receive an audio signal from an audio source, the processing module including a neural network device adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal.
According to a seventh aspect of the invention there is provided a system for controlling the output of an audio system, the system comprising - a processing module adapted to receive an audio signal from an audio source, the processing module including a neural network device adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal, and; - an audio control subsystem adapted to adjust the output of the audio system in response to the identification signal.
According to an eighth aspect of the invention there is provided a system for classifying audio content, the system comprising a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal. Preferably, the processing module comprises a neural network device.
According to an eighth aspect of the invention there is provided a system for controlling the output of an audio system, the system comprising: a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal, and; an audio control subsystem adapted to adjust the output of the audio system in response to the identification signal.
Preferably, the processing module comprises a neural network device.
The audio control subsystem may comprise at least one of the following components: an audio equaliser, a non- linear gain control, a surround sound processor and a spatial sound processor.
According to a ninth aspect of the invention there is provided a system for controlling the output of an audio system, the system comprising an audio control subsystem adapted to receive an audio input signal and an identification signal from an automated audio content classification system, the identification signal corresponding to classification of the audio input signal by audio content, wherein the audio control subsystem is adapted to adjust the output of the audio system in response to the identification signal .
Preferably, the audio control subsystem comprises at least one of the following components : an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
In one embodiment, the audio control subsystem is further adapted to receive a noise signal indicative of ambient noise levels from an auxiliary audio input device.
The system may be adapted to adjust the output of the audio system in response to the identification signal and the noise signal.
According to a tenth aspect of the invention there is provided a system for classifying audio content, the system being adapted to implement the method of the first or third aspects of the invention.
According to an eleventh aspect of the invention there is provided a system for controlling the output of an audio system, the system being adapted to implement the method of any of the second, fourth or fifth aspects of the invention.
There will now be described, by way of example only, an embodiment of the invention with reference to the following drawings, of which: Figure 1 is a schematic overview of the system in accordance with an example 'embodiment of the invention; Figure 2 is a block diagram representing the functionality of the system of Figure 1 as a series of method steps; Figure 3 is a schematic overview of a system in accordance with an alternative embodiment of the invention, and; Figure 4 is a block diagram representing the functionality of the system of Figure 3 as a series of method steps.
Referring firstly to Figure 1 of the drawings, there is shown, generally depicted at 10, a schematic representation of a system according to an example embodiment of the invention. The system comprises an audio source 11, which in this example is a Compact Disc Digital Audio (CDDA) source. The source 11 provides an audio input signal to a processing module 12, adapted to classify the audio content of the audio input signal and output an identification signal. The processing module 12 may be implemented in hardware, software, or a combination of hardware and software.
In this example, processing module 12 includes a trained neural network device 16, which may be for example a device of the type described in International Patent Publication Number WO 00/45333 Al in the name of AXEON Limited. This type of neural processor, marketed under the VindAX ® brand, has been recognised as being particularly useful for classification-type tasks. The VindAX ® device has a continuous learning capability that offers additional functionality in the context of this invention, including user configuration functions. The processing module includes a pre-processing module 14, and a post-processing module 17.
The source 11 also inputs the audio input signal to audio control subsystem 18, which controls the output signal provide to audio output device 20. Typical components of the audio control subsystem 19 are an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor, although this list should not be considered as exhaustive. The audio output device 20 is, for example, a loudspeaker device or amplification circuitry.
Figure 2 is a block diagram representing the functioning of the system of Figure 1 as a series of steps of a method, generally depicted at 30.
At 33, the audio input signal is received in the processing module.
The audio input signal undergoes pre-processing 34, to provide suitable inputs for later analysis of the audio signal. Pre-processing 34 is conducted on a series of audio frames extracted from the audio signal, each audio frame consisting of a fixed number of samples of the audio signal. For each audio frame analysed, metrics characteristic of the audio signal are computed. The metrics may be centre point of the frequency spectrum (centroid) , end point of the spectrum, low energy sections, including estimation of beats per minute, flux, or zero-crossing. Other parameters characteristic of the audio content may also be used, but it has been recognised by the inventors that selecting a subset of the above-listed parameters is particularly useful for classifying audio content by musical style or genre. Indeed, one aspect of the invention
Feature vectors are produced 35 based on the metrics computed for the audio frames, which are then appropriately scaled for presentation to the inputs of the neural network device 16.
The neural network device 16 classifies 36 the audio signal by audio content processing the feature vectors and outputting appropriate responses indicative of the class to which the audio content has been assigned. By selecting an appropriate neural network device and metrics, the audio frames are classified by musical style or genre of the content within that audio frame.
The classes of musical style or genre are selected from, for example:
• Voice • Easy Listening/Jazz • Rock/Blues • Orchestral/Classical • Beat/Dance • Urban/Hip-hop However, this list of musical styles or genres is not exhaustive.
The identification signals are received 37 in the post- processing module 17, which generates an output function received 39 in the audio control subsystem 18. A smoothing algorithm is applied 38 in the post-processing module to provide a smoothed output function to the audio control subsystem 18.
The audio control subsystem receives an input from the audio source 11, and adjusts 40 the settings of the audio control subsystem in accordance with parameters determined by the output function. The audio control subsystem 18 therefore acts in response to the output function, and the ultimate audio output is adjusted 41 in a manner dependent on the musical style or genre identified by the neural network 16.
In particular, the output function from the post- processing module 17 is capable of adjusting the settings of the audio equaliser and/or the non-linear gain. In alternative systems, the output function may adjust the settings of linear volume, a surround sound processor or spatial sound processor.
The smoothing 38 of the output function by the post- processing module 17 allows the system to avoid stepwise changes in the settings of the audio components, and thereby avoid harsh changes to the audio output itself. Optionally, the linear volume may also be adjusted when implementing new settings of the other audio components in order to smooth the audio output effect. The system and method described above provide a means for automatically classifying audio content and adjusting audio output in real time, according to the musical style or genre of the audio signal being played back.
Figure 3 is a schematic representation of a system in accordance with an alternative embodiment of the invention. This system, generally depicted at 50, includes components accounting for environmental and ambient noise levels.
The system 50 comprises the components of Figure 1, with like components signified with like reference numerals. These like components function in the manner described with reference to Figure 2, with various differences described below.
The system includes an auxiliary audio input 21, in the form of an environment microphone sensor. The microphone sensor 21 functions to monitor background or ambient noise levels present within the listening space. The signal from the microphone sensor 21 is received in a noise level estimation module 23, along with the audio input signal from the main audio source 11. A noise quantisation module 26 is also provided, receiving an input from the noise level estimation module 23 and outputting to the post-processing module 17. As before, the post-processing module 17 provides an output function to the audio control subsystem 18. Figure 4 is a block diagram representing the functioning of the system of Figure 3 " as a series of steps of a method, generally depicted at 70.
At 51 and 52 the noise level estimation module 23 receives the inputs from the auxiliary audio input and the main audio source respectively. An appropriate scaling factor is applied 53 to the main audio signal. Subsequently, the noise level estimation module 23 computes 54 a transformed domain subtraction between the two signals.
The resulting signal is received by the noise quantisation module 26. The signal is quantised 56 into three levels by applying two threshold points and an appropriate guard space. It will be appreciated by one skilled in the art that the invention is not limited to a particular number of quantisation levels or threshold points. The quantised noise level signal is received 57a in the post-processing module 17.
This embodiment of the invention also implements the method steps described with reference to Figure 2, and outputs an audio content identification signal to the post-processing module 17. The post-processing module 17 receives 57a a noise level signal and the audio content identification signal received 57b from the post- processing module 17. The audio output device 20 is adjusted 61 in a manner dependent on the musical style or genre identified by the neural network 16 and the estimated background noise levels. As in the embodiment of Figure 2 , the output function from the post-processing module 17 is capable of adjusting the settings of the audio equaliser and/or the non-linear gain. In alternative systems, the output function may adjust the settings of linear volume, a surround sound processor, or a spatial sound processor.
Also as before, the smoothing 58 of the output function by the post-processing module 17 allows the system to avoid stepwise changes in the settings of the audio components, and thereby avoid harsh changes to the audio output itself. Optionally, the linear volume may also be adjusted when implementing new settings of the other audio components in order to smooth the audio output effect.
It will be appreciated by one skilled in the art that variations to the above-described embodiments can be made within the scope of the invention herein intended.
For example, the description refers to CDDA sources, although it will be evident that other audio sources can be used including: digital library music such as mp3 , radio tuners, television tuners, DVD-Audio, SACD, HDCC etc. The signals from the sources may be digital or analogue.
Furthermore, the signals may be single-channel and/or multi-channel. For a multi-channel input signal, the individual channels can be summed into a single channel and processed in the manner described above. Alternatively, or in addition, each channel of the multi- channel signal can be processed individually. The embodiments' described include a neural processor of the type described in International Patent Publication Number WO 00/45333 Al . While it is considered to be beneficial to use such a system for the purposes of the present invention, alternative systems may achieve similar results. Alternative processing modules may be implemented in software or hardware, and may include alternative neural network devices, look-up tables, algorithms, or genetic algorithms.
The post-processing module may, in alternative embodiments, include threshold and history functions to add stability to the system. The threshold function uses the low-energy metrics to prevent the system from classifying periods of "silence" within the audio signal. The history function provides context for the audio control during start-up periods of the audio signal, and can prevent reclassification for slight variations in the audio signal.
The present invention in one of its aspects provides an improved audio system by classifying, in real time, the audio content by musical style or genre. This has application, for example, in indexing or cataloguing of music, and monitoring listening habits.
In another aspect, the classification by musical style or genre is applied to control of an audio system. The settings of an audio control subsystem can be changed to suit the musical style or genre being played back. In one embodiment, ambient noise levels are also estimated and contribute to the control of the audio control subsystem.
These aspects of the invention have particular application to in-car audio systems, in which road noise and engine noise can be compensated for and the audio settings are configured optimally for the music being played back. This enhances the driver's listening experience and reduces the need for the driver to manually adjust audio controls. Applications to other audio systems are also envisaged.

Claims

1 Claims
2
3 1. A method of classifying audio content, the method
4 comprising the steps of:
5. - Receiving an audio signal from an audio source
6 into a processing module;
7 - Classifying, using a processing module, the audio
8 content by musical style or genre;
9 - Generating an identification signal indicative of0 the musical style or genre of the audio content of1 the audio signal.23 2. The method as claimed in Claim 1 wherein the step of4 classifying the audio content is carried out in a5 neural network device.67 3. The method as claimed in Claim 1 or Claim 2 wherein8 the musical style or genre is selected from the9 group consisting of: Voice; Easy Listening/Jazz;0 Rock/Blues; Orchestral/Classical; Beat/Dance;1 Urban/Hip-hop.23 4. The method as claimed in any of Claims 1 to 34 including the additional steps of: 5 - Extracting a plurality of audio frames from the 6 audio signal, and;7 - Computing one or more metrics characterising at 8 least some of the extracted audio frames.9 0 5. The method as claimed in Claim 4 wherein the metrics1 are selected from the group consisting of: centre2 point of the frequency spectrum (centroid) ; end3 point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing:
6. The method as claimed in Claim 4 or Claim 5 including the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device comprised within the processing module.
7. A method of controlling the output of an audio system, the method comprising the steps of: - Receiving an audio signal; - Classifying, using a processing module, the audio signal by musical style or genre; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal; - Adjusting the output of an audio system in response to the identification signal.
8. The method as claimed in Claim 7 wherein the step of adjusting the output of an audio system is automated.
9. The method as claimed in Claim 7 or Claim 8 wherein the step of classifying the audio content is carried out in a neural network device.
10. The method as claimed in any of Claims 7 to 9 wherein the musical style or genre is selected from the group consisting of: Voice; Easy Listening/Jazz; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
11. The method as claimed in any of Claims 7 to 10 including the steps of: - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames.
12. The method as claimed in Claim 11 wherein the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing.
13. The method as claimed in Claim 12 including the additional steps of: - Producing feature vectors based on the metrics computed; - Presenting the feature vectors to the inputs of a neural network device.
14. The method as claimed in any of Claims 7 to 13, comprising the additional steps of: - Generating an output function from the identification signal; - applying a smoothing algorithm to the output function to provide a smoothed output function, and; - adjusting the settings of an audio control subsystem according to the smoothed output function.
15. The method as claimed in any of Claims 7 to 14, wherein the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
16. A method of classifying audio content, the method comprising the steps of: - Receiving an audio signal from an audio source into a processing module; - Extracting a plurality of audio frames from the audio signal, and; - Computing one or more metrics characterising at least some of the extracted audio frames, wherein the metrics are selected from the group consisting of: centre point of the frequency spectrum (centroid) ; end point of the spectrum; low energy sections, including estimation of beats per minute; flux; and zero-crossing; - Classifying, using a processing module, the audio content by musical style or genre, based on the metrics computed; - Generating an identification signal indicative of the musical style or genre of the audio content of the audio signal.
17. The method as claimed in Claim 16 wherein the musical style or genre is selected from the group consisting of: Voice; Easy Listening/Jazz ; Rock/Blues; Orchestral/Classical; Beat/Dance; Urban/Hip-hop.
18. A method for controlling the output of an audio system, the method comprising the steps of: - Receiving an audio input signal; - Receiving, from an automated audio content classification system, an identification signal corresponding to classification of the audio input signal by audio content; - Adjusting the output of the audio system in response to the identification signal.
19. The method as claimed in Claim 18 wherein the step of adjusting the output of an audio system is automated.
20. The method as claimed in Claim 18 or Claim 19, wherein the step of adjusting the output of an audio system includes the step of adjusting one or more of the following components of an audio control subsystem: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
21. The method as claimed in any of Claims 18 to 20, comprising the additional steps of: - Generating an output function from the identification signal; - applying a smoothing algorithm to the output function to provide a smoothed output function, and; - adjusting the settings of an audio control subsystem according to the smoothed output function.
22. The method as claimed in any of Claims 18 to 21, comprising the additional steps of: - Receiving a noise signal indicative of ambient noise levels; - Adjusting the output of the audio system in response to the noise signal.
23. The method as claimed in Claim 22 including the additional steps of: - Receiving, from an auxiliary audio input device, an ambient noise level signal; - Scaling the audio input signal; - Computing a transformed domain subtraction between the ambient noise level signal and the scaled audio input signal to provide a noise level estimation signal.
24. The method as claimed in Claim 22 or Claim 23 including the additional steps of: - Quantising the noise level estimation signal into a plurality of levels by applying at least one threshold point, and; - adjusting the output of the audio system in response to the quantised noise signal.
25. The method as claimed in Claim any of Claims 22 to 24, comprising the additional steps of: - Generating an output function from the identification signal and the noise signal; - applying a smoothing algorithm to the output function to provide a smoothed output function, and; - adjusting the settings of an audio control subsystem according to the smoothed output function.
26. A system for classifying audio content, the system comprising a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal.
27. The system as claimed in Claim 26 wherein the processing module comprises a neural network device.
28. A system for controlling the output of an audio system, the system comprising - a processing module adapted to receive an audio signal from an audio source, wherein the processing module is further adapted to classify the audio content by musical style or genre and generate an identification signal indicative of the musical style or genre of the audio content of the audio signal, and; - an audio control subsystem adapted to adjust the output of the audio system in response to the identification signal.
29. The system as claimed in Claim 28 wherein the processing module comprises a neural network device.
30. The system as claimed in Claim 28 or Claim 29 wherein the audio control subsystem comprises at least one of the following components: an audio equaliser, a non-linear gain control, a surround sound processor and a spatial sound processor.
31. A system for controlling the output of an audio system, the system comprising an audio control subsystem adapted to receive an audio input signal and an identification signal from an automated audio content classification system, the identification signal corresponding to classification of the audio input signal by audio content, wherein the audio control subsystem is adapted to adjust the output of the audio system in response to the identification signal.
32. The system as claimed in Claim 31 wherein the audio control subsystem comprises at least one of the following components: an audio equaliser, a non- linear gain control, a surround sound processor and a spatial sound processor.
33. The system as claimed in Claim 31 or Claim 32 wherein the audio control subsystem is further adapted to receive a noise signal indicative of ambient noise levels from an auxiliary audio input " device.
34. The system as claimed in Claim 33 wherein the audio control subsystem is adapted to adjust the output of the audio system in response to the identification signal and the noise signal.
35. A system for classifying audio content, the system being adapted to implement the method of any of Claims 1 to 6.
36. A system for controlling the output of an audio system, the system being adapted to implement the method of any of Claims 7 to 25.
PCT/GB2005/001637 2004-04-30 2005-04-28 Reproduction control of an audio signal based on musical genre classification WO2005106843A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0409663A GB2413745A (en) 2004-04-30 2004-04-30 Classifying audio content by musical style/genre and generating an identification signal accordingly to adjust parameters of an audio system
GB0409663.2 2004-04-30

Publications (1)

Publication Number Publication Date
WO2005106843A1 true WO2005106843A1 (en) 2005-11-10

Family

ID=32408308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/001637 WO2005106843A1 (en) 2004-04-30 2005-04-28 Reproduction control of an audio signal based on musical genre classification

Country Status (2)

Country Link
GB (1) GB2413745A (en)
WO (1) WO2005106843A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007096792A1 (en) * 2006-02-22 2007-08-30 Koninklijke Philips Electronics N.V. Device for and a method of processing audio data
EP1976114A1 (en) * 2007-03-13 2008-10-01 Vestel Elektronik Sanayi ve Ticaret A.S. Automatic equalizer adjustment method
CN104079247A (en) * 2013-03-26 2014-10-01 杜比实验室特许公司 Equalizer controller and control method
CN105074822A (en) * 2013-03-26 2015-11-18 杜比实验室特许公司 Device and method for audio classification and audio processing
US9548713B2 (en) 2013-03-26 2017-01-17 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP3142111A1 (en) * 2015-09-09 2017-03-15 Samsung Electronics Co., Ltd. Apparatus and method for controlling playback of audio signals, and apparatus and method for training a genre recognition model
WO2017097321A1 (en) * 2015-12-07 2017-06-15 Arcelik Anonim Sirketi Image display device with automatic audio and video mode configuration
US9928025B2 (en) 2016-06-01 2018-03-27 Ford Global Technologies, Llc Dynamically equalizing receiver
US10014841B2 (en) 2016-09-19 2018-07-03 Nokia Technologies Oy Method and apparatus for controlling audio playback based upon the instrument
WO2019063736A1 (en) * 2017-09-28 2019-04-04 Sony Europe Limited Method and electronic device
US10855241B2 (en) 2018-11-29 2020-12-01 Sony Corporation Adjusting an equalizer based on audio characteristics
CN113614684A (en) * 2018-09-07 2021-11-05 格雷斯诺特有限公司 Method and apparatus for dynamic volume adjustment via audio classification

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11775250B2 (en) 2018-09-07 2023-10-03 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0343691A2 (en) * 1988-05-27 1989-11-29 Matsushita Electric Industrial Co., Ltd. An apparatus for changing a sound field
JPH0837700A (en) * 1994-07-21 1996-02-06 Kenwood Corp Sound field correction circuit
US5615270A (en) * 1993-04-08 1997-03-25 International Jensen Incorporated Method and apparatus for dynamic sound optimization
WO1998027543A2 (en) * 1996-12-18 1998-06-25 Interval Research Corporation Multi-feature speech/music discrimination system
WO2003023786A2 (en) * 2001-09-11 2003-03-20 Thomson Licensing S.A. Method and apparatus for automatic equalization mode activation
FR2842014A1 (en) * 2002-07-08 2004-01-09 Lyon Ecole Centrale METHOD AND APPARATUS FOR ASSIGNING A SOUND CLASS TO A SOUND SIGNAL

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10161654A (en) * 1996-11-27 1998-06-19 Sanyo Electric Co Ltd Musical classification determining device
JP2000066669A (en) * 1998-08-25 2000-03-03 Victor Co Of Japan Ltd Music creating device
US7295977B2 (en) * 2001-08-27 2007-11-13 Nec Laboratories America, Inc. Extracting classifying data in music from an audio bitstream

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0343691A2 (en) * 1988-05-27 1989-11-29 Matsushita Electric Industrial Co., Ltd. An apparatus for changing a sound field
US5615270A (en) * 1993-04-08 1997-03-25 International Jensen Incorporated Method and apparatus for dynamic sound optimization
JPH0837700A (en) * 1994-07-21 1996-02-06 Kenwood Corp Sound field correction circuit
WO1998027543A2 (en) * 1996-12-18 1998-06-25 Interval Research Corporation Multi-feature speech/music discrimination system
WO2003023786A2 (en) * 2001-09-11 2003-03-20 Thomson Licensing S.A. Method and apparatus for automatic equalization mode activation
FR2842014A1 (en) * 2002-07-08 2004-01-09 Lyon Ecole Centrale METHOD AND APPARATUS FOR ASSIGNING A SOUND CLASS TO A SOUND SIGNAL

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAN K-P ET AL: "GENRE CLASSIFICATION SYSTEM OF TV SOUND SIGNALS BASED ON A SPECTROGRAM ANALYSIS", IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, IEEE INC. NEW YORK, US, vol. 44, no. 1, February 1998 (1998-02-01), pages 33 - 42, XP000779248, ISSN: 0098-3063 *
PATENT ABSTRACTS OF JAPAN vol. 1996, no. 06 28 June 1996 (1996-06-28) *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007096792A1 (en) * 2006-02-22 2007-08-30 Koninklijke Philips Electronics N.V. Device for and a method of processing audio data
EP1976114A1 (en) * 2007-03-13 2008-10-01 Vestel Elektronik Sanayi ve Ticaret A.S. Automatic equalizer adjustment method
US10044337B2 (en) 2013-03-26 2018-08-07 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
CN105074822A (en) * 2013-03-26 2015-11-18 杜比实验室特许公司 Device and method for audio classification and audio processing
CN104079247A (en) * 2013-03-26 2014-10-01 杜比实验室特许公司 Equalizer controller and control method
US10411669B2 (en) 2013-03-26 2019-09-10 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US11711062B2 (en) 2013-03-26 2023-07-25 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US9621124B2 (en) 2013-03-26 2017-04-11 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
US11218126B2 (en) 2013-03-26 2022-01-04 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP3232567A1 (en) * 2013-03-26 2017-10-18 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
US9842605B2 (en) 2013-03-26 2017-12-12 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US9923536B2 (en) 2013-03-26 2018-03-20 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US10803879B2 (en) 2013-03-26 2020-10-13 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
EP3598448B1 (en) 2013-03-26 2020-08-26 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US10707824B2 (en) 2013-03-26 2020-07-07 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160548A1 (en) * 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
US9548713B2 (en) 2013-03-26 2017-01-17 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP2979267B1 (en) 2013-03-26 2019-12-18 Dolby Laboratories Licensing Corporation 1apparatuses and methods for audio classifying and processing
EP3142111A1 (en) * 2015-09-09 2017-03-15 Samsung Electronics Co., Ltd. Apparatus and method for controlling playback of audio signals, and apparatus and method for training a genre recognition model
WO2017097321A1 (en) * 2015-12-07 2017-06-15 Arcelik Anonim Sirketi Image display device with automatic audio and video mode configuration
US9928025B2 (en) 2016-06-01 2018-03-27 Ford Global Technologies, Llc Dynamically equalizing receiver
US10014841B2 (en) 2016-09-19 2018-07-03 Nokia Technologies Oy Method and apparatus for controlling audio playback based upon the instrument
US11069369B2 (en) 2017-09-28 2021-07-20 Sony Europe B.V. Method and electronic device
WO2019063736A1 (en) * 2017-09-28 2019-04-04 Sony Europe Limited Method and electronic device
JP7397066B2 (en) 2018-09-07 2023-12-12 グレースノート インコーポレイテッド Method, computer readable storage medium and apparatus for dynamic volume adjustment via audio classification
CN113614684A (en) * 2018-09-07 2021-11-05 格雷斯诺特有限公司 Method and apparatus for dynamic volume adjustment via audio classification
JP2021536705A (en) * 2018-09-07 2021-12-27 グレースノート インコーポレイテッド Methods and devices for dynamic volume control via audio classification
US10855241B2 (en) 2018-11-29 2020-12-01 Sony Corporation Adjusting an equalizer based on audio characteristics

Also Published As

Publication number Publication date
GB2413745A (en) 2005-11-02
GB0409663D0 (en) 2004-06-02

Similar Documents

Publication Publication Date Title
WO2005106843A1 (en) Reproduction control of an audio signal based on musical genre classification
US9960743B2 (en) Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
CN1830141B (en) Audio conditioning apparatus and the method thereof
US8649531B2 (en) Method and system for approximating graphic equalizers using dynamic filter order reduction
EP1629463B1 (en) Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
US6341166B1 (en) Automatic correction of power spectral balance in audio source material
US9282417B2 (en) Spatial sound reproduction
EP2278707B1 (en) Dynamic enhancement of audio signals
US20090161883A1 (en) System for adjusting perceived loudness of audio signals
US8027487B2 (en) Method of setting equalizer for audio file and method of reproducing audio file
CN102077464B (en) Acoustic processing device
JP5702666B2 (en) Acoustic device and volume correction method
EP1826900A1 (en) Vehicle-mounted sound control system
WO2015035492A1 (en) System and method for performing automatic multi-track audio mixing
KR20110103339A (en) Automatic correction of loudness in audio signals
US20120155658A1 (en) Content reproduction device and method, and program
KR0129429B1 (en) Audio sgnal processing unit
CN101120412A (en) A system for and a method of mixing first audio data with second audio data, a program element and a computer-readable medium
JP3069535B2 (en) Sound reproduction device
JP4145507B2 (en) Sound quality volume control device
US8718298B2 (en) NVH dependent parallel compression processing for automotive audio systems
US20130195279A1 (en) Peak detection when adapting a signal gain based on signal loudness
CN112511966B (en) Self-adaptive active frequency division method for vehicle-mounted stereo playback
WO2018066383A1 (en) Information processing device and method, and program
CN108768330B (en) Automatic loudness control

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase