WO2024061436A1 - Adaptive audio enhancement system - Google Patents
Adaptive audio enhancement system Download PDFInfo
- Publication number
- WO2024061436A1 WO2024061436A1 PCT/EP2022/075904 EP2022075904W WO2024061436A1 WO 2024061436 A1 WO2024061436 A1 WO 2024061436A1 EP 2022075904 W EP2022075904 W EP 2022075904W WO 2024061436 A1 WO2024061436 A1 WO 2024061436A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- audio
- enhancement system
- signal
- adaptive
- Prior art date
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 35
- 230000005236 sound signal Effects 0.000 claims abstract description 111
- 230000007613 environmental effect Effects 0.000 claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000004044 response Effects 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 6
- 210000000988 bone and bone Anatomy 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 2
- 210000000613 ear canal Anatomy 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 210000003625 skull Anatomy 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/01—Aspects of volume control, not necessarily automatic, in sound systems
Definitions
- the disclosure relates to an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, and at least one sensor arrangement.
- Low-frequency performance is usually understood to be a feature of a specific device. However, the performance usually depends also on the use-case environment, several examples of which are provided below.
- Interaural (“in-ear”) headsets that are improperly fitted in the ear canal cause acoustic leakage which is perceived as bad audio low-frequency quality.
- the leak reduces low frequency considerably.
- the low frequency can be equalized, i.e. the volume of the different frequency bands within the audio signal adjusted, but since the occlusion effect is large, and such small headphone speakers have their limitations, equalization is usually not sufficient to solve the problem.
- Bone conduction headphones use the bones of the skull to transmit sound to the inner ear.
- the fit and pressure caused by the bone conduction headphones on the head will influence the perception of the emitted sound.
- low-frequency response is prone to change due to the fit of the bone conduction headphone.
- low-frequency resonances change from user-to-user and from fit-to-fit.
- the room acoustic can have effects on the low-frequency performance of a low-frequency loudspeaker. Due to the long wavelengths (compared to room dimensions), standing waves are generated causing peaks and dips in the frequency response. This typically means a “boomy” sound, some frequencies having high sound pressure levels while some frequencies are not heard at all.
- the low frequency can be improved by equalization, but room null modes, i.e. dips, cannot be improved at all by equalization.
- Singing Display low-frequency performance changes when the user touches the display, and environmental factors such as indoor heating, usage in sub-zero degree weather, the device being placed in a holder, etc., can have an impact.
- the audio content when listening to audio in an environment with a high background noise level, the audio content will be masked.
- Masking is a phenomenon where two sounds are played simultaneously, but only the loudest one is heard. This occurs when the loudness of the louder sound source is high enough compared to the quieter sound. In such cases, higher overall listening levels might help, though higher listening levels is not good for the listener. However, higher listening levels may be impossible since small portable audio playback devices (such as mobile phones, tablets, hearables, or wearables) might not have enough sound output power and, frequently, the sound output power is limited in lower frequencies since there is more headroom in higher frequencies. Additionally, the noise content is usually higher in the lower frequencies.
- VBE Virtual Bass Enhancement
- PBE Psychacoustic Bass Enhancement
- algorithms may be used as a bass improvement scheme, which is based on the Missing Fundamental phenomenon.
- the Missing Fundamental is a psychoacoustic effect where series of harmonic frequencies (integer multiplications 2, 3..., n) of a certain fundamental frequency (/o ) are perceived as producing the same pitch as the fundamental frequency.
- Pitch is a subjective attribute of a sound that can be positioned to scale from low to high, and low pitch is commonly perceived as bass sound.
- the Missing Fundamental phenomenon can therefore be used to improve the perceived bass response of a loudspeaker or headphones.
- harmonic frequencies There are different methods for generating harmonic frequencies.
- the two most common methods are the time-domain and frequency-domain methods.
- time-domain methods the filtered bass signal is fed to a non-linear device which applies a non-linear function to the bass signal.
- Non-linear functions generate non-linear distortion, which comprises harmonically related frequencies.
- frequency-domain methods the harmonic frequencies are added by utilizing Fourier transform to the input signal, and the magnitudes of frequencies modified, or by pitch shifting the original signal to higher frequency regions of harmonic frequencies.
- prior art solutions do not take into account bass performance which varies due to changing external and listening conditions. Hence, there is a need for an improved adaptive audio enhancement system.
- an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, the first audio signal being simultaneously input into the signal processing arrangement and output as a first human perceivable audio signal, and at least one sensor arrangement configured to detect a second audio signal and/or environmental data and to transmit the second audio signal and/or environmental data as raw data to the signal processing arrangement, the signal processing arrangement being configured to determine whether a perceived audio signal is different from the first audio signal, by means of the raw data, and, if the perceived audio signal is different from the first audio signal, to process the first audio signal based on the raw data, and to output the processed first audio signal as a second human perceivable audio signal.
- Such a solution allows a constantly varying bass response by adaptively changing the parameters.
- the adaptive system will improve the accuracy of the generated harmonics, and improve the performance of the system in a high-noise environment as well as in situations where the performance of the apparatus is altered.
- the system can be configured to support only the frequencies that have improper playback.
- the system can be used to mitigate the impact of a noisy environment.
- the first human perceivable audio signal is output if the perceived audio signal is determined to be equal to the first audio signal, initiating signal processing only when needed.
- the raw data is not processed by means of active noise cancellation, reducing the required number of processes and components for improving the quality of audio.
- the first audio signal has a first frequency
- the first audio signal is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency
- the second human perceivable audio signal comprises the first frequency and the at least one harmonic frequency, generating a psychoacoustic effect improving the perceived pitch.
- the virtual bass enhancement algorithm comprises at least one parameter independent of the raw data, allowing the system to adapt based on additional parameters such as a predetermined cut-off frequency.
- the virtual bass enhancement algorithm comprises at least one parameter estimated by means of the raw data, compensating for any effects on the low frequencies due to the specific use-case.
- the parameter is a cut-off frequency estimated for the first audio signal, providing a reliable limit value for when the system cannot produce sound properly.
- the parameter is a harmonic amplitude ratio estimated for the first audio signal and the at least one harmonic frequency, allowing the perceived bass of the audio to be stronger.
- the virtual bass enhancement algorithm comprises at least one parameter estimated by means of input signal classification, the first audio signal being classified after being input into the signal processing arrangement. This simplifies the processing of the input signal by automatically applying predetermined parameters that apply to particular situations.
- the sensor arrangement comprises a microphone and/or a force sensor, allowing sounds as well other environmental factors to be taken into consideration.
- the force sensor is configured to detect a force applied by an actuating part of the audio signal generator, allowing the system to be used in bone conduction devices and/or to determine whether the positioning of an in-ear apparatus is optimal.
- the sensor arrangement is configured to detect a change in environmental data and/or a discrepancy between predetermined environmental data and current environmental data, allowing the system to adapt to for example noise or a change in external sound levels.
- the environmental data comprises at least one of environmental temperature, location relative to external objects, and weather conditions, allowing the system to adapt to not only audio but other environmental factors.
- the location relative to external objects is detected by means of a frequency response of the second audio signal, allowing the perceived room response to be improved by compensating for standing waves.
- the second audio signal is a signal generated externally of the adaptive audio enhancement system, allowing external noise to be taken into account.
- the second audio signal is the first audio signal, allowing the first audio signal to be detected by the sensor arrangement.
- an electronic apparatus for generating audio comprising the adaptive audio enhancement system according to the above.
- Such an apparatus allows a constantly varying bass response by adaptively changing the parameters.
- the adaptive system of the apparatus will improve the accuracy of the generated harmonics, and improve the performance of the apparatus in a high-noise environment as well as in situations where the performance of the apparatus is altered.
- Fig. 1 is a schematic illustration of an adaptive audio enhancement system in accordance with an example of the embodiments of the disclosure.
- the present invention relates to an adaptive audio enhancement system
- a signal processing arrangement 2 comprising a signal processing arrangement 2, an audio signal generator 3 configured to generate a first audio signal Al, the first audio signal Al being simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l, and at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED and to transmit the second audio signal A2 and/or environmental data ED as raw data RD to the signal processing arrangement 2, the signal processing arrangement
- a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD, and, if the perceived audio signal A3 is different from the first audio signal Al, to process the first audio signal Al based on the raw data RD, and to output the processed first audio signal Al as a second human perceivable audio signal A12.
- the adaptive audio enhancement system 1 comprises a signal processing arrangement 2 and an audio signal generator 3 configured to generate a first audio signal Al.
- the adaptive audio enhancement system 1 also comprises at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED. This allows contextual awareness to be considered by connecting the adaptation to the detection of certain circumstances or a specific sound environment such as traffic noise, a cocktail party etc.
- the second audio signal A2 may be a signal generated externally of the adaptive audio enhancement system 1. Instead, the second audio signal A2 may be the first audio signal Al.
- the sensor arrangement 4 may be configured to detect a change in environmental data ED and/or a discrepancy between predetermined environmental data EDI and current environmental data ED2.
- the environmental data ED may comprise at least one of environmental temperature, location relative to external objects, and weather conditions. The location relative to external objects may be detected by means of a frequency response of the second audio signal A2.
- the sensor arrangement 4 may comprise a microphone and/or a force sensor.
- the force sensor may be configured to detect a force applied by an actuating part of the audio signal generator 3.
- the first audio signal Al is simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l.
- the detected second audio signal A2 and/or environmental data ED is transmitted, from the sensor arrangement 4 to the signal processing arrangement 2, as raw data RD.
- raw data RD is meant data that is not processed by means of active noise cancellation.
- an active noise cancellation output signal is a pre-processed cancellation signal of the ambient noise, which contains an inverted noise signal.
- the sensor arrangement 4 is directly connected to the signal processing arrangement 2 such that the signal processing arrangement 2 receives raw data RD that has not been analyzed, modified, or synthesized, i.e. its components have not been transformed from one format to another.
- the raw data RD may have been filtered, i.e. some components may have been removed from the raw data.
- the signal processing arrangement 2 is configured to determine whether a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD. If the perceived audio signal A3 is different from the first audio signal Al, the signal processing arrangement 2 processes the first audio signal Al based on the raw data RD and outputs the processed first audio signal Al as a second human perceivable audio signal A12. If, instead, the perceived audio signal A3 is determined to be equal to the first audio signal Al, only the first human perceivable audio signal Al 1 is output.
- the first audio signal Al has a first frequency
- the first audio signal Al is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency.
- the second human perceivable audio signal A12 comprises the first frequency and the at least one harmonic frequency.
- the virtual bass enhancement algorithm may comprise at least one parameter independent of the raw data RD.
- the virtual bass enhancement algorithm may comprise at least one parameter estimated by means of the raw data RD.
- the estimated parameter may be a cut-off frequency estimated for the first audio signal Al .
- the estimated parameter may be a harmonic amplitude ratio estimated for the first audio signal Al and the at least one harmonic frequency.
- the virtual bass enhancement algorithm may also comprise at least one parameter estimated by means of input signal classification, the first audio signal Al being classified after having been input into the signal processing arrangement 2.
- the first audio signal Al also referred to as input signal, may be classified as speech, audio, or a mixture thereof. If the input signal is classified as a speech, VBE processing can be disabled or it can be adjusted with as little boost as possible. If the input signal is classified as audio, the VBE algorithm can have a different mode, for example bass boost for certain types of music and neutral boost for other types of music. If the input signal is a mixture of audio and speech (movies etc.), the VBE algorithm can have yet another mode wherein the target boost is different than that for only music. Furthermore, the input signal class could determine how steep the slope of the frequency-amplitude curve should be, e.g., as smooth as possible for music and as steep as possible for speech. The sensor(s) gather(s) data about the use-case and the environment. If any changes in the output response of the system, or in the environment, are sensed, the VBE parameters will be adjusted accordingly.
- a microphone can measure the amount of leakage or the real frequency response of the output in the ear canal. If the microphone senses that the apparatus does not produce low frequencies properly, it utilizes the VBE algorithm. The cutoff frequency of the VBE is estimated from the measured data. If the data indicates that the device works properly, or there is no leakage, VBE processing can be disabled or a default cutoff value be set.
- the force applied by the actuating part of the apparatus will have an impact on the low-frequency performance. If the applied force is higher than a certain pre-defined threshold, the cut-off frequency should be increased. Furthermore, VBE processing can be disabled if the measured force is below the threshold.
- Improving the perceived room response is a use-case scenario wherein the position of the listener will have an impact on the perceived frequency response of the loudspeaker. If the wavelength of a frequency is longer than the room dimensions, a standing wave can occur. The impact of the listener’s position can be measured and supporting harmonic frequencies can be generated.
- the gain ratios of the harmonic frequencies should be adjusted in order to maintain proper low- frequency performance even if the device’s output power range is limited.
- a microphone is utilized to capture the noise level of the environment, and the amplitudes of the generated harmonic frequency components are adjusted according to this level.
- the noise level increases, the amplitude of the higher harmonics should be increased, thus the difference between the amplitudes of adjacent harmonic components should decrease.
- the present invention also relates to an electronic apparatus for generating audio, such as an interaural or bone conduction headset, a display, or a built-in loudspeaker, the apparatus comprising the adaptive audio enhancement system 1 described above.
- an electronic apparatus for generating audio such as an interaural or bone conduction headset, a display, or a built-in loudspeaker
- the apparatus comprising the adaptive audio enhancement system 1 described above.
- the terms “horizontal”, “vertical”, “left”, “right”, “up” and “down”, as well as adjectival and adverbial derivatives thereof simply refer to the orientation of the illustrated structure as the particular drawing figure faces the reader.
- the terms “inwardly” and “outwardly” generally refer to the orientation of a surface relative to its axis of elongation, or axis of rotation, as appropriate.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
An adaptive audio enhancement system (1) comprising an audio signal generator (3) generating a first audio signal (A1) that is simultaneously input into a signal processing arrangement (2) and output as a first human perceivable audio signal (A11). A sensor arrangement (4) detects a second audio signal (A2) and/or environmental data (ED) and transmits said second audio signal (A2) and/or environmental data (ED) as raw data (RD) to said signal processing arrangement (2). The signal processing arrangement (2) determines whether a perceived audio signal (A3) is different from said first audio signal (A1) by means of said raw data (RD). If said perceived audio signal (A3) is different from said first audio signal (A1), said signal processing arrangement (2) processes said first audio signal (A1) based on said raw data (RD) and outputs said processed first audio signal (Al) as a second human perceivable audio signal (A12).
Description
ADAPTIVE AUDIO ENHANCEMENT SYSTEM
TECHNICAL FIELD
The disclosure relates to an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, and at least one sensor arrangement.
BACKGROUND
Low-frequency performance is usually understood to be a feature of a specific device. However, the performance usually depends also on the use-case environment, several examples of which are provided below.
Interaural (“in-ear”) headsets that are improperly fitted in the ear canal cause acoustic leakage which is perceived as bad audio low-frequency quality. The leak reduces low frequency considerably. The low frequency can be equalized, i.e. the volume of the different frequency bands within the audio signal adjusted, but since the occlusion effect is large, and such small headphone speakers have their limitations, equalization is usually not sufficient to solve the problem.
Bone conduction headphones use the bones of the skull to transmit sound to the inner ear. The fit and pressure caused by the bone conduction headphones on the head will influence the perception of the emitted sound. In particular, low-frequency response is prone to change due to the fit of the bone conduction headphone. In practice, low-frequency resonances change from user-to-user and from fit-to-fit.
The room acoustic can have effects on the low-frequency performance of a low-frequency loudspeaker. Due to the long wavelengths (compared to room dimensions), standing waves are generated causing peaks and dips in the frequency response. This typically means a “boomy” sound, some frequencies having high sound pressure levels while some frequencies are not heard at all. The low frequency can be improved by equalization, but room null modes, i.e. dips, cannot be improved at all by equalization.
Singing Display low-frequency performance changes when the user touches the display, and environmental factors such as indoor heating, usage in sub-zero degree weather, the device being placed in a holder, etc., can have an impact.
Furthermore, when listening to audio in an environment with a high background noise level, the audio content will be masked. Masking is a phenomenon where two sounds are played simultaneously, but only the loudest one is heard. This occurs when the loudness of the louder sound source is high enough compared to the quieter sound. In such cases, higher overall listening levels might help, though higher listening levels is not good for the listener. However, higher listening levels may be impossible since small portable audio playback devices (such as mobile phones, tablets, hearables, or wearables) might not have enough sound output power and, frequently, the sound output power is limited in lower frequencies since there is more headroom in higher frequencies. Additionally, the noise content is usually higher in the lower frequencies.
Virtual Bass Enhancement (VBE), also known as Psychoacoustic Bass Enhancement (PBE), algorithms may be used as a bass improvement scheme, which is based on the Missing Fundamental phenomenon. The Missing Fundamental is a psychoacoustic effect where series of harmonic frequencies (integer multiplications 2, 3..., n) of a certain fundamental frequency (/o ) are perceived as producing the same pitch as the fundamental frequency. Pitch is a subjective attribute of a sound that can be positioned to scale from low to high, and low pitch is commonly perceived as bass sound. The Missing Fundamental phenomenon can therefore be used to improve the perceived bass response of a loudspeaker or headphones.
There are different methods for generating harmonic frequencies. The two most common methods are the time-domain and frequency-domain methods. In time-domain methods, the filtered bass signal is fed to a non-linear device which applies a non-linear function to the bass signal. Non-linear functions generate non-linear distortion, which comprises harmonically related frequencies. In frequency-domain methods, the harmonic frequencies are added by utilizing Fourier transform to the input signal, and the magnitudes of frequencies modified, or by pitch shifting the original signal to higher frequency regions of harmonic frequencies.
However, prior art solutions do not take into account bass performance which varies due to changing external and listening conditions. Hence, there is a need for an improved adaptive audio enhancement system.
SUMMARY
It is an object to provide an improved adaptive audio enhancement system. The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description, and the figures.
According to a first aspect, there is provided an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, the first audio signal being simultaneously input into the signal processing arrangement and output as a first human perceivable audio signal, and at least one sensor arrangement configured to detect a second audio signal and/or environmental data and to transmit the second audio signal and/or environmental data as raw data to the signal processing arrangement, the signal processing arrangement being configured to determine whether a perceived audio signal is different from the first audio signal, by means of the raw data, and, if the perceived audio signal is different from the first audio signal, to process the first audio signal based on the raw data, and to output the processed first audio signal as a second human perceivable audio signal.
Such a solution allows a constantly varying bass response by adaptively changing the parameters. The adaptive system will improve the accuracy of the generated harmonics, and improve the performance of the system in a high-noise environment as well as in situations where the performance of the apparatus is altered. By measuring the external impact and/or the real response of the system, the system can be configured to support only the frequencies that have improper playback. Furthermore, the system can be used to mitigate the impact of a noisy environment.
In a possible implementation form of the first aspect, only the first human perceivable audio signal is output if the perceived audio signal is determined to be equal to the first audio signal, initiating signal processing only when needed.
In a further possible implementation form of the first aspect, the raw data is not processed by means of active noise cancellation, reducing the required number of processes and components for improving the quality of audio.
In a further possible implementation form of the first aspect, the first audio signal has a first frequency, the first audio signal is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency, and the second human perceivable audio signal comprises the first frequency and the at least one harmonic frequency, generating a psychoacoustic effect improving the perceived pitch.
In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter independent of the raw data, allowing the system to adapt based on additional parameters such as a predetermined cut-off frequency.
In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter estimated by means of the raw data, compensating for any effects on the low frequencies due to the specific use-case.
In a further possible implementation form of the first aspect, the parameter is a cut-off frequency estimated for the first audio signal, providing a reliable limit value for when the system cannot produce sound properly.
In a further possible implementation form of the first aspect, the parameter is a harmonic amplitude ratio estimated for the first audio signal and the at least one harmonic frequency, allowing the perceived bass of the audio to be stronger.
In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter estimated by means of input signal classification, the first audio signal being classified after being input into the signal processing arrangement. This simplifies the processing of the input signal by automatically applying predetermined parameters that apply to particular situations.
In a further possible implementation form of the first aspect, the sensor arrangement comprises a microphone and/or a force sensor, allowing sounds as well other environmental factors to be taken into consideration.
In a further possible implementation form of the first aspect, the force sensor is configured to detect a force applied by an actuating part of the audio signal generator, allowing the system to be used in bone conduction devices and/or to determine whether the positioning of an in-ear apparatus is optimal.
In a further possible implementation form of the first aspect, the sensor arrangement is configured to detect a change in environmental data and/or a discrepancy between predetermined environmental data and current environmental data, allowing the system to adapt to for example noise or a change in external sound levels.
In a further possible implementation form of the first aspect, the environmental data comprises at least one of environmental temperature, location relative to external objects, and weather conditions, allowing the system to adapt to not only audio but other environmental factors.
In a further possible implementation form of the first aspect, the location relative to external objects is detected by means of a frequency response of the second audio signal, allowing the perceived room response to be improved by compensating for standing waves.
In a further possible implementation form of the first aspect, the second audio signal is a signal generated externally of the adaptive audio enhancement system, allowing external noise to be taken into account.
In a further possible implementation form of the first aspect, the second audio signal is the first audio signal, allowing the first audio signal to be detected by the sensor arrangement.
According to a second aspect, there is provided an electronic apparatus for generating audio, the apparatus comprising the adaptive audio enhancement system according to the above. Such an apparatus allows a constantly varying bass response by adaptively changing the parameters. The adaptive system of the apparatus will improve the accuracy of the generated harmonics,
and improve the performance of the apparatus in a high-noise environment as well as in situations where the performance of the apparatus is altered.
These and other aspects will be apparent from the embodiments described below.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following detailed portion of the present disclosure, the aspects, embodiments, and implementations will be explained in more detail with reference to the example embodiments shown in the drawings, in which:
Fig. 1 is a schematic illustration of an adaptive audio enhancement system in accordance with an example of the embodiments of the disclosure.
DETAILED DESCRIPTION
As illustrated in Fig. 1, the present invention relates to an adaptive audio enhancement system
1 comprising a signal processing arrangement 2, an audio signal generator 3 configured to generate a first audio signal Al, the first audio signal Al being simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l, and at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED and to transmit the second audio signal A2 and/or environmental data ED as raw data RD to the signal processing arrangement 2, the signal processing arrangement
2 being configured to determine whether a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD, and, if the perceived audio signal A3 is different from the first audio signal Al, to process the first audio signal Al based on the raw data RD, and to output the processed first audio signal Al as a second human perceivable audio signal A12.
The adaptive audio enhancement system 1 comprises a signal processing arrangement 2 and an audio signal generator 3 configured to generate a first audio signal Al. The adaptive audio enhancement system 1 also comprises at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED. This allows contextual awareness to be considered by connecting the adaptation to the detection of certain circumstances or a specific sound environment such as traffic noise, a cocktail party etc.
The second audio signal A2 may be a signal generated externally of the adaptive audio enhancement system 1. Instead, the second audio signal A2 may be the first audio signal Al.
The sensor arrangement 4 may be configured to detect a change in environmental data ED and/or a discrepancy between predetermined environmental data EDI and current environmental data ED2. The environmental data ED may comprise at least one of environmental temperature, location relative to external objects, and weather conditions. The location relative to external objects may be detected by means of a frequency response of the second audio signal A2.
The sensor arrangement 4 may comprise a microphone and/or a force sensor. The force sensor may be configured to detect a force applied by an actuating part of the audio signal generator 3.
The first audio signal Al is simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l.
The detected second audio signal A2 and/or environmental data ED is transmitted, from the sensor arrangement 4 to the signal processing arrangement 2, as raw data RD.
By raw data RD is meant data that is not processed by means of active noise cancellation. Typically, an active noise cancellation output signal is a pre-processed cancellation signal of the ambient noise, which contains an inverted noise signal. In the present invention, the sensor arrangement 4 is directly connected to the signal processing arrangement 2 such that the signal processing arrangement 2 receives raw data RD that has not been analyzed, modified, or synthesized, i.e. its components have not been transformed from one format to another. However, the raw data RD may have been filtered, i.e. some components may have been removed from the raw data.
The signal processing arrangement 2 is configured to determine whether a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD. If the perceived audio signal A3 is different from the first audio signal Al, the signal processing arrangement 2 processes the first audio signal Al based on the raw data RD and outputs the processed first audio signal Al as a second human perceivable audio signal A12.
If, instead, the perceived audio signal A3 is determined to be equal to the first audio signal Al, only the first human perceivable audio signal Al 1 is output.
The first audio signal Al has a first frequency, and the first audio signal Al is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency.
The second human perceivable audio signal A12 comprises the first frequency and the at least one harmonic frequency.
The virtual bass enhancement algorithm may comprise at least one parameter independent of the raw data RD.
Furthermore, the virtual bass enhancement algorithm may comprise at least one parameter estimated by means of the raw data RD. The estimated parameter may be a cut-off frequency estimated for the first audio signal Al . Additionally, the estimated parameter may be a harmonic amplitude ratio estimated for the first audio signal Al and the at least one harmonic frequency.
The virtual bass enhancement algorithm may also comprise at least one parameter estimated by means of input signal classification, the first audio signal Al being classified after having been input into the signal processing arrangement 2.
The first audio signal Al, also referred to as input signal, may be classified as speech, audio, or a mixture thereof. If the input signal is classified as a speech, VBE processing can be disabled or it can be adjusted with as little boost as possible. If the input signal is classified as audio, the VBE algorithm can have a different mode, for example bass boost for certain types of music and neutral boost for other types of music. If the input signal is a mixture of audio and speech (movies etc.), the VBE algorithm can have yet another mode wherein the target boost is different than that for only music. Furthermore, the input signal class could determine how steep the slope of the frequency-amplitude curve should be, e.g., as smooth as possible for music and as steep as possible for speech.
The sensor(s) gather(s) data about the use-case and the environment. If any changes in the output response of the system, or in the environment, are sensed, the VBE parameters will be adjusted accordingly.
For example, in the case of in-ear headphones, a microphone can measure the amount of leakage or the real frequency response of the output in the ear canal. If the microphone senses that the apparatus does not produce low frequencies properly, it utilizes the VBE algorithm. The cutoff frequency of the VBE is estimated from the measured data. If the data indicates that the device works properly, or there is no leakage, VBE processing can be disabled or a default cutoff value be set.
For a bone-conduction apparatus, the force applied by the actuating part of the apparatus will have an impact on the low-frequency performance. If the applied force is higher than a certain pre-defined threshold, the cut-off frequency should be increased. Furthermore, VBE processing can be disabled if the measured force is below the threshold.
Improving the perceived room response is a use-case scenario wherein the position of the listener will have an impact on the perceived frequency response of the loudspeaker. If the wavelength of a frequency is longer than the room dimensions, a standing wave can occur. The impact of the listener’s position can be measured and supporting harmonic frequencies can be generated.
In a high-noise environment, the gain ratios of the harmonic frequencies should be adjusted in order to maintain proper low- frequency performance even if the device’s output power range is limited. A microphone is utilized to capture the noise level of the environment, and the amplitudes of the generated harmonic frequency components are adjusted according to this level. When the noise level increases, the amplitude of the higher harmonics should be increased, thus the difference between the amplitudes of adjacent harmonic components should decrease.
The present invention also relates to an electronic apparatus for generating audio, such as an interaural or bone conduction headset, a display, or a built-in loudspeaker, the apparatus comprising the adaptive audio enhancement system 1 described above.
The various aspects and implementations have been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject-matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. The reference signs used in the claims shall not be construed as limiting the scope. Unless otherwise indicated, the drawings are intended to be read (e.g., cross-hatching, arrangement of parts, proportion, degree, etc.) together with the specification, and are to be considered a portion of the entire written description of this disclosure. As used in the description, the terms “horizontal”, “vertical”, “left”, “right”, “up” and “down”, as well as adjectival and adverbial derivatives thereof (e.g., “horizontally”, “rightwardly”, “upwardly”, etc.), simply refer to the orientation of the illustrated structure as the particular drawing figure faces the reader. Similarly, the terms “inwardly” and “outwardly” generally refer to the orientation of a surface relative to its axis of elongation, or axis of rotation, as appropriate.
Claims
1. An adaptive audio enhancement system (1) comprising:
-a signal processing arrangement (2);
-an audio signal generator (3) configured to generate a first audio signal (Al); said first audio signal (Al) being simultaneously input into said signal processing arrangement (2) and output as a first human perceivable audio signal (Al 1); and
-at least one sensor arrangement (4) configured to detect a second audio signal (A2) and/or environmental data (ED) and to transmit said second audio signal (A2) and/or environmental data (ED) as raw data (RD) to said signal processing arrangement (2); said signal processing arrangement (2) being configured to determine whether a perceived audio signal (A3) is different from said first audio signal (Al), by means of said raw data (RD), and, if said perceived audio signal (A3) is different from said first audio signal (Al), to process said first audio signal (Al) based on said raw data (RD), and to output said processed first audio signal (Al) as a second human perceivable audio signal (A12).
2. The adaptive audio enhancement system (1) according to claim 1, wherein, if said perceived audio signal (A3) is determined to be equal to said first audio signal (Al), only said first human perceivable audio signal (Al 1) is output.
3. The adaptive audio enhancement system (1) according to claim 1 or 2, wherein said raw data (RD) is not processed by means of active noise cancellation.
4. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said first audio signal (Al) has a first frequency, said first audio signal (Al) is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of said first frequency, and wherein said second human perceivable audio signal (A12) comprises said first frequency and said at least one harmonic frequency.
5. The adaptive audio enhancement system (1) according to claim 4, wherein said virtual bass enhancement algorithm comprises at least one parameter independent of said raw data (RD).
6. The adaptive audio enhancement system (1) according to claim 4 or 5, wherein said virtual bass enhancement algorithm comprises at least one parameter estimated by means of said raw data (RD).
7. The adaptive audio enhancement system (1) according to claim 5, wherein said parameter is a cut-off frequency estimated for said first audio signal (Al).
8. The adaptive audio enhancement system (1) according to claim 5, wherein said parameter is a harmonic amplitude ratio estimated for said first audio signal (Al) and said at least one harmonic frequency.
9. The adaptive audio enhancement system (1) according to any one of claims 4 to 8, wherein said virtual bass enhancement algorithm comprises at least one parameter estimated by means of input signal classification, said first audio signal (Al) being classified after being input into said signal processing arrangement (2).
10. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said sensor arrangement (4) comprises a microphone and/or a force sensor.
11. The adaptive audio enhancement system (1) according to claim 10, wherein said force sensor is configured to detect a force applied by an actuating part of said audio signal generator (3).
12. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said sensor arrangement (4) is configured to detect a change in environmental data (ED) and/or a discrepancy between predetermined environmental data (EDI) and current environmental data (ED2).
13. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said environmental data (ED) comprises at least one of environmental temperature, location relative to external objects, and weather conditions.
14. The adaptive audio enhancement system (1) according to claim 13, wherein said location relative to external objects is detected by means of a frequency response of said second audio signal (A2).
15. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said second audio signal (A2) is a signal generated externally of said adaptive audio enhancement system (1).
16. The adaptive audio enhancement system (1) according to any one of claims 1 to 15, wherein said second audio signal (A2) is said first audio signal (Al).
17. An electronic apparatus for generating audio, said apparatus comprising the adaptive audio enhancement system (1) according to any one of claims 1 and 16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2022/075904 WO2024061436A1 (en) | 2022-09-19 | 2022-09-19 | Adaptive audio enhancement system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2022/075904 WO2024061436A1 (en) | 2022-09-19 | 2022-09-19 | Adaptive audio enhancement system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024061436A1 true WO2024061436A1 (en) | 2024-03-28 |
Family
ID=83692790
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2022/075904 WO2024061436A1 (en) | 2022-09-19 | 2022-09-19 | Adaptive audio enhancement system |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024061436A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060188104A1 (en) * | 2003-07-28 | 2006-08-24 | Koninklijke Philips Electronics N.V. | Audio conditioning apparatus, method and computer program product |
US20120259626A1 (en) * | 2011-04-08 | 2012-10-11 | Qualcomm Incorporated | Integrated psychoacoustic bass enhancement (pbe) for improved audio |
WO2021119177A1 (en) * | 2019-12-09 | 2021-06-17 | Dolby Laboratories Licensing Corporation | Multiband limiter modes and noise compensation methods |
-
2022
- 2022-09-19 WO PCT/EP2022/075904 patent/WO2024061436A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060188104A1 (en) * | 2003-07-28 | 2006-08-24 | Koninklijke Philips Electronics N.V. | Audio conditioning apparatus, method and computer program product |
US20120259626A1 (en) * | 2011-04-08 | 2012-10-11 | Qualcomm Incorporated | Integrated psychoacoustic bass enhancement (pbe) for improved audio |
WO2021119177A1 (en) * | 2019-12-09 | 2021-06-17 | Dolby Laboratories Licensing Corporation | Multiband limiter modes and noise compensation methods |
Non-Patent Citations (1)
Title |
---|
COKER KENNETH ET AL: "A Survey on Virtual Bass Enhancement for Active Noise Cancelling Headphones", 2019 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), IEEE, 23 October 2019 (2019-10-23), pages 1 - 5, XP033761021, DOI: 10.1109/ICCAIS46528.2019.9074630 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI416868B (en) | A method and an apparatus for processing an audio signal | |
US9554226B2 (en) | Headphone response measurement and equalization | |
US8538043B2 (en) | Apparatus for processing an audio signal and method thereof | |
CN112954115B (en) | Volume adjusting method and device, electronic equipment and storage medium | |
JP6102179B2 (en) | Audio processing apparatus and method, and program | |
CN110036654B (en) | Electronic device and power control method thereof | |
TWI626646B (en) | Audio system and audio control method | |
WO2023024725A1 (en) | Audio control method and apparatus, and terminal and storage medium | |
US20190066651A1 (en) | Electronic device and control method of earphone device | |
US9980043B2 (en) | Method and device for adjusting balance between frequency components of an audio signal | |
WO2021103260A1 (en) | Control method for headphones and headphones | |
KR20140006394A (en) | Noise reduction processing device of earphone and headphone. | |
JP2013098951A (en) | Sound reproducer | |
JP2012054863A (en) | Sound reproducing apparatus | |
WO2007017809A1 (en) | A device for and a method of processing audio data | |
JP2008228198A (en) | Apparatus and method for adjusting playback sound | |
WO2024061436A1 (en) | Adaptive audio enhancement system | |
US20240236613A1 (en) | A method, device, storage medium, and headphones of headphone virtual spatial sound playback | |
KR101405847B1 (en) | Signal Processing Structure for Improving Audio Quality of A Car Audio System | |
US20210384879A1 (en) | Acoustic signal processing device, acoustic signal processing method, and non-transitory computer-readable recording medium therefor | |
WO2011031271A1 (en) | Electronic audio device | |
US20220141583A1 (en) | Hearing assisting device and method for adjusting output sound thereof | |
CN114067817A (en) | Bass enhancement method, bass enhancement device, electronic equipment and storage medium | |
KR102139599B1 (en) | Sound transferring apparatus | |
JPH08102991A (en) | Sound reproducing device for receiver |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22789895 Country of ref document: EP Kind code of ref document: A1 |