WO2024061436A1

WO2024061436A1 - Adaptive audio enhancement system

Info

Publication number: WO2024061436A1
Application number: PCT/EP2022/075904
Authority: WO
Inventors: Arvi LINTERVO; Oliver MERILAID; Antero Tossavainen
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2024-03-28

Abstract

An adaptive audio enhancement system (1) comprising an audio signal generator (3) generating a first audio signal (A1) that is simultaneously input into a signal processing arrangement (2) and output as a first human perceivable audio signal (A11). A sensor arrangement (4) detects a second audio signal (A2) and/or environmental data (ED) and transmits said second audio signal (A2) and/or environmental data (ED) as raw data (RD) to said signal processing arrangement (2). The signal processing arrangement (2) determines whether a perceived audio signal (A3) is different from said first audio signal (A1) by means of said raw data (RD). If said perceived audio signal (A3) is different from said first audio signal (A1), said signal processing arrangement (2) processes said first audio signal (A1) based on said raw data (RD) and outputs said processed first audio signal (Al) as a second human perceivable audio signal (A12).

Description

ADAPTIVE AUDIO ENHANCEMENT SYSTEM

TECHNICAL FIELD

The disclosure relates to an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, and at least one sensor arrangement.

BACKGROUND

Low-frequency performance is usually understood to be a feature of a specific device. However, the performance usually depends also on the use-case environment, several examples of which are provided below.

Interaural (“in-ear”) headsets that are improperly fitted in the ear canal cause acoustic leakage which is perceived as bad audio low-frequency quality. The leak reduces low frequency considerably. The low frequency can be equalized, i.e. the volume of the different frequency bands within the audio signal adjusted, but since the occlusion effect is large, and such small headphone speakers have their limitations, equalization is usually not sufficient to solve the problem.

Bone conduction headphones use the bones of the skull to transmit sound to the inner ear. The fit and pressure caused by the bone conduction headphones on the head will influence the perception of the emitted sound. In particular, low-frequency response is prone to change due to the fit of the bone conduction headphone. In practice, low-frequency resonances change from user-to-user and from fit-to-fit.

The room acoustic can have effects on the low-frequency performance of a low-frequency loudspeaker. Due to the long wavelengths (compared to room dimensions), standing waves are generated causing peaks and dips in the frequency response. This typically means a “boomy” sound, some frequencies having high sound pressure levels while some frequencies are not heard at all. The low frequency can be improved by equalization, but room null modes, i.e. dips, cannot be improved at all by equalization. Singing Display low-frequency performance changes when the user touches the display, and environmental factors such as indoor heating, usage in sub-zero degree weather, the device being placed in a holder, etc., can have an impact.

Furthermore, when listening to audio in an environment with a high background noise level, the audio content will be masked. Masking is a phenomenon where two sounds are played simultaneously, but only the loudest one is heard. This occurs when the loudness of the louder sound source is high enough compared to the quieter sound. In such cases, higher overall listening levels might help, though higher listening levels is not good for the listener. However, higher listening levels may be impossible since small portable audio playback devices (such as mobile phones, tablets, hearables, or wearables) might not have enough sound output power and, frequently, the sound output power is limited in lower frequencies since there is more headroom in higher frequencies. Additionally, the noise content is usually higher in the lower frequencies.

Virtual Bass Enhancement (VBE), also known as Psychoacoustic Bass Enhancement (PBE), algorithms may be used as a bass improvement scheme, which is based on the Missing Fundamental phenomenon. The Missing Fundamental is a psychoacoustic effect where series of harmonic frequencies (integer multiplications 2, 3..., n) of a certain fundamental frequency (/o ) are perceived as producing the same pitch as the fundamental frequency. Pitch is a subjective attribute of a sound that can be positioned to scale from low to high, and low pitch is commonly perceived as bass sound. The Missing Fundamental phenomenon can therefore be used to improve the perceived bass response of a loudspeaker or headphones.

There are different methods for generating harmonic frequencies. The two most common methods are the time-domain and frequency-domain methods. In time-domain methods, the filtered bass signal is fed to a non-linear device which applies a non-linear function to the bass signal. Non-linear functions generate non-linear distortion, which comprises harmonically related frequencies. In frequency-domain methods, the harmonic frequencies are added by utilizing Fourier transform to the input signal, and the magnitudes of frequencies modified, or by pitch shifting the original signal to higher frequency regions of harmonic frequencies. However, prior art solutions do not take into account bass performance which varies due to changing external and listening conditions. Hence, there is a need for an improved adaptive audio enhancement system.

SUMMARY

It is an object to provide an improved adaptive audio enhancement system. The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description, and the figures.

According to a first aspect, there is provided an adaptive audio enhancement system comprising a signal processing arrangement, an audio signal generator configured to generate a first audio signal, the first audio signal being simultaneously input into the signal processing arrangement and output as a first human perceivable audio signal, and at least one sensor arrangement configured to detect a second audio signal and/or environmental data and to transmit the second audio signal and/or environmental data as raw data to the signal processing arrangement, the signal processing arrangement being configured to determine whether a perceived audio signal is different from the first audio signal, by means of the raw data, and, if the perceived audio signal is different from the first audio signal, to process the first audio signal based on the raw data, and to output the processed first audio signal as a second human perceivable audio signal.

Such a solution allows a constantly varying bass response by adaptively changing the parameters. The adaptive system will improve the accuracy of the generated harmonics, and improve the performance of the system in a high-noise environment as well as in situations where the performance of the apparatus is altered. By measuring the external impact and/or the real response of the system, the system can be configured to support only the frequencies that have improper playback. Furthermore, the system can be used to mitigate the impact of a noisy environment.

In a possible implementation form of the first aspect, only the first human perceivable audio signal is output if the perceived audio signal is determined to be equal to the first audio signal, initiating signal processing only when needed. In a further possible implementation form of the first aspect, the raw data is not processed by means of active noise cancellation, reducing the required number of processes and components for improving the quality of audio.

In a further possible implementation form of the first aspect, the first audio signal has a first frequency, the first audio signal is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency, and the second human perceivable audio signal comprises the first frequency and the at least one harmonic frequency, generating a psychoacoustic effect improving the perceived pitch.

In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter independent of the raw data, allowing the system to adapt based on additional parameters such as a predetermined cut-off frequency.

In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter estimated by means of the raw data, compensating for any effects on the low frequencies due to the specific use-case.

In a further possible implementation form of the first aspect, the parameter is a cut-off frequency estimated for the first audio signal, providing a reliable limit value for when the system cannot produce sound properly.

In a further possible implementation form of the first aspect, the parameter is a harmonic amplitude ratio estimated for the first audio signal and the at least one harmonic frequency, allowing the perceived bass of the audio to be stronger.

In a further possible implementation form of the first aspect, the virtual bass enhancement algorithm comprises at least one parameter estimated by means of input signal classification, the first audio signal being classified after being input into the signal processing arrangement. This simplifies the processing of the input signal by automatically applying predetermined parameters that apply to particular situations. In a further possible implementation form of the first aspect, the sensor arrangement comprises a microphone and/or a force sensor, allowing sounds as well other environmental factors to be taken into consideration.

In a further possible implementation form of the first aspect, the force sensor is configured to detect a force applied by an actuating part of the audio signal generator, allowing the system to be used in bone conduction devices and/or to determine whether the positioning of an in-ear apparatus is optimal.

In a further possible implementation form of the first aspect, the sensor arrangement is configured to detect a change in environmental data and/or a discrepancy between predetermined environmental data and current environmental data, allowing the system to adapt to for example noise or a change in external sound levels.

In a further possible implementation form of the first aspect, the environmental data comprises at least one of environmental temperature, location relative to external objects, and weather conditions, allowing the system to adapt to not only audio but other environmental factors.

In a further possible implementation form of the first aspect, the location relative to external objects is detected by means of a frequency response of the second audio signal, allowing the perceived room response to be improved by compensating for standing waves.

In a further possible implementation form of the first aspect, the second audio signal is a signal generated externally of the adaptive audio enhancement system, allowing external noise to be taken into account.

In a further possible implementation form of the first aspect, the second audio signal is the first audio signal, allowing the first audio signal to be detected by the sensor arrangement.

According to a second aspect, there is provided an electronic apparatus for generating audio, the apparatus comprising the adaptive audio enhancement system according to the above. Such an apparatus allows a constantly varying bass response by adaptively changing the parameters. The adaptive system of the apparatus will improve the accuracy of the generated harmonics, and improve the performance of the apparatus in a high-noise environment as well as in situations where the performance of the apparatus is altered.

These and other aspects will be apparent from the embodiments described below.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following detailed portion of the present disclosure, the aspects, embodiments, and implementations will be explained in more detail with reference to the example embodiments shown in the drawings, in which:

Fig. 1 is a schematic illustration of an adaptive audio enhancement system in accordance with an example of the embodiments of the disclosure.

DETAILED DESCRIPTION

As illustrated in Fig. 1, the present invention relates to an adaptive audio enhancement system

1 comprising a signal processing arrangement 2, an audio signal generator 3 configured to generate a first audio signal Al, the first audio signal Al being simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l, and at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED and to transmit the second audio signal A2 and/or environmental data ED as raw data RD to the signal processing arrangement 2, the signal processing arrangement

2 being configured to determine whether a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD, and, if the perceived audio signal A3 is different from the first audio signal Al, to process the first audio signal Al based on the raw data RD, and to output the processed first audio signal Al as a second human perceivable audio signal A12.

The adaptive audio enhancement system 1 comprises a signal processing arrangement 2 and an audio signal generator 3 configured to generate a first audio signal Al. The adaptive audio enhancement system 1 also comprises at least one sensor arrangement 4 configured to detect a second audio signal A2 and/or environmental data ED. This allows contextual awareness to be considered by connecting the adaptation to the detection of certain circumstances or a specific sound environment such as traffic noise, a cocktail party etc. The second audio signal A2 may be a signal generated externally of the adaptive audio enhancement system 1. Instead, the second audio signal A2 may be the first audio signal Al.

The sensor arrangement 4 may be configured to detect a change in environmental data ED and/or a discrepancy between predetermined environmental data EDI and current environmental data ED2. The environmental data ED may comprise at least one of environmental temperature, location relative to external objects, and weather conditions. The location relative to external objects may be detected by means of a frequency response of the second audio signal A2.

The sensor arrangement 4 may comprise a microphone and/or a force sensor. The force sensor may be configured to detect a force applied by an actuating part of the audio signal generator 3.

The first audio signal Al is simultaneously input into the signal processing arrangement 2 and output as a first human perceivable audio signal Al l.

The detected second audio signal A2 and/or environmental data ED is transmitted, from the sensor arrangement 4 to the signal processing arrangement 2, as raw data RD.

By raw data RD is meant data that is not processed by means of active noise cancellation. Typically, an active noise cancellation output signal is a pre-processed cancellation signal of the ambient noise, which contains an inverted noise signal. In the present invention, the sensor arrangement 4 is directly connected to the signal processing arrangement 2 such that the signal processing arrangement 2 receives raw data RD that has not been analyzed, modified, or synthesized, i.e. its components have not been transformed from one format to another. However, the raw data RD may have been filtered, i.e. some components may have been removed from the raw data.

The signal processing arrangement 2 is configured to determine whether a perceived audio signal A3 is different from the first audio signal Al, by means of the raw data RD. If the perceived audio signal A3 is different from the first audio signal Al, the signal processing arrangement 2 processes the first audio signal Al based on the raw data RD and outputs the processed first audio signal Al as a second human perceivable audio signal A12. If, instead, the perceived audio signal A3 is determined to be equal to the first audio signal Al, only the first human perceivable audio signal Al 1 is output.

The first audio signal Al has a first frequency, and the first audio signal Al is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of the first frequency.

The second human perceivable audio signal A12 comprises the first frequency and the at least one harmonic frequency.

The virtual bass enhancement algorithm may comprise at least one parameter independent of the raw data RD.

Furthermore, the virtual bass enhancement algorithm may comprise at least one parameter estimated by means of the raw data RD. The estimated parameter may be a cut-off frequency estimated for the first audio signal Al . Additionally, the estimated parameter may be a harmonic amplitude ratio estimated for the first audio signal Al and the at least one harmonic frequency.

The virtual bass enhancement algorithm may also comprise at least one parameter estimated by means of input signal classification, the first audio signal Al being classified after having been input into the signal processing arrangement 2.

The first audio signal Al, also referred to as input signal, may be classified as speech, audio, or a mixture thereof. If the input signal is classified as a speech, VBE processing can be disabled or it can be adjusted with as little boost as possible. If the input signal is classified as audio, the VBE algorithm can have a different mode, for example bass boost for certain types of music and neutral boost for other types of music. If the input signal is a mixture of audio and speech (movies etc.), the VBE algorithm can have yet another mode wherein the target boost is different than that for only music. Furthermore, the input signal class could determine how steep the slope of the frequency-amplitude curve should be, e.g., as smooth as possible for music and as steep as possible for speech. The sensor(s) gather(s) data about the use-case and the environment. If any changes in the output response of the system, or in the environment, are sensed, the VBE parameters will be adjusted accordingly.

For example, in the case of in-ear headphones, a microphone can measure the amount of leakage or the real frequency response of the output in the ear canal. If the microphone senses that the apparatus does not produce low frequencies properly, it utilizes the VBE algorithm. The cutoff frequency of the VBE is estimated from the measured data. If the data indicates that the device works properly, or there is no leakage, VBE processing can be disabled or a default cutoff value be set.

For a bone-conduction apparatus, the force applied by the actuating part of the apparatus will have an impact on the low-frequency performance. If the applied force is higher than a certain pre-defined threshold, the cut-off frequency should be increased. Furthermore, VBE processing can be disabled if the measured force is below the threshold.

Improving the perceived room response is a use-case scenario wherein the position of the listener will have an impact on the perceived frequency response of the loudspeaker. If the wavelength of a frequency is longer than the room dimensions, a standing wave can occur. The impact of the listener’s position can be measured and supporting harmonic frequencies can be generated.

In a high-noise environment, the gain ratios of the harmonic frequencies should be adjusted in order to maintain proper low- frequency performance even if the device’s output power range is limited. A microphone is utilized to capture the noise level of the environment, and the amplitudes of the generated harmonic frequency components are adjusted according to this level. When the noise level increases, the amplitude of the higher harmonics should be increased, thus the difference between the amplitudes of adjacent harmonic components should decrease.

The present invention also relates to an electronic apparatus for generating audio, such as an interaural or bone conduction headset, a display, or a built-in loudspeaker, the apparatus comprising the adaptive audio enhancement system 1 described above. The various aspects and implementations have been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject-matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. The reference signs used in the claims shall not be construed as limiting the scope. Unless otherwise indicated, the drawings are intended to be read (e.g., cross-hatching, arrangement of parts, proportion, degree, etc.) together with the specification, and are to be considered a portion of the entire written description of this disclosure. As used in the description, the terms “horizontal”, “vertical”, “left”, “right”, “up” and “down”, as well as adjectival and adverbial derivatives thereof (e.g., “horizontally”, “rightwardly”, “upwardly”, etc.), simply refer to the orientation of the illustrated structure as the particular drawing figure faces the reader. Similarly, the terms “inwardly” and “outwardly” generally refer to the orientation of a surface relative to its axis of elongation, or axis of rotation, as appropriate.

Claims

1. An adaptive audio enhancement system (1) comprising:

-a signal processing arrangement (2);

-an audio signal generator (3) configured to generate a first audio signal (Al); said first audio signal (Al) being simultaneously input into said signal processing arrangement (2) and output as a first human perceivable audio signal (Al 1); and

-at least one sensor arrangement (4) configured to detect a second audio signal (A2) and/or environmental data (ED) and to transmit said second audio signal (A2) and/or environmental data (ED) as raw data (RD) to said signal processing arrangement (2); said signal processing arrangement (2) being configured to determine whether a perceived audio signal (A3) is different from said first audio signal (Al), by means of said raw data (RD), and, if said perceived audio signal (A3) is different from said first audio signal (Al), to process said first audio signal (Al) based on said raw data (RD), and to output said processed first audio signal (Al) as a second human perceivable audio signal (A12).

2. The adaptive audio enhancement system (1) according to claim 1, wherein, if said perceived audio signal (A3) is determined to be equal to said first audio signal (Al), only said first human perceivable audio signal (Al 1) is output.

3. The adaptive audio enhancement system (1) according to claim 1 or 2, wherein said raw data (RD) is not processed by means of active noise cancellation.

4. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said first audio signal (Al) has a first frequency, said first audio signal (Al) is processed by means of a virtual bass enhancement algorithm configured to generate at least one harmonic frequency of said first frequency, and wherein said second human perceivable audio signal (A12) comprises said first frequency and said at least one harmonic frequency.

5. The adaptive audio enhancement system (1) according to claim 4, wherein said virtual bass enhancement algorithm comprises at least one parameter independent of said raw data (RD).

6. The adaptive audio enhancement system (1) according to claim 4 or 5, wherein said virtual bass enhancement algorithm comprises at least one parameter estimated by means of said raw data (RD).

7. The adaptive audio enhancement system (1) according to claim 5, wherein said parameter is a cut-off frequency estimated for said first audio signal (Al).

8. The adaptive audio enhancement system (1) according to claim 5, wherein said parameter is a harmonic amplitude ratio estimated for said first audio signal (Al) and said at least one harmonic frequency.

9. The adaptive audio enhancement system (1) according to any one of claims 4 to 8, wherein said virtual bass enhancement algorithm comprises at least one parameter estimated by means of input signal classification, said first audio signal (Al) being classified after being input into said signal processing arrangement (2).

10. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said sensor arrangement (4) comprises a microphone and/or a force sensor.

11. The adaptive audio enhancement system (1) according to claim 10, wherein said force sensor is configured to detect a force applied by an actuating part of said audio signal generator (3).

12. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said sensor arrangement (4) is configured to detect a change in environmental data (ED) and/or a discrepancy between predetermined environmental data (EDI) and current environmental data (ED2).

13. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said environmental data (ED) comprises at least one of environmental temperature, location relative to external objects, and weather conditions.

14. The adaptive audio enhancement system (1) according to claim 13, wherein said location relative to external objects is detected by means of a frequency response of said second audio signal (A2).

15. The adaptive audio enhancement system (1) according to any one of the previous claims, wherein said second audio signal (A2) is a signal generated externally of said adaptive audio enhancement system (1).

16. The adaptive audio enhancement system (1) according to any one of claims 1 to 15, wherein said second audio signal (A2) is said first audio signal (Al).

17. An electronic apparatus for generating audio, said apparatus comprising the adaptive audio enhancement system (1) according to any one of claims 1 and 16.