CN117652161A - Audio processing method for playback of immersive audio - Google Patents

Audio processing method for playback of immersive audio Download PDF

Info

Publication number
CN117652161A
CN117652161A CN202280050234.XA CN202280050234A CN117652161A CN 117652161 A CN117652161 A CN 117652161A CN 202280050234 A CN202280050234 A CN 202280050234A CN 117652161 A CN117652161 A CN 117652161A
Authority
CN
China
Prior art keywords
audio
phase
loudspeakers
height
audio signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280050234.XA
Other languages
Chinese (zh)
Inventor
C·P·布朗
M·J·史密瑟斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority claimed from PCT/US2022/037809 external-priority patent/WO2023009377A1/en
Publication of CN117652161A publication Critical patent/CN117652161A/en
Pending legal-status Critical Current

Links

Abstract

A method (200) of processing audio in an immersive audio format including at least one high-level audio channel, the method comprising: obtaining (250) two height audio signals from at least a portion of at least one height audio channel; modifying (270) the relative phase between the two high-altitude audio signals in a frequency band in which the phase differences are mainly out of phase to obtain two phase-modified high-altitude audio signals; and playing back (290) the processed audio comprising the two phase modified altitude audio signals with at least two audio loudspeakers. The phase difference occurs at one or more listening positions symmetrically off-centered with respect to the at least two loudspeakers due to having a monaural signal emanating from the two audio loudspeakers, the at least two loudspeakers being laterally spaced apart with respect to each of said one or more listening positions. The method allows perception of sound height/elevation without using overhead speakers.

Description

Audio processing method for playback of immersive audio
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application No. 63/226,529, filed on day 28 of 7 in 2021, and from european patent application No. 21188202.2 filed on day 28 of 7 in 2021, each of which is incorporated herein by reference in its entirety.
Technical Field
The present disclosure relates to the field of audio processing. In particular, the present disclosure relates to a method of processing audio in an immersive audio format for playback of the processed audio with a non-immersive loudspeaker system. The disclosure further relates to an apparatus comprising a processor configured to perform the method, a vehicle comprising the apparatus, a program and a computer readable storage medium.
Background
Vehicles typically contain a loudspeaker system for audio playback. A loudspeaker system in a vehicle may be used to play back audio from an audio streaming service or application, for example, a tape, a CD, in an automotive entertainment system of the vehicle, or remotely via a device connected to the vehicle. The device may be a portable device connected to the vehicle, for example, wirelessly or by a cable. For example, recently streaming services such as Spotify and Tidal have been integrated into car entertainment systems, either directly into the hardware of the vehicle (commonly referred to as "car-set"), or via smart phones using bluetooth or Apple CarPlay or Android Auto. A loudspeaker system in the vehicle may also be used to play terrestrial and/or satellite broadcasts. A conventional loudspeaker system for a vehicle is a stereo loudspeaker system. A stereo loudspeaker system may comprise a total of four loudspeakers: a pair of front loudspeakers and a pair of rear loudspeakers for front passengers and rear passengers, respectively. However, in recent years, with the introduction of DVD players in vehicles, surround sound loudspeaker systems have been introduced in vehicles to support playback of DVD audio formats. Fig. 1 shows an interior view of a vehicle 100. The vehicle 100 includes a surround sound loudspeaker system that includes loudspeakers 10, 11, 30, 31, 41, 42, and 43. Only the loudspeaker on the left side of the vehicle 100 is shown. The corresponding loudspeakers may be symmetrically arranged on the right side of the vehicle 100. In particular, the surround sound loudspeaker system of fig. 1 comprises: pairs of tweeters 41, 42 and 43, a pair of full frequency front and rear loudspeakers 30 and 31, a center loudspeaker 10, and a low frequency effects loudspeaker or subwoofer 11. The tweeter 41 is placed close to the vehicle dashboard. The tweeter 42 is placed at the lower part of the front pillar of the vehicle 100. However, the tweeters 41, 42, 43 and the full frequency front and rear loudspeakers 30, 31 may be placed in any location suitable for the particular implementation.
Immersive audio is becoming the mainstay of cinema or home listening environments. As immersive audio becomes mainstream in movie theatres or households, it is naturally believed that immersive audio will also be played back within the vehicle. Dolby panoramic sound (Dolby Atmos) music has been available through various streaming services. Immersive audio is typically distinguished from surround sound audio formats by the inclusion of overhead or high-level audio channels. Thus, in order to play back immersive audio, a head top or tweeter is used. While high-end vehicles may contain such overhead or high-end loudspeakers, most conventional vehicles still use stereo loudspeaker systems or more advanced surround sound loudspeaker systems, as shown in fig. 1. In fact, the high-height loudspeaker greatly increases the complexity of the loudspeaker system in the vehicle. The tweeter needs to be placed on the roof of the vehicle, which is generally not suitable for this purpose. For example, vehicles typically have a low roof, which limits the available height at which to place a height loudspeaker. In addition, vehicles are often sold with the option of installing skylights to open the windows of the roof, which makes integration or placement of a high-height loudspeaker on the roof a difficult industrial design challenge. Such a tweeter may also require additional audio cables. For all these reasons, integrating a high-level loudspeaker in a vehicle can be expensive due to space and industrial design constraints.
Disclosure of Invention
It would be advantageous to play back immersive audio content in a non-immersive loudspeaker system (e.g., a stereo loudspeaker system or a surround sound loudspeaker system). In the context of the present disclosure, a "non-immersive loudspeaker system" is a loudspeaker/speaker system comprising at least two loudspeakers but no overhead loudspeakers (i.e. no height speakers).
It would be advantageous to create a perception of sound pitch by playback of immersive audio content into a non-immersive loudspeaker system, thereby enhancing the audio experience of a user even without the use of a head-mounted loudspeaker.
Aspects of the present disclosure provide a method of processing audio in an immersive audio format including at least one high-level audio channel for playback of the processed audio with a non-immersive loudspeaker system of at least two audio loudspeakers in a listening environment including one or more listening positions. Each of the one or more listening positions is symmetrically off-centered with respect to the at least two loudspeakers. Each of the at least two loudspeakers is laterally spaced apart relative to each of the one or more listening positions such that when two monaural audio signals emanate from the at least two loudspeakers, a phase difference (e.g., inter-loudspeaker differential phase IDP) occurs at the one or more listening positions due to the acoustic properties of the listening environment. The method comprises obtaining two (monaural/identical) height audio signals from at least a portion of the at least one height audio channel; modifying the relative phase between the two height audio signals in a frequency band in which the phase differences (mainly) are out of phase (e.g., IDP occurring at the one or more listening positions when the two height channels emanate from the at least two loudspeakers) to obtain two phase-modified height audio signals in which the phase differences are (mainly) in phase; and playing back the processed audio at the at least two audio loudspeakers, wherein the processed audio comprises the two phase modified altitude audio signals.
For a listening position that is symmetrically off-centered with respect to at least two loudspeakers, two monaural audio signals emanating from the at least two loudspeakers are perceived at the listening position with a delay in the time domain. The delay corresponds to a phase difference of the two monaural signals in the frequency domain as a function of frequency at the listening position.
According to the psychoacoustic phenomenon studied by the inventors, when the listening position is centered with respect to the two loudspeakers and when the two loudspeakers are laterally spaced apart with respect to the listening position, the audio sources emitted by the two loudspeakers may be perceived as having a sound pitch. The larger the lateral spacing of the two loudspeakers relative to the centered listening position, the greater the perceived sound height at the listening position, i.e., the greater the elevation angle of the sound.
Advantageously, for two loudspeakers laterally spaced apart with respect to each of the one or more listening positions, the sound pitch is created by centering the height channel with respect to the two loudspeakers. Centering the altitude channels is performed by obtaining two altitude audio signals from at least a portion of the at least one altitude audio channel and modifying the relative phase between the two altitude audio signals in a frequency band where the phase difference is (mainly) out of phase to obtain two phase modified altitude audio signals where the phase difference is (mainly) in phase. The processed audio signal played back at the two loudspeakers comprises the two phase modified high audio signals. The two phase modified altitude-audio signals provide a "centered" altitude-audio channel. Since the processed audio signal comprises a "centered" height audio signal, the listener(s) located at the one or more listening positions may perceive sound height. Advantageously, the perception of sound pitch is created by playback of the processed audio into a non-immersive loudspeaker system (i.e., without using a head-top loudspeaker).
In an embodiment, the audio in the immersive audio format further comprises at least two audio channels, and the method further comprises mixing each of the two phase-modified height audio signals with each of the two audio channels (e.g., one).
In an embodiment, the audio in the immersive audio format further comprises a central channel, and the method further comprises mixing each of the two phase modified height audio signals with the central channel.
In an embodiment, the audio in the immersive audio format has a single height audio channel and obtaining the two height audio signals comprises obtaining two identical height audio signals each corresponding to the single height audio channel.
In an embodiment, the audio in the immersive audio format comprises at least two height audio channels, and obtaining the two height audio signals comprises obtaining two identical height audio signals from the at least two height audio channels.
In an embodiment, the method further comprises applying a mid/side processing to the at least two high audio channels to obtain a mid signal and a side signal. Each of the two height audio signals corresponds to the intermediate signal.
In an embodiment, the method further comprises mixing the side signal and a signal corresponding to the side signal but having an opposite phase to the side signal with the phase modified high audio signal.
Another aspect of the present disclosure provides an apparatus comprising a processor and a memory coupled to the processor, wherein the processor is configured to perform any of the methods described in the present disclosure.
Another aspect of the present disclosure provides a vehicle including such an apparatus.
Other aspects of the present disclosure provide a program comprising instructions that when executed by a processor cause the processor to perform a method of processing audio, and further provide a computer readable storage medium storing such a program.
Drawings
Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
figure 1 schematically illustrates an interior view of a vehicle having a loudspeaker system arranged in accordance with an embodiment of the present disclosure,
figure 2 is a flowchart illustrating an example of a method of processing audio in an immersive format according to an embodiment of the present disclosure,
Figure 2a is a flowchart illustrating an example of a method of obtaining two altitude audio signals according to some embodiments of the present disclosure,
figure 2b is a flow chart illustrating an example of a method of modifying the relative phase between two altitude audio signals,
figure 3 schematically shows a vehicle,
fig. 4a schematically shows the spatial relationship of a listening position to two loudspeakers, wherein the listening position is equidistant from the loudspeakers,
figure 4b schematically shows an idealized inter-aural phase difference (IDP) response for all frequencies at equidistant listening positions of figure 4a,
figure 5a schematically shows the spatial relationship of the listening position offset with respect to two loudspeakers,
figure 5b schematically shows an idealized inter-ear phase difference (IDP) response for all frequencies at the listening position of figure 5a,
figure 6 schematically shows how the perception of height at listening positions equidistant from two loudspeakers varies according to the degree of lateral spacing of the loudspeakers,
fig. 7a schematically shows the spatial relationship of two listening positions, each symmetrically offset with respect to two loudspeakers,
figures 7b and 7c schematically show how the IDP of each of the two listening positions shown in figure 7a varies with frequency,
Figure 8 schematically illustrates an example of a method of processing audio in an immersive format according to an embodiment of the disclosure,
figure 9 schematically illustrates an example of a method of processing audio in an immersive format according to an embodiment of the disclosure,
figure 10 schematically shows an example of a method of obtaining a altitude-audio signal from two altitude-audio channels,
figure 11 schematically shows another example of a method of obtaining a altitude-audio signal from two altitude-audio channels,
figure 12a shows a functional schematic block diagram of a possible prior art FIR-based implementation applied to one of the two height channels (in this case the left height channel),
figure 12b shows a functional schematic block diagram of a possible prior art FIR-based implementation applied to one of the two elevation channels (in this case the right elevation channel),
figure 13a shows an idealised amplitude response of the signal output 703 of the filter or filter function 702 of figure 12a,
figure 13b shows an idealised amplitude response of the signal output 709 of the subtractor or subtractor function 708 of figure 12a,
figure 13c shows an idealized phase response of the output signal 715 of figure 12a,
figure 13d shows an idealized phase response of the output signal 735 of figure 12b,
Figure 13e shows an idealized phase response representing the relative phase difference between the output signal 715 of figure 12a and the output signal 735 of figure 12b,
fig. 13f and 13g schematically show, for each of the two listening positions shown in fig. 7a, how the corrected IDP varies with frequency,
fig. 14 is a schematic diagram of an example of an apparatus for performing a method according to an embodiment of the disclosure.
Detailed Description
Numerous specific details are set forth below to provide a thorough understanding of the present disclosure. However, the present disclosure may be practiced without these specific details. In addition, well-known portions may be described in a less detailed manner. The figures are schematic and include parts relevant to understanding the present disclosure, while other parts may be omitted or merely presented.
Fig. 2 shows a flowchart illustrating an example of a method 200 of processing audio in an immersive audio format in accordance with an embodiment of the present disclosure. The method 200 may be used to play back processed audio with a non-immersive loudspeaker system of at least two audio loudspeakers in a listening environment. The listening environment may be an interior of a vehicle (e.g., an automobile). The listening environment may be the interior of any type of passenger or non-passenger vehicle (e.g., for commercial purposes or for transporting cargo). However, the listening environment is not limited to the inside of the vehicle. In general, as will be shown in more detail below, the present disclosure relates to any listening environment in which two loudspeakers of a non-immersive loudspeaker system are laterally spaced apart relative to one or more listening positions, and in which the one or more listening positions are symmetrically off-centered relative to the two loudspeakers. In particular, it has been found that in a vehicle, loudspeakers are arranged in such a way that these conditions are generally met.
For example, referring to fig. 3, a vehicle 100 (in this example, a four-seat car) is schematically depicted. For simplicity, the arrangement of loudspeakers is not shown in fig. 3, but is shown in a more detailed interior view of the vehicle 100 of fig. 1. Passenger car 100 has four seats 110, 120, 130, and 140. When considering the loudspeaker system shown in fig. 1, the loudspeakers 30, 31, 41, 42, 43 will have corresponding loudspeakers (not shown in the figures) arranged on the right hand side of the vehicle 100. Referring to fig. 3, the loudspeakers located on the left hand side of the vehicle 100 and the corresponding loudspeakers located on the right hand side of the vehicle 100 are arranged reflectively symmetrically with respect to a central axis 150 (through the center of the vehicle along the length of the vehicle 100). It should be appreciated that each of the seats 110, 120, 130 and 140, and thus the potential listeners located thereon, are symmetrically off-centered with respect to any pair of loudspeakers including loudspeakers 30, 31, 41, 42, 43 and their respective corresponding loudspeakers on the right hand side of the vehicle. For example, an operator sitting in the operator's seat 110 will be symmetrically off-centered between the loudspeakers 30, 41, 42 and the corresponding right hand side loudspeakers (not shown in the figures). The driver will be closer to the loudspeakers 30, 41 and 42 than the corresponding loudspeakers on the right hand side of the vehicle 100. In fig. 1 and 3, the driver seat is shown on the left side of the vehicle 100 (left side with respect to the forward direction of driving). However, it should be understood that the location of the driver's seat in the vehicle may vary from region to region. For example, in the united kingdom, australia or japan, the driver's seat is located on the right side of the vehicle with respect to the forward direction of driving the vehicle.
The non-immersive loudspeaker system may be, for example, a stereo loudspeaker system or a surround sound loudspeaker system (as shown with reference to fig. 1).
In an embodiment, the audio in the immersive audio format may be audio rendered in the immersive audio format.
The immersive audio format of the (e.g., rendered) audio may include at least one height channel. In an embodiment, the immersive audio format may be a dolby panoramic sound format. In another embodiment, the immersive audio format may be an X.Y.Z audio format, where X.gtoreq.2 is the number of front or surround sound audio channels, Y.gtoreq.0 if present is a low frequency effect or subwoofer audio channel, and Z.gtoreq.1 is at least one high audio channel. The loudspeaker system shown in fig. 1 is a typical 5.1 loudspeaker system for playback of 5.1 audio, having 5 front or surround sound loudspeakers, two left audio loudspeakers (e.g., left loudspeaker and left surround sound loudspeaker), two right audio loudspeakers (e.g., right loudspeaker and right surround sound loudspeaker), one center loudspeaker and one LFE loudspeaker. These two left audio loudspeakers correspond to loudspeakers 30, 31 (for medium or full frequency), 41, 42 and 43 (for high frequency). The central loudspeaker corresponds to the loudspeaker 10.
Referring to fig. 2, a method 200 includes obtaining 250 two altitude-audio signals from at least a portion of at least one altitude-audio channel. As explained above with reference to fig. 1 and 3, in a vehicle, each of one or more listening positions is symmetrically off-centered with respect to at least one pair of two loudspeakers. Each of the pair of two loudspeakers is laterally spaced apart relative to each of the one or more listening positions. When two monaural signals are emitted from the two loudspeakers and the listening positions are symmetrically off-centered with respect to the two loudspeakers, a phase difference occurs at one or more of the listening positions due to the acoustic properties of the listening environment. The phase difference typically occurs in a plurality of frequency bands in which the phase difference alternates between being predominantly in-phase and predominantly out-of-phase.
The method 200 further comprises modifying 270 the relative phase between the two high audio signals in a frequency band in which the phase differences are mainly out of phase to obtain two phase modified high audio signals in which the phase differences are mainly in phase. The method 200 further includes playing back 290 the processed audio at the at least two audio microphones. The processed audio comprises the two phase modified high audio signals.
For further explanation, we will refer to fig. 4a and 4b. The time difference at the listening position is equivalent to the phase difference varying with the frequency. For the following discussion, the term "inter-loudspeaker differential phase" (IDP) is defined as the phase difference of sound arriving at a listening position from a pair of stereo loudspeakers.
We assume a stereo loudspeaker system in which there is a left loudspeaker and a right loudspeaker (see fig. 4 a). The IDP is substantially imperceptible to a listener positioned equidistant from the left and right loudspeakers, because the time required for the sound presented by both loudspeakers to reach the listener's ears is the same (see fig. 4 b).
Fig. 5a schematically shows the spatial relationship of the listening position offset with respect to the two loudspeakers. In the example of fig. 5a, the listener is offset (not equidistant) from a pair of stereo loudspeakers, that is, the listener is closer to one of the loudspeakers. When the listener is not equidistant from a pair of loudspeakers (as in fig. 5 a), the sound common to both loudspeakers reaches the listener at different times, with the sound from the nearest loudspeaker reaching first. The relative time delay causes the IDP to vary across frequencies as shown in fig. 5 b. IDP has a periodic behavior in that it increases and decreases linearly with frequency periodically. Frequencies with values closer to 0 degrees than-180 degrees or 180 degrees (i.e., -90 degrees and 90 degrees) are considered "in-phase" or enhanced, while frequencies closer to-180 degrees or 180 degrees than 0 degrees (i.e., between 90 degrees and 180 degrees or-90 degrees and-180 degrees) are considered "out-of-phase" or offset (see fig. 4b and 5b, dashed lines indicate-90 degrees and +90 degrees). For audio common to both loudspeakers (monaural audio), a change in sound level across frequencies at the listening position can result in poor tone perception. The change in phase results in poor spatial or directional perception. In a typical vehicular environment, i.e., where there is a delay due to the typical distance of a listener from the two loudspeakers, the IDP of each listener is as follows. Frequencies between 0 and about 250Hz are predominantly in phase, i.e., IDP is between-90 degrees and 90 degrees. Frequencies between about 250Hz and 750Hz are predominantly out of phase, i.e., IDP is between 90 degrees and 180 degrees or between-90 degrees and-180 degrees. Frequencies between about 750Hz and 1250Hz are predominantly in phase. This alternating sequence of predominantly in-phase and predominantly out-of-phase bands continues with increasing frequency up to the human hearing limit of about 20 kHz. In this example, the period repeats every 1 kHz. The exact starting and ending frequencies of the frequency bands are a function of the internal dimensions of the vehicle and the position of the listener (listening position).
Fig. 6 schematically shows how the perception of the pitch of the sound at a listening position 6 equidistant from the two loudspeakers varies depending on the extent of the lateral spacing of the two loudspeakers from the listening position 6.
When a listener at the listening position 6 is equidistant from and plays the same audio signal (monaural or monaural) from two stereo loudspeakers located in front of and very close to the listener (e.g., spaced relatively narrow around the front of the listener), sound sounds from between the two loudspeakers without an increase in elevation being perceived, and so terms such as "phantom" are used. For a narrower spacing loudspeaker, in the example of fig. 6, e.g. at a narrower position than positions 15 and 17, the sound sounds originate from a position near 7, i.e. with little or no perception of the pitch or elevation of the sound.
As the lateral or angular spacing of the loudspeakers with respect to the front viewing direction of the listener 6 increases, the perceived sound height (the so-called phantom) tends to increase in elevation.
In the example shown in fig. 6, for loudspeakers at positions 15 and 17, this corresponds to sound perceived at a position closer to 16. For loudspeakers at positions 18 and 20, this corresponds to sound perceived at a position closer to 19. For loudspeakers at positions 21 and 23, the sound perceived at a position closer to 22 is corresponding, and for loudspeakers at positions 24 and 26, the sound perceived at a position closer to 25 is corresponding. In other words, as the angle between the loudspeakers increases, the perceived sound pitch of the phantom sound image increases. This psychoacoustic phenomenon tends to work best at lower frequencies (e.g., frequencies below 5 kHz).
The literature "Elevation localization and head-related transfer function analysis at low frequencies [ elevation localization at low frequencies and head related transfer function analysis ]" (journal of the american society of acoustics 109,1110 (2001)) by ralph Algazi, carlos Avendano and Richard o.duda ] indicates that torso reflection may be a major clue to sound elevation/elevation perception at low frequencies. When the crosstalk inter-ear delay (i.e., the delay of the loudspeaker audio signal reaching the ear on the opposite side of the loudspeaker from the symmetrical loudspeaker) matches the shoulder reflection delay of the real overhead audio source, the resulting phantom sound image may be perceived as lifting to a similar location of the real overhead audio source in the mid-plane. When the loudspeakers are placed over the listener's head, the listener's ears hear direct sound from the loudspeakers and later reflected sound from the torso/shoulders. It has been found that this delay from direct sound to reflected sound is about the same as the delay introduced by the interaural crosstalk delay of the head when the loudspeakers are laterally, and particularly widely spaced relative to the listening position (e.g., positions 24 and 26 in fig. 6). In the case of crosstalk, since the source is monaural (the same for both loudspeakers), it appears to the ear as the same source, except that the delay resembles a torso reflection.
As the angular spacing of the loudspeakers becomes larger (up to +/-90 degrees), the crosstalk delay of the listener's head increases and thus the perceived elevation of sound increases.
Theoretically, this crosstalk delay of the listener's head is responsible for the increase in phantom center height.
The inventors have realized that such psycho-acoustic phenomena may be used in loudspeaker systems, such as those of vehicles, where the angular separation between the loudspeakers is typically large (e.g. greater than a minimum angular value, such as greater than 10, 15 or 20 degrees). However, this phenomenon is reproduced when the listening position or listener is symmetrically positioned with respect to the angularly spaced loudspeakers. This is not typically the case in vehicles, as the seat assigned by the passenger (see fig. 3) is symmetrically off-centered with respect to the loudspeaker of the loudspeaker system (see fig. 1 and 3).
Accordingly, the inventors have recognized that in order to provide perception of sound pitch in a vehicle or in a listening environment having a pair of loudspeakers appropriately spaced apart, a listener should perceive the sound image at the listening position as symmetrically positioned with respect to the pair of loudspeakers. In other words, the sound image should be "virtually centered". In the case of a single listening position as shown in fig. 5a, this problem can be solved simply by introducing a delay to the audio signal played back by the remote loudspeaker, thereby compensating for the different times at which the audio signal emitted from the loudspeaker arrives at the listening position. Introducing a delay has the same effect of reducing the relative phase between the two audio signals in the corresponding frequency band where the phase differences are mainly out of phase (see fig. 5 b). This reduction in phase has the effect of ideally achieving a flat IDP as shown in fig. 4b, or an IDP in the range between-90 degrees and 90 degrees for all frequencies. However, introducing a delay will only flatten the IDP of one of the two off-center listening positions shown in fig. 7 a. In contrast, as shown below, the virtual centering process of the present disclosure may also be used to correct IDPs of two listening positions off-center, thereby increasing the high perception of the two listening positions.
In contrast to the prior art, a different use of virtually centering signals is contemplated in the present disclosure. Instead of virtually centering the entire (monaural) audio signal, only the portion of the audio signal that needs to perceive the elevation angle of sound is "virtually centered". In an audio signal in immersive format, this portion of the audio signal corresponds to a height channel. In this disclosure, only the altitude channel or a portion thereof (or its audio signal) is "virtually centered", so that only the altitude channel is perceived as sound altitude/elevation, as described with reference to fig. 6. A frequency dependent phase difference between two monaural audio signals emitted by the two off-center symmetrically located loudspeakers (simultaneously) is obtained for each of the plurality of frequency bands. Once the phase difference for each frequency band is obtained, the altitude channels may be "virtually centered" by modifying the phase between the two (e.g., monaural) altitude audio signals in the corresponding frequency band where the phase differences are found to be predominantly out of phase.
This is achieved by a single height channel (the so-called "emperor"). The audio signals corresponding to the same height channel are used as monaural audio signals and are processed by modifying the relative phase between the two equal monaural audio signals so obtained.
The height audio signal with the modified phase is then played back in the processed audio through the two audio loudspeakers of the non-immersive loudspeaker system, so that the sound has elevation/height perception due to the virtually centered height channel.
In embodiments, the audio in the immersive audio format may include one or more altitude-audio channels, but may also include one or more additional audio channels that are different from the one or more altitude-audio channels. In an embodiment, any other audio channel than the one or more height channels is not virtually centered. Alternatively, additionally or alternatively, some or all of the additional audio channels are also virtually centered in a separate "virtual centering" process or algorithm.
In the discussion above, we assumed a single listening position that is symmetrically off-centered with respect to a pair of (e.g., stereo) loudspeakers.
However, for example in a vehicle, there may be two listeners (e.g., located at different listening positions), for example, in each row of the vehicle as shown in fig. 3.
Fig. 7a schematically shows the spatial relationship of two listening positions, each position being symmetrically off-centered with respect to the two loudspeakers (left and right loudspeakers).
Fig. 7b and 7c schematically show how the IDP of each of the two listening positions shown in fig. 7a varies with frequency. Furthermore, in this example of an IDP, it can be seen that for each period of the IDP there is a frequency that is predominantly in-phase and a frequency that is predominantly out of phase. I.e. frequencies between-90 degrees and 90 degrees of IDP and frequencies between-90 degrees and-180 degrees or between 90 degrees and 180 degrees of IDP.
The frequencies at which the IDPs are predominantly out of phase lead to undesirable audible effects including imaging blur of the audio signal presented by the two loudspeakers. A solution to this problem is found in EP1994795B1, which is incorporated herein by reference in its entirety. In EP1994795B1 it is shown that two listening positions that are symmetrically off-centered from the same pair of (stereo) loudspeakers can be "virtually centered" simultaneously. This follows the same principle of reducing the phase difference of the IDPs of a single listening position. In the case of two listening positions, the phase difference of the IDPs obtained for each of the two listening positions is simultaneously reduced such that the value of each IDP at each listening position is between-90 degrees and 90 degrees in the desired frequency range.
However, in the present disclosure, the simultaneous "virtual centering" of two listening positions symmetrically off-centered from the same pair of (stereo) loudspeakers does not reduce the effects of undesired audible effects, such as imaging blur of the audio signal, but rather has the effect of providing a high degree of perception of sound emanating from the loudspeakers. This is achieved by using only one or more altitude channels of audio in an immersive audio format as input to a "virtual centering algorithm", as described for example in EP1994795B 1. The virtual centering algorithm virtually centers only a portion of one or more height channels. According to the psycho-acoustic phenomenon described with reference to fig. 6, the inherently large angular (lateral) arrangement of loudspeakers (e.g. in a vehicle's loudspeaker system) is used to provide a high degree of perception in the sound emitted by the pair of loudspeakers.
In an embodiment, the (e.g. rendered) audio comprises not only at least one height channel, but also at least two other audio channels. In this embodiment, referring to fig. 2, the method 200 may further comprise mixing 280 each of the at least two phase modified high audio signals with each of the two other audio channels.
This embodiment will be explained with reference to fig. 2, 2a and 8, wherein it is assumed that the audio in the immersive audio format has a single audio height channel and two additional audio channels.
Fig. 8 schematically illustrates an example of a method of processing audio in an immersive format according to an embodiment of the disclosure. The immersive audio format may include a single height audio channel 80 and two other audio channels 81 and 82. In block 90, two altitude-audio signals 92 and 94 are obtained from at least a portion of altitude-audio channel 80.
Fig. 2a is a flowchart illustrating an example of a method of obtaining two altitude audio signals according to some embodiments of the present disclosure.
In an embodiment, referring to fig. 2a, obtaining 250 the two height audio signals comprises obtaining 255 two identical height audio signals each corresponding to a single height audio channel. Block 90 of fig. 8 may employ the input altitude-audio channel 80 and may input this same signal as altitude-audio signals 92 and 94 to a "virtual centering algorithm" block 300. In the context of the present disclosure, block 300 is configured to perform a "virtual centering algorithm". A "virtual centering algorithm" takes as input two audio signals emitted from two loudspeakers symmetrically off-centered and laterally spaced apart relative to one or more listening positions and provides as output two phase-modified audio signals such that the relative phase between the two input signals is modified such that a listener located at one or more listening positions perceives the output audio signals substantially at the centers of the two laterally spaced apart loudspeakers. This may be achieved by reducing the inter-aural phase difference or inter-loudspeaker differential phase (IDP) between two audio channels corresponding to the two loudspeakers for playback. In the context of the present disclosure, a "virtual centering algorithm" is advantageously and creatively applied to an input audio signal derived from one or more elevation channels of audio in an immersive audio format, thereby providing a perception of audio elevation/elevation to a listener located at one or more listening positions of audio played back by a loudspeaker.
In an embodiment, the non-immersive loudspeaker system for playback of the processed audio may be a stereo loudspeaker with a left loudspeaker 1 and a right loudspeaker 2 as shown in fig. 8.
In an embodiment, more than one single height channel may be input to the block 90. For example, two high audio channels may be input to block 90. For example, the immersive audio format may include two high-level audio channels. In this embodiment, obtaining 250 the two height audio signals may comprise obtaining 240 two identical audio height audio signals from the two audio channels (see step 240, see fig. 2 a). When the immersive audio includes two high-level audio channels, block 90 may be configured to pass the two high-level audio channels as signals 92 and 94, respectively (i.e., without performing any particular function) to block 300. For example, assume that in this example, the non-immersive loudspeaker system is a front (or rear) stereo loudspeaker system of a vehicle having a front left (or rear) loudspeaker 1 and a front right (or rear) loudspeaker 2. Also assume that we want to play back audio in an immersive format with a left front (or rear) height channel 92 and a right front (or rear) height channel 94, then both channels 92 and 94 can be directly input to the virtual centering algorithm of block 300. Alternatively, if the audio has only one height channel, the same channel may be input twice as the height audio signals 92 and 94 as described above.
Block 300 may perform steps 250 and/or 270 of method 200 of fig. 2. Block 300 may be configured to modify the relative phase difference between signal 92 and signal 94 to obtain phase modified signals 302 and 304, respectively. Two other audio channels 81 and 82 may be mixed with the phase modified signals 302 and 304, respectively. For example, the front left (or rear) phase-modified altitude audio signal 302 is mixed with the front left (or rear) channel 81 by the mixer 310 and input to the left loudspeaker 1 for playback. Similarly, the front right (or rear) phase modified high audio signal 304 is mixed with the front right (or rear) channel 82 by a mixer 320 and input to the right loudspeaker 2 for playback. Block 300 may be implemented with a set of filters, such as Finite Impulse Response (FIR) filters or Infinite Impulse Response (IIR) all-pass filters. The design of the IIR all-pass filter can be accomplished by a characteristic filter method. Examples of such embodiments are described further below.
Block 300 may be differently configured for front and rear loudspeaker pairs to account for different distances between a listener located at one or more listening positions and either the front or rear loudspeaker pairs that are symmetrically off-centered relative to the listener's position. For example, block 300 may be configured for a front passenger and/or driver based on a distance between the front passenger and/or driver and the front loudspeakers. Alternatively, the block 300 may be configured for rear passenger(s) according to the distance between one and/or both rear passengers and the rear loudspeakers.
Referring to fig. 2b, in an embodiment, the step of modifying 270 the relative phase between the two high audio signals may comprise measuring 272 (e.g., actively) the phase difference between the two monaural audio signals emitted from the at least two loudspeakers as a function of frequency at one or more of the listening positions. For example, the measurement of the phase difference may be performed during an initial calibration phase of the method. Examples of how such measurements at one or more listening positions may be used to modify the relative phase difference between two audio channels are provided in U.S. patent No. 10284995B2, which is incorporated herein by reference in its entirety. In the context of the present disclosure, the relative phase difference that is modified (e.g., reduced) is the phase difference between the two high-level audio signals (e.g., signals 92 and 94 in fig. 8). For example, in one embodiment, one or more sensors may be located at or near the listening position to measure such phase differences. For example, in an embodiment, such sensors may be embedded on the headrest of each seat of the vehicle at about the same height of the listener's head. The measurement may be performed during an initial calibration phase of the method or, alternatively, substantially in real time with playback of audio.
Still referring to fig. 2b (step 274), alternatively, additionally or alternatively, modifying 270 the relative phase between the two high audio signals may be based on a predetermined absolute distance between one or more listening positions and each of the at least two loudspeakers. For example, the distance between one or more listening positions (e.g., any of the positions at seats 110, 120, 130, or 140 of fig. 3) and a pair of stereo loudspeakers may be determined/predetermined by environmental characteristics (e.g., interior design of the vehicle) and loudspeaker mounting. The method of the present disclosure may use the predetermined information to obtain the phase difference. For example, in an embodiment, the step of modifying 270 the relative phase between the two high audio signals may involve accessing a predetermined phase difference. For example, the phase difference as a function of frequency may have been measured for one vehicle of a certain type and then stored in the memory of the on-board computing system of the same type of vehicle. The advantage of such an off-line calibration is that the vehicle does not need to be equipped with a sensor for measuring the phase difference on-line. The predetermined phase difference may be stored as an analytical function or a look-up table (LUT), for example.
It can be seen that there is a relationship between the listener's distance to the left and right loudspeakers and the desired frequency response of block 300, which modifies the relative phase between signals 92 and 94 to obtain phase modified signals 302 and 304, respectively. As shown in EP1994795B1, the desired frequency response of the block 300 is the frequency f d Corresponding wavelengths are equal to the path difference between the left and right loudspeakers of an off-center listening position:
wherein d L Is the listener's distance to the left speaker, and d R Is the listener's distance to the right speaker and c is the speed of sound (all distances are in meters). It can be seen that the sequential bandsA plurality of alternating frequency bands that are predominantly out of phaseIs centered at an integer multiple of the frequency of the block 300 and thus the desired phase response of the block 300 can be designed to have the same frequency response.
In an embodiment, still referring to fig. 2b (step 276), the step of modifying 270 the relative phase between the two high audio signals (based on predetermined listener to loudspeaker distance information or based on actual measurements) may be triggered upon detection of movement of a listener located at one or more listening positions. For example, one or more sensors may be employed to detect movements of a listener. Such a sensor may be located, for example, at a corresponding seat of the vehicle when used in the interior of the vehicle. The one or more sensors may be configured to detect the presence of a passenger or driver in the vehicle, enabling the processing method to use the correct distance information to obtain the phase difference.
In embodiments, the one or more seat sensors or a different set of sensors may be used to detect a new listening position, e.g., a new position of a listener's head (or a position of a listener's ear). For example, the driver or passenger may adjust his own seat horizontally and/or vertically to achieve a more comfortable seating position in the vehicle. In this embodiment, the method may acquire/obtain the phase difference according to the newly detected listening position. In this way, the correct distance information may be used depending on the new listening position, whether based on a set of correct predetermined listener-to-loudspeaker distance information or based on actual measurements. For example, if/when the predetermined phase difference is stored as an analysis function or a look-up table (LUT), different analysis functions or different LUTs may correspond to different (e.g., detected) seats or listening positions.
Fig. 9 schematically illustrates an example of a method of processing audio in an immersive format according to an embodiment of the disclosure. Fig. 9 differs from the example shown in fig. 8 in that it is assumed that the audio in the immersive audio format includes one height channel 85, two audio channels (e.g., a left audio channel 86 and a right audio channel 87), and another center audio channel 88. Two altitude audio signals 93 and 95 are obtained from the altitude channel 85 via a block 91. Block 91 may be identical to block 90 described with reference to fig. 8. Block 91 may be configured to derive altitude audio signals 93 and 95 as copies of altitude channel 85. However, if the immersive audio format has more than one height channel (e.g., two height audio channels), the block 91 may be configured to derive the height audio signals 93 and 95 by passing (step 257 in fig. 2 a) the two height channels. The altitude audio signals 93 and 95 are input to a block 301 which is functionally identical to the block 300 described with reference to fig. 8, and phase-modified altitude audio signals 306 and 308 are derived therefrom.
In this example, each of the at least two phase-modified altitude-audio channels 306 and 308 is mixed with each of the two audio channels 86 and 87 (referring to the mixing 280 of fig. 2) to generate mixed audio signals 312 and 314, respectively. The mixed audio signals 312 and 314 are further mixed with the central audio channel 88 at, for example, mixers 330 and 340, respectively. The signals generated from the mixers 330 and 340 are output to the loudspeakers 3 and 4 for playback. This enables playback of the center channel of the immersive audio with a loudspeaker system that does not include a center channel (e.g., a stereo loudspeaker system).
More generally, in an embodiment, the central audio channel of audio may be mixed directly with each phase-modified audio signal 306 and 308, for example, before it is mixed with audio channels 86 and 87 of audio (see step 285 in fig. 2).
It should be appreciated that the examples of fig. 8 and 9 may be interchanged for front-to-rear loudspeakers of a vehicle interior to provide acoustic pitch perception for passengers and/or drivers located in front or rear rows of the vehicle. It should also be appreciated that the examples of fig. 8 and 9 may be interchanged for use with front-to-rear loudspeakers in any listening environment other than a vehicle interior and suitable for a particular implementation, for example.
The example of fig. 8 may be used for a pair of (stereo) loudspeakers 1 and 2 located in the rear row of a vehicle to create a perception of the pitch of the sound for a passenger located in the rear row of the vehicle. In this example, the height channel 80 may be a rear height channel and the channels 81 and 82 correspond to a left rear channel and a right rear channel, respectively. The height audio signals 92 and 94 derived from the rear height channel are used to virtually center the rear height channel 80, thereby reconstructing the perception of sound pitch for passengers located in the rear row of the vehicle. The block 300 may be configured according to the distance between one or more rear passengers and the rear pair of loudspeakers 1 and 2.
Meanwhile, alternatively or additionally, the example of fig. 9 may be used for a pair of (stereo) loudspeakers 3 and 4 located in the front row of the same vehicle to create a perception of the pitch of the sound for a passenger located in the front row of the vehicle. In this example, the height channel 85 may be a front height channel and the channels 86 and 87 correspond to a left front channel and a right front channel, respectively. The height audio signals 93 and 95 derived from the front height channel 85 are used to virtually center the front height channel 85, thereby reconstructing the perception of the sound pitch for the front passenger and/or driver. The block 301 may be configured according to the distance between the front passenger and/or driver and the front pair of loudspeakers 3 and 4. Thus, block 301 may be identical to block 300, but differently configured for operation at a different set of predetermined distances (e.g., a different set of analysis functions or LUTs) between the front passenger and/or driver and the right and left front loudspeakers 3, 4.
Alternatively, as explained above, the block 301 may be configured to use actual measurements of sound perceived from sound emitted from the left and right front loudspeakers 3, 4 at the front driver and/or front passenger positions.
Alternatively, a single block similar to block 300 or 301 may be differently configured to operate with a different set of predetermined distances and/or actual measurements (e.g., a different set of analysis functions or LUTs) between the front and/or rear passengers and/or drivers and the respective front and/or rear left and right loudspeakers.
Furthermore, combining the audio processing methods of fig. 8 and 9 in the same vehicle (as explained in the examples explained above) enables playback of 5.1.2 audio. Using the method/system of fig. 8, the left and right rear channels and the rear elevation channel are played back with the left and right rear loudspeakers. The front left channel, front right channel, center channel and height channel are played back using the method/system of fig. 9.
However, the examples of combining the methods/systems described above with reference to fig. 8 and 9 in a vehicle are not limiting. For example, the exemplary methods/systems of fig. 8 or 9 may be used to play back audio in different types of immersive audio formats to create sound pitch for any front driver and/or front/rear passengers in a vehicle.
Fig. 10 schematically shows an example of a method of obtaining two altitude-audio signals from two altitude-audio channels. In this example, it is assumed that the (e.g., rendered) audio includes two (instead of one) altitude channels 83 and 84, and two altitude audio channels 96 and 97 are obtained from the altitude channels 83 and 84.
However, the audio may include any number (e.g., more than two) of altitude channels as appropriate for the particular implementation.
When there is more than one height channel, the height channels may be different from each other, so that the perception of sound height is diminished even when the height channels are "virtually centered", as explained above. To prevent a listener in the vehicle from perceiving the sound height/elevation of the height channel (e.g., to the extent appropriate for a particular implementation), the height channel may be processed such that two more similar or even identical signals may be used as inputs to a "virtual centering algorithm". Fig. 10 shows an example of such a procedure.
Block 98 includes units 102, 104 and optional units 103 and 105. Each unit is configured to change an audio level of an audio signal to which the corresponding unit is applied. For example, a cell may be configured to apply gain or attenuation to an audio signal to which the cell is applied.
For further explanation, the audio level of the altitude channel 83 may be changed by the unit 102. A signal having a corresponding audio level at the output of unit 102 may be mixed with altitude channel 84. The audio level of the mixed signal may optionally be changed by unit 105 to generate a height audio signal 97.
Similarly, the audio level of altitude channel 84 may be changed by unit 104 and mixed with altitude channel 83. The audio level of the mixed signal is optionally changed by unit 103 to generate a height audio signal 96. The similarity between the high audio signals 96 and 97 (e.g., similarity in audio level) is adjusted by units 102 and 104. Optionally, units 103 and 105 are applied after mixing the signal to maintain a constant power level of the signal before and after mixing the signal. The use of optional units 103 and 105 may prevent the resulting height audio signals 96 and 97 from being louder than intended. In particular, the use of optional units 103 and 105 may prevent the resulting height audio signals 96 and 97 from being louder than other channels of audio (e.g., surround sound channels).
It should be appreciated that block 98 may be used in place of block 90 or block 91 in fig. 8 and 9 to handle more than one height channel. It should also be appreciated that in the example of a vehicle, the two height channels may be front or rear height channels and audio having four height channels may thus be played back with a pair of front stereo loudspeakers and a pair of rear stereo loudspeakers. Thus, audio in, for example, 5.1.4 immersive audio format can be played back with a simple stereo loudspeaker system. For example, the method/system of fig. 8 may be used for two height rear channels of a processed loudspeaker and rear passengers. Similarly, the method/system of fig. 9 may be used to handle two high front channels of the front loudspeaker and the driver and/or front passenger.
It should also be appreciated that these two height channels (when present) may be directly input to the "virtual centering algorithm" without additional processing. For example, the two height channels may be substantially similar to each other (single ear), in which case no additional processing may be required.
Fig. 11 schematically shows another example of a method of obtaining a altitude-audio signal from two altitude-audio channels. As in the example described with reference to fig. 10, it is also assumed herein that the (e.g., rendered) audio includes two (instead of one) altitude channels 83 and 84. The elevation channels 83 and 84 are processed by the medial/lateral processing block 99 to obtain elevation audio signals 101 and 102 (see step 242 in fig. 2 a). The altitude audio signal 101 is the mid/center signal of the altitude channels 83 and 84. The height audio signal 102 is a side signal of the height channels 83 and 84. The medial/lateral processing block 99 may be implemented in any manner suitable for the particular implementation. In the example of fig. 11, the medial/lateral processing block 99 includes attenuation units 106 and 108 configured to attenuate the height channels 83 and 84 by half. The medial/lateral processing block 99 further includes a negative element 107. The negative element 107 is configured to apply a negative gain equal to-1. The elevation channels 83 and 84 processed by the attenuation units 106 and 108 are mixed at a mixer 350 to obtain an intermediate signal 101, namely:
Wherein S is 83 And S is 84 Is the signal of the elevation channels 83 and 84, and S 101 Is the high audio signal (intermediate signal) input to the "virtual centering algorithm" block 302.
The mid/side processed mid signal typically contains the same sound in the processed elevation channel. In this way, the same sound in the high audio channels 83 and 84 can be input to the "virtual centering algorithm" block 302.
The different sounds between channels 83 and 84 are represented by side signal 102:
wherein S is 83 And S is 84 Is the signal of the elevation channels 83 and 84, and S 102 Is a high audio signal (side signal) that is not input to the "virtual centering algorithm" block 302.
The side signals of the height channels 83 and 84 are mixed with the phase modified signals 305 and 307 and the audio channels 81 and 82 and then output to the loudspeakers 1 and 2. The method of fig. 11 further comprises a negative element 109 for comparing the side signal 111 (which is compared with the side signal S 102 Equal but opposite in phase) is inverted prior to mixing with audio channel 82 and phase modified signal 307Side signal S 102 Is described (see step 244 of fig. 2 a). Thus, the side signal S 102 Intermediate signal S mixed back to "virtual centering 101 To recover the original height channel signal while providing enhanced perceived sound pitch.
It should be appreciated that the elevation channels 83 and 84 may be left and right elevation channels, respectively, as explained with reference to fig. 10. More specifically, in the vehicle, the height passages 83 and 84 may be left front and right front height passages or left rear and right rear height passages, respectively. Similarly, loudspeakers 1 and 2 may be left and right stereo loudspeakers. More specifically, in a vehicle, the loudspeakers 1 and 2 may be front left and front right stereo loudspeakers or rear left and rear right stereo loudspeakers. Although not shown in fig. 11, when present, the center channel may be mixed with the phase modified height audio signal 305, side signal 102 and audio channel 81 as shown in fig. 9. The center channel may also be mixed with the phase modified height audio signal 307, the phase inverted side signal 111 and the audio channel 82.
In the examples of fig. 8, 9, and 11, the number of channels for playback is less than the number of channels of the input audio in the immersive audio format. This therefore means that the channels of the input audio in the immersive audio format are down-mixed in the channels for playback (loudspeaker feed).
Fig. 12a shows a functional schematic block diagram of a possible prior art FIR-based implementation applied to one of the two elevation channels (in this case the left elevation channel).
Fig. 12b shows a functional schematic block diagram of a possible prior art FIR-based implementation applied to one of the two elevation channels (in this case the right elevation channel).
As explained above, IDP phase compensation of the arrangement as in the example of fig. 7a may be implemented using a Finite Impulse Response (FIR) filter and a linear phase digital filter or filter function. Such filters or filter functions may be designed to achieve predictable and controlled phase and amplitude responses. Fig. 12a and 12b show block diagrams of possible FIR-based implementations applied to one of two high-level audio signals, respectively. Both of these FIR-based embodiments are described in EP1994795B1, which is incorporated herein by reference in its entirety.
In the example of fig. 12a, two complementary comb filtered signals are created (at 703 and 709), which if added together, will have a substantially flat amplitude response. Fig. 13a shows the comb filter response of a band pass filter or filter function ("BP filter") 702. Such a response may be obtained by one or more filters or filter functions.
Fig. 13b shows the effective comb filter response resulting from the arrangement shown in fig. 12a of BP filter 702, time delay or delay function ("delay") 704 and subtraction combiner 708. The BP filter 702 and the delay 704 may have substantially the same delay characteristics so that the comb filter responses are substantially complementary (see fig. 13a and 13 b). One of the comb filtered signals is subjected to a 90 degree phase shift to impart a desired phase adjustment in a desired frequency band. Although either of the two comb filtered signals may be shifted by 90 degrees, in the example of fig. 12a, the signal at 709 is phase shifted. The choice of shifting one or the other signal affects the choice in the correlation process shown in the example of fig. 12b, so that the total shift between channels is desired. The use of linear phase FIR filters allows the economical creation of two comb filtered signals (703 and 709) using one or more filters that select only one set of frequency bands, as in the example of fig. 13 a. The delay through the BP filter 702 may be constant with frequency. This allows the complementary signal to be created by delaying the original signal by the same amount of time as the group delay of the FIR BP filter 702 and subtracting the filtered signal from the delayed original signal (in the subtraction combiner 708, as shown in fig. 12 a). Any frequency-invariant delay imparted by the 90-degree phase shift process should be applied to the non-phase adjusted signals before they are added together to again ensure a flat response.
The filtered signal 709 is passed through a wideband 90 degree phase shifter or phase shift process ("90 degree phase shift") 710 to produce a signal 711. The signal 703 is delayed by a delay or delay function 712 having substantially the same delay characteristics as the 90 degree phase shift 710 to produce a signal 713. The 90 degree phase shifted signal 711 and the delayed signal 713 are input to an add summer or summing function 714 to produce an output signal 715. The 90 degree phase shift may be implemented using any of a variety of known methods, such as Hilbert (Hilbert) transform. The output signal 715 has essentially unity gain with a very narrow-3 dB drop only at frequencies corresponding to the transition point between the unmodified frequency band and the phase shifted frequency band, but with a phase response of frequency variation, as shown in fig. 13 c.
Fig. 12b shows a possible prior art FIR based implementation applied to the right elevation channel. The block diagram is similar to the block diagram of the left elevation channel of fig. 12a, except that the delay signal (in this case signal 727) is subtracted from the filtered signal (in this case signal 723), rather than vice versa. The final output signal 735 has substantially unity gain but has a negative 90 degree phase shift for the phase shift band, as shown in fig. 13d (as compared to positive 90 degrees in the left channel shown in fig. 13 c).
The relative phase difference between the two output signals 715 and 735 (phase-modified high audio signal) is shown in fig. 13 e. The phase difference shows the 180 degree combined phase shift for each band of the main out of phase for each listening position. Thus, the out-of-phase frequency bands become predominantly in-phase at the listening position. Fig. 13e shows that the relative phase of the two altitude audio signals has been modified by adding a 180 degree shift to the relative phase between the two altitude audio signals for each frequency band (e.g. in the frequency bands 250-750Hz, 1250-1750Hz etc.) where the phase differences are mainly out of phase. This corresponds to shifting the phase of one of the two high-level audio signals by +90 degrees and the phase of the other high-level audio signal by-90 degrees in the frequency band where the phase differences are mainly out of phase (see fig. 13c and 13 d). The resulting corrected IDPs for the left and right listening positions (shown in fig. 7 a) are shown in fig. 13f and 13 g. The resulting IDP observed is desirably within plus/minus 90 degrees for the left and right listening positions.
Thus, once the FIR of FIGS. 12a and 12b is applied to both high-level audio channels, the resulting IDP observed at the listening position is ideally within plus/minus 90 degrees for both listeners at the respective listening positions (e.g., in the same row of the vehicle, as shown in FIG. 7 a).
Example computing device
A method of processing audio in an immersive audio format comprising at least one high-level audio channel for playback of the audio with a non-immersive loudspeaker system of at least two audio loudspeakers in a listening environment comprising one or more listening positions has been described. Additionally, the present disclosure also relates to an apparatus for performing these methods. Furthermore, the present disclosure relates to a vehicle that may include means for performing these methods. An example of such an apparatus 1400 is schematically illustrated in fig. 14. The apparatus 1400 may include a processor 1410 (e.g., a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), one or more Application Specific Integrated Circuits (ASICs), one or more Radio Frequency Integrated Circuits (RFICs), or any combination of these) and a memory 1420 coupled to the processor 1410. Memory 1420 may, for example, store an analysis function (or set of analysis functions) or a look-up table (or set of look-up tables) representing, for example, phase differences of two high-level audio signals of different listening positions and/or listening environments. The processor may be configured to perform some or all of the steps of the methods described throughout this disclosure, for example, by retrieving the set of analytical functions and/or LTUs from memory 1420. To perform the method of processing audio, the apparatus 1400 may receive as input channels of (e.g., rendered) audio in an immersive audio format (e.g., a height channel and one or more front or surround sound audio channels 1425). In this case, the apparatus 1400 may output two or more audio phase modified audio signals 1430 for playback of audio in a non-immersive loudspeaker system.
Apparatus 1400 may be a server computer, a client computer, a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a smart phone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that apparatus. Further, while only a single apparatus 1400 is illustrated in fig. 14, the present disclosure will refer to any collection of apparatuses that individually or jointly execute instructions to perform any one or more of the methods discussed herein.
The present disclosure further relates to a program (e.g., a computer program) comprising instructions which, when executed by a processor, cause the processor to perform some or all of the steps of the methods described herein.
Still further, the present disclosure relates to a computer-readable (or machine-readable) storage medium storing the foregoing program. The term "computer readable storage medium" as used herein includes, but is not limited to, data repositories in the form of, for example, solid state memory, optical media, and magnetic media.
The embodiments described herein may be implemented in hardware, software, firmware, and combinations thereof. For example, embodiments may be implemented on a system (e.g., a computer system) that includes electronic circuitry and components. Examples of computer systems include desktop computer systems, portable computer systems (e.g., laptop computers), handheld devices (e.g., smart phones or tablet computers), and networking devices. A system for implementing the embodiments may include, for example, at least one of an Integrated Circuit (IC), a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)), a Digital Signal Processor (DSP), an Application Specific IC (ASIC), a Central Processing Unit (CPU), and a Graphics Processing Unit (GPU).
Certain implementations of the embodiments described herein may include a computer program product comprising instructions that when executed by a data processing system cause the data processing system to perform the method of any of the embodiments described herein. The computer program product may comprise a non-transitory medium, such as a physical medium, such as a magnetic data storage medium including floppy disks and hard disk drives, an optical data storage medium including CD ROMs and DVDs, and an electronic data storage medium including ROMs, flash memory (such as flash RAM or USB flash drives), in which the instructions are stored. In another example, a computer program product includes a data stream containing the instructions, or a file containing the instructions stored in a distributed computing system (e.g., in one or more data centers).
The present disclosure is not limited to the above embodiments and examples. Many modifications and variations may be made without departing from the scope of the present disclosure, as defined by the following claims.
Aspects of the invention may be understood from the example embodiments (EEEs) enumerated below:
EEE 1. A method (200) of processing audio in an immersive audio format including at least one high-level audio channel for playback of the processed audio with a non-immersive loudspeaker system having at least two audio loudspeakers in a listening environment including one or more listening positions, wherein each of the one or more listening positions is symmetrically off-centered with respect to the at least two loudspeakers and each of the at least two loudspeakers is laterally spaced apart with respect to each of the one or more listening positions such that when two monaural audio signals emanate from the at least two loudspeakers, a phase difference occurs at the one or more listening positions due to acoustic characteristics of the listening environment, the method comprising:
Obtaining (250) two height audio signals from at least a portion of the at least one height audio channel;
modifying (270) the relative phase between the two high-level audio signals in a frequency band in which the phase differences are predominantly out of phase to obtain two phase-modified high-level audio signals in which the phase differences are predominantly in phase; and
-playback (290) of the processed audio at the at least two audio loudspeakers, wherein the processed audio comprises the two phase modified high audio signals.
The method (200) of EEE 2. The audio in the immersive audio format further comprises at least two audio channels, and wherein the method further comprises mixing (280) each of the two phase modified altitude audio signals with each of the two audio channels.
The EEE 3. The method of EEE 1 or EEE 2, wherein the audio in the immersive audio format further comprises a central channel, and wherein the method further comprises mixing (285) each of the two phase modified altitude audio signals with the central channel.
EEE 4. The method of any of the preceding EEEs, wherein the audio in the immersive audio format has a single height audio channel, and wherein obtaining (250) the two height audio signals comprises obtaining (255) two identical height audio signals each corresponding to the single height audio channel.
EEE 5. The method of any of the preceding EEEs, wherein the audio in the immersive audio format comprises at least two high-altitude audio channels, and wherein obtaining (250) the two high-altitude audio signals comprises obtaining (240) two identical high-altitude audio signals from the at least two high-altitude audio channels.
The method of EEE 6. The method of EEE 5 further comprising applying (242) a mid/side process to the at least two high audio channels to obtain a mid signal and a side signal, wherein each of the two high audio signals corresponds to the mid signal.
The method of EEE 7. The method of EEE 6 further comprises mixing (244) the side signal and a signal corresponding to the side signal but having an opposite phase to the side signal with the phase modified high audio signal.
The EEE 8. The method of any of the preceding EEEs, wherein modifying (270) the relative phase between the two high-altitude audio signals comprises measuring (275) the phase difference at the one or more of the listening positions.
EEE 9. The method of any of the preceding EEEs, wherein modifying (270) the relative phase between the two high-altitude audio signals is based on a predetermined absolute distance between the one or more listening positions and each of the at least two loudspeakers.
The method of any of the preceding EEEs 10, wherein the step of modifying (270) the relative phase between the two high-level audio signals is triggered upon detection of movement of a listener at the one or more listening positions.
EEE 11. The method of any of the preceding EEEs, wherein the listening environment is a vehicle interior.
The EEE 12. The method of any of the preceding EEEs, wherein the non-immersive loudspeaker system is a stereo or surround sound loudspeaker system.
EEE 13. The method of any of the preceding EEEs, wherein the audio in the immersive audio format is audio rendered in an immersive audio format.
The method of any of the preceding EEEs, wherein the immersive audio format is dolby panoramic or any x.y.z audio format, wherein x+.2 is the number of front or surround sound audio channels, y+.0 if present, is a low frequency effect or subwoofer audio channel, and z+.1 is the at least one high audio channel.
The EEE 15. The method of any of the preceding EEEs, wherein the modifying (270) adds a 180 degree phase shift to the relative phase between the two high-level audio signals for each frequency band whose phase differences are predominantly out of phase.
The method of EEE 16 wherein the phase of one of the two altitude audio signals is shifted by +90 degrees and the phase of the other of the two altitude audio signals is shifted by-90 degrees.
An EEE 17. An apparatus comprising a processor and a memory coupled to the processor, wherein the processor is configured to perform a method according to any of the preceding EEEs.
EEE 18A vehicle includes a device as described in EEE 17.
EEE 19. A program comprising instructions which, when executed by a processor, cause the processor to perform the method according to any one of EEEs 1 to 16.
EEE 20A computer readable storage medium stores a program according to EEE 19.

Claims (20)

1. A method (200) of processing audio in an immersive audio format including at least one high audio channel for playback of processed audio without an overhead loudspeaker with a non-immersive loudspeaker system having at least two audio loudspeakers in a listening environment including one or more listening positions, wherein each of the one or more listening positions is symmetrically off-centered with respect to the at least two loudspeakers and each of the at least two loudspeakers is laterally spaced with respect to each of the one or more listening positions such that when two monaural audio signals emanate from the at least two loudspeakers, an inter-loudspeaker differential phase, IDP, occurs at the one or more listening positions due to acoustic characteristics of the listening environment, the method comprising:
obtaining (250) two monaural height audio signals from at least a portion of the at least one height audio channel;
modifying (270) the relative phase between the two monaural height audio signals in a frequency band in which the IDP out of phase occurs at the one or more listening positions when the two height channels emanate from the at least two loudspeakers to obtain two phase modified height audio signals in which the IDP is in phase; and
-playback (290) of the processed audio at the at least two audio loudspeakers, wherein the processed audio comprises the two phase modified high audio signals.
2. The method (200) of claim 1, wherein the audio in the immersive audio format further comprises at least two audio channels, and wherein the method further comprises mixing (280) each of the two phase-modified height audio signals with one of the two audio channels.
3. The method of claim 1 or 2, wherein the audio in immersive audio format further comprises a central channel, and wherein the method further comprises mixing (285) each of the two phase modified height audio signals with the central channel.
4. The method of any of the preceding claims, wherein the audio in the immersive audio format has a single height audio channel, and wherein obtaining (250) the two monaural height audio signals comprises obtaining (255) the two monaural height audio signals that each correspond to the single height audio channel.
5. The method of any of the preceding claims, wherein the audio in the immersive audio format comprises at least two height audio channels, and wherein obtaining (250) the two monaural height audio signals comprises obtaining (240) the two monaural height audio signals from the at least two height audio channels.
6. The method of claim 5, further comprising applying (242) a mid/side process to the at least two high audio channels to obtain a mid signal and a side signal, wherein each of the two high audio signals corresponds to the mid signal.
7. The method of claim 6, further comprising mixing (244) the side signal and a signal corresponding to the side signal but opposite in phase to the side signal with the phase modified high audio signal.
8. The method of any of the preceding claims, wherein modifying (270) the relative phase between the two high audio signals comprises measuring (275) the IDP at the one or more of the listening positions.
9. The method of any of the preceding claims, wherein modifying (270) the relative phase between the two high audio signals is based on a predetermined absolute distance between the one or more listening positions and each of the at least two loudspeakers.
10. The method of any of the preceding claims, wherein the step of modifying (270) the relative phase between the two high-level audio signals is triggered upon detecting a movement of a listener at the one or more listening positions.
11. The method of any of the preceding claims, wherein the listening environment is a vehicle interior.
12. The method of any of the preceding claims, wherein the non-immersive loudspeaker system is a stereo or surround sound loudspeaker system.
13. The method of any of the preceding claims, wherein the audio of the immersive audio format is audio rendered in the immersive audio format, and/or wherein the immersive audio format is dolby panoramic sound or any x.y.z audio format, wherein X ≡2 is the number of front or surround sound audio channels, Y ≡0 if present is a low frequency effect or subwoofer audio channel, and Z ≡1 is the at least one high audio channel.
14. The method of any of the preceding claims, wherein the modifying (270) adds a 180 degree phase shift to the relative phase between the two high-altitude audio signals for each frequency band that is out of phase with the IDP.
15. The method of claim 14, wherein the phase of one of the two altitude audio signals is shifted by +90 degrees and the phase of the other of the two altitude audio signals is shifted by-90 degrees.
16. The method of claim 15, wherein the phase of one of the two altitude audio signals is shifted by +90 degrees and the phase of the other of the two altitude audio signals is shifted by-90 degrees.
17. An apparatus comprising a processor and a memory coupled to the processor, wherein the processor is configured to perform the method of any of the preceding claims.
18. A vehicle comprising the apparatus of claim 17.
19. A program comprising instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 16.
20. A computer-readable storage medium storing the program according to claim 19.
CN202280050234.XA 2021-07-28 2022-07-21 Audio processing method for playback of immersive audio Pending CN117652161A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163226529P 2021-07-28 2021-07-28
US63/226,529 2021-07-28
EP21188202.2 2021-07-28
PCT/US2022/037809 WO2023009377A1 (en) 2021-07-28 2022-07-21 A method of processing audio for playback of immersive audio

Publications (1)

Publication Number Publication Date
CN117652161A true CN117652161A (en) 2024-03-05

Family

ID=90050039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280050234.XA Pending CN117652161A (en) 2021-07-28 2022-07-21 Audio processing method for playback of immersive audio

Country Status (1)

Country Link
CN (1) CN117652161A (en)

Similar Documents

Publication Publication Date Title
EP1558060B1 (en) Vehicle audio system surround modes
US9049533B2 (en) Audio system phase equalization
JP2023175769A (en) Apparatus and method for providing individual sound zones
CN113660581B (en) System and method for processing input audio signal and computer readable medium
US20050213786A1 (en) Acoustic system for vehicle and corresponding device
EP1283658A2 (en) Multi channel audio reproduction system
US9294861B2 (en) Audio signal processing device
JP2018506937A (en) Audio signal processing apparatus and method for reducing crosstalk of audio signals
JP5118267B2 (en) Audio signal reproduction apparatus and audio signal reproduction method
JP5103522B2 (en) Audio playback device
US11388539B2 (en) Method and device for audio signal processing for binaural virtualization
US20080175396A1 (en) Apparatus and method of out-of-head localization of sound image output from headpones
JP2018514134A (en) Apparatus and method for processing stereo signals for in-car reproduction, achieving individual three-dimensional sound with front loudspeakers
CN111510847B (en) Micro loudspeaker array, in-vehicle sound field control method and device and storage device
CN117652161A (en) Audio processing method for playback of immersive audio
JP2023548324A (en) Systems and methods for providing enhanced audio
CN113645531A (en) Earphone virtual space sound playback method and device, storage medium and earphone
WO2023009377A1 (en) A method of processing audio for playback of immersive audio
US11470435B2 (en) Method and device for processing audio signals using 2-channel stereo speaker
US20230319474A1 (en) Audio crosstalk cancellation and stereo widening
JP2008028640A (en) Audio reproduction device
WO2023122550A1 (en) A method of processing audio for playback of immersive audio
JP2023548849A (en) Systems and methods for providing enhanced audio
JPH01223895A (en) Acoustic reproducing device
JP2022161881A (en) Sound processing method and sound processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication