CN116866817A - Device and method for presenting spatial audio content - Google Patents

Device and method for presenting spatial audio content Download PDF

Info

Publication number
CN116866817A
CN116866817A CN202310634369.7A CN202310634369A CN116866817A CN 116866817 A CN116866817 A CN 116866817A CN 202310634369 A CN202310634369 A CN 202310634369A CN 116866817 A CN116866817 A CN 116866817A
Authority
CN
China
Prior art keywords
sound
visual
spatial audio
focus position
visual focus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310634369.7A
Other languages
Chinese (zh)
Inventor
张胜
卢明辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Acoustic Industrial Technology Research Institute Co ltd
Original Assignee
Suzhou Acoustic Industrial Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Acoustic Industrial Technology Research Institute Co ltd filed Critical Suzhou Acoustic Industrial Technology Research Institute Co ltd
Priority to CN202310634369.7A priority Critical patent/CN116866817A/en
Publication of CN116866817A publication Critical patent/CN116866817A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved

Abstract

The present invention relates to a presentation device for spatial audio content for presenting sound material, comprising: a visual point of attention determining unit for obtaining a visual focus position at which the listener gazes; and the processing unit is in signal connection with the visual attention determining unit to receive the visual focus position information, and performs gain on sound sources near the visual focus position or performs a spatial audio algorithm as sound source positions. There is also provided a method of rendering spatial audio content for rendering sound material, comprising: s1, acquiring a visual focus position of a listener; and S2, performing gain on the sound source near the visual focus position or performing a spatial audio algorithm by taking the visual focus position as the sound source position. The visual attention is used as the sound source position data to gain the sound source or execute the space audio algorithm, so that the immersion sense is increased.

Description

Device and method for presenting spatial audio content
Technical Field
The invention belongs to the field of acoustics, relates to audio algorithms, and in particular relates to the presentation of spatial audio content.
Background
Spatial audio (or 3D audio) technology has been a popular research in the multimedia field. From the end of the 80 s of the 20 th century, the implementation and development of microphone array acquisition technology has driven the development of physical sound field reconstruction technology, the most notable of which is two major technologies, namely, a sound reproduction technology based on spherical harmonic decomposition and wave field synthesis (Wave field synthesis, WFS), which are both technologies that realize sound field reconstruction by reconstructing a physical sound field, i.e., pursuing that sound pressure in a reconstruction space is consistent with that of an original field.
At the end of the 90 th century, researchers find that if the perceptual characteristics of the auditory system are considered, the consistency of the reconstructed sound field and the original sound field is not required to be sufficiently pursued during sound field reconstruction, so that people can obtain better immersion feeling and positioning feeling, and on the basis, the research of the virtual sound scene reconstruction technology based on perception is carried out. Compared with the physical sound field reconstruction technology, the requirements of the virtual sound scene reconstruction technology based on perception on the playback environment and equipment are greatly reduced, and the virtual sound scene reconstruction technology based on perception is easier to be applied practically, so that the virtual sound field reconstruction technology is also widely paid attention to the academia and industry, wherein the most widely applied technology is an amplitude translation technology (Amplitude panning, AP) and a binaural reconstruction technology based on head related transfer functions (Head related transfer function, HRTF). The amplitude panning that was born in 1961 is a virtual sound image control technique that controls the position of a sound image perceived by the human ear by adjusting the amplitude of a signal that is distributed to speakers. By using the amplitude translation technology, the virtual sound image in the sector area in front of the listening point can be reconstructed through amplitude modulation of left and right channel signals of the stereo equipment, and the method has certain expression capability of sound image positions and environment rendering capability. The binaural reconstruction technology based on the HRTF is a process of simulating the signal of a certain spatial sound image to be transmitted to ears by using the HRTF obtained through experimental measurement, and can realize the perception reconstruction of a virtual sound image by using the earphone with only two channels during playback, thereby providing technical support for the binaural 3D audio in a mobile environment.
Amplitude panning is an efficient sound image control technique that controls the position of the sound image perceived by the human ear by adjusting the amplitude of the signal distributed to the speakers. The distribution of the signals of two loudspeakers is modulated in a stereo system with the angle between the sound source and the right in front of the center point.
For the restoration of the sound source direction in the three-dimensional space, the VILLE PULKKI of university of helsinki in finland proposed Vector-based amplitude translation (Vector-based amplitude panning, VBAP) technology in 1997, and VBAP synthesizes the unit Vector of the virtual sound source by using the unit vectors of the positions of 2 or 3 speakers, thereby achieving the purpose of reconstructing the sound source direction. As shown in fig. 1, assuming that the sound source and 3 speakers are located on the same spherical surface, the unit vectors of the 3 speakers are regarded as basis vectors, and the direction of the virtual sound source is obtained by linear combination thereof.
Let the unit vectors of 3 speakers be l respectively 1 ,l 2 ,l 3 The gain of the signal of each speaker is g 1 ,g 2 ,g 3 The sound source unit vector is l 0 L is 0 Can be expressed as:
l 0 =g 1 l 1 +g 2 l 2 +g 3 l 3
then solving for the gain can be:
when the loudspeakers are on the same spherical surface, the VBAP model is simple and efficient in calculation, and the direction recovery of sound below 500-600 Hz is accurate. The VBAP technique can be further extended to aspheric situations in combination with the sound energy decay law, or control of sound time delay.
Perceived reconstruction techniques and systems for virtual sound images have achieved great success in commercial applications. The binaural stereo system may control the sound image position by adjusting the gain of the signals of the two speakers to create an auditory perception for the listener that the sound image is moving in a sector area in front of the listening point. Multi-channel surround sound systems, which typically include Dolby surround sound systems, DTS surround sound systems, etc., extend the range of motion of an audio image from a sector area to a circular area. Among them, dolby surround sound systems have been widely used in movie theatres and home environments due to their excellent sound rendering effects. The 5.1 multi-channel surround sound system developed by dolby laboratories has become a standard for home theater systems; dolby 7.1 Surround sound system (Dolby Surround 7.1) makes the sense of sound Surround stronger by adding two channels, left rear Surround and right rear Surround. On the international consumer electronics display in 2009, dolby demonstrates a new dolby directional logic technology, and by adding two overhead channels in front to upgrade the 7.1 channel to 9.1 channel, the system can provide a certain sound image high perception effect.
In a multi-channel surround sound system, speakers are placed on the same horizontal plane, and an aerial sound image outside the horizontal plane cannot be generated. In order to expand the perception of sound images by listeners from a horizontal plane to a real three-dimensional space, the research of audio technology is developed to a three-dimensional audio level, and the listeners obtain three-dimensional audio enjoyment through a multi-layer loudspeaker arrangement. In 2005, the japanese NHK laboratory formally proposed a 22.2 multichannel prototype system and was listed in japan as a three-dimensional audio standard for next-generation ultra-high definition televisions. As shown in fig. 2, the speaker array in the system is divided into three layers to realize three-dimensional immersion effect, and under the combined action of the three layers of speakers, three-dimensional sound images are reconstructed around the listener, so that the defect of surround sound in terms of high perception is overcome. The system arrangement is shown in the figure, wherein the middle layer is provided with 10 loudspeakers, the upper layer is provided with 9 loudspeakers, and the front part of the lower layer screen is provided with 5 loudspeakers (2 loudspeakers are low-frequency loudspeakers).
The foregoing is a development process of three-dimensional audio, and the current three-dimensional audio based on an amplitude panning method is usually manufactured in a recording room of a sound engineer according to experience of the sound engineer. Due to the advanced production of the channel information (such as normal binaural stereo), the playback side is often prone to cause inconsistencies or insufficient spatial perception:
for example, the distances between the left and right speakers of different products are different for the stereoscopic, the number of speakers is not the same;
for example, some products have only one speaker without the multiple speakers required for amplitude panning.
In addition, due to the creation characteristics of advanced manufacturing, the number of channels also determines the best effect of the loudspeaker system:
the sound source, if 7.1 channels, is able to obtain the best playback effect in a 7.1 channel system;
if the sound source is stereophonic, it is placed in 7.1 channels, and a more effective virtual sound image cannot be formed except for the increase in sense of surround due to the increase in the number of channels. Essentially, the signal of the plurality of speakers is still based on two signals (simply speaking, based on multichannel audio with two channels despread, accurate control of each channel cannot be achieved, which is a mathematically increasing task of the system rank dimension), although a multichannel system is connected.
The above-described acoustic playback mode in which the number of channels is determined during the propagation is called channel-based acoustic playback. The disadvantage is that, obviously, the number of channels and the playback environment need to be adapted to the best playback, e.g. the sound effect of dolby theatres needs to be specially made.
With the requirements of immersive experience, especially with the rising of car audio, the number of speakers in a car has increased dramatically, and the original channel-based system has created a great challenge to the existing channel system in terms of debugging effect and sound image localization (by the above-mentioned music producer and playback end being unable to agree that the speakers of the car are different and the car itself is an asymmetric sound field), dolby and DTS, etc. companies have started to use object-based audio systems, in which the source audio information of sound objects and the coordinates of the sound objects are transferred. And then, carrying out real-time rendering calculation locally, wherein an artist does not need to consider the arrangement of a loudspeaker system when creating audio, and a presenter can carry out localized adaptation according to the actual arrangement of the loudspeakers of the system and the sound and coordinates of a sound object, so that the method has better replay consistency than a replay system based on sound channels. The object-based audio system has wide application in real-time concert (rebroadcasting) and live broadcast, in the live broadcast process, artists cannot create multichannel audio for rebroadcasting in real time, and the listening environment can be better uniformly replayed through the object-based rebroadcasting, so that better replay effect can be obtained.
Based on playback of sound objects, the information of the audio objects is all recorded, and as the number of sound objects to be played back increases, the demand for transmission bandwidth increases with the number of sound objects, whereas stereo and other channel-based playback techniques are generally constant in data amount due to the final channel data transmitted. And most current source files, often still stay on top of channel-based playback, a large number of television or music programs, all of which are channel-based sources. The object-based programs are still in a less state, and currently, music companies are actively reproducing albums while mainly pushing the immersion sound based on the sound object, but the total resources are still in a lack state.
Current sound object-based sound reproduction techniques can form better three-dimensional sound images, providing a better immersive experience, but also suffer from the following drawbacks:
severely dependent on the sound source, which would not be possible without the corresponding sound source;
the attention effect is not obvious, and in general, the judgment of the sound image by the person needs to be focused, which is why the person is not particularly concerned with the location of different sound images when enjoying the musical piece, especially when the musical piece is regarded as background music. The quality of sound image reproduction is only of concern when a music fever friend exclusively pays a taste to the reproduction system. In the presence of a picture, the localization of the sound image is often blurred when the attention of the person is on the picture, without the need for particularly accurate reconstruction accuracy. Current acoustic restoration algorithms cannot restore this attentive mechanism.
Disclosure of Invention
The invention aims to provide a presentation device of spatial audio content, which solves the problem of how to perform spatial audio presentation on sound materials to improve immersion.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the present invention provides a presentation device of spatial audio content for presenting sound material, comprising:
a visual point of attention determining unit for obtaining a visual focus position at which the listener gazes;
and the processing unit is in signal connection with the visual attention determining unit to receive the visual focus position information, and performs gain on sound sources near the visual focus position or performs a spatial audio algorithm as sound source positions.
Preferably, the sound material is audio based on sound objects, binaural audio or mono audio, and the processing unit gains sound sources near the visual focus position.
Further, an activation function related to the Euclidean distance is recorded in the processing unit, the activation function is continuous in high order, and the output of the activation function is a global mask output by the whole spatial audio algorithm, so that a fade-in and fade-out effect of the spatial algorithm effect is achieved.
Further, the processing unit does not perform the gain for all sound sources when the processing unit does not detect the visual focus position.
Further, the visual attention determining unit is further configured to obtain visual direction information, and the processing unit is configured to gain a sound source closest to the listener.
Preferably, the sound material is single-channel audio, and the processing unit executes the spatial audio algorithm using the visual focus position as a sound source position.
There is also provided a method of rendering spatial audio content for rendering sound material, comprising:
s1, acquiring a visual focus position of a listener;
and S2, performing gain on the sound source near the visual focus position or performing a spatial audio algorithm by taking the visual focus position as the sound source position.
Preferably, the sound material is audio based on sound objects or binaural audio, further comprising S11 between S1 and S2: and calculating the Euclidean distance between the visual focus position and the sound source, and using an activation function related to the Euclidean distance to enable the output of the activation function to be a global mask output by the whole spatial audio algorithm, so as to achieve the fade-in fade-out effect of the spatial algorithm effect, wherein the activation function is continuous in high order.
Preferably, the step S1 further includes obtaining visual direction information, and the step S2 includes performing gain processing on a sound source near the listener in the visual direction, where the sound source is near the visual focus.
Preferably, the sound material is single-channel audio, and the step S2 is to execute a spatial audio algorithm using the visual focus position as a sound source position.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the present invention relates to a device and a method for presenting spatial audio content, which uses a visual focus point determining unit to obtain a visual focus position at which a listener looks, gains a sound source near the visual focus position to draw the listener's attention, or performs a spatial audio algorithm with the visual focus position as the sound source position so that a presented audio image is emitted from the visual focus position, which increases the immersion of the listener.
Drawings
Some specific embodiments of the invention will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers will be used throughout the drawings to refer to the same or like parts or portions. It will be appreciated by those skilled in the art that the drawings are not necessarily drawn to scale. In the accompanying drawings:
FIG. 1 is a schematic diagram of the restoration of a sound source using three speakers;
FIG. 2 is a schematic diagram of a three-dimensional sound image reconstruction around a listener using three-layer speakers;
FIG. 3 is a typical audio transmission flow when the playback mode is based on objects;
FIG. 4 is a schematic diagram of selecting a target sound source closer to the observer for gain in the present invention;
FIG. 5 is a schematic diagram of a spatial audio content rendering device according to the present invention;
wherein reference numerals are as follows:
1. a visual attention point determination unit;
2. a processing unit;
3. and a speaker.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 5, the spatial audio content presentation device of the present invention includes a visual attention determination unit 1, a processing unit 2, and a speaker 3, and the processing unit 2 supplies the processed driving signal to the speaker 3 for spatial audio content presentation.
The following describes the application after obtaining the visual focus position of the listener's gaze, largely divided into object-based, channel-based audio material cases. The visual attention determining unit 1 employs an eye movement tracking device such as an eye movement meter, but other devices for obtaining visual attention may be employed.
The first situation, where sound material is based on sound objects, such as scenes of a game or the like.
In the case of object-based playback, the general audio transmission flow can be shown in fig. 3.
The above procedure is described as follows:
1. in the object-based transmission, because transmission bandwidth needs to be saved as much as possible, the current object-based audio transmission is generally performed based on a compression algorithm specific to each company;
2. decoding the compressed data according to a decoding algorithm corresponding to the encoding to obtain an actual sound source waveform and corresponding position information;
3. the VBAP or WFS algorithm calculates the driving signals required for each loudspeaker according to the spatial audio algorithm.
Under the condition, the position information of the sound source and the sound source data exist, so that the sound source is an ideal listening environment, and only the attention effect is not obviously supplemented through visual capture: by tracking the visual focus, an implementation can be represented by fig. 4, in which case the object closer to the observer is selected, as the vision is a projection of the direction of observation, whereas the position information of the sound is typically three-dimensional, and only the direction of projection is calculated when the focus is calculated, as shown in fig. 4.
The above procedure is described as follows:
1. obtaining a visual focus position through an eye tracker;
2. calculating the Euclidean distance between the visual focus position and the projection of the audio object;
a. when the visual focus is not found, gain processing is not carried out on all sound source objects;
b. when the position distance between the visual focus and the sound source is smaller than a threshold value, giving a certain gain process to the audio signal;
i. when there are a plurality of objects in the visual focus, gain processing is performed on a sound source that is near to the listener in the visual direction.
The processing unit 2 also records an activation function related to the Euclidean distance (the positions of the visual focus and the sound source), the activation function is continuous in high order, and the output of the activation function is a global mask output by the whole spatial audio algorithm, so as to play a fade-in and fade-out effect on the spatial algorithm effect.
The second case, where the sound material is based on sound channels, as in the case of a home television.
When the material is sound channel based, the location of the sound focus may be obtained through the visual focus, and the location may generally characterize the location of the actual object. By the method, the position of the object can be equivalently obtained through the visual focus when in single-channel audio scenes such as games, and the audio can be combined with the visual focus information to serve as sound source position information to use a spatial audio algorithm.
When the audio is two-channel, the left and right channel signals of the channel can be enhanced by approximately referencing the way in which the audio has sound objects.
When the audio is mono, multiple channels can be formed by spatial audio algorithms.
In the present invention, the two drawbacks mentioned in the background are compensated for by the capture of the introduction of visual focus. Visual and auditory interactions are interactive (audio-visual interaction). The "interaction" means that the hearing and the vision do not necessarily interfere with each other, and they interact with each other to complement each other, and the final purpose is to make the perception of the listener more accurate. The invention obtains the position of the sound image by introducing the visual attention point determining unit to perform visual capturing, and overcomes the defect that the subsequent audio frequency has no position information. And the visual capture also carries out subjective observation on the attention, and when the user does not have visual focusing, the rendering of the multidimensional acoustic signals can be stopped, so that the calculation power and the energy consumption of the system are reduced.
Visual focus information is introduced, interaction of audio is increased, and the audio is more immersed for interaction scenes such as games, VR and the like.
For channel-based stereo signals or object-based signals, control of the sound source by the visual focus may be increased, such that the immersion of the game or VR is increased.
If the channel-based signal is single channel, then the visual focus can be used as the source location information, using a three-dimensional stereo algorithm, to increase immersion.
In summary, the present invention obtains the visual attention position of the listener through visual capturing, and improves the presentation of a plurality of sound materials using the position information to enhance the sense of immersion.
The above embodiments are only for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the present invention and to implement the same, but are not intended to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be included in the scope of the present invention.

Claims (10)

1. A rendering device for rendering sound material, comprising:
a visual point of attention determining unit (1), the visual point of attention determining unit (1) being configured to obtain a visual focus position at which a listener looks;
and a processing unit (2), wherein the processing unit (2) is connected with the visual attention determining unit (1) in phase to receive the visual focus position information, and the processing unit (2) performs gain on sound sources near the visual focus position or performs a spatial audio algorithm as sound source positions.
2. The presentation device of spatial audio content as claimed in claim 1, wherein: the sound material is audio based on sound objects, binaural audio or mono audio, and the processing unit (2) gains sound sources in the vicinity of the visual focus position.
3. The presentation device of spatial audio content as claimed in claim 2, wherein: the processing unit (2) is also recorded with an activation function related to Euclidean distance, the activation function is continuous in high order, and the output of the activation function is a global mask output by the whole spatial audio algorithm, so that the fade-in and fade-out effect of the spatial algorithm effect is achieved.
4. The presentation device of spatial audio content as claimed in claim 2, wherein: the processing unit (2) does not perform the gain for all sound sources when the processing unit (2) does not detect the visual focus position.
5. The presentation device of spatial audio content as claimed in claim 2, wherein: the visual attention determining unit (1) is further adapted to obtain visual direction information, and the processing unit (2) is adapted to gain a sound source closest to the listener.
6. The presentation device of spatial audio content as claimed in claim 1, wherein: the sound material is single-channel audio, and the processing unit (2) performs the spatial audio algorithm with the visual focus position as a sound source position.
7. A method of rendering spatial audio content for rendering sound material, comprising:
s1, acquiring a visual focus position of a listener;
and S2, performing gain on the sound source near the visual focus position or performing a spatial audio algorithm by taking the visual focus position as the sound source position.
8. The method of rendering spatial audio content of claim 7, wherein: the sound material is audio based on sound objects or binaural audio, further comprising S11 between the S1 and the S2: and calculating the Euclidean distance between the visual focus position and the sound source, and using an activation function related to the Euclidean distance to enable the output of the activation function to be a global mask output by the whole spatial audio algorithm, so as to achieve the fade-in fade-out effect of the spatial algorithm effect, wherein the activation function is continuous in high order.
9. The method for presenting spatial audio content according to claim 7, wherein S1 further comprises obtaining visual direction information, and wherein S2 comprises performing gain processing on a sound source near a listener in the visual direction, wherein a plurality of objects are near a visual focus.
10. The method of rendering spatial audio content of claim 7, wherein: the sound material is single-channel audio, and the spatial audio algorithm is performed in S2 using the visual focus position as a sound source position.
CN202310634369.7A 2023-05-31 2023-05-31 Device and method for presenting spatial audio content Pending CN116866817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310634369.7A CN116866817A (en) 2023-05-31 2023-05-31 Device and method for presenting spatial audio content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310634369.7A CN116866817A (en) 2023-05-31 2023-05-31 Device and method for presenting spatial audio content

Publications (1)

Publication Number Publication Date
CN116866817A true CN116866817A (en) 2023-10-10

Family

ID=88234747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310634369.7A Pending CN116866817A (en) 2023-05-31 2023-05-31 Device and method for presenting spatial audio content

Country Status (1)

Country Link
CN (1) CN116866817A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036691A (en) * 2011-12-17 2013-04-10 微软公司 Selective special audio communication
US10506362B1 (en) * 2018-10-05 2019-12-10 Bose Corporation Dynamic focus for audio augmented reality (AR)
US11546692B1 (en) * 2020-08-19 2023-01-03 Apple Inc. Audio renderer based on audiovisual information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036691A (en) * 2011-12-17 2013-04-10 微软公司 Selective special audio communication
US10506362B1 (en) * 2018-10-05 2019-12-10 Bose Corporation Dynamic focus for audio augmented reality (AR)
US11546692B1 (en) * 2020-08-19 2023-01-03 Apple Inc. Audio renderer based on audiovisual information

Similar Documents

Publication Publication Date Title
US10952009B2 (en) Audio parallax for virtual reality, augmented reality, and mixed reality
CN106797525B (en) For generating and the method and apparatus of playing back audio signal
RU2736274C1 (en) Principle of generating an improved description of the sound field or modified description of the sound field using dirac technology with depth expansion or other technologies
CN109906616A (en) For determining the method, system and equipment of one or more audio representations of one or more audio-sources
AU2021225242B2 (en) Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description
KR20170106063A (en) A method and an apparatus for processing an audio signal
KR20150115823A (en) Determining renderers for spherical harmonic coefficients
CN109410912B (en) Audio processing method and device, electronic equipment and computer readable storage medium
Quackenbush et al. MPEG standards for compressed representation of immersive audio
Llorach et al. Towards realistic immersive audiovisual simulations for hearing research: Capture, virtual scenes and reproduction
JP2018110366A (en) 3d sound video audio apparatus
Tsingos Object-based audio
Theile On the performance of two-channel and multi-channel stereophony
CN116866817A (en) Device and method for presenting spatial audio content
CN109391896B (en) Sound effect generation method and device
Ranjan et al. Wave field synthesis: The future of spatial audio
Gölles et al. Cat3DA-Camera-Tracked 3D Audio Player
Paterson et al. Producing 3-D audio
RU2815366C2 (en) Audio device and audio processing method
RU2815621C1 (en) Audio device and audio processing method
RU2798414C2 (en) Audio device and audio processing method
Pfanzagl-Cardone HOA—Higher Order Ambisonics (Eigenmike®)
Sporer et al. Spatialized audio and 3D audio rendering
CN114128312A (en) Audio rendering for low frequency effects
Wang Soundfield analysis and synthesis: recording, reproduction and compression.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination