WO2016180493A1 - Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande - Google Patents

Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande Download PDF

Info

Publication number
WO2016180493A1
WO2016180493A1 PCT/EP2015/060628 EP2015060628W WO2016180493A1 WO 2016180493 A1 WO2016180493 A1 WO 2016180493A1 EP 2015060628 W EP2015060628 W EP 2015060628W WO 2016180493 A1 WO2016180493 A1 WO 2016180493A1
Authority
WO
WIPO (PCT)
Prior art keywords
listener
pose
unit
drive signals
audio zone
Prior art date
Application number
PCT/EP2015/060628
Other languages
English (en)
Inventor
Michael BÜRGER
Thomas Richter
Mengqiu ZHANG
Heinrich LÖLLMANN
Walter Kellermann
Andre Kaup
Yue Lang
Peter GROSCHE
Karim Helwani
Giovanni Cordara
Original Assignee
Huawei Technologies Co., Ltd.
Friedrich-Alexander-Universität Erlangen-Nürnberg
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd., Friedrich-Alexander-Universität Erlangen-Nürnberg filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2015/060628 priority Critical patent/WO2016180493A1/fr
Priority to EP15725269.3A priority patent/EP3275213B1/fr
Publication of WO2016180493A1 publication Critical patent/WO2016180493A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/13Application of wave-field synthesis in stereophonic audio systems

Definitions

  • the present invention relates to wave field synthesis apparatus and a method for driving an array of loudspeakers with drive signals.
  • the present invention also relates to a computer- readable storage medium storing program code, the program code comprising instructions for carrying out a method for driving an array of loudspeakers with drive signals.
  • a first group comprises local sound field synthesis (SFS) approaches, such as (higher order) ambisonics, wave field synthesis and techniques related to it, and a multitude of least squares approaches (pressure matching, acoustic contrast maximization, ...) These techniques aim at reproducing a desired sound field in multiple spatially extended areas.
  • a second group comprises binaural rendering (BR) or point-to-point rendering approaches, e.g., binaural beamforming or crosstalk cancellation.
  • BR binaural rendering
  • ITDs interaural time differences
  • ILDs interaural level differences
  • the objective of the present invention is to provide an apparatus and a method for driving an array of loudspeakers with drive signals, wherein the apparatus and the method provide high- quality personalized spatial sound to possibly moving listeners.
  • a first aspect of the invention provides a wave field synthesis apparatus for driving an array of loudspeakers with drive signals, the apparatus comprising:
  • a listener pose identifying unit for identifying a pose of a listener, wherein the pose comprises a location and an orientation of a listener
  • the sound reproduction unit for generating the drive signals, the sound reproduction unit comprising a sound field synthesizer for generating sound field drive signals for causing the array of loudspeakers to generate a sound field at at least one audio zone and/or a binaural renderer for generating binaural drive signals for causing the array of loud- speakers to generate specified sound pressures at at least two locations, and
  • an adaptation unit for adapting one or more parameters of the sound reproduction unit based on the identified pose of the listener.
  • the apparatus of the first aspect combines sound reproduction with listener pose identifying for identifying the pose (position and orientation) of one or more listeners, wherein the identified pose is used to adapt parameters of the sound reproduction unit.
  • This can involve a modification of the loudspeaker prefilters such that the hearing impression can be maintained in case of a moving listener or a listener located off the sweet spot(s).
  • only a certain subset of loudspeakers is utilized for reproduction in order to reduce cross-talk.
  • a modification of the binaural input signals can be performed in order to avoid the virtual acoustic scene to rotate.
  • the wave field synthesis apparatus of the first aspect can be part of a system for untethered personalized multi-zone sound reproduction, which adapts to the poses (positions and orienta- tion) of multiple possibly moving listeners.
  • the desired auditory impression can be preserved if listeners are not located precisely at a sweet spot or even move, and in embodiments cross-talk can be reduced by selecting a suitable set of active loudspeakers for reproduction depending on the listeners' positions.
  • Information about a number and poses of the listeners can be obtained, for example, with the help of a video-based pose detection and tracking system.
  • the listener pose identi- fying unit is configured to identify a number of listeners in the audio zone, and wherein the adaptation unit is configured to adapt a size parameter, which indicates a size of the audio zone, based on the identified number of listeners in the audio zone.
  • the one or more parameters of the sound reproduction unit can comprise one or more size parameters of one or more audio zones generated by the sound reproduction unit. Therefore, the size of an audio zone can adapted if multiple persons or a group are present in a zone. This has the advantage that the wave field synthesis apparatus can adapt to different numbers of listeners who want to listen to the same audio content. Updating of the size of the audio zone can be performed periodically, e.g. in fixed predetermined time intervals. In other embodiments, the size of the audio zone can be updated in irregular intervals, whenever new information about the number of listeners is available at the listener pose identifying unit.
  • the at least one audio zone comprises a dark audio zone and a bright audio zone and the adaptation unit is configured to set an output parameter, which corresponds to a strength of a specific drive signal for a specific loudspeaker, to zero if the adaptation unit determines that there is at least one connection line between a location of the specific loudspeaker and a point in the bright audio zone that intersects with the dark audio zone and/or with a surrounding of the dark audio zone, wherein in particular the surrounding is defined by a fixed perimeter around the dark audio zone.
  • loudspeakers are used for the task of generating one or more audio zones. Discarding loudspeakers for reducing cross-talk may seem counterintuitive, since more loudspeakers should theoretically provide a higher suppression of cross-talk as well as a smaller error in the bright zone. In real-world scenarios, however, loudspeaker imperfections, positioning errors, and reflections deteriorate the reproduction performance and especially introduce a significant amount of cross-talk. Therefore, discarding loudspeakers in the prox- imity of the dark-zone can reduce the amount of cross-talk and, thus, reduce the residual sound pressure level.
  • the adaptation unit is configured to control the sound reproduction unit to start and/or resume generating the drive signals if at least one listener is identified in the audio zone and/or the adaptation unit is configured to control the sound reproduction unit to stop and/or pause generating the drive signals for the audio zone if the listener pose identifying unit determines that there are no listeners in the audio zone. Pausing to generate drive signals for an audio zone where no listeners are located has the advantage that cross talk to other audio zones can be avoided. For example, if there are three audio zones, and the wave field synthesis apparatus pauses generation of drive signals for a first of the three audio zones, cross talk to the second and third audio zone is avoided.
  • the listener pose identifying unit comprises an uncertainty determining unit for determining an uncertainty level, wherein the uncertainty level comprises a location uncertainty level, which reflects an estimated uncertainty in an identified location, and/or an orientation uncertainty level, which reflects an estimated uncertainty in an identified orientation, and wherein the adaptation unit is configured to adapt a parameter of the sound reproduction unit based on the determined uncertainty level.
  • the adaptation unit is configured to adapt parameters based on the determined pose. However, if an uncertainty in the pose determination is high, it may be preferable to avoid any significant parameter adaptations based on the determined pose. For example, if the uncertainty in the determined location of a listener is high, it is preferable to avoid setting parameters of the sound reproduction unit such that a sharply delimited audio zone is generated with high sound volume only at the determined location. The determined location might be inaccurate and at the true location, the sound output might be insufficient.
  • the adaptation unit is configured to adapt a parameter indicating a size of the audio zone based on the determined uncertainty level, wherein a higher uncertainty corresponds to a larger size.
  • the size of the audio zone can be set as a linear function of the uncertainty of the location of the listener.
  • the adaptation unit is configured to adjust a weighting parameter, which indicates a weighting between the sound field drive signals and the binaural drive signals, based on the determined uncertainty level, wherein in particular the apparatus is configured such that the drive signals are generated us- ing only sound field synthesis if the determined uncertainty level is higher than a predetermined threshold. Therefore, in cases of uncertain location determination, a higher emphasis can be placed on sound field synthesis, which compared to binaural rendering can be less reliant on precise knowledge of the location of a listener.
  • the apparatus comprises a camera input unit for obtaining image frames from one or more cameras and the listener pose identifying unit comprises:
  • a listener detection unit for detecting a location of a listener in one or more first image frames acquired by the one or more cameras
  • a listener histogram determining unit for determining a first histogram of the listener in the one or more first image frames based on the detected location
  • a listener tracking unit for tracking the listener in one or more subsequent image frames that are acquired by the one or more cameras after the one or more first image frames, wherein the listener tracking unit is configured to track the listener based on the first histo- gram of the listener.
  • the one or more cameras can be two cameras, which are located at different positions, such that a 3D image can be derived from images acquired from the two cameras, and a 3D location of a listener be determined.
  • the listener tracking unit can be configured to use a tracking algorithm which requires a lower computational effort to track a location compared to the computational effort of the listener detection unit to detect a location.
  • a first histogram of the listener can be determined with high accuracy.
  • the location detection and the histogram determining involves significant computational effort and therefore, according to the seventh implementation, a tracking unit can be used for subsequent image frames.
  • the track- ing unit can assume that the first histogram of the listener does not change between the first image frame and the subsequent image frames.
  • the tracking unit can be configured to assume that the location of the listener changes only within certain limits between an image frame and the next image frame. Based on one or more of these assumptions, the tracking unit can use simpler algorithms than the detection unit to determine a location of the lis- tener in the subsequent image frames.
  • the uncertainty determining unit is configured to determine the uncertainty level based on a difference between a first histogram that is determined based on a detected location of the listener and a subse- quent histogram that is determined based on a tracked location of the listener.
  • the difference between the first histogram and the subsequent histogram can be an indication for an error of the determined or tracked location of the listener.
  • the difference between the first histogram and the subse- quent histogram can be adjusted to account for changes of a first global histogram and a subsequent global histogram, wherein the first global histogram is computed based on an entire first image frame and wherein the subsequent global histogram is computed based on an entire subsequent image frame.
  • Global histograms of image frames can change e.g. because of changes in the lighting of the room. For example, if an artificial light is switched on in the room, all pixels in the image frames can be affected. Therefore, it can be preferable to adjust the difference computation based on a change of a global histogram.
  • the apparatus further comprises a distance detection unit which is configured to determine a distance of the listener from a reference point based on a size of a face region of the listener in the one or more image frames.
  • the reference point can be located at the location of the one or more cameras. When a listener is closer to the cameras, his face appears larger in the acquired image frames. Therefore, a distance of the listener can be determined based on the size of the listener's face in the one or more image frames.
  • a second aspect of the invention refers to a method for driving an array of loudspeakers with drive signals, comprising the steps:
  • identifying a pose of a listener wherein the pose comprises a location and an orienta- tion of the listener
  • the methods according to the second aspect of the invention can be performed by the system according to the first aspect of the invention. Further features or implementations of the method according to the second aspect of the invention can perform the functionality of the apparatus according to the first aspect of the invention and its different implementation forms.
  • the sound reproduction parameters can be parameters of a sound reproduction unit.
  • identifying the pose of the lis- tener comprises the steps:
  • the method of the first implementation further comprises the steps:
  • the method further comprises a step of detecting the location of the listener in the one or more subsequent image frames if the determined uncertainty level is higher than a predetermined threshold.
  • the method of the third implementation has the advantage that a detection of the listener location is performed only if the uncertainty level is so high that it is no longer sensible to rely on the result of the tracking unit.
  • a third aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the second aspect or one of the implementations of the second aspect.
  • FIG. 1 shows a schematic illustration of a wave field synthesis apparatus according to an embodiment of the invention
  • FIG. 2 shows a schematic illustration of a system comprising a wave field synthesis apparatus according to another embodiment of the invention
  • FIG. 3 is a flow chart of a method in accordance with the present invention
  • FIG. 4 is a flow chart which illustrates in more detail the step of identifying a pose of a listener
  • FIG. 5 shows a schematic illustration a loudspeaker selection scheme for an exemplary scenario with a first listener in a first audio zone and a second listener in a second audio zone
  • FIG. 6 shows a schematic illustration of the definition of the quantities required to determine the minimum angle of the loudspeaker selection scheme of FIG. 5, and
  • FIG. 7 shows a schematic illustration of a listener pose identifying unit in accordance with the present invention.
  • FIG. 1 shows a schematic illustration of a wave field synthesis apparatus 100.
  • the wave field synthesis apparatus 100 comprises a listener pose identifying unit 110, a sound reproduction unit 120, and an adaptation unit 130.
  • the sound reproduction unit 120 comprises a sound field synthesizer 122 and a binaural renderer 124.
  • FIG. 2 shows an overview block diagram of a system 202 in accordance with the present invention. In the scenario shown in FIG. 2, a first listener 250 and a second listener 252 are provided with personalized sound.
  • the system 202 comprises a wave field synthesis apparatus 200, a camera system 214a, 214b, and an array of loudspeakers 240.
  • the array of loud- speakers 240 is driven by drive signals that are generated by the personalized sound reproduction system, which is the sound reproduction unit 220 of the wave field synthesis apparatus 200.
  • the wave field synthesis apparatus 200 further comprises a first and second camera input unit 212a, 212b for connecting external cameras 214a, 214b, a first and second video-based pose estimation system 210a, 210b, which are listener pose identifying units.
  • the video-based pose estimation systems estimate poses of the listeners and pass estimated pose data (UDP) to the adaptation stage 230.
  • the drive signals generated by the sound reproduction unit 220 cause the array of loudspeakers to generate sound waves that generate a first audio zone 260 at a location of the first lis- tener 250 and a second audio zone 262 at a location of the second listener 252.
  • the location of the first audio zone 260 corresponds to an updated location of the first listener 250 that is different from a previous location 251 of the first listener.
  • the change in location of the first listener corresponds also to a change in orientation, i.e., the pose of the first listener has changed.
  • the wave field synthesis apparatus 200 is configured to carry out a method, wherein the listeners 250, 252 are detected and their poses detected. In the illustrated example, this is done with the help of a camera system 214a, 214b, such as a stereo camera setup or dedicated devices providing the required depth information.
  • a camera system 214a, 214b such as a stereo camera setup or dedicated devices providing the required depth information.
  • FIG. 3 is a flow chart of a method in accordance with the present invention.
  • a pose of a listener is identified, wherein the pose comprises a location and an orientation of the listener.
  • a head detection and tracking algorithm can be used, wherein the head detection is based on the so-called Viola- Jones approach, pub- lished by P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features.”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 1-511 - 1-518, 2001.
  • the algorithm may not be able to estimate the listeners' positions for arbitrary head orientations.
  • the listeners' positions can be tracked over time using a listener tracking unit. For tracking, different features such as color, depth, and dominant characteristics within each facial region can be used, making the listener tracking unit more robust against potential illumination inconsistencies or complex background regions.
  • the depth information can be used for a rough background/foreground segmentation of the facial regions, to detect outliers in the optical flow and/or to infer a 3D position for detected faces.
  • step S20 sound field drive signals are generated for causing the array of loudspeakers to generate at least one sound field at at least one audio zone, and/or in step S30 binaural drive signals are generated for causing the array of loudspeakers to generate specified sound pressures at at least two locations.
  • step S40 one or more sound reproduction parameters are adapted based on the identified pose.
  • the one or more sound reproduction parameters relate to the generation of sound field drive signals and/or the generation of binaural drive signals.
  • FIG. 4 is a flow chart that illustrates step S10 of identifying a listener's pose in more detail.
  • a first step SI 1 one or more image frames are acquired.
  • a pose of the listener is detected in the one or more first image frames.
  • the facial regions can be transformed into the HSV color space and first histograms of the hue- values are determined in step S13.
  • Dominant edges are defined as feature points and searched within each facial region.
  • a segmentation mask can be created. Depth information can also be used in determining the segment mask.
  • step S14 one or more subsequent image frames are acquired.
  • step S15 the pose of the listener is tracked in the one or more subsequent image frames.
  • the feature sets from one or more previous image frames can be tracked into a current frame by using an optical flow approach, e.g. the approach presented in J.
  • the probability map can be used to detect the region where, most likely the face of a listener will move in following frames.
  • there is no measurement of confidence for the tracking of the pose The tracking can be stopped and face detection re-initialized every N frames (wherein N can range from 5 to 50).
  • RANSAC based pose tracking can be used as a criteria for identifying when the tracking is lost.
  • a subsequent histogram is computed based on the tracked pose of the listener in the one or more subsequent image frames.
  • step S17 an uncertainty level of the tracked pose is determined.
  • the uncertainty level can be determined based on how far the subsequent histogram differs from the first histogram.
  • a sound reproduction parameter is adapted based on the determined and/or the tracked pose of the listener.
  • the pose information serves as input for an adaptation stage 230, which is an adaptation unit which controls the sound reproduction unit 220 accordingly, i.e., the individual acoustic scenes are adapted to the poses of the listeners.
  • This adaptation may comprise different steps, depending on the scenario and algorithm (binaural rendering vs. sound field synthesis) used for reproduction.
  • FSS sound field synthesis
  • the local sound fields can be shifted according to the listeners' positions such that all listeners are provided with the desired, personalized virtual acoustic scenes at all times.
  • Local SFS techniques aim to reproduce a desired sound field in multiple spatially extended areas (e.g., audio zones). Such audio zones may be referred to as bright zones or dark zones. In bright zones, the sound field can be perceived by a listener, in dark zones (quiet zones), the sound field is attenuated (e.g., corresponds to silence or is otherwise not perceivable). In case of sound field synthesis and a varying number of listeners within a single local listening area, the size of this area can be adapted such that each listener within the area can be provided with the same desired hearing impression.
  • the sound reproduction unit can be adapted such that the posi- tions at which the sound field can be controlled always coincide with the ear positions such that the desired ILDs and ITDs (provided by the binaural input signals) can be evoked at all times even for moving listeners.
  • the binaural input signals provided to the binaural rendering system can be adapted in order to avoid rotations of the virtual acoustic scenes.
  • the adaptation of the sound reproduction parameter can also be based on the determined uncertainty level. For example, if the uncertainty level increases compared to a previously determined uncertainty level, the size of an audio zone can be increased.
  • the loudspeaker prefilters which provide a certain listener i with personalized sound are also adapted if another listener j is moving. This is necessary since, in addition to the generation of the desired sound field in zone i, one or more quiet zones (spatial zeros) need to be generated and adapted to the positions of all other listeners j.
  • the sound reproduction unit can be triggered by the listener pose identifying unit, i.e., sound reproduction for a particular zone will only start if at least one listener is present in that zone, and sound reproduction for a particular zone will stop, if no listener is present anymore in that zone.
  • step SI 9a the determined uncertainty level is compared with a predetermined threshold. If it is determined that the uncertainty level is higher than the predetermined threshold, in step SI 9b the pose of the listener is detected in the one or more subsequent image frames. If the uncertainty level is not too high, the method can proceed with acquiring further subsequent image frames.
  • the present orientation estimation algorithm combines absolute orientation detection and relative orientation tracking. Based on the detection of well-known facial features, such as eyes, nose, or mouth, and their corresponding depth values, the three rotation angles roll, pitch, and yaw can be calculated for each listener. Again, the Viola- Jones approach can be used as detector.
  • the approach of relative orientation tracking comprises three steps. First, fea- tures are detected within the facial region in the previous image frame. Second, the features are tracked into the current image frame, he algorithms presented in J. Shi and C. Tomasi, "Good features to track.”, Proc. IEEE Conference on Computer Vision and Pattern recognition, 593-600, 1994 and J. Bougout, "Pyramidal implementation of the Lucas Kanade feature tracker", Intel Corporation, Microprocessor Research Labs, 2000 can be used for feature de- tection and tracking. An iterative RANSAC algorithm can be used as method to detect geometric transformation among features' position. It can be assumed that, when RANSAC does not converge (because too few features are matched, for example), the tracking is lost.
  • the Viola- Jones face detector is used to detect a face at an initial image frame and initializes the listener tracking unit by computing the hue histogram in the detected region.
  • the histogram defines the target color distribution to be tracked.
  • features are searched within the initial detected region. Since the final listener tracking unit should also be able to track other objects then faces, no facial features, like eyes, nose or mouth are chosen. Instead a feature is defined to be a dominant edge within the detected region.
  • the minimal eigenvalue is computed at each pixel position of the input image frame, resulting in a map of eigenvalues M eig (m, n, t). The minimal eigenvalue is used as "corner quality measurement".
  • the image to be tracked at a subsequent image frame is segmented first, using color information.
  • This step is similar to the color segmentation in the simple tracking mode.
  • the segmentation is done in three color spaces, namely RGB, HSV and YCbCr.
  • YCbCr color space
  • a color is represented with the luminance component Y, the blue- difference chroma component Cb and the red-difference chroma component Cr.
  • the segmentation in HSV is done, as in the simple mode, using fixed values for the upper and lower bounds of hue, saturation and value, respectively.
  • the segmentation in RGB and YCbCr is done adaptively according to the color distribution within the initially detected region of interest.
  • a histogram for each color channel in RGB and YCbCr is computed within the face rectangle defining each, the upper and the lower bound of the corresponding color channel. Since the detection region is usually larger than the target object, the rectangular region can be shrinked by a factor p in height and width.
  • a pixel in the segmentation mask M(m, n, t) is then only marked, if its color is within the computed ranges in RGB, HSV and YCbCr.
  • RGB and YCbCr can lead to a more accurate segmentation result and thus a better result for the initial mask M(m, n, t).
  • the initially detected sparse feature set which comprises the image coordinates of the calculated feature points, is tracked using an optical flow approach using a pyramidal implementation of the Lucas Kanade feature tracker.
  • the basic idea of this approach is to subdivide images into different resolution levels. Then, the motion of the input feature set is estimated, beginning on the lowest resolution level up to the original image resolution. Thereby, the result at a specific resolution level is used as initial guess for the next resolution level.
  • the motion is estimated by comparing the local neighborhood within a specific window of size w m x w n around the feature point to be tracked.
  • the unknown position (mi , iii) can therefore be described as
  • FIG. 5 illustrates the loudspeaker selection scheme for an exemplary scenario with a first listener 350 in a first audio zone 360 and a second listener 352 in a second audio zone 362.
  • the first audio zone 350 is a bright zone and the second audio zone 360 is a dark zone, i.e., a de- sired acoustic scene should be synthesized for the first listener 350, while the acoustic energy leaking to the position of the second listener 352 (cross-talk) should be minimized.
  • the angular direction of a particular loudspeaker I is denoted as a.1 and defined with respect to the point x tan ⁇ , which denotes the point where the connection line 370 between loudspeaker I and the circular contour around the first audio zone 360 form a tangent.
  • Those loudspeakers 344 of the array of loudspeakers 340 for which the angular direction t is smaller than a minimum angle a min are deactivated. This minimum angle is chosen such that the connection lines between any point in the bright zone 360 and the loudspeaker I do not intersect with the dark zone 362. Since the connection line 370 does not intersect with the dark zone 362 and since there is also no other connection line between a point in the bright zone and the loudspeaker 342, the loudspeakers is not deactivated. For the further loudspeaker 344, on the other hand, there would be a connection line between a point in the bright zone 360 and the further loudspeaker 344 that intersects with the dark zone 362. Therefore, the further loudspeaker 344 is deactivated.
  • x £ [Xi, £ ] T
  • the point x tan needs to be determined according to
  • FIG. 7 shows a schematic illustration of a listener pose identifying unit 410 in accordance with the present invention.
  • the listener pose identifying unit 410 comprises an uncertainty determining unit 412, a listener detection unit 414, a listener histogram determining unit 416 and a listener tracking unit 418.
  • the listener pose identifying unit 410 is a distance detection unit, i.e., it can be configured to detect a distance of a listener to a reference point.
  • Untethered personalized information systems e.g., in a museum:
  • the video-based pose detection and tracking system triggers the sound reproduction system, which acoustically provides information about the respective exhibit if a visitor is detected in a predefined area in front of it while keeping the acoustic energy in the other areas low.
  • Personalized TV sound for multiple viewers Combining state-of-the-art 3D imaging systems and multi-zone sound reproduction allows for two (or multiple) users watching their individual 2D audio content and 3D audio content. For example, two listeners can watch different movies with a single system. Again, sound reproduction is adapted to the actual position of the possibly moving users.
  • Dialogue multiplex in teleconferencing The system described above allows for providing individual participants, e.g., with speech from a remote site in different languages, or with different speech signals originating from different conversation partners at a remote site.
  • an adaptive system for personalized, multi-zone sound reproduction where an unknown number of possibly moving users can be provided with individual audio content and cross-talk between the individual users can be reduced by choosing a suited subset of loudspeakers for reproduction.
  • the poses (positions and orientations) of the users' heads can be tracked, e.g., with the help of a video-based system, and the obtained information are exploited in order to adapt the sound reproduction algorithm accordingly such that the desired hearing impression is maintained even if listeners move or rotate their heads.
  • the number of loud- speakers utilized for reproducing sound in another zone i are adapted in order to reduce the cross-talk leaking into zone j.
  • Embodiments of the invention may be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.
  • a computer program is a list of instructions such as a particular application program and/or an operating system.
  • the computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
  • the computer program may be stored internally on computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on transitory or non-transitory computer readable media permanently, removably or remotely coupled to an information processing system.
  • the computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; non-volatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.; and data transmission media including computer networks, point-to-point telecommunication equipment, and carrier wave transmission media, just to name a few.
  • magnetic storage media including disk and tape storage media
  • optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media
  • non-volatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM
  • ferromagnetic digital memories such as FLASH memory, EEPROM, EPROM, ROM
  • a computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process.
  • An operating system is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources.
  • An operating system processes system data and user in- put, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
  • the computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices.
  • I/O input/output
  • the computer system processes information according to the computer program and produces resultant output information via I/O devices.
  • connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Ac- cordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections.
  • the connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidi- rectional connections and vice versa.
  • plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals.
  • the examples, or portions thereof may implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.
  • the invention is not limited to physical devices or units implemented in nonprogrammable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as "computer systems”.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un appareil de synthèse de champ d'ondes pour la commande d'un réseau de haut-parleurs avec des signaux de commande. L'appareil comporte: une unité d'identification de pose d'un auditeur pour identifier une pose d'un auditeur, la pose comprenant une position et une orientation d'un auditeur; une unité de reproduction sonore pour générer des signaux de commande de champ sonore, l'unité de reproduction sonore comprenant un synthétiseur de champ sonore pour générer des signaux de commande de champ sonore pour entraîner la génération par le réseau de haut-parleurs d'un champ sonore vers au moins une zone audio et/ou un dispositif de rendu binaural pour la génération de signaux de commande binauraux entraîner la génération par le réseau de haut-parleurs des pressions sonores déterminées au niveau d'au moins deux emplacements; et une unité d'adaptation pour l'adaptation d'un ou de plusieurs paramètre(s)de l'unité de reproduction sonore en fonction de la pose identifiée de l'auditeur.
PCT/EP2015/060628 2015-05-13 2015-05-13 Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande WO2016180493A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2015/060628 WO2016180493A1 (fr) 2015-05-13 2015-05-13 Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande
EP15725269.3A EP3275213B1 (fr) 2015-05-13 2015-05-13 Procédé et dispositif à commander un réseau de haut-parleurs avec des signaux de commande

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2015/060628 WO2016180493A1 (fr) 2015-05-13 2015-05-13 Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande

Publications (1)

Publication Number Publication Date
WO2016180493A1 true WO2016180493A1 (fr) 2016-11-17

Family

ID=53269455

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/060628 WO2016180493A1 (fr) 2015-05-13 2015-05-13 Procédé et appareil pour la commande d'un réseau de haut-parleurs avec des signaux de commande

Country Status (2)

Country Link
EP (1) EP3275213B1 (fr)
WO (1) WO2016180493A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9980076B1 (en) 2017-02-21 2018-05-22 At&T Intellectual Property I, L.P. Audio adjustment and profile system
WO2019046706A1 (fr) * 2017-09-01 2019-03-07 Dts, Inc. Adaptation de point idéal pour audio virtualisé
FR3081662A1 (fr) * 2018-06-28 2019-11-29 Orange Procede pour une restitution sonore spatialisee d'un champ sonore audible selectivement dans une sous-zone d'une zone
WO2020066644A1 (fr) * 2018-09-26 2020-04-02 Sony Corporation Dispositif de traitement d'informations, procédé de traitement d'informations, programme, et système de traitement d'informations
US11310617B2 (en) 2016-07-05 2022-04-19 Sony Corporation Sound field forming apparatus and method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3900401A1 (fr) 2018-12-19 2021-10-27 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Appareil et procédé de reproduction d'une source sonore étendue spatialement ou appareil et procédé de génération d'un flux binaire à partir d'une source sonore étendue spatialement
US20240114308A1 (en) * 2020-12-03 2024-04-04 Dolby Laboratories Licensing Corporation Frequency domain multiplexing of spatial audio for multiple listener sweet spots
US20240107255A1 (en) * 2020-12-03 2024-03-28 Dolby Laboratories Licensing Corporation Frequency domain multiplexing of spatial audio for multiple listener sweet spots

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080273713A1 (en) * 2007-05-04 2008-11-06 Klaus Hartung System and method for directionally radiating sound
US20110103620A1 (en) * 2008-04-09 2011-05-05 Michael Strauss Apparatus and Method for Generating Filter Characteristics
US20120014525A1 (en) * 2010-07-13 2012-01-19 Samsung Electronics Co., Ltd. Method and apparatus for simultaneously controlling near sound field and far sound field

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080273713A1 (en) * 2007-05-04 2008-11-06 Klaus Hartung System and method for directionally radiating sound
US20110103620A1 (en) * 2008-04-09 2011-05-05 Michael Strauss Apparatus and Method for Generating Filter Characteristics
US20120014525A1 (en) * 2010-07-13 2012-01-19 Samsung Electronics Co., Ltd. Method and apparatus for simultaneously controlling near sound field and far sound field

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11310617B2 (en) 2016-07-05 2022-04-19 Sony Corporation Sound field forming apparatus and method
US10313821B2 (en) 2017-02-21 2019-06-04 At&T Intellectual Property I, L.P. Audio adjustment and profile system
US9980076B1 (en) 2017-02-21 2018-05-22 At&T Intellectual Property I, L.P. Audio adjustment and profile system
US10728683B2 (en) 2017-09-01 2020-07-28 Dts, Inc. Sweet spot adaptation for virtualized audio
WO2019046706A1 (fr) * 2017-09-01 2019-03-07 Dts, Inc. Adaptation de point idéal pour audio virtualisé
FR3081662A1 (fr) * 2018-06-28 2019-11-29 Orange Procede pour une restitution sonore spatialisee d'un champ sonore audible selectivement dans une sous-zone d'une zone
CN112369047A (zh) * 2018-06-28 2021-02-12 奥兰治 在区域的子区域中选择性可听见的声场的空间声音再现的方法
CN112369047B (zh) * 2018-06-28 2022-01-25 奥兰治 在区域的子区域中选择性可听见的声场的空间声音再现的方法
WO2020002829A1 (fr) * 2018-06-28 2020-01-02 Orange Procédé pour une restitution sonore spatialisée d'un champ sonore audible sélectivement dans une sous-zone d'une zone
US11317234B2 (en) 2018-06-28 2022-04-26 Orange Method for the spatialized sound reproduction of a sound field which is selectively audible in a sub-area of an area
WO2020066644A1 (fr) * 2018-09-26 2020-04-02 Sony Corporation Dispositif de traitement d'informations, procédé de traitement d'informations, programme, et système de traitement d'informations
CN112771891A (zh) * 2018-09-26 2021-05-07 索尼公司 信息处理设备、信息处理方法、程序和信息处理系统
US11546713B2 (en) 2018-09-26 2023-01-03 Sony Corporation Information processing device, information processing method, program, and information processing system
CN112771891B (zh) * 2018-09-26 2023-05-02 索尼公司 信息处理设备、信息处理方法、程序和信息处理系统

Also Published As

Publication number Publication date
EP3275213A1 (fr) 2018-01-31
EP3275213B1 (fr) 2019-12-04

Similar Documents

Publication Publication Date Title
US10074012B2 (en) Sound and video object tracking
EP3275213B1 (fr) Procédé et dispositif à commander un réseau de haut-parleurs avec des signaux de commande
RU2743732C2 (ru) Способ и устройство для обработки видео- и аудиосигналов и программа
US9338420B2 (en) Video analysis assisted generation of multi-channel audio data
US20180338213A1 (en) VR Audio Superzoom
US11523219B2 (en) Audio apparatus and method of operation therefor
US10542368B2 (en) Audio content modification for playback audio
US10887719B2 (en) Apparatus and associated methods for presentation of spatial audio
WO2021243633A1 (fr) Sélection de vue optimale dans un système de téléconférence à caméras en cascade
US10728689B2 (en) Soundfield modeling for efficient encoding and/or retrieval
JP2021090208A (ja) プレノプティック・カメラによりキャプチャされた画像をリフォーカシングする方法及びオーディオに基づくリフォーカシング画像システム
US11348288B2 (en) Multimedia content
US20220225050A1 (en) Head tracked spatial audio and/or video rendering
US12010490B1 (en) Audio renderer based on audiovisual information
WO2021243631A1 (fr) Estimation de pose de tête dans un système de téléconférence à caméras multiples
US20230283976A1 (en) Device and rendering environment tracking
EP4221263A1 (fr) Suivi de tête et prédiction hrtf
US11109151B2 (en) Recording and rendering sound spaces
EP4075794A1 (fr) Ajustement des paramètres d'une caméra basé sur une région d'intérêt dans un environnement de téléconférence
Li et al. Multiple active speaker localization based on audio-visual fusion in two stages
WO2023164814A1 (fr) Appareil multimédia et procédé et dispositif de commande associés et procédé et dispositif de suivi de cible
Wilson et al. Audio-video array source separation for perceptual user interfaces
CN114631332A (zh) 比特流中音频效果元数据的信令
WO2020167528A1 (fr) Formation de collage pour joindre des images
WO2023150486A1 (fr) Rendu audio et/ou visuel commandé par geste

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15725269

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015725269

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE