EP1576847A1

EP1576847A1 - Audio playback system and method for playing back an audio signal

Info

Publication number: EP1576847A1
Application number: EP03782222A
Authority: EP
Inventors: Frank Melchior; Thomas Röder; Michael Beckinger; Sandra Brix; Thomas Sporer; Haymo Kutschbach; Berthold Schlenker; Carsten Land
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2002-11-21
Filing date: 2003-11-21
Publication date: 2005-09-21
Anticipated expiration: 2023-11-21
Also published as: EP1576847B1; JP4620468B2; JP2006507727A; DE10254404B4; DE50303069D1; WO2004047485A1; ATE324021T1; DE10254404A1

Abstract

An audio playback system is divided into a central wavefield synthesis module (10) and a multitude of decentrally arranged loudspeaker modules (12a-12e). Synthesis signals for the individual loudspeakers and corresponding items of channel information, which are assigned to the synthesis signals, are calculated in the central wavefield synthesis module. The synthesis signals for a loudspeaker together with associated items of channel information are then transmitted to corresponding loudspeaker modules via a transmission link (16a-16e). Each loudspeaker module receives the synthesis signals and associated items of channel information that are intended for the loudspeaker assigned to the loudspeaker module. A decentralized audio rendering and digital-to-analog conversion takes place inside the loudspeaker modules in order to decentrally generate the actual analog loudspeaker signals in spatial proximity to each loudspeaker. The division into a central wavefield synthesis module and a multitude of decentralized loudspeaker modules enables the production of audio playback systems that can be scaled with regard to price in order to offer different size systems, which can be scaled in terms of price, for, in particular, cinema playback spaces that vary greatly in size.

Description

Audio playback system and method for playing an audio signal

description

The present invention relates to audio playback systems and, more particularly, to practical audio playback systems for variable size playback rooms such as cinemas, the audio playback systems being based on wave field synthesis.

There is an increasing need for new technologies and innovative products in the field of consumer electronics. It is an important prerequisite for the success of new multimedia systems to offer optimal functionalities and capabilities. This is achieved through the use of digital technologies and especially computer technology. Examples of this are the applications that offer an improved realistic audiovisual impression. With previous audio systems, a major weakness lies in the quality of the spatial sound reproduction of natural, but also of virtual environments.

Methods for multi-channel loudspeaker reproduction of audio signals have been known and standardized for many years. All common techniques have the disadvantage that both the location of the speakers and the position of the listener are already imprinted on the transmission format. If the speakers are arranged incorrectly in relation to the listener, the audio quality suffers significantly. Optimal sound is only possible in a small area of the playback room, the so-called sweet spot. A better natural spatial impression as well as a stronger wrapping in the audio playback can be achieved with the help of a new technology. The basics of this technology, the so-called wave field synthesis (WFS = Wave-Field Synthesis), were researched at TU Delft and first introduced in the late 80s (Berkhout, AJ; de Vries, D.; Vogel, P. : Acoustic control by Wavefield Synthesis. JASA 93, 1993).

Due to the enormous demands of this method on computer performance and transmission rates, wave field synthesis has so far only rarely been used in practice. It is only the advances in the areas of microprocessor technology and audio coding that allow this technology to be used in concrete applications. The first products in the professional sector are expected next year. In a few years, the first wave field synthesis applications for the consumer sector will also be launched.

The basic idea of WFS is based on the application of Huygens' principle of wave theory:

Every point that is captured by a wave is the starting point of an elementary wave that propagates in a spherical or circular manner.

Applied to acoustics, a large number of loudspeakers that are arranged next to each other (a so-called loudspeaker array) can be used to simulate any shape of an incoming wavefront. In the simplest case, a single point source to be reproduced and a linear arrangement of the loudspeakers, the audio signals of each loudspeaker must be fed with a time delay and amplitude scaling in such a way that the radiated sound fields of the individual loudspeakers overlap correctly. If there are several sound sources, the contribution to each source becomes Speakers calculated separately and the resulting signals added. If the sources to be reproduced are in a room with reflecting walls, then reflections must also be reproduced as additional sources via the loudspeaker array. The effort involved in the calculation therefore depends heavily on the number of sound sources, the reflection properties of the recording room and the number of speakers.

The advantage of this technique lies in the fact that a natural spatial sound impression is possible over a large area of the playback room. In contrast to the known techniques, the direction and distance of sound sources are reproduced very precisely. To a limited extent, virtual sound sources can even be positioned between the real speaker array and the listener.

Although wave field synthesis works well for environments whose properties are known, irregularities do occur when the nature changes or when the wave field synthesis is carried out on the basis of an environment condition that does not match the actual nature of the environment.

An environmental condition can be described by the impulse response of the environment.

This is explained in more detail using the example below. It is assumed that a loudspeaker emits a sound signal against a wall, the reflection of which is undesirable. For this simple example, space compensation using wave field synthesis would be to first determine the reflection of that wall to determine when a sound signal that was reflected from the wall would arrive back at the speaker and what amplitude that reflected sound signal would be Has. If the reflection from this wall is undesirable, there is the Wave field synthesis the possibility of eliminating the reflection from this wall by impressing the speaker with a signal in phase opposition to the reflection signal with a corresponding amplitude in addition to the original audio signal, so that the incoming compensation wave extinguishes the reflection wave, such that the reflection from this wall in the Environment that is considered is eliminated. This can be done by first calculating the impulse response of the environment and determining the nature and position of the wall on the basis of the impulse response of this environment, the wall being interpreted as a mirror source, that is to say as a sound source that reflects an incident sound.

If the impulse response of this environment is measured first and then the compensation signal is calculated, which must be impressed on the loudspeaker superimposed on the audio signal, the reflection from this wall will be canceled in such a way that a listener in this environment has the sound impression that it is Wall does not exist at all.

However, it is crucial for an optimal compensation of the reflected wave that the impulse response of the room is exactly determined, so that no over- or under-compensation occurs.

The wave field synthesis thus enables correct mapping of virtual sound sources over a large reproduction range. At the same time, it offers the sound engineer and sound engineer new technical and creative potential when creating complex soundscapes. Wave field synthesis (WFS or sound field synthesis), as developed at the TU Delft in the late 1980s, represents a holographic approach to sound reproduction. The Kirchhoff-Helmholtz integral serves as the basis for this. This says that any sound fields can be generated within a closed volume by means of a distribution of monopole and dipole sound sources (loudspeaker arrays) on the surface of this volume. Details can be found in MM Boone, ENG Verheijen, PF v. Toi, "Spatial Sound-Field Reproduction by Wave-Field Synthesis", Delft University of Technology Laboratory of Seismics and Acoustics, Journal of J. Audio Eng. Soc, Vol. 43, No. 12, December 1995 and Diemer de Vries, " Sound Reinforcement by Wavefield Synthesis: Adaption of the Synthesis Operator to the Loudspeaker Directivity Characteristics ", Delft University of Technology Laboratory of Seismics and Acoustics, Journal of J. Audio Eng. Soc, Vol. 44, No. 12, December 1996.

In wave field synthesis, a synthesis signal for each loudspeaker of the loudspeaker array is calculated from an audio signal which emits a virtual source at a virtual position, the synthesis signals being designed in terms of amplitude and phase in such a way that a wave resulting from the superimposition of the individual the loudspeaker output in the loudspeaker array results in the sound wave that corresponds to the wave that would come from the virtual source at the virtual position if this virtual source at the virtual position were a real source with a real position.

Typically, there are multiple virtual sources in different virtual locations. The calculation of the synthesis signals is carried out for each virtual source at each virtual position, so that typically one virtual source results in synthesis signals for several loudspeakers. Seen from a loudspeaker, this loudspeaker thus receives several synthesis signals that go back to different virtual sources. An overlay of these sources, which is possible due to the linear superposition principle, results then the playback signal actually sent from the speaker.

The larger the speaker arrays, the better the possibilities of wave field synthesis can be exploited. H. the more individual speakers are provided. However, this also increases the computing power that a wave field synthesis unit has to achieve, since channel information must also typically be taken into account. Specifically, this means that there is in principle a separate transmission channel from each virtual source to each loudspeaker, and that there can in principle be the case that each virtual source leads to a synthesis signal for each loudspeaker, or that each loudspeaker has a number of synthesis signals that is equal to the number of virtual sources.

If, in particular in cinema applications, the possibilities of wave field synthesis are to be exploited to the extent that the virtual sources can also be mobile, it can be seen that due to the calculation of the synthesis signals, the calculation of the channel information and the generation of the playback signals by combining the channel information and the synthesis signals considerable computing power has to be mastered.

In addition, it should be noted at this point that the quality of the audio reproduction increases with the number of speakers provided. This means that the audio reproduction quality becomes the better and more realistic the more speakers are present in the speaker array (s).

In the scenario above, the fully rendered and analog-to-digital converted playback signals for the individual loudspeakers could, for example, be transmitted from the wave field synthesis via two-wire lines. Central unit can be transmitted to the individual speakers. This would have the advantage that it is almost guaranteed that all loudspeakers work synchronously, so that no further measures would be necessary for synchronization purposes. On the other hand, the wave field synthesis central unit could only ever be manufactured for a special reproduction room or for reproduction with a fixed number of loudspeakers. This means that a separate wave field synthesis central unit would have to be manufactured for each playback room, which has to accomplish a considerable amount of computing power, since the calculation of the audio playback signals, in particular with regard to many speakers or many virtual sources, has to be carried out at least partially in parallel and in real time ,

Particularly with regard to audio playback systems intended for cinemas, however, there is a problem that the playback spaces in cinemas vary considerably in size. Cinemas sometimes have a very large cinema hall and / or at the same time several small cinema halls for films that are not as crowded as films that should be played in large cinema halls. However, different cinemas also have differently sized playback rooms, which are particularly important when audio is played not only in cinemas but e.g. B. is thought in concert halls, may possibly vary up to a factor of 100.

In order to equip such different audio playback rooms with an audio playback system based on wave field synthesis, e.g. B. a separate wave field synthesis central unit can be built for each playback room, which is not acceptable due to the individual production in terms of price. On the other hand, a maximally equipped wave field synthesis central unit could be set up, which is controllable with regard to the connectable loudspeakers, i.e. with regard to the number of analog signal outputs, but which includes internal computing processors which is designed for the maximum number of analog outputs, i.e. connectable loudspeakers.

Such a system would result in audio reproducing systems for smaller display rooms also having almost the same price as audio reproducing systems for very large display rooms, which should not be acceptable for operators of small display rooms. In particular, the medium to small display rooms are of interest to providers of audio display systems, with the "smallest" display rooms, which represent, for example, domestic living rooms or smaller restaurants, also being mentioned here.

The possibilities described above are therefore disadvantageous in that a thorough market acceptance is not to be expected immediately.

The object of the present invention is to provide an audio reproduction concept which has a higher market acceptance.

This object is achieved by an audio playback system according to claim 1, a method for playing back an audio signal according to claim 19 or a computer program according to claim 20.

The present invention is based on the knowledge that audio playback systems which are to achieve market acceptance must be scalable. However, scalability must not only take place in terms of the computing power provided, but must also be reflected in the price of the audio playback system impact. In other words, this means that an audio playback system for a large playback space may cost more than an audio playback system for a small playback space. In other words, an audio playback system for a small playback room must cost significantly less than an audio playback system for a large playback room.

In the case of the conceivable concepts described above, the price differences were insignificant, since the price differences were only due to the number of individual loudspeakers, which, however, was offered at low cost due to the fact that a large number of loudspeakers were provided and due to novel integration concepts into the structure that included the reproduction space can be.

According to the invention, the audio playback system is divided into a central wave field synthesis module and into many individual loudspeaker modules that are decentrally connected to the central wave field synthesis module. The central wave field synthesis module receives an audio signal with a plurality of audio tracks and on the one hand calculates the synthesis signals and on the other hand the channel information for the channels from the virtual positions to the real speaker positions.

The central wave field synthesis module is further configured to supply each speaker with one or more synthesis signals that are to be reproduced by the speaker in question, and channel information for the audio channels from the virtual positions of the virtual sources, from which the one or more synthesis signals originate, to the affected one Deliver speakers. A considerable data rate transmission limitation can already be achieved here, since experience shows that it is very rare for every loudspeaker to receive synthesis signals whose Energy content is greater than a certain threshold. The central wave field synthesis module according to the invention thus already has the option of supplying only the synthesis signals to a decentralized loudspeaker module and also only the channel information for the synthesis signals which are important for the individual loudspeakers.

The loudspeaker modules according to the invention are decentralized and directly coupled to the loudspeaker or preferably arranged in close proximity to the loudspeaker. Each loudspeaker module comprises a receiver for receiving the one or ^'more synthesis signals for the respective loudspeaker, as well as the synthesis signals associated with channel information. Furthermore, each loudspeaker module comprises a rendering device for calculating a reproduction signal for the loudspeaker using the synthesis signals and the channel information for the supplied synthesis signals. Finally, each loudspeaker module also comprises a signal processing device with possibly a digital amplifier, a further digital signal processing device and finally a digital-to-analog converter for generating an analog loudspeaker signal which is to be supplied to the loudspeaker concerned, on the basis of the reproduction signal. A plurality of transmission links are provided for connecting the central wave field synthesis module and the decentralized loudspeaker modules, one transmission link each extending from the central wave field synthesis module to the individual loudspeaker.

The operation of the rendering is very computationally expensive, which, with regard to the necessary circuit hardware in the form of, for example, a DSP or a hard-wired circuit, contributes considerably to the costs, in particular when considering the multiplier which is provided for each individual loudspeaker. Preferably the rendering device works using channel impulse responses as channel information and thus performs computationally intensive convolution, which can either be carried out directly in the time domain or is carried out in the frequency domain, which requires transformations in the frequency domain and transformations from the frequency domain, which are required together with the actual multiplication operation in the frequency domain lead to considerable effort. In particular, it should be borne in mind that a rendering unit does not only have to render a single synthesis signal, but always a large number of synthesis signals, which normally corresponds to the number of virtual sources.

The concept according to the invention means that operations which can be carried out in a decentralized manner are shifted out of the central wave field synthesis module into the decentralized loudspeaker modules in such a way that, in the best case, only the operations in the central wave field synthesis module which are equally important for all loudspeakers are carried out during all operations that affect only one loudspeaker, or several loudspeakers that are connected to a loudspeaker module, are also implemented decentrally in the loudspeaker module.

The costs for the central wave synthesis module can thus be considerably reduced, but at the expense of the loudspeaker modules, the price of which can no longer be neglected, due to the operation of the audio rendering which is mainly carried out in the loudspeaker modules.

However, the audio reproduction system according to the invention is now scalable both in terms of performance and in terms of price. This opens up the possibility of offering a central wave field synthesis module for a large number of display rooms at a reduced price, such that the costs for the overall system, which result from the costs for the central unit and the decentralized loudspeaker modules, now correspond strongly to the number of loudspeakers set up and thus the size of the playback room.

In other words, an operator of a large display room will still have to pay a certain price for a display system for his large display room. On the other hand, however, an operator of a smaller playback room will be able to purchase an audio playback system at a significantly lower price, since the number of loudspeakers and thus the number of complex and costly loudspeaker modules is considerably reduced compared to the large playback room.

The audio playback system according to the invention thus makes it possible to offer audio playback systems for smaller playback rooms at considerably reduced prices compared to large playback rooms, so that market acceptance is hoped for due to the reduced price in the very competitive market of audio / video components.

In a preferred exemplary embodiment of the present invention, the central wave field synthesis unit is designed to be able to process cinema films recorded in the conventional audio format for cinema films, conventional recording formats being, for example, the 5.1 surround format or 7.1. Format or 10.2 format. Using the 5.1 format as an example, such a film includes six audio tracks, ie audio tracks for the "left rear", "right rear", "front left", "front right" and "front center" channels, as well as the bass channel (subwoofer channel ). A playback of such a movie with respect to audio technology conventional in the audio playback system according to the invention can be achieved in that the Audio tracks can be placed as virtual sources in virtual positions, which can be selected according to the needs of the sound engineer or the operator of the playback room. The possibility of compatible playback for an audio playback system with a scalable price therefore makes a contribution to the fact that audio playback systems based on wave field synthesis are already spreading at a time when there are still few cinema / video films with audio tracks that are fully suitable for wave field synthesis, together with the correspondingly necessary ones Meta information about the recording setting is available.

Preferred embodiments of the present invention are explained in detail below with reference to the accompanying drawings. Show it:

1 shows a conceptual diagram of the audio reproduction system according to the invention;

2 shows a block diagram of the central wave field synthesis module according to the invention;

3 shows a block diagram of a decentralized loudspeaker module according to the invention;

Fig. 4 is a block diagram of a preferred

Design of the audio rendering unit in a decentralized loudspeaker module;

5 shows a schematic diagram of a compatible reproduction with a large sweet spot;

6 shows a schematic diagram for the formation of a plurality of synthesis signals for a loudspeaker, each of which is to be supplied with channel information in order to obtain the playback signal for the loudspeaker LSi; and Fig. 7 shows a schematic diagram of a channel from a virtual source to a real loudspeaker, showing the quantities that can have an influence on the channel.

The audio playback system according to the invention is basically divided into two parts, as shown in FIG. 1. One part is the central wave field synthesis module 10. The other part is composed of individual loudspeaker modules 12a, 12b, 12c, 12d, 12e, which are connected to actual physical loudspeakers 14a, 14b, 14c, 14d, 14e in the manner shown in FIG. 1 is shown. It should be noted that the number of loudspeakers 14a-14e is in the range above 50 and typically even well above 100 in typical applications. If each speaker is assigned its own speaker module, the corresponding number of speaker modules is also required. Depending on the application, however, it is preferred to address a small group of loudspeakers next to one another from a loudspeaker module. In this context, it is arbitrary whether a loudspeaker module, which is connected to four loudspeakers, for example, feeds the four loudspeakers with the same playback signal, or whether corresponding different synthesis signals are calculated for the four loudspeakers, so that such a loudspeaker module actually consists of there are several individual loudspeaker modules, but these are physically combined in one unit.

There is a separate transmission path 16a-leee between the wave field synthesis module 10 and each individual loudspeaker module 12a-12e, each transmission path being coupled to the central wave field synthesis module and a separate loudspeaker module.

As a data transmission mode for transmitting data from the wave field synthesis module to a speaker module a serial transmission format that delivers a high data rate is preferred, such as a so-called Firewire transmission format or a USB data format. Data transfer rates of over 100 megabits per second are advantageous.

The data stream that is transmitted from the wave field synthesis module 10 to a loudspeaker module is accordingly formatted in accordance with the selected data format in the wave field synthesis module and with a

Provide synchronization information, which is provided in conventional serial data formats. This synchronization information is extracted by the individual loudspeaker modules from the data stream and used to analyze the individual loudspeaker modules with regard to their reproduction, i.e. ultimately to the analog-digital conversion for obtaining the analog loudspeaker signal and the sampling (resampling) provided for this purpose. to synchronize. It is preferred that the central wave field synthesis module operate as a master and that all loudspeaker modules operate as clients, with the individual data streams via the various transmission links 16a-16e all receiving the same synchronization information from the central module 10. This ensures that all loudspeaker modules work synchronously, synchronized by the master 10, which is important for the present audio reproduction system in order not to suffer any loss of audio quality, so that the synthesis signals calculated by the wave field synthesis module are not delayed by the individual loudspeakers corresponding audio rendering are emitted. This concept has the advantage that the individual loudspeaker modules do not have to be synchronized with one another. They are automatically synchronized with each other since they all run synchronously with the master. A connection of the individual loudspeaker modules to each other will be unfavorable for the present invention because of the modular concept of scalability with the loudspeaker modules in terms of the reproduction space size, a simple addition of modules is required without corresponding wiring between the modules having to be achieved.

2 shows a block diagram of a central wave field synthesis module according to a preferred exemplary embodiment of the present invention. The central wave field synthesis module initially comprises an input device 20 which is basically designed to receive an audio signal at an input, the audio signal having a plurality of audio tracks, each audio track being assigned an audio source position.

Depending on the application, the audio source position is an indication of the position of a loudspeaker with respect to a listener in the playback room in accordance with a standardized audio format, such as, for. B. 5.1 to achieve a compatible playback. In this case the audio signal would have 5 + 1 = 6 audio tracks. Alternatively, the audio signal can have a larger number of audio tracks, which are already available as signals suitable for wave field synthesis and represent audio sources or audio objects in a real recording position, which are reproduced with respect to the audio signal reproduction as virtual sources in the playback space using the wave field synthesis.

In a preferred exemplary embodiment of the present invention, the input device 20 is also used as the main control unit, which advantageously has further functionalities. In particular, it has the functionality of a decoding module, as is usually used in cinemas. As an alternative or in addition, the input device 20 is also designed as a DVD decoder which supplies the separate audio channels or audio tracks. Alternatively, the playback device 20 is also designed as an MPEG-4 decoding module, which already provides audio tracks 21 and corresponding audio source information 22 intended for wave field synthesis. In particular, the audio tracks 21 each relate to audio signals from audio objects in a recording setting, to the position of the audio objects in the recording setting, to properties of audio objects, in particular with regard to the size of the audio object or the density with regard to the acoustic properties of the audio object ,

Furthermore, it is preferred to also transmit properties of the recording space or the recording environment in addition to the audio tracks 21 in order to be able to take these into account in the wave field synthesis, if necessary. The information about the recording room or the recording environment should serve to give the listener not only a visual but also an audio impression of the recording situation. Thus, the visitor should also notice from the reproduced sound whether, for example, a recording scene of a movie is taking place outdoors or e.g. B. in a small space, such as a submarine. While a recording scenario in the open air delivers relatively "dry" audio signals because the recording environment shows hardly any or no reflections, this situation will be completely different in a submarine, for example. Here the recording setting is due to a very reflective room or a In this case, it is preferred to keep the audio tracks as dry as possible, that is, without playing in the room acoustics in the recording room and to describe the properties of the room acoustics in terms of their properties using additional meta information, such as can be transmitted in the standardized data stream in accordance with the MPEG 4 standard. The central wave field synthesis module further comprises a device 24 for determining channel information on the one hand and wave field synthesis signals on the other hand for the individual loudspeakers. For this purpose, a device 25 for converting the audio source positions 22 into virtual positions for the wave field synthesis is also provided.

In particular, the device 24 is designed to determine audio channel information for each audio channel from a virtual position to a loudspeaker position, the virtual position from the

Audio source position associated with the audio track depends (means 25) so that for each channel from each virtual position to each speaker

Audio channel information is available. Furthermore, the

Device 24 is configured to use the

Principles of wave field synthesis, as they are presented and known at the beginning, to calculate synthesis signals from the virtual positions for the loudspeakers.

The central wave field synthesis module in FIG. 2 further comprises means 26 for supplying synthesis signals to one or more loudspeakers. The device 26 is also designed to transmit channel information for the transmitted synthesis information from the central wave field synthesis module via the corresponding transmission links to the individual loudspeaker modules, so that audio rendering can take place there. Depending on the embodiment, it is preferred to transmit further channel information for this channel for each synthesis signal which relates to a channel from a virtual position to a specific loudspeaker. This means that, in a preferred exemplary embodiment of the present invention, the device 24 also supplies channel information for each synthesis signal or interpolates from calculated channel information and the device 26 for Provides so that the same can initiate a transmission to the individual speaker modules. The device 26 is preferably designed to filter out insignificant synthesis signals and thus neither to transmit the insignificant synthesis signals nor the associated channel information in order to save data transmission capacities. So often occurs the case that a virtual source leads to significant synthesis signals only for some speakers, while for all other speakers in the speaker array synthesis signals can also be calculated based on the theory of wave field synthesis. B. are relatively small in terms of their performance in a certain period of time and can therefore be neglected in terms of a reduced amount of data transfer.

In particular, the device 24 includes functionalities to be used to preprocess the audio signals. In addition, the device 24 controls the individual loudspeaker modules in particular in such a way that it either directly or in conjunction with the device 26 introduces synchronization information into the data streams transmitted to the individual loudspeaker modules and thus achieves central synchronization of all loudspeaker modules with the central wave field synthesis module.

In particular, the central wave field synthesis module is designed to carry out all processing operations that are the same for all reproduction channels, while according to the inventive concept, the processing operations that are different for the individual loudspeakers or the individual reproduction channels are carried out decentrally.

The device 24 is also designed to simulate wave field synthesis information for stereo Signals, 5.1 signals, 7.2 signals, 10.2 signals, etc. with a view to compatible playback. For this purpose, the standard positions of loudspeakers with respect to a playback space for the standardized audio format are used as audio source positions.

In this regard, reference is made to FIG. 5 below. FIG. 5 shows a playback room 50, a speaker array 52 that extends around the playback room, and a plurality of virtual sources 53a-53e that, as can be seen in FIG. 5, are positioned at virtual positions outside of the Playroom 50 are located. The device 24 is designed in connection with the device 25 from FIG. 1 in order to calculate virtual positions which can be controlled manually from the audio source information, that is to say the standard position information for such a 5.1 signal, for example. Depending on the embodiment, it is preferred to use the virtual positions e.g. B. to infinity, so that the speaker array 52 sonicates the playback room 50 with plane waves. As a result, the so-called sweet spot, i.e. the area in a reproduction room in which an optimal sound impression is obtained, is considerably enlarged compared to a common situation in which real 5.1 speakers are placed in the reproduction room.

Alternatively, the virtual sources can also be placed at finite virtual positions and modeled as point sources, this option having the advantage that the sound impression has a more pleasant effect on the cinema viewer / listener. Flat waves have the property that the listener has the impression that he is sitting in a very large room, which leads in particular to an unpleasant sensation when, for example, a submarine scene is currently taking place on the screen. In this context, it should be pointed out that conventional films with, for example, 5.1 audio tracks have no information about acoustic Features of the recording setting include. In such a case, it is therefore preferred to find a compromise between the plane waves, that is to say the virtual sources at an infinite position or the virtual sources at a finite position. In this context, the audio playback system according to the invention also provides the possibility of varying the virtual positions of the virtual loudspeakers 53a-53e depending on the film scene. For example, if a scene is taking place outdoors, the speakers can be positioned indefinitely. On the other hand, if a scene takes place in a small room, the loudspeakers can be positioned closer to the playback room 50.

In the context of compatible playback, in a preferred embodiment of the present invention, input device 20 is configured to sample the audio tracks associated with the video signal by a certain time "delay" before the video signals, such that after processing in the wave field synthesis module in the individual loudspeaker modules, the sound belonging to a point in time is sampled simultaneously with the video signal belonging to a point in time. The negative “delay” must at least be dimensioned in such a way that sound and image are emitted in the audio reproduction system according to the invention in a manner that is associated with one another. If the negative delay is dimensioned somewhat larger, the signals can already be calculated and, for example, can be output from the loudspeaker modules to the loudspeakers by means of a corresponding synchronization signal which ensures the synchronism of image and sound.

Both in the case of compatible playback and in the case where the input audio signal already includes prepared wave field synthesis information about sound sources in the recording setting, it is preferred to provide information about the playback space via a line 27 Channel information calculator 24 supply so that the synthesis signals can be prepared using the information about the playback space to z. B. to achieve an elimination of the acoustic properties of the playback room.

Information about the playback room can either be determined on the basis of the geometrical nature of the playback room, or can be measured in the playback room using the loudspeakers and special microphone arrays, with control and evaluation for this being possible via an adaptation module 28 for the playback room. Thus, in one embodiment of the present invention, it is preferred to determine the acoustic properties of the display room during playback and to re-adjust the information about the display room accordingly, so that optimum suppression of the cinema acoustics also takes place for a filled cinema, for example. At this point, it should be pointed out that, particularly in the case of smaller, fully-filled display rooms, the acoustic properties of the display room differ significantly from those when there are no people in the display room.

The adaptation module 28 for the reproduction room further comprises a microphone array which can be used to measure the properties of the reproduction. Furthermore, the adaptation module 28 for the reproduction space comprises algorithms to find the position of speaker arrays in the reproduction space. In addition, preprocessing of measurement results is carried out here in order to carry out an optimal inversion of the room and loudspeaker properties, the adaptation module 28 preferably being controlled by the device 24 for this purpose. Depending on the embodiment, the adaptation module 28 for the playback room is only required for the system structure. However, if continuous adaptation to a changed situation in the playback room is desired, the adaptation module 28 can also be used continuously during operation.

If the channel information calculation device 24 is used to process WFS-specific signals input into the device 20, the additional WFS information, that is to say the properties of, for example, the audio objects and the properties of the recording space, are extracted from the input audio signal and via a WFS information line 29 fed to the device 24 so that this information can be taken into account in the channel information calculation.

In this case, the central WFS module is also designed to carry out preprocessing of the WFS-prepared audio signals. Furthermore, the device 24 and / or the device 26 is intended to achieve the synchronization between image and sound, for which, as has been explained, time codes are introduced into the preferably serial data streams for the individual loudspeaker modules. Finally, as has already been stated, the channel information calculation device 24 is also responsible for driving the adaptation module 28 in order to control the measurement of the acoustic properties of the reproduction space, if desired, either before playback or during playback.

The multiplexer / transmission stage 26 is designed to insert synchronization information, which is generated either by the device 24, by the control device 20 or in the device 26 itself, into the data streams to the loudspeaker modules, to which those for the individual speakers required synthesis signals and necessary channel information are supplied.

At this point it should also be pointed out that the device 24 for calculating the channel information and for calculating the synthesis signals must also be provided with the speaker locations in the special reproduction room in order to calculate the individual synthesis signals and the individual channel information for the individual speakers. This is symbolically represented in FIG. 2 by a line 30.

A preferred exemplary embodiment of a loudspeaker module is discussed below with reference to FIG. 3. The loudspeaker module first comprises a receiver / decoder block 31 in order to receive the data stream from the selection device and to extract associated channel information 31b and synchronization information 31c from the same synthesis signals 31a. The loudspeaker module shown in Fig. 3 further includes, as a central unit, an audio rendering device 32 for calculating a playback signal for the speaker using the one or more synthesis signals and using the channel information associated with the synthesis signals. Finally, a loudspeaker module comprises a signal processing device 33 with a digital / analog converter for generating an analog loudspeaker signal which is fed to the loudspeaker LSi 34 concerned in order to generate a sound signal. The signal processing device 33 and in particular the resampler, which cooperates with the digital / analog converter, is supplied via the synchronization information (31c) extracted from the data stream by the receiver 31 in order to be synchronous to the central wave field synthesis module and thus synchronous to all other loudspeaker modules 24 of Fig. 1 calculated on the To overlay loudspeakers overlaid with channel information and channel signals.

The loudspeaker module shown in FIG. 3 is thus characterized by the combination of a digital receiver, a further signal processing device and a digital-to-analog converter, wherein in particular a digital amplifier can also be provided in the signal processing device 33. Alternatively, however, the signal can also be amplified after the digital / analog conversion, although digital amplification is preferred due to the more precise possibility of synchronization. It is further preferred to couple the loudspeaker 34 to the signal processing device 33 via a short analog line. However, if it is not possible for the line from the signal processing device 33 to the loudspeaker 34 to be short, it is preferred that the corresponding lines of all loudspeakers have the same length or have length differences which are within a predetermined tolerance limit, since the synchronization is preferably on digital side is performed so that with very different line lengths between the loudspeaker modules and the loudspeaker a desynchronization could occur, which could already lead to audible artifacts or to a loss of the sound impression that is to be created by the wave field synthesis.

In a preferred embodiment of the present invention, channel impulse responses in the time domain or in the frequency domain are transmitted as channel information. In this case, the audio rendering device 32 is designed to carry out a convolution of the individual synthesis signals with the channel information assigned to the synthesis signals. This convolution can actually be implemented as convolution in the time domain, or can be implemented in the Frequency range can be performed by multiplying the analysis signal in the frequency range with the channel transfer function. A configuration which is optimized with regard to the processing outlay is shown in FIG. 4. Fig. 4 shows a preferred embodiment of the audio rendering device 32 and comprises for each synthesis signal S _j i (t) a time-frequency conversion block 34a, 34b, 34c, and for each branch a multiplier 35a, 35b, 35c for multiplying the transformed a synthesis signal with the transform of a channel impulse response H _j i (f), a summer 36 and a final frequency-time conversion device 37, which are connected as shown in FIG. 4. The arrangement shown in Fig. 4 is characterized in that it is reduced in terms of processing effort by the summation of the synthesis signals, which are already acted on by the corresponding channel transmission functions, takes place in the frequency domain, so that for each speaker module regardless of the number the synthesis signals only a single frequency-time converter is available. Depending on the embodiment, the time-frequency transformation of the synthesis signals s-μ can be carried out completely in parallel, or, if there is sufficient time, also serial / parallel or completely serial.

As has been explained, the preferred audio rendering device 32 shown in FIG. 4 is characterized in that, regardless of the number of synthesis signals that are fed to a loudspeaker module, it has only a single frequency-time conversion device 37, which is preferably implemented as an inverse FFT, in which case the devices 34a, 34b, 34c are implemented as FFT (FFT = fast Fourier transform).

The audio rendering device 32 shown in FIG. 3 is also designed to receive special program information from the central wave field synthesis module shown in FIG. 2. For this purpose, the multiplexer / transmitter stage 26 has a special output in order to supply the program information to the loudspeaker modules. Depending on the application, the program information can also be multiplexed into the data stream with synthesis signals and channel information, although this is not absolutely necessary.

An example of the transmission of program information to a loudspeaker module is shown below. If the channel information is described as channel impulse responses and transmitted to the individual loudspeaker modules, it is preferred not to transmit the entire impulse response in the sense of a data rate saving, but rather only samples of the impulse response that lie in a front area of the impulse response, the envelope of which is still an amount has a threshold. At this point it should be pointed out that impulse responses typically have large values at small points in time and gradually take on smaller values and finally have a so-called "reverberation tail", which is important for the sound impression, but whose samples are no longer particularly large, and In this case, it is preferred not to transmit the reverberation tail, whose envelope is below the threshold value, on the basis of its sample values, but only to transmit base values for the envelope That are required by the audio rendering device 32 are then generated according to the invention in that the audio rendering device generates a random sequence of zeros and ones, the amplitude of which is weighted for the envelope with the transmitted base values it is preferred to transfer only a few base values and to interpolate between the base values, and then use the interpolated envelope to weight the random 0/1 sequence.

It should be noted that the random 0/1 sequence is preferably implemented by positive voltage values for a "1" and negative voltage values for a "0". The information that the audio rendering device receives channel information that is actual samples up to a certain value and then is only support values for the envelope is transmitted via the program information input shown in FIG. 3 or is fixed agreed.

The wave field synthesis module according to the invention further comprises a WFS mixing console, not shown in FIG

Authoring system included to generate WFS sound descriptions.

The procedure on which the generation of synthesis signals is based is discussed below with reference to FIG. 6. A system with three virtual sources at three virtual positions 60, 61, 62 and a speaker LSi 63 at a real speaker position, which is known to the central WFS module, is considered. Furthermore, the virtual positions of the virtual sources 60, 61, 62 are known to the central wave field synthesis module either from the fact that they are supplied in a WFS-processed input signal or from the fact that they are derived by means of audio source positions by the means 25 for calculating the virtual positions. The synthesis signals s ₂ ι, s ₂ ι and s ₃ i are the signals which the loudspeaker 63 must emit and which go back to the respective virtual positions 60, 61, 62. From this it can be seen that, as has been stated, each loudspeaker will emit the overlay of several synthesis signals.

A channel ji is also defined between each virtual position and each loudspeaker can be described by an impulse response, a transfer function or any other channel information, as shown with reference to FIG. 7. All desired properties can be packaged in the channel description, in order then to apply the channel information for the corresponding channel assigned to a synthesis signal to the synthesis signals which are calculated by the wave field synthesis module. If the channel information is given in the form of an impulse response that describes the channel, the loading is a convolution. If the signals are in the frequency range, the application is a multiplication. Alternative channel information can also be used depending on the embodiment.

In the following, FIG. 7 shows which information can be used to influence a channel 70 from a virtual source 71 to a real loudspeaker 72. First of all, the virtual position of the virtual source 71 is included in the channel information, for example the channel impulse response. Properties of the virtual source are also included, such as. B. size, density, etc. B. a small triangle must be described and modeled differently than a large timpani. Furthermore, as shown in FIG. 7, the properties of the recording space are included in the channel transmission function. Further influencing components are a system distortion of the entire audio reproduction system, which, for example, contains loudspeaker distortions or non-idealities of the loudspeakers. The channel information also includes information about the playback space in order to compensate for the acoustic properties of the playback space. If, for example, the reproduction room is known to have a wall opposite a loudspeaker that is reflecting and whose reflection is to be suppressed, however, the corresponding loudspeaker is underneath Taking this information into account is controlled in such a way that it contains a signal which is 180 degrees out of phase with the reflected signal and has a corresponding amplitude, so that an extinguishing reflection occurs and the wall becomes acoustically transparent, ie no longer for a listener due to the reflections is identifiable.

Finally, the channel information can also be used to set a specific target reproduction acoustics. For this purpose, it is preferred to first suppress the acoustics of the display room in the form of a display room compensation, in order to then generate channel information and feed it to the wave field synthesis module, so that the acoustics of any other display room can be simulated in a display room.

Depending on the circumstances, the method according to the invention for reproducing an audio signal can be implemented in hardware or in software. The implementation can take place on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can cooperate with a programmable computer system such that the method is carried out. In general, the invention thus also consists in a computer program product with program code stored on a machine-readable carrier for carrying out the method according to the invention when the computer program product runs on a computer. In other words, the invention can thus be implemented as a computer program with a program code for carrying out the method if the computer program runs on a computer.

Claims

claims

Audio reproduction system for a reproduction room, in which a plurality of loudspeakers (14a-14e) are arranged at defined loudspeaker locations, using an audio signal with a plurality of audio tracks, each audio track being assigned an audio source position, with the following features:

a central wave field synthesis module (10) which is designed

to determine audio channel information for each audio channel from a virtual position to a speaker position, the virtual position depending on the audio source position associated with the audio track so that there is audio channel information for each channel from each virtual position to each speaker,

to calculate synthesis signals from the virtual positions for the speakers (24), and

to each speaker one or more synthesis signals to be reproduced by the speaker concerned, and

Supply (26) channel information for the one or more synthesis signals;

a plurality of loudspeaker modules (12a-12e), one loudspeaker module being assigned to one loudspeaker, and each loudspeaker module having the following features: a receiver (31) for receiving the one or more synthesis signals for the affected speaker and the channel information;

rendering means (32) for calculating a playback signal for the speaker using the one or more synthesis signals and the channel information for the affected speaker; and

a signal processing device (33) for generating an analog loudspeaker signal, which can be fed to the loudspeaker concerned, on the basis of the playback signal; and

a plurality of transmission lines (16a-16e) from the central wave field synthesis module to each loudspeaker, each transmission link being coupled to the central wave field synthesis module on the one hand and its own Lau speaker module on the other.

2. Audio playback system according to claim 1, wherein each speaker module with the speaker with which it is associated, is carried out in a compound, so that a spatial distance between the speaker and the speaker module is smaller than a spatial distance between the speaker module and the central wave field synthesis module ,

3. The audio playback system of claim 1 or 2, wherein the audio channel information is impulse responses for the audio channels.

4. The audio playback system according to claim 3, wherein the rendering device for calculating a playback signal comprises a folding device for under one or more folding information Use the one or more synthesis signals with the appropriate impulse responses.

5. The audio playback system as claimed in claim 4, in which the rendering device (32) has the following features:

time domain frequency domain converting means (34a, 34b, 34c) for each synthesis signal;

a multiplier (35a, 35b, 35c) for each synthesis signal;

a summation device (26) for summing synthesis signals present in the frequency range and subjected to corresponding channel impulse responses; and

a single frequency domain-time domain converting means (37) for converting the sum signal into the time domain to obtain the reproduction signal.

6. Audio playback system according to claim 1, wherein the signal processing device (33) in the loudspeaker module comprises a digital amplifier.

7. The audio playback system as claimed in claim 4, in which the central wave field synthesis module is designed to sample-transmit a first part of the channel impulse response and a second part only using envelope curve support values, and

in which the rendering device (32) is designed to reconstruct the second part of the channel impulse response using the reference values.

8. The audio playback system according to claim 7, wherein the rendering device (32) is designed to display the to generate the second part of the channel impulse response by means of a noise generator or pseudo-noise generator, wherein noise values or pseudo-noise values are amplitude-weighted with the base values and / or auxiliary values interpolated from the base values.

9. Audio playback system according to one of the preceding claims, in which the audio tracks are standardized multi-channel tracks and the audio source positions are standard positions which relate to a positioning of playback speakers in a playback room, the number of standard positions being equal to the number of standardized multi-channel tracks.

10. Audio reproduction system according to claim 9, in which the wave field synthesis module is designed to calculate (25) the virtual positions for calculating the audio channel information from the standard positions (22).

11. The audio playback system according to claim 10, wherein the wave field synthesis module is designed to place the virtual positions in infinity

(25) so that the multitude of loudspeakers together emit plane sound waves.

12. The audio playback system according to claim 10, wherein the wave field synthesis module is designed to simulate virtual playback speakers at defined virtual positions as point-shaped sound sources that are so far away from the plurality of speakers that an optimal playback area essentially encompasses the entire playback space ,

13. Audio playback system according to one of claims 9 to 12, wherein the audio tracks are part of a video or cinema film, the wave field synthesis module is designed to sample the audio tracks of the video or cinema film offset by a time period before video playback, the time period being selected in order to obtain a simultaneous reproduction of image and sound taking into account a processing time in the wave field synthesis module and the loudspeaker module.

14. Audio reproduction system according to one of claims 1 to ^" 13, wherein the audio signal for audio objects in one

Recording environment each include an audio signal of the object as an audio track and a position of the audio object in the recording environment, one or more properties of the audio object such as size or density and / or information about acoustic properties of a recording environment.

15. The audio playback system according to claim 14, wherein the wave field synthesis module is designed to determine the virtual positions from positions of the audio objects in the recording environment.

16. Audio reproduction system according to one of the preceding claims, in which the wave field synthesis module is designed to receive information about acoustic properties of the reproduction space and to take it into account when determining the channel information, so that the sound waves reproduced by the plurality of loudspeakers are designed in such a way that acoustic Influences of the playback space are reduced.

17. Audio reproduction system according to one of the preceding claims, in which the wave field synthesis module is designed to adapt to an acoustics of the reproduction space before or during a reproduction of the audio signal by a plurality of room impulse responses between the speakers and microphones positioned in the playback room is calculated,

an overall impulse response of the playback space is interpolated from the plurality of space impulse responses, and

the entire impulse response is taken into account when calculating the channel information in order to reduce acoustic properties of the reproduction space.

18. Audio playback system according to one of the preceding claims, wherein the central

The wave field synthesis module is designed to generate synchronization information and to embed it in data streams to the loudspeaker modules, and in which the plurality of loudspeaker modules is designed to receive the synchronization information from the central wave field synthesis module and to use it for synchronization, so that the loudspeaker modules synchronize with the central wave field synthesis module are.

19. A method for reproducing an audio signal in a playback room in which a plurality of loudspeakers are arranged at defined loudspeaker locations, the audio signal having a plurality of audio tracks, each audio track being assigned an audio source position, comprising the following steps:

centrally determining audio channel information for each audio channel from a virtual position to a speaker position, the virtual position depending on the audio source position associated with the audio track so that for each Channel from each virtual position to each speaker there is audio channel information;

central determination of synthesis signals from the virtual positions for the loudspeakers;

Transmitting one or more synthesis signals and associated channel information to a plurality of loudspeaker modules;

decentrally calculating a playback signal for the speaker using the one or more synthesis signals and the associated channel information for an affected speaker;

Performing signal processing using digital to analog conversion to produce an analog speaker signal; and

collectively recovering the analog speaker signals through the plurality of speakers.

20. Computer program with a program code for performing the method according to claim 19, when the program runs on a computer.