EP3002960A1

EP3002960A1 - System and method for generating surround sound

Info

Publication number: EP3002960A1
Application number: EP14461574.7A
Authority: EP
Inventors: Jacek Paczkowski; Tomasz Nalewa; Krzysztof Kramek
Original assignee: Patents Factory Ltd Sp zoo
Current assignee: Patents Factory Ltd Sp zoo
Priority date: 2014-10-04
Filing date: 2014-10-04
Publication date: 2016-04-06

Abstract

A signal comprising at least one sound event data, wherein the sound event comprises: time of event information; information regarding location in space with respect to a reference location point; a movement trajectory in space; orientation information; spatial characteristic of the source of the event; information on sampling frequency; information on signal resolution; and a set of acoustic samples of the sampling frequency and at the signal, resolution. A respective method and system are also disclosed.

Description

The present invention relates to a system and method for generating surround sound. In particular the present invention relates to surround environment independent from number of loudspeakers and configuration/placement of the respective loudspeakers.
Prior art defines surround sound systems such as Dolby Digital and DTS multichannel based transmission and presentation of sound. A disadvantage of this solution is a dependence of the obtained effect on loudspeakers placement and room acoustics. Both technologies suggest optimal loudspeakers placement, which, however, is often infeasible due to the shape and arrangement of the room.
There are known sound correction systems which, however, most often are based on a suitable delay of signals destined to each loudspeaker. The problem of sound reflections off the walls is similarly corrected.
Reflections may be used to generate virtual surround sound. This is the case in so-called sound projectors (an array of loudspeakers in a single casing - a so called sound bar).
Problems with surround sound arise from the fact that the data in the stream of acoustic assume specific locations of each loudspeaker relative to the listener. Even the names of channels define particular arrangements i.e.: central, left front, right front, left rear, right rear. In these prior art surround systems the same sound data stream is sent to the speakers of each listener, regardless of the actual position of the speakers in the presentation room.
It would be advantageous to provide a surround sound solution that would be independent from number of loudspeakers and configuration/placement of the respective loudspeakers.
Prior art discloses Ambisonics system, which is a full-sphere surround sound technique: in addition to the horizontal plane, it covers sound sources above and below the listener.
Unlike other multichannel surround formats, its transmission channels do not carry speaker signals. Instead, they contain a speaker-independent representation of a sound field called B-format, which is then decoded to the listener's speaker setup. This extra step allows the producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of speakers used for playback (source: Wikipedia).
The aim of the development of the present invention is a surround system and method that is independent from number of loudspeakers and configuration/placement of the respective loudspeakers.

SUMMARY AND OBJECTS OF THE PRESENT INVENTION

An object of the present invention is a signal comprising at least one sound event data, wherein the sound event comprises: time of event information; information regarding location in space with respect to a reference location point; a movement trajectory in space; orientation information; spatial characteristic of the source of the event; information on sampling frequency; information on signal resolution; and a set of acoustic samples of the sampling frequency and at the signal, resolution.
Another object of the present invention is a method for generating surround sound the method comprising the steps of: receiving a sound data signal stream according to the present invention; obtaining access to a database of loudspeakers records the records comprising information regarding location and characteristics of loudspeakers available; executing calculating, which loudspeakers may be used from the available loudspeakers so as to achieve the effect closest to a perfect arrangement according to the at least one sound event data; calculating an angular difference between sound source location and positions of the candidate loudspeakers in spherical coordinates; selecting a set of loudspeakers that have the lowest distance from the sound event location; in case of insufficient number of physical loudspeakers, creating one or more virtual loudspeakers by utilizing reflection of sound; and generating data streams that are to be sent to physical loudspeakers.
Preferably, the step of calculating is be executed for each sound event.
Preferably, in case the sound source is located between physical loudspeakers, the closest loudspeakers will be used in order to simulate a virtual loudspeaker, located where the sound source is located, by applying a superposition principle.
Preferably, the selected loudspeakers are located at opposite sides, when facing the reference location of a user, with respect to the sound event location.
Preferably, the information regarding location and characteristics of loudspeakers is described with s points whereas u describes a shape of the sound beam in the horizontal plane while v respective shape in the vertical plane wherein such characteristics is determined using an array of microphones.
Another object of the present invention is a computer program comprising program code means for performing all the steps of the computer-implemented method according to the present invention when said program is run on a computer.
Another object of the present invention is a computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to the present invention when executed on a computer.
These and other objects of the invention presented herein, are accomplished by providing a system and method for generating surround sound. Further details and features of the present invention, its nature and various advantages will become more apparent from the following detailed description of the preferred embodiments shown in a drawing, in which:

Fig. 1 presents a diagram of a sound event;
Fig. 2 presents a diagram of the method according to the present invention;
Fig. 3 presents a diagram of the system according to the present invention.

NOTATION AND NOMENCLATURE

Some portions of the detailed description which follows are presented in terms of data processing procedures, steps or other symbolic representations of operations on data bits that can be performed on computer memory. Therefore, a computer executes such logical steps thus requiring physical manipulations of physical quantities.
Usually these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. For reasons of common usage, these signals are referred to as bits, packets, messages, values, elements, symbols, characters, terms, numbers, or the like.
Additionally, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Terms such as "processing" or "creating" or "transferring" or "executing" or "determining" or "detecting" or "obtaining" or "selecting" or "calculating" or "generating" or the like, refer to the action and processes of a computer system that manipulates and transforms data represented as physical (electronic) quantities within the computer's registers and memories into other data similarly represented as physical quantities within the memories or registers or other such information storage.
A computer-readable (storage) medium, such as referred to herein, typically may be non-transitory and/or comprise a non-transitory device. In this context, a non-transitory storage medium may include a device that may be tangible, meaning that the device has a concrete physical form, although the device may change its physical state. Thus, for example, non-transitory refers to a device remaining tangible despite a change in state.

DESCRIPTION OF EMBODIMENTS

The present invention is independent from loudspeakers placement due to the fact that an acoustic stream is not divided into channels but rather sound events present in a three-dimensional space.
Fig. 1 presents a diagram of a sound event according to the present invention. The sound event 101 represents the fact of presence of a sound source in an acoustic space. Each such event has an associated set of parameters such as: time of event 102, location in space with respect to a reference location point 103. The location may be given as x,y,z coordinates (alternatively spherical coordinates r,α,β may be used).
The sound event 101 comprises further a movement trajectory in space (for example in case of a vehicle changing its location) 104. The movement trajectory may be defined as n, _Δt1, x1, y1, z1, γ1, δ1,_Δt2, x2, y2, z2, γ2, δ2, ..., _Δtn, xn, yn, zn, γs, δs which is a definition of a curve on which the sound source moves. n is a number of points of the curve while the xi, yi, zi are points in space and γ,δ is temporary orientation of the sound source (azimuth and elevation) and _Δt is an increase in time.
The sound event 101 comprises further orientation (γ,δ - direction in which the highest sound amplitude is generated; azimuth and elevation are defined relative to orientation of a coordination system) 105.
Additionally, the sound event 101 comprises spatial characteristic of the source of the event (a shape of a curve of the sound amplitude with respect to emission angle - zero angle means emission in the direction of the highest amplitude) 106. This parameter may be provided as s, λ1, u1, v1, λ2, u2, v2, λ3, u3, v3, γ3, 53, ..., λs, us, vs where the characteristic is symmetrical and described with s points whereas u_i describe a shape of the sound beam in the horizontal plane while v_i respective shape in the vertical plane.
The sound event 101 comprises further information on sampling frequency (in case it is different from the base sampling frequency of the sound stream) 107, signal resolution (the number of bits per sample; this parameter is present if a given source has a different span than a standard span of the sound stream) 108 and a set of acoustic samples 109 of the given frequency, resolution.
A plurality of sound events will typically be encoded into an output audio data stream.
The samples are always monophonic and are present as long as a given sound source emits a sound. In case of speech it means that a sound source appears and disappears in the sound stream. This is the reason for naming such event a sound event. In case of a recording of an orchestra there will occur appear/disappear events of respective instruments. As can be easily seen such an approach to sound data stream results in variable bitrate, wherein the changes may be substantial. When there are not any sound events the bitrate will be close to zero while in case of multiple sound events the bitrate may be higher (even higher than in case of prior art surround systems).
The loudspeakers may be located in an arbitrary way however preferably they should not be all placed in a single place, for example a single wall. According to the present invention the plurality of loudspeakers may be considered a cloud of loudspeakers. The more the loudspeakers the better spatial effect may be achieved. Preferably the loudspeakers are scattered in the presentation location, preferably on different walls of a room.
The loudspeakers may be either wired or wireless and be communicatively coupled to a sound decoder according to the present invention. The decoder may use loudspeakers of other electronic devices as long as communication may be established with controllers of such speakers (eg, bluetooth or wi-fi communication with loudspeakers of a TV set or mobile device).
The sound decoder according to the present invention may obtain information on location and characteristic of a given loudspeaker by sending to its controller a test sound stream and subsequently recording the played back test sound stream and analyzing the relevant acoustic response.
For the purpose of obtaining information on location and characteristic of a given loudspeaker there may be used an array of omnidirectional microphones, for example spaced from each other by 10cm and positioned on vertices of a cube or a tetrahedron. By measuring delays in a signal reaching respective microphones, one may estimate sound location. The characteristics of a given loudspeaker may be obtained by analyzing recorded sound at different frequencies.
Other methods for obtaining information on location and characteristic of a given loudspeaker include solutions presented in US20140112484 or in "Analysis of Loudspeaker Placement and Loudspeaker-Room Interaction, and Correction of Associated Effects" by Michael Hlatky of University of Applied Sciences Offenburg, Media and Information Technology, Bang & Olufsen a/s, Department of Acoustics, August 2007.
According to the present invention there are used sound reflections in order to generate sounds from directions where there is not any loudspeaker present. To this end the sound decoder executes sound location analysis aimed at using reflective surfaces (such as walls) to generate reflected sounds. All sound reflecting surfaces are divided into triangles and each of the triangles is treated by the decoder as a virtual sound source. Each triangle has an associated function defining dependence of a sound virtually emitted by this triangle on sounds emitted by physical loudspeakers. This function defines the amplitude as well as spatial characteristics of emission, which may be different for each physical loudspeaker. In order for the system to operate properly it is necessary to place, at a sound presentation location, microphones used by the sound decoder for constant measurements of compliance of the emitted sounds with expected sounds and for fine tuning the system.
Such a function is a sum of reflected signals emitted by all loudspeakers in a room, wherein a signal reflected from a given triangle depends on the triangle location, loudspeaker(s) location(s), loudspeaker(s) emission characteristics, acoustic pressure emitted by the loudspeaker(s). The signal virtually emitted by the triangle will be a sum of reflection generated by all loudspeakers. A spatial acoustic emission characteristics of such triangle will depend on physical loudspeakers whereas each physical loudspeaker will influence it partially. Such characteristics may be discrete, comprising narrow beams generated by different loudspeakers. Therefore, in order to eliminate sound reflected at a given location, there has to be selected an appropriate loudspeaker or a linear combination of loudspeakers (appropriate means in line with the acoustic target eg. generating, from a given plane, a reflection in the direction of the listener such that other reflections do not ruin the effect).
The most important module of the system is a local sound renderer. This means that the renderer receives separate sound events and composes from them acoustic output streams that are subsequently sent to loudspeakers.
Due to the fact that the sound events comprise information on location of sound sources with respect to a reference location (for example the listener), the renderer shall select a speaker or speakers, which is/are closest to the location in space where the sound was emitted from. In case a speaker is not present in that location, speakers adjacent to this location shall be used, preferably speakers located at opposite sides of the location so that they may be configured in order to create an impression for the listener that the sound is emitted from its original location in space.
More than two loudspeakers may be used for one sound event in particular when a virtual sound source is to be positioned between them.
In case there are not any physical loudspeakers in the vicinity of the location (direction) of the sound of a sound event, reflections from adjacent planes (such as walls) may be used to position the sound. Knowing a sound reflection function for a given reflective section optimal physical loudspeakers need to be chosen for generating the reflection effect.
The reference point location may be differently selected for a given sound rendering location or room. For example on may listen to the music in an armchair and watch television sitting on a sofa. Therefore, there are two different reference locations depending on circumstances. Consequently, the coordinates system changes. The reference location may be automatically obtained by different sensors such as an infrared camera or manually input by the listener. Such solution is possible only because of local sound rendering.
An exemplary normalized characteristics of a physical loudspeaker is shown in Fig. 1B. The characteristic is usually symmetrical and described with s points whereas u describes a shape of the sound beam in the horizontal plane while v respective shape in the vertical plane. Such characteristics may be determined using an array of microphones as previously described.
In case of reflection, characteristic can be asymmetrical and discontinuous.
Fig. 2 presents a diagram of the method according to the present invention. The method starts, after receiving a sound data stream according to Fig. 1, at step 201 from accessing a database of loudspeakers present at sound presentation location. Subsequently, at step 202, there is executed calculating, which loudspeakers may be used from the available loudspeakers so as to achieve the effect closest to a perfect arrangement. This may be effected by location thresholding based on the database of loudspeakers records.
Such calculation needs to be executed for each sound event because sound events may run in parallel and the same loudspeaker(s) may be needed to emit them. Data for each loudspeaker has to be added by applying superposition approach (all sound events at a given moment of time that affect a selected loudspeaker).
In case a loudspeaker is close to a location in which a sound source is located, this loudspeaker will be used. In case the sound source is located between physical loudspeakers then the closest loudspeakers will be used in order to simulate a virtual loudspeaker, located where the sound source is located. A superposition principle may be applied for this purpose. It is necessary to take into account, during this process, the emission characteristics of the loudspeakers.
The physical loudspeakers selected for simulating a virtual loudspeaker, will emit sound in direction of the listener at predefined angles of azimuth and elevation. For these angles there is to be read attenuation level from the emission characteristic of the loudspeaker (the characteristics is normalized and therefore it will be a number from a range of 0 ... 1) and multiplied by emission strength of the loudspeaker (acoustic pressure). Only after that, superposition may be executed. The signals are to be added by assigning weights to loudspeakers, the weights arising from location of a virtual loudspeaker with respect to these used to its generation (based on proportionality rule).
The calculations shall include not only the direction from which a sound event is emitted but also a distance from the listener (i.e. a delay of the signal in such a way so as to simulate the correct distance from the listener to the sound event). The properly selected loudspeakers surround the sound event location. There may be more than two selected loudspeakers that will emit a particular sound event data.
At step 203 there is calculated an angular difference between sound source location and positions of the candidate loudspeakers in spherical coordinates. The sound event location is:

r_ssi - a distance of the i-th sound event location from the listener;
γ_i - azimuth on the i-th sound event location
δ_i - elevation angle of the i-th sound event
and the loudspeaker location is:
r_sj- a distance of the j-th loudspeaker location from the listener;
γ_j - azimuth on the j-th loudspeaker location
δ_j - elevation angle of the j-th loudspeaker

Thus the angular difference is as follows: $Δγ = γ_{i} - γ_{j}$
$Δδ = δ_{i} - δ_{j}$
A set of loudspeakers that have the lowest distance from the sound event location are selected at step 204. The loudspeakers are to be located at opposite sides (when facing the reference location of a user) with respect to the sound event location so that the listener has an impression that the sound arrives from the sound event location.
Subsequently, at step 205, in case of insufficient number of physical loudspeakers there may be created one or more virtual loudspeaker(s). Reflection of sound is utilized for this purpose. The reflections are generated by physical loudspeakers so that they imitate a physical loudspeaker in a given location of the sound presentation location. The generated sound will reflect from a selected surface and be directed towards the listener.
Knowing the location of the virtual loudspeaker, a straight line is to be virtually drawn from the listener to this location and further to a reflective plane (such as a wall). A point indicated as an intersection of this line with the reflective plane will indicate a triangle on the reflective plane, which is to be used in order to generate a reflected sound. From the characteristics of emission of that triangle it needs to be read which physical loudspeakers are to be used. Subsequently, there needs to be used a function defining dependency of emission of the triangle from particular loudspeakers in order to generate data streams 206 that are to be sent to physical loudspeakers in order to achieve a reflected sound from that particular triangle. These data stream are to be added to other data emitted by the respective loudspeakers 207.
Fig. 3 presents a diagram of the system according to the present invention. The system may be realized using dedicated components or custom made FPGA or ASIC circuits. The system comprises a data bus 301 communicatively coupled to a memory 304. Additionally, other components of the system are communicatively coupled to the system bus 301 so that they may be managed by a controller 305.
The memory 304 may store computer program or programs executed by the controller 305 in order to execute steps of the method according to the present invention.
The system comprises a sound input interface 303, such as an audio/video communication connector eg. HDMI or communication connector such as Ethernet. The received sound data is processed by a sound renderer 302 managing the presentation of sounds using the listener's premises loudspeakers setup. The management of the presentation of sounds includes virtual loudspeakers management that is effected by a virtual loudspeakers module 307 operating according to the method described above.
The present invention related to recording, encoding and decoding of sound in order to provide for surround playback independent of loudspeakers setup at the sound presentation location. Therefore, the invention provides a useful, concrete and tangible result.
The aforementioned recording, encoding and decoding of sound takes place in special systems and processes sound data. therefore the machine or transformation test is fulfilled and that the idea is not abstract.
It can be easily recognized, by one skilled in the art, that the aforementioned method for generating surround sound may be performed and/or controlled by one or more computer programs. Such computer programs are typically executed by utilizing the computing resources in a computing device. Applications are stored on a non-transitory medium. An example of a non-transitory medium is a non-volatile memory, for example a flash memory or volatile memory, for example RAM. The computer instructions are executed by a processor. These memories are exemplary recording media for storing computer programs comprising computer-executable instructions performing all the steps of the computer-implemented method according the technical concept presented herein.
While the invention presented herein has been depicted, described, and has been defined with reference to particular preferred embodiments, such references and examples of implementation in the foregoing specification do not imply any limitation on the invention. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader scope of the technical concept. The presented preferred embodiments are exemplary only, and are not exhaustive of the scope of the technical concept presented herein.
Accordingly, the scope of protection is not limited to the preferred embodiments described in the specification, but is only limited by the claims that follow.

Claims

A signal comprising at least one sound event (101) data, characterized in that the sound event (101) comprises:
• time of event (102) information;

• information regarding location in space with respect to a reference location point (103);

• a movement trajectory in space (104);

• orientation information (105);

• spatial characteristic of the source of the event (106);

• information on sampling frequency (107);

• information on signal resolution (108); and

• a set of acoustic samples (109) of the sampling frequency (107) and at the signal, resolution (108).
A method for generating surround sound the method being characterized in that it comprises the steps of:
• receiving (201) a sound data signal stream according to claim 1;

• obtaining access to a database of loudspeakers records the records comprising information regarding location and characteristics of loudspeakers available;

• executing calculating (202), which loudspeakers may be used from the available loudspeakers so as to achieve the effect closest to a perfect arrangement according to the at least one sound event (101) data;

• calculating (203) an angular difference between sound source location and positions of the candidate loudspeakers in spherical coordinates;

• selecting a set of loudspeakers that have the lowest distance from the sound event location (204);

• in case of insufficient number of physical loudspeakers, creating (205) one or more virtual loudspeakers by utilizing reflection of sound; and

• generating data streams (206) that are to be sent to physical loudspeakers.
The method according to claim 2 characterized in that the step of calculating (202) is be executed for each sound event.
The method according to claim 3 characterized in that in case the sound source is located between physical loudspeakers, the closest loudspeakers will be used in order to simulate a virtual loudspeaker, located where the sound source is located, by applying a superposition principle.
The method according to claim 2 characterized in that the selected loudspeakers are located at opposite sides, when facing the reference location of a user, with respect to the sound event location.
The method according to claim 2 characterized in that the information regarding location and characteristics of loudspeakers is described with s points whereas u describes a shape of the sound beam in the horizontal plane while v respective shape in the vertical plane wherein such characteristics is determined using an array of microphones.
A computer program comprising program code means for performing all the steps of the computer-implemented method according to claim 1 when said program is run on a computer.
A computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to claim 1 when executed on a computer.
A system for generating surround sound the system comprising
• a data bus (301) communicatively coupling components of the system;

• a memory (304) for storing data;

• a controller (305);

• a sound input interface (303);
the system being characterized in that it comprises:

• whereas the controller (305) is configured to control a sound renderer module (302) to execute all steps of the method according to claim 2.