WO2013121136A1

WO2013121136A1 - Transaural synthesis method for sound spatialization

Info

Publication number: WO2013121136A1
Application number: PCT/FR2013/050278
Authority: WO
Inventors: Franck Rosset; Jean-Luc HAURAIS
Original assignee: Franck Rosset; Haurais Jean-Luc
Priority date: 2012-02-13
Filing date: 2013-02-11
Publication date: 2013-08-22
Also published as: FR2986932B1; CN104160722B; BR112014019926A2; RU2014133066A; CN104160722A; IN2014DN06776A; HK1204188A1; KR20140128412A; EP2815589B1; FR2986932A1; JP2015510348A; RU2639955C2; EP2815589A1; JP6421385B2

Abstract

The present invention relates to a method for producing a digital spatialized stereo audio file from an original multichannel audio file, characterized in that it comprises: a step of performing a processing on each of the channels for cross-talk cancelation; a step of merging the channels in order to produce a stereo signal; and a dynamic filtering and specific equalization step for increasing the sound dynamics.

Description

TRANSAURAL SYNTHESIS METHOD FOR SOUND SPACE DELIVERY

FIELD OF THE INVENTION The present invention relates to the field of spacialized sound spatialization of sound signals, integrating in particular a room effect, particularly in the field of transaural techniques. The term "binaural" refers to the reproduction on a stereophonic headphones, or a pair of headphones or a pair of speakers, a sound signal with nevertheless spatialization effects. The invention is however not limited to the aforementioned technique and applies, in particular, to techniques derived from "binaural" such as "transaural" (commercial name) restitution techniques, that is to say on remote loudspeakers, installed for example in a concert hall or cinema with a multi-point sound system.

A specific application of the invention is, for example, the enrichment of audio content broadcast by a pair of speakers to immerse a listener in a spatialized sound scene, including in particular a room effect or outdoor space.

State of the art

For the implementation of "binaural" techniques on headphones or loudspeakers, the state of the art defines a transfer function, or filter, of a sound signal between a position of a sound source in the space and the two ears of a listener. The acoustic transfer function of the aforementioned head is designated HRTF for "IHead Related Transfer Function" in English in its form. frequency and HRIR for "JHead Related Impulse Response" in English in its temporal form. For one direction of space, we finally get two HRTFs: one for the right ear and one for the left ear.

In particular, the binaural technique consists in applying such acoustic transfer functions of the head to monophonic audio signals, in order to obtain a stereophonic signal which makes it possible, when listening to headphones, to have the feeling that the sources sounds come from a particular direction of space. The signal from the right ear is obtained by filtering the monophonic signal by the HRTF of the right ear and the left ear signal is obtained by filtering the same monophonic signal by the HRTF of the left ear.

When, in the spatial rendering, we take into account the fact, for the listener, to perceive sound sources more or less distant from the head, a phenomenon known as outsourcing, and this independently of the Direction of provenance of the sound sources, it happens frequently, in a binaural 3D rendering, that the sources are perceived inside the head by the listener. The source thus perceived is said to be not outsourced.

Various works have shown that the addition of a room effect in binaural 3D rendering methods makes it possible to considerably increase the outsourcing of sound sources.

The prior art is known from US patent application 2007 / 011025A describing a sound spatialization method comprising a step of determining an acoustic matrix for a real set of sound sources at a real location and a calculation step an acoustic matrix for transmitting an acoustic signal from a set of apparent sound sources at locations different from the actual locations of the listener. The method further includes a step of resolving a transfer function matrix for presenting to the listener an audio signal creating an audio image of sound emanating from the apparent source. Disadvantages of prior art

The solutions of the prior art are fixed and do not allow to choose a spatial atmosphere among several possible environments. They are usually based on a transformation matrix calculated from a virtual head.

The solutions of the prior art do not generally allow an impression of externalization of the sound environment.

Solution provided by the invention

Physical rooms and physical enclosures make it possible to calculate the filters that will be used to generate the multichannels.

DETAILED DESCRIPTION OF A NON-LIMITATIVE EXEMPLARY EMBODIMENT The present invention will be better understood on reading the description which follows, with reference to the appended drawings in which:

FIG. 1 represents a general block diagram of the installation intended for the phase of constructing the pulse signal database.

FIG. 2 represents a schematic view of the installation for acquiring pulse signals

- Figure 3 shows a block diagram of the listening installation. The method according to the invention comprises a first processing (1) of producing a database of pulse signals from the acquisition of acoustic signals in a plurality of physical spaces, by

recording signals produced by speakers

acoustically in response to a reference multifrequency signal.

Then, for each audio sequence to be spatialized, the method consists in applying a succession of

treatments:

when the signal to be spatialized is a stereo signal, the method comprises a preliminary stage (2) of constructing a signal N.i from the stereo signal

a step (3) of transforming the signal of each of the N.i channels from one of the impulse response files selected in the abovementioned database

a step (4) of recombining the signals of the N.i channels thus transformed in order to construct a spatialized stereo signal.

This stereo signal can then be broadcast by a pair of standard loudspeakers, to restore a spatialized sound environment corresponding to the space that has been used to produce the impulse response signals or a combination of such spaces. Initial stage of construction of the base of

impulse responses.

This step is replicated a plurality of times.

It is illustrated in Figure 2.

It consists, for each series of responses

impulses, to be arranged in a physical space such as a concert hall, an open or closed place, a given room, a series of known loudspeakers (5 to 11; 17) associated with an amplifier (14), preference of recognized quality, as well as a microphone pair (12, 13) whose position relative to the speaker series (5 to 11; 17) is fixed for the series being acquired.

Each of the speakers (5 to 11) is then successively applied to an original multifrequency signal using the amplifier (14). This original signal is for example a sequence of a duration of between 10 and 90 seconds, with a frequency variation in the sound spectrum. This signal is for example a linear variation between 20 Hz and 20 kHz, or any signal covering the entire spectrum of the enclosure.

The sound signal produced by the active speaker is picked up by the microphone torque (12, 13) and produces a recorded stereo signal. From this signal, 96 kHz sampling is carried out in a known manner and at a

deconvolution by fast Fourier transform between the original signal and the recorded signal, to construct an impulse response for the enclosure considered in

the physical space considered.

This step is reproduced for each of the speakers (5 to 11) of the series, then for different physical spaces where a series of identical or identical speakers is re-inserted.

different, with the same or different amplifier and identical microphones.

This first step leads to the construction of a database of stereo impulse responses.

Preparation stage of a spatialized signal

This step makes it possible to construct a spatial stereo audio signal from a multichannel signal N.i

corresponding to a traditional digital recording.

This step consists in selecting in the database formed during the initial step N + i impulse responses. The selection will consist in associating with each of the N + 1 signals one of the impulse responses of said database, ensuring that the acquisition position in the space of the impulse response corresponds to the position in the space of the channel with which it is associated.

For each pair "mono signal / stereo impulse response", a convolution processing is applied to calculate a stereo spatialized signal cut S _sG and S _sD .

N + i pairs of signals are thus produced

spatialized S ³ _sG and S ³ _sD , with J between 1 and N + i.

For example, if the starting record was of type 5.1, we will construct 6 pairs of spatialized signals.

Optionally, channel equalization is performed to improve the dynamics of the signals.

Construction of the spatial stereo signal

The final step is to recombine the signals to build a right and left signal pair

spatialized.

For this, we add the signals S ³ _sG

corresponding to the space on the left, to build the left channel of the spatialized stereo signal. The same _procedure is _followed for the S ³ _sD signals corresponding to the space on the right, to construct the right channel of the spatial stereo signal.

Optionally, the channels are equalized to improve the dynamics of the two channels. Case of a stereo start signal; increasing the number of channels and creating intermediate channels

When the signal to be spatialized is not of the Ni type but simply a stereo signal, a step is taken intermediate consisting of building a signal Ni by phase extraction treatments between the left and right track, to build different new signals.

This extraction by phase consists in producing a signal corresponding to a reconstructed central channel, by a processing consisting in adding the signal of the left channel with a signal of the straight line which is out of phase, for example in phase opposition.

To create the other "reconstructed" channels, the left and right tracks are phase-shifted, with different phase-shift angles, and the out-of-phase signal pairs are added, with determined weights.

empirically to restore a soundscape

spacialized.

In addition, frequency filters are applied to right and left signals when creating "reconstructed" channels, in order to increase signal dynamics and maintain high fidelity sound quality. Signal restitution

FIG. 3 represents a schematic view of the reproduction installation, from a pair of actual speakers (17, 18).

This pair of speakers (17, 18) receives a signal for simulating calculated speakers (20 to 27 and 30 to 37).

The effective number of calculated speakers (20 to 27) corresponds to the number of physical speakers (5 to 11; 17) used for the production of the pulse signal database, or to the number of virtual speakers reconstructed by the method referred to above.

Virtual speakers (30 to 37) are also created which produce a perception in the sound space of a combination of real neighboring speakers, to fill the sound holes.

These virtual speakers are created by a modification of the signal supplying the neighboring real speakers.

Fifteen sound files are produced, 8 (7.1) corresponding to the processing from the pulse signals, and 7 calculated by a combination of these fifteen files.

The signals are distributed according to their right, left or central component to produce a left signal (17) for the left speaker, and a right signal for the right speaker (18):

the "right" signal corresponds to the addition of the calculated "right" signals (21, 22, 23) and virtual "right" signals (30, 31, 32), as well as the "central" signals calculated (20, 27) and virtual (33) with an amplitude weighting of 50%

the "left" signal corresponds to the addition of the left calculated signals (24, 25, 26) and the left virtual signals (34, 35, 36), as well as the calculated central (20, 27) and virtual (33 ) with an amplitude weighting of 50%.

This stereo signal is then applied to a conventional audio equipment, connected to a pair of speakers (18, 19), which will reproduce a spatialized sound environment corresponding to the sound environment of the installation that was used to build the base impulse signals, or a virtual sound environment corresponding to the combination of several original atmospheres, where appropriate enriched with virtual atmospheres.

Claims

claims

1 - Process for producing a spatial stereo audio digital file from an original multichannel audio file, characterized in that it comprises:

A processing step, on each of the channels, for the deletion of cross paths cancelation

• a channel merge step to build a stereo signal

· A step of dynamic filtering and specific equalization for the increase of the dynamics of the sound.

2 - Process for the production of a spatial stereo audio digital file according to the main claim, characterized in that the step of eliminating the crossed paths consists in adding to the signal of each of the channels a signal corresponding to the phase-shifted and weighted signal of the other channels. . 3 - Process for the production of a digital spatial stereo audio file according to the main claim characterized in that the original signal is a native 5.n multichannel signal. 4 - Process for the production of a spatial stereo audio digital file according to the main claim characterized in that the original signal is a native multichannel signal 5.n calculated from a stereo signal.