WO2017211448A1 - Method for generating a two-channel signal from a single-channel signal of a sound source - Google Patents

Method for generating a two-channel signal from a single-channel signal of a sound source Download PDF

Info

Publication number
WO2017211448A1
WO2017211448A1 PCT/EP2017/000649 EP2017000649W WO2017211448A1 WO 2017211448 A1 WO2017211448 A1 WO 2017211448A1 EP 2017000649 W EP2017000649 W EP 2017000649W WO 2017211448 A1 WO2017211448 A1 WO 2017211448A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
signal
channels
channel
virtual
Prior art date
Application number
PCT/EP2017/000649
Other languages
French (fr)
Inventor
Carlos Valenzuela
Original Assignee
Valenzuela Holding Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valenzuela Holding Gmbh filed Critical Valenzuela Holding Gmbh
Publication of WO2017211448A1 publication Critical patent/WO2017211448A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems

Definitions

  • the present invention relates to a method for generating a two-channel signal from a single-channel signal of a sound source.
  • Telephone or video conferences with several participants are oftentimes conducted via personal computers or smartphones. Such devices usually dispose of two speakers so that spatial sound effects could be generated.
  • the recording of the individual participants is made by means of simple microphones as they are integrated into computers or smartphones, so that only a monophonic single- channel signal is achieved.
  • No spatial or directivity information is available for the reproduction of the signals.
  • a sophisticated spatialized acoustical impression cannot be achieved when the signals are reproduced.
  • the object of the invention is to provide a method for transformation of a monophonic single-channel signal into a two-channel signal so that a reproduction of the signal with a virtual spatial character including a virtual principal radiation direction can be achieved.
  • the virtual principal radiation direction of an emitted signal is defined herein as the main direction of emission of the reproduced sound source, i.e. the principal radiation direction of the reproduced sound source that has a directional radiation characteristic.
  • Many sound sources do not have an omnidirectional radiation pattern, but have a directional radiation characteristic, i.e. a radiation pattern that has a distinctive principal radiation direction.
  • a human talker has a directional characteristic with a distinctive principal radiation direction which corresponds to the facing direction of the talker.
  • the object of the invention is achieved with a method for generating a two- channel signal from a single-channel signal of a sound source for simulating the position and the directional radiation characteristic of a reproduced sound source and for simulating the principal radiation direction of an emitted signal.
  • the method is characterized in that the single-channel signal is split into two first channels and two second channels, wherein splitting into the two first channels is conducted using stereophonic techniques so that a virtual position of the reproduced sound source is achieved, and wherein splitting into the two second channels is conducted by delaying the single-channel signal in both second channels with a time delay ⁇ to generate a virtual directional radiation characteristic of the reproduced sound source, i.e. to generate a directivity, and wherein the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel to generate direction information.
  • the invention provides a method that allows generating a signal, which provides virtual spatial information including a virtual principal radiation direction, from a single-channel signal that does not provide such information.
  • the method can e.g. be used for a telephone or video conference where the sound signals of the participants are recorded with a single microphone per participant. Such a situation is typical e.g. if usual smartphones are used for a conference.
  • the first channels are a first left channel and a first right channel
  • the second channels are a second left channel and a second right channel.
  • the first left channel is added to the second left channel to create a left output channel
  • the first right channel is added to the second right channel to create a right output channel.
  • the monophonic single-channel signal is split into two first channels by using well-known stereophonic techniques.
  • the resulting two-channel signal contains virtual sound source position information.
  • the method may be used in particular in case of a conference with several participants.
  • a virtual position may be allotted to each of the participants whose sound signals are recorded in a monophonic signal per participant.
  • the virtual position information is added to the signals at the reproduction site, and a participant who is using e.g. a smartphone, a headset, loudspeakers integrated into a computer, a laptop or a monitor with speakers is provided with a two-channel signal where each of the participants has a virtual position, preferably different spatially separated positions.
  • the sound quality and the identification of different participants can thus be enhanced significantly.
  • the generated virtual sound source position is created artificially and does in most cases not correspond with a physical position of a real sound source such as a loudspeaker.
  • stereophonic techniques in particular the following techniques can be used: different stereophonic phantom source techniques, HRTF for headphones or loudspeakers, or positioning with specific HRTF approximations as e.g. disclosed in EP 0 357 402.
  • Stereo widening techniques may be used to virtually spread the spatial position range, in particular in case of several reproduced sound sources.
  • the loudspeakers of an ordinary laptop or a headset are used, it is advantageous to provide a larger range of virtual sound source position than the range between the two loudspeakers.
  • Well-known stereo widening techniques provide a solution in such cases.
  • Different position creation techniques may be used for different spatial regions, e.g. for middle and outer regions.
  • stereo widening techniques may be used for outer regions.
  • a directional radiation characteristic of the sound source which is reproduced at the virtual position may be generated by creating two further second channels.
  • the two second channels are generated by delaying the single-channel signal in both second channels with a time delay ⁇ to generate a virtual directional radiation characteristic of the reproduced sound source.
  • Typical delays ⁇ are in the range between 2 ms and 100 ms at the listener. They may be constant in time.
  • the time delay ⁇ generates directivity, i.e. a directional radiation characteristic for the reproduced sound source.
  • the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel to generate a virtual principal radiation direction of the emitted signal. For example, in order to simulate a virtual principal radiation direction of the emitted signal which is directed towards the right side, the gain of the left second channel may be reduced to a value around 0 and the gain of the right second channel may be set to a value around 1.
  • Figure 1 shows a first embodiment of a system to implement the method in accordance with the invention
  • Figure 2 shows the system of figure 1 implemented for several sound sources
  • Figure 3 shows details of the first embodiment
  • FIG. 4 - 8 illustrate aspects of several embodiments of the present invention by way of example.
  • Figure 1 shows a system to implement the method in accordance with the invention.
  • a sound source has been recorded, and a monophonic single-channel signal TS is supplied.
  • the signal shall be reproduced by means of the two speakers l_s P k and Rspk.
  • Additional virtual position information P and directivity information D are provided. This information is not related to the actual position or directivity of the sound source.
  • Directivity information D which may be time-variable, comprises information concerning the principal radiation direction into which the signal shall be emitted.
  • the sound signal TS together with the position information P is fed into a sound source position generator PG and is split into a signal for the right and a signal for the left first channels LPTS, RPTS using well-known stereophonic techniques, taking account of the position information P.
  • the sound signal TS is further fed into a sound source directivity generator DG where it is split into a signal for the right and a signal for the left second channels L DT S and R DT S-
  • the single-channel signal TS is delayed by a time ⁇ to create the second channel signals LDTS and R DT S.
  • the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel.
  • the gain gs of the sound source at the virtual position which is generated by the position generator PG, may be adjusted by the directivity information D.
  • the gain gs which is adjusted in such a way that the perceivable loudness differences due to varying principal radiation directions of the sound source can be appropriately approximated, is fed into the position generator PG.
  • FIG. 1 shows a system which is in principle similar to the system of figure 1 but for use with a multitude of sound sources. For each sound source, a sound signal TS1 , TS2, TSN is provided, together with individual position information P1 , P2, ... PN and directivity information D1 , D2, DN.
  • a sound signal TS1 , TS2, TSN is provided, together with individual position information P1 , P2, ... PN and directivity information D1 , D2, DN.
  • FIG. 3 shows the details of the sound source directivity generator DG of the first embodiment of a system in accordance with the invention.
  • the incoming sound signal TS is fed into position generator PG, and a signal providing a virtual position information as described above is generated and forwarded to the speakers
  • the incoming sound signal TS is also fed into the directivity generator DG.
  • the directivity generator In the directivity generator, the signal is split into two signals, one for each of the speakers.
  • the direction information D is applied to the directivity generator DG and provides for different gains gL, gi* for the two second channels LDTS and RDTS with which different time-variable virtual principal radiation directions are generated.
  • the general directivity characteristic is achieved by application of a time difference ⁇ applied to both channels.
  • the signal of one of the second channels is in addition inverted in order to eliminate comb-filter effects.
  • the optional adjustment of the gain gs of the sound source at the virtual position, with which the perceived loudness of the virtual sound source according to the current principal radiation direction may be adjusted, is not considered in figure 3.
  • Further embodiments of the present invention are described in the following sections by means of example. The embodiments shall in no way limit the scope of the invention as described in the whole description and as claimed in the claims.
  • Figure 4A shows a basic concept for a single sound source.
  • Figure 4B shows a basic concept for multiple sound sources that might be emitting sound simultaneously or sequentially.
  • Figure 5A shows a signal processing method for a sound source TS performed by the Sound Source Directivity Generator.
  • Figure 5B shows a signal processing method for a sound source TS performed by the Sound Source Directivity Generator with additional control of the gain gS of the virtual sound source.
  • Figure 6A - 6C show gain specifications for the virtual sound source and for the delayed left and right directivity signal paths, wherein Figure 6A shows the gain for a virtual sound source, figure 6B shows the gain for the delayed left directivity signal path, and figure 6C shows the gain for the delayed right directivity signal path.
  • Figures 7A and 7B show the specific signal processing method performed by the Sound Source Directivity Generator depending on the virtual sound source position, wherein Figure 7A shows the far-left or leftmost sound source and Figure 7B shows the far-right or rightmost sound source.
  • Figure 8A and 8B shows an alternative signal processing method performed by the Sound Source Directivity Generator depending on the virtual sound source position wherein Figure 7A shows the far-left or leftmost sound source and Figure 7B shows the far-right or rightmost sound source.
  • the main purpose of the following embodiments of the invention is to provide an audio processing apparatus that creates from a monophonic audio input signal, on a two-channel stereo playback system, a sound source with an adjustable virtual location in space and an adjustable virtual sound source directivity.
  • the sound source directivity is specified herein as the main direction of emission of the sound source, i.e. the principal radiation direction of a sound source that has a directional radiation pattern.
  • Many sound sources do not have an omni- directional radiation pattern, but have a directional radiation pattern, i.e. a radiation pattern that has a distinctive principal radiation direction.
  • a human talker has a directional characteristic with a distinctive principal radiation direction which corresponds to the facing direction of the talker.
  • a trumpet has a distinctive principal radiation direction which corresponds to the orientation of the trumpet.
  • One purpose of the present invention is to provide an audio processing apparatus that is capable of providing this important acoustic cue, i.e. the principal radiation direction of a sound source, by way of conventional two-channel stereo playback equipment, such as two-channel stereo loudspeaker systems, stereo headphones or stereo headsets.
  • the audio processing apparatus creates a virtual sound source directivity which simulates the principal radiation direction of the sound source thus allowing a listener to perceive the sound source orientation, i.e. in the case of a person talking to perceive the facing direction of the talker.
  • the audio processing apparatus positions the monophonic audio signal of the sound source in a virtual location in space, allowing the listener to localize the talker in space.
  • the audio processing apparatus is not only capable of providing, on a conventional two-channel stereo playback system, an adjustable virtual location in space and an adjustable virtual sound source directivity for one sound source, but also for multiple sound sources, which may be emitting sound simultaneously or sequentially.
  • the quality and efficiency of communication can be significantly improved due to the following reasons: [0048] By providing an adjustable virtual location for each remote talker, the listener is enabled to better separate and identify different talkers.
  • the listener is enabled to distinguish who is speaking to whom, and is thus enabled to follow the conversation in a much more efficient manner.
  • speech intelligibility is significantly improved by providing the listener with information abdut the sound source directivity of each talker.
  • a Sound Source Position Generator that generates an adjustable virtual sound source location based on a position signal P.
  • the position signal P can be either provided separately or can be encoded as metadata in the tone signal.
  • a Sound Source Directivity Generator that generates an adjustable virtual sound source directivity based on a directivity signal D.
  • the directivity signal D can be either provided separately or can be encoded as metadata in the tone signal.
  • headphones or stereo headsets that are spaced apart, which reproduce the output signals of the two combiners.
  • the audio processing method of fig. 5A comprises the following steps:
  • conventional two-channel stereo playback system such as two-channel stereo loudspeaker systems, stereo headphones or stereo headsets.
  • the four common basic principles of such stereophonic methods are (a) introducing a delay between the tone signal on the left and the right channel in order to position the virtual sound source further to the left or to the right of the stereo playback system, (b) introducing an amplitude difference between the tone signal on the left and the right channel in order to position the virtual sound source further to the left or to the right of the stereo playback system, (c) introducing both, delay and amplitude differences between the tone signal on the left and the right channel, or (d) employing head-related transfer functions (HRTF) or approximations of HRTFs (e.g.
  • HRTF head-related transfer functions
  • EP 0357402 or an ear canal resonance model with bandpass filter which models a sound source at a specific location) to the tone signal on the left and the right channel in order to position the virtual sound source in the desired position.
  • the present invention employs any one of the well-known stereophonic methods to place the sound source in a virtual spatial position, whereby the desired virtual spatial position is given by the position signal P. There is, therefore, no difference between the present invention and the prior art with respect to the method of how the virtual sound source position is generated on a two-channel stereo system.
  • the present application also employs commonly known stereo-widening techniques to enlarge the perceived spatial extent of the possible sound source positions.
  • One objective of the present invention is to provide an audio processing method that is capable of creating an adjustable virtual sound source directivity on a conventional two-channel stereo playback system, while at the same time also generating an adjustable virtual sound source position.
  • None of the known prior art audio systems is capable of generating a sound source directivity, which can be adjusted to any principal radiation direction of 180° around the desired sound source position, with only two audio channels.
  • the audio systems mentioned in (a) and (b) above always require more than two audio channels, i.e. more than two stereo output transducers (i.e. multiple loudspeakers).
  • the audio system mentioned in (c) requires two second reproduction units WE2 in addition to a first reproduction unit WE1 in order to reproduce any principal radiation direction of 180° around the desired sound source position which is reproduced by the first reproduction unit WE1 , whereby one of the second reproduction units WE2 is positioned on one side and the other on the other side of the first reproduction unit WE1.
  • the system of the present invention differs from the state of the art in that only two audio channels, i.e. two stereo output transducers such as two stereo loudspeakers, binaural headphones or binaural headsets, are required to generate an adjustable virtual sound source directivity, while also creating an adjustable virtual sound source position.
  • two stereo output transducers such as two stereo loudspeakers, binaural headphones or binaural headsets
  • the advantage of the present invention is that a simple, conventional two- channel stereo playback system can be used to provide a listener with an adjustable virtual sound source position and an adjustable virtual sound source directivity from a monophonic input tone signal of a sound source.
  • the term "virtual” used in the expressions "virtual sound source position" and “virtual sound source directivity” has the following meaning: [0068]
  • the sound source position created by the audio processing apparatus of the present invention is a virtual sound source position. This means, that there is no physical sound source, as for example a loudspeaker, at the perceived position of the sound source. The perceived position of the sound source in space is not related to the position of a real physical sound source in this space.
  • the sound source directivity created by the audio processing apparatus of the present invention is a virtual sound source directivity. This means, that the sound source directivity, i.e. the principal radiation direction of the sound source, is only simulated to provide a listener with a perceivable principal radiation direction (for example a speaking direction of a human talker), without it being actually physically directed in the conventional sense.
  • Fig.4A shows a basic concept of the present invention for a single sound source.
  • a time-variable tone signal TS is provided as audio input signal.
  • the tone signal TS is the monophonic audio signal which corresponds to a sound source, such as for example the audio signal of a speaking person.
  • the monophonic audio signal could be the transmitted audio signal of a remote talker in an audioconference or a web- or videoconference.
  • a time-variable position input signal P is provided which specifies in a time-variable manner the desired virtual position of the sound source in space
  • a time-variable directivity input signal D is provided which specifies the virtual sound source directivity in a time-variable manner.
  • the position signal P and/or the directivity signal D can be provided in many different ways, such as for example embedded in the tone signal TS, encoded as metadata in the tone signal TS, combined in a data signal, combined in a separate signal, or simply provided as separate signals.
  • the audio processing apparatus of the present invention creates a stereophonic sound source, with an adjustable virtual location in space (according to the position signal P) and an adjustable virtual sound source directivity (according to the directivity signal D), which can be reproduced on any conventional stereophonic playback system (i.e. any system using two or more independent audio channels through a configuration of two or more loudspeakers), in particular on any conventional two-channel stereo playback system comprising two stereo output transducers (e.g. systems with stereo loudspeakers, stereo headphones or stereo headsets).
  • any conventional stereophonic playback system i.e. any system using two or more independent audio channels through a configuration of two or more loudspeakers
  • any conventional two-channel stereo playback system comprising two stereo output transducers (e.g. systems with stereo loudspeakers, stereo headphones or stereo headsets).
  • the audio processing apparatus comprises the following two important processing units with which the stereophonic sound source is created based on the audio input signal TS, the position input signal P and the directivity input signal D: (a) a Sound Source Position Generator, and (b) a Sound Source Directivity Generator.
  • the Sound Source Position Generator generates an adjustable virtual sound source location based on the position signal P.
  • Common stereophonic methods as mentioned earlier, are employed to place the sound source in a virtual spatial position, whereby the desired virtual spatial position is given by the position signal P.
  • the Sound Source Generator also employs commonly known stereo-widening techniques to enlarge the perceived spatial extent of the possible sound source positions.
  • the Sound Source Directivity Generator generates an adjustable virtual sound source directivity based on a directivity signal D.
  • the details of the signal processing performed by the Sound Source Directivity Generator are described in Fig.5A.
  • the Sound Source Directivity Generator may optionally provide a directivity-specific gain gS to the Sound Source Position Generator with which the sound level of the virtual sound source is processed.
  • Both, the Sound Source Position Generator and the Sound Source Directivity Generator generate each a left channel output and a right channel output.
  • the respective left and right output signals from the Sound Source Position Generator (LPTS, RPTS) and the Sound Source Directivity Generator (LDTS, RDTS) are added by two separate combiners (adders in Fig.4A) [0076]
  • the output signals of the two combiners are then reproduced by two stereo output transducers LSpk and RSpk which may be, for example/ a system with stereo loudspeakers, a stereo headphone, or a stereo headset.
  • Systems with stereo loudspeakers may include, but are not limited to, high-fidelity two-channel stereo playback equipment, surround sound systems, mobile devices such as phones, tablets, PC's, MP3-players that have stereo loudspeakers, etc.
  • the basic concept of the audio processing apparatus described for a single sound source can be adapted in order to create multiple stereophonic sound sources with adjustable positions and directivities, whereby the multiple sound sources may be emitting sound simultaneously or sequentially.
  • the present invention is used to enhance the communication quality and efficiency of an audio-, web- or videoconferencing, then multiple speaking persons with different spatial positions and speaking directions, which may be speaking sequentially or simultaneously, have to be generated by the audio processing apparatus.
  • the audio processing apparatus comprises N time-variable monophonic audio input signals TS1 to TSN, N corresponding time-variable position input signals P1 to PN, N corresponding time- variable directivity input signals D1 to DN, N corresponding Sound Source Position Generators, and N corresponding Sound Source Directivity Generators which provide N corresponding directivity-specific gains gSN to the corresponding Sound Source Position Generators.
  • the N Sound Source Position Generators and the N Sound Source Directivity Generators can be implemented as only one Sound Source Position Generator, which creates the virtual position of the sound sources separately for each sound source based on the corresponding audio input signal TSN and the corresponding position signal PN, and only one Sound Source Directivity Generator, which generates the virtual sound source directivity separately for each sound source based on the corresponding audio input signal TSN and the corresponding directivity signal DN.
  • a conferencing setup it is often sufficient to provide a limited number of different possible azimuths and/or elevations for the virtual sound source positions of different talkers. For example, it may be enough to provide 3 to 5 distinctly perceivable virtual sound source positions along the azimuth. If more than 3 to 5 remote participants, which are potential talkers, are participating in the conferencing setup, then a dynamic mapping of the current talker or talkers to the limited number of possible perceivable virtual sound source positions might be employed. The same applies to the distribution of virtual sound source positions along the elevation where, for example, it is often sufficient . to provide only 1-3 distinctly perceivable virtual sound source positions along the elevation.
  • Fig.5A shows the basic concept of the audio processing apparatus for a single sound source and highlights the details of the signal processing method that is performed by the Sound Source Directivity Generator.
  • the Sound Source Directivity Generator uses the monophonic audio input signal TS of the sound source to generate a left and a right delayed and attenuated version of the tone signal TS (hereinafter referred to as the left and right delayed directivity signal path), whereby the attenuation is based on the directivity input signal D.
  • the delayed versions of the tone signal TS are not perceived as separate sound events but serve only to simulate the sound source directivity, i.e. the principal radiation direction of the sound source.
  • the predetermined delay ⁇ may be the same or approximately the same (within ⁇ 3 ms) in both signal paths, and is chosen to be between 2 ms and 100 ms, preferably between 5 ms and 80 ms, and in particular between 7 ms and 25 ms.
  • the attenuation, or in other words the gain gl_ of the left signal path and the gain gR of the right signal path of the delayed versions is controlled by the time- variable directivity input signal D which specifies the desired virtual principal radiation direction of the sound source.
  • the gain gl_ of the left delayed directivity signal path will be reduced to a value around 0 and the gain gR of the right delayed directivity signal path will be set to a value around 1.
  • An additional feature of the present embodiment of the invention is that one of the delayed signal paths, i.e. one of the left and right delayed directivity signal paths generated by the Sound Source Directivity Generator, is inverted, that is the sound signal of that path is multiplied by -1 , and therefore the polarity of the amplitude is changed (depicted by the item INV in Fig.5A).
  • the purpose of this feature is to improve the sound quality by reducing perceivable comb-filter effects which arise due to the reduction of the number of audio channels to only two audio channels which have to carry multiple correlated version of the same sound in order to create a virtual sound source position and a virtual sound source directivity.
  • Fig.5B shows one embodiment in which the time-variable directivity input signal D also controls the gain gS of the virtual sound source that is generated by the Sound Source Position Generator.
  • the virtual sound source generated by the Sound Source Position Generator at a desired virtual spatial location given by the time-variable position input signal P is perceived as a clearly localizable sound event.
  • the gain gS of the virtual sound source, and thus the perceived sound level of the virtual sound source is controlled by the time-variable directivity input signal D in such a way that the perceived sound level corresponds to the desired virtual principal radiation direction of the sound source.
  • the position signal P and/or the directivity signal D in Fig.5A and Fig.5B can be provided in many different ways, such as for example embedded in the tone signal TS, encoded as metadata in the tone signal TS, combined in a data signal, combined in a separate signal, or simply provided as separate signals. This is depicted by the dashed lines for the position input signal P and the directivity input signal D.
  • the gain of the virtual sound source gS, and the gains gL and gR of the delayed left and right directivity signal paths of the Sound Source Directivity Generator may be adapted according to the desired directivity D.
  • the gains are adjusted in such a way that the perceivable loudness differences due to varying principal radiation directions of the sound source can be appropriately approximated.
  • the gains thus determined can be stored for different directivity input signals D.
  • Fig. 6 shows examples for gain functions gS, gL and gR in dependence of the directivity input signal D, i.e. the input signal that specifies the desired principal radiation direction of the virtual sound source.
  • D directivity input signal
  • Such gain functions can be stored within the Sound Source Directivity Generator for controlling the generation of the adjustable virtual sound source directivity.
  • the gains of the respective signal paths are adjusted according to the stored default value.
  • the basic signal processing methods shown in Fig.5A and Fig.5B can be supplemented with the following extension in order to further improve the realism of the simulated principal radiation direction of the sound source:
  • the audio signals in the left and right delayed directivity signal paths of the Sound Source Directivity Generator may be additionally processed by a frequency filter in each path, such as a high-pass, low-pass or band-pass filter, whereby in most cases the same filter characteristics will be applied to both paths.
  • the parameters of both frequency filters can be either fixed in advance or be controlled by the directivity input signal D.
  • the features described with respect to Fig.5A, Fig.5B and Fig.6 as well as the described supplemental extension can of course be applied to multiple sound sources as shown in Fig.5B, i.e. to multiple audio input signals TSN with their corresponding position input signals PN and directivity input signals DN.
  • the time delay ⁇ applied by the Sound Source Directivity Generator to one sound source may be the same or approximately the same (within ⁇ 3 ms) in the left and right delayed directivity signal path.
  • the delay ⁇ employed for the different monophonic sound sources TS1 to TSN may vary between the different sound sources.
  • it is beneficial to employ different time delays ⁇ when using different techniques for generating the virtual sound source positions For example, if a stereophonic technique is employed that uses inter-channel level or time-differences, then a shorter time delay ⁇ may be chosen than if a stereophonic technique is employed that uses HRTFs or stereo-widening techniques.
  • the time delay ⁇ may also be chosen differently depending on which stereophonic playback system is used for the reproduction. If a stereophonic system is used for the reproduction which comprises two stereo loudspeakers that are widely spaced apart, such as high-fidelity stereo systems or surround sound systems, then a larger time delay ⁇ may be employed than for the reproduction on a small stereo system incorporated in a mobile device. An even smaller time delay than the one used for small stereo systems might be used for reproduction on stereo headphones or stereo headsets.
  • the Sound Source Directivity Generator may employ the signal processing method described above in the following specific way, as shown in Fig.7A and B:
  • this specific signal processing method of the Sound Source Directivity Generator as depicted in Fig.4A and B may be employed to the leftmost and to the rightmost sound source.
  • Fig.8A and B show an alternative signal processing method performed by the Sound Source Directivity Generator which is dependent on where the virtual position of the sound source is located with respect to the two-channel reproduction system: [0101] (A) If the virtual sound source position in space, given by the corresponding position input signal P, is located to the far left side of the stereo playback system, then the Sound Source Directivity Generator will remove the left delayed directivity signal path and generate only the inverted audio signal on the right delayed directivity signal path (see Fig.8A).
  • this alternative signal processing method of the Sound Source Directivity Generator as depicted in Fig.5A and B may be employed to the leftmost and to the rightmost sound source.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Stereophonic System (AREA)

Abstract

The present invention relates to a method for generating a two-channel signal from a single-channel signal of a sound source. The object of the invention is to provide a method for transformation of a monophonic single-channel signal into a two-channel signal so that a reproduction of the signal with a virtual spatial character including a virtual principal radiation direction can be achieved. The object of the invention is achieved with a method for generating a two-channel signal from a single-channel signal of a sound source for simulating the position and the directional radiation characteristic of a reproduced sound source and for simulating the principal radiation direction of an emitted signal. The method is characterized in that the single-channel signal is split into two first channels and two second channels, wherein splitting into the two first channels is conducted using stereophonic techniques so that a virtual position of the reproduced sound source is achieved, and wherein splitting into the two second channels is conducted by delaying the single-channel signal in both second channels with a time delay τ to generate a virtual directional radiation characteristic of the reproduced sound source, i.e. to generate a directivity, and wherein the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel to generate direction information.

Description

Method for generating a two-channel signal from a single-channel signal of a sound source
[0001] The present invention relates to a method for generating a two-channel signal from a single-channel signal of a sound source. [0002] Telephone or video conferences with several participants are oftentimes conducted via personal computers or smartphones. Such devices usually dispose of two speakers so that spatial sound effects could be generated. However, as long as no complex and expensive conferencing systems are used, the recording of the individual participants is made by means of simple microphones as they are integrated into computers or smartphones, so that only a monophonic single- channel signal is achieved. No spatial or directivity information is available for the reproduction of the signals. A sophisticated spatialized acoustical impression cannot be achieved when the signals are reproduced. [0003] The object of the invention is to provide a method for transformation of a monophonic single-channel signal into a two-channel signal so that a reproduction of the signal with a virtual spatial character including a virtual principal radiation direction can be achieved. The virtual principal radiation direction of an emitted signal is defined herein as the main direction of emission of the reproduced sound source, i.e. the principal radiation direction of the reproduced sound source that has a directional radiation characteristic. Many sound sources do not have an omnidirectional radiation pattern, but have a directional radiation characteristic, i.e. a radiation pattern that has a distinctive principal radiation direction. For example, a human talker has a directional characteristic with a distinctive principal radiation direction which corresponds to the facing direction of the talker.
[0004] The object of the invention is achieved with a method for generating a two- channel signal from a single-channel signal of a sound source for simulating the position and the directional radiation characteristic of a reproduced sound source and for simulating the principal radiation direction of an emitted signal. The method is characterized in that the single-channel signal is split into two first channels and two second channels, wherein splitting into the two first channels is conducted using stereophonic techniques so that a virtual position of the reproduced sound source is achieved, and wherein splitting into the two second channels is conducted by delaying the single-channel signal in both second channels with a time delay τ to generate a virtual directional radiation characteristic of the reproduced sound source, i.e. to generate a directivity, and wherein the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel to generate direction information.
[0005] One of the first channels is then added to one of the second channels to create a first output channel, and the other of the first channels is added to the other of the second channels to create a second output channel. [0006] The invention provides a method that allows generating a signal, which provides virtual spatial information including a virtual principal radiation direction, from a single-channel signal that does not provide such information. The method can e.g. be used for a telephone or video conference where the sound signals of the participants are recorded with a single microphone per participant. Such a situation is typical e.g. if usual smartphones are used for a conference.
[0007] Preferably, the first channels are a first left channel and a first right channel, and the second channels are a second left channel and a second right channel. The first left channel is added to the second left channel to create a left output channel, and the first right channel is added to the second right channel to create a right output channel.
[0008] The monophonic single-channel signal is split into two first channels by using well-known stereophonic techniques. The resulting two-channel signal contains virtual sound source position information.
[0009] The method may be used in particular in case of a conference with several participants. A virtual position may be allotted to each of the participants whose sound signals are recorded in a monophonic signal per participant. The virtual position information is added to the signals at the reproduction site, and a participant who is using e.g. a smartphone, a headset, loudspeakers integrated into a computer, a laptop or a monitor with speakers is provided with a two-channel signal where each of the participants has a virtual position, preferably different spatially separated positions. The sound quality and the identification of different participants can thus be enhanced significantly.
[0010] The generated virtual sound source position is created artificially and does in most cases not correspond with a physical position of a real sound source such as a loudspeaker.
[0011] As stereophonic techniques, in particular the following techniques can be used: different stereophonic phantom source techniques, HRTF for headphones or loudspeakers, or positioning with specific HRTF approximations as e.g. disclosed in EP 0 357 402.
[0012] Stereo widening techniques may be used to virtually spread the spatial position range, in particular in case of several reproduced sound sources. E.g. in case the loudspeakers of an ordinary laptop or a headset are used, it is advantageous to provide a larger range of virtual sound source position than the range between the two loudspeakers. Well-known stereo widening techniques provide a solution in such cases.
[0013] Different position creation techniques may be used for different spatial regions, e.g. for middle and outer regions. In particular, stereo widening techniques may be used for outer regions.
[0014] A directional radiation characteristic of the sound source which is reproduced at the virtual position may be generated by creating two further second channels. The two second channels are generated by delaying the single-channel signal in both second channels with a time delay τ to generate a virtual directional radiation characteristic of the reproduced sound source. [0015] Typical delays τ are in the range between 2 ms and 100 ms at the listener. They may be constant in time. The time delay τ generates directivity, i.e. a directional radiation characteristic for the reproduced sound source. [0016] Furthermore, the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel to generate a virtual principal radiation direction of the emitted signal. For example, in order to simulate a virtual principal radiation direction of the emitted signal which is directed towards the right side, the gain of the left second channel may be reduced to a value around 0 and the gain of the right second channel may be set to a value around 1.
[0017] In order to avoid comb-filter effects it may be advisable to invert the signal of one of the two second channels. [0018] An embodiment of the invention is described in more detail with regard to the attached figures which show:
[0019] Figure 1 shows a first embodiment of a system to implement the method in accordance with the invention;
[0020] Figure 2 shows the system of figure 1 implemented for several sound sources;
[0021] Figure 3 shows details of the first embodiment;
[0022] Figures 4 - 8 illustrate aspects of several embodiments of the present invention by way of example.
[0023] Figure 1 shows a system to implement the method in accordance with the invention.
[0024] A sound source has been recorded, and a monophonic single-channel signal TS is supplied. The signal shall be reproduced by means of the two speakers l_sPk and Rspk. Additional virtual position information P and directivity information D are provided. This information is not related to the actual position or directivity of the sound source. Directivity information D, which may be time-variable, comprises information concerning the principal radiation direction into which the signal shall be emitted.
[0025] The sound signal TS, together with the position information P is fed into a sound source position generator PG and is split into a signal for the right and a signal for the left first channels LPTS, RPTS using well-known stereophonic techniques, taking account of the position information P.
[0026] The sound signal TS is further fed into a sound source directivity generator DG where it is split into a signal for the right and a signal for the left second channels LDTS and RDTS- In order to generate a directional radiation characteristic for the sound source which is reproduced at the virtual position, the single-channel signal TS is delayed by a time τ to create the second channel signals LDTS and RDTS. Depending on the virtual directivity to be generated, i.e. depending on the virtual principal radiation direction into which the signal shall be emitted (given by D), the signal of one of the second channels is processed with a different gain compared to the gain in the other second channel.
[0027] Optionally, in order to provide a more realistic simulation of the virtual principal radiation direction of the emitted signal, also the gain gs of the sound source at the virtual position, which is generated by the position generator PG, may be adjusted by the directivity information D. For this purpose, the gain gs, which is adjusted in such a way that the perceivable loudness differences due to varying principal radiation directions of the sound source can be appropriately approximated, is fed into the position generator PG.
[0028] The left and right output channels created by the sound source directivity generator DG (i.e. second channels LDTS and RDTS) are added by two separate combiners to the respective left and right output channels of the position generator PG (i.e. first channels LPTS and RPTS). [0029] Figure 2 shows a system which is in principle similar to the system of figure 1 but for use with a multitude of sound sources. For each sound source, a sound signal TS1 , TS2, TSN is provided, together with individual position information P1 , P2, ... PN and directivity information D1 , D2, DN.
[0030] The resulting signals of speakers LsPk and Rspk provide virtual spatial information that include different virtual positions for the different sound sources and different, time variable virtual principal radiation directions. [0031] Figure 3 shows the details of the sound source directivity generator DG of the first embodiment of a system in accordance with the invention. The incoming sound signal TS is fed into position generator PG, and a signal providing a virtual position information as described above is generated and forwarded to the speakers
Lspk, Rspk-
[0032] The incoming sound signal TS is also fed into the directivity generator DG. In the directivity generator, the signal is split into two signals, one for each of the speakers. The direction information D is applied to the directivity generator DG and provides for different gains gL, gi* for the two second channels LDTS and RDTS with which different time-variable virtual principal radiation directions are generated. The general directivity characteristic is achieved by application of a time difference τ applied to both channels. The signal of one of the second channels is in addition inverted in order to eliminate comb-filter effects. [0033] The optional adjustment of the gain gs of the sound source at the virtual position, with which the perceived loudness of the virtual sound source according to the current principal radiation direction may be adjusted, is not considered in figure 3. [0034] Further embodiments of the present invention are described in the following sections by means of example. The embodiments shall in no way limit the scope of the invention as described in the whole description and as claimed in the claims.
[0035] Figure 4A shows a basic concept for a single sound source. [0036] Figure 4B shows a basic concept for multiple sound sources that might be emitting sound simultaneously or sequentially. [0037] Figure 5A shows a signal processing method for a sound source TS performed by the Sound Source Directivity Generator.
[0038] Figure 5B shows a signal processing method for a sound source TS performed by the Sound Source Directivity Generator with additional control of the gain gS of the virtual sound source.
[0039] Figure 6A - 6C show gain specifications for the virtual sound source and for the delayed left and right directivity signal paths, wherein Figure 6A shows the gain for a virtual sound source, figure 6B shows the gain for the delayed left directivity signal path, and figure 6C shows the gain for the delayed right directivity signal path.
[0040] Figures 7A and 7B show the specific signal processing method performed by the Sound Source Directivity Generator depending on the virtual sound source position, wherein Figure 7A shows the far-left or leftmost sound source and Figure 7B shows the far-right or rightmost sound source.
[0041] Figure 8A and 8B shows an alternative signal processing method performed by the Sound Source Directivity Generator depending on the virtual sound source position wherein Figure 7A shows the far-left or leftmost sound source and Figure 7B shows the far-right or rightmost sound source.
[0042] The main purpose of the following embodiments of the invention is to provide an audio processing apparatus that creates from a monophonic audio input signal, on a two-channel stereo playback system, a sound source with an adjustable virtual location in space and an adjustable virtual sound source directivity.
[0043] The sound source directivity is specified herein as the main direction of emission of the sound source, i.e. the principal radiation direction of a sound source that has a directional radiation pattern. Many sound sources do not have an omni- directional radiation pattern, but have a directional radiation pattern, i.e. a radiation pattern that has a distinctive principal radiation direction. For example, a human talker has a directional characteristic with a distinctive principal radiation direction which corresponds to the facing direction of the talker. Also a trumpet has a distinctive principal radiation direction which corresponds to the orientation of the trumpet.
[0044] Research has shown that human listeners are surprisingly skilled at determining the orientation of a directed sound source. Especially in communication the facing direction of a talking person is important for the interaction in face-to-face dialogues. In discussions with multiple participants, the information about who is talking to whom, which is encoded in the principal radiation direction of the sound source, is of utmost importance for efficient communication. It is, therefore, important to replicate this important acoustic cue in audioconferencing and/or videoconferencing applications.
[0045] One purpose of the present invention is to provide an audio processing apparatus that is capable of providing this important acoustic cue, i.e. the principal radiation direction of a sound source, by way of conventional two-channel stereo playback equipment, such as two-channel stereo loudspeaker systems, stereo headphones or stereo headsets. For this purpose, the audio processing apparatus creates a virtual sound source directivity which simulates the principal radiation direction of the sound source thus allowing a listener to perceive the sound source orientation, i.e. in the case of a person talking to perceive the facing direction of the talker. Furthermore, the audio processing apparatus positions the monophonic audio signal of the sound source in a virtual location in space, allowing the listener to localize the talker in space.
[0046] The audio processing apparatus is not only capable of providing, on a conventional two-channel stereo playback system, an adjustable virtual location in space and an adjustable virtual sound source directivity for one sound source, but also for multiple sound sources, which may be emitting sound simultaneously or sequentially. [0047] By using the invention in audioconferencing and videoconferencing systems, the quality and efficiency of communication can be significantly improved due to the following reasons: [0048] By providing an adjustable virtual location for each remote talker, the listener is enabled to better separate and identify different talkers.
[0049] By providing an adjustable virtual sound source directivity, the listener is enabled to distinguish who is speaking to whom, and is thus enabled to follow the conversation in a much more efficient manner. In addition, speech intelligibility is significantly improved by providing the listener with information abdut the sound source directivity of each talker.
[0050] An input for a monophonic tone signal TS of a sound source.
[0051] A Sound Source Position Generator that generates an adjustable virtual sound source location based on a position signal P. The position signal P can be either provided separately or can be encoded as metadata in the tone signal. [0052] A Sound Source Directivity Generator that generates an adjustable virtual sound source directivity based on a directivity signal D. The directivity signal D can be either provided separately or can be encoded as metadata in the tone signal.
[0053] Two combiners that add the respective left and right output signals from the Sound Source Position Generator (LPTS , PTS) and the Sound Source Directivity Generator (LOTS , RDTS).
[0054] Two stereo output transducers (either stereo loudspeakers, stereo
headphones, or stereo headsets) that are spaced apart, which reproduce the output signals of the two combiners.
[0055] The audio processing method of fig. 5A comprises the following steps:
• Creating from a monophonic audio input signal TS, i.e. the monophonic tone signals of a sound source, a stereophonic sound source with an adjustable, time-variable virtual location P in space and an adjustable, time-variable virtual sound source directivity D, which can be reproduced on a
conventional two-channel stereo playback system, such as two-channel stereo loudspeaker systems, stereo headphones or stereo headsets.
• Generating a virtual sound source position with common stereophonic
methods based on the position input signal P.
• Generating a virtual sound source directivity by creating a left and a right delayed version of the audio input signal TS (hereinafter referred to as the left and right delayed directivity signal path), whereby the gain of the left and right delayed directivity signal path are adjusted separately according to the desired principal radiation direction of a sound source given by the directivity input signal D.
• Inverting the audio signal of either the left or the right delayed version of the audio input signal TS, i.e. inverting either the left or the right delayed directivity signal path. [0056] A lot of stereophonic methods are known in order to create adjustable virtual sound source positions from a monophonic tone signal of a sound source on a two- channel stereo system. The four common basic principles of such stereophonic methods are (a) introducing a delay between the tone signal on the left and the right channel in order to position the virtual sound source further to the left or to the right of the stereo playback system, (b) introducing an amplitude difference between the tone signal on the left and the right channel in order to position the virtual sound source further to the left or to the right of the stereo playback system, (c) introducing both, delay and amplitude differences between the tone signal on the left and the right channel, or (d) employing head-related transfer functions (HRTF) or approximations of HRTFs (e.g. EP 0357402, or an ear canal resonance model with bandpass filter which models a sound source at a specific location) to the tone signal on the left and the right channel in order to position the virtual sound source in the desired position. [0057] The present invention employs any one of the well-known stereophonic methods to place the sound source in a virtual spatial position, whereby the desired virtual spatial position is given by the position signal P. There is, therefore, no difference between the present invention and the prior art with respect to the method of how the virtual sound source position is generated on a two-channel stereo system.
[0058] In addition to the methods used to create a virtual sound source position, the present application also employs commonly known stereo-widening techniques to enlarge the perceived spatial extent of the possible sound source positions.
[0059] However, there are two important differences with the prior art which refer to the following features: [0060] It is particularly advantageous for the present invention to use different methods for generating the virtual sound source position depending on where the desired virtual position is located with respect to the two-channel reproduction system. For example, it is particularly advantageous to use a different method for virtual positions located within a region which extends around the center-plane of the stereo system, i.e. in an area around the middle-plane between the two stereo output transducers, and for virtual positions located outside this region around the center-plane of the stereo system. This means that if multiple sound sources with different virtual positions are present, different methods are preferably used for generating the respective virtual sound source positions depending on where the respective sound sources are positioned.
[0061] Furthermore, it is particularly advantageous for the present invention to employ stereo-widening techniques only for such virtual sound source positions which are located outside the region around the center-plane of the stereo system. This is particularly beneficial when multiple sound sources with different virtual positions are present.
[0062] There are only a few known audio systems that are capable of reproducing the sound source directivity of a given sound source. Such systems, however, require complicated signal processing and multi-channel, i.e. multi-loudspeaker playback systems. Some examples for such complex systems that are capable of reproducing the directivity of a sound source are (a) 2D or 3D-loudspeaker cluster or loudspeaker arrays (e.g. multi-speaker display systems), (b) wave-field synthesis which either employs monopole synthesis or appropriate directivity filters, or (c) directivity reproduction as described in WO2007/062840.
[0063] One objective of the present invention is to provide an audio processing method that is capable of creating an adjustable virtual sound source directivity on a conventional two-channel stereo playback system, while at the same time also generating an adjustable virtual sound source position.
[0064] None of the known prior art audio systems is capable of generating a sound source directivity, which can be adjusted to any principal radiation direction of 180° around the desired sound source position, with only two audio channels. The audio systems mentioned in (a) and (b) above always require more than two audio channels, i.e. more than two stereo output transducers (i.e. multiple loudspeakers). The audio system mentioned in (c) requires two second reproduction units WE2 in addition to a first reproduction unit WE1 in order to reproduce any principal radiation direction of 180° around the desired sound source position which is reproduced by the first reproduction unit WE1 , whereby one of the second reproduction units WE2 is positioned on one side and the other on the other side of the first reproduction unit WE1. [0065] The system of the present invention differs from the state of the art in that only two audio channels, i.e. two stereo output transducers such as two stereo loudspeakers, binaural headphones or binaural headsets, are required to generate an adjustable virtual sound source directivity, while also creating an adjustable virtual sound source position.
[0066] The advantage of the present invention is that a simple, conventional two- channel stereo playback system can be used to provide a listener with an adjustable virtual sound source position and an adjustable virtual sound source directivity from a monophonic input tone signal of a sound source. [0067] The term "virtual" used in the expressions "virtual sound source position" and "virtual sound source directivity" has the following meaning: [0068] The sound source position created by the audio processing apparatus of the present invention is a virtual sound source position. This means, that there is no physical sound source, as for example a loudspeaker, at the perceived position of the sound source. The perceived position of the sound source in space is not related to the position of a real physical sound source in this space.
[0069] The sound source directivity created by the audio processing apparatus of the present invention is a virtual sound source directivity. This means, that the sound source directivity, i.e. the principal radiation direction of the sound source, is only simulated to provide a listener with a perceivable principal radiation direction (for example a speaking direction of a human talker), without it being actually physically directed in the conventional sense.
[0070] Fig.4A shows a basic concept of the present invention for a single sound source. A time-variable tone signal TS is provided as audio input signal. The tone signal TS is the monophonic audio signal which corresponds to a sound source, such as for example the audio signal of a speaking person. For example, the monophonic audio signal could be the transmitted audio signal of a remote talker in an audioconference or a web- or videoconference. Furthermore, a time-variable position input signal P is provided which specifies in a time-variable manner the desired virtual position of the sound source in space, and a time-variable directivity input signal D is provided which specifies the virtual sound source directivity in a time-variable manner. The position signal P and/or the directivity signal D can be provided in many different ways, such as for example embedded in the tone signal TS, encoded as metadata in the tone signal TS, combined in a data signal, combined in a separate signal, or simply provided as separate signals.
[0071] From this monophonic audio input signal TS, the audio processing apparatus of the present invention creates a stereophonic sound source, with an adjustable virtual location in space (according to the position signal P) and an adjustable virtual sound source directivity (according to the directivity signal D), which can be reproduced on any conventional stereophonic playback system (i.e. any system using two or more independent audio channels through a configuration of two or more loudspeakers), in particular on any conventional two-channel stereo playback system comprising two stereo output transducers (e.g. systems with stereo loudspeakers, stereo headphones or stereo headsets).
[0072] The audio processing apparatus comprises the following two important processing units with which the stereophonic sound source is created based on the audio input signal TS, the position input signal P and the directivity input signal D: (a) a Sound Source Position Generator, and (b) a Sound Source Directivity Generator.
[0073] The Sound Source Position Generator generates an adjustable virtual sound source location based on the position signal P. Common stereophonic methods, as mentioned earlier, are employed to place the sound source in a virtual spatial position, whereby the desired virtual spatial position is given by the position signal P. In addition to the methods used to create a virtual sound source position, the Sound Source Generator also employs commonly known stereo-widening techniques to enlarge the perceived spatial extent of the possible sound source positions.
[0074] The Sound Source Directivity Generator generates an adjustable virtual sound source directivity based on a directivity signal D. The details of the signal processing performed by the Sound Source Directivity Generator are described in Fig.5A. As shown in Fig.4A, and explained in Fig.5B, the Sound Source Directivity Generator may optionally provide a directivity-specific gain gS to the Sound Source Position Generator with which the sound level of the virtual sound source is processed. [0075] Both, the Sound Source Position Generator and the Sound Source Directivity Generator generate each a left channel output and a right channel output. The respective left and right output signals from the Sound Source Position Generator (LPTS, RPTS) and the Sound Source Directivity Generator (LDTS, RDTS) are added by two separate combiners (adders in Fig.4A) [0076] The output signals of the two combiners are then reproduced by two stereo output transducers LSpk and RSpk which may be, for example/ a system with stereo loudspeakers, a stereo headphone, or a stereo headset. Systems with stereo loudspeakers may include, but are not limited to, high-fidelity two-channel stereo playback equipment, surround sound systems, mobile devices such as phones, tablets, PC's, MP3-players that have stereo loudspeakers, etc.
[0077] The basic concept of the audio processing apparatus described for a single sound source can be adapted in order to create multiple stereophonic sound sources with adjustable positions and directivities, whereby the multiple sound sources may be emitting sound simultaneously or sequentially. For example, if the present invention is used to enhance the communication quality and efficiency of an audio-, web- or videoconferencing, then multiple speaking persons with different spatial positions and speaking directions, which may be speaking sequentially or simultaneously, have to be generated by the audio processing apparatus.
[0078] For this purpose, as shown in Fig.4B, the basic concept for one single sound source is multiplied in the following way: For N sound sources, the audio processing apparatus comprises N time-variable monophonic audio input signals TS1 to TSN, N corresponding time-variable position input signals P1 to PN, N corresponding time- variable directivity input signals D1 to DN, N corresponding Sound Source Position Generators, and N corresponding Sound Source Directivity Generators which provide N corresponding directivity-specific gains gSN to the corresponding Sound Source Position Generators. The N Sound Source Position Generators and the N Sound Source Directivity Generators can be implemented as only one Sound Source Position Generator, which creates the virtual position of the sound sources separately for each sound source based on the corresponding audio input signal TSN and the corresponding position signal PN, and only one Sound Source Directivity Generator, which generates the virtual sound source directivity separately for each sound source based on the corresponding audio input signal TSN and the corresponding directivity signal DN. [0079] The respective left and right output signals (LPTSN, RPTSN, LDTSN, RDTSN) created by either the one combined Sound Source Position Generator and the one combined Sound Source Directivity Generator, or the N Sound Source Position Generators and the N Sound Source Directivity Generators are added by the two separate combiners (adders in Fig.4B) and reproduced by the two stereo output transducers LSpk and RSpk.
[0080] In the case of multiple sound sources, e.g. multiple remote talkers of a conferencing setup, it is advantageous to include a level normalization across all individual monophonic audio signals TS1 to TSN in order to adjust the perceived loudness across all the reproduced virtual sound sources. Especially in the case of monophonic sound sources TSN in a conferencing setup, it might be advantageous to include a preprocessing that extracts the target voice TSN of a remote talker from any other background noise by employing commonly known noise cancellation, echo cancellation, speech- and/or speaker recognition techniques.
[0081] Furthermore, in a conferencing setup it is often sufficient to provide a limited number of different possible azimuths and/or elevations for the virtual sound source positions of different talkers. For example, it may be enough to provide 3 to 5 distinctly perceivable virtual sound source positions along the azimuth. If more than 3 to 5 remote participants, which are potential talkers, are participating in the conferencing setup, then a dynamic mapping of the current talker or talkers to the limited number of possible perceivable virtual sound source positions might be employed. The same applies to the distribution of virtual sound source positions along the elevation where, for example, it is often sufficient . to provide only 1-3 distinctly perceivable virtual sound source positions along the elevation.
[0082] Fig.5A shows the basic concept of the audio processing apparatus for a single sound source and highlights the details of the signal processing method that is performed by the Sound Source Directivity Generator.
[0083] Experiments have shown, that the information on the principal radiation direction of a sound source can be communicated to the human ear by a delayed and optionally attenuated version of the sound source signal, whenever the delay at the human ear has values between 2 ms and 100 ms so that the delayed version is not processed as a separate sound event. According to this finding, the Sound Source Directivity Generator uses the monophonic audio input signal TS of the sound source to generate a left and a right delayed and attenuated version of the tone signal TS (hereinafter referred to as the left and right delayed directivity signal path), whereby the attenuation is based on the directivity input signal D. The delayed versions of the tone signal TS are not perceived as separate sound events but serve only to simulate the sound source directivity, i.e. the principal radiation direction of the sound source.
[0084] The predetermined delay τ may be the same or approximately the same (within ±3 ms) in both signal paths, and is chosen to be between 2 ms and 100 ms, preferably between 5 ms and 80 ms, and in particular between 7 ms and 25 ms. [0085] The attenuation, or in other words the gain gl_ of the left signal path and the gain gR of the right signal path of the delayed versions is controlled by the time- variable directivity input signal D which specifies the desired virtual principal radiation direction of the sound source. For example, in order to simulate a virtual principal radiation direction of the sound source which is directed towards the right side of the room, the gain gl_ of the left delayed directivity signal path will be reduced to a value around 0 and the gain gR of the right delayed directivity signal path will be set to a value around 1.
[0086] An additional feature of the present embodiment of the invention is that one of the delayed signal paths, i.e. one of the left and right delayed directivity signal paths generated by the Sound Source Directivity Generator, is inverted, that is the sound signal of that path is multiplied by -1 , and therefore the polarity of the amplitude is changed (depicted by the item INV in Fig.5A). The purpose of this feature is to improve the sound quality by reducing perceivable comb-filter effects which arise due to the reduction of the number of audio channels to only two audio channels which have to carry multiple correlated version of the same sound in order to create a virtual sound source position and a virtual sound source directivity. [0087] The left and right output channels created by the Sound Source Directivity Generator are added by two separate combiners to the respective left and right output channels of the Sound Source Position Generator. [0088] Fig.5B shows one embodiment in which the time-variable directivity input signal D also controls the gain gS of the virtual sound source that is generated by the Sound Source Position Generator. In contrast to the delayed versions of the tone signal TS which are generated by the Sound Source Directivity Generator and which are not perceived as separate sound events, the virtual sound source generated by the Sound Source Position Generator at a desired virtual spatial location given by the time-variable position input signal P is perceived as a clearly localizable sound event. The gain gS of the virtual sound source, and thus the perceived sound level of the virtual sound source, is controlled by the time-variable directivity input signal D in such a way that the perceived sound level corresponds to the desired virtual principal radiation direction of the sound source.
[0089] As mentioned before, the position signal P and/or the directivity signal D in Fig.5A and Fig.5B can be provided in many different ways, such as for example embedded in the tone signal TS, encoded as metadata in the tone signal TS, combined in a data signal, combined in a separate signal, or simply provided as separate signals. This is depicted by the dashed lines for the position input signal P and the directivity input signal D.
[0090] For the best possible true to life simulation of the desired principal radiation direction of the virtual sound source, the gain of the virtual sound source gS, and the gains gL and gR of the delayed left and right directivity signal paths of the Sound Source Directivity Generator may be adapted according to the desired directivity D. For this purpose, the gains are adjusted in such a way that the perceivable loudness differences due to varying principal radiation directions of the sound source can be appropriately approximated. The gains thus determined can be stored for different directivity input signals D.
[0091] Fig. 6 shows examples for gain functions gS, gL and gR in dependence of the directivity input signal D, i.e. the input signal that specifies the desired principal radiation direction of the virtual sound source. Such gain functions can be stored within the Sound Source Directivity Generator for controlling the generation of the adjustable virtual sound source directivity. Depending on the directivity input signal D the gains of the respective signal paths are adjusted according to the stored default value. These gains may be employed to both, the method shown in Fig.5A and in Fig.5B.
[0092] The basic signal processing methods shown in Fig.5A and Fig.5B can be supplemented with the following extension in order to further improve the realism of the simulated principal radiation direction of the sound source: The audio signals in the left and right delayed directivity signal paths of the Sound Source Directivity Generator may be additionally processed by a frequency filter in each path, such as a high-pass, low-pass or band-pass filter, whereby in most cases the same filter characteristics will be applied to both paths. The parameters of both frequency filters can be either fixed in advance or be controlled by the directivity input signal D.
[0093] The features described with respect to Fig.5A, Fig.5B and Fig.6 as well as the described supplemental extension can of course be applied to multiple sound sources as shown in Fig.5B, i.e. to multiple audio input signals TSN with their corresponding position input signals PN and directivity input signals DN. The time delay τ applied by the Sound Source Directivity Generator to one sound source may be the same or approximately the same (within ±3 ms) in the left and right delayed directivity signal path. The delay τ employed for the different monophonic sound sources TS1 to TSN may vary between the different sound sources. In particular, it is beneficial to employ different time delays τ when using different techniques for generating the virtual sound source positions. For example, if a stereophonic technique is employed that uses inter-channel level or time-differences, then a shorter time delay τ may be chosen than if a stereophonic technique is employed that uses HRTFs or stereo-widening techniques.
[0094] The time delay τ may also be chosen differently depending on which stereophonic playback system is used for the reproduction. If a stereophonic system is used for the reproduction which comprises two stereo loudspeakers that are widely spaced apart, such as high-fidelity stereo systems or surround sound systems, then a larger time delay τ may be employed than for the reproduction on a small stereo system incorporated in a mobile device. An even smaller time delay than the one used for small stereo systems might be used for reproduction on stereo headphones or stereo headsets.
[0095] Depending on where in space the virtual position of the sound source is located with respect to the two-channel reproduction system, the Sound Source Directivity Generator may employ the signal processing method described above in the following specific way, as shown in Fig.7A and B:
[0096] (A) If the virtual sound source position in space, given by the corresponding position input signal P, is located to the far left side of the stereo playback system, then the Sound Source Directivity Generator will invert the audio signal on the right delayed directivity signal path (see Fig.7A).
[0097] (B) If the virtual sound source position in space, given by the corresponding position input signal P, is located to the far right side of the stereo playback system, then the Sound Source Directivity Generator will invert the audio signal on the left delayed directivity signal path (see Fig.7B).
[0098] In case of multiple sound sources TS1 to TSN (as shown in Fig.5B), this specific signal processing method of the Sound Source Directivity Generator as depicted in Fig.4A and B may be employed to the leftmost and to the rightmost sound source.
[0099] Note that for purposes of simplification the Sound Source Position Generator is not shown in Fig.7A and B, but is obviously included in the same manner as shown in Fig.6A or Fig.6B. [0100] Fig.8A and B show an alternative signal processing method performed by the Sound Source Directivity Generator which is dependent on where the virtual position of the sound source is located with respect to the two-channel reproduction system: [0101] (A) If the virtual sound source position in space, given by the corresponding position input signal P, is located to the far left side of the stereo playback system, then the Sound Source Directivity Generator will remove the left delayed directivity signal path and generate only the inverted audio signal on the right delayed directivity signal path (see Fig.8A).
[0102] (B) If the virtual sound source position in space, given by the corresponding position input signal P, is located to the far right side of the stereo playback system, then the Sound Source Directivity Generator will remove the right delayed directivity signal path and generate only the inverted audio signal on the left delayed directivity signal path (see Fig.8B).
[0103] In case of multiple sound sources TS1 to TSN (as shown in Fig.5B), this alternative signal processing method of the Sound Source Directivity Generator as depicted in Fig.5A and B may be employed to the leftmost and to the rightmost sound source.
[0104] Note that for purposes of simplification the Sound Source Position Generator is not shown in Fig.8A and B, but is obviously included in the same manner as shown in Fig.6A or Fig.6B.
[0105] Remark: The following expressions are used synonymously: voice direction, speaking direction, talker's or speaker's orientation, talker's facing or gazing direction, talker's facing or gazing angle, sound source orientation, voice directivity, principal radiation direction of the emitted sound signal, principal radiation direction of the reproduced sound source, main direction of emission, directivity.

Claims

Method for generating a two-channel signal from a single-channel signal (TS) of a sound source for simulating a position and a directional radiation characteristic of a reproduced sound source and a principal radiation direction of an emitted signal, characterized in that the single-channel signal TS is split into two first channels (LPTS, RPTS) and two second channels (LDTS, RDTS) wherein splitting into the two first channels (LPTS, RPTS) is conducted using stereophonic techniques so that a virtual position of the
reproduced sound source is generated, and wherein splitting into the two second channels (LDTS, RDTS) is conducted by delaying the single-channel signal TS in both second channels (LDTS, RDTS) with a time delay τ to generate a virtual directional radiation characteristic of the reproduced sound source, and wherein the signal of one of the second channels (LDTS, RDTS) is processed with a different gain compared to the gain in the other second channel to generate a virtual principal radiation direction of the emitted signal, and in that one of the first channels (LPTS) is added to one of the second channels (LDTS) to create a first output channel (LSPK), and in that the other of the first channels (RPTS) is added to the other of the second channels (RDTS) to create a second output channel (RsPk).
Method according to claim 1 , characterized in that the first channels are a first left channel (LPTs) and a first right channel (RPTS), and the second channels are a second left channel (LOTS) and a second right channel (RDTS), and the first left channel (LPTS) is added to the second left channel (LDTS) to create a left output channel (l_sPk), and the first right channel (RPTS) is added to the second right channel (RDTS) to create a right output channel (RsPk),
3. Method according to claim 1 or 2, characterized in that as stereophonic technique stereophonic phantom source techniques, HRTF's, HRTF approximations or stereo-widening is used.
4. Method according to one of claims 1 to 3, characterized in that different stereophonic techniques are used for generating virtual positions in different spatial regions.
5. Method according to one of claims 1 to 4, characterized in that time delays τ are in the range between 2 ms and 100 ms at the listener.
6. Method according to claim 5, characterized in that time delays are constant in time.
7. Method according to one of claims 1 to 6, characterized in that the gains are variable in time.
8. Method according to one of claims 1 to 7, characterized in that the signal of one of the second channels (LDTS, RDTS) is inverted.
PCT/EP2017/000649 2016-06-06 2017-06-06 Method for generating a two-channel signal from a single-channel signal of a sound source WO2017211448A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102016006732 2016-06-06
DE102016006732.2 2016-06-06

Publications (1)

Publication Number Publication Date
WO2017211448A1 true WO2017211448A1 (en) 2017-12-14

Family

ID=59152811

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/000649 WO2017211448A1 (en) 2016-06-06 2017-06-06 Method for generating a two-channel signal from a single-channel signal of a sound source

Country Status (1)

Country Link
WO (1) WO2017211448A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927701A (en) * 2021-02-05 2021-06-08 商汤集团有限公司 Sample generation method, neural network generation method, audio signal generation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0357402A2 (en) 1988-09-02 1990-03-07 Q Sound Ltd Sound imaging method and apparatus
EP1746863A2 (en) * 2005-07-20 2007-01-24 Samsung Electronics Co., Ltd. Method and apparatus to reproduce wide mono sound
WO2007062840A1 (en) 2005-11-30 2007-06-07 Miriam Noemi Valenzuela Method for recording and reproducing a sound source with time-variable directional characteristics
US20130294605A1 (en) * 2012-05-01 2013-11-07 Sony Mobile Communications, Inc. Sound image localization apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0357402A2 (en) 1988-09-02 1990-03-07 Q Sound Ltd Sound imaging method and apparatus
EP1746863A2 (en) * 2005-07-20 2007-01-24 Samsung Electronics Co., Ltd. Method and apparatus to reproduce wide mono sound
WO2007062840A1 (en) 2005-11-30 2007-06-07 Miriam Noemi Valenzuela Method for recording and reproducing a sound source with time-variable directional characteristics
US20130294605A1 (en) * 2012-05-01 2013-11-07 Sony Mobile Communications, Inc. Sound image localization apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927701A (en) * 2021-02-05 2021-06-08 商汤集团有限公司 Sample generation method, neural network generation method, audio signal generation method and device

Similar Documents

Publication Publication Date Title
US11991315B2 (en) Audio conferencing using a distributed array of smartphones
US8073125B2 (en) Spatial audio conferencing
US10491643B2 (en) Intelligent augmented audio conference calling using headphones
Valimaki et al. Assisted listening using a headset: Enhancing audio perception in real, augmented, and virtual environments
US20110026745A1 (en) Distributed signal processing of immersive three-dimensional sound for audio conferences
CN103053180A (en) System and method for sound reproduction
WO2012068174A2 (en) Method for controlling a speaker array to provide spatialized, localized, and binaural virtual surround sound
JP5363567B2 (en) Sound playback device
CN111466123B (en) Sub-band spatial processing and crosstalk cancellation system for conferencing
US20170223474A1 (en) Digital audio processing systems and methods
US9226091B2 (en) Acoustic surround immersion control system and method
US10440495B2 (en) Virtual localization of sound
US20200059750A1 (en) Sound spatialization method
WO2017211448A1 (en) Method for generating a two-channel signal from a single-channel signal of a sound source
JP6972858B2 (en) Sound processing equipment, programs and methods
US11019216B1 (en) System and method for acoustically defined remote audience positions
US20230319492A1 (en) Adaptive binaural filtering for listening system using remote signal sources and on-ear microphones
CN109121067B (en) Multichannel loudness equalization method and apparatus
WO2023286320A1 (en) Information processing device and method, and program
Shabtai et al. Spherical array processing with binaural sound reproduction for improved speech intelligibility
EP4207804A1 (en) Headphone arrangement
Glasgal Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques
Lokki et al. Problem of far-end user’s voice in binaural telephony
Chen et al. Enhancing stereophonic teleconferencing with microphone arrays through sound field warping
Ward Acoustic Crosstalk Reduction in Loudspeaker-Based Virtual Audio Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17732301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17732301

Country of ref document: EP

Kind code of ref document: A1