EP3197182B1 - Verfahren und vorrichtung zur erzeugung und wiedergabe von audiosignalen - Google Patents

Verfahren und vorrichtung zur erzeugung und wiedergabe von audiosignalen Download PDF

Info

Publication number
EP3197182B1
EP3197182B1 EP15832603.3A EP15832603A EP3197182B1 EP 3197182 B1 EP3197182 B1 EP 3197182B1 EP 15832603 A EP15832603 A EP 15832603A EP 3197182 B1 EP3197182 B1 EP 3197182B1
Authority
EP
European Patent Office
Prior art keywords
channel
audio signal
audio
signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP15832603.3A
Other languages
English (en)
French (fr)
Other versions
EP3197182A4 (de
EP3197182A1 (de
Inventor
Hyun Jo
Sun-Min Kim
Jae-Ha Park
Sang-Mo Son
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of EP3197182A1 publication Critical patent/EP3197182A1/de
Publication of EP3197182A4 publication Critical patent/EP3197182A4/de
Application granted granted Critical
Publication of EP3197182B1 publication Critical patent/EP3197182B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/05Generation or adaptation of centre channel in multi-channel audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present invention relates to a method of generating and reproducing an audio signal and an apparatus therefor, and more specifically, to a method and apparatus with improved rendering performance by collecting audio signals and reducing coherence of the collected audio signals.
  • the present invention also relates to a method of and an apparatus for reducing a load by reducing a computational amount, while improving the rendering performance by performing rendering based on real-time information of an audio signal.
  • the miniaturization of capturing devices leads to a gradual reduction of a distance between microphones, thereby increasing the coherence between input channels.
  • a degree of sound externalization for reproduction in a headphone is deteriorated, and also, the positioning performance of a sound image is deteriorated.
  • Document US 2010/328419 A1 discloses a method and apparatus for enabling certain experience by matching of the auditory space to the visual space in video viewing applications such as those that may be used in video teleconferencing systems and in the viewing of videos with associated audio (e.g., movies).
  • an audio generation method using a miniaturized capturing device has a problem in that the reproduction performance is deteriorated due to high coherence between input signals.
  • a long-tap filter should be used to simulate an echo, and thus, a computational amount increases.
  • head position information of a user is required to position a sound image.
  • the objective of the present invention is to solve the above-described problems of the prior art, to decrease signal coherence, and to improve the rendering performance by reflecting real-time head position information of a user.
  • the rendering performance may be improved by lowering signal coherence and reflecting real-time head position information of a user regardless of form factors and the like of a capturing device and a rendering device.
  • FIG. 1 is an outline diagram of a system for generating and reproducing an audio signal, according to an embodiment of the present invention.
  • the system for generating and reproducing an audio signal includes an audio generation apparatus 100, an audio reproduction apparatus 300, and a network 500.
  • the audio signal when a sound constituting the audio signal is generated, the audio signal is transferred to a mixer through a microphone and is output to a speaker through a power amplifier.
  • a process of modulating the audio signal through an effector or a process of storing the generated audio signal in a storage or reproducing the audio signal stored in the storage may be added.
  • Types of sound are largely classified into an acoustic sound and an electrical sound according to sources thereof.
  • the acoustic sound such as a voice of a human being or an acoustic instrument sound needs a process of converting a sound source thereof into an electrical signal, wherein the acoustic sound is converted into an electrical signal through a microphone.
  • the audio generation apparatus 100 of FIG. 1 is a device for performing all the processes of generating an audio signal from a predetermined sound source.
  • a representative example of the sound source of the audio signal is an audio signal recorded by using a microphone.
  • the basic principle of the microphone corresponds to a transducer for converting a form of energy from sound energy to electrical energy.
  • the microphone generates a voltage by converting a physical, mechanical motion of air into an electrical signal and is classified into a carbon microphone, a crystal microphone, a dynamic microphone, a capacitor microphone, or the like according to a conversion scheme.
  • a capacitor microphone is mainly used for recording a sound.
  • An omnidirectional microphone has the same sensitivity for all incident angles, but a directional microphone has a difference in sensitivity according to an incident angle of an input audio signal, and this difference in sensitivity is determined depending on a unique polar pattern of the microphone.
  • a unidirectional microphone most sensitively responds to a sound input from the front (0°) of the same distance and hardly detects a sound input from the rear.
  • a bidirectional is most sensitive to signals input from the front (0°) and the rear (180°) and hardly detects sounds input from both sides (90° and 270°).
  • an audio signal having a two-dimensional (2D) or 3D spatial characteristic may be recorded.
  • the sound source of the audio signal is an audio signal generated by using a digital sound source generation device such as a musical instrument digital interface (MIDI).
  • a digital sound source generation device such as a musical instrument digital interface (MIDI).
  • the MIDI interface is equipped in a computing device and functions to connect the computing device and instrument. That is, when the computing device transmits a signal to be generated to the MIDI interface, the MIDI interface transmits signals aligned according to a predefined rule to electronic instrument to generate an audio signal. This process of collecting a sound source is called capturing.
  • An audio signal collected through the capturing process is encoded to a bitstream by an audio encoder.
  • An MPEG-H audio codec standard defines an object audio signal and a higher order ambisonics (HOA) signal besides a general channel audio signal.
  • An object indicates each sound source constituting a sound scene, for example, indicates each instrument forming music or each of dialog, effect, and background music (BGM) constituting an audio sound of movie.
  • BGM background music
  • a channel audio signal includes information about a sound scene including all objects, and thus the und scene including all the objects is reproduced through an output channel (speaker).
  • an object signal stores, transmits, and reproduces a signal on an object unit basis, and thus a reproducer may independently reproduce each object through object rendering.
  • each of objects constituting a sound scene may be extracted and reconfigured according to circumstances.
  • an audio sound of music general music content is obtained by individually recording each instrument forming music and appropriately mixing tracks of respective instruments through a mixing process. If a track of each instrument is configured as an object, a user may control each object (instrument) independently, and thus the user may adjust a sound magnitude of a specific object (instrument) and change a spatial location of the object (instrument).
  • dialog audio sounds dubbed to languages of various countries such as Korean, Japanese, and English, may be processed as objects and included in the audio signal.
  • Korean Korean
  • Japanese Japanese
  • an object corresponding to Korean is selected and included in the audio signal, such that Korean dialog is reproduced.
  • the MPEG-H standard defines HOA as a new input signal, and according to HOA, a sound scene may be represented in a form different from an existing channel or object audio signal by using a specially produced microphone and a special storage method representing the microphone in a series of processes of acquiring an audio signal through the microphone and reproducing the audio signal again.
  • An audio signal captured as described above is encoded by an audio signal encoder and transmitted in a form of bitstream.
  • a form of final output data of an encoder is a bitstream, and thus an input of a decoder is also a bitstream.
  • the audio reproduction apparatus 300 receives a bitstream transmitted via the network 500 and restores a channel audio signal, an object audio signal, and HOA by decoding the received bitstream.
  • the restored audio signals may be output as a multi-channel audio signal nixed with a plurality of output channels by which a plurality of input channels are to be reproduced through rendering.
  • the input channels are down-mixed to meet the number of the output channels.
  • Stereophonic audio indicates audio additionally having spatial information allowing a user to feel presence by reproducing not only a pitch and tone of a sound but also a direction and a sense of distance and allowing a user who is not located in a space from which the sound is generated to recognize a sense of direction, a sense of distance, and a sense of space.
  • output channels of an audio signal may indicate a number of speakers through which audio is output. The more a number of output channels, the more a number of speakers through which audio is output.
  • the stereophonic audio reproduction apparatus 100 may render and mix a multi-channel audio input signal to output channels to be reproduced such that the multi-channel audio input signal having a many number of input channels is output and reproduced in an environment with a small number of output channels.
  • the multi-channel audio input signal may include a channel capable of outputting an elevated sound.
  • the channel capable of outputting the elevated sound may indicate a channel capable of outputting an audio signal through a speaker located on the head of the user such that the user can feel a sense of elevation.
  • a horizontal channel may indicate a channel capable of outputting an audio signal through a speaker located on a plane horizontal to the user.
  • the above-described environment with a small number of output channels may indicate an environment in which audio can be output through speakers arranged on a horizontal plane without including an output channel capable of outputting an elevated sound.
  • a horizontal channel may indicate a channel including an audio signal which can be output through a speaker arranged on a horizontal plane.
  • An overhead channel may indicate a channel including an audio signal which can be output through a speaker arranged at an elevated place instead of the horizontal plane and capable of outputting an elevated sound.
  • the network 500 functions to connect the audio generation apparatus 100 and the audio reproduction apparatus 300. That is, the network 500 indicates a communication network for providing a connection path through which data can be transmitted and received.
  • the network 500 may be configured regardless of communication aspects such as wired communication and wireless communication and may be configured by a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN), taken alone or in combination.
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • the network 500 is a compressive data communication network enabling the network component entities shown in FIG. 1 to smoothly communicate with each other and may include at least some of a wired Internet, a wireless Internet, a mobile wireless communication network, a telephone network and a wired/wireless television communication network.
  • the first step of a process of generating an audio signal is to capture the audio signal.
  • the capturing of the audio signal includes collecting audio signals having spatial location information in the entire azimuth range of 360° in a 2D or 3D space.
  • An audio signal capturing environment can be largely divided into a studio environment and an environment using a capturing device having a relatively small-sized form factor.
  • An example of audio content produced in the studio environment is as follows.
  • a most general audio signal capture system is a system for recording sound sources through microphones in the studio environment and mixing the recorded sound sources to generate audio content.
  • sound sources captured by using microphones installed in various places in an indoor environment such as a stage may be mixed in a studio to generate content.
  • this method is usually applied to classic music recording.
  • a two-track recording method of a stereo output without performing post mixing production was used, but recently, a multi-track (channel) recording method is used to perform post mixing production or multi-channel (5.1-channel or the like) surround mixing.
  • the audio content captured un the studio environment is the best in terms of sound quality, but the studio environment can be used only in a limited environment and at a limited time, and there require a lot of installation and maintenance costs.
  • an audio capturing form factor having a size of tens Cm has been used, and an audio capturing form factor having a size of several Cm also has been developed.
  • a 20-Cm-sized form factor is usually used for audio content binaural-rendered and reproduced through headphones or the like.
  • a capturing device having a smaller-sized form factor may be implemented by using a directional microphone.
  • an operation of capturing an audio signal and then being linked to a portable device to mix, edit, and reproduce the captured audio signal may be possible.
  • FIG. 2 illustrates a phenomenon of increasing coherence between input channels in an audio generation apparatus according to an embodiment of the present invention and an influence to the rendering performance.
  • FIG. 2A illustrates a phenomenon of increasing coherence between input channel signals in an audio generation apparatus according to an embodiment of the present invention.
  • FIG. 2A assumes a case of two microphones, that is, two input channels.
  • An audio signal received through a microphone has a unique signal characteristic according to a relationship between a location of a sound image and a location of the microphone for receiving the sound image. Therefore, when audio signals are received through a plurality of microphones, locations (distances, azimuth angles, and elevation angles) of sound images may be detected by analyzing a time delay, a phase, and a frequency characteristic of an audio signal received through each of the microphones.
  • This phenomenon is severer as a distance between the microphones is shorter, thereby increasing the coherence between the input channel signals more.
  • the rendering performance is deteriorated, thereby affecting the reproduction performance.
  • FIG. 2B illustrates a phenomenon of deteriorating the rendering performance when coherence between input channel signals is high in an audio reproduction apparatus according to an embodiment of the present invention.
  • BRTF binaural room transfer function
  • HRTF head related transfer function
  • FIG. 3 is a block diagram of a system for generating and reproducing an audio signal, according to an embodiment of the present invention.
  • a system 300 for generating and reproducing an audio signal includes a virtual input channel audio signal generator 310, a channel separator 330, and a renderer 350.
  • the virtual input channel audio signal generator 310 generates N virtual input channel audio signals by using N input channel audio signals input through N microphones.
  • a virtual input channel layout which can be generated may vary according to a form factor of an audio signal capturer.
  • a virtual input channel layout to be generated may be manually set by a user.
  • a virtual input channel layout to be generated may be determined based on a virtual input channel layout according to a form factor of a capturing device and may refer to a database stored in a storage.
  • a virtual channel signal may be replaced by an actual input channel signal.
  • Signals output from the virtual input channel audio signal generator 310 are M input channel audio signals including virtual input channel audio signals, wherein M is an integer greater than N.
  • the channel separator 330 channel-separates the M input channel audio signals transmitted from the virtual input channel audio signal generator. For the channel separation, a process of calculating coherences through signal processing for each frequency band and reducing high coherence of a signal having the high coherence is performed. The channel separation will be described in more detail below.
  • the renderer 350 includes a filtering unit (not shown) and a panning unit (not shown).
  • the panning unit calculates and applies a panning coefficient to be applied for each frequency band and each channel in order to pan an input audio signal with respect to each output channel.
  • the panning on an audio signal indicates controlling a magnitude of a signal to be applied to each output channel in order to render a sound source to a specific location between two output channels.
  • the panning coefficient may be replaced by the term "panning gain”.
  • the panning unit may render a low frequency signal of an overhead channel signal according to an add-to-the-closest-channel method and render a high frequency signal according to a multi-channel panning method.
  • a gain value differently set for a channel to be rendered to each channel signal is applied to a signal of each channel of a multi-channel audio signal, and thus the signal of each channel of the multi-channel audio signal may be rendered to at least one horizontal channel.
  • Signals of channels to which gain values have been applied may be added through mixing, thereby outputting a final signal.
  • the audio reproduction apparatus 300 reproducing stereophonic audio may prevent sound quality deterioration which may occur according to mixing several channels to one output channel, by rendering a low frequency signal according to the add-to-the-closest-channel method. That is, when several channels are missed to one channel, sound quality may be deteriorated due to amplification or cut-off according to interference between channel signals, and thus the deterioration of sound quality may be prevented by mixing one channel to one output channel.
  • each channel of a multi-channel audio signal may be rendered to the closest channel among channels to be reproduced instead of separately rendered to several channels.
  • the filtering unit may correct a tone and the like of a decoded audio signal according to a location and filter an input audio signal by using a HRTF filter.
  • the filtering unit may render an overhead channel which has passed through the HRTF filter for 3D rendering of the overhead channel, by a different method according to a frequency.
  • the HRTF filter enables the user to recognize stereophonic audio by not only simple path differences such as a level difference between two ears (inter-aural level difference (ILD)) and an audio arrival time difference between two ears (inter-aural time difference (ITD)) but also a phenomenon in which complicated path characteristics such as diffraction on a head surface and reflection from an auricle vary according to a sound arrival direction.
  • the HRTF filter may process audio signals included in an overhead channel by changing sound quality of the audio signals such that stereophonic audio can be recognized.
  • FIG. 4 illustrates an operation of a virtual input channel audio signal generator according to an embodiment of the present invention.
  • an audio generation apparatus captures audio signals by using four microphones having the same distance from the center and having an angle of 90° therebetween. Therefore, in the embodiment disclosed in FIG. 4A , a number N of input channels is 4.
  • the used microphones are directional microphones having a cardioids pattern, and a cardioids microphone has a characteristic that side sensitivity is lower by 6 dB than front sensitivity and rear sensitivity is almost 0.
  • a beam pattern of four channel input audio signals captured in this environment is as shown in FIG. 4A .
  • FIG. 4B illustrates five input channel audio signals including a virtual microphone signal, i.e., a virtual input channel audio signal, generated based on the captured four input channel audio signals of FIG. 4A . That is, in the embodiment disclosed in FIG. 4B , a number M of virtual input channels is 5.
  • the virtual microphone signal is generated by weighted-summing the four channel input signals captured by the four microphones.
  • weights to be applied to the weighted sum are determined based on a layout of input channels and a reproduction layout.
  • FIG. 5 is a detailed block diagram of a channel separator according to an embodiment of the present invention.
  • a channel separator 500 includes a normalized energy acquirer 510, an energy index (EI) acquirer 520, an EI application unit 530, and gain application units 540 and 550.
  • EI energy index
  • the normalized energy acquirer 510 receives M input channel signals X 1 (f), X 2 (f) ,..., X M (f) and acquires normalized energy E ⁇ X 1 (f) ⁇ , E ⁇ X 2 (f) ⁇ ,..., E ⁇ X M (f) ⁇ for each frequency band of each input channel signal.
  • normalized energy E ⁇ X i (f) ⁇ of each input channel signal is determined by Equation 1.
  • E X i f X i f 2 X 1 f 2 + X 2 f 2 + ... + X M f 2
  • the normalized energy E ⁇ X i (f) ⁇ of each input channel signal corresponds to a ratio of energy occupied by an i th input channel signal in a corresponding frequency band to that of all the input channel signals.
  • the EI acquirer 520 acquires an index of a channel having the greatest energy among all the channels by calculating energy for each frequency band for each channel.
  • an energy index EI is determined by Equation 2.
  • EI f N / N ⁇ 1 ⁇ 1 ⁇ max E X _ 1 f , E X _ 2 f , ... , E X _ M f
  • the EI application unit 530 generates M highly correlated channel signals and M un-correlated signals based on a predetermined threshold.
  • the gain application unit 540 multiplies the highly correlated signals received from the EI application unit 530 by a gain Ei and the gain application unit 550 multiplies the un-correlated signals received from the EI application unit by a gain (1-Ei), respectively.
  • the M highly correlated channel signals and the M un-correlated signals to which the gains have been reflected are added to reduce channel coherence, thereby improving the rendering performance.
  • FIG. 6 is a block diagram of a configuration in which the virtual input channel signal generator and the channel separator are integrated, according to an embodiment of the present invention.
  • FIG. 6 is a block diagram for describing a method of using a center signal separation technique to separate sound images of three locations for two different input signals.
  • a sound image separator 600 includes domain converters 610 and 620, a correlation coefficient acquirer 630, a center signal acquirer 640, an inverse domain converter 650, and signal substractors 660 and 661.
  • a collected signal may vary according to a location of a microphone.
  • a sound source for generating a voice signal such as a singer or an announcer
  • stereo signals generated based on the voice signal generated from the sound source located at the center of the stage include same left and right signals.
  • signals collected by the microphones differ from each other, and thus left and right stereo signals also differ from each other.
  • a signal commonly included in stereo signals as well as a voice signal is defined as a center signal, and signals obtained by subtracting the center signal from the stereo signals are referred to as ambient stereo signals (ambient left and ambient right signals).
  • the domain converters 610 and 620 receive stereo signals L and R.
  • the domain converters 610 and 620 convert a domain of the received stereo signals.
  • the domain converters 610 and 620 convert the stereo signals to stereo signals in a time-frequency domain by using an algorithm such as fast Fourier transform (FFT).
  • FFT fast Fourier transform
  • the time-frequency domain is used to represent both changes in time and frequency.
  • a signal may be divided into a plurality of frames according to time and frequency values, and a signal in each frame may be represented by a frequency sub-band value in each time slot.
  • the correlation coefficient acquirer 630 calculates a correlation coefficient by using the stereo signals converted to the time-frequency domain by the domain converters 610 and 620.
  • the correlation coefficient acquirer 630 calculates a first coefficient indicating coherence between the stereo signals and a second coefficient indicating similarity between the two signals and calculates the correlation coefficient by using the first coefficient and the second coefficient.
  • the coherence between two signals indicates a correlated degree of the two signals
  • the first coefficient in the time-frequency domain may be represented by Equation 3.
  • ⁇ n k ⁇ 12 n k ⁇ 11 n k ⁇ 22 n k
  • n denotes a time value, that is, a time slot value
  • k denotes a frequency band value.
  • the denominator of Equation 1 is a factor for normalizing the first coefficient.
  • the first coefficient has a real number value greater than or equal to 0 and less than or equal to 1.
  • Equation 3 ⁇ ij (n, k) may be obtained as in Equation 4 by using an expectation function.
  • ⁇ ij n k E X i X j ⁇
  • X i and X j denote stereo signals represented by a complex number in the time-frequency domain
  • x j ⁇ denotes a conjugate complex number of X j .
  • the expectation function is a probability statics function used to obtain an average value of current signals by taking into account past values of the signals. Therefore, when a product of X i and x j ⁇ is applied to the expectation function, coherence between two current signals X i and X j is obtained by taking into account a statics value of coherence between two past signals X i and X j . Since Equation 4 requires a lot of computation amount, an approximate value of Equation 4 may be obtained by using Equation 5.
  • ⁇ ij n k 1 ⁇ ⁇ ⁇ ij n ⁇ 1 , k + ⁇ X i n k X j ⁇ n k
  • Equation 5 a first term indicates coherence of stereo signals in a frame immediately before a current frame, i.e., a frame having an (n-1) th time slot value and a k th frequency band value. That is, Equation 5 indicates that coherence of signals in a past frame before a current frame is considered when coherence of signals in the current frame is considered, and this is represented by using a probability statics function to predict coherence between current stereo signals as a probability based on statics, coherence between past stereo signals.
  • Equation 5 constants 1- ⁇ and ⁇ are multiplied in terms, respectively, and these constants are used to grant constant weights to a past average value and a current value, respectively.
  • a large value of the constant 1- ⁇ granted to the first term indicates that a current signal is largely affected from the past.
  • the correlation coefficient acquirer 630 obtains Equation 3 by using Equation 4 or 5.
  • the correlation coefficient acquirer 630 calculates the first coefficient indicating coherence between two signals by using Equation 3.
  • the correlation coefficient acquirer 630 calculates the second coefficient indicating similarity between two signals.
  • the second coefficient indicates a degree of similarity between two signals, and the second coefficient in the time-frequency domain may be represented by Equation 6.
  • ⁇ n k 2 ⁇ 12 n k ⁇ 11 n k + ⁇ 22 n k
  • n denotes a time value, that is, a time slot value
  • k denotes a frequency band value.
  • the denominator of Equation 6 is a factor for normalizing the first coefficient.
  • the second coefficient has a real number value greater than or equal to 0 and less than or equal to 1.
  • Equation 6 ⁇ ij(n, k) may be represented by Equation 7.
  • ⁇ ij n k X i n k X j ⁇ n k
  • X i and X j denote stereo signals represented by a complex number in the time-frequency domain
  • x j ⁇ denotes a conjugate complex number of X j .
  • Equation 7 Unlike considering a past signal value by using a probability statics function when the first coefficient is obtained in Equation 4 or 5, in Equation 7, a past signal value is not considered when ⁇ ij(n, k) is obtained. That is, the correlation coefficient acquirer 730 considers only similarity between two signals in a current frame when considering the similarity between the two signals.
  • the correlation coefficient acquirer 630 obtains Equation 6 by using Equation 7 and obtains the second coefficient by using Equation 6.
  • the correlation coefficient acquirer 730 obtains a correlation coefficient ⁇ by using the first coefficient and the second coefficient.
  • a correlation coefficient in the present invention is a value obtained by considering both similarity and coherence between two signals. Since both the first coefficient and the second coefficient are real numbers greater than or equal to 0 and less than or equal to 1, the correlation coefficient also has a real number value greater than or equal to 0 and less than or equal to 1.
  • the correlation coefficient acquirer 630 obtains a correlation coefficient and transmits the obtained correlation coefficient to the center signal acquirer 640.
  • the center signal acquirer 640 extracts a center signal from the stereo signals by using the correlation coefficient and the stereo signals.
  • the center signal acquirer 640 generates the center signal by obtaining an arithmetic average of the stereo signals and multiplying the arithmetic average by the correlation coefficient.
  • the center signal obtained by the center signal acquirer 640 may be represented by Equation 9.
  • X 1 (n,k) and X 2 (n,k) denote a left signal and a right signal in a frame having a time value of n and a frequency value of k, respectively.
  • the center signal acquirer 640 transmits the center signal generated as in Equation 9 to the inverse domain converter 650.
  • the inverse domain converter 650 converts the center signal generated in the time-frequency domain into a center signal in the time domain by using an algorithm such as inverse FFT (IFFT).
  • IFFT inverse FFT
  • the inverse domain converter 650 transmits the center signal converted into the time domain to the signal substractors 660 and 661.
  • the signal substractors 660 and 661 obtain differences between the stereo signals and the center signal in the time domain.
  • the signal substractors 660 and 661 obtain an ambient left signal by subtracting the center signal from the left signal and generate an ambient right signal by subtracting the center signal from the right signal.
  • the correlation coefficient acquirer 630 obtains a first coefficient indicating coherence between a left signal and a right signal at a current time point in consideration of past coherence between the two signals and obtains a second coefficient indicating similarity between the left signal and the right signal at the current time point.
  • the correlation coefficient acquirer 630 generates a correlation coefficient by using both the first coefficient and the second coefficient and extracts a center signal from stereo signals by using the correlation coefficient.
  • the correlation coefficient since the correlation coefficient is obtained in the time-frequency domain instead of the time domain, the correlation coefficient may be obtained more precisely in consideration of both time and frequency than in consideration of time only.
  • input channel signals may be bound on a two-channel basis, and a center channel signal separation technique may be applied to the input channel signals a plurality of times, or input channels may be down-mixed, and then a center channel separation technique may be applied to the down-mixed input channels to perform channel separation to a plurality of locations.
  • FIG. 7 is a block diagram of a configuration in which the virtual input channel signal generator and the channel separator are integrated, according to another embodiment of the present invention.
  • a sound image separator 700 includes domain converters 710 and 720, a correlation coefficient acquirer 730, a center signal acquirer 740, an inverse domain converter 750, signal substractors 760 and 761, a panning index acquirer 770, a gain index acquirer 780, and an ambient signal separator 790.
  • input channel signals may also be bound on a two-channel basis, and a center channel signal separation technique may be applied to the input channel signals a plurality of times, or input channels may also be down-mixed, and then a center channel separation technique may be applied to the down-mixed input channels to perform channel separation to a plurality of locations.
  • a process of acquiring a center signal from stereo signals L and R is the same as that in the embodiment disclosed in FIG. 7 .
  • the panning index acquirer 770 acquires a panning index Pan_Index ij (n, k) for separating a two-channel ambient signal into a 2xN-channel ambient signal to extract the center signal.
  • the panning index is determined by Equation 10.
  • Pan _ Index ij n k ⁇ ii n k ⁇ ⁇ jj n k ⁇ ii n k + ⁇ jj n k
  • ⁇ ij (n, k) is determined by Equations 3 and 4
  • Pan_Index ij (n, k) has a range between -1 and 1.
  • the gain index acquirer 780 acquires each gain index ⁇ 1 (n, k) to be applied to a sound image of an I th location by substituting the panning index to a predetermined gain table.
  • the ambient signal separator 790 acquires an ambient signal at the I th location based on frequency domain signals of L and R ambient signals and the gain index.
  • a gain to be applied to the ambient signal and the acquired L and R ambient signals at the I th location are determined by Equations 12 and 13, and ⁇ G is a forgetting factor and has a value between 0 and 1.
  • X 1L (n, k) and X 1R (n, k) denote frequency domain L and R ambient signals at the I th location, which have been sound-image-separated and finally acquired from the L and R ambient signals, respectively.
  • 2 ⁇ N ambient signals acquired in the manner described above are transmitted to the inverse domain converter 750, and the inverse domain converter 750 converts the center signal and the 2 ⁇ N ambient signals into a center signal and 2 ⁇ N ambient signals in the time domain by using an algorithm such as IFFT.
  • an algorithm such as IFFT.
  • a time domain signal separated into 2 ⁇ N+1 channels in the time domain may be acquired.
  • FIG. 8 shows a flowchart of a method of generating audio and a flowchart of a method of reproducing audio, according to an embodiment of the present invention.
  • the embodiment disclosed in FIG. 8 assumes that the above-described process of generating a virtual channel and channel-separating a sound image is performed by an audio reproduction apparatus.
  • FIG. 8A is a flowchart of a method of generating audio, according to an embodiment of the present invention.
  • the audio generation apparatus 100 receives input audio signals from N microphones in operation 810a and generates N input channel signals corresponding to the signals received from the respective microphones in operation 820a.
  • the audio generation apparatus 100 transmits generated N channel audio signals and information about the N channel audio signals to the audio reproduction apparatus 300 in operation 830a.
  • the audio signals and the information about the audio signals are encoded to a bitstream based on an appropriate codec and transmitted, and the information about the audio signals may be configured as metadata defined by the codec and encoded to a bitstream.
  • the audio signal may include an object audio signal.
  • the information about the N channel audio signals may include information about a location at which each channel signal is to be reproduced, and in this case, the information about a location at which each channel signal is to be reproduced may vary along time.
  • a location at which the birdsong is to be reproduced varies along a path through which a bird moves, and thus a location at which a channel signal is to be reproduced varies along time.
  • FIG. 8B is a flowchart of a method of reproducing audio, according to an embodiment of the present invention.
  • the audio reproduction apparatus 300 receives a bitstream in which the N channel audio signals and the information about the N channel audio signals are encoded, in operation 840b, and decodes the corresponding bitstream by using the codec used in the encoding.
  • the audio reproduction apparatus 300 generates M virtual channel signals based on the decoded N channel audio signals and an object audio signal in operation 850b.
  • M is an integer greater than N
  • the M virtual channel signals may be generated by weighted-summing the N channel signals.
  • weights to be applied to the weighted sum are determined based on a layout of input channels and a reproduction layout.
  • the reproduction apparatus 300 performs channel separation to reduce coherence between signals in operation 860b.
  • the reproduction apparatus 300 performs rendering by using a signal in which a sound image has been channel-separated, in operation 870b.
  • Audio rendering is a process of converting an input audio signal into an output audio signal such that the input audio signal can be reproduced according to an output system, and includes an up-mixing or down-mixing process if a number of input channels differs from a number of output channels. A rendering method is described below with reference to FIG. 12 and others.
  • FIG. 9 shows a flowchart of a method of generating audio and a flowchart of a method of reproducing audio, according to another embodiment of the present invention.
  • the embodiment disclosed in FIG. 9 assumes that the above-described process of generating a virtual channel and channel-separating a sound image is performed by an audio generation apparatus.
  • FIG. 9A is a flowchart of a method of generating audio, according to another embodiment of the present invention.
  • the audio generation apparatus 100 receives input audio signals from N microphones in operation 910a and generates N input channel signals corresponding to the signals received from the respective microphones in operation 920a.
  • the audio generation apparatus 100 generates M virtual channel audio signals based on the N channel audio signals and an object audio signal in operation 930a.
  • M is an integer greater than N
  • the M virtual channel audio signals may be generated by weighted-summing the N channel audio signals.
  • weights to be applied to the weighted sum are determined based on a layout of input channels and a reproduction layout.
  • the generation apparatus 100 performs channel separation to reduce coherence between signals in operation 940a.
  • the audio generation apparatus 100 transmits generated M channel audio signals and information about the M channel audio signals to the audio reproduction apparatus 300 in operation 950a.
  • the audio signals and the information about the audio signals are encoded to a bitstream based on an appropriate codec and transmitted, and the information about the audio signals may be configured as metadata defined by the codec and encoded to a bitstream.
  • the audio signal may include an object audio signal.
  • the information about the M channel audio signals may include information about a location at which each channel signal is to be reproduced, and in this case, the information about a location at which each channel signal is to be reproduced may vary along time.
  • a location at which the birdsong is to be reproduced varies along a path through which a bird moves, and thus a location at which a channel signal is to be reproduced varies along time.
  • FIG. 9B is a flowchart of a method of reproducing audio, according to another embodiment of the present invention.
  • the audio reproduction apparatus 300 receives a bitstream in which the M channel audio signals and the information about the M channel audio signals are encoded, in operation 960b, and decodes the corresponding bitstream by using the codec used in the encoding.
  • the reproduction apparatus 300 performs rendering by using the decoded M channel signals in operation 970b.
  • Audio rendering is a process of converting an input audio signal into an output audio signal such that the input audio signal can be reproduced according to an output system, and includes an up-mixing or down-mixing process if a number of input channels differs from a number of output channels. A rendering method is described below with reference to FIG. 12 and others.
  • FIG. 10 shows a flowchart of a method of generating audio and a flowchart of a method of reproducing audio, according to another embodiment of the present invention.
  • the embodiment disclosed in FIG. 11 assumes that a process of generating a virtual channel is performed by an audio generation apparatus and a process of channel-separating a sound image is performed by an audio reproduction apparatus.
  • FIG. 10A is a flowchart of a method of generating audio, according to another embodiment of the present invention.
  • the audio generation apparatus 100 receives input audio signals from N microphones in operation 1010a and generates N input channel signals corresponding to the signals received from the respective microphones in operation 1020a.
  • the audio generation apparatus 100 generates M virtual channel signals based on the N channel audio signals and an object signal in operation 1030a.
  • M is an integer greater than N, and the M virtual channel signals may be generated by weighted-summing the N channel audio signals.
  • weights to be applied to the weighted sum are determined based on a layout of input channels and a reproduction layout.
  • the audio generation apparatus 100 transmits generated M channel audio signals and information about the M channel audio signals to the audio reproduction apparatus 300 in operation 1040a.
  • the audio signals and the information about the audio signals are encoded to a bitstream based on an appropriate codec and transmitted, and the information about the audio signals may be configured as metadata defined by the codec and encoded to a bitstream.
  • the audio signal may include an object audio signal.
  • the information about the M channel audio signals may include information about a location at which each channel signal is to be reproduced, and in this case, the information about a location at which each channel signal is to be reproduced may vary along time.
  • a location at which the birdsong is to be reproduced varies along a path through which a bird moves, and thus a location at which a channel signal is to be reproduced varies along time.
  • FIG. 10B is a flowchart of a method of reproducing audio, according to another embodiment of the present invention.
  • the audio reproduction apparatus 300 receives a bitstream in which the M channel audio signals and the information about the M channel audio signals are encoded, in operation 1050b, and decodes the corresponding bitstream by using the codec used in the encoding.
  • the generation apparatus 100 performs channel separation to reduce coherence between signals in operation 1060b.
  • the reproduction apparatus 300 performs rendering by using a signal in which a sound image has been channel-separated, in operation 1070b.
  • Audio rendering is a process of converting an input audio signal into an output audio signal such that the input audio signal can be reproduced according to an output system, and includes an up-mixing or down-mixing process if a number of input channels differs from a number of output channels. A rendering method is described below with reference to FIG. 13 and others.
  • FIG. 11 illustrates an audio reproduction system capable of reproducing an audio signal in a range of 360° horizontally.
  • 3D content may include all information about a 3D space.
  • a range which a user can recognize a sense of space in a vertical direction is limited, but the user can recognize a sense of space in a horizontal direction in the entire range of 360° with the same sensitivity.
  • 3D content reproduction systems have an environment in which a 3D image and audio content produced in a range of 360° horizontally can be reproduced.
  • FIG. 11A illustrates a head mounted display (HMD).
  • the HMD indicates a display device of a head wearing type.
  • the HMD is usually used to implement virtual reality (VR) or augmented reality (AR).
  • VR virtual reality
  • AR augmented reality
  • VR is a technology of artificially generating a specific environment or situation such that a user interacts with an actual surrounding situation and environment.
  • AR is a technology of overlapping a virtual object with reality recognized by a user with naked eyes such that the user views the virtual object and the reality. Since AR mixes a virtual world having additional information with the real world in real-time such that a user views a single image, AR is also called mixed reality (MR).
  • MR mixed reality
  • the HMD has a display located closely to the eyes of the user, and thus when an image is displayed by using the HMD, the user may feel a relatively high sense of immersion.
  • a large screen may be implemented with a small-sized device, and 3D or 4D content may be reproduced.
  • an image signal is reproduced through the HMD worn around a head, and an audio signal may be reproduced through headphones equipped in the HMD or separate headphones.
  • the image signal is reproduced through the HMD, and the audio signal may be reproduced through a general audio reproduction system.
  • the HMD may be configured in an integrated type including a controller and a display therein or configured with a separate mobile terminal such as a smartphone such that the mobile terminal operates as a display, a controller, and the like.
  • FIG. 11B illustrates a home theater system (HTS).
  • HTS home theater system
  • the HTS is a system for implementing an image with high image quality and audio with high sound quality at home such that a user can enjoy movie with a sense of reality, and since the HTS includes an image display for implementing a large screen and a surround audio system for high sound quality, the HTS corresponds to a most general multi-channel audio output system installed at home.
  • 5.1 channels or 5.0 channels including a center channel, a left channel, a right channel, a surround left channel, and a surround right channel and additionally including a woofer channel according to circumstances.
  • a technique of controlling a distance and a direction may be applied.
  • a content reproduction distance is short, content of a relatively narrow region is displayed at a wide angle, and when the content reproduction distance is long, content of a relatively wide region is displayed.
  • a content reproduction direction is changed, content of a region corresponding to the changed direction may be displayed.
  • An audio signal can be controlled according to a reproduction distance and direction of image content to be displayed, and when the content reproduction distance is shorter than before, a volume (gain) of audio content is increased, and when the content reproduction distance is longer than before, a volume (gain) of audio content is decreased.
  • audio may be rendered based on the changed direction to reproduce audio content corresponding to a changed reproduction angle.
  • the content reproduction distance and reproduction direction may be determined based on a user input or determined based on a motion of a user, particularly, movement and rotation of a head.
  • FIG. 12 illustrates a schematic configuration of a 3D audio renderer 1200 in a 3D audio reproduction apparatus, according to an embodiment of the present invention.
  • stereophonic audio rendering includes filtering and panning operations.
  • the panning operation includes calculating and applying a panning coefficient to be applied for each frequency band and each channel in order to pan an input audio signal with respect to each output channel.
  • the panning on an audio signal indicates controlling a magnitude of a signal to be applied to each output channel in order to render a sound source to a specific location between two output channels.
  • the filtering includes correcting a tone and the like of a decoded audio signal according to a location and filtering an input audio signal by using a HRTF filter or a BRTF filter.
  • the 3D audio renderer 1200 receives an input audio signal 1210 including at least one of a channel audio signal and an object audio signal and transmits an output audio signal 1230 including at least one of a rendered channel audio signal and object audio signal to an output unit.
  • additional information may be additionally received as an input, and the additional information may include per-time reproduction location information of the input audio signal, language information of each object, or the like.
  • a head positon, a rotating angle of the head, and the like based on the head motion of the user may be additionally included in the additional information.
  • per-time reproduction location information of a corrected input audio signal to which the head positon, the rotating angle of the head, and the like based on the head motion of the user have been reflected may be additionally included in the additional information.
  • FIG. 13 is a block diagram for describing a rendering method for sound externalization with a low computation amount, according to an embodiment of the present invention.
  • an echo component is simulated through signal processing by using the BRTF which is an expended concept of the HRTF.
  • the BRIR used for the sound externalization is used to simulate an echo in a form of a finite impulse response (FIR) filter, and thus a many order of filter taps are generally used.
  • FIR finite impulse response
  • a long-tap BRIR filter coefficient corresponding to a left ear/a right ear for each input channel is used. Therefore, for real-time sound externalization, filter coefficients corresponding to "number of channels ⁇ binaural room filter coefficient ⁇ 2" are needed, and in this case, a computation amount is generally proportional to the number of channels and the binaural room filter coefficient.
  • An input of a renderer 1400 may be at least one of a decoded object audio signal and channel audio signal, and an output may be at least one of a rendered object audio signal and channel audio signal.
  • the renderer 1300 includes a domain converter 1310, an HRTF selector 1320, transfer function application units 1330 and 1340, and inverse domain converters 1350 and 1360.
  • the embodiment of the present invention which is disclosed in FIG. 13 , assumes that an object audio signal is rendered by applying a low-computation-amount BRTF.
  • the domain converter 1310 performs a similar operation to that of the domain converters of FIGS. 6 and 7 and converts a domain of an input first object signal.
  • the domain converter 1310 converts a stereo signal into a stereo signal in the time-frequency domain by using an algorithm such as FFT.
  • the time-frequency domain is used to represent both changes in time and frequency.
  • a signal may be divided into a plurality of frames according to time and frequency values, and a signal in each frame may be represented by a frequency sub-band value in each time slot.
  • the HRTF selector 1320 transmits a real-time HRTF selected from an HRTF database based on a head motion of a user, which has been input through additional information, to the transfer function application units 1330 and 1340.
  • an HRTF of a direction corresponding to a head motion and location of the user at a specific time point i.e., "a real-time HRTF" is selected.
  • Table 1 illustrates an HRTF index table according to real-time head motions.
  • a location at which a sound image is to be rendered and a head motion of the user may be compensated for and externalized.
  • head motion location information of the user may be received as additional information
  • both head motion location information of the user and a location at which a sound image is to be rendered may be received as additional information.
  • Table 1 shows an HRTF corrected when the head of the user has rotated when it is desired to perform sound externalization rendering such that a sound image is reproduced at a location having a horizontal left azimuth angle of 90° and an elevation angle of 0°.
  • HRTFs to be reflected to input additional information are stored in advance as a table with indices, real-time head motion correction is possible.
  • an HRTF corrected for tone correction may be used according to circumstances for stereophonic audio rendering.
  • the HRTF database may previously have a value obtained by domain-converting an HRIR for each reproduction location into an HRIR in the frequency domain, or the HRTF database may be modeled and acquired by a method such as principal component analysis (PCA) or pole-zero modeling in order to reduce a data size.
  • PCA principal component analysis
  • the transfer function application units 1330 and 1340 apply a transfer function to the audio signal received from the domain converter 1310 and further include HRTF application units 1331 and 1341 and BRTF application units 1332 and 1342.
  • the HRTF application unit 1331 of the transfer function application unit 1330 applies the real-time HRTF of the left output channel, which has been transmitted from the HRTF selector 1320, to the audio signal received from the domain converter 1310.
  • the BRTF application unit 1332 of the transfer function application unit 1330 applies a BRTF of the left output channel.
  • the BRTF is used as a fixed value instead of a real-time varying value. Since a characteristic of a space is applied to the BRTF corresponding to an echo component, a length of an echo and a number of filter taps rather than a change along time affect the rendering performance more.
  • the real-time HRTF of the left output channel which is applied by the HRTF application unit 1331, corresponds to a value (early HRTF) obtained by domain-converting, into the frequency domain, a time response before a predetermined reference time (early HRIR) among original HRTFs.
  • the BRTF of the left output channel which is applied by the BRTF application unit 1432, corresponds to a value (late BRTF) obtained by domain-converting, into the frequency domain, a time response after the predetermined reference time (late BRIR) among original BRTFs.
  • the transfer function applied by the transfer function application unit 1330 is a transfer function obtained by domain-converting, into the frequency domain, an impulse response to which an HRIR has been applied before the predetermined reference time and a BRIR has been applied after the predetermined reference time.
  • the audio signal to which a real-time HRTF has been applied by the HRTF application unit 1331 and the audio signal to which a BRTF has been applied by the BRTF application unit 1332 are added by a signal adder 1333 and transmitted to the inverse domain converter 1350.
  • the inverse domain converter 1350 generates a left channel output signal by converting the signal, which has been converted into the frequency domain, into a signal in the time domain again.
  • FIG. 14 illustrates formulae representing a specific operation of a transfer function application unit according to an embodiment of the present invention.
  • An impulse response obtained by integrating an HRIR and a BRIR corresponds to a long-tap filter, and in view of block convolution in which convolution is applied by dividing a long-tap filter coefficient into a plurality of blocks, a sound externalization scheme of reflecting a location change along time through data update of a real-time HRTF before a predetermined reference time can be performed as shown in FIG. 14 .
  • the block convolution is an operation method for efficiently convoluting a signal having a long sequence and corresponds to an overlap add (OLA) method.
  • FIG. 14 illustrates a detailed operation method of BRIR-HRIR rendering for low-computation-amount sound externalization in a transfer function application unit 1400, according to an embodiment of the present invention.
  • the first column 1411 (F(1), F(2),..., F(N)) of 1410 corresponds to a filter coefficient to which a real-time HRTF has been reflected
  • a second column 1412 (F(N+1), F(N+2),..., F(2N)) and next columns correspond to filter coefficients to which a BRTF for rendering an echo has been reflected.
  • a first column 1421 (X(1), X(2),..., X(N)) of the input signal 1420 corresponds to a frequency input sample at a current time
  • a second column 1422 (X(N+1), X(N+2),..., X(2N)) and next columns correspond to data already input before the current time.
  • the filter coefficient 1410 and the input 1420 configured as described above are multiplied column by column (1430). That is, the first column 1411 of the filter coefficient is multiplied by the first column 1421 of the input (1431, F(1)X(1), F(2)X(2),..., F(N)X(N)), and the second column 1412 of the filter coefficient is multiplied by the second column 1422 of the input (1432, F(N+1)X(N+1), F(N+2)X(N+2),..., F(2N)X(2N)).
  • factors of each row are added to generate N output signals 1440 in the frequency domain. That is, an n th sample value of the N output signals is ⁇ F(iN + n)X(iN + n).
  • FIG. 15 is a block diagram of a device 1500 for rendering a plurality of channel inputs and a plurality of object inputs, according to an embodiment of the present invention.
  • FIG. 13 a case in which one object input is rendered has been assumed. If it is assumed that N channel audio signals and M object audio signals are input, FIG. 13 can be extends to FIG. 15 . However, even in FIG. 15 , since processing on a left output channel is the same as processing on a right output channel, a description is made only based on a rendering device for the left output channel.
  • each input signal is converted into a stereo signal in the time-frequency domain by using an algorithm such as FFT.
  • the time-frequency domain is used to represent both changes in time and frequency.
  • a signal may be divided into a plurality of frames according to time and frequency values, and a signal in each frame may be represented by a frequency sub-band value in each time slot.
  • an HRTF is selected based on input additional information, wherein, with regard to a channel audio signal, an HRTF may be selected based on a head motion and location of a user, and with regard to an object audio signal, a reproduction location of the object audio signal may be additionally considered in addition to the head motion and location of the user.
  • a transfer function application unit 1530 applies a corresponding transfer function to each of the (N + M) domain-converted input signals.
  • a unique HRTF (early HRTF) may be applied before a predetermined reference time, and the same BRTF (late BRTF) may be applied after the predetermined reference time.
  • the (N + M) input signals to which respective transfer functions have been applied by the transfer function application unit 1530 are added by a signal adder and transmitted to an inverse domain converter 1550.
  • the inverse domain converter 1550 generates a left channel output signal by converting the signal, which has been converted into the frequency domain, into a signal in the time domain again.
  • FIG. 16 is a block diagram of a configuration in which a channel separator and a renderer are integrated, according to an embodiment of the present invention.
  • FIG. 17 is a block diagram of a configuration in which a channel separator and a renderer are integrated, according to another embodiment of the present invention.
  • a panning gain is determined based on layouts of each input channel and an output channel.
  • tone correction filtering may be additionally performed by using an HRTF (not shown).
  • an up-mixer or a down-mixer may be additionally included.
  • FIG. 18 is a block diagram of a renderer including a layout converter, according to an embodiment of the present invention.
  • the renderer according to the embodiment disclosed in FIG. 18 further includes a layout converter 1830 besides an input-output signal converter 1810 for converting an input channel signal into an output channel signal.
  • the layout converter 1830 receives output speaker layout information about installation locations and the like of L output speakers and head position information of a user.
  • the layout converter 1830 converts a layout of the output speakers based on the head position information of the user.
  • the input-output signal converter 1810 receives the converted output channel layout information from the layout converter and converts (renders) input-output signals based on the received output channel layout information.
  • the input-output signal converter includes a down-mixing process.
  • FIG. 19 illustrates a change in an output channel layout based on user head position information, according to an embodiment of the present invention.
  • FIG. 19 it is assumed according to the embodiment disclosed in FIG. 18 that the number M of input channels is 5, the number L of output channels is 2, installation locations of two output speakers are left and right 15°, i.e., +15° and -15°, and the user turns the head by 10° to the right, i.e., +10°.
  • FIG. 19A illustrates input and output channel locations before head position information of a user is reflected.
  • the number M of input channels is 5, and the input channels includes a center channel (0), a right channel (+30), a left channel (-30), a surround right channel (+110), and a surround left channel (-110).
  • the number L of output channels is 2, and the output speakers are located at left and right 15°, i.e., +15° and -15°.
  • FIG. 19B illustrates input and output channel locations after locations of the output channels are changed by reflecting the head position information of the user.
  • the locations of the input channels are not changed, and the changed locations of the output channels are +25° and -5°.
  • the left and right output channel signals are determined by Equation 13.
  • y L a ⁇ x ⁇ 30 + 1 ⁇ a ⁇ x 0
  • y R b ⁇ x 0 + 1 ⁇ b ⁇ x + 30
  • a and b scaling constants determined based on a distance between an input channel and an output channel or an azimuth angle difference.
  • FIGS. 20 and 21 illustrate a method of compensating for a delay of a capturing device or a device for tracking the head of a user, according to an embodiment of the present invention.
  • FIG. 20 illustrates a method of compensating for a user head tracking delay.
  • the user head tracking delay is determined based on a head motion of the user and a delay of a head tracking sensor.
  • the head tracking sensor may sense a direction of 2 as a head direction of the user due to a delay of the sensor.
  • a head angular velocity is calculated according to a head moving speed of the user, and a compensation angle ⁇ is compensated or a location is compensated to 1 by multiplying the calculated head angular velocity by a delay dt of the head tracking sensor.
  • An interpolation angle or location may be determined based on the compensated angle or location, and an audio signal may be rendered based on the interpolation angle or location. This is arranged with regard to the compensation angle as Equation 14.
  • Compensation angle ⁇ head angular velocity ⁇ head tracking sensor delay dt
  • angle or location mismatch which may occur due to a sensor delay may be compensated for.
  • a velocity sensor When a velocity is calculated, a velocity sensor may be used, and when an accelerometer is used, a velocity may be obtained by integrating an acceleration along time.
  • an angle may include head moving angles (roll, pitch, and yaw) with regard to a location of a virtual speaker, which has been set by the user, or on 3D axes.
  • FIG. 21 illustrates a method of compensating for delays of a capturing device and a user head tracking device when an audio signal captured by a device attached to a moving object is rendered.
  • real-time location information (location, angle, velocity, angular velocity, and the like) of the capturing device may be configured as metadata and transmitted to a rendering device together with a capturing audio signal.
  • the capturing device may receive location information commanded from a separate device attached with a controller such as a joystick or a smartphone remote control and change a location of the capturing device by reflecting the received location information.
  • a controller such as a joystick or a smartphone remote control
  • metadata of the capturing device may include location information of the separate device.
  • a delay may occur in each of a plurality of devices and sensors.
  • the delay may include a time delay from a command of the controller to a response of a sensor of the capturing device and a delay of a head tracking sensor.
  • compensation can be performed by a method similar to the embodiment disclosed in FIG. 20 .
  • Compensation angle ⁇ capturing device velocity ⁇ capturing sensor delay dt _ c ⁇ head angular velocity ⁇ head tracking sensor delay dt _ h
  • a length of a filter used in the above-described rendering method connectable to a head motion affects a delay of a final output signal.
  • a length of a rendering filter is too long, a sound image of an output audio signal cannot follow a head moving speed, and thus the sound image may not be pin-pointed according to a head motion and may be thus blurred, or location information between an image and a sound image may not match, thereby decreasing a sense of reality.
  • a length of the entire filter to be used may be adjusted, or when a long-tap filter is used, a length N of an individual block to be used for block convolution may be adjusted.
  • Determination of a filter length for sound image rendering should be designed such that a location of a sound image can be maintained even when a head motion is changed after sound image rendering, and thus a maximum delay should be designed such that the location of the sound image can be maintained in consideration of a head moving direction and speed of the user.
  • the designed maximum delay should be determined so as not to exceed a total input-output delay of an audio signal.
  • a delay to be applied to the sound image rendering filter is determined by Equations 15 through 17.
  • Total input-output delay of audio signal sound image rendering filter-applied delay + head position estimation delay of head tracking device + other algorithm delays
  • a length of the sound image rendering filter should be determined such that the delay after applying the sound image rendering filter does not exceed 50 ms.
  • the above-described embodiments according to the present invention may be implemented as computer instructions which may be executed by various computer components, and recorded on a non-transitory computer-readable recording medium.
  • the non-transitory computer-readable recording medium may include program commands, data files, data structures, or a combination thereof.
  • the program commands recorded on the non-transitory computer-readable recording medium may be specially designed and constructed for the present invention or may be known to and usable by one of ordinary skill in a field of computer software.
  • non-transitory computer-readable medium examples include magnetic media such as hard discs, floppy discs, or magnetic tapes, optical media such as compact disc-read only memories (CD-ROMs) or digital versatile discs (DVDs), magneto-optical media such as floptical discs, and hardware devices that are specially configured to store and carry out program commands (e.g., ROMs, RAMs, or flash memories).
  • program commands include a high-level language code that may be executed by a computer using an interpreter as well as a machine language code made by a complier.
  • the hardware devices may be changed to one or more software modules for performing processing according to the present invention, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Claims (8)

  1. Audiowiedergabeverfahren, welches Folgendes umfasst:
    Empfangen eines Mehrkanal-Audiosignals und zusätzlicher Informationen, die einen Wiedergabestandort des Mehrkanal-Audiosignals enthalten;
    Erfassen von Standortinformationen eines Benutzers;
    Kanaltrennen des empfangenen Mehrkanal-Audiosignals basierend auf den empfangenen zusätzlichen Informationen;
    Rendern des kanalgetrennten Mehrkanal-Audiosignals basierend auf den empfangenen zusätzlichen Informationen und den erfassten Standortinformationen des Benutzers; und
    Wiedergeben des gerenderten Mehrkanal-Audiosignals,
    wobei das Kanaltrennen das Trennen von Kanälen umfasst, basierend auf der Kohärenz zwischen Kanalsignalen, die in dem Mehrkanal-Audiosignal enthalten sind, und den zusätzlichen Informationen.
  2. Verfahren nach Anspruch 1, welches ferner das Erzeugen eines virtuellen Eingangskanalsignals umfasst, basierend auf dem empfangenen Mehrkanal-Audiosignal.
  3. Verfahren nach Anspruch 1, wobei das Empfangen des Weiteren das Empfangen eines Objekt-Audiosignals umfasst.
  4. Verfahren nach Anspruch 1, wobei das Rendern des Mehrkanal-Audiosignals Folgendes umfasst:
    Rendern des Mehrkanal-Audiosignals basierend auf einer kopfbezogenen Impulsantwort (HRIR) in Bezug auf eine Zeit vor einer vorbestimmten Referenzzeit; und
    Rendern des Mehrkanal-Audiosignals basierend auf einer binauralen Raumimpulsantwort (BRIR) in Bezug auf eine Zeit nach der vorbestimmten Referenzzeit.
  5. Verfahren nach Anspruch 1, wobei die Standortinformationen des Benutzers basierend auf einer Benutzereingabe oder einer gemessenen Position des Benutzers bestimmt werden.
  6. Audiowiedergabevorrichtung, welche Folgendes umfasst:
    einen Empfänger, der konfiguriert ist, um ein Mehrkanal-Audiosignal und zusätzliche Informationen zu empfangen, die einen Wiedergabestandort des Mehrkanal-Audiosignals enthalten;
    einen Standortinformationserwerber, der konfiguriert ist, um Standortinformationen eines Benutzers zu erfassen;
    einen Kanaltrenner, der konfiguriert ist, um das empfangene Mehrkanal-Audiosignal basierend auf den empfangenen zusätzlichen Informationen kanalzutrennen;
    einen Renderer, der konfiguriert ist, um das kanalgetrennte Mehrkanal-Audiosignal zu rendern, basierend auf den empfangenen zusätzlichen Informationen und den erfassten Standortinformationen des Benutzers; und
    ein Wiedergabegerät, das konfiguriert ist, um das gerenderte Mehrkanal-Audiosignal wiederzugeben,
    wobei das Kanaltrennen das Trennen von Kanälen umfasst, basierend auf der Kohärenz zwischen Kanalsignalen, die in dem Mehrkanal-Audiosignal enthalten sind, und den zusätzlichen Informationen.
  7. Audiowiedergabevorrichtung nach Anspruch 6, welche des Weiteren umfasst:
    einen virtuellen Eingangskanalsignalgenerator, der konfiguriert ist, um ein virtuelles Eingangskanalsignal basierend auf dem empfangenen Mehrkanal-Audiosignal zu erzeugen,
    wobei der Kanaltrenner konfiguriert ist, um Kanäle zu trennen, basierend auf der Kohärenz zwischen Kanalsignalen, die in dem Mehrkanal-Audiosignal enthalten sind, und den zusätzlichen Informationen.
  8. Nicht vorübergehendes computerlesbares Aufzeichnungsmedium, auf dem ein Computerprogramm zum Ausführen des Verfahrens nach Anspruch 1 aufgezeichnet ist.
EP15832603.3A 2014-08-13 2015-08-13 Verfahren und vorrichtung zur erzeugung und wiedergabe von audiosignalen Active EP3197182B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201462037088P 2014-08-13 2014-08-13
US201562163041P 2015-05-18 2015-05-18
PCT/KR2015/008529 WO2016024847A1 (ko) 2014-08-13 2015-08-13 음향 신호를 생성하고 재생하는 방법 및 장치

Publications (3)

Publication Number Publication Date
EP3197182A1 EP3197182A1 (de) 2017-07-26
EP3197182A4 EP3197182A4 (de) 2018-04-18
EP3197182B1 true EP3197182B1 (de) 2020-09-30

Family

ID=55304392

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15832603.3A Active EP3197182B1 (de) 2014-08-13 2015-08-13 Verfahren und vorrichtung zur erzeugung und wiedergabe von audiosignalen

Country Status (5)

Country Link
US (1) US10349197B2 (de)
EP (1) EP3197182B1 (de)
KR (1) KR20160020377A (de)
CN (1) CN106797525B (de)
WO (1) WO2016024847A1 (de)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107615767B (zh) 2015-06-02 2021-05-25 索尼公司 发送装置、发送方法、媒体处理装置、媒体处理方法以及接收装置
EP3357259B1 (de) * 2015-09-30 2020-09-23 Dolby International AB Verfahren und vorrichtung zur erzeugung von 3d-audio-inhalt aus zweikanaligem stereoinhalt
WO2017085562A2 (en) 2015-11-20 2017-05-26 Dolby International Ab Improved rendering of immersive audio content
US10262665B2 (en) * 2016-08-30 2019-04-16 Gaudio Lab, Inc. Method and apparatus for processing audio signals using ambisonic signals
KR102614577B1 (ko) * 2016-09-23 2023-12-18 삼성전자주식회사 전자 장치 및 그 제어 방법
US10410320B2 (en) 2016-09-30 2019-09-10 Sony Interactive Entertainment Inc. Course profiling and sharing
US10679511B2 (en) 2016-09-30 2020-06-09 Sony Interactive Entertainment Inc. Collision detection and avoidance
US11125561B2 (en) 2016-09-30 2021-09-21 Sony Interactive Entertainment Inc. Steering assist
US10850838B2 (en) 2016-09-30 2020-12-01 Sony Interactive Entertainment Inc. UAV battery form factor and insertion/ejection methodologies
US10416669B2 (en) 2016-09-30 2019-09-17 Sony Interactive Entertainment Inc. Mechanical effects by way of software or real world engagement
US10377484B2 (en) 2016-09-30 2019-08-13 Sony Interactive Entertainment Inc. UAV positional anchors
US10336469B2 (en) 2016-09-30 2019-07-02 Sony Interactive Entertainment Inc. Unmanned aerial vehicle movement via environmental interactions
US10210905B2 (en) 2016-09-30 2019-02-19 Sony Interactive Entertainment Inc. Remote controlled object macro and autopilot system
US10067736B2 (en) * 2016-09-30 2018-09-04 Sony Interactive Entertainment Inc. Proximity based noise and chat
US10357709B2 (en) 2016-09-30 2019-07-23 Sony Interactive Entertainment Inc. Unmanned aerial vehicle movement via environmental airflow
KR20180091319A (ko) * 2017-02-06 2018-08-16 삼성에스디에스 주식회사 사운드 공유 장치 및 방법
CN110786023B (zh) * 2017-06-21 2021-12-28 雅马哈株式会社 信息处理装置、信息处理系统、记录介质及信息处理方法
DE102018216604A1 (de) * 2017-09-29 2019-04-04 Apple Inc. System zur Übertragung von Schall in den und aus dem Kopf eines Hörers unter Verwendung eines virtuellen akustischen Systems
US10880649B2 (en) * 2017-09-29 2020-12-29 Apple Inc. System to move sound into and out of a listener's head using a virtual acoustic system
US10304490B2 (en) * 2017-11-02 2019-05-28 AcoustiX VR Inc. Acoustic holographic recording and reproduction system using meta material layers
CN111406414B (zh) * 2017-12-01 2022-10-04 株式会社索思未来 信号处理装置以及信号处理方法
CN107978328B (zh) * 2017-12-21 2020-07-24 联想(北京)有限公司 信息处理方法及其装置
CN108156575B (zh) 2017-12-26 2019-09-27 广州酷狗计算机科技有限公司 音频信号的处理方法、装置及终端
KR20190083863A (ko) * 2018-01-05 2019-07-15 가우디오랩 주식회사 오디오 신호 처리 방법 및 장치
US10694311B2 (en) * 2018-03-15 2020-06-23 Microsoft Technology Licensing, Llc Synchronized spatial audio presentation
KR102556092B1 (ko) 2018-03-20 2023-07-18 한국전자통신연구원 지향성 마이크를 이용한 음향 이벤트 검출 방법, 그리고 지향성 마이크를 이용한 음향 이벤트 검출 장치
US10848894B2 (en) * 2018-04-09 2020-11-24 Nokia Technologies Oy Controlling audio in multi-viewpoint omnidirectional content
US11375332B2 (en) 2018-04-09 2022-06-28 Dolby International Ab Methods, apparatus and systems for three degrees of freedom (3DoF+) extension of MPEG-H 3D audio
IL309872A (en) 2018-04-09 2024-03-01 Dolby Int Ab Methods, devices and systems for three-degree-of-freedom amplification of MPEG-H 3D audio
US10917735B2 (en) * 2018-05-11 2021-02-09 Facebook Technologies, Llc Head-related transfer function personalization using simulation
US10390170B1 (en) * 2018-05-18 2019-08-20 Nokia Technologies Oy Methods and apparatuses for implementing a head tracking headset
CN109088786B (zh) * 2018-06-26 2022-03-08 中国直升机设计研究所 一种用于测试直升机模拟器网络延迟方法
EP3595336A1 (de) * 2018-07-09 2020-01-15 Koninklijke Philips N.V. Audiovorrichtung und verfahren zum betrieb davon
US10976989B2 (en) * 2018-09-26 2021-04-13 Apple Inc. Spatial management of audio
US11100349B2 (en) 2018-09-28 2021-08-24 Apple Inc. Audio assisted enrollment
KR102602971B1 (ko) * 2018-12-17 2023-11-17 삼성전자주식회사 균일한 음질의 소리를 출력하기 위한 오디오 장치
CN113545109B (zh) * 2019-01-08 2023-11-03 瑞典爱立信有限公司 用于虚拟现实的有效空间异质音频元素
GB2581785B (en) * 2019-02-22 2023-08-02 Sony Interactive Entertainment Inc Transfer function dataset generation system and method
CN110544484B (zh) * 2019-09-23 2021-12-21 中科超影(北京)传媒科技有限公司 高阶Ambisonic音频编解码方法及装置
GB2587371A (en) * 2019-09-25 2021-03-31 Nokia Technologies Oy Presentation of premixed content in 6 degree of freedom scenes
CN113875265A (zh) * 2020-04-20 2021-12-31 深圳市大疆创新科技有限公司 音频信号处理方法、音频处理装置及录音设备
US11729571B2 (en) * 2020-08-04 2023-08-15 Rafael Chinchilla Systems, devices and methods for multi-dimensional audio recording and playback
KR20230119193A (ko) * 2020-12-15 2023-08-16 에스와이엔지, 인크. 오디오 업믹싱을 위한 시스템 및 방법
CN113889125B (zh) * 2021-12-02 2022-03-04 腾讯科技(深圳)有限公司 音频生成方法、装置、计算机设备和存储介质
CN115086861B (zh) * 2022-07-20 2023-07-28 歌尔股份有限公司 音频处理方法、装置、设备及计算机可读存储介质

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1025743B1 (de) 1997-09-16 2013-06-19 Dolby Laboratories Licensing Corporation Verwendung von filter-effekten bei stereo-kopfhörern zur verbesserung der räumlichen wahrnehmung einer schallquelle durch einen hörer
US7333622B2 (en) 2002-10-18 2008-02-19 The Regents Of The University Of California Dynamic binaural sound capture and reproduction
US8374365B2 (en) * 2006-05-17 2013-02-12 Creative Technology Ltd Spatial audio analysis and synthesis for binaural reproduction and format conversion
KR20100062784A (ko) 2008-12-02 2010-06-10 한국전자통신연구원 객체 기반 오디오 컨텐츠 생성/재생 장치
EP2194527A3 (de) 2008-12-02 2013-09-25 Electronics and Telecommunications Research Institute Vorrichtung zur Erzeugung und Wiedergabe von objektbasierten Audioinhalten
KR101485462B1 (ko) 2009-01-16 2015-01-22 삼성전자주식회사 후방향 오디오 채널의 적응적 리마스터링 장치 및 방법
US8705769B2 (en) 2009-05-20 2014-04-22 Stmicroelectronics, Inc. Two-to-three channel upmix for center channel derivation
US20100328419A1 (en) * 2009-06-30 2010-12-30 Walter Etter Method and apparatus for improved matching of auditory space to visual space in video viewing applications
KR101567461B1 (ko) 2009-11-16 2015-11-09 삼성전자주식회사 다채널 사운드 신호 생성 장치
KR101690252B1 (ko) 2009-12-23 2016-12-27 삼성전자주식회사 신호 처리 방법 및 장치
WO2011104418A1 (en) 2010-02-26 2011-09-01 Nokia Corporation Modifying spatial image of a plurality of audio signals
EP2464145A1 (de) 2010-12-10 2012-06-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zur Dekomposition eines Eingabesignals mit einem Downmixer
RU2595943C2 (ru) * 2011-01-05 2016-08-27 Конинклейке Филипс Электроникс Н.В. Аудиосистема и способ оперирования ею
HUE054452T2 (hu) * 2011-07-01 2021-09-28 Dolby Laboratories Licensing Corp Rendszer és eljárás adaptív hangjel elõállítására, kódolására és renderelésére
KR101901593B1 (ko) 2012-03-28 2018-09-28 삼성전자주식회사 가상 입체 음향 생성 방법 및 장치
WO2013181272A2 (en) 2012-05-31 2013-12-05 Dts Llc Object-based audio system using vector base amplitude panning
WO2014088328A1 (ko) 2012-12-04 2014-06-12 삼성전자 주식회사 오디오 제공 장치 및 오디오 제공 방법
TR201808415T4 (tr) 2013-01-15 2018-07-23 Koninklijke Philips Nv Binoral ses işleme.
BR112015018993B1 (pt) * 2013-03-28 2023-11-28 Dolby International Ab Método e aparelho
TWI530941B (zh) * 2013-04-03 2016-04-21 杜比實驗室特許公司 用於基於物件音頻之互動成像的方法與系統
US9369818B2 (en) 2013-05-29 2016-06-14 Qualcomm Incorporated Filtering with binaural room impulse responses with content analysis and weighting
CN117376809A (zh) * 2013-10-31 2024-01-09 杜比实验室特许公司 使用元数据处理的耳机的双耳呈现
EP3172730A1 (de) 2014-07-23 2017-05-31 PCMS Holdings, Inc. System und verfahren zur bestimmung von audiokontext in anwendungen für erweiterte realität

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
EP3197182A4 (de) 2018-04-18
US20170251323A1 (en) 2017-08-31
WO2016024847A1 (ko) 2016-02-18
CN106797525A (zh) 2017-05-31
CN106797525B (zh) 2019-05-28
US10349197B2 (en) 2019-07-09
KR20160020377A (ko) 2016-02-23
EP3197182A1 (de) 2017-07-26

Similar Documents

Publication Publication Date Title
EP3197182B1 (de) Verfahren und vorrichtung zur erzeugung und wiedergabe von audiosignalen
US11950085B2 (en) Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description
KR101627652B1 (ko) 바이노럴 렌더링을 위한 오디오 신호 처리 장치 및 방법
CN107533843B (zh) 用于捕获、编码、分布和解码沉浸式音频的系统和方法
RU2586842C2 (ru) Устройство и способ преобразования первого параметрического пространственного аудиосигнала во второй параметрический пространственный аудиосигнал
KR101627647B1 (ko) 바이노럴 렌더링을 위한 오디오 신호 처리 장치 및 방법
US10531216B2 (en) Synthesis of signals for immersive audio playback
WO2017182714A1 (en) Merging audio signals with spatial metadata
US11863962B2 (en) Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description
KR20170106063A (ko) 오디오 신호 처리 방법 및 장치
CN111108555A (zh) 使用深度扩展DirAC技术或其他技术生成经增强的声场描述或经修改的声场描述的概念
US11122384B2 (en) Devices and methods for binaural spatial processing and projection of audio signals
JP2016501472A (ja) 空間オーディオ信号の異なる再生スピーカ設定に対するセグメント毎の調整
WO2013186593A1 (en) Audio capture apparatus
Rafaely et al. Spatial audio signal processing for binaural reproduction of recorded acoustic scenes–review and challenges
CN113170271A (zh) 用于处理立体声信号的方法和装置
Suzuki et al. 3D spatial sound systems compatible with human's active listening to realize rich high-level kansei information
JP2024502732A (ja) バイノーラル信号の後処理
WO2024044113A2 (en) Rendering audio captured with multiple devices
CN116615919A (zh) 双耳信号的后处理
KR20180024612A (ko) 오디오 신호 처리 방법 및 장치

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20170223

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20180321

RIC1 Information provided on ipc code assigned before grant

Ipc: H04S 5/00 20060101AFI20180315BHEP

Ipc: G10L 19/008 20130101ALI20180315BHEP

Ipc: H04S 7/00 20060101ALI20180315BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20181114

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20200619

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1320044

Country of ref document: AT

Kind code of ref document: T

Effective date: 20201015

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602015059922

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201231

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201230

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20201230

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1320044

Country of ref document: AT

Kind code of ref document: T

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20200930

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210201

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602015059922

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

26N No opposition filed

Effective date: 20210701

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602015059922

Country of ref document: DE

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20210831

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20210813

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210130

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210813

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210813

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210813

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20220301

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210831

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20150813

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20200930