WO2018056780A1 - 바이노럴 오디오 신호 처리 방법 및 장치 - Google Patents
바이노럴 오디오 신호 처리 방법 및 장치 Download PDFInfo
- Publication number
- WO2018056780A1 WO2018056780A1 PCT/KR2017/010564 KR2017010564W WO2018056780A1 WO 2018056780 A1 WO2018056780 A1 WO 2018056780A1 KR 2017010564 W KR2017010564 W KR 2017010564W WO 2018056780 A1 WO2018056780 A1 WO 2018056780A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- signal processing
- processing apparatus
- audio
- metadata
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the present invention relates to an audio signal processing method and apparatus. Specifically, the present invention relates to a binaural audio signal processing method and apparatus.
- 3D audio is a series of signal processing, transmission, encoding, and playback methods for providing a realistic sound in three-dimensional space by providing another axis corresponding to the height direction to a sound scene on a horizontal plane (2D) provided by conventional surround audio. Also known as technology.
- a rendering technique is required in which a sound image is formed at a virtual position in which no speaker exists even if a larger number of speakers or a smaller number of speakers are used.
- 3D audio is expected to be an audio solution for ultra-high definition televisions (UHDTVs), as well as sound from vehicles evolving into high-quality infotainment spaces, as well as theater sounds, personal 3DTVs, tablets, wireless communication terminals, and cloud games. It is expected to be applied in the field.
- UHDTVs ultra-high definition televisions
- infotainment spaces as well as theater sounds, personal 3DTVs, tablets, wireless communication terminals, and cloud games. It is expected to be applied in the field.
- a channel based signal and an object based signal may exist in the form of a sound source provided to 3D audio.
- a sound source in which a channel-based signal and an object-based signal are mixed, thereby providing a user with a new type of content experience.
- Binaural rendering is the modeling of this 3D audio as a signal delivered to both ears.
- the user can also feel 3D through the binaural rendered 2-channel audio output signal through headphones or earphones.
- the specific principle of binaural rendering is as follows. One always hears the sound through both ears, and the sound recognizes the location and direction of the sound source.
- 3D audio can be modeled in the form of an audio signal delivered to both ears of a person, the stereoscopic sense of 3D audio can be reproduced through a two-channel audio output without a large number of speakers.
- One embodiment of the present invention is to provide an audio signal processing method and apparatus for processing an audio signal.
- an embodiment of the present invention is to provide an audio signal processing method and apparatus for processing a binaural audio signal.
- an embodiment of the present invention is to provide an audio signal processing method and apparatus for processing a binaural audio signal using metadata.
- an embodiment of the present invention is to provide a method and apparatus for processing an audio signal using an audio file format that supports a smaller number of channels than the number of channels of an audio signal.
- an audio signal processing apparatus for rendering an audio signal may include a receiver configured to receive an audio file including an audio signal; A processor that simultaneously renders a first audio signal component included in a first track of the audio file and a second audio signal component included in a second track; And an output unit configured to output the rendered first audio signal component and the rendered second audio signal component.
- the number of channels of an audio signal supported by each of the first track and the second track may be smaller than the sum of the number of channels of the audio signal.
- the first track may be a track at a predetermined position among a plurality of tracks of the audio file.
- the first audio signal component may be an audio signal component that may be rendered without metadata for representing a position of a sound image simulated by the audio signal.
- the first audio signal component may be an audio signal component that may be rendered without metadata for binaural rendering.
- the first track may include metadata.
- the processor may determine a track of the audio file including an audio signal component based on the metadata.
- the processor may render the first audio signal component and the second audio signal component based on the metadata.
- the processor may determine in a predetermined track order whether a plurality of tracks of the audio file includes an audio signal component of the audio signal.
- the processor may select the first audio signal component and the second audio signal component from among a plurality of audio signal components included in the plurality of tracks of the audio file according to the capability of the audio signal processing apparatus.
- the number of channels of an audio signal supported by each of the first track and the second track may be smaller than the sum of the number of channels of the audio signal.
- the first track may be a track at a predetermined position among a plurality of tracks of the audio file.
- the first audio signal component may be an audio signal component that may be rendered without metadata for representing a position of a sound image simulated by the audio signal.
- the first audio signal component may be an audio signal component that may be rendered without metadata for binaural rendering.
- the processor inserts metadata into the first track, and the metadata may indicate which track of the plurality of tracks of the audio file includes an audio signal component of the audio signal.
- the processor may insert a plurality of audio signal components of the audio signal in a specified order in a plurality of tracks of the audio file.
- an audio signal processing apparatus for rendering an audio signal may include a receiver configured to receive an audio signal; A processor that determines whether to render the audio signal by reflecting the position of a sound image simulated by the audio signal based on metadata about the audio signal, and renders the audio signal according to the determination; And an output unit configured to output the rendered audio signal.
- the metadata may include sound level information indicating a sound level corresponding to a time interval indicated by the metadata.
- the processor may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal based on the sound level information.
- the processor compares a difference between a sound level of an audio signal corresponding to a first time interval and a sound level of an audio signal corresponding to a second time interval, and thereby positions a sound image simulated by the audio signal corresponding to the second time interval. Reflecting this, it may be determined whether to render the audio signal corresponding to the second time interval.
- the first time interval may be a time ahead of the second time interval.
- the processor may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal based on whether the sound level indicated by the sound level information is smaller than a predetermined value.
- the metadata may include binaural effect intensity information indicating the binaural rendering application intensity.
- the processor may determine a binaural rendering application strength for the audio signal based on the binaural effect intensity information, and binaurally render the audio signal with the determined binaural rendering application strength.
- the processor may change an application intensity of a head related transfer function (HRTF) or a binaural rendering impulse response (BRIR) for binaural rendering according to the determined binaural rendering application intensity.
- HRTF head related transfer function
- BRIR binaural rendering impulse response
- the binaural effect intensity information may indicate the binaural rendering intensity for each component of the audio signal.
- the binaural effect intensity information may indicate the binaural rendering intensity in units of frames.
- the metadata may include motion application information indicating whether to render the audio signal by reflecting the movement of the listener.
- the processor may determine whether to render the audio signal by reflecting the movement of the listener based on the motion application information.
- the processor may render the audio signal by applying a fade in / fade out depending on whether the audio signal is rendered by reflecting the position of a simulated sound image.
- the metadata may include personalization parameter application information indicating whether to allow the application of a personalization parameter, which is a parameter that can be set according to the listener.
- the processor may render the audio signal without applying the personalization parameter according to the personalization parameter application information.
- the processor may insert a sound level corresponding to a time interval indicated by the metadata into the metadata.
- the sound level may be used to determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal.
- the processor may insert binaural effect intensity information indicating the binaural rendering intensity applied to the audio signal into the metadata.
- the binaural effect intensity information may be used to change an application strength of a head related transfer function (HRTF) or a binaural rendering impulse response (BRIR) for binaural rendering.
- HRTF head related transfer function
- BRIR binaural rendering impulse response
- the binaural effect strength information may indicate the binaural rendering intensity for each audio signal component of the audio signal.
- the binaural effect intensity information may indicate the intensity of the binaural rendering applied on a frame basis.
- the processor may insert motion application information indicating whether to render the audio signal by reflecting the movement of the listener into the metadata.
- the listener's movement may include the listener's head movement.
- an operation method of an audio signal processing apparatus for rendering an audio signal may include receiving an audio signal; Rendering the audio signal by reflecting a position of a sound image simulated by the audio signal based on metadata about the audio signal; And outputting the rendered audio signal.
- One embodiment of the present invention provides an audio signal processing method and apparatus for processing a plurality of audio signals.
- an embodiment of the present invention provides an audio signal processing method and apparatus for processing an audio signal that may be represented by an ambisonic signal.
- FIG. 1 is a block diagram illustrating an audio signal processing apparatus for rendering an audio signal according to an exemplary embodiment.
- FIG. 2 is a block diagram illustrating an operation of processing an ambisonic signal and an object signal together by an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment of the present invention.
- FIG. 3 illustrates syntax of metadata representing a degree of application of binaural rendering according to an exemplary embodiment of the present invention.
- FIG. 4 illustrates syntax of metadata for adjusting rendering conditions according to characteristics of an apparatus in which an audio signal is rendered according to an exemplary embodiment of the present invention.
- FIG. 5 is a view illustrating a classification of additional information according to an embodiment of the present invention.
- FIG. 6 shows a structure of a header parameter according to an embodiment of the present invention.
- FIG. 7 shows a specific format of GAO_HDR according to an embodiment of the present invention.
- FIG. 8 shows a structure of metadata parameters according to an embodiment of the present invention.
- FIG. 9 illustrates an operation of acquiring metadata separately from an audio signal by an audio signal processing apparatus that renders an audio signal according to an embodiment of the present invention.
- FIG. 10 illustrates an operation of acquiring metadata together with an audio signal by an audio signal processing apparatus that renders an audio signal according to an embodiment of the present invention.
- FIG. 11 is a view illustrating an operation of simultaneously acquiring link information for linking an audio signal and metadata by an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment.
- 12 to 13 illustrate an operation of acquiring metadata based on an audio bitstream by an audio signal processing apparatus that renders an audio signal according to an embodiment of the present invention.
- FIG. 14 illustrates a method in which an audio signal processing organ acquires metadata when an audio signal processing apparatus that renders an audio signal receives an audio signal through transport streaming according to an embodiment of the present invention.
- 15 through 16 illustrate syntax of an AAC file according to an embodiment of the present invention.
- FIG. 17 is a view illustrating an audio signal processing method using an audio file format that supports a number of channels smaller than the sum of the number of channels included in an audio signal according to an embodiment of the present invention.
- FIG. 18 is a block diagram illustrating an audio signal processing apparatus that processes an audio signal to deliver an audio signal according to an embodiment of the present invention.
- 19 is a flowchart illustrating a method of operating an audio signal processing apparatus that processes an audio signal to transmit an audio signal according to an embodiment of the present invention.
- 20 is a flowchart illustrating a method of operating an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment.
- FIG. 1 is a block diagram illustrating an audio signal processing apparatus for rendering an audio signal according to an exemplary embodiment.
- an audio signal processing apparatus 100 for rendering an audio signal includes a receiver 10, a processor 30, and an output unit 70.
- the receiver 10 receives an input audio signal.
- the input audio signal may be a sound received by the sound collector.
- the sound collection device may be a microphone.
- the sound collecting device may be a microphone array including a plurality of microphones.
- the processor 30 processes the input audio signal received by the receiver 10.
- the processor 30 may include a format converter, a renderer, and a post processing unit.
- the format converter converts the format of the input audio signal into another format.
- the format converter may convert an object signal into an ambisonic signal.
- the ambisonic signal may be a signal recorded through the microphone array.
- the ambisonic signal may be a signal obtained by converting a signal recorded through a microphone array into a coefficient with respect to the basis of spherical harmonics.
- the format converter may convert an ambisonic signal into an object signal.
- the format converter may change the order of the ambisonic signal.
- the format converter may convert a higher order ambisonics (hoa) signal into a first order ambisonics (foa) signal.
- the format converter may acquire position information related to the input audio signal, and convert the format of the input audio signal based on the acquired position information.
- the location information may be information about a microphone array in which a sound corresponding to an audio signal is collected.
- the information on the microphone array may include at least one of array information, number information, location information, frequency characteristic information, and beam pattern information of microphones constituting the microphone array.
- the position information related to the input audio signal may include information indicating the position of the sound source.
- the renderer renders the input audio signal.
- the renderer may render an input audio signal in which the format is converted.
- the input audio signal may include at least one of a loudspeaker channel signal, an object signal, and an ambisonic signal.
- the renderer may render the input audio signal into an audio signal such that the input audio signal is represented by a virtual sound object positioned in three dimensions using information represented by the format of the audio signal.
- the renderer may render the input audio signal by matching the plurality of speakers.
- the renderer may binaurally render the input audio signal.
- the renderer may include a time synchronizer for synchronizing the time between the object signal and the ambisonic signal.
- the renderer may include a 6DOF controller that controls 6 degrees of freedom (6DOF) of the ambisonic signal.
- the 6DOF controller may include a direction changing unit for changing the size of a specific direction component of the ambisonic signal.
- the 6DOF controller may change the size of a specific direction component of the ambisonic signal according to the position of the listener in the virtual space simulated by the audio signal.
- the direction changing unit may include a direction modification matrix generator for generating a matrix for changing the size of a specific direction component of the ambisonic signal.
- the 6DOF control unit may include a conversion unit for converting the ambisonic signal into a channel signal
- the 6DOF control unit may include a relative position calculation unit for calculating the relative position between the virtual speaker corresponding to the channel signal and the listener of the audio signal.
- the output unit 70 outputs the rendered audio signal.
- the output unit 70 may output an audio signal through two or more loudspeakers.
- the output unit 70 may output an audio signal through two-channel stereo headphones.
- the audio signal processing apparatus 100 may process an ambisonic signal and an object signal together. In this case, a specific operation of the audio signal processing apparatus 100 will be described with reference to FIG. 2.
- FIG. 2 is a block diagram illustrating an operation of processing an ambisonic signal and an object signal together by an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment of the present invention.
- the aforementioned Ambisonics is one of methods in which an audio signal processing apparatus obtains information about a sound field and reproduces sound using the obtained information.
- the ambisonic may represent that the audio signal processing apparatus processes the audio signal as follows.
- an audio signal processing device For ideal ambisonic signal processing, an audio signal processing device must obtain information about a sound source from sound in all directions incident on a point in space. However, since there is a limit to reducing the size of the microphone, the audio signal processing apparatus may obtain information about a sound source by calculating a signal incident to infinitely small points from the sound collected on the surface of a sphere, and use the obtained information.
- the position of each microphone of the microphone array on the spherical coordinate system may be expressed as a distance from the center of the coordinate system, an azimuth (or horizontal angle), and an elevation angle (or vertical angle).
- the audio signal processing apparatus may acquire the basis of the spherical harmonic function through the coordinate values of each microphone in the spherical coordinate system. In this case, the audio signal processing apparatus may project the microphone array signal into the spherical harmonic function domain based on each basis of the spherical harmonic function.
- the microphone array signal can be recorded via a spherical microphone array. If the center of the spherical coordinate system coincides with the center of the microphone array, the distances from the center of the microphone array to each microphone are all constant. Therefore, the position of each microphone can be expressed only by the azimuth angle ⁇ and the altitude angle ⁇ .
- the signal p a recorded through the microphone may be expressed in the following equation in the spherical harmonic function domain.
- p a represents the signal recorded through the microphone.
- ( ⁇ q, ⁇ q) represent azimuth and elevation angles of the q-th microphone.
- Y represents a spherical harmonic function having azimuth and elevation angles as factors.
- m each represents the order of the spherical harmonic function, and
- n represents the degree.
- B represents an ambisonic coefficient corresponding to the spherical harmonic function.
- Ambisonic coefficients may be referred to herein as an ambisonic signal.
- the ambisonic signal may represent any one of a FoA signal and a HoA signal.
- the audio signal processing apparatus may obtain an ambisonic signal using a pseudo inverse matrix of a spherical harmonic function.
- the audio signal processing apparatus may obtain an ambisonic signal by using the following equation.
- p a denotes a signal recorded through a microphone as described above, and B denotes an ambisonic coefficient corresponding to a spherical harmonic function.
- pinv (Y) represents the pseudo inverse matrix of Y.
- the aforementioned object signal represents an audio signal corresponding to one sound object.
- the object signal may be a signal obtained from a sound collecting device proximate to a specific sound object.
- the object signal is used to express that the sound output by any one sound object is transmitted to a specific point, unlike an ambisonic signal that represents all sounds that can be collected at a specific point in space.
- the audio signal processing apparatus may represent the object signal in the format of an ambisonic signal using the position of the sound object corresponding to the object signal.
- the audio signal processing apparatus may measure the position of the sound object using an external sensor installed in a microphone that collects sound corresponding to the sound object and an external sensor installed at a reference point of position measurement.
- the audio signal processing apparatus may estimate the location of a sound object by analyzing the audio signal collected by the microphone.
- the audio signal processing apparatus may represent the object signal as an ambisonic signal using the following equation.
- Each of ⁇ s and ⁇ s represents an azimuth and an elevation angle representing the position of a sound object corresponding to the object.
- Y represents a spherical harmonic function having azimuth and elevation angles as factors.
- B S nm represents an ambisonic signal converted from an object signal.
- the audio signal processing apparatus may use at least one of the following methods.
- the audio signal processing apparatus may separately output an object signal and an ambisonic signal.
- the audio signal processing apparatus may convert the object signal into an ambisonic signal format and output the object signal and the ambisonic signal converted into the ambisonic signal format.
- the object signal and the ambisonic signal converted into the ambisonic signal format may be HoA signals.
- the object signal and the ambisonic signal converted into the ambisonic signal format may be FoA signals.
- the audio signal processing apparatus may output only an ambisonic signal without an object signal.
- the ambisonic signal may be a FoA signal. Since the ambisonic signal is assumed to include all sounds collected at one point in space, the ambisonic signal may be assumed to include a signal component corresponding to the object signal. Accordingly, the audio signal processing apparatus may reproduce the sound object corresponding to the object signal even if the audio signal processing apparatus processes only the ambisonic signal without separately processing the object signal.
- the audio signal processing apparatus may process the ambisonic signal and the object signal as in the embodiment of FIG. 2.
- the ambisonic converter 31 converts the ambient sound into an ambisonic signal.
- the format converter 33 changes the format of the object signal and the ambisonic signal.
- the format converter 33 may convert the object signal into a format of an ambisonic signal.
- the format converter 33 may convert the object signal into a HoA signal.
- the format converter 33 may convert the object signal into a FoA signal.
- the format converter 33 may convert the HoA signal into a FoA signal.
- the post processor 35 post-processes the converted audio signal.
- the renderer 37 renders the post processed audio signal.
- the renderer 37 may be a binaural renderer.
- the renderer 37 may binaurally render the post processed audio signal.
- the audio signal processing apparatus may render an audio signal to simulate a sound source located in a virtual space.
- the audio signal processing apparatus needs information for rendering the audio signal.
- Information for rendering the audio signal may be delivered in the form of metadata, and the audio signal processing apparatus may render the audio signal based on the metadata.
- the metadata may include information about a rendering method intended by a content producer and information about a rendering environment. Accordingly, the audio signal processing apparatus may render the audio signal by reflecting the intention of the content producer.
- the metadata type and format will be described with reference to FIGS. 3 to 16.
- FIG. 3 illustrates syntax of metadata representing a degree of application of binaural rendering according to an exemplary embodiment of the present invention.
- the metadata may include head movement application information indicating whether to render the audio signal by reflecting the listener's head movement when rendering the audio signal.
- the audio signal processing apparatus for rendering the audio signal may obtain the head motion application information from the metadata.
- the audio signal processing apparatus may determine whether to render the object signal by reflecting the head movement of the listener based on the head motion application information. Head movement may also indicate head rotation.
- the audio signal processing apparatus may render the object signal without reflecting the listener's head movement according to the head movement application information.
- the audio signal processing apparatus may render the object signal by reflecting the head movement of the listener according to the head motion application information.
- Like a bee on the listener's head there may be objects that move together as the listener's head moves.
- the audio signal processing apparatus may render the audio signal simulating the corresponding object without reflecting the movement of the listener's head. Through this embodiment, the amount of calculation of the audio signal processing apparatus can be reduced.
- the metadata may include binaural effect intensity information indicating the binaural rendering application intensity.
- the audio signal processing apparatus that renders the audio signal may obtain the binaural effect strength from the metadata.
- the audio signal processing apparatus may determine a level at which binaural rendering is applied to the object signal based on the binaural effect intensity information.
- the audio signal processing apparatus may determine whether to apply binaural rendering to the audio signal based on the binaural effect intensity information. As described above, when the audio signal processing apparatus binaurally renders the audio signal, the audio signal processing apparatus may simulate the sound image represented by the audio signal in a three-dimensional space.
- the tone of the audio signal may be transformed by the binaural rendering.
- the tone may be more important than the sense of space depending on the type of sound image represented by the audio signal.
- the producer of the content included in the audio signal may set the binaural effect intensity information to determine the degree of application of the binaural rendering of the audio signal.
- the binaural effect intensity information may represent that binaural rendering is not applied.
- the audio signal processing apparatus may render the audio signal according to the binaural effect intensity information without using binaural rendering.
- the binaural effect intensity information may indicate an application strength of HRTF or BRIR for binaural rendering when binaural rendering is applied.
- the binaural effect intensity information may be divided into quantized levels.
- the binaural effect intensity information may be divided into three stages such as Mild, Normal, and Strong.
- the binaural effect intensity information may be divided into five steps as in the embodiment of FIG.
- the binaural effect strength information may be expressed as a value of any one of consecutive real numbers between 0 and 1.
- the audio signal processing apparatus for rendering the audio signal may apply the binaural effect intensity information for each audio track included in the audio signal.
- the audio signal processing apparatus may apply the binaural effect strength information for each audio source included in the audio signal.
- the audio signal processing apparatus for rendering the audio signal may apply the binaural effect strength information for each signal characteristic.
- the audio signal processing apparatus may apply the binaural effect strength information for each object included in the audio signal.
- the audio signal processing apparatus for rendering the audio signal may apply the binaural effect strength information for each time interval of each audio track. In this case, the time interval may be a frame of the audio signal.
- the metadata may classify binaural effect intensity information for each track and frame.
- the metadata may include binaural effect intensity forced information indicating whether application of the binaural effect intensity information is enforced.
- the audio signal processing apparatus that renders the audio signal may obtain binaural effect intensity forced information from metadata, and selectively apply binaural effect intensity information according to the binaural effect intensity forced information. Also, the audio signal processing apparatus may forcibly apply the binaural effect strength information according to the binaural effect intensity forced information.
- the audio signal processing apparatus that renders the audio signal may apply the binaural effect intensity forced information for each audio track included in the audio signal.
- the audio signal processing apparatus that renders the audio signal may apply the binaural effect intensity forced information for each audio source included in the audio signal.
- the audio signal processing apparatus may apply the binaural effect strength forced information for each signal characteristic.
- the audio signal processing apparatus that renders the audio signal may apply the binaural effect intensity forced information for each object included in the audio signal.
- the audio signal processing apparatus that renders the audio signal may apply the binaural effect intensity forced information for each time interval of each audio track.
- the specific format of the binaural effect intensity forced information may be as shown in FIG. 3 (c).
- the audio signal processing apparatus for rendering the audio signal may determine whether to apply the binaural rendering as well as other stereoscopic sound using the binaural effect intensity information.
- the audio signal processing apparatus may render the audio signal indicated by the binaural effect intensity information without reflecting the position of the sound image simulated by the corresponding audio signal according to the binaural effect intensity information.
- the computational efficiency of the audio signal processing apparatus that renders the audio signal may be increased.
- the intended content experience of the content included in the audio signal may be precisely delivered to the listener.
- the same audio signal can be rendered through various devices.
- the rendering environment of the audio signal is also diversified.
- the same audio signal may be rendered as a head mounted display (HMD) in the form of a VR device, or may be rendered by a mobile phone or a TV. Therefore, even the same audio signal needs to be rendered differently depending on the device in which the audio signal is rendered. This will be described with reference to FIG. 4.
- HMD head mounted display
- FIG. 4 illustrates syntax of metadata for adjusting rendering conditions according to characteristics of an apparatus in which an audio signal is rendered according to an exemplary embodiment of the present invention.
- the metadata may include a reference device characteristic parameter indicating a characteristic of the audio signal processing apparatus that is a reference when the corresponding metadata is generated.
- the reference device characteristic parameter may indicate a characteristic of the audio signal processing apparatus that the producer of the content included in the audio signal intends to render the audio signal.
- the audio signal reference device characteristic parameter may include a characteristic of the image display apparatus in which the audio signal is rendered.
- the reference device characteristic parameter may include a screen characteristic of the image display device.
- the screen characteristic may include at least one of a screen type, a screen resolution, a screen size, and an aspect ratio of the screen.
- the screen type may include at least one of a TV, a PC monitor, a mobile phone, and an HMD.
- the screen type can be classified in combination with the resolution of the screen.
- the device characteristic parameter may distinguish and represent an HMD supporting HD and an HMD supporting UHD.
- the aspect ratio of the screen may include at least one of 1: 1, 4: 3, 15: 9, and 16: 9.
- the reference device characteristic parameter may include a specific model name.
- the reference device characteristic parameter may include a positional relationship between the listener and the image display device.
- the positional relationship between the listener and the image display device may include a distance between the listener and the screen of the image display device.
- the positional relationship between the listener and the image display apparatus may include a viewing angle at which the listener views the image display apparatus.
- the distance between the listener and the screen of the video display device may vary depending on the production environment when the audio content is produced.
- the device characteristic parameter may distinguish the viewing angle as 90 degrees or less, 90 degrees to 110 degrees, 110 degrees to 130 degrees, or 130 degrees or more.
- the reference device characteristic parameter may include an audio signal output characteristic.
- the audio signal output characteristic may include at least one of a loudness level, a type of output device, and an EQ used for output.
- the reference device characteristic parameter may represent a loudness level as a sound pressure level (SPL) value.
- the reference device characteristic parameter may indicate a range of loudness levels intended by the metadata.
- the reference device characteristic parameter may indicate a loudness level value intended by the metadata.
- the output device type may include at least one of a headphone and a speaker.
- the output device type may be subdivided according to the output characteristics of the headphones and speakers.
- the EQ used for the output may be the EQ used when producing the creator content.
- the reference device characteristic parameter may have a syntax as illustrated in FIG. 4.
- the audio signal processing apparatus may render the audio signal based on the reference device characteristic parameter and the characteristic difference of the audio signal processing apparatus.
- the audio signal processing apparatus may determine the size of an audio signal based on a difference between the distance between the listener and the screen of the image output apparatus represented by the reference device characteristic parameter and the distance between the listener and the screen of the image output device represented by the actual device characteristic parameter. I can adjust it.
- the audio signal processing apparatus may render the audio signal by correcting the position of the sound image represented by the metadata based on the viewing angle difference indicated by the reference device characteristic parameter and the viewing angle difference indicated by the actual device characteristic parameter. have.
- the audio signal processing apparatus may adjust the output level of the audio signal processing apparatus based on the loudness level indicated by the reference device characteristic parameter.
- the audio signal processing apparatus may adjust the output level of the audio signal processing apparatus to the loudness level indicated by the reference device characteristic parameter.
- the audio signal processing apparatus may display the loudness level indicated by the reference device characteristic parameter to the user.
- the audio signal processing apparatus may adjust the output level of the audio signal processing apparatus based on the loudness level indicated by the reference device characteristic parameter and the equal loudness curve.
- the audio signal processing apparatus may select any one of a plurality of reference device characteristic parameter sets and render an audio signal using metadata corresponding to the selected reference device characteristic parameter set.
- the audio signal processing apparatus may select any one of a plurality of reference apparatus characteristic parameter sets based on the characteristics of the audio signal processing apparatus.
- the reference device characteristic parameter set may include at least one of the device characteristic parameters described above.
- the audio signal processing apparatus may receive a metadata set including metadata corresponding to each of a plurality of reference device characteristic parameter sets and a plurality of reference device characteristic parameter sets.
- the metadata set may include the number of screen optimized information (numScreenOptimizedInfo) indicating the number of reference device characteristic parameter sets. The number of screen optimal information may be displayed by 5 bits, and may represent up to 32.
- the audio signal processing apparatus may binaurally render the audio signal using a personalization parameter.
- the personalization parameter may represent a parameter that may be set according to the listener.
- the personalization parameter may include at least one of an HRTF, body information, and a 3D model.
- Personalization parameters affect the rendering of the audio signal. Therefore, when the personalization parameter set by the listener is applied, the producer of the content included in the audio signal may not be reflected in the rendered audio. As a result, the content experience that the audio signal intends to deliver through the content may not be delivered. Therefore, the metadata may include personalization application information indicating whether the personalization parameter is applied.
- the audio signal processing apparatus may determine whether to binaurally render the audio signal by applying a personalization parameter based on the personalization application information. When the personalization application information indicates that the personalization parameter is not allowed to be applied, the audio signal processing apparatus may binaurally render the audio signal without applying the personalization parameter.
- the creator of the content included in the audio signal may use metadata to induce optimization of the amount of calculation of the audio signal processing apparatus.
- the metadata may include sound level information indicating a sound level of an audio signal.
- the audio signal processing apparatus may render the audio signal based on the sound level information without reflecting the position of the sound image simulated by the corresponding audio signal. Rendering without reflecting the location of the sound image that the audio signal simulates may include rendering the audio signal without applying binaural rendering.
- the metadata may include mute information indicating that the sound level is zero.
- the audio signal processing apparatus may render the audio signal based on the mute information without reflecting the position of the sound image simulated by the corresponding audio signal.
- the audio signal processing apparatus may render the audio signal indicating that the mute information indicates that the sound level is 0 without reflecting the position of the sound image simulated by the corresponding audio signal.
- the audio signal processing apparatus may render an audio signal having a sound level equal to or less than a predetermined size without reflecting the position of a sound image simulated by the corresponding audio signal.
- an audio signal processing apparatus includes an audio signal corresponding to a second time interval based on a sound level of an audio signal corresponding to a first time interval and a sound level of an audio signal corresponding to a second time interval.
- the audio signal corresponding to the second time interval may be rendered without reflecting the position of the simulated sound image.
- the first time interval is a time interval located before the second time interval.
- the first time interval and the second time interval may be continuous time intervals.
- the audio signal processing apparatus compares a difference between a sound level of an audio signal corresponding to a first time interval and a sound level of an audio signal corresponding to a second time interval, and simulates an audio signal corresponding to a second time interval.
- the audio signal corresponding to the second time interval may be rendered without reflecting the position of the sound image. For example, when the difference between the sound level of the audio signal corresponding to the first time interval and the sound level of the audio signal corresponding to the second time interval is equal to or greater than a specified value, the audio signal processing apparatus may determine that the audio signal corresponding to the second time interval is equal to or greater than that specified.
- the audio signal corresponding to the second time interval may be rendered without reflecting the position of the simulated sound image. If the listener hears a relatively small sound after the loud sound, the listener may not perceive the relatively small sound well depending on the temporal masking effect.
- the listener When the listener hears a relatively small sound after the loud sound, the listener may not be able to recognize the location of the sound source that produces the relatively small sound according to the spatial masking effect. Therefore, even if the rendering for stereoscopic reproduction is applied to the small sound coming after the relatively loud sound, the effect on the listener may be insignificant. Therefore, the audio signal processing apparatus may not apply rendering for stereoscopic sound reproduction to the small sound coming after the loud sound to increase the computational efficiency.
- the metadata may be divided into at least one of an audio track, an audio source, an object, and a time interval.
- the above-described time period may be a frame of the audio signal.
- the audio signal processing apparatus may render the audio signal by applying fade in / fade out depending on whether the audio signal is rendered or not by reflecting the position of the sound image simulated. According to this embodiment, the audio signal processing apparatus may prevent the rendered sound from being unnaturally heard by selectively applying stereoscopic rendering.
- the metadata may include motion application information indicating whether the audio signal renders the audio signal by reflecting the movement of the listener with respect to the position of the simulated sound image.
- the audio signal processing apparatus may obtain motion application information from metadata.
- the audio signal processing apparatus may determine whether to render the object signal by reflecting the movement of the listener based on the motion application information.
- the metadata may include information on whether head tracking is applied, which indicates whether to render an audio signal by reflecting a listener's head movement.
- the audio signal processing apparatus may obtain information on whether head tracking is applied from the metadata.
- the audio signal processing apparatus may determine whether to render the object signal by reflecting the head movement of the listener based on the head tracking application information.
- the audio signal processing apparatus may render the object signal without reflecting the head movement of the listener based on the head tracking application information.
- the audio signal processing apparatus may render the audio signal simulating the object without reflecting the movement of the listener's head with respect to the audio signal representing the object.
- the audio signal processing apparatus may optimize computational efficiency by using metadata according to the above-described embodiments.
- FIG. 5 is a view illustrating a classification of additional information according to an embodiment of the present invention.
- the additional information may include metadata.
- the additional information may be classified according to the relative length of the time interval of the audio signal signaled by the additional information.
- the additional information may be classified into a header parameter and a metadata parameter according to a relative length of a time interval of an audio signal signaled by the additional information.
- the header parameter may include a parameter that is less likely to change frequently when rendering the audio signal.
- the parameter included in the header parameter may be information that remains the same until the content included in the audio signal is terminated or the rendering configuration is changed.
- the header parameter may include the order of the ambisonic signal.
- Metadata parameters may include parameters that are likely to change frequently when rendering the audio signal.
- the metadata parameter may include information about the position of the object that the audio signal simulates. In more detail, the information regarding the position of the object may be at least one of azimuth, elevation, and distance.
- the type of the additional information may be divided into an element parameter including information for rendering an audio signal and a general parameter including information other than information about the audio signal itself.
- the general parameter may include information about the audio signal itself.
- FIG. 6 shows a structure of a header parameter according to an embodiment of the present invention.
- the header parameter may include information for each type of component included in the audio signal.
- the header parameter may include information for the entire audio signal, the ambisonic signal, the object signal, and the channel signal.
- the header parameter indicating the entire audio signal may be referred to as GAO_HDR.
- GAO_HDR may include information about a sampling rate of an audio signal.
- the audio signal processing apparatus may calculate a filter coefficient based on a head related transfer function (HRTF) or a binaural room impulse response (BRIR) based on the information about the sampling rate.
- HRTF head related transfer function
- BRIR binaural room impulse response
- the audio signal processing apparatus may resample the audio signal to calculate the filter coefficient.
- the audio signal includes information about the sampling rate, such as a WAV file or an AAC file
- the GAO_HDR may not include the information about the sampling rate.
- the GAO_HDR may include information indicating the length of each frame indicated by the element metadata.
- the length of each frame may be set based on various constraints such as sound quality, binaural rendering algorithm, memory, and computation amount.
- the frame-by-frame length may be set when post-production or encoding. The frame-by-frame length allows the producer to adjust the time resolution density when the audio signal is binaurally rendered.
- the GAO_HDR may include the number of components according to the type of components included in the audio signal.
- GAO_HDR may include the number of ambisonic signals, the number of channel signals, and the number of object audio signals included in the audio signal.
- the GAO_HDR may include at least one of the information included in the following table.
- GEN represents a general parameter
- ELE represents an element parameter.
- header parameters corresponding to each component may be delivered to the audio signal processing apparatus together with the GAO_HDR.
- GAO_HDR may include a header parameter corresponding to each component.
- GAO_HDR may include link information connecting header parameters corresponding to each component.
- FIG. 7 shows a specific format of GAO_HDR according to an embodiment of the present invention.
- the header parameter indicating the ambisonic signal may be referred to as GAO_HOA_HDR.
- GAO_HOA_HDR may include information about a speaker layout to be used when rendering an ambisonic signal.
- the audio signal processing apparatus may convert the ambisonic signal into a channel signal, and binaurally render the converted ambisonic signal.
- the audio signal processing apparatus may convert the ambisonic signal into a channel signal based on the information on the speaker layout.
- the information about the speaker layout may be a code independent coding point (CICP) index.
- CICP code independent coding point
- the GAO_HOA_HDR may include information about a binaural rendering mode to be used when the audio signal processing apparatus binaurally renders the corresponding ambisonic signal.
- the audio signal processing apparatus may binaurally render the corresponding ambisonic signal based on the binaural rendering mode.
- the binaural rendering mode may represent any one of a rendering mode in which the user's head movement is applied after the channel rendering and a mode in which channel rendering is applied after the user's head movement is applied.
- the head movement may indicate head rotation.
- the audio signal processing apparatus may apply the rotation matrix corresponding to the head movement to the first ambisonic signal to generate the second ambisonic signal, and channel-render the second ambisonic signal.
- the audio signal processing apparatus may maintain the timbre of the ambisonic signal through this rendering mode. Also, the audio signal processing apparatus may convert the first ambisonic signal into a channel signal, change the speaker layout of the first channel signal according to head movement, and then binaurally render the channel signal. The audio signal processing apparatus may precisely represent the position of the sound image simulated by the ambisonic signal through this rendering mode.
- GAO_HOA_HDR includes information about the binaural rendering mode
- the producer may select the binaural rendering mode according to the content characteristic. For example, a manufacturer may apply head movements to a channel rendered ambisonic signal after channel rendering the ambisonic signal to a sound such as broadband noise such as a car sound.
- the producer may apply a head movement to the ambisonic signal and then channel render the ambisonic signal to which the head movement is applied.
- GAO_HOA_HDR may include information indicating whether the position of the sound image simulated by the ambisonic signal is rotated with time. Information indicating whether the position of the sound image simulated by the audio signal is rotated with time may be displayed in the form of a flag. If the position of the sound image simulated by the audio signal does not rotate over time, the audio signal processing apparatus may continue to use information about the position rotation of the sound image simulated by the first acquired Ambisonic signal.
- the GAO_HOA_HDR may include information indicating the language of content included in the ambisonic signal.
- the audio signal processing apparatus may selectively render the ambisonic signal based on the information representing the language of the content included in the audio signal.
- GAO_HOA_HDR may include at least one of information included in the following table.
- the header parameterer indicating the channel signal may be referred to as GAO_CHN_HDR.
- the GAO_CHN_HDR may include information indicating information on the speaker layout of the channel signal.
- GAO_CHN_HDR may include at least one of the information included in GAO_HOA_HDR.
- GAO_CHN_HDR may include at least one of information included in the following table.
- the header parameterer indicating the channel signal may be referred to as GAO_OBJ_HDR.
- the GAO_OBJ_HDR may include at least one of the information included in the GAO_HOA_HDR.
- GAO_OBJ_HDR may include at least one of information included in the following table.
- FIG. 8 shows a structure of metadata parameters according to an embodiment of the present invention.
- the metadata parameter may include information for each type of component included in the audio signal.
- the metadata parameter may include information for the entire audio signal, the ambisonic signal, the object signal, and the channel signal.
- the metadata parameter representing the entire audio signal may be referred to as GAO_META.
- Metadata parameters corresponding to each component may be transmitted to the audio signal processing apparatus together with the GAO_META.
- GAO_META may include metadata parameters corresponding to each component.
- the GAO_META may include link information connecting metadata parameters corresponding to each component.
- the metadata parameter representing the object signal may be referred to as GAO_META_OBJ.
- GAO_META_OBJ may include the above-described information on whether head tracking is applied.
- the audio signal processing apparatus may obtain information indicating whether to render head tracking application information from GAO_META_OBJ.
- the audio signal processing apparatus may determine whether to render the object signal by reflecting the head movement of the listener based on the head tracking application information.
- GAO_META_OBJ may include the binaural effect strength information described above.
- the audio signal processing apparatus may obtain information indicating binaural effect strength information from GAO_META_OBJ.
- the audio signal processing apparatus may determine the binaural rendering application strength to be applied to the object signal based on the binaural effect intensity information.
- the audio signal processing apparatus may determine whether to binaurally render the object signal based on the binaural effect intensity information.
- GAO_META_OBJ may include the sound level information described above.
- the audio signal processing apparatus may obtain sound level information from GAO_META_OBJ.
- the audio signal processing apparatus may determine whether to render by reflecting the position of the sound image simulated by the object signal based on the sound level information.
- the audio signal processing apparatus may determine whether to binaurally render the object signal based on the sound level information.
- GAO_META_OBJ may include at least one of the information shown in the following table.
- GAO_META_CHN and GAO_META_HOA may include the binaural effect strength information described above.
- the audio signal processing apparatus may obtain information indicating binaural effect strength information from GAO_META_CHN or GAO_META_HOA.
- the audio signal processing apparatus may determine the binaural rendering application strength to be applied to the channel signal based on the binaural effect intensity information. In more detail, the audio signal processing apparatus may determine whether to binaurally render a channel signal based on the binaural effect intensity information.
- the audio signal processing apparatus may determine the binaural rendering application strength to be applied to the ambisonic signal based on the binaural effect intensity information. In more detail, the audio signal processing apparatus may determine whether to binaurally render an ambisonic signal based on the binaural effect intensity information.
- GAO_META_CHN and GAO_META_HOA may include the sound level information described above.
- the audio signal processing apparatus may obtain sound level information from GAO_META_CHN or GAO_META_HOA.
- the audio signal processing apparatus may determine whether to render by reflecting the position of the sound image simulated by the channel signal based on the sound level information.
- the audio signal processing apparatus may determine whether to binaurally render the channel signal based on the sound level information.
- the audio signal processing apparatus may determine whether to render by reflecting the position of the sound image simulated by the ambisonic signal based on the sound level information.
- the audio signal processing apparatus may determine whether to binaurally render the ambisonic signal based on the sound level information.
- GAO_META_CHN and GAO_META_OBJ may include the same kind of parameters.
- GAO_META_CHN and GAO_META_OBJ may include different types of parameters.
- GAO_META_CHN and GAO_META_OBJ may include at least one of the information shown in the following table.
- the audio signal may be transmitted to the audio signal processing apparatus in the form of a file.
- the audio signal may be delivered to the audio signal processing apparatus through streaming.
- the audio signal may be transmitted to the audio signal processing apparatus through a broadcast signal.
- the transmission method of the metadata may also vary according to the transmission type of the audio signal. This will be described with reference to FIGS. 9 to 12.
- FIG. 9 illustrates an operation of acquiring metadata separately from an audio signal by an audio signal processing apparatus according to an embodiment of the present invention.
- An audio signal processing apparatus that processes an audio signal to deliver an audio signal may transmit metadata to the audio signal processing apparatus separately from the audio bitstream encoding the audio signal. Therefore, the audio signal processing apparatus that renders the audio signal may acquire metadata separately from the audio signal.
- an audio signal processing apparatus that renders an audio signal may obtain metadata from a transport file or another transport stream different from the audio signal.
- an audio signal processing apparatus that renders an audio signal may receive a transport stream or a transport file through a first link and receive metadata through a second link.
- the transport file or transport stream may include an audio bitstream encoding the audio signal or both an audio bitstream encoding the audio signal and a video bitstream encoding the video signal.
- FIG. 9 illustrates an image signal processing apparatus including an audio signal processing apparatus.
- the video signal processing apparatus receives a transport stream including an audio signal and a video signal through a first link URL1.
- the image signal processing apparatus receives metadata from the second link ULR2.
- the video signal processing apparatus demuxes a transport stream and extracts an audio bitstream A and a video bitstream V.
- FIG. A decoder of the audio signal processing apparatus decodes the audio bitstream A to obtain an audio signal.
- An audio renderer of the audio signal processing apparatus receives an audio signal and metadata. In this case, the renderer of the audio signal processing apparatus may receive metadata by using a metadata interface. Also, an audio renderer of the audio signal processing apparatus renders an audio signal based on metadata.
- the audio renderer may include a module (G-format) for processing metadata and a module (G-core) for processing an audio signal. Also, the audio renderer may render an audio signal based on the head movement of the user of the image signal processing apparatus.
- the image signal processing apparatus outputs the rendered audio and the rendered video together.
- the video renderer also renders a video signal. In this case, the video renderer may render a video signal based on the head movement of the user of the image signal processing apparatus.
- the image signal processing apparatus may receive a user input using a controller.
- the controller may control operations of the demux and the metadata interface.
- 9 shows a module included in the audio signal processing apparatus according to the embodiment of FIG. 9. In addition, the portion indicated by the dotted line may be omitted or replaced by a module included in the image signal processing apparatus.
- FIG. 10 illustrates an operation of acquiring metadata together with an audio signal by an audio signal processing apparatus that renders an audio signal according to an embodiment of the present invention.
- An audio signal processing apparatus that processes an audio signal to deliver an audio signal may transmit metadata along with an audio bitstream encoding the audio signal.
- An audio signal processing apparatus that renders an audio signal may acquire metadata along with the audio signal.
- an audio signal processing apparatus that renders an audio signal may acquire metadata and an audio signal together from the same transport file or transport stream.
- the transport file or transport stream may include an audio bitstream and metadata encoded with an audio signal, or may include both an audio bitstream encoded with an audio signal, a video bitstream encoded with a video signal, and metadata.
- the user data field of the transfer file may include metadata.
- UTDA which is a user data field of mp4 may include metadata.
- an individual box or element of mp4 may include metadata.
- the video signal processing apparatus receives a transport stream including an audio signal, a video signal, and metadata through the first link URL1.
- the video signal processing apparatus parses the transport stream and extracts metadata.
- the image signal processing apparatus may parse the transport stream using a parser.
- the video signal processing apparatus demuxes a transport stream and extracts an audio signal and a video signal.
- a decoder (Audio Decoder) of the audio signal processing apparatus decodes the demuxed audio signal (A).
- An audio renderer of the audio signal processing apparatus receives a decoded audio signal and metadata.
- the renderer of the audio signal processing apparatus may receive metadata by using a metadata interface.
- an audio renderer of the audio signal processing apparatus renders a decoded audio singnal based on metadata.
- Other operations of the audio signal processing apparatus and the image signal processing apparatus may be the same as those described with reference to FIG. 9.
- FIG. 11 is a view illustrating an operation of simultaneously acquiring link information for linking an audio signal and metadata by an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment.
- An audio signal processing apparatus that processes an audio signal to transmit an audio signal may transmit link information for linking metadata through a transport stream or a transport file. Therefore, the audio signal processing apparatus that renders the audio signal may acquire link information for linking metadata from the transport stream or the transport file, and obtain the metadata using the link information.
- the transport file or transport stream may include a bitstream encoding the audio signal, or may include both the bitstream encoding the audio signal and the bitstream encoding the video signal.
- the user data field of the transfer file may include link information that links the metadata.
- UTDA which is a user data field of mp4
- an individual box or element of mp4 may include link information for linking metadata.
- An audio signal processing apparatus that renders an audio signal may receive metadata obtained using the link information.
- the video signal processing apparatus receives a transport stream including link information for linking an audio signal, a video signal, and metadata through a first link URL1.
- the video signal processing apparatus demuxes a transport stream and extracts link information for linking the audio bitstream A, the video bitstream V, and metadata.
- a decoder of the audio signal processing apparatus decodes the audio bitstream A to obtain an audio signal.
- An audio renderer of an audio signal processing apparatus receives metadata from a second link ULR2 indicated by link information using a metadata interface.
- An audio renderer of the audio signal processing apparatus receives an audio signal and metadata.
- an audio renderer of the audio signal processing apparatus renders an audio signal based on metadata.
- Other operations of the audio signal processing apparatus and the image signal processing apparatus may be the same as those described with reference to FIG. 9.
- 12 to 13 illustrate an operation of acquiring metadata based on an audio bitstream by an audio signal processing apparatus that renders an audio signal according to an embodiment of the present invention.
- An audio signal processing apparatus that processes an audio signal to deliver an audio signal may insert metadata into an audio bitstream. Therefore, the audio signal processing apparatus that renders the audio signal may obtain metadata from the audio bitstream.
- the user data field of the audio bitstream may include metadata.
- the audio signal processing apparatus for rendering the audio signal may include a parser for parsing metadata from the audio bitstream.
- the decoder of the audio signal processing apparatus may obtain metadata from the audio bitstream.
- a parser of an audio signal processing apparatus obtains metadata from an audio bitstream.
- An audio renderer of the audio signal processing apparatus receives metadata from a parser.
- an audio decoder of the audio signal processing apparatus obtains metadata from an audio bitstream.
- An audio renderer of an audio signal processing apparatus receives metadata from a decoder of an audio signal processing apparatus. 12 to 13, other operations of the audio signal processing apparatus and the image signal processing apparatus may be the same as those described with reference to FIG. 9.
- the audio signal processing apparatus When the audio signal processing apparatus receives the audio signal through streaming, the audio signal processing apparatus may receive the audio signal from the middle of the streaming. Therefore, information necessary to render the audio signal should be transmitted periodically. This will be described with reference to FIGS. 14 to 16.
- FIG. 14 illustrates a method in which an audio signal processing apparatus acquires metadata when an audio signal processing apparatus receives an audio signal through transport streaming according to an embodiment of the present invention.
- An audio signal processing apparatus that processes an audio signal to deliver an audio signal may periodically insert metadata into a multimedia stream.
- the audio signal processing apparatus which processes the audio signal to deliver the audio signal may insert metadata in the frame unit in the multimedia stream.
- an audio signal processing apparatus that processes an audio signal to deliver an audio signal may periodically insert the header parameter and the metadata parameter described above in the multimedia stream.
- the audio signal processing apparatus which processes the audio signal to transmit the audio signal may insert the header parameter into the multimedia stream at a larger period than the metadata parameter.
- the audio signal processing apparatus that processes the audio signal to deliver the audio signal may insert a header parameter into the corresponding frame. Can be.
- the audio signal processing apparatus that renders the audio signal may periodically acquire metadata from the multimedia stream.
- an audio signal processing apparatus that renders an audio signal may obtain metadata on a frame basis from a multimedia stream.
- the audio signal processing apparatus that renders the audio signal acquires the metadata on a frame basis
- the audio signal processing apparatus that renders the audio signal repacks the audio signal and the metadata to synchronize the metadata with the audio signal. )You do not have to do.
- an audio signal processing apparatus that renders an audio signal may efficiently manage metadata and an audio signal. Specific syntax of the metadata will be described with reference to FIGS. 15 to 16.
- FIG. 15A illustrates a syntax for determining an ID of an element included in an AAC file by an audio signal processing apparatus according to an exemplary embodiment of the present invention.
- 15 (b) and 15 (c) show the syntax of a data stream element parsing operation of an audio signal processing apparatus according to an embodiment of the present invention.
- the multimedia stream may include metadata in units of frames.
- the AAC file when transmitted through streaming, it may have syntax as shown in FIGS. 15 to 16.
- the audio signal processing apparatus may determine whether an ID of an element included in the AAC file represents the data stream element ID_DSE. When the ID of an element included in the AAC file indicates a data stream element ID_DSE, the audio signal processing apparatus performs a data stream element parsing operation GaoReadDSE.
- 16 (a) shows the syntax of the header parameter described above.
- 16 (b) shows the syntax of the metadata parameter described above.
- the audio signal processing apparatus parses the header parameter (GaoReadDSEHDR) and parses the metadata parameter (GaoReadDSEMeta).
- the number of channels that can be decoded / rendered by a legacy audio signal processing apparatus that does not support an embodiment of the present invention may be smaller than the number of channels that can be decoded / rendered by the audio signal processing apparatus according to an embodiment of the present invention.
- the legacy audio file format may also include only audio signals having a channel number smaller than the number of channels that the audio signal processing apparatus can decode / render. Therefore, it may be difficult to transmit an audio signal for an audio signal processing apparatus according to an embodiment of the present invention through a legacy audio file format.
- compatibility with legacy audio signal processing apparatus may be a problem. Therefore, an audio signal processing method using a legacy audio file format will be described with reference to FIG. 17.
- FIG. 17 is a view illustrating an audio signal processing method using an audio file format that supports a number of channels smaller than the sum of the number of channels included in an audio signal according to an embodiment of the present invention.
- the audio file may include a plurality of tracks.
- one audio file may include a plurality of tracks in which the dialogue of the same movie is recorded in different languages.
- the audio file may include a plurality of tracks containing different music.
- An audio signal processing apparatus that processes an audio signal to deliver an audio signal may encode an audio signal having more channels than the number of channels supported by the audio file using the track of the audio file into the audio file.
- an audio signal processing apparatus that processes an audio signal to deliver an audio signal may divide and insert a plurality of audio signal components of the audio signal into a plurality of tracks included in the audio file.
- the plurality of signal components may be at least one of an object signal, a channel signal, and an ambisonic signal.
- each track of an audio file can support only a number of channels smaller than the sum of the number of channels of the plurality of signal components.
- the number of channels of signal components included in each track of the audio file may be smaller than the number of channels supported by each track of the audio file.
- the audio signal processing apparatus that processes the audio signal to deliver the audio signal supports the number of channels supported by the audio file on the first track of the format.
- a first signal component can be inserted, and a second signal component can be inserted into a second track of the audio file.
- the first track may be a predetermined track.
- the first signal component may be an audio signal component that can be rendered without metadata for representing the position of the sound image simulated by the audio signal.
- the first signal component may be an audio signal component that may be rendered without metadata for binaural rendering.
- an audio signal processing apparatus that processes an audio signal to deliver an audio signal may insert signal components other than the first signal component according to a predetermined track order.
- an audio signal processing apparatus that processes an audio signal to transmit an audio signal may insert metadata into a first track.
- the metadata may indicate a track including signal components other than the first signal component. Metadata can also be used to render the audio signal.
- the metadata may be metadata described with reference to FIGS. 3 to 8.
- An audio signal processing apparatus for rendering an audio signal may simultaneously render audio signal components included in a plurality of tracks included in an audio file.
- the plurality of audio signal components may be at least one of an object signal, a channel signal, and an ambisonic signal.
- each track of the audio file may support a number of channels smaller than the sum of the number of channels of the plurality of audio signal components.
- the audio signal processing apparatus that renders the audio signal may render the first audio signal component included in the first track of the audio file and the second audio component included in the second track together.
- the first track may be a track at a predetermined position among the plurality of tracks as described above.
- the first track may be the first track of the plurality of tracks of the audio file.
- the audio signal processing apparatus that renders the audio signal may check whether the plurality of tracks of the audio file include audio signal components in a predetermined track order.
- an audio signal processing apparatus that renders an audio signal may acquire metadata from a first track and obtain an audio component based on the obtained metadata.
- the audio signal processing apparatus that renders the audio signal may determine a track including the audio signal component based on the obtained metadata.
- the audio signal processing apparatus that renders the audio signal may acquire metadata from the first track and render the audio signal component based on the metadata.
- the metadata may be metadata described with reference to FIGS. 3 to 8.
- the audio signal processing apparatus for rendering the audio signal may select a plurality of tracks included in the audio file according to the capability of the audio signal processing apparatus and render the selected plurality of tracks.
- the audio signal processing apparatus for rendering the audio signal may select the plurality of tracks according to the characteristics of the audio component included in each of the plurality of tracks and the capability of the audio signal processing apparatus.
- the audio signal processing apparatus for rendering the audio signal may select the first audio signal component and the second audio signal component according to the capabilities of the audio signal processing apparatus.
- an audio signal processing apparatus that processes an audio signal to deliver an audio signal encodes the FOA signal and metadata into one track as shown in FIG. 17 (a).
- an audio signal processing apparatus that renders an audio signal may generate an AAC file included in the MP4 file of FIG. 17B.
- the audio signal processing apparatus for processing the audio signal to transmit the audio signal inserts the first ambisonic signal (FOA) and metadata into the first track (TRK0) of the AAC file.
- An audio signal processing apparatus which processes an audio signal to transmit an audio signal inserts a first object signal OBJ0 and a second object signal OBJ1 into a second track TRK1 of an AAC file.
- the audio signal processing apparatus which processes the audio signal to transmit the audio signal inserts the third object signal OBJ2 and the fourth object signal OBJ3 into the third track TRK2 of the AAC file. In addition, the audio signal processing apparatus which processes the audio signal to transmit the audio signal inserts the fifth object signal OBJ4 and the sixth object signal OBJ5 into the fourth track TRK3 of the AAC file. Also, the audio signal processing apparatus which processes the audio signal to transmit the audio signal inserts the seventh object signal OBJ6 and the eighth object signal OBJ7 into the fifth track TRK4 of the AAC file. In addition, the audio signal processing apparatus which processes the audio signal to transmit the audio signal inserts the second ambisonic signal FAO1 into the sixth track TRK5 of the AAC file.
- the second ambisonic signal FAO1 is a primary ambisonic signal including four channels.
- the audio signal processing apparatus which processes the audio signal to transmit the audio signal inserts the third ambisonic signal HOA2 into the seventh track TRK6 of the AAC file.
- Third Ambisonic Signal HOA2 The Ambisonic signal includes five channels, and the second Ambisonic signal HOA1 and the third Ambisonic signal HOA2 constitute a secondary Ambisonic signal.
- the audio signal processing apparatus that processes the audio signal to transmit the audio signal inserts the fourth ambisonic signal HOA3 into the eighth track TRK7 of the AAC file.
- the Ambisonic Signal includes seven channels, and the second Ambisonic Signal (FOA1), the Third Ambisonic Signal (HOA2) and the Fourth Ambisonic Signal (HOA3) are tertiary Ambisonic Signals.
- a decoder of an audio signal processing apparatus that renders an audio signal decodes an audio signal included in a track of an AAC file.
- the decoder of the audio signal processing apparatus that renders the audio signal does not decode the metadata Meta included in the first track TRK0 of the AAC file.
- the audio signal processing apparatus that renders the audio signal may determine the track of the AAC file including the audio signal component based on metadata Meta, and decode the audio signal included in the track of the AAC file. .
- FIG. 17C a decoder of an audio signal processing apparatus that renders an audio signal.
- a renderer of an audio signal processing apparatus that renders an audio signal may convert the audio signal component (OBJ / HOA / CHN Audio) included in the track of the AAC file into metadata (OBJ / HOA / CHN Metadata).
- the audio signal processing apparatus that renders the audio signal may selectively render a plurality of tracks according to the capability of the audio signal processing apparatus. For example, an audio signal processing apparatus capable of rendering a signal including four channels may render a second ambisonic signal FAO1. In this case, the audio signal processing apparatus capable of rendering a signal including nine channels may simultaneously render the second ambisonic signal FAO1 and the third ambisonic signal HOA2.
- the audio signal processing apparatus capable of rendering a signal including 16 channels may simultaneously render the second ambisonic signal FAO1, the third ambisonic signal HOA2, and the fourth ambisonic signal HOA3. have.
- the audio signal processing apparatus for rendering an audio signal may render an audio signal including an individual channel of the audio file format including a channel number larger than the number of channels supported by the track.
- compatibility between audio signal processing apparatuses that support different number of channel decoding / rendering may be ensured.
- FIG. 18 is a block diagram illustrating an audio signal processing apparatus that processes an audio signal to deliver an audio signal according to an embodiment of the present invention.
- an audio signal processing apparatus 300 for processing an audio signal to deliver an audio signal includes a receiver 310, a processor 330, and an output unit 370.
- the receiver 10 receives an input audio signal.
- the audio signal may be a sound received by the sound collector.
- the sound collection device may be a microphone.
- the sound collecting device may be a microphone array including a plurality of microphones.
- the processor 30 encodes the input audio signal received by the receiver 10 to generate a bitstream and generates metadata about the audio signal.
- the processor 30 may include a format converter and a metadata generator.
- the format converter converts the format of the input audio signal into another format.
- the format converter may convert an object signal into an ambisonic signal.
- the ambisonic signal may be a signal recorded through the microphone array.
- the ambisonic signal may be a signal obtained by converting a signal recorded through a microphone array into a coefficient with respect to the basis of spherical harmonics.
- the format converter may convert an ambisonic signal into an object signal.
- the format converter may change the order of the ambisonic signal.
- the format converter may convert a higher order ambisonics (hoa) signal into a first order ambisonics (foa) signal.
- the format converter may acquire position information related to the input audio signal, and convert the format of the input audio signal based on the acquired position information.
- the location information may be information about a microphone array in which a sound corresponding to an audio signal is collected.
- the information on the microphone array may include at least one of array information, number information, location information, frequency characteristic information, and beam pattern information of microphones constituting the microphone array.
- the position information related to the input audio signal may include information indicating the position of the sound source.
- the metadata generator generates metadata corresponding to the input audio signal.
- the metadata generator may generate metadata used to render the input audio signal.
- the metadata may be metadata in the embodiments described with reference to FIGS. 3 to 17.
- the metadata may be delivered to the audio signal processing apparatus according to the embodiments described with reference to FIGS. 9 to 17.
- the processor 330 may insert a plurality of audio signal components of the audio signal into a plurality of tracks included in the audio file format.
- the plurality of signal components may be at least one of an object signal, a channel signal, and an ambisonic signal.
- the processor 330 may operate as in the embodiment described with reference to FIG. 17.
- the output unit 370 outputs a bitstream and metadata.
- 19 is a flowchart illustrating a method of operating an audio signal processing apparatus to transmit an audio signal according to an embodiment of the present invention.
- the audio signal processing apparatus which processes the audio signal to transmit the audio signal receives the audio signal (S1901).
- the audio signal processing apparatus encodes the received audio signal (S1903).
- the audio signal processing apparatus may generate metadata about the audio signal.
- the metadata can be used to render the audio signal.
- the rendering may be binaural rendering.
- the audio signal processing apparatus may generate metadata about the audio signal, including information for reflecting the position of the sound image simulated by the audio signal.
- the audio signal processing apparatus may insert a sound level corresponding to the time interval indicated by the metadata into the metadata. In this case, the sound level may be used to determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal.
- the audio signal processing apparatus may insert binaural effect intensity information indicating the binaural rendering intensity applied to the audio signal, into the metadata.
- the binaural effect intensity information may be used to change the relative size of the HRTF or the BRIR.
- the binaural effect intensity information may indicate the binaural rendering intensity for each audio signal component of the audio signal.
- the binaural effect intensity information may indicate the intensity of the binaural rendering applied on a frame basis.
- the audio signal processing apparatus may insert the motion application information indicating whether to render the audio signal by reflecting the movement of the listener in the metadata.
- the movement of the listener may include the movement of the head of the listener.
- the audio signal processing apparatus may insert personalization parameter application information indicating whether to allow the application of the personalization parameter, which is a parameter that may be set according to the listener, in the metadata.
- the personalization parameter application information may represent that personalization parameter application is not allowed.
- the format of specific metadata may be the same as the embodiments described with reference to FIGS. 3 to 16.
- the audio signal processing apparatus may generate an audio file including a plurality of audio signal components of the received audio signal in the plurality of tracks.
- the audio signal processing apparatus may generate an audio file including the audio signal first audio signal component in the first track and the second audio signal component of the audio signal in the second track.
- the number of channels of the audio signal supported by each of the first track and the second track may be smaller than the sum of the number of channels of the audio signal.
- the first track may be a track at a predetermined position among the plurality of tracks of the audio file.
- the first track may be the first track.
- the audio signal encoding apparatus may insert metadata into the first track.
- the metadata may indicate which track of the plurality of tracks of the audio file includes an audio signal component of the audio signal.
- the audio signal processing apparatus may insert the plurality of audio signal components of the audio signal in the order specified in the plurality of tracks.
- an audio signal processing apparatus that processes an audio signal to transmit an audio signal may operate as in the embodiments described with reference to FIGS. 17 to 18.
- the audio signal processing apparatus outputs the encoded audio signal (S1905).
- the audio signal processing apparatus may output the generated metadata.
- the audio signal encoding apparatus may output the generated audio file.
- 20 is a flowchart illustrating a method of operating an audio signal processing apparatus that renders an audio signal according to an exemplary embodiment.
- the audio signal processing apparatus for rendering the audio signal receives the audio signal (S2001).
- the audio signal processing apparatus may receive an audio file including the audio signal.
- the audio signal processing apparatus renders the received audio signal (S2003).
- the audio signal processing apparatus may binaurally render the received audio signal.
- the audio signal processing apparatus may render the audio signal by reflecting the position of the sound image simulated by the audio signal based on metadata about the received audio signal.
- the audio signal processing apparatus may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal. In this case, the audio signal processing apparatus may render the audio signal according to the determination.
- the metadata may include sound level information indicating a sound level corresponding to a time interval indicated by the metadata.
- the audio signal processing apparatus may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal based on the sound level information. For example, the audio signal processing apparatus may compare the difference between the sound level of the audio signal corresponding to the first time interval and the sound level of the audio signal corresponding to the second time interval. In this case, the audio signal processing apparatus may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal corresponding to the second time interval based on the comparison result. . In this case, the first time interval may be a time ahead of the second time interval.
- the first time interval and the second time interval may be continuous time intervals.
- the audio signal processing apparatus may determine whether to render the audio signal by reflecting the position of the sound image simulated by the audio signal based on whether the sound level indicated by the sound level information is smaller than a predetermined value. In more detail, when the sound level information indicates mute, the audio signal processing apparatus may render the audio signal without reflecting the position of the sound image simulated by the audio signal.
- the metadata may include binaural effect intensity information indicating the binaural rendering application intensity.
- the audio signal processing apparatus may determine the binaural rendering application strength of the audio signal based on the binaural effect intensity information. Also, the audio signal processing apparatus may binaurally render the audio signal at the determined binaural rendering application intensity. In detail, the audio signal processing apparatus may change a relative size of a head related transfer function (HRTF) or a binaural rendering impulse response (BRIR) for binaural rendering according to the determined binaural rendering application intensity.
- the binaural effect intensity information may indicate the binaural rendering intensity for each component of the audio signal.
- the binaural effect intensity information may indicate the binaural rendering intensity in units of frames.
- the audio signal processing apparatus may render the audio signal by applying fade in / fade out depending on whether the audio signal is rendered or not by reflecting the position of the sound image simulated.
- the metadata may include motion application information indicating whether to render the audio signal by reflecting the movement of the listener.
- the audio signal processing apparatus may determine whether to render the audio signal by reflecting the movement of the listener based on the motion application information.
- the audio signal processing apparatus may render the audio signal without reflecting the listener's movement according to the motion application information.
- the movement of the listener may include the movement of the head of the listener.
- the metadata may include personalization parameter application information indicating whether to allow the application of the personalization parameter, which is a parameter that can be set according to the listener.
- the audio signal processing apparatus may render the audio signal based on the personalization parameter application information.
- the audio signal processing apparatus may render the audio signal without applying the personalization parameter according to the personalization parameter application information.
- the specific format of the metadata may be the same as the embodiment described with reference to FIGS. 3 to 16.
- the metadata may be delivered according to the embodiments described with reference to FIGS. 9 to 14.
- the audio signal processing apparatus may simultaneously render a plurality of audio signal components included in each of the plurality of tracks of the audio file including the audio signal.
- the audio signal processing apparatus may simultaneously render the first audio signal component included in the first track of the audio file including the audio signal and the second audio signal component included in the second track.
- the number of channels of the audio signal supported by each of the first track and the second track may be smaller than the sum of the number of channels of the audio signal.
- the first track may be a track of a predetermined position among the plurality of tracks of the audio file.
- the first track may include metadata.
- the audio signal processing apparatus may determine a track of the audio file including the audio signal component based on the metadata.
- the audio signal processing apparatus may render the first audio signal component and the second audio signal component based on the metadata.
- the audio signal processing apparatus may binaurally render the first audio signal component and the second audio signal component based on the metadata.
- the audio signal processing apparatus may check in a predetermined track order whether the plurality of tracks of the audio file include audio signal components of the audio signal.
- the audio signal processing apparatus outputs the rendered audio signal (S2005). As described above, the audio signal processing apparatus may output the rendered audio signal through two or more loudspeakers. In another specific embodiment, the audio signal processing apparatus may output a rendered audio signal through two-channel stereo headphones.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims (16)
- 오디오 신호를 렌더링하는 오디오 신호 처리 장치에서,오디오 신호를 포함하는 오디오 파일을 수신하는 수신부;상기 오디오 파일의 제1 트랙에 포함된 제1 오디오 신호 성분과 제2 트랙에 포함된 제2 오디오 신호 성분을 동시에 렌더링하는 프로세서; 및상기 렌더링된 제1 오디오 신호 성분과 상기 렌더링된 제2 오디오 신호 성분을 출력하는 출력부를 포함하는오디오 신호 처리 장치.
- 제1항에서상기 제1 트랙과 상기 제2 트랙 각각이 지원하는 오디오 신호의 채널 수가 상기 오디오 신호의 채널 수의 합보다 작은오디오 신호 처리 장치.
- 제2항에서상기 제1 트랙은 상기 오디오 파일의 복수의 트랙 중 미리 지정된 위치의 트랙인오디오 신호 처리 장치.
- 제3항에서,상기 제1 오디오 신호 성분은 오디오 신호가 시뮬레이션하는 음상의 위치를 표현하기 위한 메타데이터 없이 렌더링 될 수 있는 오디오 신호 성분인오디오 신호 처리 장치.
- 제4항에서,상기 제1 오디오 신호 성분은 바이노럴 렌더링을 위한 메타데이터 없이 렌더링 될 수 있는 오디오 신호 성분인오디오 신호 처리 장치.
- 제3항에서,상기 제1 트랙은 메타데이터를 포함하고,상기 프로세서는상기 메타데이터를 기초로 오디오 신호 성분을 포함하는 상기 오디오 파일의 트랙을 판단하는오디오 신호 처리 장치.
- 제5항에서,상기 프로세서는상기 메타데이터를 기초로 상기 제1 오디오 신호 성분과 상기 제2 오디오 신호 성분을 렌더링하는오디오 신호 처리 장치.
- 제3항에서,상기 프로세서는상기 오디오 파일의 복수의 트랙이 상기 오디오 신호의 오디오 신호 성분을 포함하는지 미리 지정된 트랙 순서로 확인하는오디오 신호 처리 장치.
- 제1항에서,상기 프로세서는상기 오디오 신호 처리 장치의 능력에 따라 상기 오디오 파일의 복수의 트랙에 포함된 복수의 오디오 신호 성분 중 상기 제1 오디오 신호 성분과 상기 제2 오디오 신호 성분을 선택하는오디오 신호 처리 장치.
- 오디오 신호 전달을 위해 오디오 신호를 처리하는 오디오 신호 처리 장치에서,오디오 신호를 수신하는 수신부;상기 오디오 신호의 제1 오디오 신호 성분을 제1 트랙에 포함하고, 상기 오디오 신호의 제2 오디오 신호 성분을 제2 트랙에 포함하는 오디오 파일을 생성하는 프로세서; 및상기 오디오 파일을 출력하는 출력부를 포함하는오디오 신호 처리 장치.
- 제10항에서,상기 제1 트랙과 상기 제2 트랙 각각이 지원하는 오디오 신호의 채널 수가 상기 오디오 신호의 채널 수의 합보다 작은오디오 신호 처리 장치.
- 제10항에서,상기 제1 트랙은 상기 오디오 파일의 복수의 트랙 중 미리 지정된 위치의 트랙인오디오 신호 처리 장치.
- 제12항에서,상기 제1 오디오 신호 성분은 오디오 신호가 시뮬레이션하는 음상의 위치를 표현하기 위한 메타데이터 없이 렌더링 될 수 있는 오디오 신호 성분인오디오 신호 처리 장치.
- 제13항에서,상기 제1 오디오 신호 성분은 바이노럴 렌더링을 위한 메타데이터 없이 렌더링 될 수 있는 오디오 신호 성분인오디오 신호 처리 장치.
- 제12항에서,상기 프로세서는상기 제1 트랙에 메타데이터를 삽입하고,상기 메타데이터는 상기 오디오 파일의 복수의 트랙 중 어느 트랙이 상기 오디오 신호의 오디오 신호 성분을 포함하는지 나타내는오디오 신호 처리 장치.
- 제12항에서,상기 프로세서는상기 오디오 신호의 복수의 오디오 신호 성분을 상기 오디오 파일의 복수의 트랙에 지정된 순서대로 삽입하는오디오 신호 처리 장치.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019537729A JP2019533404A (ja) | 2016-09-23 | 2017-09-25 | バイノーラルオーディオ信号処理方法及び装置 |
US15/826,485 US10659904B2 (en) | 2016-09-23 | 2017-11-29 | Method and device for processing binaural audio signal |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2016-0122515 | 2016-09-23 | ||
KR20160122515 | 2016-09-23 | ||
KR20170018515 | 2017-02-10 | ||
KR10-2017-0018515 | 2017-02-10 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/826,485 Continuation US10659904B2 (en) | 2016-09-23 | 2017-11-29 | Method and device for processing binaural audio signal |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018056780A1 true WO2018056780A1 (ko) | 2018-03-29 |
Family
ID=61686917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2017/010564 WO2018056780A1 (ko) | 2016-09-23 | 2017-09-25 | 바이노럴 오디오 신호 처리 방법 및 장치 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10356545B2 (ko) |
JP (1) | JP2019533404A (ko) |
WO (1) | WO2018056780A1 (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10659904B2 (en) | 2016-09-23 | 2020-05-19 | Gaudio Lab, Inc. | Method and device for processing binaural audio signal |
CN113170274A (zh) * | 2018-11-21 | 2021-07-23 | 诺基亚技术有限公司 | 环境音频表示和相关联的渲染 |
JP2022528837A (ja) * | 2019-03-27 | 2022-06-16 | ノキア テクノロジーズ オサケユイチア | 音場関連のレンダリング |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2563635A (en) | 2017-06-21 | 2018-12-26 | Nokia Technologies Oy | Recording and rendering audio signals |
GB2566992A (en) * | 2017-09-29 | 2019-04-03 | Nokia Technologies Oy | Recording and rendering spatial audio signals |
CN115334444A (zh) * | 2018-04-11 | 2022-11-11 | 杜比国际公司 | 用于音频渲染的预渲染信号的方法、设备和系统 |
TWI698132B (zh) | 2018-07-16 | 2020-07-01 | 宏碁股份有限公司 | 音效輸出裝置、運算裝置及其音效控制方法 |
CN110740415B (zh) * | 2018-07-20 | 2022-04-26 | 宏碁股份有限公司 | 音效输出装置、运算装置及其音效控制方法 |
EP3617871A1 (en) * | 2018-08-28 | 2020-03-04 | Koninklijke Philips N.V. | Audio apparatus and method of audio processing |
US11798569B2 (en) * | 2018-10-02 | 2023-10-24 | Qualcomm Incorporated | Flexible rendering of audio data |
US11019449B2 (en) * | 2018-10-06 | 2021-05-25 | Qualcomm Incorporated | Six degrees of freedom and three degrees of freedom backward compatibility |
WO2020080099A1 (ja) * | 2018-10-16 | 2020-04-23 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
JP7285967B2 (ja) * | 2019-05-31 | 2023-06-02 | ディーティーエス・インコーポレイテッド | フォービエイテッドオーディオレンダリング |
JP7432225B2 (ja) * | 2020-01-22 | 2024-02-16 | クレプシードラ株式会社 | 音再生記録装置、及びプログラム |
US11381209B2 (en) | 2020-03-12 | 2022-07-05 | Gaudio Lab, Inc. | Audio signal processing method and apparatus for controlling loudness level and dynamic range |
US12010505B2 (en) * | 2021-05-19 | 2024-06-11 | Snap Inc. | Low latency, low power multi-channel audio processing |
EP4392970A1 (en) * | 2021-08-26 | 2024-07-03 | Dolby Laboratories Licensing Corporation | Method and apparatus for metadata-based dynamic processing of audio data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080053875A (ko) * | 2006-12-11 | 2008-06-16 | 한국전자통신연구원 | 가상현실을 위한 오디오 음상 제어 장치 및 그 방법 |
US20110264456A1 (en) * | 2008-10-07 | 2011-10-27 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Binaural rendering of a multi-channel audio signal |
KR20140027954A (ko) * | 2011-03-16 | 2014-03-07 | 디티에스, 인코포레이티드 | 3차원 오디오 사운드트랙의 인코딩 및 재현 |
KR20140125745A (ko) * | 2013-04-19 | 2014-10-29 | 한국전자통신연구원 | 다채널 오디오 신호 처리 장치 및 방법 |
US20150199973A1 (en) * | 2012-09-12 | 2015-07-16 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for providing enhanced guided downmix capabilities for 3d audio |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2898885C (en) * | 2013-03-28 | 2016-05-10 | Dolby Laboratories Licensing Corporation | Rendering of audio objects with apparent size to arbitrary loudspeaker layouts |
TWI530941B (zh) * | 2013-04-03 | 2016-04-21 | 杜比實驗室特許公司 | 用於基於物件音頻之互動成像的方法與系統 |
CN108712711B (zh) * | 2013-10-31 | 2021-06-15 | 杜比实验室特许公司 | 使用元数据处理的耳机的双耳呈现 |
US10375439B2 (en) * | 2014-05-30 | 2019-08-06 | Sony Corporation | Information processing apparatus and information processing method |
US20180165358A1 (en) * | 2014-06-30 | 2018-06-14 | Sony Corporation | Information processing apparatus and information processing method |
EP3198594B1 (en) * | 2014-09-25 | 2018-11-28 | Dolby Laboratories Licensing Corporation | Insertion of sound objects into a downmixed audio signal |
JP6729382B2 (ja) * | 2014-10-16 | 2020-07-22 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
KR101627652B1 (ko) * | 2015-01-30 | 2016-06-07 | 가우디오디오랩 주식회사 | 바이노럴 렌더링을 위한 오디오 신호 처리 장치 및 방법 |
US10136240B2 (en) * | 2015-04-20 | 2018-11-20 | Dolby Laboratories Licensing Corporation | Processing audio data to compensate for partial hearing loss or an adverse hearing environment |
-
2017
- 2017-09-25 JP JP2019537729A patent/JP2019533404A/ja active Pending
- 2017-09-25 WO PCT/KR2017/010564 patent/WO2018056780A1/ko active Application Filing
- 2017-09-25 US US15/715,062 patent/US10356545B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080053875A (ko) * | 2006-12-11 | 2008-06-16 | 한국전자통신연구원 | 가상현실을 위한 오디오 음상 제어 장치 및 그 방법 |
US20110264456A1 (en) * | 2008-10-07 | 2011-10-27 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Binaural rendering of a multi-channel audio signal |
KR20140027954A (ko) * | 2011-03-16 | 2014-03-07 | 디티에스, 인코포레이티드 | 3차원 오디오 사운드트랙의 인코딩 및 재현 |
US20150199973A1 (en) * | 2012-09-12 | 2015-07-16 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for providing enhanced guided downmix capabilities for 3d audio |
KR20140125745A (ko) * | 2013-04-19 | 2014-10-29 | 한국전자통신연구원 | 다채널 오디오 신호 처리 장치 및 방법 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10659904B2 (en) | 2016-09-23 | 2020-05-19 | Gaudio Lab, Inc. | Method and device for processing binaural audio signal |
CN113170274A (zh) * | 2018-11-21 | 2021-07-23 | 诺基亚技术有限公司 | 环境音频表示和相关联的渲染 |
CN113170274B (zh) * | 2018-11-21 | 2023-12-15 | 诺基亚技术有限公司 | 环境音频表示和相关联的渲染 |
US11924627B2 (en) | 2018-11-21 | 2024-03-05 | Nokia Technologies Oy | Ambience audio representation and associated rendering |
JP2022528837A (ja) * | 2019-03-27 | 2022-06-16 | ノキア テクノロジーズ オサケユイチア | 音場関連のレンダリング |
US12058511B2 (en) | 2019-03-27 | 2024-08-06 | Nokia Technologies Oy | Sound field related rendering |
Also Published As
Publication number | Publication date |
---|---|
US20180091917A1 (en) | 2018-03-29 |
US10356545B2 (en) | 2019-07-16 |
JP2019533404A (ja) | 2019-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018056780A1 (ko) | 바이노럴 오디오 신호 처리 방법 및 장치 | |
WO2019147064A1 (ko) | 오디오 데이터를 송수신하는 방법 및 그 장치 | |
WO2014175669A1 (ko) | 음상 정위를 위한 오디오 신호 처리 방법 | |
WO2018182274A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2017191970A2 (ko) | 바이노럴 렌더링을 위한 오디오 신호 처리 방법 및 장치 | |
CN111466124B (zh) | 用于渲染用户的视听记录的方法,处理器系统和计算机可读介质 | |
WO2019004524A1 (ko) | 6자유도 환경에서 오디오 재생 방법 및 오디오 재생 장치 | |
WO2014021588A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
KR101054932B1 (ko) | 스테레오 오디오 신호의 동적 디코딩 | |
WO2018147701A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2014157975A1 (ko) | 오디오 장치 및 이의 오디오 제공 방법 | |
WO2017209477A1 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2011115430A2 (ko) | 입체 음향 재생 방법 및 장치 | |
US10659904B2 (en) | Method and device for processing binaural audio signal | |
WO2016089180A1 (ko) | 바이노럴 렌더링을 위한 오디오 신호 처리 장치 및 방법 | |
WO2021118107A1 (en) | Audio output apparatus and method of controlling thereof | |
WO2019199046A1 (ko) | 무선 통신 시스템에서 오디오에 대한 메타데이터를 송수신하는 방법 및 장치 | |
WO2019107868A1 (en) | Apparatus and method for outputting audio signal, and display apparatus using the same | |
WO2019103584A1 (ko) | 귀 개방형 헤드폰을 이용한 다채널 사운드 구현 장치 및 그 방법 | |
WO2011139090A2 (en) | Method and apparatus for reproducing stereophonic sound | |
WO2019203627A1 (ko) | 트랜지션 이펙트에 관한 오디오 데이터를 송수신하는 방법 및 그 장치 | |
WO2015147435A1 (ko) | 오디오 신호 처리 시스템 및 방법 | |
WO2019031652A1 (ko) | 3차원 오디오 재생 방법 및 재생 장치 | |
WO2017126895A1 (ko) | 오디오 신호 처리 장치 및 처리 방법 | |
WO2019147040A1 (ko) | 스테레오 오디오를 바이노럴 오디오로 업 믹스하는 방법 및 이를 위한 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17853494 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019537729 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/07/2019) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17853494 Country of ref document: EP Kind code of ref document: A1 |