WO2015152666A1

WO2015152666A1 - Method and device for decoding audio signal comprising hoa signal

Info

Publication number: WO2015152666A1
Application number: PCT/KR2015/003334
Authority: WO
Inventors: 전상배; 김선민
Original assignee: 삼성전자 주식회사
Priority date: 2014-04-02
Filing date: 2015-04-02
Publication date: 2015-10-08

Abstract

Disclosed is a device for decoding an audio signal comprising an HOA signal, comprising: an audio core codec for decoding a bit stream comprising an audio signal to output the HOA signal of a frequency domain or a time domain; and an HOA decoder for rendering and outputting an HOA signal of the frequency domain in the frequency domain.

Description

Method and apparatus for decoding an audio signal comprising a HOA signal

The present invention relates to a method and apparatus for decoding an audio signal comprising a higher order ambisonincs (HOA) signal.

As the quality of multimedia contents increases, high quality multichannel audio signals such as 7.1 channels, 10.2 channels, 13.2 channels, and 22.2 channels are used, which have more channels than the 5.1 audio signals. However, high-quality multi-channel audio signals are often heard through two-channel stereo speakers or headphones through a personal terminal such as a smartphone or a PC.

Accordingly, binaural rendering may be used, which downmixes the multichannel audio signal into the stereo audio signal so that a high quality multichannel audio signal can be listened to in two channels of stereo speakers or headphones.

However, there is a problem in that the amount of computation increases during binaural rendering as the number of channels of the input audio signal increases.

The present invention relates to a method and apparatus for decoding an audio signal including a HOA signal for reducing the amount of computation during binaural rendering.

According to an embodiment, as the conversion between the time domain and the frequency domain is not performed during binaural rendering, the complexity in the audio decoding stage may be reduced.

1 is a block diagram illustrating an internal structure of an audio decoder including a HOA decoder according to an embodiment.

2 is a block diagram illustrating an internal structure of an audio decoder according to an embodiment.

3 is a flowchart illustrating a method of decoding an audio signal including a HOA signal according to an embodiment.

4 is a flowchart illustrating a method of decoding an audio signal including a HOA signal according to a processing domain of a HOA decoder according to an embodiment.

5 is a block diagram illustrating an internal structure of an audio decoder according to an embodiment.

An apparatus for decoding an audio signal including a HOA signal, according to an embodiment, comprising: an audio core codec for decoding a bitstream including an audio signal and outputting the HOA signal in a frequency domain or a time domain; And a HOA decoder for rendering and outputting the HOA signal of the frequency domain in the frequency domain.

A mixer for mixing in the frequency domain a HOA signal in the rendered frequency domain with other audio signals; In the frequency domain, further comprises a binaural renderer for binaural rendering the signal mixed by the mixer.

The audio core codec outputs a HOA signal in the frequency domain when the processing domain of the HOA decoder is neutral or frequency domain, and the HOA decoder renders and outputs a HOA signal in the frequency domain in the frequency domain. .

When the domain conversion method in the audio core codec and the binaural renderer is the same, the processing domain of the HOA decoder is determined as a neutral or frequency domain.

According to an embodiment, a method of decoding an audio signal including a HOA signal, the method comprising: decoding the bitstream including an audio signal and outputting the HOA signal in a frequency domain or a time domain; In the frequency domain, rendering and outputting the HOA signal of the frequency domain.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description and the accompanying drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention will be omitted. In addition, it should be noted that like elements are denoted by the same reference numerals as much as possible throughout the drawings.

The terms or words used in the specification and claims described below should not be construed as being limited to the ordinary or dictionary meanings, and the inventors are properly defined as terms for explaining their own invention in the best way. It should be interpreted as meaning and concept corresponding to the technical idea of the present invention based on the principle that it can. Therefore, the embodiments described in the present specification and the configuration shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention, and various alternatives may be substituted at the time of the present application. It should be understood that there may be equivalents and variations.

In the accompanying drawings, some components are exaggerated, omitted, or schematically illustrated, and the size of each component does not entirely reflect the actual size. The invention is not limited by the relative size or spacing drawn in the accompanying drawings.

When any part of the specification is to "include" any component, this means that it may further include other components, except to exclude other components unless otherwise stated. In addition, when a part is "connected" with another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element between them.

In addition, the term "part" as used herein refers to a hardware component, such as software, FPGA or ASIC, and "part" plays certain roles. However, "part" is not meant to be limited to software or hardware. The “unit” may be configured to be in an addressable storage medium and may be configured to play one or more processors. Thus, as an example, a "part" refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, Subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables. The functionality provided within the components and "parts" may be combined into a smaller number of components and "parts" or further separated into additional components and "parts".

Also, in the present specification, an audio object refers to each of sound components included in an audio signal. One audio signal may include various audio objects. For example, the audio signal generated by recording the performance of an orchestra includes a plurality of audio objects generated from a plurality of musical instruments such as guitar, violin, and oboe.

In addition, in the present specification, the HOA signal refers to a signal in which the audio signal is represented by coefficients representing a three-dimensional sound field. The HOA signal is one of content types for representing an audio signal such as an object and a channel. The HOA signal may be included in the bitstream in addition to the information about the channel and the object, and may be rendered as a channel through which the audio signal is output by the HOA decoder.

In addition, in the present specification, for convenience of description, the method of rendering the HOA signal is described as a reference, but it is not limited thereto, and the exemplary embodiments described herein may be applied to a method of rendering various types of audio signals.

In addition, in the present specification, the processing domain refers to a domain in which a corresponding component is operated. The processing domain can be set to one of time domain, frequency domain and neutral. The processing domain of components that can operate in either the time domain or the frequency domain may be set to neutral.

DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

In one embodiment, the described technique is described based on the Moving Picture Experts Group-H (MPEG-H) standard, but is not limited thereto and may be applied to other audio coding techniques.

Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

The audio decoder 100 of FIG. 1 may include an audio core codec 110, a format converter 120, an object renderer 130, a HOA decoder 140, a mixer 150, and a binaural renderer 160. Can be. The audio decoder 100 is not limited to the components shown in FIG. 1, and may further include other components as necessary.

The audio core codec 110 may extract a plurality of channels, objects, and a HOA signal by decoding a bitstream including an audio signal. The audio core codec 110 according to an embodiment may be a unified speech and audio coding (USAC) core decoder. The audio core codec 110 may be various types of decoders for decoding a bitstream including an audio signal.

The audio core codec 110 according to an embodiment may decode the audio signal using a spectral band replication (SBR) technique that obtains a high band signal from a low band signal and a parameter. When decoding data using the SBR technology, the audio core codec 110 may output a decoded audio signal in the frequency domain. The audio core codec 110 may output the HOA signal in the frequency domain by decoding the bitstream.

The audio core codec 110 demuxes the bit stream and performs T (time) / frequency (F) conversion on the demuxed data to extract channels, objects, HOA signals, etc. from the bit stream in the frequency domain. To perform the main process. The audio core codec 110 may convert the HOA signal extracted as a result of performing the main process into the time domain and output the HOA signal of the frequency domain without domain conversion. The audio core codec 110 may include the HOA decoder 140. The HOA signal in the frequency domain may be converted into a value in the time domain according to the processing domain of and output. The processing domain of the HOA decoder 140 may be determined according to whether the domain conversion methods of the audio core codec 110 and the binaural renderer 160 are the same.

A domain conversion method for converting the HOA signal output from the audio core codec 110 to the time domain and a domain conversion method for converting the HOA signal of the time domain input from the binaural renderer 160 to the frequency domain Whether or not it can be determined.

For example, when the audio core codec 110 converts the domain of the HOA signal with QMF synthesis and the binaural renderer 160 converts the domain of the audio signal with QMF analysis corresponding to QMF synthesis, the same method is used. It may be determined that domain conversion is performed. Inverse fast fourier transform (IFFT) and fast fourier transform (FFT) methods may be determined to be the same domain transformation method.

On the other hand, when the audio core codec 110 converts the domain of the HOA signal by QMF synthesis and the binaural renderer 160 converts the domain of the HOA signal by the FFT method, domain conversion is performed in different ways. Can be judged.

When it is determined that the domain transformation is performed in the same manner, respectively, the domain transformation process of the audio core codec 110 and the domain transformation process of the binaural renderer 160 are omitted, and the HOA decoder 140 performs the HOA in the frequency domain. When the processing domain of the HOA decoder 140 is the time domain, the audio core codec 110 may convert the HOA signal of the decoded frequency domain into a value of the time domain and output the converted signal. If the domain conversion method of the audio core codec 110 and the domain conversion method of the binaural renderer 160 do not correspond to each other, the HOA decoder 140 may operate in the time domain.

In addition, when the processing domain of the HOA decoder 140 is a neutral or frequency domain, the audio core codec 110 may output a decoded HOA signal of a frequency domain without domain conversion.

The format converter 120 may convert the audio signal to be output to each channel according to the output environment in which the audio signal is to be output. The audio signal to be output through each channel may be input to the format converter 120 as channel information and pre-rendered object information among data output by the audio core codec 110. The output environment in which the audio signal is output may include layout information, performance information, and the like of the speaker to which the audio signal is output. Since the output environment in which the audio signal is output is different from the output environment assumed at the time of encoding, the format converter 120 may convert the audio signal based on the environment information in which the audio signal is actually output.

The object renderer 130 may render the audio object at a predetermined spatial position based on metadata regarding the audio object.

The HOA decoder 140 may render a HOA signal including the HOA coefficients and the HOA side information output by the audio core codec 110 in multiple channels. HOA coefficients are values representing an audio signal in a three-dimensional sound field space. Based on the HOA side information, the HOA signal can be rendered in multiple channels. The HOA decoder 140 may render the HOA signal and output the rendered HOA signal. The HOA decoder may be referred to as a renderer, a HOA renderer, or the like.

The processing domain of HOA decoder 140 may be time domain, frequency domain or neutral. The processing domain of the HOA decoder 140 may be determined according to whether the domain conversion methods of the audio core codec 110 and the binaural renderer 160 are the same.

When the processing domain of the HOA decoder 140 is the time domain, the HOA decoder 140 may receive the HOA signal of the time domain. As the domain conversion method of the audio core codec 110 and the binaural renderer 160 is determined to be different, the processing domain of the HOA decoder 140 may be determined as the time domain. In addition, the HOA decoder 140 may render the HOA signal in the time domain.

On the other hand, the HOA decoder 140 in which the processing domain is the neutral or frequency domain can be operated in the frequency domain. Therefore, the audio core codec 110 may output the HOA signal in the frequency domain to the HOA decoder 140 without domain conversion. As the domain conversion method of the audio core codec 110 and the binaural renderer 160 is determined to be the same, the processing domain of the HOA decoder 140 may be determined as the frequency domain or the neutral. The HOA decoder 140 may render the HOA signal in the frequency domain in the frequency domain and output the rendered HOA signal in the frequency domain.

The mixer 150 may mix a plurality of audio signals such as a rendered object, a rendered HOA signal, and channel information. The mixer 150 is not limited to the example described above, and may mix various types of audio signals. The mixer 150 may appropriately convert and mix the volume, tone, and the like of the rendered audio signals. The mixer 150 may output a mixed audio signal that may be output to each channel. The mixer 150 may output the mixed audio signal of the frequency domain or the time domain according to the input signal. When the mixer 150 mixes the audio signal of the frequency domain, the mixer 150 may output the mixed audio signal of the frequency domain. When the mixer 150 mixes audio signals in the time domain, the mixer 150 may output the mixed audio signals in the time domain.

The binaural renderer 160 may downmix the audio signal mixed by the mixer 150 and binaurally render the 2-channel signal. The binaural renderer 160 performs binaural rendering of the multi-channel mixed audio signal so that the multi-channel mixed audio signal can be output to two-channel stereo speakers or headphones through a terminal device such as a smartphone or a PC. can do.

When the multi-channel audio signal input to the binaural renderer 160 is the time domain, the binaural renderer 160 converts the multi-channel audio signal in the time domain into an audio signal in the frequency domain. As it is determined that the domain conversion methods of the audio core codec 110 and the binaural renderer 160 are different, the processing domain of the HOA decoder 140 may be determined as the time domain, and the binaural renderer 160 may have time. The audio signal of the domain may be input. The binaural renderer 160 may perform binaural rendering on the audio signal of the domain transformed frequency domain.

On the other hand, when the multichannel audio signal input to the binaural renderer 160 is the frequency domain, the binaural renderer 160 may perform binaural rendering on the audio signal of the frequency domain without domain conversion. . As the domain conversion method of the audio core codec 110 and the binaural renderer 160 is determined to be the same, the processing domain of the HOA decoder 140 may be determined as the frequency domain or the neutral, and the binaural renderer 160 is determined. An audio signal in a frequency domain may be input to the channel. The binaural renderer 160 may generate two channels of binaural signals in the frequency domain by performing binaural rendering. The binaural renderer 160 may convert the domain of the binaural signal into the time domain and output an audio signal of the time domain.

According to an embodiment, when the HOA decoder 140 operates in the frequency domain, the rendered HOA signal output from the HOA decoder 140 is a signal of the frequency domain. Accordingly, the mixer 150 may output the mixed audio signal in the frequency domain, and the audio signal in the frequency domain may be input to the binaural renderer 160. Accordingly, since the binaural renderer 160 may perform binaural rendering without converting the audio signal of the time domain into the frequency domain, the complexity of the audio decoding stage may be reduced.

When the audio core codec 110 converts the decoded HOA signal of the decoded frequency domain into a time domain value and the binaural renderer 160 converts the time domain audio signal into the frequency domain in the same manner. The HOA decoder 140 may render the HOA signal in the frequency domain. For example, when the audio core codec 110 converts the domain of the HOA signal by QMF synthesis, when the binaural renderer 160 converts the domain of the audio signal by QMF analysis corresponding to QMF synthesis, the audio core The domain conversion process in the codec 110 and the binaural renderer 160 may be omitted, and the HOA decoder 140 may render the HOA signal in the frequency domain.

In addition, when the HOA decoder 140 operates in the frequency domain, the HOA signal rendered by the HOA decoder 140 is a signal in the frequency domain, and the rendered channels and objects are also signals in the frequency domain. Thus, a unified interface in the frequency domain may be provided for the post-rendering process for signals such as HOA signals, channels, objects, and the like.

The binaural renderer 160 may binaurally render not only the mixed HOA signal but also the signals of the mixed channel and the object. The mixed channel and object signals that may be input to the binaural renderer 160 may be signals in a frequency domain. Thus, the binaural renderer 160 may binaurally render the signals of the mixed channels and objects in the frequency domain.

When the mixed HOA signal in the time domain is input to the binaural renderer 160, the binaural renderer 240 converts the domain of the mixed HOA signal into the frequency domain, unlike the signals of the channel and the object. After that, you can perform binaural rendering, which is the main process. However, when the HOA decoder 140 operates in the frequency domain, as the HOA decoder 140 outputs the rendered HOA signal in the frequency domain, the binaural renderer 160 receives the mixed HOA signal in the frequency domain. can do. Accordingly, the binaural renderer 160 performs the binaural rendering in the frequency domain on the mixed HOA signal in the frequency domain without mixing the other audio signals in the frequency domain, for example, the channel and object signals. Can be done.

When a binaural signal of two channels in the frequency domain is generated by the binaural renderer 160, the binaural signal in the frequency domain may be converted into a value in the time domain according to F / T conversion. The binaural signal in the frequency domain may be converted into the time domain such that the binaural signal is sequentially output through the audio output device over time. The converted binaural signal of the time domain may be finally output through a two-channel audio output device such as a speaker or a headphone.

The audio decoder 200 according to an embodiment may be a terminal device that can be used by a user. For example, the audio decoder 200 may include a smart television (television), ultra high definition (UHD) TV, a monitor, a personal computer (PC), a notebook computer, a mobile phone, a tablet PC, a navigation terminal, Smart phones, personal digital assistants (PDAs), portable multimedia players (PMPs), and digital broadcast receivers.

Referring to FIG. 2, the audio decoder 200 may include an audio core codec 210 and a HOA decoder 220. The audio core codec 210 and the HOA decoder 220 of FIG. 2 may correspond to the audio core codec 110 and the HOA decoder 140 of FIG. 1, respectively.

The audio core codec 210 may obtain a HOA signal by decoding a bitstream including an audio signal. When decoding data using the SBR technology, the audio core codec 210 may obtain a decoded audio signal in the frequency domain. The audio core codec 210 may convert and output the HOA signal in the frequency domain into a value in the time domain according to the processing domain of the HOA decoder 220. The processing domain of the HOA renderer 220 may be determined depending on whether the domain conversion methods of the audio core codec 210 and the binaural renderer 160 are the same.

When the processing domain of the HOA decoder 220 is the time domain, the audio core codec 210 may convert and output the HOA signal of the decoded frequency domain into a value of the time domain. In addition, when the processing domain of the HOA decoder 220 is a neutral or frequency domain, the audio core codec 210 may output the decoded HOA signal of the frequency domain without domain conversion.

The HOA decoder 220 may render the HOA signal output by the audio core codec 210.

Meanwhile, when the processing domain of the HOA decoder 220 is a neutral or frequency domain, the HOA signal of the frequency domain input from the audio core codec 210 may be rendered. The HOA decoder 220 may render the HOA signal in the frequency domain and output the HOA signal in the rendered frequency domain.

The HOA signal rendered by the HOA decoder 220 may be mixed with other audio signals in the frequency domain and then binaurally rendered and finally output.

Referring to FIG. 3, in operation S310, the audio core codec 210 may decode a bit stream and output a HOA signal in a frequency domain or a time domain. The HOA signal may include a HOA coefficient and HOA side information. When decoding data using the SBR technique, the audio core codec 210 obtains a decoded audio signal in the frequency domain, and decodes the decoded HOA signal in the time domain or the frequency domain according to the processing domain of the HOA decoder 220. You can print

When the processing domain of the HOA decoder 220 is the time domain, the audio core codec 210 may output the time domain HOA signal by converting the decoded HOA signal into a value of the time domain. In addition, when the processing domain of the HOA decoder 220 is a neutral or frequency domain, the audio core codec 210 may output the decoded HOA signal of the frequency domain without domain conversion. The processing domain of the HOA decoder 220 may be determined according to whether the domain conversion methods of the audio core codec 110 and the binaural renderer are the same.

In operation S320, the HOA decoder 220 may render the HOA signal in a plurality of channels in the frequency domain or the time domain according to the processing domain of the HOA decoder 220.

When the processing domain of the HOA decoder 220 is the time domain, the HOA decoder 220 may receive a time domain HOA signal from the audio core codec 210. The HOA decoder 220 may output the rendered HOA signal in the time domain.

Meanwhile, when the processing domain of the HOA decoder 220 is a neutral or frequency domain, the HOA decoder 220 may receive a HOA signal in the frequency domain from the audio core codec 210. The HOA decoder 220 may render the HOA signal in the frequency domain and output the HOA signal in the rendered frequency domain. Therefore, unlike the HOA decoder 220 in the time domain, the HOA decoder 220 may directly output the rendered HOA signal in the frequency domain without performing a domain conversion process.

Referring to FIG. 4, in step S410, the audio core codec 210 may obtain a HOA signal in a frequency domain by decoding a bit stream including an audio signal. When decoding data using the SBR technology, the audio core codec 210 may obtain a decoded audio signal in the frequency domain.

In operation S420, the processing domain of the HOA decoder 220 may be determined depending on whether the audio core codec and the binaural renderer have the same domain conversion method. The processing domain of HOA decoder 220 may be set to one of time domain, frequency domain, and neutral.

In operation S420, when it is determined that the domain conversion method of the audio core codec and the binaural renderer is the same, the processing domain of the HOA decoder 220 may be determined as a frequency domain or a neutral. Therefore, the audio core codec 210 may output the decoded HOA signal of the frequency domain without domain conversion.

In operation S430, the HOA decoder 220 may render a plurality of channels in the frequency domain with respect to the HOA signal in the frequency domain output by the audio core codec 210.

On the other hand, when it is determined in step S420 that the domain conversion method of the audio core codec and the binaural renderer are the same, the processing domain of the HOA decoder 220 may be determined as the time domain. The HOA decoder 220 may render the HOA signal decoded in the time domain.

In operation S440, since the processing domain of the HOA decoder 220 is a time domain, the audio core codec 210 may convert and output a HOA signal in a frequency domain into a HOA signal in a time domain.

In operation S450, the HOA decoder 220 may output the HOA signal in the time domain by rendering the HOA signal in the time domain input from the audio core codec 210 in the time domain.

In operation S460, the audio decoder 200 may mix the HOA signal of the frequency domain or time domain rendered by the HOA decoder 220 with another audio signal. The audio decoder 200 may further include a mixer 150 for mixing the plurality of audio signals.

In operation S470, the audio decoder 200 may binaurally render the mixed signal in the frequency domain or the time domain in the frequency domain. The audio decoder 200 may further include a binaural renderer 160 for performing binaural rendering. Since the binaural renderer 160 may perform binaural rendering in the frequency domain, when the mixed signal of the time domain is received, the binaural renderer 160 may further perform processing for converting the time domain signal into the frequency domain. . However, when the binaural renderer 160 receives the mixed signal of the frequency domain, the binaural renderer may perform binaural rendering without performing a domain conversion process. The binaural renderer 160 may receive the mixed signal of the frequency domain when the HOA decoder 140 operates in the frequency domain.

In addition, when the HOA decoder 220 operates in the frequency domain, the HOA signal rendered by the HOA decoder 220 is a signal in the frequency domain, and the rendered channels and objects are also signals in the frequency domain. Thus, a uniform interface in the frequency domain can be provided for tasks performed by mixer 150 and binaural renderer 160 using audio signals, such as rendered HOA signals, channels, objects, and the like. In addition, since the computation amount of the task performed in the frequency domain is less than the computation amount of the task performed in the time domain, the computation amount of the task after rendering may be reduced.

In addition, the binaural rendered binaural signal may be converted to the time domain and output to a device capable of outputting an audio signal such as a speaker or a headphone.

Referring to FIG. 5, the audio decoder 500 may include an audio core codec 510, a HOA decoder 520, and a binaural renderer 530.

The audio core codec 510 demuxes 511 the bit stream and performs T (time) / frequency (F) conversion on the demuxed data (512), so that channels, objects, The main process 513 for extracting the HOA signal and the like may be performed. The audio core codec 110 may output the HOA signal of the frequency domain extracted as a result of the main process.

The HOA decoder 520 may render the HOA signal output by the audio core codec 110 in the frequency domain. The HOA decoder 520 may output the HOA signal in the rendered frequency domain.

The binaural renderer 530 may perform binaural rendering in the frequency domain, which is a main process 531 for re-rendering the HOA signal in the rendered frequency domain in two channels. The binaural renderer 530 performs binaural rendering, and then performs F / T conversion (532) to convert a signal in the frequency domain to the time domain and output the signal to a two-channel output device such as a speaker or a headphone. Can be.

A domain conversion method for converting the HOA signal output from the audio core codec 510 into the time domain and a domain conversion method for converting the HOA signal of the time domain input from the binaural renderer 530 into the frequency domain 5, the F / T conversion process of the audio core codec 510 and the T / F conversion process of the binaural renderer 530 may be omitted. In addition, the HOA decoder 520 may render in the frequency domain instead of rendering in the time domain. Therefore, according to an embodiment, since some processes of the audio core codec 510 and the binaural renderer 530 may be omitted, the amount of computation at the decoding stage may be reduced.

The method according to some embodiments may be embodied in the form of program instructions that may be executed by various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Although the foregoing description has been focused on the novel features of the invention as applied to various embodiments, those skilled in the art will appreciate that the apparatus and method described above without departing from the scope of the invention. It will be understood that various deletions, substitutions, and changes in form and detail of the invention are possible. Accordingly, the scope of the invention is defined by the appended claims rather than in the foregoing description. All modifications within the scope of equivalents of the claims are to be embraced within the scope of the present invention.

Claims

An apparatus for decoding an audio signal comprising a HOA signal,

An audio core codec for decoding a bitstream including an audio signal and outputting the HOA signal in a frequency domain or a time domain; And

And a HOA decoder for rendering and outputting a HOA signal in the frequency domain in the frequency domain.
The method of claim 1,

A mixer for mixing in the frequency domain a HOA signal in the rendered frequency domain with other audio signals;

And a binaural renderer for binaural rendering a signal mixed by the mixer in the frequency domain.
The method of claim 1, wherein the audio core codec

If the processing domain of the HOA decoder is a neutral or frequency domain, outputs the HOA signal of the frequency domain,

And the HOA decoder renders and outputs a HOA signal in the frequency domain in the frequency domain.
The method of claim 1,

And if the domain conversion method in the audio core codec and binaural renderer is the same, the processing domain of the HOA renderer is determined to be a neutral or frequency domain.
A method of decoding an audio signal comprising a HOA signal,

Decoding a bitstream including an audio signal to output the HOA signal in a frequency domain or a time domain;

In the frequency domain, rendering and outputting a HOA signal in the frequency domain.
The method of claim 5,

In the frequency domain, mixing the rendered HOA signal with another audio signal;

In the frequency domain, further comprising binaural rendering a signal mixed by the mixer.
The method of claim 5, wherein outputting the HOA signal

If the processing domain of the HOA decoder is a neutral or frequency domain, outputting the HOA signal in the frequency domain,

The rendering and outputting of the HOA signal may include rendering and outputting the HOA signal of the frequency domain in the frequency domain.
The method of claim 5,

If the domain conversion method in the audio core codec and binaural renderer is the same, the processing domain of the HOA decoder is determined to be a neutral or frequency domain.
The computer-readable recording medium according to any one of claims 5 to 8, wherein a program for implementing the method is recorded.