KR20090004778A

KR20090004778A - Method for processing an audio signal and apparatus for implementing the same

Info

Publication number: KR20090004778A
Application number: KR1020080065478A
Authority: KR
Inventors: 오현오
Original assignee: 엘지전자 주식회사
Priority date: 2007-07-05
Filing date: 2008-07-07
Publication date: 2009-01-12

Abstract

An audio signal processing method and an apparatus thereof are provided to perform up-mixing to a stereo signal and a multi channel signal through one codec mode, thereby significantly reducing the amount of bits of information necessary for the up-mixing. Channel expansion environmental information is extracted from an audio signal bit stream(S120). One or more of first and second decoding information is extracted from the audio signal bit stream based on the channel expansion environmental information(S220). By using one or more of the first decoding information and the second decoding information, up-mixing of a core signal is performed (S212). The first decoding information is information for up-mixing the core signal to the stereo signal by using a first decoding mode. The second decoding information is information for up-mixing the core signal to the stereo signal or the multichannel signal through a second decoding mode.

Description

TECHNICAL FOR PROCESSING AN AUDIO SIGNAL AND APPARATUS FOR IMPLEMENTING THE SAME}

The present invention relates to an audio signal processing method and apparatus, and more particularly, to an audio signal processing method and apparatus capable of encoding or decoding an audio signal.

In general, in order to transmit a digital broadcast signal, a video signal and an audio signal must be transmitted. In this case, a signal corresponding to a mono channel or a stereo channel may be transmitted as an audio signal, and decoding information for upmixing the audio signal to a stereo channel or a multichannel may be transmitted together. In this case, there is a problem in that the bit amount of the audio signal is increased in transmitting the decoding information, and in the decoder, the complexity increases in the process of upmixing to the multichannel.

The present invention was devised to solve the above problems, and provides an audio signal processing method and apparatus capable of generating both a stereo output and a multi-channel output by one codec in upmixing a core signal. The purpose is.

Still another object of the present invention is to determine whether an output mode is a stereo output or a multi output before parsing decoding information for channel extension by using channel extension environment information included in an audio signal bitstream.

In order to achieve the above object, an audio signal processing method includes extracting channel extension environment information from an audio signal bitstream; Extracting at least one of first decoding information and second decoding information from the audio signal bitstream based on the channel extension environment information; And upmixing a core signal using at least one of the first decoding information and the second decoding information, wherein the first decoding information is configured to upgrade the core signal to a stereo signal using a first decoding scheme. Information for mixing, and the second decoding information is information for upmixing the core signal into a stereo signal or a multichannel signal using a second decoding scheme.

According to the present invention, the second decoding information includes first spatial information that is spatial information for upmixing the core signal into the multichannel signal, and the core signal is converted into the core signal using the first spatial information. The method may further include generating second spatial information for upmixing into a stereo signal.

According to the present invention, when the core signal is a stereo signal, the step of upmixing the core signal may be performed using the first spatial information.

According to the present invention, the first decoding method corresponds to a method of generating a stereo channel using a mono signal and a decorrelator, and the second decoding method uses spatial information including a level difference between channels. It may correspond to a method for generating the stereo channel or the multi-channel signal.

According to the present invention, the method further comprises: extracting core channel information from the audio signal bitstream; The method may further include determining whether the core signal is a mono signal or a stereo signal based on the core channel information.

According to the present invention, extracting SBR (Spectral Band Replication) flag information from the audio signal bitstream; And determining whether to use the SBR tool based on the SBR flag information.

According to the present invention, when the SBR (Spectral Band Replication) tool is not used according to the SBR flag information, the step of upmixing the core signal is performed using only the second decoding information, not the first decoding information. Can be.

According to the present invention, the method may further include determining an output mode according to the channel extension environment information. When the output mode is a stereo output mode, the step of upmixing the core signal may be performed in a general manner rather than a binaural mode. May be performed in mode.

According to still another aspect of the present invention, channel extension environment information is extracted from an audio signal bitstream, and at least one of first decoding information and second decoding information is extracted from the audio signal bitstream based on the channel extension environment information. Extraction unit to extract; And an extended decoding unit configured to upmix a core signal using at least one of the first decoding information and the second decoding information, wherein the first decoding information includes a stereo signal using the first decoding scheme. The second decoding information is information for upmixing, and the second decoding information is information for upmixing the core signal into a stereo signal or a multichannel signal using a second decoding scheme.

According to the present invention, the second decoding information includes first spatial information, which is spatial information for upmixing the core signal into the multichannel signal, and the extended decoding unit uses the first spatial information, The apparatus may further include a spatial information generator configured to generate second spatial information for upmixing the core signal to the stereo signal.

According to the present invention, when the core signal is a stereo signal, the extended decoding unit may upmix the core signal using the first spatial information.

According to the present invention, the extractor may further extract core channel information from the audio signal bitstream, and determine whether the core signal is a mono signal or a stereo signal based on the core channel information.

According to the present invention, the extractor may extract SBR (Spectral Band Replication) flag information from the audio signal bitstream and determine whether to use the SBR tool based on the SBR flag information.

According to the present invention, when the SBR (Spectral Band Replication) tool is not used according to the SBR flag information, the extended decoding unit upmixes the core signal using only the second decoding information, not the first decoding information. can do.

According to the present invention, the extractor may determine an output mode according to the channel extension environment information, and when the output mode is a stereo output mode, the extended decoder may upgrade the core signal in a normal mode instead of a binaural mode. You can mix.

According to another aspect of the present invention, the method includes: extracting channel extension environment information from an audio signal bitstream; Determining an output mode based on the channel extension environment information; Generating second spatial information by using first spatial information included in the audio signal bitstream when the output mode is a stereo output mode; And upmixing a core signal using one of the first spatial information and the second spatial information, wherein the first spatial information is used for upmixing the core signal into a stereo signal or a multichannel signal. Provided is information for decoding, and the second spatial information is provided with an audio signal processing method comprising information for upmixing the core signal into a stereo signal.

According to the present invention, when the output mode is a stereo output mode, the step of upmixing the core signal may be performed in a normal mode rather than a binaural mode.

According to another aspect of the present invention, an extraction unit for extracting the channel expansion environment information from the audio signal bitstream, and determines the output mode based on the channel expansion environment information; A spatial information generator configured to generate second spatial information by using first spatial information included in the audio signal bitstream when the output mode is a stereo output mode; And an extended decoding unit configured to upmix a core signal using one of the first spatial information and the second spatial information, wherein the first spatial information upmixes the core signal to a stereo signal or a multichannel signal. The second spatial information is provided, and the audio signal processing apparatus includes information for upmixing the core signal into a stereo signal.

According to the present invention, the core signal may be a stereo signal, and the extended decoding unit may upmix the core signal using the first spatial information.

According to the present invention, when the output mode is a stereo output mode, the extended decoding unit may upmix the core signal in a normal mode instead of a binaural mode.

According to an aspect of the present invention, when upmixing a core signal, since both the stereo signal and the multi-channel signal can be upmixed using one codec method, a bitstream corresponding to another codec method is excluded. Therefore, the amount of bits of information required for upmixing can be significantly reduced.

According to another aspect of the present invention, before parsing decoding information (eg, MPEG bitstream) for upmixing, stereo output may be performed based on the channel expansion environment information because it may be determined whether the stereo output mode or the multichannel output mode is used. In mode, using two filter banks can significantly reduce the complexity required for upmixing.

According to another aspect of the invention,

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

1 is a diagram illustrating a configuration of an audio signal processing apparatus according to an embodiment of the present invention. Referring to the drawings, an audio signal processing apparatus according to an exemplary embodiment of the present invention includes an extractor 110, a core decoder 120, and an extended decoder 130, and the extractor 110 first includes audio. The signal bitstream may be received, the decoding information may be extracted from the audio signal bitstream, and the encoded core signal may be further extracted. The decoding information may include first decoding information and second decoding information necessary for channel extension, and the extraction unit 110 transmits the first decoding information and the second decoding information to the extension decoding unit 130. An audio signal bitstream such as first decoding information and second decoding information will be described later with reference to FIGS. 2 and 3.

The core decoding unit 120 decodes the encoded core signal, and outputs a mono signal when the core signal is a mono signal, and outputs a stereo signal when the core signal is a stereo signal. When the encoded core signal is encoded according to the AAC (Advanced Audio Coding) method, the core decoding unit 120 may also decode the core signal according to the AAC method.

The extended decoding unit 130 upmixes the decoded core signal into a stereo signal or a multi-channel signal using at least one of the first decoding information and the second decoding information. The extended decoding unit 130 may include one or more of the first decoding unit and the second decoding unit, and there may be various examples depending on whether each of the first decoding unit and the second decoding unit is used and a signal type. In detail, when both the first decoding information and the second decoding information may exist, the first to fifth examples may exist according to whether only the second decoding information may exist, for example, FIGS. 4 to 6 and 9. 10 to 10 and a signal processing method thereof will be described later with reference to FIGS. 7-8 and 11.

2 is an example conceptually illustrating an audio signal bitstream. Referring to FIG. 2A, the audio signal bitstream includes core decoding information 10a, SBR information 20a, first decoding information 30a, and second decoding information 35a. Although not shown, the encoded core signal may be further included in the audio signal beast stream. Meanwhile, the core decoding information 10a is information for decoding an encoding core signal and may be based on an AAC (Advanced Audio Coding) scheme, but the present invention is not limited thereto. SBR information 20a is information used for a Spectral Band Replication (SBR) tool. The first decoding information 30a is information used in the first decoding unit of the extended decoding unit 130 and corresponds to a parametric stereo (PS) standard for generating a stereo channel using a mono signal and a decorator. It may be according to the present invention is not limited thereto. The second decoding information 35a is information used in the second decoding unit of the extended decoding unit 130 and is spatial information including a level difference between channels, and includes MPEG surround (MPS: MPEG surround [ISO / IEC 2003-1). ]) It may be according to a standard, but the present invention is not limited thereto. When the audio signal bitstream is as shown in (a) of FIG. 2, the detailed configuration of the extended decoding unit 130 may be the same as the first to third examples shown in FIGS. 4 to 6, and the signal processing method. May be one embodiment shown in FIGS. 7 and 8. Detailed description thereof will be described later with reference to FIGS. 4 to 8.

Referring to FIG. 2B, the audio signal bitstream includes core decoding information 10b, SBR information 20b, and decoding information 30b. Here, the core decoding information 10b may be the same as the core decoding information 10a described with reference to FIG. 2A and the SBR information 20b may be the same as the SBR information 20a described with reference to FIG. 2. The decoding information 30b may be the same as the second decoding information 35a described with reference to FIG. 2. When the audio signal bitstream is as shown in (b) of FIG. 2, the detailed configuration of the extended decoding unit 130 may be the same as the fourth to fifth examples shown in FIGS. 9 to 10, and the signal processing method. May be another embodiment shown in FIG. 11. A detailed description thereof will be provided later with reference to FIGS. 9 to 11.

3 is an example of a structure of an audio signal bitstream. Referring to FIG. 3A, the structure of mh_audio_frame () is shown. The syntax of mh_audio_frame () may be configured as follows.

[Table 1: Syntax of mh_audio_frame ()]

Syntax No. of bits Note mh_audio_frame () { mh_audio_version One if (mh_audio_version == HE_AAC_V2_MPS) { mh_audio_ham_frame (); else { mh_audio_extension_type; 2 switch (mh_audio_extension_type) { case AMR_WBP: mh_audio_awp_frame (); default: type! = AMR_WBP // reserved } } NOTE:

First, referring to the syntax, the audio signal version information (mh_audio_version) is extracted, and when the audio signal version is HE_AAC_V2_MPS ("MPEG-4 HE AAC v2 with Baseline MPEG surround Profile Level 2"), mh_audio_ham_frame () is called. .

An example of the structure for mh_audio_ham_frame () is shown in FIG. 2B, and an example of the syntax is as follows.

Table 2: Syntax of mh_audio_ham_frame ()

Syntax No. of bits Note mh_audio_ham_frame () { mh_audio_ham_header () determines au_size for (n = 0; n <n_aus n ++) { au [n] 8 ㅧ au_size [n] au_crc 16 } } NOTE: au corresponds to one single access unit (audio frame). Each au is protected by one CRC. The size of mh_audio_ham_frame () is equal to xxx_size.

Here, "n_aus" is the number of access units in the corresponding audio frame,

an [n]: Audio samples for the duration of one audio_ham_frame according to the core sampling rate and n_aus.

au_crc: Each access unit is protected according to a 16 bit CRC. Cyclic Redundancy Check (CRC) is generated by the following polynomial:

An example of the syntax of mh_audio_ham_header () is as follows.

Table 3: Syntax of mh_audio_ham_header ()

Syntax No. of bits Note mh_audio_ham_header () { sync_word 12 Value: 0xFF5 // start of audio parameters refresh_flag One aac_channel_mode One sbr_flag One ch_extension_config; 3 n_aus; vlc for (n = 1; n <n_aus; n ++) { au_start [n]; 12 AU start position } byte_align (); byte-alignment header_crc 16 } NOTE: The au_start for the first AU in the audio super frame (au_start [0]) is not transmitted. The first AU always starts immediately after the he_aac_super_frame_header ().

Like the syntax, mh_audio_ham_header () includes a sync word (sync_word), a refresh flag (refresh_flag), core channel information (aac_channel_mode), SBR flag information (sbr_flag), first extension channel environment information (ch_extension_config), and the like. The second extension channel environment information (mps_extension_config) may be included instead of the first extension channel environment information. Specifically, when the audio signal bitstream is as shown in FIG. 2A, the first extended channel environment information is included, and as shown in FIG. 2B, the second extended channel environment information. May be included. Hereinafter, each element included in the syntax will be described in order.

A sync word (sync_word) is present to allow an RS (Reed_Solomon) decoder to identify the beginning of an audio frame with a 12-bit long synchronization sequence. The value of sync_word is '0xFF5' in hexadecimal.

The refresh flag (refresh_flag) is a 1-bit flag indicating whether the audio decoder checks whether the audio parameter of the next audio frame is the same as or equal to the audio parameter of the current audio frame, as shown in the table below.

Table 4: Meaning of refresh_flag

refresh_flag meaning 0 The audio parameters of the next frame are the same as the audio parameters of the current audio super frame. One Audio parameter of next frame is different from audio parameter of current audio super frame

The core channel information (aac_channel_mode) is a 1-bit flag indicating whether the core signal is mono or stereo as shown in the following table.

[Table 5: Meaning of core channel information (aac_channel_mode)]

aac_channel_mode meaning Note 0 Core signal (AAC) is mono mono refers to a single_channel_element () [2] One Core signal (AAC) is stereo stereo refers to a channel_pair_element () [2]

SBR flag information (sbr_flag) is a 1-bit flag indicating whether SBR is used as shown in the following table.

[Table 6: Meaning of SBR flag information (sbr_flag)]

sbr_flag meaning Note 0 Disable SBR The sampling rate of the AAC core is equal to the sampling rate of the digital analog converter (DAC) One Enable SBR The sampling rate of the AAC core is half the sampling rate of the DAC (digital analog converter)

On the other hand, au_start [n] is an unsigned integer, with the most significant bit being given priority and transmitted through a 12-bit field that defines the starting point in the audio frame of the individual access unit by giving the byte number of the first byte of the access unit. The value of au_start of the first access unit is not transmitted but is given by calculating the header size. The decoder derives au_size [n] from au_start [n] and au_start [n + 1] as follows.

[Equation 1]

au_size [n] = au_start [n + 1] -au_start [n] -2;

au_size [n_aus] = audio_frame_size

The output mode when the first extension channel environment information ch_extension_config uses one of the first decoding information PS and the second decoding information MPS and the second decoding information MPS as shown in the following table. Information about whether it is a stereo output or a multichannel (ex: 5.1) output. On the other hand, when the output mode is stereo, since the stereo signal (s) is generated directly from the mono signal (m), it can be referred to as a 2-1-2 configuration, the 2-1-2 configuration is the MPEG surround standard It is not included in the specified bstreeconfig (eg 5-1-5 ₁ , 5-1-5 ₂ , 5-2-5, etc.).

[Table 7: Meaning of first extension channel environment information (ch_extension_config)]

ch_extension_config Meaning Note 000 Disable channel expansion 001 Use first decoding information (PS) only permitted when aac_channel_mode == 0 && sbr_flag == 1 010 Use second decoding information (MPS) (stereo output) only permitted when aac_channel_mode == 0 011 Use of first decoding information PS and second decoding information MPS. Output mode of the second decoding information (MPS) outputs multiple channels (ex: 5.1) 100 Use second decoding information (MPS) (multi-channel (ex: 5.1) output) 101 to 111 Expected

When the first extension channel environment information is 'ch_extension_config = 000', when the first decoding information and the second decoding information do not exist, the extension decoding unit 130 described with reference to FIG. 1 may be skipped. If yes.

When the first extension channel environment information is 'ch_extension_config = 001', only first decoding information exists and may correspond to a case of the extended decoding unit 130A which will be described later with reference to FIG. 4. When the first decoding information follows the PS scheme described above, since the PS is a method of upmixing a mono signal to a stereo signal, only when the core signal is a mono signal according to the core channel information aac_channel_mode, the first decoding information May be allowed. In other words, when the core signal is a stereo signal, only the second decoding information may be used instead of the first decoding information. Furthermore, according to the SBR flag information sbr_flag, use of the first decoding information may be allowed only when the SBR tool is used. In other words, if the SBR tool is not used for the core signal, the use of the second decoding information may be forced.

When the first extension channel environment information is 'ch_extension_config = 010', only second decoding information exists, and the output mode is a stereo output mode. In this case, the extended decoding unit 130 of the audio signal decoding apparatus may be implemented as the extended decoding unit 130B to be described later with reference to FIG. 5. Since it is a case of the stereo output mode (2-1-2 config), the use of the second decoding information of the stereo output mode can be allowed only when the core signal is a mono signal. Meanwhile, in the stereo output mode, the first spatial information included in the audio signal bitstream is not used, and the second spatial information generated using the first spatial information is used, which will be described in detail later.

When the first extension channel environment information is 'ch_extension_config = 011', both the first decoding information and the second decoding information may be used. In this case, when the second decoding information is used, the output mode is multichannel. Output mode. In this case, when the first decoding information is used, the extended decoding unit 130 of the audio signal decoding apparatus may be implemented as the extended decoding unit 130A, which will be described later with reference to FIG. 4, and the second decoding information is used. In this case, the extended decoding unit 130C to be described with reference to FIG. 6 may be implemented.

When the first extension channel environment information is 'ch_extension_config = 100', only second decoding information is used, and the output mode is a multichannel output mode. In this case, the extended decoding unit 130C to be described with reference to FIG. 6 may correspond.

As such, referring to the first extended channel environment information, it is possible to know not only whether the second decoding information is present, but also whether the output channel is stereo or multichannel before parsing the second decoding information.

A detailed description of each case of the first extended channel environment information will be described in detail with reference to FIGS. 4 to 6 and 7 to 8.

Meanwhile, as described above, the second extended channel environment information (mps_extension_config) may be included instead of the first extended channel environment information, which is a case where only the second decoding information may be included in the audio signal bitstream. The second extended channel environment information is shown in the following table.

[Table 8: Meaning of second extension channel environment information (mps_extension_config)]

mps_extension_config Meaning Note 000 Disable channel expansion 001 Use second decoding information (MPS) (stereo output) only permitted when aac_channel_mode == 0 010 Use second decoding information (MPS) (multi-channel (ex: 5.1) output) 011 to 111 Expected

When the second extended channel environment information is 'mps_extension_config = 000', the second decoding information does not exist, and the extended decoding unit 130 described with reference to FIG. 1 may be skipped.

When the second extension channel environment information is 'mps_extension_config = 001', the second decoding information is used, and the output mode is a stereo output mode. This case is allowed only when the core signal is a mono signal. In other words, when the core signal is a stereo signal, the stereo output mode is not allowed and the multichannel output mode is forced.

When the second extension channel environment information is 'mps_extension_config = 010', second decoding information is used, and the output mode is a multichannel output mode. In this case, the case where the core signal is a mono signal as well as a stereo signal is allowed.

As such, referring to the second extended channel environment information, it is possible to know whether the output channel is stereo or multichannel even before parsing the second decoding information.

4 to 6 are first, second and third examples showing the detailed configuration of the extended decoding unit. As described above, the first to third examples are cases in which the first channel extension environment information (ch_extension_config) is included in the audio signal bitstream, where both the first decoding information and the second decoding information may exist. Corresponding.

First, referring to FIG. 4, the extended decoding unit 130A of the first example is shown. The extended decoding unit 130A of the first example is an embodiment in which the first decoding information exists. When the first channel extension environment information is shown in Table 7, 'ch_extension_config = 001' or 'ch_extension_config = 011'. Corresponds to The extended decoding unit 130A receives a mono signal m as a core signal, and the mono signal m is input to the first decoding unit 132A and upmixed into a stereo signal in a first decoding manner. Here, the first decoding method may correspond to a parametric stereo method.

Referring to FIG. 5, the extended decoding unit 130B of the second example is illustrated. The extended decoding unit 130B of the second example is an embodiment in which the second decoding information exists, and particularly corresponds to the case where the output mode is the stereo output mode. Similarly, when the first channel extension environment information is as shown in Table 7, 'ch_extension_config = 010' corresponds. The extended decoding unit 130B receives a mono signal m as a core signal, and the mono signal m is input to the second decoding unit 134B and upmixed into a stereo signal in a second decoding scheme. Here, the second decoding method may correspond to an MPEG surround method. Here, the second decoding unit 134B uses the second spatial information to upmix the mono signal to the stereo signal using the second decoding method, wherein the second spatial information is generated by the spatial information generating unit 136B. It may be generated by using the first spatial information included in the bistrim. For example, when the first spatial information includes the first channel-to-channel level difference CLD ₀ to the fifth channel-to-channel level difference CLD ₄ , the second spatial information includes the first channel-level level difference CLD ₀ . To a new level difference CLD _a between channels, which is recalculated using the fifth level difference CLD ₄ . In this case, the second decoding unit 134B may be implemented as two synthesis filterbanks corresponding to the stereo output. On the other hand, when the output mode is a stereo output mode, the upmixing of the core signal is performed in the normal mode, not the binaural mode.

Referring to FIG. 6, the extended decoding unit 130C of the third example is an embodiment in which second decoding information exists, and particularly corresponds to a case in which the output mode is a multi-channel output mode. Similarly, when the first channel extension environment information is shown in Table 7, this corresponds to a case where 'ch_extension_config = 011' or 'ch_extension_config = 100'. The extended decoding unit 130C receives a mono signal m or a stereo signal s as a core signal, and the second decoding unit 134B upmixes the core signal into a multichannel signal using spatial information. . The spatial information is first spatial information included in the audio signal bitstream.

7 and 8 illustrate a procedure of an audio signal processing method according to an embodiment of the present invention. 7 to 8 also correspond to the case where at least one of the first decoding information and the second decoding information exists in the audio signal bitstream as described above. Referring to FIG. 7, first, an audio signal bitstream is received (step S110). Core channel information, SBR information, first extended channel environment information, and the like are extracted from the audio signal bitstream (S120). Based on the first extended channel environment information, it is determined whether the first decoding information and the second decoding information exist or, if the second decoding information exists, which output mode.

If only the first decoding information exists (YES in step S122), for example, when the first extended channel environment information is '001', the first decoding information is extracted (step S210). The core signal is upmixed using the 1 decoding information (S212).

If both the first decoding information and the second decoding information exist (YES in S124), for example, when the first extended channel environment information is '011', the first decoding information and the second decoding information are determined. Extract all (step S220). When the first decoding information is used among the extracted information (YES in step S222), the core signal is upmixed using the first decoding information (step S212). If the second decoding information is used among the extracted information (NO in step S222), the second decoding information may be a multi-channel output mode, which will be described with reference to FIG. 8. It will be described later.

Referring to FIG. 8, when it corresponds to 'NO' in steps S122 and S124 (step A), and second decoding information does not exist (“No” in step S126), for example, the first extended channel environment If the information is '000', the procedure ends without upmixing the core signal. If only the second decoding information exists (YES in step S126), the second decoding information is extracted (step S230). Then, it is determined whether the output mode is a stereo output mode or a multichannel output mode based on the first extended channel environment information. In case of the stereo output mode (Yes in step S240), for example, when the first extended channel environment information is '010', the second spatial information is used by using the first spatial information included in the audio signal bitstream. In operation S242, the core signal is upmixed into a stereo signal using the second spatial information in operation S244. It can be seen that the output mode is a stereo output (2-1-2 configuration) based on the first extended channel environment information, and can be upmixed using only two synthesis filterbanks corresponding to the stereo output. In this case, the complexity of the decoder can be significantly reduced. On the other hand, the upmixing of the core signal in step S244 is performed in the normal mode, not the binaural mode. On the contrary, when the output mode is not the stereo output mode (No in step S240), for example, when the first extended channel environment information is '100', the core signal is converted into a multi-channel signal using the first spatial information. Upmixing (S250).

9 and 10 are fourth and fifth examples showing detailed configurations of the extended decoding unit. 9 and 10 illustrate a case in which only the second decoding information may be included in the audio signal bitstream, and the second extension channel environment information (mps_extension_config) indicating whether the second decoding information exists and the output mode is It is a case of inclusion. The fourth example shown in FIG. 9 corresponds to a case where 'mps_extension_config = 001' when the second channel extension environment information is shown in Table 8, and the configuration of the extension decoding unit 130D is illustrated in FIG. It is the same as the extended decoding unit 130B of the second example. The fifth example shown in FIG. 10 corresponds to a case where 'mps_extension_config = 010' when the second channel extension environment information is shown in Table 8, and the configuration of the extension decoding unit 130E is illustrated in FIG. It is the same as the extended decoding unit 130C of the third example.

11 is a flowchart illustrating an audio signal processing method according to another embodiment of the present invention. As described above, another embodiment illustrated in FIG. 11 corresponds to a case in which only second decoding information may exist in the audio signal bitstream. Referring to FIG. 11, first, an audio signal bitstream is received (step S310). Core channel information, SBR information, second extended channel environment information, and the like are extracted from the audio signal bitstream (step S320). The presence of the second decoding information and the output mode are determined based on the second extended channel environment information. If the second decoding information does not exist (NO in step S330), for example, when the second extended channel environment information is '000', the extended decoding step is omitted and the procedure is terminated.

When the second decoding information is present (YES in step S330), for example, when the second extended channel environment information is '001' or '010', steps S240 to S250 of FIG. 8 described above are described. The same step as is performed (steps S350 to S380).

As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

The invention can be applied to the encoding and decoding of audio signals.

1 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention.

2 is a first example and a second example conceptually illustrating an audio signal bitstream.

3 is an example of the structure of an audio signal bitstream.

4 is a first example showing a detailed configuration of an extended decoding unit.

5 is a second example showing the detailed configuration of an extended decoding unit.

6 is a third example showing the detailed configuration of an extended decoding unit.

7 and 8 are flowcharts of an audio signal processing method according to an embodiment of the present invention.

9 is a fourth example showing the detailed configuration of an extended decoding unit.

10 is a fifth example showing the detailed configuration of an extended decoding unit.

11 is a flowchart of an audio signal processing method according to another embodiment of the present invention.

Claims

Extracting channel extension environment information from the audio signal bitstream;

Extracting at least one of first decoding information and second decoding information from the audio signal bitstream based on the channel extension environment information; And,

Upmixing a core signal using at least one of the first and second decoding information;

The first decoding information is information for upmixing the core signal into a stereo signal using a first decoding scheme,

And the second decoding information is information for upmixing the core signal into a stereo signal or a multi-channel signal using a second decoding scheme.

The method of claim 1,

The second decoding information includes first spatial information which is spatial information for upmixing the core signal into the multichannel signal,

And generating second spatial information for upmixing the core signal into the stereo signal using the first spatial information.

The method of claim 2,

If the core signal is a stereo signal, the step of upmixing the core signal,

The audio signal processing method is performed by using the first spatial information.

The method of claim 1,

The first decoding method corresponds to a method of generating a stereo channel using a mono signal and a decorrelator,

The second decoding method may correspond to a method of generating the stereo channel or the multi-channel signal using spatial information including a level difference between channels.

The method of claim 1,

Further extracting core channel information from the audio signal bitstream; And,

And determining whether the core signal is a mono signal or a stereo signal based on the core channel information.

The method of claim 1,

Extracting Spectral Band Replication (SBR) flag information from the audio signal bitstream; And,

And determining whether to use the SBR tool based on the SBR flag information.

The method of claim 6,

When the SBR (Spectral Band Replication) tool is not used according to the SBR flag information, the step of upmixing the core signal is performed using only the second decoding information, not the first decoding information. Audio signal processing method.

The method of claim 1,

Determining an output mode according to the channel extension environment information;

If the output mode is a stereo output mode, the step of upmixing the core signal, the audio signal processing method, characterized in that performed in the normal mode, not binaural mode.

An extracting unit extracting channel extension environment information from an audio signal bitstream and extracting at least one of first decoding information and second decoding information from the audio signal bitstream based on the channel extension environment information; And,

An extended decoding unit configured to upmix a core signal using at least one of the first decoding information and the second decoding information,

The first decoding information is information for upmixing the core signal to a stereo signal using a first decoding scheme,

And the second decoding information is information for upmixing the core signal into a stereo signal or a multichannel signal using a second decoding scheme.

The method of claim 9,

And the extended decoding unit further comprises a spatial information generation unit generating second spatial information for upmixing the core signal to the stereo signal using the first spatial information.

The method of claim 10,

If the core signal is a stereo signal, the extended decoding unit, upmixing the core signal using the first spatial information.

The method of claim 9,

And the second decoding method corresponds to a method of generating the stereo channel or the multi-channel signal using spatial information including a level difference between channels.

The method of claim 9,

The extractor further extracts core channel information from the audio signal bitstream, and determines whether the core signal is a mono signal or a stereo signal based on the core channel information.

The method of claim 9,

The extractor extracts SBR (Spectral Band Replication) flag information from the audio signal bitstream and determines whether to use an SBR tool based on the SBR flag information.

The method of claim 14,

When the SBR (Spectral Band Replication) tool is not used according to the SBR flag information, the extended decoding unit upmixes the core signal using only the second decoding information, not the first decoding information. Audio signal processing device.

The method of claim 9,

The extraction unit determines the output mode according to the channel expansion environment information,

And the extended decoding unit upmixes the core signal in a normal mode instead of a binaural mode when the output mode is a stereo output mode.

Determining an output mode based on the channel extension environment information; And,

Generating second spatial information by using first spatial information included in the audio signal bitstream when the output mode is a stereo output mode; And

Upmixing a core signal using one of the first spatial information and the second spatial information;

The first spatial information is information for decoding for upmixing the core signal into a stereo signal or a multichannel signal,

The second spatial information includes information for upmixing the core signal into a stereo signal.

The method of claim 17,

If the core signal is a stereo signal,

Upmixing the core signal is performed using the first spatial information.

The method of claim 17,

If the output mode is a stereo output mode, the step of upmixing the core signal, audio signal processing method, characterized in that performed in the normal mode, not binaural mode.

An extraction unit for extracting channel extension environment information from an audio signal bitstream and determining an output mode based on the channel extension environment information;

A spatial information generator configured to generate second spatial information by using first spatial information included in the audio signal bitstream when the output mode is a stereo output mode; And,

An extended decoding unit configured to upmix a core signal using one of the first spatial information and the second spatial information,

The first spatial information is information for upmixing the core signal into a stereo signal or a multichannel signal,

The method of claim 20,

The core signal is a stereo signal,

The extended decoding unit is configured to upmix the core signal using the first spatial information.

The method of claim 20,

And when the output mode is a stereo output mode, the extended decoding unit upmixes the core signal in a normal mode instead of a binaural mode.