WO2014030928A1

WO2014030928A1 - Audio signal encoding method, audio signal decoding method, and apparatus using same

Info

Publication number: WO2014030928A1
Application number: PCT/KR2013/007505
Authority: WO
Inventors: 정규혁; 전혜정; 강인규
Original assignee: 엘지전자 주식회사
Priority date: 2012-08-21
Filing date: 2013-08-21
Publication date: 2014-02-27

Abstract

The present invention relates to a method and apparatus for processing audio signals. The method for encoding audio signals according to the present invention may comprise the steps of: determining the importance of tracks constituting track pairs on the basis of the energy for each track of the audio signals, and searching for pulses starting from the track with the highest importance to determine a pulse to be encoded; and quantizing information of the determined pulse to be encoded.

Description

Audio signal encoding method and audio signal decoding method and apparatus using same

The present invention relates to encoding and decoding of an audio signal, and more particularly, to a method and apparatus for searching an encoding / decoding target of an audio signal.

In general, audio signals include signals of various frequencies, and the human audible frequency is in the range of about 200 Hz to 3 kHz, whereas the average human voice is in the range of about 200 Hz to 3 kHz. The input audio signal may include not only a band in which a human voice exists but also a component of a high frequency region of 7 kHz or more, where a human voice is hard to exist.

Recently, network development and user demand for high-quality service are increasing, narrow band (NB, hereinafter 'NB'), wide band (WB, `` WB ''), ultra wide band ( Super Wide Band: The audio signal is transmitted through a wide band such as SWB (hereinafter referred to as SWB).

In this regard, when a coding method suitable for NB (sampling rate is about 8 kHz) is applied to a signal having a sampling rate of about 16 kHz, sound quality deterioration occurs. .

In addition, a coding scheme suitable for NB (sampling rate ~ ~ 8 kHz) or a coding scheme suitable for WB (sampling rate ~ ~ 16 kHz) is applied to a signal of SWB (sampling rate ~ 32 kHz). There is a problem that deterioration of sound quality occurs.

Accordingly, developments are being made on speech and audio encoding devices / decoding devices that can be used in various bands from NB to WB or SWB, or in various environments including communication environments between various bands.

An object of the present invention is to provide a method and apparatus for band extension of a voice and audio encoder in a digital communication environment.

It is an object of the present invention to provide a method and apparatus for efficiently searching for dense pulses in sinusoidal mode without the use of additional bits.

An object of the present invention is to provide a method and apparatus for encoding / decoding audio and audio signals having backward compatibility in decoding a bitstream encoded in a sine mode.

An object of the present invention is to provide a pulse search method and apparatus applicable to both CELP mode and sine mode.

It is an object of the present invention to provide a method and apparatus for performing an efficient search by using correlation between tracks, rather than performing independent search for each audio track.

The present invention relates to encoding and decoding of an audio signal. The encoding according to the present invention determines the importance of tracks constituting a track pair according to track-specific energy of an audio signal, and searches for and encodes pulses from the track of high importance. Determining a target pulse and quantizing information of the determined encoding target pulse.

The encoder according to the present invention determines the importance of the tracks constituting the track pair according to the track-specific energy of the audio signal, determines the encoding target pulse by searching the pulses from the high importance track, and determines the information of the determined encoding target pulse. By quantizing the audio signal can be encoded.

Decoding according to the present invention also includes generating pulses for the audio signal from the tracks constituting the track pair based on inverse quantization and reconstructing the audio signal based on the pulses. At this time, the generation of the pulse may be performed for each track in a predetermined order.

The decoder according to the present invention may generate pulses for the audio signal from the tracks constituting the track pair and restore the audio signal based on the pulses based on inverse quantization. The decoder may generate a pulse for each track.

According to the present invention, the dense pulses can be efficiently retrieved and transmitted / stored without using additional bits.

According to the present invention, since dense pulses can be searched without using additional bits, coding efficiency can be greatly improved.

According to the present invention, since the bitstream encoded according to the existing sine wave mode can be decoded, backward compatibility is guaranteed.

According to the present invention, even in the CELP mode in which adjacent tracks composed of pulses of positions adjacent to each other exist, the dense pulses can be effectively searched without using additional bits.

According to the present invention, it is possible to perform an efficient search by using correlation between tracks instead of performing independent search for each track.

1 schematically illustrates an example of an encoder configuration that may be used when an ultra-wideband signal is processed by a band extension method.

FIG. 2 is a diagram for explaining an example of a configuration of an encoder based on the configuration of a core encoder.

FIG. 3 schematically illustrates an example of a decoder configuration that may be used when an ultra-wideband signal is processed by a band extension method.

4 is a diagram illustrating an example of a decoder configuration based on the configuration of a core decoder.

5 is a diagram schematically illustrating a method of encoding a sine wave in a sine mode.

FIG. 6 is a diagram schematically illustrating track information encoded / decoded in a layer 6 to which a sine mode is applied.

FIG. 7 schematically illustrates an example of track information regarding a sine wave mode in layer 6, which is a first SWB layer.

8 is a diagram schematically illustrating an example in which two tracks are paired in the case of two steps.

9 is a diagram schematically illustrating an example in which three tracks are paired in the case of three steps.

10 is a flowchart schematically illustrating a sine wave search method applied to each layer according to an embodiment of the present invention.

FIG. 11 is a diagram schematically illustrating a case where an independent search is performed for each track without considering the characteristics of track pairs.

12 is a diagram schematically illustrating an example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention.

FIG. 13 is a view schematically illustrating another example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention.

14 is a view schematically illustrating another example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention.

In the example of FIG. 15, it is assumed that two or more tracks constitute a track pair, and that each track constituting the track pair is adjacent.

FIG. 16 is a block diagram schematically illustrating an example of an encoder to which the methods of FIGS. 14 and 16 are applied.

17 is a flowchart schematically illustrating an example of a method of searching for a pulse of a track according to frame energy or tonality according to the present invention.

18 is a flowchart schematically illustrating a method for searching / selecting a pulse based on a CELP mode in the present invention.

19 is a flowchart schematically illustrating an example of an audio signal encoding method according to the present invention.

20 is a flowchart schematically illustrating an example of an audio signal decoding method according to the present invention.

EMBODIMENT OF THE INVENTION Hereinafter, embodiment of this invention is described concretely with reference to drawings. In describing the embodiments of the present specification, when it is determined that a detailed description of a related well-known configuration or function may obscure the subject matter of the present disclosure, the description may be omitted.

When a component is said to be “connected” or “connected” to another component, it may be directly connected to or connected to that other component, but it may be understood that another component may exist in between. Should be.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

Components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that each component is made of separate hardware or one software component unit. Each component is included in a list of components for convenience of description, and at least two of the components may be combined to form one component, or one component may be divided into a plurality of components to perform a function.

In response to the development of networks and the demand for high-quality services, audio signal processing methods have been studied for various bands from narrow bands (NB) to wide bands (WB) or super wide bands (SWBs). For example, as a speech and audio encoding / decoding technique, a Code Excited Linear Prediction (CELP) mode, a sinusoidal mode, or the like may be used.

The coder may be divided into a baseline coder and an enhancement layer. The enhancement layer may be further divided into a lower band enhancement layer (LBE) layer, a bandwidth extension (BWE) layer, and a higher band enhancement layer (HBE) layer.

The LBE layer improves low-band sound quality by encoding / decoding a difference signal, that is, an excitation signal, between a sound source processed by a core encoder / core decoder and an original sound. Since the high band signal has similarity with the low band signal, it is possible to recover the high band signal at a low bit rate through the high band extension method using the low band.

As a method of extending and encoding a high band signal and restoring the decoding process, a method of scaling and processing a SWB signal may be considered. The method of band extending the SWB signal may operate in the Modified Discrete Cosine Transform (MDCT) domain.

The enhancement layers may be handled by being divided into a generic mode and a sinusoidal mode. For example, when three enhancement layers are used, the first enhancement layer may be processed in generic mode and sign mode, and the second and third enhancement layers may be processed in sign mode.

In the present specification, a sinusoid includes both a sine wave and a cosine wave in which the sinusoid is shifted in phase by half. Therefore, in the present invention, a sinusoid may mean a sine wave or a cosine wave. If the input sine wave is a cosine wave, it may be converted into a sine wave or cosine wave in the encoding / decoding process, and the conversion depends on the conversion method of the input signal. Even when the input sine wave is a sine wave, it may be converted to a cosine wave or a sine wave in the encoding / decoding process, and the conversion depends on the conversion method of the input signal.

In generic mode, coding is based on adaptive replication of the coded wideband signal subbands. In sine mode coding, a sine wave is added to high frequency contents.

The sine mode is an efficient encoding technique for a signal having a strong periodicity or a signal having a tone component, and may encode sign, amplitude, and position information for each sine wave component. A predetermined number, for example, 10 MDCT coefficients may be encoded for each layer.

1 schematically illustrates an example of an encoder configuration that may be used when an ultra-wideband signal is processed by a band extension method. In FIG. 1, an encoder structure of a G.718 Annex B scalable extension to which a sine mode is applied will be described as an example.

The encoder of FIG. 1 is composed of a generic mode and a sign mode for SWB extension, and when an additional bit is allocated, the encoder mode can be used by extending the sign mode.

Referring to FIG. 1, the encoder 100 includes a down sampling unit 105, a core encoder 110, an MDCT unit 115, a tonality estimation unit, a tonality determination unit 125, and a SWB ( Super Wide Band) encoding unit 130. The SWB encoder 130 includes a generic mode unit 135, a sine wave mode unit 140, and additional

sine wave units

145 and 150.

When the SWB signal is input, the down sampling unit 105 down-samples the input signal to generate a WB signal that can be processed by a core encoder.

SWB encoding is performed in the MDCT domain. The core encoder 110 encodes the WB signal to MDCT the synthesized WB signal and outputs MDCT coefficients.

Modified Discrete Cosine Transform (MDCT) is a transformation that transforms a signal in the time domain into a signal in the frequency domain, and uses an overlap-addition method to completely reconstruct a signal before converting the original signal. . Equation 1 shows an example of MDCT.

<수식 1><Equation 1>

Input signal in the windowed time domain

Is a symmetric window function.

Is N MDCT coefficients.

Is an input signal of the reconstructed time domain with 2N samples.

The MDCT unit 115 MDCTs the SWB signal, and the tonality estimator 120 estimates the tonality of the MDCT signal. The choice between generic mode and sine mode is based on tonality. For example, when using three layers in the scalable SWB band extension method, the first layer, that is, layer 6mo (layer 7mo) may be selected based on the tonality estimate. The generic mode and / or the sine mode may be used in the layer 6mo of the three layers, and the sine mode may be used in the upper layer (layer 7mo, layer 8mo).

The tonality estimation may be performed based on correlation analysis between spectral peaks in a current frame and a past frame.

The tonality estimator 120 outputs the tonality estimate to the tonality determiner 125.

The tonality determiner 125 determines whether the MDCT-converted signal is tonal based on the degree of tonality, and transmits it to the SWB encoder 130. For example, the tonality determination unit 125 compares the tonality estimation value input from the tonality estimator 120 with a predetermined reference value to determine whether the MDCT-converted signal is a tonal signal or a non-tonal signal.

As shown, the SWB encoder 130 processes the MDCT coefficients of the MDCT SWB signal. In this case, the SWB encoder 130 may process the MDCT coefficients of the SWB signal by using the MDCT coefficients of the synthesized WB signal input through the core encoder 110.

When it is determined that the MDCT-converted signal is not tonal by the tonality determination unit 125, the signal is transmitted to the generic mode unit 135, and when it is determined to be tonal, the signal is transmitted to the sine wave mode unit 140. do.

The generic mode may be used when it is determined that the input frame is not tonal. The low frequency spectrum is directly transposed to high frequencies and parameterized to follow the envelope of the original high frequency. At this time, the parameterization can be made more coarsely than the case of the original high frequency. By applying the generic mode, high frequency content can be coded at a low bit rate.

For example, in the generic mode, the high frequency band is divided into sub-bands, and according to a predetermined similarity criterion, the one that is most similarly matched among coded and block normalized broadband contents is selected. The selected contents are scaled and output as synthesized high frequency content.

The sinusoidal mode unit 140 may be used when the input frame is tonal. In sine mode, a finite set of sinusoidal components is added to the high frequency (HF) spectrum to generate a SWB signal. At this time, the HF spectrum is generated using the MDCT coefficients of the SW synthesis signal.

The additional

sine wave units

145 and 150 add additional sine waves to the signal output in the generic mode and the signal output in the sine mode to improve the generated signal. For example, when additional bits are allocated, the additional

sine wave units

145 and 150 determine an additional sine wave (pulse) to transmit and extend the sine mode to quantize to improve the signal.

Meanwhile, as illustrated, outputs of the core encoder 110, the tonality determination unit 125, the generic mode unit 135, the sine wave mode unit 140, and the additional

sine wave units

145 and 150 are converted into bit streams. May be sent to the decoder.

FIG. 2 is a diagram for explaining an example of a configuration of an encoder based on the configuration of a core encoder. Referring to FIG. 2, the encoder 200 includes a bandwidth checker 205, a sampling converter 210, an MDCT converter 215, a core encoder 220, an important MDCT coefficient extractor and a quantizer 265. It includes.

The bandwidth checking unit 205 may determine whether the input signal (audio signal) is a narrow band (NB) signal, a wide band (WB) signal, or a super wide band (SWB) signal. The NB signal may have a sampling rate of 8 kHz, the WB signal may have a sampling rate of 16 kHz, and the SWB signal may have a sampling rate of 32 kHz.

The bandwidth checking unit 205 may convert an input signal into a frequency domain to determine a component and a zone of upper band bins of the spectrum.

The encoder 200 may not include the bandwidth checking unit 205 when the input signal is fixed, for example, when the input signal is fixed to NB.

The bandwidth checking unit 205 determines the input signal and outputs the NB or WB signal to the sampling converter 210, and outputs the SWB signal to the sampling converter 210 or the MDCT converter 215.

The sampling converter 210 performs sampling for converting an input signal into a WB signal input to the core encoder 220. For example, the sampling converter 210 up-samples the input signal to be a signal having a sampling rate of 12.8 kHz when the input signal is an NB signal, and the sampling rate is 12.8 kHz when the input signal is a WB signal. The down-sampling to the signal can produce a 12.8kHz low-band signal. When the input signal is a SWB signal, the sampling converter 210 downsamples the sampling rate to be 12.8 kHz to generate an input signal of the core encoder 220.

The core encoder 220 includes a preprocessor 225, a linear prediction analyzer 230, a quantizer 235, a CELP mode performer 240, a quantizer 245, an inverse quantizer 250, synthesis and post-processing. A processing unit 255 and an MDCT conversion unit 260.

The preprocessor 225 may filter low frequency components among the lower band signals input to the core encoder 220 and transmit only a signal of a desired band to the linear prediction analyzer.

The linear prediction analyzer 230 may extract a linear prediction coefficient (LPC) from the signal processed by the preprocessor 225. For example, the linear prediction analyzer 230 may extract the 16th linear prediction coefficient from the input signal and transfer the extracted 16th linear prediction coefficient to the quantization unit 235.

The quantization unit 235 quantizes the linear prediction coefficients transmitted from the linear prediction analyzer 230. The linear prediction residual signal is generated by filtering the original lower band signal using the quantized linear prediction coefficients in the lower band.

The linear prediction residual signal generated by the quantization unit 235 is input to the CELP mode performing unit 240.

The CELP mode performing unit 240 detects a pitch of the input linear prediction residual signal by using a self-correlation function. In this case, a first open loop pitch search method, a first closed loop pitch search method, and Abs (Analysis by Synthesis) may be used.

The CELP mode performing unit 240 may extract the adaptive codebook index and the gain information based on the detected pitch information. The CELP mode performing unit 240 may extract the index and the gain of the fixed codebook based on the remaining components limiting the contribution of the adaptive codebook in the linear prediction residual signal.

The CELP mode performing unit 240 quantizes the parameters (pitch, adaptive codebook index and gain, fixed codebook index and gain) related to the linear prediction residual signal extracted through the pitch search, the adaptive codebook search, and the fixed codebook search. To pass on.

The quantizer 245 quantizes the parameters transmitted from the CELP mode performer 240.

Parameters related to the quantized linear prediction residual signal in the quantization unit 245 may be output as a bit stream and transmitted to the decoder. In addition, the parameters related to the quantized linear prediction residual signal may be transferred to the inverse quantizer 250.

The inverse quantization unit 250 generates an excitation signal reconstructed using the extracted and quantized parameters through the CELP mode. The generated excitation signal is transmitted to the synthesis and post processor 255.

The synthesis and post-processing unit 255 synthesizes the reconstructed excitation signal and the quantized linear prediction coefficient, generates a synthesized signal of 12.8 kHz, and restores the 16 kHz WB signal through upsampling.

The MDCT converter 260 converts the restored WB signal by a modified disc cosine transform (MDCT) method. The MDCT transformed WB signal is output to the important MDCT coefficient extraction and quantization unit 265.

The important MDCT coefficient extraction and quantization unit 265 corresponds to the SWB coding unit shown in FIG. The important MDCT coefficient extraction and quantization unit 265 receives the MDCT transform coefficients for the SWB from the MDCT transform unit 215 and the MDCT transform coefficients for the synthesized WB from the MDCT transform unit 260.

The important MDCT coefficient extraction and quantization unit 265 extracts a transform coefficient to be quantized by using the input MDCT transform coefficients. The details of the important MDCT coefficient extraction and quantization unit 265 extracting MDCT coefficients are the same as those of the SWB encoder of FIG. 1.

The important MDCT coefficient extraction and quantization unit 265 quantizes the extracted MDCT coefficients, outputs them as a bitstream, and transmits them to the decoder.

Referring to FIG. 3, the decoder 300 includes a core decoder 305, a first post processor 310, an up sampling unit 315, a SWB decoder 320, an IMDCT unit 350, and a second post processor. 355, and an adder 360. The SWB decoder 320 includes a generic mode unit 325, a sinusoidal wave unit 330, and additional

sinusoidal wave units

335 and 340.

As shown, the core encoder 305, the generic mode unit 325, the sine wave unit 330, and the additional sine wave unit 335 may receive target information to be processed from the bit stream and / or auxiliary information for processing. Can be.

The core decoder 305 decodes the wideband signal to synthesize the WB signal. The synthesized WB signal is input to the first post processor 310, and the MDCT transform coefficients of the synthesized WB signal are input to the SWB decoder 320.

The first post processor 310 improves the synthesized WB signal in the time domain.

The upsample 315 upsamples the WB signal to form a SWB signal.

The SWB decoder 320 decodes the MDCT of the SWB signal input from the bitstream. In this case, the MDCT coefficients of the synthesized WB signal (Synthesized Super Wide Band Signal) input from the core decoder 305 may be used. The decoding of the SWB signal is mainly performed in the MDCT domain.

The generic mode unit 325 and the sine wave mode unit 330 decode the first layer of the enhancement layer, and the upper layer may be decoded by the additional

sine wave units

335 and 340.

The SWB decoder 320 performs a decoding process in the reverse order of the encoding process, corresponding to the encoding process described by the SWB encoder. In this case, the SWB decoder 320 determines whether the input information is tonal from the bitstream, and in the case of the tonal, the SWB decoder 320 or the sine wave mode unit 330 and the additional sine wave unit 340. If the decoding process is not performed, and not tonal, the decoding process may be performed by the generic mode unit 325 or the generic mode unit 325 and the additional sine wave unit 335.

For example, the generic mode unit 325 configures the HF signal by adaptive sub-band replica. Two sinusoidal components are then added to the spectrum of the first SWB enhancement layer. Generic mode and sine mode utilize similar enhancement layers that underlie sine mode coding.

The sine wave mode unit 330 generates a high frequency (HF) signal based on a finite set of sine wave components. The additional

sine wave units

335 and 340 add sine waves to the upper SWB layer and improve the quality of the high band content.

The IMDCT unit 350 performs an inverse MDCT to output a signal in the time domain, and the second post-processing unit 355 improves the inverse MDCT processed signal in the time domain.

The adder 360 adds the SWB signal decoded and upsampled by the core decoder and the SWB signal output from the SWB decoder 320 and outputs a reconstructed signal.

4 is a diagram illustrating an example of a decoder configuration based on the configuration of a core decoder. Referring to FIG. 4, the decoder 400 includes a core decoder 410, a post-processing / sampling transformer 450, an inverse quantizer 460, an upper MDCT coefficient generator 470, and an MDCT inverse transformer 480. And a post-processing filtering unit 490.

The bitstream including the NB signal or WB signal transmitted from the encoder is input to the core decoder 410.

The core decoder 410 includes an inverse transformer 420, a linear prediction synthesizer 430, and an MDCT transformer 440.

The inverse transform unit 420 may inverse transform the audio information encoded in the CELP mode and restore the excitation signal based on a parameter received from the encoder. The inverse transform unit 420 may transmit the reconstructed excitation signal to the linear prediction synthesis unit 430.

The linear prediction synthesizer 430 may reconstruct a lower band signal (NB signal, WB signal, etc.) using the excitation signal transmitted from the inverse transformer 420 and the linear prediction coefficient transmitted from the encoder.

The lower band signal (12.8 kHz) reconstructed by the linear prediction synthesis unit 430 may be downsampled to NB or upsampled to WB. The WB signal is output to the post-processing / sampling converter 450 or to the MDCT converter 440.

The post-processing / sampling converter 450 may up-sample the NB signal or the WB signal to generate a synthesized signal for use in restoring the SWB signal.

The MDCT converter 440 MDCT transforms the restored lower band signal and transmits the MDCT coefficient generator 470.

The inverse quantizer 460 and the upper MDCT coefficient generator 470 correspond to the SWB decoder of the decoder illustrated in FIG. 3.

The dequantizer 460 receives the SWB signal and the parameter quantized through the bitstream from the encoder and dequantizes the received information.

The dequantized SWB signal and the parameter are transmitted to the upper MDCT coefficient generator 470.

The upper MDCT coefficient generator 470 receives the MDCT coefficients for the synthesized NB signal or the WB signal from the core decoder 410, and receives necessary parameters from the bitstream for the SWB signal to dequantize the SWB. Generate MDCT coefficients for the signal. As shown in FIG. 3, the upper MDCT coefficient generator 470 may apply the generic mode or the sine mode according to whether the signal is tonal, and may apply an additional sine wave to the signal of the enhancement layer.

The MDCT inverse transform unit 480 restores a signal through an inverse transform on the generated MDCT coefficients.

The post processing filter 490 may apply filtering on the restored signal. Filtering allows for post-processing such as reducing quantization errors, highlighting peaks and killing valleys.

The SWB signal may be restored by synthesizing the signal restored by the post-processing filter 490 and the signal restored by the post-processing / sampling converter 450.

As described with reference to FIGS. 1 to 4, the band extension method passes through a core encoder and an enhancement layer processor (SWB encoder) to encode a SWB input signal. To decode the SWB signal, a core decoder and an enhancement layer processor (SWB decoder) are used.

In order to encode the signal information corresponding to the WB among the SWB input signals, the SWB signal is downsampled at a sampling rate corresponding to the WB and encoded by a WB encoder (core encoder).

In order to be used for encoding the SWB signal, the encoded WB signal is synthesized and then MDCT transformed, and the MDCT coefficients for the WB may be input to the SWB encoder. The SWB input signal is encoded by being divided into a generic mode and a sine mode according to the degree of tonality in the MDCT coefficient domain after MDCT conversion. In order to increase encoding efficiency, encoding for an enhancement layer may be further performed using an additional sine wave.

Signal information corresponding to WB among SWB signals is decoded by a WB decoder (core decoder). The decoded WB signal is synthesized and then MDCT-converted so that the MDCT coefficients for the WB can be input to the SWB decoder. The encoded SWB signal is decoded by being divided into a generic mode and a sine mode corresponding to the encoded mode, and further, decoding of an enhancement layer may be performed using an additional sine wave. The inverted SWB signal and the WB signal may be synthesized through additional post-processing such as upsampling and then restored to the SWB signal.

Hereinafter, the sine mode will be described in relation to the present invention.

The sine mode does not encode all sine waves constituting the audio signal (also called sine wave components constituting the audio signal), but encodes only sine waves having a high energy among sine waves constituting the audio signal. Therefore, unlike when encoding all sine waves, in the sine mode, the encoder encodes the selected sine wave as well as the amplitude information and the sign information and transmits the position information of the selected sine wave to the decoder.

In this case, the sine waves constituting the audio signal mean MDCT coefficients X (k) obtained by MDCT transforming the respective sine waves constituting the audio signal. Therefore, when describing the characteristics of the sine wave in the sine mode in the present specification, the magnitude of the sine wave is the magnitude (C) of the MDCT coefficient obtained by MDCT conversion of the sine wave component, the sign of the sine wave component, Note the position (pos). The position of the sine wave is a position in the frequency domain, and may be a wave number k specifying each sine wave constituting the audio signal, or an index corresponding to the wave number k.

In the present specification, for convenience of description, it is noted that the MDCT coefficient of each sine wave component constituting the audio signal is simply displayed as 'sine wave' or 'pulse'. Therefore, in the present specification, unless otherwise specified, 'sine wave' or 'pulse' may mean an MDCT coefficient of each sine wave component constituting the input audio signal.

In addition, in the present specification, for convenience of description, the position of the sine wave is described by specifying the wave number of the sine wave. However, this is for convenience of description and the present invention is not limited thereto, and the contents of the present invention may be equally applied even when using separate information for specifying the positions of the sine waves in the frequency domain as the position of the sine wave.

The sine mode is not suitable for encoding all sine waves because it needs to transmit position information of the sine wave, but is effective when a small number of sine waves should be used to guarantee sound quality or a low bit rate should be transmitted. Therefore, it can be used for a band extension technique or a low bit rate audio codec.

Referring to FIG. 5, sine waves constituting the input audio signal are located corresponding to the wave number k of each sine wave.

An upward sine wave represents a positive MDCT coefficient, and a downward sine wave represents a negative MDCT coefficient. The magnitude of the sine wave (MDCT coefficient) corresponds to the length of the sine wave.

5 illustrates a case where a positive sine wave having a size 126 is positioned at position 4 and a negative sine wave having a size 18 is positioned at position 74 as an example. In the sine mode, as described above, magnitude information, sign information, and position information of a sine wave are transmitted.

Assuming a case where two largest sine waves are retrieved and corresponding information is encoded, in the example of FIG. 5, information [size: 126 code: + position: 4] of the first sine wave located at position 4 is encoded and the second is encoded. The sine wave information [Size: 74 code:-Position: 18] can be encoded.

As described above, in the sinusoidal mode, not all components are encoded, but only a sinusoid having a large energy. Therefore, unlike the case of encoding all components, in the sine mode, not only the size information and the sign information of the sine waves to be encoded, but also the position information of the sine waves to be encoded should be transmitted.

For example, in consideration of the layer structure, in the layer 6 which is the first layer of the SWB signal, a predetermined number of sine waves may be searched and quantized for each track.

In the example of FIG. 6, as shown in FIG. 6, two sine waves are searched and quantized for tracks 1 to 3 of six tracks (tracks 0 to 5), and one sine wave for track 4 and track 5 is quantized. Is searched and quantized. The search can be performed for each track.

In the example of FIG. 6, S_i_j (i, j is an integer of 0 or more) means a j-th sine wave of the i-th track segment. In the example of FIG. 6, the search may proceed in the order of track 0 → track 1 → track 2 → track 3 → track 4 → track 5.

As described above, suppose that two sine waves are searched for each track for tracks 0 to 3 and one sine wave is searched for each track for tracks 4 and 5. In this case, referring to FIG. 6, a total of 10 sine waves in layer 6 may be searched in the order of S_0_1 → S_0_2 → S_1_1 → S_1_2 → S_2_1 → S_2_2 → S_3_1 → S_3_2 → S_4_1 → S_5_1.

Meanwhile, in the case of the G.718-based SWB scalable extension encoder as shown in FIG. 1, a baseline coder and an enhancement layer may be included in order to encode a 32 KHz SWB input signal. .

For example, in order to encode a signal having a WB bandwidth among SWB input signals, downsampling may be performed at 16 kHz, and the downsampled signal may be encoded by a WB encoder. Since the encoded WB signal is used for encoding in the enhancement layer, it is converted to MDCT after synthesis, and the WB MDCT coefficient is input to the enhancement layer. For the encoding of the enhancement layer, the SWB input signal may be converted into MDCT and then divided into two modes (general mode and sine mode) according to the degree of tonality in the MDCT coefficient domain.

The sinusoidal mode may be processed in parallel with the generic mode in the case of layer 6.

In the case of a frame determined to be tonal, it may be encoded in a sine mode. In the case of layer 6, in the sine mode, 10 pulses may be extracted from an HF (High Frequency) signal and encoded. For example, the first four pulses are extracted in the band corresponding to 7000-8600 Hz, the next four pulses are extracted in the 8600-10200 Hz band, and the last two pulses are respectively the 10200-11800 Hz band and 11800-12600 Hz. Extracted from the band.

As described above, in the sine mode, the position of the pulse is quantized / coded and transmitted. The position of the extracted pulse is the original signal M ₃₂ (k) and the HF composite signal.

It can be determined using the difference value of. At this time, M is the magnitude of the MDCT coefficient, k is the position of the pulse (sine wave) represents a wave number (wave number). Thus, M ₃₂ (k) represents the pulse magnitude at position k for the SWB up to 32 KHz.

Ten pulses (positions of pulses) to be encoded may be determined through Equation 1. Specifically, through Equation 2, the original signal M ₃₂ (k) and the HF composite signal

The ten pulses with the largest difference value can be determined.

<수식 2><Formula 2>

Since the HF synthesized signal is a synthesized signal in the previous layer, a signal having a large position between the original signal and the HF synthesized signal may be determined as a signal to be encoded in the current layer.

Since the HF composite signal does not exist in the sine mode of the layer 6 which is the base layer for the SWB signal, the HF composite signal in Equation 2

You can set the value of to 0. Therefore, in the case of layer 6, the pulse search process using Equation 2 is a process of finding the maximum value of the original signal M ₃₂ (k) .

In D (k) , the entire band may be divided into five subbands to make D _j (k) for each subband. In this case, the pulse number N _j of each subband may be predetermined. Dj (k) is the difference between the original signal and the HF synthesized signal at k of subband j , and N _j is the number of pulses searched in subband j .

Table 1 schematically illustrates the process of finding the N _j largest D values (the largest original signal in the case of layer 6) for each subband.

<표 1>TABLE 1

Using the sorting method shown in the example in Table 1, the maximum value N can be retrieved and the retrieved N value can be stored in an array called input_data.

In the example of FIG. 7, each sine wave (MDCT coefficient) constituting the audio signal in the frequency domain is displayed at a position corresponding to the wave number of each sine wave.

Track 0 is located in the frequency range of 280 to 342, and consists of sine waves with a spacing of two (2 steps, 2 steps) in the position unit (for example, wave number or frequency). Track 1 is located in the frequency range of 281 to 343, and consists of sine waves with an interval of two. Track 2 is located in the frequency range of 344 ~ 406, and consists of sine waves spaced by two. Track 3 is located in the frequency range of 345 ~ 407, and consists of sine waves with intervals of two. Track 4 is located in the frequency range of 408 ~ 471, and consists of sine waves having an interval of one step (1 step, 1 step). Track 5 is located in the frequency range of 472 ~ 503, and consists of sine waves with intervals of one.

In the sine wave mode, a sine wave (pulse) satisfying a predetermined condition is searched by a predetermined number for each track according to the track order, and quantized. In the present specification, quantizing a sine wave (pulse) when sine wave mode is applied may include quantizing a MDCT coefficient of a sine wave (pulse). The MDCT coefficient may mean the magnitude of a sine wave at a specific frequency.

In this specification, quantizing a sinusoid includes (1) quantizing the magnitude of the sine wave (absolute value of the MDCT coefficients), (2) quantizing the frequency of the sine wave (position of the MDCT coefficients), and (3) Quantizing the phase (Sign) of the MDCT coefficients.

In addition, in the present specification, a pulse may mean an MDCT coefficient that is a magnitude of a sine wave. It may also be referred to as a maximum sinusoid or sinusoidal maximum in that a pulse may mean a sinusoidal peak at a particular frequency. In the present specification, staging such as 'maximum sine wave (pulse)' or 'pulse (maximum sine wave)' indicates that the maximum sine wave and the pulse may have the same meaning. It does not mean that pulses can mean different things. Thus, quantizing a pulse herein includes (1) quantizing the magnitude of the pulse (MDCT coefficient) and (2) quantizing the position of the pulse. In this case, quantizing the magnitude of the pulse may include quantizing the absolute value of the pulse and quantizing the sign of the pulse.

In addition, in the present specification, quantizing a sine wave means that the sine wave is quantized in that the sine wave is encoded so that the sine wave can be recovered after selecting and quantizing a specific pulse among the pulses (MDCT coefficients) constituting the sine wave. It may be used as.

Meanwhile, the sine wave may mean a signal of each frequency in the sine wave mode, and the pulse may mean a signal at a specific position in the CELP mode.

However, in the present specification, 'sine wave' excludes 'pulse', or 'pulse' does not exclude 'sine wave'.

In layer 6, two pulses are searched and quantized in each of four tracks from track 0 to track 3 according to bit allocation, and one pulse is searched and quantized in track 4 and track 5, respectively.

The search in each track can be said to find the largest pulse in a track by the number allocated for each track.

<표 2>TABLE 2

Table 2 shows the number of sine waves (pulses) extracted by the search for each track in the sine mode, the starting position of the track (starting position of the search), the interval size of each pulse position, and the number of pulses for each track. Indicates.

N _j pulses extracted for each track have position information pos _j (l) (l = 0, ..., N _j ), and the position information is related to the start position of each track.

The magnitude c _j (l) of the extracted pulse may be encoded as shown in Equation 3.

<수식 3><Equation 3>

cc _jj (l) = log( | D (l) = log (| D _jj (( pospos _jj (l) ) | ) (l) ) | )

According to Equation 3, the magnitude value is encoded, but the sign information is lost. Therefore, the sign value of the pulse may be separately encoded by the following Equation 4.

<수식 4><Equation 4>

In this case, when N _j = 2, the code value of the first pulse is transmitted for each track, rather than the code values of both searched pulses. Sign value information of other pulses can be derived using Table 3 when encoding the sign value of the first pulse.

<표 3>TABLE 3

In Table 3, pos _j (0), Sign_sin _j (0), and c _j (0) indicate the position, sign, and magnitude of a large pulse, and pos _j (1), Sign_sin _j (1), and c _j (1 ) Denotes the position, symbol, and magnitude of the small pulse.

According to the method of Table 3, if a large pulse is positioned ahead of the smaller pulse on the frequency axis, the magnitude of the two pulses is derived from the same sign, and the larger pulse is positioned behind the smaller pulse on the frequency axis. The sign of the two pulses can then be derived to be different. Therefore, on the decoder side, when the encoder receives the aligned information according to the scheme of Table 3, it is possible to derive the sign of the two pulses.

In case of Layer 6, the encoding is performed by using the original signal as a target signal in Equation 2, but in the case of an upper layer of layer 6, for example, in the case of Layer 7 or Layer 8, as shown in Equation 2, the original signal of the previous layer The encoding is performed by using the difference between the synthesized signal and the higher layer synthesized signal as a target signal. In an upper layer, an uncoded signal may be encoded and transmitted in a lower layer.

The encoding method performed on the upper layer of layer 6 is also similar to the encoding method described above with respect to layer 6.

In encoding for the first layer of the SWB enhancement layer, an additional 10 pulses may be extracted from the HF (7 to 14 kHz) signal. In layer 7, a frequency band to be encoded may be set differently according to a generic mode and a sine mode.

HF signal from generic mode

Is divided into eight subbands and energy is calculated for each subband. Each subband is composed of 32 MDCT coefficients as shown in Table 2, and the energy calculation method in each subband is shown in Equation 5.

<수식 5><Equation 5>

In Equation 5,

Is the HF signal resynthesized via generic mode.

In the seventh layer, eight subbands may be arranged in order of energy magnitude from the subband having the highest energy by comparing the energy of each subband with each other. Five subbands with the highest energy among the aligned subbands are selected and five pulses are extracted for each subband according to the sine wave coding method described in Layer 6. At this time, the position of the track defined in the sine wave coding method depends on the energy characteristic of the HF signal for each frame.

HF signal output in sine mode

A total of 10 pulses extracted from are extracted through two processes, four extraction and six extraction. Four pulses may be extracted at positions corresponding to the 9400 to 11000 Hz band, and six pulses may be extracted at positions corresponding to the 11000 to 13400 Hz band.

Table 4 shows information for each track in the sine mode (sign mode frame) of layer 7.

<표 4> TABLE 4

Table 4 shows the number of sine waves extracted by the search for each track of the layer 7 as the encoding target, the start position of the track (start position of the search), the interval size of the pulse position of each track, and the number of pulses.

Meanwhile, in layer 8, additional 20 pulses are extracted, and a slight difference is added to the mode of layer 6 in the same manner as in layer 7.

In generic mode (generic mode frame), two different processes of extracting 10 pulses are performed.

Six of the first 10 pulses can be extracted from two tracks on three tracks, with the bands extracted from 9750 to 12150 Hz. The remaining four of the first 10 pulses can be extracted two by two tracks, and the band from which the pulses are extracted is 12150 to 13750 Hz.

The extraction of the remaining 10 pulses out of 20 pulses is similar. The first six of the ten pulses can be extracted from three tracks, two per track, and the band from which the pulses are extracted is 8600 to 11000 Hz. The remaining four pulses can be extracted two by two from two tracks, and the band from which the pulses are extracted is 11000-12600 Hz.

Table 5 describes an example of a sine wave track structure in the generic mode frame of Layer 8.

<표 5>TABLE 5

In the sine mode (sine mode frame), two different processes of extracting 10 pulses are performed.

Six of the first ten pulses can be extracted two tracks per track from three tracks, with the band being extracted from 7000 to 9400 Hz. The remaining four pulses of the first ten pulses can be extracted two by two tracks, and the band from which the pulses are extracted is 11000 to 12600 Hz.

The extraction of the remaining 10 pulses out of 20 pulses is similar. The first six of the ten pulses can be extracted per track from two tracks on three tracks, with the band being extracted from 94000 to 11000 Hz. The remaining four pulses can be extracted two by two from two tracks, and the band from which the pulses are extracted is from 11000 to 13400 Hz.

Table 6 shows an example of a sinusoidal track structure for a first set of extracting the first 10 pulses of 20 pulses in a sine mode frame of Layer 8.

<표 6>TABLE 6

Table 7 shows an example of a sinusoidal track structure for a second set of extracting the second 10 of 20 pulses in a sine mode frame of Layer 8.

<표 7>TABLE 7

On the other hand, looking at the sine wave tracks of Tables 2, 4, 5, 6, and 7, it can be seen that each track consists of 2 steps and 3 steps.

8 is a diagram schematically illustrating an example in which two tracks are paired in the case of two steps. Referring to FIG. 8, when track 0 and track 1 in two steps are paired, it can be seen that pulse positions of both tracks are adjacent to each other.

9 is a diagram schematically illustrating an example in which three tracks are paired in the case of three steps. Referring to FIG. 9, when tracks 2, 3, and 4 of 3 steps are paired, it can be seen that pulse positions of each track are adjacent to each other.

In the process of searching for a sine wave in the SWB based on the conventional G.718, the search is performed independently from each track while sequentially searching from the first track to the last track.

Ideally, for non-contiguous tracks (for non-paired tracks), the search for each track is done independently, but using conventional methods, even for adjacent tracks (for paired tracks), The tracks are searched independently without considering the characteristics of each track.

For example, considering a track pair consisting of two steps, the sine waves searched in the first track do not affect the search of the second track paired with the first track. Even in a three-step track pair, the sine waves found in the first track do not affect the sine wave search in the second track, and the sine waves found in the second track do not affect the sine wave search in the third track.

As a more specific example, when two adjacent tracks are paired (two step tracks are paired), most MDCT coefficients have a large absolute value in the first track, and a relatively small absolute value in the second track. Assume that MDCT coefficients exist. In this case, if there are two sine waves searched in each track, the third largest MDCT coefficient in the first track may be more important (greater) than the largest MDCT coefficient in the second track. This example is a case where frequent pulses belonging to two tracks are adjacent to each other (two tracks are paired), and this may have a big influence on the coding performance.

Therefore, when encoding a SWB signal, it is necessary to search for a pulse to be encoded in consideration of the characteristics of adjacent tracks. In addition, even when decoding the SWB signal, it is necessary to decode the encoded signal in consideration of the characteristics of adjacent tracks.

Hereinafter, a method of searching for a sine wave or pulse in consideration of characteristics of a track pair in tracks (track pair) having adjacent positions according to the present invention will be described.

The present invention can be applied not only to layer 6 which is a base layer of SWB, but also to layer 7 and layer 8 which are enhancement layers of layer 6. In consideration of characteristics between tracks, that is, by searching for MDCT coefficients (sine waves or pulses) to be encoded in the current track in consideration of MDCT coefficient magnitudes of adjacent tracks, more efficient searching is possible than when searching each track independently.

So far, an example based on an absolute value of Equation 2 has been described as a method of searching for a signal. However, the present invention is not limited thereto, and is based on a convolution value of an impulse response of, for example, a linear prediction coefficient (LPC) synthesis filter. Or search based on Mean Square Error (MSE). A method based on convolution and a method based on MSE will be described later.

10 is a flowchart schematically illustrating a sinusoid search method applied to each layer according to an example of the present invention. The example of FIG. 10 may be performed by the SWB encoder of FIG. 1. Also, some or all of the steps in the example of FIG. 10 may be performed by the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. For convenience of explanation, the steps of FIG. 10 are performed by the SWB encoder and / or the SWB decoder.

Referring to FIG. 10, first, a target signal is generated (S1010). In this case, the target signal may be MDCT coefficients to be quantized. The SWB encoder and / or the SWB decoder may generate MDCT coefficients (target signals) to be quantized.

The absolute value of the generated target signal (MDCT coefficient to be quantized) is calculated (S1020). The SWB encoder and / or the SWB decoder calculates an absolute value for the MDCT coefficients to be quantized. The absolute value of the MDCT coefficient can be calculated using Equation 2.

In the case of the sine mode of the layer 6, since there is no HF composite signal, the absolute value of the MDCT coefficient M ₃₂ (k) of the original signal can be obtained. For layers other than layer 6 (eg, layer 7 or layer 8), MDCT coefficients M ₃₂ (k) of the original signal and MDCT coefficients of the HF composite signal

You can find the absolute value of the difference between them.

Based on the calculated absolute value, a sine wave (maximum sine wave, maxima sinusoid) having a maximum value may be searched for (S1030).

In the present specification, for convenience of description, a sine wave maximizing D (k) of Equation 2 may be referred to as a maximum sinusoid.

The SWB encoder and / or the SWB decoder may search for at least one maximum sine wave in each track. At this time, the number of searched maximum sine waves (maxima) may be determined for each track.

The SWB encoder and / or the SWB decoder may search for a sine wave having an absolute value of Equation 2 by a predetermined number for each track. For example, in the case of layer 6, at least two maximum sine waves may be searched in tracks 0 to 3, and at least one maximum sine wave may be searched in tracks 4 and 5.

When the maximum sine waves are found, a position change for quantizing a sign of the sine wave may be performed (S1040). The position change may be performed in a track in which two or more sine waves are searched. Therefore, this step may not be performed when one sine wave is found. The location change may be performed based on the method described in Table 3. For example, considering the case of transmitting two sine waves or two pulses, the sign (+ or-) of the first sine wave / pulse is encoded. At this time, if the first sine wave / pulse is greater than the second sine wave / pulse, the signs of the two sine waves / pulse are the same, and if the magnitude of the first sine wave / pulse is smaller than the second sine wave / pulse, the signs of the two sine waves / pulse are different. Can be determined.

As described above, the SWB encoder may set a position of a sine wave / pulse so that a code can be derived. The SWB decoder decodes the sign of the first sine wave / pulse, determines that the sign of the two sine waves / pulses is the same when the magnitude of the first sine wave / pulse is larger than the second sine wave / pulse bota, and the magnitude of the first sine wave / pulse If is smaller than the second sine wave / pulse, it can be determined that the coding of the two sine wave / pulse is different.

For amplitude quantization, the signal amplitudes of the searched sine waves / pulses are grouped (S1070). The SWB encoder and / or the SWB decoder may group signal amplitudes according to a sine wave / pulse group to be quantized. Grouping may be performed regardless of the track.

For example, in the case of layer 6, the amplitudes of 10 detected signals (sine waves or pulses) are grouped. Sine waves or pulses may be grouped in sequence of three, three, and four. Specifically, three signal magnitudes are grouped in Group 1, where the signal magnitudes of the two signals found in track 0 and one of the two signals found in track 1 may be grouped. In the group 2, three signal sizes may be grouped. The signal size of one of the two signals found in the track 1 and the signal size of the two signals found in the track 2 may be grouped. In the case of group 3, four signal magnitudes may be grouped. The signal magnitudes of two signals retrieved in track 3, the signal magnitude of signals retrieved in track 4, and the signal magnitudes of signals retrieved in track 5 may be grouped.

The grouped signal magnitudes may be quantized in group units (S1080). The SWB encoder and / or the SWB decoder may perform quantization based on multi-dimension vector quantization (VQ).

In the sine wave search step of FIG. 10, pulses of adjacent positions between two paired tracks may be searched or pulses of mutually separated positions between two paired tracks may be searched.

Hereinafter, each case will be described in detail with reference to the drawings.

인접 위치의 펄스를 검색하는 방법How to Search for Pulses in Adjacent Locations

In the sine wave search step of FIG. 10, the SWB encoder may search for and encode adjacent pulses among pulses of paired tracks. In addition, the SWB decoder may restore the SWB signal by decoding adjacent pulses among the pulses of the paired tracks. In this case, the pulse may be a sinusoidal MDCT coefficient as described above.

In the case where the tracks of layer 6 in sine wave mode are as shown in the example of FIG. 7, it can be seen that there are two track pairs. In the example of FIG. 7, track 0 and track 1 are two pairs of track pairs, and track 2 and track 3 are two pairs of tracks.

These track pairs exist on layers other than layer 6.

Referring to Table 4, in the case of layer 7, track 0 and track 1 of two steps form a track pair, and track 2, track 3, and track 4 of three steps form another track pair.

Referring to Table 5, in the generic mode frame of Layer 8, we can see that track 0, track 1, and track 2 of 3 steps form one track pair, and track 3 and track 4 of 2 steps form another track pair. have. There is no distinction between generic mode and sine wave mode in layer 8, and generic mode frames of layer 8 refer to frames processed in generic mode in layer 7 of frames processed in layer 8.

Referring to Table 6, for the first set of sine wave mode frames in Layer 8, track 0, track 1, and track 2 in 3 steps form one track pair, and track 3 and track 4 in 2 steps You can see different track pairs.

Also, referring to Table 7, for the second set of sine wave mode frames in Layer 8, track 0 and track 1 in 2 steps form one track pair, track 2, track 3, and track 4 in 3 steps. Can be seen to form another track pair.

There is no distinction between generic mode and sine wave mode in layer 8, and the sine wave mode frame of layer 8 refers to frames processed in sine wave mode in layer 7 among frames processed in layer 8.

In the present invention, the tracks of the track pairs are searched in consideration of characteristics of the tracks. In other words, when a pulse is searched for tracks that make up a track pair, the search is performed in consideration of pulses searched in other tracks.

On the contrary, if sinusoids are independently searched for each track without considering the characteristics of the track pairs, the encoding efficiency may be degraded, and thus, the original signal may not be effectively restored during the decoding process.

Referring to FIG. 11, in the case of performing an independent sine wave search for each track, after the pulse 1 and the pulse 2 are searched in the track 0, the pulse 3 and the pulse 4 are searched in the track 1. .

In detail, when two pulses having a large absolute value are found in the track 0, the position, the sign, and the amplitude of the pulse are quantized. Amplitude, or gain, may be quantized through grouping with other tracks. This is the same as described in step S1070 of FIG. 10.

After quantization of the pulses searched for in track 0 is completed, the same search may be performed for track 1 to quantize information about the searched pulses. At this time, the search / quantization for track 1 is performed separately and independently from the search result of track 0.

As illustrated in FIG. 11, when a search is independently performed without considering search results of other tracks between tracks of track pairs, a case in which more important pulses may not be encoded among pulses of a track pair may occur.

Therefore, it is necessary to search for a pulse (sine wave) to be coded in consideration of the characteristics between the tracks constituting the track pair.

According to an embodiment of the present invention, when tracks having adjacent positions form a track pair, a sine wave value may be retrieved from a second track based on a sine wave value first detected from the track.

In this specification, that tracks having adjacent positions form a track pair means that the tracks in the track pair have the same step (pulse interval), and each pulse in the track paired tracks is adjacent to each other in the track paired adjacent tracks. It means the case.

In other words, the present embodiment is applicable to adjacent tracks, for example, when three tracks having three steps form a track pair, or two tracks having two steps form a track pair.

12 is a diagram schematically illustrating an example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention. In the example of FIG. 12, when a search is performed on any one track among tracks constituting a track pair, a method of selecting a pulse searched in another track and an adjacent pulse as a pulse to be encoded.

In the example of FIG. 12, as described above, it is assumed that each track constituting the track pair is adjacent to each other.

In addition, the example of FIG. 12 may be performed by the SWB encoder of FIG. 1. In addition, some or all steps of the example of FIG. 12 may be performed by the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. For convenience of description, the steps of FIG. 12 are performed by the SWB encoder and / or the SWB decoder.

Referring to FIG. 12, a target signal is first generated (S1200). In this case, the target signal may be pulses to be quantized, that is, MDCT coefficients. The SWB encoder and / or the SWB decoder may generate MDCT coefficients (target signals) to be quantized.

The absolute value of the generated target signal (MDCT coefficient to be quantized) is calculated (S1205). The SWB encoder and / or the SWB decoder calculates an absolute value for the MDCT coefficients to be quantized. The absolute value of the MDCT coefficient can be calculated using Equation 2.

You can find the absolute value of the difference between them.

Here, the absolute value is calculated immediately after generating the target signal. However, the absolute value of Equation 2 may be generated in the process of searching for the maximum value pulse for each track while searching for each track of the track pair.

It is determined whether there is a track pair in the generated target signal (S1210). The SWB encoder and / or the SWB decoder determines whether a track pair exists in the MDCT coefficients (target signal) to be encoded.

A track pair may consist of tracks having the same steps (pulse intervals) and whose position in the track is adjacent to each other with the position of the pulse in the adjacent track.

If there are track pairs in the target signal, the energy of tracks constituting the track pair is calculated (S1215). For example, when track 0 and track 1 constitute a track pair, the SWB encoder and / or SWB decoder may calculate the energy of track 0 and the energy of track 1.

The energy between the tracks constituting the track pair is compared with each other (S1220). The SWB encoder and / or the SWB decoder may search for the tracks in the order of the highest energy among the tracks constituting the track pair. For example, if track 0 and track 1 constitute a track pair and the energy of track 1 is greater than the energy of track 0, the SWB encoder and / or SWB decoder may first search for track 1 and then search for track 0.

However, the processing in the energy order is a case of searching, and each process may be performed in the subsequent tracks according to the original track order. For example, when the bitstream is formed by quantization, track 0 may be processed first and track 1 may be processed subsequently.

If the order to search for the track is determined, candidate pulses are searched for each track according to the search order (S1225). For example, when a track pair is composed of two tracks, the SWB encoder and / or the SWB decoder may search for candidate pulses in a high energy track and then search for candidate pulses in a low energy track. When the number of pulses to be searched and encoded in each track is determined, the SWB encoder and / or SWB decoder may search for candidates of the pulse to be encoded by searching a predetermined number more than the number of pulses to be searched and encoded.

Assume that the number of pulses to be searched and encoded in a track with high energy is N1 (N1 is an integer greater than 0), and the number of pulses to be searched and encoded in a track with low energy is N2 (N2 is an integer greater than 0). At this time, N1 and N2 may be the same. The SWB encoder and / or the SWB decoder may search for N1 + n1 pulses (n1 is an integer greater than or equal to 0) pulses on a high energy track, and N2 + n2 (n2 is an integer greater than 0) pulses on a low energy track. Can be. n2 may be greater than or equal to n1.

N1 + n1 pulses and N2 + n2 pulses may be selected in the order of the largest absolute value according to Equation 2 in each track.

For example, if the number of pulses to be searched in the track pairing tracks is 2, as many as 2 + n1 pulses are selected in the high energy track and as many as 2 + n2 pulses are selected in the low energy track. Can be.

In the track with a large energy, as many pulses as the number to search and encode are selected (S1230). The SWB encoder and / or the SWB decoder may select the maximum sine wave (pulse) as many as the number to search and encode in a track having a large energy among tracks constituting the track pair.

In the track with small energy, the searched pulses (maximum sine waves) are compared with the pulse searched in the track with the largest energy in the order of the greatest absolute value (S1235). The SWB encoder and / or the SWB decoder may compare the N2 + n2 pulses found in the low energy track with the pulses found in the high energy track. In this case, the N2 + n2 pulses may be compared with the pulses selected in the tracks with the highest energy in the order of the greatest absolute value.

If among the pulses searched on the low energy track (low importance track), there is a pulse in the position adjacent to the selected pulse on the high energy track (high importance track), the two adjacent pulses are each to be encoded in the high energy track. And one of the pulses to be signed in the track having a small energy (S1240).

If there are no pulses in the position adjacent to the pulse found in the high energy track among the pulses searched in the low energy track, the pulse with the largest absolute value (maximum sine wave) in the low energy track may be selected.

For example, suppose that the pulse selected from the tracks with the highest energy among the track pairing tracks is P _t1 (t1 = 1… N1), and the pulse retrieved from the track with the smallest energy is P _t2 (t2 = 1… N2 + n2). . N1 x (N2 + n2) pulse combinations P _t1 and P _t2 may be configured, which may consist of N1 pulses selected on the high energy track and N2 + n2 pulses retrieved on the low energy track. If P _t1 and P _t2 are contiguous combinations (P _{t1, adj} , P _{t2, adj} ), then P _{t1, adj} is determined as the pulse to be encoded in the high energy track and P _{t2, adj} is encoded in the low energy track. It can be determined by the pulse to be.

When a plurality of pulses are selected and encoded in each track, adjacent pulse combinations between tracks constituting the track pair can be further selected.

When there are a plurality of adjacent pulse combinations among the tracks constituting the track pair, the absolute values may be selected in order from the adjacent pulse pairs.

If there are no adjacent pulse combinations between tracks that make up a pair of tracks, N2 pulses (maximum sine waves) can be selected in the order of the greatest absolute value, as in the case of the high energy track, even in the low energy track.

Meanwhile, in S1230, only the number of pulses (maximum sine wave) to be encoded in the track having a large energy is selected. However, the present invention is not limited thereto. For example, without performing step S1230, it is also possible to consider both N1 + n1 pulses found in the high energy tracks and N2 + n2 tracks found in the low energy tracks among the track pairing tracks.

In detail, a pulse searched for a track having a high energy among tracks constituting a track pair is called P _t1 (t1 = 1 ... N1 + n1), and a pulse searched for a track having a small energy is referred to as P _t2 (t2 = 1 ... N2 + n2). Let's say (N1 + n1) x (N2 + n2) pulse combinations (P _t1 , P _t2 ) that can be composed of N1 + n1 pulses found on high energy tracks and N2 + n2 pulses found on low energy tracks This can be configured. If P _t1 and P _t2 are contiguous combinations (P _{t1, adj} , P _{t2, adj} ), then P _{t1, adj} is determined as the pulse to be encoded in the high energy track and P _{t2, adj} is encoded in the low energy track. It may be determined by the pulse to be.

The position may be changed to quantize the sign of the selected pulse (S1245). In steps S1235 and S1240 of selecting an encoding target pulse, the steps are performed in consideration of the pulses found in other tracks, but only the pulses in the same track are considered in the position change step.

The position change is for transmitting only one sign bit per track. If the two selected pulses in the track have the same sign, the pulse with the larger absolute value is placed in the front position. If the two pulses are different, the pulse with the small absolute value is different. This is done by placing in the front position.

Thus, the position of the pulse may or may not change depending on whether the signs of the two selected pulses within the same track are the same or different.

The specific method of changing the position is as described in Table 3.

If the position of the selected pulse is determined, the position of the pulse is quantized (S1250). At this time, in quantizing the information indicating the position of the pulse, the quantization target position is a position determined in consideration of the sign of the pulse in step S1245.

Subsequently, the sign and amplitude of the selected pulse may be encoded (S1265).

Quantizing the information indicating the sign and magnitude of the pulse includes: quantizing the sign of the pulse (S1270), size grouping step (S1275) for quantizing the pulse amplitude (S1275), and quantizing the magnitude of the pulse ( S1280) may be included. Quantization of the size indicating information may be performed based on multi-dimensional vector quantization (VQ), and grouping of sizes may be referred to as a prerequisite for multi-dimensional VQ.

On the other hand, when no track pair exists in the target signal, the maximum sine wave can be selected by the number of maximum sine waves (pulses) searched in each track (S1360). The SWB encoder and / or the SWB decoder may search for the maximum sine wave (pulse) by the number of pulses to be encoded for each track and select it as an encoding / quantization target pulse without considering the pulse waves searched for in other tracks.

For the selected pulses (maximum sine waves), the quantization step S1365 of position / magnitude / sign may be performed in the same manner as if a track pair exists.

Meanwhile, in operation S1225, a search for a predetermined number more pulses as candidate pulses than the number of encoding target pulses in each track is described, but the present invention is not limited thereto. For example, in a track with a large energy, only a pulse serving as an encoding target (quantization target) may be searched without searching for a larger number of pulses as candidate pulses than the number of encoding target pulses. In other words, N1 pulses may be searched for in a track of high energy. In this case, step S1230 may not be performed.

In FIG. 12, the candidate pulses are searched for in the tracks (tracks with high energy and tracks with low energy) constituting the track pair, and then pulses to be encoded for each track are selected. However, the present invention is not limited thereto. For example, the candidate pulse may be searched and the encoding target pulse may be selected for each track constituting the track pair. In this case, after selecting the encoding target pulse for the high energy track, the candidate pulses of the low energy track may be searched, and the encoding target pulse of the low energy track may be selected in consideration of the position of the selected pulse in the high energy track. have.

In FIG. 12, in order to determine the importance of a track, the track is divided into an upper importance track and a lower importance track based on the energy of the track. However, the present invention is not limited thereto. As a criterion for determining a track to be searched first, other criteria may be applied in addition to energy. Even in this case, the tracks of the track pairs can be detected in the same manner as described with reference to FIG. 12 to determine the encoding target pulse and quantize the information of the pulse.

As described, the criteria for searching for candidate pulses in each track is the absolute value of the MDCT coefficient (pulse). In this case, other characteristic values may be added as another criterion. In addition, among the searched candidate pulses, the pulses to be finally encoded / quantized may be selected based on whether they are pulses in a position adjacent to the pulses of the higher importance track (the track having higher energy in the example of FIG. 12).

In FIG. 12, a case in which a track having a high energy and a track having a low energy forms one track pair is described as an example. The present invention described with reference to FIG. 12 may be equally applied to a case in which two tracks constitute a track pair, as well as a case in which two or more tracks constitute a track.

Meanwhile, the steps of FIG. 12 may be applied in order to all tracks so that an encoding target pulse may be determined in all tracks of a target signal.

FIG. 13 is a view schematically illustrating another example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention. In the example of FIG. 13, when a search is performed for any one track among tracks constituting a track pair, a method of selecting a pulse adjacent to a pulse searched in other tracks as a pulse to be encoded.

In the example of FIG. 13, it is assumed that each track constituting the track pair is adjacent.

In addition, the example of FIG. 13 may be performed by the SWB encoder of FIG. 1. In addition, some or all of the steps in the example of FIG. 13 may be performed by the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. For convenience of explanation, the steps of FIG. 13 are performed by the SWB encoder and / or the SWB decoder.

Referring to FIG. 13, a target signal is first generated (S1300). In this case, the target signal may be pulses to be quantized, that is, MDCT coefficients. The SWB encoder and / or the SWB decoder may generate MDCT coefficients (target signals) to be quantized.

It is determined whether there is a track pair in the generated target signal (S1305). The SWB encoder and / or the SWB decoder determines whether a track pair exists in the MDCT coefficients (target signal) to be encoded.

If there are track pairs in the target signal, the features of the tracks constituting the track pair are extracted (S1215). In this case, the extracted feature is the same feature for the track pair, and may have different values, for example, different values for each track constituting the track pair.

According to the value of the feature extracted for each track, the order of importance of the tracks may be determined (S1315). For example, considering a case where track-specific energy is used as a feature extracted for each track, a track having a high energy may be determined as a track of high importance, that is, a track in which a search for pulse is performed first.

However, processing in the order according to the extracted feature value is a case of searching, and each process may proceed according to the original track order in other steps later. For example, when the bitstream is formed by quantization, track 0 may be processed first and track 1 may be processed subsequently.

When the feature value is extracted for each track, an order of searching for pulses may be determined according to the feature value of the tracks constituting the track pair. For example, depending on what the feature is, when a track having a large feature value is a more important track, the pulse search may proceed from the track having a large feature value. Alternatively, in the case where a track having a small feature value is a more important track, a pulse search may be performed from a track having a small feature value.

When the importance order for the tracks constituting the track pair is determined, candidate pulses may be searched for each track according to the importance order.

According to the importance order for the tracks, candidate pulses are searched for in the first priority importance track (S1320). If the number of pulses to be encoded in the priority track is M1 (M1 is an integer greater than 0), a predetermined number (m1) more pulses than the number of pulses to be encoded may be searched as candidate pulses in the priority track. .

Among the candidate pulses searched for in the priority ranking track, pulses corresponding to the number M1 of pulses to be coded in the priority ranking track may be selected (S1325). In this case, the number of pulses to be encoded may be equal to the number of pulses searched in the priority tracks when the pulses searched in the other tracks of the track pair (maximum sine wave) are not considered.

Subsequently, candidate pulses are searched for in the second rank importance track (S1330). If the number of pulses to be encoded in the second priority track is M2 (M2 is an integer greater than 0), a predetermined number (m2) more pulses than the number of pulses to be encoded may be searched as candidate pulses of the second priority track. . At this time, the number m2 of additionally searched pulses in the second priority track can be equal to or greater than the number m1 of additionally searched pulses in the first priority track.

The positions of the candidate pulses searched in the 2nd priority importance track and the position of the selected pulses in the 1st priority importance track are compared (S1335).

The SWB encoder and / or the SWB decoder may compare the positions of the M2 + m2 pulses found in the second priority track with the positions of the pulses selected in the first priority track.

If among the pulses retrieved from the 2nd priority track, there is a pulse at a position adjacent to the selected pulse in the 1st priority track, two adjacent pulses are each encoded in the 2nd priority track and one of the pulses to be coded in the 1st priority track. One of the pulses may be selected (S1340).

If there is no pulse in the position adjacent to the pulse selected in the first priority track among the pulses searched in the second rank importance track, the pulse with the largest absolute value (maximum sine wave) in the second rank importance track may be selected.

For example, _suppose that the pulse selected in the first priority track among the track pairing tracks is P _tp1 (tp1 = 1… M1), and the pulse retrieved in the second priority track is P _tp2 (tp2 = 1… M2 + m2). . M1x (M2 + m2) pulse combinations P _tp1 and P _tp2 may be configured, which may consist of M1 pulses selected in the priority track and M2 + m2 pulses found in the priority track.

P _tp1 and P _tp2 is the adjacent combination _{_{(adj P tp1,, P tp2}} , adj) is called when, P _tp1, determines the _adj by one pulse to be encoded in the Priority track P _{tp2, adj} encoding the at secondary importance track It can be determined by the pulse to be.

In the case where a plurality of pulses are selected and encoded in each track, adjacent pulse combinations between tracks forming track pairs can be further selected. When there are a plurality of adjacent pulse combinations among the tracks constituting the track pair, the absolute values may be selected in order from the adjacent pulse pairs. For example, if two pulses are selected and encoded for each track, pulse pairs having the largest pulse absolute value of the second priority track among pulse pairs adjacent to the pulses of the two tracks, and then pulse absolute of the second priority track. A pulse pair with a large value can be selected and encoded / quantized.

If there are no adjacent pulse combinations between the track pairing tracks, M2 pulses (maximum sine waves) are selected in order of absolute magnitude, similar to the method of selecting pulses in the 1st priority track in the 2nd priority track. Can be.

In S1325, only the number of pulses to be coded in the first priority track is selected. However, the present invention is not limited thereto. For example, without performing step S1325, it is also possible to consider both M1 + m1 pulses found in the first priority track and M2 + m2 tracks found in the second priority track among the track pairing tracks.

Specifically, the pulse searched in the first priority track among the track pairing tracks is P _tp1 (tp1 = 1… M1 + m1), and the pulse searched in the second rank importance track is P _tp2 (tp2 = 1… M2 + m2). Let's say A combination of (M1 + m1) x (M2 + m2) pulses (P _tp1 , P _tp2 ) that can consist of M1 + m1 pulses retrieved from the priority track and M2 + m2 pulses retrieved from the priority track. This can be configured. P _tp1 and P _tp2 is the adjacent combination _{_{(adj P tp1,, P tp2}} , adj) is called when, P _tp1, determines the _adj by one pulse to be encoded in the Priority track P _{tp2, adj} encoding the at secondary importance track It may be determined by the pulse to be. If a plurality of pulses are to be selected for each track, the absolute values of the pulses of the second priority track among the pairs of pulses of two adjacent pulses may be selected in ascending order.

The priority tracks from the 3rd priority track to the following priority tracks may search for candidate pulses and select a pulse to be encoded from the candidate pulses.

The candidate pulses for each track are sequentially searched according to the importance, the process of selecting the encoding target pulse is performed, and the candidate pulses are searched for in the lowest priority track (S1345). Assuming that a pair of tracks consists of k tracks, for a number Mk (Mk is an integer greater than 0) to be encoded in the lowest priority track (k rank priority track), the predetermined number (mk) is greater than the number of pulses to be encoded. More pulses can be retrieved as candidate pulses of the lowest priority track. In this case, the number mk of additionally searched pulses in the lowest priority track may be equal to or greater than the number mk-1 of additionally searched pulses in the previous priority track (k-1 priority track).

The positions of the candidate pulses searched in the lowest priority track and the positions of the selected pulses in the high rank priority track are compared (S1350).

The SWB encoder and / or the SWB decoder may compare the positions of the Mk + mk pulses found in the lowest priority track with the positions of the pulses selected in the previous rank priority track.

If among the pulses retrieved in the lowest priority track, there is a pulse at a position adjacent to the selected pulse in the previous priority importance track, two adjacent pulses are each one of the pulses to be encoded in the lowest priority track (k rank importance track) and One of the pulses to be encoded in the previous rank importance track (k-1 rank importance track) may be selected (S1355).

For example, among the track paired tracks, the pulse selected in the previous rank importance track is referred to as P _tpk-1 (tpk-1 = 1 ... Mk-1), and the pulse found in the least significant importance track is referred to as P _tpk (tpk = 1 ... Mk-1). Mk + mk). There are Mk-1 x (Mk + mk) pulse combinations (P _tpk-1 , P _tpk ) that can consist of the Mk-1 pulses selected in the previous priority importance track and the Mk + mk pulses found in the lowest priority importance track. Can be configured.

If P _tpk-1 and P _tpk are contiguous combinations (P _{tpk-1, adj} , P _{tpk, adj} ), then determine P _{tpk-1, adj} as the pulse to be coded in the previous rank importance track and P _{tpk, adj} It can be determined as the pulse to be encoded in the lowest priority track.

In the case where a plurality of pulses are selected and encoded in each track, adjacent pulse combinations between tracks forming track pairs can be further selected. When there are a plurality of adjacent pulse combinations among the tracks constituting the track pair, the absolute values may be selected in order from the adjacent pulse pairs.

If there are no adjacent pulse combinations between the track pairing tracks, then Mk pulses (maximum sine waves) can be selected in the order of largest absolute value in the lowest priority track.

The position may be changed to quantize the sign of the selected pulse (S1345). In the step of selecting the encoding target pulse (S1340, etc.), the steps are performed in consideration of the pulses found in other tracks, but in the position change step, only the pulses in the same track are considered.

The specific method of changing the position is as described in Table 3.

When the position of the selected pulse is determined, the position, magnitude and / or sign of the pulse is quantized (S1365). At this time, in quantizing the information indicating the position of the pulse, the quantization target position is a position determined in consideration of the sign of the pulse in step S1245.

On the other hand, when no track pair exists in the target signal, the maximum sine wave can be selected by the number of maximum sine waves (pulses) searched in each track (S1360). The SWB encoder and / or the SWB decoder may search for the maximum sine wave by the number of pulses to be encoded for each track and select it as an encoding / quantization target pulse without considering the pulse waves searched for in other tracks.

Meanwhile, in operation S1320, a search for a predetermined number more pulses as candidate pulses than the number of encoding target pulses in each track has been described, but the present invention is not limited thereto. For example, in the priority track, the number of pulses larger than the number of encoding target pulses may not be searched as candidate pulses, but only the pulses that are encoding targets (quantization targets) may be searched. That is, unlike the lower priority track, only the M1 pulses may be searched for in the priority track. In this case, step S1325 may not be performed.

In the example of FIG. 13, the searching and selection of the candidate pulses are performed for each track constituting the track pair, but the present invention is not limited thereto. For example, after searching a predetermined number of candidate pulses or more than the number of encoding target pulses for all tracks constituting the track pair, encoding the pulses adjacent to the pulses selected from the higher priority tracks among the candidate pulses for each track. It can be selected by the target pulse. In this case, when searching for candidate pulses for the mode tracks constituting the track pair, the most significant track selects candidate pulses equal to the number of encoding target pulses (for example, searching for encoding target pulses rather than searching for candidate pulses). It may be.

In FIG. 13, in order to determine the importance of a track, the track is divided into an upper importance track and a lower importance track based on the energy of the track. However, the present invention is not limited thereto. As a criterion for determining a track to be searched first, other criteria may be applied in addition to energy. Even in this case, the tracks of the track pairs may be searched for pulses in the same manner as described with reference to FIG. 13 to determine an encoding target pulse and quantize the information of the pulses.

Meanwhile, the steps of FIG. 13 may be applied in order to all tracks so that an encoding target pulse may be determined in all tracks of a target signal.

The method described with reference to FIGS. 12 and 13 can be applied to a case where the target signal includes all original signal components, such as the case of processing the higher band of the G.718 SWB. In addition, the method of FIG. 12 and FIG. 13 serves to cause the MDCT coefficient to be concentrated in a band having a strong tonal component.

Selecting a pulse adjacent to the pulse position of the higher importance track is effective for signals with tonality and different modes depending on the tonal information, such as G.718 SWB (generic if no tonal component is present). Efficient if you have other modes).

12 and / or 13 in the encoder may be performed in the MDCT-based enhancement layer. The MDCT based enhancement layer may correspond to the SWB encoder of FIG. 1. In addition, the MDCT-based enhancement layer may correspond to the SWB decoder of FIG. 3.

떨어진fallen 위치의 펄스를 검색하는 방법 How to retrieve the pulse of position

Unlike the examples of FIGS. 12 and 13, instead of selecting adjacent pulses between tracks of the track pairs as encoding target pulses, pulses separated from pulses selected from other tracks may be selected as encoding target pulses.

The method of selecting a pulse at a position away from the pulse selected in another track as the encoding target pulse can be effectively used when energy is uniformly distributed in one frame of the target signal. In this case, in order to select the pulse to be encoded within the current track, in the step of comparing the position with the pulses of the other tracks constituting the track pair, the pulse at a position relatively separated from the pulse position of the higher priority track is obtained. You can choose.

In addition, the present method can be effectively used even when there are no modes depending on the tonality.

For convenience of explanation, it is assumed that the tracks are adjacent, but the present invention can be equally applied even when the tracks are separated.

14 is a view schematically illustrating another example of a method of performing a search in consideration of a search result of another track among tracks of a track pair according to the present invention. In the example of FIG. 14, when a search is performed for any one track among tracks constituting a track pair, a method of selecting a pulse away from a pulse searched in other tracks as a pulse to be encoded.

In the example of FIG. 14, it is assumed that two tracks constitute a track pair, and each track constituting the track pair is adjacent to each other.

In addition, the example of FIG. 14 may be performed by the SWB encoder of FIG. 1. In addition, some or all of the steps in the example of FIG. 14 may be performed by the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. For convenience of explanation, the steps of FIG. 14 are performed by the SWB encoder and / or the SWB decoder.

Referring to FIG. 14, first, a target signal is generated (S1400). In this case, the target signal may be pulses to be quantized, that is, MDCT coefficients. The SWB encoder and / or the SWB decoder may generate MDCT coefficients (target signals) to be quantized.

It is determined whether there is a track pair in the generated target signal (S1405). The SWB encoder and / or the SWB decoder determines whether a track pair exists in the MDCT coefficients (target signal) to be encoded.

If there are track pairs in the target signal, the features of the tracks constituting the track pair are extracted (S1410). In this case, the extracted feature is the same feature for the track pair, and may have different values, for example, different values for each track constituting the track pair. For example, the feature to be extracted can be the energy of the track.

According to the value of the feature extracted for each track, the order of importance of the tracks may be determined (S1415). For example, considering a case where track-specific energy is used as a feature extracted for each track, a track having a high energy may be determined as a track of high importance, that is, a track in which a search for pulse is performed first.

When the importance order for the tracks constituting the track pair is determined, candidate pulses may be searched for each track according to the importance order. However, processing in the order according to the feature values is a case of searching, and each process may proceed according to the original track order in other steps later. For example, when the bitstream is formed by quantization, track 0 may be processed first and track 1 may be processed subsequently.

According to the importance order for the tracks, candidate pulses are searched for in the first priority importance track (S1420). The SWB encoder and / or the SWB decoder may search for the first priority track to find pulses having a maximum maximum value.

Among candidate pulses searched in the priority ranking track, more pulses may be selected by a predetermined number l1 than the number L1 of pulses to be encoded in the priority ranking track (S1425). In this case, the number of pulses to be encoded may be the same as the number of pulses searched in the priority tracks when the pulses (maximum sine waves) selected in other tracks of the track pair are not considered.

If the number of pulses to be coded in the priority track is L1 (L1 is an integer greater than 0), then a greater number of pulses (1, l1 are greater than or equal to 0) greater than the number of pulses to be encoded. It can be selected as a candidate pulse of the track. For example, if the number of pulses to be encoded in the priority track is two (if the number of pulses to be searched is not considered when the pulses selected in other tracks are not taken into consideration), 2 + 1 pulses can be selected.

Subsequently, candidate pulses are searched for in the second most important track, which is the next important track (S1430). If the number of pulses to be encoded in the 2nd priority importance track is L2 (L2 is an integer greater than 0), a certain number of pulses (l2 and l2 are greater than or equal to 0) are greater than the number of pulses to be encoded. It can be retrieved as a candidate pulse of the track. For example, if the number of pulses to be coded in the second priority track is two (the number of pulses to be searched when searching by the conventional method is two), 2 + l 2 pulses can be searched.

In this case, the number l2 of additionally searched pulses in the second priority importance track may be equal to or greater than the number l1 of additionally searched pulses in the first priority importance track.

The positions of candidate pulses searched in the 2nd priority importance track and the position of the selected pulses in the 1st priority importance track are compared (S1435).

The SWB encoder and / or the SWB decoder may compare the positions of the L2 + l2 pulses found in the second priority track with the positions of the pulses selected in the first priority track.

If any of the pulses retrieved in the 2nd priority track have pulses at positions away from the selected pulses in the 1st priority track, the two apart pulses will be encoded in one of the pulses to be coded in the 1st priority track and 2nd priority track respectively. One of the pulses may be selected (S1440).

If there is no pulse in the position that is separated from the pulse selected in the first priority track among the pulses searched in the second priority track, the pulse having the largest absolute value (maximum sine wave) may be selected in the second priority track.

For example, a pulse selected in a first priority track among track paired tracks is called P _tp1 (tp1 = 1… L1 + l1), and a pulse searched in a second priority track is referred to as P _tp2 (tp2 = 1… L2 + l2). Let's say. The combination of (L1 + l1) x (L2 + l2) pulses (P _tp1 , P _tp2 ), which can consist of L1 + l1 pulses selected in the 1st priority track and L2 + l2 pulses found in the 2nd priority track, Can be configured.

Among the (L1 + l1) x (L2 + l2) pulse combinations (P _tp1 , P _tp2 ), the combination where P _tp1 and P _tp2 are separated most is (P _{tp1, away} , P _{tp2, away} ). P _{tp1, away} may be determined as the pulse to be encoded in the first priority importance track and P _{tp2, away} may be determined as the pulse to be encoded in the second priority importance track.

In the case of selecting and encoding a plurality of pulses in each track, it is possible to further select a combination of pulses that are separated between the tracks forming the track pair. When there are a plurality of pulse combinations that are separated between the tracks constituting the track pair, the positions of the two pulses among the pulse pairs that are separated may be selected in the order of the furthest distance. For example, in the case of selecting and encoding two pulses for each track, select a pulse pair with the longest distance between two pulses among the pulse pairs, and then select a pulse pair with a longest distance between the two pulses to track the selected pulses. Can be encoded (quantized) separately.

If no pulse combinations exist between track paired tracks, L2 pulses (maximum sine waves) in the order of magnitude are similar to the method of selecting pulses in the 1st priority track, even in the 2nd priority track. You can choose.

Meanwhile, in S1425, it is described that a predetermined number more pulses are selected than the number to be coded in the priority track. However, the present invention is not limited thereto. For example, in the priority rank track of the track pairs, only the same number of pulses as the encoding target pulses (same pulses as the encoding target pulses) may be searched / selected. Therefore, in the case of the present embodiment, L1 pulses can be searched and selected in the case of the first priority track among the track pairing tracks.

In this case, the pulse retrieved from the first priority track among the track pairing tracks is called P _tp1 (tp1 = 1… L1), and the pulse retrieved from the second priority track is called P _tp2 (tp2 = 1… L2 + l2). Is composed of (L1) x (L2 + l2) pulse combinations (P _tp1 , P _tp2 ) that can be composed of L1 pulses selected from the 1st priority track and L2 + l2 pulses retrieved from the 2nd priority track. Can be. _If the position where P _tp1 and P _tp2 are farthest is the combination (P _{tp1, away} , P _{tp2, away} ), then P _{tp1, away} is determined as the pulse to be encoded in the priority track and P _{tp2, away} May be determined as the pulse to be encoded in the second priority importance track. Also, if a plurality of (e.g., two) pulses are to be selected for each track, it is also possible to select the pulse pair at which the two pulses are farthest apart and the second pair of pulses farthest apart.

On the other hand, when no track pair exists in the target signal, the maximum sine wave can be selected by the maximum number of sine waves (pulses) searched in each track (S1445). The SWB encoder and / or the SWB decoder may search for the maximum sine wave (pulse) by the number of pulses to be encoded for each track and select it as an encoding / quantization target pulse without considering the pulse waves searched for in other tracks.

The position / size / sign is quantized with respect to the selected pulses (maximum sine wave) (S1455).

Within the track, the position of the pulse can be changed to quantize information indicating the sign of the selected pulse. The position change is for transmitting only one sign bit per track. If the two selected pulses in the track have the same sign, the pulse with the larger absolute value is placed in the front position. If the two pulses are different, the pulse with the small absolute value is different. This is done by placing in the front position.

Once the position of the selected pulse is determined, the position, magnitude and / or sign of the pulse is quantized. Quantizing the information indicating the sign and magnitude of the pulse may include quantizing the sign of the pulse, magnitude grouping to quantize the pulse amplitude, and quantizing the magnitude of the pulse. Quantization of the size indicating information may be performed based on multi-dimensional vector quantization (VQ), and grouping of sizes may be referred to as a prerequisite for multi-dimensional VQ.

Meanwhile, in operation S1425, a search for a predetermined number more pulses as the candidate pulses than the number of encoding target pulses in the first priority track is described, but the present invention is not limited thereto. For example, in the priority track, the number of pulses larger than the number of encoding target pulses may not be searched as candidate pulses, but only the pulses that are encoding targets (quantization targets) may be searched. That is, unlike the lower priority track, only the M1 pulses may be searched for in the priority track. In this case, step S1425 may not be performed.

In the example of FIG. 14, the searching and selection of candidate pulses are performed for each track constituting the track pair, but the present invention is not limited thereto. For example, after searching a predetermined number of candidate pulses or more than the number of encoding target pulses for all tracks constituting the track pair, encoding the pulses adjacent to the pulses selected from the higher priority tracks among the candidate pulses for each track. It can be selected by the target pulse. In this case, when searching for candidate pulses for the mode tracks constituting the track pair, the most significant track selects candidate pulses equal to the number of encoding target pulses (for example, searching for encoding target pulses rather than searching for candidate pulses). It may be.

In the example of FIG. 14, in order to determine the importance of a track, the track is classified into a higher importance track and a lower importance track based on the energy of the track. However, the present invention is not limited thereto. As a criterion for determining a track to be searched first, other criteria may be applied in addition to energy. Even in this case, the tracks of the track pairs may be searched for pulses in the same manner as described with reference to FIG. 14 to determine an encoding target pulse, and quantize the information of the pulses.

Meanwhile, the steps of FIG. 14 may be applied in order to all tracks so that an encoding target pulse may be determined in all tracks of a target signal.

In addition, the example of FIG. 15 may be performed by the SWB encoder of FIG. 1. In addition, some or all of the steps in the example of FIG. 15 may be performed by the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. For convenience of explanation, the steps of FIG. 15 will be described in the SWB encoder and / or the SWB decoder.

Referring to FIG. 15, a target signal is first generated (S1500). In this case, the target signal may be pulses to be quantized, that is, MDCT coefficients. The SWB encoder and / or the SWB decoder may generate MDCT coefficients (target signals) to be quantized.

It is determined whether there is a track pair in the generated target signal (S1505). The SWB encoder and / or the SWB decoder determines whether a track pair exists in the MDCT coefficients (target signal) to be encoded.

If there are track pairs in the target signal, the features of the tracks constituting the track pair are extracted (S1510). In this case, the extracted feature is the same feature for the track pair, and may have different values, for example, different values for each track constituting the track pair. For example, the feature to be extracted may be the energy of the track.

According to the value of the feature extracted for each track, the order of importance of the tracks may be determined (S1515). For example, considering a case where track-specific energy is used as a feature extracted for each track, a track having a high energy may be determined as a track of high importance, that is, a track in which a search for pulse is performed first.

According to the importance order for the tracks, candidate pulses are searched for in the first priority importance track (S1520). If the number of pulses to be coded in the priority track is P1 (P1 is an integer greater than zero), then more pulses are given the number of pulses (p1, p1 is an integer greater than or equal to 0) than the number of pulses to encode. It can be retrieved as a candidate pulse of the track. For example, if the number of pulses to be coded in the priority track is two (if the number of pulses to be searched is not considered when a pulse selected from another track is not taken into consideration), 2 + p1 pulses may be searched. .

Among the candidate pulses searched in the priority track, the number of pulses corresponding to the number P1 of the pulses to be coded in the priority track may be selected (S1525). In this case, the number of pulses to be encoded may be the same as the number of pulses searched in the priority tracks when the pulses (maximum sine waves) selected in other tracks of the track pair are not considered.

Subsequently, candidate pulses are searched for in the second most important track, which is the next most important track (S1530). If the number of pulses to be coded in the second-order importance track is P2 (P2 is an integer greater than zero), the number of pulses (p2, p2 is an integer greater than or equal to zero) is greater than the number of pulses to encode. It can be retrieved as a candidate pulse of the track. For example, if the number of pulses to be encoded in the second priority importance track is two (the number of pulses to be searched is two when searching by the conventional method), 2 + p 2 pulses can be searched.

In this case, the number p2 of additionally searched pulses in the second priority track may be equal to or greater than the number p1 of additionally searched pulses in the first priority track.

The positions of the candidate pulses searched in the 2nd priority importance track and the position of the selected pulses in the 1st priority importance track are compared (S1535).

The SWB encoder and / or the SWB decoder may compare the positions of the P2 + p2 pulses found in the second priority track with the positions of the pulses selected in the first priority track.

If any of the pulses retrieved in the 2nd priority track have pulses at positions away from the selected pulses in the 1st priority track, the two apart pulses will be encoded in one of the pulses to be coded in the 1st priority track and 2nd priority track respectively. One of the pulses may be selected (S1540).

If there are no pulses in the position that are separated from the pulse selected in the first priority importance track among the pulses searched in the second priority importance track, the pulse having the largest absolute value (sine wave having the maximum value) may be selected. .

For example, _suppose that the pulse selected in the first priority track among the track pairing tracks is P _tp1 (tp1 = 1… P1), and the pulse searched in the second priority track is P _tp2 (tp2 = 1… P2 + p2). . P1x (P2 + p2) pulse combinations P _tp1 and P _tp2 may be configured, which may consist of P1 pulses selected in the priority track and P2 + p2 pulses found in the priority track.

Speaking of P1x (P2 + p2) of the pulse combinations (P _tp1, P _tp2) of from, P _tp1 and P _tp2 is the most combination detached _{_{(P tp1, away, P tp2}} , away), the P _{tp1, away} It may be determined as a pulse to be encoded in the priority track and P _{tp2, away} may be determined as a pulse to be encoded in the priority track.

In the case of selecting and encoding a plurality of pulses in each track, it is possible to further select a combination of pulses that are separated between the tracks forming the track pair. When there are a plurality of pulse combinations that are separated between tracks constituting the track pair, the positions of the two pulses among the pulse pairs that are separated may be selected in the order of the distance. For example, in the case of selecting and encoding two pulses for each track, select a pulse pair with the longest distance between two pulses among the pulse pairs, and then select a pulse pair with a longest distance between the two pulses to track the selected pulses. Can be encoded (quantized) separately.

If no pulse combinations exist between the track pairing tracks, P2 pulses (maximum sine waves) in the order of absolute value are similar to the method of selecting pulses in the 1st priority track in the 2nd priority track. You can choose.

In S1525, only the number of pulses to be coded in the first priority track is selected. However, the present invention is not limited thereto. For example, without performing step S1525, it is also possible to consider both P1 + p1 pulses found in the first priority track and P2 + p2 tracks found in the second priority track among the track pairing tracks.

Specifically, the pulses retrieved from the rank 1 priority track among the track pairing tracks are called P _tp1 (tp1 = 1… P1 + p1), and the pulses retrieved from the rank 2 priority track are referred to as P _tp2 (tp2 = 1… P2 + p2). Let's say A combination of (P1 + p1) x (P2 + p2) pulses (P _tp1 , P _tp2 ) that can consist of P1 + p1 pulses retrieved from the priority track and P2 + p2 pulses retrieved from the priority track. This can be configured. _If the position where P _tp1 and P _tp2 are farthest is the combination (P _{tp1, away} , P _{tp2, away} ), then P _{tp1, away} is determined as the pulse to be encoded in the priority track and P _{tp2, away} May be determined as the pulse to be encoded in the second priority importance track. Also, if a plurality of (e.g., two) pulses are to be selected for each track, it is also possible to select the pulse pair at which the two pulses are farthest apart and the second pair of pulses farthest apart.

The candidate pulses are sequentially searched for tracks according to the importance, and the candidate pulses are searched for in the lowest priority track at the end of the process of selecting an encoding target pulse (S1545). Assuming that a pair of tracks consists of k tracks, for a number of pulses Pk (Pk is an integer greater than 0) to be encoded in the lowest priority track (k rank importance track), the predetermined number (pk) is greater than the number of pulses to be encoded. More pulses can be retrieved as candidate pulses of the lowest priority track. In this case, the number pk of pulses additionally searched in the lowest priority track may be equal to or greater than the number of pulses pk-1 additionally searched in the previous priority track (k-1 priority track).

The positions of candidate pulses searched in the lowest priority track and the positions of the selected pulses in the high rank priority track are compared (S1550).

The SWB encoder and / or the SWB decoder may compare the positions of the Pk + pk pulses found in the lowest priority track with the positions of the pulses selected in the previous priority track.

If among the pulses retrieved in the lowest priority track, there is a pulse at a position apart from the pulse selected in the previous priority importance track, the two apart pulses are each one of the pulses to be encoded in the lowest priority track (k rank importance track) and One of the pulses to be encoded in the previous rank importance track (k-1 rank importance track) may be selected (S1355).

For example, among the track paired tracks, the pulse selected in the previous rank importance track is referred to as P _tpk-1 (tpk-1 = 1… Pk-1), and the pulse searched in the least significant priority track is referred to as P _tpk (tpk = 1…. Pk + pk). Pk-1 x (Pk + pk) pulse combinations (P _tpk-1 , P _tpk ) that can be composed of Pk-1 pulses selected in the previous priority importance track and Pk + pk pulses found in the lowest priority track. Can be configured.

If the combination of pulses where P _tpk-1 and P _tpk are furthest from each other is (P _{tpk-1, away} , P _{tpk, away} ), then P _{tpk-1, away} is determined as the pulse to be encoded in the previous rank importance track. And P _{tpk, away} can be determined as the pulse to be coded in the lowest priority track.

In the case where a plurality of pulses are selected and encoded in each track, a combination of pulses separated between tracks constituting a track pair may be further selected. For example, when there are a plurality of combinations of pulses located apart between tracks that make up a pair of tracks, a combination of pulses having the longest distance between the pulses of the two tracks and then a combination having the greatest distance between the pulses may be sequentially selected. .

If there are no adjacent pulse combinations between the track pairing tracks, then Pk pulses (maximum sine waves) can be selected in the order of largest absolute value in the least significant importance track.

On the other hand, when no track pair exists in the target signal, the maximum sine wave can be selected by the maximum number of sine waves (pulses) searched in each track (S1560). The SWB encoder and / or the SWB decoder may search for the maximum sine wave (pulse) by the number of pulses to be encoded for each track and select it as an encoding / quantization target pulse without considering the pulse waves searched for in other tracks.

The position / size / sign is quantized with respect to the selected pulses (maximum sine wave) (S1565).

Once the position of the selected pulse is determined, the position, magnitude and / or sign of the pulse is quantized

Meanwhile, in operation S1520, a search for a predetermined number more pulses as the candidate pulses than the number of encoding target pulses in the priority track is described, but the present invention is not limited thereto. For example, in the priority track, the number of pulses larger than the number of encoding target pulses may not be searched as candidate pulses, but only the pulses that are encoding targets (quantization targets) may be searched. In other words, unlike the lower priority track, only the P1 pulses may be searched for in the priority track. In this case, step S1525 may not be performed.

In the example of FIG. 15, the searching and selection of candidate pulses are performed for each track constituting the track pair, but the present invention is not limited thereto. For example, after retrieving a predetermined number of candidate pulses more than the number of pulses to be coded for all tracks constituting the track pair, encoding the pulses far from the pulse selected in the higher priority track among the candidate pulses for each track. It can be selected by the target pulse. In this case, when searching for candidate pulses for the mode tracks constituting the track pair, the most significant track selects candidate pulses equal to the number of encoding target pulses (for example, searching for encoding target pulses rather than searching for candidate pulses). It may be.

In FIG. 15, in order to determine the importance of a track, the track is classified into an upper importance track and a lower importance track based on the energy of the track. However, the present invention is not limited thereto. As a criterion for determining a track to be searched first, other criteria may be applied in addition to energy. Even in this case, the tracks of the track pairs may be searched for pulses in the same manner as described with reference to FIG. 15 to determine an encoding target pulse, and quantize the information of the pulses.

Meanwhile, the steps of FIG. 15 may be applied to all tracks in order so that the encoding target pulse may be determined in all tracks of the target signal.

The method of FIGS. 14 and 15 may be effective when the difference signal (difference, residual) is a target after encoding, without targeting the original signal during encoding for the G.718 SWB upper band. In addition, a basic core such as G.718 WB may be applied to encode an uncoded signal.

After the input signal is filtered by the HP filter 1610, the WB signal is input to a basic core 1620. The signal output from the basic core 1620 may be encoded and transmitted in a bitstream.

The difference between the signal decoded in the basic core 1620 and the original HP filtered signal may be processed in the MDCT-based enhancement layer 1630 and then output as a bitstream. In this case, the enhancement layer 1630 may correspond to the super wide band (SWB) encoder of FIG. 1.

프레임의 에너지를 기반으로 펄스를 검색하는 방법How to retrieve pulses based on the energy of the frame

When the target signal is generated, it is possible to determine whether to select a pulse adjacent to a pulse of another track in a pair of tracks as an encoding target pulse or to select pulses apart from pulses of another track in the track pair as encoding target pulses.

In this case, an energy distribution may be used as a feature of the target signal as a reference for determining how to select an encoding target pulse. In this case, the tonality determination unit of FIG. 1 may determine how to select an encoding target pulse.

For example, when a tonal component is present in the target signal or when the energy distribution of the target signal is concentrated in one band, a method of selecting a pulse adjacent to a pulse selected from another track of a track pair as an encoding target pulse may be used. If the target signal has no tonal component or the energy distribution of the target signal is uniform, a method of selecting a pulse that is separated from a pulse selected from other tracks of the track pair as the encoding target pulse may be used.

Therefore, in the present embodiment, when the target signal is generated, information indicating whether to select a pulse adjacent to the selected pulse or a pulse apart from the selected track in another track of the pair of tracks extracts a feature of the target signal (eg, FIG. 1). It may be input to a module (for example, the SWB encoder of FIG. 1) for selecting a pulse for each track from the tonerity determination unit of FIG.

In the example of FIG. 17, it is assumed that two or more tracks constitute a track pair, and that each track of the track pair is adjacent to each other.

In addition, the example of FIG. 17 may be performed by the SWB encoder of FIG. 1 and / or the SWB decoder of FIG. 3. For example, the operation may be performed in at least one of a sine wave mode unit and an additional sine wave unit of the SWB encoder and / or the SWB decoder. In addition, the determination and the indication of whether to select the adjacent pulse in FIG. 17 may be performed by the tonality determination unit of FIG. 1 and the SWB decoding unit of FIG. 3. For convenience of explanation, the steps of FIG. 17 are described by the SWB encoder and / or the SWB decoder.

Referring to FIG. 17, first, a target signal is generated (S1700). In this case, the target signal may be pulses to be quantized, that is, MDCT coefficients.

Next, the feature of the target signal is extracted (S1705). Depending on the features of the extracted target signal, when searching / selecting a pulse for a pair of tracks, it may be determined whether to select a pulse adjacent to a pulse of another track or a pulse apart. In this case, the feature of the extracted target signal may be tonality or may be a distribution of energy.

When determining the tonality as a characteristic of the target signal, if the target signal is tonal, it may be instructed to select a pulse adjacent to the selected pulse in another track of the track pair. In addition, if the target signal is not tonal, it may be instructed to select a pulse away from the selected pulse in another track of the track pair.

When determining the energy distribution as a characteristic of the target signal, when the energy of the target signal is concentrated in a specific band, it may be instructed to select a pulse adjacent to a selected pulse in another track of the track pair. In addition, when the energy of the target signal is evenly concentrated, it may be instructed to select a pulse away from the selected pulse in another track of the track pair.

In FIG. 17, the following steps are the same as described with reference to FIGS. 13 and 15. However, which track to select from the tracks constituting the track pair may be adaptively determined at the step of selecting each track.

For example, in step S1705, for a pair of tracks, it is determined whether to select a pulse adjacent to a pulse selected in another track or a pulse away from a pulse selected in another track, and the determined information selects a pulse in the most significant track, or When delivered to the previous stage, the pulses can be selected in the same way on each track of the track pair.

If it is determined to select adjacent pulses, the pulses may be retrieved / selected by the method according to the example of FIG. If it is determined to select the distant pulses, the pulses can be retrieved by the method according to the example of FIG.

Further, when it is determined in step S1705 for a pair of tracks to select a pulse adjacent to a pulse selected in another track or a pulse away from a pulse selected in another track, how to select a pulse for each track may be determined. In this case, when information on how to select a pulse is transmitted to the step of selecting a pulse for each track, the pulse may be selected according to the method indicated by the transferred information.

For example, when the current track is instructed to select a pulse adjacent to a selected pulse in another track, the pulse may be searched / selected by the method according to the example of FIG. 13. In addition, in the current track, when it is instructed to select a pulse at a position away from the selected pulse in another track, the pulse may be searched / selected by the method according to the example of FIG. 15.

In the example of FIG. 17, a case where each track is instructed as to whether adjacent pulses or separated pulses should be selected will be described as an example.

Returning to the example of 17 again, it is determined whether a track pair exists (S1710). A track pair may consist of tracks having the same steps (pulse intervals) and whose position in the track is adjacent to each other with the position of the pulse in the adjacent track.

If there are track pairs in the target signal, the features of the tracks constituting the track pair are extracted (S1715). For example, the feature to be extracted may be the energy of the track.

According to the value of the feature extracted for each track, the order of importance of the tracks may be determined (S1720). For example, when the feature extracted for each track is energy for each track, a track having a high energy may be determined as a track having a high importance, that is, a track in which a search for pulse is performed first.

According to the importance order for the tracks, candidate pulses are searched for in the first priority importance track (S1725). If the number of pulses to be encoded in the priority track is Q1 (Q1 is an integer greater than zero), the number of pulses (q1, q1 is an integer greater than or equal to 0) is greater than the number of pulses to be encoded. It can be retrieved as a candidate pulse of the track.

Among the candidate pulses searched in the priority ranking track, as many pulses as the number Q1 of pulses to be coded in the priority ranking track may be selected (S1730). In this case, the selection of the pulse to be encoded may be performed according to the method determined in S1705 based on the feature of the target signal. In addition, for the most significant track, the encoding target pulse may be selected without considering the relationship with the pulse selected in the other track.

In the priority ranking track, an encoding target pulse may be selected based on an absolute value of the pulse.

Subsequently, candidate pulses are searched for in the second rank importance track (S1735). If the number of pulses to be coded in the second-order importance track is Q2 (Q2 is an integer greater than zero), the number of pulses is greater than the number of pulses to be encoded (q2, q2 is an integer greater than or equal to zero). It can be retrieved as a candidate pulse of the track.

In this case, the number q2 of pulses additionally searched in the second priority track may be equal to or greater than the number q1 of pulses additionally searched in the first priority track.

The positions of the candidate pulses searched in the 2nd priority importance track and the position of the selected pulses in the 1st priority importance track are compared (S1840).

Pulses to be encoded are selected among the pulses searched in the second priority importance track based on the positional relationship with the pulse selected in the first priority importance track (S1745). In this case, the selection of the pulse to be encoded in the second priority track may be performed according to the method determined in S1705 based on the feature of the target signal. For example, if it is instructed to select a pulse adjacent to the selected pulse in the priority priority track, the pulse may be selected by the method according to the example of FIG. 13. Further, when instructed to select a pulse at a position away from the selected pulse in the priority track, the pulse may be selected by the method according to the example of FIG. 15.

The candidate pulses for each track are sequentially searched according to the importance, and candidate pulses are searched for in the lowest priority track at the end of the process of selecting an encoding target pulse (S1750). Assuming that a pair of tracks consists of k tracks, for a number Qk (Qk is an integer greater than 0) to be encoded in the lowest priority track (k rank importance track), the predetermined number qk is greater than the number of pulses to be encoded. More pulses can be retrieved as candidate pulses of the lowest priority track. In this case, the number qk of pulses additionally searched in the lowest priority track may be equal to or greater than the number qk-1 of pulses additionally searched in the previous rank importance track (k-1 rank priority track).

The positions of the candidate pulses searched in the lowest priority track and the positions of the selected pulses in the high priority track are compared (S1755).

Pulses to be encoded in the lowest priority track (k rank priority track) may be selected based on the positional relationship with the pulse selected in the previous rank priority track (k-1 rank priority track) (S1760).

In this case, the selection of the pulse to be encoded in the k rank importance track may be performed according to the method determined in S1705 based on the feature of the target signal. For example, if the k-1 rank importance track is instructed to select a pulse adjacent to the selected pulse, the pulse may be selected by the method according to the example of FIG. In addition, when instructed to select a pulse at a position away from the selected pulse in the k-1 rank importance track, the pulse may be selected by the method according to the example of FIG. 15.

On the other hand, when no track pair exists in the target signal, the maximum sine wave can be selected by the number of maximum sine waves (pulses) searched in each track (S1765). The SWB encoder and / or the SWB decoder may search for the maximum sine wave (pulse) by the number of pulses to be encoded for each track and select it as an encoding / quantization target pulse without considering the pulse waves searched for in other tracks.

The position / size / sign is quantized with respect to the selected pulses (maximum sine wave) (S1770).

Meanwhile, in operation S1725, a search for a predetermined number more pulses as the candidate pulses than the number of encoding target pulses in the first priority track is described, but the present invention is not limited thereto. For example, in the priority track, the number of pulses larger than the number of encoding target pulses may not be searched as candidate pulses, but only the pulses that are encoding targets (quantization targets) may be searched. In other words, unlike the lower priority track, only the P1 pulses may be searched for in the priority track. In this case, step S1725 may not be performed.

In the example of FIG. 17, the searching and selection of candidate pulses is performed for each track constituting the track pair, but the present invention is not limited thereto. For example, after searching for a predetermined number of candidate pulses or more than the number of encoding target pulses for all tracks constituting the track pair, the encoding target pulse is based on the positional relationship with the pulse selected in the higher priority track for each track. You can also select. At this time, the selection of the pulse in each track may be performed according to the method determined in S1705 based on the characteristics of the target signal. In this case, the same method may be applied to each track, or different methods may be applied.

Further, in the most significant importance track, a candidate pulse equal to the number of encoding target pulses may be selected (for example, the encoding target pulse is searched instead of the candidate pulse search).

In FIG. 17, in order to determine the importance of a track, the track is divided into an upper importance track and a lower importance track based on the energy of the track. However, the present invention is not limited thereto. As a criterion for determining a track to be searched first, other criteria may be applied in addition to energy. Even in this case, the tracks of the track pairs may be searched for pulses in the same manner as described with reference to FIG. 17 to determine an encoding target pulse and quantize the information of the pulses.

Meanwhile, the steps of FIG. 17 may be applied to all tracks in order so that an encoding target pulse may be determined in all tracks of a target signal.

CELP 모드에서 펄스를 검색하는 방법How to Search for Pulses in CELP Mode

In addition to the MDCT-based sine wave mode, even when the CELP (Code Excited Linear Prediction) mode is applied, the encoding target pulse can be searched according to the present invention.

Encoding and decoding methods performed by the CELP mode are the same as described with reference to FIGS. 2 and 4.

In sine wave mode, candidate pulses were searched using the absolute values of pulses based on Equation 2. In contrast, in the CELP mode, candidate pulses may be selected based on a convolution value with an impulse response of the LPC synthesis filter. For example, a candidate pulse may be searched for a pulse having a minimum mean square error (MSE) between an impulse response, a convolved pulse value, and a target signal in a current track.

In the example of FIG. 18, it is assumed that two or more tracks constitute a track pair, and that each track constituting the track pair is adjacent.

In addition, the example of FIG. 18 may be performed by the core encoder in the encoder of FIG. 2 and / or the core decoder in the decoder of FIG. 4. For convenience of description, the encoder and / or the decoder will be described as performing each step of FIG. 18.

Referring to FIG. 18, a target signal is first generated (S1800). In this case, the generated target signal may be a signal passed through a weighting filter or a signal after an adaptive codebook search in the CELP mode, that is, a new signal from which the influence of the adaptive codebook is removed from the audio signal. In other words, when the CELP mode is applied, the target signal may be a signal excluding a signal synthesized from (1) an audio signal and (2) a coded adaptive codebook.

Subsequently, track-specific energy of tracks constituting the track pair with respect to the target signal is calculated (S1805). A track pair may consist of tracks having the same steps (pulse intervals) and whose position in the track is adjacent to each other with the position of the pulse in the adjacent track. The track-specific energy can be used as a criterion for determining in what order the tracks are to be searched. Here, the energy for each track is used as a reference, but other characteristics other than energy may be calculated and used as a reference for determining the search order.

The calculated track-specific energies are compared (S1810). By comparing the energy of the tracks, a track with higher energy may be determined as a track of high importance. Therefore, the track with the highest energy among the tracks constituting the track pair can be searched first as the first track. The track with the second highest energy is then determined as the second rank track to be searched for the second time, and can be determined up to the lowest rank track according to the energy magnitude.

The determined rank is a rank for pulse retrieval, and may proceed in the original track order when the retrieved pulse is quantized and the bitstream is constructed.

The MSE is calculated for each pulse with respect to the first priority track having the highest importance level (S1815). For each pulse position of the first track, a pulse whose MSE is minimum for the target signal is selected as a candidate pulse of the first track using a convolution value with the impulse response. In this case, the MSE for the target signal may be an MSE (Mean Square Error) between a value of the target signal and a value obtained by convolving a candidate pulse with an impulse response.

The codebook can be used in the process of obtaining the MSE. The codebook specifies where in the track there may be pulses.

In the first track, set only the target pulses to be calculated for MSE (Put the amplitude signal (eg, the signal of magnitude 1) only at the position of the target pulse, and set the pulse size to 0 at the position of other pulses. By performing convolution with the impulse response, the MSE between the convolution value and the target signal in the first track can be calculated for each pulse.

In the high priority track, a predetermined number of pulses for minimizing MSE for the target signal are selected (S1820). Unlike the MDCT-based case, for all pulses in a track, the impulse response and the confluence of each pulse and the MSE between the target signals are obtained, and the predetermined number of searches for the first track in order of decreasing MSE magnitudes. Pulses can be selected. That is, the predetermined number of pulses may be selected in the order of the smallest difference from the target signal.

If the number of candidate pulses selected in the 1st track is C1 (C1 is an integer greater than 0), C1 pulses are generated from the pulse with the smallest MSE for the target signal to the pulse with the smallest MSE for the target signal. Can be selected.

The position of the selected pulses in the first rank track is fixed, and the MSE for the target signal is calculated at the positions of the pulses in the second rank track (S1825).

By setting only the target pulse to calculate the MSE in the second rank track and performing convolution with the impulse response, the MSE between the convolution value and the target signal in the second rank track can be calculated for each pulse. In order that the calculated MSE is small, a predetermined number more pulses than the number of pulses to be encoded may be selected as candidate pulses of the second rank track.

Subsequently, the pulses selected in the 1st rank track exist in each position, and among the pulses in the 2nd rank track, only the pulse which is the current MSE calculation target is present. Convolution with the impulse response is performed by setting the pulse of the unit size only at the position of the MSE calculation target pulse and setting the pulse size of the other position to 0). In this way, the MSE between the convolution value and the target signal in the second rank track may be calculated for each pulse in consideration of the pulses selected in the first rank track.

In order of decreasing MSE values in the second rank track calculated in consideration of the first rank track, pulses are selected in the second rank track by the number of pulses to be encoded in the second rank track.

For example, assume that the number of pulses to be encoded in the second rank track is C2 (C2 is an integer greater than 0) and additionally c2 (c2 is an integer of 0 or more) pulses are searched. In this case, the C2 + c2 second rank track pulses are convolved with the impulse response, respectively, along with the C1 first rank track pulses. C2 pulses may be selected as encoding target pulses of the second rank track in the order in which the MSE between each pulse of the second rank track convolved with the pulses of the first rank track and the target signal is small.

The priority tracks from the 3rd priority track to the following priority tracks may search for candidate pulses and select a pulse to be encoded from the candidate pulses. For example, the MSE between the target signal and the value of the convolution with the pulses selected in the first rank track and the pulses selected in the second rank track after searching a predetermined number of candidate pulses in the third rank track more than the number of pulses to be encoded. Can be calculated. In the third rank track, the encoding target pulse may be selected based on the MSE value calculated in consideration of the pulses selected in the first rank track and the second rank track.

The candidate pulses for each track are sequentially searched according to the importance, the pulses to be encoded are selected, and the candidate pulses are searched up to the lowest priority track.

The pulses of the upper major tracks are fixed and the MSE is calculated at the pulse position of the lowermost track (S1835). Assuming that a pair of tracks consists of k tracks, for the number of pulses Ck (Ck is an integer greater than 0) in the lowest priority track (k rank importance track), the predetermined number (ck, ck is an integer greater than or equal to 0). More pulses can be retrieved as candidate pulses of the lowest priority track. For example, the MSE between the convolution value and the target signal in the lowest track can be calculated for each pulse by setting that only the target pulse to calculate the MSE exists in the lowest track and performing convolution with the impulse response. have. For the lowest rank track, Ck + ck candidate pulses can be retrieved in order of decreasing MSE.

Subsequently, the pulses selected in the rank 1 to k-1 rank tracks exist at each position, and among the pulses in the k rank (lowest rank) track, only the pulses currently being calculated for MSE are set (1 rank to k). The convolution with the impulse response is performed by setting the unit pulses at the position of the selected pulses in the -1 rank tracks and the position of the current MSE calculation target pulse in the k rank tracks and setting the pulse size at other positions to 0). do. In this way, the MSE between the convolution value and the target signal in the lowest rank track may be calculated for each pulse in consideration of the pulses selected in the previous rank tracks.

In the lowest energy track having small energy, MSEs may be selected from candidate pulses having small energy by comparing MSEs considering pulses of tracks with large energy and already searched (S1840). That is, pulses are selected in the least significant track by the number of pulses to be encoded in the least significant track, in order of decreasing MSE value in the least significant track calculated in consideration of the previous rank track.

Information of the pulses selected in the entire track is quantized (S1845). The information of the quantized pulses may include at least one of a position of the pulse, a magnitude of the pulse, and a sign of the pulse.

In the example of FIG. 18, the searching and selection of candidate pulses is performed for each track constituting the track pair, but the present invention is not limited thereto. For example, after searching for a predetermined number of candidate pulses or more than the number of pulses to be encoded for all tracks constituting the track pair, a convolution with an impulse response may be obtained by including the pulses selected from the higher importance tracks for each track. It may be.

In the case of the base layer (layer 6) to which the CELP mode is applied, at least two or more tracks have a structure in which track pairs can be configured. Therefore, in the example of FIG. Need not be performed.

In contrast, when the CELP mode is applied to the enhancement layer (layer 7 or layer 8), the track pair may not exist, and thus, the existence of the track pair may be determined before comparing the energy of each track. If no track pair exists, the encoding target pulses may be selected in order of decreasing MSE between the convolution of each pulse and the original signal for each track.

The method of FIG. 18 may also be applied to the encoders of FIGS. 1 and 16. However, when the CELP-based embodiment of FIG. 18 is applied, the SWB encoder 130 of FIG. 1 may be converted to a CELP-based enhancement layer unit, and the MDCT-based enhancement layer unit 1630 of FIG. 16 is also CELP-based. It can be switched to the enhancement layer unit.

The enhancement layer unit may process higher layers of layer 6 or more that process the SWB signal. In the MDCT-based enhancement layer unit, layers 6 or more layers may be processed based on MDCT, and in the CELP-based enhancement layer unit, layers 6 or more layers may be processed based on CELP.

Meanwhile, the steps of FIG. 18 may be applied to all tracks in order so that the encoding target pulse may be determined in all tracks of the target signal.

In the above-described embodiments, the determination of the candidate pulse is referred to as 'search', and the determination of the encoding target pulse is referred to as 'selection'. However, the present invention is not limited thereto, and 'search' and 'selection' may be used interchangeably. have. For example, candidate pulses may be retrieved or selected.

Referring to FIG. 19, the encoder determines an encoding target pulse (S1910). The encoder may determine the importance of the tracks constituting the track pair according to the track-specific energy of the audio signal, and determine the encoding target pulse by searching for the pulses from the track of high importance.

In this case, the encoder may select (1) a pulse at a position adjacent to a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair as an encoding target pulse of the current track. Also, the encoder may select (2) the pulse of the position furthest from the pulse selected as the encoding target pulse in the track of the higher importance among the tracks constituting the track pair as the encoding target pulse of the current track.

The encoder may search for the pulses to be encoded in the tracks of the most significant importance (search for the same number of pulses as the pulses to be encoded), and may search for a predetermined number more pulses than the number of the pulses to be encoded in the tracks below the most significant importance. In this case, the pulses may be searched in the order of increasing absolute value. The magnitude of the absolute value may be determined based on Equation 2.

In the case of using the method of (1), the encoder may select, as the encoding target pulses of the current track, pulses adjacent to the pulse selected as the encoding target pulse in the track of the higher importance among the retrieved pulses as described above. In the case of using the method (2), as described above, the encoder may select, as the encoding target pulse of the current track, a pulse that is farthest from the pulse selected as the encoding target pulse in the track of the higher importance among the retrieved pulses.

In the methods of (1) and (2), when there are a plurality of combinations of selectable pulses, the encoding target pulses can be selected in the order of the largest absolute value among the pulses found in the current track.

When using the method of (1), even when there are no adjacent pulses, the encoding target pulses may be selected in the order of the largest absolute value among the pulses searched in the current track.

The method of (1) is as described in detail in the examples of FIGS. 12 and 13. The method of (2) is as described in detail in the examples of FIGS. 14 and 15.

Meanwhile, as described above, the encoder may select the encoding target pulse based on the positional relationship with the pulse selected as the encoding target pulse in the track of the higher importance among the tracks constituting the track pair. In this case, the selection criteria of the encoding target pulse may be adaptively determined.

For example, when the energy of tracks constituting the track pair is evenly distributed, the encoder selects, as the encoding target pulse of the current track, pulses adjacent to the pulse selected as the encoding target pulse in the track of higher importance among the tracks constituting the track pair. When the energy of the tracks constituting the track pair is concentrated in a specific band, pulses that are separated from the pulse selected as the encoding target pulse in the track of higher importance among the tracks constituting the track pair are used as the encoding target pulse of the current track. You can choose.

In addition, when the tracks constituting the track pair are tonal, the coder uses pulses that are separated from the pulse selected as the encoding target pulse in the track of higher importance among the tracks constituting the track pair as the encoding target pulse of the current track. If the tracks constituting the track pair are not tonal, the pulses adjacent to the pulse selected as the encoding target pulse in the track of higher importance among the tracks constituting the track pair may be selected as the encoding target pulse of the current track.

The related method is as described in detail in the example of FIG. 17.

In addition, the encoder may select the encoding target pulse of the current track in order of decreasing mean square error (MSE) between the 'convolution' and the 'audio signal' based on the pulse response selected based on the pulses selected as the encoding target pulse in the track of higher importance. You can also choose.

In this case, the convolution may be a convolution of a pulse selected as a pulse to be encoded in a track of higher importance and one of the pulses searched in the current track with an impulse response.

Convolution may also be used in the process of searching for pulses in each track. In this case, the convolution may be a convolution of an impulse response and a specific pulse in the current track. The MSE between this convolution and the audio signal can be retrieved as candidate pulses of the current track in small order.

The related method is as described in detail in the example of FIG. 18.

On the other hand, the position of the pulse in the track can be changed in consideration of the sign of the pulse. The content is as described above.

The encoder quantizes the selected encoding target pulse (S1920). Quantized pulses may be encoded and transmitted or stored in a bitstream.

Referring to FIG. 20, the decoder generates a pulse for an audio signal (S2010). The pulse for the audio signal may be derived from the received audio data based on dequantization.

The pulses were searched or selected from the tracks having the highest importance among the tracks constituting the track pair in the audio signal. (1) The positional relationship with the pulse selected as the encoding target pulse in the track of the higher importance among the tracks constituting the track pair. (2) pulses in which the convolution with the impulse response based on the pulses selected as encoding target pulses in the track of higher importance and the MSE (Mean Square Error) between the audio signals are selected in order of decreasing order. It may be.

In the case of pulses corresponding to (1), the pulses at positions adjacent to the pulse selected as the encoding target pulse in the track of higher importance may be pulses for the current track. The related contents are the same as those described in detail with reference to FIGS. 12 and 13.

In addition, in the case of the pulses corresponding to (1), the pulses farthest from the pulse selected as the encoding target pulse in the track of the higher importance may be pulses for the current track. The related contents are the same as those described with reference to FIGS. 14 and 15.

Whether the pulse corresponds to (1) or the pulse corresponding to (2) may be adaptively determined according to the characteristics of the audio signal. In this regard, the contents are the same as those described in the example of FIG. 17.

The pulses corresponding to (2) are pulses selected in order of decreasing mean square error (MSE) between 'convolution' and 'audio signal' with the impulse response based on the pulses selected as encoding target pulses in the track of higher importance. Can be entered. In this case, the convolution may be a convolution of one pulse among pulses selected as encoding target pulses in a track of higher importance and pulses searched in the current track. In addition, the pulses retrieved in the current track may be pulses selected in order of MSE between the convolution with the impulse response and the audio signal. (2) has been described with reference to the example of FIG.

On the other hand, the position of the pulse in the track may be a position changed in consideration of the sign of the pulse. The content is as described above.

The decoder may reconstruct the audio signal based on the generated pulses (S2020).

In the above examples, the methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of steps, and any steps may occur in a different order or simultaneously from other steps as described above. have. In addition, the above-described embodiments include examples of various aspects. For example, the above-described embodiments may be implemented in combination with each other, which also belongs to the embodiments according to the present invention. The invention includes various modifications and changes in accordance with the spirit of the invention within the scope of the claims below.

Claims

Determining importance of tracks constituting a track pair according to track-specific energy of an audio signal, and searching for pulses from the track of high importance to determine an encoding target pulse; And
And quantizing the information of the determined encoding target pulse.
The method of claim 1, wherein the determining of the encoding target pulse comprises:
And a pulse at a position adjacent to a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair as an encoding target pulse of the current track.
The method of claim 2, wherein the pulses to be encoded are searched for the tracks of the highest importance, and the predetermined number more pulses are searched for the tracks less than the highest importance.
And pulses adjacent to the pulse selected as the encoding target pulse in the track of the higher importance among the retrieved pulses are selected as the encoding target pulse of the current track.
4. The audio encoding method according to claim 3, wherein the pulses are searched in the order of increasing absolute value.
The method according to claim 3, wherein when there are a plurality of adjacent pulses adjacent to the pulse selected as the encoding target pulse in the track of higher importance, the absolute values of the pulses are selected as the encoding target pulse of the current track. Audio signal coding method.
The method of claim 3, wherein when there are no pulses adjacent to a pulse selected as an encoding target pulse in the track of the higher importance among the pulses searched in the current track,
The audio signal encoding method of claim 1, wherein pulses having a large absolute value among the pulses found in the current track are sequentially selected as pulses to be encoded.
The method of claim 1, wherein the determining of the encoding target pulse comprises:
And selecting a pulse at a position farthest from a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair as an encoding target pulse of the current track.
The method of claim 7, wherein the encoding target pulses are searched for tracks of the highest importance level, and a predetermined number more pulses are searched for the tracks of the highest importance level than the number of encoding target pulses.
And selecting a pulse at a position farthest from the pulse selected as the encoding target pulse in the track of the higher importance among the retrieved pulses as the encoding target pulse of the current track.
10. The audio signal encoding method of claim 8, wherein the pulses are searched in the order of increasing absolute value.
9. The method of claim 8, wherein, when a plurality of pulses are selected as encoding target pulses in the current track, the encoding target pulses of the current track are farther from the pulses selected as encoding target pulses in the track of higher importance. The audio signal encoding method characterized by the above-mentioned.
The method of claim 1, wherein the determining of the encoding target pulse comprises:
And an encoding target pulse is selected based on a positional relationship with a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair.
12. The method of claim 11, wherein when energy of tracks constituting the track pair is evenly distributed, pulses adjacent to a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair are adjacent to the current track. Is selected as the pulse to be encoded,
When the energy of the tracks constituting the track pair is concentrated in a specific band, pulses that are separated from the pulse selected as the encoding target pulse in the track of higher importance among the tracks constituting the track pair are used as the encoding target pulse of the current track. Audio signal encoding method characterized in that the selected.
12. The method of claim 11, wherein when the tracks constituting the track pair are tonal, pulses apart from a pulse selected as an encoding target pulse in a track of higher importance among the tracks constituting the track pair are the current track. Is selected as the encoding target pulse of,
If the tracks constituting the track pair are not tonal, pulses adjacent to a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair are selected as encoding target pulses of the current track. An audio signal encoding method.
The method of claim 1, wherein the determining of the encoding target pulse comprises:
Convolution with an impulse response based on pulses selected as encoding target pulses in a track of higher importance and an MSE (Mean Square Error) between the audio signals are selected as encoding target pulses of the current track in small order. Signal coding method.
15. The audio signal encoding method of claim 14, wherein the convolution convolves one of the pulses selected as encoding target pulses in the higher importance track and one of the pulses retrieved in the current track with an impulse response.
16. The method of claim 15, wherein the pulses retrieved in the current track are:
And an MSE between the convolution with an impulse response and the audio signal is a pulse selected in small order.
Generating pulses that have been selected from tracks of high importance among tracks constituting a track pair of the audio signal based on inverse quantization; And
Restoring a speech signal based on the pulses;
And the higher the energy of the track, the higher the importance of the track.
The method of claim 17, wherein the pulses,
And pulses selected based on a positional relationship with a pulse selected as an encoding target pulse in a track of higher importance among tracks constituting the track pair.
The method of claim 17, wherein the pulses,
A convolution with an impulse response based on pulses selected as encoding target pulses in a track of a higher importance level and an MSE (Mean Square Error) between the audio signals are pulses selected in a small order.