WO2018073486A1

WO2018073486A1 - Low-delay audio coding

Info

Publication number: WO2018073486A1
Application number: PCT/FI2016/050744
Authority: WO
Inventors: Adriana Vasilache; Anssi Sakari RÄMÖ
Original assignee: Nokia Technologies Oy
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2018-04-26

Abstract

According to an example embodiment, a technique for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal is provided. In an example, the technique comprises quantizing the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm, detecting a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, determining, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence, and quantizing the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

Description

Low-delay audio coding TECHNICAL FIELD

The example and non-limiting embodiments of the present invention relate to low- delay coding of audio signals at high sound quality. In particular, some embodiments of the present invention relate to lattice vector quantization of a signal that represents a segment of an audio signal.

BACKGROUND

Development of speech and audio coding techniques has evolved into solutions that enable high compression ratio at a good sound quality across input audio signals of various characteristics and across a wide-range of encoding bit-rates. Typically, achieving a high compression ratio in an audio coding technique that operates on a full-band audio signal (typically employing a sampling frequency of 48 kHz) requires usage of a relatively long analysis window in a range of 150 milliseconds (ms) or above to ensure sufficient sound quality. Consequently, a coding delay (which is also commonly referred to as an algorithmic delay) of such audio coding techniques is in the range of 150 ms or above. Examples of commonly employed audio coding techniques of this type include e.g. MPEG-1/MPEG-2 audio layer 3 (MP3) and MPEG-2/MPEG-4 advanced audio coding (AAC).

When such an audio coding technique is applied in an audio processing system that involves e.g. capturing and processing an audio signal and related processing, encoding the captured/processed audio signal, transmitting the encoded audio signal from one entity to another, decoding the received encoded audio signal and reproducing the decoded audio signal, the overall processing delay typically increases clearly beyond the mere coding delay, thereby rendering such audio coding techniques unsuitable for applications that cannot tolerate long latency such as telephony, wireless microphones or audio co-creation systems. Speech coding techniques, such as adaptive multi-rate (AMR), adaptive multi-rate wideband (AMR-WB) and 3GPP enhanced voice services (EVS) employ coding delay in the range of 25 to 32 ms, which makes them somewhat better suited for some latency-critical applications, including conversational applications such as mobile telephony and/or voice over internet protocol (VoIP). However, although enabling high compression ratio, these coding techniques are speech coding techniques that make use of some characteristics of human voice and that operate on bandwidth-limited audio signals at a relatively low-bitrates, thereby providing an audio quality that is not well-suited for applications that require high-quality full-band audio and/or carry audio content different from human voice. There are also speech coding techniques such as. ITU-T G.726, G.728 and G.722 that enable very low coding delay even in a range below 1 ms, but also these coding techniques operate on voice band (e.g. at 8 or 16 kHz sampling frequency) and provide a rather modest compression ratio. Some recently introduced audio coding techniques such as Opus (in a low-delay mode) and AAC-ULD enable relatively low coding delay in a range from 2.5 to 20 ms for full-band audio at a relatively good sound quality. As an example, assuming sampling frequency of 32 kHz, the AAC-ULD coding technique enables good sound quality using a coding delay of approximately 8 ms at bit-rates around 72 to 96 kilobits per second (kbps) or using a coding delay of approximately 2 ms at bit-rates around 128 to 192 kbps. While such coding delays make these audio coding techniques feasible candidates for many low-latency applications and usage scenarios, there is still a need for high-quality full-band audio coding technique that enables extremely low coding delay, e.g. one that is around 2.5 ms or below at bit rates at or close to 128 kbps and below.

SUMMARY

According to an example embodiment, a method for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal is provided, the method comprising quantizing the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm, detecting a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, determining, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence, and quantizing the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

According to another example embodiment, an apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal is provided, the apparatus configured to quantize the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm, detect a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, determine, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero- valued quantized samples of said sequence, and quantize the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

According to another example embodiment, an apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal is provided, the apparatus comprising means for quantizing the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm, means for detecting a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, means for determining, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence, and means for quantizing the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

According to another example embodiment, an apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal is provided, wherein the apparatus comprises at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: quantize the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm, detect a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, determine, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero- valued quantized samples of said sequence, and quantize the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

According to another example embodiment, a computer program is provided, the computer program comprising computer readable program code configured to cause performing at least a method according to the example embodiment described in the foregoing when said program code is executed on a computing apparatus.

The computer program according to an example embodiment may be embodied on a volatile or a non-volatile computer-readable record medium, for example as a computer program product comprising at least one computer readable non- transitory medium having program code stored thereon, the program which when executed by an apparatus cause the apparatus at least to perform the operations described hereinbefore for the computer program according to an example embodiment of the invention. The exemplifying embodiments of the invention presented in this patent application are not to be interpreted to pose limitations to the applicability of the appended claims. The verb "to comprise" and its derivatives are used in this patent application as an open limitation that does not exclude the existence of also unrecited features. The features described hereinafter are mutually freely combinable unless explicitly stated otherwise.

Some features of the invention are set forth in the appended claims. Aspects of the invention, however, both as to its construction and its method of operation, together with additional objects and advantages thereof, will be best understood from the following description of some example embodiments when read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF FIGURES

The embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, where

Figure 1 illustrates a block diagram of some components and/or entities of an audio processing system within which one or more example embodiments may be implemented. Figure 2 illustrates a block diagram of some components and/or entities of an audio encoder according to an example embodiment;

Figure 3 illustrates a method according to an example embodiment;

Figure 4 illustrates a method according to an example embodiment; Figure 5 illustrates a mapping table according to an example embodiment; and

Figure 6 illustrates a block diagram of some components and/or entities of an audio decoder according to an example embodiment;

Figure 7 illustrates a block diagram of some components and/or entities of an apparatus for implementing an audio encoder and/or an audio decoder according to an example embodiment.

DESCRIPTION OF SOME EMBODIMENTS

Figure 1 schematically illustrates a block diagram of some components and/or entities of an audio processing system 100. The audio processing system comprises an audio capturing entity 1 10 for capturing an input audio signal 1 15 that represents at least one sound, an audio encoding entity 120 for encoding the input audio signal 1 15 into an encoded audio signal 125, an audio decoding entity 130 for decoding the encoded audio signal 125 obtained from the audio encoding entity into a reconstructed audio signal 135, and an audio reproduction entity 140 for playing back the reconstructed audio signal 135. The audio capturing entity 1 10 may comprise e.g. a microphone, an arrangement of two or more microphones or a microphone array, each operable for capturing a respective sound signal. The audio capturing entity 1 10 serves to process one or more sound signals that each represent an aspect of the captured sound into the input audio signal 1 15 for provision to the audio encoding entity 120 and/or for storage in a storage means for subsequent use. The audio encoding entity 120 employs an audio coding algorithm, referred herein to as an audio encoder, to process the input audio signal 1 15 into the encoded audio signal 125. In this regard, the audio encoder may be considered to implement a transform from a signal domain (the input audio signal 1 15) to the compressed domain (the encoded audio signal 125). The audio encoding entity 120 may further include a pre-processing entity for processing the input audio signal 1 15 from a format in which it is received from the audio capturing entity 1 10 into a format suited for the audio encoder. This pre-processing may involve, for example, level control of the input audio signal 1 15 and/or modification of frequency characteristics of the input audio signal 1 15 (e.g. low-pass, high-pass or bandpass filtering). The preprocessing may be provided as a pre-processing entity that is separate from the audio encoder, as a sub-entity of the audio encoder or as a processing entity whose functionality is shared between a separate pre-processing and the audio encoder.

The audio decoding entity 130 employs an audio decoding algorithm, referred herein to as an audio decoder, to process the encoded audio signal 125 into the reconstructed audio signal 135. The audio encoder may be considered to implement a transform from an encoded domain (the encoded audio signal 125) back to the signal domain (the reconstructed audio signal 135). The audio decoding entity 130 may further include a post-processing entity for processing the reconstructed audio signal 1 15 from a format in which it is received from the audio decoder into a format suited for the audio reproduction entity 140. This post-processing may involve, for example, level control of the reconstructed audio signal 135 and/or modification of frequency characteristics of the reconstructed audio signal 135 (e.g. low-pass, high- pass or bandpass filtering). The post-processing may be provided as a post- processing entity that is separate from the audio decoder, as a sub-entity of the audio decoder or as a processing entity whose functionality is shared between a separate post-processing and the audio decoder.

The audio reproduction entity 140 may comprise, for example, headphones, a headset, a loudspeaker or an arrangement of one or more loudspeakers. Instead of using the audio capturing entity 1 10, the audio processing system 100 may include a storage means for storing pre-captured or pre-created audio signals, among which the audio input signal for provision to the audio encoding entity 120 can be selected. Instead of using the audio reproduction entity 140, the audio processing system 100 may comprise a storage means for storing the reconstructed audio signal 135 for subsequent analysis, processing, playback and/or transmission to a further entity.

The dotted vertical line in Figure 1 serves to denote that, typically, the audio encoding entity 120 and the audio decoding entity 130 are provided in separate devices that may be connected to each other via a network or via a transmission channel. The network/channel may enable a wireless connection, a wired connection or a combination of the two between the audio encoding entity 120 and the audio decoding entity 130. As an example in this regard, the audio encoding entity 120 may further comprise a (first) network interface for encapsulating the encoded audio signal 125 into a sequence of protocol data units (PDUs) for transfer to the decoding entity 130 over a network/channel, whereas the audio decoding entity 130 may further comprise a (second) network interface for decapsulating the encoded audio signal 125 from the sequence of PDUs received from the audio encoding entity 120 over the network/channel. In the following, operation of some elements of the audio processing system 100 are described via more detailed examples by assuming that the input audio signal 1 15 includes a single audio channel. This, however, is a non-limiting example that has been adopted for clarity and brevity of description and in other examples the input audio signal 1 15 may comprise a multi-channel signal (e.g. a stereo signal) that comprises two or more separate audio channels. The following examples outline a few possibilities for making use of the examples provided in the following for a single-channel input audio signal 1 15 for processing a multi-channel input audio signal 1 15 provided as a multi-channel signal: - The audio encoding entity 120 may separately process each channel of the input audio signal 1 15 into a respective channel of the encoded audio signal 125, while the channels of the encoded audio signal 125 are processed in the audio decoding entity into respective channels of the reconstructed audio signal 135. In this regard, the processing of a single channel in the audio encoding means 120 and the audio decoding means 130 may follow the approach according to the respective examples provided in the following for a single-channel input audio signal 1 15.

- The audio encoding entity 120 may jointly process one or more channels of the input audio signal 1 15 into a channel of the encoded audio signal 125, while channels of the encoded audio signal 125 are processed in the audio decoding entity 130 into desired number of reconstructed audio channels for provision as the reconstructed audio signal 135. As a more detailed example in this regard, the audio encoding means 120 may process one or more derived audio signals that are derived from channels of the input audio signal 1 15 into respective encoded derived audio signal for provision as the encoded audio signal 125 or as part thereof, whereas the decoding means 130 may process one or more encoded derived audio signals received in the encoded audio signal 125 into one or more channels of the reconstructed audio signal 135. As a particular example, a derived audio signal in the encoding means 120 comprises a downmix signal derived e.g. as a sum or as an average of two or more channels of the input audio signal 1 15 and the encoding means 120 further derives, for two or more channels, a respective set of (one or more) audio parameters that are descriptive of the difference between the downmix signal and a respective channel of the input audio signal 1 15 for inclusion in the encoded audio signal 125. The audio decoding means 130 decodes the encoded downmix signal and applies, for the two or more channels, the respective set of audio parameters to reconstruct the respective channel of the reconstructed audio signal 135.

Figure 2 illustrates a block diagram of some components and/or entities of an audio encoder 121 that may be provided as part of the audio encoding entity 120 according to an example. The audio encoding entity 120 may include further components or entities in addition to the audio encoder 121 , e.g. the pre-processing entity referred to in the foregoing, which pre-processing entity may be arranged to process the input audio signal 1 15 before passing it for the audio encoder 121 . The audio encoder 121 carries out encoding of the input audio signal 1 15 into the encoded audio signal 125, in other words the audio encoder 121 implements a transform from the signal domain to the encoded domain. The audio encoder 121 may be arranged to process the input audio signal 1 15 as a sequence of input frames, each input frame including digital audio signal at a predefined sampling frequency and comprising a time series of input samples. Typically, the audio encoder 121 employs a fixed predefined frame length. In other examples, the frame length may be a selectable frame length that may be selected from a plurality of predefined frame lengths, or the frame length may be an adjustable frame length that may be selected from a predefined range of frame lengths. A frame length may be defined as number samples L included in the frame, which at the predefined sampling frequency maps to a corresponding duration in time.

As an example in this regard, the audio encoder 121 may employ a fixed frame length of 1 ms and sampling frequency of 48 kHz, resulting in frames of L=48 samples. These values, however, serve as non-limiting examples and different frame length and/or sampling frequency may be employed instead, depending e.g. on the desired audio bandwidth, on desired framing delay and/or on available processing capacity.

The audio encoder 121 processes in the input audio signal 1 15 through a linear predictive coding (LPC) encoder 122, a long-term prediction (LTP) encoder 124 and a residual encoder 126. The LPC encoder 122 carries out an LPC encoding procedure to process the input audio signal 1 15 into a first residual signal 123, which is provided as input to the LTP encoder 124. The LTP encoder 124 carries out LTP encoding to process the first residual signal 123 into a second residual signal 127, which is provided as input to the residual encoder 126. The residual encoder 126 carries out residual encoding procedure to process the second residual signal 127 into the encoded audio signal 125 for provision to the decoding means (and/or for storage by a storage means).

LPC encoding in general is a coding technique well known in the art and it makes use of short-term redundancies in the input audio signal 125. Along similar lines, LTP encoding in general is a technique known in the art, and it makes use of long(er) term redundancies (e.g. in a range above approximately 2 ms) in the input audio signal 125: while the LPC encoder 122 is typically successful in modeling any short- term redundancies, possible long-term redundancies are still there in the first residual signal 123 and hence the LTP encoder 124 may provide an improvement for encoding audio input signals 1 15 that include a periodic or a quasi-periodic signal component whose periodicity falls into the range of long(er) term redundancies. Typical example of an audio signal that includes such a periodic or quasi-periodic signal component is human voice (especially during time periods of voiced sound that typically represent vowel sounds of human speech).

In a signal path through the LPC encoder 122, the LTP encoder 124 and the residual encoder 126, the input audio signal 1 15 is processed into the encoded audio signal 125 frame by frame. In other words, in the signal path the LPC encoder 122 carries out the LPC encoding for a frame of input audio signal 1 15 and produces a corresponding frame of the first residual signal 123, which is processed by the LTP encoder 124 into a corresponding frame of the second residual signal 127, which in turn is processed by the residual encoder 126 into a corresponding frame of the encoded audio signal 125. Respective non-limiting examples of operation of the LPC encoder 122, the LTP encoder 124 and the residual encoder 126 outlined above are provided in the following.

The LPC encoder 122 carries out an LPC analysis based on past values of the reconstructed audio signal 135 using a backward prediction technique known in the art. To enable access to the past values of the reconstructed audio signal 135, a 'local' copy of the reconstructed audio signal 135 may be stored in a past audio buffer, which may be provided e.g. in a memory in the audio encoder 121 or in the LPC encoder 122, thereby making the reconstructed audio signal 135 available for the LPC analysis in the LPC encoder 122. Hence, the references to the reconstructed audio signal 135 in context of the audio encoder 121 refer to the local copy available therein. This aspect will be described in more detail later below.

In the LPC analysis, the LPC encoder 122 may determine the LPC filter coefficients e.g. by minimizing the error term

<^KLPC

<i=0 a_tx t - 0||> t = t + l: t + N_t Ipc where a i = 0: K_LPC, a_o = 1 denote the LPC filter coefficients,

denotes the analysis window length (in number of samples), x(t , t = t - N_LPC: t denotes a signal reconstructed on basis of one or more past frames of the encoded audio signal, i.e. the most recent samples of the reconstructed audio signal 135, and the symbol ||-|| denotes an applied norm, e.g. the Euclidean norm.

The backward prediction computes LPC filter coefficients on basis of past samples of the reconstructed audio signal 135 and carries out LPC analysis filtering for a frame of the input audio signal 1 15 using the computed LPC filter coefficients to produce a corresponding frame of the first residual signal 123. In other words, the LPC analysis filtering involves processing a time series of input samples into a corresponding time series of first residual samples. The LPC encoder 122 passes the first residual signal 123 to the LTP encoder 124 for computation of the second residual signal 127 therein. The LPC analysis filtering to compute the first residual signal 123 on basis of the input audio signal 1 15 may be carried out e.g. according to the following equation:

<^KLPC

ri (t) =∑ <;i=o a_tx(t— i) , t = t + l: t + L where a i = 0: K_LPC, a₀ = 1 denote the LPC filter coefficients, L denotes the frame length (in number of samples), x(t), t = t + 1: t + L denotes a frame of the input audio signal 1 15 (i.e. the time series of input samples), and i i (t), t = t + l-. t + L denotes a corresponding frame of the first residual signal 123 (i.e. the time series of first residual samples).

In an example, the backward prediction in the LPC encoder 122 employs a predefined window length, d implying that the backward prediction bases the LPC analysis on

t samples of the reconstructed audio signal 135. In an example, the analysis window covers 608 most recent samples of the reconstructed audio signal 135, which at the sampling frequency of 48 kHz corresponds to approx. 12.7 ms. This, however, is a non-limiting example and a shorter or longer window may be employed instead, e.g. a window having a duration of 16 ms or a duration selected from the range 12 to 30 ms. A suitable length/duration of the analysis window depends also on the existence and/or characteristics of other encoding components employed in the first audio encoding mode. The analysis window has a predefined shape, which may be selected in view of desired LPC analysis characteristics. Several analysis windows for the LPC analysis applicable for the LPC encoder 122 are known in the art, e.g. a (modified) Hamming window and a (modified) Hanning window, as well as hybrid windows such as one specified in the ITU-T Recommendation G.728 (section 3.3). The LPC encoder 122 employs a predefined LPC model order, denoted as

resulting in a set of

LPC filter coefficients. Since the LPC analysis in the LPC encoder 122 relies on past values of the reconstructed audio signal 135, there is no need to transmit parameters that are descriptive of the computed LPC filter coefficients to the decoding entity 130, but the decoding entity 130 is able to compute an identical set of LPC filter coefficients for LPC synthesis filtering therein on basis of the reconstructed audio signal 135 available in the audio decoding entity 130. Consequently, a relatively high LPC model order

may be employed since it does not have an effect on the resulting bit-rate of the encoded audio signal 125, thereby enabling accurate modeling of spectral envelope of the input audio signal 1 15 especially for input audio signals 1 15 that include a periodic or a quasi-periodic signal component. On the other hand, required computing capacity increases with increasing LPC model order i_pc, and hence selection of the most appropriate LPC model order

for a given use case may involve a trade-off between the desired accuracy of modeling the spectral envelope of the input audio signal 1 15 and the available computational resources. As a non-limiting example, the LPC model order ipc may be selected as a value between 30 and 60.

In an example, the zero-input response of the LPC analysis filter derived in the LPC encoder 122 may be removed from the first residual signal 123 before encoding the residual signal 123 in the residual encoder 124. The zero-input response removal may be provided, for example, as part of the LPC encoder 122 (before passing the first residual signal 123 obtained by the LPC analysis filtering to the LTP encoder 124) or in the LTP encoder 124 (before carrying out an encoding procedure therein).

The zero input response may be calculated as

where a i =

K_LPC denote the LPC filter coefficients, L denotes the frame length (in number of samples), and x(t , t = t - K_LPC + l-. t denotes a signal reconstructed on basis of one or more past frames of the encoded audio signal, i.e. the most recent samples of the reconstructed audio signal 135. The computation of the zero input response is a recursive process: for the first sample of the zero input response all x(t) refer to past samples of the reconstructed audio signal 135, whereas the following samples of the zero input response are computed at least in part using signal samples computed for the zero input response.

After encoding a frame of the input audio signal 125 in the audio encoder 121 , the calculated zero input response is added back to the reconstructed audio signal 135. Consequently, also in the audio decoding entity 131 , after reconstructing a frame of the reconstructed audio signal 135 therein, the zero input response is added to the reconstructed audio signal 135, as will be described in the following.

The LTP encoder 124 carries out an LTP analysis based on past values of the reconstructed audio signal 135. Various approaches for carrying out the LTP analysis are known in the art. Since the computation in this regard is based on information of signal history, also the LTP analysis may be considered to constitute a backward prediction technique. To enable access to the past values of the first residual signal 123, the local copy of the reconstructed audio signal 135 required also for the backward predictive LTP analysis may be employed for this purpose. In the LTP analysis, the LTP encoder 124 may determine LTP parameters LTP lag d and LTP gain g for example by finding values of the LTP lag d and LTP gain g that minimize the error term e(t) = ||x(t) - gx(t - d) \\, t = t: t + L - 1 , where L denotes the frame length (in number of samples), x(t), t = t - d_max: t + L— 1 denotes a signal reconstructed on basis of one or more past frames of the encoded audio signal, i.e. the most recent samples of the reconstructed audio signal 135, and the symbol ||-|| denotes an applied norm, e.g. the Euclidean norm. The determination of LTP parameters may consider values of d in a predefined range from d_min to d_max in the procedure of searching the LTP parameters that minimize the above error term. In the equation above, the value of the LTP lag d is expressed as number of samples, and the values d_min and d_max that define the predefined range may be set, in dependence of the applied sampling frequency, such that they cover e.g. a value range that corresponds to LTP lag values d from approximately 2 ms to approximately 20 ms. In order to ensure matching LTP analysis to be carried out in the audio decoding entity 130, the value of d_min may be set to a value that excludes LTP lag values d that are shorter than the frame length L from consideration. For speech signals and especially for voiced segments thereof, the LTP lag d typically corresponds to the pitch period of the speech signal carried by the input audio signal 1 15.

Once found, the respective values of the LTP lag c/ and LTP gain g may be applied in the LTP encoder 124 to carry out LTP analysis filtering of a frame of the first residual signal 123 into a corresponding frame of the second residual signal 127 In other words, the LTP analysis filtering involves processing a time series of first residual samples into a corresponding time series of second residual samples. The LTP encoder 124 passes the second residual signal 127 to the residual encoder 126 for derivation of the encoded audio signal 125 therein. The LTP analysis filtering to compute the second residual signal 127 on basis of the first residual signal 123 may be carried out e.g. according to the following equation: r2 ( = ^ri ( - gr-i_(t - d), t = t: t + L - 1 , where L denotes the frame length (in number of samples), i i (t), t = t-. t + L - 1 denotes the first residual signal 123 and r₂ (t), t = t-. t + L - 1 denotes a frame of the second residual signal 127.

Although described as part of the exemplifying audio encoder 121 depicted in Figure 2, in other examples the audio encoder 121 may be provided without the LTP encoder 124. In such a scenario the residual encoder 126 may carry out the residual encoding procedure on basis of the first residual signal 123 instead of the second residual signal 127. Alternatively, such scenario may, at least conceptually, involve copying the first residual signal 123 into the second residual signal 127 for use as basis for the residual encoding procedure in the residual encoder 126.

In a further example, the application of the LTP encoder 124 is applied to carry out the LTP analysis for each frame of the first residual signal 123, but the basis for the residual encoding in the residual encoder 126 for a given frame is selected in dependence of the performance of the LTP encoder 124. As an example in this regard, the LTP encoder 124 may select one of the first residual signal 123 and the second residual signal 127 on basis of a selected norm, e.g. an Euclidean norm: the LTP encoder 124 may compute a first norm as a norm of (a frame of) the first residual signal 123 and a second norm as a norm of (the corresponding frame of) the second residual signal 127. The second residual signal 127 is selected as basis for the residual encoding in response to the first norm exceeding the second norm, whereas the first residual signal 123 is selected as basis for the residual encoding otherwise. In a variation of such an example, the second residual signal 127 is selected as basis for the residual encoding in response to the first norm multiplied by a weighting factor that is smaller than unity exceeding the second norm, whereas the first residual signal 123 is selected as basis for the residual encoding otherwise. In other words, the selection involves selecting whether to apply the LTP encoding for the given frame of the input signal or not. In such an approach, the encoded parameters that are transmitted to the audio decoding entity 130 include an indication of the selection (i.e. whether the LTP encoding has been applied or not) for the given frame is included in.

The residual encoder 126 carries out a residual encoding procedure that involves deriving encoded residual parameters on basis of the second residual signal 127. The residual encoding may employ, for example, a gain-shape coding technique (e.g. a gain-shape encoder), wherein relative amplitudes of samples in a source vector v_r(j ,j = 1: 1 are encoded separately from a gain g_r of the source vector v_r(j),j = 1: L, thereby resulting in encoded parameters that include pieces of information that identify a codevector that represents the source vector v_r(j ,j = l-. L and the gain value g_r, where a reconstructed version of the second residual signal 127 is formed by multiplying each relative amplitude value in the source vector v_r(j ,j = 1: L by the gain value g_r.

The residual encoder 126 may be arranged to convert a frame of the second residual signal 127 from the time domain into a transform domain by using a predefined transform. In an example, the predefined transform may comprise discrete cosine transform (DCT). In other examples, the predefined transform may comprise another energy-compacting transform known in the art, such as modified discrete cosine transform (MDCT), discrete sine transform (DST), etc. In the following, we refer to the second residual signal 127 converted into a transform domain by DCT (or by other transform known in the art) as transformed residual signal C whereas a frame of the transformed residual signal C is referred to as c(j) with y^'=1 , L. Herein, L (also) denotes the length of the transform, such that a frame of second residual signal 127 of length L time-domain samples is transformed into L transform domain samples c(j) with ^'=1 , L that constitute the frame of the transformed residual signal.

In an example, the gain-shape coding technique applied for encoding a frame of transformed residual signal c(j) finds the source vector v_r (j),j = 1-. L and the gain g_r that represent the frame of the transformed residual signal c(j) and makes use of a suitable vector quantizer in finding a quantized version of the source vector v_r (j),j = 1: L, whereas quantized value of the gain g_r may be derived separately e.g. by using a suitable scalar quantizer. The quantized source vector may be denoted as v_r (j),j = l-. L and it may be identified by a codeword ldx_v, whereas the quantized gain may be denoted as g_r and it may be identified by a codeword ldx_g. The quantized source vector v_r (j),j = l-. L may be also referred to as reconstructed vector.

In another example, the frame of transformed residual signal c(j) is weighted to determine a frame of weighted transformed residual signal cQ) and the encoding procedure involves finding a pair of the source vector v_r (j),j = l-. L and the gain g_r that represent the frame of weighted transformed residual signal cQ), wherein the weighting may be applied, for example, in the following manner:

6(f) = w(;)cQ^'); j = 1: L

w(l) = MAXW; w(j) = wQ^') ^0_1); j = 2: L; 0 < / < 1' where w(j),j = l-. L denotes the weights applied to the respective individual transform domain samples c(/)j = 1: L, where MAXW denotes the weight applied to the first transform domain sample c(1) and where f denotes the weighting coefficient. As non-limiting illustrative examples, MAXW may be set to value 2 and f may be set to value 0.98. If weighting of transformed residual signal C is applied, the residual encoding procedure described in more detail in the following is based on the vector that represents the frame of weighted transformed residual signal cQ) instead of the vector that represents the frame of transformed residual signal c(j).

The vector quantizer referred to in the foregoing serves to quantize the L dimensional source vector v_r(j ,j = l-. L derived from the vector c(/)j = l-. L (or from the vector c(J),j = l-. L) that represents the current frame of the second residual signal 127 into the respective quantized source vector v_r(j ,j = l-. L such that quantization distortion according to a predefined criterion is minimized. As an example, the vector quantizer may employ a pyramidally truncated lattice quantizer. In the example case of the frame length of L=48 samples (i.e. 1 ms if assuming 48 kHz sampling frequency) a pyramidally truncated Z₄8 lattice quantizer may be applied, e.g. one described in the article by Thomas R. Fisher titled "A pyramid Vector Quantizer", IEEE Transactions on Information Theory, Vol. 32, Issue 4, pp. 568-583, July 1986, ISSN 0018-9448. The frame length L=48 and the pyramidally truncated Z₄8 lattice serve as non-limiting and illustrative examples and a different frame length and/or a different lattice quantizer may be applied instead. However, for clarity and brevity of description, the frame length L=48 and the pyramidally truncated Z₄8 lattice are applied in the following as representative examples to illustrate various details and variations of the residual encoding according to some embodiments of the present invention.

The residual encoding procedure encodes the source vector v_r(j ,j = l-. L that represents the shape of the frame of the second residual signal 127 using at most B bits. In an example, the number of bits B is a predefined fixed value. In another example, the number of bits B may be selected or defined on frame-by-frame basis. Non-limiting examples for applicable number of bits B are provided in the following.

In an example, a search procedure is carried out to find the quantized source vector v_r(j = 1: that minimizes the quantization distortion. The search procedure may also consider a suitable value for the gain g_r. In this regard, the search procedure may involve testing a plurality of candidate values for the source vector v_r (j ,j = 1: L and the gain g_r and selecting the pair of the quantized source vector v_r(j ,j = l-. L and the gain g_r that minimizes the quantization distortion according to the predefined criterion. In this regard, the search procedure may involve testing a plurality of candidate values for a scaling factor g_{s i} and for each candidate value computing the respective candidate source vector as v_{r i} (j) = g_Siic(J ,j = 1: L, using the Z₄8 lattice quantizer to quantize v_{r i} (j) into a corresponding candidate quantized source vector v_{r i} (j), finding a corresponding candidate gain g_{r i} and deriving a resulting candidate reconstructed frame of the residual signal as c_£ (y) = g_r,iV_r,i(j)>j = ^{1 : L}-

Consequently, the pair of the candidate quantized source vector v_{r i}(j) and the corresponding candidate gain g_{r i} that result in minimizing the quantization distortion among the tested candidate values for the scaling factor g_{s i} are selected as the pair of the quantized source vector v_r(j ,j = l-. L and the gain g_r that represent the current frame of the transformed residual signal c(j) (or the weighted transformed residual signal cQ^'))- Herein, the gain g_r is the unquantized value, whereas the quantized gain g_r and the respective codeword ldx_g may be derived separately using the scalar quantizer (as already referred to in the foregoing). The predefined criterion employed in computing the quantization distortion may comprise, for example, the Euclidean distance between the candidate reconstructed frame of the residual signal _£ (/)_>_/ = l-. L and the frame of transformed residual signal c(j) (or the weighted transformed residual signal cQ^')).

In the example search procedure outlined in the foregoing, the candidate scaling factors g_{s i} may be computed using the following equation:

9s.i = 7 ^Lm^ax - ^R - ^ \\ J \ > * = , 7 = 1: where i_max and i_max denote respective predefined minimum and maximum values for /^' (e.g. such that i_min = 1 and i_max = 20), R denotes a predefined maximum Li norm applied in the lattice quantization by the Z₄8 lattice quantizer, and the expression ||c(/) ll denotes the Li norm of the frame of transformed residual signal c(j).

An example of the residual encoding procedure is illustrated by the flowchart 300 depicted in Figure 3. The residual encoding procedure according to this example commences by initial quantization of the source vector v_r(j),j = l-. L using the pyramidally truncated Z₄8 lattice quantizer by applying a predefined maximum norm K (e.g. Li norm), as indicated in block 302. Application of the predefined maximum norm K implies quantization that is limited to make use of those shells of the pyramidally truncated Z₄8 lattice that have norm that is at most K. The initial quantization results in an initial quantized vector v^j), j = l-. L.

The procedure continues with detecting the number of zero-valued elements k at the end of the initial quantized vector v^j), as indicated in block 304. If k equals zero, i.e. if the last element of the initial quantized vector, i.e. v^L), is non-zero, the initial quantized vector v^j) is selected to represent the current frame of the second residual signal 127, as indicated in block 308, and a codeword Idxi that identifies the initial quantized vector v^j) is computed and included in the encoded parameters as the codeword ldx_v.

In contrast, if k last elements of the initial quantized vector v^j) are zero (where k > 0), the residual encoding procedure proceeds to block 310 to re-quantize the first L-k elements of the source vector v_r(j),j = l-. L - k using the pyramidally truncated Z₄8 lattice quantizer, this time by applying a modified maximum norm K' (e.g. Li norm), that is larger than or equal to the predefined maximum norm K (i.e. K^*≥K). The first L-k elements of the source vector v_r(j),j = l-. L - k may be referred to as a shortened source vector. The re-quantization results in a re-quantized vector v₂(j _> j = 1- L - k that will be selected to represent the current frame of the second residual signal 127, as indicated in block 312, and a codeword Idx2 that identifies the re-quantized vector v₂(j , j = 1: L - k is computed and included in the encoded parameters as the codeword ldx_v.

Using the modified maximum norm K' that is larger than or equal to the predefined maximum norm K but performing the quantization on the shortened source vector v_r(j),j = l-. L - k enables more accurate modeling of the vector c(j) subject to quantization while using the same or substantially the same number of bits B for the quantization in comparison to the initial quantization of block 302.

Figure 4 depicts a flowchart 400 that provides an example of the re-quantization of block 310, i.e. quantization of the shortened source vector v_r(j ,j = l: L - k by applying the modified maximum norm K' while applying the same maximum number of bits B as in the initial quantization. The re-quantization commences by determining a value of the modified maximum norm K', as indicated in block 314. The modified maximum norm K' may be selected in dependence of the number of bits B and the dimension L-k of the shortened source vector v_r(j ,j = 1: L - k that is subject for quantization. The selection of the modified maximum norm K' may be provided e.g. by a predefined mapping function that returns a suitable value of the modified maximum norm K' in dependence of the given values of the number of bits B and the vector dimension L-k. As an example, such a mapping function may be provided via a mapping table that stores the respective number of bits B_m for a plurality of pairs of a maximum norm K_m and a vector dimension L_m and searching the mapping table in the following manner:

- Find the number of bits B_r defined for the predefined maximum norm K and the vector dimension L;

- Identify the highest maximum norm K_r for which the number of bits B_m for vector dimension L-k does not exceed B_r; and

- Select K_r as the modified maximum norm K'.

Once the modified maximum norm K' has been determined, the shortened source vector v_r(j ,j = l-. L - k is quantized using the pyramidally truncated Z₄8 lattice by applying the modified maximum norm K', as indicated in block 316. This results in the re-quantized vector v₂(j), j = l: L - k, which will be selected to represent the current frame of the second residual signal 127 and which is identified by the codeword Idx2. The residual encoding procedure, e.g. the one illustrated by the flowchart 300 depicted in Figure 3, results in providing residual encoding parameters including the codeword ldx_g, that identifies the quantized gain g_r, a codeword ldx_v that identifies the selected one of the quantized vectors v^j) and v₂(j), and the value of k. The residual encoding parameters are provided for inclusion in the encoded parameters for transmission to the decoding entity 130 for the audio decoding procedure therein. In particular, in case where / =0 the codeword ldx_v=ldxi is provided in the residual encoding parameters, whereas for cases where k>0 the codeword Idx_v=ldx2 is provided in the residual encoding parameters. In the decoding entity 130, the codeword ldx_v and the value of k, together with a priori knowledge of the number of bits B, the length L of the source vector v_r(j ,j = 1: L and the predefined maximum norm K with access to the above-mentioned predefined mapping function provide sufficient information to derive the value of the modified maximum norm K' for the decoding procedure. This aspect will be discussed more detail in the following as part of description of the decoding entity 130. A non-limiting example of a mapping table referred to in the foregoing is provided in Figure 5. Each row of the mapping table represents a given maximum norm K_m, whereas each column of the mapping table represents a given vector dimension L_m. Each cell of the mapping table indicates the number of bits required for lattice quantization using the respective maximum norm K_m and vector dimension L_m. As an example, if there are 92 bits available for quantization of the source vector ^v _rii)_> i = one sees from the mapping table of Figure 5 that the predefined maximum norm K for vector dimension K_m=48 is 30 (cf. the cell with light gray background). Assuming that k=5, i.e. that five last elements of the initial quantized vector v-i_(J), j = 1: 1 are zero-valued, vector dimension of the shortened source vector v_r(J), j = 1: L - k is 48-5=43. From the mapping table of Figure 5 one can see that for vector dimension L_m=43 the highest maximum norm K_m for which the number of bits is at most 92 is 32 (cf. the cell with dark gray background), which in this example would hence serve as the modified maximum norm K' for quantization of the shortened source vector v_r(J), j = 1: L - k. In the following, some further considerations concerning the predefined maximum norms K_m, vector dimensions L_m and respective number of bits B_m stored in the mapping table, e.g. the exemplifying mapping table of Figure 5, are provided. Let S(n ,k) denote the number of lattice points in a pyramidal shell k of a lattice Z_n. The pyramidal shell of norm k of the lattice Z_n contains all lattice points having the Li norm equal to k. A pyramidal lattice truncation to norm k implies truncation of the lattice Z_n such that only those pyramidal shells that have norm that is smaller than or equal to k are considered.

The number of lattice points at the shell of the pyramidal lattice Z_n that has norm k may be computed based on the following equations:

5(n, k) = 5(n - l, k) + 5(n - 1, k - 1) + S n, k - 1)

S(n, 0) = l; S(l, fc) = 2k

Consequently, the number of lattice points in a pyramidal truncation of the lattice Z_n to norm k may by expressed as

N(n, k) =∑₌₀ S(n, i) .

Moreover, the number of bits required to uniquely indicate a lattice point in a pyramidal truncation of the lattice Z_n to norm k may be computed as

where the symbol \x] denotes rounding to the smallest integer value that is larger than or equal to x. The audio encoder 121 stores at least a predefined number of most recent samples of the reconstructed audio signal 135 to enable the backward prediction in the LPC encoder 122. As described in the foregoing, this may be implemented by generating a local copy of the reconstructed audio signal 135 in the audio encoder 121 and storing the local copy of the reconstructed audio signal 135 in the past audio buffer in the LPC encoder 122 or otherwise within the audio encoder 121 . In this regard, the audio encoder 121 may further comprise a local audio synthesis element that is arranged to generate the local copy of the reconstructed audio signal 135 for the current frame and to update the past audio buffer by discarding the L oldest samples therein and inserting the samples that constitute the local copy of the reconstructed audio signal 135 in the past audio buffer to facilitate audio encoder 121 operation for processing of the next frame of the audio input signal 1 15.

The past audio buffer stores at least the

most recent samples of the reconstructed audio signal 135 to cover the analysis window applied by the LPC encoder 122. In case the LTP encoder 124 is available in the audio encoder, the past audio buffer may store at least the d_max most recent samples of the reconstructed audio signal 135 to enable evaluation of LTP lag values up to d_max.

Figure 6 illustrates a block diagram of some components and/or entities of an audio decoder 131 that may be provided as part of the audio decoding entity 130 according to an example. The audio decoder 131 carries out decoding of the encoded audio signal 125 into the reconstructed audio signal 135, thereby serving to implement a transform from the encoded domain (back) to the signal domain and, in a way, reversing the encoding operation carried out in the audio encoder 121 . In the audio decoder 131 , a residual encoder 136 carries out residual decoding procedure to processes the encoded audio signal 125 into a reconstructed second residual signal 137, which is provided as input to a LTP decoder 134. The LTP decoder 134 carries out LTP decoding procedure to generate a reconstructed first residual signal 133 for provision as input to a LPC decoder 132, which in turn carries out LPC synthesis on basis of the reconstructed first residual signal 133 to output the reconstructed audio signal 135. The audio decoder 131 process the encoded audio signal 125 frame by frame.

The residual decoding procedure in the residual decoder 136 involves computing the reconstructed second residual signal 137 on basis of the encoded audio signal 125. A frame of reconstructed second residual signal 137 is provided as a respective time series of reconstructed second residual samples. In order to enable meaningful reconstruction of the residual signal, the residual decoder 134 must employ the same or otherwise matching residual coding technique as employed in the residual encoder 124. The residual decoding procedure involves dequantizing residual encoding parameters received as part of the encoded audio signal 125 and using the dequantized parameters to create the frame of the reconstructed second residual signal 137, i.e. the time series of reconstructed second residual samples.

In an example, the encoded audio signal 125 includes the residual encoding parameters described in the foregoing, i.e. the codewords ldx_g and ldx_v and the value of k, where the codeword ldx_g identifies the quantized gain g_r, the codeword Idxv identifies a vector of the lattice codebook that represents the current frame and k indicates the number of zero-valued elements at the end of the initial quantized vector v _(j as detected in the audio encoder 121 . In addition to these received residual encoding parameters the residual decoder 136 further has a priori knowledge of the number of bits B available for quantization of a frame of the second residual signal 127 and the length L, as well as access to the predefined mapping function that returns a suitable value of the norm (e.g. the predefined maximum norm K or the modified maximum norm K') in dependence of the given values of the number of bits B and the vector dimension L-k. In case the received value k equals zero, the residual decoder 136 may directly dequantize the received codeword ldx_v into a reconstructed vector v_r(j , j = 1: 1 by using the pyramidally truncated Z₄8 lattice (de)quantizer (possibly in view of the predefined maximum norm K), the dequantization thereby resulting in the reconstructed vector v_r(j) = v^j), j = l-. L. In case the received value k is larger than zero, the residual decoder 136 defines the value of L-k by using the received value of k and may employ the predefined mapping function to derive the modified maximum norm K employed in the residual encoder 126 in generation of the received codeword ldx_v. This can be carried out by using a predefined mapping table as basis for the mapping, for example by using the procedure described in the foregoing in context of the residual encoding procedure. Consequently, the residual decoder 136 may dequantize the received codeword ldx_v into a reconstructed vector v_r(j , j = l: L - k by using the pyramidally truncated Z₄s lattice (de)quantizer (possibly in view of the predefined maximum norm K), the dequantization thereby resulting in the reconstructed vector v_r(j) = v₂(j), j = 1: L - k.

Once the reconstructed vector v_r(j , j = l-. L - k has been derived, the residual decoder 136 proceeds into generating a frame of a reconstructed transform-domain residual signal c(j), j = l-. L, which may be found by multiplying each element of the reconstructed source vector v_r(j , j = l-. L - k by the (de)quantized gain value g_r, e.g. by (y) = g_rv_r(j), j = 1: L - k. . In case the received value k is larger than zero, the reconstructed source vector v_r(j , j = l-. L - k is shorter than L, which needs to be compensated before or during an inverse transform to be applied to the reconstructed transform-domain residual signal c(j), j = l-. L - k in the residual decoder 136. In an example, such a compensation involves appending the reconstructed transform-domain residual signal c(j), j = l-. L - k by introducing k zeros at the end of the vector c(j)„ thereby resulting in the reconstructed transform- domain residual signal c(j), j = 1-. L. In another example, the k zeros are appended at the end of the vector v_r(j before the multiplication by g_r. In a further example, the inverse transform is carried out such that only the first L-k transform domain samples are considered in the procedure (e.g. by considering only the first L-k columns when applying a matrix-based inverse transform).

The residual decoder 136 further applies an inverse transform to convert the reconstructed transform-domain residual signal c(j), j = l-. L into corresponding time-domain signal, which serves as a frame of the reconstructed second residual signal 137, which is denoted herein as ₂(t), t = t + l-. t + L. \n case the weighting of the transformed residual signal c(j) has been applied in the encoder (as described in the foregoing), the corresponding inverse weighting needs to be applied to the reconstructed transform-domain residual signal c(j), j = l-. L before applying the inverse transform in order to compensate the effect of the weighting. The applied inverse transform is an inverse transform of the transform applied in the residual encoder 126, e.g. inverse DCT, inverse MDCT, inverse DST, etc.

As described in the foregoing, the reconstructed second residual signal 137 is provided for LTP decoding procedure in the LTP decoder 134, which results in a reconstructed first residual signal 133. A frame of reconstructed first residual signal 133 is provided as a respective time series of reconstructed first residual samples. In other words, the LTP decoding procedure processes the frame of the reconstructed second residual signal ₂ (t), t = t + l-. t + L into a corresponding frame of the reconstructed first residual signal r^t), t = t + 1: t + L. In this regard, the LTP decoder 134 carries out LTP analysis to find the LTP lag d and the LTP gain g, for example, by using the procedure described in the foregoing in context of the LTP encoder 124. Moreover, the LTP decoding procedure involves LTP synthesis filtering to compute the first residual signal 133 on basis of the second residual signal 137 using the derived values of the LTP lag c/ and the LTP gain g. In this regard, the following equation may be employed: r^t) = r₂(t) + r^t - d), t = t: t + L - l , where L denotes the frame length (in number of samples), r^t^. t = t-. t + L - 1 denotes the reconstructed first residual signal 133 and ₂ (t), t = t-. t + L— 1 denotes the frame of the reconstructed second residual signal 137.

Although described as part of the exemplifying audio decoder 131 depicted in Figure 6, in other examples the audio decoder 131 may be provided without the LTP decoder 134. In such a scenario the residual decoder 136 may provide its output as the reconstructed first residual signal 133 instead of the reconstructed second residual signal 137. Alternatively, such scenario may, at least conceptually, involve copying the reconstructed second residual signal 137 into the reconstructed first residual signal 133 for use as basis for the LPC decoding procedure in the LPC decoder 132.

In a further example, the LTP decoder 134 is available in the audio decoder 131 for carrying out the LTP decoding procedure therein in accordance with the indication in this regard received in the encoded parameters: if the encoded parameters include an indication that the LTP encoding was applied in the audio encoder 121 in encoding the respective frame, the LTP decoder 134 is employed to process the frame of the reconstructed second residual signal ₂ (t), t = t + l-. t + L into a corresponding frame of the reconstructed first residual signal r^t), t = t + l-. t + L. In contrast, in case the encoded parameters include an indication that the LTP encoding was not applied in the audio encoder 121 in encoding the respective frame, the LTP decoder 134 operation is omitted for the respective frame and the frame of the reconstructed second residual signal ₂ (t), t = t + l-. t + L \s provided instead or as the reconstructed first residual signal r^t), t = t + l-. t + L for processing by the LPC decoder 132.

As described in the foregoing, the reconstructed first residual signal 133 is provided for LPC decoding procedure in the LPC decoder 132, which results in the reconstructed audio signal 135. A frame of reconstructed audio signal 135 is provided as a respective time series of reconstructed output samples. In other words, the LPC decoding procedure processes the frame of the reconstructed first residual signal r^t), t = t + l-. t + L into a corresponding frame of the reconstructed audio signal x(t), t = t + l-. t + L.

The LPC decoding procedure comprises the LPC decoder 132 carrying out the LPC analysis based on past values of the reconstructed audio signal 135 using the same backward prediction technique as applied in the LPC encoder 122. Hence, the backward prediction computes LPC filter coefficients on basis of past samples of the reconstructed audio signal 135. The LPC decoder further carries out LPC synthesis filtering of the reconstructed residual signal 133 by using the LPC filter coefficients derived for the current frame in the LPC decoder 132, thereby generating the reconstructed audio signal 135.

The LPC synthesis filtering in the LPC decoder 132 involves processing a time series of reconstructed first residual samples into a corresponding time series of reconstructed output samples that hence constitute a corresponding frame of the reconstructed audio signal 135. The LPC decoder 132 may find the LPC filter coefficients for the LPC synthesis therein, for example, by using the procedure outlined in the foregoing for the LPC encoder 122. The LPC synthesis may be carried out e.g. by using the following equation:

where a i =

K_LPC denote the LPC filter coefficients, L denotes the frame length (in number of samples), x(t), t = t + 1: t + L denotes a frame of the reconstructed audio signal 135 (i.e. the time series of reconstructed output samples), and r^t^. t = t + 1: t + L denotes a corresponding frame of the reconstructed first residual signal 133 (i.e. the time series of reconstructed residual samples).

Since the LPC analyses in the LPC encoder and the LPC decoder 132 are carried out using the same approach and they are further performed on the same or similar audio signals, the resulting LPC filter coefficients are also the same or similar. The past values of the reconstructed audio signal 135 required for the LPC analysis in the LPC decoder 131 are stored in a past audio buffer, which may be provided e.g. in a memory in the audio decoder 131 or in the LPC decoder 132.

After having derived the reconstructed audio signal 135, the LPC decoder 132 further adds the zero input response of the LPC synthesis filter to the reconstructed audio signal 135 before passing it from the audio decoder 131 for audio playback, storage and/or further processing and before using this signal to update the past audio buffer of the audio decoder 131 (as will be described later in this text). The zero input response may be calculated on basis of the reconstructed audio signal 135, for example, as described in the foregoing for computation of the zero input response in the audio encoder 121 .

Along the lines described in the foregoing for the audio encoder 121 , also the audio decoder 131 stores at least the

most recent samples of the reconstructed audio signal 135 to enable the backward prediction in the LPC decoder 132. In case the LTP decoder 134 is available in the audio decoder 131 , at least the d_max most recent samples of the reconstructed audio signal 135 may be stored to enable evaluation of LTP lag values up to d_max. This may be implemented by storing sufficient number of most recent samples in the past audio buffer of the audio decoder 131 . After having carried out the decoding procedure, the audio decoder 131 updates the past audio buffer therein by discarding the L oldest samples in the past audio buffer and inserting the samples of the reconstructed audio signal 135 in the past audio buffer to facilitate the audio decoding of the next frame. Figure 7 illustrates a block diagram of some components of an exemplifying apparatus 600. The apparatus 600 may comprise further components, elements or portions that are not depicted in Figure 7. The apparatus 600 may be employed in implementing e.g. the audio encoder 121 and/or the audio decoder 131 .

The apparatus 600 comprises a processor 616 and a memory 615 for storing data and computer program code 617. The memory 615 and a portion of the computer program code 617 stored therein may be further arranged to, with the processor 616, to implement the function(s) described in the foregoing in context of the audio encoder 121 and/or the audio decoder 131 .

The apparatus 600 comprises a communication portion 612 for communication with other devices. The communication portion 612 comprises at least one communication apparatus that enables wired or wireless communication with other apparatuses. A communication apparatus of the communication portion 612 may also be referred to as a respective communication means. The apparatus 600 may further comprise user I/O (input/output) components 418 that may be arranged, possibly together with the processor 616 and a portion of the computer program code 617, to provide a user interface for receiving input from a user of the apparatus 600 and/or providing output to the user of the apparatus 600 to control at least some aspects of operation of the audio encoder 121 and/or the audio decoder 131 implemented by the apparatus 600. The user I/O components 618 may comprise hardware components such as a display, a touchscreen, a touchpad, a mouse, a keyboard, and/or an arrangement of one or more keys or buttons, etc. The user I/O components 618 may be also referred to as peripherals. The processor 616 may be arranged to control operation of the apparatus 600 e.g. in accordance with a portion of the computer program code 617 and possibly further in accordance with the user input received via the user I/O components 618 and/or in accordance with information received via the communication portion 612.

Although the processor 616 is depicted as a single component, it may be implemented as one or more separate processing components. Similarly, although the memory 615 is depicted as a single component, it may be implemented as one or more separate components, some or all of which may be integrated/removable and/or may provide permanent / semi-permanent/ dynamic/cached storage.

The computer program code 617 stored in the memory 615, may comprise computer-executable instructions that control one or more aspects of operation of the apparatus 600 when loaded into the processor 616. As an example, the computer-executable instructions may be provided as one or more sequences of one or more instructions. The processor 616 is able to load and execute the computer program code 617 by reading the one or more sequences of one or more instructions included therein from the memory 615. The one or more sequences of one or more instructions may be configured to, when executed by the processor 616, cause the apparatus 600 to carry out operations, procedures and/or functions described in the foregoing in context of the audio encoder 121 and/or the audio decoder 131 . Hence, the apparatus 600 may comprise at least one processor 616 and at least one memory 615 including the computer program code 617 for one or more programs, the at least one memory 615 and the computer program code 617 configured to, with the at least one processor 616, cause the apparatus 600 to perform operations, procedures and/or functions described in the foregoing in context of the audio encoder 121 and/or the audio decoder 131 .

The computer programs stored in the memory 615 may be provided e.g. as a respective computer program product comprising at least one computer-readable non-transitory medium having the computer program code 617 stored thereon, the computer program code, when executed by the apparatus 600, causes the apparatus 600 at least to perform operations, procedures and/or functions described in the foregoing in context of the audio encoder 121 and/or the audio decoder 131 . The computer-readable non-transitory medium may comprise a memory device or a record medium such as a CD-ROM, a DVD, a Blu-ray disc or another article of manufacture that tangibly embodies the computer program. As another example, the computer program may be provided as a signal configured to reliably transfer the computer program.

Reference(s) to a processor should not be understood to encompass only programmable processors, but also dedicated circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processors, etc. Features described in the preceding description may be used in combinations other than the combinations explicitly described.

Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.

Claims

1 . A method for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal, the method comprising quantizing the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm; detecting a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector; determining, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence; and quantizing the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

2. A method according to claim 1 , wherein said lattice quantizer comprises a pyramidally truncated lattice quantizer.

3. A method according to claim 1 or 2, further comprising using a predefined transform to convert a time series of input samples that represent said frame of the input audio signal in time domain into a series of transform domain samples that represent said frame of the input audio signal in a transform domain, which series of transform domain samples serves as basis for said source vector.

4. A method according to claim 3, wherein said predefined transform comprises discrete cosine transform.

5. A method according to claim 3 or 4, further comprising modeling said series of transform domain samples as a shape vector of relative amplitude values and a scalar gain value such that the shape vector multiplied by the scalar gain value matches or substantially matches said series of transform domain samples; and using relative amplitude values of said shape vector as said source vector.

6. A method according to any of claims 3 to 5, further comprising processing a time series of input samples of said frame of the input audio signal using linear predictive filter coefficients computed using a backward prediction into a residual signal that comprises a respective time series of residual samples; and applying said predefined transform to said time series of residual samples.

7. A method according to any of claims 3 to 5, further comprising processing a time series of input samples of said frame of the input audio signal using linear predictive filter coefficients computed using a backward prediction into a residual signal that comprises a respective first time series of residual samples; applying long-term prediction to said first time series of residual samples to derive a respective second time series of residual samples; and applying said predefined transform to said second time series of residual samples.

8. A method according to any of claims 1 to 7, further comprising outputting, in response to a zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, the initial quantized vector; and outputting, in response to a non-zero-length sequence of consecutive zero- valued quantized samples at the end of the initial quantized vector, the re- quantized vector

9. A method according to any of claims 1 to 8, further comprising outputting an indication of the length of said sequence of consecutive zero- valued quantized samples at the end of the initial quantized vector; outputting, in response to a zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, a codeword that identifies the initial quantized vector; and outputting, in response to a non-zero-length sequence of consecutive zero- valued quantized samples at the end of the initial quantized vector, a codeword that identifies the re-quantized vector.

10. An apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal, the apparatus configured to: quantize the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm; detect a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector; determine, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence; and quantize the source samples of the shortened source vector into respective re- quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

1 1 . An apparatus according to claim 10, wherein said lattice quantizer comprises a pyramidally truncated lattice quantizer.

12. An apparatus according to claim 10 or 1 1 , wherein the apparatus is further configured to use a predefined transform to convert a time series of input samples that represent said frame of the input audio signal in time domain into a series of transform domain samples that represent said frame of the input audio signal in a transform domain, which series of transform domain samples serves as basis for said source vector.

13. An apparatus according to claim 12, wherein said predefined transform comprises discrete cosine transform.

14. An apparatus according to claim 12 or 13, wherein the apparatus is further configured to model said series of transform domain samples as a shape vector of relative amplitude values and a scalar gain value such that the shape vector multiplied by the scalar gain value matches or substantially matches said series of transform domain samples; and use relative amplitude values of said shape vector as said source vector.

An apparatus according to any of claims 12 to 14, wherein the apparatus is further configured to process a time series of input samples of said frame of the input audio signal using linear predictive filter coefficients computed using a backward prediction into a residual signal that comprises a respective time series of residual samples; and apply said predefined transform to said time series of residual samples.

An apparatus according to any of claims 12 to 14, wherein the apparatus is further configured to process a time series of input samples of said frame of the input audio signal using linear predictive filter coefficients computed using a backward prediction into a residual signal that comprises a respective first time series of residual samples; apply long-term prediction to said first time series of residual samples to derive a respective second time series of residual samples; and apply said predefined transform to said second time series of residual samples.

17. An apparatus according to any of claims 10 to 16, wherein the apparatus is further configured to output, in response to a zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, the initial quantized vector; and output, in response to a non-zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, the re-quantized vector

18. An apparatus according to any of claims 10 to 17, wherein the apparatus is further configured to output an indication of the length of said sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector; output, in response to a zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, a codeword that identifies the initial quantized vector; and output, in response to a non-zero-length sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector, a codeword that identifies the re-quantized vector.

19. An apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal, the apparatus comprising means for quantizing the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm; means for detecting a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector; means for determining, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence; and means for quantizing the source samples of the shortened source vector into respective re-quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

20. An apparatus for encoding a source vector of a predefined number of source samples that represent a frame of an input audio signal, wherein the apparatus comprises at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: quantize the source samples of the source vector into respective quantized samples of an initial quantized vector using at most a predefined number of bits by employing a lattice quantizer restricted to a predefined maximum norm; detect a sequence of consecutive zero-valued quantized samples at the end of the initial quantized vector; determine, in response to detecting a sequence of non-zero length, a modified maximum norm that is greater than or equal to the predefined maximum norm and determining a shortened source vector by excluding those source samples that are represented by said zero-valued quantized samples of said sequence; and quantize the source samples of the shortened source vector into respective re- quantized samples of a re-quantized vector using at most the predefined number of bits by employing said lattice quantizer restricted to the modified maximum norm.

21 . A computer program comprising computer readable program code configured to cause performing of the method of any of claims 1 to 9 when said program code is run on a computing apparatus.

22. A computer program product comprising computer readable program code tangibly embodied on a non-transitory computer readable medium, the program code configured to cause performing the method according to any of claims 1 to 9 when run a computing apparatus.