EP2045800A1

EP2045800A1 - Method and apparatus for transcoding

Info

Publication number: EP2045800A1
Application number: EP07117956A
Authority: EP
Inventors: Christophe Beaugeant
Original assignee: Nokia Siemens Networks Oy
Current assignee: Nokia Solutions and Networks Oy
Priority date: 2007-10-05
Filing date: 2007-10-05
Publication date: 2009-04-08

Abstract

A method and apparatus for transcoding comprising means (10) for partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream; means (20) for mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; and means (20) for encoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, wherein the apparatus comprises means (30) for modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.

Description

FIELD OF THE INVENTION

The present invention relates to transcoding.

BACKGROUND OF THE INVENTION

In the last decades, networks of different type have been developed, like mobile GSM, UMTS, CDMA and IP, providing alternative ways to the 'classical' circuit switched network. The interconnection of all these networks leads to an interoperability problem regarding transmission of speech. Indeed, non-compatible speech standards have been adopted in the different networks, although, most of the codecs at medium rate (5-16,5 kbit/s for narrowband codecs, 5-25 kbit/s for wideband codecs) are based on the same model Code Excited Linear Prediction (CELP). The simplest method to provide inter-connectivity consists of decoding one codec standard compressed bitstream A and re-encoding it into the other codec standard bitstream B. This conventional method is called tandem transcoding. It suffers from several problems such as complexity, delay and degradation of speech.
Recently, so-called 'smart transcoding' solutions have been proposed, which are based on the fact that the different standards are based on the CELP principle. They aim at reducing the complexity of the transcoding as many functions at encoder B can be skipped, decreasing the delay and enhancing the quality or at least getting the same quality as with the normal transcoding. The basic idea is to use redundancy on the standard to avoid computing parameters that have already been computed. Reference is made to Figure 1 that shows the principle of smart coding. When transcoding from a bitstream format of codec A into a bitstream format of codec B, bitstream A is first decoded in decoder A. The obtained decoded signal is then encoded into target format B (bitstream B) by encoder B. In case both codecs are CELP codecs, bitstreams A and B transmit a similar set of parameters, such as Linear Prediction Coding (LPC) coefficients, pitch delays, fixed codebook indexes and fixed and adaptive gains. The key idea of smart transcoding consists of avoiding the computation of parameters already available. An intelligent mapping and quantization of the parameters available in bitstream A into bitstream B parameters allow the skipping of many functions and hence reduce the computation load of the transcoding. As depicted in Figure 1, only a partial decoding is necessary to extract the parameters from bitstream A. Their mapping as well as a partial encoding then builds the accurate bitstream B.
Article C. Beaugeant, H. Taddei, "Quality and Computation Load Reduction achieved by applying Smart Transcoding between CELP Speech Codecs", Eusipco, Poland, September 2007, gives several examples of possible mapping of parameters in the case of G.729.A and AMR codecs.
One of the possible parameters mapped between speech codecs in transcoding is the Linear Prediction Coefficients vector (LPC). The mapping of the LPC coefficients is relatively straightforward when the speech codecs are applied to the signal at the same sampling frequency. A transposition of the LPCs from decoder A to encoder B leads to good quality and reduction of complexity as shown in the above-mentioned article. However, such a solution cannot be applied when codecs A and B employ different sampling frequencies. In that case if the LPC filters of codec A and B model signals of different sampling frequencies, it leads to a different number of coefficients and different meanings of the LPC coefficients. Existing solutions that provide mapping of LPC parameters for smart transcoding purposes are only based on the mapping of the LPC filter at the same sampling frequency (e.g. narrowband signal at 8 kHz).

BRIEF DESCRIPTION OF THE INVENTION

An object of the present invention is thus to provide a method and an apparatus for implementing the method so as to solve the above problem or at least to alleviate it. The objects of the invention are achieved by a method, a computer program product, an apparatus and a module which are characterized by what is stated in the independent claims. The preferred embodiments of the invention are disclosed in the dependent claims.
The invention is based on recognizing the problem and on the realization that in transcoding between two codec formats employing different sampling frequencies, the LPC coefficients of the LPC filter of the target codec format can be estimated by applying a modification on the sampling frequency of the extracted LPC coefficients.
An advantage of the method and apparatus of the invention is that it enables smart transcoding between two codec formats employing different sampling frequencies.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following the invention will be described in greater detail by means of preferred embodiments with reference to the accompanying drawings, in which

Figure 1 is a block diagram showing the principle of smart transcoding; and
Figure 2 is a block diagram of an embodiment.

DETAILED DESCRIPTION OF THE INVENTION

The following embodiments are exemplary. Although the specification may refer to "an", "one", or "some" embodiment(s) in several locations, this does not necessarily mean that each such a reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments. The present invention is applicable to any communication system or any combination of different communication systems such as GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division Multiple Access), WLAN (Wireless Local Area Network) UMTS (Universal Mobile Telecommunications System), CDMA and/or IP (Internet Protocol) standard, or any other suitable standard/non-standard communication means. The communication system may be a fixed communication system or a wireless communication system or a communication system utilizing both fixed networks and wireless networks. The protocols used and the specifications of communication systems, especially in wireless communication, develop rapidly. Such a development may require extra changes to an embodiment. Therefore, all terms and expressions should be interpreted broadly and they are intended to illustrate, not to restrict, the embodiment. In the following, different embodiments will be described using, as an example a system architecture to which the embodiments may be applied, without restricting the embodiment to such an architecture, however.
Notation and environment: in the following, two codecs A and B based on an LPC analysis at sampling frequencies F_s (A) and F_s (B) respectively are considered. The CELP codecs family is a subset of such codecs. A transcoding scheme where a signal s_A (t) at sampling frequency F_s (A) is encoded by encoder A is considered. In a 'classical' transcoding scheme (i.e. without smart transcoding), the signal is decoded into a pcm signal which is resampled into the sampling frequency F_s (B) into a signal s_B (t) and signal s_B (t) is then encoded by encoder B.
The LPC analysis within encoder A provides an autoregressive (AR) model of signal s_A (t), so that an approximation of the signal s_A (t) is given by: ${\hat{s}}_{A} (t) = \sum_{i = 1}^{M} a_{i} s_{A} (t - i) .$
In this example the LPC filter $A (z) = \sum_{i = 1}^{M} a_{i} z^{- 1},$
with a_o =1 is considered.
Similarly, encoder B provides an AR estimate of s_B (t) through its LPC analysis: ${\hat{s}}_{B} (t) = \sum_{i = 1}^{N} b_{i} s_{B} (t - i) .$
In that case the LPC filter is: $B (z) = \sum_{i = 0}^{N} b_{i} z^{- 1} with b_{o} = 1.$
Taking into account these notations, the invention generally deals with the process of finding an estimation of filter B(z) when LPC filter A(z) is known. Let's note as B̂(z) the filter obtained from A(z). The coefficient of the constructed filter B̂(z) can be mapped into encoder B in a similar way as smart transcoding based on LPC mapping, thus avoiding the computation of the LPC coefficients within encoder B and accordingly saving computation load. Filters A(z) and B(z) are AR models of two signals with different sampling frequencies. To obtain the coefficient of filter B̂(z), coefficients a_i need to be extrapolated if N > M or interpolated if N < M. The interpolation/extrapolation can be seen as a modification of the sampling frequency of signal [a_i ]_i=0...M (or alternatively [a_i ], i=1...M) from sampling frequency F_s (A) to sampling frequency F_s (B). Accordingly, according to an embodiment, finding the coefficients of B̂(z) that approximate filter B(z) can be done through the following steps:

1. Extracting LPC coefficients [a_i ], i=1...M from bitstream A in decoder A
2. Applying a modification of the sampling frequency on LPC coefficients [a_i ], i=1...M, thus obtaining coefficients b̂_i
3. Mapping coefficients b̂_i in encoder B for quantization and for computation of the rest of the coefficients of encoder B (in CELP codecs e.g. pitch, gains, fixed codebook).

In step 2 above it is alternatively possible to apply the modification of the sampling frequency on LPC coefficients [a_i ]_i=0...M. In that case the target LPC filter is preferably forced to set b₀=1. According to an embodiment, modification of the sampling frequency comprises up-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec (B) format is higher than the sampling frequency of the source codec (A) format. According to an embodiment, the up-sampling factor is equal to the ratio of the sampling frequency of the target codec format to the sampling frequency of the source codec format. According to an embodiment, modification of the sampling frequency comprises down-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec format is lower than the sampling frequency of the source codec format. According to an embodiment, the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format. Acccordingly, when applying a modification of the sampling frequency to LPC coefficients [a_i ], i=1...M from sampling frequency F_s (A) to sampling frequency F_s (B), M* F_s (B)/F_s (A) coefficients b̂_i are obtained. According to an embodiment, the number of coefficients b̂_i can be further adjusted to the number of coefficients of target LPC filter B(z) if necessary. For instance if M* F_s (B)/F_s (A)>N, the number of coefficients b̂_i can be restricted to N, and if M* F_s (B)/F_s (A)>N, N- M* F_s (B)/F_s (A), zeros can be added to the vector b̂_t .
Figure 2 is a block diagram of an apparatus according to an embodiment. Different modules or units 10, 20 and 30 of the apparatus may be implemented in one or more physical or logical entities. Figure 2 is a simplified diagram that only shows some elements and functional entities relevant to understanding the various embodiments described here and whose implementation may differ from what is shown. The connections shown in Figure 2 are logical connections; the actual physical connections may be different. In the example shown bitstream A of codec format A enters decoder A 10. Decoder A 10 may be a plain decoder or a codec unit, for example. In Decoder A 10 bitstream A is partially decoded by extracting at least LPC coefficients from bitstream A. Other parameters, such as pitch delays, fixed codebook indexes, and fixed and adaptive gains, may also be extracted. The LPC coefficients and possible other extracted parameters as well as the partially decoded bitstream (signal) are further transmitted to a frequency modification unit 30. The frequency modification unit 30 applies a modification of the sampling frequency to the LPC coefficients according to the embodiments described above. According to an embodiment, the partially decoded bitstream (signal) is up-sampled or down-sampled from the sampling frequency employed by source codec format A to the sampling frequency employed by target codec format B. This is also preferably done in the frequency modification unit 30. The modified LPC coefficients and possible other parameters as well as the modified signal are then transmitted to encoder B 20. Encoder B 20 may be a plain encoder or a codec unit, for example. In encoder B 20 the modified LPC coefficients are mapped into LPC coefficients of codec format B and the partially decoded bitstream is encoded into a bitstream of codec format B using the mapped LPC coefficients. It should be noted that the partial encoding, i.e. the extraction of LPC coefficients in decoder A and the mapping of parameters and encoding in encoder B, can be performed in a similar manner as in existing transcoding solutions. Therefore, they need not to be discussed in more detail here. It should be further noted that not only e.g. existing mapping schemes can be used but also any future mapping schemes may be utilized.
The modification of the sampling frequency (step 2 above) can be implemented in many different ways. Concrete performance of smart transcoding depends on the way the modification of the sampling frequency is done. One possible problem related to up-sampling and down-sampling deals with smoothing that may appear either in low frequency or high frequency of the vector [b̂_i ]. Therefore, it is preferable to enhance the obtained [b̂_i ] by resynthesizing properly the lower or higher frequency. In order to achieve this, a separate step may be used before the mapping step 3 above, in which an appropriate property of filter B̂(z), such as (but not restricted to) frequency response in the low and high frequency, is assured.
The following will now describe in more detail an implementation example according to an embodiment. The example presents transcoding between AMR 12.2 kbit/s and AMRWB 23.05 kbit/s codecs. It should be noted, however, that the use of the invention is not restricted to any particular codec format or standard or a particular mode of a given codec format. For example, the following codec formats could be used in connection with the invention: Full Rate (FR), Half Rate (HR), Enhanced Full Rate (EFR), Adaptive Multi-Rate (AMR), Adaptive Multi Rate WideBand (AMR-WB), Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec (EVRC), Variable-Rate Multi-Mode Wideband (VMR-WB) and Speex.
In the example the source codec format is AMR and the target codec format is AMR-WB. The AMR codec processes signals at a sampling frequency of F_s (A) = 8 kHz and provides an LPC analysis on 10 coefficients. The AMR-WB codec operates with a signal of 16 kHz and its LPC analysis is done on a signal of F_s (B) =12.8 kHz (a down-sampling is applied within the encoding). The LPC filter of the AMR-WB has 16 coefficients. In this case M* F_s (B)/F_s (A) = N. Thus, the correct amount of LPC coefficients may be obtained directly by the modification of the sampling frequency. The modification of the sampling frequency may be done in two phases such that first an up-sampling of vector [a_i ], i=0...M by a factor 3 is applied. A low pass filter is preferably applied to the up-sampled signal to avoid aliasing. A down-sampling by a factor 2 is then achieved. Thus the total up-sampling factor is 3/2. It has to be noted that considering F_s (B) and F_s (A), the factors of the down-sampling and up-sampling could have been 8 and 5, respectively. But the numbers 3 and 2 can lead to a better performance as the smoothing applied to the low or high frequency are less important. Considering that i=0...M, b₀ is set to be 1 and thus the resulting number of coefficients is 1 + 10*3/2 = 16.
Additionally in the exemplary embodiment, the obtained [b̂_i ] can be enhanced through the following exemplary processing: the zeros of filter B̂(z) are modified by taking into account the zeros of filter A(z) and of an additive estimated filter B̂ ₁(z). Such an operation makes it possible to avoid smoothing in the down-sampling phase of the above example which tends to reduce the number of zeros of the LPC analysis. B̂(z) is preferably designed so that smoothing is applied to the high frequency and no smoothing to the low frequency. B̂ ₁(z) is designed reversely by high smoothing in the low frequency and low smoothing in the high frequency domain. A(z) presents only information and zeros in the low frequency (since LPC filter A(z) models a 8 kHz signal and has no zeros above 4 kHz). Accordingly with A(z) and B̂ ₁(z), we consider two additional filters which apply a correction to the zeros of B̂(z) in the low and high frequency, respectively. It permits an accurate estimation of B(z), providing good performance of the smart transcoding based on mapping of the LPC.
In the above-described detailed example, an up-sampling was applied to the LPC coefficients because of transcoding from 8 kHz AMR codec format to 12.8 kHz AMR-WB codec format. Transcoding from e.g. AMR-WB codec format to AMR codec format can be arranged in a similar manner but by applying down-sampling to the LPC coefficients instead of up-sampling.
An apparatus according to an embodiment, such as the one shown in Figure 2, may be implemented as one unit (e.g. a transcoding unit) or as two or more separate units that are configured to implement the functionality of the various embodiments described. Here the term 'unit' refers generally to a physical or logical entity, such as a physical device or a part thereof or a software routine. For example, units 10, 20 and 30 may be physically separate units or implemented as one entity.
An apparatus according to any one of the embodiments can be implemented by means of a computer or corresponding digital signal processing equipment with suitable software therein, for example. Such a computer or digital signal processing equipment preferably comprises at least a working memory (RAM) providing storage area used for arithmetical operations and a central processing unit (CPU), such as a general-purpose digital signal processor (DSP). The CPU may comprise a set of registers, an arithmetic logic unit, and a control unit. The control unit is controlled by a sequence of program instructions transferred to the CPU from the RAM. The control unit may contain a number of microinstructions for basic operations. The implementation of microinstructions may vary depending on the CPU design. The program instructions may be coded by a programming language, which may be a high-level programming language, such as C, Java, etc., or a low-level programming language, such as a machine language, or an assembler. The computer may also have an operating system which may provide system services to a computer program written with the program instructions. It is also possible to use a specific integrated circuit or circuits, or corresponding components and devices for implementing the functionality according to any one of the embodiments
The invention can be implemented in existing system elements, such as various communication system elements, or by using separate dedicated elements or devices in a centralized or distributed manner. An example of such a system element is a media gateway or an internet protocol telephony gateway. Present elements for communication systems typically comprise processors and memory that can be utilized in the functions according to the embodiments. Thus, all modifications and configurations required for implementing an embodiment in existing devices may be performed as software routines, which may be implemented as added or updated software routines. If the functionality of the embodiments is implemented by software, such software can be provided as a computer program product comprising computer program code which, when run on a computer, causes the computer or corresponding arrangement to perform the functionality according to the invention as described above. Such a computer program code can be stored on a computer readable medium, such as suitable memory means, e.g. a flash memory or a disc memory, from which it is loadable to the unit or units executing the program code. In addition, such a computer program code implementing the invention can be loaded to the unit or units executing the computer program code via a suitable data network, for example, and it can replace or update a possibly existing program code.
A frequency modification unit 30 may be implemented as a module for interfacing between two codec formats. Such a module may be a physical device, a part of a physical device or a software module, for example. According to an embodiment, such a module is configured to modify the sampling frequency of extracted linear predictive coding coefficients according to the various embodiments described. For this purpose the module may comprise an up/down-sampling unit. Further such a module is configured to receive the linear predictive coding coefficients extracted from a bitstream from a decoder and to send the linear predictive coding coefficients obtained from the modification of the sampling frequency to an encoder. For this purpose the module may comprise e.g. suitable input and output terminals and receiving and sending units in connection thereto.
It will be obvious to a person skilled in the art that, as the technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described above but may vary within the scope of the claims.

Claims

A method for transcoding, the method comprising:
partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream;

mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; and

encoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, characterized in that the first and second codec formats employ different sampling frequencies and in that the method comprises:

modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.
A method according to claim 1, characterized in that the modifying of the sampling frequency of the extracted linear predictive coding coefficients comprises:
up-sampling the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is higher than the sampling frequency of the first codec format.
A method according to claim 2, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
A method according to claim 1, 2 or 3, characterized in that the modifying of the sampling frequency of the extracted linear predictive coding coefficients comprises:
down-sampling the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is lower than the sampling frequency of the first codec format.
A method according to claim 4, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
A method according to any one of claims 1 to 5, characterized in that the method comprises up-sampling or down-sampling the partially decoded first bitstream from the sampling frequency employed by the first codec format to the sampling frequency employed by the second codec format before encoding.
A method according to any one of claims 1 to 6, characterized in that the method comprises adjusting the number of linear predictive coding coefficients after modifying the sampling frequency of the extracted linear predictive coding coefficients to the number of coefficients required for encoding the partially decoded first bitstream into a second bitstream of a second codec format.
A method according to any one of claims 1 to 7, characterized in that the first and/or the second codec format is selected from the following: Full Rate, Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand, Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec, Variable-Rate Multi-Mode Wideband and Speex.
A method according to any one of claims 1 to 8, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
A computer program product comprising computer program code, wherein the execution of the program code in a computer causes the computer to carry out the steps of the method according to any one of claims 1 to 9.
An apparatus for transcoding comprising:
means for partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream;

means for mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; and

means for encoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, characterized in that the apparatus comprises means for modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.
An apparatus according to claim 11, characterized in that the means for modifying the sampling frequency of the extracted linear predictive coding coefficients is arranged to up-sample the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is higher than the sampling frequency of the first codec format.
An apparatus according to claim 12, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
An apparatus according to claim 11, 12 or 13, characterized in that the means for modifying the sampling frequency of the extracted linear predictive coding coefficients is arranged to down-sample the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is lower than the sampling frequency of the first codec format.
An apparatus according to claim 14, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
An apparatus according to any one of claims 11 to 15, characterized in that the apparatus comprises means for up-sampling or down-sampling the partially decoded first bitstream from the sampling frequency employed by the first codec format to the sampling frequency employed by the second codec format before encoding.
An apparatus according to any one of claims 11 to 16, characterized in that the apparatus comprises means for adjusting the number of linear predictive coding coefficients after the modification of the sampling frequency of the extracted linear predictive coding coefficients to the number of coefficients required for encoding the partially decoded first bitstream into a second bitstream of a second codec format.
An apparatus according to any one of claims 11 to 17, characterized in that the first and/or the second codec format is selected from the following: Full Rate, Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand, Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec, Variable-Rate Multi-Mode Wideband and Speex.
An apparatus according to any one of claims 11 to 18, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
A module for interfacing between codec formats, characterized in that the module comprises:
means for receiving from a decoder linear predictive coding coefficients extracted from a bitstream;

means for modifying the sampling frequency of the extracted linear predictive coding coefficients; and

means for sending the linear predictive coding coefficients obtained from the modification of the sampling frequency to an encoder.
A module according to claim 20, characterized in that the module comprises:
means for receiving from the decoder a partially decoded bitstream from which at least the linear predictive coding coefficients have been extracted;

means for up-sampling or down-sampling the partially decoded bitstream from the sampling frequency employed by the decoder to the sampling frequency employed by the encoder; and

means for sending the up- or down-sampled partially decoded bitstream to the encoder.
A module according to claim 20 or 21, characterized in that the module is a module for a gateway.
A module according to claim 22, characterized in that the gateway is a media gateway or an internet protocol telephony gateway.