EP2045800A1 - Method and apparatus for transcoding - Google Patents
Method and apparatus for transcoding Download PDFInfo
- Publication number
- EP2045800A1 EP2045800A1 EP07117956A EP07117956A EP2045800A1 EP 2045800 A1 EP2045800 A1 EP 2045800A1 EP 07117956 A EP07117956 A EP 07117956A EP 07117956 A EP07117956 A EP 07117956A EP 2045800 A1 EP2045800 A1 EP 2045800A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- sampling frequency
- predictive coding
- linear predictive
- codec format
- coding coefficients
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000005070 sampling Methods 0.000 claims abstract description 106
- 238000013507 mapping Methods 0.000 claims abstract description 19
- 238000012986 modification Methods 0.000 claims description 20
- 230000004048 modification Effects 0.000 claims description 20
- 230000003044 adaptive effect Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 description 11
- 238000009499 grossing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000001934 delay Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/173—Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
Definitions
- the present invention relates to transcoding.
- networks of different type have been developed, like mobile GSM, UMTS, CDMA and IP, providing alternative ways to the 'classical' circuit switched network.
- the interconnection of all these networks leads to an interoperability problem regarding transmission of speech.
- non-compatible speech standards have been adopted in the different networks, although, most of the codecs at medium rate (5-16,5 kbit/s for narrowband codecs, 5-25 kbit/s for wideband codecs) are based on the same model Code Excited Linear Prediction (CELP).
- CELP Code Excited Linear Prediction
- the simplest method to provide inter-connectivity consists of decoding one codec standard compressed bitstream A and re-encoding it into the other codec standard bitstream B. This conventional method is called tandem transcoding. It suffers from several problems such as complexity, delay and degradation of speech.
- bitstreams A and B transmit a similar set of parameters, such as Linear Prediction Coding (LPC) coefficients, pitch delays, fixed codebook indexes and fixed and adaptive gains.
- LPC Linear Prediction Coding
- the key idea of smart transcoding consists of avoiding the computation of parameters already available.
- An intelligent mapping and quantization of the parameters available in bitstream A into bitstream B parameters allow the skipping of many functions and hence reduce the computation load of the transcoding. As depicted in Figure 1 , only a partial decoding is necessary to extract the parameters from bitstream A. Their mapping as well as a partial encoding then builds the accurate bitstream B.
- LPC Linear Prediction Coefficients vector
- An object of the present invention is thus to provide a method and an apparatus for implementing the method so as to solve the above problem or at least to alleviate it.
- the objects of the invention are achieved by a method, a computer program product, an apparatus and a module which are characterized by what is stated in the independent claims.
- the preferred embodiments of the invention are disclosed in the dependent claims.
- the invention is based on recognizing the problem and on the realization that in transcoding between two codec formats employing different sampling frequencies, the LPC coefficients of the LPC filter of the target codec format can be estimated by applying a modification on the sampling frequency of the extracted LPC coefficients.
- An advantage of the method and apparatus of the invention is that it enables smart transcoding between two codec formats employing different sampling frequencies.
- the following embodiments are exemplary. Although the specification may refer to "an”, “one”, or “some” embodiment(s) in several locations, this does not necessarily mean that each such a reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.
- the present invention is applicable to any communication system or any combination of different communication systems such as GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division Multiple Access), WLAN (Wireless Local Area Network) UMTS (Universal Mobile Telecommunications System), CDMA and/or IP (Internet Protocol) standard, or any other suitable standard/non-standard communication means.
- GSM Global System for Mobile Communications
- WCDMA Wideband Code Division Multiple Access
- WLAN Wireless Local Area Network
- UMTS Universal Mobile Telecommunications System
- CDMA and/or IP Internet Protocol
- the communication system may be a fixed communication system or a wireless communication system or a communication system utilizing both fixed networks and wireless networks.
- the protocols used and the specifications of communication systems, especially in wireless communication develop rapidly. Such a development may require extra changes to an embodiment. Therefore, all terms and expressions should be interpreted broadly and they are intended to illustrate, not to restrict, the embodiment. In the following, different embodiments will be described using, as an example a system architecture to which the embodiments may be applied, without restricting the embodiment to such an architecture, however.
- the invention generally deals with the process of finding an estimation of filter B ( z ) when LPC filter A ( z ) is known.
- B ⁇ ( z ) the filter obtained from A ( z ).
- the coefficient of the constructed filter B ⁇ ( z ) can be mapped into encoder B in a similar way as smart transcoding based on LPC mapping, thus avoiding the computation of the LPC coefficients within encoder B and accordingly saving computation load.
- Filters A ( z ) and B ( z ) are AR models of two signals with different sampling frequencies.
- coefficients a i need to be extrapolated if N > M or interpolated if N ⁇ M.
- modification of the sampling frequency comprises up-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec (B) format is higher than the sampling frequency of the source codec (A) format.
- the up-sampling factor is equal to the ratio of the sampling frequency of the target codec format to the sampling frequency of the source codec format.
- modification of the sampling frequency comprises down-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec format is lower than the sampling frequency of the source codec format.
- the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
- the number of coefficients b ⁇ i can be further adjusted to the number of coefficients of target LPC filter B ( z ) if necessary. For instance if M* F s ( B )/ F s ( A )>N, the number of coefficients b ⁇ i can be restricted to N, and if M* F s ( B )/ F s ( A )>N, N- M* F s ( B )/ F s ( A ), zeros can be added to the vector b ⁇ t .
- FIG. 2 is a block diagram of an apparatus according to an embodiment. Different modules or units 10, 20 and 30 of the apparatus may be implemented in one or more physical or logical entities. Figure 2 is a simplified diagram that only shows some elements and functional entities relevant to understanding the various embodiments described here and whose implementation may differ from what is shown. The connections shown in Figure 2 are logical connections; the actual physical connections may be different.
- bitstream A of codec format A enters decoder A 10.
- Decoder A 10 may be a plain decoder or a codec unit, for example.
- bitstream A is partially decoded by extracting at least LPC coefficients from bitstream A. Other parameters, such as pitch delays, fixed codebook indexes, and fixed and adaptive gains, may also be extracted.
- the LPC coefficients and possible other extracted parameters as well as the partially decoded bitstream (signal) are further transmitted to a frequency modification unit 30.
- the frequency modification unit 30 applies a modification of the sampling frequency to the LPC coefficients according to the embodiments described above.
- the partially decoded bitstream (signal) is up-sampled or down-sampled from the sampling frequency employed by source codec format A to the sampling frequency employed by target codec format B. This is also preferably done in the frequency modification unit 30.
- the modified LPC coefficients and possible other parameters as well as the modified signal are then transmitted to encoder B 20.
- Encoder B 20 may be a plain encoder or a codec unit, for example.
- encoder B 20 the modified LPC coefficients are mapped into LPC coefficients of codec format B and the partially decoded bitstream is encoded into a bitstream of codec format B using the mapped LPC coefficients.
- the partial encoding i.e. the extraction of LPC coefficients in decoder A and the mapping of parameters and encoding in encoder B, can be performed in a similar manner as in existing transcoding solutions. Therefore, they need not to be discussed in more detail here. It should be further noted that not only e.g. existing mapping schemes can be used but also any future mapping schemes may be utilized.
- the modification of the sampling frequency can be implemented in many different ways. Concrete performance of smart transcoding depends on the way the modification of the sampling frequency is done.
- One possible problem related to up-sampling and down-sampling deals with smoothing that may appear either in low frequency or high frequency of the vector [ b ⁇ i ]. Therefore, it is preferable to enhance the obtained [ b ⁇ i ] by resynthesizing properly the lower or higher frequency.
- a separate step may be used before the mapping step 3 above, in which an appropriate property of filter B ⁇ ( z ), such as (but not restricted to) frequency response in the low and high frequency, is assured.
- the following will now describe in more detail an implementation example according to an embodiment.
- the example presents transcoding between AMR 12.2 kbit/s and AMRWB 23.05 kbit/s codecs. It should be noted, however, that the use of the invention is not restricted to any particular codec format or standard or a particular mode of a given codec format.
- FR Full Rate
- HR Half Rate
- EFR Enhanced Full Rate
- AMR Adaptive Multi-Rate
- AMR-WB Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec (EVRC), Variable-Rate Multi-Mode Wideband (VMR-WB) and Speex.
- EVRC Enhanced Variable Rate Codec
- VMR-WB Variable-Rate Multi-Mode Wideband
- Speex Speex.
- the source codec format is AMR and the target codec format is AMR-WB.
- the correct amount of LPC coefficients may be obtained directly by the modification of the sampling frequency.
- a low pass filter is preferably applied to the up-sampled signal to avoid aliasing.
- a down-sampling by a factor 2 is then achieved.
- the total up-sampling factor is 3/2.
- F s ( B ) and F s ( A ) the factors of the down-sampling and up-sampling could have been 8 and 5, respectively. But the numbers 3 and 2 can lead to a better performance as the smoothing applied to the low or high frequency are less important.
- the obtained [ b ⁇ i ] can be enhanced through the following exemplary processing: the zeros of filter B ⁇ ( z ) are modified by taking into account the zeros of filter A ( z ) and of an additive estimated filter B ⁇ 1 ( z ).
- B ⁇ ( z ) is preferably designed so that smoothing is applied to the high frequency and no smoothing to the low frequency.
- B ⁇ 1 ( z ) is designed reversely by high smoothing in the low frequency and low smoothing in the high frequency domain.
- a ( z ) presents only information and zeros in the low frequency (since LPC filter A ( z ) models a 8 kHz signal and has no zeros above 4 kHz). Accordingly with A ( z ) and B ⁇ 1 ( z ), we consider two additional filters which apply a correction to the zeros of B ⁇ ( z ) in the low and high frequency, respectively. It permits an accurate estimation of B ( z ), providing good performance of the smart transcoding based on mapping of the LPC.
- an up-sampling was applied to the LPC coefficients because of transcoding from 8 kHz AMR codec format to 12.8 kHz AMR-WB codec format.
- Transcoding from e.g. AMR-WB codec format to AMR codec format can be arranged in a similar manner but by applying down-sampling to the LPC coefficients instead of up-sampling.
- An apparatus may be implemented as one unit (e.g. a transcoding unit) or as two or more separate units that are configured to implement the functionality of the various embodiments described.
- the term 'unit' refers generally to a physical or logical entity, such as a physical device or a part thereof or a software routine.
- units 10, 20 and 30 may be physically separate units or implemented as one entity.
- An apparatus can be implemented by means of a computer or corresponding digital signal processing equipment with suitable software therein, for example.
- a computer or digital signal processing equipment preferably comprises at least a working memory (RAM) providing storage area used for arithmetical operations and a central processing unit (CPU), such as a general-purpose digital signal processor (DSP).
- the CPU may comprise a set of registers, an arithmetic logic unit, and a control unit.
- the control unit is controlled by a sequence of program instructions transferred to the CPU from the RAM.
- the control unit may contain a number of microinstructions for basic operations. The implementation of microinstructions may vary depending on the CPU design.
- the program instructions may be coded by a programming language, which may be a high-level programming language, such as C, Java, etc., or a low-level programming language, such as a machine language, or an assembler.
- the computer may also have an operating system which may provide system services to a computer program written with the program instructions. It is also possible to use a specific integrated circuit or circuits, or corresponding components and devices for implementing the functionality according to any one of the embodiments
- the invention can be implemented in existing system elements, such as various communication system elements, or by using separate dedicated elements or devices in a centralized or distributed manner.
- An example of such a system element is a media gateway or an internet protocol telephony gateway.
- Present elements for communication systems typically comprise processors and memory that can be utilized in the functions according to the embodiments.
- All modifications and configurations required for implementing an embodiment in existing devices may be performed as software routines, which may be implemented as added or updated software routines.
- software can be provided as a computer program product comprising computer program code which, when run on a computer, causes the computer or corresponding arrangement to perform the functionality according to the invention as described above.
- Such a computer program code can be stored on a computer readable medium, such as suitable memory means, e.g. a flash memory or a disc memory, from which it is loadable to the unit or units executing the program code.
- suitable memory means e.g. a flash memory or a disc memory
- such a computer program code implementing the invention can be loaded to the unit or units executing the computer program code via a suitable data network, for example, and it can replace or update a possibly existing program code.
- a frequency modification unit 30 may be implemented as a module for interfacing between two codec formats.
- a module may be a physical device, a part of a physical device or a software module, for example.
- a module is configured to modify the sampling frequency of extracted linear predictive coding coefficients according to the various embodiments described.
- the module may comprise an up/down-sampling unit.
- the module is configured to receive the linear predictive coding coefficients extracted from a bitstream from a decoder and to send the linear predictive coding coefficients obtained from the modification of the sampling frequency to an encoder.
- the module may comprise e.g. suitable input and output terminals and receiving and sending units in connection thereto.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
A method and apparatus for transcoding comprising means (10) for partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream; means (20) for mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; and means (20) for encoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, wherein the apparatus comprises means (30) for modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.
Description
- The present invention relates to transcoding.
- In the last decades, networks of different type have been developed, like mobile GSM, UMTS, CDMA and IP, providing alternative ways to the 'classical' circuit switched network. The interconnection of all these networks leads to an interoperability problem regarding transmission of speech. Indeed, non-compatible speech standards have been adopted in the different networks, although, most of the codecs at medium rate (5-16,5 kbit/s for narrowband codecs, 5-25 kbit/s for wideband codecs) are based on the same model Code Excited Linear Prediction (CELP). The simplest method to provide inter-connectivity consists of decoding one codec standard compressed bitstream A and re-encoding it into the other codec standard bitstream B. This conventional method is called tandem transcoding. It suffers from several problems such as complexity, delay and degradation of speech.
- Recently, so-called 'smart transcoding' solutions have been proposed, which are based on the fact that the different standards are based on the CELP principle. They aim at reducing the complexity of the transcoding as many functions at encoder B can be skipped, decreasing the delay and enhancing the quality or at least getting the same quality as with the normal transcoding. The basic idea is to use redundancy on the standard to avoid computing parameters that have already been computed. Reference is made to
Figure 1 that shows the principle of smart coding. When transcoding from a bitstream format of codec A into a bitstream format of codec B, bitstream A is first decoded in decoder A. The obtained decoded signal is then encoded into target format B (bitstream B) by encoder B. In case both codecs are CELP codecs, bitstreams A and B transmit a similar set of parameters, such as Linear Prediction Coding (LPC) coefficients, pitch delays, fixed codebook indexes and fixed and adaptive gains. The key idea of smart transcoding consists of avoiding the computation of parameters already available. An intelligent mapping and quantization of the parameters available in bitstream A into bitstream B parameters allow the skipping of many functions and hence reduce the computation load of the transcoding. As depicted inFigure 1 , only a partial decoding is necessary to extract the parameters from bitstream A. Their mapping as well as a partial encoding then builds the accurate bitstream B. - Article C. Beaugeant, H. Taddei, "Quality and Computation Load Reduction achieved by applying Smart Transcoding between CELP Speech Codecs", Eusipco, Poland, September 2007, gives several examples of possible mapping of parameters in the case of G.729.A and AMR codecs.
- One of the possible parameters mapped between speech codecs in transcoding is the Linear Prediction Coefficients vector (LPC). The mapping of the LPC coefficients is relatively straightforward when the speech codecs are applied to the signal at the same sampling frequency. A transposition of the LPCs from decoder A to encoder B leads to good quality and reduction of complexity as shown in the above-mentioned article. However, such a solution cannot be applied when codecs A and B employ different sampling frequencies. In that case if the LPC filters of codec A and B model signals of different sampling frequencies, it leads to a different number of coefficients and different meanings of the LPC coefficients. Existing solutions that provide mapping of LPC parameters for smart transcoding purposes are only based on the mapping of the LPC filter at the same sampling frequency (e.g. narrowband signal at 8 kHz).
- An object of the present invention is thus to provide a method and an apparatus for implementing the method so as to solve the above problem or at least to alleviate it. The objects of the invention are achieved by a method, a computer program product, an apparatus and a module which are characterized by what is stated in the independent claims. The preferred embodiments of the invention are disclosed in the dependent claims.
- The invention is based on recognizing the problem and on the realization that in transcoding between two codec formats employing different sampling frequencies, the LPC coefficients of the LPC filter of the target codec format can be estimated by applying a modification on the sampling frequency of the extracted LPC coefficients.
- An advantage of the method and apparatus of the invention is that it enables smart transcoding between two codec formats employing different sampling frequencies.
- In the following the invention will be described in greater detail by means of preferred embodiments with reference to the accompanying drawings, in which
-
Figure 1 is a block diagram showing the principle of smart transcoding; and -
Figure 2 is a block diagram of an embodiment. - The following embodiments are exemplary. Although the specification may refer to "an", "one", or "some" embodiment(s) in several locations, this does not necessarily mean that each such a reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments. The present invention is applicable to any communication system or any combination of different communication systems such as GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division Multiple Access), WLAN (Wireless Local Area Network) UMTS (Universal Mobile Telecommunications System), CDMA and/or IP (Internet Protocol) standard, or any other suitable standard/non-standard communication means. The communication system may be a fixed communication system or a wireless communication system or a communication system utilizing both fixed networks and wireless networks. The protocols used and the specifications of communication systems, especially in wireless communication, develop rapidly. Such a development may require extra changes to an embodiment. Therefore, all terms and expressions should be interpreted broadly and they are intended to illustrate, not to restrict, the embodiment. In the following, different embodiments will be described using, as an example a system architecture to which the embodiments may be applied, without restricting the embodiment to such an architecture, however.
- Notation and environment: in the following, two codecs A and B based on an LPC analysis at sampling frequencies Fs (A) and Fs (B) respectively are considered. The CELP codecs family is a subset of such codecs. A transcoding scheme where a signal sA (t) at sampling frequency Fs (A) is encoded by encoder A is considered. In a 'classical' transcoding scheme (i.e. without smart transcoding), the signal is decoded into a pcm signal which is resampled into the sampling frequency Fs (B) into a signal sB (t) and signal sB (t) is then encoded by encoder B.
-
-
-
-
- Taking into account these notations, the invention generally deals with the process of finding an estimation of filter B(z) when LPC filter A(z) is known. Let's note as B̂(z) the filter obtained from A(z). The coefficient of the constructed filter B̂(z) can be mapped into encoder B in a similar way as smart transcoding based on LPC mapping, thus avoiding the computation of the LPC coefficients within encoder B and accordingly saving computation load. Filters A(z) and B(z) are AR models of two signals with different sampling frequencies. To obtain the coefficient of filter B̂(z), coefficients ai need to be extrapolated if N > M or interpolated if N < M. The interpolation/extrapolation can be seen as a modification of the sampling frequency of signal [ai ] i=0...M (or alternatively [ai ], i=1...M) from sampling frequency Fs (A) to sampling frequency Fs (B). Accordingly, according to an embodiment, finding the coefficients of B̂(z) that approximate filter B(z) can be done through the following steps:
- 1. Extracting LPC coefficients [ai ], i=1...M from bitstream A in decoder A
- 2. Applying a modification of the sampling frequency on LPC coefficients [ai ], i=1...M, thus obtaining coefficients b̂i
- 3. Mapping coefficients b̂i in encoder B for quantization and for computation of the rest of the coefficients of encoder B (in CELP codecs e.g. pitch, gains, fixed codebook).
- In step 2 above it is alternatively possible to apply the modification of the sampling frequency on LPC coefficients [ai ] i=0...M . In that case the target LPC filter is preferably forced to set b0=1. According to an embodiment, modification of the sampling frequency comprises up-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec (B) format is higher than the sampling frequency of the source codec (A) format. According to an embodiment, the up-sampling factor is equal to the ratio of the sampling frequency of the target codec format to the sampling frequency of the source codec format. According to an embodiment, modification of the sampling frequency comprises down-sampling the extracted linear predictive coding coefficients when the sampling frequency of the target codec format is lower than the sampling frequency of the source codec format. According to an embodiment, the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format. Acccordingly, when applying a modification of the sampling frequency to LPC coefficients [ai ], i=1...M from sampling frequency Fs (A) to sampling frequency Fs (B), M* Fs (B)/Fs (A) coefficients b̂i are obtained. According to an embodiment, the number of coefficients b̂i can be further adjusted to the number of coefficients of target LPC filter B(z) if necessary. For instance if M* Fs (B)/Fs (A)>N, the number of coefficients b̂i can be restricted to N, and if M* Fs (B)/Fs (A)>N, N- M* Fs (B)/Fs (A), zeros can be added to the vector b̂t .
-
Figure 2 is a block diagram of an apparatus according to an embodiment. Different modules orunits Figure 2 is a simplified diagram that only shows some elements and functional entities relevant to understanding the various embodiments described here and whose implementation may differ from what is shown. The connections shown inFigure 2 are logical connections; the actual physical connections may be different. In the example shown bitstream A of codec format A entersdecoder A 10.Decoder A 10 may be a plain decoder or a codec unit, for example. InDecoder A 10 bitstream A is partially decoded by extracting at least LPC coefficients from bitstream A. Other parameters, such as pitch delays, fixed codebook indexes, and fixed and adaptive gains, may also be extracted. The LPC coefficients and possible other extracted parameters as well as the partially decoded bitstream (signal) are further transmitted to afrequency modification unit 30. Thefrequency modification unit 30 applies a modification of the sampling frequency to the LPC coefficients according to the embodiments described above. According to an embodiment, the partially decoded bitstream (signal) is up-sampled or down-sampled from the sampling frequency employed by source codec format A to the sampling frequency employed by target codec format B. This is also preferably done in thefrequency modification unit 30. The modified LPC coefficients and possible other parameters as well as the modified signal are then transmitted toencoder B 20.Encoder B 20 may be a plain encoder or a codec unit, for example. Inencoder B 20 the modified LPC coefficients are mapped into LPC coefficients of codec format B and the partially decoded bitstream is encoded into a bitstream of codec format B using the mapped LPC coefficients. It should be noted that the partial encoding, i.e. the extraction of LPC coefficients in decoder A and the mapping of parameters and encoding in encoder B, can be performed in a similar manner as in existing transcoding solutions. Therefore, they need not to be discussed in more detail here. It should be further noted that not only e.g. existing mapping schemes can be used but also any future mapping schemes may be utilized. - The modification of the sampling frequency (step 2 above) can be implemented in many different ways. Concrete performance of smart transcoding depends on the way the modification of the sampling frequency is done. One possible problem related to up-sampling and down-sampling deals with smoothing that may appear either in low frequency or high frequency of the vector [b̂i ]. Therefore, it is preferable to enhance the obtained [b̂i ] by resynthesizing properly the lower or higher frequency. In order to achieve this, a separate step may be used before the mapping step 3 above, in which an appropriate property of filter B̂(z), such as (but not restricted to) frequency response in the low and high frequency, is assured.
- The following will now describe in more detail an implementation example according to an embodiment. The example presents transcoding between AMR 12.2 kbit/s and AMRWB 23.05 kbit/s codecs. It should be noted, however, that the use of the invention is not restricted to any particular codec format or standard or a particular mode of a given codec format. For example, the following codec formats could be used in connection with the invention: Full Rate (FR), Half Rate (HR), Enhanced Full Rate (EFR), Adaptive Multi-Rate (AMR), Adaptive Multi Rate WideBand (AMR-WB), Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec (EVRC), Variable-Rate Multi-Mode Wideband (VMR-WB) and Speex.
- In the example the source codec format is AMR and the target codec format is AMR-WB. The AMR codec processes signals at a sampling frequency of Fs (A) = 8 kHz and provides an LPC analysis on 10 coefficients. The AMR-WB codec operates with a signal of 16 kHz and its LPC analysis is done on a signal of Fs (B) =12.8 kHz (a down-sampling is applied within the encoding). The LPC filter of the AMR-WB has 16 coefficients. In this case M* Fs (B)/Fs (A) = N. Thus, the correct amount of LPC coefficients may be obtained directly by the modification of the sampling frequency. The modification of the sampling frequency may be done in two phases such that first an up-sampling of vector [ai ], i=0...M by a factor 3 is applied. A low pass filter is preferably applied to the up-sampled signal to avoid aliasing. A down-sampling by a factor 2 is then achieved. Thus the total up-sampling factor is 3/2. It has to be noted that considering Fs (B) and Fs (A), the factors of the down-sampling and up-sampling could have been 8 and 5, respectively. But the numbers 3 and 2 can lead to a better performance as the smoothing applied to the low or high frequency are less important. Considering that i=0...M, b0 is set to be 1 and thus the resulting number of coefficients is 1 + 10*3/2 = 16.
- Additionally in the exemplary embodiment, the obtained [b̂i ] can be enhanced through the following exemplary processing: the zeros of filter B̂(z) are modified by taking into account the zeros of filter A(z) and of an additive estimated filter B̂ 1(z). Such an operation makes it possible to avoid smoothing in the down-sampling phase of the above example which tends to reduce the number of zeros of the LPC analysis. B̂(z) is preferably designed so that smoothing is applied to the high frequency and no smoothing to the low frequency. B̂ 1(z) is designed reversely by high smoothing in the low frequency and low smoothing in the high frequency domain. A(z) presents only information and zeros in the low frequency (since LPC filter A(z) models a 8 kHz signal and has no zeros above 4 kHz). Accordingly with A(z) and B̂ 1(z), we consider two additional filters which apply a correction to the zeros of B̂(z) in the low and high frequency, respectively. It permits an accurate estimation of B(z), providing good performance of the smart transcoding based on mapping of the LPC.
- In the above-described detailed example, an up-sampling was applied to the LPC coefficients because of transcoding from 8 kHz AMR codec format to 12.8 kHz AMR-WB codec format. Transcoding from e.g. AMR-WB codec format to AMR codec format can be arranged in a similar manner but by applying down-sampling to the LPC coefficients instead of up-sampling.
- An apparatus according to an embodiment, such as the one shown in
Figure 2 , may be implemented as one unit (e.g. a transcoding unit) or as two or more separate units that are configured to implement the functionality of the various embodiments described. Here the term 'unit' refers generally to a physical or logical entity, such as a physical device or a part thereof or a software routine. For example,units - An apparatus according to any one of the embodiments can be implemented by means of a computer or corresponding digital signal processing equipment with suitable software therein, for example. Such a computer or digital signal processing equipment preferably comprises at least a working memory (RAM) providing storage area used for arithmetical operations and a central processing unit (CPU), such as a general-purpose digital signal processor (DSP). The CPU may comprise a set of registers, an arithmetic logic unit, and a control unit. The control unit is controlled by a sequence of program instructions transferred to the CPU from the RAM. The control unit may contain a number of microinstructions for basic operations. The implementation of microinstructions may vary depending on the CPU design. The program instructions may be coded by a programming language, which may be a high-level programming language, such as C, Java, etc., or a low-level programming language, such as a machine language, or an assembler. The computer may also have an operating system which may provide system services to a computer program written with the program instructions. It is also possible to use a specific integrated circuit or circuits, or corresponding components and devices for implementing the functionality according to any one of the embodiments
- The invention can be implemented in existing system elements, such as various communication system elements, or by using separate dedicated elements or devices in a centralized or distributed manner. An example of such a system element is a media gateway or an internet protocol telephony gateway. Present elements for communication systems typically comprise processors and memory that can be utilized in the functions according to the embodiments. Thus, all modifications and configurations required for implementing an embodiment in existing devices may be performed as software routines, which may be implemented as added or updated software routines. If the functionality of the embodiments is implemented by software, such software can be provided as a computer program product comprising computer program code which, when run on a computer, causes the computer or corresponding arrangement to perform the functionality according to the invention as described above. Such a computer program code can be stored on a computer readable medium, such as suitable memory means, e.g. a flash memory or a disc memory, from which it is loadable to the unit or units executing the program code. In addition, such a computer program code implementing the invention can be loaded to the unit or units executing the computer program code via a suitable data network, for example, and it can replace or update a possibly existing program code.
- A
frequency modification unit 30 may be implemented as a module for interfacing between two codec formats. Such a module may be a physical device, a part of a physical device or a software module, for example. According to an embodiment, such a module is configured to modify the sampling frequency of extracted linear predictive coding coefficients according to the various embodiments described. For this purpose the module may comprise an up/down-sampling unit. Further such a module is configured to receive the linear predictive coding coefficients extracted from a bitstream from a decoder and to send the linear predictive coding coefficients obtained from the modification of the sampling frequency to an encoder. For this purpose the module may comprise e.g. suitable input and output terminals and receiving and sending units in connection thereto. - It will be obvious to a person skilled in the art that, as the technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described above but may vary within the scope of the claims.
Claims (23)
- A method for transcoding, the method comprising:partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream;mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; andencoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, characterized in that the first and second codec formats employ different sampling frequencies and in that the method comprises:modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.
- A method according to claim 1, characterized in that the modifying of the sampling frequency of the extracted linear predictive coding coefficients comprises:up-sampling the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is higher than the sampling frequency of the first codec format.
- A method according to claim 2, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
- A method according to claim 1, 2 or 3, characterized in that the modifying of the sampling frequency of the extracted linear predictive coding coefficients comprises:down-sampling the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is lower than the sampling frequency of the first codec format.
- A method according to claim 4, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
- A method according to any one of claims 1 to 5, characterized in that the method comprises up-sampling or down-sampling the partially decoded first bitstream from the sampling frequency employed by the first codec format to the sampling frequency employed by the second codec format before encoding.
- A method according to any one of claims 1 to 6, characterized in that the method comprises adjusting the number of linear predictive coding coefficients after modifying the sampling frequency of the extracted linear predictive coding coefficients to the number of coefficients required for encoding the partially decoded first bitstream into a second bitstream of a second codec format.
- A method according to any one of claims 1 to 7, characterized in that the first and/or the second codec format is selected from the following: Full Rate, Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand, Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec, Variable-Rate Multi-Mode Wideband and Speex.
- A method according to any one of claims 1 to 8, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
- A computer program product comprising computer program code, wherein the execution of the program code in a computer causes the computer to carry out the steps of the method according to any one of claims 1 to 9.
- An apparatus for transcoding comprising:means for partially decoding a first bitstream of a first codec format by extracting at least linear predictive coding coefficients from the first bitstream;means for mapping the extracted linear predictive coding coefficients into linear predictive coding coefficients of a second codec format; andmeans for encoding the partially decoded first bitstream into a second bitstream of a second codec format using the mapped linear predictive coding coefficients, characterized in that the apparatus comprises means for modifying the sampling frequency of the extracted linear predictive coding coefficients before the mapping of the extracted linear predictive coding coefficients.
- An apparatus according to claim 11, characterized in that the means for modifying the sampling frequency of the extracted linear predictive coding coefficients is arranged to up-sample the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is higher than the sampling frequency of the first codec format.
- An apparatus according to claim 12, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
- An apparatus according to claim 11, 12 or 13, characterized in that the means for modifying the sampling frequency of the extracted linear predictive coding coefficients is arranged to down-sample the extracted linear predictive coding coefficients when the sampling frequency of the second codec format is lower than the sampling frequency of the first codec format.
- An apparatus according to claim 14, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second codec format to the sampling frequency of the first codec format.
- An apparatus according to any one of claims 11 to 15, characterized in that the apparatus comprises means for up-sampling or down-sampling the partially decoded first bitstream from the sampling frequency employed by the first codec format to the sampling frequency employed by the second codec format before encoding.
- An apparatus according to any one of claims 11 to 16, characterized in that the apparatus comprises means for adjusting the number of linear predictive coding coefficients after the modification of the sampling frequency of the extracted linear predictive coding coefficients to the number of coefficients required for encoding the partially decoded first bitstream into a second bitstream of a second codec format.
- An apparatus according to any one of claims 11 to 17, characterized in that the first and/or the second codec format is selected from the following: Full Rate, Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand, Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec, Variable-Rate Multi-Mode Wideband and Speex.
- An apparatus according to any one of claims 11 to 18, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
- A module for interfacing between codec formats, characterized in that the module comprises:means for receiving from a decoder linear predictive coding coefficients extracted from a bitstream;means for modifying the sampling frequency of the extracted linear predictive coding coefficients; andmeans for sending the linear predictive coding coefficients obtained from the modification of the sampling frequency to an encoder.
- A module according to claim 20, characterized in that the module comprises:means for receiving from the decoder a partially decoded bitstream from which at least the linear predictive coding coefficients have been extracted;means for up-sampling or down-sampling the partially decoded bitstream from the sampling frequency employed by the decoder to the sampling frequency employed by the encoder; andmeans for sending the up- or down-sampled partially decoded bitstream to the encoder.
- A module according to claim 20 or 21, characterized in that the module is a module for a gateway.
- A module according to claim 22, characterized in that the gateway is a media gateway or an internet protocol telephony gateway.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07117956A EP2045800A1 (en) | 2007-10-05 | 2007-10-05 | Method and apparatus for transcoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07117956A EP2045800A1 (en) | 2007-10-05 | 2007-10-05 | Method and apparatus for transcoding |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2045800A1 true EP2045800A1 (en) | 2009-04-08 |
Family
ID=39110850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07117956A Withdrawn EP2045800A1 (en) | 2007-10-05 | 2007-10-05 | Method and apparatus for transcoding |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2045800A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869653A (en) * | 2016-05-31 | 2016-08-17 | 华为技术有限公司 | Voice signal processing method and related device and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003003770A1 (en) * | 2001-06-26 | 2003-01-09 | Nokia Corporation | Method for transcoding audio signals, transcoder, network element, wireless communications network and communications system |
US20040102966A1 (en) * | 2002-11-25 | 2004-05-27 | Jongmo Sung | Apparatus and method for transcoding between CELP type codecs having different bandwidths |
US20040111257A1 (en) * | 2002-12-09 | 2004-06-10 | Sung Jong Mo | Transcoding apparatus and method between CELP-based codecs using bandwidth extension |
US6829579B2 (en) * | 2002-01-08 | 2004-12-07 | Dilithium Networks, Inc. | Transcoding method and system between CELP-based speech codes |
US20050053130A1 (en) * | 2003-09-10 | 2005-03-10 | Dilithium Holdings, Inc. | Method and apparatus for voice transcoding between variable rate coders |
US20070124138A1 (en) * | 2003-12-10 | 2007-05-31 | France Telecom | Transcoding between the indices of multipulse dictionaries used in compressive coding of digital signals |
EP1796084A1 (en) * | 2004-11-04 | 2007-06-13 | Matsushita Electric Industrial Co., Ltd. | Vector conversion device and vector conversion method |
-
2007
- 2007-10-05 EP EP07117956A patent/EP2045800A1/en not_active Withdrawn
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003003770A1 (en) * | 2001-06-26 | 2003-01-09 | Nokia Corporation | Method for transcoding audio signals, transcoder, network element, wireless communications network and communications system |
US6829579B2 (en) * | 2002-01-08 | 2004-12-07 | Dilithium Networks, Inc. | Transcoding method and system between CELP-based speech codes |
US20040102966A1 (en) * | 2002-11-25 | 2004-05-27 | Jongmo Sung | Apparatus and method for transcoding between CELP type codecs having different bandwidths |
US20040111257A1 (en) * | 2002-12-09 | 2004-06-10 | Sung Jong Mo | Transcoding apparatus and method between CELP-based codecs using bandwidth extension |
US20050053130A1 (en) * | 2003-09-10 | 2005-03-10 | Dilithium Holdings, Inc. | Method and apparatus for voice transcoding between variable rate coders |
US20070124138A1 (en) * | 2003-12-10 | 2007-05-31 | France Telecom | Transcoding between the indices of multipulse dictionaries used in compressive coding of digital signals |
EP1796084A1 (en) * | 2004-11-04 | 2007-06-13 | Matsushita Electric Industrial Co., Ltd. | Vector conversion device and vector conversion method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869653A (en) * | 2016-05-31 | 2016-08-17 | 华为技术有限公司 | Voice signal processing method and related device and system |
WO2017206432A1 (en) * | 2016-05-31 | 2017-12-07 | 华为技术有限公司 | Voice signal processing method, and related device and system |
US10218856B2 (en) | 2016-05-31 | 2019-02-26 | Huawei Technologies Co., Ltd. | Voice signal processing method, related apparatus, and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102240271B1 (en) | Apparatus and method for generating a bandwidth extended signal | |
JP5203929B2 (en) | Vector quantization method and apparatus for spectral envelope display | |
US7502734B2 (en) | Method and device for robust predictive vector quantization of linear prediction parameters in sound signal coding | |
US11282530B2 (en) | Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates | |
US7184953B2 (en) | Transcoding method and system between CELP-based speech codes with externally provided status | |
JP4390803B2 (en) | Method and apparatus for gain quantization in variable bit rate wideband speech coding | |
RU2509379C2 (en) | Device and method for quantising and inverse quantising lpc filters in super-frame | |
CN101375330B (en) | Re-phasing of decoder states after packet loss | |
US8666754B2 (en) | Audio signal encoding method, audio signal decoding method, encoding device, decoding device, audio signal processing system, audio signal encoding program, and audio signal decoding program | |
EP1788556B1 (en) | Scalable decoding device and signal loss concealment method | |
JP2007537494A (en) | Method and apparatus for speech rate conversion in a multi-rate speech coder for telecommunications | |
EP2502231B1 (en) | Bandwidth extension of a low band audio signal | |
SE521693C3 (en) | A method and apparatus for noise suppression | |
CN100578618C (en) | Decoding method and device | |
CN1751338B (en) | Method and apparatus for speech coding | |
JP2005515486A (en) | Transcoding scheme between speech codes by CELP | |
EP2045800A1 (en) | Method and apparatus for transcoding | |
JPH09127987A (en) | Signal coding method and device therefor | |
JP2004177982A (en) | Encoding device and decoding device for sound music signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20091009 |