KR20040095205A

KR20040095205A - A transcoding scheme between celp-based speech codes

Info

Publication number: KR20040095205A
Application number: KR10-2004-7010699A
Authority: KR
Inventors: 자브리마르완에이.; 왕지안웨이; 고울드스테펜
Original assignee: 딜리시움 네트웍스 피티와이 리미티드
Priority date: 2002-01-08
Filing date: 2003-01-08
Publication date: 2004-11-12
Also published as: AU2003207498A1; AU2003207498A8; JP2005515486A; WO2003058407A3; EP1464047A4; CN100527225C; WO2003058407A2; CN1701353A; EP1464047A2

Abstract

본 발명은 소스 코덱으로부터 데스티네이션 코덱으로 CELP를 기반으로 한 압축 음성 비트스트림을 변환코딩하는 시스템 및 방법에 관한 것이다. 본 발명의 방법은, 소스 코덱 입력 비트스트림을 처리하여, 입력 CELP 비트스트림으로부터 CELP 파라미터를 언팩처리(1)하고, 데스티네이션 코덱 파라미터와 소스 코덱 파라미터간에 차이가 있는 경우에 언팩처리된 CELP 파라미터를 보간처리(2)한다. 본 발명의 방법에서, 소스 코덱 포맷으로부터 데스티네이션 코덱 포맷으로 CELP를 매핑처리(4)하는 경우, 파라미터 매핑 방식이 단독으로 사전설정되거나 선택(3)될 수 있다. 본 발명의 방법은 데스티네이션 코덱에 대해 CELP 파라미터를 부호화하는 단계와, 데스티네이션 코덱에 대해 CELP 파라미터를 패킹처리(7)함으로써 데스티네이션 CELP 비트스트림을 처리하는 단계를 포함한다.The present invention relates to a system and method for transcoding a compressed speech bitstream based on CELP from a source codec to a destination codec. The method of the present invention processes the source codec input bitstream, unpacks (1) the CELP parameters from the input CELP bitstream, and extracts the unpacked CELP parameters if there is a difference between the destination codec parameters and the source codec parameters. Interpolate (2). In the method of the present invention, in the case of mapping (4) the CELP from the source codec format to the destination codec format, the parameter mapping scheme can be preset or selected (3) alone. The method includes encoding the CELP parameters for the destination codec and processing the destination CELP bitstream by packing 7 the CELP parameters for the destination codec.

Description

A transcoding method between speech codes based on CPL {A TRANSCODING SCHEME BETWEEN CELP-BASED SPEECH CODES}

코딩(coding)은 원시 신호(음성, 화상, 영상 등)를 전송 또는 저장이 가능한 포맷으로 변환하는 처리기술이다. 이러한 코딩은 상당히 많은 압축이 이루어지는 것이 일반적이지만, 대체적으로 중요한 신호처리기술이 수반된다. 코딩의 결과는 주어진 압축 포맷에 따라 부호화(코드화)된 파라미터의 비트스트림(프레임의 연속)이다. 압축은 신호를 모델링하기 위한 다양한 기술을 이용하여 중복(redundant) 정보를 지각적(perceptual) 및 통계적으로 제거함으로써 달성된다. 이에 따라, 부호화된 포맷을 "압축 포맷" 또는 "파라미터 공간"(parameter space)이라고 한다. 디코더는 압축처리된 비트스트림을 수신하여 원시 신호를 재생성한다. 음성 코딩의 압축에는 통상적으로 정보의 손실이 뒤따르게 된다.Coding is a processing technique that converts a raw signal (audio, image, video, etc.) into a format that can be transmitted or stored. Such coding typically takes a lot of compression, but generally involves significant signal processing techniques. The result of the coding is a bitstream (a sequence of frames) of parameters encoded (coded) according to a given compression format. Compression is accomplished by using perceptual and statistical removal of redundant information using various techniques for modeling the signal. Accordingly, the encoded format is referred to as "compression format" or "parameter space". The decoder receives the compressed bitstream and reconstructs the raw signal. Compression of speech coding typically involves a loss of information.

상이한 압축 포맷간의 변환 처리 및/또는 이전에 부호화된 신호의 비트전송률의 감소 처리는, 변환코딩(transcoding)으로 알려져 있다. 이것은 대역폭을 유지하거나 호환성이 없는 클라이언트 및/또는 서버 장치를 접속하도록 구현될 수 있다. 변환코딩은 변환코더(transcoder)가 압축처리된 신호에 대해서만 액세스할 수 있으며 원시 신호에 대해서는 액세스할 수 없다는 점에서 직접적인 압축 처리기술과 상이하다.The conversion process between different compression formats and / or the process of reducing the bit rate of a previously encoded signal is known as transcoding. This may be implemented to maintain bandwidth or connect incompatible client and / or server devices. Transform coding differs from direct compression processing in that a transcoder can only access a compressed signal and not a raw signal.

변환코딩은 압축해제 처리를 행한 후 재압축 처리를 행하는 탠덤(tandem) 등의 부르트 포스 기법(brute force techniques)을 이용하여 구현될 수 있다. 대량의 처리가 종종 요구되고 신호를 압축해제하고 나서 재압축 처리를 하는 데에 지연이 발생할 수 있기 때문에, 압축 공간 또는 파라미터 공간에 변환코딩을 고려할 수 있다. 이러한 변환코딩은 압축 포맷간의 매핑(mapping: 정합)을 목적으로 하며, 다만 가능한 어디에서나 파라미터 공간에 유지하면서 매핑한다. 이 위치는 "스마트" 변환코딩(smart transcoding)의 정교한 알고리즘이 작용을 행하게 되는 위치이다. 변환코딩 기술이 진보하기는 했지만, 변화코딩 기술을 더 향상시키는 것이 바람직하다. 종래 기술의 한계에 대한 추가적인 설명은 본 명세서를 통해 이하 더 구체적으로 설명된다.Transform coding can be implemented using brute force techniques such as tandems that undergo decompression and then recompress. Since a large amount of processing is often required and there may be a delay in recompressing the signal after decompressing the signal, transform coding may be considered in the compression space or parameter space. This conversion coding aims at mapping between compression formats, but keeps mapping in the parameter space wherever possible. This position is where the sophisticated algorithm of "smart" transcoding will work. Although the conversion coding technique has advanced, it is desirable to further improve the change coding technique. Further description of the limitations of the prior art is described in more detail below throughout this specification.

관련출원 참조See related applications

본 출원은 이하 모두 소유권을 갖는, 2002년 1월 8일 제출된 미국 가출원 제60/347,270호, 2002년 3월 12일 제출된 제60/364,403호, 2002년 10월 25일 제출된 제60/421,446호, 2002년 10월 25일 제출된 제60/421,449호, 2002년 10월 25일 제출된 제60/421,446호에 대한 우선권을 주장하며, 상기 가출원들은 본 명세서에 완전하게 개시되어 있는 것처럼 본 명세서에 인용하여 포함하는 것으로 한다.This application is incorporated by reference in US Provisional Application No. 60 / 347,270, filed Jan. 8, 2002, No. 60 / 364,403, filed Mar. 12, 2002, No. 60 / 421,446, 60 / 421,449, filed October 25, 2002, and 60 / 421,446, filed October 25, 2002, the provisional application of which is hereby incorporated by reference as if fully set forth herein. It is included in the specification and quoted.

STATEMENT AS TO RIGHT TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT (연방정부 개발기금으로 발명된 발명의 경우 해당 발명의 권리에 대한 선언)STATEMENT AS TO RIGHT TO S MADE UNDER FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT (In the case of invention invented by Federal Development Fund

- 해당 없음- Not applicable

REFERENCE TO A "SEQUENCE LISTING," A TABLE, OR A COMPUTER PROGRAM LISTING APPENDIX SUBMITTED ON A COMPACT DISK (CD로 제출된 "서열목록", 표 또는 컴퓨터 프로그램 목록 첨부에 대한 참조)REFERENCE TO A "SEQUENCE LISTING," A TABLE, OR A COMPUTER PROGRAM LISTING APPENDIX SUBMITTED ON A COMPACT DISK (Reference to Attaching a "Sequence List", Table, or List of Computer Programs Submitted to CD)

- 해당 없음- Not applicable

본 발명은 일반적으로 정보 처리 기술에 관한 것이다. 더 구체적으로, 본발명은 하나의 CELP(Code Excited Linear Prediction: 코드 여기 선형 예측)를 기반으로 하는 표준으로부터 다른 CELP를 기반으로 하는 표준으로, 및/또는 단일의 표준내에서 상이한 모드로 CELP 프레임을 변환하는 방법 및 장치를 제공한다. 본 발명의 세부사항에 대해서는 본 명세서를 통해 이하 구체적으로 설명한다.The present invention relates generally to information processing techniques. More specifically, the present invention relates to a CELP-based standard from one based on one Code Excited Linear Prediction (CELP), and / or to a different mode within a single standard. Provided are a method and an apparatus for converting. Details of the present invention will be described below in detail through the present specification.

본 발명의 목적, 특징 및 장점들은 신규한 것으로 생각되며 첨부한 청구범위에 구체적으로 개시하고 있다. 본 발명은 그 구성이나 동작방식에 대해 다른 목적과 장점들과 함께, 첨부 도면을 참조하여 이하의 상세한 설명으로부터 명백하게 이해할 수 있을 것이다.The objects, features and advantages of the invention are considered novel and are specifically disclosed in the appended claims. The present invention will be clearly understood from the following detailed description with reference to the accompanying drawings, along with other objects and advantages with respect to the configuration and operation thereof.

도 1은 일반적인 CELP 코더의 디코더 단을 단순 블록으로 나타낸 도면.1 is a block diagram of a decoder stage of a typical CELP coder.

도 2는 일반적인 CELP 코더의 인코더 단을 단순 블록으로 나타낸 도면.2 is a block diagram illustrating an encoder stage of a general CELP coder;

도 3은 코덱의 수학적 모델을 단순 블록으로 나타낸 도면.3 shows a mathematical model of a codec as a simple block;

도 4는 탠덤 변환코덱의 수학적 모델을 단순 블록으로 나타낸 도면.4 shows a mathematical model of a tandem transform codec as a simple block.

도 5는 스마트 변환코덱의 수학적 모델을 단순 블록으로 나타낸 도면.5 is a block diagram illustrating a mathematical model of a smart transform codec.

도 6은 CELP를 기반으로 한 변환코딩용의 전형적인 장치 중 하나의 예를 나타내는 도면.FIG. 6 shows an example of one of the typical devices for transcoding based on CELP. FIG.

도 7은 CELP를 기반으로 한 변환코딩용의 전형적인 장치 중 하나의 예를 나타내는 도면.FIG. 7 shows an example of one of the typical devices for conversion coding based on CELP. FIG.

도 8은 CELP 코덱간의 일반적인 변환코딩을 단순 블록으로 나타낸 도면.8 is a block diagram illustrating general conversion coding between CELP codecs.

도 9는 GSM-AMR 및 G.723.1을 위한 서브프레임 보간을 단순하게 나타내는 도면.9 is a simplified diagram illustrating subframe interpolation for GSM-AMR and G.723.1.

도 10은 소스(source) CELP 코덱으로부터의 입력 CELP 비트스트림을 데스티네이션(destination) 코덱의 출력 CELP 비트스트림으로 변환코딩하도록, 본 발명의 실시예에 따라 구성된 시스템을 나타내는 단순 블록도.10 is a simplified block diagram illustrating a system configured in accordance with an embodiment of the present invention for transcoding an input CELP bitstream from a source CELP codec into an output CELP bitstream of a destination codec.

도 11은 소스 코덱 CELP 파라미터 언팩 모듈을 상세하게 나타낸 단순 블록도.11 is a simple block diagram detailing a source codec CELP parameter unpack module.

도 12는 G.723.1 내지 GSM-AMR에 대한 샘플단위(sample-by-sample)의 파라미터와 서브프레임의 보간을 나타내는 단순 블록도.12 is a simple block diagram showing interpolation of subframes and parameters in sample-by-sample for G.723.1 to GSM-AMR.

도 13은 소스 코덱 LPC 계수와 데스티네이션 코덱 코드화 LPC 계수에 의해조정되는 여기(excitation)을 나타내는 단순 블록도.FIG. 13 is a simple block diagram illustrating excitation adjusted by source codec LPC coefficients and destination codec coded LPC coefficients. FIG.

도 14는 CELP 파라미터 매핑에 대한 파라미터 매핑 및 튜닝 모듈을 상세하게 나타내는 단순 블록도.14 is a simple block diagram detailing a parameter mapping and tuning module for CELP parameter mapping.

도 15는 데스티네이션 CELP 파라미터 튜닝 모듈을 상세하게 나타내는 단순 블록도.Fig. 15 is a simple block diagram showing details of destination CELP parameter tuning module.

도 16은 GSM-AMR에 대한 프레임내의 데스티네이션 CELP 코드 패킹의 실시예를 단순하게 나타낸 도면.FIG. 16 is a simplified illustration of an embodiment of destination CELP code packing in a frame for GSM-AMR. FIG.

도 17은 G.723.1 대 GSM-AMR 변환코더의 실시예를 나타내는 도면.17 illustrates an embodiment of a G.723.1 to GSM-AMR conversion coder.

도 18은 SGM-AMR 대 G.723.1 변환코더의 실시예를 나타내는 도면.18 illustrates an embodiment of a SGM-AMR to G.723.1 transform coder.

본 발명은 정보를 처리하기 위한 기술을 제공한다. 더 구체적으로, 본 발명은 하나의 CELP(코드 여기 선형 예측)를 기반으로 한 표준으로부터 다른 CELP를 기반으로 한 표준으로, 및/또는 단일의 표준내에서 상이한 모드로 CELP 프레임을 변환하는 방법 및 장치를 제공한다. 본 발명의 보다 상세한 설명에 대해서는 본 명세서를 통해 이하에 구체적으로 제공한다.The present invention provides a technique for processing information. More specifically, the present invention provides a method and apparatus for converting a CELP frame from one CELP (Code Excited Linear Prediction) based standard to another based on another CELP, and / or in a different mode within a single standard. To provide. More detailed description of the present invention will be provided below in detail through the present specification.

구체적인 실시예로서, 본 발명은 하나의 CELP를 기반으로 한 표준으로부터 다른 CELP를 기반으로 한 표준으로, 및/또는 단일 표준내에서 상이한 모드로 CELP 프레임을 변환하는 장치를 제공한다. 이 장치는 소스 코덱(source codec)으로부터 하나 이상의 CELP 파라미터를 추출하는 비트스트림 언패킹 모듈(bitstreamunpacking module)을 구비한다. 또, 본 발명의 장치는 비트스트림 언패킹 모듈에 결합된 보간기 모듈(interpolator module)을 구비한다. 이 보간기 모듈은 상이한 소스 코덱과 데스티네이션 코덱(destination codec)의 프레임 크기, 서브프레임 크기 및/또는 샘플링 레이트간에 보간처리를 행한다. 보간기 모듈에는 매핑 모듈이 결합된다. 이 매핑 모듈은 소스 코덱의 하나 이상의 CELP 파라미터를, 데스티네이션 코덱의 하나 이상의 CELP 파라미터로 매핑처리를 행하도록 구성된다. 본 발명의 장치는 매핑 모듈에 결합되는 데스티네이션 비트스트림 패킹 모듈을 구비한다. 이 데스티네이션 비트스트림 패킹 모듈은 데스티네이션 코덱으로부터 적어도 하나 이상의 CELP 파라미터에 기초한 적어도 하나의 데스티네이션 출력 CELP 프레임을 구성하도록 되어 있다. 제어기는 적어도 데스티네이션 비트스트림 패킹 모듈, 매핑 모듈, 보간기 모듈 및 비트스트림 언패킹 모듈에 결합된다. 바람직하게는, 제어기는 하나 이상의 모듈의 동작을 감시하고 하나 이상의 외부 애플리케이션으로부터 명령을 수신하도록 되어 있다. 제어기는 하나 이상의 외부 애플리케이션에 상태 정보를 제공하도록 되어 있다.As a specific embodiment, the present invention provides an apparatus for converting a CELP frame from a standard based on one CELP to a standard based on another CELP, and / or in a different mode within a single standard. The apparatus includes a bitstream unpacking module that extracts one or more CELP parameters from a source codec. The apparatus of the present invention also includes an interpolator module coupled to the bitstream unpacking module. This interpolator module performs interpolation between the frame size, subframe size and / or sampling rate of different source codec and destination codec. The interpolator module is combined with a mapping module. The mapping module is configured to map one or more CELP parameters of the source codec to one or more CELP parameters of the destination codec. The apparatus of the present invention includes a destination bitstream packing module coupled to the mapping module. The destination bitstream packing module is adapted to construct at least one destination output CELP frame based on at least one CELP parameter from the destination codec. The controller is coupled to at least the destination bitstream packing module, the mapping module, the interpolator module and the bitstream unpacking module. Preferably, the controller is arranged to monitor the operation of one or more modules and to receive commands from one or more external applications. The controller is adapted to provide status information to one or more external applications.

다른 구체적인 실시예로서, 본 발명은 소스 코덱으로부터 데스티네이션 코덱으로 CELP를 기반으로 하는 압축 음성 비트스트림을 변환코딩하는 방법을 제공한다. 이 방법은 입력 CELP 비트스트림으로부터 적어도 하나 이상의 CELP 파라미터를 언패킹하도록 소스 코덱 입력 CELP 비트스트림을 처리하는 단계와, 데스티네이션 코덱 포맷의 프레임 크기, 서브프레임 크기 및/또는 샘플링 레이트를 갖는 다수의 데스티네이션 코덱 파라미터와, 소스 코덱 포맷의 프레임 크기, 서브프레임 크기 또는 샘플링 레이트를 갖는 다수의 소스 코덱 파라미터에서 하나 이상의 차이가 존재하는 경우에, 소스 코덱 포맷으로부터 데스티네이션 코덱 포맷으로 다수의 언패킹된 CELP 파라미터 중 하나 이상을 보간처리하는 단계를 포함한다. 본 발명의 방법은 데스티네이션 코덱에 대해 하나 이상의 CELP 파라미터를 부호화하는 단계와, 데스티네이션 코덱에 대해 하나 이상의 CELP 파라미터를 적어도 패킹함으로써 데스티네이션 CELP 비트스트림을 처리하는 단계를 포함한다.In another specific embodiment, the present invention provides a method of transcoding a compressed speech bitstream based on CELP from a source codec to a destination codec. The method comprises processing a source codec input CELP bitstream to unpack at least one CELP parameter from an input CELP bitstream, and a plurality of destinations having a frame size, subframe size, and / or sampling rate of a destination codec format. Multiple unpacked CELPs from the source codec format to the destination codec format, if one or more differences exist in the nation codec parameter and multiple source codec parameters having a frame size, subframe size or sampling rate of the source codec format. Interpolating one or more of the parameters. The method includes encoding one or more CELP parameters for the destination codec and processing the destination CELP bitstream by at least packing one or more CELP parameters for the destination codec.

다른 구체적인 실시예로서, 본 발명은 소스 코덱으로부터 데스티네이션 코덱 포맷으로 CELP를 기반으로 한 압축 음성 비트스트림을 처리하는 방법을 제공한다. 이 방법은 애플리케이션 처리기로부터 다수의 제어 신호 중 하나의 제어 신호를 전송하는 단계와, 애플리케이션으로부터의 적어도 제어 신호에 기초하여 다수의 상이한 CELP 매핑 방식으로부터 하나의 CELP 매핑 방식을 선택하는 단계를 포함한다. 본 발명의 방법은 소스 코덱 포멧으로부터 하나 이상의 CELP 파라미터를, 데스티네이션 코덱 포맷의 하나 이상의 CELP 파라미터로 매핑하기 위해 선택된 CELP 매핑 방식을 이용하여 매핑 처리를 수행하는 단계를 포함한다.In another specific embodiment, the present invention provides a method for processing a compressed voice bitstream based on CELP from a source codec to a destination codec format. The method includes transmitting one control signal of the plurality of control signals from an application processor and selecting one CELP mapping scheme from a plurality of different CELP mapping schemes based on at least control signals from the application. The method includes performing a mapping process using a CELP mapping scheme selected to map one or more CELP parameters from a source codec format to one or more CELP parameters of a destination codec format.

또한, 본 발명은 소스 코덱으로부터 데스티네이션 코덱 포맷으로 CELP를 기반으로 한 압축 음성 비트스트림을 처리하는 시스템을 제공한다. 이 시스템은 하나 이상의 메모리를 포함한다. 이러한 메모리는 애플리케이션 처리기로부터 다수의 제어 신호 중 하나의 제어 신호를 수신하는 하나 이상의 코드를 포함할 수 있다. 애플리케이션으로부터의 적어도 제어 신호에 기초하여 다수의 상이한 CELP 매핑 방식으로부터 하나의 CELP 매핑 방식을 선택하는 하나 이상의 코드도 포함될 수있다. 또, 상기 하나 이상의 메모리는 소스 코덱 포멧으로부터 하나 이상의 CELP 파라미터를, 데스티네이션 코덱 포맷의 하나 이상의 CELP 파라미터로 매핑하기 위해 선택된 CELP 매핑 방식을 이용하여 매핑 처리를 수행하는 하나 이상의 코드를 포함한다. 이 실시예에 따라, 본 명세서 또는 본 발명에 이용될 수 있는 본 명세서외의 부분에 개시된 기능을 실행하는 다른 컴퓨터 코드가 존재할 수 있다.The present invention also provides a system for processing a compressed voice bitstream based on CELP from a source codec to a destination codec format. The system includes one or more memories. Such a memory may include one or more codes for receiving a control signal of one of a plurality of control signals from an application processor. One or more codes may also be included that select one CELP mapping scheme from a plurality of different CELP mapping schemes based on at least control signals from the application. The one or more memories may also include one or more codes that perform mapping processing using a CELP mapping scheme selected to map one or more CELP parameters from a source codec format to one or more CELP parameters of a destination codec format. According to this embodiment, there may be other computer code that performs the functions disclosed in the present specification or in the non-parts that may be used in the present invention.

본 발명을 이용하여 많은 장점을 취할 수 있다. 실시예에 따라 이하의 장점들 중 하나 이상이 달성될 수 있다.Many advantages can be taken with the present invention. Depending on the embodiment, one or more of the following advantages may be achieved.

ㆍ 변환코딩 처리의 계산 복잡도를 감소시킬 수 있다.The computational complexity of the conversion coding process can be reduced.

ㆍ 변환코딩 처리를 통해 지연을 감소시킬 수 있다.ㆍ Delay can be reduced through the conversion coding process.

ㆍ 변환코딩에 필요한 메모리의 용량을 감소시킬 수 있다.• The amount of memory required for conversion coding can be reduced.

ㆍ 동적 레이트 제어를 도입할 수 있다.Dynamic rate control can be introduced.

ㆍ 내장된 음성활동 검출기(voice activity detector)를 통해 묵음 프레임(silence frame)을 지원할 수 있다.Silence frames can be supported through the built-in voice activity detector.

ㆍ 다양한 파라미터 매핑 방식을 이용할 수 있는 구조를 제공할 수 있다.A structure that can use various parameter mapping methods can be provided.

ㆍ 현재 및 향후의 다양한 CELP를 기반으로 한 코덱을 적용할 수 있는 일반적인 변환코딩 구조를 제공할 수 있다.ㆍ To provide general conversion coding structure that can apply codec based on various CELP now and in the future.

본 발명의 변환코딩기술은 이들 장점들의 하나 이상을 달성할 수 있다. 구체적인 실시예로서, 변환코딩 장치는,The conversion coding technique of the present invention can achieve one or more of these advantages. In a specific embodiment, the conversion encoding apparatus,

ㆍ 입력 부호화 CELP 비트스트림으로부터 CELP 파라미터를 추출하는 소스 CELP 파라미터 언패킹 모듈;A source CELP parameter unpacking module for extracting CELP parameters from an input encoded CELP bitstream;

ㆍ 입력 소스 CELP 파라미터를, 소스 및 데스티네이션 코덱간의 서브프레임 크기 차이에 대응하는 데스티네이션 CELP 파라미터로 변환하는 CELP 파라미터 보간기; 파라미터 보간기는 소스 및 데스티네이션 코덱의 서브프레임 크기가 차이가 있는 경우에 이용된다.A CELP parameter interpolator that converts the input source CELP parameters into destination CELP parameters corresponding to the subframe size difference between the source and destination codec; The parametric interpolator is used when the subframe sizes of the source and destination codecs are different.

ㆍ 상기 보간기 모듈로부터의 CELP 파라미터를 데스티네이션 CELP 코덱 파라미터로 변환하는 데스티네이션 CELP 파라미터 매핑 및 튜닝 엔진;A destination CELP parameter mapping and tuning engine for converting CELP parameters from the interpolator module to destination CELP codec parameters;

ㆍ 매핑된 CELP 파라미터를 데스티네이션 CELP 코드 프레임으로 패킹처리하는 데스티네이션 CELP 코드 패커(code packer);A destination CELP code packer to pack the mapped CELP parameters into a destination CELP code frame;

ㆍ CELP 대 CELP의 변환코딩에서의 선택적인 기능 및 특징을 관리하는 개선된 특징 관리기;Improved feature manager to manage optional functions and features in CELP to CELP conversion coding;

ㆍ 전체 변환코딩 처리를 감시하는 제어기;A controller for monitoring the entire transcoding process;

ㆍ 변환코딩 처리의 상태를 제공하는 상태 보고 기능.Status reporting function providing the status of the conversion coding process.

소스 CELP 파라미터 언패킹 모듈은 포르만트 필터(formant filter)와 후처리 필터(post-filter)가 없는 단순화한 CELP 디코더이다.The source CELP parameter unpacking module is a simplified CELP decoder without formant and post-filters.

CELP 파라미터 보간기는 하나 이상의 CELP 파라미터에 관련된 보간기 세트를 구비한다.The CELP parameter interpolator has a set of interpolators associated with one or more CELP parameters.

데스티네이션 CELP 파라미터 매핑 및 튜닝 모듈은 파라미터 매핑 방식 스위칭 모듈과, 다음의 파라미터 매핑 방식, 즉 CELP 파라미터 직접 공간 매핑의 모듈, 여기 공간 매핑(excitation space mapping)에서의 분석 모듈 및 필터링된 여기 공간 매핑에서의 분석 모듈 중 하나 이상을 포함한다.The destination CELP parameter mapping and tuning module includes a parameter mapping scheme switching module and a following parameter mapping scheme, that is, a module of CELP parameter direct space mapping, an analysis module in excitation space mapping and a filtered excitation space mapping. At least one of the analysis modules.

본 발명은 서브프레임 단위로 변환코딩을 수행한다. 즉, (소스 압축 정보의)프레임이 변환코딩 시스템에 수신되면, 변환코더(transcoder)는 프레임에 대해 동작을 개시하고 출력 서브프레임을 생성할 수 있다. 일단, 충분한 개수의 서브프레임이 생성되었으면, (데스티네이션 포맷에 따른 압축 정보의)프레임이 생성되어, 통신이 목적인 경우, 통신 채널로 전송될 수 있다. 저장이 목적이라면, 생성된 프레임은 필요에 따라 저장될 수 있다. 소스 및 데스티네이션 포맷 표준에 의해 정의 된 프레임의 기간이 동일하다면, 단일의 입력 프레임은 단일의 출력 프레임을 생성할 것이며, 상기 기간이 동일하지 않다면 어느 하나의 입력 프레임을 버퍼링하거나 다중의 출력 프레임의 생성이 요구될 것이다. 서브프레임이 상이한 기간을 갖는다면, 서브프레임 파라미터간의 보간이 요구될 것이다. 따라서, 변환코딩 동작은 4개의 동작, 즉 (1)비트스트림의 언패킹, (2)소스 CELP 파라미터의 보간 및 서브프레임 버퍼링, (3)데스티네이션 CELP 파라미터에 대한 매핑 및 튜닝, (4)출력 프레임을 생성하기 위한 코드 패킹으로 구성된다.The present invention performs transform coding in units of subframes. In other words, when a frame (of source compression information) is received by the transcoding system, the transcoder may initiate an operation on the frame and generate an output subframe. Once a sufficient number of subframes have been created, a frame (of compressed information according to destination format) can be generated and sent to the communication channel if the communication is for the purpose. If storage is the purpose, the generated frames can be stored as needed. If the durations of the frames defined by the source and destination format standards are the same, a single input frame will produce a single output frame, and if the durations are not the same, then either buffer one input frame or Generation will be required. If the subframes have different periods, interpolation between subframe parameters will be required. Thus, the transcoding operation has four operations: (1) unpacking the bitstream, (2) interpolation and subframe buffering of the source CELP parameters, (3) mapping and tuning for the destination CELP parameters, and (4) output. It consists of code packing to create a frame.

따라서, 프레임을 수신할 때, 변환코더는 프레임내에 포함된 서브프레임의 각각에 대한 CELP 파라미터를 생성하기 위해 비트스트림을 언패킹한다(도 10의 블록(1)). 관련 파라미터는 LPC 계수, 여기(excitation)(적응 및 고정 코드워드로부터 생성) 및 피치 지연(pitch lag)이다. 낮은 복잡도를 갖는 경우, 양호한 품질을 생성하기 위한 해결책은 여기에 대해 복호화가 필요하다는 것이며 음성 파형의 완전한 합성은 필요하지 않다. 서브프레임 보간이 요구된다면, 스마트 보간 엔진에 의해 서브프레임 보간(subframe interpolation)이 구현된다(도 10의 블록(2)).Thus, upon receiving a frame, the transcoder unpacks the bitstream to generate a CELP parameter for each of the subframes contained within the frame (block 1 in FIG. 10). Relevant parameters are LPC coefficients, excitations (generated from adaptive and fixed codewords) and pitch lag. In the case of low complexity, the solution for producing good quality is that decoding is required for this and no full synthesis of speech waveforms is necessary. If subframe interpolation is required, subframe interpolation is implemented by the smart interpolation engine (block 2 in FIG. 10).

서브프레임은 데스티네이션 파라미터 매핑 및 튜닝 모듈에 의한 처리가 가능한 형태이다(도 10의 블록(5)). 단기(short-term) LPC 필터 계수는 여기 CELP 파라미터와 독립적으로 매핑된다. LSP 의사 주파수 공간에서의 단순한 선형 매핑은 데스티네이션 코덱에 대한 LSP 계수를 생성하는데 이용될 수 있다. 여기 CELP 파라미터는 계산 복잡도를 희생하고 더 양호한 품질의 출력에 따라 주어지는 많은 방법으로 매핑될 수 있다. 이들 3가지 매핑 방식은 본 문헌에 개시되어 있으며, 파라미터 매핑 및 튜닝 방식 모듈(도 10의 블록(4))의 일부가 된다.The subframe may be processed by the destination parameter mapping and tuning module (block 5 of FIG. 10). Short-term LPC filter coefficients are mapped independently of excitation CELP parameters. Simple linear mapping in the LSP pseudo frequency space can be used to generate the LSP coefficients for the destination codec. The CELP parameters here can be mapped in many ways at the expense of computational complexity and given a better quality output. These three mapping schemes are disclosed in this document and become part of the parameter mapping and tuning scheme module (block 4 in FIG. 10).

ㆍ CELP 파라미터 직접 공간 매핑(DSM);CELP parameter direct spatial mapping (DSM);

ㆍ 여기 공간 영역에서의 분석;Analysis in the excitation space domain;

ㆍ 필터링된 여기 공간 영역에서의 분석.Analysis in the filtered excitation space domain.

매핑 및 튜닝 방식의 선택은 매핑 및 튜닝 방식 스위칭 모듈(도 10의 블록(3))에 의해 이루어진다.The selection of the mapping and tuning scheme is made by the mapping and tuning scheme switching module (block 3 in FIG. 10).

3가지 방법은 계산 부하를 감소시키기 위해 품질을 상쇄시키기 때문에, 장치가 대량의 연립 채널에 의해 과부담되는 경우에 품질의 점진적 성능저하(graceful degradation)를 제공하는데에 이용될 수 있다. 따라서, 변환코더의 성능은 이용 가능한 리소스를 적용시키는 것이 가능하다. 이와 선택적으로, 변환코딩 시스템은 바람직한 품질과 성능을 제공하는 하나의 방식만을 이용하여 구축될 수 있다. 이 경우, 매핑 및 튜닝 방식 스위칭 모듈(도 10의 블록(3))은 포함되지 않을 것이다.Since the three methods cancel the quality to reduce the computational load, it can be used to provide a graceful degradation of quality when the device is overloaded with a large number of simultaneous channels. Thus, the performance of the transform coder can adapt the available resources. Alternatively, the transcoding system can be built using only one way to provide the desired quality and performance. In this case, the mapping and tuning scheme switching module (block 3 in FIG. 10) will not be included.

이 때, 데스티네이션 표준에 적용 가능하다면, 아웃바운드(outbound) 대역폭을 감소시키기 위해 음성활동 검출기(voice activity detector)(파라미터 공간에서동작)가 채택될 수 있다.At this point, if applicable to the destination standard, a voice activity detector (operating in parameter space) may be employed to reduce the outbound bandwidth.

매핑된 파라미터는 데스티네이션 비트스트림 포맷 프레임(도 10의 블록(7))으로 패킹될 수 있으며 전송 또는 저장을 위해 생성된다.The mapped parameter may be packed into a destination bitstream format frame (block 7 in FIG. 10) and is generated for transmission or storage.

본 발명은 CELP(코드 여기 선형 예측)를 기반으로 하는 코딩 방법들 및 표준들 사이의 스마트 변환코딩(smart transcoding)을 수행하는데 이용되는 알고리즘 및 방법에 관한 것이다. 본 발명은, 또한 (내장된 음성활동 검출기(VAD)를 통해 모드를 낮추거나 묵음(silence) 프레임을 도입하도록 변환코딩함으로써) 전송률 제어를 수행하기 위한 단일 표준내에서의 변환코딩에 관한 것이다.The present invention relates to algorithms and methods used to perform smart transcoding between coding methods and standards based on CELP (Code Excited Linear Prediction). The present invention also relates to transcoding within a single standard for performing rate control (by transcoding to lower the mode or introduce silence frames through the built-in voice activity detector (VAD)).

변환코딩의 전체 과정은 변환코딩 및 외부 명령의 상태에 기초하여 명령을 전송하는 제어 모듈(도 10의 (8))에 의해 감시(통제)된다.The entire process of the conversion coding is monitored (controlled) by the control module (Fig. 10 (8)) which sends the command based on the status of the conversion coding and the external command.

상이한 변환코딩 요구를 적용시키기 위해, 본 발명의 장치는 선택적인 특징 및 기능(도 10의 블록(6))을 추가하는 성능을 제공한다.In order to apply different transcoding requirements, the apparatus of the present invention provides the ability to add optional features and functions (block 6 in FIG. 10).

본 발명의 다른 특징 및 장점들은 첨부 도면을 참조하여 이하의 상세한 설명으로부터 명백해질 것이며, 유사한 참조부호는 전 도면을 통해 동일 또는 유사한 요소를 나타낸다.Other features and advantages of the present invention will become apparent from the following detailed description with reference to the accompanying drawings, in which like reference characters designate the same or similar elements throughout the figures thereof.

본 발명에 의하면, 정보를 처리하는 기술을 제공한다. 구체적으로 말하면, 본 발명은 하나의 CELP를 기반으로 하는 표준으로부터 다른 CELP를 기반으로 하는 표준으로, 및/또는 단일의 표준내에서 상이한 모드로 CELP 프레임을 변환하는 방법 및 장치에 제공한다. 본 발명의 더 상세한 설명을 본 명세서를 통해 이하에서 구체적으로 설명한다.According to the present invention, a technique for processing information is provided. Specifically, the present invention provides a method and apparatus for converting a CELP frame from a standard based on one CELP to a standard based on another CELP, and / or in a different mode within a single standard. A more detailed description of the invention is described below in detail throughout this specification.

본 발명은 CELP(code excited linear prediction: 코드 여기 선형 예측)를 기반으로 하는 코딩 방법들 및 표준들 사이의 스마트 변환코딩(smart transcoding)을 수행하는데 이용되는 알고리즘 및 방법에 관한 것이다. 중점으로 두는 것은 국제전기통신연합(ITU: International Telecommunication Union) 또는 유럽전기통신표준협회(ETSI: European Telecommunications Standards Institute) 등의 기관에의해 표준화된 CELP 코딩 방법이다. 본 발명은, 또한 [내장된 음성활동 검출기(VAD)를 통해 모드를 낮추거나 묵음(silence) 프레임을 도입하도록 변환코딩함으로써] 레이트 제어(rate control)를 수행하기 위한 단일 표준내에서의 변환코딩에 관한 것이다.The present invention relates to algorithms and methods used to perform smart transcoding between coding methods and standards based on code excited linear prediction (CELP). The focus is on CELP coding methods standardized by organizations such as the International Telecommunication Union (ITU) or the European Telecommunications Standards Institute (ETSI). The present invention also relates to transform coding within a single standard for performing rate control (by transform coding to lower the mode or introduce a silence frame through the built-in voice activity detector (VAD)). It is about.

음성코딩 기술은 일반적으로 파형 코더(예컨대, ITU에서의 표준 G.722, G.726, G.722)와, 합성에 의한 분석(analysis-by-synthesis: AbS) 타입의 코더[예컨대, ITU에서의 G.723.1 및 G.729 표준, ETSI에서의 GSM-AMR 표준, 개선된 가변 전송률 코덱(EVRC: Enhanced Variable-Rate Codec), 전기통신산업협회(TIA: Telecommunications Industry Association)에서의 선택형 모드 보코더(SMV: Selectable Mode Vocoder) 표준]로 분류할 수 있다. 상기 파형 코더는 시간 영역에서 동작하며, 음성 샘플들간의 상관(correlation)을 이용하는 샘플단위의 처리기법(sample-by-sample approach)에 기초한다. 상기 합성에 의한 분석 타입의 코더는 프레임 단위(통상 10-30ms 크기의 프레임이 사용됨)로, 출력 음성 스펙트럼을 형성하는 소스(성문: glottis)와 필터(성도: vocal tract)의 단순 모델에 의해 인간의 음성생성 시스템에 대한 모방을 시도한다.Speech coding techniques generally include waveform coders (e.g., standards G.722, G.726, and G.722 at ITU) and analysis-by-synthesis (ABS) type coders (e.g., at ITU). G.723.1 and G.729 standards, GSM-AMR standards in ETSI, Enhanced Variable-Rate Codec (EVRC), and optional mode vocoder (TIA) in the Telecommunications Industry Association (TIA) SMV: Selectable Mode Vocoder (Standard). The waveform coder operates in the time domain and is based on a sample-by-sample approach that uses correlation between speech samples. The coder of the analysis type according to the synthesis is a frame unit (typically a frame of 10-30 ms size is used), and a human model is generated by a simple model of a source (glottis) and a filter (vocal tract) forming an output speech spectrum. Try to imitate your voice generation system.

합성에 의한 분석 타입의 코더는 낮은 전송률에서 고품질의 음성을 제공하도록 도입되었으나, 요구되는 연산이 많다. 압축 기술은 통신 인터페이스에서의 자원을 보존하는 데는 의미가 있는 방법이다.Coders of the analysis type by synthesis have been introduced to provide high quality voice at low bit rates, but many computations are required. Compression technology is a meaningful way to conserve resources at the communication interface.

수학적으로, 모든 음성 코덱(speech codec)은 1차원 아날로그 음성신호x _α (t) 로서 시작하고, 이 음성신호가 균일하게 샘플링되고 양자화되어 디지털 영역 표현x(n)=Ｑ(xα(nT))을 얻게 된다. 음성신호에 대한 샘플링 비율f=1/T는 통상 8kHz 또는 16kHz 이며, 샘플링된 신호는 통상 16 비트의 최대값으로 양자화된다.Mathematically, every speech codec starts with a one-dimensional analog speech signal x _α ( t ), and this speech signal is uniformly sampled and quantized so that the digital domain representation x ( n ) = Ｑ ( xα ( nT )) You get The sampling rate f = 1 / T for the audio signal is usually 8 kHz or 16 kHz, and the sampled signal is usually quantized to a maximum value of 16 bits.

CELP를 기반으로 한 코덱은 음성생성의 모델을 이용하여 샘플링된 음성x(n)과 일부 파라미터 공간θ사이를 매핑시키는 알고리즘으로 생각될 수 있다. 즉, 디지털 음성을 부호화 및 복호화하는 것이다. CELP를 기반으로 한 모든 알고리즘은 음성 프레임 단위(몇개의 서브프레임으로 더 분리할 수도 있다)로 동작하게 된다. 일부 코덱에 있어서, 음성 프레임은 서로 중첩된다. 음성 프레임은 임의의 시간n에서 시작하는 음성 샘플의 벡터로서 규정할 수 있다. 즉,A codec based on CELP can be thought of as an algorithm for mapping between sampled speech x ( n ) and some parameter space θ using a model of speech generation. In other words, the digital voice is encoded and decoded. All algorithms based on CELP operate in speech frame units (which can be further separated into several subframes). In some codecs, voice frames overlap each other. The speech frame may be defined as a vector of speech samples starting at any time n . In other words,

여기서,L은 음성 프레임의 길이(샘플의 수)이다. 주의할 것은, 프레임 인덱스i는 선형 관계에 의해 제1 프레임 샘플n에 관련된다는 것이다.Where L is the length (number of samples) of the audio frame. Note that frame index i relates to the first frame sample n by a linear relationship.

n=iL비중첩 프레임의 경우 n = for iL non-overlapping frames

n=i(L-K) 중첩 프레임의 경우 n = i ( L - K ) for nested frames

여기서,K는 프레임들간의 중첩된 샘플의 수이다.Where K is the number of samples superimposed between the frames.

압축(손실 부호화) 처리는 파라미터θ _i 에 음섬 프레임을 매핑시키는 기능이고, 복호화 처리는 거꾸로 파라미터θ _i 로부터 원시 음성 프레임의 근사값에 매핑시킨다. 복호화기에 의해 생성되는 음성 프레임은 최초에 부호화된 음성프레임과 동일하지 않다. 코덱은 가능한 입력 음성과 지각적으로 유사한 출력 음성을 생성하도록 설계된다. 즉, 인코더는 파라미터를 처리할 때 디코더에 의해 생성되는 프레임과 입력 음성 프레임간의 일부 지각적 기준 측정값을 최대로 하는 파라미터를 생성해야 한다.Compression (lossy coding) processing is a parameterθ _i Essex frame onIs a function that maps theθ _i Raw speech frames fromMap to an approximation of. The speech frame generated by the decoder is not the same as the initially encoded speech frame. The codec is designed to produce an output voice that is perceptually similar to a possible input voice. That is, the encoder must generate a parameter that maximizes some perceptual reference measurement between the frame generated by the decoder and the input speech frame when processing the parameter.

일반적으로, 입력에서 파라미터로 그리고 파라미터에서 출력으로의 매핑은 이전의 모든 입력 또는 파라미터를 알고 있어야 한다. 이것은, 예컨대 CELP를 기반으로 하는 방법에 의해 이용되는 적응 코드북(adaptive codebook)의 구성에 있어서, 코덱S내의 상태를 유지함으로써 달성될 수 있다. 인코더 상태와 디코더 상태는 동기화를 유지하여야 한다. 이것은 양측(인코더 및 디코더)이 작는 데이터, 즉 파라미터에 기초한 상태를 갱신함에 의해서만 달성된다. 도 3은 인코더, 채널 및 디코더의 일반 모델을 나타낸다.In general, the mapping from input to parameter and from parameter to output must know all previous inputs or parameters. This can be achieved by maintaining a state in the codec S , for example in the construction of an adaptive codebook used by a method based on CELP. The encoder state and decoder state must be kept in sync. This is accomplished only by both sides (encoders and decoders) updating small data, i.e. a state based on parameters. 3 shows a general model of an encoder, a channel and a decoder.

CELP를 기반으로 한 모델에서 사용되는 프레임 파라미터θ는 적응 및 고정 코드로 이루어지는 여기 신호(excitation signal)뿐만 아니라 음성 신호(그리고 성도, 입, 비강 및 입술과 물리적으로 관련)의 단기 예측을 위해 이용되는 선형예측 계수(LPC: linear predictive coefficient)로 구성된다. 적응 코드는 음성에서의 장기 피치 정보를 모델화하는데 이용된다. 코드(적응 및 고정)는 특정의 CELP 코덱에 대해 미리 설정된 코드북과 연관된다. 도 1은 적응 및 고정 코드북 벡터가 이득 계수에 의해 독립적으로 스케일링되고, 조합 및 필터링되어 합성된 음성을 생성하는 통상적인 CELP 디코더를 나타낸다. 이 음성은 모델에 의해 생기는 아티팩트(artifacts)를 제거하기 위해 후처리 필터(post-filter)를 통과하게 된다.The frame parameter θ used in the CELP-based model is used for short-term prediction of speech signals (and physically related to vocal, mouth, nasal and lip) as well as excitation signals of adaptive and fixed codes. It consists of a linear predictive coefficient (LPC). The adaptation code is used to model long term pitch information in speech. Codes (adaptation and fixation) are associated with codebooks preset for a particular CELP codec. Figure 1 shows a typical CELP decoder in which the adaptive and fixed codebook vectors are scaled independently by gain coefficients, combined and filtered to produce synthesized speech. This voice passes through a post-filter to remove artifacts caused by the model.

CELP 인코딩(분석) 처리과정은, 도 2에 도시된 것과 같이, 바람직하지 않은 주파수 성분을 제거하기 위한 음성신호의 전처리(preprocessing), 윈도우 함수(windowing function)의 적용 및 단기 LPC 파라미터의 추출을 포함한다. 이 처리과정은 일반적으로 레빈슨-더빈 알고리즘(Levinson-Durbin algorithm)을 이용하여 구현된다. LPC 파라미터는 LSPs(Line Spectral Pairs: 라인스펙트럼 페어)로 변환되어 양자화 및 서브프레임 보간을 용이하게 한다. 다음으로, 음성은 잔차(residual) 여기 신호를 생성하기 위해 단기 LPC 필터에 의해 역필터링된다. 이러한 잔차(residual) 신호는 품질을 향상시키기 위해 지각적(perceptual)으로 가중처리(weight)되고 음성 피치의 추정치를 발견하기 위해 분석이 행해진다. 최적의 피치를 결정하기 위해 폐루프(closed-loop)형의 합성에 의한 분석 기법이 이용된다. 피치가 발견되면, 여기의 적응 코드북 성분이 잔차 신호 및 발견된 최적의 코드워드로부터 공제된다. 인코더의 내부 메모리는 코덱 상태(적응 코드북 등)에 대한 변화를 반영하도록 갱신된다.The CELP encoding (analysis) process includes, as shown in FIG. 2, preprocessing the speech signal to remove undesirable frequency components, applying a windowing function, and extracting short-term LPC parameters. do. This process is typically implemented using the Levinson-Durbin algorithm. LPC parameters are converted into LSPs (Line Spectral Pairs) to facilitate quantization and subframe interpolation. Next, the voice is back filtered by a short term LPC filter to produce a residual excitation signal. This residual signal is perceptually weighted to improve quality and analyzed to find estimates of speech pitch. In order to determine the optimal pitch, a closed-loop synthesis method is used. If the pitch is found, the adaptive codebook component here is subtracted from the residual signal and the best codeword found. The internal memory of the encoder is updated to reflect changes in the codec state (adaptive codebook, etc.).

변환코딩의 가장 간단한 방법은 탠덤 변환코딩이라 불리는 부르트 포스 방식으로서, 도 4를 참조하라. 이 방법은 합성된 음성을 생성하기 위해 입력되는 압축처리된 비트의 완전 복호화를 수행한다. 다음으로, 합성된 음성은 목표로 한 타겟 표준에 대해 부호화된다. 이 방법은 신호를 재부호화할 때 상당히 많은 양의 계산을 필요로 한다는 문제 뿐만 아니라 음성 파형의 전처리 및 후처리 필터링에 의해 생기는 품질 열화 문제와 인코더의 룩어헤드(look-ahead) 조건에 의해 생기는 잠재적인 지연 문제를 가지고 있다.The simplest method of transform coding is the Brute force method called tandem transform coding, see FIG. This method performs full decoding of the compressed bit input to generate the synthesized speech. Next, the synthesized speech is encoded against the target target standard. This method not only requires a significant amount of computation when recoding a signal, but also the degradation caused by pre- and post-processing filtering of speech waveforms and the potential caused by the look-ahead conditions of the encoder. Has a delay problem.

도 5에 도시된 것과 유사한 "스마트" 변환코팅을 위한 방법은 문헌에 개시되어 있다. 그러나, 이들 방법은 여전히 음성 신호를 재구성할 필요가 있으며 LPC 및 피치 등의 다양한 CELP를 추출하는 중요한 작업을 수행하여야 한다. 즉, 이들 방법은 여전히 음성 신호 공간에서 작용하게 된다. 특히, 가장 멀리 있는 종단(far-end) 인코더(압축 포맷에 따라 압축된 음성을 생성하는 종단에 있는 인코더)에 의해 최초 음성에 이미 최적으로 매칭된 여기 신호는 합성된 음성의 발생을 위해서만 이용된다. 이 합성된 음성은 새로운 최적의 여기를 계산하기 위해 이용된다. 폐루프 검색에서의 임펄스 응답 필터링 동작이 필요하기 때문에, 매우 많은 계산을 수반하는 동작이 된다. 도 6은 미국특허 6,260,009 B1에 개시된 방법을 도시하고 있다. 탐색기에 의해 타겟 신호로서 이용되는 재구성된 신호는 입력 여기 파라미터와 양자화된 출력 포르만트 필터 계수로부터 생성된다. 소스 및 데스티네이션 코덱에서의 양자화된 포르만트(formant) 필터 계수간에 차이가 있기 때문에, 탐색기에 대한 타겟 신호가 열화되어 변환코딩으로부터의 출력 음질이 크게 열화된다. 도 6을 참조하라. 다른 문제점들은 본 명세서를 통해 이하 더 구체적으로 설명한다.A method for "smart" conversion coatings similar to that shown in FIG. 5 is disclosed in the literature. However, these methods still need to reconstruct the speech signal and perform an important task of extracting various CELP such as LPC and pitch. In other words, these methods still operate in the speech signal space. In particular, the excitation signal already optimally matched to the original voice by the far-end encoder (an encoder at the end that produces compressed speech according to the compression format) is used only for the generation of synthesized speech. . This synthesized speech is used to calculate a new optimal excitation. Since the impulse response filtering operation in the closed loop search is necessary, this operation is very expensive. 6 illustrates a method disclosed in US Pat. No. 6,260,009 B1. The reconstructed signal used by the searcher as the target signal is generated from the input excitation parameter and the quantized output formant filter coefficients. Since there is a difference between the quantized formant filter coefficients in the source and destination codecs, the target signal for the searcher deteriorates and the output sound quality from the transform coding is greatly degraded. See FIG. 6. Other problems are described in more detail below throughout this specification.

다른 "스마트" 변환코딩 방법을 도 7에 도시한다. 미국특허출원 US2002/0077812 A1이 공개되어 있다. 이 방법은 CELP 파라미터간의 상호작용을 직접적으로 무시하는 각각의 CELP 파라미터의 매핑을 통해 변환코딩을 수행한다. 이 방법은 소스 및 데스티네이션 CELP 코덱간의 매우 제한된 조건을 요구하는 특별한 경우에서만 적용이 가능하다. 예를 들어, 이 방법은 소스 및 데스티네이션 코덱모두에서 대수적 CELP(ACELP)와 동일한 서브프레임 크기를 요구한다. 대부분의 CELP를 기반으로 한 변환코딩에 대해 양호한 음질의 음성을 생성하지 못한다. 이 방법은 GSM-AMR 모드 중 하나에 대해서만 적합하고, 모든 GSM-AMR 모드에는 적용되지 못한다.Another " smart " conversion coding method is shown in FIG. US patent application US2002 / 0077812 A1 is disclosed. This method performs transcoding through the mapping of each CELP parameter that directly ignores the interaction between the CELP parameters. This method is only applicable in special cases that require very limited conditions between the source and destination CELP codecs. For example, this method requires the same subframe size as the algebraic CELP (ACELP) in both the source and destination codecs. Most of the CELP-based conversion codings do not produce good sound quality. This method is only suitable for one of the GSM-AMR modes and does not apply to all GSM-AMR modes.

본 발명의 방법 및 장치에 대해 이하 상세히 설명한다. 이하의 내용에 있어서, 설명을 위해, 본 발명의 완전한 이해를 돕도록 많은 구체적인 세부사항이 개시되어 있다. 예시를 위해 GSM-AMR 및 G.723.1인 경우를 이용한다. 여기에서 설명하는 방법은 일반적인 것이며, 임의의 CELP 코덱쌍간의 변환코딩에 적용한다. 관련 기술분야의 당업자라면 본 발명의 범위를 벗어남이 없이 다른 단계, 구조 및 구성을 이용할 수 있다는 것을 알 수 있을 것이다.The method and apparatus of the present invention are described in detail below. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. For example use the case of GSM-AMR and G.723.1. The method described here is general and applies to conversion coding between arbitrary CELP codec pairs. Those skilled in the art will appreciate that other steps, structures, and configurations may be used without departing from the scope of the present invention.

본 발명은 CELP를 기반으로 한 음성 코딩 표준간의 스마트 변환코딩을 수행하는데 이용되는 알고리즘과 방법에 관한 것이다. 본 발명은, 또한 레이트(rate) 제어를 (내장된 음성활동 검출기(VAD)를 통해 모드를 낮추거나 묵음 프레임을 도입하도록 변환코딩 처리함으로써)수행하기 위한 단일 표준내에서의 변환코딩에 관한 것이다. 이하는 본 발명에 대해 상세히 설명한다.The present invention relates to algorithms and methods used to perform smart transform coding between CELP-based speech coding standards. The invention also relates to transcoding within a single standard for performing rate control (by transcoding to lower the mode or introduce silent frames via the built-in voice activity detector (VAD)). The following describes the present invention in detail.

본 발명은 서브프레임 단위로 변환코딩을 수행한다. 즉, 변환코딩 시스템이 프레임을 수신하면, 변환코더는 수신한 서브프레임에 대한 동작과 출력 서브프레임의 생성을 개시할 수 있다. 일단 충분한 개수의 서브프레임이 생성되었으면, 프레임을 생성할 수 있다. 소스 및 데스티네이션 표준에 의해 규정된 프레임의 기간이 동일하면 하나의 입력 프레임이 하나의 출력 프레임을 생성할 것이고, 동일하지 않다면 입력 프레임들 중 하나의 버퍼링 또는 다중 출력 프레임의 생성이 필요하게 될 것이다. 서브프레임이 상이한 기간을 갖는다면, 서브프레임 파라미터간에 보간처리가 요구될 것이다. 따라서, 변환코딩은 4개의 동작, 즉 (1)비트스트림 언패킹(bitstream unpacking), (2)소스 CELP 파라미터의 서브프레임 버퍼링 및 보간, (3)데스티네이션 CELP 파라미터로의 매핑 및 튜닝, (4)출력 프레임을 생성하기 위한 코드 패킹(code packing)으로 이루어진다(도 8을 참조하라).The present invention performs transform coding in units of subframes. That is, when the transcoding system receives the frame, the transcoder may start the operation of the received subframe and the generation of the output subframe. Once a sufficient number of subframes have been created, a frame can be created. If the durations of the frames defined by the source and destination standards are the same, one input frame will produce one output frame, otherwise the buffering of one of the input frames or the generation of multiple output frames will be required. . If the subframes have different periods, interpolation processing will be required between the subframe parameters. Thus, transcoding has four operations: (1) bitstream unpacking, (2) subframe buffering and interpolation of source CELP parameters, (3) mapping and tuning to destination CELP parameters, and (4) Code packing to generate an output frame (see FIG. 8).

도 10은 본 발명에 따른 CELP를 기반으로 한 코덱 변환코딩 장치의 원리를 나타내는 블록도이다. 이 블록은 소스 비트스트림 언패킹 모듈, 스마트 보간 엔진, 파라미터 매핑 및 튜닝 모듈, 선택사양으로서의 개선된 특징 모듈, 제어 모듈 및 데스티네이션 비트스트림 패킹 모듈을 포함한다.10 is a block diagram showing the principle of a codec conversion coding apparatus based on CELP according to the present invention. This block includes a source bitstream unpacking module, a smart interpolation engine, a parameter mapping and tuning module, an optional improved feature module, a control module and a destination bitstream packing module.

파라미터 매핑 및 튜닝 모듈(parameter mapping & tuning module)은 매핑 및 튜닝 방식 스위칭 모듈(mapping & tuning strategy switching module)과, 파라미터 매핑 및 튜닝 방식 모듈(parameter mapping & tuning strategies module)을 포함한다.The parameter mapping & tuning module includes a mapping & tuning strategy switching module and a parameter mapping & tuning strategies module.

변환코딩 동작은 제어모듈에 의해 감시(통제)된다.The conversion coding operation is monitored (controlled) by the control module.

프레임을 수신하게 되면, 변환코더는 비트스트림을 압축해제, 측 언패킹하여, 프레임내에 포함된 각 서브프레임에 대한 CELP 파라미터를 생성하게 된다. 관련 파라미터는 LPC 계수, 여기(적응 및 고정 코드워드로부터 생성) 및 피치 지연(pitch lag)이다.Upon receiving the frame, the transform coder decompresses and side unpacks the bitstream to generate a CELP parameter for each subframe included in the frame. Relevant parameters are LPC coefficients, excitations (generated from adaptive and fixed codewords) and pitch lag.

여기(excitation)에 대한 복호화만이 요구되며 음성 파형의 완전한 합성은아니라는 것에 주의하라. 이에 의해, 소스 코덱 비트스트림 언패킹의 복잡도(complexity)를 크게 감소시킨다. 코드북 이득과 고정 코드워드(fixed codewords)도 또한 CELP 파라미터 직접 공간매핑(DSM: Direct Space Mapping) 변환코딩 방식과 관련되어 있다. 서브프레임 보간이 필요한 경우에 구현된다.Note that only decoding for excitation is required, not full synthesis of speech waveforms. This greatly reduces the complexity of the source codec bitstream unpacking. Codebook gains and fixed codewords are also associated with CELP parametric Direct Space Mapping (DSM) transcoding schemes. Implemented when subframe interpolation is required.

서브프레임은 도 14에 도시된 데스티네이션 파라미터 매핑 및 튜닝 모듈에 의해 처리 가능한 형태이다. 단기 LPC 필터 계수는 여기 CELP 파라미터와 독립적으로 매핑된다. LSP 의사 주파수 공간(pseudo frequency space)에서의 단순 선형 매핑은 데스티네이션 코덱에 대해 LSP 계수를 생성하는데 이용될 수 있다. 보다 정교한 비선형 보간(non-linear interpolation)도 이용될 수 있다. 여기 CELP 파라미터는 계산은 복잡하지만 출력 품질은 향상된 다양한 방식으로 매핑될 수 있다. 이러한 3가지 매핑 방식을 개시했으며, 파라미터 매핑 및 튜닝 방식 모듈(도 10의 블록 (4))의 일부가 된다.The subframe may be processed by the destination parameter mapping and tuning module shown in FIG. 14. The short term LPC filter coefficients are mapped independently of the excitation CELP parameters. Simple linear mapping in the LSP pseudo frequency space can be used to generate the LSP coefficients for the destination codec. More sophisticated non-linear interpolation can also be used. The CELP parameters here can be mapped in a variety of ways, with complex calculations but improved output quality. These three mapping schemes have been disclosed and become part of the parameter mapping and tuning scheme module (block (4) of FIG. 10).

매핑 및 튜닝 방식의 선택은 매핑 및 튜닝 방식 스위칭 모듈(Mapping & Tuning Strategy Switching Module)(도 10의 블록(3))에서 이루어진다.The selection of the mapping and tuning scheme is made in a mapping & tuning strategy switching module (block 3 in FIG. 10).

이들 3가지 방법에 대해 이하 상세하게 설명한다. 이들 3가지 방법은 계산 부담을 감소시키기 위해 품질이 상쇄되기 때문에, 장치가 대량의 연립 채널에 의해 과부담되는 경우에 품질의 점진적 성능저하를 제공하는데에 이용될 수 있다. 따라서, 변환코더의 성능은 이용 가능한 리소스를 적용시키는 것이 가능하다. 이와 선택적으로, 변환코딩 시스템은 바람직한 품질과 성능을 제공하는 하나의 방식만을 이용하여 구축될 수 있다. 이 경우, 매핑 및 튜닝 방식 스위칭 모듈(도 10의 블록(3))은 포함되지 않을 것이다.These three methods are described in detail below. These three methods can be used to provide a gradual degradation of quality when the device is overburdened by a large number of simultaneous channels because the quality is offset to reduce the computational burden. Thus, the performance of the transform coder can adapt the available resources. Alternatively, the transcoding system can be built using only one way to provide the desired quality and performance. In this case, the mapping and tuning scheme switching module (block 3 in FIG. 10) will not be included.

이 때, 데스티네이션 표준에 적용 가능하다면, 아웃바운드(outbound) 대역폭을 감소시키기 위해 음성활동 검출기(voice activity detector)(파라미터 공간에서 동작)가 채택될 수 있다.At this point, if applicable to the destination standard, a voice activity detector (operating in parameter space) may be employed to reduce the outbound bandwidth.

파라미터 매핑 및 튜닝 모듈의 출력은 데스티네이션 CELP 코덱 코드이다. 이 출력은 코덱 CELP 프레임 포맷에 따라 데스티네이션 비트스트림 프레임으로 패킹 처리된다. 이러한 패킹 처리(packing process)는 상기 출력 비트를 데스티네이션 CELP 디코더가 이해할 수 있는 포맷으로 구성하는데 필요하다. 저장을 행하기 위한 경우, 데스티네이션 CELP 파라미터는 애플리케이션 특정 포맷으로 저장될 수 있거나 패킹화될 수 있다. 패킹 처리는 또한 프레임이 멀티미디어 프로토콜에 따라 전송되어야 하는 경우에 변경이 가능하며, 예를 들어 패킹 처리에서 비트 스크램블처리가 구현될 것이다.The output of the parameter mapping and tuning module is the destination CELP codec code. This output is packed into destination bitstream frames according to the codec CELP frame format. This packing process is necessary to configure the output bits in a format that the destination CELP decoder can understand. For doing the storage, the destination CELP parameters may be stored or packed in an application specific format. The packing process is also changeable if the frame is to be transmitted in accordance with the multimedia protocol, for example a bit scramble process will be implemented in the packing process.

또한, 본 발명의 장치는 향후 선택사양으로서의 신호처리 기능 또는 모듈을 추가할 수 있는 성능을 제공한다.In addition, the apparatus of the present invention provides the ability to add a signal processing function or module as an option in the future.

서브프레임 보간(subframe interpolation) Subframe interpolation

서브프레임 보간은 상이한 표준에 대한 서브프레임이 신호 영역에서 상이한 시간 간격을 나타내는 경우, 또는 상이한 샘플링 레이트(sampling rate)가 이용되는 경우에 필요하다. 예를 들어, G.723.1 은 30ms 간격을 갖는 프레임(서브프레임당 7.5ms)을 이용하고, GSM-AMR은 20ms의 간격을 갖는 프레임(서브프레임당 5ms)을 이용한다. 이에 대해서는 도 9에 도시되어 있다. 서브프레임 보간은 2가지 상이한 타입의 파라미터, 즉 (1)샘플 단위의 파라미터(예컨대, 여기 및 코드워드 벡터)와, (2)서브프레임 파라미터(예컨대, LSP 계수 및 피치 지연 추정치)에 대해 수행된다. 샘플 단위의 파라미터는 이들의 이산적인 타임 인덱스를 고려하고 타겟 서브프레임에서의 적절한 위치에 복제함으로써 매핑된다. 상이한 샘플링 레이트가 상이한 CELP 표준에 의해 이용되는 경우 업샘플링 또는 다운샘플링이 요구될 수 있다. 서브프레임 파라미터는 타겟 서브프레임에서의 파라미터의 평활 추정치를 생성하도록 몇몇 보간 함수에 의해 보간처리된다. 스마트 보간 알고리즘은 계산 성능에 의해서만이 아니라 더 중요하게는 음질에 의해서 음성 변환코딩을 향상시킬 수 있다. 단순 보간 함수는 선형 보간기이다.Subframe interpolation is necessary when subframes for different standards represent different time intervals in the signal region, or when different sampling rates are used. For example, G.723.1 uses frames with a 30ms interval (7.5ms per subframe) and GSM-AMR uses frames with a 20ms interval (5ms per subframe). This is illustrated in FIG. 9. Subframe interpolation is performed on two different types of parameters: (1) sample-based parameters (eg, excitation and codeword vectors), and (2) subframe parameters (eg, LSP coefficients and pitch delay estimates). . The parameters in sample units are mapped by considering their discrete time indices and replicating to appropriate locations in the target subframe. Upsampling or downsampling may be required if different sampling rates are used by different CELP standards. Subframe parameters are interpolated by some interpolation functions to produce a smooth estimate of the parameters in the target subframe. Smart interpolation algorithms can improve speech conversion coding not only by computational performance but more importantly by sound quality. Simple interpolation functions are linear interpolators.

예로서, 도 9는 3개의 GSM-AMR 프레임이 2개의 G.723.1 프레임으로서 음성 신호의 동일 간격을 기술하는데 필요하다는 것을 나타낸다. 마찬가지로, 3개의 GSM-AMR 서브프레임은 2개의 G.723.1 서브프레임마다 필요하다. 상술한 바와 같이, 2가지 유형의 파라미터가 있다. 즉, 전체 서브프레임(subframe-wide)의 파라미터(예컨대, LSP 계수)와 샘플 단위의 파라미터(예컨대, 적음 및 고정 코드워드)가 있다.As an example, FIG. 9 shows that three GSM-AMR frames are needed to describe the same spacing of voice signals as two G.723.1 frames. Similarly, three GSM-AMR subframes are needed for every two G.723.1 subframes. As mentioned above, there are two types of parameters. That is, there are subframe-wide parameters (eg, LSP coefficients) and sample-level parameters (eg, low and fixed codewords).

θ로 나타낸 서브프레임 파라미터는 중첩 서브프레임의 가중합계(weighted sum)를 연산함으로써 선형으로 변환되며,v[ㆍ]로 나타낸 샘플 단위의 파라미터는적절한 샘플을 복제함으로써 형성된다. G.723.1 서브프레임으로부터 GSM-AMR 서브프레임으로의 보간에 대해, 분석식(analytical formula)은 다음과 같이 나타낸다. The subframe parameter, denoted θ , is converted linearly by computing the weighted sum of the overlapping subframes, and the parameter in the sample unit, denoted by v [·], is formed by duplicating an appropriate sample. For interpolation from a G.723.1 subframe to a GSM-AMR subframe, the analytic formula is expressed as follows.

여기서,i=0은 제1 GSM-ARM 프레임의 제1 서브프레임이고,i=4는 제2 GSM-AMR 프레임의 제1 서브프레임 등이다. 도 12는 이 과정을 나타낸다.Here, i = 0 is the first subframe of the first GSM-ARM frame, i = 4 is the first subframe of the second GSM-AMR frame, and so on. 12 shows this process.

LSP 파라미터는 전체 서브프레임 파라미터로서 의사 주파수 영역, 즉f= cos^-1(q) 에서 보간처리되어야 한다. 이에 의하여, 보다 향상된 품질의 출력이 된다. 다른 서브프레임 파라미터는 보간처리 이전에 변환될 필요는 없다.The LSP parameter should be interpolated in the pseudo frequency domain, i.e. f = cos ^-1 ( q ), as a full subframe parameter. This results in a more improved output quality. The other subframe parameters do not need to be converted before interpolation.

상기 분석식은 단순 선형 보간기로부터 유도된다. 이 식은 스플라인, 사인곡선 등의 임의의 적절한 보간 구조로 대체할 수 있다는 것이 중요하다. 또한, 각각의 CELP 파라미터(LSP 계수, 지연, 피치 이득, 코드워드 이득 등)는 최상의 지각적 품질(perceptual quality)을 얻기 위하여 상이한 보간 방식을 이용할 수 있다.The equation is derived from a simple linear interpolator. It is important to note that this equation can be replaced with any suitable interpolation structure, such as splines and sinusoids. In addition, each CELP parameter (LSP coefficient, delay, pitch gain, codeword gain, etc.) may use different interpolation schemes to obtain the best perceptual quality.

LSP 계수에 의한 LSP 파라미터 매핑 및 여기 벡터 조정(LSP Parameter Mapping and Excitation Vector Calibration by LSP Coefficients) LSP Parameter Mapping and Excitation Vector Calibration by LSP Coefficients

거의 모든 CELP를 기반으로 한 오디오 코덱이 동일한 방식을 사용해서 LPC 계수를 획득한다고 하더라도, 여전히 몇몇 미세한 차이점이 있다. 이들 차이점은 상이한 윈도우 크기와 형태, 각 서브프레임에 대한 상이한 LPC 보간, 상이한 서브프레임 크기, 상이한 LPC 양자화 방식, 상이한 조사 테이블에 기인한다.Although almost all CELP-based audio codecs use the same method to obtain LPC coefficients, there are still some minor differences. These differences are due to different window sizes and shapes, different LPC interpolation for each subframe, different subframe sizes, different LPC quantization schemes, different lookup tables.

상술한 서브프레임 보간 방법을 통해 처리되는 오디오 변환코딩 품질을 더 향상시키기 위하여, 소스 및 데스티네이션 코덱으로부터 LPC 데이터를 적용함으로써, 변환코딩에서의 타겟 신호로서 이용되는 여기 벡터(excitation vectors)가 이용된다.In order to further improve the audio transcoding quality processed through the subframe interpolation method described above, by applying LPC data from the source and destination codec, excitation vectors used as target signals in the transcoding are used. .

지각적 품질을 향상시키기 위해 이하의 2가지 방법이 채택될 수 있다,.The following two methods can be adopted to improve the perceptual quality.

제1 방법 : LSP 계수의 선형 변환First method: linear transformation of LSP coefficients

LSP 계수간의 변환을 위한 일반적인 방법은 선형 변환The general method for converting between LSP coefficients is linear transformation.

q'=Aq+b q ' = Aq + b

을 통해 이루어지며, 여기서,q'는 데스티네이션 LSP 벡터(의사 주파수 영역에서)이고,q는 소스(최초) LSP 벡터이며,A는 선형 변환 매트릭스이고,b는 바이어스 항(bias term)이다. 가장 단순한 경우로서,A는 단위행렬로 감소하며,b는 제로(0)로 감소한다. GSM-AMR 대 G.723.1 변환코더의 실시예로서, GSM-AMR 코덱에 사용된 DC 바이어스 항은 G.723.1에 사용된 것과 상이하고, 상기 등식에서의b항은 이러한 차이를 보상하는데 이용된다.Where q ' is the destination LSP vector (in the pseudo frequency domain), q is the source (first) LSP vector, A is the linear transformation matrix, and b is the bias term. In the simplest case, A decreases to the unit matrix and b decreases to zero. As an embodiment of the GSM-AMR to G.723.1 conversion coder, the DC bias term used in the GSM-AMR codec is different from that used in G.723.1, and the b term in the equation is used to compensate for this difference.

제2 방법 : LSP 계수에 의한 여기 벡터 조정Method 2: Adjust Excitation Vector by LSP Coefficient

복호화된 소스 여기 벡터는 음성 영역으로 변환하기 위해 각각의 서브프레임에서의 소스 LPC 계수에 의해 합성되고, 변환코딩에서 타겟 신호를 형성하기 위해 데스티네이션 코덱의 양자화된 LP 파라미터를 이용하여 필터링된다. 이러한 조정은 선택적이며, LPC 파라미터에서 현저한 차이가 있는 지각적 음질을 크게 향상시킬 수 있다. 도 13은 여기 조정 방식을 나타낸다.The decoded source excitation vector is synthesized by the source LPC coefficients in each subframe to convert to the speech domain, and filtered using the quantized LP parameters of the destination codec to form the target signal in transform coding. This adjustment is optional and can greatly improve the perceptual sound quality with significant differences in LPC parameters. 13 shows an excitation adjustment scheme.

파라미터 매핑 및 튜닝 모듈(Parameter Mapping & Tuning Module) The parameter mapping and the tuning module (Mapping Parameter & Tuning Module)

이하, CELP 여기 파라미터를 매핑처리하기 위한 3가지 방식을 설명한다. 이들 방식은 연속의 계산 복잡도와 출력 품질을 기초로 하여 나타낸다. 본 발명의 핵심은 음성 신호을 재구성할 필요 없이 여기(excitation)가 직접 매핑될 수 있다는 것이다. 이것이 의미하는 것은 중요한 계산이 폐루프 코드북 검색 동안 세이브된다는 것으로서, 이는 종래기술에서 필요로 했던 것처럼, 단기 임펄스 응답에 의해 신호가 필터링될 필요가 없기 때문이다. 이러한 매핑처리가 작용을 하는 것은 입력 비트스트림이 음성을 생성하기 위한 소스 CELP 코덱에 따라 이미 최적의 여기를 포함하기 때문이다. 본 발명은 이러한 사실을 이용하여 음성 영역 대신에 여기 영역에서의 신속한 검색을 수행한다.Hereinafter, three methods for mapping the CELP excitation parameter will be described. These schemes are represented on the basis of continuous computational complexity and output quality. The key to the present invention is that excitation can be mapped directly without the need to reconstruct the speech signal. This means that important calculations are saved during closed loop codebook searches, since the signal does not need to be filtered by the short term impulse response, as required in the prior art. This mapping process works because the input bitstream already contains the optimal excitation according to the source CELP codec for speech generation. The present invention uses this fact to perform a quick search in the excitation region instead of the speech region.

상술한 바와 같이, 여기 매핑을 위한 3가지 방법은 각각 성공적으로 더 양호한 성능을 가지며, 이에 의해 변환코더는 이용 가능한 컴퓨팅 리소스로 적용이 가능하다.As mentioned above, each of the three methods for excitation mapping has successfully better performance, whereby the transform coder can be applied to available computing resources.

CELP 파라미터 직접 공간 매핑(CELP Parameters Direct Space Mapping) CELP Parameters Direct Space Mapping

이 방식은 가장 단순한 변환코딩 기법이다. 매핑은 소스 및 데스티네이션 파라미터 사이에서 물리적인 의미를 갖는 유사성에 기초를 두고 있으며, 어떠한 반복이나 탐색 없이 분석식을 이용하여 변환코딩을 직접 수행한다. 이 기법의 장점은 많은 메모리를 필요로 하지 않으며 거의 제로에 가까운 MIPS를 사용한다는 것이지만, 음질의 성능열화에도 불구하고 여전히 명료한 사운드를 생성할 수 있다. 중요한 것은, 본 발명의 CELP 파라미터 직접 공간 매핑 방법은 도 7에 도시된 종래기술의 장치와 상이하다는 것이다. 본 발명의 방법은 일반적이며, 소스 및 데스티네이션에서의 상이한 CELP 코드, 상이한 프레임 또는 서브프레임 크기에 의해 CELP를 기반으로 한 모든 종류의 변환코딩에 적용된다.This is the simplest conversion coding technique. The mapping is based on the similarity with physical meaning between source and destination parameters, and directly performs transform coding using analytical equations without any repetition or searching. The advantage of this technique is that it does not require much memory and uses nearly zero MIPS, but it can still produce clear sound despite the degradation of sound quality. Importantly, the CELP parameter direct spatial mapping method of the present invention is different from the prior art apparatus shown in FIG. The method of the present invention is general and applies to all kinds of transform coding based on CELP by different CELP codes, different frame or subframe sizes in source and destination.

여기 공간 영역에서의 분석(Analysis in Excitation Space Domain) Analysis at this spatial region (Excitation Analysis in Space Domain)

이 기법은 적응 및 고정 코드북이 탐색되는 이전의 기법에 비해 보다 개선된 것이며, 데스티네이션 CELP 표준에 의해 정의된 일반적인 방법에서 추정된 이득은 여기 영역에서 구현되는 것을 제외하고는 음성 영역이 아니다. 피치 기여도(pitch contribution)는 최초 추정치로서 입력 CELP 서브프레임으로부터 피치를 이용하여 먼저 로컬 탐색에 의해 결정된다. 일단 발견되면, 잔차(residual)를 최적으로 매칭시킴으로써 결정되는 고정 코드북과 여기로부터 피치 기여도가 공제된다. 탠덤 방식에 비해, 개방루프 피치 추정치가 CELP 표준에 의해 이용되는 자동상관 방법으로부터 계산될 필요가 없지만 대신에 복호화된 CELP 서브프레임의 피치 지연으로부터 결정될 수 있다는 장점이 있다. 또, 이러한 탐색은 음성 영역이 아닌 여기 영역에서 수행되기 때문에, 피치 및 코드북 탐색 동안의 임펄스 응답 필터링이 필요하지 않다. 이에 의해, 출력 품질을 상쇄시키지 않고도 상당한 양의 계산을 줄일 수 있다.This technique is an improvement over previous techniques where adaptive and fixed codebooks are searched, and the gain estimated in the general method defined by the destination CELP standard is not in the speech domain except that it is implemented in the excitation region. Pitch contribution is first determined by local search using the pitch from the input CELP subframe as an initial estimate. Once found, the pitch contribution is subtracted from the fixed codebook, which is determined by optimally matching the residual. Compared to the tandem scheme, the open loop pitch estimate does not need to be calculated from the autocorrelation method used by the CELP standard, but instead has the advantage that it can be determined from the pitch delay of the decoded CELP subframe. In addition, since such a search is performed in the excitation region rather than the speech region, impulse response filtering during pitch and codebook search is not necessary. Thereby, a significant amount of calculation can be reduced without canceling output quality.

필터링된 공간 영역에서의 분석(Analysis in Filtered Excitation Space Domain) Analysis of the filtered spatial domain (Space Domain Analysis in Filtered Excitation)

이 경우, LP 파라미터는 여전히 소스 코덱으로부터 데스티네이션 코덱으로직접 매핑되고, 복호화된 피치 지연은 데스티네이션 코덱에 대한 개방루프 피치 추정으로서 이용된다. 폐루프 피치 탐색은 계속해서 여기 영역에서 수행된다. 그러나, 고정 코드북 탐색은 필터링된 여기 공간 영역에서 수행된다. 필터 종류의 선택은 타겟 벡터가 하나 또는 2개의 탐색에 대해 이러한 영역으로 변환되는지 여부에 따라, 원하는 품질과 복잡도 요구에 의존하게 될 것이다.In this case, the LP parameters are still mapped directly from the source codec to the destination codec, and the decoded pitch delay is used as an open loop pitch estimate for the destination codec. Closed loop pitch search is subsequently performed in the excitation region. However, fixed codebook searches are performed in the filtered excitation space region. The choice of filter type will depend on the desired quality and complexity requirements, depending on whether the target vector is transformed into this region for one or two searches.

다양한 필터로서, 불규칙성을 평활화하기 위해 저역통과 필터를 포함하여, 소스 및 데스티네이션 코덱에서의 여기 특성간의 차이를 보상하는 필터, 지각적으로 중요한 신호 특성을 보강하는 필터를 이용할 수 있다. 장점은 표준 인코딩에서의 타겟 신호의 계산과 달리, 가중화된 LP 합성 필터를 이용한다는 것이며, 이러한 필터의 파라미터[차수, 주파수성분의 강조/디엠퍼시스(frequency emphasis/de-emphasis, 위상]가 완전한 튜닝이 가능하다. 따라서, 이러한 기법에 의해, 특정의 코덱쌍간의 변환코딩을 위한 품질과, 복잡도를 감소시킨 것에 대한 품질을 상쇄시키는 것을 향상시키는 튜닝이 가능하게 된다.As various filters, it is possible to use filters to compensate for differences between excitation characteristics in the source and destination codecs, including lowpass filters to smooth out irregularities, and filters to reinforce perceptually important signal characteristics. The advantage is that, unlike the calculation of the target signal in standard encoding, it uses a weighted LP synthesis filter, whose parameters [frequency, frequency emphasis / de-emphasis, phase] are complete. Thus, this technique enables tuning to improve the quality for conversion coding between specific codec pairs and to offset the quality for reducing complexity.

묵음 프레임 변환코딩 및 생성(Silence Frame Transcoding and Generation) Silent frame transform coding and generating (Silence Frame Transcoding and Generation)

몇몇 CELP를 기반으로 한 표준은 음성이 없는 기간 동안 불연속 전송(DTX: discontinuous transmission)과 컴포트 노이즈 생성(CNG: comfort noise generation)이 가능한 음성활동 검출기(VAD: Voice Activity Detectors)를 구현한다. 묵음 프레임이 소스 코덱에 의해 발생되지 않는 경우에 데스티네이션 코덱에 대한 묵음 프레임의 생성에서도 이들 프레임간의 변환코딩이 요구된다. 일반적으로, 프레임은 디코더에서 적절한 컴포트 노이즈를 생성하기 위한 파라미터로 구성된다. 이들 파라미터는 간단한 대수적 방법을 이용하여 변환코딩될 수 있다.Some CELP-based standards implement Voice Activity Detectors (VAD) that enable discontinuous transmission (DTX) and comfort noise generation (CNG) during periods of no voice. When silent frames are not generated by the source codec, conversion coding between these frames is also required for generation of silent frames for the destination codec. In general, a frame consists of parameters for generating appropriate comfort noise at the decoder. These parameters can be transform coded using a simple algebraic method.

본 발명의 실시예Embodiment of the present invention

이하, G.723.1 및 GSM-AMR 음성 코딩 표준에 대한 본 발명의 실시예를 설명한다. 본 발명은 이들 표준에 한정되지 않는다. 본 발명은 CELP를 기반으로 한 모든 코딩 표준에 적용된다. 당업자라면, 다른 CELP를 기반으로 한 코딩 표준들간의 변환코딩을 위해 이러한 방법들을 적용하는 방식을 이해할 수 있을 것이다. 바람직한 실시예를 설명하기 전에, GSM-ARM 및 G.723.1 코덱의 간단한 설명을 먼저 한다.Embodiments of the present invention for the G.723.1 and GSM-AMR speech coding standards are described below. The present invention is not limited to these standards. The present invention applies to all coding standards based on CELP. Those skilled in the art will understand how to apply these methods for transform coding between different CELP-based coding standards. Before describing the preferred embodiment, a brief description of the GSM-ARM and G.723.1 codecs is given first.

GSM-AMR 코덱GSM-AMR codec

GSM-AMR 코덱(codec)은 초당 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 및 4.75 킬로비트(kbit/s)의 비트전송률을 가진 8개의 소스 코덱을 이용한다.The GSM-AMR codec uses eight source codecs with bit rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kilobits per second (kbit / s).

이 코덱은 코드 여기된 선형 예측(CELP) 코딩 모델에 기초한다. 제10차 선형 예측(LP) 또는 단기의 합성 필터가 이용된다. 장기 또는 피치 합성 필터는 소위 적응성 코드북 방식을 이용하여 구현된다.This codec is based on code excited linear prediction (CELP) coding model. Tenth order linear prediction (LP) or short term synthesis filters are used. The long term or pitch synthesis filter is implemented using a so-called adaptive codebook scheme.

CELP 음성 합성 모델에 있어서, 단기 LP 합성 필터의 입력에서의 여기 신호는 적응 및 고정(이노베이션(innovative)) 코드북으로부터 2개의 여기 벡터를 추가함으로써 구성된다. 이 음성은 이들 코드북으로부터 단기 합성 필터를 통해 2개의 적절하게 선택된 벡터를 공급함으로써 합성된다. 코드북에서의 최적의 여기 시퀀스는, 최초의 합성된 음성간의 오차(error)가 지각적으로 가중된 왜곡 측정에 따라 최소화되는 합성에 의한 분석 탐색 절차를 이용하여 선택된다. 합성에 의한 분석탐색 기술에서 이용되는 지각적 가중 필터는 양자화되지 않은 LP 파라미터를 이용한다.In the CELP speech synthesis model, the excitation signal at the input of the short-term LP synthesis filter is constructed by adding two excitation vectors from an adaptive and fixed (innovative) codebook. This speech is synthesized by feeding two appropriately selected vectors from these codebooks through a short-term synthesis filter. The optimal excitation sequence in the codebook is selected using an analytical search procedure by synthesis where the error between the original synthesized speech is minimized in accordance with the perceptually weighted distortion measurement. Perceptual weighted filters used in synthetic analytical search techniques use unquantized LP parameters.

코더는 초당 8000개의 샘플(8000 samples/s)의 샘플링 주파수에서 160개의 샘플에 대응하는 20ms의 음성 프레임에 대해 동작한다. 160개의 음성 샘플 각각에 대해, 음성 신호는 CELP 모델의 파라미터(LP 필터 계수, 적응 및 고정 코드북의 인덱스 및 이득)를 추출하도록 분석된다. 이들 파라미터는 부호화 및 전송된다. 디코더에서, 이들 파라미터는 복호화되고 음성은 LP 합성 필터를 통해 재구성된 여기 신호를 필터링함으로써 합성된다.The coder operates on a 20 ms speech frame corresponding to 160 samples at a sampling frequency of 8000 samples / s. For each of the 160 speech samples, the speech signal is analyzed to extract the parameters of the CELP model (LP filter coefficients, indexes and gains of the adaptive and fixed codebooks). These parameters are encoded and transmitted. At the decoder, these parameters are decoded and the speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.

LP 분석은 초당 12.2 킬로비트(12.2 kbit/s) 모드에 대해서는 프레임당 2번 수행되고 다른 모드에 대해서는 한번 수행된다. 12.2 kbit/s 모드에 대해, 2세트의 LP 파라미터는 라인스펙트럼 페어(LSP)로 변환되고 38비트를 가진 스플리트 매트릭스 양자화(SMQ)를 이용하여 공동으로 양자화된다. 다른 모드의 경우에는, 단일 세트의 LP 파라미터가 라인스펙트럼 페어(LSP)로 변환되고 스플리트 벡터 양자화(SVQ)를 이용하여 벡터 양자화된다.LP analysis is performed twice per frame for 12.2 kilobits per second (12.2 kbit / s) mode and once for the other mode. For 12.2 kbit / s mode, two sets of LP parameters are converted into line spectrum pairs (LSP) and jointly quantized using split matrix quantization (SMQ) with 38 bits. For other modes, a single set of LP parameters is converted into line spectrum pairs (LSP) and vector quantized using split vector quantization (SVQ).

음성 프레임은 각각 5ms씩 4개의 서브프레임(40개의 샘플)로 분리된다. 적응 및 고정 코드북 파라미터는 서브프레임마다 전송이 행해진다. 서브프레임에 따라 양자화된 LP 파라미터 및 양자화되지 않은 LP 파라미터 또는 이들의 보간된 버젼이 이용된다. 개방루프 피치 지연은 지각적으로 가중된 음성 신호에 기초한 다른 서브프레임마다(프레임당 한번 행해지는 5.15 및 4.75 kbit/s 모드를 제외) 추정이 이루어진다.The speech frame is divided into four subframes (40 samples) of 5 ms each. The adaptive and fixed codebook parameters are transmitted per subframe. Depending on the subframe, quantized LP parameters and unquantized LP parameters or interpolated versions thereof are used. The open loop pitch delay is estimated every other subframe based on the perceptually weighted speech signal (except for 5.15 and 4.75 kbit / s mode, which is done once per frame).

각각의 서브프레임에 대해 이하의 동작이 반복된다.The following operation is repeated for each subframe.

ㆍ 타겟 신호는 LP 잔차(residual)와 여기 사이의 오차(error)를 필터링함으로써 갱신된 필터의 초기 상태에서 가중된 합성 필터를 통해 LP 잔차를 필터링하여 계산된다(이것은 가중된 음성 신호로부터 가중된 합성 필터의 제로 입력 응답(zero input response)을 공제하는 공통 방식과 등가이다).The target signal is calculated by filtering the LP residual through a weighted synthesis filter in the initial state of the updated filter by filtering the error between the LP residual and the excitation (this is a weighted synthesis from the weighted speech signal). Equivalent to a common way of subtracting the filter's zero input response).

ㆍ 가중된 합성 필터의 임펄스 응답이 계산된다.The impulse response of the weighted synthesis filter is calculated.

ㆍ 폐루프 피치 분석이 타겟 및 임펄스 응답을 이용하여 개방루프 피치 지연 부근을 탐색함으로써 수행된다(피치 지연 및 이득을 찾기 위해). (모드에 따라)샘플 분해능의 1/6번째 또는 1/3번째의 샘플 분해능을 갖는 부분 피치(fractional pitch)가 이용된다.Closed loop pitch analysis is performed by searching around the open loop pitch delay using the target and impulse response (to find the pitch delay and gain). A fractional pitch with 1 / 6th or 1 / 3th sample resolution of the sample resolution (depending on the mode) is used.

ㆍ 타겟 신호는 적응 코드북 기여도를 제거함으로써 갱신되고(필터링된 적응 코드벡터), 이 새로운 타겟은 고정 대수 코드북 탐색에 이용된다(최적의 이노베이션 코드워드를 찾기위해).The target signal is updated by removing the adaptive codebook contribution (filtered adaptive codevector), and this new target is used for fixed algebraic codebook search (to find the optimal innovation codeword).

ㆍ 적응 및 고정 코드북의 이득은 4비트 및 5비트로 각각 스칼라 양자화되거나 6-7비트로 벡터 양자화된다(고정 코드북 이득에 적용되는 이동평균(MA: moving average) 예측으로).The gains of the adaptive and fixed codebooks are scalar quantized to 4 bits and 5 bits respectively or vector quantized to 6-7 bits (with a moving average (MA) prediction applied to the fixed codebook gains).

ㆍ 최종적으로, 필터 메모리가 다음 서브프레임에서의 타겟 신호를 찾기 위해 갱신된다(결정된 여기 신호를 이용).Finally, the filter memory is updated to find the target signal in the next subframe (using the determined excitation signal).

각각 20ms의 음성 프레임에서, 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 또는 12.2 kbps의 비트전송률에 대응하는 95, 103, 118, 134, 148, 159, 204 또는244비트의 비트 할당이 이루어진다.In 20 ms speech frames, bit allocations of 95, 103, 118, 134, 148, 159, 204, or 244 bits correspond to bit rates of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbps, respectively. .

G.723.1 코덱G.723.1 codec

G.723.1 코덱은 관련된 2개의 비트전송률, 5.3 kbps 와 6.3 kbps를 갖는다. 이들 2개의 비트전송률은 인코더와 디코더의 필수 부분이다. 임의의 30ms 프레임 경계에 대해 2개의 비트전송률을 교체하는 것이 가능하다.The G.723.1 codec has two associated bit rates, 5.3 kbps and 6.3 kbps. These two bit rates are an integral part of the encoder and decoder. It is possible to swap two bit rates for any 30 ms frame boundary.

코더는 선형 예측의 합성에 의한 분석의 원리에 기초를 두고 있으며, 지각적으로 가중된 오차 신호를 최소화한다. 인코더는 240 샘플의 블록(프레임) 각각에 대해 작용한다. 즉, 8 kHz 샘플링 레이트에서의 30ms가 된다. 각각의 블록은 DC 성분을 제거하기 위해 먼저 고역통과 필터링되고, 각각 60 샘플을 갖는 4개의 서브프레임으로 분리된다. 각 서브프레임에 대해, 제10차 선형 예측 코더(LPC) 필터가 처리되지 않은 입력 신호를 이용하여 계산된다. 최종 서브프레임에 대한 LPC 필터는 예측 스플리트 벡터 양자화기(PSVQ: Predictive Split Vector Quantizer)를 이용하여 양자화된다. 양자화되지 않은 LPC 계수는 단기 지각적 가중 필터를 구성하는데 이용되고, 이러한 필터는 전체 프레임을 필터링하고 지각적으로 가중화된 음성 신호를 획득하는데 이용된다.The coder is based on the principle of analysis by synthesis of linear prediction and minimizes the perceptually weighted error signal. The encoder works for each block (frame) of 240 samples. That is, 30 ms at an 8 kHz sampling rate. Each block is first high-pass filtered to remove the DC component and separated into four subframes with 60 samples each. For each subframe, a tenth order linear prediction coder (LPC) filter is calculated using the unprocessed input signal. The LPC filter for the final subframe is quantized using a Predictive Split Vector Quantizer (PSVQ). Unquantized LPC coefficients are used to construct a short-term perceptual weighted filter, which is used to filter the entire frame and obtain a perceptually weighted speech signal.

2개의 서브프레임(120개 샘플)마다, 개방루프 피치 기간L _OL 이 가중화된 음성 신호를 이용하여 계산된다. 이러한 피치 추정치는 120개의 샘플 블록에 대해 수행된다. 피치 기간은 18 내지 142개의 샘플 범위에서 탐색된다.For every two subframes (120 samples), the open loop pitch period L _OL is calculated using the weighted speech signal. This pitch estimate is performed on 120 sample blocks. The pitch period is searched in the range of 18 to 142 samples.

이러한 점으로부터 서브프레임 단위마다 60개의 샘플에 대해 음성이 처리된다.From this point, speech is processed for 60 samples per subframe unit.

이전에 계산된 추정된 피치 기간을 이용하여, 고주파 노이즈 성형 필터가 구성된다. 임펄스 응답을 생성하기 위해, LPC 합성 필터, 포르만트 지각 가중화 필터(formant perceptual weighting filter) 및 고주파 노이즈 성형 필터의 조합이 이용된다. 향후의 계산을 위해 임펄스 응답이 이용된다.Using the previously calculated estimated pitch period, a high frequency noise shaping filter is constructed. To produce an impulse response, a combination of LPC synthesis filter, formant perceptual weighting filter and high frequency noise shaping filter are used. The impulse response is used for future calculations.

피치 기간 추정치L _OL 와 임펄스 응답을 이용하여, 폐루프 피치 예측기가 계산된다. 제5차 피치 예측기가 이용된다. 피치 기간은 개방루프 피치 추정치 부근의 작은 차분값으로 계산된다. 피치 예측기의 기여도는 초기 타겟 벡터로부터 공제된다. 피치 기간과 차분값은 모두 디코더로 전송된다.Using the pitch period estimate L _OL and the impulse response, a closed loop pitch predictor is calculated. A fifth order pitch predictor is used. The pitch period is calculated as a small difference near the open loop pitch estimate. The contribution of the pitch predictor is subtracted from the initial target vector. The pitch period and the difference value are both sent to the decoder.

마지막으로, 여기(excitation)의 주기적 성분의 근사값을 구한다. 높은 비트레이트에 대해, 다중펄스 최대 유사도 양자화(MP-MLQ: multi-pulse maximum likelihood quantization) 여기가 이용되며, 낮은 비트레이트에 대해서는 대수 코드북 여기(ACELP)가 이용된다.Finally, an approximation of the periodic component of the excitation is obtained. For high bitrates, multi-pulse maximum likelihood quantization (MP-MLQ) excitation is used, and for low bitrates, algebraic codebook excitation (ACELP) is used.

제1 실시예 - GSM-AMR 대 G.723.1Embodiment 1-GSM-AMR vs. G.723.1

도 17은 본 발명의 제1 실시예에 따라 GSM-AMR로부터 G.723.1로의 변환코더를 나타내는 블록도이다. GSM-AMR 비트스트림은 최고속 모드 12.2 kbps에서의 244 비트(31 바이트)로부터 최저속 모드 4.75 kbps 코덱까지의 길이를 갖는 20ms 프레임으로 구성된다. 모두 8개의 모드(mode)가 있다. 8개의 GSM-AMR 동작 모드 각각은 상이한 비트스트림을 생성한다. 30ms의 기간을 갖는 G.723.1 프레임은 1과 1/2의 GSM-AMR 프레임으로 구성되기 때문에, 단일의 G.723.1 프레임을 생성하기 위해서는 2개의 GSM-AMR 프레임이 필요하게 된다. 3번째 GSM-AMR 프레임이 도달하게 되면 다음의 G.723.1 프레임이 생성될 수 있다. 따라서, 3개의 GSM-AMR 프레임이 처리될 때마다 2개의 G.723.1 프레임이 생성된다.17 is a block diagram showing a conversion coder from GSM-AMR to G.723.1 according to the first embodiment of the present invention. The GSM-AMR bitstream consists of 20 ms frames with a length from 244 bits (31 bytes) in the fastest mode 12.2 kbps to the lowest mode 4.75 kbps codec. There are eight modes in all. Each of the eight GSM-AMR modes of operation produces a different bitstream. Since a G.723.1 frame with a duration of 30ms consists of 1 and 1/2 GSM-AMR frames, two GSM-AMR frames are required to generate a single G.723.1 frame. When the third GSM-AMR frame arrives, the next G.723.1 frame may be generated. Thus, two G.723.1 frames are generated each time three GSM-AMR frames are processed.

GSM-AMR 음성 생성 모델에서 단기 필터에 의해 이용되는 10개의 LSP 파라미터는 동일한 기술을 이용하여 부호화되지만, 상이한 동작 모드를 위한 비트스트림 포맷은 상이하다. LSP 파라미터를 재구성하기 위한 알고리즘은 GSM-AMR 표준 문서에 있다.The ten LSP parameters used by the short-term filter in the GSM-AMR speech generation model are encoded using the same technique, but the bitstream formats for the different modes of operation are different. The algorithm for reconstructing LSP parameters is in the GSM-AMR standard document.

각 서브프레임에 대해 단기 필터 파라미터가 생성되었으면, 적응 코드워드 및 고정(대수) 코드워드를 조합시킴으로써 여기 벡터(excitation vector)가 형성될 필요가 있다. 적응 코드워드는 1/6번째 또는 1/3번째의 분해능 피치 지연 파라미터에 기초한 60탭(60-tap) 보간을 이용하여 구성된다. 다음으로, 고정 코드워드는 다음과 같이 형성된 여기와 표준에 의해 정의되는 것으로 구성된다.Once the short-term filter parameters have been generated for each subframe, an excitation vector needs to be formed by combining the adaptive codeword and the fixed (logarithmic) codeword. The adaptive codeword is constructed using 60-tap interpolation based on the 1 / 6th or 1 / 3th resolution pitch delay parameter. Next, the fixed codewords consist of those defined by excitation and standards formed as follows.

여기서,x는 여기,v는 보간된 적응 코드워드,c는 고정 코드벡터,와는 각각 적응 및 고정 코드 이득을 나타낸다. 여기(excitation)는 GSM-AMR 언패커의 메모리 상태를 매핑을 위한 G.723.1 비트스트림 패커에 의해 갱신하는데 이용된다.Where x is here, v is interpolated adaptive codeword, c is fixed codevector, Wow Denotes the adaptive and fixed code gains, respectively. Excitation is used to update the memory state of the GSM-AMR unpacker by the G.723.1 bitstream packer for mapping.

여기 벡터의 선형 조합을 형성하고 GSM-AMR 언패커에 의해 구성된 타겟 여기신호x[]에 대한 최적의 매칭을 발견함으로써 각각의 서브프레임에 대해 적응 코드워드가 정해진다. 이 조합은 5번의 연속 지연에서 이전 여기(previous excitation)의 가중합계이다. 이것은 다음 등식을 통해 가장 잘 설명할 수 있다.An adaptive codeword is determined for each subframe by forming a linear combination of excitation vectors and finding the best match for the target excitation signal x [] constructed by the GSM-AMR unpacker. This combination is the weighted sum of previous excitations at five consecutive delays. This can best be explained by the following equation:

여기서,v[]는 재구성된 적응 코드워드이고,u[]는 이전의 여기 버퍼이며,L은 18과 143 사이의 (정수) 피치 지연(GSM-AMR 언패킹 모듈에 의해 결정)이고,β _j 는 이득과 지연 위상을 결정하는 지연 가중값이다. 적응 코드워드v[]와 여기 벡터x[]간의 매칭을 최적화하기 위해β _j 값의 벡터 테이블이 탐색된다.Where v [] is the reconstructed adaptive codeword, u [] is the previous excitation buffer, L is the (integer) pitch delay between 18 and 143 (determined by the GSM-AMR unpacking module), and β _j Is the delay weight that determines the gain and delay phase. The vector table of β _j values is searched to optimize the matching between the adaptive codeword v [] and the excitation vector x [] .

여기(excitation)의 적응 코드북 성분이 발견되었으면, 이 성분이 여기값으로부터 공제되어 고정 코드북에 의한 부호화를 위해 잔차(residual)를 남기게 된다. 각 서브프레임에 대한 잔차 신호는 다음과 같이 계산된다.If an adaptive codebook component of excitation is found, this component is subtracted from the excitation value, leaving a residual for encoding by the fixed codebook. The residual signal for each subframe is calculated as follows.

여기서,x ₂ []는 고정 코드북 검색을 위한 타겟이며,x[]는 GSM-AMR 언패킹으로부터 유도된 여기값이고,v[]는 (보간 및 스케일링 처리된) 적응 코드워드이다.Where x ₂ [] is a target for fixed codebook search, x [] is an excitation value derived from GSM-AMR unpacking, and v [] is an adaptive codeword (interpolated and scaled).

고정 코드북은 G.723.1 코덱의 고속(high rate) 모드와 저속(low rate) 모드에서 상이하다. 고속 모드는, 어느 위치에서나, 짝수 서브프레임의 경우에 서브프레임당 6개의 펄스를 허용하고 홀수 서브프레임의 경우에는 서브프레임당 5개의 펄스를 허용하는 MP-MLQ 코드북을 이용한다. 저속 모드는 제한된 위치에서 서브프레임당 4개의 펄스를 허용하는 대수 코드북(ACELP)을 이용한다. 상기 2개의 코드북은 모두 코드워드를 천이하는 것이 하나의 위치에 의해 천이되어야 하는지 여부를 나타내기 위해 그리드 플래그(grid flag)를 이용한다. 이들 코드북은 표준에서 정의된 방법에 의해 검색되며, 예외적으로 임펄스 응답 필터는 사용되지 않는다. 이는 이러한 검색이 음성 영역에서가 아니라 여기 영역에서 수행되기 때문이다.The fixed codebook is different in the high rate mode and the low rate mode of the G.723.1 codec. The fast mode uses an MP-MLQ codebook at any position, allowing six pulses per subframe for even subframes and five pulses per subframe for odd subframes. The low speed mode uses an algebraic codebook (ACELP) that allows four pulses per subframe at restricted locations. Both codebooks use a grid flag to indicate whether a transitioning codeword should be shifted by one position. These codebooks are retrieved by the method defined in the standard, with the exception of impulse response filters. This is because such a search is performed in the excitation region, not in the speech region.

코덱용의 (불휘발성) 메모리는 각 서브프레임에 대한 처리를 완료한 때에 갱신될 필요가 있다. 이러한 갱신은 이전 여기 버퍼u[]를 60 샘플만큼 천이시켜, 가장 오래된 샘플을 버리고 현재의 서브프레임으로부터 버퍼의 최상위 60 샘플로 여기를 복제함으로써 달성된다.The (nonvolatile) memory for the codec needs to be updated when the processing for each subframe is completed. This update is accomplished by shifting the previous excitation buffer u [] by 60 samples, discarding the oldest sample and duplicating the excitation from the current subframe to the top 60 samples of the buffer.

여기서 인덱스n은 현재 서브프레임의 제1 샘플이고, 다른 파라미터는 앞서 정의되어 있다.Where index n is the first sample of the current subframe and other parameters are defined above.

모든 매핑처리된 파라미터가 부호화되어 출력 G.723.1 비트스트림이 되고, 시스템은 다음 프레임을 처리할 준비를 한다.All mapped parameters are coded into the output G.723.1 bitstream, and the system is ready to process the next frame.

제2 실시예 - G.723.1 대 GSM-AMRSecond Embodiment-G.723.1 vs. GSM-AMR

도 18은 본 발명의 제2 실시예에 따른 GSM-AMR에 대한 G.723.1의 변환코더를 나타내는 블록도이다. G.723.1 비트스트림은 고속(6.3kbps) 코덱에 대해서는 192 비트(24 바이트) 또는 저속(5.3kbps) 코덱에 대해서는 160 비트(20 바이트)의 길이를 갖는 프레임으로 이루어진다. 프레임들은 매우 유사한 구조를 가지며 고정 코드북 파라미터의 표현만이 상이하다.18 is a block diagram showing a G.723.1 conversion coder for GSM-AMR according to a second embodiment of the present invention. The G.723.1 bitstream consists of frames having a length of 192 bits (24 bytes) for the high speed (6.3 kbps) codec or 160 bits (20 bytes) for the low speed (5.3 kbps) codec. Frames have a very similar structure and differ only in the representation of fixed codebook parameters.

단기 성도 필터(short-term vocal tract filter)를 모델링하기 위해 이용되는 10개의 LSP 파라미터는 고속 및 저속 모두에 대해 동일한 방식으로 부호화되고, G.723.1 프레임의 비트 2 내지 비트 25까지 추출이 가능하다. 4번째 서브프레임의 LSP만이 부호화되고 프레임간의 보간이 다른 3개의 서브프레임에 대한 LSP를 재생성하는데 이용된다. 부호화는 3개의 조사 테이블을 이용하고, 이들 테이블로부터 유도된 3개의 서브벡터를 결합함으로써 재구성된 LSP 벡터를 이용한다. 각각의 테이블은 256개의 벡터 엔트리를 갖는다. 즉 처음 2개의 테이블은 3요소 서브벡터를 가지며 마지막 테이블은 4요소 서브벡터를 갖는다. 이들을 조합하면 10요소의 LSP 벡터가 된다.The ten LSP parameters used to model the short-term vocal tract filter are coded in the same way for both high speed and low speed, and can extract bits 2 to 25 of the G.723.1 frame. Only the LSP of the fourth subframe is encoded and used to regenerate LSPs for three subframes having different interpolation between frames. The encoding uses three lookup tables and uses the reconstructed LSP vectors by combining three subvectors derived from these tables. Each table has 256 vector entries. That is, the first two tables have three element subvectors and the last table has four element subvectors. Combining these results in a 10-element LSP vector.

적응 코드워드는 이전 여기 벡터를 조합함으로써 각각의 서브프레임에 대해 구성된다. 이 조합은 5개의 연속 지연에서 이전 여기의 가중합계이다. 이하의 등식이 이를 가장 잘 설명한다.The adaptive codeword is constructed for each subframe by combining the previous excitation vector. This combination is the weighted sum of the previous excitations at five consecutive delays. The following equation best explains this.

여기서,v[]는 재구성된 적응 코드워드이고,u[]는 이전의 여기 버퍼이며,L은 18과 143 사이의 (정수) 피치 지연이고,β _j 는 피치 이득 파라미터에 의해 결정된 지연 가중값이다.Where v [] is the reconstructed adaptive codeword, u [] is the previous excitation buffer, L is the (integer) pitch delay between 18 and 143, and β _j is the delay weight determined by the pitch gain parameter.

지연 파라미터L은 비트스트림으로부터 직접 추출된다. 제1 및 제3 서브프레임은 지연의 완전한 동적 범위를 이용하는 반면, 제 2 및 제4 서브프레임은 지연을 이전 서브프레임으로부터의 오프셋으로서 부호화한다. 지연 가중 파라미터β _j 는 테이블 조사에 의해 결정된다. 적응 코드워드 언패킹의 결과로서, 부분 피치 지연 및 이와 관련된 이득에 대한 근사치가 이하의 식을 계산함으로써 결정될 수 있다.The delay parameter L is extracted directly from the bitstream. The first and third subframes use the full dynamic range of the delay, while the second and fourth subframes encode the delay as an offset from the previous subframe. The delay weighting parameter β _j is determined by table lookup. As a result of adaptive codeword unpacking, an approximation to the partial pitch delay and its associated gain can be determined by calculating the following equation.

고정 코드북은 G.723.1 코덱의 고속 모드와 저속 모드에서 다르다. 고속 모드는, 어느 위치에서나, 짝수 서브프레임에 대해 서브프레임당 6개의 펄스를 허용하고 홀수 서브프레임에 대해서는 서브프레임당 5개의 펄스를 허용하는 MP-MLQ 코드북을 이용한다. 저속 모드는 제한된 위치에서 서브프레임당 4개의 펄스를 허용하는 대수 코드북(ACELP)을 이용한다. 상기 2개의 코드북은 모두 코드워드를 천이하는 것이 하나의 위치에 의해 천이되어야 하는지 여부를 나타내기 위해 그리드 플래그(grid flag)를 이용한다. 부호화된 비트스트림으로부터 코드워드를 생성하기 위한 알고리즘은 GSM-AMR 표준 문서에 있다.Fixed codebooks are different in the fast and slow modes of the G.723.1 codec. The fast mode uses an MP-MLQ codebook that allows six pulses per subframe for even subframes and five pulses per subframe for odd subframes at any location. The low speed mode uses an algebraic codebook (ACELP) that allows four pulses per subframe at restricted locations. Both codebooks use a grid flag to indicate whether a transitioning codeword should be shifted by one position. Algorithms for generating codewords from coded bitstreams are in the GSM-AMR standard document.

코덱용의 (불휘발성) 메모리는 각 서브프레임에 대한 처리를 완료한 때에 갱신될 필요가 있다. 이러한 갱신은 이전 여기 버퍼u[]를 60 샘플(즉, 1 서브프레임)만큼 천이시켜, 가장 오래된 샘플을 버리고 현재의 서브프레임으로부터 버퍼의최상위 60 샘플로 여기(excitation)를 복제함으로써 달성된다.The (nonvolatile) memory for the codec needs to be updated when the processing for each subframe is completed. This update is accomplished by shifting the previous excitation buffer u [] by 60 samples (ie 1 subframe), discarding the oldest sample and duplicating the excitation from the current subframe to the top 60 samples of the buffer.

여기서, 인덱스n은 현재 서브프레임의 제1 샘플에 상대적으로 설정되며, 다른 파라미터는 앞서 정의되어 있다.Here, the index n is set relative to the first sample of the current subframe, and other parameters are defined above.

변환코더의 GSM-AMR 파라미터 매핑 부분은 상술한 바와 같이 보간처리된 CELP 파라미터를 취하고, 이들 파라미터를 GSM-AMR 파라미터 공간을 검색하는 기초로서 이용한다. LSP 파라미터는 수신시에 간단히 부호화되는 반면, 다른 파라미터, 즉 여기 및 피치 지연은 GSM-AMR 공간에서의 국부적인 검색을 위한 추정치로서 이용된다. 이하는 변환코딩을 완료하기 위하여 각각의 서브프레임에 대해 행해질 필요가 있는 주요 동작을 나타낸다.The GSM-AMR parameter mapping portion of the transform coder takes the interpolated CELP parameters as described above and uses these parameters as a basis for searching the GSM-AMR parameter space. The LSP parameter is simply encoded upon reception, while the other parameters, excitation and pitch delay, are used as estimates for local search in the GSM-AMR space. The following shows the main operations that need to be done for each subframe to complete the transform coding.

적응 코드워드는, 타겟 여기내에서의 최적의 매칭을 위한 143의 최대 지연까지 이전 여기의 벡터를 검색함으로써 형성된다. 타겟 여기는 보간된 서브프레임으로부터 결정된다. 이전 여기는 모드에 따른 1/6번째 또는 1/3번째 간격만큼 보간될 수 있다. 최적의 지연은 G.723.1 언패킹 모듈로부터 결정된 피치 지연에 관한 작은 범위를 검색함으로써 정해진다. 이 범위는 최적의 정수 지연을 찾기 위해 검색되며, 지연의 분수 부분(fractional part)을 결정하도록 구분된다. 과정은 부분 검색을 수행하도록 24 탭 보간 필터를 이용한다. 제1 및 제3 서브프레임은 제2 및 제4 서브프레임과 다르게 처리된다. 보간처리된 적응 코드워드v[]는 다음과 같이형성된다.The adaptive codeword is formed by searching the previous excitation vector up to a maximum delay of 143 for optimal matching within the target excitation. Target excitation is determined from interpolated subframes. The previous excitation can be interpolated by 1 / 6th or 1 / 3rd intervals depending on the mode. The optimum delay is determined by searching a small range of pitch delays determined from the G.723.1 unpacking module. This range is searched to find the optimal integer delay and separated to determine the fractional part of the delay. The process uses a 24-tap interpolation filter to perform partial search. The first and third subframes are processed differently from the second and fourth subframes. The interpolated adaptive codeword v [] is formed as follows.

여기서,u[]는 이전의 여기 버퍼이고,L은 (정수) 피치 지연이며,t는 1/6번째 분해능에서의 부분 피치 지연이고,b ₆₀ 은 60 탭 보간 필터이다.Where u [] is the previous excitation buffer, L is the (integer) pitch delay, t is the partial pitch delay at 1 / 6th resolution, and b ₆₀ is a 60 tap interpolation filter.

피치 이득은 부호화되고 디코더로 전송되도록 또한 고정 코드북 타겟 벡터의 계산을 위하여 계산 및 양자화된다. 모든 모드는 각 서브프레임에 대해 동일한 방식으로 피치 이득을 계산한다.The pitch gain is calculated and quantized to be encoded and transmitted to the decoder and also for the calculation of the fixed codebook target vector. All modes calculate the pitch gain in the same way for each subframe.

여기서g _p 는 양자화처리되지 않은 피치 이득이고,x는 적응 코드북 검색을 위한 타겟이며,v는 (보간된) 적응 코드워드 벡터이다. 12.2kbps 및 7.95kbps 모드는 적응 및 고정 코드북 이득을 독립적으로 양자화하는 반면, 다른 모드는 고정 및 적응 이득의 공동 양자화(joint quantization)를 이용한다.Where g _p is the unquantized pitch gain, x is the target for adaptive codebook search, and v is the (interpolated) adaptive codeword vector. The 12.2 kbps and 7.95 kbps modes independently quantize the adaptive and fixed codebook gains, while the other mode uses joint quantization of the fixed and adaptive gains.

여기의 적응 코드북 성분이 정해지면, 이 성분이 여기값으로부터 공제되어 고정 코드북에 의한 부호화를 위한 잔차(residual)를 남기게 된다. 각 서브프레임에 대한 잔차 신호는 다음과 같이 계산된다.Once the adaptive codebook component is determined, this component is subtracted from the excitation value, leaving a residual for encoding by the fixed codebook. The residual signal for each subframe is calculated as follows.

여기서,x ₂ []는 고정 코드북 검색을 위한 타겟이며,x[]는 적응 코드북 검색을 위한 타겟이고,는 양자화된 피치 이득이며,v[]는 (보간된) 적응 코드워드이다.Here, x ₂ [] is a target for fixed codebook search, x [] is a target for adaptive codebook search, Is the quantized pitch gain and v [] is the (interpolated) adaptive codeword.

고정 코드북 검색은 적응 코드북 성분이 제거된 이후에 잔차 신호에 대한 최적을 매칭을 찾도록 이루어진다. 이것은 비음성(unvoiced speech)에 대해 중요하며 적응 코드북의 프라이밍(priming)에 대해서 중요하다. 변환코딩에 이용되는 코드북 검색은, 많은 양의 최초 음성의 분석이 이미 이루어진 이후 코덱에서 사용되는 것보다 더 간단하게 될 수 있다. 또, 코드북 검색이 수행되는 신호는 합성된 음성을 대신하여 재구성된 여기 신호이기 때문에, 고정 북 코딩이 가능한 구조를 갖게 된다.The fixed codebook search is made to find the best match for the residual signal after the adaptive codebook component is removed. This is important for unvoiced speech and for priming adaptive codebooks. Codebook searches used for transcoding may be simpler than those used in the codec after a large amount of initial speech analysis has already been made. In addition, since the signal on which the codebook search is performed is an excitation signal reconstructed in place of the synthesized speech, it has a structure capable of fixed book coding.

고정 코드북에 대한 이득은 이전의 4개 서브프레임의 에너지에 기초하여 이동평균 예측을 이용하여 양자화된다. 실제와 예측된 이득간의 상관계수는 (테이블 조사를 통해) 양자화되어 디코더로 전송된다. 정확한 세부사항에 대해서는 GSM-AMR 표준 문헌에 개시되어 있다.The gain for the fixed codebook is quantized using moving average prediction based on the energy of the previous four subframes. The correlation coefficient between actual and predicted gain is quantized (via table lookup) and sent to the decoder. The exact details are disclosed in the GSM-AMR standard literature.

코덱에 대한 (불휘발성) 메모리는 각 서브프레임에 대한 처리를 완료한 때에 갱신될 필요가 있다. 이러한 갱신은 이전 여기 버퍼u[]를 40 샘플(즉, 1 서브프레임)만큼 천이시켜, 가장 오래된 샘플을 버리고 현재의 서브프레임으로부터 버퍼의 최상위 40 샘플로 여기를 복제함으로써 달성된다.The (nonvolatile) memory for the codec needs to be updated when processing for each subframe is completed. This update is accomplished by shifting the previous excitation buffer u [] by 40 samples (ie 1 subframe), discarding the oldest sample and duplicating the excitation from the current subframe to the top 40 samples of the buffer.

본 발명의 실시예가 되는 것으로 고려되는 것이 예시 및 개시되어 있지만, 당업자라면 본 발명의 범위를 벗어남이 없이 다양한 변형 및 등가물로의 대체가 가능하다는 것을 알 수 있을 것이다. 또, 본 명세서에 개시된 핵심적인 발명의 개념으로부터 벗어남이 없이 본 발명의 내용에 특정의 상황을 적용하도록 많은 변형이 가능하다.While what is considered to be an embodiment of the invention has been illustrated and disclosed, those skilled in the art will recognize that various modifications and equivalents may be made without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation to the teachings of the present invention without departing from the essential inventive concept disclosed herein.

Claims

A device for converting a CELP frame from a standard based on one CELP to a standard based on another CELP, and / or in a different mode within a single standard,

A bitstream unpacking module that extracts one or more CELP parameters from the source codec;

An interpolator module coupled to the bitstream unpacking module to interpolate between different frame sizes, subframe sizes, and / or sampling rates of the source and destination codecs;

A mapping module coupled to the interpolator module for mapping one or more CELP parameters from the source codec to one or more CELP parameters of the destination codec;

A destination bitstream packing module coupled to the mapping module to form at least one destination output CELP frame based on at least one CELP parameter from the destination codec;

Coupled to at least the destination bitstream packing module, the mapping module, the interpolator module, and the bitstream unpacking module to monitor operation of the one or more modules, receive instructions from one or more external applications, And a controller for providing status information to an external application.

The apparatus of claim 1, wherein the controller is a single or multiple controller.

The apparatus of claim 1, wherein the mapping module and the destination bitstream packing module are in the same module.

The apparatus of claim 1, wherein the mapping module is a single or multiple module.

The apparatus of claim 1, wherein the interpolator module is a single or multiple module.

The method of claim 1, wherein the bitstream unpacking module,

A bitstream processor for extracting information from a source CELP codec input frame into a first format of one or more CELP parameters;

An LSP decoding module coupled to the bitstream processor to output one or more LSP coefficients using at least the information from the source CELP codec input frame;

A decoding module coupled to the bitstream processor for decoding the information to output a pitch delay parameter and a pitch gain parameter from the source CELP codec input frame;

A fixed codebook decoding module coupled to the bitstream processor for decoding the information to output a fixed codebook vector;

An adaptive codeword decoding module coupled to the bitstream processor for decoding the information to output an adaptive codebook contribution vector;

And an excitation generator coupled to the fixed codebook decoding module and the adaptive codeword decoding module to output an excitation vector using at least the fixed codebook vector and the adaptive codebook vector.

The method of claim 1, wherein the interpolator module,

An LSP processor for converting one or more LSP coefficients of the source codec into one or more LSP coefficients of the destination codec when the source codec and the destination codec have different subframe sizes;

An adaptive codebook processor for converting a pitch delay and a pitch gain of the source codec into a pitch delay and a pitch gain of the destination codec when the source codec and the destination codec have different subframe sizes;

And a CELP parameter buffer for holding one or more CELP parameters that need to be buffered for interpolation when the source codec and the destination codec have different subframe sizes.

The method of claim 1, wherein the parameter mapping and tuning module,

A parameter mapping and tuning scheme switching module for selecting a CELP parameter mapping scheme based on a plurality of schemes;

And a parameter matching and tuning method module for outputting the at least one destination CELP parameter.

The method of claim 8, wherein the plurality of schemes,

A CELP parameter direct spatial mapping module;

A filtered excitation space domain analysis module;

CELP frame conversion device comprising the analysis in the excitation space domain module.

The method of claim 8, wherein the parameter mapping and tuning method module,

An LSP coefficient converter for encoding the destination LSP coefficients;

And a CELP excitation mapping unit having CELP excitation parameters including pitch delay, gain and excitation vector from interpolation to obtain coded CELP excitation parameters.

The method of claim 10, wherein the CELP excitation mapping unit,

A CELP parameter direct spatial mapping module for generating coded destination CELP parameters using analysis without any repetition;

An analysis module of excitation space domain mapping for generating a destination CELP parameter encoded by the search in the excitation space domain;

And an analysis module of the filtered excitation space domain for generating an encoded destination CELP parameter by searching for an adaptive closed loop in the excitation space and a fixed codebook in the filtered excitation space.

2. The destination bitstream packing module of claim 1, wherein the destination bitstream packing module comprises a plurality of frame packing facilities, each facility being applicable to a preselected application from a plurality of applications for a selected destination CELP coder. The CELP coder is one of a plurality of CELP coders including the destination CELP coder.

The method of claim 1, wherein the controller,

A control unit for receiving an external command and controlling each signal processing module;

CELP frame conversion apparatus characterized in that it comprises a state unit for transmitting the conversion coding information, such as frame, count, error delay to the outside on request.

10. The apparatus of claim 1, wherein the interpolator module can be selected from linear or nonlinear interpolation.

The method of claim 7, wherein the CELP parameter buffer,

An excitation vector buffer that stores a reconstructed excitation vector waiting for mapping in the next subframe or frame;

An LSP coefficient buffer for storing the LSP coefficients waiting for matching processing in the next subframe or frame before or after interpolation;

And a CELP parameter buffer for storing a pitch delay, a pitch gain, a codebook gain, and an index before or after interpolation waiting for a mapping process in a next subframe or frame.

A method of transcoding a compressed speech bitstream based on CELP from a source codec to a destination codec.

Processing the source codec input CELP bitstream to unpack at least one CELP parameter from the input CELP bitstream;

One or more of a plurality of destination codec parameters having a frame size, subframe size, and / or sampling rate of the destination codec format, and a plurality of source codec parameters having a frame size, subframe size, or sampling rate of the source codec format. Interpolating one or more of a plurality of unpacked CELP parameters from the source codec format to the destination codec format if there is a difference in one or more;

Encoding at least one CELP parameter for the destination codec;

Processing a destination CELP bitstream by at least packing one or more CELP parameters for the destination codec.

The method of claim 16, wherein processing the source codec input comprises:

Converting an input bitstream frame into information associated with one or more of the CELP parameters;

Decoding the information into one or more CELP parameters;

Reconstructing the excitation vector based on at least one of the CELP parameters;

And converting the CELP parameter to an interpolator.

The method of claim 16, wherein the interpolation process,

Interpolating one or more of the LSP coefficients from the source codec to one or more LSP coefficients for the destination codec;

Interpolating other CELP parameters other than the LSP coefficients from the source codec to other CELP parameters for the destination codec;

And if the excitation vector does not require adjustment, transmitting the excitation vector of the source to an encoding processor.

19. The method of claim 18, further comprising transforming one or more LSP coefficients using a linear transform process.

19. The method of claim 18, further comprising: converting the source codec excitation vector into a synthesized speech vector using at least one decoded source LPC coefficient;

Quantizing the destination LPC coefficients;

Converting the synthesized speech vector back into a adjusted excitation vector using at least the quantized destination LPC coefficients;

And transmitting the adjusted excitation vector to another processor.

The method of claim 16, wherein the encoding step,

Quantizing the destination LPC coefficients;

Parameter mapping and tuning method comprising the step of selecting one of the CELP mapping methods, ie, CELP parameter direct spatial mapping, analysis in the excitation space domain and analysis in the filtered excitation space domain according to a control signal from the switching module. A conversion coding method characterized by the above-mentioned.

The method of claim 21, wherein the CELP parameter direct spatial mapping operation,

Encoding a pitch delay from the interpolated pitch delay parameter;

Encoding a pitch gain from the interpolated pitch gain parameter;

Encoding the index of the fixed codebook from the analysis format;

And encoding the gain of the fixed codebook gain parameter.

The method of claim 21, wherein the analyzing operation in the excitation space domain mapping comprises:

Selecting a pitch delay as an initial value from the interpolated pitch delay parameter;

Searching for a pitch delay in the closed loop of the excitation space;

Searching for a pitch gain in the excitation space;

Constructing a target signal for fixed codebook search;

Retrieving a fixed codebook index in the space here;

Retrieving a fixed codebook gain in the excitation space;

And updating the previous excitation vector.

The method of claim 21, wherein the analyzing operation in the filtered excitation space domain mapping comprises:

Searching for a pitch delay in the closed loop of the excitation space;

Searching for a pitch gain in the excitation space;

Constructing a target signal for fixed codebook search;

Retrieving a fixed codebook index in the filtered excitation space;

Searching for a fixed codebook gain in the filtered excitation space;

And updating the previous excitation vector.

22. The method of claim 21, wherein the selection is not limited to the three schemes, but a combination of the three schemes can be selected as a new mapping scheme.

2. The CELP frame of claim 1, further comprising a silent frame transform coding unit capable of performing a quick conversion of the silent frame from one speech coding standard to another, and mapping a comfort noise parameter. Converter.

The method of claim 1, wherein the parameter mapping and tuning module includes a voice activity detector for generating a silence frame, wherein the voice activity detector determines voice / silence based on parameters in a CELP space. CELP frame converter.

10. The apparatus of claim 1, further comprising a system for modifying the excitation mapping scheme used, applying it to available computing resources and allowing for gradual degradation of the burden.

Wherein the excitation mapping is performed without returning back to the speech signal region.

A method of processing a compressed voice bitstream based on CELP from a source codec to a destination codec format.

Sending one control signal of the plurality of control signals from the application processor;

Selecting one CELP mapping scheme from a plurality of different CELP mapping schemes based on at least a control signal from the application;

And performing a mapping process using the selected CELP mapping scheme to map one or more CELP parameters from a source codec format to one or more CELP parameters of a destination codec format.

31. The method of claim 30, wherein the plurality of CELP mapping schemes include CELP parameter direct spatial mapping, analysis in an excitation space domain, or analysis in a filtered excitation space domain.

31. The method of claim 30, wherein the step of selecting one CELP mapping scheme is for a predetermined application during setup or configuration processing.

31. The method of claim 30, further comprising receiving the control signal at a switching module coupled to each of the plurality of mapping schemes.

31. The method of claim 30, wherein the control signal is provided based on a computing resource characteristic of the selected CELP mapping scheme.

31. The method of claim 30, wherein at least one of the plurality of mapping schemes is provided to a library of memory.

32. The method of claim 31, further comprising: encoding one or more CELP parameters for the destination codec;

37. The method of claim 36, further comprising transmitting a packed destination CELP bitstream to the destination codec.

A system for processing a compressed voice bitstream based on CELP from a source codec to a destination codec format.

One or more codes for receiving a control signal of one of the plurality of control signals from an application processor;

One or more codes for selecting one CELP mapping scheme from a plurality of different CELP mapping schemes based on at least control signals from the application;

And one or more codes that perform mapping processing using the selected CELP mapping scheme to map one or more CELP parameters from a source codec format to one or more CELP parameters of a destination codec format.

The method of claim 38, wherein the plurality of CELP mapping schemes comprise one or more codes for CELP parameter direct spatial mapping, one or more codes for analysis in an excitation space domain, or one or more codes for analysis in a filtered excitation space domain. System comprising a.

39. The system of claim 38, wherein the selected CELP mapping scheme is for a predetermined application.

39. The system of claim 38, wherein the one or more codes for receiving the control signal are provided to a method switching module, wherein the method switching module is coupled to each of a plurality of mapping methods.

39. The system of claim 38, wherein the control signal is provided based on a computing resource characteristic of the selected CELP mapping scheme.

39. The system of claim 38, wherein one or more codes for the plurality of mapping schemes are provided to a library of memory.

44. The apparatus of claim 43, further comprising: one or more codes for encoding one or more CELP parameters for the destination codec;

And at least one code for processing a destination CELP bitstream by at least packing one or more CELP parameters for the destination codec.

45. The system of claim 44, further comprising one or more codes for transmitting the destination CELP bitstream to the destination codec.

45. The system of claim 44, further comprising one or more codes for transmitting the destination CELP bitstream to a storage location.