KR100651731B1

KR100651731B1 - Apparatus and method for variable frame speech encoding/decoding

Info

Publication number: KR100651731B1
Application number: KR1020040097916A
Authority: KR
Inventors: 이미숙; 김도영; 성종모; 김현우; 강홍구; 정성교; 윤대희; 김홍국
Original assignee: 한국전자통신연구원; 학교법인연세대학교
Priority date: 2003-12-26
Filing date: 2004-11-26
Publication date: 2006-12-01
Also published as: KR20050066996A

Abstract

본 발명은 입력 음성 신호의 특성에 따라 입력 음성 신호의 등급을 분류하고 분류한 등급에 대응하는 프레임 크기, 양자화기 구조 및 비트 할당 구조를 사용하여 부호화를 수행하거나, 네트워크의 상태 또는 통화 상대방이 사용하는 음성 부호화기의 종류에 따라 프레임 크기를 조정할 수 있는 음성 부호화/복호화 장치 및 그 방법에 관한 것이다. 본 발명은 입력 음성의 특성에 따라 프레임의 크기와 양자화기 구조 및 비트 할당을 최적으로 조정하여 음성 부호화 장치의 성능을 향상 시킬 수 있으며, 통화 상대방이 사용하는 음성 부호화기의 종류에 따라 프레임의 크기를 조정함으로써 전체 통화 지연을 줄일 수 있다. The present invention classifies an input speech signal according to the characteristics of the input speech signal and performs encoding using a frame size, a quantizer structure, and a bit allocation structure corresponding to the classified classification, or is used by a network state or a call counterpart. The present invention relates to a speech encoding / decoding apparatus capable of adjusting a frame size according to a type of speech encoder and a method thereof. The present invention can improve the performance of the speech encoding apparatus by optimally adjusting the size of the frame, the structure of the quantizer and the bit allocation according to the characteristics of the input speech, and the size of the frame according to the type of speech encoder used by the call counterpart. By adjusting, you can reduce overall call delay.

Description

Apparatus and method for variable frame speech encoding / decoding}

도 1은 입력 음성 신호의 특성에 따라 부호화/복호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 장치 및 음성 복호화 장치의 일 실시예의 구성을 도시한 도면,1 is a diagram illustrating a configuration of an embodiment of a speech encoding apparatus and a speech decoding apparatus according to the present invention for optimally performing encoding / decoding according to characteristics of an input speech signal.

도 2는 입력 음성 신호의 특성에 따라 부호화를 최적으로 수행하는 본 발명에 따른 입력 음성 등급 결정부에 의한 입력 음성 등급 결정의 일 예를 도시한 도면,2 is a diagram illustrating an example of determining an input speech grade by an input speech rating determiner according to the present invention for optimally performing encoding according to characteristics of an input speech signal.

도 3은 입력 음성 신호의 특성에 따라 부호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 장치의 가변 조정 음성 부호화부의 구성을 도시한 도면,3 is a diagram illustrating a configuration of a variable adjusted speech encoder of a speech encoder according to an embodiment of the present invention, which performs encoding optimally according to characteristics of an input speech signal.

도 4는 입력 음성 신호의 특성에 따라 복호화를 최적으로 수행하는 본 발명에 따른 음성 복호화 장치의 가변 조정 음성 복호화부의 구성을 도시한 도면, 4 is a diagram illustrating a configuration of a variable adjusted speech decoder of a speech decoding apparatus according to the present invention for optimally performing decoding according to characteristics of an input speech signal.

도 5a 및 도 5b는 입력 음성 신호의 특성에 따라 부호화/복호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 방법 및 복호화 방법의 흐름을 도시한 도면,5A and 5B illustrate a flow of a speech encoding method and a decoding method according to the present invention for optimally performing encoding / decoding according to characteristics of an input speech signal.

도 6은 네트워크 상태를 고려하여 음성 통화에 필요한 지연을 줄이는 본 발명에 따른 음성 부호화/복호화 장치의 일 실시예의 구성을 도시한 도면,FIG. 6 is a diagram illustrating a configuration of an embodiment of a speech encoding / decoding apparatus according to the present invention for reducing a delay required for a voice call in consideration of network conditions. FIG.

도 7a 및 도 7b는 네트워크 상태를 고려하여 음성 통화에 필요한 지연을 줄 이는 본 발명에 따른 음성 부호화/복호화 방법의 일 실시예의 흐름을 도시한 흐름도,7A and 7B are a flowchart illustrating a flow of an embodiment of a speech encoding / decoding method according to the present invention for reducing a delay required for a voice call in consideration of network conditions;

도 8은 수신측의 음성 부호화기의 종류에 따라 프레임의 크기를 조정하는 음성 부호화/복호화 장치의 일 실시예의 구성을 도시한 도면, 그리고,FIG. 8 is a diagram showing the configuration of an embodiment of a speech encoding / decoding apparatus for adjusting the size of a frame according to the type of speech encoder on the receiving side; FIG.

도 9a 및 도 9b는 수신측의 음성 부호화기의 종류에 따라 프레임의 크기를 조정하는 음성 부호화/복호화 방법의 일 실시예의 흐름을 도시한 흐름도이다.9A and 9B are flowcharts illustrating a flow of an embodiment of a speech encoding / decoding method for adjusting a frame size according to a type of a speech encoder on a receiving side.

본 발명은 음성 부호화/복호화 장치 및 그 방법에 관한 것으로, 보다 상세하게는 효율적인 음성 부호화를 위해 프레임 크기, 양자화기 구조 및 비트-할당 방법을 입력 음성 신호의 특성에 따라 조정할 수 있으며, 네트워크의 상태 또는 통화 상대방의 음성 부호화기의 유형에 따라 프레임 크기를 조정할 수 있는 음성 부호화/복호화 장치 및 그 방법에 관한 것이다.The present invention relates to a speech encoding / decoding apparatus and a method thereof, and more particularly, to adjust a frame size, a quantizer structure, and a bit-allocation method according to characteristics of an input speech signal for efficient speech encoding. The present invention also relates to a speech encoding / decoding apparatus and method for adjusting a frame size according to a type of a speech encoder of a call counterpart.

종래에 음성신호를 디지털화하여 처리하는 다양한 부호화 방법이 제안되어 사용되고 있다. 종래에 가장 널리 사용되는 부호화 방법은 펄스부호변조(Pulse Code Modulation:PCM)와 같은 파형 부호화(Waveform coding)방법과 ITU-T(Internation Telecommunication Union-Telecommunication Standardization Sector)의 표준안에서 주류를 이루고 있는 CELP(Code_Excited Linear Predicition)와 같은, 파형 부호화와 파라미터 부호화(parametic coding)가 결합된 혼성 부호화 (hybrid coding)방법이 있다. Conventionally, various encoding methods for digitizing and processing a speech signal have been proposed and used. Conventionally, the most widely used coding method is CELP (Waveform coding) such as Pulse Code Modulation (PCM) and CELP (mainstream in the standard of the International Telecommunication Union-Telecommunication Standardization Sector). There is a hybrid coding method in which waveform coding and parameter coding are combined, such as Code_Excited Linear Predicition.

대부분의 혼성 부호화 방법에서는 음성 신호의 효율적인 압축을 위해 음성신호의 발성 모델(production model)에 기반을 두어, 음성 신호를 성도 전달함수(vocal tract transfer function)를 나타내는 스펙트럼 정보와 여기 신호(excitation signal) 성분으로 분리하여 각각을 적절한 방법으로 모델링 및 양자화하여 전송한다. 대표적인 혼성 부호화 기술로는 ITU-T의 G.723.1과 G.729, 그리고 IMT-2000에 사용 예정인 적응 다중 전송률(Adaptive Multi-Rate:AMR)부호화 방법 등이 있다.In most hybrid coding methods, for efficient compression of speech signals, spectral information and excitation signals representing the vocal tract transfer function are based on a production model of the speech signals. Separated into components, each is modeled, quantized, and transmitted in an appropriate manner. Representative hybrid coding techniques include G.723.1 and G.729 of ITU-T, and Adaptive Multi-Rate (AMR) encoding method to be used for IMT-2000.

ITU-T G.723.1은 멀티미디어 신호를 적은 비트를 사용하여 압축하기 위해 표준화된 음성 부호화기로 입력 음성을 30ms 프레임 단위로 나누어 5.3/6.3 kbit/s의 두 가지 전송률로 압축/복원하는 알고리즘으로, 유선망의 음성 품질(toll quality)을 제공한다.ITU-T G.723.1 is a standardized speech coder for compressing multimedia signals using fewer bits.It is an algorithm that compresses and restores input speech by 30ms frame unit and compresses / restores at two data rates of 5.3 / 6.3 kbit / s. Toll quality.

그리고, ITU-T G.729는 입력 음성을 10ms의 구간으로 나누어 8 kbit/s의 비트율로 압축하고 복원하는 알고리즘으로 유선망의 음성 품질을 제공하며, G.723.1과 더불어 VoIP 분야에 널리 사용된다. 많은 계산량을 요구하는 G.729의 효율적인 구현을 위해 G.729의 프레임 크기와 비트-호환을 유지하면서 복잡도가 감소된 버전인 G.729A도 널리 사용되고 있다.ITU-T G.729 is an algorithm that divides the input voice into 10ms and compresses and restores it at a bit rate of 8 kbit / s. It provides voice quality of wired networks, and is widely used in the VoIP field along with G.723.1. For the efficient implementation of G.729, which requires a large amount of computation, G.729A, a reduced complexity version while maintaining the frame size and bit-compatibility of G.729, is also widely used.

이 외에도 차세대 음성 통신을 위한 적응 다중 전송률 부호화기가 있다. 이 부호화기의 종류로는 전화선 대역 음성을 처리하는 협대역 적응 다중 전송률(AMR narrowband:AMR-NB) 부호화기와 광대역 신호를 처리하는 광대역 적응 다중 전송률 (AMR wideband:AMR-WB) 부호화기가 있다. 두 부호화기 모두 20ms 단위로 입력 음성을 분석하여 부호화한다.In addition, there is an adaptive multi-rate coder for next-generation voice communications. Types of this coder include a narrowband adaptive multi-rate (AMR-NB) encoder for processing telephone line band speech and a wideband adaptive multiple-rate (AMR-WB) encoder for processing wideband signals. Both encoders analyze and encode input speech in units of 20ms.

종래의 음성 부호화기는 유/무선 음성 통신 시스템에서 음성 발성 모델에 기반한 CELP 알고리즘을 사용하여 프레임마다 음성의 스펙트럼 정보와 여기 신호 정보를 추출하여 양자화한다. 그러나, CELP 알고리즘을 사용하는 종래의 음성 부호화기는 입력 음성의 특성에 상관없이 동일한 프레임 크기를 사용하므로, 음질 또는 부호화 효율면에서 성능저하가 발생하는 문제점이 있다.A conventional speech coder extracts and quantizes speech spectrum information and excitation signal information for each frame using a CELP algorithm based on a speech utterance model in a wired / wireless speech communication system. However, since the conventional speech coder using the CELP algorithm uses the same frame size regardless of the characteristics of the input speech, performance degradation occurs in terms of sound quality or encoding efficiency.

특히, G.729와 같이 파라미터 분석을 위한 프레임의 크기가 10ms인 경우에는 급속히 변화하는 천이구간의 모델링에는 적합한 반면, 유성음과 같은 정적 구간에서는 부호화 효율이 떨어지는 문제점이 있다. In particular, when the size of the frame for parameter analysis, such as G.729, is 10 ms, it is suitable for modeling a rapidly changing transition section, but the coding efficiency is inferior in a static section such as voiced sound.

반면, G.723.1에서 사용하는 30ms 프레임 크기는 유성음 구간 부호화에는 적합하지만, 천이구간에서는 스펙트럼 정보의 전송 빈도가 충분하지 않아 부프레임에서 스펙트럼 정보의 왜곡이 커지는 현상이 발생한다. On the other hand, the 30ms frame size used in G.723.1 is suitable for voiced sound interval coding, but the frequency of transmission of spectral information is not sufficient in the transition interval, resulting in a large distortion of the spectral information in the subframe.

즉, 입력 음성의 특성에 상관없이 동일한 크기의 분석 프레임 크기와 양자화기 구조 그리고 비트-할당을 사용하는 종래의 부호화기는 입력 음성의 특성에 따라 성능의 편차가 커지는 문제점이 있다.That is, conventional encoders using the same analysis frame size, quantizer structure, and bit-allocation regardless of the characteristics of the input speech have a problem in that the performance variation increases according to the characteristics of the input speech.

또한, 종래의 음성 부호화기는 항상 고정된 프레임 크기로 동작하고 음성 부호화기마다 프레임 크기가 정해져 있다. 예를 들어, G.723.1은 30msec, G.729는 10msec 그리고 AMR-NB 부호화기는 20msec의 프레임 크기를 가지고 있으며 항상 같은 프레임 크기로 음성 신호를 처리한다.In addition, the conventional speech coder always operates at a fixed frame size, and a frame size is determined for each speech coder. For example, G.723.1 has 30msec, G.729 has 10msec and AMR-NB coder has a frame size of 20msec and always processes speech signals with the same frame size.

최근 들어, IP 망을 통해 음성 데이터를 전송하고자 하는 Voice over IP(VoIP)에 대한 관심이 많아지고 있다. 일반적으로 음성 통화 시에는 end-to-end 지연이 150msec 이하가 되어야 좋은 품질의 서비스를 제공할 수 있는 것으로 알려져 있다. 만일 지연이 길어지면 반향이 발생하고, 대화가 자연스럽게 진행되지 않기 때문이다. 또한, 패킷망에서는 통화 도중에도 end-to-end 지연이 계속 바뀔 수 있기 때문에 일정한 지연을 유지하기가 어렵다. 고품질의 음성 서비스를 위해서는 지연이 150msec 이하가 되어야 하며 통화 도중에도 이 지연이 계속 유지되어야 한다.Recently, there is a growing interest in Voice over IP (VoIP) for transmitting voice data through an IP network. In general, it is known that an end-to-end delay of 150msec or less can provide a good quality service during a voice call. If the delay is long, reverberation occurs and the conversation does not go smoothly. In addition, it is difficult to maintain a constant delay in the packet network because the end-to-end delay can be continuously changed during a call. For high quality voice service, the delay should be less than 150msec and this delay should be maintained during the call.

통화를 원하는 상대방과 음성 부호화기가 일치하지 않을 경우에는 transcodec을 사용하여 통화를 할 수 있다. 현재 패킷망에서는 상대방과 같은 음성 부호화기를 가지고 있지 않으면 통화가 이루어지지 않지만, 서로 다른 음성 부호화기를 사용하는 IP망 사용자와 무선망 가입자간에는 transcodec을 통해 통화를 지원하고 있다.If the voice coder does not match with the party you want to talk to, you can use transcodec to make the call. In the current packet network, the call cannot be made unless the other party has the same voice coder, but the IP network user using the different voice coders and the subscribers of the wireless network support the call through transcodec.

현재 CDMA에서는 EVRC(Enhanced Variable Rate Coder)나 QCELP(Qualcomm Code Excited Linear Prediction)와 같은 음성 부호화기를 많이 사용하고 있으며, VoIP 분야에서는 G.729나 G.723.1과 같은 음성 부호화기를 주로 사용하고 있다. 예를 들어, G.723.1을 사용하고 있는 IP 전화 사용자가 EVRC를 사용하고 있는 무선망 가입자에게 전화를 걸기 위해서는 중간에 Transcodec이 필요하다. Currently, CDMA uses many voice coders such as Enhanced Variable Rate Coder (EVRC) and Qualcomm Code Excited Linear Prediction (QCELP), and voice coders such as G.729 and G.723.1 are mainly used in VoIP. For example, an IP phone user using G.723.1 needs a Transcodec to make a call to a wireless network subscriber using EVRC.

Transcodec에서는 G.723.1로 부호화되어 전송된 비트열을 EVRC로 복호화 할 수 있는 비트열로 변환시켜 주고, EVRC로 부호화되어 전송된 비트열을 G.723.1 복 호화기로 복호화 할 수 있도록 변환시켜 준다. Transcoding을 위해서는 기본적으로 양쪽 사용자가 사용하는 음성 부호화기의 프레임 크기의 최대 공배수 만큼의 지연이 필요하다.Transcodec converts the bitstreams encoded and encoded by G.723.1 into bitstreams that can be decoded by EVRC, and converts the bitstreams encoded and encoded by EVRC to be decoded by the G.723.1 decoder. Transcoding basically requires a delay of the maximum common multiple of the frame size of the speech encoder used by both users.

따라서, G.723.1과 EVRC 부호화기를 사용하고 있는 가입간의 통화를 위해서는 Transcoding을 위해 기본적으로 60msec의 지연이 필요하다. 이러한 지연의 증가는 전체 서비스 품질에 영향을 미칠 수 있다.Therefore, 60msec delay is basically required for transcoding for the call between G.723.1 and the subscription using the EVRC encoder. This increase in delay can affect the overall quality of service.

본 발명이 이루고자 하는 기술적 과제는, 입력 음성의 특성에 따라 프레임의 크기를 가변적으로 조정하고, 스펙트럼 정보와 여기 신호를 양자화하기 위한 양자화기의 구조를 적응적으로 사용하며, 스펙트럼 정보와 여기 신호 양자화에 할당되는 비트의 조합을 적응적으로 조정하여 음성 부호화/복호화의 성능을 향상시키기 위한 음성 부호화/복호화 장치 및 그 방법을 제공하는 데 있다.The technical problem to be achieved by the present invention is to variably adjust the frame size according to the characteristics of the input speech, to adaptively use the structure of the quantizer for quantizing the spectral information and the excitation signal, and to quantize the spectral information and the excitation signal. There is provided a speech encoding / decoding apparatus and a method for adaptively adjusting a combination of bits allocated to the to improve the performance of speech encoding / decoding.

본 발명이 이루고자 하는 다른 기술적 과제는, 패킷망에서 네트워크의 상태 또는 통화 상대방의 부호화기의 종류에 따라 음성 부호화기의 프레임 크기 및 패킷당 프레임 수를 조정함으로써 음성 데이터 전송에 필요한 전체적인 지연을 조정하거나 Transcoding에 필요한 지연을 줄임으로써 서비스 품질을 높일 수 있는 음성 부호화/복호화 장치 및 그 방법을 제공하는 데 있다.Another technical problem to be solved by the present invention is to adjust the frame size of the voice encoder and the number of frames per packet according to the state of the network or the type of encoder of the call partner in the packet network to adjust the overall delay required for transmitting the voice data or to perform the transcoding. The present invention provides a speech encoding / decoding apparatus and method for improving service quality by reducing delay.

본 발명이 이루고자 하는 다른 기술적 과제는, 패킷 전송을 위한 프레임 크기와 패킷 인코딩을 위한 프레임 크기를 달리하는 음성 부호화/복호화 장치 및 그 방법을 제공하는 데 있다.Another object of the present invention is to provide a speech encoding / decoding apparatus and method for differentiating a frame size for packet transmission and a frame size for packet encoding.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 장치의 일 실시예는, 입력 음성의 등급을 천이 구간 및 정적 구간으로 분류하는 입력 음성 등급 결정부; 상기 결정된 등급에 따라 다른 프레임 크기, 양자화기 구조 및 비트-할당을 사용하여 부호화를 가변적으로 수행하는 가변 조정 음성 부호화부; 및 상기 가변의 프레임크기로 부호화된 입력 음성의 비트열을 출력하는 다중화부;를 포함한다.In accordance with one aspect of the present invention, there is provided an apparatus for encoding a speech, comprising: an input speech rating determiner classifying a rating of an input speech into a transition section and a static section; A variable adjusted speech encoder that variably performs encoding using a different frame size, quantizer structure, and bit-allocation according to the determined grade; And a multiplexer for outputting a bit string of the input speech encoded with the variable frame size.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 방법의 일 실시예는, (a) 입력 음성의 등급을 천이 구간 및 정적 구간으로 분류하는 단계; (b) 상기 결정된 등급에 따라 다른 프레임 크기, 양자화기 구조 및 비트-할당을 사용하여 부호화를 가변적으로 수행하는 단계; 및 (c) 상기 가변의 프레임크기로 부호화된 입력 음성의 비트열을 출력하는 단계;를 포함한다.In order to achieve the above technical problem, an embodiment of the speech encoding method according to the present invention comprises: (a) classifying a grade of an input speech into a transition period and a static period; (b) variably performing encoding using a different frame size, quantizer structure, and bit-allocation according to the determined class; And (c) outputting a bit string of the input speech encoded with the variable frame size.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 복호화 장치의 일 실시예는, 입력 음성의 등급에 따라 다른 프레임 크기, 양자화기 구조 및 비트-할당을 사용하여 부호화된 비트열을 수신하면 상기 비트열로부터 복호화에 필요한 파라미터 정보를 추출하는 역다중화부; 입력 음성의 등급별로 구비되어 상기 수신한 입력 음성의 등급에 해당하는 복호화를 가변적으로 수행하는 가변 조정 음성 복호화부; 및 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장하는 임시 저장부;를 포함한다.In order to achieve the above technical problem, an embodiment of a speech decoding apparatus according to the present invention may include receiving a bit string encoded using a different frame size, quantizer structure, and bit-allocation according to a grade of an input speech. A demultiplexer for extracting parameter information necessary for decoding from the bit string; A variable adjusted speech decoder provided for each grade of the input speech and variably performing decoding corresponding to the received grade of the input speech; And a temporary storage unit for temporarily storing the decoded input voice to be continuously output.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 복호화 방법의 일 실시예는, (a) 입력 음성의 등급에 따라 다른 프레임 크기, 양자화기 구조 및 비트-할당을 사용하여 부호화된 비트열을 수신하면 상기 비트열로부터 복호화에 필요한 파라미터 정보를 추출하는 단계; (b) 입력 음성의 등급별로 구비되어 상기 수신한 입력 음성의 등급에 해당하는 복호화를 가변적으로 수행하는 단계; 및 (c) 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장하는 단계;를 포함한다.In order to achieve the above technical problem, an embodiment of a speech decoding method according to the present invention includes: (a) encoding a bit string encoded using a different frame size, quantizer structure, and bit-allocation according to a grade of an input speech. Extracting parameter information necessary for decoding from the bit string when received; (b) variably performing decoding corresponding to the grade of the received input speech, provided for each grade of the input speech; And (c) temporarily storing the decoded input speech to be continuously output.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 장치의 다른 실시예는, 네트워크의 지연 정보 또는 통화 상대방 부호화기에 대한 정보를 기초로 입력 음성 전송을 위한 프레임 크기 및 패킷 당 프레임 수를 결정하는 프레임 결정부; 상기 결정된 프레임 크기 및 프레임 수에 대응하여 상기 입력 음성을 가변적으로 부호화하는 가변 조정 음성 부호화부; 및 상기 가변의 프레임 크기로 부호화된 입력 음성의 비트열을 출력하는 다중화부;를 포함한다.Another embodiment of the speech encoding apparatus according to the present invention for achieving the above technical problem is to determine the frame size and the number of frames per packet for the input voice transmission based on the delay information of the network or information on the call counterpart encoder A frame determination unit; A variable adjusted speech encoder that variably encodes the input speech corresponding to the determined frame size and the number of frames; And a multiplexer for outputting a bit string of the input speech encoded with the variable frame size.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 방법의 다른 실시예는, (a) 네트워크의 지연 정보 또는 통화 상대방의 부호화기에 대한 정보를 기초로 음성 신호를 코딩하기 위한 프레임 크기 및 패킷 당 프레임 수를 결정하는 단계; (b) 상기 결정된 프레임 크기 및 프레임 수에 대응하여 상기 음성 신호를 가변적으로 부호화하는 단계; 및 (c) 상기 가변의 프레임크기로 부호화된 음성 신호의 비트열을 출력하는 단계;를 포함한다.Another embodiment of the speech encoding method according to the present invention for achieving the above technical problem, (a) the frame size and packet for coding the speech signal based on the delay information of the network or information on the encoder of the call counterpart Determining the number of frames per frame; (b) variably encoding the speech signal corresponding to the determined frame size and the number of frames; And (c) outputting a bit string of the speech signal encoded in the variable frame size.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 복호화 장치의 다른 실시예는, 네트워크의 지연 정보를 기초로 부호화된 비트열을 수신하면 상기 비트열로부터 복호화에 필요한 파라미터 정보를 추출하는 역다중화부; 프레임 크기 별로 구비되어 상기 수신한 입력 음성의 크기에 해당하는 복호화를 가변적으로 수행하는 가변 조정 음성 복호화부; 및 상기 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장하는 임시 저장부;를 포함한다.Another embodiment of the speech decoding apparatus according to the present invention for achieving the above technical problem is, demultiplexing to extract the parameter information necessary for decoding from the bit string when receiving the encoded bit string based on the delay information of the network part; A variable-adjusted speech decoder provided for each frame size to variably perform decoding corresponding to the size of the received input speech; And a temporary storage unit for temporarily storing the decoded input voice to be continuously output.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 복호화 방법의 다른 실시예는, (a) 네트워크의 지연 정보를 기초로 부호화된 비트열을 수신하면 상기 비트열로부터 복호화에 필요한 파라미터 정보를 추출하는 단계; (b) 프레임 크기별로 구비되어 상기 수신한 입력 음성의 크기에 해당하는 복호화를 가변적으로 수행하는 단계; 및 (c) 상기 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장하는 단계;를 포함한다.Another embodiment of the speech decoding method according to the present invention for achieving the above technical problem, (a) when receiving the coded bit strings based on the delay information of the network to extract the parameter information necessary for decoding from the bit strings Doing; (b) variably performing decoding corresponding to the size of the received input speech provided for each frame size; And (c) temporarily storing the decoded input speech to be continuously output.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 장치의 다른 실시예는, 입력 음성의 특성, 네트워크의 지연 정보 및 통화 상대방의 부호화기의 코덱 정보 중 어느 하나를 기초로 부호화를 위한 프레임 크기를 결정하고, 결정된 프레임 크기를 기초로 상기 입력 음성을 부호화하는 가변 부호화부; 및 상기 부호화된 프레임을 일정한 전송 간격으로 전송하는 프레임 전송부;를 포함한다. Another embodiment of the speech encoding apparatus according to the present invention for achieving the above technical problem, frame size for encoding based on any one of the characteristics of the input speech, the delay information of the network and the codec information of the encoder of the call counterpart. A variable encoder configured to determine and to encode the input speech based on the determined frame size; And a frame transmitter configured to transmit the encoded frame at a constant transmission interval.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 음성 부호화 방법의 다른 실시예는, 입력 음성의 특성, 네트워크의 지연 정보, 통화 상대방의 부호화기의 코덱 정보 중 어느 하나를 기초로 부호화를 위한 프레임 크기를 결정하고, 결정된 프레임 크기를 기초로 상기 입력 음성을 부호화하는 단계; 및 상기 부호화된 프레임을 일정한 전송 간격로 전송하는 단계;를 포함한다. Another embodiment of the speech encoding method according to the present invention for achieving the above technical problem, frame size for encoding based on any one of the characteristics of the input speech, network delay information, the codec information of the encoder of the call partner. Determining and encoding the input speech based on the determined frame size; And transmitting the encoded frame at a constant transmission interval.

이로써, 입력 음성의 특성에 따라 프레임의 크기와 양자화기 구조 및 비트 할당을 최적으로 조정하며, 네트워크의 상태 및 통화 상대방의 부호화기의 유형을 기초로 프레임의 크기를 조정함으로써, 음성 부호화/복호화 장치의 성능을 향상 시킬 수 있다. Accordingly, the frame size, the quantizer structure and the bit allocation are optimally adjusted according to the characteristics of the input voice, and the frame size is adjusted based on the state of the network and the type of the encoder of the call counterpart. It can improve performance.

이하에서, 첨부된 도면들을 참조하여 본 발명에 따른 음성 부호화/복호화 장치 및 그 방법에 관해 상세히 설명한다.Hereinafter, a speech encoding / decoding apparatus and a method thereof according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 입력 음성 신호의 특성에 따라 부호화/복호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 장치 및 음성 복호화 장치의 일 실시예의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of an embodiment of a speech encoding apparatus and a speech decoding apparatus according to the present invention for optimally performing encoding / decoding according to characteristics of an input speech signal.

도 1을 참조하면, 음성 부호화 장치를 송신부(100)로 사용하고, 음성 복호화 장치를 수신부(150)로 사용하는 음성 통신 시스템이 도시되어 있다.Referring to FIG. 1, a voice communication system using a voice encoding apparatus as a transmitter 100 and a voice decoding apparatus as a receiver 150 is illustrated.

송신부(100)인 음성 부호화 장치는 입력 음성 등급 결정부(105), 가변 조정 음성 부호화부(110) 및 다중화부(115)로 구성된다. 수신부(150)인 음성 복호화 장치는 역다중화부(155) 및 가변 조정 음성 복호화부(160)로 구성된다.The speech encoding apparatus, which is the transmitter 100, is composed of an input speech grade determiner 105, a variable adjusted speech encoder 110, and a multiplexer 115. The speech decoding apparatus as the receiver 150 includes a demultiplexer 155 and a variable adjusted speech decoder 160.

입력 음성 등급 결정부(105)는 입력 음성의 등급을 결정한다. 입력 음성은 음성 신호의 변화가 큰 천이 구간(transition segment) 및 유성음 구간과 같은 정적 구간으로 등급이 나누어진다. 천이 구간 및 정적 구간에서의 입력 음성은 그 특성이 상이하므로 천이 구간에서는 G.729를 사용하고, 정적 구간에서는 G.723.1을 사용하는 것이 효율적이다. 이와 같이 입력 음성의 특성에 따른 최적의 부호화 방법이 각각 상이하므로, 입력 음성 등급 결정부(105)는 최적의 부호화 방법의 선택을 위해 입력 음성의 특성에 따라 등급을 분류한다. 입력 음성 등급 결정부(105)는 상술한 천이 구간 및 정적 구간의 등급 외에 입력 음성의 특성에 따라 다양한 등급으로 입력 음성을 구분할 수 있다.The input speech grade determining unit 105 determines the grade of the input speech. The input speech is classified into static segments, such as transition segments and voiced sound segments, in which the change of the speech signal is large. Since the input voices in the transition section and the static section have different characteristics, it is efficient to use G.729 in the transition section and G.723.1 in the static section. In this way, since the optimum encoding method according to the characteristics of the input speech is different, the input speech rating determiner 105 classifies the grades according to the characteristics of the input speech in order to select the optimal encoding method. The input voice rating determiner 105 may classify the input voice into various grades according to the characteristics of the input voice in addition to the above-described grades of the transition section and the static section.

입력 음성 등급 결정부(105)는 입력 음성의 특성에 따라 등급을 구분하기 위하여 개회로(open-circuit) 등급 결정 방법 또는 폐회로(closed-circuit) 등급 결정 방법을 사용한다. 개회로 등급 결정 방법은 입력 음성의 특성에 따라 직접적으로 등급을 결정하는 데 반해, 폐회로 등급 결정 방법은 피드백 과정을 통해 입력 음성의 등급을 결정한다.The input speech rating determiner 105 uses an open-circuit rating method or a closed-circuit rating method to classify the ratings according to the characteristics of the input speech. While the open circuit grading method directly determines the rating according to the characteristics of the input speech, the closed loop grading method determines the rating of the input speech through a feedback process.

가변 조정 음성 부호화부(110)는 입력 음성 등급 결정부(105)에 의해 결정된 등급에 따라 미리 설정되어 있는 프레임 크기, 양자화기 구조 및 비트-할당을 사용하여 입력 음성을 부호화한다.The variable adjusted speech encoder 110 encodes the input speech using a frame size, a quantizer structure, and bit-allocation preset in accordance with the grade determined by the input speech grade determiner 105.

다중화부(115)는 가변 조정 음성 부호화부(110)가 가변의 프레임 크기를 사용하는 것을 고려하며, 가변 조정 음성 부호화부로(110)부터 부호화된 입력 음성의 비트열을 출력한다. The multiplexer 115 considers that the variable adjusted speech encoder 110 uses a variable frame size, and outputs a bit string of the input speech encoded from the variable adjusted speech encoder 110.

수신부(150)의 역다중화부(155)는 송신부(100)의 다중화부(115)로부터 출력된 비트열을 수신하고, 수신한 비트열로부터 복호화에 필요한 파라미터 정보를 추출한다. 그리고 역다중화부(150)는 수신한 입력 음성 비트열의 등급에 대응하는 복호화를 수행하도록 가변 조정 음성 복호화부(160)로 수신한 입력 음성 비트열을 전달한다.The demultiplexer 155 of the receiver 150 receives a bit string output from the multiplexer 115 of the transmitter 100, and extracts parameter information necessary for decoding from the received bit string. The demultiplexer 150 transmits the received input speech bit string to the variable adjusting speech decoder 160 to perform decoding corresponding to the class of the received input speech bit string.

가변 조정 음성 복호화부(160)는 수신된 비트열로부터 음성 등급에 대응하는 복호화기를 통하여 음성 신호를 복원한다.The variable adjustment speech decoder 160 restores the speech signal from the received bit string through a decoder corresponding to the speech class.

도 2는 입력 음성 신호의 특성에 따라 부호화를 최적으로 수행하는 본 발명에 따른 입력 음성 등급 결정부에 의한 입력 음성 등급 결정의 일 예를 도시한 도면이다.2 is a diagram illustrating an example of determining an input speech grade by an input speech rating determiner according to the present invention, which performs encoding optimally according to characteristics of an input speech signal.

음성 신호는 다양한 특성을 가지고 있는데, 입력 음성 등급 결정부는 미리 분류된 여러 등급 중 입력 음성이 어느 등급에 속하는 지를 판단한다. 입력 음성 등급 결정부에 의해 결정된 등급에 따라 다른 부호화 방식을 적용한다. The voice signal has various characteristics, and the input voice grade determiner determines which grade the input voice belongs to among the pre-classified classes. Different coding schemes are applied according to the class determined by the input speech class determiner.

도 3은 입력 음성 신호의 특성에 따라 부호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 장치의 가변 조정 음성 부호화부의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of a variable adjusted speech encoder of the speech encoder according to the present invention, which performs encoding optimally according to characteristics of an input speech signal.

도 3을 참조하면, 가변 조정 음성 부호화부(110)는 입력 음성 임시 저장부(300), 적어도 하나 이상의 가변 음성 부호화부(305 내지 315)로 구성된다.Referring to FIG. 3, the variable adjusted speech encoder 110 includes an input speech temporary storage unit 300 and at least one variable speech encoder 305 to 315.

입력 음성 임시 저장부(300)는 입력 음성 등급 결정부(105)에 의해 결정된 입력 음성의 등급에 따라 프레임 단위로 음성 샘플을 저장한다. 입력 음성 임시 저장부(300)에 의해 저장된 입력 음성 샘플은 입력 음성의 등급에 따라 해당하는 등급의 가변 음성 부호화부(305 내지 315)로 전송된다.The input voice temporary storage unit 300 stores voice samples in units of frames according to the grade of the input voice determined by the input voice grade determiner 105. The input voice samples stored by the input voice temporary storage unit 300 are transmitted to the variable voice encoders 305 to 315 having the corresponding grades according to the grades of the input voices.

가변 음성 부호화부(305 내지 315)는 입력 음성 등급 결정부(105)에 의해 결정되는 등급과 일대일 대응하여 구비되며, 각 등급에 따른 부호화를 수행한다. The variable speech encoders 305 to 315 are provided in one-to-one correspondence with the grade determined by the input speech grade determiner 105 and perform encoding according to each grade.

예를 들어, 입력 음성 등급 결정부(105)가 입력 음성을 천이 구간 및 정적 구간의 두 등급으로 구분한다고 가정한다. 그러면, 가변 음성 부호화부(305 내지 315)는 천이 구간에 속하는 입력 음성의 부호화를 위한 것과 정적 구간에 속하는 입력 음성의 부호화를 위한 것이 존재한다. 입력 음성 등급 결정부(305 내지 315) 는 입력 음성이 천이 구간 등급인지 정적 구간 등급인지를 구분하고 각 등급에 따른 가변 음성 부호화부(305 내지 315)에 입력 음성을 전송한다. For example, it is assumed that the input speech rating determiner 105 divides the input speech into two grades, a transition section and a static section. Then, the variable speech encoders 305 to 315 exist for encoding the input speech belonging to the transition section and for encoding the input speech belonging to the static section. The input voice rating determiner 305 to 315 determines whether the input voice is a transition interval grade or a static interval grade, and transmits the input voice to the variable voice encoders 305 to 315 according to each grade.

각각의 가변 음성 부호화부(305 내지 315)는 각각 서로 다른 프레임 크기, 다른 구조의 양자화기 및 비트-할당 방식을 갖는다. 따라서, 가변 조정 음성 부호화부(110)는 각 등급에 따른 최적의 부호화방법을 이용하여 입력 음성을 각각 부호화할 수 있다.Each of the variable speech encoders 305 to 315 has different frame sizes, different structures of quantizers, and bit-allocation schemes, respectively. Accordingly, the variable adjusted speech encoder 110 may encode the input speech using an optimal encoding method according to each class.

도 4는 입력 음성 신호의 특성에 따라 복호화를 최적으로 수행하는 본 발명에 따른 음성 복호화 장치의 가변 조정 음성 복호화부의 구성을 도시한 도면이다.4 is a diagram illustrating a configuration of a variable adjusted speech decoder of a speech decoding apparatus according to the present invention for optimally performing decoding according to characteristics of an input speech signal.

도 4를 참조하면, 가변 조정 음성 복호화부(160)는 적어도 하나 이상의 가변 음성 복호화부(400 내지 410) 및 출력 음성 임시 저장부(415)로 구성된다.Referring to FIG. 4, the variable adjusted voice decoder 160 includes at least one variable voice decoder 400 to 410 and an output voice temporary storage 415.

수신부(150)의 역다중화부(155)가 송신부(100)로부터 등급에 따라 각각 다르게 부호화된 입력 음성 비트열을 수신하면, 입력 음성 비트열의 등급에 따라 수신한 입력 음성 비트열을 해당하는 가변 음성 복호화부(400 내지 410)로 전달한다.When the demultiplexer 155 of the receiver 150 receives an input speech bit string encoded differently according to a grade from the transmitter 100, the variable speech corresponding to the received input speech bit string according to the grade of the input speech bit string Transfer to the decoder 400 to 410.

가변 음성 복호화부(400 내지 410)는 부호화된 입력 음성 비트열의 등급을 기초로 복호화를 수행한다. 수신부(150)의 가변 음성 복호화부(400 내지 410) 및 송신부(100)의 가변 음성 부호화부(305 내지 315)는 일대일 대응되어 입력 음성의 등급에 따라 각각 부호화 및 복호화를 수행한다.The variable speech decoders 400 to 410 perform decoding based on the grade of the encoded input speech bit string. The variable voice decoders 400 to 410 of the receiver 150 and the variable voice encoders 305 to 315 of the transmitter 100 correspond one-to-one to perform encoding and decoding, respectively, according to the grade of the input voice.

출력 음성 임시 저장부(415)는 연속적인 음성 출력이 가능하도록 가변 음성 복호화부(400 내지 410)에 의해 복호화된 입력 음성을 임시로 저장한 후 출력한다. 즉, 각각의 가변 음성 복호화부(400 내지 410)에 의해 복호화된 입력 음성의 프레 임 크기가 가변적이므로 출력 음성 임시 저장부(415)는 복호화된 입력 음성을 임시 저장한 후 연속적으로 출력한다. The output voice temporary storage unit 415 temporarily stores and outputs the input voice decoded by the variable voice decoders 400 to 410 to enable continuous voice output. That is, since the frame size of the input voice decoded by each of the variable voice decoders 400 to 410 is variable, the output voice temporary storage unit 415 temporarily stores the decoded input voice and outputs them continuously.

도 5a 및 도 5b는 입력 음성 신호의 특성에 따라 부호화/복호화를 최적으로 수행하는 본 발명에 따른 음성 부호화 방법 및 복호화 방법의 흐름을 도시한 도면이다.5A and 5B are diagrams illustrating a flow of a speech encoding method and a decoding method according to the present invention for optimally performing encoding / decoding according to characteristics of an input speech signal.

도 5a를 참조하면, 입력 음성 등급 결정부(105)는 입력 음성의 특성에 따라 등급을 결정한다(S500). 입력 음성의 특성에 따른 등급은 미리 결정되어 있으며 각각의 등급에 최적화된 부호화 방식이 존재한다.Referring to FIG. 5A, the input voice rating determiner 105 determines a rating according to the characteristics of the input voice (S500). The grade according to the characteristics of the input speech is predetermined and there is an encoding scheme optimized for each grade.

가변 조정 음성 부호화부(110)는 입력 음성의 등급에 따라 프레임의 크기, 양자화기 구조 및 비트-할당을 사용하여 입력음성을 부호화하여 출력한다(S510). The variable adjusted speech encoder 110 encodes and outputs an input speech using a frame size, a quantizer structure, and bit-allocation according to the grade of the input speech (S510).

도 5b를 참조하면, 역다중화부(155)는 등급에 따라 부호화된 입력 음성의 비트열을 수신하고, 수신한 입력 음성의 비트열의 등급에 따라 해당하는 가변 조정 음성 복호화부(160)의 가변 음성 복호화부(400 내지 410)로 전송한다.Referring to FIG. 5B, the demultiplexer 155 receives a bit string of an input speech encoded according to a grade, and the variable speech of the variable-adjusted speech decoder 160 corresponding to a grade of a bit string of the received input speech. Transmission to the decoders 400 to 410.

가변 음성 복호화부(400 내지 410)는 입력 음성 비트열을 복호화하고, 연속된 음성으로 출력한다.The variable voice decoders 400 to 410 decode the input voice bit string and output the continuous voice.

도 1 내지 도 5b는 입력 신호의 특성에 따라 프레임 크기 및 비트 할당을 달리하는 음성 부호화기의 구조에 관한 것으로, 통화 도중에 프레임 크기가 변경될 수 있는 구조를 갖는 음성 부호화/복호화 장치 및 방법에 관한 것이다.1 to 5b relate to a structure of a speech coder that varies frame size and bit allocation according to characteristics of an input signal, and relates to a speech encoding / decoding apparatus having a structure in which a frame size can be changed during a call. .

또한, 본 발명의 부호화/복호화 장치 및 방법은 음성 통화 중 뿐만 아니라 통화 연결시 상대편에서 사용하고 있는 음성 부호화기에 따라 프레임 크기를 설정 함으로써 통화자 사이의 프레임 크기가 다른 경우에 발생하는 지연을 방지한다.In addition, the encoding / decoding apparatus and method of the present invention prevents the delay that occurs when the frame sizes are different between callers by setting the frame size according to the voice coder used by the other party as well as during a voice call. .

예를 들어, A에서 B로 전화를 걸 때 B가 사용하는 음성 부호화기의 프레임 크기가 20msec 이면 A에서 음성 부호화기의 프레임 크기를 20msec로 설정하고, 만일 B가 사용하는 음성 부호화기의 프레임 크기가 10msec이면 A에서 음성 부호화기의 프레임 크기를 10msec로 설정한다.For example, when making a call from A to B, if the frame size of the speech coder used by B is 20 msec, set the frame size of the speech coder to 20 msec. If the frame size of the speech coder B used is 10 msec. In A, the frame size of the speech encoder is set to 10 msec.

이렇게 A와 B에서 사용하는 음성 부호화기의 프레임 크기가 같으면 Tandem 지연에 장점이 있다. A가 사용하는 음성 부호화기의 프레임 크기가 20msec이고 B가 사용하는 음성 부호화기의 프레임 크기가 30msec라면 A와 B가 통화하는데 60msec의 지연이 필요하다. 그러나, A에서 프레임 크기를 30msec로 설정하면 둘의 통화를 위해 필요한 지연은 30msec이다.If the frame sizes of the speech coders used in A and B are the same, there is an advantage in Tandem delay. If the frame size of the speech coder used by A is 20 msec and the frame size of the speech coder used by B is 30 msec, a delay of 60 msec is required for A and B to communicate. However, if the frame size is set to 30msec in A, the delay required for the two parties is 30msec.

따라서, 통화(call) 연결시 프레임 크기를 상대편에서 사용하는 음성 부호화기의 프레임 크기와 같이 설정할 수 있는 구조의 음성 부호화기를 사용하면 Tandem시 지연입장에서 장점을 가진다.Therefore, the use of a speech coder having a structure that can set the frame size at the time of a call connection with the frame size of the voice coder used on the other side has an advantage in delaying the tandem time.

이하에서, 도 6및 도 9를 참조하여 음성 통화에 필요한 지연을 줄이는 부호화/복호화 장치 및 방법에 대해 상세히 설명한다.Hereinafter, an encoding / decoding apparatus and method for reducing a delay required for a voice call will be described in detail with reference to FIGS. 6 and 9.

도 6은 음성 통화에 필요한 지연을 줄이는 본 발명에 따른 음성 부호화/복호화 장치의 일 실시예의 구성을 도시한 도면이다.FIG. 6 is a diagram illustrating a configuration of an embodiment of a speech encoding / decoding apparatus according to the present invention, which reduces a delay required for a voice call.

도 6을 참조하면, 음성 부호화 장치를 송신부(600)로 사용하고, 음성 복호화 장치를 수신부(650)로 사용하는 음성 통신 시스템이 도시되어 있다.Referring to FIG. 6, a voice communication system using a voice encoding apparatus as a transmitter 600 and a voice decoding apparatus as a receiver 650 is illustrated.

송신부(600)인 음성 부호화 장치는 프레임 결정부(605), 가변 조정 음성 부 호화부(610) 및 다중화부(615)로 구성된다. 수신부(650)인 음성 복호화 장치는 역다중화부(655) 및 가변 조정 음성 복호화부(660)로 구성된다.The speech encoding apparatus, which is the transmitter 600, is composed of a frame determiner 605, a variable adjusted speech encoder 610, and a multiplexer 615. The speech decoding apparatus, which is the receiver 650, includes a demultiplexer 655 and a variable adjusted speech decoder 660.

프레임 결정부(605)는 음성 부호화를 위한 프레임 크기 및 패킷 당 프레임의 개수를 결정한다. 프레임의 크기와 패킷 당 프레임의 수는 네트워크의 상태에 따라 결정된다. 예를 들어, 전체 네트워크의 지연이 증가하여 서비스 품질의 저하가 발생할 경우에는 음성 부호화 장치의 프레임 크기 및 패킷 당 프레임 수를 줄임으로써 전체 지연을 감소시킨다. 또한, 네트워크의 전체 지연이 감소할 경우에는 프레임의 크기 및 패킷 당 프레임 수를 증가 시킨다. The frame determiner 605 determines the frame size for speech encoding and the number of frames per packet. The size of the frame and the number of frames per packet are determined by the state of the network. For example, when the delay of the entire network is increased and the quality of service occurs, the overall delay is reduced by reducing the frame size and the number of frames per packet of the speech encoding apparatus. In addition, when the overall delay of the network is reduced, the size of the frame and the number of frames per packet are increased.

전체 지연은 통화 도중에도 계속 변할 수 있으므로 통화 도중에 네트워크의 상태에 따라 프레임의 크기 및 패킷 당 프레임 수를 계속 조정하여 전체 지연을 일정 수준으로 유지한다. The overall delay can change over the course of a call, so that the size of the frame and the number of frames per packet are constantly adjusted to maintain a constant level of latency, depending on the state of the network during the call.

가변 조정 음성 부호화부(610)는 프레임 결정부(605)에 의해 결정된 프레임 크기에 따라 입력 음성 신호를 부호화한다. 프레임 크기는 통화 도중에도 계속 변경될 수 있으므로 가변 조정 음성 부호화부(610)는 통화 도중에도 프레임 크기 변경을 수행하여 음질 저하가 발생하지 않도록 한다.The variable adjusted speech encoder 610 encodes the input speech signal according to the frame size determined by the frame determiner 605. Since the frame size can be continuously changed during a call, the variable-adjusted voice encoder 610 changes the frame size even during a call so that the sound quality is not degraded.

다중화부(615)는 가변 조정 음성 부호화부(610)가 가변의 프레임 크기를 사용하는 것을 고려하여, 가변 조정 음성 부호화부(610)로부터 부호화된 입력 음성의 비트열을 출력한다.The multiplexer 615 outputs a bit string of the input speech coded from the variable adjusted speech encoder 610 in consideration of the variable adjusted speech encoder 610 using a variable frame size.

프레임 결정부(605) 및 도 1에 도시된 입력 음성 등급 결정부(105)는 하나의 구성으로 구현되어 입력 음성 등급 결정 및 프레임 크기 등을 결정할 수 있다. 그 리고, 가변 조정 음성 부호화부(610)는 도 1의 가변 조정 음성 부호화부와 그 기능과 구성을 동일하게 구현할 수 있다. 다만, 도 1의 가변 조정 음성 부호화부(110)는 음성 등급에 대응하여 부호화를 수행하고, 도 6의 가변 조정 음성부호화부(610)는 프레임 크기에 대응하여 부호화를 수행한다. 다중화부(615)는 도 1의 다중화부(115)와 그 기능 및 구성을 동일하게 구현할 수 있다.The frame determiner 605 and the input speech grade determiner 105 illustrated in FIG. 1 may be implemented in one configuration to determine an input speech grade and a frame size. In addition, the variable adjusted speech encoder 610 may implement the same function and configuration as the variable adjusted speech encoder of FIG. 1. However, the variable adjusted speech encoder 110 of FIG. 1 performs encoding in response to the speech grade, and the variable adjusted speech encoder 610 of FIG. 6 performs encoding corresponding to the frame size. The multiplexer 615 may implement the same function and configuration as the multiplexer 115 of FIG. 1.

따라서, 도 1에 도시된 본 발명에 따른 음성 부호화 장치(100)를 이용하여 도 6에 도시된 음성 부호화 장치(600)를 구현할 수 있으며, 도 1 및 도 6에 도시된 각각의 음성 부호화장치의 기능을 하나의 부호화장치에 통합하여 구현할 수 있다.Accordingly, the voice encoding apparatus 600 illustrated in FIG. 6 may be implemented using the speech encoding apparatus 100 according to the present invention illustrated in FIG. 1, and each of the speech encoding apparatuses illustrated in FIGS. 1 and 6 may be implemented. It can be implemented by integrating a function into one coding apparatus.

수신부(650)의 역다중화부(655)는 송신부(600)의 다중화부(615)로부터 출력된 비트열을 수신한다. 그리고, 역다중화부(655)는 수신한 비트열로부터 복호화에 필요한 파라미터를 추출하여 가변 조정 음성 복호화부(660)로 전달한다. 그리고, 가변 조정 음성 복호화부(660)는 수신한 비트열을 복호화한다. 임시 저장부(미도시)는 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장한 후 출력한다.The demultiplexer 655 of the receiver 650 receives the bit string output from the multiplexer 615 of the transmitter 600. The demultiplexer 655 extracts a parameter necessary for decoding from the received bit string and transfers the parameter to the variable adjusted speech decoder 660. The variable adjusted speech decoder 660 decodes the received bit string. The temporary storage unit (not shown) temporarily outputs the decoded input voice so that the decoded input voice can be continuously output.

도 6의 수신부(650)는 도 1에 도시된 수신부(150)로 구현가능하며 그 역도 가능하고, 하나의 수신부에 도 6 및 도 1에 도시된 수신부의 기능을 통합하여 구현 가능하다.The receiver 650 of FIG. 6 may be implemented by the receiver 150 illustrated in FIG. 1 and vice versa, and may be implemented by integrating the receiver illustrated in FIGS. 6 and 1 into one receiver.

도 7a 및 도 7b는 음성 통화에 필요한 지연을 줄이는 본 발명에 따른 음성 부호화/복호화 방법의 일 실시예의 흐름을 도시한 흐름도이다.7A and 7B are flow charts illustrating the flow of an embodiment of a speech encoding / decoding method according to the present invention for reducing the delay required for a voice call.

도 7a를 참조하면, 프레임 결정부(605)는 네트워크의 지연 정보에 따라 프레임 크기 및 패킷 당 프레임 수를 결정한다(S700,S710). 가변 조정 음성 부호화부 (610)는 결정된 프레임 크기를 사용하여 입력 음성 신호를 부호화하여 출력한다(S720,S730).Referring to FIG. 7A, the frame determiner 605 determines the frame size and the number of frames per packet according to delay information of the network (S700 and S710). The variable adjustment speech encoder 610 encodes and outputs an input speech signal using the determined frame size (S720 and S730).

도 7b를 참조하면, 역다중화부(655)는 부호화된 입력 음성의 비트열을 수신한 후(S750), 수신한 비트열로부터 복호화에 필요한 파라미터를 추출하고 수신한 비트열을 가변 조정 음성 복호화부(660)로 전송한다(S750). 가변 조정 음성 복호화부(660)는 수신한 입력 음성의 크기에 해당하는 복호화를 가변적으로 수행하여 출력한다(S760). 그리고, 임시 저장부(미도시)는 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장한다.Referring to FIG. 7B, the demultiplexer 655 receives a bit string of an encoded input speech (S750), extracts a parameter necessary for decoding from the received bit string, and then decodes the received bit string. In operation 750, the process transmits to S660. The variable adjusted speech decoder 660 variably performs decoding corresponding to the size of the received input speech and outputs it (S760). The temporary storage unit temporarily stores the decoded input voice to be continuously output.

도 8은 수신측의 음성 부호화기의 종류에 따라 프레임의 크기를 조정하는 음성 부호화/복호화 장치의 일 실시예의 구성을 도시한 도면이다.FIG. 8 is a diagram illustrating a configuration of an embodiment of a speech encoding / decoding apparatus that adjusts a frame size according to a type of speech encoder of a receiving side.

도 8을 참조하면, 송신부(800)인 음성 부호화 장치는 프레임 크기 적응형 음성 부호화부(805) 및 다중화부(810)로 구성된다. 수신부(850)인 음성 복호화 장치는 역다중화부(855) 및 가변 프레임 크기 적응형 음성 복호화부(860)로 구성된다.Referring to FIG. 8, a speech encoding apparatus, which is a transmitter 800, includes a frame size adaptive speech encoder 805 and a multiplexer 810. The speech decoding apparatus as the receiver 850 includes a demultiplexer 855 and a variable frame size adaptive speech decoder 860.

서로 다른 음성 부호화기를 가지고 있는 사용자간의 통화를 위해서는 Transcodec이 필요하다. 이 경우 음성 부호화기의 프레임 크기를 조정함으로써 Transcoding에 필요한 지연을 줄일 수 있다. 다시 말해서, 서로 다른 음성 부호화기를 사용하고 있는 IP 전화 사용자와 무선망 가입자간에 통화를 위해서는 Transcodec이 필요하다. Transcoding을 위해서는 최소한 양측에서 사용하고 있는 부호화기의 프레임 크기의 최대 공배수만큼의 지연이 필요하다. Transcodec is required for calls between users who have different voice coders. In this case, the delay required for transcoding can be reduced by adjusting the frame size of the speech encoder. In other words, Transcodec is required to make a call between an IP telephone user using a different voice coder and a subscriber of a wireless network. Transcoding requires a delay of at least the largest common multiple of the frame sizes of the encoders used by both sides.

예를 들어, G.723.1과 EVRC를 가지고 있는 사용자간의 통화를 위해 Transcoding에 필요하나 지연은 최소 60msec가 된다. 따라서, Transcoding이 필요한 경우에 양측에서 사용하는 음성 부호화기의 프레임 크기가 동일하다면 그만큼 Transcoding에 필요한 지연이 줄어든다. 그러므로 음성 부호화기의 프레임 크기를 상대방 음성 부호화기의 프레임 크기와 동일하게 조정함으로써 Transcoding에 필요한 지연을 줄일 수 있다.For example, transcoding is required for calls between G.723.1 and users with EVRC, but the delay is at least 60 msec. Therefore, when transcoding is required, if the frame sizes of the speech coders used by both sides are the same, the delay required for transcoding is reduced accordingly. Therefore, the delay required for transcoding can be reduced by adjusting the frame size of the speech coder to be the same as the frame size of the other speech coder.

프레임 크기 적응형 음성 부호화부(805)는 상대방의 음성 부호화기의 종류에 따라 결정된 프레임 크기로 입력 음성 신호를 부호화한다. 프레임 크기는 통화 연결시에 상대방 음성 부호화기의 종류에 의해 결정되며 통화 도중에는 변경되지 않는다. 다중화부(810)는 프레임 크기 적응형 음성 부호화부에 의해 부호화된 입력 음성의 비트열을 출력한다.The frame size adaptive speech encoder 805 encodes an input speech signal with a frame size determined according to the type of speech encoder of the counterpart. The frame size is determined by the type of the other party's voice coder when the call is connected and does not change during the call. The multiplexer 810 outputs a bit string of the input speech encoded by the frame size adaptive speech encoder.

수신부의 역다중화부(855)는 송신부의 다중화부(810)로부터 출력된 비트열을 수신한다. 그리고, 역다중화부(855)는 수신한 비트열로부터 복호화에 필요한 파라미터를 추출하여 프레임 크기 적응형 음성 복호화부(860)로 전달한다. 프레임 크기 적응형 음성 부호화/복호화 장치(800,850)는 프레임 크기가 정해지면 그에 맞는 음성 신호 분석 및 양자화 표를 이용하여 음성 신호를 부호화/복호화한다.The demultiplexer 855 of the receiver receives a bit string output from the multiplexer 810 of the transmitter. The demultiplexer 855 extracts a parameter for decoding from the received bit string and transfers the parameter to the frame size adaptive speech decoder 860. The frame size adaptive speech encoding / decoding apparatus 800, 850 encodes / decodes a speech signal using a speech signal analysis and quantization table according to the frame size.

도 9a를 참조하면, 프레임 크기 적응형 음성 부호화부(805)는 Transcoding을 이용하여 통화를 하고자 하는 상대방의 음성 부호화기의 종류에 따라 정해진 프레임 크기로 음성 신호를 부호화한다(S900,S910). 다중화부(810)는 가변의 프레임 크 기로 부호화된 입력 음성의 비트열을 출력한다(S920).Referring to FIG. 9A, the frame size adaptive speech encoder 805 encodes a speech signal at a frame size determined according to the type of speech encoder of a counterpart to make a call using transcoding (S900 and S910). The multiplexer 810 outputs a bit string of an input speech encoded with a variable frame size (S920).

도 9b를 참조하면, 역다중화부(855)는 부호화된 입력 음성의 비트열을 수신하고(S950), 수신한 비트열을 음성 복호화 장치(850)의 프레임 크기 적응형 음성 복호화부(860)로 전송한다. 프레임 크기 적응형 음성 복호화부(860)는 수신한 비트열을 복호화하고(S960), 임시 저장부(미도시)는 복호화된 입력 음성이 연속 출력 가능하도록 임시 저장한다(S970).Referring to FIG. 9B, the demultiplexer 855 receives a bit stream of an encoded input speech (S950) and transfers the received bit stream to the frame size adaptive speech decoder 860 of the speech decoding apparatus 850. send. The frame size adaptive speech decoder 860 decodes the received bit string (S960), and the temporary storage unit (not shown) temporarily stores the decoded input speech to be continuously output (S970).

도 10은 일정한 전송 간격을 가진 가변 프레임 크기의 음성 부호화/복호화 장치의 일 실시예의 구성을 도시한 도면이다.FIG. 10 is a diagram illustrating a configuration of an embodiment of an apparatus for encoding / decoding a variable frame size having a constant transmission interval. Referring to FIG.

도 10을 참조하면, 본 발명에 따른 음성 부호화 장치(1000)는 송신부로 동작하며 가변 부호화부(1005) 및 프레임 전송부(1010)로 구성된다. 그리고, 음성 복호화 장치(1050)는 수신부로 동작하며 프레임 수신부(1055) 및 가변 복호화부(106)로 구성된다.Referring to FIG. 10, the speech encoding apparatus 1000 according to the present invention operates as a transmitter and includes a variable encoder 1005 and a frame transmitter 1010. The voice decoding apparatus 1050 operates as a receiver and includes a frame receiver 1055 and a variable decoder 106.

가변 부호화부(1005)는 입력 음성의 특성에 따라 동작 프레임 크기를 결정하고, 결정된 프레임 크기로 입력 음성을 부호화한다.The variable encoder 1005 determines an operation frame size according to the characteristics of the input voice, and encodes the input voice with the determined frame size.

입력 음성의 특성에 따른 프레임 크기의 결정은 도 1을 참조하여 설명하였다. Determination of the frame size according to the characteristics of the input voice has been described with reference to FIG. 1.

가변 부호화부(1005)에서 입력 음성의 특성에 따라 다른 프레임 크기로 음성 신호를 부호화한다. 프레임 전송부(1010)는 가변 부호화부(1005)에서 출력되는 다양한 크기로 부호화된 음성 데이터를 프레임 간격마다 전송하거나, 일정한 전송 간격으로 전송한다. 이에 대한 프레임은 도 11c에 도시되어 있다. The variable encoder 1005 encodes the speech signal in a different frame size according to the characteristics of the input speech. The frame transmitter 1010 transmits speech data encoded in various sizes output by the variable encoder 1005 for each frame interval or at a constant transmission interval. The frame for this is shown in Fig. 11C.

음성 복호화 장치(1050)는 음성 부호화 장치(1000)와 반대의 과정을 수행한다. 즉, 프레임 수신부(1055)는 일정하지 않은 간격으로 전송된 프레임 또는 일정한 간격으로 전송된 프레임을 수신하고, 가변 복호화부(1060)는 수신한 프레임의 크기에 따른 복호화를 수행한다.The speech decoding apparatus 1050 performs a process opposite to that of the speech encoding apparatus 1000. That is, the frame receiving unit 1055 receives a frame transmitted at an irregular interval or a frame transmitted at a constant interval, and the variable decoding unit 1060 performs decoding according to the size of the received frame.

도 10a에 도시된 본 발명에 따른 부호화/복호화 장치의 개념은 도 1, 도 6 및 도 8에 도시된 각각의 발명에 적용될 수 있다. The concept of an encoding / decoding apparatus according to the present invention shown in FIG. 10A may be applied to each of the inventions shown in FIGS. 1, 6, and 8.

도 10b는 일정한 전송 간격을 가진 가변 프레임 크기의 음성 부호화 방법의 일 실시예의 흐름을 도시한 흐름도이다.FIG. 10B is a flowchart illustrating a flow of an embodiment of a voice encoding method of a variable frame size having a constant transmission interval.

도 10b를 참조하면, 가변 부호화부(1005)는 입력 음성의 특성, 네트워크의 지연정보 및 상대방 부호화기의 유형에 대한 정보에 따른 프레임 크기를 결정하고, 결정된 프레임 크기를 기초로 입력 음성을 부호화한다(S1080).Referring to FIG. 10B, the variable encoder 1005 determines a frame size according to characteristics of an input voice, delay information of a network, and information about a type of a counterpart encoder, and encodes an input voice based on the determined frame size ( S1080).

그리고, 프레임 전송부(1010)는 가변 부호화부(1005)에 의해 다양한 크기로 부호화되는 프레임을 일정한 전송 간격으로 전송한다(S1090).The frame transmitter 1010 transmits a frame encoded in various sizes by the variable encoder 1005 at regular transmission intervals (S1090).

도 11은 본 발명에 따른 다양한 프레임 타입을 도시한 도면이다.11 illustrates various frame types according to the present invention.

도 11a 및 도 11b는 일정한 간격으로 입력 음성을 부호화하고 전송하는 구조를 도시한 도면이다. 예를 들어, 도 11a는 프레임 크기가 10ms이다. 즉, 음성 부호화기에서 입력 음성 신호를 항상 10ms 단위로 부호화하고 10ms 단위로 전송하는 기준의 음성 부호화기를 나타내고 있다. 도 11b는 프레임 크기가 20ms이고, 입력 음성 신호를 20ms마다 부호화하고, 20ms 단위로 전송하는 기존의 음성 부호화기를 나타내고 있다. 11A and 11B illustrate a structure of encoding and transmitting an input voice at regular intervals. For example, FIG. 11A has a frame size of 10 ms. In other words, the voice encoder is a reference speech encoder that always encodes the input speech signal in units of 10ms and transmits the unit in units of 10ms. FIG. 11B shows a conventional speech encoder having a frame size of 20 ms, encoding an input speech signal every 20 ms, and transmitting the apparatus in 20 ms units.

도 11c는 도 10a 및 도 10b에 도시된 본 발명의 특징을 도시한 도면으로, 전송 간격은 실선으로, 부호화 프레임 크기는 점선으로 도시하고 있다. 도 11c를 참조하면, 입력 음성신호의 특성에 따라 음성 부호화기의 프레임 크기는 10ms 또는 20ms 단위로 음성 신호를 부호화하지만, 전송은 항상 20ms 단위로 한다. 즉, 입력 음성 신호를 분석하기 위한 프레임 크기는 입력 음성 신호의 특성에 따라 정해지지만, 전송은 일정한 간격으로 할 수 있다. FIG. 11C is a diagram illustrating the features of the present invention shown in FIGS. 10A and 10B. The transmission interval is shown by a solid line, and the coding frame size is shown by a dotted line. Referring to FIG. 11C, the frame size of the speech encoder is encoded in 10 ms or 20 ms units according to the characteristics of the input speech signal, but transmission is always performed in 20 ms units. That is, the frame size for analyzing the input voice signal is determined according to the characteristics of the input voice signal, but the transmission may be at regular intervals.

도 11d는 도 1 내지 도 9b에 도시된 본 발명의 특징을 도시한 도면으로, 입력 음성의 특징에 따라 음성 신호를 10ms 또는 20ms 단위로 부호화하고, 전송 간격도 가변 프레임 크기에 따라 가변적으로 변화하는 프레임을 도시하고 있다. 즉, 입력 음성 신호의 특성에 따라 음성 부호화기의 프레임 크기가 정해지고, 부호화된 데이터도 정해진 프레임 크기 간격마다 전송된다. FIG. 11D is a diagram illustrating the features of the present invention shown in FIGS. 1 to 9B, wherein a voice signal is encoded in units of 10 ms or 20 ms according to characteristics of an input voice, and a transmission interval is variably changed according to a variable frame size. The frame is shown. That is, the frame size of the speech encoder is determined according to the characteristics of the input speech signal, and the coded data is also transmitted at the predetermined frame size intervals.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

본 발명에 따르면, 입력 음성에 따라 프레임의 크기와 양자화기 구조 및 비트 할당을 최적으로 조정이 가능하므로, 음성 부호화 장치의 성능을 향상 시킬 수 있다. According to the present invention, since the frame size, the quantizer structure and the bit allocation can be optimally adjusted according to the input speech, the performance of the speech encoding apparatus can be improved.

그리고, 네트워크 상태 또는 통화를 원하는 상대방이 가지고 있는 음성 부호화기의 유형에 따라 음성 부호화기의 프레임 크기를 조정함으로써 음성 데이터 전달에 필요한 지연을 조정하여 음성 서비스의 품질을 향상 시킬 수 있다. In addition, by adjusting the frame size of the voice coder according to the network state or the type of the voice coder of the other party, the quality of voice service can be improved by adjusting the delay required for voice data transmission.

Claims

An input speech rating determiner classifying the rating of the input speech into a transition section and a static section;

A variable adjusted speech encoder that variably performs encoding using a frame size, a quantizer structure, and bit-allocation corresponding to the determined class; And

And a multiplexing unit for outputting a bit string of the input speech encoded with the variable frame size.

The method of claim 1,

And the voice grade determiner determines a grade of the input voice using an open-loop grade method or a closed-loop grade method.

The method of claim 1,

The variable adjustment speech encoder,

An input voice temporary storage unit configured to store input voice samples by the size of a frame corresponding to the determined class; And

And a variable speech encoder provided for each class of the input speech and encoding the input speech sample using a frame size, a quantizer structure, and bit-allocation corresponding to the determined grade. .

(a) classifying an input speech into a transition section and a static section;

(b) variably performing encoding using frame size, quantizer structure, and bit-allocation corresponding to the determined class; And

and (c) outputting a bit string of the input speech encoded with the variable frame size.

When receiving a bit stream encoded using a different frame size, quantizer structure, and bit-allocation according to the grade of an input speech classified into a transition period and a static period, the inverse of extracting parameter information necessary for decoding from the bit string Multiplexer;

A variable-adjusted speech decoder provided for each grade of the input speech to variably perform decoding corresponding to the received grade of the input speech; And

And a temporary storage unit for temporarily storing the decoded input speech to be continuously output.

(a) When receiving a bit stream encoded using a different frame size, quantizer structure, and bit-allocation according to the grade of an input speech classified into a transition period and a static period, parameter information necessary for decoding is obtained from the bit string. Extracting;

(b) variably performing decoding corresponding to the grade of the received input speech, provided for each grade of the input speech; And

and (c) temporarily storing the decoded input speech to be continuously output.

A frame determiner configured to determine a frame size having the same size as the frame size of the call counterpart encoder and the number of frames per packet based on information on the type of the call counterpart encoder;

A variable adjusted speech encoder for variably encoding an input speech corresponding to the determined frame size and the number of frames; And

And a multiplexer for outputting a bit string of the input speech encoded with the variable frame size.

The method of claim 7, wherein

And the frame determiner reduces the frame size and the number of frames when the network delay increases, and increases the frame size and the number of frames when the network delay decreases.

delete

The method of claim 7, wherein

And the frame determiner determines the frame size and the number of frames based on delay information of a network that changes during a call.

The method of claim 7, wherein

And the frame determiner determines the frame size and the number of frames based on information on the speech encoder obtained when the call is connected to the call counterpart.

The method of claim 7, wherein

The variable adjustment speech encoder,

An input voice temporary storage unit which stores an input voice sample by the size of the determined frame; And

And a variable speech encoder configured to encode the input speech sample by using the speech encoder corresponding to the determined frame size among the speech encoders provided for each frame size.

(a) determining a frame size and the number of frames per packet having the same size as the frame size of the encoder of the call counterpart, based on the information on the type of the encoder of the call counterpart;

(b) variably encoding a speech signal corresponding to the determined frame size and the number of frames; And

and (c) outputting a bit string of the speech signal encoded in the variable frame size.

A demultiplexer configured to extract parameter information necessary for decoding from the bit string when receiving a bit string of a speech signal encoded according to a frame size and a frame number per packet determined based on delay information of a network;

A variable-adjusted speech decoder provided for each frame size to variably perform decoding corresponding to the size of the received speech signal; And

And a temporary storage unit for temporarily storing the decoded voice signal to be continuously output.

(a) extracting parameter information necessary for decoding from the bit string when receiving the bit string of the speech signal encoded according to the frame size and the number of frames per packet determined based on delay information of the network;

(b) variably performing decoding corresponding to the size of the received voice signal provided for each frame size; And

and (c) temporarily storing the decoded speech signal to be continuously output.

A variable encoder configured to determine a frame size for encoding based on any one of characteristics of an input voice, delay information of a network, and information about a type of an encoder of a call counterpart, and to encode the input voice based on the determined frame size; And

And a frame transmitter configured to transmit the encoded frame at a constant transmission interval.

The method of claim 16,

The variable encoding unit divides the input speech into a transition section and a static section, and performs a speech encoding optimized according to the speech characteristics of each section.

The method of claim 16,

The variable encoding unit reduces the frame size when the delay of the network increases, and increases the size of the frame when the delay of the network decreases.

The method of claim 16,

And the variable encoding unit encodes the input speech in the same frame size as that of the encoder of the other party.

Determining a frame size for encoding based on any one of characteristics of an input voice, delay information of a network, and information on a type of an encoder of a call counterpart, and encoding the input voice based on the determined frame size; And

And transmitting the encoded frame at a constant transmission interval.