KR0180824B1

KR0180824B1 - Adaptive differential pulse code modulation

Info

Publication number: KR0180824B1
Application number: KR1019960026558A
Authority: KR
Inventors: 양연대; 안병철; 박용완; 이윤근; 이황수
Original assignee: 서정욱; 에스케이텔레콤
Priority date: 1996-06-29
Filing date: 1996-06-29
Publication date: 1999-05-15
Also published as: KR980007008A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

미분에너지와 출력코드범위측정에 의한 음성 활성도 검출 방법 및 이를 적용한 가변전송율 적응미분 펄스부호변조장치.Speech activity detection method using differential energy and output code range measurement, and variable rate adaptive differential pulse code modulation device using the same.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

32k 적응미분 펄스부호변조장치와 가변율 적응미분 펄스부호변조장치는 채널의 대역을 효율적으로 사용하지 못하는 문제점이 있었음.The 32k adaptive differential pulse coded modulator and the variable rate adaptive differential pulse coded modulator have a problem in that they cannot use the bandwidth of the channel efficiently.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

음성 활성도를 감지하여 음성 활성도가 작은 부분에서는 낮은 전송율로, 음성 활성도가 큰 부분에서는 높은 전송율로 음성데이타를 전송하여 채널전송대역을 효율적으로 사용할 뿐만아니라 전송전력을 감소시켜 전송함으로써 간섭의 양을 감소시켜 코드분할다중접속(CDMA) 시스템의 용량을 증가시키는 음성 활성도 검출 방법 및 이를 적용한 가변전송율 적응미분 펄스부호변조장치를 제공하고자 함.It detects the voice activity and transmits the voice data at the low data rate in the part where the voice activity is low and at the high data rate in the part where the voice activity is high, thereby effectively using the channel transmission band and reducing the amount of interference by reducing the transmission power. To provide a voice activity detection method for increasing the capacity of a code division multiple access (CDMA) system and a variable rate adaptive differential pulse code modulation device using the same.

4. 발명의 중요한 용도4. Important uses of the invention

코드분할다중접속(CDMA) 시스템의 용량 증대에 이용됨.Used to increase the capacity of code division multiple access (CDMA) systems.

Description

Speech activity detection method and variable rate adaptive differential pulse code modulation device using the same

제1도는 본 발명의 일실시예에 따른 미분에너지를 이용한 음성 활성도 검출(VAD) 방법에 대한 흐름도.1 is a flow chart for a voice activity detection (VAD) method using differential energy according to an embodiment of the present invention.

제2도는 미분에너지를 이용한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 일실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도.2 is a schematic diagram of a variable rate adaptive differential pulse code modulation (ADPCM) apparatus according to an embodiment of the present invention to which a voice activity detection (VAD) method using differential energy is applied.

제3도는 본 발명의 다른 실시예에 따른 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법에 대한 흐름도.3 is a flowchart of a voice activity detection (VAD) method by output code range measurement according to another embodiment of the present invention.

제4도는 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 다른 실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도.4 is a schematic diagram of a variable rate adaptive differential pulse code modulation device (ADPCM) according to another embodiment of the present invention to which a voice activity detection (VAD) method based on output code range measurement is applied.

제5도는 32k 적응미분 펄스부호변조장치(ADPCM)의 출력코드범위 및 그에 따른 전송율 결정에 대한 설명도.5 is an explanatory diagram of an output code range of a 32k adaptive differential pulse code modulation device (ADPCM) and determination of a data rate accordingly.

제6도는 각 전송율별 프레임내의 출력코드범위 및 그에 따른 전송코드에 대한 설명도.6 is an explanatory diagram of an output code range in a frame according to each transmission rate and a corresponding transmission code.

제7도는 본 발명의 또다른 실시예에 따른 미분에너지와 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법에 대한 흐름도.7 is a flowchart of a voice activity detection (VAD) method using differential energy and output code range measurement according to another embodiment of the present invention.

제8도는 미분에너지와 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 또다른 실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도.8 is a schematic diagram of a variable rate adaptive differential pulse code modulation (ADPCM) apparatus according to another embodiment of the present invention to which a voice activity detection (VAD) method using differential energy and output code range measurement is applied.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

21 : 버퍼21: buffer

22 : 미분에너지를 이용한 음성 활성도 검출부22: voice activity detection unit using differential energy

23 : 가변전송율 적응미분 펄스부호변조부23: variable rate adaptive differential pulse code modulator

41 : 음성 분할부 42 : 32k 적응미분 펄스부호변조부41: voice divider 42: 32k adaptive differential pulse code modulator

43 : 프레임 저장부 및 십진수 변환부43: frame storage unit and decimal conversion unit

44 : 비교기 45 : 전송율 결정부44: comparator 45: transmission rate determining unit

본 발명은 미분에너지와 출력코드범위측정에 의한 음성 활성도 검출(VAD:Voice Activity Detection) 방법 및 이를 적용한 가변전송율 적응미분 펄스부호변조장치(ADPCM:Adaptive Differential Pulse Code Modulation)에 관한 것이다.The present invention relates to a voice activity detection (VAD) method by measuring differential energy and output code range, and an adaptive differential pulse code modulation (ADPCM) using the same.

국제전기통신연합(ITU-T)에서 권고한 32k 적응미분 펄스부호변조방식(ADPCM)은 고정된 전송율로 음성을 전달하기 때문에 음성의 활성도가 낮은 부분에서도 32kbps의 높은 전송율로 데이타를 전송한다. 따라서, 전송율을 적응적으로 전송하는 방식보다 채널의 대역을 효율적으로 사용하지 못하는 단점을 지니고 있다.The 32k Adaptive Differential Pulse Code Modulation (ADPCM), recommended by the International Telecommunication Union (ITU-T), delivers data at a fixed data rate, so that data is transmitted at a high data rate of 32 kbps even in areas where voice activity is low. Therefore, there is a disadvantage in that the bandwidth of the channel is not used more efficiently than the method of adaptively transmitting the transmission rate.

한편, 북미방식 코드분할다중접속 이동통신시스템(IS-95)에 사용된 음성부호화기(QCELP:Qualcomm Code Excited Linear Prediction)의 음성 활성도 검출(VAD) 방법에 있어서, 한 프레임 동안의 데이타 전송율은 음성 활성도에 따라 정해진다. 즉, 어떤 데이타율로 처리할 것인가는 그 프레임내의 에너지와 임계치를 비교하여 결정한다. 각 프레임의 에너지 R_i(0)는 i번째 프레임의 첫번째 자기상관계수로부터 구해지고, 임계치는 i번째 배경잡음레벨 B_i로부터 구해진다. R_i(0)은 세개의 임계치 T1(B_i), T2(B_i), T2(B_i)와 비교된다. 만약, R_i(0)이 상기 세개의 임계치보다 클 경우에는 데이타율 1이 선택되고, R_i(0)이 세개중 두개의 임계치보다 큰 경우에는 1/2의 전송율이 선택되며, R_i(0)이 한개의 임계치보다 큰 경우에는 1/4의 전송율이 선택되며, R_i(0)이 세개의 어떤 임계치보다도 작으면 1/8의 전송율이 선택된다.On the other hand, in the voice activity detection (VAD) method of the QCLP (Qualcomm Code Excited Linear Prediction) used in the North American Code Division Multiple Access Mobile Communication System (IS-95), the data rate for one frame is the voice activity. It is decided according to. In other words, the data rate to be processed is determined by comparing the threshold value with the energy in the frame. The energy R _i (0) of each frame is obtained from the first autocorrelation coefficient of the i th frame, and the threshold is obtained from the i th background noise level B _i . R _i (0) is compared with three thresholds T1 (B _i ), T2 (B _i ), T2 (B _i ). If R _i (0) is greater than the three thresholds, data rate 1 is selected, and if R _i (0) is greater than two of the three thresholds, a half rate is selected, and R _i ( If 0) is greater than one threshold, a rate of 1/4 is selected; if R _i (0) is less than any of the three thresholds, a rate of 1/8 is selected.

데이타율의 결정에 사용되는 임계치들은 데이타율이 결정되기 전에 매 프레임마다 갱신된다. 먼저, 배경잡음 B_i는 다음과 같은 식으로 계산된다.The thresholds used to determine the data rate are updated every frame before the data rate is determined. First, the background noise B _i is calculated by the following equation.

B_i=min[R_i-1(0), 160000, max(1.00547B_i-1, B_i-1+1)] _{_{B i = min [R i-}} 1 (0), 160000, max (1.00547B i-1, B i-1 +1)]

초기화시 첫번째 프레임에 대한 배경잡음 B₁은 160,000으로 설정한다. 세개의 임계치는 다음과 같이 계산된다.At initialization, the background noise B ₁ for the first frame is set to 160,000. The three thresholds are calculated as follows.

T₁(B_i)=-(5.544613×10^-6)B² _i+4.047152B_i+362T ₁ (B _i ) =-(5.544613 × 10 ^-6 ) B ² _i +4.047 152B _i +362

T₂(B_i)=-(1.529733×10^-5)B² _i+8.750045B_i+1136T ₂ (B _i ) =-(1.529733 × 10 ^-5 ) B ² _i + 8.750045B _i +1136

T₃(B_i)=-(3.957050×10^-5)B² _i+18.89962B_i+3347T ₃ (B _i ) =-(3.957050 × 10 ^-5 ) B ² _i +18.89962 B _i +3347

종래의 국제전기통신연합(ITU-T)에 권고된 32k 적응미분 펄스부호변조장치(ADPCM)(G.721)와 가변율(variable rate) 적응미분 펄스부호변조장치(ADPCM)(G.726)는 고정된 전송율로 음성을 전달하기 때문에 음성의 활성도에 따라 전송율을 적응적으로 전송하는 방식보다 채널의 대역을 효율적으로 사용하지 못하는 문제점이 있었다.32k Adaptive Differential Pulse Code Modulator (ADPCM) (G.721) and Variable Rate Adaptive Differential Pulse Code Modulator (G.726) recommended by the International Telecommunication Union (ITU-T) Since the voice is transmitted at a fixed rate, there is a problem in that the bandwidth of the channel is not used more efficiently than the method of adaptively transmitting the rate according to the voice activity.

상기 문제점을 해결하기 위하여 안출된 본 발명은 음성 활성도를 감지하여 음성 활성도가 작은 부분에서는 낮은 전송율로, 음성 활성도가 큰 부분에서는 높은 전송율로 음성데이타를 전송하여 채널전송대역을 효율적으로 사용할 뿐만아니라 전송전력을 감소시켜 전송함으로써 간섭의 양을 감소시켜 코드분할다중접속(CDMA) 시스템의 용량을 증가시키는 음성 활성도 검출 방법 및 이를 적용한 가변전송율 적응미분 펄스부호변조장치를 제공하는데 그 목적이 있다.In order to solve the above problems, the present invention senses voice activity and transmits voice data at a low data rate at a portion where voice activity is low and at a high data rate at a portion where voice activity is large, thereby efficiently using the channel transmission band as well. It is an object of the present invention to provide a voice activity detection method for increasing the capacity of a code division multiple access (CDMA) system by reducing the amount of interference by transmitting power and a variable rate adaptive differential pulse code modulation device using the same.

즉, 본 발명에서는 두가지 음성 활성도 검출(VAD:Voice Activity Detection) 방법(미분에너지를 이용한 음성 활성도 검출 방법, 출력코드범위측정에 의한 음성 활성도 검출 방법) 및 상기 두가지 음성 활성도 검출 방법을 동시에 적용하여 서로의 단점을 보완한 음성 활성도 검출 방법을 제시하며, 상기 각 음성 활성도 검출 방법을 32k 적응미분 펄스부호변조장치(ADPCM)에 적용하여 다양한 전송율(32,24,26,8kbps)로 동작하는 가변전송율 적응미분 펄스부호변조장치(ADPCM)을 제시한다.That is, in the present invention, two voice activity detection (VAD) methods (voice activity detection method using differential energy, voice activity detection method by output code range measurement) and the two voice activity detection methods are simultaneously applied to each other. We propose a voice activity detection method that compensates for the shortcomings, and apply each voice activity detection method to 32k Adaptive Differential Pulse Code Modulator (ADPCM) to operate at various data rates (32, 24, 26, 8kbps). A differential pulse code modulation device (ADPCM) is presented.

상기 목적을 달성하기 위한 본 발명의 장치는, 외부로부터 음성신호를 입력받아 프레임으로 분할하여 미분에너지를 계산한 후에 기준에너지를 설정하여 미분에너지와 비교하여 전송율을 가변적으로 결정하는 미분에너지를 이용한 음성 활성도 검출 수단; 상기 미분에너지를 이용한 음성 활성도 검출 수단에서 데이타 전송율을 결정하는 동안 외부로부터 입력되는 음성신호를 버퍼링하는 버퍼링 수단; 및 상기 음성 활성도 검출 수단으로부터 어떤 전송율로 동작할 것인가에 대한 정보를 입력받아 전송율에 맞는 양자화 레벨과 변수를 셋팅하고 상기 버퍼링 수단으로부터 입력되는 음성신호를 부호화하여 해당하는 전송율로 음성코드를 외부로 출력하는 가변전송율 적응미분 펄스부호변조수단을 구비한다.The apparatus of the present invention for achieving the above object, the voice using the differential energy to variably determine the transmission rate compared to the differential energy by setting the reference energy after calculating the differential energy by dividing the frame into frames by receiving a voice signal from the outside Activity detection means; Buffering means for buffering a voice signal input from the outside while determining a data rate in the voice activity detection means using the differential energy; And receiving information on the data rate to be operated from the voice activity detection means, setting a quantization level and a variable suitable for the data rate, encoding a voice signal input from the buffering means, and outputting a voice code to the outside at a corresponding data rate. Variable rate adaptive differential pulse code modulation means.

상기 목적을 달성하기 위한 본 발명의 다른 장치는, 외부로부터 입력되는 음성신호를 소정 길이의 프레임으로 분할하는 음성 분할 수단; 상기 음성 분할 수단의 출력을 입력받아 양자화 레벨과 변수를 셋팅하여 부호화한 음성코드를 출력하는 32k 적응미분 펄스부호변조수단; 상기 32k 적응미분 펄스부호변조수단의 출력코드를 프레임 길이만큼 버퍼에 저장한 후에 출력코드를 십진수로 변환시키는 프레임 저장 수단 및 십진수 변환 수단; 상기 프레임 저장 수단 및 십진수 변환 수단으로부터 한 프레임안에 있는 십진수로 변환된 출력코드를 입력받아 비교한 후에 최대값과 최소값을 선택하는 비교 수단; 및 상기 비교 수단으로부터 입력된 최대값이 속하는 전송율을 결정한 후에 해당되는 범위의 전송율로 프레임 데이타를 전송하는 전송율 결정 수단을 구비한다.Another apparatus of the present invention for achieving the above object comprises: voice dividing means for dividing a voice signal input from the outside into a frame of a predetermined length; 32k adaptive differential pulse code modulation means for receiving the output of the speech dividing means and outputting an encoded speech code by setting a quantization level and a variable; Frame storing means and decimal converting means for converting the output code into a decimal number after storing an output code of said 32k adaptive differential pulse code modulation means in a buffer by a frame length; Comparison means for selecting a maximum value and a minimum value after receiving and comparing an output code converted to a decimal number within a frame from the frame storage means and the decimal conversion means; And a rate determining means for determining frame rate to which the maximum value inputted from the comparing means belongs and then transmitting frame data at a rate within the corresponding range.

상기 목적을 달성하기 위한 본 발명의 또다른 장치는, 외부로부터 입력되는 음성신호를 소정 길이의 프레임으로 분할하는 음성 분할 수단; 상기 음성 분할 수단으로부터 음성신호를 입력받아 프레임으로 분할하여 미분에너지를 계산한 후에 기준에너지를 설정하여 미분에너지와 비교하여 전송율을 가변시키는 미분에너지를 이용한 음성 활성도 검출 수단; 상기 미분에너지를 이용한 음성 활성도 검출 수단에서 데이타 전송율을 결정하는 동안 상기 음성 분할 수단으로부터 입력되는 음성신호를 버퍼링하는 버퍼링 수단; 상기 미분에너지를 이용한 음성 활성도 검출 수단의 출력(X)과 소정의 임계치(Y)를 입력받아 두 값을 비교하여 제어신호를 출력하는 제1비교 수단; 상기 제1비교 수단으로부터 입력되는 제어신호에 따라 상기 버퍼링 수단의 출력을 스위칭하는 스위칭 수단; 상기 버퍼링 수단의 출력을 상기 스위칭 수단을 통하여 입력받으면 델타 변조 방식(Delta Modulation)으로 변조하여 음성코드(1 또는 0)를 외부로 출력하는 델타 변조 수단; 상기 버퍼링 수단의 출력을 상기 스위칭 수단을 통하여 입력받아 양자화 레벨과 변수를 셋팅하여 부호화한 음성코드를 출력하는 32k 적응미분 펄스부호변조수단; 상기 32k 적응미분 펄스부호변조수단의 출력코드를 프레임 길이만큼 버퍼에 저장한 후에 출력코드를 십진수로 변환시키는 프레임 저장 수단 및 십진수 변환 수단; 상기 프레임 저장 수단 및 십진수 변환 수단으로부터 한 프레임안에 있는 십진수로 변환된 출력코드들을 입력받아 비교한 후에 최대값과 최소값을 선택하는 제2비교 수단; 및 상기 제2비교 수단으로부터 입력된 최대값이 속하는 전송율을 결정한 후에 해당되는 범위의 전송율로 프레임 데이타를 전송하는 전송율 결정 수단을 구비한다.Another apparatus of the present invention for achieving the above object comprises: voice dividing means for dividing a voice signal input from the outside into a frame of a predetermined length; A voice activity detection means using differential energy for receiving a voice signal from the voice dividing means, dividing it into frames, calculating differential energy, setting a reference energy, and changing a transmission rate in comparison with the differential energy; Buffering means for buffering a speech signal input from said speech dividing means while determining a data transmission rate in said speech activity detecting means using said differential energy; First comparison means for receiving an output (X) of the voice activity detection means using the differential energy and a predetermined threshold value (Y) and comparing the two values to output a control signal; Switching means for switching the output of said buffering means in accordance with a control signal input from said first comparing means; Delta modulating means for outputting a voice code (1 or 0) to the outside by modulating a delta modulation method when the output of the buffering means is input through the switching means; 32k adaptive differential pulse code modulation means for receiving an output of the buffering means through the switching means and outputting an encoded speech code by setting a quantization level and a variable; Frame storing means and decimal converting means for converting the output code into a decimal number after storing an output code of said 32k adaptive differential pulse code modulation means in a buffer by a frame length; Second comparing means for selecting a maximum value and a minimum value after receiving and comparing output codes converted to decimal numbers in one frame from the frame storage means and the decimal conversion means; And a transmission rate determining means for transmitting the frame data at a transmission rate in a corresponding range after determining a transmission rate to which the maximum value input from the second comparison means belongs.

상기 목적을 달성하기 위한 본 발명의 방법은, 음성신호를 소정 길이의 프레임으로 분할한 후에 분할된 각각의 프레임마다 미분에너지(dENG)를 계산하는 제1단계; 현재의 미분에너지(dENG), 배경잡음 최대상수(100000), 저역 필터링된 기준에너지(MAX(1.00547B1, B1+1))중 최소값을 선택하여 기준에너지(B)로 설정하는 제2단계; 및 설정된 기준에너지 값과 현재 프레임의 미분에너지를 비교하여 현재 프레임의 배경잡음 대비 에너지의 비율에 따라 전송율을 가변적으로 결정하는 제3단계를 포함한다.A method of the present invention for achieving the above object comprises: a first step of calculating a differential energy (dENG) for each divided frame after dividing an audio signal into frames of a predetermined length; A second step of selecting a minimum value among a current differential energy dENG, a background noise maximum constant 100000, and a low-pass filtered reference energy MAX (1.00547B1, B1 + 1) and setting it as a reference energy B; And a third step of comparing the set reference energy value with the differential energy of the current frame to variably determine the data rate according to the ratio of the energy to the background noise of the current frame.

상기 목적을 달성하기 위한 본 발명의 다른 방법은, 음성신호를 소정 길이의 프레임으로 분할한 후에 각 프레임을 최대전송율(32k ADPCM)로 부호화하여 출력코드를 버퍼에 저장하는 제1단계; 각 프레임의 출력코드를 검색하는 제2단계; 및 상기 제2단계의 검색 결과에 나타난 출력코드의 비트수에 따라 전송율을 가변적으로 결정하는 제3단계를 포함한다.According to another aspect of the present invention, there is provided a method, comprising: a first step of dividing an audio signal into frames having a predetermined length, encoding each frame at a maximum data rate (32k ADPCM), and storing an output code in a buffer; A second step of retrieving an output code of each frame; And a third step of variably determining a transmission rate according to the number of bits of the output code shown in the search result of the second step.

상기 목적을 달성하기 위한 본 발명의 또다른 방법은, 음성신호를 소정 길이의 프레임으로 분할한 후에 분할된 각각의 프레임마다 미분에너지(dENG)를 계산하는 제1단계; 현재의 미분에너지(dENG), 배경잡음 최대상수(100000), 저역 필터링된 기준에너지(MAX(1.00547B1, B1+1))중 최소값을 선택하여 기준에너지(B)로 설정하는 제2단계; 소정의 제4임계치와 기준에너지(B)를 곱한 값이 미분에너지(dENG)보다 크면 임의의 전송율4(8kbps)로 프레임 데이타를 전송하는 제3단계; 및 각 프레임을 최대전송율(32k ADPCM)로 부호화하여 출력코드를 버퍼에 저장한 후에 각 프레임의 출력코드를 검색하여 출력코드의 비트수에 따라 전송율을 가변적으로 결정하는 제4단계를 포함한다.According to another aspect of the present invention, there is provided a method including: a first step of dividing an audio signal into frames having a predetermined length and then calculating differential energy dENG for each divided frame; A second step of selecting a minimum value among a current differential energy dENG, a background noise maximum constant 100000, and a low-pass filtered reference energy MAX (1.00547B1, B1 + 1) and setting it as a reference energy B; A third step of transmitting frame data at an arbitrary transmission rate 4 (8 kbps) when a value obtained by multiplying a predetermined fourth threshold by the reference energy B is greater than the differential energy dENG; And a fourth step of encoding each frame at a maximum data rate (32k ADPCM), storing the output code in a buffer, retrieving the output code of each frame, and variably determining the data rate according to the number of bits of the output code.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

제1도는 본 발명의 일실시예에 따른 미분에너지를 이용한 음성 활성도 검출(VAD) 방법에 대한 흐름도로서, 본 발명에서 제안한 가변전송율 적응미분 펄스부호변조장치(ADPCM)를 위한 음성 활성도 검출(VAD) 방법의 흐름도이다.1 is a flowchart illustrating a method for detecting a voice activity using differential energy according to an embodiment of the present invention, wherein a voice activity detection (VAD) for a variable rate adaptive differential pulse code modulation (ADPCM) proposed by the present invention is shown. A flowchart of the method.

제1도의 음성 활성도 측정 방법은 기본적으로 음성신호의 미분에너지를 이용하여 음성 활성도를 측정한다. 미분에너지를 이용한 이유는 적응미분 펄스부호변조장치(ADPCM)가 미분신호를 양자화하도록 설계되어 있기 때문이다. 순차적으로 미분에너지를 이용한 음성 활성도 검출 방법의 흐름도를 설명하면 다음과 같다.The method of measuring voice activity of FIG. 1 basically measures the voice activity using differential energy of a voice signal. The reason for using the differential energy is that the adaptive differential pulse code modulation device (ADPCM) is designed to quantize the differential signal. A flowchart of a voice activity detection method using differential energy is sequentially described as follows.

먼저, 음성신호를 일정한 길이의 프레임으로 분할한다(11). 프레임의 길이는 음성신호의 특성이 크게 변화하지 않도록 짧게 하는 것이 유리하나 음성 활성도 정보 전송의 오버헤드(overhead)가 너무 크지 않도록 적당한 길이를 유지하여야 한다. 본 발명에서는 보코더(Vocoder)에서 많이 채택하고 있는 20msec의 프레임 길이를 선택하였다. 일반적으로 20msec 동안은 음성신호의 특성이 크게 변화하지 않는 것으로 알려져 있다.First, the voice signal is divided into frames of a predetermined length (11). The length of the frame is advantageously shortened so that the characteristics of the voice signal do not change significantly, but the length of the frame should be maintained so that the overhead of voice activity information transmission is not too large. In the present invention, a frame length of 20 msec, which is widely adopted by Vocoder, is selected. In general, it is known that the characteristics of a voice signal do not change significantly during 20 msec.

분할된 각각의 프레임마다 미분에너지(dENG)를 계산한다(12). 미분신호는 음성신호의 현재 샘플과 과거 샘플의 차이로 정의되며, 미분신호를 제곱하여 프레임구간 동안 합하면 미분에너지가 얻어진다. 즉, 아래의 식과 같이 나타내어진다.The differential energy dENG is calculated for each divided frame (12). The differential signal is defined as the difference between the current sample and the past sample of the speech signal. The differential signal is squared and summed over the frame period to obtain the differential energy. That is, it is represented by the following formula.

여기서, dENG는 미분에너지, L은 프레임 길이, S(i)는 프레임내의 i번째 샘플 , i는 프레임내의 샘플 인덱스를 각각 나타낸다.Where dENG is the differential energy, L is the frame length, S (i) is the i-th sample in the frame, and i is the sample index in the frame.

이후, 미분에너지를 이용하여 기준에너지(B)를 구한다. 기준에너지는 음성 활성도를 측정할 때 기준이 되는 값으로, 현재의 미분에너지(dENG), 배경잡음 최대상수(100000), 저역 필터링된 기준에너지(MAX(1.00547B1, B1+1))중 최소값으로 정의되며, 이 값은 매 프레임마다 갱신된다(13). 이와같이 구하여진 기준에너지는 대략 배경잡음의 미분에너지의 의미를 가지며, 이는 매 프레임의 음성 활성도 측정의 기준이 된다.Then, the reference energy (B) is obtained using the differential energy. The reference energy is a reference value when measuring voice activity. The reference energy is the minimum value of the current differential energy (dENG), the maximum background noise constant (100000), and the low-pass filtered reference energy (MAX (1.00547B1, B1 + 1)). Defined, this value is updated every frame (13). The reference energy thus obtained has a meaning of the differential energy of the background noise, which is a reference for measuring the voice activity of each frame.

기준에너지를 구하는데 필요한 3가지 요소중 배경잡음 최대상수와 지역 필터링된 미분에너지에 대해 간단히 설명하면 다음과 같다.Among the three factors required to obtain the reference energy, the maximum background noise constant and the regional filtered differential energy are briefly described as follows.

◎ 배경잡음 최대상수◎ Maximum background noise

배경잡음이 너무 크거나 음성구간이 너무 오래 지속되면 기준에너지 값이 너무 커져 문제가 생길 수 있으므로 기준에너지의 상한선을 정해 놓을 필요가 있는데, 그 값이 배경잡음 최대상수이며, 실험을 통하여 적당한 값 100000을 택하였다.If the background noise is too loud or the voice section lasts too long, the reference energy value may be too large to cause problems. Therefore, the upper limit of the reference energy needs to be set. The value is the maximum background noise constant. Was selected.

◎ 저역 필터링된 기준에너지(MAX(1.00547×B1, B1+1))◎ Low-filtered reference energy (MAX (1.00547 × B1, B1 + 1))

조용한 환경인 경우에 미분에너지(dENG)가 매우 작으므로 세가지 요소중 미분에너지가 최소값이 되어, 기준에너지(B)는 미분에너지(dENG)가 된다. 그러나, 음성구간이 시작되면 미분에너지(dENG) 값이 갑자기 증가하는데, 이때 기준에너지(B)의 급격한 변화를 막아주는 요소가 저역 필터링된 기준에너지이다. 여기서, B1은 바로 전 프레임의 기준에너지이므로 시간이 지날수록 저역 필터링된 기준에너지의 값은 조금씩 증가하지만 급격히 변화하지는 않는다. 따라서, 기준에너지(B)는 조금씩 증가한다.In the case of a quiet environment, the differential energy dENG is very small, so the differential energy among the three elements becomes the minimum value, and the reference energy B becomes the differential energy dENG. However, when the voice interval starts, the differential energy value dENG increases abruptly. At this time, the element preventing the sudden change of the reference energy B is the low-pass filtered reference energy. Here, since B1 is the reference energy of the previous frame, the value of the low-pass filtered reference energy increases little by little but does not change rapidly. Therefore, the reference energy B increases little by little.

또한, 환경이 변화하여 주위의 잡음이 심한 경우에 미분에너지(dENG)가 증가하므로 이러한 경우에 저역 필터링된 기준에너지가 현재의 기준에너지로 선택됨으로써, 배경잡음이 큰 경우에 묵음구간을 음성구간으로 잘못 인식하여 많은 양의 비트를 할당하여 부호화하는 낭비를 막아준다.In addition, the differential energy (dENG) increases when the environment changes and the surrounding noise is severe. In this case, the low-pass filtered reference energy is selected as the current reference energy, so that the silent section is used as the voice section when the background noise is large. Misrecognition prevents the waste of allocating and encoding a large amount of bits.

저역 필터링된 기준에너지를 구할 때 1.00547×B1과 B1+1중 큰 값을 택하는 이유는 B1이 큰 값일 경우에 1.00547×B1으로 동작하고, B1이 작은 값일 경우에는 B1+1로 동작하도록 하기 위함이다. B1이 큰 값일 경우에 B1+1로 동작하면 B1에 비하여 1이 너무 작아 증가속도가 너무 느리고 마찬가지로 B1이 작은 값일 경우에 1.00547×B1으로 동작하면 증가율이 너무 작기 때문이다. 음성구간이 끝나면 다시 미분에너지(dENG) 값이 작아지므로 기준에너지(B)도 작아진다.The reason for choosing a higher value among 1.00547 × B1 and B1 + 1 when obtaining the low-pass filtered reference energy is to operate as 1.00547 × B1 when B1 is a large value and B1 + 1 when B1 is a small value. to be. This is because when B1 is a large value, if B1 + 1 is operated, 1 is too small compared to B1, and the increase rate is too slow. Likewise, when B1 is a small value, when B1 is small, the increase rate is too small. At the end of the negative interval, the differential energy value dENG decreases, so the reference energy B also decreases.

매 프레임마다 기준에너지가 구하여지면 이 값과 현재 프레임의 미분에너지를 비교하여 현재 프레임의 배경잡음 대비 에너지의 비율에 따라 전송율을 가변시키며, 임계값은 실험에 의해 결정된다. 이를 구체적으로 살펴보면 다음과 같다.When the reference energy is obtained for each frame, this value is compared with the differential energy of the current frame, and the data rate is changed according to the ratio of the energy to the background noise of the current frame. The threshold is determined by experiment. Looking at this in detail.

즉, 제4임계치와 기준에너지(B)를 곱한 값이 미분에너지(dENG)보다 크면 전송율4(rate4:8kbps)로 동작하고(14), 미분에너지(dENG)가 제4임계치와 기준에너지(B)를 곱한 값보다 크고 제4임계치와 기준에너지(B)를 곱한 값보다 작으면 전송율3(rate3:16kbps)으로 동작하며(15), 미분에너지(dENG)가 제3임계치와 기준에너지(B)를 곱한 값보다 크고 제2임계치와 기준에너지(B)를 곱한 값보다 작으면 전송율2(rate2:24kbps)로 동작하며(16), 위의 조건을 모두 만족하지 않으면 전송율1(rate1:32kbps)로 동작한다. 이후, 다음 음성신호에 대하여 위와 같은 동작을 받복한다.That is, if the product of the fourth threshold and the reference energy (B) is greater than the differential energy (dENG), the operation rate is 4 (rate4: 8 kbps) (14), and the differential energy (dENG) is the fourth threshold and the reference energy (B). Is greater than the product of the fourth threshold and the reference energy (B), and operates at a rate of 3 (rate3: 16 kbps) (15), and the differential energy (dENG) is the third threshold and the reference energy (B). If it is larger than the value multiplied by and is smaller than the second threshold multiplied by the reference energy (B), it operates at a rate of 2 (rate2: 24kbps) (16). It works. Thereafter, the above operation is received for the next voice signal.

제2도는 미분에너지를 이용한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 일실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도로서, 도면에서 21은 버퍼, 22는 미분에너지를 이용한 음성 활성도 검출부, 23은 가변전송율 적응미분 펄스부호변조부를 각각 나타낸다.2 is a schematic block diagram of an ADPCM of a variable rate adaptive differential pulse code modulation apparatus according to an embodiment of the present invention to which a voice activity detection (VAD) method using differential energy is applied. A voice activity detection unit using differential energy, 23 denotes a variable rate adaptive differential pulse code modulation unit, respectively.

버퍼(21)는 미분에너지를 이용한 음성 활성도 검출부(22)에서 데이타 전송율을 결정하는 동안 외부로부터 입력되는 음성신호를 버퍼링한다.The buffer 21 buffers the voice signal input from the outside while the voice activity detector 22 using the differential energy determines the data rate.

미분에너지를 이용한 음성 활성도 검출부(22)는 외부로부터 음성신호를 입력받아 프레임으로 분할하여 미분에너지를 계산한 후에 기준에너지를 설정하여 상기 미분에너지와 비교하여 전송율을 가변시킨다.The voice activity detection unit 22 using the differential energy receives a voice signal from the outside, divides it into frames, calculates differential energy, sets a reference energy, and compares the differential energy with the transmission rate.

가변전송율 적응미분 펄스부호변조부(23)는 상기 음성 활성도 검출부(22)로부터 어떤 전송율로 동작할 것인가에 대한 정보를 입력받아 전송율에 맞는 양자화 레벨과 변수를 셋팅하고 상기 버퍼(21)로부터 입력되는 음성신호를 인코딩하여 해당하는 전송율로 음성코드를 외부로 출력한다.The variable rate adaptive differential pulse code modulator 23 receives information about the rate of operation from the voice activity detector 22 to set a quantization level and a variable suitable for the rate and is input from the buffer 21. It encodes the voice signal and outputs the voice code to the outside at the corresponding rate.

32kbps로 동작할 경우에 양자화 레벨의 수는 32k 적응미분 펄스부호변조장치(ADPCM)와 마찬가지로 15레벨(4비트)로 입력음성신호를 양자화하여 출력한다. 가변전송율 적응미분 펄스부호변조부(23)는 음성 활성도 검출부(22)로부터 24kbps로 동작하라는 정보를 수신하면 7레벨(3비트)의 양자화 레벨로 입력음성신호를 양자화하여 출력한다. 마찬가지로 음성 활성도 검출부(22)로부터 16kbps로 동작하라는 신호를 수신하면 가변전송율 적응미분 펄스부호변조부(23)는 3레벨(2비트)의 양자화기로 음성신호를 양자화하여 출력한다. 또한, 음성 활성도 검출부(22)로부터 8kbps로 동작하라는 신호를 수신하면 가변전송율 적응미분 펄스부호변조부(23)는 델타 변조 방식(DeltModulation:입력신호를 표본화하여 그 하나 앞의 표본치와의 차의 부호의 정·부에 상응하는 극성만을 전송)을 사용하여 1 또는 0의 신호를 출력한다.When operating at 32 kbps, the number of quantization levels is quantized and outputted at 15 levels (4 bits), similar to the 32k Adaptive Differential Pulse Code Modulator (ADPCM). When the variable rate adaptive differential pulse code modulator 23 receives information from the voice activity detector 22 to operate at 24 kbps, the variable rate adaptive differential pulse code modulator 23 quantizes and outputs the input voice signal at a quantization level of 7 levels (3 bits). Similarly, upon receiving a signal to operate at 16 kbps from the voice activity detector 22, the variable rate adaptive differential pulse code modulator 23 quantizes and outputs the voice signal with a three-level (2-bit) quantizer. In addition, when receiving a signal to operate at 8 kbps from the voice activity detection unit 22, the variable rate adaptive differential pulse code modulation unit 23 samples the delta modulation method (DeltModulation) and compares the difference with the previous sample value. Outputs a signal of 1 or 0 using only the polarity corresponding to the positive and negative parts of the code).

제3도는 본 발명의 다른 실시예에 따른 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법에 대한 흐름도이다.3 is a flowchart illustrating a voice activity detection (VAD) method by output code range measurement according to another embodiment of the present invention.

먼저, 음성신호를 적당한 길이의 프레임으로 분할하여(31) 각 프레임을 최대전송율(32k ADPCM)로 인코딩하여(32) 출력코드를 버퍼에 저장한 후에(33) 각 프레임의 출력코드를 검색한다(34).First, the audio signal is divided into frames of an appropriate length (31), each frame is encoded at a maximum data rate (32k ADPCM) (32), the output code is stored in a buffer (33), and then the output code of each frame is retrieved ( 34).

검색 결과, 출력코드중 2비트 이상인 출력코드가 없으면 전송율4(8kbps)로 프레임을 전송하고(35), 출력코드중 3비트 이상인 출력코드가 없으면 전송율3(16kbps)로 프레임을 전송하고(36), 출력코드중 4비트 이상인 출력코드가 없으면 전송율2(24kbps)로 프레임을 전송하며(37), 그외에 출력코드중 4비트 이상인 출력코드가 있으면 전송율1(32kbps)로 프레임을 전송한다. 이후, 다음 음성신호에 대하여 위와 같은 동작을 반복한다.As a result, if there is no output code of 2 bits or more among the output codes, the frame is transmitted at a transmission rate of 4 (8 kbps) (35). If there is no output code of more than 3 bits of the output code, the frame is transmitted at a transmission rate of 3 (16 kbps) (36). If there is no output code of 4 bits or more among the output codes, the frame is transmitted at transmission rate 2 (24 kbps) (37). Otherwise, if there is an output code of 4 bits or more among the output codes, the frame is transmitted at transmission rate 1 (32 kbps). Thereafter, the above operation is repeated for the next voice signal.

제4도는 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 다른 실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도로서, 도면에서 41은 음성 분할부, 42는 32k 적응미분 펄스부호변조부, 43은 프레임 저장부 및 십진수 변환부, 44는 비교기, 45는 전송율 결정부를 각각 나타낸다.4 is a schematic configuration diagram of an ADPCM of a variable bit rate adaptive differential pulse code modulation device according to another embodiment of the present invention to which a voice activity detection (VAD) method by output code range measurement is applied. An installment, 42 denotes a 32k adaptive differential pulse code modulator, 43 denotes a frame storage and a decimal converter, 44 denotes a comparator, and 45 denotes a rate determiner.

제5도는 32k 적응미분 펄스부호변조장치(ADPCM)의 출력코드범위 및 그에 따른 전송율 결정에 대한 설명도이고, 제6도는 각 전송율별 프레임내의 출력코드범위 및 그에 따른 전송코드에 대한 설명도이다.FIG. 5 is an explanatory diagram of an output code range of the 32k adaptive differential pulse code modulation system (ADPCM) and its rate determination, and FIG. 6 is an explanatory diagram of an output code range in a frame for each transmission rate and a corresponding transmission code.

입력음성신호를 적당한 길이의 프레임으로 분할하여 32k 적응미분 펄스부호변조장치(ADPCM)를 통과시켜 인코딩된 출력코드의 범위를 측정하면 이에 따라 가변전송율로 동작시킬 수 있다.By dividing the input speech signal into frames of appropriate length and passing the 32k Adaptive Differential Pulse Code Modulator (ADPCM) to measure the range of encoded output codes, it is possible to operate at variable rates.

그 구성을 구체적으로 살펴보면 다음과 같다.Looking at the configuration in detail as follows.

음성 분할부(41)는 외부로부터 입력되는 음성신호를 적당한 길이의 프레임으로 분할한다.The voice divider 41 divides the voice signal input from the outside into frames of a suitable length.

32k 적응미분 펄스부호변조부(42)는 15레벨 양자화기를 사용하므로, 상기 음성 분할부(41)의 출력을 입력받아 한 샘플당 4비트 단위로 음성신호를 출력하며, 샘플링율이 8k(샘플/초)이므로 32kbps의 표본화된 데이타를 출력한다. 출력되는 4비트중 최상위비트는 부호를 나타내며(즉, 0이면 +, 1이면 -) 나머지 3비트는 음성신호의 크기를 나타낸다. 32k 적응미분 펄스부호변조부(42)에서 출력되는 4비트의 출력코드의 범위는 제5도에 나타나 있다.Since the 32k adaptive differential pulse code modulator 42 uses a 15-level quantizer, it receives the output of the voice divider 41 and outputs a voice signal in units of 4 bits per sample, and has a sampling rate of 8k (sample / Second), so we output 32kbps of sampled data. The most significant bit of the output 4 bits represents a sign (i.e., 0 if + and 1 if-). The remaining 3 bits represent the magnitude of the voice signal. The range of the 4-bit output code output from the 32k adaptive differential pulse code modulator 42 is shown in FIG.

프레임 저장부 및 십진수 변환부(43)는 상기 32k 적응미분 펄스부호변조부(42)로부터 출력된 데이타를 프레임 길이만큼 버퍼에 저장한 후에 4비트 단위의 출력코드들을 추후 비교기에서 비교하기에 용이하게 십진수로 변환시켜 준다.The frame storage unit and the decimal conversion unit 43 store the data output from the 32k adaptive differential pulse code modulation unit 42 in a buffer of a frame length, and then easily compare the output codes of 4 bits in a comparator later. Convert to decimal.

비교기(44)는 상기 프레임 저장부 및 십진수 변환부(43)로부터 한 프레임안에 있는 십진수로 변환된 출력코드들을 입력받아 비교한 후에 최대값과 최소값을 선택한다.The comparator 44 receives output codes converted into a decimal number in one frame from the frame storage unit and the decimal converter 43, and compares them with each other, and selects the maximum value and the minimum value.

전송율 결정부(45)는 상기 비교기(44)로부터 입력된 최대값이 제5도에 나타나 있는 전송율1, 전송율2, 전송율3, 전송율4의 범위중 어느 범위에 속하는지를 결정한 후에 해당되는 범위의 전송율로 프레임을 전송한다.The rate determining unit 45 determines the range of the rate 1, rate 2, rate 3, and rate 4 shown in FIG. 5 after the maximum value input from the comparator 44 is in the range corresponding to the rate. Send a frame to

예를 들면 최대값이 0(이진수 0000)이고 최소값이 -1(이진수 1111)인 경우에는 최하위비트(0 또는 1)만을 전송하면 되므로 전송율4(8kbps)로 동작하면 된다. 디코더(Decoder)단에서는 사인 익스텐션(sign extension)하여 (1→1111(십진수 -1), 0→0000(십진수 0)) 복원할 수 있다.For example, if the maximum value is 0 (binary 0000) and the minimum value is -1 (binary 1111), only the least significant bit (0 or 1) needs to be transmitted. The decoder stage may sign extension to restore (1 → 1111 (decimal −1), 0 → 0000 (zero)).

그리고, 최대값이 +1(이진수 0001)이고 최소값이 -2(이진수 1110)인 경우에는 하위 2비트만을 전송하면 되므로 전송율3(16kbps)로 동작한다. 즉, 프레임내의 출력코드가 +1(0001), 0(0000), -1(1111), -2(1110)인 경우에 각각 01, 00, 11, 10을 전송하면 디코더는 사인 익스텐션(sign extension)하여 복원할 수 있다. 한편, 각 전송율에 대한 프레임내의 출력코드와 실제 전송비트에 대한 것을 제6도에 정리하여 놓았다.In addition, when the maximum value is +1 (binary 0001) and the minimum value is -2 (binary 1110), only the lower two bits need to be transmitted, thereby operating at transmission rate 3 (16kbps). In other words, if the output code in a frame is +1 (0001), 0 (0000), -1 (1111), or -2 (1110), respectively, 01, 00, 11, and 10 are transmitted. Can be restored. On the other hand, the output code in the frame for each transmission rate and the actual transmission bit are summarized in FIG.

제7도는 본 발명의 또다른 실시예에 따른 미분에너지와 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법에 대한 흐름도이다.7 is a flowchart illustrating a voice activity detection (VAD) method using differential energy and output code range measurement according to another embodiment of the present invention.

제안된 알고리즘은 두 단계로 동작한다.The proposed algorithm operates in two steps.

첫번째 단계로 묵음구간을 검출한다. 이는 미분에너지를 이용한 전송율 결정 알고리즘중 전송율4의 결정 과정과 동일하다. 만약, 전송율4로 결정된 경우에는 8kbps로 동작하며 그렇지 않은 경우에는 두번째 단계로 넘어간다.The first step is to detect the silence section. This is the same as the determination process of transmission rate 4 in the transmission rate determination algorithm using differential energy. If it is determined as the transmission rate 4, it operates at 8kbps. Otherwise, it proceeds to the second step.

두번째 단계에서는 출력코드를 이용한 전송율 알고리즘에 의해 전송율1∼전송율3을 결정한다.In the second step, the transmission rate 1 to the transmission rate 3 are determined by the transmission rate algorithm using the output code.

그 구체적인 동작을 살펴보면 다음과 같다.Looking at the specific operation is as follows.

먼저, 음성신호를 일정한 길이의 프레임으로 분할한다(71). 프레임의 길이는 음성신호의 특성이 크게 변화하지 않도록 짧게 하는 것이 유리하나 음성 활성도 정보 전송의 오버헤드(overhead)가 너무 크지 않도록 적당한 길이를 유지하여야 한다. 본 발명에서는 보코더(Vocoder)에서 많이 채택하고 있는 20msec의 프레임 길이를 선택하였다. 일반적으로 20msec 동안은 음성신호의 특성이 크게 변화하지 않는 것으로 알려져 있다.First, the voice signal is divided into frames of a predetermined length (71). The length of the frame is advantageously shortened so that the characteristics of the voice signal do not change significantly, but the length of the frame should be maintained so that the overhead of voice activity information transmission is not too large. In the present invention, a frame length of 20 msec, which is widely adopted by Vocoder, is selected. In general, it is known that the characteristics of a voice signal do not change significantly during 20 msec.

분할된 각각의 프레임마다 미분에너지(dENG)를 계산한다(72). 미분신호는 음성신호의 현재 샘플과 과거 샘플의 차이로 정의되며, 미분신호를 제곱하여 프레임구간 동안 합하면 미분에너지가 얻어진다. 즉, 아래의 식과 같이 나타내어진다.The differential energy dENG is calculated for each divided frame (72). The differential signal is defined as the difference between the current sample and the past sample of the speech signal. The differential signal is squared and summed over the frame period to obtain the differential energy. That is, it is represented by the following formula.

이후, 미분에너지를 이용하여 기준에너지(B)를 구한다. 기준에너지는 음성 활성도를 측정할 때 기준이 되는 값으로, 현재의 미분에너지(dENG), 배경잡음 최대상수(100000), 저역 필터링된 기준에너지(MAX(1.00547B1, B1+1))중 최소값으로 정의되며, 이 값은 매 프레임마다 갱신된다(73). 이와같이 구하여진 기준에너지는 대략 배경잡음의 미분에너지의 의미를 가지며, 이는 매 프레임의 음성 활성도 측정의 기준이 된다.Then, the reference energy (B) is obtained using the differential energy. The reference energy is a reference value when measuring voice activity. The reference energy is the minimum value of the current differential energy (dENG), the maximum background noise constant (100000), and the low-pass filtered reference energy (MAX (1.00547B1, B1 + 1)). This value is defined and updated every frame (73). The reference energy thus obtained has a meaning of the differential energy of the background noise, which is a reference for measuring the voice activity of each frame.

이후, 제4임계치의 기준에너지(B)를 곱한 값이 미분에너지(dENG)보다 크면 전송율4(rate4:8kbps)로 동작한다(74).Thereafter, when the value multiplied by the reference energy B of the fourth threshold is greater than the differential energy dENG, the operation rate 4 (rate 4: 8 kbps) is performed (74).

이후, 각 프레임을 최대전송율(32k ADPCM)로 부호화하여 출력코드를 버퍼에 저장한 후에(75) 각 프레임의 출력코드를 검색한다(76).After that, each frame is encoded at the maximum data rate (32k ADPCM), and the output code is stored in the buffer (75), and then the output code of each frame is retrieved (76).

검색 결과, 출력코드중 2비트 이상인 출력코드가 없으면 전송율4(8kbps)로 프레임을 전송하고(77), 출력코드중 3비트 이상인 출력코드가 없으면 전송율3(16kbps)로 프레임을 전송하고(78), 출력코드중 4비트 이상인 출력코드가 없으면 전송율2(24kbps)로 프레임을 전송하며(79), 그외에 출력코드중 4비트 이상인 출력코드가 있으면 전송율1(32kbps)로 프레임을 전송한다. 이후, 다음 음성신호에 대하여 위와 같은 동작을 반복한다.As a result of the search, if there is no output code of 2 bits or more among the output codes, the frame is transmitted at transmission rate 4 (8 kbps) (77), and if there is no output code of 3 bits or more among the output codes, the frame is transmitted at transmission rate 3 (16 kbps) (78). If there is no output code of 4 bits or more among the output codes, the frame is transmitted at transmission rate 2 (24 kbps) (79). Otherwise, if there is an output code of 4 bits or more among the output codes, the frame is transmitted at transmission rate 1 (32 kbps). Thereafter, the above operation is repeated for the next voice signal.

제8도는 미분에너지와 출력코드범위측정에 의한 음성 활성도 검출(VAD) 방법을 적용한 본 발명의 또다른 실시예에 따른 가변전송율 적응미분 펄스부호변조장치(ADPCM)의 개략적인 구성도로서, 도면에서 81은 음성 분할부, 82는 버퍼, 83은 미분에너지를 이용한 음성 활성도 검출부, 84는 비교기, 85는 스위치, 86은 델타 변조부, 87은 32k 적응미분 펄스부호변조부, 88은 프레임 저장부 및 십진수 변환부, 89는 비교기, 90은 전송율 결정부를 각각 나타낸다.8 is a schematic block diagram of an ADPCM variable rate adaptive differential pulse code modulation device according to another embodiment of the present invention to which a voice activity detection (VAD) method using differential energy and output code range measurement is applied. 81 is a voice splitter, 82 is a buffer, 83 is a voice activity detector using differential energy, 84 is a comparator, 85 is a switch, 86 is a delta modulator, 87 is a 32k adaptive differential pulse coder, 88 is a frame storage unit, The decimal converter 89 denotes a comparator and 90 denotes a data rate determining unit.

음성 분할부(81)는 외부로부터 입력되는 음성신호를 적당한 길이의 프레임으로 분할한다.The voice divider 81 divides the voice signal input from the outside into a frame of a suitable length.

버퍼(82)는 미분에너지를 이용한 음성 활성도 검출부(83)에서 데이타 전송율을 결정하는 동안 상기 음성 분할부(81)로부터 입력되는 음성신호를 버퍼링한다.The buffer 82 buffers the voice signal input from the voice divider 81 while the voice activity detector 83 using the differential energy determines the data rate.

미분에너지를 이용한 음성 활성도 검출부(83)는 상기 음성 분할부(81)로부터 음성신호를 입력받아 프레임으로 분할하여 미분에너지를 계산한 후에 기준에너지를 설정하여 상기 미분에너지와 비교하여 전송율을 가변시킨다.The voice activity detection unit 83 using differential energy receives a voice signal from the voice divider 81 and divides it into frames to calculate differential energy, and then sets reference energy to compare the differential energy with the differential energy.

비교기(84)는 상기 미분에너지를 이용한 음성 활성도 검출부(83)의 출력(X)과 기결정된 임계치(Y)를 입력받아 두 값을 비교하여 제어신호를 출력한다. 즉, 비교기(84)의 비교 결과, X가 Y보다 크면 버퍼(82)의 출력이 델타 변조부(86)로 출력되도록 제어신호를 출력하고, Y가 X보다 크면 버퍼(82)의 출력이 32k 적응미분 펄스부호변조부(87)로 출력되도록 제어신호를 출력한다.The comparator 84 receives an output X of the voice activity detector 83 using the differential energy and a predetermined threshold Y, compares the two values, and outputs a control signal. That is, as a result of the comparison of the comparator 84, if X is greater than Y, the control signal is output so that the output of the buffer 82 is output to the delta modulator 86. If Y is greater than X, the output of the buffer 82 is 32k. A control signal is outputted to the adaptive differential pulse code modulator 87.

스위치(85)는 상기 비교기(84)로부터 입력되는 제어신호에 따라 상기 버퍼(82)의 출력을 스위칭한다.The switch 85 switches the output of the buffer 82 according to the control signal input from the comparator 84.

델타 변조부(86)는 상기 버퍼(82)의 출력을 상기 스위치(85)를 통하여 입력받으면 델타 변조 방식(Delta Modulation:입력신호를 표본화하여 그 하나 앞의 표본치와의 차이의 부호의 정·부에 상응하는 극성만을 전송)을 사용하여 1 또는 0의 신호를 외부로 출력한다.When the delta modulator 86 receives the output of the buffer 82 through the switch 85, the delta modulator 86 samples the input signal and determines the sign of the difference from the previous sample value. Send only 1 or 0 signal to outside.

32k 적응미분 펄스부호변조부(87)는 15레벨 양자화기를 사용하므로, 상기 버퍼(82)의 출력을 상기 스위치(85)를 통하여 입력받아 한 샘플당 4비트 단위로 음성신호를 출력하며, 샘플링율이 8k(샘플/초)이므로 32kbps의 표본화된 데이타를 출력한다. 출력되는 4비트중 최상위비트는 부호를 나타내며(즉, 0이면 +, 1이면 -) 나머지 3비트는 음성신호의 크기를 나타낸다.Since the 32k adaptive differential pulse code modulator 87 uses a 15-level quantizer, it receives the output of the buffer 82 through the switch 85 and outputs an audio signal in units of 4 bits per sample. Since this is 8k (sample / second), 32kbps of sampled data is output. The most significant bit of the output 4 bits represents a sign (i.e., 0 if + and 1 if-). The remaining 3 bits represent the magnitude of the voice signal.

프레임 저장부 및 십진수 변환부(88)는 상기 32k 적응미분 펄스부호변조부(87)로부터 출력된 데이타를 프레임 길이만큼 버퍼에 저장한 후에 4비트 단위의 출력코드들을 추후 비교기에서 비교하기에 용이하게 십진수로 변환시켜 준다.The frame storage unit and the decimal converter 88 store the data output from the 32k adaptive differential pulse code modulation unit 87 in a buffer of a frame length, and then easily compare the output codes of 4 bits in a comparator later. Convert to decimal.

비교기(89)는 상기 프레임 저장부 및 십진수 변환부(88)로부터 한 프레임안에 있는 십진수로 변환된 출력코드들을 입력받아 비교한 후에 최대값과 최소값을 선택한다.The comparator 89 receives and compares the output codes converted into decimals in one frame from the frame storage unit and the decimal converter 88, and selects the maximum value and the minimum value.

전송율 결정부(90)는 상기 비교기(89)로부터 입력된 최대값이 전송율1, 전송율2, 전송율3, 전송율4의 범위중 어느 범위에 속하는지 결정한 후에 해당되는 범위의 전송율로 프레임을 전송한다.The rate determining unit 90 determines whether the maximum value input from the comparator 89 falls within the range of the rate 1, the rate 2, the rate 3, and the rate 4, and then transmits the frame at the rate of the corresponding range.

상기와 같이 구성되어 동작하는 본 발명은 다음과 같은 효과가 있다.The present invention configured and operated as described above has the following effects.

첫째, 미분에너지를 이용한 음성 활성도 검출 방법은 기준에너지를 이용하여 이 값과 미분에너지와의 비율에 의해 전송율이 결정되므로 기준에너지를 배경잡음의 크기에 따라 적용시킴으로써 배경잡음의 변화에 용이하게 적응할 수 있는 장점이 있다.First, since the transmission rate is determined by the ratio between this value and the differential energy using the reference energy, the voice activity detection method using the differential energy can be easily adapted to the change of the background noise by applying the reference energy according to the magnitude of the background noise. There is an advantage.

둘째, 출력코드범위에 의한 음성 활성도 검출 방법은 부호화에 의해 출력된 코드의 범위를 검색한 후에 프레임내의 샘플중 최대 출력코드의 범위에 의해 전송율이 결정되므로써 최대 전송율의 경우와 동일한 음질의 재생음을 얻을 수 있는 효과가 있다.Secondly, in the voice activity detection method based on the output code range, after retrieving the range of codes output by the encoding, the transmission rate is determined by the range of the maximum output code among the samples in the frame, so that the reproduction sound having the same sound quality as the maximum transmission rate is obtained. It can be effective.

셋째, 미분에너지와 출력코드범위에 의한 음성 활성도 검출 방법은 음성구간에서는 보통 전송율1∼전송율3으로 동작하므로 최대전송율로 동작한 경우와 동일한 음질의 재생음을 얻을 수 있으며, 동시에 프레임의 미분에너지 값이 증가하여 기준에너지가 조정됨으로써 묵음 구간에서는 전송율4로 동작함으로써 불필요한 배경잡음의 전송에 사용되는 비트의 낭비를 억제할 수 있는 효과가 있다.Third, the voice activity detection method using differential energy and output code range usually operates at transmission rate 1 to transmission rate 3 in the voice section, so that it is possible to obtain reproducing sound with the same sound quality as when operating at maximum transmission rate. As the reference energy is increased and adjusted, the silent section operates at the transmission rate 4, thereby reducing the waste of bits used to transmit unnecessary background noise.

넷째, 미분에너지와 출력코드범위에 의한 음성 활성도 검출 방법을 적용한 가변전송율 적응미분 펄스부호변조장치는 평균전송율을 감소시킬 수 있다. 즉, 32k로 동작하는 적응미분 펄스부호변조장치(ADPCM)보다 음성 활성도에 따라 전송율을 결정하므로 보다 낮은 전송율로 32k 적응미분 펄스부호변조장치(ADPCM)과 거의 같은 음질의 서비스를 받을 수 있다. 모의 실험결과 17kbps의 평균전송율을 가진다.Fourth, the variable rate adaptive differential pulse code modulation apparatus applying the voice activity detection method using differential energy and output code range can reduce the average transmission rate. That is, since the transmission rate is determined according to the voice activity than the adaptive differential pulse code modulation (ADPCM) operating at 32k, the same quality of service as that of the 32k adaptive differential pulse code modulation (ADPCM) can be obtained at a lower transmission rate. The simulation results show an average data rate of 17kbps.

다섯째, 미분에너지와 출력코드범위에 의한 음성 활성도 검출 방법을 적용한 가변전송율 적응미분 펄스부호변조장치는 코드분할다중접속(CDMA) 시스템의 용량을 증대시킬 수 있다. 즉, 가변전송율 적응미분 펄스부호변조장치(ADPCM)를 코드분할다중접속(CDMA) 시스템에 적용함으로써 낮은 전송율에서 전송전력을 낮게 전송함으로써 채널간 간섭에 의한 영향을 감소시킴으로써 수용용량을 증대시킬 수 있다.Fifth, the variable rate adaptive differential pulse code modulation apparatus applying the voice activity detection method using differential energy and output code range can increase the capacity of a code division multiple access (CDMA) system. That is, by applying the variable rate adaptive differential pulse code modulation (ADPCM) to the code division multiple access (CDMA) system, it is possible to increase the capacity by reducing the influence of interference between channels by transmitting the transmission power at a low transmission rate. .

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로, 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited to the drawings shown.

Claims

A voice activity detection means (22) using differential energy that receives a voice signal from the outside, divides it into frames, calculates differential energy, sets reference energy, and variably determines a transmission rate in comparison with the differential energy; Buffering means (21) for buffering a voice signal input from the outside while the data activity rate is determined by the voice activity detecting means (22) using the differential energy; And receiving information on the data rate to be operated from the voice activity detecting means 22, setting a quantization level and a variable corresponding to the data rate, and encoding a voice signal input from the buffering means 21 to a corresponding data rate. A variable rate adaptive differential pulse code modulation apparatus using a voice activity detection method comprising a variable rate adaptive differential pulse code modulation means (23) for outputting a voice code to the outside.

2. The variable rate adaptive differential pulse code modulation means (23) according to claim 1, wherein the variable rate adaptive differential pulse code modulation means (23) receives an input voice signal at a quantization level of 15 levels (4 bits) upon receiving information to operate at 32 kbps from the voice activity detection means (22). Quantized and outputted, and upon receiving information to operate at 24 kbps from the voice activity detecting means 22, the input voice signal is quantized and outputted at a quantization level of 7 levels (3 bits), and is output from the voice activity detecting unit 22. When receiving a signal to operate at 16kbps, the input audio signal is quantized and output at a three-level (2-bit) quantization level, and when receiving a signal to operate at 8kbps from the voice activity detector 22, a delta modulation method is used. A variable rate adaptive differential pulse code modulation apparatus using a voice activity detection method, characterized in that configured to output a signal of 1 or 0 by using.

A voice activity detection method applied to a variable rate adaptive differential pulse code modulation device, comprising: firstly dividing an audio signal into frames having a predetermined length and then calculating differential energy dENG for each divided frame; ); A second step 13 of selecting a minimum value among a current differential energy dENG, a background noise maximum constant, and a low pass filtered reference energy and setting it as a reference energy B; And a third step (14 to 16) of comparing the set reference energy value with the differential energy of the current frame to variably determine the data rate according to the ratio of the energy to the background noise of the current frame.

4. The method of claim 3, wherein the differential energy calculation process of the first step (11, 12) is calculated by summing the difference between the current sample and the past sample of the speech signal over a frame period.

The method according to claim 3 or 4, wherein the third steps (14 to 16) include a predetermined transmission rate 4 (8 kbps) if a value obtained by multiplying the predetermined fourth threshold value by the reference energy (B) is larger than the differential energy (dENG). A fourth step (14) of transmitting the frame data to the user; If the differential energy dENG is greater than the product of the predetermined fourth threshold and the reference energy B and less than the product of the predetermined third threshold and the reference energy B, the frame data at an arbitrary transmission rate 3 (16 kbps) is obtained. Transmitting a fifth step (15); If the differential energy dENG is greater than the value of the predetermined third threshold and the reference energy B and less than the value of the second threshold and the reference energy B, the frame data is transmitted at an arbitrary transmission rate 2 (24 kbps). A sixth step (16); And a seventh step of transmitting frame data at an arbitrary transmission rate 1 (32 kbps) when the differential energy is greater than the predetermined second threshold.

Speech dividing means (41) for dividing a voice signal input from the outside into a frame having a predetermined length; a 32k adaptive derivative for outputting an encoded voice code by setting the quantization level and a variable by receiving the output of the speech dividing means (41) Pulse code modulation means 42; Frame storing means and decimal converting means (43) for storing the output code of said 32k adaptive differential pulse code modulation means (42) in a buffer for a frame length and then converting the output code into a decimal number; Comparison means (44) for selecting the maximum value and the minimum value after receiving and comparing the output code converted to the decimal number in one frame from the frame storage means and the decimal conversion means (43); And a rate determining means 45 for determining frame rate to which the maximum value input from the comparing means 44 belongs, and then transmitting frame data at a rate within the range corresponding to the variable rate adaptive differential pulse applying the voice activity detection method. Code modulator.

A voice activity detection method applied to a variable rate adaptive differential pulse code modulation device, comprising: first dividing a voice signal into frames having a predetermined length and encoding each frame at a maximum data rate (32k ADPCM) to store an output code in a buffer; Steps 31 to 33; A second step 34 of retrieving an output code of each frame; And a third step (35 to 37) for variably determining a transmission rate according to the number of bits of the output code shown in the search result of the second step (34).

8. The method of claim 7, wherein the third step (35 to 37) comprises: a fourth step (35) of transmitting frame data at an arbitrary transmission rate 4 (8 kbps) if there is no output code of 2 bits or more; A fifth step 36 of transmitting frame data at an arbitrary transmission rate 3 (16 kbps) if there is no output code of 3 bits or more; A sixth step 37 of transmitting frame data at an arbitrary transmission rate 2 (24 kbps) if there is no output code of 4 bits or more; And a seventh step of transmitting frame data at an arbitrary transmission rate 1 (32 kbps) when there is an output code of 4 bits or more.

Voice dividing means (81) for dividing an audio signal input from the outside into a frame having a predetermined length; A voice activity detection means (83) using differential energy that receives a voice signal from the voice dividing means (81), divides it into frames, calculates differential energy, sets reference energy, and varies transmission rate in comparison with differential energy; Buffering means (82) for buffering a voice signal input from said speech dividing means (81) while determining a data transmission rate in said speech activity detecting means (83) using said differential energy; First comparing means (84) for receiving an output (X) of the voice activity detecting means (83) using the differential energy and a predetermined threshold (Y), comparing the two values and outputting a control signal; Switching means (85) for switching the output of said buffering means (82) in accordance with a control signal input from said first comparing means (84); Delta modulating means (86) for outputting a voice code (1 or 0) to the outside by modulating the output of the buffering means (82) through the switching means (85) by a delta modulation; 32k adaptive differential pulse code modulation means (87) for receiving the output of the buffering means (82) through the switching means (85) and outputting an encoded speech code by setting a quantization level and a variable; Frame storage means and decimal conversion means (88) for storing the output code of said 32k adaptive differential pulse code modulation means (87) in a buffer for a frame length and then converting the output code into a decimal number; Second comparison means (89) for selecting a maximum value and a minimum value after inputting and comparing output codes converted to decimal numbers in one frame from the frame storage means and the decimal conversion means (88); And a rate determining means (90) for determining frame rate to which the maximum value input from the second comparing means (89) belongs, and then transmitting frame data at a rate within a corresponding range. Differential pulse code modulation device.

A voice activity detection method applied to a variable rate adaptive differential pulse code modulation device, comprising: first step (71,72) of calculating a differential energy (dENG) for each divided frame after dividing a voice signal into frames having a predetermined length; ); A second step 73 of selecting a minimum value among a current differential energy dENG, a background noise maximum constant, and a low pass filtered reference energy and setting it as a reference energy B; A third step 74 of transmitting frame data at an arbitrary transmission rate 4 (8 kbps) when a value obtained by multiplying the predetermined fourth threshold by the reference energy B is greater than the differential energy dENG; And a fourth step (75 to 79) of encoding each frame at a maximum data rate (32k ADPCM), storing the output code in a buffer, searching for the output code of each frame, and variably determining the data rate according to the number of bits of the output code. Voice activity detection method comprising a.

11. The method of claim 10, wherein the step of variably determining the transmission rate according to the number of bits of the output code in the fourth step (75 to 79), if there is no output code of more than 2 bits, the frame data at an arbitrary transmission rate 4 (8kbps) A fifth step 77 of transmitting; A sixth step 78 of transmitting frame data at an arbitrary transmission rate 3 (16 kbps) if there is no output code of 3 bits or more; A seventh step 79 of transmitting frame data at an arbitrary transmission rate 2 (24 kbps) if there is no output code of 4 bits or more; And an eighth step of transmitting frame data at an arbitrary transmission rate 1 (32 kbps) when there is an output code of 4 bits or more.