KR19980035867A

KR19980035867A - Speech data encoding / decoding device and method

Info

Publication number: KR19980035867A
Application number: KR1019960054317A
Authority: KR
Inventors: 배성근; 고용철
Original assignee: 김영환; 현대전자산업 주식회사
Priority date: 1996-11-15
Filing date: 1996-11-15
Publication date: 1998-08-05
Also published as: KR100255297B1

Abstract

본 발명은 디지털 통신 시스템의 음성신호 처리에 있어서, 피치단위에 의한 매트릭싱 방법과 국부 봉우리와 골에 의한 비균일 표본화 기술을 이용하여 음성신호를 부호 및 복호화함으로써 두 번의 압축에 걸친 높은 데이터 압축율과 간단한 구조 및 빠른 데이터 처리시간을 제공할 수 있는 음성 데이터 부호화/복호화장치 및 그 방법에 관한 것으로, 입력되는 음성신호의 피치를 검출하여 1차원의 음성신호를 2차원 신호로 변환하는 피치단위 매트릭싱의 제1 단계와, 상기 제1 단계에서 구성된 2차원 신호중 첫 번째 피치집합과 마지막 피치집합을 선별한 후 상기 첫 번째와 마지막 피치집합만을 남기고 나머지 두 집합 사이의 집합을 제거하는 매트릭싱 데시메이션의 제2 단계와, 상기 제2 단계에서 얻어진 첫 번째 피치집합과 마지막 피치집합에 대해 국부 봉우리와 골을 검출하고 이 검출된 봉우리와 골에 대한 양자화 레벨값과 간격값을 사용하여 데이터를 다시한번 압축하는 비균일 표본화과정의 제4단계로 이루어진 데이터 부호화과정과; 상기 데이터 압축과정의 제4 단계에서 압축된 데이터를 복원하기 위해 국부 봉우리와 골의 양자화 레벨값과 비균일 샘플링 간격값으로 첫 번째 피치집합과 마지막 피치집합을 재구성하는 매트릭싱 인터폴레이션의 제1 단계와, 상기 제1 단계에서 재구성된 첫번째 피치집합과 마지막 피치집합을 2차원 행열의 나머지 피치집합을 재구성하는 매트릭싱 인터폴레이션의 제2 단계와, 상기 제2 단계에서 재구성된 2차원 행렬 신호를 1차원의 음성신호로 재생하여 원래의 신호를 복원하는 디매트릭싱의 제3 단계로 이루어진 데이터 복호화과정으로 수행되어, 음성신호가 단시간에는 느린 변화를 하며 준주기적인 특성을 가진다는 것을 이용하여 불필요한 데이터를 줄일 수 있고, 2차원 피치행렬과 비균일 표본화법을 사용하여 자연성 및 명료성이 우수함은 물론 계산량이 적고 간단한 구조를 갖는 특장점이 있다.In the speech signal processing of a digital communication system, the present invention provides a high data compression ratio over two compressions by encoding and decoding a speech signal using a matrix-based pitch method and a nonuniform sampling technique using local peaks and valleys. The present invention relates to a speech data encoding / decoding apparatus capable of providing a simple structure and a fast data processing time, and a method thereof, comprising: a pitch unit matrix for detecting a pitch of an input speech signal and converting a one-dimensional speech signal into a two-dimensional signal A matrix decimation process for selecting a first pitch set and a last pitch set from the two-dimensional signals configured in the first stage of and removing a set between the remaining two sets, leaving only the first and last pitch sets. Step 2 and local to the first and last pitch sets obtained in the second step. A data encoding process comprising a fourth step of non-uniform sampling process for detecting peaks and valleys and compressing the data once again using quantization level values and interval values for the detected peaks and valleys; A first step of matrix interpolation for reconstructing the first pitch set and the last pitch set with quantization level values and non-uniform sampling interval values of local peaks and valleys to restore the compressed data in the fourth step of the data compression process; A second step of matrix interpolation for reconstructing the first pitch set and the last pitch set reconstructed in the first step, and the remaining pitch sets of the two-dimensional matrix; and the two-dimensional matrix signal reconstructed in the second step as a one-dimensional audio signal. It is performed by the data decoding process consisting of the third step of dematrixing to restore the original signal by reproducing the original signal, so that unnecessary data can be reduced by using a slow change in a short time and having a quasi-periodic characteristic. In addition to using the two-dimensional pitch matrix and the non-uniform sampling method, It has the advantage of small amount of calculation and simple structure.

Description

Speech data encoding / decoding device and method

제1도는 본 발명에 의한 첫번째 음성 데이터 압축을 위한 부호화기의 블록 구성도.1 is a block diagram of an encoder for first speech data compression according to the present invention.

제2도는 본 발명에 의한 두 번째 음성 데이터 압축을 위한 부호화기의 블록 구성도.2 is a block diagram of an encoder for second speech data compression according to the present invention.

제3도는 본 발명에 의한 음성 데이터 신장을 위한 복호화기의 블럭 구성도.3 is a block diagram of a decoder for speech data extension according to the present invention.

제4도 (가)는 본 발명에 의한 음성 데이터의 부호화 순서흐름도.4 (a) is a flowchart illustrating the encoding sequence of speech data according to the present invention.

(나)는 본 발명에 의한 음성 데이터의 복호화 순서흐름도.(B) is a flowchart illustrating the decoding process of speech data according to the present invention.

제5도는 본 발명의 첫 번째 압축을 위한 메트릭싱 과정을 설명하기 위한 도면.5 is a view for explaining a metric process for the first compression of the present invention.

제6도는 본 발명의 두 번째 압축을 위한 국부 봉우리와 골에 의한 비균일 표본화 과정을 설명하기 위한 도면.6 is a diagram illustrating a non-uniform sampling process by local peaks and bones for the second compression of the present invention.

*도면의 주요 부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

11:피치 검출부12:매트릭싱부11: pitch detection unit 12: matrixing unit

13:피치 변화 검출부14:에너지 변확 검출부13: pitch change detector 14: energy variation detector

15:매트릭싱 데시메이션부16:대표 피치 집합 선별부15: matrix decimation unit 16: representative pitch set selector

21,22:피치집합 저장부23:봉우리/골 검출부21, 22: Pitch set storage unit 23: Peaks / bone detection unit

24:양자화 레벨정보 저장부25:샘플링 간격정보 저장부24: quantization level information storage unit 25: sampling interval information storage unit

26:데이터 압축 전송부32:봉우리 집합부26: data compression transmission unit 32: peak aggregation unit

33:골 집합부34:매트릭싱 인터폴레이션부33: Goal aggregation part 34: Matrix interpolation part

35:매트릭싱 재구성부36:디매트릭싱부35: matrix reconstruction unit 36: de-matrixing unit

본 발명은 디지털 통신 시스템의 음성신호 처리에 있어서, 피치단위에 의한 매트릭싱 방법과 국부 봉우리와 골에 의한 비균일 표본화 기술에 이용하여 음성신호를 부호 및 복호화함으로써 두 번의 압축에 걸친 높은 데이터 압축율과 간단한 구조 및 빠른 데이터 처리시간을 제공할 수 있는 음성 데이터 부호화/복호화장치 및 그 방법에 관한 것이다.In the speech signal processing of a digital communication system, the present invention provides a high data compression ratio over two compressions by encoding and decoding a speech signal using a matrix-based pitch method and a nonuniform sampling technique using local peaks and valleys. The present invention relates to a speech data encoding / decoding apparatus capable of providing a simple structure and fast data processing time, and a method thereof.

일반적으로 급속히 변화하는 산업사회에서 쏟아지는 많은 정보들은 영상이나 음성을 통해서 전달되게 되며, 특히 상기 음성을 사용한 정보 전달 방법은 통신 수단이나 대화수단으로서 가장 오랫동안 사용되어 온 통신 방법이다.In general, a lot of information pouring in a rapidly changing industrial society is transmitted through video or audio, and in particular, the information transmission method using the voice is a communication method that has been used for a long time as a communication means or a conversation means.

최근, 통신방식이 아날로그에서 디지털화 되어가는 시점에서 음성신호를 디지털화하여 송수신하는 디지털 통신의 음성신호 처리에 있어서, 많은 음성 합성 기술 및 부호화, 복호화기술이 대두되고 있다.In recent years, many voice synthesis techniques, encoding and decoding techniques have emerged in the voice signal processing of digital communication in which digital communication is digitized and transmitted and received at the time when the communication system becomes digital from analog.

이러한 음성 부호화/복호화 기술에 있어서, 음성을 부호화하고 저장하는 방법으로는 음성신호에 존재하는 반복적이고 불필요한 잉여성분을 제거한 후 저장 및 부호화하는 파형 부호화법(waveform coding method)과 음성 생성과정(speech production model)에 근거한 각 원(source)을 필터로 간주하여 여기원(excitation source)의 여기피터와 성도성분(vocal tract)의 여파기 필터로서 부호화하는 신호원 부호화법(source coding method) 및 신호원 부호화법 중 상기 파형 부호화법과 합성한 혼성 부호화법(hybrid coding)이 있다.In the speech encoding / decoding technique, a method of encoding and storing a speech includes a waveform coding method and a speech production process of removing and storing repetitive and unnecessary surpluses present in a speech signal and storing and encoding the speech. A source coding method and a signal source coding method in which each source based on the model is regarded as a filter and encoded as an excitation filter of an excitation source and a filter of a vocal tract. Among them, there is a hybrid coding method synthesized with the waveform coding method.

상기 파형 부호화법은 시간영역에서 잉여성분만을 제거하여 부호화하므로 음질의 자연성(naturality) 및 명료성(intelligibility)이 매우 높으나, 전송에 요구되는 데이터가 많아서 메모리에 효율적이지 못하다는 단점을 가지고 있으며, 이런 부호화 방법으로는 PCM(pulse code modulation), DM(delt modulation), ADM(adaptive modulation), DPC(differential pulse code modulation) 등이 있다.The waveform coding method removes only the excess part in the time domain and encodes the audio so that its naturalness and intelligibility are very high. However, the waveform coding method has a disadvantage in that it is not efficient in memory due to the large amount of data required for transmission. Methods include pulse code modulation (PCM), delete modulation (DM), adaptive modulation (ADM), differential pulse code modulation (DPC), and the like.

그리로, 음성 생성과정에 근거하여 음성은 음원과 성도필터에 의해 생성되고 이를 모델링하여 인위적으로 부호화하는 신호원 부호화법은 시간영역에서 주파수영역으로 변환하여 여기성분(excitating)과 포만트성분(formant)을 분리하여 처리한다.Therefore, based on the speech generation process, the speech is generated by the sound source and the vocal filter, and the signal source coding method which artificially encodes the model is transformed from the time domain to the frequency domain to generate excitating and formant components. ) To separate.

특히, 상기 포만트성분을 부호화하고 여기정보를 부호화하기 위해 다시 두가지 방법으로 나누어지는데, 먼저 포만트성분을 부호화하는 방법에는 LPC(linear preadiction coding), LSP(linear spectrum pair), PARCOR 등이 있으며, 이런 방법은 특징 파라미터만을 전송함에 따라 메모리 사용이 효율적인 장점이 있다.In particular, the formant component is divided into two methods for encoding the excitation information. First, the formant component encoding method includes linear preadiction coding (LPC), linear spectrum pair (LSP), and PARCOR. This method has an advantage of efficient memory usage by transmitting only feature parameters.

그러나, 음의 전이, 음의 시작/끝, 유/무성음의 반복시에는 특징 파라미터만으로는 모델링을 할 수 없기 때문에 음질의 열화가 발생하고, 특히 비음이나 마찰음을 모델링하는 데는 전극형모델(all-pole model)에서 극-영형모델(zero-pole model)이 요구되어 음의 자연성과 명료성이 떨어지는 결점이 있다.However, in case of transition of sound, start / end of sound, and repetition of voiced / unvoiced sound, deterioration of sound quality occurs because feature parameter cannot be modeled only. Especially, all-pole model is used for modeling nasal or friction sound. In the model, a zero-pole model is required, which has a drawback of inferior naturalness and clarity.

또한, 상기 파형 부호화법의 장점인 고음질과 신호원 부호화법의 장점인 메모리 효율성을 취한 혼성 부호화법은 여기정보를 어떻게 부호화하는지에 따라 CELP(code excited linear prediction), VSELP(vector sum linear prediction) 등이 사용되며, 분석에 따른 합성(synthesis by analysis)을 주로 사용한다.In addition, the hybrid coding method having the high sound quality, which is an advantage of the waveform coding method, and the memory efficiency, which is an advantage of the signal source coding method, may include code excited linear prediction (CELP), vector sum linear prediction (VSELP), and the like depending on how the excitation information is encoded. This is used, mainly by synthesis by analysis.

상기 혼성 부호화법은 낮은 비트율로서 고음질을 보장받기 위해서는 반복적인 계산과정과 비교과정이 필요하게 되고, 이로 인하여 계산량이 방대해짐은 물론 구조가 복잡해지는 단점이 있다.The hybrid coding method requires an iterative calculation process and a comparison process in order to ensure a high sound quality at a low bit rate, which causes a large amount of calculation and a complicated structure.

상기와 같은 단점을 해결하고자 최근에는 준주기적인 주기를 가지는 음성신호에 있어서 데이터량을 감소하기 위해 사용되는 피치 동기 인터폴레이션(Pitch Synchronous Interpolation; 이하 'PSI'라고 약칭함)을 사용하고 있는데, 이러한 PSI법은 반복되는 음의 피치구간(pitch period)이나 여기정보를 피치주기에 동기시켜 반복되는 주기를 분석프레임내에 반복적으로 재구성하는 방식으로 규칙에 의한 합성에 따른 부호화나 음성 합성에 적용된다.In order to solve the above disadvantage, recently, Pitch Synchronous Interpolation (hereinafter, abbreviated as 'PSI') used to reduce the amount of data in a speech signal having a semi-periodic period is used. The method is applied to encoding or speech synthesis according to the synthesis according to a rule by repeating reconstructing the repeated period in the analysis frame by synchronizing repeated pitch periods or excitation information with the pitch period.

그러나, 상기 PSI법 역시 일부 파형으로 균일하게 합성함에 따라 강세변화에 대한 에너지 보상문제와 정확한 피치 검출이 어렵고, 특히 음의 변화가 심하고 피치가 변화하는 구간, 강세와 억양의 변화가 심한 음에서는 심각한 열화뿐 아니라 많은 계산시간이 필요하여 음성데이터의 부호화가 매우 어렵게 되는 문제점이 있었다.However, since the PSI method is also uniformly synthesized into some waveforms, it is difficult to detect an energy compensation problem and accurate pitch for stress changes, especially in a period where the sound change is severe and the pitch changes, and the stress and accent change are severe. In addition to deterioration, a large amount of calculation time is required, which makes it difficult to encode voice data.

따라서 본 발명은 상기와 같은 문제점들을 해결하기 위해 창안된 것으로서, 그 목적은 피치단위 매트릭싱을 이용한 매트릭싱과정과 국부 봉우리와 골을 이용한 비균일 표본화 과정을 통한 두 번의 압축과, 비균일 표본화의 역과정과 디매트릭싱에 의한 두 번의 신장으로 음성 데이터를 부호화 및 복호화함으로써 계산량이 적고 구조가 간단하며, 음의 자연성과 명료성이 우수하면서 실시간처리에 적합한 음성 데이터 부호화/복호화장치 및 그 방법을 제공하는 데에 있다.Therefore, the present invention was devised to solve the above problems, and its object is to provide two compression and non-uniform sampling through a matrix process using pitch unit matrixing and a non-uniform sampling process using local peaks and valleys. Provides a speech data encoding / decoding apparatus and method suitable for real-time processing with a small amount of calculation, simple structure, excellent sound nature and clarity, and encoding and decoding of speech data by two expansions by inverse process and dematrixing. It's there.

이러한 목적을 달성하기 위한 본 발명의 음성 데이터 부호화/복호화장치는, 음성신호의 부호화는 입력되는 음성신호를 피치단위로 분리하여 피치지합을 구성하고 이 구성된 피치집합 중 첫 번째 피치집합과 마지막 피치집합만을 남기고 나머지 집합들을 제거하는 첫 번째 압축과정을 수행한 후, 상기 두 피치집합의 국부 봉우리와 골을 이용한 비균일 표본화에 의해 두 번째 압축과정을 수행하도록 하고, 복호화는 상기 부호화시 수행된 비균일 표본화의 역과정에 의해 첫 번째 피치집합과 마지막 피치집합을 재구성하고, 디메트릭싱 과정을 통해 원래의 음성신호를 재생하는 두 번의 신장과정을 거쳐 음성 데이터의 복호가 이루어지도록 한다.In the speech data encoding / decoding apparatus of the present invention for achieving the above object, the encoding of the speech signal forms a pitch match by separating the input speech signal in pitch units, and only the first pitch set and the last pitch set among the configured pitch sets. After performing the first compression process to remove the remaining sets, the second compression process is performed by non-uniform sampling using the local peaks and valleys of the two pitch sets, and the decoding is performed by the non-uniform sampling performed during the encoding. The first pitch set and the last pitch set are reconstructed by the inverse process, and the decoding process is performed on the decoded speech data through two decompression processes to reproduce the original speech signal.

이하, 첨부된 도면을 참조하여 본 발명의 음성 데이터 부호화/복호화장치 및 그 방법을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the speech data encoding / decoding apparatus and method thereof.

제1도는 본 발명에 의한 첫번째 음성 데이터 압축을 위한 부호화기의 블록구성도로 이에 도시된 바와 같이, 입력되는 음성신호의 피치를 검출하는 피치 검출부(11)와, 상기 피치 검출부(11)에서 검출된 피치를 단위별로 분리하여 피치집합을 구성하는 매트릭싱부(12)와, 상기 피치 검출부(11)에서 검출된 피치의 변화를 검출하는 피치 변환 검출부(13)와, 상기 피치 검출부(11)에서 검출된 피치의 에너지 변화를 검출하는 에너지 변화 검출부(14)와, 상기 피치 변화 검출부(13)에서 검출된 피치와 에너지 변화 검출부(14)에서 검출된 에너지의 변화에 따라 상기 매트릭싱부(12)에서 구성된 피치집합 중 첫 번째 피치집합과 마지막 피치집합을 남기고 나머지 집합들을 제거하는 매트릭싱 데시메이션부(15)와, 상기 매트릭싱 데시메이션부(15)에 의해 남은 첫 번째 마지막 피치집합의 대표 피치 집합을 선별하는 대표 피치 집합 선별부(16)로 구성된다.1 is a block diagram of an encoder for compressing first speech data according to the present invention. As shown in FIG. 1, a pitch detector 11 for detecting a pitch of an input speech signal and a pitch detected by the pitch detector 11 are illustrated in FIG. Is divided into units to form a pitch set, a pitch conversion detector 13 for detecting a change in pitch detected by the pitch detector 11, and a pitch detected by the pitch detector 11. An energy change detector 14 that detects an energy change of the pitch, and a pitch set configured by the matrixing unit 12 according to the pitch detected by the pitch change detector 13 and the energy detected by the energy change detector 14. The matrix decimation unit 15 for removing the remaining sets after leaving the first pitch set and the last pitch set of the first, and the first margin remaining by the matrix decimation unit 15 Representative of selecting a representative pitch set of pitch set consists of a set of pitch selection unit (16).

제2도는 본 발명에 의한 두 번째 음성 데이터 압축을 위한 부호화기의 블록 구성도로서 이에 도시된 바와 같이, 첫 번째 압축과정에서 얻어진 첫 번째 피치집합과 마지막 피치집합을 각각 저장하는 피치집합 저장부(21)(22)와, 상기 피치집합 저장부(21)(22)의 첫 번째와 마지막 피치집합에 대해 국부 봉우리와 골을 검출하는 봉우리/골 검출부(23)와, 상기 봉우리/골 검출부(23)에서 검출된 봉우리와 골에 대한 양자화 레벨정보를 저장하는 양자화 레벨정보 저장부(24)와 상기 봉우리/골 검출부(23)에서 검출된 봉우리와 골에 대한 샘플링 간격정보를 저장하는 샘플링 간격정보 저장부(25)와, 상기 양자화 레벨정보 저장부(24)의 양자화 레벨값과 상기 샘플링 간격정보 저장부(25)의 샘플링 간격값으로 데이터를 압축하여 전송하는 데이터 압축 전송부(26)로 구성된다.2 is a block diagram of an encoder for compressing second speech data according to the present invention. As shown in FIG. 2, a pitch set storage unit 21 stores a first pitch set and a last pitch set, respectively, obtained in a first compression process. (22), a peak / bone detector (23) for detecting local peaks and valleys for the first and last pitch sets of the pitch set storage (21) and (22), and in the peak / bone detector (23). A quantization level information storage unit 24 which stores quantization level information on detected peaks and valleys, and a sampling interval information storage unit which stores sampling interval information on peaks and valleys detected by the peak / gol detection unit 23 ( 25) and a data compression transmission section 26 for compressing and transmitting data to a quantization level value of the quantization level information storage section 24 and a sampling interval value of the sampling interval information storage section 25. .

제3도는 본 발명에 의한 음성 데이터 신장을 위한 복호화기의 블록 구성도로서 이에 도시된 바와 같이, 부호화기(31)로부터 전송되는 압축 음성 데이터의 국부 봉우리와 골집합을 저장하는 봉우리 집합부(32), 골 집합부(33)와, 상기 봉우리 집합부(32)의 봉우리와 골 집합부(33)의 골의 양자화 레벨값과 간격값으로 직선근사를 이용하여 첫 번째와 마지막 피치집합을 재구성하는 매트릭싱 인터폴레이션부(34)와, 상기 매트릭싱 인터폴레이션부(34)에서 재구성된 두 개의 피치집합으로 2차원 행렬의 나머지 피치집합을 형성하는 매트릭싱 재구성부(35)와, 상기 매트릭싱 재구성부(35)에서 형성된 2차원 행렬을 1차원의 음성신호로 재구성하여 원래의 음성신호를 재생하는 디매트릭싱부(36)로 구성된다.3 is a block diagram of a decoder for speech data extension according to the present invention. As shown therein, a peak aggregation unit 32 for storing local peaks and bone sets of compressed speech data transmitted from the encoder 31 is shown. A metric for reconstructing the first and last pitch sets using a straight line approximation with the quantization level value and the spacing value of the valleys of the valleys 33, the peaks of the peaks 32, and the valleys of the valleys 33. A matrix interpolation unit 34, a matrix reconstruction unit 35 that forms the remaining pitch sets of the two-dimensional matrix with two pitch sets reconstructed by the matrix interpolation unit 34, and the matrix reconstruction unit 35 And a de-matrixing unit 36 for reconstructing the two-dimensional matrix formed in the < RTI ID = 0.0 >

상기와 같이 구성되는 본 발명의 음성 데이터의 부호화 및 복호화 방법을 나머지 도면을 참조하여 상세히 설명하면 다음과 같다.A method of encoding and decoding speech data of the present invention configured as described above will be described in detail with reference to the remaining drawings.

제 4도 (가)는 본 발명에 의한 음성 데이터의 부호화 순서흐름도로서, 제5도의 본 발명의 첫 번째 압축을 위한 매트릭싱 과정을 설명하기 위한 도면과 제6도의 본 발명의 두 번째 압축을 위한 국부 봉우리와 골에 의한 비균일 표본화 과정을 설명하기 위한 도면을 참고하여 상세히 설명하면 다음과 같다.4 (a) is a flowchart illustrating the encoding process of speech data according to the present invention, which is a diagram illustrating a matrixing process for the first compression of the present invention of FIG. 5 and a second compression of the present invention of FIG. Referring to the drawings for explaining the non-uniform sampling process by the local peaks and bones as follows.

우선, 입력되는 음성신호가 유성음임을 판단하여(S₁) 유성음인 경우라면 상기 입력되는 음성의 피치를 검출하고, 검출된 피치를 단위별로 분리하여 피치집합을 구성한다(S₂).First, it is determined that the input voice signal is voiced sound (S ₁ ). If the voice signal is voiced, the pitch of the input voice is detected, and the detected pitch is separated by units to form a pitch set (S ₂ ).

이때, 피치 검출은 자기상관관계(autocorrelation method)를 사용하여 검출하게 되는데, 음성신호는 아날로그 입력에 대해서 디지털 변환을 하여 한 프래임을 256 샘플로 나누어 처리하고, 각 프래임별 처리는 피치단위로 매트릭싱을 작성하기 위하여 피치를 검출하게 된다.At this time, the pitch detection is detected using an autocorrelation method. The audio signal is digitally converted to an analog input and processed by dividing one frame into 256 samples, and each frame is matrixed by pitch units. The pitch will be detected to create.

음성신호에 있어서 피치(51)는 제5도에 도시되어 있는 바와 같이 두드러진 봉우리(peak)와 봉우리 혹은 골과 골 사이의 기본 주파수(fundamental frequency)를 말하며, 피치 검출은 정확성이 놓은 식-1의 자기상관관계를 사용하여 검출하게 된다.In the speech signal, the pitch 51 refers to the prominent peak and peak or the fundamental frequency between the valley and the valley, as shown in FIG. Detects using autocorrelation.

[식-1][Equation-1]

여기서 s(.)는 입력 디지털 음성이고, k는 지연인자이다.Where s (.) Is the input digital voice and k is the delay factor.

그리고, 상기 식-1에 의해 검출된 피치주기(51)로 피치집합을 구성하는 과정은 1차원의 음성신호를 프래임내의 피치주기와 피치갯수로 2차원 신호로 변환하는 매트릭싱(matrixing)과정으로서, 식-2에 의해 행렬이 구성된다.The process of constructing a pitch set using the pitch period 51 detected by Equation-1 is a matrixing process of converting a 1-dimensional audio signal into a 2-dimensional signal with a pitch period and a number of pitches in a frame. , Matrix-2 is formed by Equation-2.

즉, i축은 프래임내의 피치주기(52)이고, j축은 피치주기 개수(K)로 구성된 행렬(P_ij)은,That is, the i-axis is the pitch period 52 in the frame, and the j-axis is the matrix P _ij composed of the number of pitch periods K,

[식-2][Equation-2]

와 같으며, 여기서, P는 피치주기이고, K는 피치갯수이다.Where P is the pitch period and K is the number of pitches.

이어서, 상기 매트릭싱 과정을 통해 구성된 행렬(P_ij)의 2차원 신호에서 제5도에 도시된 바와 같이 첫 번째 피치 열(P_ij)(52)과 마지막 피치 열(P_ik)(53)만을 선별하여(S₃) 매트릭싱 데시메이션에 의해 상기 첫 번째 피치집합과 마지막 피치집합만을 남기고 나머지 두 집합 사이의 집합들(54)은 제거한다(S₄).Subsequently, in the two-dimensional signal of the matrix Pi _j constructed through the matrixing process, only the first pitch column Pi _j 52 and the last pitch column Pi _k 53 as shown in FIG. By selecting (S ₃ ), the set 54 between the remaining two sets is removed (S ₄ ), leaving only the first and last pitch sets by matrix decimation.

지금까지 매트릭싱 데시메이션에 집합의 제거과정은 본 발명에 있어서 첫 번째 압축과정이다.So far, the elimination of the set in matrix decimation is the first compression process in the present invention.

계속해서, 본 발명의 두 번째의 압축과정인 국부 봉우리와 골(local peak and valley)을 이용한 비균일 표본화(nonuniform sampling)과정을 수행하기 위해 상기 첫 번째 압축과정에서 얻어진 첫 번째 피치집합(52)과 마지막 피치집합(53)에 대해 국부 봉우리(LP(.))와골(LV(.))을 검출한다(S₅).Subsequently, the first pitch set 52 obtained in the first compression process to perform a nonuniform sampling process using a local peak and valley, which is the second compression process of the present invention. And local peaks LP (.) And valleys LV (.) For the last pitch set 53 (S ₅ ).

이러한 봉우리와 골을 사용한 비균일 표본화 과정은 음성신호가 기본 주파수와포만트성분으로 이루어진다는 것을 이용한 것으로서, 특히 인지(perceptual)에 불필요한 주파수를 제거한다는 개념이다.This non-uniform sampling process using peaks and valleys is based on the fact that the speech signal consists of the fundamental frequency and formant components, and in particular, removes unnecessary frequencies for perceptual.

제6도에 도시된 바와 같이, 음성신호는 기본 주파수와 여러 고조파로 구성되어 있기 때문에 봉우리(66)와 골(67)을 검출할 경우 인지에 중요한 주파수만을 검출할 수 있게 된다.As shown in FIG. 6, since the voice signal is composed of a fundamental frequency and various harmonics, when detecting the peak 66 and the valley 67, only a frequency important for recognition can be detected.

마지막으로, 상기 단계(S₅)에서 검출된 봉우리와 골에 대한 양자화 레벨값(504)과 간격값(505)이 전송되게 된다.Finally, the quantization level value 504 and the spacing value 505 for the peaks and valleys detected in step S ₅ are transmitted.

한편, 부호화기에서 압축되어 전송되는 음성신호의 복호화 과정은 제4도(나)의 본 발명에 의한 음성 데이터의 복호화 순서흐름도를 참고하여 설명하면 다음과 같다.Meanwhile, the decoding process of the speech signal compressed and transmitted by the encoder will be described with reference to the decoding flow chart of the speech data according to the present invention of FIG.

복호화 과정은 복호화기에 입력되는 음성신호가 유성음임을 판단하면서 시작된다(S₆). 유성음이면, 비균일 표본화의 역화정으로서 부호화기측에서 전송된 국부 봉우리와 골의 진폭과 간격값으로 봉우리와 골 사이에 직선으로 근사(linear interpolation)(63)하여 제거된 샘플들(61)을 인터폴레이션한다(S₇).The decoding process starts with determining that the voice signal input to the decoder is voiced sound (S ₆ ). If it is voiced, it interpolates the removed samples 61 by linear interpolation 63 between the peaks and valleys with amplitude and spacing values of the local peaks and valleys transmitted from the encoder side as a non-uniform sampling inverse. (S ₇ ).

상기 매트릭싱 인터폴레이션 과정에 의해 첫 번째 피치집합(52)과 마지막 피치집합(53)을 재구성하게 된다.The first pitch set 52 and the last pitch set 53 are reconstructed by the matrixing interpolation process.

즉, 상기 두 피치집합 각각의 샘플(samle to sample)에서 차(differential) 값을 구하고, 식-3에 의해 매트릭싱 데시메이션된 집합의 갯수로 나누어 매트릭싱 인터폴레이션해야 할 지연 갯수(L_itp)를 얻는다.That is, a differential value is obtained from a sample of each of the two pitch sets, and the number of delays (L _itp ) to be interpolated by dividing by the number of sets metric decimated by Equation-3 Get

[식-3]Equation-3

L_itp(n)=[P_lj(n)-P_ij(n)]/KL _itp (n) = [P _lj (n) -P _ij (n)] / K

여기서, K는 프래임내의 갯수이고, P_lj는 첫 번째 피치집합열이고, P_ij는 마지막 피치집합열이다.Where K is the number in the frame, P _lj is the first pitch set sequence, and P _ij is the last pitch set sequence.

이어서, 상기 재구성된 열집합(P_lj)과(P_ik)으로 2차원 행렬(P_ij)을 형성한(S₈) 후, 디매트릭싱(S₉)에 따라 2차원 신호에서 1차원 신호로 재구성하여 원신호를 재생한다.Subsequently, a two-dimensional matrix P _ij is formed from the reconstructed column set P _lj and P _ik (S ₈ ), and then a two-dimensional signal is converted into a one-dimensional signal according to dematrixing S ₉ . Reconstruct the original signal.

이상 설명에서와 같이, 본 발명은 두 번의 압축 및 신장을 통해서 필요로 하는 데이터가 절감되고 음의 변화가 심한 곳에서도 적절히 적응할 수 있다.As described above, the present invention can be suitably adapted even where the data required through two compression and decompression is reduced and the negative change is severe.

또한, 음성신호가 단시간에는 느린 변화를 하며 준주기적인 특성을 가진다는 것을 이용하여 불필요한 데이터를 줄일 수 있고, 2차원 피치행렬과 비균일 표본화법을 사용하여 자연성 및 명료성이 우수함은 물론 계산량이 적고 간단한 구조를 갖는 특장점이 있다.In addition, it is possible to reduce unnecessary data by using a slow change in a short time and a quasi-periodic characteristic, and excellent naturalness and clarity using a two-dimensional pitch matrix and non-uniform sampling method, as well as a small amount of calculation. It has the advantage of having a simple structure.

Claims

An apparatus for encoding and decoding speech data in a digital communication system,

Primary voice data compression device that separates input voice signal by pitch unit and performs primary data compression through matrixing, and non-uniformity using local peaks and valleys to compress data compressed by this primary voice data compression device. An encoding device comprising a secondary speech data compression device that performs secondary compression through sampling;

And a decoding apparatus for firstly extending the speech data encoded by the encoding apparatus through the inverse process of the non-uniform sampling, and then restoring the original speech signal through the second expansion by dematrixing. Data encoding / decoding device.

The method of claim 1,

The primary voice data compression device includes a pitch detector 11 for detecting a pitch of an input voice signal and a matrix unit 12 for separating pitches detected by the pitch detector 11 in units of units to form a pitch set. A pitch change detector 13 for detecting a change in pitch detected by the pitch detector 11, an energy change detector 14 for detecting an energy change in the pitch detected by the pitch detector 11, and According to the pitch detected by the pitch change detector 13 and the energy detected by the energy conversion detector 14, the first and second pitch sets of the pitch sets configured in the matrixing unit 12 are left, and the remaining sets are removed. The matrix decimation unit 15 and the representative pitch set selector 16 which select the representative pitch sets of the first and last pitch sets remaining by the matrix decimation unit 15. An encoding / decoding apparatus for audio data as that sex characteristics.

The method of claim 1,

The secondary speech data compression apparatus includes a pitch set storage unit 21 and 22 for storing the first pitch set and the last pitch set obtained in the first compression process, and the pitch set storage unit 21 and 22, respectively. Quantization level information for storing local peaks and valleys for the first and last pitch sets of the quantization level information for peaks and valleys detected by the peaks and valleys 23 and quantization level information for the peaks and valleys detected by the peaks and valleys 23. Sampling interval information storage unit 25 for storing sampling interval information for the peaks and valleys detected by the storage unit 24 and the peak / valley detection unit 23, and the quantization level of the quantization level information storage unit 24 And a data compression transmission section (26) for compressing and transmitting data to a value and a sampling interval value of the sampling interval information storage section (25).

The method of claim 1,

The apparatus for decoding speech data includes a peak aggregation section 32, a valley aggregation section 33, and a peak of the peak aggregation section 32, each of which stores local peaks and bone sets of compressed speech data that are encoded and transmitted. A matrix interpolation unit 34 for reconstructing the first and last pitch sets using a linear approximation using the quantization level values and spacing values of the valleys of the bone set unit 33, and the matrix interpolation unit 34 reconstructed The matrix reconstruction unit 35 forming the remaining pitch sets of the two-dimensional matrix by two pitch sets, and the two-dimensional matrix formed by the matrix reconstructing unit 35 are reproduced as one-dimensional voice signals to reproduce the original voice signal. Speech data encoding / decoding apparatus, characterized in that consisting of a de-matrix unit 36 for restoring the.

In the audio data encoding and decoding method in a digital communication system,

The first step of pitch unit matrixing that detects the pitch of the input voice signal and converts the one-dimensional voice signal into a two-dimensional signal, and selects the first pitch set and the last pitch set among the two-dimensional signals configured in the first step. After the second stage of the matrix decimation, leaving only the first and last pitch set and removing the set between the remaining two sets, the local peaks and valleys are calculated for the first pitch set and the last pitch set obtained in the second stage. A speech data encoding process comprising a fourth step of non-uniform sampling process of detecting and compressing the data once again using the quantization level value and the interval value for the detected peaks and valleys;

A first step of matrix interpolation for reconstructing the first pitch set and the last pitch set with quantization level values and non-uniform sampling interval values of local peaks and valleys to restore the compressed data in the fourth step of the data compression process; A second step of matrixing interpolation for reconstructing the first pitch set and the last pitch set reconstructed in the first step and the remaining pitch sets of the two-dimensional matrix; and the two-dimensional matrix signal reconstructed in the second step, And a speech data decoding process comprising a third step of dematrixing to reproduce and restore an original signal.