KR19980035868A

KR19980035868A - Speech data encoding / decoding device and method

Info

Publication number: KR19980035868A
Application number: KR1019960054318A
Authority: KR
Inventors: 배성근
Original assignee: 김영환; 현대전자산업 주식회사
Priority date: 1996-11-15
Filing date: 1996-11-15
Publication date: 1998-08-05

Abstract

본 발명은 디지털 통신 시스템의 음성신호 처리에 있어서, 피치변화에 따른 피치 동기 인터폴레이션 방법을 이용하여 음성신호를 부호 및 복호화함으로써 필요로 하는 데이터의 양을 절감하여 구조를 간단하게 함은 물론 계산량을 줄임으로써 실시간 처리가 용이하고, 음의 자연성과 명료성을 향상시킬 수 있는 음성신호 처리 시스템 및 그 제어방법에 관한 것으로, 입력되는 음성신호의 피치를 검출하고 이 검출한 피치를 단위로하여 피치주기 집합열을 구성하는 매트릭싱과정과, 상기 매트릭싱 과정에서 구성된 피치주기 집합열 중 첫 번째 피치주기 집합열과 마지막 피치주기 집합열 사이의 집합열들을 제거하고 상기 첫 번째 피치주기 집합열과 마지막 피치주기 집합열만을 전송하는 데시메이션 과정과, 상기 데시메이션 과정에 의해 전송되는 첫 번째 피치주기 집합열과 마지막 피치주기 집합열 사이의 차신호와 데시메이션된 피치주기 집합열의 개수에 의해 상기 두 피치주기 집합열 사이의 집합들을 복원하여 원래의 음성신호를 재생하는 인터폴레이션 과정으로 이루어져 인간과 기계 상호간의 통신에 사용되는 마우스, 모니터, 키보드와 같은 입출력 방식에 음성을 적용하여 사용할 수 있고, text-to-speech 분야에서 적은 량의 데이터를 가지고도 합성할 수 있는 효과가 있다.According to the present invention, in the speech signal processing of a digital communication system, the amount of data required is reduced by encoding and decoding a speech signal using a pitch synchronous interpolation method according to a pitch change, thereby simplifying a structure and reducing a calculation amount. The present invention relates to a voice signal processing system and a method of controlling the same, which facilitates real-time processing and improves sound naturalness and clarity. The matrixing process constituting the matrix, and the set sequence between the first pitch period set sequence and the last pitch period set sequence among the pitch period set sequence constructed in the matrix process is removed, and only the first pitch period set sequence and the last pitch period set sequence The decimation process to be transmitted and the first to be transmitted by the decimation process. It is composed of an interpolation process of reconstructing the sets between the two pitch period set sequences by the difference signal between the second pitch period set sequence and the last pitch period set sequence and the number of decimated pitch period set sequences to reproduce the original audio signal. Voice can be applied to input / output methods such as mouse, monitor, and keyboard used for communication between machines, and it is effective to synthesize even a small amount of data in text-to-speech field.

Description

Speech data encoding / decoding device and method

제1도 (가)~(다)는 일반적인 피치 동기 인터폴레이션 방식에 따른 일 예의 파형을 나타내는 도면.1 (a) to (c) are diagrams showing an example waveform according to a general pitch synchronous interpolation scheme.

제2도는 본 발명에 의한 부호화기의 블록 구성도.2 is a block diagram of an encoder according to the present invention.

제3도는 본 발명에 의한 복호화기의 블록 구성도.3 is a block diagram of a decoder in accordance with the present invention.

제4도는 본 발명에 의한 인터폴레이션 과정을 보여주는 도면.4 is a diagram showing an interpolation process according to the present invention.

제5도(가)는 한 프래임내의 음성신호 파형도, (나)는 본 발명에 의한 음성신호 파형도.5 is a waveform diagram of audio signals in one frame, and (b) is diagram of audio signal waveforms according to the present invention.

*도면의 주요 부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

10:유/무성음 분리부20:피치 검출부10: voice / silent sound separation unit 20: pitch detection unit

30:매트릭싱부40:데시메이션부30: matrix part 40: decimation part

50:가우시안 잡음 발생부60:복호화기50: Gaussian noise generator 60: Decoder

70:부호화기80:첫 번째 피치주기 집합열 저장부70: encoder 80: first pitch period set sequence storage unit

90:마지막 피치주기 집합열 저장부100:덧셈기90: last pitch period set sequence storage unit 100: adder

110:피치집합 차신호 보상부120:인터폴레이션부110: pitch set difference signal compensation unit 120: interpolation unit

본 발명은 디지털 통신 시스템의 음성데이터 처리에 있어서, 피치변화에 따른 피치 동기 인터폴레이션(Pitch synchronous Interpolation; 이하 'PSI'라고 약칭함) 방법을 이용하여 음성데이터를 부호 및 복호화함으로써 필요로 하는 데이터의 양을 절감하여 구조를 간단하게 함은 물론 계산량을 줄임으로써 실시간 처리가 용이하고, 음의 자연성과 명료성을 향상시킬 수 있는 음성데이터 부호화/복호화장치 및 그 방법에 관한 것이다.According to the present invention, an amount of data required by encoding and decoding voice data by using pitch synchronous interpolation (hereinafter, referred to as 'PSI') method in accordance with a pitch change in a digital data processing system. The present invention relates to a speech data encoding / decoding apparatus and a method for simplifying a structure and reducing a calculation amount in real time, and for improving the naturalness and clarity of sound.

일반적으로 급속히 변화하는 산업사회에서 쏟아지는 많은 정보들은 영상이나 음성을 통해서 전달되게 되며, 특히 상기 음성을 사용한 정보 전달 방법은 통신 수단이나 대화수단으로서 가장 오랫동안 사용되어 온 통신 방법이다.In general, a lot of information pouring in a rapidly changing industrial society is transmitted through video or audio, and in particular, the information transmission method using the voice is a communication method that has been used for a long time as a communication means or a conversation means.

최근, 통신방식이 아날로그에서 디지털화 되어가는 시점에서 음성신호를 디지털화하여 저장하고 송수신하는 디지털 통신의 음성신호 처리에 있어서, 적은 비트수로 고음질을 유지하기 위한 많은 음성 합성 기술 및 부호화, 복호화기술이 대두되고 있다.Recently, in the voice signal processing of digital communication that digitally stores, transmits and receives a voice signal when the communication method is digitalized, many voice synthesis techniques, encoding, and decoding techniques for maintaining high sound quality with a small number of bits have emerged. It is becoming.

이러한 음성 부호화/복호화 기술에 있어서, 음성을 부호화하고 저장하는 방법으로는 음성신호에 존재하는 반복적이고 불필요한 잉여성분을 제거한 후에 저장 및 부호화하는 파형 부호화법(waveform coding method)과 음성 생성과정(speech production model)을 필터로 간주하여 여기원(excitation source)의 여기필터와 성도성분(vocal tract)의 여파기 필터로서 부호화하는 신호원부호화법(source coding method) 및 신호원 부호화법 중 상기 파형 부호화법과 합성한 혼성 부호화법(hybrid coding)이 있다.In the speech encoding / decoding technique, a method of encoding and storing a speech includes a waveform coding method and a speech production process of removing and storing a repetitive and unnecessary surplus present in a speech signal and storing and encoding the speech. model is considered as a filter and is synthesized with the above-mentioned waveform coding method among the source coding method and the signal source coding method for encoding as an excitation filter of an excitation source and a filter of a vocal tract. There is hybrid coding.

상기 파형 부호화법은 시간영역에서 잉여성분만을 제거하여 부호화하므로 음질의 자연성(naturality) 및 명료성(intelligibility)이 매우 높으나, 전송에 요구되는 데이터가 많아서 메모리에 효율적이지 못하다는 단점을 가지고 있으며, 이런 부호화 방법으로는 PCM(pulse code modulation), DM(delta modulation), ADM(adaptive modulation), ADPCM(adaptive pulse code modulation) 등이 있다.The waveform coding method removes only the excess part in the time domain and encodes the audio so that its naturalness and intelligibility are very high. However, the waveform coding method has a disadvantage in that it is not efficient in memory due to the large amount of data required for transmission. Methods include pulse code modulation (PCM), delta modulation (DM), adaptive modulation (ADM), adaptive pulse code modulation (ADPCM), and the like.

그리고, 음성 생성과정에 근거하여 음성은 음원(source)과 성도필터에 의해 생성되고 이를 모델링하여 인위적으로 부호화하는 신호원 부호화법은 시간영역에서 주파수영역으로 변환하여 여기성분(excitating)과 포만트성분(formant)을 분리하여 처리한다.In addition, based on the speech generation process, a speech is generated by a sound source and a vocal filter, and a signal source encoding method that artificially encodes the model by converting it from the time domain to the frequency domain is used to generate excitating and formant components. (formant) is treated separately.

특히, 상기 포만트성분을 부호화하고 여기정보를 부호화하기 위해 다시 두가지 방법을 나누어지는데, 먼저 포만트성분을 부호화하는 방법에는 LPC(linear prediction coding), LSP(linear spectrum pair), PARCOR 등이 있으며, 이런 방법은 특징 파라미터를 전송함에 따라 메모리 사용이 효율적인 장점이 있다.In particular, two methods are further divided to encode the formant component and the excitation information. First, the formant component encoding method includes a linear prediction coding (LPC), a linear spectrum pair (LSP), and a PARCOR. This method has the advantage of efficient memory usage by transmitting feature parameters.

그러나, 음의 전이, 음의 시작/끝, 유/무성음의 반복시에는 상기 특징 파라미터만으로는 모델링을 할 수 없기 때문에 음질의 열화가 발생하게 되고, 특히 비음이나 마찰음을 모델링하는 데는 전극형모델(all-pole model)에서 극-영형모델(zero-pole model)까지 요구되어 음이 자연성과 명료성이 떨어지는 결점이 있다.However, when the transition of the sound, the start / end of the sound, and the repetition of the voiced / unvoiced sound cannot be modeled using the characteristic parameters alone, deterioration of the sound quality occurs. From the pole model to the zero-pole model, there is a drawback that the sound is less natural and clear.

또한, 여기정보를 부호하는 데에는 분석에 따른 합성(synthesis by analysis)을 주로 사용하며, 특히 CELP(code excited linear prediction), VSELP(vector sum linear prediction), RELP 등을 사용하여 낮은 비트율을 가진 코덱(codec)을 구현하고 있다.In addition, synthesis by analysis is mainly used to code excitation information, and a codec having a low bit rate using code excited linear prediction (CELP), vector sum linear prediction (VSELP), RELP, etc. codec) is implemented.

그러나, 상기 부호화법도 낮은 비트율로서 고음질을 보장받기 위해서는 반복적인 계산과정과 비교과정이 필요하게 되어 계산량이 방대해지고 구조가 복잡해지는 단점이 있다.However, the encoding method also requires a repetitive calculation process and a comparison process in order to ensure a high sound quality at a low bit rate, resulting in a large amount of calculation and a complicated structure.

상기와 같은 단점을 해결하고자 최근에는 준주기적인 주기를 가지는 음성신호의 데이터량을 감소하기 위해 PSI법을 사용하고 있는데, 이러한 PSI법은 반복되는 음의 피치구간(pitch period)이나 여기정보를 피치주기에 동기시켜 반복되는 주기를 전 프래임내에 반복적으로 합성하는 방식으로 규칙에 의한 합성이나 파형편집 기술에 적용된다.Recently, in order to solve the above disadvantages, the PSI method is used to reduce the amount of data of a speech signal having a quasi-periodical period. The PSI method pitches repeated pitch periods or excitation information. It is applied to synthesis or waveform editing techniques by rules by repeatedly synthesizing cycles repeated in synchronization with periods.

즉, 제1도의 PSI법에 따른 일 예의 파형 중 (가)는 음성신호의 유성음의 파형도이고, (나)는 상기 (가)의 음성신호 중 반복되는 피치집합의 대표 피치구간을 나타낸 것이며, (다)는 PSI법을 사용하여 (나)의 피치주기를 반복하여 합성한 결과 파형도로서, 상기 (가)와 (다)의 파형을 비교해 볼 때, (다)의 합성 음성신호가 원래의 (가)와 같은 음성신호처럼 정확하게 재생되지 않았음을 볼 수 있다.That is, (a) of the example waveforms according to the PSI method of Fig. 1 is a waveform diagram of the voiced sound of the voice signal, (b) shows a representative pitch interval of the repeated pitch set of the (a) voice signal, (C) is a waveform diagram obtained by repeatedly synthesizing the pitch period of (B) using the PSI method. When comparing the waveforms of (A) and (C), the synthesized audio signal of (C) is the original. It can be seen that it is not played back exactly like the voice signal as (A).

상기 PSI법을 이용한 종래의 음성 부호화 및 합성방법 역시 일부 파형으로 균일하게 합성함에 따라 강세변화에 대한 에너지 보상문제가 발생하게 되고, 또한 정확한 피치의 검출이 어려울 뿐만 아니라, 음의 변화가 심하고 피치가 변화하는 구간이나 강세와 억양의 변화가 심한 음에서는 심각한 열화뿐 아니라 많은 계산시간이 필요하여 음성데이터의 부호화가 매우 어려운 문제점이 있었다.The conventional speech coding and synthesis method using the PSI method also uniformly synthesizes some waveforms, which causes energy compensation problems due to stress changes, and it is difficult to accurately detect pitches, and the sound changes and the pitches are severe. In the sound of changing intervals or strong stress and intonation, there is a problem that encoding of voice data is very difficult because not only serious degradation but also many calculation time are required.

따라서, 본 발명은 상기와 같은 문제점들을 해결하기 위해 창안된 것으로서, 그 목적은 매트릭싱과 데시메이션 등의 PSI법을 이용하여 대표 피치주기내에 있는 성분만을 전송하는 부호화 과정과 인터폴레이션에 의한 복호화 과정을 통해 필요로 하는 데이터의 양을 절감하여 간단한 구조와 적은 계산량으로 실시간 처리가 용이하고, 음의 자연성과 명료성을 향상시킬 수 있는 음성데이터 부호화/복호화장치 및 그 방법을 제공하는 데에 있다.Accordingly, the present invention has been devised to solve the above problems, and an object thereof is to provide an encoding process for transmitting only components within a representative pitch period and a decoding process by interpolation using a PSI method such as matrixing and decimation. The present invention provides a speech data encoding / decoding apparatus and a method for reducing the amount of data required through the real-time processing with a simple structure and a small amount of calculation, and improving sound naturalness and clarity.

이러한 목적을 달성하기 위한 본 발명의 음성데이터 부호화/복호화장치는, 음성데이터를 부호화하는 경우, 입력되는 음성데이터의 피치를 검출하고 이 검출한 피치를 단위로하여 피치접합을 구성하는 매트릭싱 과정을 수행한 후, 첫 번째 피치집합과 마지막 피치집합 사이의 집합들을 데시메이션하여 상기 첫 번째 피치집합과 마지막 피치집합만을 전송하는 부호화 과정을 수행하도록 하고, 복호화시에는 상기 첫 번째 피치집합과 마지막 피치집합 사이의 차신호에 의해 상기 데시메이션된 피치집합들을 복원하여 원래의 음성데이터를 재생하도록 한다.The speech data encoding / decoding apparatus of the present invention for achieving the above object, when encoding the speech data, detects the pitch of the input speech data, and performs a matrixing process for constructing a pitch junction in units of the detected pitch; After performing the decimation of the sets between the first pitch set and the last pitch set, performing an encoding process for transmitting only the first pitch set and the last pitch set, and during decoding, the difference between the first pitch set and the last pitch set. The decimated pitch sets are restored by the signal to reproduce the original voice data.

이하, 첨부된 도면을 참조하여 본 발명의 음성데이터 부호화/복호화장치 및 그 방법을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the speech data encoding / decoding apparatus and method thereof.

제2도는 본 발명의 음성데이터를 부호화하기 위한 복호화기의 블록 구성도로서 이에 도시된 바와 같이, 디지털 변환부(5)에 의해 변환출력되는 디지털 음성데이터를 입력하여 유성음과 무성음으로 분리하는 유/무성음 분리부(10)와, 상기 유/무성음 분리부(10)에서 분리된 유무성음만을 입력하여 피치를 검출하는 피치 검출부(20)와, 상기 피치 검출부(20)에서 검출된 피치를 단위로 하여 피치주기 집합열을 구성하는 매트릭싱부(30)와, 상기 매트릭싱부(30)에서 구성된 피치주기 집합열 중 첫 번째 피치주기 집합열과 마지막 피치주기 집합열 사이의 집합열을 데시메이션하여 제거한 후 상기 첫 번째와 마지막 피치주기 집합열만을 복호화기(60)에 전송하는 데시메이션부(40)와, 상기 유/무성음 분리부(10)에서 무성음이 분리되어 출력되는 경우 상기 무성음을 가우시안 잡음으로 대치하기 위한 가우시안 잡음을 발생하는 가우시안 잡음 발생부(50)으로 구성된다.2 is a block diagram of a decoder for encoding voice data according to the present invention. As shown in FIG. 2, a digital voice data which is converted and output by the digital converter 5 is inputted and separated into voiced sound and unvoiced sound. On the basis of the unvoiced sound separation unit 10, the pitch detection unit 20 for detecting the pitch by inputting only the unvoiced sound separated by the voice / unvoiced sound separation unit 10, and the pitch detected by the pitch detection unit 20 as a unit The matrixing unit 30 constituting the pitch period set sequence and the first and the last pitch cycle set sequence among the pitch cycle set sequence configured in the matrixing section 30 are decimated and removed. When the unvoiced sound is separated and output from the decimation unit 40 and the voiced / unvoiced sound separating unit 10, the unvoiced sound is added to the decoder 60. The Gaussian noise generating unit 50 generates Gaussian noise for replacing with the Gaussian noise.

제3도는 본 발명의 음성데이터를 복호화하기 위한 복호화기의 블록 구성도로서 이에 도시된 바와 같이, 부호화기(70)측에서 전송되는 첫 번째 피치주기 집합열과 마지막 피치주기 집합열을 각각 저장하는 첫 번째 피치주기 집합열 저장부(80) 및 마지막 피치주기 집합열 저장부(90)와, 상기 첫 번째 피치주기 집합열과 마지막 피치주기 집합열의 차이를 가산하기 위한 덧셈기(100)와, 상기 덧셈기(100)에서 가산된 두 피치주기 집합열의 차신호에 의해 상기 데시메이션된 피치집합을 진폭과 개수로 보상하는 피치집합 차신호 보상부(110)와, 상기 피치집합 차신호 보상부(110)에서 보상된 진폭과 개수로 인하여 상기 데시메이션된 피치주기 집합을 복원하여 인터폴레이션부(120)로 구성된다.3 is a block diagram of a decoder for decoding voice data according to an embodiment of the present invention. As shown therein, a first pitch period set sequence and a last pitch period set sequence transmitted by the encoder 70 are respectively stored. A pitch period set sequence storage section 80 and a last pitch period set sequence storage section 90, an adder 100 for adding a difference between the first pitch period set sequence and the last pitch period set sequence, and the adder 100 A pitch set difference signal compensator 110 for compensating the decimated pitch sets by an amplitude and a number by a difference signal of the two pitch period set sequences added at, and an amplitude compensated by the pitch set difference signal compensator 110. The interpolation unit 120 is configured by restoring the decimated pitch period set due to the number of times.

상기와 같이 구성되는 본 발명의 음성데이터 부호화 및 복호화 방법을 설명하면 다음과 같다.Referring to the speech data encoding and decoding method of the present invention configured as described above are as follows.

먼저, 음성데이터를 부호화하기 전에 음성데이터를 유성음과 무성음으로 분리한 후 부호화를 시작한다.First, before encoding the voice data, the voice data is separated into voiced sound and unvoiced sound, and then encoding is started.

입력되는 음성신호가 유성임은 경우에는 음성데이터를 256 샘플씩 한 프래임으로 나누어 처리하게 되는데, 프래임내에서 두드러진 봉우리(paek)와 봉우리 또는 골(valley)과 골 사이를 피치라 하여 이 피치를 검출한다.If the input voice signal is voiced, the voice data is divided into one frame for 256 samples. The pitch is detected between the peaks and peaks or valleys and valleys in the frame. do.

즉, 아래의 식-1과 같은 자기상관관계법을 이용한 피치검출식에 의해 정확한 피치를 검출하게 된다.That is, the correct pitch is detected by the pitch detection equation using the autocorrelation method as shown in Equation-1 below.

[식-1][Equation-1]

이어서, 상기 검출된 피치에서 피치단위로 다시 재 분류하여 2차원 행렬의 피치주기 집합열을 구성한다.Subsequently, the detected pitches are reclassified in units of pitches to form a pitch period set sequence of the two-dimensional matrix.

즉, 매트릭싱 과정을 통해 한 프래임내에서 각 피치주기에 인덱스(p)를 부여하여 피치주기 집합(X_p)을 만든다.That is, the index period p is assigned to each pitch period within one frame to form a pitch period set X _p .

그리고 상기 구성된 피치주기 집합열에서 처음에 검출된 첫 번째 피치(P_F)와 제일 마지막에 검출된 마지막 피치(P_L) 집합열만을 제외하고 나머지 두 피치 집합열 사이의 집합열을 데시메이션하여 제거하여 음성데이터를 부호화한다.And decimating the set sequence between the remaining two pitch set sequences except for the first pitch P _F first detected and the last pitch P _L last detected in the configured pitch period set sequence. Audio data is encoded.

이때, 피치주기축(X)과 피치갯수축(P)으로 매트릭싱이 이루어져 프래임내의 음성데이터는 상기 식-2와 같이 표현되게 된다.At this time, the matrix is composed of the pitch period axis (X) and the pitch number axis (P) so that the voice data in the frame is expressed as in Equation-2 above.

[식-2][Equation-2]

s(n)=P_F(k)+P_D(k)+P_L(k),k=1, 2, …, pitch periods (n) = P _F (k) + P _D (k) + P _L (k), k = 1, 2,... , pitch period

즉, 상기 데시메이션된 성분들의 과정은 식-3과 4에 도시된 바와 같이, 각각 피치주기단위로 매트릭싱을 수행한 후에 2차원 행렬식을 가지게 된다.That is, the process of the decimated components has a two-dimensional determinant after performing matrixing on a pitch period basis, as shown in Equation-3 and 4, respectively.

[식-3]Equation-3

[식-4][Equation-4]

여기서, S는 피치주기단위(P)로 나누어진 K개로 나누어진 행렬으로서, 최종적으로 전송되는 샘플들은 처음과 마지막의 피치주기 샘플 집합인 것이다.Here, S is a matrix divided into K divided by pitch period units (P), and the finally transmitted samples are the first and last pitch period sample sets.

따라서, 복호화기에 전송되는 샘플 집합은 단지 두 집합 P_F(X)과 P_L(X)이다.Thus, the sample set transmitted to the decoder is only two sets P _F (X) and P _L (X).

상기와 같이 매트릭싱과 데시메이션에 의한 PSI법을 사용하는 부호화 과정시, 첫 번째 피치주기 집합열과 마지막 피치주기 집합열의 대표 피치주기내에 있는 성분만을 복호화기 측에 전송하게 된다.In the encoding process using the PSI method by matrixing and decimation as described above, only components within a representative pitch period of the first pitch period set sequence and the last pitch period set sequence are transmitted to the decoder.

한편, 상기 부호화로부터 전송되는 두 피치주기 집합열을 수신하는 복호화기에서는 상기 부호화에서 데시메이션된 피치집합(P_D(n))을 복원하기 위해 상기 첫 번째 피치주기 집합열과 마지막 피치주기 집합열의 차(△d(k))를 식-5에 의해 구하도록 한다.Meanwhile, in a decoder that receives two pitch period set sequences transmitted from the encoding, a difference between the first pitch period set sequence and the last pitch period set sequence is used to restore the pitch set P _D (n) decimated in the encoding. (Δd (k)) is obtained by the expression-5.

[식-5]Equation-5

△d(k)=P_F(k)=P_L(k),k=1,2,…, pitch periodΔd (k) = P _F (k) = P _L (k), k = 1,2,... , pitch period

여기서, P_F(k)는 첫 번째 검출된 한 피치주기 내의 샘플 열이고, P_L(k)는 마지막에 검출된 한 피치주기내의 샘플 열이다.Here, P _F (k) is the sample string within the first detected one pitch period, and P _L (k) is the sample string within the last detected one pitch period.

이는 부호화기에서 데시메이션 과정으로 제거된 피치집합을 진폭과 갯수로 보상하기 위한 것이다.This is to compensate for the amplitude and number of pitch sets removed by the decimation process in the encoder.

이어서, 데시메이션된 피치주기 집합열의 개수(Occ)를 계산하여 식-6에 의해 상기 차(△d(k))의 값을 개수(Occ)로 나누어 인터폴레이션 샘플들을 정의하도록 한다.Subsequently, the number Occ of the decimated pitch period set sequence is calculated, and the interpolation samples are defined by dividing the value of the difference Δd (k) by the number Occ according to Equation-6.

[식-6][Equation-6]

P_D(k,t)=△d(k)/Occ,t=1,2,…, OccP _D (k, t) = Δd (k) / Occ, t = 1,2,... , Occ

k=1,2,…, pitch periodk = 1,2,... , pitch period

여기서, t는 데시메이션 피치집합 수이고, k는 피치주기내의 샘플이다.Where t is the number of decimation pitch sets and k is the sample in the pitch period.

상기 식-5에서 얻은 변화량과 P_F(X)과 P_L(X)의 차 집합으로 각각에 인터폴레이션되어 원래의 음성데이터가 재생된다.The original speech data is reproduced by interpolation between the amount of change obtained in Equation 5 and the difference set between P _F (X) and P _L (X).

즉, 제4도의 본 발명에 의한 인터폴레이션 과정을 보여주는 도면에 도시된 바와 같이, 전송되는 샘플과 본 발명의 복호화기에서 인터폴레이션된 샘플들을 정확히 볼 수 있다.That is, as shown in the figure showing the interpolation process according to the present invention of FIG. 4, it is possible to accurately see the samples transmitted and the interpolated samples in the decoder of the present invention.

한편, 입력되는 음성이 무성음일 경우 유/무성 검출단에서 상기 무성음을 가우시안잡음으로서 대치하여 처리하도록 한다.On the other hand, if the input voice is an unvoiced sound, the voiced / unvoiced detection stage replaces the unvoiced sound with Gaussian noise to process it.

상기 본 발명의 음성신호 처리제어방법에 의한 부호화기 및 복호화기를 통해 제5도(나)의 음성신호 파형도는 (가)의 파형도에 비해 명료성 및 자연성이 우수함을 알 수 있다.It can be seen that the audio signal waveform diagram of FIG. 5 (b) has better clarity and naturalness than the waveform diagram of (a) through the encoder and the decoder according to the voice signal processing control method of the present invention.

이상에서와 같이 본 발명은 PSI법에서 매트릭싱이 끝난 후에 첫 번째 피치주기내의 샘플에서 마지막 피치주기내의 샘플로 데시메이션된 피치주기샘플을 인터폴레이션함에 따라 필요로 하는 데이터를 절감할 수 있으며, 또한 피치검출과 매트릭싱 과정의 계산량만이 필요하므로 간단한 구조를 이룰 수 있다.As described above, the present invention can reduce the data required by interpolating the pitch period sample decimated from the sample in the first pitch period to the sample in the last pitch period after the matrixing is completed in the PSI method. Only a computational amount of detection and matrixing is required, so a simple structure can be achieved.

그리고, 디지털 통신에서 낮은 비트율로 음성을 부호화함에 따라 PCM 통신에 적용할 수 있으며 특히, 인간과 기계 상호간의 통신에 사용되는 마우스, 모니터, 키보드와 같은 입출력 방식에 음성을 적용하여 사용할 수 있고, text-to-speech 분야에서 적은 량의 데이터를 가지고도 합성할 수 있는 효과가 있다.In addition, it can be applied to PCM communication as it encodes voice at low bit rate in digital communication. In particular, it can be applied to input / output methods such as mouse, monitor, and keyboard used for communication between human and machine. In the field of -to-speech, there is an effect that can be synthesized with a small amount of data.

Claims

An apparatus for decoding and decoding voice data in a digital communication system,

A voice / unvoiced sound separation unit 10 for inputting digital voice data to separate voiced and unvoiced sounds, a pitch detector 20 for detecting a pitch by inputting only voiced sounds separated by the voiced / unvoiced sound separation unit 10, and The matrixing unit 30 constituting the pitch period set sequence based on the pitch detected by the pitch detection unit 20, and the first pitch period set sequence and the last pitch period set of the pitch period set sequence configured in the matrix unit 30. When the decimation unit 40 transmits only the first and last pitch period sets after removing the set sequence between columns, and the unvoiced sound is separated and output from the voiced / unvoiced separator 10, the unvoiced sound is Gaussian noise. An encoder composed of a Gaussian noise generator (50) for generating Gaussian noise to replace with;

A first pitch period set string storage unit 80 and a last pitch period set string storage unit 90 respectively storing a first pitch period set string and a last pitch period set string transmitted from the encoder; An adder 100 for adding a difference between the first pitch period set sequence and the last pitch period set sequence of the heat storage unit 80 and the last pitch period set sequence storage unit 90, and the two pitches added by the adder 100. Pitch set difference signal compensator 110 for compensating the decimated pitch set by amplitude and number by the difference signal of the period set sequence, and decimated by the amplitude and number compensated by the pitch set difference signal compensator 110. Encoding and decoding field of the voice data, characterized in that consisting of a decoder consisting of an interpolation unit 120 to restore the original pitch data by restoring the set of pitch periods .

In the audio data encoding and decoding method in a digital communication system,

A matrixing process of detecting a pitch of an input audio signal and constructing a pitch period set sequence based on the detected pitch as a unit, and a first pitch period set sequence and a last pitch period set among the pitch period set sequences configured in the matrixing process A decimation process of removing the set sequences between columns and transmitting only the first pitch period set sequence and the last pitch set, and a difference signal between the first pitch period set sequence and the last pitch period set sequence transmitted by the decimation process; And an interpolation process of reconstructing the sets between the two pitch period set strings by the number of decimated pitch period set strings and reproducing the original voice signal.