KR100452955B1

KR100452955B1 - Voice encoding method, voice decoding method, voice encoding device, voice decoding device, telephone device, pitch conversion method and medium

Info

Publication number: KR100452955B1
Application number: KR1019970060947A
Authority: KR
Inventors: 마사유끼 니시구찌; 준 마쯔모또; 아끼라 이노우에
Original assignee: 소니 가부시끼 가이샤
Priority date: 1996-11-19
Filing date: 1997-11-18
Publication date: 2005-05-20
Also published as: JPH10147672A; JP3681488B2; KR19980042556A

Abstract

음성신호가 부호화 또는 복호화되는 경우에 있어서, 단순처리 및 구조로 피치제어를 행하는 것이 가능하다. 소정의 부호화부에서 시간축상에서 음성신호를 분할함으로써 얻어지는 각 부호화부에 대하여 음성신호가 사인파분석 부호화를 행하는 경우에 있어서, 음성신호의 선형예측잔차가 얻어지고, 결과의 음성부호화데이터가 처리되고, 사인파 분석부호화에 의해 부호화된 음성부호화데이터의 피치성분이 피치변환부에서 소정의 계산처리에 의해 변경된다.In the case where an audio signal is encoded or decoded, pitch control can be performed by simple processing and structure. When the speech signal performs sinusoidal encoding on each encoder obtained by dividing the speech signal on the time axis by a predetermined encoder, a linear predictive residual of the speech signal is obtained, and the resulting speech coded data is processed and the sinusoidal wave is processed. The pitch component of speech encoded data encoded by the analysis encoding is changed by a predetermined calculation process in the pitch conversion unit.

Description

Voice encoding method, voice decoding method, voice encoding device, voice decoding device, telephone device, pitch conversion method and medium

본 발명은 음성신호가 고능률 부호화 또는 복호화되는 경우에 적용되는 부호화방법 및 복호화방법과, 부호화방법과 복호화방법이 적용되는 부호화장치, 복호화장치 및 전화장치와, 부호화 및 복호화의 처리데이터가 기록되는 여러 매체에 관한 것이다.The present invention provides an encoding method and a decoding method applied when a speech signal is highly efficient encoded or decoded, an encoding device, a decoding device and a telephone device to which an encoding method and a decoding method are applied, and processing data of encoding and decoding are recorded. It is about various media.

신호압축이 시간영역과 주파수영역에서 오디오신호(여기서 오디오신호는 음성신호와 음향신호를 포함한다)의 통계적 성질과 사람의 청감상의 성질을 활용함으로써 실행되는 여러 가지 부호화방법이 알려져 있다. 부호화방법은 시간영역에서 부호화, 주파수영역에서 부호화, 분석부호화등으로 넓게 분류된다.Various encoding methods are known in which signal compression is performed by utilizing the statistical properties of audio signals (where audio signals include voice signals and sound signals) and human hearing properties in the time domain and the frequency domain. Coding methods are broadly classified into coding in the time domain, coding in the frequency domain, and analysis coding.

음성신호 등의 고능률부호화의 예로서, MBE(multiband excitation: 멀티밴드여기)부호화, SBE(singleband excitation:싱글밴드여기) 혹은 사인파합성부호화, 하모닉(Harmonic)부호화, SBC(sub-band coding: 대역분할부호화), LPC(linear predictive coding:선형예측부호화), 또는 DCT(이산코사인변환), MDCT(모디파이드 DCT), FFT(고속프리에변환)등이 알려져 있다.Examples of high efficiency coding such as voice signals include MBE (multiband excitation) coding, SBE (singleband excitation) or sine wave synthesis coding, harmonic coding, SBC (sub-band coding) Split coding), LPC (linear predictive coding), DCT (discrete cosine transform), MDCT (modified DCT), FFT (fast free-switched transform), and the like are known.

음성신호가 상기 여러 가지 부호화방법을 사용하여 부호화되는 경우 또는 부호화된 음성신호가 복호화되는 경우에, 음성의 음소를 변경하지 않고 음성의 피치를 변화하고자 하는 것이 요구된다.When a speech signal is encoded using the various encoding methods or when the encoded speech signal is decoded, it is required to change the pitch of the speech without changing the phoneme of the speech.

종래의 음성신호의 고능률 부호화장치 및 고능률 복호화장치에 있어서, 피치변화는 고려되지 않고 분리된 피치제어장치를 연결하고 피치변환을 행하는 것이 필요하고, 복잡한 구조의 불이익을 초래하는 결과를 가져온다.In the conventional high efficiency encoding apparatus and high efficiency decoding apparatus for speech signals, it is necessary to connect a separate pitch control apparatus and perform pitch conversion without considering the pitch change, resulting in a disadvantage of a complicated structure.

상기의 점을 고려하여, 본 발명의 목적은 음성신호의 부호화처리와 복호화처리를 실행할 때 음소의 변화없이 간단한 처리와 구조로 원하는 피치제어를 정확하게 실행할수 있게 하는 것이다.In view of the foregoing, it is an object of the present invention to accurately execute desired pitch control with a simple process and structure without changing the phoneme when performing the encoding and decoding processing of a speech signal.

상기 서술한 문제를 해결하기 위해, 소정의 부호화장치에서 시간축상에서 음성신호를 분할하고, 각 부호화부에서 선형예측잔차를 구하고, 선형예측잔차에서 사인파분석부호화를 행할 때, 사인파분석부호화에 의해 부호화된 음성부호화데이터의 피치성분이 본 발명에 따라서 소정의 계산처리에 의해 변경된다.In order to solve the above-mentioned problem, a predetermined encoding apparatus divides a speech signal on the time axis, obtains linear predictive residuals in each encoder, and performs sinusoidal encoded encoding on the linear predicted residuals, which are encoded by sinusoidal encoded encoding. The pitch component of speech encoded data is changed by a predetermined calculation process in accordance with the present invention.

본 발명에 따라서, 피치변환은 사인파분석부호화에 의해 부호화된 음성부호화데이터의 계산처리에서 음소성분을 변경하지 않고 간단히 행할수 있다.According to the present invention, the pitch conversion can be simply performed without changing the phoneme components in the calculation processing of the speech encoded data encoded by the sinusoidal analysis encoding.

이하, 본 발명의 실시예를 도면을 참조하여 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the drawings.

도 1은 음성부호화장치의 일예의 기본 구성을 나타내는 블록도이고, 도 3은 그 상세한 구성을 나타내는 블록도이다.1 is a block diagram showing a basic configuration of an example of an audio encoding device, and FIG. 3 is a block diagram showing a detailed configuration thereof.

본 발명의 실시예의 음성처리의 기본개념을 이하 설명한다. 음성신호의 부호화측에서, 본 발명 등에 의해 전에 제안되고 일본공개번호 6-51800에 서술된 데이터변환의 차원변환의 기술 또는 수가 사용된다. 기술을 사용하는 스펙트럼 엔벌로프의 진폭의 양자화시에, 벡터양자화가 일정수 즉, 차원의 일정수에서 유지되는 하모닉스의 수로 실행된다. 스펙트럼 엔벌로프의 형상이 변화되지 않으므로, 음성성분에 포함되는 음소성분은 변화하지 않는다.The basic concept of speech processing according to the embodiment of the present invention will be described below. On the encoding side of the audio signal, a technique or number of dimensional transformations of data transformation previously proposed by the present invention and described in Japanese Laid-Open No. 6-51800 is used. In the quantization of the amplitude of the spectral envelope using the technique, vector quantization is performed with a number of harmonics maintained at a certain number, i.e., a certain number of dimensions. Since the shape of the spectral envelope does not change, the phoneme components included in the speech component do not change.

기본개념에서, 도 1의 음성신호 부호화장치는 LPC(선형예측부호화)잔차와 같은 단기예측잔차를 구하고 하모닉부호화와 같은 사인파분석부호화를 행하는 제 1부호화부(110), 입력음성신호에 대하여 위상송신으로 파형부호화에 의해 부호화를 행하는 제 2부호화부(120)를 포함한다. 제 1부호화부(110)는 입력신호의 V(유성음)부를 부호화하기 위해 사용되는 반면, 제 2부호화부(120)는 입력신호의 UV(무성음)부를 부호화하기 위해 사용된다.In the basic concept, the speech signal encoding apparatus of FIG. 1 obtains short-term prediction residuals such as LPC (linear predictive encoding) residuals, and performs a phase transmission on an input audio signal and a first encoding unit 110 that performs sine wave analysis encoding such as harmonic encoding. And a second encoding unit 120 which encodes by waveform encoding. The first encoder 110 is used to encode the V (voiced sound) part of the input signal, while the second encoder 120 is used to encode the UV (unvoiced) part of the input signal.

제 1부호화부(110)에서, LPC잔차에서 예를 들면 하모닉부호화 또는 멀티밴드여기(MBE)부호화와 같은 사인파분석부호화를 행하는 구조가 사용된다. 제 2부호화부(120)에서, 예를 들면 합성에 의한 분석법을 사용하는 최적벡터의 폐루프서치로 벡터양자화에 의해 코드여기 선형예측(CELP)부호화의 구조가 사용된다.In the first encoding unit 110, a structure for performing sinusoidal analysis encoding such as, for example, harmonic encoding or multiband excitation (MBE) encoding in the LPC residual is used. In the second encoding unit 120, a structure of code excitation linear prediction (CELP) encoding is used, for example, by vector quantization as a closed loop search of an optimal vector using a method of synthesis.

도 1의 예에서, 입력단자(101)에 공급된 음성신호는 LPC역필터(111)와 제 1부호화부(110)의 LPC분석 및 양자화부(113)에 보내진다. LPC분석 및 양자화부(113)에서 얻어진 LPC계수 혹은 소위 α파라미터가 LPC역필터(111)에 보내진다. LPC역필터(111)에 의해, 입력음성신호의 선형예측잔차(LPC예측)가 얻어진다. LPC분석 및 양자화부(113)에서, LSP(선형스펙트럼쌍)의 양자화된 출력이 후술하는 것같이 얻어지고 출력단자(102)에 보내진다. LPC역필터(111)에서 LPC잔차는 사인파분석부호화부(114)에 보내진다.In the example of FIG. 1, the audio signal supplied to the input terminal 101 is sent to the LPC analysis and quantization unit 113 of the LPC inverse filter 111 and the first encoding unit 110. The LPC coefficient or so-called α parameter obtained by the LPC analysis and quantization unit 113 is sent to the LPC inverse filter 111. By the LPC inverse filter 111, a linear prediction residual (LPC prediction) of the input audio signal is obtained. In the LPC analysis and quantization unit 113, the quantized output of the LSP (Linear Spectrum Pair) is obtained as described below and sent to the output terminal 102. In the LPC inverse filter 111, the LPC residual is sent to the sinusoidal analysis coding unit 114.

사인파분석부호화부(114)에서, 피치검출 및 스펙트럼 엔벌로프 진폭계산이 실행된다. 또한, V(유성음)/UV(무성음) 판정이 V/UV판정부(115)에 의해 행해진다. 사인파분석부호화부(114)로부터 스텍트럼 엔벌로프 진폭데이터는 벡터양자화부(116)에 보내진다. 스펙트럼 엔벌로프의 벡터양자화출력으로서, 벡터양자화부(116)로부터 코드북 인덱스가 스위치(117)를 거쳐서 출력단자(103)에 보내진다. 사인파분석부호화부(114)로부터 공급된 피치성분인 피치데이터는 피치변환부(119) 및 스위치(118)를 거쳐서 출력단자(104)에 보내진다. V/UV판정부(115)로부터 V/UV판정출력이 출력단자(105)에 보내지고, 그 제어신호로서 스위치(117, 118)에 보내진다. 상기 서술한 유성음(V)시에, 상술의 인덱스와 피치가 선택되어 출력단자(103, 104)에서 각각 얻어진다.In the sinusoidal analysis coding unit 114, pitch detection and spectral envelope amplitude calculation are performed. In addition, the V (voiced sound) / UV (unvoiced) determination is performed by the V / UV determiner 115. The spectrum envelope amplitude data from the sine wave analysis coding unit 114 is sent to the vector quantization unit 116. As a vector quantization output of the spectral envelope, the codebook index is sent from the vector quantization unit 116 to the output terminal 103 via the switch 117. Pitch data, which is the pitch component supplied from the sine wave analysis encoding unit 114, is sent to the output terminal 104 via the pitch conversion unit 119 and the switch 118. The V / UV determination output is sent from the V / UV determination unit 115 to the output terminal 105, and is sent to the switches 117 and 118 as the control signal. In the above-mentioned voiced sound V, the above-described index and pitch are selected and obtained at the output terminals 103 and 104, respectively.

피치변환커맨드를 받으면, 피치변환부(119)는 커맨드에 의거하는 계산처리에 의해 피치데이터를 변화시키고 피치변환을 행한다. 그 상세한 처리를 이하 설명한다.Upon receiving the pitch conversion command, the pitch conversion unit 119 changes the pitch data and performs the pitch conversion by calculation processing based on the command. The detailed processing will be described below.

벡터양자화부(116)에서의 벡터양자화의 때에는, 예를 들면 주파수축상의 유효대역 1블록분의 진폭데이터에 대하여, 블록내의 최종의 데이터에서 블록내의 최초의 데이터까지의 값을 보간하는 더미데이터, 또는 최후의 데이터 및 최초의 데이터를 연장하는 더미데이터가 최후와 최초에 적당한 수 부가된다. 데이터개수를 N_F개로 확대한다. 그 다음, 대역제한형의 O_S배(예를 들면 8배)의 오버샘플링을 실시함으로써 O_S배의 개수의 진폭데이터를 구한다. 이 O_S배의 개수((m_MX+1) X O_S)개의 진폭데이터를 직선보간하고 보다 많은 N_M개(예를 들면 2048개) 데이터로 확장한다. 이 N_M개의 데이터를 솎아내어 상기 일정개수 M(예를 들면 44개)의 데이터로 변환한 후, 벡터양자화되고 있다.In the vector quantization in the vector quantization unit 116, for example, dummy data for interpolating values from the last data in the block to the first data in the block with respect to amplitude data of one block of the effective band on the frequency axis, Alternatively, the last data and dummy data extending the first data are added appropriately at the end and the beginning. Increase the number of data to N _F. Next, the amplitude data of the number of times O _S is obtained by performing oversampling of O _S times (for example, 8 times) of the band limit type. The amplitude data ((m _MX +1) XO _S ) of this O _S times is linearly interpolated and expanded to more N _M (for example, 2048) data. The N _M pieces of data are extracted and converted into the predetermined number M (for example, 44 pieces) of data, and then vector quantized.

이 예에서는 제 2부호화부(120)는 CELP(부호여기선형예측)부호화구성을 가지고 있고, 잡음부호장(121)으로부터의 출력을 가중합성필터(122)에 의해 합성처리하고, 얻어진 가중음성을 감산기(123)에 보내고, 입력단자(101)에 공급된 음성신호를 청각가중필터(125)를 거쳐서 얻어진 음성과의 오차를 꺼내고, 이 오차를 거리계산회로(124)에 보내서 거리계산을 행하고, 오차가 최소가 되는 벡터를 잡음부호장(121)에서 서치하는 "합성에 의한 분석"(Analysis by Synthesis)법을 이용한 폐루프서치를 이용한 시간축파형의 벡터양자화를 행하고 있다. 이 CELP부호화는 상술한 바와같이 무음성부분의 부호화에 이용되고 있고, 잡음부호장(121)으로부터의 UV데이터로써의 코드북인덱스는 상기 V/UV판정부(115)로부터의 V/UV판정결과가 무음성(UV)일 때 온이 되는 스위치(127)를 거쳐서 출력단자(107)에서 꺼내진다.In this example, the second encoding unit 120 has a CELP (signal excitation linear prediction) encoding structure, and the weighted speech obtained by synthesizing the output from the noise encoding field 121 by the weighted synthesis filter 122 is obtained. Send to the subtractor 123, take out the error from the audio signal supplied to the input terminal 101 via the audio weight filter 125, send this error to the distance calculating circuit 124 to calculate the distance, The vector quantization of time-axis waveforms using closed-loop search using the "Analysis by Synthesis" method, which searches for the vector with the minimum error in the noise coding field 121, is performed. This CELP encoding is used for encoding the unvoiced portion as described above, and the codebook index as the UV data from the noise coding field 121 has the result of the V / UV determination from the V / UV determination 115. It is taken out of the output terminal 107 via the switch 127 which is turned on when it is silent (UV).

다음에, 도 1의 음성신호 부호화장치에서 부호화된 음성부호화데이터를 복호하는 음성신호 복호화장치의 기본구성을 도 2를 참조하여 설명한다.Next, a basic configuration of a speech signal decoding apparatus for decoding speech encoded data encoded by the speech signal encoding apparatus of FIG. 1 will be described with reference to FIG.

이 도 2에 있어서, 입력단자(202)에는 도 1에서 설명한 출력단자(102)에서의 LSP(선스펙트럼쌍)의 양자화출력으로서의 코드북인덱스가 입력된다. 입력단자(203, 204 및 205)에는 도 1의 각 출력단자(103, 104 및 105)로부터의 각 출력, 즉 엔벌로프 양자화출력으로써의 인덱스, 피치 및 V/UV판정출력이 각각 입력된다. 또, 입력단자(207)에는, 도 1의 출력단자(107)로부터의 UV(무성음)용의 데이터로써의 인덱스가 입력된다.In FIG. 2, the codebook index as the quantized output of the LSP (line spectrum pair) in the output terminal 102 described in FIG. 1 is input to the input terminal 202. As shown in FIG. Inputs 203, 204, and 205 are input to respective outputs from the output terminals 103, 104, and 105 of FIG. 1, that is, an index, pitch, and V / UV determination output as envelope quantization output, respectively. In addition, an index as data for UV (silent sound) from the output terminal 107 of FIG. 1 is input to the input terminal 207.

입력단자(203)로부터의 LPC잔차의 스펙트럼 엔벌로프 양자화출력으로써의 인덱스는, 역벡터양자화기(212)에 보내져서 역벡터양자화되고, 데이터변환부(270)에 보내진다. 또, 데이터변환부(270)에는 입력단자(204)에서의 피치데이터가 피치데이터변환부(215)를 거쳐서 공급된다. 그리고, 데이터변환부(270)에서는, LPC잔차의 스펙트럼 엔벌로프의 설정피치에 따른 개수의 진폭데이터와, 변경된 피치데이터가 유성음합성부(211)에 보내진다. 여기서, 피치변환부(215)에서는, 피치변환을 행하는 지시가 있을 때, 그 지시에 기초한 연산처리로, 피치데이터의 변경을 행하고, 피치변환을 행할수 있다. 그 상세한 처리에 대하여는 후술한다.The index of the LPC residual from the input terminal 203 as the spectral envelope quantization output is sent to the inverse vector quantizer 212, inverse vector quantized, and sent to the data converter 270. In addition, pitch data at the input terminal 204 is supplied to the data converter 270 via the pitch data converter 215. In the data converter 270, the amplitude data of the number corresponding to the set pitch of the spectral envelope of the LPC residual and the changed pitch data are sent to the voiced sound synthesis unit 211. Here, in the pitch converting section 215, when the instruction to perform the pitch conversion is made, the pitch data can be changed and the pitch transform can be performed by the calculation processing based on the instruction. The detailed processing will be described later.

유성음합성부(211)는 사인파합성에 의해 유성음부분의 LPC(선형예측부호화)잔차를 합성하는 것이다. 이 유성음합성부(211)에는 단자(205)로부터의 V/UV판정출력이 공급되어 있다. 유성음합성부(211)로부터의 유성음의 LPC잔차는 LPC합성필터(214)에 보내진다. 또, 입력단자(207)로부터의 UV데이터의 인덱스는 무성음합성부(220)에 보내져서, 잡음부호장을 참조함으로써 무성음부분의 LPC잔차가 꺼내진다. 이 LPC잔차도 LPC합성필터(214)에 보내진다. LPC합성필터(214)에서는 상기 유성음부분의 LPC잔차와 무성음부분의 LPC잔차가 각각 독립으로, LPC합성처리가 실시된다. 혹은 유성음부분의 LPC잔차와 무성음부분의 LPC잔차가 가산된 것에 대해서 LPC합성처리를 실시할수 있다. 여기서 입력단자(202)로부터의 LSP의 인덱스는 LPC파라미터 재생부(213)에 보내져서 LPC의 α파라미터가 꺼내지고, 이것이 LPC합성필터(214)에 보내진다. LPC합성필터(214)에 의해 LPC합성되어서 얻어진 음성신호는 출력단자(201)에서 꺼내진다.The voiced sound synthesis unit 211 synthesizes the LPC (linear predictive encoding) residual of the voiced sound portion by sine wave synthesis. The voiced sound synthesis section 211 is supplied with the V / UV judgment output from the terminal 205. The LPC residual of the voiced sound from the voiced sound synthesis unit 211 is sent to the LPC synthesis filter 214. The index of the UV data from the input terminal 207 is sent to the unvoiced speech synthesis section 220, and the LPC residual of the unvoiced speech section is taken out by referring to the noise code field. This LPC residual is also sent to the LPC synthesis filter 214. In the LPC synthesis filter 214, the LPC residual of the voiced sound portion and the LPC residual of the unvoiced sound portion are each independently performed. Alternatively, the LPC synthesis process may be performed on the addition of the LPC residual of the voiced sound portion and the LPC residual of the unvoiced sound portion. Here, the index of the LSP from the input terminal 202 is sent to the LPC parameter regeneration unit 213 to take out the α parameter of the LPC, which is sent to the LPC synthesis filter 214. The audio signal obtained by LPC synthesis by the LPC synthesis filter 214 is taken out from the output terminal 201.

다음에, 도 1에 나타낸 음성신호 부호화장치의 보다 기본적인 구성에 대해서 도 3을 참조하면서 설명한다. 또한, 도 3에 있어서, 도 1에 대응하는 부분에는 동일 부호를 붙이고 있다.Next, a more basic configuration of the audio signal encoding apparatus shown in FIG. 1 will be described with reference to FIG. 3. 3, the same code | symbol is attached | subjected to the part corresponding to FIG.

이 도 3에 나타낸 음성신호 부호화장치에 있어서, 입력단자(101)에 공급된 음성신호는 하이패스필터(HPF)(109)로써 불필요한 대역의 신호를 제거하는 필터처리가 실시된다. 그 다음, LPC(선형예측부호화) 분석·양자화부(113)의 LPC분석회로(132)와, LPC역필터회로(111)와에 보내진다.In the audio signal encoding apparatus shown in Fig. 3, the audio signal supplied to the input terminal 101 is subjected to a filter process of removing unnecessary band signals by the high pass filter (HPF) 109. Then, it is sent to the LPC analysis circuit 132 and the LPC inverse filter circuit 111 of the LPC (Linear Prediction Encoding) analysis and quantization unit 113.

LPC분석·양자화부(113)의 LPC분석회로(132)는, 입력신호파형의 256샘플정도의 길이를 1블록으로써 해밍창을 걸어서 자기상관법에 의해 선형예측계수, 소위 α파라미터를 구한다. 데이터출력의 단위가 되는 프레밍의 간격은 160샘플정도로 한다. 샘플링주파수(fs)가 예를 들면 8kHz일 때, 1프레임간격은 160샘플로 20msec가 된다.The LPC analysis circuit 132 of the LPC analysis and quantization unit 113 calculates the linear predictive coefficient, so-called α parameter, by autocorrelation through a Hamming window with a length of about 256 samples of the input signal waveform as one block. The interval between framing, which is the unit of data output, is about 160 samples. When the sampling frequency fs is 8 kHz, for example, one frame interval is 20 msec with 160 samples.

LPC분석회로(132)로부터의 α파라미터는, α→LSP변환회로(133)에 보내져서 선스펙트럼쌍(LSP) 파라미터로 변환된다. 이것은 직접형의 필터계수로써 구해진 α파라미터를 예를 들면 10개, 즉 5쌍의 LSP파라미터로 변환한다. 변환은 예를 들면 뉴튼랩슨법 등을 이용하여 행한다. 이 LSP파라미터로 변환하는 것은 α파라미터보다도 보간특성이 뛰어나고 있기 때문이다.The? Parameter from the LPC analysis circuit 132 is sent to the? → LSP conversion circuit 133 and converted into a line spectrum pair (LSP) parameter. This converts the α parameter obtained as a direct filter coefficient into, for example, ten or five pairs of LSP parameters. The conversion is performed using, for example, the Newton Labson method. The conversion to this LSP parameter is because the interpolation characteristic is superior to the? Parameter.

α→LSP변환회로(133)로부터의 LSP파라미터는, LSP양자화기(134)에 의해 매트릭스 혹은 벡터양자화된다. 이때, 프레임간 차분을 취하고서 벡터양자화해도 좋고, 복수프레임분을 모아서 매트릭스양자화해도 좋다. 여기서는 20msec을 1프레임으로 하고, 20msec마다 산출되는 LSP파라미터를 2프레임분 모아서 매트릭스양자화 및 벡터양자화하고 있다.The LSP parameter from the? -LSP conversion circuit 133 is matrix- or vector-quantized by the LSP quantizer 134. At this time, vector quantization may be performed by taking the difference between the frames, and matrix quantization may be performed by collecting a plurality of frames. Here, 20 msec is used as one frame, and LSP parameters calculated every 20 msec are collected for two frames to perform matrix quantization and vector quantization.

이 LSP양자화기(134)로부터의 양자화출력, 즉 LSP양자화의 인덱스는 단자(102)를 거쳐서 꺼내진다. 그리고, 양자화가 끝난 LSP벡터는 LSP보간회로(136)에 보내진다.The quantization output from the LSP quantizer 134, that is, the index of the LSP quantization, is taken out via the terminal 102. The quantized LSP vector is sent to the LSP interpolation circuit 136.

LSP보간회로(136)는 상기 20msec 혹은 40msec마다 양자화된 LSP의 벡터를 보간하고, 8배의 레이트로 한다. 즉, 2.5msec마다 LSP벡터가 갱신되도록 한다. 이것은 잔차파형을 하모닉스부호화 복호화방법에 의해 분석합성하면, 그 합성파형의 엔벌로프는 상당히 완만하게 경사지고 매끄러운 파형이 된다. 그러므로, LPS개수가 20msec마다 급격하게 변화하면 이음을 발생하기도 하기 때문이다. 2.5msec마다 LPC계수가 서서히 변화하여 감으로써, 이와같은 이음의 발생을 방지할 수 있다.The LSP interpolation circuit 136 interpolates the vector of the quantized LSP every 20 msec or 40 msec, and makes the rate 8 times. That is, the LSP vector is updated every 2.5 msec. When the residual waveform is analyzed and synthesized by the harmonic coded decoding method, the envelope of the synthesized waveform is considerably smoothly inclined and becomes a smooth waveform. Therefore, if the number of LPS changes abruptly every 20 msec, a noise may occur. By gradually changing the LPC coefficient every 2.5 msec, occurrence of such anomalies can be prevented.

이와같은 보간이 행해진 2.5msec마다의 LPS벡터를 이용하여 입력음성의 역필터링을 실행하기 위해, LSP→α변환회로(137)에 의해 LSP파라미터를 예를 들면 10차정도의 직접형 필터의 계수인 α파라미터로 변환한다. 이 LSP→α변환회로(137)로부터의 출력은 상기 LPC역필터회로(111)에 보내진다. 이 LPC역필터(111)에서는 2.5msec마다 갱신되는 α파라미터에 의해 역필터링처리를 행하여 매끄러운 출력을 얻도록 하고 있다. 이 LPC역필터(111)로부터의 출력은 사인파분석 부호화부(114), 구체적으로는 예를 들면 하모닉스부호화회로의 직교변환회로(145), 예를 들면 DFT(이산푸리에변환)회로에 보내진다.In order to perform inverse filtering of the input speech using the LPS vector every 2.5 msec in which such interpolation has been performed, the LSP-? Convert to α parameter. The output from the LSP? Alpha conversion circuit 137 is sent to the LPC inverse filter circuit 111. In this LPC inverse filter 111, the reverse filtering process is performed by the α parameter updated every 2.5 msec to obtain a smooth output. The output from the LPC inverse filter 111 is sent to a sinusoidal analysis encoder 114, specifically, an orthogonal transform circuit 145 of a harmonic encoding circuit, for example a DFT (discrete Fourier transform) circuit.

LPC분석·양자화부(113)의 LPC분석회로(132)로부터의 α파라미터는, 청각가중필터 산출회로(139)에 보내져서 청각가중을 위한 데이터가 구해지고, 이 가중데이터가 후술하는 청각가중의 벡터양자화기(116)와, 제 2부호화부(120)의 청각가중필터(125) 및 청각가중의 합성필터(122)에 보내진다.The α parameter from the LPC analysis circuit 132 of the LPC analysis and quantization unit 113 is sent to the auditory weighting filter calculation circuit 139 to obtain data for auditory weighting. The vector quantizer 116 and the auditory weighting filter 125 and the auditory weighting synthesis filter 122 of the second encoder 120 are sent.

하모닉스부호화회로 등의 사인파분석 부호화부(114)에서는, LPC역필터(111)로부터의 출력을 하모닉스부호화의 방법으로 분석한다. 즉, 피치검출, 각 하모닉스의 진폭(Am)의 산출, 유성음(V)/무성음(UV)의 판별을 행하고, 피치에 의해 변화하는 하모닉스의 엔벌로프 혹은 진폭(Am)의 개수를 차원변환하여 일정수로 하고 있다.A sine wave analysis coding unit 114 such as a harmonic encoding circuit analyzes the output from the LPC inverse filter 111 by the method of harmonic encoding. That is, pitch detection, calculation of the amplitude Am of each harmonics, discrimination of voiced sound (V) / unvoiced sound (UV), and dimensional conversion of the number of harmonic envelopes or amplitudes Am varying by pitch I am numbered.

도 3에 나타내는 사인파분석 부호화부(114)의 구체예에 있어서는, 일반의 하모닉스부호화가 가정된다. 특히 MBE(Multiband Excitation: 멀티밴드여기)부호화의 경우에는, 동시각(동일 블록 혹은 프레임내)의 주파수축영역 소위 밴드마다 유성음(Voiced)부분과 무성음(Unvoiced)부분이 존재한다고 하는 가정으로 모델화하게 된다. 그 이외의 하모닉부호화에서는, 1블록 혹은 프레임내의 음성이 유성음인지 무성음인지의 택일적인 판정이 되게 된다. 또한, 이하의 설명중의 "프레임에 대하여 UV"는 MBE부호화에 적용한 경우에는 전체대역이 UV인 것을 의미한다.In the specific example of the sinusoidal analysis coding unit 114 shown in FIG. 3, general harmonic encoding is assumed. In particular, in the case of MBE (Multiband Excitation) encoding, it is modeled on the assumption that voiced and unvoiced portions exist for each so-called band in the frequency axis region of the simultaneous angle (in the same block or frame). do. In other harmonic encoding, an alternative determination is made as to whether voice in one block or frame is voiced or unvoiced. In addition, "UV to a frame" in the following description means that the whole band is UV when applied to MBE encoding.

도 3의 사인파분석 부호화부(114)의 개루프 피치서치부(141)에는 상기 입력단자(101)로부터의 입력음성신호가, 또 0교차카운터(142)에는 상기 HPF(하이패스필터)(109)로부터의 신호가 각각 공급되어 있다. 사인파분석 부호화부(114)의 직교변환회로(145)에는, LPC역필터(111)로부터의 LPC잔차 혹은 선형예측잔차가 공급되어 있다. 개루프 피치서치부(141)에서는 입력신호의 LPC잔차를 취하여 개루프에 의한 비교적 거친 피치의 서치가 행해지고, 추출된 거친피치데이터는 고정밀도 피치서치(146)에 보내져서 후술하는 바와같은 폐루프에 의한 고정밀도의 피치서치(피치의 정밀서치)가 행해진다. 또, 개루프 피치서치부(141)에서는, 상기 거친피치데이터와 함께 LPC잔차의 자기상관의 최대값을 파워로 정규화한 정규화 자기상관 최대값(r(P))이 꺼내져서, V/UV(유성음/무성음)판정부(115)에 보내지고 있다.In the open loop pitch search unit 141 of the sine wave analysis encoder 114 of FIG. 3, the input audio signal from the input terminal 101 is input, and the zero crossing counter 142 is the HPF (high pass filter) 109. The signals from) are supplied respectively. The LPC residual or linear prediction residual from the LPC inverse filter 111 is supplied to the orthogonal transform circuit 145 of the sine wave analysis encoding unit 114. In the open loop pitch search unit 141, the LPC residual of the input signal is taken to perform a relatively rough pitch search by the open loop, and the extracted rough pitch data is sent to the high precision pitch search 146 to be described later. High-precision pitch search (precision search of pitch) is performed. In addition, in the open loop pitch search unit 141, the normalized autocorrelation maximum value r (P) obtained by normalizing the maximum value of the autocorrelation of the LPC residual with power is taken out together with the rough pitch data, and the V / UV ( Voiced sound / unvoiced sound).

직교변환회로(145)에서는, 예를 들면 DFT(이산푸리에변환) 등의 직교변환처리가 실시되어서, 시간축상의 LPC잔차가 주파수축상의 스펙트럼 진폭데이터로 변환된다. 이 직교변환회로(145)로부터의 출력은, 고정밀도 피치서치부(146) 및 스펙트럼진폭 혹은 엔벌로프를 평가하기 위한 스펙트럼평가부(148)에 보내진다.In the orthogonal transform circuit 145, an orthogonal transform process such as a DFT (discrete Fourier transform) is performed, and the LPC residual on the time axis is converted into spectral amplitude data on the frequency axis. The output from the orthogonal transformation circuit 145 is sent to the high precision pitch search unit 146 and the spectrum evaluation unit 148 for evaluating the spectral amplitude or envelope.

고정밀도(fine; 파인) 피치서치부(146)에는, 개루프 피치서치부(141)에서 추출된 비교적 개략적인 거친피치데이터와, 직교변환부(145)에 의해 예를 들면 DFT된 주파수축상의 데이터가 공급되어 있다. 이 고정밀도 피치서치부(146)에서는, 상기 거친피치데이터값을 중심으로, 0.2∼0.5시각으로 ±수샘플씩 흔들어서, 최적한 소수점부(플로팅)의 파인피치데이터의 값에 몰아넣는다. 이때의 파인서치의 방법으로써, 소위 합성에 의한 분석법을 이용하고, 합성된 파워스펙트럼이 원음의 파워스펙트럼에 가장 가깝게 되도록 피치를 선택하고 있다. 이와 같은 폐루프에 의한 고정밀도의 피치서치부(146)로부터의 피치데이터에 대해서는 스위치(118)를 거쳐서 출력단자(104)에 보내고 있다. 그리고, 피치변환이 필요한 경우에는, 피치변환부(119)에서의 후술하는 처리에 의해 피치변환을 행할수 있다.The fine pitch search unit 146 includes relatively rough coarse pitch data extracted from the open loop pitch search unit 141 and a frequency axis that is, for example, DFTed by the orthogonal transform unit 145. The data is supplied. In this high-precision pitch search section 146, the sample is shaken by ± several samples at the 0.2 to 0.5 hour centering on the rough pitch data value, and is driven into the optimum value of the fine pitch data of the floating point portion (floating). As a method of fine search at this time, a so-called synthesis analysis method is used, and the pitch is selected so that the synthesized power spectrum is closest to the power spectrum of the original sound. The pitch data from the high precision pitch search unit 146 by the closed loop is sent to the output terminal 104 via the switch 118. And when pitch conversion is needed, pitch conversion can be performed by the process mentioned later in the pitch conversion part 119. FIG.

스펙트럼평가부(148)에서는, LPC잔차의 직교변환 출력으로써의 스펙트럼진폭 및 피치에 의거하여 각 하모닉스의 크기 및 그 집합인 스펙트럼 엔벌로프가 평가되고, 고정밀도 피치서치부(146), V/UV(유성음/무성음)판정부(115) 및 청각가중의 벡터양자화기(116)에 보내진다.In the spectrum evaluation unit 148, the spectral envelope, which is the size and set of each harmonic, is evaluated based on the spectral amplitude and pitch as the orthogonal transform output of the LPC residual, and the high precision pitch search unit 146 and the V / UV (Voiced / unvoiced) is sent to the judging unit 115 and the auditory weighting vector quantizer 116.

V/UV(유성음/무성음)판정부(115)는, 직교변환회로(145)로부터의 출력과, 고정밀도 피치서치부(146)로부터의 최적피치와, 스펙트럼평가부(148)로부터의 스펙트럼진폭데이터와, 개루프피치서치부(141)로부터의 정규화 자기상관 최대값(r(P))과, 제로클로스카운터(142)로부터의 제로크로스카운트값과에 의거하여 상기 프레임의 V/UV판정이 행해진다. 또한, MBE의 경우의 각 밴드마다의 V/UV판정결과의 경계위치도 상기프레임의 V/UV판정의 일조건으로 하여도 좋다. 이 V/UV판정부(115)로부터의 판정출력은 출력단자(105)를 거쳐서 꺼내진다.The V / UV (voiced / unvoiced) determination unit 115 outputs from the quadrature conversion circuit 145, the optimum pitch from the high precision pitch search unit 146, and the spectral amplitude from the spectrum evaluation unit 148. The V / UV determination of the frame is based on the data, the normalized autocorrelation maximum value r (P) from the open loop pitch search unit 141, and the zero cross count value from the zero clock counter 142. Is done. The boundary position of the V / UV determination result for each band in the case of MBE may also be a condition of the V / UV determination of the frame. The determination output from this V / UV determination section 115 is taken out via the output terminal 105.

그런데, 스펙트럼평가부(148)의 출력부 혹은 벡터양자화기(116)의 입력부에는 데이터수변환(일종의 샘플링레이트변환)부가 설치되어 있다. 이 데이터수변환부는 상기 피치에 따라서 고주파축상에서의 분할대역수가 다르고, 데이터수가 다른 것을 고려하여 엔벌로프의 진폭데이터(|Am|)를 일정의 개수로 하기위한 것이다. 즉, 예를 들면 유효대역을 3400kHz까지로 하면, 이 유효대역이 상기 피치에 따라서, 8밴드∼63밴드로 분할됨으로써, 이들의 각 밴드마다 얻어지는 상기 진폭데이터(|Am|)의 개수(m_MX＋1)도 변화하게 된다. 이 때문에 데이터수변환부(119)에서는, 이 가변개수(m_MX＋1)의 진폭데이터를 일정개수 M개, 예를 들면 44개의 데이터로 변환하고 있다.By the way, at the output of the spectrum evaluation unit 148 or at the input of the vector quantizer 116, a data number conversion (a kind of sampling rate conversion) is provided. This data number converter is intended to make a certain number of amplitude data (| Am |) of the envelope in consideration of the fact that the number of division bands on the high frequency axis is different and the number of data is different depending on the pitch. That is, for example, when the effective band is up to 3400 kHz, the effective band is divided into 8 bands to 63 bands according to the pitch, and thus the number (m _MX ) of the amplitude data (| Am |) obtained for each of these bands. +1) also changes. For this reason, the data number converting section 119 converts this variable number (m _MX +1) amplitude data into a constant number M pieces, for example, 44 pieces of data.

이 스펙트럼평가부(148)의 출력부 혹은 벡터양자화기(116)의 입력부에 설치된 데이터수변환부로부터의 상기 일정개수 M개(예를 들면 44개)의 진폭데이터 혹은 엔벌로프데이터가, 벡터양자화기(116)에 의해 소정개수, 예를 들면 44개의 데이터마다 모아져서 벡터가 되고 가중벡터양자화가 실시된다. 이 가중은 청각가중필터 산출회로(139)로부터의 출력에 의해 부여된다. 벡터양자화기(116)로부터의 상기 엔벌로프의 인덱스는 스위치(117)를 거쳐서 출력단자(103)에서 꺼내진다. 또한, 상기 가중벡터양자화에 앞서서, 소정개수의 데이터에서 이루는 벡터에 대해서 적당한 리크계수를 이용한 프레임간 차분을 취하여 놓도록 해도 좋다.The constant number M pieces (for example, 44 pieces) of amplitude data or envelope data from the data number conversion section provided in the output section of the spectrum evaluation section 148 or the input section of the vector quantizer 116 are vector quantizers. By 116, a predetermined number, for example, 44 pieces of data, are gathered into a vector and weighted vector quantization is performed. This weighting is given by the output from the auditory weighting filter calculation circuit 139. The index of the envelope from the vector quantizer 116 is taken out of the output terminal 103 via the switch 117. In addition, prior to the weight vector quantization, a difference between frames using an appropriate leak coefficient may be taken for a vector formed from a predetermined number of data.

다음에 제 2부호화부(120)에 대해서 설명한다. 제 2부호화부(120)는 소위 CELP(부호여기선형예측)부호화구성을 가지고 있고, 특히 입력음성신호의 무성음부분의 부호화를 위해 이용되고 있다. 이 무성음부분용의 CELP부호화 구성에 있어서, 잡음부호장, 소위 스토캐스틱 코드북(stochastic code book)(121)으로부터의 대표값 출력인 무성음의 LPC잔차에 상당하는 노이즈출력을 게인회로(126)를 거쳐서 청각가중의 합성필터(122)에 보내고 있다. 가중의 합성필터(122)에서는 입력된 노이즈를 LPC합성처리하고, 얻어진 가중무성음의 신호를 감산기(123)에 보내고 있다. 감산기(123)에는 상기 입력단자(101)에서 HPF(하이패스필터)(109)를 거쳐서 공급된 음성신호를 청각가중필터(125)에서 청각가중한 신호가 입력되어 있고, 합성필터(122)로부터의 신호와의 차분 혹은 오차를 꺼내고 있다. 이 오차를 거리계산회로(124)에 보내서 거리계산을 행하고, 오차가 최소가 되도록 대표값벡터를 잡음부호장(121)에서 서치한다. 이와같은 합성에 의한 분석법을 이용한 폐루프서치를 이용한 시간축파형의 벡터양자화를 행하고 있다.Next, the second encoding unit 120 will be described. The second encoding unit 120 has a so-called CELP (signal excitation linear prediction) encoding structure, and is particularly used for encoding unvoiced portions of input audio signals. In the CELP encoding configuration for the unvoiced portion, a noise output corresponding to the LPC residual of the unvoiced sound, which is a representative value output from a noise code book, a so-called stochastic code book 121, is audited through the gain circuit 126. It is sent to the weighted synthesis filter 122. The weighted synthesis filter 122 performs LPC synthesis processing on the input noise, and sends the obtained weighted unvoiced sound signal to the subtractor 123. The subtractor 123 inputs an audio weighted signal from the audio weight filter 125 to the audio signal supplied from the input terminal 101 through the HPF (high pass filter) 109. The difference or error with the signal is taken out. The error is sent to the distance calculating circuit 124 to calculate the distance, and the noise value encoding field 121 searches for the representative value vector so that the error is minimized. The vector quantization of time-axis waveforms using closed loop search using such an analytical method is performed.

이 CELP부호화구성을 이용한 제 2부호화부(120)로부터의 UV(무성음)부분용의 데이터로써는, 잡음부호장(121)으로부터의 코드북의 셰이프인덱스와 게인회로(gain circuit)(126)로부터의 코드북의 게인인덱스가 꺼내진다. 잡음부호장(121)으로부터의 UV데이터인 셰이프인덱스는 스위치(127s)를 거쳐서 출력단자(107s)에 보내지고, 게인회로(126)의 UV데이터인 게인인덱스는, 스위치(127g)를 거쳐서 출력단자(107g)에 보내지고 있다.As data for the UV (unvoiced) portion from the second encoding unit 120 using this CELP encoding structure, the codebook from the shape index and gain circuit 126 of the codebook from the noise code book 121 is used. Gain index of is taken out. The shape index which is the UV data from the noise code field 121 is sent to the output terminal 107s via the switch 127s, and the gain index which is the UV data of the gain circuit 126 is output via the switch 127g. I am sent to (107g).

여기서, 이들의 스위치(127s, 127g) 및 상기 스위치(117, 118)는, 상기 V/UV판정부(115)로부터의 V/UV판정결과에 의해 온/오프제어되고, 스위치(117, 118)는 현재 전송하고자 하는 프레임의 음성신호의 V/UV판정결과가 유성음(V)일 때 온이 되고, 스위치(127s, 127g)는 현재 전송하고자 하는 프레임의 음성신호가 무음성(UV)일 때 온이 된다.Here, the switches 127s and 127g and the switches 117 and 118 are controlled on / off by the V / UV determination result from the V / UV determination 115, and the switches 117 and 118 are controlled. Is on when the V / UV determination result of the voice signal of the frame to be transmitted is voiced sound (V), and switches 127s and 127g are on when the voice signal of the frame to be transmitted is silent (UV). Becomes

다음에, 도 4를 참조하여, 도 2에 나타낸 음성신호 복호화장치의 보다 구체적인 구성을 설명한다. 이 도 4에 있어서, 도 2에 대응하는 부분에는 동일 부호를 붙이고 있다.Next, with reference to FIG. 4, the more specific structure of the audio signal decoding apparatus shown in FIG. 2 is demonstrated. In FIG. 4, the same code | symbol is attached | subjected to the part corresponding to FIG.

이 도 4에 있어서, 입력단자(202)에는, 도 1, 3의 출력단자(102)로부터의 출력에 상당하는 LSP의 벡터양자화출력, 소위 코드북의 인덱스가 공급되고 있다.In FIG. 4, the input terminal 202 is supplied with the vector quantized output of the LSP corresponding to the output from the output terminal 102 of FIGS. 1 and 3, and the index of the so-called codebook.

이 LSP의 인덱스는, LPC파라미터 재생부(213)의 LSP의 역벡터양자화기(231)에 보내져서 LSP(선스펙트럼쌍)데이터에 역벡터양자화되고, LSP보간회로(232, 233)에 보내져서 LSP의 보간처리가 실시된 후, LSP보간회로(232) 및 LSP→α변환회로(234)는 유성음(V)용이며, LSP보간회로(233) 및 LSP→α변환회로(235)는 무음성(UV)용이다. 또 LPC합성필터(214)는 유성음부분의 LPC합성필터(236)와, 무성음부분의 LPC합성필터(237)를 분리하고 있다. 즉, 유성음부분과 무성음부분에서 LPC의 계수보간을 독립으로 행하도록 하여, 유성음에서 무성음에의 천이부나, 무성음에서 유성음에의 천이부에서, 전혀 성질이 다른 LSP끼리를 보간하는 것에 의한 악영향을 방지하고 있다.The index of this LSP is sent to the inverse vector quantizer 231 of the LSP of the LPC parameter reproducing unit 213, inverse vector quantized into LSP (line spectrum pair) data, and sent to the LSP interpolation circuits 232 and 233. After the LSP interpolation processing is performed, the LSP interpolation circuit 232 and the LSP to alpha conversion circuit 234 are for voiced sound (V), and the LSP interpolation circuit 233 and the LSP to alpha conversion circuit 235 are silent. It is for (UV). The LPC synthesis filter 214 separates the LPC synthesis filter 236 of the voiced sound portion and the LPC synthesis filter 237 of the unvoiced sound portion. In other words, the LPC coefficient interpolation is performed independently in the voiced and unvoiced portions, thereby preventing the adverse effects of interpolating LSPs having completely different properties in the transition portion from the voiced sound to the unvoiced sound or the transition portion from the unvoiced sound to the voiced sound. Doing.

도 4의 입력단자(203)에서는, 도 1, 도 3의 인코더측의 단자(103)로부터의 출력에 대응하는 스펙트럼 엔벌로프(Am)의 가중벡터양자화된 코드인덱스데이터가 공급된다. 입력단자(204)에는 도 1, 도 3의 단자(104)로부터의 피치의 데이터가 공급되고, 입력단자(205)에서는 도 1, 도 3의 단자(105)로부터의 V/UV판정데이터가 공급되고 있다.In the input terminal 203 of FIG. 4, the weighted vector quantized code index data of the spectral envelope Am corresponding to the output from the terminal 103 on the encoder side of FIGS. 1 and 3 is supplied. Pitch data from the terminals 104 of FIGS. 1 and 3 are supplied to the input terminal 204, and V / UV determination data from the terminals 105 of FIGS. 1 and 3 are supplied from the input terminal 205. FIG. It is becoming.

입력단자(203)로부터의 스펙트럼 엔벌로프(Am)의 벡터양자화된 인덱스데이터는, 역벡터양자화기(212)에 보내져서 역벡터양자화가 실시된다. 이 역벡터양자화된 엔벌로프의 진폭데이터의 개수는, 상술한 것같이 일정개수, 예를 들면 44개로 되어 있다. 피치데이터에 따른 개수의 하모닉스가 되는 것같이 데이터수변환한다. 역벡터양자화기(212)에서 데이터수변환부(270)에 보내지는 데이터의 개수는 일정개수로 남아서 데이터수변환될수 있다.The vector quantized index data of the spectral envelope Am from the input terminal 203 is sent to the inverse vector quantizer 212 to perform inverse vector quantization. As described above, the number of amplitude data of the inverse vector quantized envelope is a fixed number, for example, 44. The number of data is converted as if it were the harmonics of the number according to the pitch data. The number of data sent from the inverse vector quantizer 212 to the data number converter 270 may be converted into a number of data while remaining as a predetermined number.

또, 데이터변환부(270)에는, 입력단자(204)에서의 피치데이터가 피치변환부(215)를 거쳐서 공급되고 엔코드된 피치가 공급된다. 여기서, 피치변환이 필요한 경우에는, 피치변환부(215)에서의 후술하는 처리에 의해 피치변환을 행할수 있다. 그리고, 데이터변환부(270)에서의 LPC잔차의 스펙트럼 엔벌로프의 설정피치에 따른 개수의 진폭데이터와, 변경된 피치데이터가 유성음합성부(211)의 사인파합성회로(215)에 보내지고 있다.In addition, pitch data at the input terminal 204 is supplied to the data converter 270 via the pitch converter 215, and the encoded pitch is supplied. Here, when pitch conversion is necessary, pitch conversion can be performed by the process described later in the pitch conversion unit 215. The amplitude data of the number corresponding to the set pitch of the spectral envelope of the LPC residual in the data converter 270 and the changed pitch data are sent to the sine wave synthesis circuit 215 of the voiced sound synthesis unit 211.

여기서, 데이터변환부(270)에서의 LPC잔차의 스펙트럼 엔벌로프의 진폭데이터의 개수를 변환하기에는, 여러 가지의 보간방법이 고려되는바, 예를 들면 주파수축상의 유효대역 1블록분의 진폭데이터에 대하여, 블록내의 최종의 데이터에서 블록내의 최초의 데이터까지의 값을 보간하는 더미데이터를 부가하여 데이터개수를 N_F개로 확대한 후, 혹은 블록내의 좌단 및 우단(최초와 최후)의 데이터를 연장하여 더미데이터로서 데이터대역제한형의 O_S배(예를 들면 8배)의 오버샘플링을 실시함으로써 O_S배의 개수의 진폭데이터를 구하고, 이 O_S배의 개수((m_MX+1) X O_S)개의 진폭데이터를 직선보간하고 다시 많은 N_M개(예를 들면 2048개)로 확장하고, 이 N_M개의 데이터를 솎아내어 설정하는 피치에 따른 개수 M의 데이터로 변환하여도 좋다.Here, in order to convert the number of amplitude data of the spectral envelope of the LPC residual in the data converter 270, various interpolation methods are considered, for example, the amplitude data of one block of the effective band on the frequency axis. On the other hand, dummy data for interpolating values from the last data in the block to the first data in the block are added to increase the number of data to N _F , or the left and right (first and last) data in the block are extended. O _S-fold of the band-limited data as dummy data by carrying out over sampling O _S times to obtain an amplitude data of the number of the O _S times the number ((m _MX +1 of) of (e.g. eight times) XO _S ) Amplitude data may be linearly interpolated, and further expanded to a large number of N _M pieces (for example, 2048 pieces), and the N _M data may be converted into data of the number M according to the pitch set by crushing.

데이터변환부(270)에 있어서는, 스펙트럼 엔벌로프의 형상을 변경하지 않으므로, 하모닉스의 세우고 있는 위치만큼을 변경할수 있다. 이 때문에, 음운은 불변이다.In the data conversion section 270, since the shape of the spectral envelope is not changed, only the upright position of the harmonics can be changed. For this reason, phonology is immutable.

여기서, 데이터변환부(270)에 있어서의 동작의 일예로서, 피치랙(pitch lag)(L)의 때의 주파수 F₀=f_s/L를 Fx에 변환하는 경우에 대하여 설명한다. f_s는 샘플링주파수이고, 예를 들면 f_s= 8kHz = 8000Hz로 한다.Here, as an example of the operation in the data converter 270, the case where the frequency F ₀ = f _s / L at the time of the pitch lag L is converted into Fx will be described. f _s is a sampling frequency, for example, f _s = 8 kHz = 8000 Hz.

이때, 피치주파수F₀=8000/L이고, 하모닉스는 4000Hz까지의 사이에 n = L/2개 세워져 있다. 통상의 음성대역의 3400Hz폭에서는, 약 (L/2) x (3400/4000)이다. 이것은 상술한 데이터수변환 혹은 차원변환에 의해 일정의 개수, 예를 들면 44개로 변환한 후, 벡터양자화를 행한다.At this time, the pitch frequency F ₀ = 8000 / L, and n = L / 2 is set up to 4000 Hz. In the 3400 Hz width of a typical audio band, it is about (L / 2) x (3400/4000). This is converted into a predetermined number, for example, 44 by the data number conversion or dimension conversion described above, and then vector quantization is performed.

또한, 인코드시에 스펙트럼의 벡터양자화에 앞서서 프레임간차분을 취하고 있는 경우에는, 여기서의 역벡터양자화후에 프레임간차분의 복호를 행하므로 데이터수변환을 행하고, 스펙트럼 엔벌로프의 데이터를 얻는다.If the interframe difference is taken before the vector quantization of the spectrum at the time of encoding, the interframe difference is decoded after the inverse vector quantization here, so that data number conversion is performed to obtain data of the spectral envelope.

사인파합성회로(215)에는, 데이터변환부(270)에서의 LPC잔차의 스펙트럼 엔벌로프 진폭데이터나 피치데이터의 외에도, 입력단자(205)에서의 상기 V/UV판정데이터가 공급되고 있다. 사인파합성회로(215)에서는, LPC잔차데이터가 꺼내지고, 이것이 가산기(218)에 보내지고 있다.The sine wave synthesis circuit 215 is supplied with the V / UV determination data at the input terminal 205 in addition to the spectral envelope amplitude data and the pitch data of the LPC residual in the data converter 270. In the sine wave combining circuit 215, the LPC residual data is taken out and sent to the adder 218.

또, 역벡터양자화기(212)로부터의 엔벌로프의 데이터와 입력단자(204)로부터의 피치, 입력단자(205)로부터의 V/UV판정데이터는 유성음(V)부분의 노이즈가산을 위한 노이즈 합성회로(216)에 보내지고 있다. 이 노이즈합성회로(216)로부터의 출력은 가중중첩가산회로(217)를 거쳐서 가산기(218)에 보내고 있다. 이것은 사인파합성에 의해 유성음의 LPC합성필터에의 입력이 되는 엑사이테이션(Excitation: 여기, 여진)을 만들면, 남성의 음성 등의 낮은 피치의 음으로 코가 막힌 감이 있는 점 및 V(유성음)와 UV(무성음)에서 음질이 급격히 변화하여 부자연하게 느끼는 경우가 있는 점을 고려하고, 유성부분의 LPC합성필터입력 즉 엑사이테이션에 대해서, 음성부호화데이터에 의거하는 파라미터 예를 들면 피치, 스펙트럼 엔벌로프진폭, 프레임내의 최대진폭, 잔차신호의 레벨 등을 고려한 노이즈를 LPC잔차신호의 유성음부분에 가하고 있는 것이다.In addition, the envelope data from the inverse vector quantizer 212, the pitch from the input terminal 204, and the V / UV determination data from the input terminal 205 are synthesized to add noise to the voiced sound (V) portion. It is sent to the circuit 216. The output from the noise synthesis circuit 216 is sent to the adder 218 via the weighted overlap addition circuit 217. This creates excitation (excitation), which is the input of voiced sound to the LPC synthesis filter by sine wave synthesis. Considering the fact that the sound quality may change suddenly and unnaturally in the case of UV and unvoiced sound, and the LPC synthesis filter input, ie, excitation, of the voice part is based on parameters such as pitch and spectrum spectrum. Noise in consideration of the rope amplitude, the maximum amplitude in the frame, and the residual signal level is applied to the voiced sound portion of the LPC residual signal.

가산기(218)로부터의 가산출력은, LPC합성필터(214)의 유성음용의 합성필터(236)에 보내져서 LPC의 합성처리가 실시됨으로써 시간파형데이터가 되고, 또한 유성음용 포스트필터(238v)에서 필터처리된 후, 가산기(239)에 보내진다.The addition output from the adder 218 is sent to the synthesis filter 236 for the voiced sound of the LPC synthesis filter 214, and is subjected to LPC synthesis processing, thereby making time waveform data, and the post filter 238v for the voiced sound. After filtering, it is sent to the adder 239.

다음에 도 4의 입력단자(207s 및 207g)에는 도 3의 출력단자(107s 및 107g)로부터의 UV데이터로써의 셰이프인덱스 및 게인인덱스가 각각 꺼내져서 무성음합성부(220)에 보내지고 있다. 단자(207s)로부터의 셰이프인덱스는 무성음합성부(220)의 잡음부호장(221)에, 단자(207g)로부터의 게인인덱스는 게인회로(222)에 각각 보내지고 있다. 잡음부호장(221)에서 독출된 대표값 출력은 여기벡터, 즉 무성음의 LPC잔차에 상당하는 노이즈신호성분이며, 이것이 게인회로(222)에서 소정의 게인의 진폭이 되고, 윈도잉회로(223)에 보내져서 상기 유성음부분과의 연결을 원활화하기 위한 윈도잉처리가 실시된다.Next, the shape index and the gain index as the UV data from the output terminals 107s and 107g of FIG. 3 are taken out to the input terminals 207s and 207g of FIG. 3, respectively, and are sent to the unvoiced sound synthesis unit 220. FIG. The shape index from the terminal 207s is sent to the noise coding field 221 of the unvoiced synthesizer 220, and the gain index from the terminal 207g is sent to the gain circuit 222, respectively. The representative value output read out from the noise code field 221 is a noise signal component corresponding to the excitation vector, i.e., the LPC residual of the unvoiced sound, which becomes the amplitude of the predetermined gain in the gain circuit 222, and the windowing circuit 223 Is sent to a windowing process for smoothing the connection with the voiced sound portion.

윈도잉회로(223)로부터의 출력은, 무성음합성부(220)로부터의 출력으로서, LPC합성필터(214)의 UV(무성음)용의 합성필터(237)에 보내진다. 합성필터(237)에서는 LPC합성처리가 실시됨으로써 무성음부분의 시간파형데이터가 되고, 이 무성음부분의 시간파형데이터는 무성음용 포스트필터(238u)에서 필터처리된 후, 가산기(239)에 보내진다.The output from the windowing circuit 223 is sent to the synthesis filter 237 for UV (silent sound) of the LPC synthesis filter 214 as the output from the unvoiced sound synthesis unit 220. In the synthesis filter 237, the LPC synthesis processing is performed to form time waveform data of the unvoiced sound portion, which is filtered by the unvoiced post filter 238u, and then sent to the adder 239. FIG.

가산기(239)에서는, 유성음용 포스트필터(238v)로부터의 유성음부분의 시간파형신호와, 무성음용 포스트필터(230u)로부터의 무성음부분의 시간파형데이터가 가산되어 출력단자(201)에서 꺼내진다.In the adder 239, the time waveform signal of the voiced sound portion from the voiced sound post filter 238v and the time waveform data of the unvoiced sound portion from the unvoiced post filter 230u are added and taken out from the output terminal 201.

도 1, 도 3을 참조하여 설명한 음성부호화장치내에 포함되어 있는 피치변환부(119)에서 실행되는 피치변환처리와, 도 2, 도 4를 참조하여 설명한 음성복호화장치내에 포함되어 있는 피치변환부(240)에서 실행되는 피치변환처리에 대하여 설명한다. 본 예는 부호화시와 복호화시 모두 음성의 피치변환을 할수 있도록 구성된다. 부호화시에 피치변환하고자 하는 경우에는, 음성부호화장치내에 포함되는 피치변환부(119)에서 대응하는 처리를 행한다. 복호화시에 피치변환하고자 하는 경우에는, 음성복호화장치내에 포함되는 피치변환부(240)에서 대응하는 처리를 행한다. 따라서, 기본적으로는 음성부호화장치와 음성복호화장치의 어느 한측에 피치변환부가 있으면, 본 예에서 설명하는 피치변환처리를 실행할수 있다. 음성부호화장치에서 부호화시에 피치변환된 음성신호는 음성복호화장치에서의 복호화시에 더 피치변환될수 있다.Pitch conversion processing performed in the pitch conversion unit 119 included in the audio encoding apparatus described with reference to FIGS. 1 and 3, and pitch conversion unit included in the audio decoding apparatus described with reference to FIGS. The pitch conversion processing performed at 240 will be described. This example is configured to enable pitch conversion of speech both during encoding and during decoding. When pitch conversion is to be performed at the time of encoding, the corresponding processing is performed by the pitch conversion unit 119 included in the speech encoding apparatus. When pitch conversion is to be performed at the time of decoding, the corresponding processing is performed by the pitch conversion unit 240 included in the speech decoding apparatus. Therefore, basically, if there is a pitch conversion section on either side of the audio encoding device or the audio decoding device, the pitch conversion processing described in this example can be executed. The speech signal pitch-transformed at the time of encoding in the speech encoding apparatus may be further pitch-converted at the time of decoding in the speech decoding apparatus.

이하, 피치변환부에서의 처리의 상세에 대하여 설명한다. 음성부호화장치내에 포함되는 피치변환부(119)에서 실행되는 피치변환처리와, 음성복호화장치내의 피치변환부(215)에서 행해지는 피치변환처리와는 기본적으로 동일하다. 각각의 변환부(119, 240)에서는 공급되는 피치데이터가 변환처리된다. 본 예에서 각 피치변환부(119)에 공급되는 피치데이터는 도 1~도 4를 참조하여 설명한 것같이 피치랙(주기)이다. 피치랙은 연산처리에 의해 다른 데이터로 변환되고, 피치변환이 행해진다.Hereinafter, the detail of the process in a pitch conversion part is demonstrated. The pitch conversion processing performed by the pitch conversion unit 119 included in the audio encoding device and the pitch conversion processing performed by the pitch conversion unit 215 in the audio decoding device are basically the same. In each of the converters 119 and 240, the supplied pitch data is converted. In this example, the pitch data supplied to each pitch converter 119 is a pitch rack (period) as described with reference to FIGS. 1 to 4. The pitch rack is converted into other data by arithmetic processing, and pitch conversion is performed.

이 피치변환의 구체적 처리에 대하여 이하에 서술되는 제 1처리에서 제 9의 처리까지 즉, 9개의 처리상태에서 선택될수 있다. 부호화장치 또는 복호화장치내에 포함되는 컨트롤러 등에서 행해지는 제어에 의거하여, 이들 처리상태중의 하나가 설정된다. 이하의 처리의 설명중의 수식으로 나타내는 피치는 그 주기를 나타낸다. 실제의 변환부 내에서의 연산처리에서는, 대응하는 처리가 하모닉스의 개수의 데이터로 행해진다.The specific processing of this pitch conversion can be selected from the first processing to the ninth processing described below, i.e., in nine processing states. One of these processing states is set based on control performed by a controller or the like included in the encoding apparatus or the decoding apparatus. The pitch represented by the formula in the description of the following processing represents the period. In the arithmetic processing in the actual conversion section, the corresponding processing is performed with data of the number of harmonics.

제 1처리First treatment

이 처리는, 입력피치를 정수배하는 처리이다. 입력피치(pch_in)에 정수(K₁)이 곱해져서 출력피치(pch_out)가 산출된다. 그 연산식을 다음 수학식 1에 나타낸다.This process is a process of multiplying the input pitch by an integer. The input pitch pch_in is multiplied by an integer K ₁ to calculate the output pitch pch_out. The calculation is shown in the following equation.

[수학식 1][Equation 1]

pch_out = K₁ pch_inpch_out = K ₁ pch_in

여기서, 정수(K₁)의 값이 0< K₁ <1의 관계를 만족하도록 설정하면, 주파수가 높게 되고, 높은 음으로 변화시킬수 있다. K₁>1의 관계를 만족하도록 값으로 정수(K₁)의 값을 설정하면, 주파수가 낮게 되고, 낮은 음으로 변화시킬 수 있다.Here, if the value of the constant K ₁ is set to satisfy the relationship of 0 <K ₁ <1, the frequency becomes high and can be changed to a high sound. When the value of the constant K ₁ is set as a value so as to satisfy the relationship of K ₁ > 1, the frequency becomes low and can be changed to a low sound.

제 2처리Second treatment

이 처리는, 출력피치를 입력피치에 관계없이 일정하게 하는 처리이다. 미리 설정된 적당한 정수(P₂)를 항상 출력피치(pch_out)로 설정한다. 그 연산식을 다음 수학식 2에 나타낸다.This process is a process of making the output pitch constant regardless of the input pitch. Always set the appropriate constant P ₂ as the output pitch (pch_out). The calculation is shown in the following expression.

[수학식 2][Equation 2]

pch_out = P₂ pch_out = P ₂

이와 같이 일정의 피치로 하는 것으로, 단조로운 인공적인 음으로 변환할 수 있다.By setting it to a constant pitch in this way, it can be converted into monotonous artificial sound.

제 3처리Third treatment

이 처리는, 출력피치(pch_out)를 미리 설정된 적당한 정수(P₃)와 적당한 진폭(A₃)주파수(F₃)를 가지는 정현파의 합과 같게 한다. 그 연산식을 다음 수학식 3에 나타낸다.This process makes the output pitch pch_out equal to the sum of a sine wave having a suitable constant P ₃ set in advance and an appropriate amplitude A ₃ frequency F ₃ . The calculation is shown in the following equation.

[수학식 3][Equation 3]

pch_out = P₃ + A₃ sin(2πF₃t_(n))pch_out = P ₃ + A ₃ sin (2πF ₃ t _(n) )

이 수학식 3의 수학식에 있어서 n은 프레임수이고, t_(n)은 그 프레임에서의 이산시간으로 다음 수학식에 의해 설정된다.In Equation 3, n is the number of frames, and t _(n) is set by the following equation as a discrete time in the frame.

[수학식 4][Equation 4]

t_(n)= t_(n-1) + △tt _(n) = t _(n-1) + Δt

이와 같이 고정된 일정의 피치에 정형파를 더하는 것으로, 인공적인 음에 진동이 가해질수 있다.By adding a square wave to the fixed constant pitch as described above, vibration may be applied to the artificial sound.

제 4처리Fourth treatment

이 처리는 입력피치(pch_in)와 동일 모양의 난수 [-A₄, A₄]를 더한 것을 출력피치(pch_out)로 하는 것이다. 그 연산식을 다음 수학식 5에 나타낸다.This process is to make the output pitch pch_out by adding the input pitch pch_in and the same random number [-A ₄ , A ₄ ]. The calculation is shown in the following equation.

[수학식 5][Equation 5]

pch_out = pch_in + r_(n) pch_out = pch_in + r _(n)

여기서, r_(n)은 프레임 n마다 설정되는 난수이다. 각 처리프레임에 대하여, 동일 모양의 난수 [-A₄, A₄]가 발생되고, 가산처리가 행해진다. 이와 같이 처리하는 것으로, 덜커덕거리는 음과 같은 음으로 변환할수 있다.Here, r _(n) is a random number set for each frame n. For each processing frame, the same random number [-A ₄ , A ₄ ] is generated, and an addition process is performed. By doing this, you can convert the sound into a rattling sound.

제 5처리Fifth treatment

이 처리는 입력피치(pch_in)에 적당한 진폭(A₅) 및 주파수(F₅)의 정현파를 더한 것을 출력피치(pch_out)로 만드는 처리이다. 그 연산식을 다음 수학식 6에 나타낸다.This process is a process of making the output pitch pch_out by adding the sinusoids of the appropriate amplitude A ₅ and the frequency F ₅ to the input pitch pch_in. The calculation is shown in the following equation.

[수학식 6][Equation 6]

pch_out = pch_in + A₅ sin(2πF₅t_(n))pch_out = pch_in + A ₅ sin (2πF ₅ t _(n) )

이 수학식 6에 있어서 또한 n은 프레임수, t_(n)은 그 프레임에서의 이산시간이고, 상술한 수학식 4에 의해 설정된다. 이와 같이 처리하는 것으로, 입력음성에 진동이 가해진다. 이 경우, 주파수(F₅)를 작은 값으로 하는(즉 주기를 길게 한다) 것으로, 올림과 내림이 있는 음으로의 변환이 실행된다.In this equation (6 ₎ , n is the number of frames, t _(n) is the discrete time in the frame, and is set by the above equation (4). In this way, vibration is applied to the input voice. In this case, by changing the frequency F ₅ to a small value (that is, lengthening the period), the conversion to the sound with the up and down is performed.

제 6처리6th treatment

이 처리는 적당한 정수(P₆)에서 입력피치(pch_in)를 감산한 것을 출력피치(pch_out)로 한다. 그 연산식을 다음 수학식 7에 나타낸다.This processing subtracts the input pitch pch_in from the appropriate constant P _{6 as the} output pitch pch_out. The calculation is shown in the following expression.

[수학식 7][Equation 7]

pch_out = P₆- pch_inpch_out = P ₆ -pch_in

이와 같이 처리하는 것으로, 피치의 변화가 입력음성과는 역으로 된다. 예를 들면 어미의 변화가 통상과는 역으로 되는 음성으로 변환된다.In this way, the change in pitch is inverse to the input voice. For example, a change in the ending is converted into a voice that is inverted.

제 7처리7th treatment

이 처리는 적당한 시정수(τ₇)(여기에서 시정수(τ₇)는 0 < τ₇< 1의 범위에 있다)로 입력피치(pch_in)를 평활화(평균화)하여 얻어진 avg_pch를 출력피치(pch_out)로 하는 것이다. 그 연산식을 다음 수학식 8에 나타낸다.This process outputs avg_pch obtained by smoothing (averaging) the input pitch pch_in with an appropriate time constant τ ₇ (where time constant τ ₇ is in the range of 0 <τ ₇ <1). ) The calculation is shown in the following expression (8).

[수학식 8][Equation 8]

avg＿pch=(1-τ₇)avg＿pch+τ₇pch＿inavg＿pch = (1-τ ₇ ) avg＿pch + τ ₇ pch＿in

pch＿out=avg＿pchpch＿out = avg＿pch

여기서, 예를 들면 τ₇로서 0.05를 설정하는 것으로, 과거 20프레임의 평균치가 avg_pch가 되고, 그 값이 출력피치가 된다. 이와 같이 처리하는 것으로, 오름과 내림이 없는 축늘어진 감의 음성으로 변환된다.Here, for example, by setting 0.05 as tau ₇ , the average value of the past 20 frames becomes avg_pch, and the value becomes the output pitch. In this way, the sound is converted into a sulky persimmon without rising or falling.

제 8처리8th treatment

이 처리는, 적당한 시정수(τ₈)(여기에서 시정수(τ₈)는 0 < τ₈< 1의 범위에 있다)로 입력피치(pch_in)를 평활화(평균화)하여 구해진 avg_pch을 입력피치(pch_in)에서 감산된다. 그 결과의 차에 적당한 팩터(K₈)(여기서 K₈은 정수)가 곱해진다. 이 결과의 곱이 강조성분으로서 입력피치(pch_in)에 가산되고, 출력피치(pch_out)로 한다. 그 연산식을 다음 수학식 9에 나타낸다.This process is performed by avg_pch obtained by smoothing (averaging) the input pitch pch_in with an appropriate time constant τ ₈ (where the time constant τ ₈ is in the range of 0 <τ ₈ <1). pch_in). The difference is multiplied by a suitable factor K ₈ , where K ₈ is an integer. The product of this result is added to the input pitch pch_in as the emphasis component, and is set as the output pitch pch_out. The calculation is shown in the following expression (9).

[수학식 9][Equation 9]

avg＿pch=(1-τ₈)avg＿pch+τ₈pch＿inavg＿pch = (1-τ ₈ ) avg＿pch + τ ₈ pch＿in

pch＿out=pch＿in+K₈(pch＿in-avg＿pch)pch＿out = pch＿in + K ₈ (pch＿in-avg＿pch)

이와 같이 처리하는 것으로, 입력음성에 강조성분이 가산된 상태로 피치변환된다. 효과상 변조된 음으로 변환되는 것이 실행된다.In this way, the pitch conversion is performed with the emphasis component added to the input voice. In effect, the conversion to the modulated sound is performed.

제 9처리9th treatment

이것은 입력피치(pch_in)를 미리 피치변환부에 준비된 피치테이블내에 포함된 가장 근접한 고정피치데이터로 변환하는 매핑처리이다. 이 경우에는, 예를 들면 피치테이블내의 고정피치데이터로서, 음계에 대응하는 주파수간격의 데이터를 준비하고, 입력피치(pch_in)에 가장 근접한 음계의 데이터로 변환하는 것이 고려된다.This is a mapping process of converting the input pitch pch_in into the closest fixed pitch data contained in the pitch table prepared in advance in the pitch converting section. In this case, for example, as fixed pitch data in the pitch table, data of frequency intervals corresponding to the scale are prepared and converted into data of the scale closest to the input pitch pch_in.

이상 설명한 것같은 제 1처리에서 제 9처리까지의 어느 피치변환처리를 부호화장치내의 피치변환부(119) 또는 복호화장치내의 피치변환부(240)에서 실행함으로써, 복호시의 하모닉스의 개수를 제어하는 피치데이터만이 변환된다. 그래서, 음성의 음운을 변화시키지 않고, 피치만을 간단히 변환할수 있다.By performing any of the pitch conversion processing from the first to the ninth processing described above in the pitch conversion unit 119 in the encoding apparatus or the pitch conversion unit 240 in the decoding apparatus, the number of harmonics during decoding is controlled. Only pitch data is converted. Therefore, it is possible to simply change the pitch without changing the phonation of the voice.

이상 설명한 음성부호화장치 및 음성복호화장치를 전화장치에 적용한 예를 도 5 및 도 6을 참조하여 설명한다. 먼저, 음성부호화장치를 무선전화장치(휴대전화기세트와 같은)의 송신계에 적용한 예를 도 5에 나타낸다. 마이크로폰(301)에 의해 수집된 음성신호는 증폭기(302)에서 증폭되고, 아날로그/디지털변환기(303)에 의해 디지털신호로 변환되고, 음성부호화부(304)에 보내진다. 이 음성부호화부(304)는, 도 1, 도 3을 참조하여 설명된 음성부호화장치에 상당한다. 필요한 경우가 있으면, 부호화부(304)내에 포함된 피치변환부(도 1, 도 3의 피치변환부(119)에 상당)에서 피치변환처리가 행해진다. 음성부호화부(304)에서 부호화된 각 데이터가 부호화부(304)의 출력신호로서, 송신경로부호화부(305)에 보내진다. 송신경로부호화부(305)에서는, 소위 채널부호화처리가 실시된다. 그 출력신호가 변조회로(306)에 보내져서 변조되고, 디지털/아날로그 변환기(307), 고주파앰프(308)를 거쳐서 안테나(309)에 보내지고, 무선송신된다.An example in which the voice encoding device and the voice decoding device described above are applied to a telephone apparatus will be described with reference to FIGS. 5 and 6. First, FIG. 5 shows an example in which the voice encoding device is applied to a transmission system of a radiotelephone device (such as a cellular phone set). The voice signal collected by the microphone 301 is amplified by the amplifier 302, converted into a digital signal by the analog / digital converter 303, and sent to the voice encoder 304. This audio encoding unit 304 corresponds to the audio encoding apparatus described with reference to FIGS. 1 and 3. If necessary, the pitch conversion processing is performed in the pitch conversion unit (corresponding to the pitch conversion unit 119 in Figs. 1 and 3) included in the coding unit 304. Each data encoded by the speech encoder 304 is sent to the transmission path encoder 305 as an output signal of the encoder 304. In the transmission path encoding unit 305, a so-called channel encoding process is performed. The output signal is sent to the modulation circuit 306 to be modulated, sent to the antenna 309 via the digital-to-analog converter 307 and the high frequency amplifier 308, and wirelessly transmitted.

음성복호화장치를 무선전화장치의 송신계에 적용한 예를 도 6에 나타낸다. 안테나(311)에서 수신된 신호는, 고주파증폭기(312)에서 증폭되고, 아날로그/디지털변환기(313)를 거쳐서 복조회로(314)에 보내진다. 복조된 신호가 송신경로복호화부(315)에 보내진다. 이 송신경로복호화부(315)에서 음성은 채널디부호화처리되고 전송된 음성신호가 추출된다. 그 추출된 음성신호가 음성복호화부(316)에 보내진다. 이 음성복호화부(316)는 도 2, 도 4를 참조하여 설명한 음성복호화장치에 상당한다. 필요가 있는 경우에, 그 부호화부(316)내의 피치변환부(도 2, 도 4의 피치변환부(215)에 상당)에서 피치변환처리가 행해진다. 음성복호화부(316)에 의해 복호화된 음성신호가 복호화부(316)의 출력신호로서 디지털/아날로그변환기(317)에 보내지고, 증폭기(318)에서 아날로그 음성처리가 행해진후, 스피커(319)에 보내지고 음성으로서 방사된다.6 shows an example in which the voice decoding device is applied to the transmission system of the radiotelephone device. The signal received at the antenna 311 is amplified by the high frequency amplifier 312 and sent to the demodulation circuit 314 via the analog / digital converter 313. The demodulated signal is sent to the transmission path decoding unit 315. In this transmission path decoding unit 315, the voice is subjected to channel decoding processing, and the transmitted voice signal is extracted. The extracted voice signal is sent to the voice decoder 316. This voice decoding unit 316 corresponds to the voice decoding apparatus described with reference to FIGS. 2 and 4. If necessary, pitch conversion processing is performed in the pitch conversion unit (corresponding to the pitch conversion unit 215 in Figs. 2 and 4) in the coding unit 316. The audio signal decoded by the voice decoding unit 316 is sent to the digital / analog converter 317 as an output signal of the decoding unit 316, and analog voice processing is performed by the amplifier 318, and then to the speaker 319. Sent and radiated as voice.

물론, 이와 같은 무선전화장치이외의 장치에, 본 발명을 적용할수 있다. 즉, 도 1 등에서 설명된 음성부호화장치를 내장하고 음성신호를 다루는 각종 장치와 도 3 등에서 설명된 음성복호화장치를 내장하고 음성신호를 다루는 각종 장치에, 본 발명을 적용할수 있다.Of course, the present invention can be applied to devices other than such a radiotelephone apparatus. That is, the present invention can be applied to various devices incorporating the audio encoding device described in FIG. 1 and the like and handling various audio signals incorporating the audio decoding device described in FIG.

또한, 도 1, 도 3에서 설명한 음성부호화처리를 실행하는 처리프로그램이 기록된 기록매체(광디스크, 광자기디스크, 자기테이프등)에, 본 예의 피치변환부(119)에서의 처리에 상당하는 처리프로그램이 기록되고, 이 매체에서 읽혀진 처리프로그램이 컴퓨터장치등에서 실행되고 부호화하는 경우에, 동일의 피치변환처리가 실행될수 있다. 동일하게, 도 2 및 도 4를 참조하여 설명한 음성복호화처리를 실행하는 처리프로그램이 기록된 기록매체에 본 예의 피치변환부(240)에서의 처리에 상당하는 처리프로그램이 기록되고, 이 매체로부터 읽혀진 처리프로그램이 컴퓨터장치등에서 실행되어 복호화하는 경우에, 동일의 피치변환처리가 실행될수 있다.Incidentally, a process corresponding to the processing in the pitch converter 119 of this example on a recording medium (optical disk, magneto-optical disk, magnetic tape, etc.) in which the processing program for executing the audio encoding process described in Figs. In the case where a program is recorded and a processing program read from this medium is executed and encoded in a computer apparatus or the like, the same pitch conversion processing can be executed. Similarly, a processing program corresponding to the processing in the pitch converting unit 240 of this example is recorded on a recording medium on which a processing program for executing the audio decoding process described with reference to Figs. 2 and 4 is recorded and read from this medium. In the case where the processing program is executed in the computer apparatus or the like to decode, the same pitch conversion processing can be executed.

본 발명의 음성부호화방법에 따르면, 사인파분석부호화된 음성부호화데이터의 피치성분이 소정의 연산처리로 변경되고 피치변환하도록 한 것이다. 그 결과, 간단한 연산처리로, 입력음성의 음운을 변경하지 않고, 피치만을 정확히 변환하여 복호화할수 있다.According to the speech encoding method of the present invention, the pitch component of the sinusoidal encoded speech encoded data is changed to a predetermined calculation process and the pitch is converted. As a result, it is possible to convert and decode only the pitch accurately without changing the phonation of the input voice by simple arithmetic processing.

이 경우, 하모닉스의 개수를 소정수로 하는 데이터수변환이 행해진다. 그 결과, 복호화된 데이터에 의거한 피치변환이 간단히 행해진다.In this case, data number conversion is performed in which the number of harmonics is a predetermined number. As a result, pitch conversion based on the decoded data is simply performed.

이 데이터수변환을 행하는 경우에, 오버샘플링연산을 사용하는 보간처리로 데이터수변환처리가 행해진다. 그 결과, 오버샘플링연산을 사용하는 간단한 처리로, 데이터수변환이 행해진다.In the case of performing this data number conversion, the data number conversion processing is performed by interpolation processing using the oversampling operation. As a result, data number conversion is performed by a simple process using the oversampling operation.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분에 소정의 계수가 승산되고, 피치변환을 행한다. 그 결과, 예를 들면 입력음성의 음색을 변화시키는 피치변환처리가 가능하게 된다.When pitch conversion is performed at the time of encoding, a predetermined coefficient is multiplied by the pitch component of the sinusoidal coded speech coded data to perform the pitch conversion. As a result, for example, a pitch conversion process for changing the timbre of the input voice becomes possible.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분이 고정된 값으로 변환되고, 항상 일정의 피치로 변환된다. 예를 들면 그러므로 입력음성의 피치가 단조로운 인공적인 음성으로 변환될수 있다.In addition, when pitch conversion is performed at the time of encoding, the pitch component of the sinusoidal encoded audio coded data is converted into a fixed value, and is always converted to a constant pitch. For example, therefore, the pitch of the input speech can be converted to monotonous artificial speech.

또, 이 일정의 피치로 변환하는 경우에, 일정의 피치변환된 데이터에 소정의 주파수의 정현파의 데이터가 가산된다. 그 결과, 예를 들면 일정의 피치를 중심으로 하여 상하에 진동을 가지는 음성으로 변환할수 있다.In the case of converting to a constant pitch, the sinusoidal data of a predetermined frequency is added to the constant pitch-converted data. As a result, for example, the sound can be converted into a voice having vibration up and down about a constant pitch.

또, 부호화시에 피치변환이 행해지는 경우에, 소정의 정수값으로부터 사인파분석부호화된 음성부호화데이터의 피치성분이 감산되고, 피치변환이 행해진다. 그 결과, 예를 들면 입력음성의 어미의 인토네이션등이 역으로 변화하는 것같은 효과가 얻어지는 피치로 변환하는 것이 가능하게 된다.In addition, when pitch conversion is performed at the time of encoding, the pitch component of the sine wave analysis encoded speech encoded data is subtracted from a predetermined integer value, and pitch conversion is performed. As a result, for example, it is possible to convert the pitch into a pitch such that an effect such as inversion of the mother of the input voice is reversed.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분에 소정의 난수가 가산되고, 피치변환이 행해진다. 그 결과, 음성의 인토네이션등이 불규칙하게 변화하는 것같은 피치로 변환하는 것이 가능하게 된다.In addition, when pitch conversion is performed at the time of encoding, a predetermined random number is added to the pitch component of the sinusoidal encoded audio coded data, and the pitch transform is performed. As a result, it is possible to convert the speech into pitches that seem to change irregularly.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화를 사용하여 부호화된 음성부호화데이터의 피치성분에 소정의 주파수의 정현파의 데이터가 가산되고, 피치변환이 행해진다. 그 결과, 예를 들면 입력음성에 진동을 가함으로써 얻어진 음성으로의 변환하는 것이 가능하게 된다.When pitch conversion is performed at the time of encoding, sinusoidal data of a predetermined frequency is added to the pitch component of speech encoded data encoded using sine wave analysis encoding, and pitch conversion is performed. As a result, for example, it becomes possible to convert to the speech obtained by applying vibration to the input speech.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분의 평균치가 산출되고, 이 평균치가 음성부호화데이터로서 사용되어 피치변환된다. 그 결과, 예를 들면 입력음성의 오름과 내림이 감소된 음성으로 변환하는 것이 가능하게 된다.In addition, when pitch conversion is performed at the time of encoding, the average value of the pitch components of the sinusoidal coded speech coded data is calculated, and this average value is used as the speech coded data and pitch transformed. As a result, for example, it becomes possible to convert the input voice into a voice with reduced rise and fall.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분의 평균치가 산출되고, 음성부호화데이터와 평균치사이의 차가 음성부호화데이터에 가해져서 피치변환이 행해진다. 그 결과, 예를 들면 입력음성의 오름과 내림으로 강조되고 효과상 변조된 음성으로 변환하는 것이 가능하게 된다.When pitch conversion is performed at the time of encoding, the average value of the pitch components of the sinusoidal coded speech coded data is calculated, and the difference between the speech coded data and the average value is added to the speech coded data to perform the pitch conversion. As a result, for example, it is possible to convert a voice which is emphasized by the rising and falling of the input voice and is effectively modulated.

또, 부호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분이 미리 준비된 피치변환용 테이블의 데이터로 변환되고, 이 피치변환용 테이블에서 설정된 단계의 피치로 변환된다. 그 결과, 예를 들면 입력음성의 피치를 일정의 음계의 피치에 정규화하는 것같은 변환이 가능하게 된다.When pitch conversion is performed at the time of encoding, the pitch component of the sinusoidal encoded audio coded data is converted into data of a pitch conversion table prepared in advance, and converted to the pitch of the step set in the pitch conversion table. As a result, for example, the conversion such that the pitch of the input voice is normalized to the pitch of a certain scale can be performed.

본 발명의 음성부호화방법에 의하면, 사인파분석부호화된 피치성분이 소정의 연산처리로 변경된다. 그 결과, 간단한 연산처리로 음성의 음운을 변화하지 않고, 복호화된 음성의 피치만이 정확히 변환될수 있다.According to the speech encoding method of the present invention, the sine wave analysis encoded pitch component is changed to a predetermined calculation process. As a result, only the pitch of the decoded speech can be accurately converted without changing the phonation of the speech by simple arithmetic processing.

이 경우, 피치성분이 변경된 후, 하모닉스의 개수를 소정수로부터 데이터수변환이 행해진다. 그 결과, 변경된 피치성분에 의한 복호가 간단히 행해진다.In this case, after the pitch component is changed, data number conversion is performed from the predetermined number to the number of harmonics. As a result, decoding by the changed pitch component is simply performed.

또, 이 데이터수변환을 행하는 경우에, 오버샘플링연산을 사용하는 보간처리로 데이터수변환처리가 행해진다. 오버샘플링연산을 사용하는 간단한 처리로, 데이터수변환이 행해진다.In the case of performing this data number conversion, the data number conversion processing is performed by interpolation processing using the oversampling operation. In a simple process using the oversampling operation, data number conversion is performed.

또, 복호화시에 피치변환을 행하는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분에 소정의 계수를 승산하고, 피치변환이 행해진다. 그 결과, 복호하는 음성의 음색을 변화시키는 것같은 피치변환처리가 가능하게 된다.When pitch conversion is performed at the time of decoding, a pitch coefficient is multiplied by a predetermined coefficient multiplied by the pitch component of the sinusoidal coded speech coded data. As a result, pitch conversion processing such as changing the tone of the decoded voice can be performed.

또, 복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분이 고정된 값으로 변환되고, 항상 일정의 피치로 변환된다. 예를 들면, 그러므로 복호하는 음성의 피치가 단조로운 인공적인 음성으로 변환될수 있다.When pitch conversion is performed at the time of decoding, the pitch component of the sinusoidal encoded audio coded data is converted to a fixed value, and is always converted to a constant pitch. For example, therefore, the pitch of the decoded voice can be converted into a monotonous artificial voice.

또, 이 일정의 피치로 변환되는 경우에, 일정의 피치로 변환된 데이터에, 소정의 주파수의 정현파의 데이터가 가산된다. 그 결과, 예를 들면 일정의 피치를 중심으로 하여 상하에 진동을 가지는 음성으로 변환하는 것이 가능하게 된다.In the case of conversion to a constant pitch, data of a sinusoidal wave of a predetermined frequency is added to the data converted to a constant pitch. As a result, it becomes possible to convert, for example, to a voice having a vibration up and down around a constant pitch.

또, 복호화시에 피치변환이 행해지는 경우에, 소정의 정수값으로부터 사인파분석부호화된 음성부호화데이터의 피치성분이 감산되고, 피치변환이 행해진다. 그 결과, 예를 들면 복호하는 음성의 어미의 인토네이션등이 역으로 변화하는 것같은 효과를 가져오는 피치로 변환하는 것이 가능하게 된다.In addition, when pitch conversion is performed at the time of decoding, the pitch component of the sine wave analysis encoded speech encoded data is subtracted from a predetermined integer value, and pitch conversion is performed. As a result, for example, it is possible to convert the pitch into an effect such that the mother's intonation of the decoded voice is reversed.

또, 복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분에 소정의 난수가 가산되고, 피치변환이 행해진다. 그 결과, 예를 들면 복호하는 음성의 인토네이션등이 불규칙하게 변화하는 것같은 피치로 변환하는 것이 가능하게 된다.When pitch conversion is performed at the time of decoding, a predetermined random number is added to the pitch component of the sinusoidal encoded audio coded data, and pitch conversion is performed. As a result, for example, it becomes possible to convert the pitch into which an internation of a decoded voice, etc. changes irregularly.

또, 복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화에 의해 음성부호화데이터의 피치성분에 소정의 주파수의 정현파의 데이터가 가산되고, 피치변환이 행해진다. 그 결과, 예를 들면 복호하는 음성에 진동에 가함으로써 얻어지는 음성으로 변환하는 것이 가능하게 된다.In the case where pitch conversion is performed at the time of decoding, sinusoidal data of a predetermined frequency is added to the pitch component of speech encoded data by sinusoidal analysis encoding, and pitch conversion is performed. As a result, for example, it becomes possible to convert the voice to be obtained by applying vibration to the decoded voice.

또, 복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분의 평균치를 산출하고, 이 평균치를 피치변환된 음성부호화데이터로서 사용된다. 그 결과, 예를 들면 복호하는 음성의 억양이 감소된 음성으로 변환하는 것이 가능하게 된다.When pitch conversion is performed at the time of decoding, the average value of the pitch components of the sinusoidal coded speech coded data is calculated, and the average value is used as the pitch coded speech coded data. As a result, for example, it becomes possible to convert the intonation of the decoded speech into the reduced speech.

또, 복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분의 평균치가 산출되고, 음성부호화데이터와 평균치와의 차분을 음성부호화데이터에 가산하여 피치변환된다. 그 결과, 예를 들면 복호하는 음성의 억양이 강조되고 변조된 효과가 있는 음성으로 변환하는 것이 가능하게 된다.When pitch conversion is performed at the time of decoding, the average value of the pitch components of the sinusoidal coded speech coded data is calculated, and the difference between the speech coded data and the average value is added to the speech coded data to be pitch transformed. As a result, for example, the intonation of the decoded voice is emphasized and it is possible to convert the voice into a modulated effect.

복호화시에 피치변환이 행해지는 경우에, 사인파분석부호화된 음성부호화데이터의 피치성분이 미리 준비된 피치변환용 테이블의 데이터로 변환하고, 이 피치변환용 테이블에서 설정된 단계의 피치로 변환된다. 그 결과, 예를 들면 복호하는 음성의 피치를 일정의 음계의 피치로 정규화하는 것같은 변환이 가능하게 된다.In the case where pitch conversion is performed at the time of decoding, the pitch component of the sinusoidal encoded coded speech coded data is converted into data of a pitch conversion table prepared in advance, and converted to the pitch of the steps set in this pitch conversion table. As a result, for example, a conversion such as normalizing the pitch of the decoded voice to a pitch of a certain scale can be performed.

본 발명의 음성부호화장치에 의하면, 사인파분석부호화된 피치성분을 소정의 연산처리로 변경하는 피치변환수단을 가진다. 사인파분석부호화된 데이터의 피치성분을 변환처리에 의한 간단한 처리구성으로, 입력음성의 음운을 변화하지 않고, 피치만을 정확히 변환하여 복호화하는 것이 가능하게 된다.According to the speech coding apparatus of the present invention, there is provided a pitch converting means for changing a sine wave analysis encoded pitch component into a predetermined calculation process. The pitch component of the sinusoidal encoded data can be converted and decoded accurately by only the pitch without changing the phonation of the input voice with a simple processing configuration by the conversion process.

이 경우, 하모닉스의 개수를 소정수로 하는 데이터수변환이 행해진다. 그 결과, 간단한 처리구성으로 부호화가 실행될수 있다. 또한, 부호화된 데이터에 의거한 피치변환이 간단히 행해진다.In this case, data number conversion is performed in which the number of harmonics is a predetermined number. As a result, encoding can be performed with a simple processing structure. Further, pitch conversion based on the encoded data is simply performed.

또, 이 데이터수변환처리가 대역제한형 오버샘플링필터에 의한 보간처리에 의해 행해진다. 그 결과, 오버샘플링필터를 사용한 간단한 처리구성으로 데이터수변환처리가 행해진다.This data number conversion processing is performed by interpolation processing by a band limited oversampling filter. As a result, data number conversion processing is performed with a simple processing structure using an oversampling filter.

본 발명의 음성복호화장치에 의하면, 사인파분석부호화된 데이터의 피치성분이 피치변환수단으로 변환되고, 변환된 사인파분석부호화데이터와 선형예측잔차에 의거한 부호화데이터에 의해 음성복호화수단으로 복호화처리가 행해진다. 그러므로, 간단한 처리구성으로 음성의 음운을 변화시키지 않고, 복호하는 음성의 피치만을 정확히 변환할수 있다.According to the speech decoding apparatus of the present invention, the pitch component of the sinusoidal encoded data is converted into a pitch converting means, and the decoding process is performed by the speech decoding means by the encoded sinusoidal encoded data and the encoded data based on the linear prediction residual. All. Therefore, it is possible to accurately convert only the pitch of the decoded voice without changing the phonation of the voice with a simple processing configuration.

이 경우, 하모닉스의 개수를 소정수로부터 데이터수변환이 행해진다. 그 결과, 하모닉스의 개수를 변환하는 만큼의 간단한 처리구성으로 변환된 피치의 복호가 행해진다.In this case, data number conversion is performed from the predetermined number of harmonics. As a result, the converted pitch is decoded by a simple processing structure as much as the number of harmonics is changed.

또, 이 데이터수변환처리가 대역제한형 오버샘플링필터를 사용하여 보간처리에 의해 행해진다. 그 결과, 오버샘플링필터를 사용한 간단한 처리구성으로, 복호시의 데이터수변환이 행해진다.This data number conversion processing is performed by interpolation processing using a band limited oversampling filter. As a result, with the simple processing structure using the oversampling filter, data number conversion during decoding is performed.

또, 본 발명에 따르는 전화장치는 사인파분석부호화 수단에서 분석부호화된 데이터의 피치성분이 변환되는 피치변환수단을 가지고 있다. 그러므로, 간단한 구성으로 송신하는 음성데이터의 피치성분이 소망의 상태로 변환되는 것이 가능하게 된다.The telephone apparatus according to the present invention also has a pitch converting means for converting pitch components of the data encoded by the sine wave analysis encoding means. Therefore, the pitch component of the audio data to be transmitted with a simple configuration can be converted into a desired state.

또 본 발명의 피치변환방법에 따르면, 음성신호를 사인파분석하여 부호화된 피치성분의 데이터에 소정의 계수를 승산하고, 피치변환이 행해진다. 그 결과, 예를 들면 입력음성의 음색을 변화하는 것같은 변환이 간단히 행해진다.According to the pitch conversion method of the present invention, a sine wave analysis of a speech signal is performed to multiply a predetermined coefficient by the encoded pitch component data, and pitch conversion is performed. As a result, for example, conversion such as changing the timbre of the input voice is performed simply.

또 본 발명의 피치변환방법에 따르면, 음성신호를 사인파분석하여 부호화된 피치성분의 데이터를 고정된 값으로 변환하여, 항상 일정의 피치로 변환된다. 그 결과, 예를 들면 입력음성의 피치가 단조로운 인공적인 음성으로 변환된다.In addition, according to the pitch conversion method of the present invention, a sine wave analysis of a speech signal is performed to convert the encoded pitch component data into a fixed value, which is always converted to a constant pitch. As a result, for example, the pitch of the input voice is converted into a monotonous artificial voice.

또, 본 발명의 피치변환방법에 따르면, 소정의 정수값으로부터, 사인파분석부호화에 의해 부호화된 음성부호화데이터가 감산되고, 피치변환이 행해진다. 그 결과, 예를 들면 입력음성의 어미의 억양등이 역으로 변화하는 것같은 효과가 얻어지는 피치로 변환되는 것이 가능하게 된다.Further, according to the pitch conversion method of the present invention, the speech encoded data encoded by the sine wave analysis encoding is subtracted from the predetermined integer value, and the pitch conversion is performed. As a result, for example, it becomes possible to convert the pitch into a pitch such that an effect such as the intonation of the input voice is reversed.

또 본 발명의 매체에 따르면, 부호화 프로그램이 기록된 매체에 사인파분석부호화에 의해 부호화된 음성부호화데이터의 피치성분을 변환하는 처리프로그램이 기록된다. 이 처리프로그램을 실행함으로써, 입력음성의 음운을 변화하지 않고, 피치만을 정확히 변환하여 부호화하는 것이 가능하게 된다.According to the medium of the present invention, a processing program for converting pitch components of speech encoded data encoded by sine wave analysis encoding is recorded on a medium on which an encoded program is recorded. By executing this processing program, it is possible to accurately convert and encode only pitches without changing the phonation of the input voice.

또 본 발명의 매체에 따르면, 복호화 프로그램이 기록된 매체에 사인파분석복호화된 데이터의 피치성분을 변환하는 피치변환처리프로그램이 기록된다. 그러므로, 이 처리프로그램을 실행함으로써, 입력음성의 음운을 변화하지 않고, 복호화된 음성의 피치만을 정확히 변환하는 것이 가능하게 된다.Further, according to the medium of the present invention, a pitch conversion processing program for converting pitch components of sine wave analysis decoded data is recorded on a medium on which a decoding program is recorded. Therefore, by executing this processing program, it is possible to accurately convert only the pitch of the decoded speech without changing the phonation of the input speech.

본 발명의 실시예를 도면을 참조하여 서술하였지만, 본 발명은 상술의 실시예에 한정되는 것은 아니고 여러 가지 변화와 변경이 본 기술에서 숙련된 자에 의해 청구범위에서 정의되어 있는 본 발명의 요지와 범위를 벗어나지 않고 가능하다.Although embodiments of the present invention have been described with reference to the drawings, the present invention is not limited to the above-described embodiments, and various changes and modifications are defined in the claims by the person skilled in the art. Possible without departing from the scope.

도 1은 본 발명의 실시예에 따른 음성부호화장치의 일예의 기본 구성을 나타내는 블록도이다.1 is a block diagram showing a basic configuration of an example of an audio encoding apparatus according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 음성신호 복호화장치의 기본 구성을 나타내는 블록도이다.2 is a block diagram showing the basic configuration of an audio signal decoding apparatus according to an embodiment of the present invention.

도 3은 도 1의 음성신호 부호화장치의 보다 상세한 구성을 나타내는 블록도이다.3 is a block diagram illustrating a more detailed configuration of the audio signal encoding apparatus of FIG. 1.

도 4는 도 2의 음성신호 복호화장치의 보다 상세한 구성을 나타내는 블록도이다.4 is a block diagram showing a more detailed configuration of the audio signal decoding apparatus of FIG.

도 5는 무선전화장치의 송신시스템에 응용된 일예를 나타내는 블록도이다.5 is a block diagram showing an example applied to a transmission system of a radiotelephone apparatus.

도 6은 무선전화장치의 수신시스템에 적용된 일예를 나타내는 블록도이다.6 is a block diagram showing an example applied to the reception system of the radiotelephone apparatus.

* 도면의 주요부분에 대한 부호설명* Explanation of symbols on the main parts of the drawings

110. 제 1부호화부 111. LPC역필터110. First Coder 111. LPC Inverse Filter

113. LPC분석·양자화부 114. 사인파분석 부호화부113. LPC analysis and quantization unit 114. Sine wave analysis encoding unit

115. V/UV판정부 119. 피치변환부115.V / UV Determination 119. Pitch Conversion Unit

120. 제 2부호화부 121. 잡음부호장120. Second Coder 121. Noise Code

122. 가중합성필터 123. 감산기122. Weighted synthetic filter 123. Subtractor

124. 거리계산회로 125. 청각가중필터124. Distance calculation circuit 125. Acoustic weight filter

211. 유성음합성회로 212. 역벡터양자화기211. Voiced speech synthesis circuits 212. Inverse vector quantizers

215. 피치변환부 270. 데이터변환부215. Pitch converter 270. Data converter

Claims

Dividing the speech signal into predetermined coding units on the time axis;

Obtaining a linear prediction residual for each split coding unit; and

In the speech encoding method comprising the step of performing a sinusoidal analysis encoding based on the linear prediction residual,

And changing a pitch component of said sine wave analysis encoded speech encoding data to a predetermined calculation process.

The method of claim 1,

The encoding process is performed by harmonic encoding, and data number conversion is performed in which the number of harmonics is a predetermined number.

The method of claim 2,

And the data number conversion processing is performed by an interpolation processing using an oversampling operation.

The method of claim 1,

A speech encoding method characterized by performing a pitch conversion by multiplying a pitch component of a sine wave analysis encoded speech encoding data by a predetermined coefficient.

The method of claim 1,

A speech encoding method comprising converting a pitch component of a sinusoidal encoded speech encoded data into a fixed value and always converting the pitch component into a fixed pitch.

The method of claim 5,

And a sine wave of a predetermined frequency is added to the data of the predetermined pitch.

The method of claim 1,

And a pitch conversion is performed by subtracting the pitch component of the sine wave analysis encoded speech encoded data from a predetermined integer value.

The method of claim 1,

And a predetermined random number is added to a pitch component of said sine wave analysis encoded speech encoding data, and pitch conversion is performed.

The method of claim 1,

And the sine wave data of a predetermined frequency is added to the pitch component of the speech encoded data by the sine wave analysis encoding, and the pitch encoding is performed.

The method of claim 1,

And an average value of pitch components of speech encoded data is calculated by the sine wave analysis encoding, and the average is converted into pitch encoded speech encoded data.

The method of claim 1,

And an average value of pitch components of the speech encoded data encoded by the sine wave analysis encoding, and a difference between the speech encoded data and the average value is added to the speech encoded data to perform pitch conversion.

The method of claim 1,

And a pitch component of the sine wave analysis encoded speech encoding data is converted into data of a pitch conversion table prepared in advance, and converted into a pitch of a step set in the pitch conversion table.

A speech decoding method in which a speech signal is decoded on the basis of linear predictive residual data of a predetermined coding unit on a time axis and sine wave analysis encoded data,

And changing a pitch component of the sine wave analysis encoded data by a predetermined calculation process.

The method of claim 13,

And converting the number of harmonics to a predetermined number in a coding process using harmonics coding after the pitch component is changed to a predetermined arithmetic processing.

The method of claim 14,

And said data number conversion processing is performed by interpolation processing using oversampling operation.

The method of claim 13,

A speech decoding method comprising performing a pitch conversion by multiplying a pitch component of a sinusoidal encoded speech encoded data by a predetermined coefficient.

The method of claim 13,

A speech decoding method comprising converting a pitch component of a sinusoidal coded speech coded data into a fixed value and always converting the pitch component into a fixed pitch.

The method of claim 17,

The method of claim 13,

And the sine wave data of a predetermined frequency is added to the pitch component of the speech encoded data by the sine wave analysis encoding, and the pitch decoding is performed.

The method of claim 13,

Linear predictive residual detection means for obtaining a linear predictive residual from an input speech signal in a predetermined coding unit on a time axis;

Sine wave analysis encoding means for sine wave analysis encoding the linear predicted residual detected by the linear predictive residual detection means;

And a pitch conversion means for converting a pitch component of the data encoded by the sine wave analysis encoding means.

The method of claim 25,

And a data number conversion for setting the number of harmonics used for harmonic encoding to a predetermined number is performed by said sinusoidal encoding encoding means.

The method of claim 26,

And said data number conversion processing is performed by interpolation processing using a band limited oversampling filter.

A speech decoding apparatus for decoding a speech signal on the basis of linear predictive residual data of a predetermined coding unit on a time axis and sine wave analysis encoded data,

Pitch conversion means for converting a pitch component of said sine wave analysis encoded data;

And a speech decoding means for performing a decoding process on the sine wave analysis coded data and the linear predictive residual data converted by the pitch converting means.

The method of claim 28,

And a data number conversion for setting a predetermined number of harmonics used for harmonic encoding on the basis of the converted pitch component data.

The method of claim 29,

And the data number conversion processing is performed by interpolation processing using a band-limited oversampling filter.

Sinusoidal stone encoding means for encoding sinusoidal wave analysis encoding the linear predictive residual detected by the linear predictive residual detection means;

Pitch conversion means for converting pitch components of the data encoded by the sine wave analysis encoding means;

And transmission means for transmitting the analysis coded data and the linear prediction residual data converted by the pitch conversion means to a predetermined transmission line.

A pitch conversion method comprising sine wave analysis of a speech signal and multiplying a predetermined coefficient by the encoded pitch component data to perform a pitch conversion.

Sine wave analysis of the speech signal to convert the data of the encoded pitch component to a fixed value, the pitch conversion method comprising the step of always converting to a constant pitch

A pitch conversion method comprising subtracting data of a pitch component encoded by sine wave analysis encoding from a predetermined integer value and performing pitch conversion.

Processing for classifying the input audio signal into predetermined coding units on the time axis;

A process of detecting a linear prediction residual in each of the divided coding units;

A medium on which a program for performing a process for encoding sinusoidal analysis of the linear prediction residuals is recorded.

And a processing program for converting a pitch component of said sine wave analysis encoded speech encoded data.

A medium in which a processing program for decoding a speech signal is recorded on the basis of linear predictive residual data and sine wave analysis encoded data of a predetermined coding unit on a time axis,

A medium for recording a pitch conversion processing program for converting pitch components of said sine wave analysis encoded data;