KR950003557B1

KR950003557B1 - Encoding method of voice sample and signal sample

Info

Publication number: KR950003557B1
Application number: KR1019910701947A
Authority: KR
Inventors: 앨런 거슨 아이라
Original assignee: 모토로라 인코포레이티드; 빈센트 죠셉 로너
Priority date: 1989-06-23
Filing date: 1990-05-02
Publication date: 1995-04-14
Also published as: AU5735990A; KR920702787A; CN1023160C; CA2060310C; IL94119A0; BR9007467A; AU638462B2; EP0484339A4; WO1991001545A1; CA2060310A1; DE69032026T2; IL94119A; NZ234180A; CN1048278A; DE69032026D1; EP0484339B1; EP0484339A1

Abstract

내용 없음.No content.

Description

[발명의 명칭][Name of invention]

음성 샘플 및 신호 샘플 엔코딩 방법Speech Sample and Signal Sample Encoding Methods

[도면의 간단한 설명][Brief Description of Drawings]

제 1 도는 본 발명을 도시한 블럭도.1 is a block diagram illustrating the present invention.

제 2 도는 본 발명의 한 양상을 도시한 간단한 백터도.2 is a simple vector diagram illustrating an aspect of the present invention.

[발명의 상세한 설명]Detailed description of the invention

[기술분야][Technical Field]

본 발명은 일반적으로 음성 부호기에 관한 것이며, 특히 벡터 여자원(vector excitation source)을 이용하는 디지탈 음성 부호기에 관한 것이다.FIELD OF THE INVENTION The present invention relates generally to speech coders, and more particularly to digital speech coders using vector excitation sources.

[발명의 배경][Background of invention]

음성 부호기들이 본 기술에 공지되어 있다. 일부 음성 부호기들은 아날로그 음성 샘플을 디지탈화된 표현으로 변환시킨 다음에 선형 예측 부호화를 이용하여 스팩트럴 음성 정보를 표현한다. 다른 음성 부호기들은 원음 신호와 관계하는 여자 신호를 제공하므로써 통상적인 선형 예측 부호화 기술을 개선한다. 이미 허여된 미합중국 공보 제4,817,157호에 개선된 벡터 여자원을 갖는 디지탈 음성 부호기가 서술되어 있는데, 이 공보에서, 여자 벡터의 코드북은 활용 정보와 최적으로 어울리는 여자 신호를 선택하도록 액세스되므로써 원신호를 근사하게 표현하는 복원된 음성 신호를 제공한다.Speech coders are known in the art. Some speech coders convert analog speech samples into digital representations and then use linear predictive coding to represent spectral speech information. Other speech coders improve the conventional linear predictive coding technique by providing an excitation signal relative to the original speech signal. Already issued US Patent No. 4,817,157 describes a digital speech coder with an improved vector excitation source, in which the codebook of the excitation vector is approximated to the original signal by being accessed to select the excitation signal that best matches the utilization information. It provides a reconstructed speech signal expressing.

일반적으로, 여자원으로서 사용되는 특정수의 예비 여자 벡터가 존재하는 경우 최종 디코드된 음성 신호는 원래 비엔코드된 음성 신호를 표현한다. 그러나, 이 방식으로 수행성능을 증가시키면은 일반적으로 코드북 크기를 확대시키고 통상적으로 처리를 복잡하게하고 데이타율을 증가시킨다.In general, the final decoded speech signal represents the original unencoded speech signal when there is a certain number of preliminary excitation vectors used as excitation sources. However, increasing performance in this manner generally increases the codebook size, typically complicates processing and increases data rates.

그러므로, 벡터 여자 신호를 이용하는 디지탈 음성 부호기에 대한 필요성이 존재하며, 여기서 소정 크기의 코드북으로도 복잡도를 최소화시키면서 데이타율을 거의 증가시킴이 없이 디코드된 음성 신호질을 최대로 한다.Therefore, there is a need for a digital speech coder using a vector excitation signal, where a codebook of a predetermined size maximizes the decoded speech signal quality with little increase in data rate while minimizing complexity.

[발명의 요약][Summary of invention]

개선된 음질을 갖는 벡터 여자원을 이용하는 디지탈 음성 부호기 설비로 이들 필요성과 그밖의 다른것들이 해결될 수 있다. 본 발명에 따르면, 음성 샘플을 엔코딩할 때, 부호기는 우선 음성 샘플에 대한 피치 주기 파라미터를 결정한다. 부분적으로 이 피치 주기 파라미터에 따라서, 특정 부호화된 여자 신호는 피치 필터 계수와 관계없이 결정될 수 있으며, 그에따라 상기 피치 필터 계수 파라미터는 특정음성 샘플을 최적화시킨다. 이 방법론은 처리 복잡도 또는 데이타율을 증가시킴이 없이 예비 여자 신호를 고려할 수 있다.These needs and others can be solved with a digital speech coder facility using vector excitation sources with improved sound quality. According to the present invention, when encoding speech samples, the encoder first determines the pitch period parameter for the speech samples. In part according to this pitch period parameter, a particular coded excitation signal can be determined irrespective of the pitch filter coefficient, whereby the pitch filter coefficient parameter optimizes a particular speech sample. This methodology can account for preliminary excitation signals without increasing processing complexity or data rate.

일실시예에서, 부호화된 여자 신호는 어떤 피치 정보와 관계없이 결정된다. 특히, 코드북에 의해 제공된 바와 같은 예비 여자 신호가 처리되어 적어도 부분적으로 중간 피치 벡터와 관계되는 기준 성분에 의해 적어도 부분적으로 표현가능한 성분을 거의 제거한다. 특히, 중간 피치 벡터와 관계하는 벡터 성분은 예비 여자 신호(직교화로서 공지된 처리)로부터 제거된다. 그리고나서, 직교화된 예비 여자 신호는 비엔코드된 음성 샘플과 비교되어 이 특정 음성 샘플을 최적으로 표현하는 예비 여자 신호를 식별한다. 그 다음에, 피치 필터 계수 파라미터를 포함하는 피치 정보는 선택된 여자 신호와 최상으로 어울리도록 최적화되므로써 전체 음성 신호를 최적으로 표현한다.In one embodiment, the encoded excitation signal is determined regardless of any pitch information. In particular, a preliminary excitation signal as provided by the codebook is processed to substantially eliminate at least partly representable components by the reference component at least partially related to the intermediate pitch vector. In particular, the vector components associated with the intermediate pitch vectors are removed from the preliminary excitation signal (process known as orthogonalization). The orthogonalized preliminary excitation signal is then compared with the unencoded speech sample to identify a preliminary excitation signal that best represents this particular speech sample. Then, the pitch information including the pitch filter coefficient parameter is optimized to best match the selected excitation signal, thereby optimally representing the entire speech signal.

또다른 실시예에서, 예비 여자 신호, 즉 음성 샘플을 표현하기 위하여 사용되는 두개의 여자 신호의 제 2 코드북이 제공된다. 제 1 여자 신호는 상술된 바와 같이 선택되고 제 2 여자 신호는 유사한 방식으로 선택되는데, 여기서 제 2 예비 여자 신호는 중간 피치 벡터 및 사전에 선택된 제 1 여자 신호 둘다와 관계하여 우선적으로 최적화된다.In another embodiment, a second codebook of preliminary excitation signals, ie two excitation signals used to represent speech samples, is provided. The first excitation signal is selected as described above and the second excitation signal is selected in a similar manner, where the second preliminary excitation signal is optimized first in relation to both the intermediate pitch vector and the first selected excitation signal.

[발명의 상세한 설명]Detailed description of the invention

본 발명은 모토로라 DSP 56000류의 장치와 같은 적당한 디지탈 신호 처리기를 사용하는 음성 부호기로 실현될 수있다. 그러한 DSP 실시예의 연산 기능들이 블럭도로 제 1 도에 도시되어 있다.The present invention can be realized with a voice coder using a suitable digital signal processor such as the Motorola DSP 56000 device. The computational functions of such a DSP embodiment are shown in FIG. 1 in a block diagram.

피치 주기 파라미처(101)는 (종래 기술에 따라서 결정됨)피치 필터부를 구비하는 피치 필터 스테이트(102)에 제공된다. 최종 신호(103)는 제 1 멀티플라이어(104) 및 이하에 더욱 상세하게 서술되는 두개의 직교 처리부(106 및 107) 둘다에 제공되는 중간 피치 벡터를 구비한다. 이 제 1 멀티플라이어(104)는 상기 최종 신호를 피치 필터 계수(108)와 승산시켜 피치 필터 출력(109)을 발생시키는 기능을 한다. 피치 필터 계수(108) 선택이 이하에 더욱 상세하게 서술될 것이다.Pitch period parameter 101 is provided to pitch filter state 102 having a pitch filter portion (determined according to the prior art). The final signal 103 has an intermediate pitch vector provided to the first multiplier 104 and to two orthogonal processing units 106 and 107 described in more detail below. The first multiplier 104 functions to generate the pitch filter output 109 by multiplying the final signal by the pitch filter coefficient 108. Pitch filter coefficient 108 selection will be described in more detail below.

제 1 코드북(111)은 복수의 최종 여자 신호를 형성하기 위하여 선형적으로 결합되는 한 세트의 기본 벡터를 포함한다. 활용되는 메모리 크기와 응용에 적합한 다른 요인에 따라서, 가능한 최종 여자 신호수는 특정 응용에 적합할 때 가능한 예를들어 64 및 2048사이가 될 수 있다. 특정 음성 샘풀을 엔코딩할 때의 문제점은 원음 정보의 대응 성분을 최상으로 표현하는 이들 여자원중 어느 여자원을 선택하느냐는 것이다.The first codebook 111 includes a set of base vectors that are linearly combined to form a plurality of final excitation signals. Depending on the memory size utilized and other factors suitable for the application, the possible number of final excitation signals may be between 64 and 2048, for example, when suitable for a particular application. The problem with encoding a particular speech sample is that which female source is selected among those female sources that best represent the corresponding components of the original sound information.

본 발명에 따르면, 일단 특정 최종 신호(103)가 결정되면, 제 1 코드북(111)에 의해 포물레이트(formulated)되는 여자 신호는 예비 여자원으로서 순차 형태로 제공된다. 각 예비 여자원은 최종 신호와 관계하여 우선 직교화(106)된다. 예를들어, 제 2 도를 참조하면, 만일 벡터 A가 최종 신호를 표현하기 위하여 고려되고 벡터 B가 특정 예비 여자원을 표현한다면, 예비 여자 신호원을 직교화하면은 참조문자 B'로 표시되는 벡터를 야기시킨다(실제에서, 벡터 차원 스페이스는 벡터를 포함하는 샘플수의 함수인데, 그것은 40샘플이거나 또는 그 이상일 수 있다. 또한, 예비 여자 벡터는 기본 벡터를 직교화하므로써 손쉽게 직교화되며, 여기서 서로 직교화된 기본 벡터의 선형 결합이 직교화된 여자 벡터를 발생시킨다).According to the present invention, once the specific final signal 103 is determined, the excitation signal formulated by the first codebook 111 is provided in sequential form as a preliminary excitation source. Each preliminary excitation source is first orthogonalized 106 in relation to the final signal. For example, referring to FIG. 2, if vector A is considered to represent the final signal and vector B represents a particular preliminary excitation source, orthogonalizing the preliminary excitation signal source is represented by the reference letter B '. Results in a vector (in practice, the vector dimension space is a function of the number of samples containing the vector, which may be 40 samples or more. In addition, the preliminary excitation vector is easily orthogonalized by orthogonalizing the base vector, where Linear combination of orthogonal base vectors with each other results in an orthogonal excitation vector).

일단 직교화되면, 발생한 예비 여자원은 비엔코드된 신호(113) (또는 그것을 토대로 적절히 표현된 신호)와 비교되어(112), 둘간의 상대적인 유사성 또는 불규형성을 결정한다. 그리고나서, 상기 처리는 제 1 코드북(111)의 각 여자원에 대하여 반복된다. 그리고나서, 어느 예비 여자원이 비엔코드된 신호(113)와 가장 근사하게 정렬하는지에 대한 결정이 이루어진다.Once orthogonalized, the generated preliminary excitation source is compared 112 with an unencoded signal 113 (or a signal properly represented based thereon) to determine the relative similarity or disproportionation between the two. Then, the process is repeated for each excitation source of the first codebook 111. A determination is then made as to which preliminary excitation source most closely aligns with the unencoded signal 113.

이 특정 실시예에서, 이득 인자(114)는 또한 본 기술로 이해할 수 있는 바와같이 각 예비 여자 신호원을 수정하기 위하여 사용된다. 추가하여, 만일 바람직하다면, 여자원 선택 및 이득 보상들다는 본 기술로 알 수 있는 바와같이 거의 유사한 방식으로 성취될 수 있다.In this particular embodiment, the gain factor 114 is also used to modify each pre-excitation signal source as can be understood by the present technology. In addition, if desired, excitation selection and gain compensations can be achieved in a nearly similar manner as can be seen with the present technology.

이 처리를 통해 일단 제 1 코드북(11)으로부터 적당한 여자원이 선택되면, 그후 직교화 처리(106)는 수행 할 필요가 없고 정확한 여자원 신호가 적당한 제어 매카니즘(117)을 통해 선택된다(116). 그후에, 단일 코드북 부호기를 가정하면, 피치 정보는 결합된 여자원이 엔코드된 신호(113)와 가장 근사하게 정렬하도록 최적화되는 여자 이득(114) 및 피치 필터 계수(108)를 갖는 선택된 여자원과 함께 게이팅되어 합산된다. 일단 최적화되면, 피치 주기 파라미터, 피치 필터 계수 및 특정 여자원과 이득이 공지되고 그후에 그것의 적당한 표현을 원음 샘플을 표현하는 것으로서 활용된다.Once the appropriate excitation source is selected from the first codebook 11 through this process, then the orthogonalization process 106 does not need to be performed and the correct excitation signal is selected via the appropriate control mechanism 117 (116). . Subsequently, assuming a single codebook encoder, the pitch information may be derived from a selected excitation source having an excitation gain 114 and a pitch filter coefficient 108 that are optimized such that the combined excitation source is most closely aligned with the encoded signal 113. Gated together and added up. Once optimized, the pitch period parameters, pitch filter coefficients, and specific excitation and gain are known and then their proper representation is utilized as representing the original sample.

만일 바람직하다면, 제 1 도에 도시된 바와 같이, 추가 코드북(121)이 활용되며, 제 2 코드북은 예비 여자원을 발생하는 복수의 기본 벡터를 또다시 포함한다. 본 기술을 통해 그러한 멀티플 코드북 이용을 알 수 있다. 그러나 본 발명에 따르면, 일단 제 1 코드북(111)으로부터 제 1 여자원이 상술된 바와 같이 선택되면, 제 2 코드북(121)로부터 예비 여자원은 최종 신호(103) 및 제 1 코드북(111)으로부터 선택된 여자원 신호와 관계하여 직교화(107)된다. 그리고나서, 비엔코드된 신호(113)와 비교되는 제 2 코드북으로부터 직교화된 예비 여자원 신호로 상술된 바와 같이 선택 처리를 계속하여 가장 적합한 것을 식별한다. 그리고나서, 일단이 여자원이 선택되면, 피치 필터 계수(108)와 여자 이득(114 및 120)은 상술된 바와 같이 최적화될 수 있다.If desired, as shown in FIG. 1, an additional codebook 121 is utilized, and the second codebook again contains a plurality of basic vectors that generate preliminary excitation sources. The present technology demonstrates the use of such multiple codebooks. However, according to the present invention, once the first excitation source is selected from the first codebook 111 as described above, the preliminary excitation source from the second codebook 121 is transferred from the final signal 103 and the first codebook 111. Orthogonalized 107 in relation to the selected excitation source signal. The selection process is then continued as described above with the orthogonal preliminary excitation signal from the second codebook compared with the unencoded signal 113 to identify the most suitable one. Then, once the excitation source is selected, the pitch filter coefficients 108 and the excitation gains 114 and 120 can be optimized as described above.

Claims

A speech sample encoding method comprising the steps of: A) determining a pitch period parameter for a speech sample, B) determining a plurality of encoded excitation signals of the speech sample irrespective of an arbitrary pitch filter parameter coefficient, and C) a plurality of Processing each encoded excitation signal of to provide a plurality of processed preliminary excitation signals, each of the plurality of processed preliminary excitation signals representing a pitch filter output derived at least in part as the pitch period parameter function; Providing said plurality of processed preliminary excitation signals comprising information substantially independent of and D) optimizing at least one pitch filter coefficient parameter for said speech sample. Way.

4. The method of claim 1, wherein the step of determining the encoded excitation signal processes the plurality of preliminary excitation signals to orthogonalize the plurality of preliminary excitation signals in relation to a pitch filter output that is at least partially derived as a pitch period parameter function. The voice sample encoding method further comprises.

2. The method of claim 1, wherein the step of determining the encoded excitation signal comprises: B1) processing the excitation signal to substantially eliminate a component that can be at least partially represented by a reference component that is at least partially related to the pitch period parameter; B2) determining the appropriate excitation signal for the speech sample.

4. The method of claim 3, wherein the exciting signal processing step further comprises processing the excitation signal to orthogonalize the excitation signal in relation to a pitch filter output that is at least partially derived as the pitch period parameter function. Voice sample encoding method.

4. The method of claim 3, wherein C1) substantially eliminating a reference component that is at least partially related to the pitch period parameter processing the preliminary excitation signal and a component that can at least partially represent the appropriate excitation signal determined in step C. The voice sample encoding method further comprises.

6. The method of claim 5, wherein the preliminary excitation signal processing further comprises processing the preliminary excitation signal to orthogonalize the preliminary excitation signal relating to both the reference component and the appropriate excitation signal determined in step C. A voice sample encoding method.

A method of encoding a signal sample using at least two codebooks comprising information related to a preliminary excitation signal, the method comprising: A) determining a first excitation signal for the signal sample using a first codebook of the codebooks; And B) determining a second excitation signal using information of the codebook that is substantially independent of information that can be represented as the first excitation signal for a signal sample using a second codebook of the codebook; And at least partially representing said signal sample using a first and a second excitation signal.

8. The method of claim 7, wherein the signal sample further comprises a voice sample.

8. The method of claim 7, wherein determining the second excitation signal further comprises processing a preliminary excitation signal to orthogonalize the preliminary excitation signal in relation to the first excitation signal.