JPH0793000A

JPH0793000A - Speech encoding device

Info

Publication number: JPH0793000A
Application number: JP5239515A
Authority: JP
Inventors: Fumihiro Matsuoka; 文啓松岡; Hirohisa Tazaki; 裕久田崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1993-09-27
Filing date: 1993-09-27
Publication date: 1995-04-07

Abstract

PURPOSE:To obtain the speech encoding device which can be improved in total sound source encoding characteristics by securing the possibility that a representative sound source, which is good in encoding characteristics and overlooked when a single representative sound source is used, from a frame. CONSTITUTION:When a pitch cycle 14 has a value, i.e., when a current processed frame is a voiced sound, a voiced sound source encoding means 8 extracts plural representative sound sources 16 by a multiple representative sound extracting means 9 according to previously set extraction standards. Then a sound encoding distortion calculating means 10 compares each of the representative sound sources 16, extracted by the multiple representative sound source extracting means 9, with respective code words used for specific encoding and finds and outputs distortion 17 in specific encoding between two, e.g. the least squared distance on a time waveform as to all combinations. Further, a sound source code selecting means 11 selects the smallest distortion among them and outputs a sound source code 19 corresponding to the code word in the case of it.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号をディジタ
ル伝送、あるいは蓄積する場合に用いられる音声符号化
装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding device used for digital transmission or storage of voice signals.

【０００２】[0002]

【従来の技術】所定のサンプリング周期でディジタル信
号化された音声信号を、所定の長さのフレーム単位で分
析し、スペクトル形状情報と音源信号に分離し、音声信
号が有声音である場合に、ピッチ周期で類似波形の繰り
返しとして現れることを利用してフレーム内の音源信号
をその中の代表的な１ピッチ周期長区間の繰り返しとし
て表現することで有声音部分の符号化効率を改善する音
声符号化方法は、例えば特開平２−８４６９９号公報に
開示されている。2. Description of the Related Art A voice signal converted into a digital signal at a predetermined sampling period is analyzed in frame units of a predetermined length and separated into spectrum shape information and a sound source signal, and when the voice signal is a voiced sound, A speech code that improves the coding efficiency of a voiced sound portion by expressing a sound source signal in a frame as a repetition of a typical one-pitch cycle length section by utilizing the appearance of similar waveforms in a pitch cycle The conversion method is disclosed in, for example, Japanese Patent Application Laid-Open No. 2-84699.

【０００３】図６は、特開平２−８４６９９号公報に開
示されている従来の音声符号化装置の構成を示すブロッ
ク図である。FIG. 6 is a block diagram showing the configuration of a conventional speech coding apparatus disclosed in Japanese Patent Laid-Open No. 2-84699.

【０００４】音声符号化装置は、図６に示すように、所
定のサンプリング周期でディジタル信号化された音声信
号１を、所定の長さのフレーム単位でスペクトル形状を
分析するスペクトル形状分析手段２と、音声信号１のピ
ッチ周期を求めるピッチ周期抽出手段３と、音声信号が
有声か無声かを判定する有声無声判定手段４とを有して
いる。そして、スペクトル形状分析手段２には、スペク
トル形状情報１２を符号化するスペクトル形状情報符号
化手段５が接続されており、スペクトル形状分析手段２
とピッチ周期抽出手段３とには有声音源符号化手段８が
接続されている。有声音源符号化手段８は、音声信号１
が有声音である場合に、ピッチ周期抽出手段３が求めた
ピッチ周期１４から現在のフレーム内乃至その近傍の音
源信号１３から１ピッチ周期長の代表音源３２を抽出す
る代表音源抽出手段３０と、この代表音源３２を所定の
符号化操作により符号化して音源符号１９を出力する音
源符号化手段３１とから構成されている。As shown in FIG. 6, the speech coding apparatus includes a spectrum shape analyzing means 2 for analyzing a spectrum shape of a speech signal 1 which is digitalized at a predetermined sampling period in frame units of a predetermined length. It has pitch period extraction means 3 for obtaining the pitch period of the voice signal 1 and voiced / unvoiced voice determination means 4 for determining whether the voice signal is voiced or unvoiced. The spectrum shape analysis means 2 is connected to the spectrum shape information coding means 5 for coding the spectrum shape information 12, and the spectrum shape analysis means 2 is connected.
A voiced sound source coding means 8 is connected to the pitch period extraction means 3. The voiced sound source encoding means 8 uses the voice signal 1
Is a voiced sound, a representative sound source extraction unit 30 that extracts a representative sound source 32 having a one-pitch period length from the sound source signal 13 in or near the current frame from the pitch period 14 obtained by the pitch period extraction unit 3, The representative excitation 32 is encoded by a predetermined encoding operation and outputs an excitation code 19, and is composed of an excitation encoding means 31.

【０００５】また、ピッチ周期抽出手段３には、ピッチ
周期抽出手段３が求めたピッチ周期１４を符号化するピ
ッチ周期符号化手段６が接続されている。更に、有声無
声判定手段４には、ピッチ周期抽出手段３と、有声無声
判定手段４が出力する有声無声情報１５を符号化する有
声無声情報符号化手段７が接続されている。The pitch cycle extraction means 3 is also connected to a pitch cycle coding means 6 for coding the pitch cycle 14 obtained by the pitch cycle extraction means 3. Further, the voiced unvoiced determination means 4 is connected to the pitch period extraction means 3 and the voiced unvoiced information encoding means 7 for encoding the voiced unvoiced information 15 output from the voiced unvoiced determination means 4.

【０００６】次に動作について説明する。Next, the operation will be described.

【０００７】スペクトル形状分析手段２は、入力された
現在のフレームの音声信号１を分析し、スペクトル形状
情報１２を求め、求めたスペクトル形状情報１２と音声
信号１とから音源信号１３を算出する。そして、スペク
トル形状情報符号化手段５は、スペクトル形状情報１２
を符号化し、得られたスペクトル形状符号１８を出力す
る。The spectrum shape analyzing means 2 analyzes the input voice signal 1 of the current frame to obtain spectrum shape information 12, and calculates a sound source signal 13 from the obtained spectrum shape information 12 and the voice signal 1. The spectrum shape information coding means 5 then uses the spectrum shape information 12
Is encoded and the obtained spectrum shape code 18 is output.

【０００８】また、有声無声判定手段４は、音声信号１
を分析し、有声音か無声音かを判定し、その結果を有声
無声情報１５として出力する。それから、有声無声情報
符号化手段７は、この有声無声情報１５を符号化し、得
られた有声無声符号２１を出力する。In addition, the voiced / unvoiced determination means 4 uses the voice signal 1
Is analyzed to determine whether it is voiced sound or unvoiced sound, and the result is output as voiced unvoiced information 15. Then, the voiced unvoiced information encoding means 7 encodes this voiced unvoiced information 15 and outputs the obtained voiced unvoiced code 21.

【０００９】更に、ピッチ周期抽出手段３は、有声無声
情報１５が有声音である場合に、音声信号１に対してピ
ッチ周期分析を行い、得られたピッチ周期１４を出力す
る。そして、ピッチ周期符号化手段６は、ピッチ周期１
４を符号化し、得られたピッチ周期符号２０を出力す
る。Further, when the voiced unvoiced information 15 is voiced sound, the pitch period extraction means 3 performs a pitch period analysis on the voice signal 1 and outputs the obtained pitch period 14. Then, the pitch period encoding means 6 determines the pitch period 1
4 is encoded and the obtained pitch period code 20 is output.

【００１０】一方、有声音源符号化手段８は、ピッチ周
期１４が値を持つとき、すなわち有声無声情報１５が有
声音であった場合に、まず代表音源抽出手段３０により
現在のフレーム内の音源信号１３の中から振幅が最大に
なる点を中心とするピッチ周期１４の長さを持つ区間を
代表音源３２として抽出して出力する。それから、音源
符号化手段３１は代表音源３２を所定の符号化操作によ
り符号化して音源符号１９を出力する。On the other hand, when the pitch period 14 has a value, that is, when the voiced unvoiced information 15 is a voiced sound, the voiced sound source encoding means 8 first uses the representative sound source extraction means 30 to generate a sound source signal in the current frame. A section having a length of a pitch period 14 centered on the point where the amplitude is maximum is extracted from 13 as the representative sound source 32 and output. Then, the excitation encoding means 31 encodes the representative excitation 32 by a predetermined encoding operation and outputs the excitation code 19.

【００１１】ここで、音源符号化手段３１に用いる音源
符号帳は、代表音源３２の抽出手順と同じ手順により抽
出された１ピッチ帳の音源信号の中の典型的なものを符
号語とし、それらを集めて構成されており、音源符号帳
内の各符号語の波形は、その振幅最大点が常に波形内の
中心に位置する。また、音源符号化手段３１における符
号化は、代表音源３２と各符号語との間に、歪尺度を設
定し、その距離、すなわち符号化時の歪を全ての符号語
の内最小にする符号語を選択し、そのインデックスであ
る音源符号１９を出力するようになっている。歪尺度と
しては、最小二乗距離を用いている。Here, the excitation codebook used in the excitation encoding means 31 is a typical one of the excitation signals of the one-pitch book extracted by the same procedure as the extraction procedure of the representative excitation 32. , The waveform of each codeword in the excitation codebook has its maximum amplitude point always located at the center of the waveform. In the encoding in the excitation encoding means 31, a distortion measure is set between the representative excitation 32 and each code word, and the distance, that is, the code at which the distortion at the time of encoding is minimized among all the code words. A word is selected and the sound source code 19, which is the index thereof, is output. The least square distance is used as the distortion measure.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、図６に
示した従来の音声符号化装置においては、代表音源抽出
手段３０で得られる代表音源３２の抽出結果と、音源符
号化手段３１の符号化特性が、復号後の最終的な再生音
声の音質に大きな影響を与える。However, in the conventional speech encoding apparatus shown in FIG. 6, the extraction result of the representative excitation 32 obtained by the representative excitation extraction means 30 and the encoding characteristic of the excitation encoding means 31. However, it has a great influence on the sound quality of the final reproduced sound after decoding.

【００１３】従来の音声符号化装置では、代表音源抽出
手段３０によりフレームの１つの代表音源３２を抽出し
て、この代表音源３２を音源符号化装置３１により符号
化している。ここで代表音源抽出手段３０において、前
記代表音源３２に隣接する１ピッチ周期長の区間の音源
信号を第２の代表音源として抽出して、音源符号化手段
３０にて符号化を行った場合を考える。この符号化の
際、代表音源３２として得られた区間に対しての符号化
歪が大きく、第２の代表音源に対する符号化歪がそれぞ
れに比べて小さくなる場合も考えられ、本来の代表音源
３２に換えて、より符号化歪の小さい第２の代表音源を
用いるほうが符号化特性がよい場合がある。しかしなが
ら、上述従来例では、フレームに１つの代表音源３２し
か抽出していないために、このような場合に対する考慮
がなされていないという問題点があった。In the conventional speech encoding apparatus, the representative excitation extracting means 30 extracts one representative excitation 32 of a frame, and the representative excitation 32 is encoded by the excitation encoding apparatus 31. Here, the case where the representative excitation extracting means 30 extracts the excitation signal of the section of one pitch period length adjacent to the representative excitation 32 as the second representative excitation and the encoding is performed by the excitation encoding means 30. Think At the time of this encoding, it is conceivable that the coding distortion for the section obtained as the representative excitation 32 is large and the coding distortion for the second representative excitation is smaller than that of the representative excitation 32. Instead, it may be better to use the second representative excitation with smaller coding distortion, which has better coding characteristics. However, in the above-described conventional example, since only one representative sound source 32 is extracted in a frame, there is a problem that such a case is not taken into consideration.

【００１４】また、音源信号１３は、有声音の場合に、
概ねピッチ周期１４での類似波形の繰り返しとして現れ
るものの、その波形様態はしばしば急変し、フレーム内
の音源信号１３の中から１つの代表音源３２だけを抽出
する代表音源抽出手段３０では、フレーム内の音源信号
１３の容態を十分に捉えた代表音源３２を抽出すること
が難しい場合があり、復号後の音源が不安定になるとい
う問題点があった。The sound source signal 13 is a voiced sound,
Although it appears as a repetition of similar waveforms with a pitch period of 14, the waveform mode often changes suddenly, and in the representative sound source extraction means 30 that extracts only one representative sound source 32 from the sound source signal 13 in the frame, It may be difficult to extract the representative sound source 32 that sufficiently captures the condition of the sound source signal 13, and the sound source after decoding becomes unstable.

【００１５】更に、一般にディジタル化された信号は、
全く同一のアナログ信号から同一のサンプリング周期で
ディジタル化された信号であっても、サンプリング点
が、サンプリング周期未満の間隔でずれていると、波形
が微妙に異なる場合がある。In addition, generally the digitized signal is
Even if the signals are digitized from the same analog signal at the same sampling period, the waveforms may be slightly different if the sampling points are displaced at intervals less than the sampling period.

【００１６】図７は、上記サンプリング点の位置の違い
に伴う波形のずれの様子を説明するための図であり、こ
の図は、ある音声信号について、所定のサンプリング周
期でサンプリングが行われたサンプリング後の音声信号
ａと、同一のサンプリングが行われたものの、サンプリ
ング点が若干ずれており、サンプリング後の音声信号ｂ
の様子を模式的に表したものである。FIG. 7 is a diagram for explaining how the waveform shifts due to the difference in the positions of the sampling points. This diagram shows a sampling in which a certain audio signal is sampled at a predetermined sampling cycle. Although the same sampling is performed as the subsequent audio signal a, the sampling point is slightly deviated, and the audio signal b after the sampling is performed.
This is a schematic representation of the situation.

【００１７】サンプリング後の音声信号ａとサンプリン
グ後の音声信号ｂとのそれぞれの振幅最大点の位置は若
干ずれている。ここで、この振幅最大点を基準に両信号
の一部を抽出して比較すると、両信号波形の間にサンプ
リング周期未満のずれが生じる。The positions of the maximum amplitude points of the sampled audio signal a and the sampled audio signal b are slightly deviated. Here, when a part of both signals is extracted and compared with this maximum amplitude point as a reference, a deviation of less than the sampling period occurs between the waveforms of both signals.

【００１８】図８は、上記の波形のずれが、数サンプリ
ング周期に亙って生じる様子を説明するための図であ
り、この図は、ある音声信号について、所定のサンプリ
ング周期でサンプリングが行われたサンプリング後の音
声信号ｃと、同一のサンプリングが行われたものの、サ
ンプリング点が若干ずれているサンプリング後の音声信
号ｄの様子を模式的に表したものである。FIG. 8 is a diagram for explaining how the above-mentioned waveform shift occurs over several sampling cycles. In this figure, a certain audio signal is sampled at a predetermined sampling cycle. The sampled audio signal c is the same as the sampled audio signal c, but the sampling point is slightly deviated from the sampled audio signal d.

【００１９】サンプリング後の音声信号ｃとサンプリン
グ後の音声信号ｄとのそれぞれの振幅最大点の位置は、
双峰状の様態を示すサンプリング前の音声信号波形の左
右の最大点に位置する。ここで、この振幅最大点を基準
に両信号の一部を抽出して比較すると、両信号波形の間
に２サンプリング周期のずれが生じる。The positions of the maximum amplitude points of the sampled audio signal c and the sampled audio signal d are
It is located at the left and right maximum points of the voice signal waveform before sampling, which shows a bimodal shape. Here, when a part of both signals is extracted and compared with this maximum amplitude point as a reference, a shift of two sampling periods occurs between the waveforms of both signals.

【００２０】いま、音源符号帳内のある符号語が作成さ
れたときと全く同じ音声信号が従来の音声符号化装置に
入力された場合を考えると、同じ音声信号であるので、
この符号語が選ばれることが望ましいが、符号語を作成
したときと代表音源３２の抽出を行うときの僅かなサン
プリング点のずれの影響で歪みが大きくなってしまい、
異なった符号語が選ばれことがあり、サンプリング点の
ずれの影響で符号化特性が劣化するという問題点があっ
た。Now, considering the case where the same speech signal as when a certain codeword in the excitation codebook was created is input to the conventional speech coding apparatus, it is the same speech signal.
It is desirable to select this code word, but the distortion becomes large due to the effect of a slight shift in the sampling points when the code word is created and when the representative sound source 32 is extracted,
There is a problem that different codewords may be selected, and the coding characteristics may deteriorate due to the influence of the deviation of the sampling points.

【００２１】この発明は、上記のような課題を解消する
ためになされたもので、単一の代表音源を用いた場合に
は見逃してしまうより符号化特性の良い代表音源をフレ
ーム内より抽出できる可能性を確保し、総合的な音源符
号化特性を向上させられる音声符号化装置を得ることを
目的としている。また、フレーム内の平均的な音源信号
波形の様態を代表音源の符号化に反映させ、復号後の音
源の安定度を向上させられる音声符号化装置を得ること
を目的としている。更に、音源符号化時に、抽出された
代表音源と符号後との微小なサンプルずれを吸収し、音
源の符号化特性を向上させられる音声符号化装置を得る
ことを目的としている。The present invention has been made in order to solve the above-mentioned problems, and a representative sound source having a better coding characteristic than that which is missed when a single representative sound source is used can be extracted from within a frame. It is an object of the present invention to obtain a speech coder that secures the possibility and improves the overall excitation coding characteristic. Another object of the present invention is to obtain a speech coding apparatus capable of improving the stability of the sound source after decoding by reflecting the state of the average sound source signal waveform in the frame on the coding of the representative sound source. Further, another object of the present invention is to obtain a speech coding apparatus capable of absorbing a small sample shift between the extracted representative sound source and the coded sound at the time of sound source coding and improving the coding characteristic of the sound source.

【００２２】[0022]

【課題を解決するための手段】請求項１記載の発明に係
る音声符号化装置は、音源信号から代表音源を複数抽出
して出力する複数代表音源抽出手段と、複数代表音源抽
出手段が抽出した各代表音源を所定の符号化操作で出力
可能な各音源符号に符号化した場合の歪を全て算出する
音源符号化歪算出手段と、音源符号化歪算出手段が算出
した歪を比較し、その歪が最小となる代表音源と音源符
号の組み合わせを選択してその音源符号を出力する音源
符号選択手段とを備えることを特徴とするものである。According to a first aspect of the present invention, there is provided a speech coding apparatus, wherein a plurality of representative sound source extracting means for extracting a plurality of representative sound sources from a sound source signal and outputting the plurality of representative sound sources, and a plurality of representative sound source extracting means. Excitation coding distortion calculation means for calculating all distortions when each representative excitation is encoded into each excitation code that can be output by a predetermined encoding operation, and distortions calculated by the excitation coding distortion calculation means are compared, The present invention is characterized by further comprising excitation code selection means for selecting a combination of the representative excitation and the excitation code having the minimum distortion and outputting the excitation code.

【００２３】請求項２記載の発明に係る音声符号化装置
は、音源信号から代表音源を複数抽出して出力する複数
代表音源抽出手段と、複数代表音源抽出手段が抽出した
各代表音源を所定の符号化操作で出力可能な各音源符号
に符号化した場合の歪を全て算出する音源符号化歪算出
手段と、音源符号化歪算出手段が算出した各歪に所定の
重み係数を乗じて音源符号毎に和を取り、その和が最小
になる音源符号を出力する音源符号決定手段とを備える
ことを特徴とするものである。According to a second aspect of the present invention, a speech coding apparatus extracts a plurality of representative sound sources by extracting a plurality of representative sound sources from a sound source signal and outputs the representative sound sources extracted by the plurality of representative sound source extracting means. Excitation coded distortion calculation means for calculating all distortions when encoded into each excitation code that can be output by encoding operation, and excitation code by multiplying each distortion calculated by the excitation coded distortion calculation means by a predetermined weighting coefficient Excitation code determination means for taking the sum for each and outputting the excitation code with the minimum sum is provided.

【００２４】請求項３記載の発明に係る音声符号化装置
は、音源信号から代表音源を複数抽出して出力する複数
代表音源抽出手段と、複数代表音源抽出手段が抽出した
各代表音源の平均化処理を行って得られた１つの平均化
代表音源を出力する平均化代表音源算出手段と、平均化
代表音源に所定の符号化操作を行って符号化時の歪を最
小にする音源符号を出力する平均化音源符号化手段とを
備えることを特徴とするものである。A speech encoding apparatus according to a third aspect of the invention is a plurality of representative sound source extracting means for extracting a plurality of representative sound sources from a sound source signal and outputting the plurality of representative sound sources, and averaging of each representative sound source extracted by the plurality of representative sound source extracting means. An averaged representative sound source calculation means for outputting one averaged representative sound source obtained by processing, and a sound source code for performing a predetermined coding operation on the averaged representative sound source to minimize distortion during coding. And an averaging excitation coding means for performing the same.

【００２５】請求項４記載の発明に係る音声符号化装置
は、前記複数代表音源抽出手段に、現在のフレーム内の
音源信号より１つの代表音源抽出位置を決定して出力す
る代表音源抽出位置決定手段と、代表音源抽出位置の代
表音源とその代表音源抽出位置から前向または後向にお
およそピッチ周期の整数倍長離れた位置の音源とから複
数の１ピッチ周期長の代表音源を抽出するピッチ周期間
隔代表音源抽出手段とを設けたことを特徴とするもので
ある。According to a fourth aspect of the present invention, there is provided a speech coding apparatus, wherein a representative sound source extraction position is determined by the plural representative sound source extraction means by determining one representative sound source extraction position from a sound source signal in a current frame and outputting the representative sound source extraction position. Pitch for extracting a plurality of representative sound sources of one pitch period length from the means, the representative sound source at the representative sound source extraction position, and the sound source at a position separated from the representative sound source extraction position forward or backward by an integer multiple of the pitch period. It is characterized in that a periodic interval representative sound source extraction means is provided.

【００２６】請求項５記載の発明に係る音声符号化装置
は、前記複数代表音源抽出手段に、現在のフレーム内の
音源信号より１つの代表音源抽出位置を決定して出力す
る代表音源抽出位置決定手段と、代表音源抽出位置に対
してピッチ周期に比べて微小な複数種類の前後の移動を
与えてそれらの位置の音源信号より複数の１ピッチ周期
長の代表音源を抽出する変位代表音源抽出手段とを設け
たことを特徴とするものである。In the speech coder according to the fifth aspect of the present invention, the representative sound source extraction position determination means for determining one representative sound source extraction position from the sound source signals in the current frame and outputting it to the plural representative sound source extraction means. And a displacement representative sound source extraction means for extracting a plurality of representative sound sources of one pitch cycle length from the sound source signals at those positions by giving a plurality of kinds of front and rear movements that are minute compared to the pitch period to the representative sound source extraction position. And is provided.

【００２７】[0027]

【作用】請求項１記載の発明における音声符号化装置
は、複数代表音源抽出手段により音源信号から代表音源
を複数抽出して出力し、音源符号化歪算出手段により複
数代表音源抽出手段が抽出した各代表音源を所定の符号
化操作で出力可能な各音源符号に符号化した場合の歪を
全て算出し、音源符号選択手段により音源符号化歪算出
手段が算出した歪を比較し、その歪が最小となる代表音
源と音源符号の組み合わせを選択してその音源符号を出
力する。In the speech coding apparatus according to the first aspect of the present invention, the plurality of representative sound source extraction means extracts a plurality of representative sound sources from the sound source signal and outputs the same, and the sound source coding distortion calculation means extracts the plurality of representative sound source extraction means. Calculates all distortions when each representative excitation is encoded into each excitation code that can be output by a predetermined encoding operation, compares the distortions calculated by the excitation coding distortion calculation means by the excitation code selection means, and finds that distortion The smallest combination of the representative excitation and the excitation code is selected and the excitation code is output.

【００２８】請求項２記載の発明における音声符号化装
置は、複数代表音源抽出手段により音源信号から代表音
源を複数抽出して出力し、音源符号化歪算出手段により
複数代表音源抽出手段が抽出した各代表音源を所定の符
号化操作で出力可能な各音源符号に符号化した場合の歪
を全て算出し、音源符号決定手段により音源符号化歪算
出手段が算出した各歪に所定の重み係数を乗じて音源符
号毎に和を取り、その和が最小になる音源符号を出力す
る。In the speech coder according to the second aspect of the present invention, a plurality of representative excitation extracting means extracts a plurality of representative excitations from the excitation signal and outputs the representative excitations, and an excitation coding distortion calculating means extracts a plurality of representative excitation extracting means. All distortions when each representative excitation is encoded into each excitation code that can be output by a predetermined encoding operation are calculated, and a predetermined weighting coefficient is assigned to each distortion calculated by the excitation coding distortion calculation means by the excitation code determination means. The sum is multiplied to obtain the sum for each excitation code, and the excitation code having the minimum sum is output.

【００２９】請求項３記載の発明における音声符号化装
置は、複数代表音源抽出手段により音源信号から代表音
源を複数抽出して出力し、平均化代表音源算出手段によ
り複数代表音源抽出手段が抽出した各代表音源の平均化
処理を行って得られた１つの平均化代表音源を出力し、
平均化音源符号化手段により平均化代表音源に所定の符
号化操作を行って符号化時の歪を最小にする音源符号を
出力する。In the speech coding apparatus according to the third aspect of the present invention, the plurality of representative sound source extracting means extracts a plurality of representative sound sources from the sound source signal and outputs the same, and the averaged representative sound source calculating means extracts the plurality of representative sound source extracting means. Output one averaged representative sound source obtained by averaging each representative sound source,
The averaging excitation encoding means performs a predetermined encoding operation on the averaged representative excitation to output an excitation code that minimizes distortion during encoding.

【００３０】請求項４記載の発明における音声符号化装
置は、代表音源抽出位置決定手段により現在のフレーム
内の音源信号より１つの代表音源抽出位置を決定して出
力し、ピッチ周期間隔代表音源抽出手段により代表音源
抽出位置の代表音源とその代表音源抽出位置とから前向
または後向におおよそピッチ周期の整数倍長離れた位置
の音源とから複数の１ピッチ周期長の代表音源を抽出す
る。In the speech coding apparatus according to the present invention, the representative sound source extraction position determining means determines and outputs one representative sound source extraction position from the sound source signal in the current frame, and extracts the pitch period interval representative sound source. The means extracts a plurality of representative sound sources having a one-pitch cycle length from the representative sound source at the representative sound source extraction position and the sound source at a position separated from the representative sound source extraction position by an integral multiple of the pitch period in the forward direction or the backward direction.

【００３１】請求項５記載の発明における音声符号化装
置は、代表音源抽出位置決定手段により現在のフレーム
内の音源信号より１つの代表音源抽出位置を決定して出
力する、変位代表音源抽出手段により代表音源抽出位置
に対してピッチ周期に比べて微小な複数種類の前後の移
動を与えてそれらの位置の音源信号より複数の１ピッチ
周期長の代表音源を抽出する。In the speech coder according to the fifth aspect of the present invention, the representative excitation extraction position determining means determines one representative excitation extraction position from the excitation signal in the current frame and outputs the displacement representative excitation extraction means. A plurality of types of front and rear movements that are minute compared to the pitch period are given to the representative sound source extraction position, and a plurality of representative sound sources of one pitch cycle length are extracted from the sound source signals at those positions.

【００３２】[0032]

【実施例】以下、この発明の一実施例を図を用いて説明
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００３３】実施例１．図１は、請求項１記載の発明に
係る音声符号化装置の構成を示すブロック図である。Example 1. FIG. 1 is a block diagram showing a configuration of a speech coder according to the invention described in claim 1.

【００３４】音声符号化装置は、図１に示すように、所
定のサンプリング周期でディジタル信号化された音声信
号１を、所定の長さのフレーム単位でスペクトル形状を
分析するスペクトル形状分析手段２と、音声信号１のピ
ッチ周期を求めるピッチ周期抽出手段３と、音声信号が
有声か無声かを判定する有声無声判定手段４とを有して
いる。そして、スペクトル形状分析手段２には、スペク
トル形状情報１２を符号化するスペクトル形状情報符号
化手段５が接続されており、スペクトル形状分析手段２
とピッチ周期抽出手段３とには有声音源符号化手段８が
接続されている。有声音源符号化手段８は、音源信号１
３から複数の代表音源である複数代表音源１６を抽出し
て出力する複数代表音源抽出手段９と、複数代表音源抽
出手段９が抽出した複数代表音源１６を所定の符号化操
作で出力可能な各音源符号に符号化した場合の歪を全て
算出する音源符号化歪算出手段１０と、音源符号化歪算
出手段１０が算出した歪１７を比較し、その歪１７が最
小となる代表音源と音源符号の組み合わせを選択してそ
の音源符号１９を出力する音源符号選択手段１１とから
構成されている。As shown in FIG. 1, the speech coding apparatus includes a spectrum shape analyzing means 2 for analyzing a spectrum shape of a speech signal 1 which is digitalized at a predetermined sampling period in frame units of a predetermined length. It has pitch period extraction means 3 for obtaining the pitch period of the voice signal 1 and voiced / unvoiced voice determination means 4 for determining whether the voice signal is voiced or unvoiced. The spectrum shape analysis means 2 is connected to the spectrum shape information coding means 5 for coding the spectrum shape information 12, and the spectrum shape analysis means 2 is connected.
A voiced sound source coding means 8 is connected to the pitch period extraction means 3. The voiced sound source encoding means 8 uses the sound source signal 1
3, a plurality of representative sound source extraction means 9 for extracting and outputting a plurality of representative sound source 16 which are a plurality of representative sound sources, and a plurality of representative sound source 16 extracted by the plurality of representative sound source extraction means 9 can be output by a predetermined encoding operation. Excitation coded distortion calculation means 10 for calculating all distortions when coded into the excitation code and distortion 17 calculated by the excitation coded distortion calculation means 10 are compared, and the representative excitation and the excitation code with the minimum distortion 17 are compared. And a sound source code selecting means 11 for selecting a combination of and outputting the sound source code 19.

【００３５】また、ピッチ周期抽出手段３には、ピッチ
周期抽出手段３が求めたピッチ周期１４を符号化するピ
ッチ周期符号化手段６が接続されている。更に、有声無
声判定手段４には、ピッチ周期抽出手段３と、有声無声
判定手段４が出力する有声無声情報１５を符号化する有
声無声情報符号化手段７が接続されている。Further, the pitch cycle extraction means 3 is connected to a pitch cycle coding means 6 for coding the pitch cycle 14 obtained by the pitch cycle extraction means 3. Further, the voiced unvoiced determination means 4 is connected to the pitch period extraction means 3 and the voiced unvoiced information encoding means 7 for encoding the voiced unvoiced information 15 output from the voiced unvoiced determination means 4.

【００３６】次に、本実施例の動作について説明する。Next, the operation of this embodiment will be described.

【００３７】スペクトル形状分析手段２は、入力された
現在のフレームの音声信号１を分析し、スペクトル形状
情報１２を求め、求めたスペクトル形状情報１２と音声
信号１とから音源信号１３を算出する。そして、スペク
トル形状情報符号化手段５は、スペクトル形状情報１２
を符号化し、得られたスペクトル形状符号１８を出力す
る。The spectrum shape analyzing means 2 analyzes the input voice signal 1 of the current frame to obtain spectrum shape information 12, and calculates a sound source signal 13 from the obtained spectrum shape information 12 and the voice signal 1. The spectrum shape information coding means 5 then uses the spectrum shape information 12
Is encoded and the obtained spectrum shape code 18 is output.

【００３８】また、有声無声判定手段４は、音声信号１
を分析し、有声音か無声音かを判定し、その結果を有声
無声情報１５として出力する。それから、有声無声情報
符号化手段７は、この有声無声情報１５を符号化し、得
られた有声無声符号２１を出力する。In addition, the voiced / unvoiced determination means 4 uses the voice signal 1
Is analyzed to determine whether it is voiced sound or unvoiced sound, and the result is output as voiced unvoiced information 15. Then, the voiced unvoiced information encoding means 7 encodes this voiced unvoiced information 15 and outputs the obtained voiced unvoiced code 21.

【００３９】更に、ピッチ周期抽出手段３は、有声無声
情報１５が有声音である場合に、音声信号１に対してピ
ッチ周期分析を行い、得られたピッチ周期１４を出力す
る。そして、ピッチ周期符号化手段６は、ピッチ周期１
４を符号化し、得られたピッチ周期符号２０を出力す
る。Further, the pitch period extracting means 3 performs pitch period analysis on the voice signal 1 when the voiced unvoiced information 15 is voiced sound, and outputs the obtained pitch period 14. Then, the pitch period encoding means 6 determines the pitch period 1
4 is encoded and the obtained pitch period code 20 is output.

【００４０】一方、有声音源符号化手段８は、ピッチ周
期１４が値を持つとき、すなわち現処理フレームが有声
音であるとき、複数代表音源抽出手段９は予め設定され
ている抽出基準に従って複数代表音源１６の抽出を行
う。それから、音源符号化歪算出手段１０は、複数代表
音源抽出手段９が抽出した複数代表音源１６の一つ一つ
について所定の符号化に用いられる各符号語と比較し、
２つの間の所定の符号化時の歪１７、例えば時間波形上
での最小二乗距離を全ての組み合わせについて求めて出
力する。更に、音源符号選択手段１１では、これらの歪
１７のうちで最小のものを選択し、その場合の符号語に
対応する音源符号１９を出力する。On the other hand, when the pitch period 14 has a value, that is, when the current processing frame is a voiced sound, the voiced sound source encoding means 8 causes the plural representative sound source extraction means 9 to select a plurality of representative sound sources in accordance with preset extraction criteria. The sound source 16 is extracted. Then, the excitation coding distortion calculation means 10 compares each of the plural representative excitations 16 extracted by the plural representative excitation extraction means 9 with each codeword used for predetermined encoding,
The distortion 17 at the time of predetermined coding between the two, for example, the least square distance on the time waveform is obtained and output for all combinations. Further, the excitation code selection means 11 selects the minimum of these distortions 17 and outputs the excitation code 19 corresponding to the code word in that case.

【００４１】実施例２．図２は、請求項２記載の発明に
係る音声符号化装置の構成を示すブロック図である。な
お、図１と同一構成部分には同一符号を付して説明を省
略する。Example 2. FIG. 2 is a block diagram showing the configuration of the speech encoding apparatus according to the invention of claim 2. It should be noted that the same components as those in FIG.

【００４２】有声音源符号化手段８は、音源信号１３か
ら複数代表音源１６を複数抽出して出力する複数代表音
源抽出手段９と、複数代表音源抽出手段９が抽出した各
複数代表音源１６を所定の符号化操作で出力可能な各音
源符号に符号化した場合の歪を全て算出する音源符号化
歪算出手段１０と、音源符号化歪算出手段１０が算出し
た各歪に所定の重み係数を乗じて音源符号毎に和を取
り、その和が最小になる音源符号１９を出力する音源符
号決定手段２２とから構成されている。The voiced sound source encoding means 8 extracts a plurality of representative sound sources 16 from the sound source signal 13 and outputs them, and a plurality of representative sound sources 16 extracted by the plurality of representative sound source extraction means 9. Excitation coding distortion calculation means 10 for calculating all distortions when encoded into each excitation code that can be output by the encoding operation, and each distortion calculated by the excitation coding distortion calculation means 10 is multiplied by a predetermined weighting coefficient. And the excitation code determination means 22 for taking the sum for each excitation code and outputting the excitation code 19 with the minimum sum.

【００４３】次ぎに、本実施例の動作について説明す
る。Next, the operation of this embodiment will be described.

【００４４】有声音源符号化手段８は、ピッチ周期１４
が値を持つとき、すなわち現処理フレームが有声音であ
るとき、複数代表音源抽出手段９は予め設定されている
抽出基準に従って複数代表音源１６の抽出を行う。それ
から、音源符号化歪算出手段１０は、複数代表音源抽出
手段９が抽出した複数代表音源１６の一つ一つについて
所定の符号化に用いられる各符号語と比較し、２つの間
の所定の符号化時の歪１７、例えば時間波形上での最小
二乗距離を全ての組み合わせについて求めて出力する。
更に、音源符号決定手段２２では、これらの歪１７に各
代表音源の抽出位置に応じて付与された重み係数（ここ
ではフレームの中央に近い場所から抽出された代表音源
に対しては大きな重み係数）を乗じ、それらを音源符号
毎に和を取り、その和が最小になる音源符号１９を選択
して出力する。The voiced sound source encoding means 8 has a pitch period of 14
Has a value, that is, when the current processing frame is a voiced sound, the plural representative sound source extracting means 9 extracts the plural representative sound sources 16 according to a preset extraction criterion. Then, the excitation coding distortion calculation means 10 compares each of the plurality of representative excitations 16 extracted by the plurality of representative excitation extraction means 9 with each codeword used for a predetermined encoding, and determines a predetermined value between the two. The distortion 17 at the time of encoding, for example, the least square distance on the time waveform is obtained and output for all combinations.
Further, the excitation code determination means 22 assigns a weighting factor to these distortions 17 according to the extraction position of each representative excitation (here, a large weighting factor is applied to the representative excitation extracted from a location near the center of the frame. ), The sum is calculated for each excitation code, and the excitation code 19 having the minimum sum is selected and output.

【００４５】実施例３．上述実施例２においては、音源
符号決定手段２２内における重み係数の付与について、
フレームの中央に近い場所から抽出した代表音源に対し
て大きな重み係数を付与するようになっているが、これ
に限らず、前フレーム代表音源と各代表音源の相関に応
じて付与する重み係数の大きさを変えてもよい。Example 3. In the above-described second embodiment, regarding the weighting coefficient assignment in the excitation code determination means 22,
A large weighting factor is given to the representative sound source extracted from a location close to the center of the frame, but not limited to this, the weighting factor of the weighting factor given according to the correlation between the previous frame representative sound source and each representative sound source is You may change the size.

【００４６】実施例４．上述実施例２においては、音源
符号決定手段２２内における重み係数の付与について、
フレームの中央に近い場所から抽出した代表音源に対し
て大きな重み係数を付与または前フレーム代表音源と各
代表音源の相関に応じて付与する重み係数の大きさを変
えるようになっているが、各代表音源に一様な重み付け
をして和を取るようにしてもよい。Example 4. In the above-described second embodiment, regarding the weighting coefficient assignment in the excitation code determination means 22,
A large weighting coefficient is given to the representative sound source extracted from a location near the center of the frame, or the size of the weighting coefficient to be given is changed according to the correlation between the previous frame representative sound source and each representative sound source. The representative sound sources may be uniformly weighted to obtain the sum.

【００４７】実施例５．図３は、請求項３記載の発明に
係る音声符号化装置の構成を示すブロック図である。な
お、図１と同一構成部分には同一符号を付して説明を省
略する。Example 5. FIG. 3 is a block diagram showing the configuration of a speech encoding apparatus according to the invention of claim 3. It should be noted that the same components as those in FIG.

【００４８】有声音源符号化手段８は、音源信号１３か
ら代表音源１６を複数抽出して出力する複数代表音源抽
出手段９と、複数代表音源抽出手段９が抽出した各代表
音源の平均化処理を行って得られた１つの平均化代表音
源２５を出力する平均化代表音源算出手段２３と、平均
化代表音源２５に所定の符号化操作を行って符号化時の
歪を最小にする音源符号１９を出力する平均化音源符号
化手段２４とから構成されている。The voiced sound source encoding means 8 performs a plurality of representative sound source extraction means 9 for extracting and outputting a plurality of representative sound sources 16 from the sound source signal 13, and an averaging process of each representative sound source extracted by the plurality of representative sound source extraction means 9. An averaged representative sound source calculation unit 23 that outputs one averaged representative sound source 25 obtained by performing the sound source code 19 that performs a predetermined coding operation on the averaged representative sound source 25 to minimize distortion at the time of coding. And averaging excitation coding means 24 for outputting.

【００４９】次に、本実施例の動作について説明する。Next, the operation of this embodiment will be described.

【００５０】有声音源符号化手段８は、ピッチ周期１４
が値を持つとき、すなわち現処理フレームが有声音であ
るとき、複数代表音源抽出手段９は予め設定されている
抽出基準に従って複数代表音源１６の抽出を行う。それ
から、平均化代表音源算出手段２３は、複数代表音源抽
出手段９が抽出した各代表音源の時間波形上における各
サンプル毎の振幅の平均値を求め、平均化代表音源２５
を算出して出力する。更に、平均化音源符号化手段２４
は、この平均化代表音源２５と所定の符号化に用いられ
る各符号毎の間の時間波形上の最小二乗距離を計算し、
これを最小とする符号語に対応する音源符号１９を出力
する。The voiced sound source encoding means 8 has a pitch period of 14
Has a value, that is, when the current processing frame is a voiced sound, the plural representative sound source extracting means 9 extracts the plural representative sound sources 16 according to a preset extraction criterion. Then, the averaged representative sound source calculation means 23 obtains the average value of the amplitude of each sample on the time waveform of each representative sound source extracted by the plural representative sound source extraction means 9, and calculates the averaged representative sound source 25.
Is calculated and output. Furthermore, averaging excitation coding means 24
Is the least square distance on the time waveform between the averaged representative excitation 25 and each code used for predetermined encoding,
The excitation code 19 corresponding to the code word that minimizes this is output.

【００５１】実施例６．上述実施例５においては、平均
化音源符号化手段２４で平均値の算出に偏りを持たせて
いなかったが、各代表音源の抽出位置や、前フレームの
代表音源との相関を重み付与基準に用いた加重平均を求
める構成としてもよい。Example 6. In the fifth embodiment described above, the averaged excitation coding means 24 does not bias the calculation of the average value, but the extraction position of each representative excitation and the correlation with the representative excitation of the previous frame are used as weighting criteria. The weighted average used may be obtained.

【００５２】実施例７．上述実施例１、２及び５におい
ては、平均化音源符号化手段２４における符号化の歪尺
度を、時間波形上の最小二乗距離としたが、歪尺度はこ
れに限らず、例えば波形間のバタチャリア距離等を用い
る構成も可能である。また、ＤＦＴスペクトル上での歪
尺度を定義し、歪を計算する構成も可能である。Example 7. In the first, second and fifth embodiments described above, the distortion measure of coding in the averaging excitation coding means 24 is the least square distance on the time waveform, but the distortion measure is not limited to this, and for example, Batacharia between waveforms. A configuration using distance or the like is also possible. Further, a configuration in which a distortion measure on the DFT spectrum is defined and distortion is calculated is also possible.

【００５３】実施例８．上述実施例１、２及び５におい
ては、平均化音源符号化手段２４で、時間波形上の振幅
の平均により、平均化代表音源２５を算出したが、複数
代表音源１６のそれぞれをＤＦＴし、ＤＦＴスペクトル
上での平均を取り、そのまま実施例７の上の歪尺度を用
いた符号化を行う構成とすることも可能である。Example 8. In the first, second and fifth embodiments described above, the averaged excitation coding means 24 calculates the averaged representative excitation 25 by averaging the amplitudes on the time waveform. However, each of the plurality of representative excitations 16 is subjected to DFT and DFT. It is also possible to adopt a configuration in which the average is taken on the spectrum and the coding is performed using the distortion measure as in the seventh embodiment as it is.

【００５４】実施例９．図４は、請求項４記載の発明に
係る音声符号化装置の複数代表音源抽出手段９の構成を
示すブロック図である。Example 9. FIG. 4 is a block diagram showing the configuration of the plural representative excitation extracting means 9 of the speech coding apparatus according to the fourth aspect of the invention.

【００５５】複数代表音源抽出手段９は、現在のフレー
ム内の音源信号より１つの代表音源抽出位置を決定して
出力する代表音源抽出位置決定手段２６と、代表音源抽
出位置の代表音源とその代表音源抽出位置とから前向ま
たは後向におおよそピッチ周期の整数倍長離れた位置の
音源とから複数の１ピッチ周期長の代表音源を抽出する
ピッチ周期間隔代表音源抽出手段２７とを備えている。The plural representative sound source extraction means 9 determines the representative sound source extraction position from the sound source signal in the current frame and outputs it, the representative sound source extraction position determining means 26, the representative sound source at the representative sound source extraction position and its representative. Pitch cycle interval representative sound source extraction means 27 for extracting a plurality of representative sound sources of one pitch cycle length from a sound source extraction position and a sound source located at a position approximately an integer multiple of the pitch cycle forward or backward. .

【００５６】次に、本実施例の動作について説明する。Next, the operation of this embodiment will be described.

【００５７】代表音源抽出位置決定手段２６は、ピッチ
周期１４が値を持つとき、すなわち現処理フレームが有
声音であるとき、フレーム内の音源信号１３の全域よ
り、振幅ピーク最大の点の位置、及びその点を中心とす
る１ピッチ周期区間を抽出し、抽出位置情報２８として
出力する。そして、ピッチ周期間隔代表音源抽出手段２
７は、抽出位置情報２８の１ピッチ周期区間と、それに
前後するフレーム内全域より抽出位置情報２８の中心か
らピッチ周期の整数倍離れた点の±２〜３ポイント以内
の振幅最大点を中心とする１ピッチ周期区間を抽出し、
複数代表音源１６として出力する。この時、フレーム始
終端付近の抽出区間では、前後フレームにまたがって抽
出される場合もある。When the pitch period 14 has a value, that is, when the current processing frame is a voiced sound, the representative sound source extraction position determining means 26 determines the position of the maximum amplitude peak point from the entire region of the sound source signal 13 in the frame, And a 1-pitch cycle section centered on that point is output as extraction position information 28. Then, the pitch period interval representative sound source extraction means 2
7 is centered on one pitch period section of the extraction position information 28, and the maximum amplitude point within ± 2 to 3 points of the point which is an integer multiple of the pitch period away from the center of the extraction position information 28 from the whole area in the frame before and after it. Extract one pitch period section,
It is output as a plurality of representative sound sources 16. At this time, in the extraction section near the start and end of the frame, it may be extracted over the preceding and following frames.

【００５８】実施例１０．図５は、請求項５記載の発明
に係る音声符号化装置の複数代表音源抽出手段９の構成
を示すブロック図である。Example 10. FIG. 5 is a block diagram showing the configuration of the plural representative excitation extracting means 9 of the speech encoding apparatus according to the fifth aspect of the invention.

【００５９】複数代表音源抽出手段９は、現在のフレー
ム内の音源信号より１つの代表音源抽出位置を決定して
出力する代表音源抽出位置決定手段２６と、代表音源抽
出位置に対してピッチ周期に比べて微小な複数種類の前
後の移動を与えてそれらの位置の音源信号より複数の１
ピッチ周期長の代表音源を抽出する変位代表音源抽出手
段２９とを備えている。The plural representative sound source extraction means 9 determines a representative sound source extraction position from the sound source signal in the current frame and outputs the representative sound source extraction position determining means 26, and a pitch period with respect to the representative sound source extraction position. Compared with the sound source signals at those positions, a plurality of 1
Displacement representative sound source extraction means 29 for extracting a representative sound source having a pitch cycle length is provided.

【００６０】次に、本実施例の動作について説明する。Next, the operation of this embodiment will be described.

【００６１】代表音源抽出位置決定手段２６は、ピッチ
周期１４が値を持つとき、すなわち現処理フレームが有
声音であるとき、フレーム内の音源信号１３の全域よ
り、振幅ピーク最大の点の位置、及びその点を中心とす
る１ピッチ周期区間を抽出し、抽出位置情報２８として
出力する。そして、変位代表音源抽出手段２９は、抽出
位置情報２８の１ピッチ周期区間と、それに±１〜３ポ
イント前後した１ピッチ周期区間を独立した区間として
抽出し、複数代表音源１６として出力する。この時、フ
レーム始終端付近の抽出区間では、前後フレームにまた
がって抽出される場合もある。When the pitch period 14 has a value, that is, when the current processing frame is a voiced sound, the representative sound source extraction position determining means 26 determines the position of the maximum amplitude peak point from the entire region of the sound source signal 13 in the frame, And a 1-pitch cycle section centered on that point is output as extraction position information 28. Then, the displacement representative sound source extraction means 29 extracts the 1-pitch cycle section of the extracted position information 28 and the 1-pitch cycle section about ± 1 to 3 points as an independent section, and outputs it as a plurality of representative sound sources 16. At this time, in the extraction section near the start and end of the frame, it may be extracted over the preceding and following frames.

【００６２】実施例１１．上述実施例９及び１０では、
抽出位置情報２８を、フレーム内全域の振幅最大点を基
準に決定していたが、この基準は、前フレームの代表音
源の１つと、フレーム内の音源信号１３の相互相関値が
最大となる位置にすることも可能である。Example 11. In the above-mentioned Examples 9 and 10,
The extraction position information 28 is determined based on the maximum amplitude point in the entire area of the frame. The reference is a position where the cross-correlation value of one of the representative sound sources in the previous frame and the sound source signal 13 in the frame becomes maximum. It is also possible to

【００６３】実施例１２．上述実施例１０では、抽出位
置の変位を、音源信号１３上の抽出区間の移動によって
得ているが、抽出位置情報２８で得られるその位置を中
心とする１ピッチ周期区間をＤＦＴしてＤＦＴスペクト
ルを得、そのスペクトルに時間方向の±３ポイント以下
の変位に相当するスペクトル上の位相回転を複数種類疑
似的に時間方向の微小な変位を段階的に作り出し、それ
ぞれを複数代表音源１６として出力する構成も可能であ
る。この構成ではサンプリング周期未満の変位も疑似的
に作り出すことが可能である。Example 12 In the tenth embodiment described above, the displacement of the extraction position is obtained by moving the extraction segment on the sound source signal 13. However, the DFT spectrum is obtained by performing DFT on the one-pitch period segment centered on the position obtained from the extraction position information 28. A plurality of types of phase rotations on the spectrum corresponding to displacements of ± 3 points or less in the time direction are generated stepwise in a pseudo manner in the spectrum, and each is output as a plurality of representative sound sources 16. A configuration is also possible. With this configuration, it is possible to artificially create a displacement shorter than the sampling period.

【００６４】実施例１３．上述実施例１０では、単一の
抽出位置の変化により複数の代表音源を得ていたが、請
求項４記載の構成を組み合わせることも可能である。Example 13. In the tenth embodiment described above, a plurality of representative sound sources are obtained by changing the single extraction position, but it is also possible to combine the configuration according to claim 4.

【００６５】[0065]

【発明の効果】以上説明したように、請求項１記載の発
明によれば、複数代表音源抽出手段により音源信号から
代表音源を複数抽出して出力し、音源符号化歪算出手段
により複数代表音源抽出手段が抽出した各代表音源を所
定の符号化操作で出力可能な各音源符号に符号化した場
合の歪を全て算出し、音源符号選択手段により音源符号
化歪算出手段が算出した歪を比較し、その歪が最小とな
る代表音源と音源符号の組み合わせを選択してその音源
符号を出力するように構成したので、単一の代表音源を
用いて符号化していたときには見逃していたより符号化
特性の良好な１ピッチ周期区間を符号化に用いることが
でき、総合的に符号化特性を向上することができる。As described above, according to the invention of claim 1, a plurality of representative sound source extraction means extracts a plurality of representative sound sources from a sound source signal and outputs the same, and a sound source coding distortion calculation means makes a plurality of representative sound source. All distortions when each representative excitation extracted by the extraction means is encoded into each excitation code that can be output by a predetermined encoding operation are calculated, and the distortions calculated by the excitation coding distortion calculation means by the excitation code selection means are compared. However, since it is configured to output the excitation code by selecting the combination of the representative excitation and the excitation code that minimizes the distortion, the coding characteristics that are overlooked when encoding using a single representative excitation Can be used for encoding, and the encoding characteristics can be improved comprehensively.

【００６６】請求項２記載の発明によれば、複数代表音
源抽出手段により音源信号から代表音源を複数抽出して
出力し、音源符号化歪算出手段により複数代表音源抽出
手段が抽出した各代表音源を所定の符号化操作で出力可
能な各音源符号に符号化した場合の歪を全て算出し、音
源符号決定手段により音源符号化歪算出手段が算出した
各歪に所定の重み係数を乗じて音源符号毎に和を取り、
その和が最小になる音源符号を出力するように構成した
ので、単一の代表音源を用いて符号化していた場合に比
べ、フレーム内の平均的な音源信号波形の様態を符号化
に反映させることができ、復号後の音源の安定度を向上
させることができる。According to the second aspect of the invention, a plurality of representative sound sources are extracted from the sound source signal by the plurality of representative sound source extracting means and output, and the representative sound source extracted by the plurality of representative sound source extracting means by the sound source coding distortion calculating means. Is calculated by encoding each excitation code that can be output by a predetermined encoding operation, and the excitation code determination means multiplies each distortion calculated by the excitation coding distortion calculation means by a predetermined weighting factor Take the sum for each sign,
Since the excitation code whose sum is the minimum is output, the state of the average excitation signal waveform in the frame is reflected in the encoding as compared with the case where the encoding is performed using a single representative excitation. Therefore, the stability of the sound source after decoding can be improved.

【００６７】請求項３記載の発明によれば、複数代表音
源抽出手段により音源信号から代表音源を複数抽出して
出力し、平均化代表音源算出手段により複数代表音源抽
出手段が抽出した各代表音源の平均化処理を行って得ら
れた１つの平均化代表音源を出力し、平均化音源符号化
手段により平均化代表音源に所定の符号化操作を行って
符号化時の歪を最小にする音源符号を出力するように構
成したので、単一の代表音源を用いて符号化していた場
合に比べ、フレーム内の平均的な音源信号波形の様態を
符号化に反映させることができ、復号後の音源の安定度
を向上させることができる。According to the third aspect of the present invention, a plurality of representative sound sources are extracted from the sound source signal by the plurality of representative sound source extracting means and output, and each representative sound source extracted by the plurality of representative sound source extracting means by the averaged representative sound source calculating means. A sound source that outputs one averaged representative sound source obtained by performing the averaging process and performs a predetermined coding operation on the averaged representative sound source by the averaging sound source coding means to minimize distortion at the time of coding. Since it is configured to output the code, the state of the average excitation signal waveform in the frame can be reflected in the encoding, compared with the case where the encoding is performed using a single representative excitation, The stability of the sound source can be improved.

【００６８】請求項４記載の発明によれば、代表音源抽
出位置決定手段により現在のフレーム内の音源信号より
１つの代表音源抽出位置を決定して出力し、ピッチ周期
間隔代表音源抽出手段により代表音源抽出位置の代表音
源とその代表音源抽出位置とから前向または後向におお
よそピッチ周期の整数倍長離れた位置の音源とから複数
の１ピッチ周期長の代表音源を抽出するように構成した
ので、従来のように単一の代表音源を用いた場合に見ら
れたサンプリング点のずれの影響に対し、複数の代表音
源の中からその振幅ピークに対するサンプリング点の位
置関係が符号後のそれとよく合致し、類似の波形を示す
別の代表音源を用いることができ、符号化特性を向上す
ることができる。According to the fourth aspect of the present invention, the representative sound source extraction position determining means determines and outputs one representative sound source extraction position from the sound source signal in the current frame, and the pitch period interval representative sound source extracting means represents the representative sound source extraction position. It is configured to extract a plurality of representative sound sources of one pitch cycle length from the representative sound source at the sound source extraction position and the sound source at a position that is distant from the representative sound source extraction position forward or backward by an integer multiple of the pitch cycle. Therefore, the positional relationship of the sampling point with respect to its amplitude peak among multiple representative sound sources is often the same as that after coding, against the influence of the deviation of the sampling points seen when using a single representative sound source as in the past. It is possible to use another representative sound source that matches and exhibits a similar waveform, and it is possible to improve the coding characteristic.

【００６９】請求項５記載の発明によれば、代表音源抽
出位置決定手段により現在のフレーム内の音源信号より
１つの代表音源抽出位置を決定して出力する、変位代表
音源抽出手段により代表音源抽出位置に対してピッチ周
期に比べて微小な複数種類の前後の移動を与えてそれら
の位置の音源信号より複数の１ピッチ周期長の代表音源
を抽出するように構成したので、サンプリング点のずれ
の影響を回避し得る代表音源を抽出することができ、符
号化特性を向上することができる。According to the fifth aspect of the invention, the representative sound source extraction position determining means determines and outputs one representative sound source extraction position from the sound source signal in the current frame, and the displacement representative sound source extracting means extracts the representative sound source. Since a plurality of types of front and rear movements that are minute compared to the pitch period are given to the position and a plurality of representative sound sources of one pitch period length are extracted from the sound source signals at those positions, the deviation of the sampling points A representative sound source that can avoid the influence can be extracted, and the coding characteristic can be improved.

[Brief description of drawings]

【図１】請求項１記載の発明に係る音声符号化装置の構
成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to the invention described in claim 1.

【図２】請求項２記載の発明に係る音声符号化装置の構
成を示すブロック図である。[Fig. 2] Fig. 2 is a block diagram showing a configuration of a speech coding apparatus according to a second aspect of the invention.

【図３】請求項３記載の発明に係る音声符号化装置の構
成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a speech encoding apparatus according to the invention of claim 3.

【図４】請求項４記載の発明に係る音声符号化装置の構
成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a speech encoding apparatus according to a fourth aspect of the invention.

【図５】請求項５記載の発明に係る音声符号化装置の構
成を示すブロック図である。[Fig. 5] Fig. 5 is a block diagram showing the structure of a speech encoding apparatus according to the invention of claim 5.

【図６】従来の音声符号化装置の構成を示すブロック図
である。FIG. 6 is a block diagram showing a configuration of a conventional speech encoding device.

【図７】従来の音声符号化装置において波形のサンプリ
ング間隔未満のずれが生じる仕組みを説明するための図
である。[Fig. 7] Fig. 7 is a diagram for explaining a mechanism in which a shift of less than a sampling interval of a waveform occurs in a conventional speech encoding device.

【図８】従来の音声符号化装置において波形のサンプリ
ング間隔程度のずれが生じる仕組みを説明するための図
である。[Fig. 8] Fig. 8 is a diagram for explaining a mechanism in which a shift of about a sampling interval of a waveform occurs in a conventional speech encoding device.

[Explanation of symbols]

２スペクトル形状分析手段３ピッチ周期抽出手段４有声無声判定手段５スペクトル形状情報符号化手段６ピッチ周期符号化手段７有声無声情報符号化手段８有声音源符号化手段９複数代表音源抽出手段１０音源符号化歪算出手段１１音源符号選択手段２２音源符号決定手段２３平均化代表音源算出手段２４平均化音源符号化手段２６代表音源抽出位置決定手段２７ピッチ周期間隔代表音源抽出手段２９変位代表音源抽出手段 2 spectrum shape analysis means 3 pitch period extraction means 4 voiced unvoiced determination means 5 spectrum shape information coding means 6 pitch period coding means 7 voiced unvoiced information coding means 8 voiced sound source coding means 9 multiple representative sound source extraction means 10 sound source codes Distortion distortion calculation means 11 Excitation code selection means 22 Excitation code determination means 23 Averaged representative excitation calculation means 24 Averaged excitation coding means 26 Representative excitation extraction position determination means 27 Pitch cycle interval representative excitation extraction means 29 Displacement representative excitation extraction means

Claims

[Claims]

1. A voice signal converted into a digital signal at a predetermined sampling period is analyzed in units of frames of a predetermined length and separated into spectrum shape information and a sound source signal, and when the voice signal is a voiced sound, A pitch period is obtained, a representative excitation having a pitch length of 1 is extracted from the excitation signal in or near the current frame, the representative excitation is encoded by a predetermined encoding operation, and an excitation code is output. ,
In a speech coder for encoding and outputting a plurality of parameters including spectrum shape information and voiced / unvoiced information, a plurality of representative sound source extraction means for extracting and outputting a plurality of representative sound sources from a sound source signal and a plurality of representative sound source extraction means are extracted. Comparing the distortion calculated by the excitation coding distortion calculation means and the excitation coding distortion calculation means for calculating all the distortions when each representative excitation coded into each excitation code that can be output by the predetermined coding operation, A speech encoding apparatus, comprising: an excitation code selecting unit that selects a combination of a representative excitation and an excitation code having the minimum distortion and outputs the excitation code.

2. A voice signal converted into a digital signal at a predetermined sampling period is analyzed in frame units of a predetermined length and separated into spectrum shape information and a sound source signal, and when the voice signal is a voiced sound, A pitch period is obtained, a representative excitation having a pitch length of 1 is extracted from the excitation signal in or near the current frame, the representative excitation is encoded by a predetermined encoding operation, and an excitation code is output. ,
In a speech coder for encoding and outputting a plurality of parameters including spectrum shape information and voiced / unvoiced information, a plurality of representative sound source extraction means for extracting and outputting a plurality of representative sound sources from a sound source signal and a plurality of representative sound source extraction means are extracted. Excitation coding distortion calculation means for calculating all distortions when each representative excitation is coded into each excitation code that can be output by a predetermined encoding operation, and each distortion calculated by the excitation coding distortion calculation means is predetermined for each distortion. A speech coding apparatus comprising: a sound source code determining unit that multiplies a weighting coefficient to obtain a sum for each sound source code, and outputs a sound source code having the smallest sum.

3. A voice signal converted into a digital signal at a predetermined sampling period is analyzed in frame units of a predetermined length and separated into spectrum shape information and a sound source signal, and when the voice signal is a voiced sound, A pitch period is obtained, a representative excitation having a pitch length of 1 is extracted from the excitation signal in or near the current frame, the representative excitation is encoded by a predetermined encoding operation, and an excitation code is output. ,
In a speech coder for encoding and outputting a plurality of parameters including spectrum shape information and voiced / unvoiced information, a plurality of representative sound source extraction means for extracting and outputting a plurality of representative sound sources from a sound source signal and a plurality of representative sound source extraction means are extracted. The averaged representative sound source calculation means for outputting one averaged representative sound source obtained by performing the averaging process of each of the representative sound sources, and the predetermined representative sound source is subjected to a predetermined coding operation to reduce distortion at the time of coding. A speech coding apparatus, comprising: an averaging excitation coding means for outputting an excitation code to be minimized.

4. The representative sound source extraction position deciding means for deciding and outputting one representative sound source extraction position from the sound source signal in the current frame, the plural representative sound source extracting means, and the representative sound source at the representative sound source extraction position Pitch cycle interval representative sound source extraction means for extracting a plurality of representative sound sources of one pitch cycle length from a sound source at a position approximately an integer multiple of the pitch cycle away from the representative sound source extraction position in the forward direction or the backward direction. The speech coding apparatus according to any one of claims 1 to 3, which is characterized.

5. The representative sound source extraction position deciding means for deciding and outputting one representative sound source extraction position from a sound source signal in a current frame, and the plurality of representative sound source extracting means have a pitch cycle with respect to the representative sound source extraction position. A displacement representative sound source extraction means for applying a plurality of minute forward and backward movements to extract a plurality of representative sound sources having a one-pitch cycle length from the sound source signals at those positions. The speech coding apparatus according to any one of claims 1 to 3.