JPH0784597A - Speech encoding device and speech decoding device - Google Patents

Speech encoding device and speech decoding device

Info

Publication number
JPH0784597A
JPH0784597A JP23270493A JP23270493A JPH0784597A JP H0784597 A JPH0784597 A JP H0784597A JP 23270493 A JP23270493 A JP 23270493A JP 23270493 A JP23270493 A JP 23270493A JP H0784597 A JPH0784597 A JP H0784597A
Authority
JP
Japan
Prior art keywords
sound
voice
time
consonant
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP23270493A
Other languages
Japanese (ja)
Inventor
Toshiaki Nobumoto
俊明 信本
Shoji Fujino
尚司 藤野
Mitsuru Tsuboi
満 坪井
Naoji Matsuo
直司 松尾
Osahide Eguchi
修英 江口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP23270493A priority Critical patent/JPH0784597A/en
Publication of JPH0784597A publication Critical patent/JPH0784597A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)

Abstract

PURPOSE:To provide the speech encoding device and speech decoding device which improve the quality of a reproduced sound at the time of a voiceless sound signal with small periodicity and a consonant with large variation while maintaining the quality of a voiced sound with large periodicity when the time- base compression and time-base expansion of a speech are performed. CONSTITUTION:The speech encoding device equipped with a pitch period detection part 700 which detects the pitch period of a speech input signal is equipped with a speech kind decision means 200 which decides which of a voiced sound, a voiceless sound, and a consonant the speech input signal is, a signal converting means 400 which converts the speech input signal to a positive or negative certain value according to whether the signal is positive or negative when the speech signal is the voiceless sound or consonant, and switch means 300 and 500 which apply the speech input signal as it is when the speech input signal is the voiced sound or the output of the signal converting means to the pitch detection part when the voiceless sound or consonant according to the output of the voice kind decision means.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、音声符号化装置および
音声復号化装置に関するものであり、特に時間領域調波
圧縮・伸長方式(以下TDHS:Time Domain Harmonic
Scalingと称する)による音声符号化および復号化を行
う場合、無声音および子音の符号化および復号化品質を
向上させることを目的とした音声符号化装置および音声
復号化装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coder and a speech decoder, and more particularly to a time domain harmonic compression / expansion system (TDHS: Time Domain Harmonic).
The present invention relates to a voice encoding device and a voice decoding device for improving the encoding and decoding quality of unvoiced sounds and consonants when performing voice encoding and decoding by Scaling).

【0002】近年のディジタル回線の普及にともない、
その回線の有効利用を図るため、音声信号を高品質に維
持しつつ高能率に情報量圧縮を行う音声信号圧縮技術が
種々提案されている。そのなかで、音声信号の時間的周
期性(ピッチ周期)を利用して、送信側では特性のよく
似た数周期の信号を適切な重みを掛け合わせて一ピッチ
周期の信号に圧縮して符号化し、受信側では圧縮された
その信号の前後の関係を考慮しながら圧縮とは逆の伸長
を行うTDHSが知られている。
With the spread of digital lines in recent years,
In order to effectively utilize the line, various voice signal compression techniques have been proposed which efficiently compress the information amount while maintaining high quality of the voice signal. Among them, by utilizing the temporal periodicity (pitch period) of the voice signal, the transmitter side multiplies signals of several periods with similar characteristics by appropriate weighting, compresses it into a signal of one pitch period, and encodes it. TDHS is known which performs decompression opposite to compression while considering the front-back relationship of the compressed signal on the receiving side.

【0003】上記TDHSでは、周期性のある有声音と
同様に周期性の小さい無声音についても時間軸の圧縮を
行っている。
In the TDHS described above, unvoiced sound having a small periodicity is compressed on the time axis as well as voiced sound having a periodicity.

【0004】[0004]

【従来の技術】送信側で電話等の音声信号を符号化して
伝送し受信側で復号する音声符号化装置において、従来
は送信側では、時間軸圧縮部(図示しない)において、
音声信号を時間軸圧縮を行った後符号化部(図示しな
い)で所定の符号化を行い、この出力を他のチャネルの
符号化出力とともに多重化して伝送路に送出する。図8
は一例の時間軸圧縮伸長方式を説明するための図であ
り、(a) は送信側での時間軸圧縮処理、(b) は受信側で
の時間軸伸長処理の方法を示す。
2. Description of the Related Art In a speech coder that encodes and transmits a speech signal of a telephone or the like on the transmitting side and decodes it on the receiving side, conventionally, on the transmitting side, in a time base compression section (not shown),
The audio signal is time-axis compressed and then encoded by a coding unit (not shown). This output is multiplexed with the coded outputs of other channels and sent to the transmission line. Figure 8
FIG. 4 is a diagram for explaining an example of a time axis compression / expansion method, (a) shows a time axis compression processing on the transmission side, and (b) shows a time axis expansion processing method on the reception side.

【0005】同図(a) において、まず音声入力信号のピ
ッチ検出を行うが、このときの信号の相関を見る基準と
して、下記に示す自己相関計数(数式1)や共分散値
(数式2)が用いられていた。
In FIG. 1 (a), the pitch of the voice input signal is first detected. As a reference for observing the correlation of the signal at this time, the autocorrelation coefficient (equation 1) and covariance value (equation 2) shown below are used. Was used.

【0006】[0006]

【数1】 [Equation 1]

【0007】[0007]

【数2】 但し、S:入力音声信号、F:評価式である。次に、抽
出されたピッチ周期で分けられる連続する二つの音声信
号列(Pp1とPp2の区間)を一つに圧縮する。即ち、前
の周期の信号には窓掛け係数(1−Wc(n))を掛け、後
ろの周期の信号には窓掛け係数Wc(n)を掛け、両者を加
算して、1つの信号Sc(n)として時間軸を圧縮してい
る。
[Equation 2] However, S: input voice signal, F: evaluation formula. Next, two consecutive audio signal sequences (sections Pp 1 and Pp 2 ) divided by the extracted pitch period are compressed into one. That is, the signal of the previous cycle is multiplied by the windowing coefficient (1-Wc (n)), the signal of the subsequent cycle is multiplied by the windowing coefficient Wc (n), and both are added to obtain one signal Sc. The time axis is compressed as (n).

【0008】ピッチ検出を行う際の基準である自己相関
計数や共分散値は、ピッチの長い場合も短い場合も等し
い評価を行う。ところが、ピッチを長くとると一般に明
瞭度が劣化するため、最適ピッチが必ずしも音声品質の
点で最適なピッチとはならない場合がある。
The autocorrelation count and covariance value, which are the criteria for pitch detection, are evaluated equally when the pitch is long and when it is short. However, as the pitch becomes long, intelligibility generally deteriorates, and therefore the optimum pitch may not always be the optimum pitch in terms of voice quality.

【0009】一方、受信側の時間軸伸長を示す同図(b)
において、圧縮音声信号とは別にピッチ周期の情報が送
られてきて、このピッチ周期情報を元に、3ピッチ周期
分の圧縮信号(図のPp1’、Pp2’、Pp3’の区間)の
うち、前の2ピッチ周期分の信号に窓掛け係数We(n)を
掛け、後ろの2ピッチ周期分の信号に窓掛け係数(1−
We(n))を掛けたものを加算して、元の2ピッチ周期分
の信号Se(n)に時間軸を伸長していた。
[0009] On the other hand, the same figure (b) showing the time axis expansion on the receiving side
In this case, pitch period information is sent separately from the compressed voice signal, and based on this pitch period information, a compressed signal for three pitch periods (section Pp 1 ', Pp 2 ', Pp 3 'in the figure) Among them, the signal for the preceding two pitch periods is multiplied by the windowing coefficient We (n), and the signal for the following two pitch periods is multiplied by the windowing coefficient (1-
We (n)) is added and the time axis is expanded to the original signal Se (n) for two pitch periods.

【0010】[0010]

【発明が解決しようとする課題】上記TDHSでは、周
期性の少ない無声音についても、あるいは有声音であっ
ても話始めである子音部では、比較的低いレベル、短い
時間内に人間の音声認識に必要な情報が多く含まれてい
るが、周期性の大きい有声音と同じ時間軸の圧縮および
伸長を行っている。
In the TDHS described above, even for unvoiced sound with a small periodicity, or in the consonant part that is the beginning of a talk even if it is a voiced sound, it is possible to recognize a human voice within a short time at a relatively low level. Although it contains a lot of necessary information, it compresses and decompresses the same time axis as voiced sound with large periodicity.

【0011】この結果、無声音に対しては正しい符号化
および復号再生処理が行われず、子音の曖昧さという形
で復号音声が劣化するという問題点が生じていた。ま
た、周期性の大きな有声音であっても、大きな周期で圧
縮するよりも短い周期で圧縮した方が音声の明瞭度の点
で有利である。
As a result, a problem arises in that the correct encoding and decoding / reproducing process is not performed on unvoiced sound, and the decoded speech is deteriorated in the form of ambiguity of consonants. Further, even for voiced sound having a large periodicity, it is advantageous in terms of clarity of voice to compress it in a shorter cycle than in a large cycle.

【0012】したがって本発明は、音声信号のうち無声
音および子音部の特性を利用して、音声の時間軸圧縮・
時間軸伸長を行う際に、周期性の大きい有声音の品質を
維持しつつ、周期性の少ない無声音信号、変動の激しい
子音時の再生音質の品質を向上させる音声符号化装置お
よび音声復号化装置を提供することを目的とする。
Therefore, according to the present invention, the characteristics of the unvoiced sound and the consonant portion of the voice signal are used to compress the time base of the voice.
A voice encoding device and a voice decoding device that improve the quality of unvoiced sound signal with less periodicity and the reproduced sound quality at the time of consonant with large fluctuation while maintaining the quality of voiced sound with large periodicity when performing time axis expansion. The purpose is to provide.

【0013】[0013]

【課題を解決するための手段】上記問題点は図1および
図2に示す装置の構成によって解決される。図1におい
て(請求項1)、音声入力信号のピッチ周期を検出する
ピッチ周期検出部700 を具備し、該ピッチ周期を用いて
該音声入力信号を時間軸圧縮して符号化する送信部を有
する音声符号化装置において、前記音声入力信号に対し
て有声音か無声音または子音かの判定を行う音声種類判
定手段200 と、該無声音または子音の場合には、前記音
声入力信号を正または負に応じて正または負の一定値に
変換する信号変換手段400 と、該音声種類判定手段の出
力により、前記音声入力信号が有声音の場合はそのま
ま、無声音または子音の場合には該信号変換手段の出力
を前記ピッチ検出部に加えるためのスイッチ手段300、5
00 とを設ける。
The above problems can be solved by the structure of the apparatus shown in FIGS. In FIG. 1 (claim 1), a pitch period detection unit 700 for detecting a pitch period of a voice input signal is provided, and a transmission unit for time-axis-compressing and encoding the voice input signal using the pitch period is provided. In the voice encoding device, a voice type determining unit 200 that determines whether the voice input signal is voiced sound, unvoiced sound, or consonant, and, in the case of the unvoiced sound or consonant, determines whether the voice input signal is positive or negative. When the voice input signal is voiced sound, it is output as it is, and when the voice input signal is unvoiced sound or consonant sound, the output of the signal conversion means 400 Switch means 300, 5 for adding to the pitch detector.
00 is provided.

【0014】(請求項2)、音声入力信号のピッチ周期
を検出するピッチ周期検出部700 と、該ピッチ周期を用
いて連続する2つのピッチ周期区間毎に該音声入力信号
を時間軸圧縮する第1の時間軸圧縮手段110 とを具備
し、該第1の時間軸圧縮手段の出力を符号化して送出す
る送信部を有する音声符号化装置において、前記音声入
力信号に対して有声音か無声音または子音かの判定を行
う音声種類判定手段200 と、該無声音または子音の場合
には、前記連続する2つのピッチ周期区間の音声入力信
号に対してそれぞれ、正弦波関数および該正弦波の位相
を反転した関数を掛けて時間軸圧縮する第2の時間軸圧
縮手段120 と、該音声種類判定手段の出力により、前記
音声入力信号が有声音の場合には前記第1の時間軸圧縮
手段側に、又、無声音または子音の場合には該第2の間
軸圧縮手段側に切り替えるためのスイッチ手段105、130
とを設ける。
(Claim 2) A pitch cycle detector 700 for detecting a pitch cycle of a voice input signal, and time-axis compression of the voice input signal for every two consecutive pitch cycle sections using the pitch cycle. 1. A voice encoding device having one time axis compression means 110 and having a transmission section for encoding and transmitting the output of the first time axis compression means, in a voiced or unvoiced sound for the voice input signal. A voice type determining means 200 for determining whether or not a consonant, and in the case of the unvoiced sound or consonant, inverts the sine wave function and the phase of the sine wave for the voice input signal in the two continuous pitch period sections. When the voice input signal is a voiced sound, the second time-axis compression means 120 for performing time-axis compression by multiplying the above function and the output of the voice type determination means, on the side of the first time-axis compression means, Again, unvoiced Is a consonant, switch means 105, 130 for switching to the second intermediate compression means side.
And.

【0015】(請求項3)、前記請求項2の音声符号化
装置から送出された時間軸圧縮され符号化された音声信
号を入力して、復号化して時間軸伸長する音声復号化装
置であって、前記時間軸圧縮された音声信号に対して有
声音か無声音または子音かの判定を行う音声種類判定手
段205 と、該有声音の場合に時間軸伸長を行う第1の時
間軸伸長手段230 と、該無声音または子音の場合には、
連続する3つのピッチ周期区間のうち前方2つ、及び後
方2つのピッチ周期区間の時間軸圧縮された音声信号に
対してそれぞれ、正弦波関数および該正弦波の位相を反
転した関数を掛けて時間軸伸長する第2の時間軸伸長手
段240 と、該音声種類判定手段の出力により、前記音声
信号が有声音の場合には該第1の時間軸伸長手段側に、
又、無声音または子音の場合には該第2の時間軸伸長手
段側に切り替えるためのスイッチ手段220、250 とを設
ける。
(Claim 3) A voice decoding device for inputting a time-axis-compressed and encoded voice signal sent from the voice encoding device of the above-mentioned claim 2, for decoding and for expanding the time-axis. A voice type judging means 205 for judging voiced sound, unvoiced sound or consonant sound with respect to the time-axis-compressed sound signal, and a first time-axis expanding means 230 for expanding the time-axis in the case of the voiced sound. And in the case of the unvoiced sound or consonant,
Time is obtained by multiplying the time-axis-compressed audio signals in the front two and rear two pitch period sections of the three consecutive pitch period sections by a sine wave function and a function obtained by inverting the phase of the sine wave, respectively. When the voice signal is a voiced sound, the second time axis extending means 240 for axis extension and the output of the voice type determining means, to the first time axis extending means side,
Also, in the case of unvoiced sound or consonant, switch means 220, 250 for switching to the second time axis expansion means side are provided.

【0016】[0016]

【作用】図1において(請求項1)、音声入力信号が無
声音または子音の場合には、信号変換手段400 で、前記
音声入力信号を正または負に応じて正または負の一定値
に変換することにより、元々レベルが小さくピッチ周期
が余りない無声音・子音のピッチ周期の算出精度を上げ
ることが可能となる。
In FIG. 1 (claim 1), when the voice input signal is unvoiced or consonant, the signal conversion means 400 converts the voice input signal into a positive or negative constant value according to the positive or negative. As a result, it is possible to improve the accuracy of calculation of the pitch period of unvoiced sounds / consonants whose level is originally small and the pitch period is not excessive.

【0017】(請求項2)、音声入力信号が無声音また
は子音の場合には、前記連続する2つのピッチ周期区間
の音声入力信号に対する窓掛け係数としてそれぞれ、正
弦波関数および該正弦波の位相を反転した関数を用い
て、図7(a)に示すように、連続する2つのピッチ周期
区間Pp1、Pp2の互いに隣接する部分でより小さくし、
両端部分でより大きい値としたことにより、互いに隣接
する部分の信号の連続性を保つことができる。
(Claim 2) When the voice input signal is unvoiced or consonant, the sine wave function and the phase of the sine wave are respectively used as windowing coefficients for the voice input signal in the two continuous pitch period sections. By using the inverted function, as shown in FIG. 7 (a), it is made smaller in the adjacent portions of two consecutive pitch period sections Pp 1 and Pp 2 ,
By setting a larger value at both end portions, it is possible to maintain the continuity of signals in the portions adjacent to each other.

【0018】図2において(請求項3)、受信側で、時
間軸圧縮された音声信号が無声音または子音の場合に
は、連続する3つのピッチ周期区間Pp1’、Pp2’、P
p3’のうち前方2つ、及び後方2つのピッチ周期区間の
時間軸圧縮された音声信号に対する窓掛け係数としてそ
れぞれ、正弦波関数および該正弦波の位相を反転した関
数を用いて、図7(b) に示すように、真中の区間Pp2
でより大きくし、両端の区間Pp1’、Pp3’でより小さ
い値としたことにより、互いに隣接する部分で信号の連
続性を保つことができる。この結果、無声音および子音
について高品位の再生音声を提供することができる。
In FIG. 2 (claim 3), when the time-axis-compressed voice signal is unvoiced or consonant on the receiving side, three consecutive pitch period sections Pp 1 ′, Pp 2 ′, P
Using a sine wave function and a function obtained by inverting the phase of the sine wave as the windowing coefficient for the time-axis-compressed audio signal in the front two and rear two pitch period sections of p 3 ', respectively. As shown in (b), the middle section Pp 2 '
Is set to be larger and the sections Pp 1 ′ and Pp 3 ′ at both ends are set to be smaller, it is possible to maintain signal continuity in portions adjacent to each other. As a result, it is possible to provide high-quality reproduced voice for unvoiced sounds and consonants.

【0019】[0019]

【実施例】図3は本発明の実施例の音声符号化装置の構
成を示すブロック図(送信側)である。
FIG. 3 is a block diagram (transmission side) showing the configuration of a speech coder according to an embodiment of the present invention.

【0020】図4は本発明の実施例の音声符号化装置の
構成を示すブロック図(受信側)である。図5は実施例
における有声音時の共分散値算出方法を説明するための
図である。
FIG. 4 is a block diagram (reception side) showing the configuration of the speech coding apparatus according to the embodiment of the present invention. FIG. 5 is a diagram for explaining a covariance value calculation method for voiced sound in the embodiment.

【0021】図6は実施例における無声音・子音時のピ
ッチ周期抽出用信号補正方法を説明するための図であ
る。図7は実施例の無声音・子音部時間軸圧縮伸長方式
を説明するための図である。
FIG. 6 is a diagram for explaining a pitch period extraction signal correction method for unvoiced sounds / consonants in the embodiment. FIG. 7 is a diagram for explaining the unvoiced / consonant part time axis compression / expansion method of the embodiment.

【0022】図3において、音声入力信号(SIN)は
バッファ1で蓄えられた後、有声音/無声音・子音判定
部2でそのレベルを計算し、予め決められた閾値より小
さいとき無声音または子音、大きいときは有声音と判定
して、スイッチ3、5、10及び13の切替え信号(SJUDG
E)を出力する。
In FIG. 3, the voice input signal (SIN) is stored in the buffer 1 and then its level is calculated by the voiced sound / unvoiced sound / consonant determination unit 2. When the level is smaller than a predetermined threshold value, unvoiced sound or consonant sound, When it is loud, it is judged as voiced sound and the switching signal of switches 3, 5, 10 and 13 (SJUDG
E) is output.

【0023】有声音と判定された場合、上記各スイッチ
をa側に切り替え、バッファ1で蓄えられた一定区間の
音声入力信号(SIN)を共分散算出部6に入力して、
音声入力信号(SIN)に対して前述した共分散法(数
式2)を用いて該一定区間のピッチ周期を求める。
When it is determined to be voiced sound, the respective switches are switched to the a side, and the voice input signal (SIN) of the constant section stored in the buffer 1 is input to the covariance calculation unit 6,
The pitch period of the certain section is obtained by using the above-mentioned covariance method (Formula 2) for the voice input signal (SIN).

【0024】図5に示すように、ピッチ周期を求める操
作はピッチ制御部9からの制御により、最小の1ピッチ
周期の候補値Pmin から最大の1ピッチ周期の候補値P
maxの間でピッチ周期の候補値を変えて、各ピッチ周期
の候補値における共分散の評価式F(n)を求める。図5
において、nはPmin =<n=<Pmax なる条件を満た
す整数で、Sj は点Aからj離れた時点のサンプリング
データ(音声入力信号)を表す。つまり、nをPmin か
らPmax まで変化させてそれぞれ共分散の評価式F(n)
を求める。
As shown in FIG. 5, the operation for obtaining the pitch period is controlled by the pitch control section 9 so that the minimum one-pitch period candidate value Pmin is changed to the maximum one-pitch period candidate value P.
The candidate value of the pitch cycle is changed between max and the evaluation formula F (n) of the covariance in the candidate value of each pitch cycle is obtained. Figure 5
, N is an integer that satisfies the condition of Pmin = <n = <Pmax, and Sj represents sampling data (voice input signal) at a point j away from the point A. That is, n is changed from Pmin to Pmax and the covariance evaluation formula F (n) is calculated.
Ask for.

【0025】最大値探索部7で、この評価式F(n) に対
して数式3と数式4に示すような重み付けを行い、重み
付けされた評価式Fw(n)の値をRAM8に格納する。そ
して、各評価式Fw(n)の値の中の最大値を求め、そのと
きのnをその区間のピッチ周期とする。
The maximum value search unit 7 weights the evaluation formula F (n) as shown in Formulas 3 and 4, and stores the weighted value of the evaluation formula Fw (n) in the RAM 8. Then, the maximum value among the values of each evaluation formula Fw (n) is obtained, and n at that time is set as the pitch period of the section.

【0026】[0026]

【数3】 [Equation 3]

【0027】[0027]

【数4】 但し、Fw :重み付けされた評価式、Wm :重み関数、
L:定数=K×Pmin、K:定数>=0である。
[Equation 4] However, Fw: weighted evaluation formula, Wm: weighting function,
L: constant = K × Pmin, K: constant> = 0.

【0028】又、有声音/無声音・子音判定部2で無声
音・子音と判定された場合には各スイッチ3、5、10及
び13をb側に切り替え、一定区間の音声入力信号(SI
N)をゼロクロス補正部4に入力し、音声入力信号SI
Nの零クロスのみの相関を見て一定区間のピッチ周期を
求める。
When the voiced sound / unvoiced sound / consonant judgment unit 2 judges unvoiced sound / consonant sound, the switches 3, 5, 10 and 13 are switched to the b side, and the voice input signal (SI
N) is input to the zero cross correction unit 4, and the voice input signal SI
The pitch period of a certain section is obtained by observing the correlation of N zero crosses only.

【0029】図6に示すように、零クロス点を境に音声
入力信号が正の場合は正の一定値(k)、負の場合は負
の一定値(−k)と音声入力信号を補正して、有声音と
同様に共分散式(数式2)を用いて判定する。上記補正
を行うことにより、元々レベルが小さくピッチ周期が余
りない無声音・子音のピッチ周期の算出精度を上げるこ
とが可能となる。
As shown in FIG. 6, when the voice input signal is positive, the positive constant value (k) is corrected and when the voice input signal is negative, the negative constant value (-k) is corrected to correct the voice input signal. Then, similarly to the voiced sound, the judgment is performed using the covariance formula (Formula 2). By performing the above correction, it is possible to improve the calculation accuracy of the pitch period of unvoiced sound / consonant whose level is originally small and the pitch period is not excessive.

【0030】次に、有声音と判定された区間では、図8
(a) に示す窓掛け係数を窓掛け係数発生器11で発生し、
無声音と判定された区間では図7(a) に示す窓掛け係数
を窓掛け係数発生器12で発生し、時間軸圧縮部14で、そ
れぞれ2ピッチ周期の信号(Pp1、Pp2の区間)を1ピ
ッチ周期の信号Sc(n)に時間軸圧縮処理する。無声音の
場合の窓掛け係数は数式5に示されるような係数であ
る。
Next, in the section judged as voiced sound, as shown in FIG.
The windowing coefficient shown in (a) is generated by the windowing coefficient generator 11,
In the section that is determined to be unvoiced, the windowing coefficient generator 12 generates the windowing coefficient shown in FIG. 7 (a), and the time axis compression unit 14 outputs a signal of two pitch periods (sections Pp 1 and Pp 2 ). Is time-axis compressed into a signal Sc (n) of one pitch period. The windowing coefficient in the case of unvoiced sound is a coefficient as shown in Expression 5.

【0031】[0031]

【数5】 但し、Ww :窓掛け係数、p: ピッチ周期、i:サンプ
ルである。これはピッチ周期により異なる係数であるの
で、ピッチ周期が求まる毎に新たに計算しているが、予
め各ピッチ周期毎にテーブルとして用意して利用しても
よい。
[Equation 5] However, Ww: windowing coefficient, p: pitch period, i: sample. Since this is a coefficient that differs depending on the pitch period, it is newly calculated each time the pitch period is obtained, but it may be prepared and used as a table for each pitch period in advance.

【0032】この結果、無声音・子音の窓掛け係数を数
式5を用いて、図7(a)に示すように連続する2つのピ
ッチ周期区間Pp1、Pp2の互いに隣接する部分でより小
さくし、両端部分でより大きい値としたことにより、互
いに隣接する部分の信号の連続性を保つことができる。
時間軸圧縮処理した信号は符号化部15でさらにCELP符号
化等により圧縮符号化し、多重化部16でRAM8から読
み出した最適ピッチ周期の情報とともに多重化して伝送
路に送出する。
As a result, the windowing coefficient of unvoiced sound / consonant is made smaller by using Equation 5 in the adjacent portions of two consecutive pitch period sections Pp 1 and Pp 2 as shown in FIG. 7A. By setting a larger value at both end portions, it is possible to maintain the continuity of signals in the portions adjacent to each other.
The signal subjected to the time-axis compression processing is further compression-coded by CELP coding or the like in the encoding unit 15, multiplexed in the multiplexing unit 16 together with the information of the optimum pitch period read from the RAM 8 and sent to the transmission line.

【0033】受信側では、伝送路より送られてきた信号
は図4の多重化部17で符号化データと最適ピッチ周期の
情報に分離される。符号化データは復号化部18で復号し
て時間軸圧縮処理されたデータRINを出力し、バッフ
ァ19に蓄える。
On the receiving side, the signal sent from the transmission line is separated into coded data and optimum pitch period information by the multiplexing unit 17 in FIG. The encoded data is decoded by the decoding unit 18, and the time-axis compressed data RIN is output and stored in the buffer 19.

【0034】有声音/無声音・子音判定部20で時間軸圧
縮処理されたデータRINのレベルを計算し、予め決め
られた閾値より小さいとき無声音または子音、大きいと
き有声音と判定し、スイッチ22、25の切替え信号(RJUD
GE)を出力する。ただし、このレベルの計算はピッチ周
期単位で行っているため、閾値は数式6で表される値と
している。
The voiced / unvoiced / consonant determination unit 20 calculates the level of the data RIN which has been time-axis compressed, and when it is smaller than a predetermined threshold value, it is determined as unvoiced sound or consonant, and when it is larger, it is determined as a voiced sound. 25 switching signals (RJUD
GE) is output. However, since the calculation of this level is performed in pitch cycle units, the threshold value is the value expressed by Equation 6.

【0035】[0035]

【数6】 但し、Ps :閾値、p:ピッチ周期、K:定数>=0で
ある。有声音と判定された区間では、図8(b) に実線で
示す窓掛け係数を窓掛け係数発生器23で発生し、無声音
または子音と判定された区間では図7(b) に示す窓掛け
係数を窓掛け係数発生器24で発生する。この出力を時間
軸伸長部21で時間軸伸長処理し、ROUTとして出力す
る。無声音部の窓掛け係数は数式7〜数式9で示される
ような係数である。
[Equation 6] However, Ps: threshold value, p: pitch period, K: constant> = 0. In the section judged as voiced sound, the windowing coefficient shown by the solid line in Fig. 8 (b) is generated by the windowing coefficient generator 23, and in the section judged as unvoiced sound or consonant, the windowing coefficient shown in Fig. 7 (b). The coefficients are generated by the windowing coefficient generator 24. This output is time-axis expanded by the time-axis expansion unit 21 and output as ROUT. The windowing coefficient of the unvoiced part is a coefficient as shown in Equations 7-9.

【0036】[0036]

【数7】 [Equation 7]

【0037】[0037]

【数8】 [Equation 8]

【0038】[0038]

【数9】 但し、Ww :窓掛け係数、p:ピッチ周期、i:サンプ
ル、Ww1:時間軸伸長用窓掛け係数1、Ww2:時間軸伸
長用窓掛け係数2、Sc :時間軸圧縮処理されたデー
タ、Se :時間軸伸長されたデータである。
[Equation 9] However, Ww: windowing coefficient, p: pitch period, i: sample, Ww 1: windowing a time axis extension coefficient 1, Ww 2: Time-base decompression windowing coefficient 2, Sc: time warping processed data , Se: Time-axis expanded data.

【0039】これはピッチ周期により異なる係数である
ので、各ピッチ周期毎にテーブルとして用意している
が、ピッチ周期毎に新たに計算してもよい。この結果、
無声音および子音の窓掛け係数を数式7、数式8を用い
て、図7(b) に示すように、連続する3つのピッチ周期
区間Pp1’、Pp2’、Pp3’の真中の区間Pp2’でより
大きくし、両端の区間Pp1’、Pp3’でより小さい値と
したことにより、互いに隣接する部分で信号の連続性を
保つことができる。この結果、無声音および子音につい
て高品位の再生音声を提供することができる。
Since this coefficient is different depending on the pitch cycle, it is prepared as a table for each pitch cycle, but may be newly calculated for each pitch cycle. As a result,
As shown in FIG. 7 (b), the windowing coefficients of unvoiced sound and consonant are calculated by using Equations 7 and 8 as shown in FIG. 7 (b), and the middle section Pp of three consecutive pitch period sections Pp 1 ′, Pp 2 ′, and Pp 3 ′. By making the value larger in 2 ′ and smaller in the intervals Pp 1 ′ and Pp 3 ′ at both ends, it is possible to maintain signal continuity in the portions adjacent to each other. As a result, it is possible to provide high-quality reproduced voice for unvoiced sounds and consonants.

【0040】上述した実施例では、圧縮音声を解析し、
有声音/無声音・子音の判定を行っているが、送信側か
ら伝送路を介して有声音/無声音・子音の判定信号を伝
送してもよい。その際は、送信側では判定信号(SJUDJ
E)を多重化部で最適ピッチ周期情報、符号化データと
ともに多重化することにより、受信側では受信側の有声
音/無声音・子音判定部20を省略でき、多重化部17で伝
送されてきた信号を最適ピッチ周期、符号化データ及び
判定信号(RJUDJE)に分解することになる。
In the above embodiment, the compressed voice is analyzed,
Although the voiced sound / unvoiced sound / consonant is determined, a voiced sound / unvoiced sound / consonant determination signal may be transmitted from the transmission side through a transmission path. In that case, on the sending side, the judgment signal (SJUDJ
By multiplexing E) with optimum pitch period information and coded data in the multiplexing unit, the receiving side can omit the voiced sound / unvoiced sound / consonant determination unit 20 on the receiving side, and the multiplexing unit 17 has transmitted. The signal will be decomposed into an optimum pitch period, encoded data and a judgment signal (RJUDJE).

【0041】[0041]

【発明の効果】以上説明したように本発明によれば、
(請求項1)音声入力信号が無声音または子音の場合に
は、信号変換手段400 で、前記音声入力信号を正または
負に応じて正または負の一定値に変換することにより、
元々レベルが小さくピッチ周期が余りない無声音・子音
のピッチ周期の算出精度を上げることが可能となる。
As described above, according to the present invention,
(Claim 1) When the voice input signal is an unvoiced sound or a consonant, the signal converting means 400 converts the voice input signal into a positive or negative constant value according to positive or negative,
It is possible to improve the calculation accuracy of the pitch period of unvoiced sounds / consonants that originally have a small level and a few pitch periods.

【0042】(請求項2)また、音声入力信号が無声音
または子音の場合には、前記連続する2つのピッチ周期
区間の音声入力信号に対する窓掛け係数としてそれぞ
れ、正弦波関数および該正弦波の位相を反転した関数を
用いて、図7(a)に示すように、連続する2つのピッチ
周期区間Pp1、Pp2の互いに隣接する部分でより小さく
し、両端部分でより大きい値としたことにより、互いに
隣接する部分の信号の連続性を保つことができる。
(Claim 2) When the voice input signal is unvoiced or consonant, the sine wave function and the phase of the sine wave are respectively used as windowing coefficients for the voice input signal in the two continuous pitch period sections. As shown in FIG. 7 (a), by using a function obtained by inverting, the value is set to be smaller at the adjacent portions of two consecutive pitch period sections Pp 1 and Pp 2 and set to be larger at both end portions. , It is possible to maintain the continuity of signals in the portions adjacent to each other.

【0043】(請求項3)受信側で、時間軸圧縮された
音声信号が無声音または子音の場合には、連続する3つ
のピッチ周期区間Pp1’、Pp2’、Pp3’のうち前方2
つ、及び後方2つのピッチ周期区間の時間軸圧縮された
音声信号に対する窓掛け係数としてそれぞれ、正弦波関
数および該正弦波の位相を反転した関数を用いて、図7
(b) に示すように、真中の区間Pp2’でより大きくし、
両端の区間Pp1’、Pp3’でより小さい値としたことに
より、互いに隣接する部分で信号の連続性を保つことが
できる。この結果、無声音および子音について高品位の
再生音声を提供することができる。
(Claim 3) On the receiving side, when the time-axis-compressed voice signal is unvoiced or consonant, the front two of the three consecutive pitch period sections Pp 1 ′, Pp 2 ′, Pp 3 ′.
7 and a sine wave function and a function obtained by inverting the phase of the sine wave, respectively, are used as windowing coefficients for a time-axis-compressed audio signal in two pitch period sections and two rearward pitch period sections, respectively.
As shown in (b), it is made larger in the middle section Pp 2 ',
By setting a smaller value in the sections Pp 1 ′ and Pp 3 ′ at both ends, it is possible to maintain signal continuity in the portions adjacent to each other. As a result, it is possible to provide high-quality reproduced voice for unvoiced sounds and consonants.

【図面の簡単な説明】[Brief description of drawings]

【図1】は請求項1および請求項2の発明の原理図、FIG. 1 is a principle diagram of the invention of claims 1 and 2;

【図2】は請求項3の発明の原理図、2 is a principle diagram of the invention of claim 3, FIG.

【図3】は本発明の実施例の音声符号化装置の構成を示
すブロック図(送信側)、
FIG. 3 is a block diagram (transmission side) showing a configuration of a speech encoding apparatus according to an embodiment of the present invention,

【図4】は本発明の実施例の音声符号化装置の構成を示
すブロック図(受信側)、
FIG. 4 is a block diagram (reception side) showing a configuration of a speech encoding apparatus according to an embodiment of the present invention,

【図5】は実施例における有声音時の共分散値算出方法
を説明するための図、
FIG. 5 is a diagram for explaining a covariance value calculation method for voiced sound according to the embodiment;

【図6】は実施例における無声音・子音時のピッチ周期
抽出用信号補正方法を説明するための図、
FIG. 6 is a diagram for explaining a pitch period extraction signal correction method for unvoiced sounds / consonants in the embodiment,

【図7】は実施例の無声音・子音部時間軸圧縮伸長方式
を説明するための図、
FIG. 7 is a diagram for explaining the unvoiced / consonant part time axis compression / expansion method of the embodiment;

【図8】は一例の時間軸圧縮伸長方式を説明するための
図である。
FIG. 8 is a diagram for explaining an example of a time axis compression / expansion method.

【符号の説明】[Explanation of symbols]

105 、130 、220 、250 、300、500 はスイッチ手段、 110 は第1の時間軸圧縮手段、 120 は第2の時間軸圧縮手段、 200 、205は音声種類判定手段、 230 は第1の時間軸伸長手段、 240 は第2の時間軸伸長手段、 400 は信号変換手段、 700 はピッチ周期検出部 を示す。 105, 130, 220, 250, 300, 500 are switch means, 110 is a first time base compression means, 120 is a second time base compression means, 200, 205 are voice type determination means, and 230 is a first time base. Axis expanding means, 240 is second time axis expanding means, 400 is signal converting means, and 700 is a pitch period detecting section.

───────────────────────────────────────────────────── フロントページの続き (72)発明者 坪井 満 神奈川県川崎市中原区上小田中1015番地 富士通株式会社内 (72)発明者 松尾 直司 神奈川県川崎市中原区上小田中1015番地 富士通株式会社内 (72)発明者 江口 修英 福岡県福岡市博多区博多駅前3丁目22番8 号 富士通九州ディジタル・テクノロジ株 式会社内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Mitsuru Tsuboi, 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture, Fujitsu Limited (72) Inventor, Naoji Matsuo, 1015, Kamiodanaka, Nakahara-ku, Kawasaki, Kanagawa Prefecture, Fujitsu Limited ( 72) Inventor Shuei Eguchi 3-22-8 Hakataekimae, Hakata-ku, Fukuoka City, Fukuoka Prefecture Fujitsu Kyushu Digital Technology Co., Ltd.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 音声入力信号のピッチ周期を検出するピ
ッチ周期検出部(700) を具備し、該ピッチ周期を用いて
該音声入力信号を時間軸圧縮して符号化する送信部を有
する音声符号化装置において、 前記音声入力信号に対して有声音か無声音または子音か
の判定を行う音声種類判定手段(200) と、 該無声音または子音の場合には、前記音声入力信号を正
または負に応じて正または負の一定値に変換する信号変
換手段(400) と、 該音声種類判定手段の出力により、前記音声入力信号が
有声音の場合はそのまま、無声音または子音の場合には
該信号変換手段の出力を前記ピッチ検出部に加えるため
のスイッチ手段(300、500)とを設けたことを特徴とする
音声符号化装置。
1. A voice code comprising a pitch period detection unit (700) for detecting a pitch period of a voice input signal, and a transmission unit for time-axis-compressing and encoding the voice input signal using the pitch period. A voice type determining means (200) for determining whether the voice input signal is a voiced sound, an unvoiced sound or a consonant; and in the case of the unvoiced sound or a consonant, the voice input signal is determined to be positive or negative. And a signal converting means (400) for converting into a positive or negative constant value, and by the output of the voice type determining means, when the voice input signal is voiced sound, it is as it is, and when it is unvoiced sound or consonant, the signal conversion means. And a switch means (300, 500) for adding the output of the above to the pitch detection unit.
【請求項2】 音声入力信号のピッチ周期を検出するピ
ッチ周期検出部(700) と、該ピッチ周期を用いて連続す
る2つのピッチ周期区間毎に該音声入力信号を時間軸圧
縮する第1の時間軸圧縮手段(110) とを具備し、該第1
の時間軸圧縮手段の出力を符号化して送出する送信部を
有する音声符号化装置において、 前記音声入力信号に対して有声音か無声音または子音か
の判定を行う音声種類判定手段(200) と、 該無声音または子音の場合には、前記連続する2つのピ
ッチ周期区間の音声入力信号に対してそれぞれ、正弦波
関数および該正弦波の位相を反転した関数を掛けて時間
軸圧縮する第2の時間軸圧縮手段(120) と、 該音声種類判定手段の出力により、前記音声入力信号が
有声音の場合には前記第1の時間軸圧縮手段側に、又、
無声音または子音の場合には該第2の時間軸圧縮手段側
に切り替えるためのスイッチ手段(105、130)とを設けた
ことを特徴とする音声符号化装置。
2. A pitch period detection unit (700) for detecting a pitch period of a voice input signal, and a first period compression unit for time-sequentially compressing the voice input signal for every two continuous pitch period sections using the pitch period. A time axis compression means (110),
In a voice encoding device having a transmission unit for encoding and transmitting the output of the time axis compression unit, a voice type determination unit (200) for determining whether the voice input signal is voiced sound, unvoiced sound, or consonant sound, In the case of the unvoiced sound or the consonant sound, a second time period in which the sine wave function and the function obtained by inverting the phase of the sine wave are applied to the voice input signals in the two continuous pitch period sections to compress the time axis. When the voice input signal is a voiced sound, by the axis compression means (120) and the output of the voice type determination means, to the first time axis compression means side,
A voice encoding device, characterized by comprising switch means (105, 130) for switching to the second time axis compression means side in the case of unvoiced sound or consonant sound.
【請求項3】 前記請求項2の音声符号化装置から送出
された時間軸圧縮され符号化された音声信号を入力し
て、復号化して時間軸伸長する音声復号化装置であっ
て、 前記時間軸圧縮された音声信号に対して有声音か無声音
または子音かの判定を行う音声種類判定手段(205) と、 該有声音の場合に時間軸伸長を行う第1の時間軸伸長手
段(230) と、 該無声音または子音の場合には、連続する3つのピッチ
周期区間のうち前方2つ、及び後方2つのピッチ周期区
間の時間軸圧縮された音声信号に対してそれぞれ、正弦
波関数および該正弦波の位相を反転した関数を掛けて時
間軸伸長する第2の時間軸伸長手段(240) と、 該音声種類判定手段の出力により、前記音声信号が有声
音の場合には該第1の時間軸伸長手段側に、又、無声音
または子音の場合には該第2の時間軸伸長手段側に切り
替えるためのスイッチ手段(220、250)とを設けたことを
特徴とする音声復号化装置。
3. A speech decoding apparatus for inputting a time-axis-compressed and coded speech signal sent from the speech coding apparatus according to claim 2, decoding, and expanding the time-axis, Voice type determining means (205) for determining voiced sound, unvoiced sound or consonant sound with respect to the axially compressed voice signal, and first time axis expanding means (230) for expanding the time axis in the case of the voiced sound And in the case of the unvoiced sound or consonant, the sine wave function and the sine wave function for the time-axis-compressed audio signal of the front two pitch period sections and the rear two pitch period sections of three consecutive pitch period sections, respectively. When the voice signal is a voiced sound, the second time axis expansion means (240) for expanding the time axis by multiplying the function of inverting the phase of the wave, and the output of the voice type determination means, In the case of unvoiced sound or consonant on the side of axial extension means Speech decoding apparatus characterized in that a switch means (220, 250) for switching the time-axis expanding means side of said second.
JP23270493A 1993-09-20 1993-09-20 Speech encoding device and speech decoding device Withdrawn JPH0784597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP23270493A JPH0784597A (en) 1993-09-20 1993-09-20 Speech encoding device and speech decoding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP23270493A JPH0784597A (en) 1993-09-20 1993-09-20 Speech encoding device and speech decoding device

Publications (1)

Publication Number Publication Date
JPH0784597A true JPH0784597A (en) 1995-03-31

Family

ID=16943474

Family Applications (1)

Application Number Title Priority Date Filing Date
JP23270493A Withdrawn JPH0784597A (en) 1993-09-20 1993-09-20 Speech encoding device and speech decoding device

Country Status (1)

Country Link
JP (1) JPH0784597A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873058A (en) * 1996-03-29 1999-02-16 Mitsubishi Denki Kabushiki Kaisha Voice coding-and-transmission system with silent period elimination
JP2009515207A (en) * 2005-11-03 2009-04-09 ドルビー スウェーデン アクチボラゲット Improved transform coding for time warping of speech signals.
JP2011521303A (en) * 2008-07-11 2011-07-21 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Time axis compression curve calculator, audio signal encoder, encoded audio signal representation, method for providing decoded audio signal representation, method for providing encoded audio signal representation, and computer program

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5873058A (en) * 1996-03-29 1999-02-16 Mitsubishi Denki Kabushiki Kaisha Voice coding-and-transmission system with silent period elimination
JP2009515207A (en) * 2005-11-03 2009-04-09 ドルビー スウェーデン アクチボラゲット Improved transform coding for time warping of speech signals.
JP2012068660A (en) * 2005-11-03 2012-04-05 Dolby International Ab Time warped modified transform coding of audio signals
US8412518B2 (en) 2005-11-03 2013-04-02 Dolby International Ab Time warped modified transform coding of audio signals
US8838441B2 (en) 2005-11-03 2014-09-16 Dolby International Ab Time warped modified transform coding of audio signals
JP2011521303A (en) * 2008-07-11 2011-07-21 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Time axis compression curve calculator, audio signal encoder, encoded audio signal representation, method for providing decoded audio signal representation, method for providing encoded audio signal representation, and computer program
JP2011521304A (en) * 2008-07-11 2011-07-21 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Audio signal decoder, time-axis compression curve data providing apparatus, decoded audio signal providing method, and computer program
US9043216B2 (en) 2008-07-11 2015-05-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio signal decoder, time warp contour data provider, method and computer program

Similar Documents

Publication Publication Date Title
CA1301072C (en) Speech coding transmission equipment
JP4866438B2 (en) Speech coding method and apparatus
MXPA04005764A (en) Signal modification method for efficient coding of speech signals.
KR100615480B1 (en) Speech bandwidth extension apparatus and speech bandwidth extension method
EP0770987A2 (en) Method and apparatus for reproducing speech signals, method and apparatus for decoding the speech, method and apparatus for synthesizing the speech and portable radio terminal apparatus
US20020159472A1 (en) Systems and methods for encoding &amp; decoding speech for lossy transmission networks
JP2002055699A (en) Device and method for encoding voice
JP2004177978A (en) Method of generating comfortable noise of digital speech transmission system
JP2707564B2 (en) Audio coding method
US5488704A (en) Speech codec
JP2003223189A (en) Voice code converting method and apparatus
KR20200051858A (en) Audio coding device, audio coding method, audio coding program, audio decoding device, audio decoding method, and audio decoding program
JP2005338200A (en) Device and method for decoding speech and/or musical sound
US6243674B1 (en) Adaptively compressing sound with multiple codebooks
WO1997015046A9 (en) Repetitive sound compression system
JPH0784597A (en) Speech encoding device and speech decoding device
KR100594599B1 (en) Apparatus and method for restoring packet loss based on receiving part
JP2900987B2 (en) Silence compressed speech coding / decoding device
JP3088204B2 (en) Code-excited linear prediction encoding device and decoding device
US6134519A (en) Voice encoder for generating natural background noise
JP3055901B2 (en) Audio signal encoding / decoding method and audio signal encoding device
JP2847730B2 (en) Audio coding method
JPH0736119B2 (en) Piecewise optimal function approximation method
JP2003195900A (en) Speech signal encoding device, speech signal decoding device, and speech signal encoding method
JPH0229234B2 (en)

Legal Events

Date Code Title Description
A300 Withdrawal of application because of no request for examination

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 20001128