JPH08305396A

JPH08305396A - Device and method for expanding voice band

Info

Publication number: JPH08305396A
Application number: JP7110425A
Authority: JP
Inventors: Mineo Tsushima; 峰生津島; Takeshi Norimatsu; 武志則松; Yoshihisa Nakato; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-05-09
Filing date: 1995-05-09
Publication date: 1996-11-22
Anticipated expiration: 2013-09-17
Also published as: JP2798003B2

Abstract

PURPOSE: To generate a wide band synthetic voice having high articulation using an extracted spectrum envelope and a sound source having a wide band pulse train from a narrow band voice signal. CONSTITUTION: An input signal is stored in a buffer 101 for a fixed time, feature quantity relating to a spectrum envelope is extracted for a signal train stored in the buffer by an analyzing section 102. By using this feature quantity, a sound source pulse train generating section 13 estimates a sound source pulse train for a signal of the buffer 101, and restores an omitted band using an estimated pitch period. An output voice synthesizing section 14 synthesizes voice by weighing frequency in which voice is omitted from the sound source pulse train and the spectrum envelope. Thereby, a voice band can be expanded for a voice signal in which a band is omitted with comparatively simple constitution.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、狭帯域音声信号のみが
得られる環境下において、狭帯域音声信号から広帯域な
音声信号へと帯域を拡大する装置および方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for expanding a band from a narrow band voice signal to a wide band voice signal in an environment where only a narrow band voice signal is obtained.

【０００２】[0002]

【従来の技術】近年、ディジタル通信網の発達に伴っ
て、音声信号のディジタル化が急速に進んでおり、ディ
ジタル信号処理技術を用いた音声信号の加工には多くの
手法が提案されている。2. Description of the Related Art In recent years, with the development of digital communication networks, the digitization of voice signals has been rapidly progressing, and many techniques have been proposed for processing voice signals using a digital signal processing technique.

【０００３】以下に従来の音声帯域拡大装置および音声
帯域拡大方法について説明する。従来の音声帯域拡大方
法においては、例えば、信学技法ＳＰ９３-６１（１９
９３-０８）にあるように、分析合成技術とベクトル量
子化の手法を用いた手法が提案されている。A conventional voice band expanding device and voice band expanding method will be described below. In the conventional voice band expanding method, for example, the communication technique SP93-61 (19
93-08), a method using an analysis-synthesis technology and a vector quantization method has been proposed.

【０００４】図５は従来の音声帯域拡大方法を適用可能
な装置の構成図である。図５において、２０１はＬＰＣ
分析部、２０２はベクトル量子化部、２０３はディコー
ディング部、２０４は狭帯域コードブック、２０５は広
帯域コードブック、２０６は低域復元部、２０７は第一
の高域復元部、２０８は第二の高域復元部、２０９は加
算部、２１０はアップサンプリング部である。FIG. 5 is a block diagram of an apparatus to which a conventional voice band expanding method can be applied. In FIG. 5, 201 is an LPC
An analysis unit, 202 is a vector quantization unit, 203 is a decoding unit, 204 is a narrowband codebook, 205 is a wideband codebook, 206 is a low band restoration unit, 207 is a first high band restoration unit, and 208 is a second band. Is a high-frequency restoration unit, 209 is an addition unit, and 210 is an upsampling unit.

【０００５】以上のように構成された音声帯域拡大装置
について、以下その動作について説明する。The operation of the speech band expanding apparatus configured as described above will be described below.

【０００６】まず、入力された狭帯域音声信号はＬＰＣ
分析部２０１で線形予測分析され、スペクトル包絡情
報、パワー情報、およびピッチ情報に分離される。抽出
されたスペクトル包絡情報はベクトル量子化部２０２に
て、狭帯域コードブック２０４を参照してベクトル量子
化される。ベクトル量子化部２０２の出力をもとにディ
コーディング部２０３では、広帯域コードブック２０５
を参照し、広帯域なスペクトル包絡情報を推定する。Ｌ
ＰＣ分析部２０１で得られたパワー情報、ピッチ情報
と、ディコーディング部２０３で得られた広帯域なスペ
クトル包絡情報とから、低域復元部２０６では３００Ｈ
ｚ以下の低域周波数成分を生成する。また、ＬＰＣ分析
部２０１で得られたパワー情報、ピッチ情報と、ディコ
ーディング部２０３で得られたスペクトル包絡情報とか
ら、第一の高域復元部２０７では３４００Ｈｚ以上の高
域周波数成分を生成する。狭帯域音声信号をアップサン
プリングするアップサンプリング部２１０の出力と、低
域復元部２０６と高域復元部２０７の出力とを加算部２
０９で加算し、広帯域音声信号を得る。First, the input narrow band speech signal is the LPC.
The analysis unit 201 performs linear prediction analysis, and separates the spectrum envelope information, power information, and pitch information. The extracted spectrum envelope information is vector-quantized by the vector quantization unit 202 with reference to the narrowband codebook 204. Based on the output of the vector quantization unit 202, the decoding unit 203 outputs the wideband codebook 205.
To estimate the broadband spectral envelope information. L
From the power information and pitch information obtained by the PC analysis unit 201 and the wideband spectrum envelope information obtained by the decoding unit 203, the low frequency band restoration unit 206 outputs 300H.
A low frequency component equal to or lower than z is generated. Further, the first high-frequency restoration unit 207 generates a high-frequency component of 3400 Hz or higher from the power information and pitch information obtained by the LPC analysis unit 201 and the spectrum envelope information obtained by the decoding unit 203. . The adder 2 adds the output of the up-sampling unit 210 that up-samples the narrowband audio signal and the outputs of the low-frequency restoring unit 206 and the high-frequency restoring unit 207.
The addition is performed at 09 to obtain a wideband audio signal.

【０００７】また図５において、第一の高域復元部２０
７の代わりに第二の高域復元部２０８を用いる音声帯域
拡大方法も提案されている。第二の高域復元部２０８で
は、ＬＰＣ分析部２０１で得られたパワー情報、ピッチ
情報と、ベクトル量子化部２０２の出力とをもとに、波
形素片の方法を用いて３４００Ｈｚ以上の高域成分を生
成する。それ以外の構成は第一の高域復元部２０７を用
いる方法と同様であり、狭帯域音声信号をアップサンプ
リングするアップサンプリング部２１０の出力と、低域
復元部２０６と高域復元部２０８の出力とを加算部２０
９で加算し、広帯域音声信号を得るものである。Further, in FIG. 5, the first high-frequency restoration unit 20
A voice band expansion method using a second high-frequency restoration unit 208 instead of No. 7 has also been proposed. In the second high-frequency restoration unit 208, based on the power information and pitch information obtained by the LPC analysis unit 201 and the output of the vector quantization unit 202, a high-frequency wave of 3400 Hz or higher is used by using the waveform segment method. Generate a domain component. The rest of the configuration is the same as the method using the first high-frequency restoration unit 207, and the output of the up-sampling unit 210 that up-samples the narrow band audio signal and the output of the low-frequency restoration unit 206 and the high-frequency restoration unit 208. And adder 20
The addition is made in 9 to obtain a wideband audio signal.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記の
従来の構成では、スペクトル包絡情報の拡大を高精度に
行うには多数のコードブックが必要になるという問題が
ある。さらには、上記の従来の構成は、狭帯域音声信号
からピッチ情報を抽出し、それをパラメータとして直接
的に合成音源を作成する方法であるため、ピッチ推定誤
りによる音質の劣化が生じるという問題や、波形素片の
方法を用いた手法では素片データのコードブックを別途
持つ必要があるという問題を有していた。However, the above-mentioned conventional configuration has a problem that a large number of codebooks are required to perform expansion of the spectrum envelope information with high accuracy. Furthermore, since the above-mentioned conventional configuration is a method of extracting pitch information from a narrow band speech signal and directly creating a synthesized sound source using it as a parameter, there is a problem that sound quality deterioration due to pitch estimation error occurs. The method using the waveform segment method has a problem that it is necessary to have a codebook of segment data separately.

【０００９】本発明は上記従来の問題点を解決するもの
であり、スペクトル包絡情報の拡大は、線形写像関数を
用いるなどして広帯域化し広帯域スペクトル包絡を得
て、合成音源は音源を複数のパルス列で表現し、欠落し
た帯域を補間するようなパルス列を付加すること、さら
に、パルスを強調することで生じる歪み成分を用いて広
帯域化し、前記広帯域スペクトル包絡と前記合成音源を
用いて合成することにより広帯域な音声帯域拡大を図る
ことを目的とするものである。The present invention solves the above-mentioned conventional problems. The expansion of the spectrum envelope information is widened by using a linear mapping function or the like to obtain a wide band spectrum envelope. By adding a pulse train that interpolates the missing band, further widening the band using the distortion component generated by emphasizing the pulse, and combining by using the wideband spectrum envelope and the synthetic sound source. The purpose is to expand a wide voice band.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声帯域拡大装置は、入力信号を一定量蓄
えておくバッファと、前記バッファに蓄えられた信号列
に対して線形予測係数とＰＡＲＣＯＲ係数とのうち少な
くとも線形予測係数を計算する分析部と、前記分析部に
て得られる線形予測係数からインパルス応答を計算する
インパルス応答計算部と、前記線形予測係数をパラメー
タとして前記バッファの出力と第１の音声合成部の出力
との差信号列に対し人間の聴覚特性を模擬する重みづけ
を行なう聴覚重みづけフィルタ部と、前記聴覚重みづけ
フィルタ部の出力信号からピッチ周期を推定するピッチ
周期推定部と、前記ピッチ周期推定部からの出力値を参
照して、前記聴覚重みづけフィルタからの出力信号と前
記インパルス応答計算部の出力とを入力してパルス列を
発生させるパルス発生部と、前記パルス発生部から出力
されるパルス列と、前記線形予測係数とＰＡＲＣＯＲ係
数とのいずれかとを入力して音声信号を合成する前記第
１の音声合成部と、前記パルス発生部の出力に対して周
波数重みづけをおこなう周波数重みづけフィルタと、前
記周波数重みづけフィルタの出力値と、前記線形予測係
数とＰＡＲＣＯＲ係数とのいずれかとを入力して音声を
合成する第２の音声合成部とを具備する。In order to solve the above-mentioned problems, a voice band expanding apparatus of the present invention comprises a buffer for storing a fixed amount of input signals, and a linear prediction for a signal sequence stored in the buffer. An analysis unit that calculates at least a linear prediction coefficient of the coefficient and the PARCOR coefficient; an impulse response calculation unit that calculates an impulse response from the linear prediction coefficient obtained by the analysis unit; and a buffer of the buffer using the linear prediction coefficient as a parameter. An auditory weighting filter section for weighting the difference signal sequence between the output and the output of the first speech synthesizer to simulate human auditory characteristics, and a pitch period is estimated from the output signal of the auditory weighting filter section. With reference to the pitch period estimation unit and the output value from the pitch period estimation unit, the output signal from the auditory weighting filter and the impulse response A pulse generator that inputs the output of the calculator to generate a pulse train, a pulse train that is output from the pulse generator, and one of the linear prediction coefficient and the PARCOR coefficient, and synthesizes an audio signal. A first voice synthesis unit, a frequency weighting filter that performs frequency weighting on the output of the pulse generation unit, an output value of the frequency weighting filter, and either the linear prediction coefficient or the PARCOR coefficient. A second voice synthesizer for inputting and synthesizing voice.

【００１１】また、本発明の音声帯域拡大装置は、入力
信号を一定量蓄えておくバッファと、前記バッファに蓄
えられた信号列に対して線形予測係数とＰＡＲＣＯＲ係
数とのうち少なくとも線形予測係数を計算する分析部
と、前記分析部にて得られる線形予測係数からインパル
ス応答を計算するインパルス応答計算部と、前記線形予
測係数もしくはＰＡＲＣＯＲ係数をもとに帯域の拡大さ
れた線形予測係数もしくはＰＡＲＣＯＲ係数を推定する
包絡拡大部と、前記線形予測係数をパラメータとして前
記バッファと第１の音声合成部との差信号列に対し人間
の聴覚特性を模擬する重みづけを行なう聴覚重みづけフ
ィルタ部と、前記聴覚重みづけフィルタ部の出力信号か
らピッチ周期を推定するピッチ周期推定部と、前記ピッ
チ周期推定部からの出力値を参照して、前記聴覚重みづ
けフィルタ部からの出力信号と前記インパルス応答計算
部の出力とを入力してパルス列を発生させるパルス発生
部と、前記パルス発生部から出力されるパルス列と、前
記線形予測係数とＰＡＲＣＯＲ係数とのいずれかとを入
力して音声信号を合成する前記第１の音声合成部と、前
記パルス発生部の出力に対して周波数重みづけをおこな
う周波数重みづけフィルタと、前記周波数重みづけフィ
ルタの出力値と、前記包絡拡大部からの出力値とを入力
して音声を合成する第２の音声合成部とを具備する。Further, the voice band expanding apparatus of the present invention stores at least a linear prediction coefficient of a linear prediction coefficient and a PARCOR coefficient for a signal sequence stored in the buffer and a buffer for storing a fixed amount of the input signal. An analysis unit for calculating, an impulse response calculation unit for calculating an impulse response from the linear prediction coefficient obtained by the analysis unit, and a linear prediction coefficient or PARCOR coefficient whose band is expanded based on the linear prediction coefficient or PARCOR coefficient. An envelopment enlarging section for estimating the difference, a perceptual weighting filter section for weighting a difference signal sequence between the buffer and the first speech synthesis section using the linear prediction coefficient as a parameter to simulate human auditory characteristics, A pitch period estimation unit that estimates the pitch period from the output signal of the auditory weighting filter unit; With reference to the force value, a pulse generator that generates a pulse train by inputting the output signal from the auditory weighting filter unit and the output of the impulse response calculator, and a pulse train output from the pulse generator, The first speech synthesizer for synthesizing an audio signal by inputting one of the linear prediction coefficient and the PARCOR coefficient; a frequency weighting filter for performing frequency weighting on the output of the pulse generator; A second voice synthesis unit for synthesizing voice by inputting an output value of the frequency weighting filter and an output value from the envelope enlarging unit is provided.

【００１２】ここで、上述したパルス発生部は、聴覚重
みづけフィルタからの出力信号に対して周波数重みづけ
を行なう周波数重みづけフィルタを有し、かつ欠落して
いる帯域を復元するパルス列を発生する機能を有するよ
うにしてもよい。Here, the above-mentioned pulse generator has a frequency weighting filter for performing frequency weighting on the output signal from the auditory weighting filter, and generates a pulse train for restoring the missing band. You may make it have a function.

【００１３】また、パルス発生部は、ピッチ周期推定部
の検出値を特徴量として、ピッチ周期間隔にあるパルス
の振幅を大きくすることによりパルスを強調する機能を
有するようにしてもよい。Further, the pulse generating section may have a function of emphasizing the pulse by increasing the amplitude of the pulse at the pitch cycle interval, using the detected value of the pitch cycle estimating section as the characteristic amount.

【００１４】さらに、パルス発生部は、ピッチ周期推定
部の検出値を特徴量として、ある閾値を用いて、前記閾
値に設定されたパルスを出力する機能を有するようにし
てもよい。Further, the pulse generating section may have a function of outputting a pulse set to the threshold value by using a threshold value with the detected value of the pitch period estimating section as a characteristic amount.

【００１５】包絡拡大部は、分析部で得られた特徴量を
入力して、広帯域なスペクトル包絡の特徴を有するスペ
クトル包絡に対する特徴量を、線形写像関数を用いて広
帯域化する機能を有するようにしてもよい。The envelope enlarging unit has a function of inputting the feature amount obtained by the analyzing unit and broadening the feature amount for the spectrum envelope having the feature of the wide band spectrum envelope by using the linear mapping function. May be.

【００１６】また、本発明の音声帯域拡大方法は、入力
信号をバッファに一定量蓄えておき、前記バッファに蓄
えられた信号列に対してスペクトル包絡に関する特徴量
を抽出し、前記抽出された特徴量と前記バッファに蓄え
られた信号列とから推定された音源パルス列に対して、
信号の持つ周波数特性についての事前情報と推定された
ピッチ情報とから広帯域な音源パルス列を発生し、前記
スペクトル包絡に関する特徴量と前記広帯域な音源パル
ス列とを用いて音声を合成するものである。Further, in the voice band expanding method of the present invention, a certain amount of an input signal is stored in a buffer, a feature quantity relating to a spectrum envelope is extracted from a signal sequence stored in the buffer, and the extracted feature is extracted. For the source pulse train estimated from the amount and the signal train stored in the buffer,
A wide-band source pulse train is generated from a priori information on the frequency characteristic of the signal and the estimated pitch information, and a voice is synthesized using the feature amount related to the spectral envelope and the wide-band source pulse train.

【００１７】さらに、本発明の音声帯域拡大方法は、入
力信号をバッファに一定量蓄えておき、前記バッファに
蓄えられた信号列に対してスペクトル包絡に関する特徴
量を抽出し、前記特徴量の表現する包絡情報に欠落して
いる包絡情報を補間し、前記特徴量と前記バッファに蓄
えられた信号列とから推定された音源パルス列に対し
て、信号の持つ周波数特性についての事前情報と推定さ
れたピッチ情報とから広帯域な音源パルス列を発生し、
前記スペクトル包絡に関する特徴量と前記広帯域な音源
パルス列とを用いて音声を合成するものである。Further, in the voice band expanding method of the present invention, a certain amount of input signal is stored in a buffer, and the feature quantity related to the spectrum envelope is extracted from the signal sequence stored in the buffer to express the feature quantity. The missing envelope information in the envelope information is interpolated, and with respect to the sound source pulse train estimated from the feature quantity and the signal train stored in the buffer, it was estimated as prior information about the frequency characteristic of the signal. Generates a wide-band source pulse train from pitch information,
The speech is synthesized using the feature quantity related to the spectrum envelope and the wide-band source pulse train.

【００１８】[0018]

【作用】本発明は上記構成により、入力信号を一定時間
バッファに蓄え、バッファ中の信号に対して、スペクト
ル包絡に関するＰＡＲＣＯＲ係数や線形予測係数等の特
徴量をＰＡＲＣＯＲ分析部で抽出する。また、バッファ
中の信号とスペクトル包絡に関する特徴量とから合成音
源を複数のパルス列で表現する。そのパルス列を決定し
て生成する際には、入力信号に無い帯域を、予め推定し
ておいたピッチ周期を特徴量として、周期的に強調する
ことで生成することや、ピッチ周期に対する歪みを加え
ることや、クリッピングの歪みを用いること等によって
生成する。こうして生成されたパルス列を音源として、
ＰＡＲＣＯＲ分析部から出力されるＰＡＲＣＯＲ係数ま
たは線形予測係数を用いて、音声合成を行なう。また、
スペクトル包絡情報は線形写像関数やベクトル量子化の
方法を用いて広帯域化し、前記音源を用いて音声合成も
可能である。このようにして合成された音声は、スペク
トルの微細構造を保持した音声で、かつ、欠落した帯域
を補間するように広帯域化され、高品質で明瞭な合成音
を提供することとなる。According to the present invention, the input signal is stored in the buffer for a certain period of time according to the above-mentioned structure, and the PARCOR analysis unit extracts the feature quantities such as the PARCOR coefficient and the linear prediction coefficient related to the spectral envelope from the signal in the buffer. In addition, the synthesized sound source is represented by a plurality of pulse trains from the signal in the buffer and the feature amount related to the spectrum envelope. When the pulse train is determined and generated, a band that does not exist in the input signal is generated by periodically emphasizing the pitch period estimated in advance as a feature amount or adding distortion to the pitch period. Or using the distortion of clipping. With the pulse train generated in this way as the sound source,
Speech synthesis is performed using the PARCOR coefficient or the linear prediction coefficient output from the PARCOR analysis unit. Also,
The spectral envelope information can be broadened using a linear mapping function or a vector quantization method, and speech synthesis can be performed using the sound source. The voice synthesized in this manner is a voice that retains the fine structure of the spectrum and is broadened so as to interpolate the missing band, thereby providing a high-quality and clear synthesized voice.

【００１９】[0019]

【Example】

（実施例１）以下、本発明の第１の実施例について説明
する。図１は本発明の第１の実施例における音声帯域拡
大方法を適用可能な装置の全体構成を示すブロック図で
ある。(Embodiment 1) Hereinafter, a first embodiment of the present invention will be described. FIG. 1 is a block diagram showing the overall configuration of an apparatus to which the voice band expanding method according to the first embodiment of the present invention can be applied.

【００２０】図１において、１０１は入力信号を一定時
間蓄えておくバッファである。１０２は、バッファ１０
１に蓄えられた入力信号列に対してスペクトル包絡に関
する線形予測係数やＰＡＲＣＯＲ係数などの特徴量を抽
出する分析部、１０３は前記線形予測係数からインパル
ス応答を計算するインパルス応答計算部である。In FIG. 1, reference numeral 101 is a buffer which stores an input signal for a certain period of time. 102 is the buffer 10
An analysis unit that extracts a feature amount such as a linear prediction coefficient or a PARCOR coefficient related to the spectral envelope with respect to the input signal sequence stored in 1 is an impulse response calculation unit that calculates an impulse response from the linear prediction coefficient.

【００２１】音源情報の帯域拡大は、第一の音声合成部
１０５からの出力信号をバッファ１０１の信号列から引
いたものに、周波数軸上で人間の聴覚的重みづけを行な
う聴覚重みづけフィルタ部１０４と、この聴覚重みづけ
フィルタ部１０４の出力からピッチ周期を推定するピッ
チ周期推定部１０６と、分析部１０２にて抽出された特
徴量をもとに、パルス位置とゲインを決定しパルス列を
発生するパルス発生部１０７と、パルス発生部１０７で
発生されたパルス列と前記線形予測係数とから合成音を
合成する第一の音声合成部１０５とから実現される。To expand the band of the sound source information, a perceptual weighting filter unit for perceptually weighting humans on the frequency axis by subtracting the output signal from the first speech synthesis unit 105 from the signal sequence of the buffer 101. 104, a pitch period estimation unit 106 that estimates a pitch period from the output of the auditory weighting filter unit 104, and a pulse position and a gain based on the feature amount extracted by the analysis unit 102 to generate a pulse train. And a first speech synthesis unit 105 that synthesizes a synthetic sound from the pulse train generated by the pulse generation unit 107 and the linear prediction coefficient.

【００２２】パルス発生部１０７では、聴覚重みづけフ
ィルタ部１０４からの入力信号に対して誤差が最小にな
るようにパルス位置とゲインを決定し、ピッチ周期推定
部１０６で得られたピッチ周期をもとに発生するパルス
列に、低域強調等の歪みを付加する。低域強調の歪みと
しては、例えば、半波整流、全波整流、クリッピング、
べき乗の処理等が挙げられる。The pulse generator 107 determines the pulse position and gain so that the error with respect to the input signal from the perceptual weighting filter 104 is minimized, and the pitch period obtained by the pitch period estimator 106 is also calculated. Distortion such as low-frequency emphasis is added to the pulse trains generated in and. Examples of distortion in the low-frequency emphasis include half-wave rectification, full-wave rectification, clipping,
Examples include power processing.

【００２３】最終的な出力合成音は、パルス発生部１０
７で決定されたパルス列を入力音源として、周波数的に
欠落した部分を強調し補間することを目的とした強調フ
ィルタ部１０８と、分析部１０２にて得られた線形予測
係数を特徴量として、音声を合成する第二の音声合成部
１０９とからなる。The final output synthesized voice is the pulse generator 10
Using the pulse train determined in step 7 as the input sound source, the emphasis filter unit 108 for the purpose of emphasizing and interpolating the frequency-missing portion, and the linear prediction coefficient obtained by the analysis unit 102 as the feature amount And a second voice synthesis unit 109 for synthesizing

【００２４】以下、上述した本発明の第一実施例につい
て、図１のブロック図を参照しながら詳細に説明する。The first embodiment of the present invention described above will be described in detail below with reference to the block diagram of FIG.

【００２５】まず、バッファ１０１にて、音声信号を離
散的に一定時間取り込むが、この一定時間間隔は、例え
ばサンプリング周波数が１６kHzであるとして、２４０
点とし、この時間単位を以下「フレーム」と呼ぶことに
する。１フレーム毎に音声のスペクトル包絡に関するパ
ラメータを算出する処理が分析部１０２にて実行され
る。バッファ１０１に取り込まれた信号は、分析部１０
２にて、時刻ｉにおける観測信号をy(i) とし、ｍ次の
自己相関値をr(m) とすると、（数１）によってｍ次の
自己相関値が計算される。First, in the buffer 101, the audio signal is discretely taken in for a fixed time, and this fixed time interval is 240 if the sampling frequency is 16 kHz, for example.
This time unit will be referred to as a "frame" hereinafter. The analysis unit 102 executes the process of calculating the parameter related to the spectrum envelope of the voice for each frame. The signal captured in the buffer 101 is analyzed by the analysis unit 10
At 2, when the observed signal at time i is y (i) and the mth-order autocorrelation value is r (m), the mth-order autocorrelation value is calculated by (Equation 1).

【００２６】[0026]

【数１】 [Equation 1]

【００２７】（数１）によって計算された自己相関値を
もとにして、分析部１０２で分析を行なう。分析部１０
２では、上記自己相関値からＰＡＲＣＯＲ係数もしくは
線形予測係数を回帰的に算出する。この算出方法につい
ては、公知技術を用いて容易に実現でき、例えば、音響
・音声工学、古井著近代科学社pp.131-136 に記載され
ている。ＰＡＲＣＯＲ係数を求めれば一意的に線形予測
係数が求まるし、線形予測係数を求めれば一意的にＰＡ
ＲＣＯＲ係数が求まる。この際、分析の次数としては１
０次〜２５次程度の値として計算する。インパルス応答
計算部１０３では、分析部１０２で算出された線形予測
係数をもとに、その線形予測係数値の系におけるインパ
ルス応答を、インパルスを入力として印加することで算
出する。Based on the autocorrelation value calculated by (Equation 1), the analysis unit 102 analyzes. Analysis unit 10
In 2, the PARCOR coefficient or the linear prediction coefficient is recursively calculated from the autocorrelation value. This calculation method can be easily realized by using a known technique, and is described in, for example, Acoustic and Speech Engineering, Furui Modern Science Co., Ltd. pp.131-136. If the PARCOR coefficient is obtained, the linear prediction coefficient is uniquely obtained. If the linear prediction coefficient is obtained, the PA is uniquely obtained.
The RCOR coefficient is obtained. At this time, the order of analysis is 1
It is calculated as a value of 0th to 25th order. The impulse response calculation unit 103 calculates the impulse response in the system of the linear prediction coefficient value based on the linear prediction coefficient calculated by the analysis unit 102 by applying an impulse as an input.

【００２８】音源の生成は、まずバッファ１０１の音声
信号から、第一の音声合成部１０５で合成された信号を
差し引くことから計算が始まる。差し引かれた信号は、
いわば前の分析フレームの影響を取り除くことと同じ効
果を持ち、分析フレームにおける聴覚重みづけフィルタ
部１０４の入力信号となる。聴覚重みづけフィルタ部１
０４は、（数２）の特性を持つディジタルフィルタをも
って構成される。The calculation of the sound source starts by subtracting the signal synthesized by the first speech synthesizer 105 from the speech signal in the buffer 101. The subtracted signal is
It has, so to speak, the same effect as removing the influence of the previous analysis frame, and becomes the input signal of the perceptual weighting filter unit 104 in the analysis frame. Auditory weighting filter unit 1
04 is configured by a digital filter having the characteristic of (Equation 2).

【００２９】[0029]

【数２】 [Equation 2]

【００３０】（数２）において、Ｗ（ｚ）はｚの関数
で、ａiはｉ次の線形予測係数、c^kは所望のフィルタ特
性を実現する為に与えられる定数（例えば０．８程
度）、ｚはｚ変換により生じる複素変数である。聴覚重
みづけフィルタ部１０４は、量子化誤差の影響をパワー
の大きいフォルマント周波数辺りに付加することで、聴
覚的にザラツキ感の少ない合成音を提供する効果を持
つ。なお（数２）において、ｃ ^kは例えば０．８である
とき、フィルタとしての効果が大きくなる。ピッチ周期
推定部１０６では、長期予測フィルタを用いてピッチ周
期を推定する。ピッチ周期推定部１０６としては、長期
予測に対する相関値を算出することでピッチ周期を算出
する方法と、波形を加算することでピッチ周期を推定す
る方法が挙げられる。例えば、最小２乗予測誤差法（Ｍ
ＳＰＥ法）を用いた方法等があげられる。In (Equation 2), W (z) is a function of z
Where ai is the i-th order linear prediction coefficient, c^kIs the desired filter feature
A constant that is given to realize the property (for example, about 0.8
Degree) and z are complex variables generated by z transformation. Heavy hearing
The finding filter unit 104 reduces the influence of the quantization error to power.
By adding around the large formant frequency,
Has the effect of providing a synthetic sound with less sense of roughness.
One. Note that in (Equation 2), c ^kIs, for example, 0.8
At this time, the effect as a filter becomes large. Pitch period
The estimation unit 106 uses the long-term prediction filter to determine the pitch cycle.
Estimate the period. As the pitch period estimation unit 106,
Calculate pitch period by calculating correlation value for prediction
Method, and the pitch period is estimated by adding the waveforms.
Method. For example, the least squares prediction error method (M
A method using the SPE method) and the like can be mentioned.

【００３１】具体的には、（数３）の計算式において、
match(m) に最大値を与えるｍをピッチ周期とすること
等によりピッチ周期を算出する。Specifically, in the calculation formula of (Equation 3),
The pitch period is calculated by setting m that gives the maximum value to match (m) as the pitch period.

【００３２】[0032]

【数３】 (Equation 3)

【００３３】パルス発生部１０７では、聴覚重みづけフ
ィルタ部１０４からの出力系列を入力として、（数４）
によって位置ｍ_k点におけるゲインｇk(ｍk)を算出す
る。In the pulse generator 107, the output sequence from the perceptual weighting filter unit 104 is input, and (Equation 4)
The gain gk (mk) at the position m _k is calculated by

【００３４】[0034]

【数４】 [Equation 4]

【００３５】（数４）において、ｈ_iはｉ点離れた点に
おけるインパルス応答である。そして、（数５）のε_k
を最小とする点ｍ_kがパルス位置と成るように、予め設
定したパルスの本数まで決定していく。In (Equation 4), h _i is the impulse response at points separated by i points. Then, ε _{k of} (Equation 5)
The number of pulses set up in advance is determined so that the point m _k that minimizes is the pulse position.

【００３６】[0036]

【数５】 (Equation 5)

【００３７】パルスの本数は、任意に設定できるが、例
えば入力系列が１６kHzサンプリングで８０点の観測点
に対して８点程度のパルス設定が好ましい。The number of pulses can be set arbitrarily, but for example, it is preferable to set a pulse of about 8 points for 80 observation points when the input sequence is 16 kHz sampling.

【００３８】入力音声において低域周波数成分が欠落し
ている場合は、ピッチ周期推定部１０６の推定値を用い
て、（数５）で決定されたパルス列に対して、推定され
たピッチ周期にあるパルスを強調することで、パルス発
生部１０７の出力として、基本周波数成分を復元する。When the low frequency component is missing in the input speech, the estimated pitch period is used for the pulse train determined by (Equation 5) using the estimated value of the pitch period estimation unit 106. By emphasizing the pulse, the fundamental frequency component is restored as the output of the pulse generator 107.

【００３９】パルス発生部１０７におけるパルス強調の
方法としては、簡単には、ピッチ周期推定部１０６の推
定値を用いて、フレーム長とピッチ周期推定部１０６か
らの出力値との比から、１フレーム長に対して何ピッチ
周期があるかを計算し、１フレームに対して、上からそ
の本数の最大振幅のパルスを定数倍（＞＝１）し、出力
パルス列とする。As a method of emphasizing the pulse in the pulse generation section 107, a simple method is to use the estimated value of the pitch period estimation section 106 and calculate one frame from the ratio of the frame length and the output value from the pitch period estimation section 106. The number of pitch periods with respect to the length is calculated, and one frame is multiplied by a constant number (> = 1) of the maximum amplitude pulses from the top to obtain an output pulse train.

【００４０】図４を用いてピッチ強調の一実施例を説明
する。同図中のａはピッチ周期推定部１０６によって推
定された推定ピッチ周期間隔を模式に表現したもので、
ｂは（数５）によって算出された現在の推定フレームに
おける推定パルス列と、一つ前の分析時刻に決定された
パルス列とを、縦軸にゲインを横軸に時刻をとって表現
したもので、ｃは強調処理後の推定パルス列である。ｂ
において、過去の推定フレームにおける強調されたパル
スのうち、最後の時刻に強調されたパルスからａで示し
た推定ピッチ間隔ごとに（数５）で決定されたパルスが
あるか無いかを観測し、該当するパルスがある場合は、
ｃに示すようにパルスを強調し、パルス発生部１０７の
出力とする。An embodiment of pitch enhancement will be described with reference to FIG. In the figure, a is a schematic representation of the estimated pitch cycle interval estimated by the pitch cycle estimation unit 106,
b represents the estimated pulse train in the current estimated frame calculated by (Equation 5) and the pulse train determined at the previous analysis time, with the vertical axis representing the gain and the horizontal axis representing the time. c is the estimated pulse train after the emphasis processing. b
In the above, among the emphasized pulses in the past estimated frame, it is observed whether or not there is a pulse determined in (Equation 5) at each estimated pitch interval indicated by a from the pulse emphasized at the last time, If there is a corresponding pulse,
The pulse is emphasized as shown in FIG.

【００４１】他には、ピッチ周期推定部１０６の推定値
を用いて、フレーム長とピッチ周期推定部１０６からの
出力値との比から、１フレーム長に対して何ピッチ周期
があるかを計算し、１フレームに対して、ピッチ周期に
相当する位置にあるパルスの組を探し、振幅の大きな組
からフレーム長とピッチ周期推定部１０６の出力値の比
から求まる、１フレームにあるべきピッチ数だけパルス
を定数倍（＞＝１）し、出力パルス列とする。第一の音
声合成部１０５では、パルス発生部１０７の出力パルス
列を入力し、分析部１０２から出力されるＰＡＲＣＯＲ
係数を特徴量として、ＰＡＲＣＯＲ合成によって合成音
を得る。Besides, using the estimated value of the pitch period estimation unit 106, the number of pitch periods for one frame length is calculated from the ratio of the frame length and the output value from the pitch period estimation unit 106. However, for one frame, a set of pulses at a position corresponding to the pitch period is searched for, and the number of pitches that should be in one frame, which is obtained from the ratio of the frame length and the output value of the pitch period estimation unit 106, from the set having a large amplitude. However, the pulse is multiplied by a constant (> = 1) to obtain an output pulse train. The first speech synthesis unit 105 receives the output pulse train of the pulse generation unit 107 and outputs the PARCOR output from the analysis unit 102.
A synthetic sound is obtained by PARCOR synthesis using the coefficient as a feature amount.

【００４２】出力音声の合成は、パルス発生部１０７で
生成されたパルス列を入力として、強調フィルタ部１０
８によって、欠落した帯域をＦＩＲフィルタ等で復元す
る。第二の音声合成部１０９では、強調フィルタ部１０
８からの出力を入力し、分析部１０２からのＰＡＲＣＯ
Ｒ係数を特徴量として、ＰＡＲＣＯＲ合成によって得ら
れた合成音を出力する。このようにして各フレームごと
に音声を合成する。The output voice is synthesized by using the pulse train generated by the pulse generator 107 as an input and the emphasis filter 10
8, the missing band is restored by an FIR filter or the like. In the second speech synthesis unit 109, the emphasis filter unit 10
The output from 8 is input, and the PARCO from the analysis unit 102 is input.
A synthesized sound obtained by PARCOR synthesis is output using the R coefficient as a feature amount. In this way, the voice is synthesized for each frame.

【００４３】以上のように、本実施例の音声帯域拡大装
置によれば、入力信号を一定時間蓄えておくバッファ１
０１と、前記バッファに蓄えられた信号列に対して、ス
ペクトル包絡に関する特徴量を抽出する分析部１０２
と、前記分析部１０２からの特徴量を用いて、前記バッ
ファ１０１の信号に対する音源パルスを推定し、推定し
たピッチ周期を用いてピッチ周期に対応するパルスを強
調するパルス発生部１０７と、周波数の重みづけを行う
強調フィルタ部１０８とを備えたことにより、比較的簡
単な構成で、ピッチ強調の効果により、帯域の欠落した
音声信号から音声帯域を拡大できる音声帯域拡大装置を
提供することができる。As described above, according to the voice band expanding apparatus of this embodiment, the buffer 1 for storing the input signal for a certain period of time.
01, and an analysis unit 102 that extracts a feature amount related to the spectrum envelope from the signal sequence stored in the buffer.
And a pulse generator 107 for estimating a sound source pulse for the signal of the buffer 101 using the feature amount from the analyzer 102, and emphasizing a pulse corresponding to the pitch period using the estimated pitch period; By providing the weighting enhancement filter unit 108, it is possible to provide a voice band expanding device that can expand the voice band from a voice signal having a missing band by the effect of pitch enhancement with a relatively simple configuration. .

【００４４】なお本実施例では、第一の音声合成部１０
５および第二の音声合成部１０９では、分析部１０２か
ら出力されるＰＡＲＣＯＲ係数を特徴量として、ＰＡＲ
ＣＯＲ合成によって合成音を得ているが、ＰＡＲＣＯＲ
係数に代えて線形予測係数を用いて音声合成を行なって
もかまわない。In this embodiment, the first voice synthesizer 10
5 and the second speech synthesis unit 109, the PARCOR coefficient output from the analysis unit 102 as a feature amount
Synthesized sound is obtained by COR synthesis, but PARCOR
The speech synthesis may be performed using a linear prediction coefficient instead of the coefficient.

【００４５】（実施例２）次に、本発明の第２の実施例
について説明する。(Embodiment 2) Next, a second embodiment of the present invention will be described.

【００４６】本発明の第２の実施例の全体構成は第１の
実施例（図１）と同様であるが、図１のパルス発生部１
０７におけるパルス発生の方法が異なる。以下、このパ
ルス発生方法についてのみ説明し、その他の説明は省略
する。The overall configuration of the second embodiment of the present invention is the same as that of the first embodiment (FIG. 1), but the pulse generator 1 of FIG.
The method of pulse generation in 07 is different. Hereinafter, only this pulse generation method will be described, and other description will be omitted.

【００４７】聴覚重みづけフィルタ部１０４からの出力
と、インパルス応答計算部１０３からの出力から第１実
施例と同様の手法で推定パルス列を得た後、ピッチ周期
推定部１０６の推定値を用いて、１フレーム長に対して
何ピッチ周期があるかを計算し、１フレームに対して、
上からその本数の最大振幅のパルスを、ある閾値を用い
て、その閾値以上の推定パルスはその閾値の大きさのパ
ルスとすることにより、出力パルス列とする点のみが第
一の実施例と異なる。An estimated pulse train is obtained from the output from the perceptual weighting filter unit 104 and the output from the impulse response calculation unit 103 by the same method as in the first embodiment, and then the estimated value of the pitch period estimation unit 106 is used. Calculate how many pitch periods there are for one frame length,
From the top, the number of maximum amplitude pulses is set to a certain threshold value, and the estimated pulse above the threshold value is set to a pulse having the magnitude of the threshold value, so that only the output pulse train is different from the first embodiment. .

【００４８】以上のように、本実施例の音声帯域拡大装
置によれば、入力信号を一定時間蓄えておくバッファ１
０１と、前記バッファに蓄えられた信号列に対して、ス
ペクトル包絡に関する特徴量を抽出する分析部１０２
と、前記分析部１０２からの特徴量を用いて、前記バッ
ファ１０１の信号に対する音源パルスを推定し、推定し
たピッチ周期を用いてピッチ周期に対応するパルスを、
ある閾値を用いて、その閾値以上の振幅であるパルスを
閾値に揃えることで、音源に歪みを生じせしめ、その歪
みにより低域強調に効果のあるパルス列を生成するパル
ス発生部１０７と、周波数の重みづけを行う強調フィル
タ部１０８とを備えたことにより、比較的簡単な構成
で、ピッチ周期に対する歪みの効果を利用して、帯域の
欠落した音声信号から音声帯域を拡大できる音声帯域拡
大装置を提供することができる。As described above, according to the voice band expanding apparatus of this embodiment, the buffer 1 for storing the input signal for a certain period of time.
01, and an analysis unit 102 that extracts a feature amount related to the spectrum envelope from the signal sequence stored in the buffer.
And estimating a sound source pulse for the signal of the buffer 101 using the feature amount from the analysis unit 102, and using the estimated pitch period to obtain a pulse corresponding to the pitch period,
By using a certain threshold value and aligning a pulse having an amplitude equal to or larger than the threshold value with the threshold value, distortion is caused in the sound source, and the distortion causes the pulse train 107 to generate a pulse train effective for low-frequency emphasis. By providing the weighting enhancement filter unit 108, a voice band expanding device capable of expanding a voice band from a voice signal with a missing band by using the effect of distortion with respect to the pitch period with a relatively simple configuration. Can be provided.

【００４９】なお本実施例では、第一の音声合成部１０
５および第二の音声合成部１０９では、分析部１０２か
ら出力されるＰＡＲＣＯＲ係数を特徴量として、ＰＡＲ
ＣＯＲ合成によって合成音を得ているが、ＰＡＲＣＯＲ
係数に代えて線形予測係数を用いて音声合成を行なって
もかまわない。In this embodiment, the first voice synthesizer 10
5 and the second speech synthesis unit 109, the PARCOR coefficient output from the analysis unit 102 as a feature amount
Synthesized sound is obtained by COR synthesis, but PARCOR
The speech synthesis may be performed using a linear prediction coefficient instead of the coefficient.

【００５０】（実施例３）次に、本発明の第３の実施例
について説明する。(Embodiment 3) Next, a third embodiment of the present invention will be described.

【００５１】本発明の第３の実施例の全体構成を図２に
示す。第１の実施例との相違点は、図２に示したように
包絡拡大部１１０を有する点、および、分析部１０２
が、スペクトル包絡に関する特徴量として線形予測係数
を算出する点であり、その他の構成・動作は図１にある
構成要素と同様であるので説明を省略する。The overall configuration of the third embodiment of the present invention is shown in FIG. The difference from the first embodiment is that the envelope expanding section 110 is provided as shown in FIG.
However, the point is that a linear prediction coefficient is calculated as the feature amount related to the spectrum envelope, and the other configurations and operations are the same as the components in FIG.

【００５２】包絡拡大部１１０では、広帯域な特性を有
する線形予測係数の集まりであるコードブックを予め作
成しておき、分析部１０２より得られる線形予測係数を
入力として、ベクトル量子化の方法を用いて、広帯域な
特性を有する線形予測係数を、上記コードブック中から
選択し、これを包絡拡大部１１０の出力とし、第二の音
声合成部１０９の入力とすることで、合成された音源な
らびに拡大されたスペクトル包絡の特徴量から、広帯域
な音声を合成する。In the envelope enlarging unit 110, a codebook, which is a group of linear prediction coefficients having a wide band characteristic, is created in advance, and the linear prediction coefficients obtained from the analysis unit 102 are used as an input to use the vector quantization method. Then, a linear prediction coefficient having a wide band characteristic is selected from the above codebook, and this is used as the output of the envelope enlarging unit 110 and the input of the second speech synthesizing unit 109. Wide-band speech is synthesized from the generated spectral envelope feature values.

【００５３】以上のように、本実施例の音声帯域拡大装
置によれば、入力信号を一定時間蓄えておくバッファ１
０１と、前記バッファに蓄えられた信号列に対して、ス
ペクトル包絡に関する特徴量を抽出する分析部１０２
と、前記分析部からの出力である特徴量から広帯域な特
性を有するスペクトル包絡に関する特徴量を推定する包
絡拡大部１１０と、前記分析部１０２からの特徴量を用
いて、前記バッファ１０１の信号に対する音源パルス列
を推定し、推定したピッチ周期を用いて欠落した帯域を
復元するパルス発生部１０７と、音源のパルス列に周波
数の重みづけを行う強調フィルタ部１０８を備えたこと
により、比較的簡単な構成で、帯域拡大されたスペクト
ル包絡と、帯域拡大された音源とから、帯域を拡大した
高音質で広帯域な合成音を作成することができる。As described above, according to the voice band expanding apparatus of the present embodiment, the buffer 1 for storing the input signal for a certain period of time.
01, and an analysis unit 102 that extracts a feature amount related to the spectrum envelope from the signal sequence stored in the buffer.
And an envelope enlarging unit 110 that estimates a feature amount related to a spectral envelope having a wide band characteristic from a feature amount output from the analysis unit, and a feature amount from the analysis unit 102, with respect to a signal of the buffer 101. A pulse generator 107 that estimates a sound source pulse train and restores a missing band by using the estimated pitch period, and an enhancement filter 108 that weights the pulse train of the sound source with a frequency are provided. Then, it is possible to create a high-quality, wide-band synthetic sound with a widened band from the band-extended spectrum envelope and the band-expanded sound source.

【００５４】なお本実施例では、第一の音声合成部１０
５および第二の音声合成部１０９では、線形予測係数を
特徴量として用い、合成音を得ているが、線形予測係数
に代えてＰＡＲＣＯＲ係数を用いて音声合成を行なって
もかまわない。In this embodiment, the first voice synthesizer 10
In the fifth and second speech synthesis units 109, the linear prediction coefficient is used as the feature amount to obtain the synthesized voice, but the PARCOR coefficient may be used instead of the linear prediction coefficient to perform the speech synthesis.

【００５５】（実施例４）次に、本発明の第４の実施例
について説明する。(Fourth Embodiment) Next, a fourth embodiment of the present invention will be described.

【００５６】本発明の第４の実施例の全体構成を図２に
示す。第３の実施例との相違点は、包絡拡大部の構成で
あり、第１の実施例との相違点は、図２に示したように
包絡拡大部１１０を有する点、および、分析部１０２
が、スペクトル包絡に関する特徴量として線形予測係数
を算出する点であり、その他の構成・動作は図１にある
構成要素と同様であるので説明を省略する。The overall construction of the fourth embodiment of the present invention is shown in FIG. The difference from the third embodiment is the configuration of the envelope enlarging unit, and the difference from the first embodiment is that the envelope enlarging unit 110 is provided as shown in FIG. 2 and the analyzing unit 102.
However, the point is that a linear prediction coefficient is calculated as the feature amount related to the spectrum envelope, and the other configurations and operations are the same as the components in FIG.

【００５７】包絡拡大部１１０の詳細な説明を図３を用
いて行う。図３は包絡拡大部１１０の内部ブロック図で
あり、同図において、３０１は狭帯域コードブック、３
０２は線形写像関数部、３０３は重みつけ加算部であ
る。分析部１０２で得られた入力スペクトルは、狭帯域
コードブック３０１の各コードと（数６）で計算される
距離ｄ_iを算出される。A detailed description of the envelope enlarging section 110 will be given with reference to FIG. FIG. 3 is an internal block diagram of the envelope enlarging unit 110. In FIG. 3, 301 is a narrowband codebook and 3 is a narrowband codebook.
Reference numeral 02 is a linear mapping function unit, and reference numeral 303 is a weighting addition unit. The input spectrum obtained by the analysis unit 102 is calculated with each code of the narrow band codebook 301 and the distance d _i calculated by (Equation 6).

【００５８】[0058]

【数６】 (Equation 6)

【００５９】（数６）において、ｘ_jはｊ次の入力スペ
クトル包絡情報で、Ｖ_ijはコードブック３０１中のｉ番
目のコードにおけるｊ次のスペクトル包絡情報である。
また、入力スペクトルは、線形写像関数部３０２にて複
数の線形写像関数で広帯域なスペクトルに変換される。
線形写像関数部３０２からの出力は重みつけ加算部３０
３で重みつけ加算され、変換スペクトルとして出力され
る。その際の重みは、In (Equation 6), x _j is the j-th order input spectrum envelope information, and V _ij is the j-th order spectrum envelope information in the i-th code in the codebook 301.
Further, the input spectrum is converted by the linear mapping function unit 302 into a wide band spectrum by a plurality of linear mapping functions.
The output from the linear mapping function unit 302 is the weighting addition unit 30.
Weighted additions are made in 3 and output as a converted spectrum. The weight at that time is

【００６０】[0060]

【数７】 (Equation 7)

【００６１】で算出される。（数７）においてｗ_iはｉ
番目の線形写像関数の出力に対する重みである。線形写
像関数部３０２の各々の線形写像関数をＡ_kとすると、It is calculated by In (Equation 7), w _i is i
Is the weight for the output of the th linear mapping function. If each linear mapping function of the linear mapping function unit 302 is A _k ,

【００６２】[0062]

【数８】 (Equation 8)

【００６３】により変換スペクトルが算出される。（数
８）において、ｙ_jはｊ次の変換スペクトルであり、Ａ
_ijはｉ番目の線形写像関数のｊ次の関数値である。The converted spectrum is calculated by In (Equation 8), y _j is the j-th order converted spectrum, and A j
_ij is the j-th function value of the i-th linear mapping function.

【００６４】このようにして得られたｙ_jを包絡拡大部
１１０の出力とし、第二の音声合成部１０９の入力とす
ることにより、合成された音源ならびに拡大されたスペ
クトル包絡の特徴量から、広帯域な音声を合成する。By using y _j thus obtained as the output of the envelope enlarging unit 110 and inputting it to the second speech synthesizing unit 109, from the synthesized sound source and the characteristic amount of the enlarged spectral envelope, Synthesize wideband speech.

【００６５】以上のように、本実施例の音声帯域拡大装
置によれば、入力信号を一定時間蓄えておくバッファ１
０１と、前記バッファに蓄えられた信号列に対して、ス
ペクトル包絡に関する特徴量を抽出する分析部１０２
と、前記分析部からの出力である特徴量から広帯域な特
性を有するスペクトル包絡に関する特徴量を推定する包
絡拡大部１１０と、前記分析部１０２からの特徴量を用
いて、前記バッファ１０１の信号に対する音源パルス列
を推定し、推定したピッチ周期を用いて、欠落した帯域
を復元するパルス発生部１０７と、音源のパルス列に周
波数の重みづけを行う強調フィルタ部１０８を備えたこ
とにより、比較的簡単な構成で、高精度に帯域拡大され
たスペクトル包絡と、帯域拡大された音源とから、帯域
を拡大した高音質で広帯域な合成音を作成できる音声帯
域拡大装置を提供することができる。As described above, according to the voice band expanding apparatus of the present embodiment, the buffer 1 for storing the input signal for a certain period of time.
01, and an analysis unit 102 that extracts a feature amount related to the spectrum envelope from the signal sequence stored in the buffer.
And an envelope enlarging unit 110 that estimates a feature amount related to a spectral envelope having a wide band characteristic from a feature amount output from the analysis unit, and a feature amount from the analysis unit 102, with respect to a signal of the buffer 101. A pulse generator 107 that estimates a sound source pulse train and restores a missing band by using the estimated pitch period, and an enhancement filter 108 that weights a frequency of the pulse train of the sound source are provided. With the configuration, it is possible to provide a voice band expanding device that can create a high-quality, wide-band synthetic sound with a band expanded from a spectrum envelope whose band is expanded with high precision and a sound source whose band has been expanded.

【００６６】なお本実施例では、第一の音声合成部１０
５および第二の音声合成部１０９では、線形予測係数を
特徴量として用い、合成音を得ているが、線形予測係数
に代えてＰＡＲＣＯＲ係数を用いて音声合成を行なって
もかまわない。In this embodiment, the first voice synthesizer 10
In the fifth and second speech synthesis units 109, the linear prediction coefficient is used as the feature amount to obtain the synthesized voice, but the PARCOR coefficient may be used instead of the linear prediction coefficient to perform the speech synthesis.

【００６７】[0067]

【発明の効果】以上のように、本発明によれば、入力信
号を一定時間蓄えておくバッファを有し、バッファ中の
音声信号からスペクトル包絡成分に関する特徴量を分析
部で抽出し、前記分析部からの出力とバッファ中の信号
とから、音源を複数のパルス列で表現し、ピッチ周期推
定部で推定したピッチ周期によってパルス列に強調等の
歪みによって重みづけし、音源であるパルス列を生成
し、合成時に周波数に対する重みづけのフィルタ処理を
さらに行い、スペクトル包絡の特徴量と音源パルス列と
から音声を合成することにより、比較的簡単な構成で、
原音の音源の特徴を大きく損なうことなく、狭帯域信号
にない帯域の信号の帯域拡大を実現することができる。
また包絡拡大部を用い、ＰＡＲＣＯＲ分析で得られたス
ペクトル包絡情報を高精度に広帯域化する処理を付加す
ることによって、さらに、高明瞭で広帯域な合成音声を
生成する音声帯域拡大装置および音声帯域拡大方法を提
供することができる。As described above, according to the present invention, a buffer for storing an input signal for a certain period of time is provided, and a feature quantity related to a spectrum envelope component is extracted from an audio signal in the buffer by an analysis unit, and the analysis is performed. From the output from the unit and the signal in the buffer, the sound source is represented by a plurality of pulse trains, the pulse train is weighted by distortion such as emphasis by the pitch period estimated by the pitch period estimation unit, and the pulse train that is the sound source is generated. By further performing a filtering process of weighting with respect to the frequency at the time of synthesis and synthesizing the voice from the feature amount of the spectrum envelope and the sound source pulse train, a relatively simple configuration,
It is possible to realize band expansion of a signal in a band not included in the narrow band signal without significantly impairing the characteristics of the original sound source.
In addition, a voice band expanding device and a voice band expanding device that generate a clear and wide band synthetic speech by adding a process of using the envelope expanding unit to widen the spectrum envelope information obtained by PARCOR analysis with high accuracy. A method can be provided.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の第一の実施例における音声帯域拡大装
置の全体構成を示すブロック図FIG. 1 is a block diagram showing the overall configuration of a voice band expansion device according to a first embodiment of the present invention.

【図２】本発明の第三の実施例における音声帯域拡大装
置の全体構成を示すブロック図FIG. 2 is a block diagram showing the overall configuration of a voice band expansion device according to a third embodiment of the present invention.

【図３】本発明の第四の実施例における包絡拡大部の概
念図FIG. 3 is a conceptual diagram of an envelope enlarging unit according to a fourth embodiment of the present invention.

【図４】パルス発生部におけるパルス強調の概念図FIG. 4 is a conceptual diagram of pulse enhancement in a pulse generator.

【図５】従来例における音声帯域拡大装置の全体構成を
示すブロック図FIG. 5 is a block diagram showing the overall configuration of a voice band expansion device in a conventional example.

[Explanation of symbols]

１０１バッファ１０２分析部１０３インパルス応答計算部１０４聴覚重みづけフイルタ部１０５第一の音声合成部１０６ピッチ周期推定部１０７パルス発生部１０８強調フィルタ部１０９第二の音声合成部１１０包絡拡大部２０１ＬＰＣ分析部２０２ベクトル量子化部２０３ディコーディング部２０４狭帯域コードブック２０５広帯域コードブック２０６低域復元部２０７第一の高域復元部２０８第二の高域復元部２０９加算部２１０アップサンプリング部３０１狭帯域コードブック３０２線形写像関数部３０３重みつけ加算部 101 Buffer 102 Analysis Unit 103 Impulse Response Calculation Unit 104 Auditory Weighting Filter Unit 105 First Speech Synthesis Unit 106 Pitch Period Estimation Unit 107 Pulse Generation Unit 108 Enhancement Filter Unit 109 Second Speech Synthesis Unit 110 Envelope Enlargement Unit 201 LPC Analysis Part 202 Vector quantizer 203 Decoding part 204 Narrowband codebook 205 Wideband codebook 206 Low band restoration part 207 First high band restoration part 208 Second high band restoration part 209 Addition part 210 Upsampling part 301 Narrow band Codebook 302 Linear mapping function part 303 Weighting addition part

Claims

[Claims]

1. A buffer for storing a fixed amount of an input signal,
An analysis unit that calculates at least a linear prediction coefficient of a linear prediction coefficient and a PARCOR coefficient for the signal sequence stored in the buffer, and an impulse response calculation that calculates an impulse response from the linear prediction coefficient obtained by the analysis unit. Section, an auditory weighting filter section for weighting a difference signal sequence between the output of the buffer and the output of the first speech synthesizer using the linear prediction coefficient as a parameter, and the auditory weighting filter section for simulating human auditory characteristics. A pitch period estimation unit that estimates a pitch period from the output signal of the weighting filter unit, and an output value from the auditory weighting filter and an output of the impulse response calculation unit with reference to the output value from the pitch period estimation unit. And a pulse train output from the pulse generator, and a linear prediction coefficient Of the first speech synthesizer for synthesizing a speech signal by inputting any one of the ARCOR coefficients, a frequency weighting filter for performing frequency weighting on the output of the pulse generator, and a frequency weighting filter. A voice band expanding device comprising a second voice synthesizing unit for synthesizing a voice by inputting an output value and one of the linear prediction coefficient and the PARCOR coefficient.

2. A buffer for storing a fixed amount of input signals,
An analysis unit that calculates at least a linear prediction coefficient of a linear prediction coefficient and a PARCOR coefficient for the signal sequence stored in the buffer, and an impulse response calculation that calculates an impulse response from the linear prediction coefficient obtained by the analysis unit. And a linear prediction coefficient or PARCOR whose band is expanded based on the linear prediction coefficient or PARCOR coefficient.
An envelope enlarging unit that estimates a coefficient, and an auditory weighting filter unit that weights a difference signal sequence between the buffer and the first speech synthesizer using the linear prediction coefficient as a parameter to simulate human auditory characteristics. With reference to the output value from the pitch period estimation unit that estimates the pitch period from the output signal of the auditory weighting filter unit, and the pitch period estimation unit,
A pulse generator that receives the output signal from the auditory weighting filter unit and the output from the impulse response calculator and generates a pulse train, a pulse train output from the pulse generator, the linear prediction coefficient and the PARCOR coefficient. And a frequency weighting filter for performing frequency weighting on the output of the pulse generating section, and an output value of the frequency weighting filter. And a second voice synthesizing unit for synthesizing voice by inputting an output value from the envelope enlarging unit.

3. A pulse generator has a frequency weighting filter for performing frequency weighting on an output signal from the auditory weighting filter, and has a function of generating a pulse train for restoring a missing band. The voice band expansion device according to claim 1 or 2, wherein.

4. The pulse generating section has a function of emphasizing the pulse by increasing the amplitude of the pulse at the pitch cycle interval using the detected value of the pitch cycle estimating section as a characteristic amount. Alternatively, the voice band expansion device described in 2.

5. The pulse generation section has a function of outputting a pulse set to the threshold value by using a certain threshold value with the detected value of the pitch period estimation section as a feature amount. The voice band expansion device described in 2.

6. The envelope enlarging unit has a function of inputting the feature amount obtained by the analyzing unit and widening the feature amount for the spectrum envelope having the feature of the wide band spectrum envelope using a linear mapping function. The voice band expansion device according to claim 2, wherein

7. A predetermined amount of input signal is stored in a buffer,
A feature amount related to the spectral envelope is extracted from the signal sequence stored in the buffer, and the source pulse sequence estimated from the extracted feature amount and the signal sequence stored in the buffer is a frequency of the signal. A voice band expanding method characterized by generating a wide-band source pulse train from a priori information about characteristics and estimated pitch information, and synthesizing a voice by using a feature amount related to the spectral envelope and the wide-band source pulse train. .

8. An input signal is stored in a buffer in a fixed amount,
A feature amount related to the spectral envelope is extracted from the signal sequence stored in the buffer, the envelope information missing in the envelope information represented by the feature amount is interpolated, and the feature amount and the signal stored in the buffer are interpolated. For a sound source pulse train estimated from the sequence, a wide-band source pulse train is generated from the a priori information and the estimated pitch information about the frequency characteristic of the signal, and the feature amount related to the spectrum envelope and the wide-band source pulse train. A method for expanding a voice band, which is characterized by synthesizing a voice using.