JP3302075B2

JP3302075B2 - Synthetic parameter conversion method and apparatus

Info

Publication number: JP3302075B2
Application number: JP03579893A
Authority: JP
Inventors: 芳則志賀
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-02-24
Filing date: 1993-02-24
Publication date: 2002-07-15
Anticipated expiration: 2017-07-15
Also published as: JPH06250692A

Abstract

PURPOSE:To use parameters as they are by conducting a simple transformation of cepstrum.parameters even though the smapling frequency is different during a voice synthesis. CONSTITUTION:F1 [Hz] synthesis cepstrum.parameters, which are transformation object to F2 [Hz] synthesis cepstrum.parameters, are transformed into a logarithmic power.spectrum by conducting a discrete Fourier transform for every frame by a Fourier transformation processing section 1. The logarithmic power.spectrum is inputted to a spectrum data number changing section 2. If F1>F2, the data corresponding to a high frequency side that is higher than F2/2 [Hz] of the spectrum are discarded. If F1<F2, the data with small constant values are added to the high frequency side which is higher than F1/2 [Hz] of the spectrum. The power.spectrum, to which a data number is added or reduced, is discrete Fourier inverse transformed in a Fourier inverse transformation processing section 3 and F2 [Hz] synthesis cepstrum.parameters are obtained.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ケプストラムを音韻パ
ラメータとする音声合成方式に係り、特に音声信号のサ
ンプリング・分析時と異なるサンプリング周波数で音声
を合成するのに好適なケプストラム・パラメータの変換
方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizing method using a cepstrum as a phoneme parameter, and more particularly to a method for converting cepstrum parameters suitable for synthesizing speech at a sampling frequency different from that at the time of sampling and analyzing a speech signal. And equipment.

【０００２】[0002]

【従来の技術】音声のケプストラムは、周知のように、
人の発声した音声をアナログ／ディジタル（Ａ／Ｄ）変
換してディジタル化した後、そのディジタル音声データ
に対してハミング窓等の短時間窓を一定周期でずらしな
がらかけ、各窓内の音声データをフーリエ変換して得ら
れるスペクトルを対数化し、更にフーリエ逆変換するこ
とにより得られる。2. Description of the Related Art Voice cepstrum, as is well known,
The voice uttered by a person is converted from analog to digital (A / D) and digitized, and a short-time window such as a hamming window is applied to the digital voice data while being shifted at a fixed period, and the voice data in each window is applied. Is obtained by performing logarithmization on a spectrum obtained by performing Fourier transform of, and then performing inverse Fourier transform.

【０００３】得られたケプストラムのうち、高次のケプ
ストラムは音声パワー・スペクトルのピッチ成分を、ま
た、低次のケプストラムは音声パワー・スペクトルの包
絡成分を、それぞれ保存している。ケプストラムは、こ
のような性質を利用して、以下に述べる音声の分析合成
や音声の規則合成などに広く使われている。[0003] Among the obtained cepstrum, the higher-order cepstrum stores the pitch component of the voice power spectrum, and the lower-order cepstrum stores the envelope component of the voice power spectrum. Cepstrum is widely used for analyzing and synthesizing speech and synthesizing rules of speech described below by utilizing such properties.

【０００４】まず、音声の分析合成では、アナウンサー
などが発声した音声を上記した方法でケプストラム分析
し、得られたケプストラム・パラメータの各フレーム低
次成分を各フレームの有声・無声情報およびピッチ周波
数と共に保存する。そして、合成時には、これらの情報
をＬＭＡ（対数振幅近似）フィルタ等で構成した合成器
に入力し、合成された音声をＤ／Ａ変換して音声出力す
る。[0004] First, in the speech analysis and synthesis, the speech uttered by an announcer or the like is cepstrum-analyzed by the above-described method, and the low-order components of each frame of the obtained cepstrum parameters are determined together with the voiced / unvoiced information and pitch frequency of each frame. save. Then, at the time of synthesis, the information is input to a synthesizer configured with an LMA (logarithmic amplitude approximation) filter or the like, and the synthesized voice is D / A converted and output as a voice.

【０００５】次に、音声の規則合成では、予め、アナウ
ンサーなどの発声した音声を上記した方法でケプストラ
ム分析し、規則合成の基本単位、例えばＣＶ（子音＋母
音連鎖）単位に相当するケプストラム分析フレームか
ら、音声の包絡情報を保持している低次のケプストラム
を取り出し、音声素片として記憶装置に格納しておく。
そして、音声を合成するときには、合成したい音韻系列
に従って上記ＣＶ単位の音声素片を補間接続し、これを
音韻パラメータとして、他方で生成したピッチ情報より
なる韻律パラメータと共に、ＬＭＡフィルタ等で構成さ
れる合成器に入力し、合成された音声をＤ／Ａ変換して
音声出力する。Next, in the rule synthesis of speech, cepstrum analysis is performed in advance on a voice uttered by an announcer or the like by the above-described method, and a cepstrum analysis frame corresponding to a basic unit of rule synthesis, for example, a CV (consonant + vowel chain) unit. Then, a low-order cepstrum holding the envelope information of the voice is extracted and stored in the storage device as a voice unit.
Then, when synthesizing speech, the speech units in CV units are interpolated and connected according to the phoneme sequence to be synthesized, and this is used as a phoneme parameter, and is configured by an LMA filter or the like together with a prosody parameter consisting of pitch information generated on the other side. It is input to the synthesizer, and the synthesized voice is D / A converted and output as voice.

【０００６】[0006]

【発明が解決しようとする課題】このように、従来よ
り、ケプストラムを利用した音声の分析合成や音声の規
則合成の技術が知られていた。As described above, techniques for analyzing and synthesizing speech using cepstrum and rules for synthesizing speech have been known.

【０００７】しかし従来は、例えば合成器処理系の処理
速度が遅く、合成器出力の合成音声信号のサンプリング
周波数を低くして、単位時間の演算回数を減らしたいと
きなどのように、ケプストラム・パラメータ作成時のＡ
／Ｄ変換のサンプリング周波数と音声合成時のＤ／Ａ変
換のサンプリング周波数が異なってくる場合、最初にＡ
／Ｄ変換して分析したケプストラム・パラメータはその
まま使うことができないという不都合があった。例え
ば、分析合成の場合には、次の２つの方法のいずれかを
行って、ケプストラム・パラメータからなる音声データ
を作り直さなければならなかった。（１）音声を所望のサンプリング周波数でＡ／Ｄ変換し
直し、ケプストラム分析もやり直す。However, conventionally, for example, when the processing speed of the synthesizer processing system is slow and the sampling frequency of the synthesized speech signal output from the synthesizer is lowered to reduce the number of operations per unit time, cepstrum parameter A at the time of creation
If the sampling frequency of the D / A conversion differs from the sampling frequency of the D / A conversion during speech synthesis,
The cepstrum parameter analyzed by the / D conversion cannot be used as it is. For example, in the case of analysis and synthesis, one of the following two methods has to be performed to regenerate audio data composed of cepstrum parameters. (1) The voice is A / D converted again at a desired sampling frequency, and the cepstrum analysis is performed again.

【０００８】（２）既にＡ／Ｄ変換された離散信号デー
タに対して適当なサンプルの間引きを行った後、折り返
し歪みを起こさないように低域ろ過フィルタに通し、ケ
プストラム分析をし直す。(2) After appropriate sampling of the A / D-converted discrete signal data, the signal is passed through a low-pass filter so that aliasing does not occur, and cepstrum analysis is performed again.

【０００９】上記（１），（２）のいずれの方法も、最
初に作成済みのケプストラム・パラメータを用いること
ができないので、音声のＡ／Ｄ変換や間引き処理、ケプ
ストラム・パラメータ分析のやり直しなどを行わなけれ
ばならず、効率が悪い。In any of the above methods (1) and (2), since the cepstrum parameters that have been created first cannot be used, A / D conversion and decimation processing of speech, re-execution of cepstrum parameter analysis, and the like are performed. Must be done, inefficient.

【００１０】また、規則合成の場合を例にとると、上記
分析合成における（１）或いは（２）の手間に加えて、
音声素片の切り出し（必要な音声部分に相当するケプス
トラムフレームの低次ケプストラムの取り出し）をやり
直さなければならず、更に面倒である。Also, taking the case of rule synthesis as an example, in addition to the labor of (1) or (2) in the above-mentioned analysis synthesis,
It is necessary to redo a speech unit (extraction of a low-order cepstrum of a cepstrum frame corresponding to a necessary speech portion), which is even more troublesome.

【００１１】そこで本発明は、音声合成時のサンプリン
グ周波数が異なっても、ケプストラム・パラメータの簡
単な変換だけで、そのまま利用できる合成パラメータ変
換方法および装置を提供することにある。It is an object of the present invention to provide a method and apparatus for synthesizing parameters which can be used as they are even by simple conversion of cepstrum parameters, even if the sampling frequency during speech synthesis is different.

【００１２】[0012]

【課題を解決するための手段】本発明は上記課題を解決
するために、第１の周波数でサンプリングした音声信号
データの分析により得られた第１のケプストラム・パラ
メータを、各フレーム毎に離散フーリエ変換によりパワ
ー・スペクトルに変換し、このパワー・スペクトルの最
高周波数側からデータを切り捨てるか或いは最高周波数
側にデータを付け足すことによってデータ数が増減され
たパワー・スペクトルを離散フーリエ逆変換して第１の
周波数とは異なる第２の周波数で音声合成するための第
２のケプストラム・パラメータを得ることを特徴とする
ものである。According to the present invention, a first cepstrum parameter obtained by analyzing voice signal data sampled at a first frequency is discrete-Fourier-by-frame for each frame. The power spectrum is converted into a power spectrum by the conversion, and the power spectrum whose number of data is increased or decreased by discarding data from the highest frequency side of the power spectrum or adding data to the highest frequency side is inversely transformed by a discrete Fourier transform. And obtaining a second cepstrum parameter for speech synthesis at a second frequency different from the second frequency.

【００１３】[0013]

【作用】上記の構成においては、第１の周波数でＡ／Ｄ
変換してケプストラム分析して得られた音声の第１のケ
プストラム・パラメータから、第２の周波数で音声合成
するための第２のケプストラム・パラメータを得たい場
合、まず第１のケプストラム・パラメータを各フレーム
毎に離散フーリエ変換によりパワー・スペクトルに変換
した後、そのパワー・スペクトルの最高周波数側からデ
ータを切り捨てるか（第１の周波数＞第２の周波数の場
合）或いは最高周波数側に低周波数部分より小さな値の
データを付け足して（第１の周波数＜第２の周波数の場
合）、パワー・スペクトルのデータ数を増減させること
により、周波数領域を縮小或いは拡大した新たなパワー
・スペクトルを得ることができ、この新たなパワー・ス
ペクトルを離散フーリエ逆変換することにより、第２の
周波数（サンプリング周波数）で分析したものと同等の
第２のケプストラム・パラメータを得ることができる。In the above configuration, A / D conversion is performed at the first frequency.
When it is desired to obtain second cepstrum parameters for speech synthesis at a second frequency from first cepstrum parameters of speech obtained by conversion and cepstrum analysis, first the first cepstrum parameters must be After converting into a power spectrum by the discrete Fourier transform for each frame, data may be discarded from the highest frequency side of the power spectrum (in the case of first frequency> second frequency) or from the lower frequency part to the highest frequency side. By adding data of a small value (when the first frequency <the second frequency) and increasing or decreasing the number of data of the power spectrum, a new power spectrum in which the frequency domain is reduced or expanded can be obtained. , By performing an inverse discrete Fourier transform of this new power spectrum, the second frequency (sampler It is possible to obtain the second cepstral parameters comparable to those analyzed in grayed frequency).

【００１４】このように、一度Ａ／Ｄ変換したケプスト
ラムを分析して得られたケプストラム・パラメータか
ら、異なったサンプリング周波数でＡ／Ｄ変換して分析
したケプストラム・パラメータと同等のケプストラム・
パラメータを得ることができるので、再サンプリングや
間引き、低域ろ過フィルタ処理、再分析等を行う必要が
なく、最初に作った音声素片から変換できるために無駄
がなく極めて効率的である。As described above, the cepstrum parameters obtained by analyzing the cepstrum once A / D converted once are converted into cepstrum parameters equivalent to the cepstrum parameters analyzed by A / D conversion at different sampling frequencies.
Since the parameters can be obtained, there is no need to perform resampling, thinning, low-pass filtering, reanalysis, and the like, and the speech can be converted from the first speech unit, so that there is no waste and the efficiency is extremely high.

【００１５】[0015]

【実施例】以下、図面を参照して本発明の一実施例を説
明する。図１は同実施例における合成パラメータ（ケプ
ストラム・パラメータ）変換装置の構成を示すブロック
図ある。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a synthesizing parameter (cepstrum parameter) converter according to the embodiment.

【００１６】図１に示す装置は、第１のサンプリング周
波数Ｆ1[Hz] でＡ／Ｄ変換した音声信号データを分析し
て得られたケプストラム・パラメータ（Ｆ1[Hz] 合成用
ケプストラム・パラメータ）を、第２のサンプリング周
波数Ｆ2[Hz] の音声合成用のケプストラム・パラメータ
（Ｆ2[Hz] 合成用ケプストラム・パラメータ）に変換す
る（合成音サンプリング周波数変換の）ためのもので、
フーリエ変換処理部１、スペクトルデータ数変更部２、
およびフーリエ逆変換処理部３から構成される。フーリ
エ変換処理部１は、Ｆ1[Hz] 合成用ケプストラム・パラ
メータを、各フレーム毎に離散フーリエ変換により、パ
ワー・スペクトルに変換する。The apparatus shown in FIG. 1 converts a cepstrum parameter (F1 [Hz] synthesis cepstrum parameter) obtained by analyzing audio signal data A / D converted at a first sampling frequency F1 [Hz]. , For converting to a cepstrum parameter for speech synthesis (F2 [Hz] cepstrum parameter for synthesis) at a second sampling frequency F2 [Hz] (for converting a synthesized sound sampling frequency),
Fourier transform processing unit 1, spectrum data number changing unit 2,
And an inverse Fourier transform processing unit 3. The Fourier transform processing unit 1 transforms the cepstrum parameter for F1 [Hz] synthesis into a power spectrum by discrete Fourier transform for each frame.

【００１７】スペクトルデータ数変更部２は、フーリエ
変換処理部１から変換出力されるパワー・スペクトルを
対象として、Ｆ1 とＦ2 の大小関係に応じ、最高周波数
側からデータを切り捨てるか（Ｆ1 ＞Ｆ2 の場合）、或
いは最高周波数側にデータを付け足す（Ｆ1 ＜Ｆ2 の場
合）データ数変更処理を行う。The number-of-spectral-data changing unit 2 determines whether to cut off data from the highest frequency side (F1> F2) for the power spectrum converted and output from the Fourier transform processing unit 1 according to the magnitude relationship between F1 and F2. ) Or add data to the highest frequency side (if F1 <F2) and perform a data number change process.

【００１８】フーリエ逆変換処理部３は、スペクトルデ
ータ数変更部２によりデータ数が増減されたパワー・ス
ペクトルを離散フーリエ逆変換してＦ2[Hz] 合成用ケプ
ストラム・パラメータを生成する。The inverse Fourier transform processing unit 3 performs an inverse discrete Fourier transform of the power spectrum whose number of data has been increased / decreased by the spectrum data number changing unit 2 to generate a cepstrum parameter for F 2 [Hz] synthesis.

【００１９】次に、図１のように構成された合成パラメ
ータ変換装置の動作を、Ｆ1[Hz] 合成用のケプストラム
・パラメータがｍ次（但し、ｍ＜２５６）のケプストラ
ム・パラメータＣ₀〜Ｃ_mであるものとして、同パラメ
ータをＦ2[Hz] 合成用のケプストラム・パラメータに変
換する場合を例に、図２のフローチャートを参照して説
明する。Next, the operation of the synthesizing parameter conversion device configured as shown in FIG. 1 will be described. The cepstrum parameters for F1 [Hz] synthesis are the m-th order (where m <256) cepstrum parameters C ₀ -C. An example in which the parameter is converted to a cepstrum parameter for F2 [Hz] synthesis assuming _m is described with reference to the flowchart in FIG.

【００２０】図１の装置の（うち、スペクトルデータ数
変更部２の）動作は、Ｆ1 とＦ2 の大小関係により異な
る。そこで、まずＦ1 ＞Ｆ2 の場合の動作を、図３の動
作説明図を併用して説明する。The operation of the apparatus shown in FIG. 1 (of which the number of spectrum data changing unit 2 is different) differs depending on the magnitude relationship between F1 and F2. Therefore, the operation when F1> F2 will be described first with reference to the operation explanatory diagram of FIG.

【００２１】まず、フーリエ変換処理部１は、サンプリ
ング周波数Ｆ1[Hz] でＡ／Ｄ変換した音声信号データを
分析して得られたｍ次ケプストラム・パラメータＣ₀〜
Ｃ_mを、First, the Fourier transform processing unit 1 analyzes m-th order cepstrum parameters C ₀ -C obtained by analyzing the audio signal data A / D converted at the sampling frequency F 1 [Hz].
C _m ,

【００２２】[0022]

【数１】 (Equation 1)

【００２３】というように配列Ｘにセットする（ステッ
プＳ１）。そしてフーリエ変換処理部１は、この配列Ｘ
に対して、２５６点の離散フーリエ変換を行い、対数パ
ワー・スペクトルを得る（ステップＳ２）。この際、周
知の高速フーリエ変換（ＦＥＴ）のアルゴリズムが利用
される。Thus, the array X is set (step S1). Then, the Fourier transform processing unit 1 calculates the array X
Is subjected to a discrete Fourier transform of 256 points to obtain a logarithmic power spectrum (step S2). At this time, a well-known fast Fourier transform (FET) algorithm is used.

【００２４】さて、フーリエ変換処理部１が上記の配列
Ｘに対してフーリエ変換を行うと、例えば図３（ａ）に
示すｍ次ケプストラム・パラメータから、同図（ｂ）に
示すように、０[Hz]からナイキスト周波数にあたるＦ1
／２[Hz]の範囲の対数パワー・スペクトルが高周波側で
折り返された形で得られる。フーリエ変換処理部１によ
り対数パワー・スペクトルが得られると、スペクトルデ
ータ数変更部２が起動される。When the Fourier transform processing unit 1 performs a Fourier transform on the array X, for example, from the m-th order cepstrum parameter shown in FIG. 3 (a), as shown in FIG. F1 from [Hz] to Nyquist frequency
A logarithmic power spectrum in the range of / 2 [Hz] is obtained in a folded form on the high frequency side. When the logarithmic power spectrum is obtained by the Fourier transform processing unit 1, the spectrum data number changing unit 2 is activated.

【００２５】スペクトルデータ数変更部２は、まずＦ1
とＦ2 の大小を判別し（ステップＳ３）、この例のよう
にＦ1 ＞Ｆ2 の場合には、上記の対数パワー・スペクト
ル、即ち図３（ｂ）に示すように折り返された形の対数
パワー・スペクトルに対して、以下のような処理を行
う。The number-of-spectral-data changing unit 2 firstly selects F1
And F2 is determined (step S3). If F1> F2 as in this example, the logarithmic power spectrum, that is, the logarithmic power spectrum folded back as shown in FIG. The following processing is performed on the spectrum.

【００２６】スペクトルデータ数変更部２は、図３
（ｂ）に示す上記折り返された対数パワー・スペクトル
のうち、Ｆ2 ／２[Hz]より高周波側に相当するデータ
（図中斜線が施された中央部のデータ）を切り捨て、同
図（ｃ）に示すようにデータ点数を２５６×Ｆ2 ／Ｆ1
ポイントに減らす（ステップＳ４）。このデータ点数が
減らされた図３（ｃ）に示すスペクトルデータは、０[H
z]からＦ2 ／２[Hz]の周波数範囲の対数パワー・スペク
トルを高周波側で折り返した形になっている。これは、
サンプリング周波数Ｆ2 でＡ／Ｄ変換し、２５６×Ｆ2
／Ｆ1 ポイントの窓をかけ離散フーリエ変換して得られ
る対数パワー・スペクトルと殆ど同じである。FIG.
In the folded logarithmic power spectrum shown in (b), the data corresponding to the higher frequency side than F2 / 2 [Hz] (the data in the hatched portion in the figure) is discarded, and FIG. The number of data points is 256 × F2 / F1 as shown in FIG.
It is reduced to points (step S4). The spectrum data shown in FIG. 3C with the reduced number of data points is 0 [H
The logarithmic power spectrum in the frequency range from z] to F2 / 2 [Hz] is folded on the high frequency side. this is,
A / D conversion at the sampling frequency F2, 256 × F2
This is almost the same as the logarithmic power spectrum obtained by performing a discrete Fourier transform by windowing the / F1 point.

【００２７】このように、Ｆ1 ＞Ｆ2 の場合には、フー
リエ変換処理部１により得られた図３（ｂ）に示す対数
パワー・スペクトルのＦ2 ／２[Hz]より高周波側に相当
するデータがスペクトルデータ数変更部２によって間引
かれて、そのデータ点数が同図（ｃ）に示すように２５
６×Ｆ2 ／Ｆ1 ポイントに減らされる。フーリエ変換処
理部１によりデータ点数が減らされた新たな対数パワー
・スペクトルが得られると、フーリエ逆変換処理部３が
起動される。As described above, when F1> F2, data corresponding to a higher frequency side than F2 / 2 [Hz] of the logarithmic power spectrum obtained by the Fourier transform processing unit 1 shown in FIG. The data points are thinned out by the spectrum data number changing unit 2 and the number of data points is reduced to 25 as shown in FIG.
It is reduced to 6 x F2 / F1 points. When a new logarithmic power spectrum whose number of data points is reduced by the Fourier transform processing unit 1 is obtained, the Fourier inverse transform processing unit 3 is activated.

【００２８】フーリエ逆変換処理部３は、このデータ点
数が２５６×Ｆ2 ／Ｆ1 ポイントに減らされた図３
（ｃ）に示す対数パワー・スペクトルを、離散フーリエ
逆変換する（ステップＳ６）。これにより、サンプリン
グ周波数Ｆ2[Hz] でサンプリングし分析したものと同等
の図３（ｄ）に示すような新たなケプストラム・パラメ
ータ、即ちＦ2[Hz] 合成用ケプストラム・パラメータが
得られる。次に、Ｆ1 ＜Ｆ2 の場合の動作を、図４の動
作説明図を併用して説明する。The inverse Fourier transform processing unit 3 determines that the number of data points is reduced to 256 × F2 / F1 points in FIG.
The inverse Fourier transform of the logarithmic power spectrum shown in (c) is performed (step S6). As a result, a new cepstrum parameter as shown in FIG. 3D equivalent to that obtained by sampling at the sampling frequency F2 [Hz], that is, a cepstrum parameter for F2 [Hz] synthesis is obtained. Next, the operation when F1 <F2 will be described with reference to the operation explanatory diagram of FIG.

【００２９】このＦ１＜Ｆ2 の場合の動作が、前記した
Ｆ1 ＞Ｆ2 の場合の動作と異なるのは、スペクトルデー
タ数変更部２によりデータ点数が増やされる点であり、
他のフーリエ変換処理部１の動作（ステップＳ１，Ｓ
２）とフーリエ逆変換処理部３の動作（ステップＳ６）
はＦ1 ＞Ｆ2 の場合と同様である。このため、Ｆ1 合成
用ケプストラム・パラメータが、図４（ａ）に示すよう
に、図３（ａ）のケプストラム・パラメータと同一であ
るならば、フーリエ変換処理部１により得られる対数パ
ワー・スペクトルも、図４（ｂ）に示すように、図３
（ｂ）のパワー・スペクトルと同一となる。The operation in the case of F1 <F2 is different from the operation in the case of F1> F2 in that the number of data points is increased by the spectrum data number changing unit 2.
Operation of Other Fourier Transform Processing Unit 1 (Steps S1, S1
2) and the operation of the inverse Fourier transform processing unit 3 (step S6)
Is the same as in the case of F1> F2. For this reason, if the cepstrum parameter for F1 synthesis is the same as the cepstrum parameter in FIG. 3A as shown in FIG. 4A, the logarithmic power spectrum obtained by the Fourier transform processing unit 1 is also obtained. As shown in FIG.
This is the same as the power spectrum of (b).

【００３０】スペクトルデータ数変更部２は、フーリエ
変換処理部１により図４（ｂ）に示す対数パワー・スペ
クトルが得られると、即ち０[Hz]からＦ1 ／２[Hz]の範
囲の対数パワー・スペクトルが高周波側で折り返された
形の対数パワー・スペクトルが得られると、Ｆ1 とＦ2
の大小を判別する（ステップＳ３）。When the Fourier transform processing unit 1 obtains the logarithmic power spectrum shown in FIG. 4B, the spectrum data number changing unit 2 changes the logarithmic power in the range from 0 [Hz] to F1 / 2 [Hz]. When a logarithmic power spectrum with a spectrum folded on the high frequency side is obtained, F1 and F2
Is determined (step S3).

【００３１】この例のようにＦ1 ＜Ｆ2 の場合には、ス
ペクトルデータ数変更部２は、図４（ｂ）に示すように
折り返された形の対数パワー・スペクトルのＦ1 ／２[H
z]より高周波側に、例えば対数パワーとしては低周波数
部分に比べて小さな一定値のデータを追加し、同図
（ｃ）に示すようにデータ点数を２５６×Ｆ2 ／Ｆ1 ポ
イントに増やす（ステップＳ５）。このデータ点数が増
やされた図４（ｃ）に示すスペクトルデータは、０[Hz]
からＦ2 ／２[Hz]の周波数範囲の対数パワー・スペクト
ルを高周波側で折り返した形になっている。これは、周
波数Ｆ1[Hz] の急峻な低域ろ過フィルタを通した音声を
サンプリング周波数Ｆ2 でＡ／Ｄ変換し、２５６×Ｆ2
／Ｆ1 ポイントの窓をかけ離散フーリエ変換して得られ
る対数パワー・スペクトルと同等のものである。In the case of F1 <F2 as in this example, the spectrum data number changing section 2 sets the F1 / 2 [H of the folded logarithmic power spectrum as shown in FIG. 4B.
z], on the higher frequency side, for example, add data of a constant value smaller than that of the low frequency part as log power, and increase the number of data points to 256 × F2 / F1 points as shown in FIG. ). The spectrum data shown in FIG. 4C in which the number of data points is increased is 0 [Hz].
The logarithmic power spectrum in the frequency range from F2 to F2 [Hz] is folded on the high frequency side. The A / D conversion is performed on the sound that has passed through a steep low-pass filter having a frequency of F1 [Hz] at a sampling frequency of F2, and 256 × F2
This is equivalent to a logarithmic power spectrum obtained by performing a discrete Fourier transform with a window of / F1 point.

【００３２】このように、Ｆ1 ＜Ｆ2 の場合には、フー
リエ変換処理部１により得られた図４（ｂ）に示す対数
パワー・スペクトルのＦ1 ／２[Hz]より高周波側に、低
周波数部分に比べて小さな値のデータがスペクトルデー
タ数変更部２によって付け足されて、そのデータ点数が
同図（ｃ）に示すように２５６×Ｆ2 ／Ｆ1 ポイントに
増やされる。As described above, when F 1 <F 2, the low-frequency portion is shifted to a higher frequency side than F 1/2 [Hz] of the logarithmic power spectrum obtained by the Fourier transform processing section 1 and shown in FIG. Is added by the spectrum data number changing unit 2, and the number of data points is increased to 256 × F2 / F1 points as shown in FIG.

【００３３】そこで、このデータ点数が増やされた図４
（ｃ）に示す対数パワー・スペクトルを、フーリエ逆変
換処理部３で離散フーリエ逆変換することにより、サン
プリング周波数Ｆ2[Hz] でサンプリングし分析したもの
と同等の図４（ｄ）に示すようなＦ2[Hz] 合成用ケプス
トラム・パラメータを得ることができる。以上の合成パ
ラメータ（ケプストラム・パラメータ）変換を、分析合
成に応用する場合について、図５（ａ），（ｂ）を参照
して説明する。図５（ａ），（ｂ）は音声合成装置のブ
ロック構成を示す。Therefore, FIG. 4 in which the number of data points is increased
The logarithmic power spectrum shown in (c) is subjected to discrete Fourier inverse transform by the inverse Fourier transform processing unit 3, thereby obtaining a sample at the sampling frequency F2 [Hz] and analyzing it as shown in FIG. F2 [Hz] Cepstrum parameters for synthesis can be obtained. A case where the above-described synthesis parameter (cepstrum parameter) conversion is applied to analysis and synthesis will be described with reference to FIGS. 5 (a) and 5 (b). FIGS. 5A and 5B show a block configuration of the speech synthesizer.

【００３４】まず図５（ａ）の音声合成装置は、記憶部
１１と、音源生成部１２と、音声合成演算部１３と、１
２ｋＨｚのＤ／Ａ変換部１４と、ローパスフィルタ１５
とから構成される周知の構成を有している。First, the speech synthesizer shown in FIG. 5A includes a storage unit 11, a sound source generation unit 12, a speech synthesis operation unit 13,
2 kHz D / A converter 14 and low-pass filter 15
And a well-known configuration composed of

【００３５】この音声合成装置の記憶部１１では、文や
単語を読み上げた音声を１２ｋＨｚでサンプリングし、
分析して得られる例えば２０次（ｍ＝２０）ケプストラ
ム・パラメータ（１２ｋＨｚ合成用ケプストラム・パラ
メータ）がケプストラム・パラメータ記憶部１１１に、
合成すべき有声・無声情報が有声・無声情報記憶部１１
２に、ピッチ情報がピッチ情報記憶部１１３に、それぞ
れ保持される。そして、有声・無声情報記憶部１１２の
内容とピッチ情報記憶部１１３の内容から音源生成部１
２にて音源データを生成し、この音源データとケプスト
ラム・パラメータ記憶部１１１の内容から、ＬＭＡフィ
ルタを構成した音声合成演算部（ＬＭＡフィルタ演算
部）１３にて音声信号データを生成する。この音声信号
データに対して、Ｄ／Ａ変換部１４で１２ｋＨｚのＤ／
Ａ変換を行い、ローパスフィルタ１５を通す。このよう
にして、１２ｋＨｚのサンプリングの音声が合成され
る。In the storage unit 11 of the speech synthesizer, a speech read out from a sentence or word is sampled at 12 kHz,
For example, a 20th-order (m = 20) cepstrum parameter (12 kHz synthesis cepstrum parameter) obtained by analysis is stored in the cepstrum parameter storage unit 111.
The voiced / unvoiced information to be synthesized is a voiced / unvoiced information storage unit 11
2 is stored in the pitch information storage unit 113. Then, based on the contents of the voiced / unvoiced information storage unit 112 and the contents of the pitch information storage unit 113, the sound source generation unit 1
2, sound source data is generated, and from the sound source data and the contents of the cepstrum / parameter storage unit 111, voice signal data is generated by a voice synthesis calculation unit (LMA filter calculation unit) 13 configuring an LMA filter. The D / A converter 14 converts this audio signal data into a 12 kHz D / A signal.
A conversion is performed and the signal is passed through a low-pass filter 15. In this way, a 12-kHz sampling voice is synthesized.

【００３６】一方、図５（ｂ）の音声合成装置も、図５
（ａ）の音声合成装置と同様の構成を有しており、記憶
部２１と、音源生成部２２と、音声合成演算部２３と、
Ｄ／Ａ変換部２４と、ローパスフィルタ２５とから構成
されている。但し、この図５（ｂ）の音声合成装置は、
音声合成演算部（ＬＭＡフィルタ演算部）２３の演算速
度が遅いため、１２ｋＨｚの音声信号を合成しようとす
ると演算時間が極めて長く実用化に向かないことから、
単位時間の演算回数が半分で済む６ＫＨｚの音声信号を
合成するようになっている。このため、図５（ａ）のＤ
／Ａ変換部１４と異なり、６ｋＨｚのＤ／Ａ変換を行う
Ｄ／Ａ変換部２４を用いている。On the other hand, the speech synthesizer shown in FIG.
It has a configuration similar to that of the voice synthesizer of FIG. 1A, and includes a storage unit 21, a sound source generation unit 22, a voice synthesis operation unit 23,
It comprises a D / A converter 24 and a low-pass filter 25. However, the speech synthesizer of FIG.
Since the operation speed of the speech synthesis operation unit (LMA filter operation unit) 23 is slow, the operation time is extremely long when attempting to synthesize a 12 kHz audio signal.
A 6 KHz audio signal that requires only half the number of calculations per unit time is synthesized. For this reason, D in FIG.
Unlike the / A converter 14, a D / A converter 24 that performs 6 kHz D / A conversion is used.

【００３７】さて、このような図５（ｂ）の音声合成装
置にて、図５（ａ）に示す記憶部１１に入っている音声
と同様の内容を合成したい場合には、図１に示す合成パ
ラメータ変換装置を用いればよい。即ち、図２のフロー
チャート（アルゴリズム）中のＦ1 ，Ｆ2 を、Ｆ1 ＝１２[kHz] Ｆ2 ＝６[kHz]When it is desired to synthesize the same content as the voice stored in the storage unit 11 shown in FIG. 5A by such a voice synthesizer as shown in FIG. A synthesis parameter conversion device may be used. That is, F1 and F2 in the flow chart (algorithm) of FIG. 2 are calculated as follows: F1 = 12 [kHz] F2 = 6 [kHz]

【００３８】として、合成パラメータ変換装置により、
１２ＫＨｚサンプリングの音声データから分析したケプ
ストラム・パラメータを、サンプリング周波数６ＫＨｚ
でサンプリングし分析したものと同等のケプストラム・
パラメータに変換し、これを図５（ｂ）の音声合成装置
のケプストラム・パラメータ記憶部２１１に保持して用
いればよい。以上、本発明の一実施例についた説明した
が、本発明は前記実施例に限定されるものではない。例
えば、フーリエ変換に関し、前記実施例では２５６点の
フーリエ変換を用いたが、データ点数、フーリエ変換の
アルゴリズムについては何ら限定されない。As described above, by the synthesis parameter conversion device,
The cepstrum parameter analyzed from the voice data of the sampling of 12 kHz is converted to the sampling frequency of 6 kHz.
Cepstrum equivalent to that sampled and analyzed in
The parameters may be converted and stored in the cepstrum / parameter storage unit 211 of the speech synthesizer in FIG. As mentioned above, although one Example of this invention was described, this invention is not limited to said Example. For example, regarding the Fourier transform, the Fourier transform of 256 points is used in the above embodiment, but the number of data points and the algorithm of the Fourier transform are not limited at all.

【００３９】また、前記実施例では、Ｆ1 ＜Ｆ2 のと
き、スペクトルデータ数変更部２により、図４（ｃ）に
示したように、最も高周波側（スペクトルデータ中心
部）に、対数パワーとしては低周波数部分に比べて小さ
な一定値のデータを付加することで、データ点数を増や
す場合について説明したが、合成音に悪影響を及ぼさな
い範囲で、例えば図６のように、付加するスペクトル部
分に傾斜を与えたり（図６（ａ）の場合）、ピークを加
えたりしてもよい（図６（ｂ）の場合）。In the above embodiment, when F1 <F2, as shown in FIG. 4C, the spectrum data number changing unit 2 changes the logarithmic power to the highest frequency side (the center of the spectrum data). A case has been described where the number of data points is increased by adding data of a fixed value that is smaller than that of the low frequency portion. However, as shown in FIG. 6, for example, as shown in FIG. (In the case of FIG. 6A) or a peak may be added (in the case of FIG. 6B).

【００４０】また、図５に示した応用例は、音声の分析
合成の例であるが、ケプストラム・パラメータを音声素
片として用いる音声の規則合成方式で、音声素片の変換
に用いることも可能である。要するに本発明はその要旨
を逸脱しない範囲で種々変形して実施することができ
る。Although the application example shown in FIG. 5 is an example of speech analysis and synthesis, it is a speech synthesis method using cepstrum parameters as speech segments, and can be used for speech segment conversion. It is. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【００４１】[0041]

【発明の効果】以上説明したように本発明によれば、音
声信号データの分析により得られたケプストラム・パラ
メータを、各フレーム毎に離散フーリエ変換によりパワ
ー・スペクトルに変換し、このパワー・スペクトルの最
高周波数側からデータを切り捨てるか或いは最高周波数
側にデータを付け足すことによってデータ数が増減され
たパワー・スペクトルを離散フーリエ逆変換する構成と
したので、異なった周波数でサンプリングして分析した
ケプストラム・パラメータと同等のケプストラム・パラ
メータを得ることができる。As described above, according to the present invention, the cepstrum parameter obtained by analyzing the audio signal data is converted into a power spectrum by a discrete Fourier transform for each frame. Since the power spectrum whose number of data has been increased or decreased by truncating the data from the highest frequency side or adding data to the highest frequency side is inversely transformed by discrete Fourier transform, cepstrum parameters sampled and analyzed at different frequencies The cepstrum parameter equivalent to can be obtained.

【００４２】したがって本発明によれば、例えば合成器
処理系の固有の処理速度のために、単位時間の演算回数
を減らす必要があり、合成音声信号のサンプリング周波
数を低くしたい場合でも、一度Ａ／Ｄ変換してケプスト
ラム分析し、必要なフレームを切り出したケプストラム
・パラメータから、異なった周波数（目的の周波数）で
サンプリングして分析したケプストラム・パラメータと
同等のケプストラム・パラメータを簡単に得ることがで
き、サンプリングのやり直しや間引き、低域ろ過フィル
タ処理、分析・素片の切り出しのやり直し等を行う労力
が省け、初めに作ったケプストラム・パラメータから変
換できるので無駄がない。Therefore, according to the present invention, it is necessary to reduce the number of operations per unit time due to, for example, the inherent processing speed of the synthesizer processing system. From the cepstrum parameters obtained by D-conversion and cepstrum analysis, necessary frames are cut out, cepstrum parameters equivalent to cepstrum parameters sampled and analyzed at different frequencies (target frequencies) can be easily obtained. Efforts such as re-sampling and thinning, low-pass filtering, analysis / slicing-out, and the like are eliminated, and there is no waste since conversion can be performed from the initially created cepstrum parameters.

[Brief description of the drawings]

【図１】本発明の一実施例に係る合成パラメータ（ケプ
ストラム・パラメータ）変換装置の構成を示すブロック
図。FIG. 1 is a block diagram showing a configuration of a synthesis parameter (cepstrum parameter) conversion device according to an embodiment of the present invention.

【図２】図１の装置による合成音サンプリング周波数変
換のための合成パラメータ（ケプストラム・パラメー
タ）変換処理の手順を示すフローチャート。FIG. 2 is a flowchart showing a procedure of a synthesis parameter (cepstrum parameter) conversion process for converting a synthesized sound sampling frequency by the apparatus of FIG. 1;

【図３】Ｆ1[Hz] 合成用ケプストラム・パラメータから
それより周波数の低いＦ2[Hz]合成用ケプストラム・パ
ラメータを得る場合の動作を説明するための図。FIG. 3 is a diagram for explaining an operation when obtaining an F2 [Hz] synthesis cepstrum parameter having a lower frequency than the F1 [Hz] synthesis cepstrum parameter;

【図４】Ｆ1[Hz] 合成用ケプストラム・パラメータから
それより周波数の高いＦ2[Hz]合成用ケプストラム・パ
ラメータを得る場合の動作を説明するための図。FIG. 4 is a diagram for explaining an operation in a case where an F2 [Hz] synthesis cepstrum parameter having a higher frequency than the F1 [Hz] synthesis cepstrum parameter is obtained;

【図５】分析合成への応用例を示す図であり、同図
（ａ）はサンプリング周波数１２ｋＨｚの音声を合成す
る音声合成装置のブロック構成図、同図（ｂ）はサンプ
リング周波数６ｋＨｚの音声を合成する音声合成装置の
ブロック構成図。5A and 5B are diagrams showing an example of application to analysis and synthesis. FIG. 5A is a block diagram of a speech synthesizer that synthesizes speech having a sampling frequency of 12 kHz, and FIG. FIG. 2 is a block diagram of a speech synthesizer for synthesizing.

【図６】図４（ｃ）に示すデータ付加方式の変形例を示
す図。FIG. 6 is a diagram showing a modification of the data addition method shown in FIG. 4 (c).

[Explanation of symbols]

１…フーリエ変換処理部、２…スペクトルデータ数変更
部、３…フーリエ逆変換処理部、１３，２３…音声合成
演算部、１４…１２ｋＨｚＤ／Ａ変換部、２４…６ｋＨ
ｚＤ／Ａ変換部、１１１…１２ｋＨｚ合成用ケプストラ
ム・パラメータ記憶部、２１１…６ｋＨｚ合成用ケプス
トラム・パラメータ記憶部。DESCRIPTION OF SYMBOLS 1 ... Fourier-transform processing part, 2 ... Spectrum data number change part, 3 ... Fourier inverse transformation processing part, 13,23 ... Speech synthesis operation part, 14 ... 12kHz D / A conversion part, 24 ... 6kHz
zD / A converter, 111 ... 12 kHz synthesis cepstrum parameter storage, 211 ... 6 kHz synthesis cepstrum parameter storage.

Claims

(57) [Claims]

A first cepstrum parameter obtained by analyzing voice signal data sampled at a first frequency is converted into a power spectrum by a discrete Fourier transform for each frame, and the highest frequency of the power spectrum is obtained. A discrete cosine inverse transform of the power spectrum reduced in data number by truncating the data from the side to obtain a second cepstrum parameter for speech synthesis at a second frequency lower than the first frequency. Characteristic synthesis parameter conversion method.

2. A first cepstrum parameter obtained by analyzing audio signal data sampled at a first frequency is converted into a power spectrum by a discrete Fourier transform for each frame, and the highest frequency of the power spectrum is obtained. A discrete cosine inverse transform of the power spectrum whose data number has been increased by adding data to the second side to obtain a second cepstrum parameter for speech synthesis at a second frequency higher than the first frequency. Characteristic synthesis parameter conversion method.

3. Converting a first cepstrum parameter obtained by analyzing voice signal data sampled at a first frequency into a cepstrum parameter for voice synthesis at a second frequency different from the first frequency. Converting the first cepstrum parameter into a power spectrum by discrete Fourier transform for each frame, and if the first frequency is higher than the second frequency, By truncating the data from the highest frequency side of the power spectrum, the first frequency is
If the frequency is lower than the frequency, the data is added to the highest frequency side of the power spectrum to obtain a new power spectrum with the number of data increased / decreased, and the discrete power Fourier transform of the new power spectrum is performed. Obtaining the second cepstrum parameter according to:

4. The power spectrum according to claim 2, wherein a power value of data added to a highest frequency side of the power spectrum is smaller than a power value of a low frequency part of the power spectrum. The described synthesis parameter conversion method.

5. Fourier transform processing means for transforming a first cepstrum parameter obtained by analyzing audio signal data sampled at a first frequency into a power spectrum by discrete Fourier transform for each frame; Spectrum data number changing means for reducing the data number of the power spectrum by truncating the data from the highest frequency side of the power spectrum converted by the conversion processing means, and reducing the data number by the spectrum data number changing means. Inverse Fourier transform processing means for performing a discrete Fourier inverse transform of the power spectrum to obtain a second cepstrum parameter for voice synthesis at a second frequency lower than the first frequency. Synthesis parameter converter.

6. Fourier transform processing means for transforming a first cepstrum parameter obtained by analyzing audio signal data sampled at a first frequency into a power spectrum by discrete Fourier transform for each frame; A spectrum data number changing means for increasing the number of data of the power spectrum by adding data to the highest frequency side of the power spectrum converted by the conversion processing means, and the number of data is increased by the spectrum data number changing means. Inverse Fourier transform processing means for performing a discrete Fourier inverse transform of the power spectrum to obtain a second cepstrum parameter for voice synthesis at a second frequency higher than the first frequency. Synthesis parameter converter.

7. Converting a first cepstrum parameter obtained by analyzing voice signal data sampled at a first frequency into a cepstrum parameter for voice synthesis at a second frequency different from the first frequency. A Fourier transform processing means for transforming the first cepstrum parameter into a power spectrum by a discrete Fourier transform for each frame, wherein the first frequency is higher than the second frequency. If expensive
By adding data to the highest frequency side of the power spectrum converted by the Fourier transform processing means, if the first frequency is lower than the second frequency, data is added to the highest frequency side of the power spectrum. A spectrum data number changing means for increasing or decreasing the number of data of the power spectrum, and performing a discrete Fourier inverse transform on the power spectrum having an increased or decreased number of data by the spectrum data number changing means, An inverse Fourier transform processing means for obtaining cepstrum parameters.

8. A power value of data added to a highest frequency side of the power spectrum by the spectrum data number changing means is smaller than a power value of a low frequency portion of the power spectrum. The synthesis parameter conversion device according to claim 6 or 7.