JPH10187180A

JPH10187180A - Musical sound generating device

Info

Publication number: JPH10187180A
Application number: JP8356123A
Authority: JP
Inventors: Goro Sakata; 吾朗坂田
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1996-12-25
Filing date: 1996-12-25
Publication date: 1998-07-14

Abstract

PROBLEM TO BE SOLVED: To realize a musical sound generating device in which a voice- synthesized human voice sound can be musical sound-formed as a natural singing voice. SOLUTION: Partial auto correlation (PARCOR) coefficients K1-Kn generated by a PARCOR analyzing system 2 are stored in a PARCOR coefficient storage part 4, and residual waveform is stored in a residual waveform data memory 3. A control circuit reads the corresponding PARCOR coefficients K1-Kn from the PARCOR coefficient storage part 4 according to performance data, and a pitch converting means 5 pitch-converts the residual waveform read from the residual waveform data memory 3 according to the performance data. Then, a PARCOR synthesizing system 6 PARCOR-synthesizes the pitch-converted residual waveform data according to the PARCOR coefficients K1-Kn, and generates a voice with a desired pitch. Thus, the voice-synthesized human voice sound can be musical sound-formed as a natural singing voice.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成する装置
に関し、特に、自然な歌声を発生することができる楽音
発生装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing device, and more particularly to a tone generating device capable of generating a natural singing voice.

【０００２】[0002]

【従来の技術】従来より、音声分析して抽出した特徴パ
ラメータに基づき人声音を合成する手法として、チャネ
ルボコーダや、線形予測、ＰＡＲＣＯＲ（パーコール）
と呼ばれる技術が知られている。これら音声合成技術
は、分析した音声を如何に少ない情報量に変換するか、
つまり音声を分析して特徴パラメータの形に変換して言
葉の意味内容に関係の無い冗長成分を除いて情報量を圧
縮することに着目したものであって、高音質で音声合成
したり、合成した人声音を楽音形成に応用することを考
えたものではなかった。2. Description of the Related Art Conventionally, as a method of synthesizing a human voice based on feature parameters extracted by voice analysis, channel vocoder, linear prediction, PARCOR (Percall)
A technique called is known. These speech synthesis techniques can convert the analyzed speech into a smaller amount of information,
In other words, it focuses on analyzing speech, converting it into the form of feature parameters, and compressing the amount of information excluding redundant components not related to the meaning of words. He did not consider applying the human voice to the formation of musical sounds.

【０００３】そうした中にあって、チャネルボコーダは
構成が単純でリアルタイムの分析合成に向いているた
め、フィルタバンクにより抽出される音声のパワースペ
クトル包絡に基づき楽音合成する楽音発生装置に適用さ
れていた。しかしながら、チャネルボコーダでは、フィ
ルタバンクを構成するバンドパスフィルタ段数の限界
や、子音を合成できない等の問題により高音質の音声合
成が叶わず、やがて淘汰されて行った。In such a situation, the channel vocoder has a simple structure and is suitable for real-time analysis and synthesis. Therefore, the channel vocoder has been applied to a tone generator which synthesizes a tone based on the power spectrum envelope of speech extracted by a filter bank. . However, in the channel vocoder, high-quality sound synthesis was not achieved due to the limitation of the number of band-pass filter stages constituting the filter bank and problems such as inability to synthesize consonants.

【０００４】[0004]

【発明が解決しようとする課題】一方、従来の波形メモ
リ読み出し方式による楽音発生装置では、サンプリング
した人声音を波形メモリに記憶しておき、これをサンプ
リング時のピッチで読み出し再生すれば、最も単純な形
で高品位な人声音を発生させることが可能になるもの
の、サンプリング時のピッチとは異なるピッチで読み出
し再生しようとすると、人声音のフォルマント周波数が
変換ピッチ量に応じて変化してしまう為、自然な歌声を
発生することができないという問題がある。On the other hand, in the conventional tone generator using the waveform memory reading method, the simplest method is to store the sampled human voice in the waveform memory and read and reproduce it at the sampling pitch. Although it is possible to generate a high-quality human voice in a simple form, if you try to read and play it at a pitch different from the pitch at the time of sampling, the formant frequency of the human voice will change according to the conversion pitch amount However, there is a problem that a natural singing voice cannot be generated.

【０００５】そこで本発明は、このような事情に鑑みて
なされたもので、音声合成された人声音を自然な歌声と
して楽音形成することができる楽音発生装置を提供する
ことを目的としている。Accordingly, the present invention has been made in view of such circumstances, and has as its object to provide a musical sound generating apparatus capable of forming a musical sound from a synthesized human voice as a natural singing voice.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、請求項１に記載の発明では、時系列に標本化された
音声信号をパーコール分析してパーコール係数群および
残差波形を発生する音声分析手段と、この音声分析手段
が発生したパーコール係数群および残差波形を記憶する
記憶手段と、演奏情報に応じて前記記憶手段から対応す
るパーコール係数群を読み出す係数読み出し手段と、前
記記憶手段に記憶される残差波形を、前記演奏情報に応
じてピッチ変換するピッチ変換手段と、このピッチ変換
手段によってピッチ変換された残差波形を、前記係数読
み出し手段によって読み出されるパーコール係数群に従
ってパーコール合成して所望ピッチの音声を生成する音
声合成手段とを具備することを特徴とする。In order to achieve the above object, according to the first aspect of the present invention, an audio signal sampled in time series is subjected to Percoll analysis to generate a Percoll coefficient group and a residual waveform. Analysis means, storage means for storing a Percoll coefficient group and a residual waveform generated by the voice analysis means, coefficient reading means for reading out a corresponding Percoll coefficient group from the storage means according to performance information, Pitch conversion means for converting the stored residual waveform into a pitch in accordance with the performance information; and performing Percoll synthesis on the residual waveform pitch-converted by the pitch conversion means in accordance with a Percoll coefficient group read by the coefficient reading means. And voice synthesis means for generating voice of a desired pitch.

【０００７】上記請求項１に従属する請求項２に記載の
発明によれば、前記記憶手段は、前記音声分析手段から
フレーム毎に出力される残差波形を、フレーム順に加算
してなる一連の残差波形を記憶することを特徴とする。According to a second aspect of the present invention, the storage means adds a series of residual waveforms output from the voice analysis means for each frame in a series of frames. It is characterized in that the residual waveform is stored.

【０００８】また、上記請求項１に従属する請求項３に
記載の発明によれば、前記ピッチ変換手段は、一定周期
毎に時分割動作する複数の読み出しチャンネルを有し、
これら各読み出しチャンネルにてそれぞれ前記記憶手段
から読み出される残差波形の個々に窓関数を乗算し、そ
れを逐次加算合成して所望ピッチの残差波形を生成する
ことを特徴としている。According to the third aspect of the present invention, the pitch conversion means has a plurality of read channels which operate in a time-sharing manner at regular intervals,
In each of these read channels, a residual function read out from the storage means is multiplied by a window function, and the resultant is sequentially added and synthesized to generate a residual waveform having a desired pitch.

【０００９】さらに、上記請求項１に従属する請求項４
に記載の発明では、前記ピッチ変換手段は、時分割動作
する複数の読み出しチャンネルを有し、前記演奏情報が
指定する音高に対応した周期毎に各読み出しチャンネル
において、前記係数読み出し手段が読み出すパーコール
係数群と、当該パーコール係数に対応付けられた残差波
形とを同期させることを特徴とする。Further, claim 4 is dependent on claim 1.
In the invention described in (1), the pitch conversion means has a plurality of read channels that operate in a time-division manner, and in each of the read channels for each cycle corresponding to a pitch specified by the performance information, The coefficient group is synchronized with a residual waveform associated with the Percoll coefficient.

【００１０】また、上記請求項１に従属する請求項５に
記載の発明では、前記音声合成手段は、ノイズ信号を発
生するノイズ発生手段と、このノイズ発生手段が発生す
るノイズ信号と、前記ピッチ変換手段が生成するピッチ
変換された残差波形とを、前記係数読み出し手段から出
力されるパーコール係数群の内のいずれかの係数に応じ
て重み付けして加算する重み付け手段とを備え、この重
み付け手段の出力を、前記係数読み出し手段によって読
み出されるパーコール係数に従ってパーコール合成する
ことを特徴としている。In the invention according to claim 5 dependent on claim 1, the voice synthesizing means includes: a noise generating means for generating a noise signal; a noise signal generated by the noise generating means; Weighting means for weighting the pitch-converted residual waveform generated by the conversion means in accordance with one of the Percoll coefficient groups output from the coefficient reading means, and adding the weights. Are percolated in accordance with the Percoll coefficient read by the coefficient reading means.

【００１１】本発明では、音声分析手段がパーコール分
析して得たパーコール係数群および残差波形を記憶手段
に記憶しておき、係数読み出し手段が演奏情報に応じて
前記記憶手段から対応するパーコール係数群を読み出す
一方、ピッチ変換手段が前記記憶手段に記憶される残差
波形を演奏情報に応じてピッチ変換すると、音声合成手
段がピッチ変換された残差波形を、前記係数読み出し手
段によって読み出されるパーコール係数群に従ってパー
コール合成して所望ピッチの音声を生成する。これによ
り、音声合成された人声音を自然な歌声として楽音形成
することが可能になる。In the present invention, the Percoll coefficient group and the residual waveform obtained by the voice analysis means by the Percoll analysis are stored in the storage means, and the coefficient reading means reads the corresponding Percoll coefficient from the storage means according to the performance information. While the group is read out, the pitch conversion means pitch-converts the residual waveform stored in the storage means in accordance with the performance information, and the voice synthesis means reads the pitch-converted residual waveform by the coefficient reading means. Percall synthesis is performed according to the coefficient group to generate a voice having a desired pitch. As a result, it is possible to form a musical tone as a natural singing voice using the synthesized voice.

【００１２】[0012]

【発明の実施の形態】本発明による楽音発生装置は、電
子楽器の他、人声音で音声案内する等の音声合成装置な
どに適用され得る。以下では、本発明の実施の形態であ
る楽音発生装置を実施例として図面を参照して説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A musical sound generator according to the present invention can be applied to a voice synthesizer for guiding voices using human voices, in addition to an electronic musical instrument. Hereinafter, a tone generator according to an embodiment of the present invention will be described as an example with reference to the drawings.

【００１３】（１）全体構成図１は、本発明の一実施例の原理を説明するための機能
ブロック図である。この図において、２はパーコール
（ＰＡＲＣＯＲ）分析系である。パーコール分析系２で
は、端子ＩＮより入力される離散的な音声データｘｉ間
の線形予測誤差の自己相関を逐次算出してパーコール係
数Ｋ１〜Ｋｎおよび残差波を発生する。残差波は、分析
窓中の音声データが無声音／有声音のいずれであるかを
表わすものであって、無声音である時にはホワイトノイ
ズとなり、一方、有声音である時にはピッチ周期を形成
するパルス列となり、後述する残差波形データメモリ３
に記憶される。(1) Overall Configuration FIG. 1 is a functional block diagram for explaining the principle of one embodiment of the present invention. In this figure, reference numeral 2 denotes a PARCOR analysis system. The Percoll analysis system 2 sequentially calculates the autocorrelation of the linear prediction error between the discrete voice data xi input from the terminal IN, and generates Percoll coefficients K1 to Kn and a residual wave. The residual wave indicates whether the voice data in the analysis window is unvoiced or voiced. When the voice is unvoiced, it becomes white noise. On the other hand, when it is voiced, it becomes a pulse train forming a pitch period. , A residual waveform data memory 3 described later
Is stored.

【００１４】４はパーコール分析系２から出力されるパ
ーコール係数Ｋ１〜Ｋｎを分析フレーム毎に順次記憶す
るパーコール係数記憶部である。ここで言う分析フレー
ムとは、後述する窓関数にて規定される音声分析期間に
相当する。５は、例えば周波数ナンバ等の再生音高を指
定する音高データに応じて残差波形データメモリ３から
読み出される残差波形データをピッチ変換するピッチ変
換手段であり、その詳細については追って述べる。６
は、上述したパーコール分析系２とは逆の過程で音声を
合成するパーコール合成系であり、ピッチ変換手段５か
ら出力される残差波形データと、パーコール係数記憶部
４から読み出されるパーコール係数Ｋ１〜Ｋｎとに基づ
いて離散的な音声データｘｉを合成して端子ＯＵＴより
出力する。Reference numeral 4 denotes a Percoll coefficient storage unit for sequentially storing Percoll coefficients K1 to Kn output from the Percoll analysis system 2 for each analysis frame. The analysis frame here corresponds to a speech analysis period defined by a window function described later. Reference numeral 5 denotes pitch conversion means for converting the pitch of the residual waveform data read from the residual waveform data memory 3 in accordance with pitch data designating a reproduced pitch, such as a frequency number. The details will be described later. 6
Is a Percoll synthesis system for synthesizing voice in a process reverse to that of the above-mentioned Percoll analysis system 2, and the residual waveform data output from the pitch conversion means 5 and the Percoll coefficients K 1 to K 1 read from the Percoll coefficient storage unit 4. Based on Kn, discrete audio data xi is synthesized and output from terminal OUT.

【００１５】（２）要部構成次に、図１〜図５を参照し、上述した各要部２〜６の構
成について説明して行く。パーコール分析系２の構成図１に図示するパーコール分析系２において、２ａは入
力される信号を少なくとも１サンプリング周期遅延して
出力する遅延回路である。なお、この遅延回路２ａは、
１サンプル遅延に限定されず、系のサンプリング周波数
Ｆｓが３０ＫＨｚを越える時には、２サンプル遅延が適
当である。２ｂはサンプル（音声データ）ｘｉ（Ｎ個）
に対して窓関数Ｗ（ｎ）を乗算して重み付けをした後
に、自己相関値を算出する相関器である。(2) Configuration of Main Part Next, referring to FIGS. 1 to 5, the configuration of each of the main parts 2 to 6 will be described. Configuration of Percoll Analysis System 2 In the Percoll analysis system 2 shown in FIG. 1, reference numeral 2a denotes a delay circuit that delays an input signal by at least one sampling period and outputs the delayed signal. Note that this delay circuit 2a
The sampling delay is not limited to one sample. When the sampling frequency Fs of the system exceeds 30 KHz, a two-sample delay is appropriate. 2b is a sample (voice data) xi (N pieces)
This is a correlator that calculates an autocorrelation value after multiplying by a window function W (n) and weighting.

【００１６】相関器２ｂにおいて重み付けされる窓関数
Ｗ（ｎ）としては、次式（１）に示すハニング窓関数が
用いられている。As the window function W (n) weighted in the correlator 2b, a Hanning window function expressed by the following equation (1) is used.

【００１７】[0017]

【数１】 (Equation 1)

【００１８】このハニング窓関数Ｗ（ｎ）は、図３に図
示するように、約３０ｍｓ幅の分析フレームを持ち、２
０ｍｓのフレーム周期で分析を進めるようになってい
る。また、相関器２ｂでは、次式（２）に示す自己相関
関数に基づきパーコール係数Ｋ１〜Ｋｎが算出される。As shown in FIG. 3, the Hanning window function W (n) has an analysis frame having a width of about 30 ms and has
The analysis proceeds at a frame period of 0 ms. In the correlator 2b, the Percoll coefficients K1 to Kn are calculated based on the autocorrelation function shown in the following equation (2).

【００１９】[0019]

【数２】 (Equation 2)

【００２０】なお、この関数ではパーコール係数Ｋ１〜
Ｋｎがサンプルの振幅の影響を受けないよう正規化して
いる。パーコール係数Ｋ１〜Ｋｎは、完全に相関がある
時には「１」、相関が無い時には「０」、完全に逆位相
の関係にある時には「−１」となる。In this function, the Percoll coefficients K1 to K1
Kn is normalized so as not to be affected by the amplitude of the sample. The Percoll coefficients K1 to Kn are “1” when there is a perfect correlation, “0” when there is no correlation, and “−1” when there is a completely opposite phase relationship.

【００２１】２ｃは１サンプル遅延された音声データに
パーコール係数Ｋ１を乗算して出力する係数乗算器であ
り、２ｄは現サンプリングされた音声データにパーコー
ル係数Ｋ１を乗算して出力する係数乗算器である。２ｅ
は現サンプリングされた音声データから係数乗算器２ｃ
の出力を減算する減算器、２ｆは１サンプル遅延された
音声データに係数乗算器２ｄの出力を減算する減算器で
ある。Reference numeral 2c denotes a coefficient multiplier which multiplies the voice data delayed by one sample by the Percoll coefficient K1 and outputs the result. 2d denotes a coefficient multiplier which multiplies the currently sampled voice data by the Percoll coefficient K1 and outputs the result. is there. 2e
Is a coefficient multiplier 2c from the currently sampled audio data.
Is a subtractor for subtracting the output of the coefficient multiplier 2d from the audio data delayed by one sample.

【００２２】以上の構成要素２ａ〜２ｆは、格子型フィ
ルタ２−１を構成し、これがｎ段縦続接続された格子型
フィルタ２−１〜２−ｎによって、サンプル（音声デー
タ）ｘｉ間の線形予測誤差の相関を表わすパーコール係
数Ｋ１〜Ｋｎを発生するパーコール分析系２が形成され
る。The above-mentioned components 2a to 2f constitute a lattice filter 2-1. The lattice filters 2-1 to 2-n are connected in cascade by n stages to form a linear filter between samples (voice data) xi. A Percoll analysis system 2 for generating Percoll coefficients K1 to Kn representing the correlation of the prediction error is formed.

【００２３】パーコール合成系６の構成図１に図示するパーコール合成系６は、上述のパーコル
分析系２と同様に、ｎ段縦続接続された格子型フィルタ
ー６−１〜６−ｎから構成される。これら縦続接続され
る格子型フィルター６−１〜６−ｎは、それぞれ遅延回
路６ｆ、係数乗算器６ｇ，６ｈ、加算器６ｉおよび減算
器６ｊから構成され、与えられるパーコール係数Ｋ１〜
Ｋｎに基づき前述したパーコール分析過程の逆過程で音
声合成する。Configuration of Percoll Synthesis System 6 The Percoll synthesis system 6 shown in FIG. 1 is composed of lattice filters 6-1 to 6-n cascaded in n stages, similarly to the Percoll analysis system 2 described above. . The cascade-connected lattice filters 6-1 to 6-n each include a delay circuit 6f, coefficient multipliers 6g and 6h, an adder 6i and a subtractor 6j.
Based on Kn, speech synthesis is performed in the reverse process of the above-described Percoll analysis process.

【００２４】なお、遅延回路６ｆにおける遅延量を、パ
ーコール分析系２のサンプリング遅延量と同じにすれ
ば、分析した音声信号と同じフォルマントとなる。した
がって、音声合成時の特殊効果として故意にフォルマン
トを異ならせるには、分析時とは異なるサンプリング遅
延量とすれば良い。If the amount of delay in the delay circuit 6f is made equal to the amount of sampling delay of the Percoll analysis system 2, the formant becomes the same as the analyzed audio signal. Therefore, in order to intentionally make the formant different as a special effect at the time of speech synthesis, a sampling delay amount different from that at the time of analysis may be used.

【００２５】パーコール係数記憶部４の構成パーコール分析系２から出力されるパーコール係数Ｋ１
〜Ｋｎは、図示されていない制御部の指示に基づき、パ
ーコール係数記憶部４に逐次フレーム記憶、あるいは当
該記憶部４からフレーム読み出しされる。ここで、図２
を参照してパーコール係数記憶部４のフレーム記憶態様
について説明しておく。本実施例の場合、分析されたパ
ーコール係数Ｋ１〜Ｋｎは、前述した分析フレーム毎に
配列記憶される。Configuration of Percoll coefficient storage unit 4 Percoll coefficient K1 output from Percoll analysis system 2
To Kn are sequentially stored in the Percoll coefficient storage unit 4 or read out from the storage unit 4 based on an instruction from a control unit (not shown). Here, FIG.
The frame storage mode of the Percoll coefficient storage unit 4 will be described with reference to FIG. In the case of the present embodiment, the analyzed Percoll coefficients K1 to Kn are arranged and stored for each analysis frame described above.

【００２６】日本語の場合、母音部分ではパーコール係
数Ｋ１〜Ｋｎが殆ど変化しないのに対し、子音部分では
倍音の変化が大きい為、これに対応してパーコール係数
Ｋ１〜Ｋｎの変化も大きい。そこで、母音部分では、図
４に図示するように、上記ストップビットＳＴＢにスト
ップフラグ「１」を立て、前後の似通ったパーコール係
数Ｋ１〜Ｋｎを持つ分析フレームを削除することによっ
て大幅にデータ量を削減し得るようになっている。In the case of Japanese, the percall coefficients K1 to Kn hardly change in the vowel part, while the overtone changes in the consonant part are large, and accordingly, the changes in the Percoll coefficients K1 to Kn are large. Therefore, in the vowel part, as shown in FIG. 4, a stop flag “1” is set in the stop bit STB, and the analysis frame having similar Percoll coefficients K1 to Kn before and after is deleted to greatly reduce the data amount. It can be reduced.

【００２７】つまり、演奏データとしてノートオンが与
えられた時、ストップフラグが「１」となっているフレ
ーム迄、順次特徴パラメータを読み出してパーコール合
成系６に入力し、ストップフラグが「１」となっている
フレームが読み出し対象となった時点でフレームの更新
読み出しを一時停止する。そして、次のノートオンが発
生した時に、再びストップフラグが立っているフレーム
まで順次特徴パラメータを読み出してパーコール合成系
６に与えて音声合成させることが可能になる。なお、演
奏データに応じてパーコール合成させる態様（変形例）
については追って説明する。That is, when note-on is given as performance data, the feature parameters are sequentially read out and input to the Percoll synthesis system 6 until the frame in which the stop flag is "1", and the stop flag becomes "1". When the current frame becomes a read target, the update reading of the frame is temporarily stopped. Then, when the next note-on occurs, it becomes possible to sequentially read out the characteristic parameters up to the frame in which the stop flag is set again and give it to the Percall synthesizing system 6 to perform voice synthesis. In addition, an aspect in which percall synthesis is performed according to performance data (modification)
Will be described later.

【００２８】また、フレーム毎に配列記憶されるパーコ
ール係数Ｋ１〜Ｋｎには、スタートアドレスＳＴＡが付
加される。このスタートアドレスＳＴＡを参照して残差
波形データメモリ３から対応するフレームの残差波形デ
ータが読み出される。なお、図４に示す一例では、各フ
レーム毎にスタートアドレスＳＴＡが設けられている
が、これは必ずしも各フレーム毎に必須なものではな
く、幾つかのフレーム毎に付加する形態としても良い。A start address STA is added to the Percoll coefficients K1 to Kn arranged and stored for each frame. The residual waveform data of the corresponding frame is read from the residual waveform data memory 3 with reference to the start address STA. In the example shown in FIG. 4, the start address STA is provided for each frame. However, the start address STA is not necessarily required for each frame, and may be added for some frames.

【００２９】残差波形データメモリ３の構成前述のパーコール分析系２において生成される残差波形
データは、図３に示すように、１フレーム毎にハニング
窓関数Ｗ（ｎ）を乗算された形で出力される。残差波形
データメモリ３では、こうしてパーコール分析系２から
出力される残差波形データを、各フレーム毎に加算して
連続した一つの波形データとして記憶する（図３参
照）。このような形で記憶しておくことにより、残差波
形データをそのままデータメモリ３から読み出し、これ
を対応するパーコール係数Ｋ１〜Ｋｎと共にパーコール
合成系６に入力することによって、原音と同一ピッチの
合成音を得ることができる。Configuration of Residual Waveform Data Memory 3 The residual waveform data generated in the above-mentioned Percoll analysis system 2 is obtained by multiplying a Hanning window function W (n) for each frame as shown in FIG. Is output. In the residual waveform data memory 3, the residual waveform data output from the Percoll analysis system 2 is added for each frame and stored as one continuous waveform data (see FIG. 3). By storing the residual waveform data in this manner, the residual waveform data is read out from the data memory 3 as it is, and is input to the Percoll synthesizing system 6 together with the corresponding Percoll coefficients K1 to Kn. You can get sound.

【００３０】ピッチ変換手段５の構成残差波形データメモリ３に記憶される残差波形データ
を、ピッチ変換する簡易な手法としては、当該メモリ３
の読み出しアドレスを実数部と小数部とに分け、隣り合
う実数部アドレスに応じて読み出した残差波形データ
を、補間係数となる小数部アドレスで補間する補間読み
出しが知られている。Configuration of Pitch Conversion Means 5 As a simple method for converting the pitch of the residual waveform data stored in the residual waveform data memory 3,
The interpolation readout is known in which the read address of the data is divided into a real part and a decimal part, and the residual waveform data read according to the adjacent real part addresses is interpolated by a decimal part address serving as an interpolation coefficient.

【００３１】ところで、この補間読み出しよるピッチ変
換を行う場合には、再生ピッチに同期して読み出し時間
が変化する為、合成に使用するパーコール係数Ｋ１〜Ｋ
ｎと読み出される残差波形データとが時間的にずれてし
まう。そこで、本発明では、ピッチ変換を一定周期毎に
複数の時分割チャンネルで行うことによって、再生する
フレームの残差波形データとパーコール係数Ｋ１〜Ｋｎ
とを同期させ、各時分割チャンネルにて読み出される残
差波形データに個々に窓関数を乗算し、その結果を逐次
加算合成して所望ピッチの残差波形データを生成するこ
とを特徴とする。When the pitch conversion by the interpolation reading is performed, the read time changes in synchronization with the reproduction pitch, so that the Percoll coefficients K1 to K
n and the residual waveform data to be read out are shifted in time. Therefore, in the present invention, by performing pitch conversion on a plurality of time-division channels at regular intervals, residual waveform data of frames to be reproduced and Percoll coefficients K1 to Kn are reproduced.
Are synchronized with each other, and the residual waveform data read by each time-division channel is individually multiplied by a window function, and the results are successively added and synthesized to generate residual waveform data of a desired pitch.

【００３２】ここで、図４を参照して時分割された２つ
の読み出しチャンネルＣＨ１，ＣＨ２に基づいてピッチ
変換する例について説明する。この例の場合、最初に、
読み出しチャンネルＣＨ１において、再生するフレーム
のパーコール係数Ｋ１〜Ｋｎに付加される波形読み出し
開始アドレスＳＴＡ₁（図２参照）に基づき、残差波形
データメモリ３から対応する残差波形データＲＷ１を補
間読み出し、読み出され残差波形データＲＷ１に窓関数
を乗算する。Here, an example in which pitch conversion is performed based on two time-division read channels CH1 and CH2 will be described with reference to FIG. In this case, first,
In the read channel CH1, the corresponding residual waveform data RW1 is interpolated and read from the residual waveform data memory 3 based on the waveform read start address STA ₁ (see FIG. 2) added to the Percoll coefficients K1 to Kn of the frame to be reproduced. The read residual waveform data RW1 is multiplied by a window function.

【００３３】そして、次の読み出し周期では、読み出し
チャンネルＣＨ２において、再生するフレームのパーコ
ール係数Ｋ１〜Ｋｎに対応する波形読み出し開始アドレ
スＳＴＡ₂に基づき、残差波形データメモリ３から対応
する残差波形データＲＷ２を補間読み出しして窓関数を
乗算し、これを先に窓関数が乗算された残差波形データ
ＲＷ１に対して加算する。以後、同様に、各チャンネル
ＣＨ１，ＣＨ２から交互に補間読み出しされる残差波形
データＲＷ３，４，…にそれぞれ窓関数を乗算してから
順次加算する。このようにすることで、再生ピッチに依
存することなく、合成に使用するパーコール係数Ｋ１〜
Ｋｎと残差波形データとを同期させることが可能になっ
ている。[0033] In the next read cycle, the read channel CH2, based on the waveform reading start address STA ₂ which corresponds to the Percoll coefficient K1~Kn frame to be reproduced, the corresponding residual waveform data from the residual waveform data memory 3 RW2 is interpolated and read, multiplied by a window function, and added to the residual waveform data RW1 multiplied by the window function first. Thereafter, similarly, the residual waveform data RW3, RW4,... Alternately interpolated and read from the respective channels CH1 and CH2 are each multiplied by a window function and then sequentially added. By doing so, the Percoll coefficients K1 to K1 used for the synthesis are independent of the reproduction pitch.
Kn can be synchronized with the residual waveform data.

【００３４】次に、図５を参照し、上述した過程でピッ
チ変換を行うピッチ変換手段５の機能構成について説明
しておく。図５において、５−１，５−２は、再生する
フレームのパーコール係数Ｋ１〜Ｋｎに対応する波形読
み出し開始アドレスＳＴＡに基づき、残差波形データメ
モリ３から対応する残差波形データを補間読み出しする
補間器であり、減算器５ａ、係数乗算器５ｂおよび加算
器５ｃから構成され、それぞれ一定周期毎に交互に時分
割に補間動作する。Next, with reference to FIG. 5, a description will be given of a functional configuration of the pitch conversion means 5 for performing pitch conversion in the above-described process. In FIG. 5, 5-1 and 5-2 interpolate and read the corresponding residual waveform data from the residual waveform data memory 3 based on the waveform read start address STA corresponding to the Percoll coefficients K1 to Kn of the frame to be reproduced. The interpolator is composed of a subtractor 5a, a coefficient multiplier 5b, and an adder 5c, and performs an interpolating operation alternately in a time-division manner at regular intervals.

【００３５】すなわち、例えば、波形読み出し開始アド
レスＳＴＡの隣り合う実数部アドレスに従って残差波形
データメモリ３から残差波形データａ，ｂが読み出され
ると、補間器５−１では上記アドレスＳＴＡの小数部を
補間係数αとしてａ（１−α）＋ｂαなる補間残差波形
データを発生し、一方、補間器５−２では同様にａ（１
−β）＋ｂβの補間残差波形データを発生する。５−３
は、上記補間器５−１，５−２と同様に、減算器５ａ、
係数乗算器５ｂおよび加算器５ｃから構成される演算部
であり、補間器５−１，５−２からそれぞれ出力される
補間残差波形データを窓関数Ｗｎで内挿してなるピッチ
変換出力を発生する。That is, for example, when the residual waveform data a and b are read from the residual waveform data memory 3 in accordance with the real part address adjacent to the waveform read start address STA, the interpolator 5-1 causes the decimal part of the address STA to be read. Is used as the interpolation coefficient α to generate interpolation residual waveform data of a (1−α) + bα, while the interpolator 5-2 similarly generates a (1
−β) + bβ interpolated residual waveform data is generated. 5-3
Is a subtractor 5a, like the interpolators 5-1 and 5-2.
An arithmetic unit comprising a coefficient multiplier 5b and an adder 5c, and generates a pitch conversion output obtained by interpolating the interpolation residual waveform data output from the interpolators 5-1 and 5-2 with a window function Wn. I do.

【００３６】かくして、本実施例によれば、パーコール
分析系２から出力される残差波形データを、各フレーム
毎に加算して連続した一つの波形データとして残差波形
データメモリ３に記憶しておき、ピッチ変換を一定周期
毎に複数の時分割チャンネルで行うことによって、再生
するフレームの残差波形データとパーコール係数Ｋ１〜
Ｋｎとを同期させ、各時分割チャンネルにて読み出され
る残差波形データに個々に窓関数を乗算し、その結果を
逐次加算合成して所望ピッチの残差波形データを生成す
るので、所望ピッチに対応したピッチ変換出力を生成で
き、これにより音声合成された人声音を自然な歌声とし
て楽音形成することが可能になっている。Thus, according to the present embodiment, the residual waveform data output from the Percoll analysis system 2 is added for each frame and stored in the residual waveform data memory 3 as one continuous waveform data. By performing pitch conversion on a plurality of time division channels at regular intervals, the residual waveform data of the frame to be reproduced and the Percoll coefficients K1 to
Kn is synchronized with each other, the residual waveform data read in each time-division channel is multiplied by a window function individually, and the results are sequentially added and synthesized to generate residual waveform data of a desired pitch. It is possible to generate a corresponding pitch conversion output, thereby making it possible to form a human voice synthesized with speech as a natural singing tone.

【００３７】なお、この実施例では、一次補間する例に
ついて述べたが、これは一次補間に限らず、周知のスプ
ライン関数等を用いた高次補間を行い、より高品位なピ
ッチ変換を実現する態様にしても良い。In this embodiment, an example in which the primary interpolation is performed has been described. However, this is not limited to the primary interpolation, and a higher-order interpolation using a well-known spline function or the like is performed to realize higher quality pitch conversion. You may make it an aspect.

【００３８】（３）変形例次に、図６を参照し、演奏データに従ってパーコール係
数Ｋ１〜Ｋｎと残差波形データとを同期させながら所望
ピッチの音声をパーコール合成させる変形例について説
明する。なお、この図において、上述した実施例と共通
する構成要素には同一の番号を付し、その説明を省略す
る。(3) Modification Next, with reference to FIG. 6, a description will be given of a modification in which voices of a desired pitch are percall-synthesized while synchronizing the Percoll coefficients K1 to Kn with the residual waveform data according to the performance data. In this figure, the same reference numerals are given to the same components as those in the above-described embodiment, and the description thereof will be omitted.

【００３９】演奏データに対応させて同期再生する場合
は、パーコール係数Ｋ１〜Ｋｎの進行を演奏データによ
り制御すると共に、演奏データに対応して残差波形をピ
ッチ変換することが要求される。そこで、変形例におい
ては、演奏データに応じてパーコール係数記憶部４に対
してフレーム指定する一方、ピッチ変換手段５に対して
音高データに応じた周期毎に複数の時分割チャンネルに
よる補間読み出しを指示する制御回路７を設けている。In the case of synchronous reproduction in accordance with the performance data, it is required that the progress of the Percoll coefficients K1 to Kn be controlled by the performance data and that the residual waveform be pitch-converted in accordance with the performance data. Therefore, in a modified example, a frame is designated in the Percoll coefficient storage unit 4 according to the performance data, and the pitch conversion means 5 performs interpolation reading using a plurality of time-division channels in each cycle according to the pitch data. A control circuit 7 for instructing is provided.

【００４０】パーコール係数記憶部４に記憶される各フ
レームのパーコール係数Ｋ１〜Ｋｎには、前述したスト
ップビットＳＴＢが設けられており、主に母音の位置で
パーコール係数Ｋ１〜Ｋｎのフレーム更新を一時停止さ
せ得るので、演奏データに対応してフレーム指定し得る
ようになっている。フレームが順次更新されて行く時に
は、パーコール係数Ｋ１〜Ｋｎに付加されるスタートア
ドレスＳＴＡも一定周期でインクリメントされ、ストッ
プビットＳＴＢのストップフラグに基づきフレーム更新
が一時停止された時には、アドレスインクリメントも停
止させる。The above-mentioned stop bit STB is provided in the Percoll coefficients K1 to Kn of each frame stored in the Percoll coefficient storage unit 4, and the frame update of the Percoll coefficients K1 to Kn is temporarily performed mainly at the position of the vowel. Since it can be stopped, a frame can be designated according to the performance data. When the frames are sequentially updated, the start address STA added to the Percoll coefficients K1 to Kn is also incremented at a constant cycle. When the frame update is temporarily stopped based on the stop flag of the stop bit STB, the address increment is also stopped. .

【００４１】すると、ピッチ変換手段５では、再生ピッ
チをそのままに停止させたフレームのスタートアドレス
ＳＴＡを、読み出し開始アドレスとして時分割された複
数チャンネルが繰り返し読み出す。これにより、合成に
使用するパーコール係数Ｋ１〜Ｋｎと残差波形データと
を同期させつつ、所望ピッチに対応したピッチ変換出力
を演奏データに応じて生成し得るので、演奏データに対
応した自然な歌声を得ることができる。Then, the pitch conversion means 5 repeatedly reads the time-divided plural channels as the read start address of the start address STA of the frame in which the reproduction pitch is stopped as it is. Accordingly, a pitch conversion output corresponding to a desired pitch can be generated according to the performance data while synchronizing the Percoll coefficients K1 to Kn used for the synthesis and the residual waveform data, so that a natural singing voice corresponding to the performance data can be generated. Can be obtained.

【００４２】（４）その他の例ところで、合成に使用するパーコール係数Ｋ１〜Ｋｎと
残差波形データとは、厳密に同期させる必要はなく、パ
ーコール係数Ｋ１〜Ｋｎのフレーム供給周期とピッチ変
換および窓関数の周期とを厳密に一致させる必要もな
い。一般にはピッチ変換および窓関数の周期の方をパー
コール係数Ｋ１〜Ｋｎのフレーム供給周期より遅く設定
するのが妥当である。(4) Other Examples By the way, it is not necessary to strictly synchronize the Percoll coefficients K1 to Kn and the residual waveform data used for the synthesis, and the frame supply cycle of the Percoll coefficients K1 to Kn, the pitch conversion, and the window. It is not necessary to exactly match the period of the function. In general, it is appropriate to set the period of the pitch conversion and the window function later than the frame supply period of the Percoll coefficients K1 to Kn.

【００４３】また、両者が必ずしも一致しなくて良い理
由は、合成すべき音声が比較的滑らかに変化し、前後す
るフレームの残差波形の周波数成分が類似しているとい
う前提に基づく。したがって、ピッチ変換量が大きく、
しかも急激に残差波形が変化する子音で、残差波形がほ
ぼホワイトノイズとなる場合には、パーコール係数Ｋ１
〜Ｋｎと残差波形との不一致が音質劣化を引き起こすの
で、パーコール係数Ｋ１〜Ｋｎから残差波形を推測し、
ピッチ変換からの出力とホワイトノイズ源の切り換え
や、重み付け変化等を併用する態様が考えられる。The reason why the two do not always have to match is based on the premise that the speech to be synthesized changes relatively smoothly and the frequency components of the residual waveforms of the preceding and succeeding frames are similar. Therefore, the pitch conversion amount is large,
Further, when the consonant whose residual waveform changes rapidly and the residual waveform becomes almost white noise, the Percoll coefficient K1
Since the mismatch between ~ Kn and the residual waveform causes sound quality deterioration, the residual waveform is estimated from the Percoll coefficients K1 to Kn,
It is conceivable that the output from the pitch conversion and the switching of the white noise source, the weighting change, etc. are used together.

【００４４】そうした態様の構成について図７を参照し
て説明する。図７に示す構成は、図１に図示した実施例
に対して構成要件１０〜１４を追加したものである。こ
のような構成おいて、１０は無声音を合成する際にホワ
イトノイズＷＮを発生するノイズ発生器である。１１
は、パーコール係数Ｋ１を所定の定数（約０．２〜０．
３）に対して大小比較することによって有声音あるいは
無声音のいずれかを判断し、その結果に応じた乗算係数
Ｗ１，Ｗ２を発生する比較器である。The configuration of such an embodiment will be described with reference to FIG. The configuration shown in FIG. 7 is obtained by adding components 10 to 14 to the embodiment shown in FIG. In such a configuration, reference numeral 10 denotes a noise generator that generates white noise WN when synthesizing an unvoiced sound. 11
Sets the Percoll coefficient K1 to a predetermined constant (about 0.2 to 0.
This is a comparator that determines either voiced sound or unvoiced sound by comparing the magnitudes of 3) and generates multiplication coefficients W1 and W2 according to the result.

【００４５】すなわち、この比較器１１は、合成に使用
するパーコール係数Ｋ１が「有声音」に対応するもので
ある時には、係数Ｗ１を「１」として出力する一方、係
数Ｗ２を「０」とする。一方、これとは逆に「無声音」
に対応するものである時には、係数Ｗ２を「１」として
出力し、係数Ｗ１を「０」とする。That is, when the Percoll coefficient K1 used for synthesis corresponds to "voiced sound", the comparator 11 outputs the coefficient W1 as "1" and sets the coefficient W2 as "0". . On the other hand, on the other hand, "silent sound"
Is output, the coefficient W2 is output as "1", and the coefficient W1 is set to "0".

【００４６】１２，１３はそれぞれ係数乗算器であり、
係数乗算器１２はピッチ変換された残差波形データに対
して係数Ｗ１を乗算して出力し、係数乗算器１３はホワ
イトノイズＷＮに対して係数Ｗ２を乗算して出力する。
１４は上記係数乗算器１２，１３の各出力を加算してパ
ーコール合成系６に供給する加算器である。したがっ
て、この加算器１４では、有声音（母音）を合成する時
にはピッチ変換された残差波形データをパーコール合成
系６に供給し、無声音（子音）を合成する時にはホワイ
トノイズＷＮをパーコール合成系６に供給する。Reference numerals 12 and 13 denote coefficient multipliers, respectively.
The coefficient multiplier 12 multiplies the pitch-converted residual waveform data by a coefficient W1 and outputs the result. The coefficient multiplier 13 multiplies the white noise WN by a coefficient W2 and outputs the result.
An adder 14 adds the outputs of the coefficient multipliers 12 and 13 and supplies the result to the Percoll synthesis system 6. Therefore, the adder 14 supplies the residual waveform data whose pitch has been converted to the Percoll synthesis system 6 when synthesizing a voiced sound (vowel), and converts the white noise WN into the Percoll synthesis system 6 when synthesizing an unvoiced sound (consonant). To supply.

【００４７】上記構成によれば、ピッチ変換量が大き
く、しかも急激に残差波形が変化する子音を合成する場
合には、ホワイトノイズＷＮがパーコール合成系６に供
給される為、パーコール係数Ｋ１〜Ｋｎと残差波形との
不一致による音質劣化を防止でき、自然な人声音を合成
し得る。According to the above arrangement, when synthesizing a consonant having a large pitch conversion amount and a drastically changing residual waveform, the white noise WN is supplied to the Percoll synthesis system 6, so that the Percoll coefficients K1 to K1 are used. Sound quality deterioration due to mismatch between Kn and the residual waveform can be prevented, and a natural human voice can be synthesized.

【００４８】また、上記構成において、子音を合成する
際にホワイトノイズＷＮを用いるようにすると、子音に
対応した残差波形データを残差波形データメモリ３に記
憶しておく必要がなくなる為、当該メモリ３の記憶容量
を削減することが可能になる。なお、上述した比較器１
１にあっては、有声音（母音）・無声音（子音）の変化
に応じて係数Ｗ１，Ｗ２をクロスフェードさせ、有声音
から無声音への変化、あるいは無声音から有声音への変
化をより自然な形にすることも可能である。In the above configuration, if the white noise WN is used when synthesizing a consonant, there is no need to store the residual waveform data corresponding to the consonant in the residual waveform data memory 3. The storage capacity of the memory 3 can be reduced. Note that the above-described comparator 1
In the case of No. 1, the coefficients W1 and W2 are cross-fade according to the change of voiced sound (vowel) / unvoiced sound (consonant), and the change from voiced sound to unvoiced sound or the change from unvoiced sound to voiced sound is more natural. It can also be shaped.

【００４９】[0049]

【発明の効果】本発明によれば、音声分析手段がパーコ
ール分析して得たパーコール係数群および残差波形を記
憶手段に記憶しておき、係数読み出し手段が演奏情報に
応じて前記記憶手段から対応するパーコール係数群を読
み出す一方、ピッチ変換手段が前記記憶手段に記憶され
る残差波形を演奏情報に応じてピッチ変換すると、音声
合成手段がピッチ変換された残差波形を、前記係数読み
出し手段によって読み出されるパーコール係数群に従っ
てパーコール合成して所望ピッチの音声を生成するの
で、音声合成された人声音を自然な歌声として楽音形成
することができる。According to the present invention, the Percoll coefficient group and the residual waveform obtained by the Percoll analysis by the voice analysis means are stored in the storage means, and the coefficient reading means reads out from the storage means in accordance with the performance information. While the corresponding Percoll coefficient group is read, while the pitch conversion means pitch-converts the residual waveform stored in the storage means according to the performance information, the voice synthesizing means converts the pitch-converted residual waveform to the coefficient reading means. Therefore, a voice having a desired pitch is generated by performing a Percoll synthesis in accordance with a Percoll coefficient group read out by the user, so that a human voice synthesized with the voice can be formed as a natural singing tone.

[Brief description of the drawings]

【図１】本発明による一実施例の原理を説明するための
機能ブロック図である。FIG. 1 is a functional block diagram for explaining the principle of an embodiment according to the present invention.

【図２】パーコール係数記憶部４のフレーム記憶態様を
説明するための図である。FIG. 2 is a diagram for explaining a frame storage mode of a Percoll coefficient storage unit 4;

【図３】パーコール分析系２において生成される残差波
形データを説明するための図である。FIG. 3 is a diagram for explaining residual waveform data generated in a Percoll analysis system 2;

【図４】ピッチ変換手段５におけるピッチ変換の一例を
説明するための図である。FIG. 4 is a diagram for explaining an example of pitch conversion in pitch conversion means 5;

【図５】ピッチ変換手段５の機能構成を示すブロック図
である。FIG. 5 is a block diagram showing a functional configuration of a pitch conversion unit 5;

【図６】変形例の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a modification.

【図７】その他の例の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of another example.

[Explanation of symbols]

２パーコール分析系（音声分析手段）３残差波形データメモリ（記憶手段）４パーコール係数記憶部（記憶手段）５ピッチ変換手段（ピッチ変換手段）６パーコール合成系（音声合成手段）７制御回路（係数読み出し手段） 2 Percoll analysis system (voice analysis means) 3 Residual waveform data memory (storage means) 4 Percoll coefficient storage unit (storage means) 5 Pitch conversion means (pitch conversion means) 6 Percoll synthesis system (voice synthesis means) 7 control circuit ( Coefficient reading means)

Claims

[Claims]

1. A speech analysis means for performing a Percoll analysis on a speech signal sampled in a time series to generate a Percoll coefficient group and a residual waveform, and stores the Percoll coefficient group and a residual waveform generated by the speech analysis means. Storage means for reading, a coefficient reading means for reading out a corresponding Percoll coefficient group from the storage means in accordance with the performance information, and a pitch conversion means for performing pitch conversion of the residual waveform stored in the storage means in accordance with the performance information And voice synthesis means for generating a voice of a desired pitch by performing Percoll synthesis on the residual waveform pitch-converted by the pitch conversion means in accordance with a Percoll coefficient group read by the coefficient reading means. Musical sound generator.

2. The storage device according to claim 1, wherein the storage unit stores a series of residual waveforms obtained by adding the residual waveforms output from the voice analyzing unit for each frame in frame order. Musical sound generator.

3. The pitch conversion means has a plurality of read channels which operate in a time-sharing manner at regular intervals, and multiplies each of the residual waveforms read from the storage means by a window function in each of the read channels. 2. The musical tone generator according to claim 1, wherein the residual waveform is sequentially added and synthesized to generate a residual waveform having a desired pitch.

4. The pitch conversion means has a plurality of read channels operating in a time-division manner, and in each read channel for each period corresponding to a pitch specified by the performance information, a Percoll coefficient read by the coefficient read means. 2. The tone generator according to claim 1, wherein the group and the residual waveform associated with the Percoll coefficient are synchronized.

5. The speech synthesis means includes: a noise generation means for generating a noise signal; a noise signal generated by the noise generation means; and a pitch-converted residual waveform generated by the pitch conversion means.
Weighting means for weighting and adding according to any one of the Percoll coefficient groups output from the coefficient readout means, wherein an output of the weighting means is output in accordance with a Percoll coefficient read out by the coefficient readout means. The musical sound generating device according to claim 1, wherein the musical sound is synthesized.