JP2005107315A

JP2005107315A - Apparatus, method, and program for musical sound processing

Info

Publication number: JP2005107315A
Application number: JP2003342254A
Authority: JP
Inventors: Yasuo Yoshioka; 靖雄吉岡; Rosukosu Alex; ロスコスアレックス
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2003-09-30
Filing date: 2003-09-30
Publication date: 2005-04-21
Anticipated expiration: 2023-09-30
Also published as: JP3903975B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus, a method, and a program for musical sound processing that imparts natural unison singing effect. <P>SOLUTION: A pseudo-random signal generation unit 500 comprises a plurality of pseudo-random signal generation parts 510. The respective pseudo-random signal generation parts 510 generate pseudo-random signals differing in way of varying etc., and supply those pseudo-random signals to corresponding clone signal generation parts 410. A pseudo-random signal generated by each pseudo-random signal generation part 510 has its white noise processed by an LPF. The cutoff frequency of the LPF is set to a value matching natural ways of variation and fluctuation when a person sings most (in concrete, about 2 Hz). Each clone signal generation part 410 uses the pseudo-random signal as a modulation signal to perform voice conversion. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、楽音処理装置に関し、特に、複数の人が同じメロディを歌唱しているような効果（斉唱効果）等を得るのに好適な楽音処理装置、楽音処理方法及び楽音処理プログラムに関する。 The present invention relates to a musical sound processing apparatus, and more particularly, to a musical sound processing apparatus, a musical sound processing method, and a musical sound processing program suitable for obtaining an effect (singing effect) in which a plurality of people sing the same melody.

入力される楽音にコーラス効果（単一の音源の音を複数の音源の音が同時に鳴っているように変える効果）を付与する装置が広く知られており、このような装置が上記斉唱効果を得るために利用されている。このようなコーラス効果を付与する装置として、下記特許文献１には、入力される楽音信号を低域成分、中域成分、高域成分の３つに帯域分割し、帯域分割した各信号成分に異なる変調処理（周期的なピッチ変化や遅延等を与えるための処理）を施し、これらを入力された楽音信号とミックスしてコーラス効果を付与する装置が開示されている。また、下記特許文献２には、予めメモリ等に格納されている楽譜情報からピッチ、音量、出音タイミング等を抽出し、これらに変調処理を施して合成した合唱音声を得る歌唱合成装置が開示されている。さらに、下記特許文献３には、入力される音声信号をキー変更回路、フィルタ、残響付加回路を通して変換する際に、これら各パラメータを揺らぎコントローラによって揺らがせてやることにより、入力音声信号とは異なる音声信号を生成し、これらを合成することにより斉唱効果を得る斉唱効果装置が開示されている。 Devices that give a chorus effect (the effect of changing the sound of a single sound source so that multiple sound sources are playing simultaneously) to the input musical sound are widely known. Has been used to get. As an apparatus for giving such a chorus effect, Patent Document 1 below discloses that an input musical sound signal is divided into three parts, a low-frequency component, a mid-frequency component, and a high-frequency component, and each band-divided signal component is divided. An apparatus is disclosed that performs different modulation processing (processing for giving periodic pitch changes, delays, etc.) and mixes these with the input musical sound signal to give a chorus effect. Patent Document 2 below discloses a singing voice synthesizing apparatus that extracts pitch, volume, sound output timing, and the like from musical score information stored in advance in a memory or the like, and performs a modulation process on these to synthesize a chorus voice. Has been. Further, in Patent Document 3 below, when an input audio signal is converted through a key changing circuit, a filter, and a reverberation adding circuit, these parameters are changed by a fluctuation controller, which is different from the input audio signal. A chorus effect device that generates a voice effect and obtains a chorus effect by synthesizing them is disclosed.

特開２００３−１２２３６１号公報JP 2003-122361 A 特開平７−１４６６９５号公報JP-A-7-146695 特開平９−２８１９６６号公報Japanese Patent Laid-Open No. 9-281966

しかしながら、上記各特許文献に開示された変調処理においては、ＬＦＯ（Low Frequency Oscillator）にて発生された三角波等の変調信号が用いられていたため、その変化の仕方は単調かつ不自然であり、実際に複数の人が歌唱しているような効果は得られなかった。なぜなら、実際に複数の人が歌唱した場合には、声質、歌い方、音程のずれ方等は各人毎に微妙に異なるものであり、この微妙なずれによって味わい深い音の艶や揺らぎが生み出されるからである。 However, in the modulation processing disclosed in each of the above patent documents, a modulation signal such as a triangular wave generated by an LFO (Low Frequency Oscillator) is used, so the change is monotonous and unnatural. The effect that multiple people are singing was not obtained. This is because when multiple people actually sing, the voice quality, singing method, pitch shift, etc. are subtly different for each person, and this subtle shift creates a lustrous sound gloss and fluctuation. Because.

本発明は、以上説明した事情を鑑みてなされたものであり、自然な斉唱効果等の付与を可能とする楽音処理装置、楽音処理方法及び楽音処理制御プログラムを提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and an object thereof is to provide a musical sound processing device, a musical sound processing method, and a musical sound processing control program that can impart a natural sung effect or the like.

上述した問題を解決するため、本発明に係る楽音処理装置は、ノイズ信号を発生するノイズ発生手段と、設定されるカットオフ周波数に従って前記ノイズ信号から特定周波数成分の信号を取り除き、疑似ランダム信号として出力するフィルタ手段と、入力される楽音を前記疑似ランダム信号に基づいて変調する変調手段とを具備することを特徴とする。 In order to solve the above-described problem, the musical sound processing apparatus according to the present invention removes a signal of a specific frequency component from the noise signal according to a noise generation unit that generates a noise signal and a set cutoff frequency, and generates a pseudo-random signal. Filter means for outputting, and modulation means for modulating the input musical sound based on the pseudo-random signal.

かかる構成によれば、入力される楽音を変調するための変調信号として、特定周波数成分の信号が取り除かれた疑似ランダム信号が用いられる。このような疑似ランダム信号を変調信号として用いることにより、三角波等を変調信号として用いる場合と比較して、より自然な斉唱効果等を付与することが可能となる。 According to such a configuration, a pseudo-random signal from which a signal of a specific frequency component is removed is used as a modulation signal for modulating an input musical sound. By using such a pseudo-random signal as a modulation signal, a more natural chorus effect or the like can be imparted as compared with the case where a triangular wave or the like is used as a modulation signal.

ここで、上記構成にあっては、前記カットオフ周波数を設定する手段であり、該カットオフ周波数を一定の周波数範囲内で変更するカットオフ周波数設定手段をさらに具備する態様が望ましい。このように、カットオフ周波数を固定するのではなく、一定の周波数範囲内（例えば、２Ｈｚ前後）で揺らすことにより、人が歌唱したときの自然な変化、揺れ方により合致させることが可能となる。 Here, in the above configuration, it is desirable that the cutoff frequency setting unit is a unit that sets the cutoff frequency and changes the cutoff frequency within a certain frequency range. In this way, the cutoff frequency is not fixed, but is oscillated within a certain frequency range (for example, around 2 Hz), so that it is possible to match the natural change when the person sings, and how to sway. .

また、上記構成にあっては、前記楽音を分析することにより、該楽音の特徴をあらわすパラメータを抽出する分析・抽出手段をさらに具備し、前記変調手段は、
抽出されたパラメータを前記疑似ランダム信号に基づいて変化させることにより、前記楽音を変調するようにしても良い。さらに、前記変調手段によって変調された後の楽音と変調される前の楽音とを合成する合成手段をさらに具備する態様も望ましい。 Further, in the above-described configuration, by further analyzing / extracting means for extracting a parameter representing characteristics of the musical sound by analyzing the musical sound, the modulating means comprises:
The musical sound may be modulated by changing the extracted parameter based on the pseudo-random signal. Furthermore, it is desirable that the apparatus further includes a synthesizing unit that synthesizes the musical sound after being modulated by the modulating unit and the musical tone before being modulated.

また、本発明に係る楽音処理方法は、ノイズ発生器によって発生されるノイズ信号から特定周波数成分の信号を取り除き、疑似ランダム信号として出力するフィルタ手段に対し、カットオフ周波数を設定するカットオフ周波数設定過程と、前記カットオフ周波数が設定された前記フィルタ手段出力される前記疑似ランダム信号に基づいて入力される楽音を変調する変調過程とを具備することを特徴とする。 In addition, the musical sound processing method according to the present invention is a cut-off frequency setting for setting a cut-off frequency for a filter means that removes a signal of a specific frequency component from a noise signal generated by a noise generator and outputs it as a pseudo-random signal. And a modulation process for modulating a musical sound input based on the pseudo-random signal output from the filter means in which the cutoff frequency is set.

また、本発明に係る楽音処理プログラムは、コンピュータを、ノイズ信号を発生するノイズ発生手段と、設定されるカットオフ周波数に従って前記ノイズ信号から特定周波数成分の信号を取り除き、疑似ランダム信号として出力するフィルタ手段と、入力される楽音を前記疑似ランダム信号に基づいて変調する変調手段として機能させることを特徴とする。 Further, the musical sound processing program according to the present invention includes a noise generating means for generating a noise signal and a filter for removing a signal of a specific frequency component from the noise signal according to a set cutoff frequency and outputting the signal as a pseudo random signal. And a means for modulating the input musical sound based on the pseudo-random signal.

以上説明したように、本発明によれば、自然な斉唱効果等の付与が可能となる。 As described above, according to the present invention, it is possible to give a natural chorus effect or the like.

以下、本発明に係る実施の形態について図面を参照しながら説明する。
Ａ．本実施形態
Ａ−１．全体構成
図１は、本実施形態に係る音声処理装置（楽音処理装置）１００の構成を示す図である。
音声信号入力部２００は、マイクロホン等によって構成され、利用者が発した音声を入力音声信号として音声処理装置１００の内部に入力する。
音声信号分析部（分析・抽出手段）３００は、音声信号入力部２００から供給される入力音声信号を入力音声をフレーム単位（５〜１０ｍｓ程度）でＦＦＴ（Fast Fourier Transform）分析等を行い、有声、無声の判定、ピッチ、音量、スペクトルの抽出を行う。そして、音声信号分析部３００は、かかる分析等によって得た該音声の特徴をあらわすパラメータ、すなわち有声、無声の判定結果、ピッチ、音量、スペクトルをフレーム情報として各クローン信号生成部４１０−ｋ（１≦ｋ≦ｎ）等に供給する。なお、入力音声信号が有声であるか、あるいは無声であるかについては、該音声信号のエネルギーや周波数成分を分析することで判定すれば良い。 Embodiments according to the present invention will be described below with reference to the drawings.
A. Embodiment A-1. Overall Configuration FIG. 1 is a diagram showing a configuration of a sound processing apparatus (musical sound processing apparatus) 100 according to the present embodiment.
The audio signal input unit 200 is configured by a microphone or the like, and inputs the audio uttered by the user into the audio processing apparatus 100 as an input audio signal.
The audio signal analysis unit (analysis / extraction means) 300 performs an FFT (Fast Fourier Transform) analysis on the input audio signal supplied from the audio signal input unit 200 in units of frames (about 5 to 10 ms), and the like. , Silent determination, pitch, volume, spectrum extraction. Then, the audio signal analysis unit 300 uses each clone signal generation unit 410-k (1) as a frame information with parameters indicating the characteristics of the audio obtained by such analysis, that is, voiced / unvoiced determination results, pitch, volume, and spectrum. ≦ k ≦ n) etc. Whether the input voice signal is voiced or unvoiced may be determined by analyzing the energy and frequency components of the voice signal.

クローン信号生成ユニット４００は、音色、ピッチ、音量、出力タイミング等が入力音声信号と微妙に異なる変換音声信号（以下、クローン信号という）を生成する手段であり、複数のクローン信号生成部４１０−ｋによって構成されている。各クローン信号生成部（変調手段）４１０−ｋは、音声信号分析部３００から供給されるフレーム情報を、疑似ランダム信号発生部５１０−ｋ（１≦ｋ≦ｎ）から供給される疑似ランダム信号（後述）に基づいて変化させることにより上記クローン信号を生成する。 The clone signal generation unit 400 is a means for generating a converted audio signal (hereinafter referred to as a clone signal) whose tone color, pitch, volume, output timing, etc. are slightly different from the input audio signal, and a plurality of clone signal generation units 410-k. It is constituted by. Each clone signal generation unit (modulation unit) 410-k converts the frame information supplied from the audio signal analysis unit 300 into a pseudo random signal (1 ≦ k ≦ n) supplied from the pseudo random signal generation unit 510-k (1 ≦ k ≦ n). The clone signal is generated by changing based on (described later).

疑似ランダム信号発生ユニット５００は、クローン信号生成時に用いられる疑似ランダム信号を発生する手段であり、複数の疑似ランダム信号発生部５１０−ｋ（１≦ｋ≦ｎ）によって構成されている。各疑似ランダム信号発生部５１０−ｋは、振幅の仕方等がそれぞれ異なる疑似ランダム信号を発生し、これら各疑似ランダム信号をそれぞれ対応するクローン信号生成部４１０−ｋに供給する。図２は、疑似ランダム信号発生部５１０−ｋの構成を示す図であり、図３は、疑似ランダム信号発生部５１０−ｋにおいて発生される疑似ランダム信号の波形を例示した図である。なお、以下の説明において、各疑似ランダム信号発生部５１０−ｋ及び各クローン信号生成部４１０−ｋを特に区別する必要がない場合には、単に疑似ランダム信号発生部５１０及びクローン信号生成部４１０と略称する。また、図３に示すような疑似ランダム信号を表す関数を疑似ランダム関数rand(t)と略称する。 The pseudo-random signal generation unit 500 is a means for generating a pseudo-random signal used when generating a clone signal, and includes a plurality of pseudo-random signal generation units 510-k (1 ≦ k ≦ n). Each pseudo-random signal generator 510-k generates pseudo-random signals having different amplitudes and the like, and supplies these pseudo-random signals to the corresponding clone signal generators 410-k. FIG. 2 is a diagram illustrating a configuration of the pseudo random signal generation unit 510-k, and FIG. 3 is a diagram illustrating a waveform of the pseudo random signal generated in the pseudo random signal generation unit 510-k. In the following description, when there is no need to particularly distinguish each pseudo-random signal generator 510-k and each clone signal generator 410-k, the pseudo-random signal generator 510 and clone signal generator 410 Abbreviated. A function representing a pseudo random signal as shown in FIG. 3 is abbreviated as a pseudo random function rand (t).

図２に示すホワイトノイズ発生器（ノイズ発生手段）５１１は、図示せぬ制御部による制御のもと、一定レベル内のノイズ信号をランダムに発生し、ＬＰＦ（フィルタ手段）５１２に供給する。ＬＰＦ５１２は、供給されるノイズ信号からカットオフ周波数設定手段５１３によって設定されるカットオフ周波数Ｆｃよりも高い周波数成分の信号を取り除き、これを正規化手段５１４に出力する。カットオフ周波数設定手段５１３は、２Ｈｚ前後の一定の周波数範囲内で揺れるようなカットオフ周波数Ｆｃを発生し、これをＬＰＦ５１２に設定する。このように、カットオフ周波数Ｆｃを２Ｈｚ前後で揺らすように設定するのは、人が歌唱したときの自然な変化、揺れ方に最も合致するから（２Ｈｚ前後で揺れるから）である。なお、カットオフ周波数Ｆｃを２Ｈｚ前後で揺らすタイミングについては、任意に設定可能である。また、カットオフ周波数Ｆｃを２Ｈｚ前後で揺らすことなく、固定しても良いのはもちろんである。 A white noise generator (noise generating means) 511 shown in FIG. 2 randomly generates a noise signal within a certain level under the control of a control unit (not shown), and supplies the noise signal to an LPF (filter means) 512. The LPF 512 removes a signal having a frequency component higher than the cutoff frequency Fc set by the cutoff frequency setting unit 513 from the supplied noise signal, and outputs this to the normalizing unit 514. The cut-off frequency setting means 513 generates a cut-off frequency Fc that fluctuates within a certain frequency range around 2 Hz, and sets this in the LPF 512. Thus, the reason why the cutoff frequency Fc is set to swing around 2 Hz is because it most closely matches the natural change and way of shaking when a person sings (because it swings around 2 Hz). Note that the timing at which the cutoff frequency Fc is fluctuated around 2 Hz can be arbitrarily set. Of course, the cutoff frequency Fc may be fixed without being fluctuated around 2 Hz.

正規化手段５１４は、ＬＰＦ５１２から高周波成分が取り除かれたノイズ信号を受け取ると、これを図３に示すように−１〜０〜１の範囲になるように正規化し、疑似ランダム信号として出力手段５１５に出力する。この結果、各疑似ランダム信号発生部５１０からは、人が歌唱したときの自然な変化、揺れ方に最も合致する疑似ランダム信号（ただし、振幅の仕方等は各疑似ランダム信号毎に異なる）が出力されることになる。 When the normalizing means 514 receives the noise signal from which the high frequency component has been removed from the LPF 512, the normalizing means 514 normalizes the noise signal to fall within the range of −1 to 0 to 1 as shown in FIG. Output to. As a result, each pseudo-random signal generator 510 outputs a pseudo-random signal that best matches the natural change and shaking method when a person sings (however, the method of amplitude differs for each pseudo-random signal). Will be.

図１に戻り、各クローン信号生成部４１０は、対応する各疑似ランダム信号発生部５１０から疑似ランダム信号を受け取ると、受け取った疑似ランダム信号に基づきフレーム情報を変化させることにより、それぞれ異なったクローン信号（すなわち、音色、ピッチ等のずれ量がそれぞれ異なる変換音声信号）を生成する。 Returning to FIG. 1, when each clone signal generation unit 410 receives a pseudo-random signal from each corresponding pseudo-random signal generation unit 510, the clone information generation unit 410 changes the frame information based on the received pseudo-random signal, thereby different clone signals. (That is, converted audio signals having different shift amounts such as timbre and pitch) are generated.

制御情報入力部（入力手段）６００は、操作ボタン、操作スイッチ等によって構成され、外部から操作ボタン等を介して入力される各種効果（ビブラート効果等；詳細は後述）に関するコントロール指示を受け付ける。
信号合成部（合成手段）７００は、クローン信号生成ユニット４００において生成される各クローン信号と入力音声信号とを合成する手段であり、第１加算器７１０と、第２加算器７２０と、変換器７３０とを備えている。 The control information input unit (input unit) 600 includes operation buttons, operation switches, and the like, and receives control instructions regarding various effects (vibrato effect and the like; details will be described later) input from the outside via the operation buttons.
The signal synthesizer (synthesizing unit) 700 is a unit that synthesizes each clone signal generated in the clone signal generation unit 400 and the input audio signal, and includes a first adder 710, a second adder 720, and a converter. 730.

第１加算器７１０は、各クローン信号生成部４１０から供給される各クローン信号のスペクトルを加算し、加算結果を第２加算器７２０に出力する。第２加算器７２０は、第１加算器７１０から出力されるスペクトルの加算結果と、音声信号分析部３００から供給される入力音声信号のスペクトルとを加算し、加算結果を変換器７３０に出力する。変換器７３０は、第２加算器７２０から出力される加算結果（すなわち、全てのスペクトルの加算結果）に逆ＦＦＴ等を施し、入力音声信号と各クローン信号とを合成した合成音声信号を得る。そして、信号合成部７００は、この逆ＦＦＴ等によって得られた合成音声信号（すなわち、音色、ピッチ等が微妙にずれた複数の音声信号を合成したもの）を音声出力部８００に供給する。 The first adder 710 adds the spectrum of each clone signal supplied from each clone signal generation unit 410 and outputs the addition result to the second adder 720. The second adder 720 adds the spectrum addition result output from the first adder 710 and the spectrum of the input audio signal supplied from the audio signal analysis unit 300, and outputs the addition result to the converter 730. . The converter 730 performs inverse FFT or the like on the addition result (that is, the addition result of all the spectra) output from the second adder 720 to obtain a synthesized voice signal obtained by synthesizing the input voice signal and each clone signal. Then, the signal synthesis unit 700 supplies the synthesized voice signal obtained by the inverse FFT or the like (that is, a synthesized voice signal having a slightly shifted tone color, pitch, etc.) to the voice output unit 800.

音声出力部８００は、スピーカ等によって構成され、信号合成部７００から供給される合成音声信号を外部に出力する。かかる構成を有する音声処理装置１００を利用することで、実際に複数の人が斉唱しているかのような効果を得ることが可能となる。
以下、各クローン信号生成部４１０によって実現される各種機能について詳細に説明する。 The audio output unit 800 is configured by a speaker or the like, and outputs the synthesized audio signal supplied from the signal synthesis unit 700 to the outside. By using the speech processing apparatus 100 having such a configuration, it is possible to obtain an effect as if a plurality of people are actually singing.
Hereinafter, various functions realized by each clone signal generation unit 410 will be described in detail.

Ａ−２．出力タイミング変更機能
出力タイミング変更機能は、入力音声信号に対するクローン信号の出力タイミングを変更する機能であり、図１に示すタイミング変更手段４１１によって実現される。
図４は、タイミング変更手段４１１によってフレーム情報の出力タイミングが変更されたときの様子を示す図である。なお、図４においては、フレーム情報に含まれるピッチを例示し、出力タイミング変更前のピッチを実線で示し、出力タイミング変更後のピッチを破線で示している。図４に示すように、フレーズの切り替わり部分において、ピッチの出力タイミング（すなわち、ピッチの時間的な遅れ量）は変更されている。図５は、かかる出力タイミングを変更するための処理（タイミング変更処理）を示すフローチャートであり、図６は、タイミング変更処理を説明するための図である。なお、以下の説明では、初期条件としてstate値＝２、Delay値＝０がメモリ（図示略）に予め設定されているものとする。 A-2. Output Timing Change Function The output timing change function is a function that changes the output timing of the clone signal with respect to the input audio signal, and is realized by the timing change means 411 shown in FIG.
FIG. 4 is a diagram illustrating a state when the output timing of the frame information is changed by the timing changing unit 411. In FIG. 4, the pitch included in the frame information is exemplified, the pitch before the output timing is changed is indicated by a solid line, and the pitch after the output timing is changed is indicated by a broken line. As shown in FIG. 4, the pitch output timing (that is, the pitch time delay amount) is changed in the phrase switching portion. FIG. 5 is a flowchart showing a process (timing changing process) for changing the output timing, and FIG. 6 is a diagram for explaining the timing changing process. In the following description, it is assumed that state value = 2 and delay value = 0 are preset in a memory (not shown) as initial conditions.

タイミング変更手段（検出手段）４１１は、音声信号分析部３００からフレーム情報を受け取ると、該フレーム情報から入力音声信号の音量値を取得し、これを音量値ＡＭＰとしてレジスタ（図示略）に格納する（ステップＳ１）。そして、タイミング変更手段４１１は、メモリを参照し、当該時点におけるstate値を判断する（ステップＳ２）。タイミング変更手段４１１は、state値が「２」であると判断すると、ステップＳ３に進み、現在の音量値ＡＭＰが予め設定されている第２の音量閾値Ｇ２（＞Ｇ１）よりも大きいか否かを判断する。タイミング変更手段４１１は、現在の音量値ＡＭＰが第２の音量閾値Ｇ２以下であると判断すると（ステップＳ３；ＮＯ）、そのまま処理を終了する一方、現在の音量値ＡＭＰが第２の音量閾値Ｇ２よりも大きいと判断すると（ステップＳ３；ＹＥＳ）、ステップＳ４に進み、state値を「２」から「１」に切り換え（図６に示すＰ１参照）、処理を終了する。 Upon receiving the frame information from the audio signal analysis unit 300, the timing changing unit (detection unit) 411 acquires the volume value of the input audio signal from the frame information, and stores this as a volume value AMP in a register (not shown). (Step S1). Then, the timing changing unit 411 refers to the memory and determines the state value at that time (step S2). When the timing changing unit 411 determines that the state value is “2”, the process proceeds to step S3, and whether or not the current volume value AMP is greater than a preset second volume threshold G2 (> G1). Judging. When the timing changing unit 411 determines that the current volume value AMP is equal to or lower than the second volume threshold value G2 (step S3; NO), the timing changing unit 411 ends the process as it is, while the current volume value AMP is set to the second volume threshold value G2. If it is determined that the value is larger (step S3; YES), the process proceeds to step S4, the state value is switched from “2” to “1” (see P1 shown in FIG. 6), and the process ends.

また、タイミング変更手段４１１は、ステップＳ２においてstate値が「１」であると判断すると、ステップＳ５に進み、現在の音量値ＡＭＰが予め設定されている第１の音量閾値Ｇ１よりも小さいか否かを判断する。タイミング変更手段４１１は、現在の音量値ＡＭＰが第１の音量閾値Ｇ１以上であると判断すると（ステップＳ５；ＮＯ）、そのまま処理を終了する一方、現在の音量値ＡＭＰが第１の音量閾値Ｇ１よりも小さいと判断すると（ステップＳ５；ＹＥＳ）、ステップＳ６に進み、下記式（１）を利用してNEW Delay値を生成し、Delay値の書き換えを行う（Delay←New Delay）とともに、state値を「１」から「２」に切り換え（図６に示すＰ２参照）、処理を終了する。なお、下記式（１）に示す変化量ｚ１は、外部から制御情報入力部６００を介して入力される制御情報であり、この変化量ｚ１を変更することで出力タイミングの調整が可能となっている（この点については、以下に説明する変化量ｚ２等も同様）。
New Delay［s］＝｛１＋rand(t)｝＊ｋ１＊ｚ１・・・（１）
ｋ１；定数
ｚ１；変化量（０〜１） If the timing change unit 411 determines that the state value is “1” in step S2, the process proceeds to step S5, where the current volume value AMP is smaller than a preset first volume threshold G1. Determine whether. When the timing changing unit 411 determines that the current volume value AMP is equal to or greater than the first volume threshold G1 (step S5; NO), the timing changing unit 411 ends the process as it is, while the current volume value AMP is equal to the first volume threshold G1. If it is determined that the delay time is smaller (step S5; YES), the process proceeds to step S6, a new delay value is generated using the following equation (1), the delay value is rewritten (delay ← new delay), and the state value Is switched from “1” to “2” (see P2 shown in FIG. 6), and the process ends. The change amount z1 shown in the following formula (1) is control information input from the outside via the control information input unit 600, and the output timing can be adjusted by changing the change amount z1. (This also applies to the amount of change z2 and the like described below).
New Delay [s] = {1 + rand (t)} * k1 * z1 (1)
k1; constant
z1: Change amount (0 to 1)

このように、タイミング変更手段４１１は、入力音声信号の音量が減少し、かつ、該音量が第１の閾値Ｇ１（＜Ｇ２）を下回ったときに、新たなDelay値を求める。タイミング変更手段４１１は、新たなDelay値を求めると、この新たなDelay値に従ってフレーム情報の出力タイミングを適宜変更してゆく。ここで、Delayの値を変更することにより、音声波形は不連続になり、異音が発生するといった問題が生ずるが、上記条件（すなわち、入力音声信号の音量の減少が検出され、かつ、該音量が第１の閾値Ｇ１を下回るといった条件）によれば該異音はマスクされるため、聴感上不自然な異音が聞こえてしまうといった問題も未然に防ぐことができる。 As described above, the timing changing unit 411 obtains a new delay value when the volume of the input audio signal decreases and the volume falls below the first threshold G1 (<G2). When the timing changing unit 411 obtains a new delay value, the timing changing unit 411 appropriately changes the output timing of the frame information according to the new delay value. Here, changing the Delay value causes the problem that the audio waveform becomes discontinuous and abnormal noise occurs, but the above condition (that is, a decrease in the volume of the input audio signal is detected, and the According to the condition that the sound volume is lower than the first threshold value G1, the abnormal sound is masked, so that it is possible to prevent a problem that an unnatural noise that is unnatural for hearing is heard.

このようにしてタイミング変更手段４１１から出力されたフレーム情報は、図１に示す供給制御手段４１２に供給される。供給制御手段４１２は、フレーム情報を受け取ると、該フレーム情報を参照して入力音声が有声であるか、あるいは無声であるかを判断する。入力音声が有声であると判断すると、供給制御手段４１２は、ピッチ、音量をトレンド変化手段４１３に出力するとともに、スペクトルを第１スペクトル変化手段４１８に出力し、さらに、出力切換手段４２０に有声である旨の判定結果を出力する（図１に示す有声系統参照）。一方、入力音声が無声であると判断すると、供給制御手段４１２は、スペクトルを第２スペクトル変化手段４１９に出力し、さらに、出力切換手段４２０に無声である旨の判定結果を出力する（図１に示す無声系統参照）。 The frame information output from the timing changing unit 411 in this way is supplied to the supply control unit 412 shown in FIG. When the supply control means 412 receives the frame information, the supply control means 412 refers to the frame information to determine whether the input voice is voiced or unvoiced. If it is determined that the input voice is voiced, the supply control unit 412 outputs the pitch and volume to the trend changing unit 413, outputs the spectrum to the first spectrum changing unit 418, and further outputs the spectrum to the output switching unit 420. A determination result to the effect is output (see the voiced system shown in FIG. 1). On the other hand, if it is determined that the input voice is silent, the supply control unit 412 outputs the spectrum to the second spectrum changing unit 419, and further outputs a determination result indicating that the input voice is silent to the output switching unit 420 (FIG. 1). (See the unvoiced system shown below).

Ａ−３．トレンド変化機能
トレンド変化機能は、供給されるピッチ、音量に比較的大きな変化（以下、トレンド変化という）をつける機能であり、図１に示すトレンド変化手段４１３によって実現される。
図７は、ピッチのトレンド変化の様子を示す図であり、トレンド変化前のピッチを実線で示し、トレンド変化後のピッチを破線で示している。
トレンド変化手段（パラメータ制御手段）４１３は、タイミング変更手段４１１からピッチ、音量を受け取ると、これらを下記式（２）、（３）にそれぞれ代入することにより、ピッチ、音量にトレンド変化を与える。なお、下記式（２）、（３）に示す入力ピッチ(t)［Hz］、入力音量(t)［dB］は、それぞれ供給制御手段４１２から供給されるピッチ、音量を指す。 A-3. Trend Change Function The trend change function is a function that gives a relatively large change (hereinafter referred to as trend change) to the supplied pitch and volume, and is realized by the trend change means 413 shown in FIG.
FIG. 7 is a diagram illustrating a change in the trend of the pitch. The pitch before the trend change is indicated by a solid line, and the pitch after the trend change is indicated by a broken line.
When the trend changing means (parameter control means) 413 receives the pitch and volume from the timing changing means 411, the trend changing means 413 substitutes these into formulas (2) and (3) below, respectively, to give a trend change to the pitch and volume. In addition, the input pitch (t) [Hz] and the input volume (t) [dB] shown in the following formulas (2) and (3) indicate the pitch and volume supplied from the supply control unit 412 respectively.

出力ピッチ(t)［Hz］＝入力ピッチ(t)［Hz］＊｛１＋rand(t)＊ｋ２＊ｚ２｝・・・（２）
出力音量(t)［dB］＝入力音量(t)［dB］＋rand(t)＊ｋ３＊ｚ３｝・・・（３）
ｋ２、ｋ３；定数
ｚ２、ｚ３；変化量（０〜１）
このように、ピッチ、音量にトレンド変化を与えるための信号として疑似ランダム信号を用いることにより、正弦波信号などを用いる場合に比べ、より自然な変化を与えることができる。 Output pitch (t) [Hz] = Input pitch (t) [Hz] * {1 + rand (t) * k2 * z2} (2)
Output volume (t) [dB] = Input volume (t) [dB] + rand (t) * k3 * z3} (3)
k2, k3; constants
z2, z3: Amount of change (0 to 1)
Thus, by using a pseudo random signal as a signal for giving a trend change to the pitch and volume, a more natural change can be given compared to the case of using a sine wave signal or the like.

Ａ−４．しゃくり効果付与機能
しゃくり効果付与機能は、入力音声信号のアタック検出時にピッチ、音量軌跡を変える機能であり、図１に示すしゃくり効果付与手段４１４によって実現される。
図８は、しゃくり効果が付与されたときのピッチ変化の様子を示す図であり、しゃくり効果付与前のピッチを実線で示し、しゃくり効果付与後のピッチを破線で示している。周知の通り、人は歌唱するときに、音の出始め部分（アタック）において「しゃくる」ことがある。この「しゃくり」は、各人毎に、また歌唱する状況等によって異なる場合がある。この「しゃくり」をシュミレートして自然なしゃくり効果を付与するのが、しゃくり効果付与手段４１４である。 A-4. Scribbing effect imparting function The scribbling effect imparting function is a function for changing the pitch and volume trajectory when an attack of the input voice signal is detected, and is realized by the sneezing effect imparting means 414 shown in FIG.
FIG. 8 is a diagram showing how the pitch changes when the scouring effect is applied. The pitch before the sneezing effect is given is shown by a solid line, and the pitch after the sneezing effect is given is shown by a broken line. As is well known, when a person sings, they sometimes “suck” at the beginning of the sound (attack). This “shakuri” may be different for each person or depending on the situation of singing. It is the scouring effect imparting means 414 that simulates this scouring and imparts a natural sneezing effect.

図９は、しゃくり効果の制御方法を説明するための図である。しゃくり効果付与手段（検出手段）４１４は、まず、与えられる音量と予め設定されている閾値とを比較等することにより、アタック時刻（図９に示すＰ１参照）の検出を行う。しゃくり効果付与手段（変化量算出手段）４１４は、アタック時刻を検出すると、疑似ランダム関数rand(t)を用いてピッチの変化量ΔPitchを求める。具体的には、しゃくり効果付与手段４１４は、アタック時刻から所定の入り時間（ある時間）が経過したときに上記変化量ΔPitchが最大値に到達し（図９に示すＰ２参照）、アタック時刻から所定の収束時間（ある時間）が経過したときに上記変化量ΔPitchが「０」に収束するように変化量ΔPitchを求める（図９に示すＰ３参照）。 FIG. 9 is a diagram for explaining a method for controlling the scouring effect. The sneezing effect applying means (detecting means) 414 first detects the attack time (see P1 shown in FIG. 9) by comparing the given volume with a preset threshold value. When the attack time is detected, the shackle effect imparting means (change amount calculating means) 414 obtains the pitch change amount ΔPitch using the pseudo random function rand (t). Specifically, the sneezing effect giving means 414 reaches the maximum value ΔPitch when a predetermined entry time (a certain time) has elapsed from the attack time (see P2 shown in FIG. 9), and from the attack time. The change amount ΔPitch is obtained so that the change amount ΔPitch converges to “0” when a predetermined convergence time (a certain time) has passed (see P3 shown in FIG. 9).

この際、しゃくり効果付与手段（時間算出手段）４１４は、変化量ΔPitchのみならず、入り時間、収束時間についても疑似ランダム関数rand(t)を用いて算出する。なお、疑似ランダム関数rand(t)をどのように用いるかは、利用者が所望するしゃくり効果の大きさ、長さ等に応じて適宜決定すれば良い。そして、しゃくり効果付与手段（パラメータ変化手段）４１４は、上記の如く求めた変化量ΔPitchを下記式（４）に代入し、出力ピッチ［cent］を求める。なお、下記式（４）に示す入力ピッチ［cent］は、トレンド変化手段４１３から供給されるピッチを指す。
出力ピッチ［cent］＝入力ピッチ［cent］＋ΔPitch［cent］・・・（４） At this time, the sneezing effect applying means (time calculating means) 414 calculates not only the change amount ΔPitch but also the entry time and convergence time using the pseudo-random function rand (t). It should be noted that how to use the pseudo-random function rand (t) may be appropriately determined according to the size, length, etc. of the squealing effect desired by the user. Then, the scooping effect imparting means (parameter changing means) 414 substitutes the change amount ΔPitch obtained as described above into the following equation (4) to obtain the output pitch [cent]. Note that the input pitch [cent] shown in the following formula (4) indicates the pitch supplied from the trend changing means 413.
Output pitch [cent] = input pitch [cent] + ΔPitch [cent] (4)

しゃくり効果付与手段４１４は、このように疑似ランダム関数rand(t)に基づいて変化量ΔPitchを求め、求めた変化量ΔPitchを入力ピッチに加算することで、図８に破線で示すようなしゃくり効果が付与された出力ピッチを得る。なお、しゃくり効果が付与されたときの音量変化については、以上説明したピッチ変化とほぼ同様の論理によって説明することができるため、割愛する。 The scouring effect imparting means 414 obtains the change amount ΔPitch based on the pseudo-random function rand (t) as described above, and adds the obtained change amount ΔPitch to the input pitch, whereby the sneezing effect as shown by a broken line in FIG. Is obtained. It should be noted that the volume change when the squealing effect is applied can be explained by the logic that is almost the same as the pitch change described above, and is therefore omitted.

Ａ−５．ビブラート効果付与機能
ビブラート効果付与機能は、音を伸ばす部分等にビブラートを付ける機能であり、図１に示すビブラート効果付与手段４１５によって実現される。
図１０は、ビブラート効果が付与されたときのピッチ変化の様子を示す図であり、ビブラート効果付与前のピッチを実線で示し、ビブラート効果付与後のピッチを破線で示している。 A-5. Vibrato effect imparting function The vibrato effect imparting function is a function for adding vibrato to a portion where sound is extended, and is realized by vibrato effect imparting means 415 shown in FIG.
FIG. 10 is a diagram showing how the pitch changes when the vibrato effect is applied. The pitch before the vibrato effect is given is shown by a solid line, and the pitch after the vibrato effect is given is shown by a broken line.

このようなビブラート効果を付与する前提として、利用者は、制御情報入力部６００の操作ボタン等を操作して平均ビブラートディレイ、平均ビブラートデプス、平均ビブラートスピード（＝レート）といったビブラート効果に関するコントロール指示（ビブラート制御情報）を入力する。入力されたビブラート制御情報は、制御情報入力部６００からビブラート効果付与手段４１５に供給される。ビブラート効果付与手段（変調手段）４１５は、該ビブラート制御情報を受け取ると、ビブラート効果を付与すべきか否かを判断すべく、しゃくり効果付与手段４１４から供給される音量と予め設定されている閾値とを比較する。ビブラート効果付与手段４１５は、該音量が予め設定されている閾値を越えたと判断すると、下記式（５）、（６）、（７）に上記平均ビブラートディレイ、平均ビブラートデプス、平均ビブラートスピードを代入し、新たなビブラートディレイ、ビブラートデプス、ビブラートスピードを求める。 As a premise for providing such a vibrato effect, the user operates control buttons or the like of the control information input unit 600 to control instructions relating to vibrato effects such as average vibrato delay, average vibrato depth, and average vibrato speed (= rate). Enter the vibrato control information). The input vibrato control information is supplied from the control information input unit 600 to the vibrato effect applying means 415. When receiving the vibrato control information, the vibrato effect applying means (modulating means) 415 determines the volume supplied from the sneezing effect applying means 414 and a preset threshold value in order to determine whether or not to apply the vibrato effect. Compare When the vibrato effect imparting means 415 determines that the volume exceeds a preset threshold, the average vibrato delay, average vibrato depth, and average vibrato speed are substituted into the following formulas (5), (6), and (7). And seek new vibrato delay, vibrato depth and vibrato speed.

ビブラートディレイ＝平均ビブラートディレイ＊｛１＋rand(t)＊ｋ４｝・・・（５）
ビブラートデプス＝平均ビブラートデプス＊｛１＋rand(t)＊ｋ５｝・・・（６）
ビブラートスピード＝平均ビブラートスピード＊｛１＋rand(t)＊ｋ６｝・・・（７）
ｋ４、ｋ５、ｋ６；定数 Vibrato delay = average vibrato delay * {1 + rand (t) * k4} (5)
Vibrato depth = average vibrato depth * {1 + rand (t) * k5} (6)
Vibrato speed = average vibrato speed * {1 + rand (t) * k6} (7)
k4, k5, k6; constants

このように、ビブラート効果付与手段４１５は、平均ビブラートディレイ、平均ビブラートデプス、平均ビブラートディレイといったビブラート制御情報を疑似ランダム関数rand(t)に基づいて変更し、新たなビブラート制御情報を求める。そして、ビブラート効果付与手段４１５は、新たなビブラートディレイ時間が経過した後、この計算で求めたビブラートデプス、ビブラートスピードにてビブラートをかける。この結果、各クローン信号毎にそれぞれ異なる位相、異なる開始時刻、異なるデプス、異なるスピードを有するビブラートがかけられることになり（図１０参照）、よりばらけた感じを与えることが可能となる。なお、ブラート効果が付与されたときの音量変化については、以上説明したピッチ変化とほぼ同様に説明することができるため、割愛する。 Thus, the vibrato effect imparting means 415 changes the vibrato control information such as the average vibrato delay, the average vibrato depth, and the average vibrato delay based on the pseudo random function rand (t), and obtains new vibrato control information. Then, after a new vibrato delay time elapses, the vibrato effect imparting means 415 applies vibrato at the vibrato depth and vibrato speed obtained by this calculation. As a result, a vibrato having different phases, different start times, different depths, and different speeds is applied to each clone signal (see FIG. 10), and it is possible to give a more dispersed feeling. Note that the volume change when the blur effect is applied can be described in substantially the same manner as the pitch change described above, and is therefore omitted.

Ａ−６．遷移部変化機能
遷移部変化手段４１６は、音程や音量が大きく変化するところ（遷移部）において、その変化の仕方を変更する機能であり、図１に示す遷移部変化手段４１６によって実現される。
図１１は、遷移部前後におけるピッチ変化の様子を示す図であり、変化させる前のピッチを実線（ただし、遷移部は太い実線）で示し、変化させた後のピッチを破線で示している。周知の通り、歌唱するメロディ等が同一であったとしても、歌唱する人が異なれば、音程や音量が大きく変わるところ（すなわち遷移部）でのピッチや音量の変化の仕方は異なる。このようなピッチ、音量の変化をシュミレートして遷移部における自然な変化を実現するのが、遷移部変化手段４１６である。 A-6. Transition Section Changing Function The transition section changing means 416 is a function that changes the way of change when the pitch or volume changes greatly (transition section), and is realized by the transition section changing means 416 shown in FIG.
FIG. 11 is a diagram showing the state of the pitch change before and after the transition part. The pitch before the change is indicated by a solid line (where the transition part is a thick solid line), and the pitch after the change is indicated by a broken line. As is well known, even if the singing melody and the like are the same, if the person who sings is different, the way the pitch and volume change at a place where the pitch and volume change greatly (that is, the transition part) is different. The transition part changing means 416 realizes a natural change in the transition part by simulating such changes in pitch and volume.

ここで、ピッチの遷移部の検出について説明すると、まず、遷移部変化手段（遷移部検出手段）４１６は、ビブラート効果付与手段４１５から与えられるピッチの短時間平均値（例えば５０［ms］間隔のピッチの平均値等）を求める。次に、遷移部変化手段４１６は、このようにして求めた短時間平均値について、前回求めた短時間平均値と今回求めた短時間平均値との差分（すなわち微分）をとる。そして、遷移部変化手段４１６は、この微分の絶対値（すなわちピッチの絶対的な変化量）が予め設定されている第１の閾値を越えてから、該ピッチの変化量が予め設定されている第２の閾値（＜第１の閾値）を下回るまでを遷移部として検出する（図１１に太い実線で示す遷移部参照）。より詳細には、遷移部変化手段（遷移部時刻検出手段）４１６は、ピッチの変化量が予め設定されている第１の閾値を越えた時刻を遷移部の開始時刻として検出し（図１２に示すＰ１参照）、該開始時刻の後の時刻であってピッチの変化量が第２の閾値を下回った時刻を遷移部の終了時刻として検出する（図１２に示すＰ２参照）。 Here, the detection of the transition part of the pitch will be described. First, the transition part changing means (transition part detecting means) 416 is a short-term average value (for example, 50 [ms] interval) of the pitch given from the vibrato effect applying means 415. Find the average value of the pitch. Next, the transition part change means 416 takes the difference (that is, differentiation) between the short-time average value obtained last time and the short-time average value obtained this time for the short-time average value thus obtained. Then, the transition portion changing means 416 sets the change amount of the pitch in advance after the absolute value of the differentiation (that is, the absolute change amount of the pitch) exceeds a preset first threshold value. Until it falls below the second threshold value (<first threshold value), it is detected as a transition part (see the transition part indicated by a thick solid line in FIG. 11). More specifically, the transition part changing means (transition part time detecting means) 416 detects the time when the pitch change amount exceeds a preset first threshold as the start time of the transition part (see FIG. 12). The time after the start time and when the amount of change in the pitch falls below the second threshold is detected as the end time of the transition unit (see P2 shown in FIG. 12).

図１２は、このように検出した遷移部におけるピッチの制御方法を説明するための図である。
まず、遷移部変化手段（算出手段）４１６は、下記式（８）を用いることにより、疑似ランダム関数に基づき単位時間当たり（例えば１フレーム時間）のピッチ変化量、すなわち単位時間毎にどれだけピッチを変化させるかを求める。
ピッチ変化量［cent］＝rand(t)＊ｋ７・・・（８）
ｋ７；定数 FIG. 12 is a diagram for explaining a pitch control method in the transition section detected in this way.
First, the transition portion changing means (calculating means) 416 uses the following equation (8) to calculate the amount of pitch change per unit time (for example, one frame time) based on the pseudo-random function, that is, how much pitch per unit time. Ask whether to change.
Pitch change amount [cent] = rand (t) * k7 (8)
k7; constant

次に、遷移部変化手段４１６は、下記式（９）にピッチ変化量を代入することにより、ピッチ変位量ΔPitchを求める。ただし、あまりに大きくピッチが変化してしまうと音痴に聞こえてしまうため、ピッチ変化関数f(t)は、図１２に示すようにある量（限界値）以上は変化しないように規定する。また、遷移終了後は、遷移終了時刻（図１２に示すＰ２）からある時間をかけてピッチ変位量ΔPitchを「０」に収束させる。ただし、遷移終了時刻後におけるピッチ変位量ΔPitchをどのように収束させるかは任意に設定可能である。
ΔPitch［cent］＝ピッチ変化量［cent］＊f(t) ・・・（９）
f(t)；ピッチの変化関数（図１２参照） Next, the transition portion changing means 416 obtains the pitch displacement amount ΔPitch by substituting the pitch change amount into the following equation (9). However, if the pitch is changed too much, it will sound audible. Therefore, the pitch change function f (t) is defined not to change more than a certain amount (limit value) as shown in FIG. Further, after the end of the transition, the pitch displacement amount ΔPitch is converged to “0” over a certain period of time from the transition end time (P2 shown in FIG. 12). However, how the pitch displacement amount ΔPitch after the transition end time converges can be arbitrarily set.
ΔPitch [cent] = Pitch change [cent] * f (t) (9)
f (t); pitch change function (see Fig. 12)

そして、遷移部変化手段（パラメータ変化手段）４１６は、このようにして求めたピッチ変位量ΔPitchを下記式（１０）に代入することにより、出力ピッチ［cent］を求める。なお、下記式（１０）に示す入力ピッチ［cent］は、ビブラート効果付与手段４１５から供給されるピッチを指す。
出力ピッチ［cent］＝入力ピッチ［cent］＋ΔPitch［cent］・・・（１０）
遷移部変化手段４１６は、このように疑似ランダム関数rand(t)に基づいてピッチ変化量、ピッチ変位量ΔPitchを求め、求めたピッチ変位量ΔPitchを入力ピッチに加算することで、図１１に破線で示すような遷移部にてその変化の仕方が異なる出力ピッチを得る。なお、遷移部前後における音量変化については、上記ピッチ変化とほぼ同様に説明することができるため、割愛する。 Then, the transition portion changing means (parameter changing means) 416 obtains the output pitch [cent] by substituting the pitch displacement amount ΔPitch obtained in this way into the following equation (10). Note that the input pitch [cent] shown in the following formula (10) indicates a pitch supplied from the vibrato effect applying means 415.
Output pitch [cent] = input pitch [cent] + ΔPitch [cent] (10)
The transition part changing means 416 obtains the pitch change amount and the pitch displacement amount ΔPitch based on the pseudo-random function rand (t) in this way, and adds the obtained pitch displacement amount ΔPitch to the input pitch, so that the broken line in FIG. An output pitch having a different manner of change is obtained at the transition part as shown in FIG. Note that the volume change before and after the transition portion can be described in substantially the same manner as the pitch change, and is omitted.

Ａ−７．スモール変化機能
スモール変化機能は、供給されるピッチ、音量に細かな変化（以下、スモール変化という）をつける機能であり、図１に示すスモール変化手段４１７によって実現される。
図１３は、ピッチのスモール変化の様子を示す図であり、スモール変化前のピッチを実線で示し、スモール変化後のピッチを破線で示している。上述したトレンド変化においては、ピッチ、音量に比較的大きな変化を与えたが、ここではさらに短い時間間隔で、ピッチ、音量に細かな変化を与える。かりに、このようなスモール変化を与えずに音声合成等を行った場合には、一定のピッチ、音量で音声が合成されるため、機械的な音（例えばブザーのような音）に聞こえてしまう。 A-7. Small Change Function The small change function is a function for adding a fine change (hereinafter referred to as small change) to the supplied pitch and volume, and is realized by the small change means 417 shown in FIG.
FIG. 13 is a diagram illustrating a state of small pitch change, in which the pitch before the small change is indicated by a solid line, and the pitch after the small change is indicated by a broken line. In the above-described trend change, a relatively large change is given to the pitch and volume, but here, a fine change is given to the pitch and volume at even shorter time intervals. However, if speech synthesis or the like is performed without giving such a small change, since the speech is synthesized at a constant pitch and volume, it sounds like a mechanical sound (for example, a buzzer-like sound). .

これに対し、短い時間間隔でピッチ、音量に細かな変化を与えた場合には、音声として自然に聞こえるといった効果を享受できる。このように、音声が音声として聞こえるためには、微少なピッチ、音量の変化が必要であるが、この変化の仕方は、当然ながら各人毎に異なる。これをシュミレートするために、スモール変化手段（パラメータ制御手段）４１７は、供給されるピッチ、音量を純粋なランダム信号を用いて変化させことにより、微少なランダム変化を与える。なお、この純粋なランダム信号にかえて疑似ランダム関数rand(t)を用いても良い。 On the other hand, when a fine change is given to a pitch and a sound volume at a short time interval, it is possible to enjoy an effect that the sound is naturally heard. As described above, in order to hear the sound as a sound, a slight change in pitch and volume is necessary. However, the way of the change naturally varies from person to person. In order to simulate this, the small change means (parameter control means) 417 gives a slight random change by changing the supplied pitch and volume using a pure random signal. Note that a pseudo-random function rand (t) may be used instead of the pure random signal.

Ａ−８．音色変化機能
Ａ−８−１．第１音色変化機能
第１音色変化機能は、クローン信号毎に異なる音色変化を与える機能であり、図１に示す第１スペクトル変化手段４１８によって実現される。
図１４は、入力音声信号のあるフレームのスペクトルを例示した図である。なお、図１４では、周波数f[Hz]を横軸にとり、振幅値magnitude[dB]を縦軸にとっている。また、図１４では、スペクトルエンベロープを実線で示し、スロープを表すカーブ（以下、ECurveと称する）を破線で示している。ここでまず、ECurveの振幅値であるECurveMag(f)は、下記式（１'）で表すことができる。
ECurveMag(f)＝Gain＋１００＊（ｅ^−slope＊f−１）・・・（１'）
Gain；当該フレームのゲイン
slope；当該フレームのスロープ A-8. Tone change function A-8-1. First timbre change function The first timbre change function is a function that gives a different timbre change for each clone signal, and is realized by the first spectrum changing means 418 shown in FIG.
FIG. 14 is a diagram illustrating a spectrum of a certain frame of the input audio signal. In FIG. 14, the frequency f [Hz] is on the horizontal axis and the amplitude value magnitude [dB] is on the vertical axis. In FIG. 14, the spectrum envelope is indicated by a solid line, and a curve indicating a slope (hereinafter referred to as ECurve) is indicated by a broken line. First, ECurveMag (f), which is the amplitude value of ECurve, can be expressed by the following equation (1 ′).
ECurveMag (f) = Gain + 100 * (e ^{−slope * f} −1) (1 ′)
Gain: Gain of the frame
slope; slope of the frame

第１スペクトル変化手段４１８は、供給制御手段４１２から入力音声信号のスペクトルを受け取ると、上記のようにあらわされるECurveのslopeを各クローン信号毎に変化させる。図１５及び図１６は、ECurveのslopeを変えたときのスペクトルエンベロープの変化の様子を示す図であり、図１５は、slopeを大きくしたときのスペクトルエンベロープの変化を示し、図１６は、slopeを小さくしたときのスペクトルエンベロープの変化を示している。なお、図１５及び図１６に示すスペクトルエンベロープ及びECurveは、いずれも図１４に示すスペクトルエンベロープ及びECurveを基準にしている。 When receiving the spectrum of the input audio signal from the supply control means 412, the first spectrum changing means 418 changes the ECurve slope expressed as described above for each clone signal. 15 and 16 are diagrams showing changes in the spectral envelope when the slope of ECurve is changed. FIG. 15 shows changes in the spectral envelope when the slope is increased. FIG. The change of the spectrum envelope when making it small is shown. Note that the spectrum envelope and ECurve shown in FIGS. 15 and 16 are all based on the spectrum envelope and ECurve shown in FIG.

図１５と図１４、図１６と図１４をそれぞれ比較して明らかなように、slopeを変化させると、全体の音量を表すGain（各図ではECurveの切片）は変わらないが、ECurveのslopeの変化に伴ってスペクトルエンベロープの形状が変化し、これにより音色が変化する。より具体的には、図１５に示すようにslopeを大きくすると、高域側のスペクトルが出なくなるため、こもった音色になる。一方、図１６に示すようにslopeを小さくすると、低域から高域まで均等にスペクトルが出るため、明るい音色になる。第１スペクトル変化手段４１８は、このようにクローン信号毎にスペクトルエンベロープのslopeを変えることにより、クローン信号毎に異なる音色変化を与えることが可能となる。なお、スペクトルエンベロープのslopeを各クローン信号毎に変える方法は、適宜設定可能である。 As is obvious from comparison between FIGS. 15 and 14 and FIGS. 16 and 14, when the slope is changed, the gain (the ECurve intercept in each figure) indicating the overall volume does not change, but the ECurve slope is changed. With the change, the shape of the spectrum envelope changes, thereby changing the timbre. More specifically, as shown in FIG. 15, when the slope is increased, the high-frequency spectrum does not appear, and the tone becomes muffled. On the other hand, when the slope is reduced as shown in FIG. 16, the spectrum appears uniformly from the low range to the high range, so that a bright tone is obtained. The first spectrum changing means 418 can give a different timbre change for each clone signal by changing the slope of the spectrum envelope for each clone signal in this way. Note that the method of changing the slope of the spectrum envelope for each clone signal can be set as appropriate.

Ａ−８−２．第２音色変化機能
第２音色変化機能は、音色を時間とともに（すなわち、時間的に連続して）変化させる機能であり、上記第１音色変化機能と同様、図１に示す第１スペクトル変化手段４１８によって実現される。例えば入力音声信号のアタック部分等において、スペクトルエンベロープの変化のさせ方を時間とともに変えていくと、ゴスペル的斉唱効果等が得られるといった効果がある。なお、ゴスペル的斉唱効果とは、音色の時間変化が歌唱者毎に異なるために、例えば各歌唱者による歌唱表現がさまざまであるかのような状況を与える効果をいう。ここで、上記の如く音色を時間とともに変化させる方法としては、例えば第１フォルマント周波数（スペクトルのピークが最初に現れる周波数）の変化値を時間とともに変化させる方法がある。図１７は、第１フォルマント周波数の変化の様子を示す図であり、音色変化前の第１フォルマント周波数を実線で示し、音色変化後の第１フォルマント周波数を破線で示している。また、図１８は、このような音色変化を実現するための第１フォルマント周波数の制御方法を説明するための図である。 A-8-2. Second timbre change function The second timbre change function is a function that changes the timbre with time (that is, temporally continuously). Like the first timbre change function, the first spectrum change means shown in FIG. 418. For example, if the method of changing the spectral envelope is changed with time in the attack portion of the input audio signal, etc., there is an effect that a gospel chorus effect or the like can be obtained. The gospel-like singing effect means an effect that gives a situation as if the singing expression by each singer is various, for example, because the time change of the timbre is different for each singer. Here, as a method of changing the timbre with time as described above, for example, there is a method of changing the change value of the first formant frequency (frequency at which the spectrum peak first appears) with time. FIG. 17 is a diagram illustrating a change in the first formant frequency, in which the first formant frequency before the timbre change is indicated by a solid line, and the first formant frequency after the timbre change is indicated by a broken line. FIG. 18 is a diagram for explaining a control method of the first formant frequency for realizing such a timbre change.

まず、第１スペクトル変化手段（変調手段）４１８は、タイミング変更手段４１１から入力音声信号のスペクトルを受け取ると、該スペクトルからスペクトルエンベロープを抽出するとともに、該スペクトルを分析してアタック開始時刻の検出を行う。第１スペクトル変化手段４１８は、アタック時刻を検出すると（図１８に示すＰ１参照）、疑似ランダム関数rand(t)を用いて第１フォルマント周波数の変化目標値（第１フォルマント変化値という）を求める。具体的には、第１スペクトル変化手段４１８は、アタック時刻から所定の入り時間が経過したときに上記第１フォルマント変化値が予め設定されている変化目標値に到達し（図１８に示すＰ２参照）、アタック時刻から所定の収束時間が経過したときに「０」に収束するように第１フォルマント変化値を求める（図１８に示すＰ３参照）。 First, upon receiving the spectrum of the input audio signal from the timing changing unit 411, the first spectrum changing unit (modulating unit) 418 extracts a spectrum envelope from the spectrum and analyzes the spectrum to detect an attack start time. Do. When the first spectrum changing means 418 detects the attack time (see P1 shown in FIG. 18), the first spectrum changing means 418 obtains a change target value (referred to as a first formant change value) of the first formant frequency using the pseudo-random function rand (t). . More specifically, the first spectrum changing means 418 reaches the change target value set in advance when the predetermined entry time has elapsed from the attack time (see P2 shown in FIG. 18). ), The first formant change value is obtained so as to converge to “0” when a predetermined convergence time has elapsed from the attack time (see P3 shown in FIG. 18).

この際、第１スペクトル変化手段４１８は、第１フォルマント変化値のみならず、入り時間、収束時間についても疑似ランダム関数rand(t)を用いて算出する。なお、疑似ランダム関数rand(t)をどのように用いるかは、利用者が所望するゴスペル的斉唱効果の大きさ、長さ等に応じて適宜決定すれば良い。そして、第１スペクトル変化手段４１８は、上記の如く求めた第１フォルマント変化値を下記式（１１）に代入し、出力第１フォルマント周波数［Hz］を求める。なお、下記式（１１）に示す入力第１フォルマント周波数［Hz］は、供給制御手段４１２から供給される入力音声信号の第１フォルマント周波数を指す。
出力第１フォルマント周波数［Hz］＝入力第１フォルマント周波数［Hz］＋第１フォルマント変化値［Hz］・・・（１１） At this time, the first spectrum changing means 418 calculates not only the first formant change value but also the entry time and convergence time using the pseudo-random function rand (t). Note that how to use the pseudo-random function rand (t) may be appropriately determined according to the magnitude, length, etc. of the gospel chorus effect desired by the user. Then, the first spectrum changing means 418 substitutes the first formant change value obtained as described above into the following equation (11) to obtain the output first formant frequency [Hz]. Note that the input first formant frequency [Hz] shown in the following formula (11) indicates the first formant frequency of the input audio signal supplied from the supply control means 412.
Output first formant frequency [Hz] = input first formant frequency [Hz] + first formant change value [Hz] (11)

このように、第１フォルマント周波数を時間とともに変化させることで、上述したスペクトルエンベロープ（図１４等参照）は時間とともに変化する。これに伴って該スペクトルエンベロープのslopeが変化するため、最終的には、音色が時間とともに変化することになる。なお、上記説明では、第１フォルマント周波数を時間とともに変化させる態様を例示したが、例えば第ｍフォルマント周波数（２≦ｍ）を時間とともに変化させる、あるいは第ｍフォルマントの振幅値を時間とともに変化させるなど、何れの態様によって時間とともに音色を変化させるかは適宜選択可能である。また、第１スペクトル変化手段４１８は、以上説明した第１音色変化機能及び第２音色変化機能を実現するほか、スモール変化手段４１７から供給されるピッチ、音量に基づいてスペクトルを変化させる機能も実現する。よって、入力音声が有声音である場合には、有声系統の各手段によって様々な効果が付与されたクローン信号のスペクトルが第１スペクトル変化手段４１８から出力されることになる。 In this way, by changing the first formant frequency with time, the above-described spectrum envelope (see FIG. 14 and the like) changes with time. Along with this, the slope of the spectrum envelope changes, so that the timbre eventually changes with time. In the above description, the first formant frequency is changed with time. However, for example, the mth formant frequency (2 ≦ m) is changed with time, or the amplitude value of the mth formant is changed with time. Depending on which mode the timbre is changed with time, it can be appropriately selected. The first spectrum changing means 418 also realizes the function of changing the spectrum based on the pitch and volume supplied from the small changing means 417 in addition to realizing the first timbre changing function and the second timbre changing function described above. To do. Therefore, when the input voice is a voiced sound, the spectrum of the clone signal to which various effects are given by each means of the voiced system is output from the first spectrum changing means 418.

Ａ−９．無声音質変化機能
無声音質変化機能は、入力音声が無声である場合にこの無声音の音質を変化させる機能であり、図１に示す第２スペクトル変化手段４１９によって実現される。
第２スペクトル変化手段（変更手段）４１９は、供給制御手段（判定手段）４１２から入力音声信号（無声音）のスペクトルを受け取ると、図示せぬランダム信号発生器から供給されるランダム信号（この場合のランダム信号は純粋なランダム信号とする）に基づき、該スペクトルの各周波数f[Hz]における振幅値magnitude[dB]（図１４参照）、位相値をランダムに変更する。この結果、例えば「ｓ」などピッチのない無声音について、違うニュアンスの音質を有する「ｓ」といった無声音を出力することが可能となる。このように、入力音声が無声音である場合には、無声系統の第２スペクトル変化手段４１９によって上記の振幅値magnitude[dB]、位相値が変更されたクローン信号のスペクトルが出力されることになる。なお、上記純粋なランダム信号の代わりに、疑似ランダム信号を用いても良いのはもちろんである。 A-9. Unvoiced sound quality changing function The unvoiced sound quality changing function is a function of changing the sound quality of the unvoiced sound when the input voice is unvoiced, and is realized by the second spectrum changing means 419 shown in FIG.
When receiving the spectrum of the input voice signal (unvoiced sound) from the supply control means (determination means) 412, the second spectrum change means (change means) 419 receives a random signal (in this case, a random signal generator). The random signal is a pure random signal), and the amplitude value magnitude [dB] (see FIG. 14) and the phase value at each frequency f [Hz] of the spectrum are randomly changed. As a result, an unvoiced sound such as “s” having a different nuance sound quality can be output for an unvoiced sound having no pitch such as “s”. As described above, when the input voice is an unvoiced sound, the spectrum of the clone signal in which the amplitude value magnitude [dB] and the phase value are changed by the second spectrum changing unit 419 of the unvoiced system is output. . Of course, a pseudo-random signal may be used instead of the pure random signal.

Ａ−１０．出力切換機能
出力切換機能は、有声系統、無声系統との間で出力スペクトルの切り換えを行う機能であり、図１に示す出力切換手段４２０によって実現される。出力切換手段４２０は、供給制御手段４１２から供給される判定結果が「有声」である場合、有声系統側に切り換えることで、有声系統から出力されるスペクトル（有声スペクトル）を信号合成部７００に供給する一方、該判定結果が「無声」である場合には、無声系統側に切り換えることで、無声系統から出力されるスペクトル（無声スペクトル）を信号合成部７００に出力する。 A-10. Output switching function The output switching function is a function for switching the output spectrum between a voiced system and an unvoiced system, and is realized by the output switching means 420 shown in FIG. When the determination result supplied from the supply control means 412 is “voiced”, the output switching means 420 supplies the spectrum (voiced spectrum) output from the voiced system to the signal synthesis unit 700 by switching to the voiced system side. On the other hand, when the determination result is “unvoiced”, the spectrum output from the unvoiced system (unvoiced spectrum) is output to the signal synthesis unit 700 by switching to the unvoiced system side.

信号合成部７００は、各クローン信号生成部４１０から供給される全ての有声スペクトル若しくは無声スペクトル、及び入力音声信号のスペクトルを加算し、逆ＦＦＴ等を施す等して合成音声信号を得る。この合成音声信号は、上記の如く疑似ランダム関数rand(t)に基づいて種々の効果等が付与された合成音声信号である。よって、利用者等は、この合成音声信号から生成される合成音声を音声出力部８００を介して受聴することで、実際に複数の人が斉唱しているかのような効果を享受することが可能となる。 The signal synthesis unit 700 adds all voiced or unvoiced spectra supplied from each clone signal generation unit 410 and the spectrum of the input voice signal, and performs a reverse FFT or the like to obtain a synthesized voice signal. This synthesized speech signal is a synthesized speech signal to which various effects are given based on the pseudo-random function rand (t) as described above. Therefore, the user or the like can enjoy the effect as if a plurality of people are actually singing by listening to the synthesized voice generated from the synthesized voice signal via the voice output unit 800. It becomes.

以上説明したように、本実施形態に係る音声処理装置によれば、音声変換時に用いる変調信号として、ホワイトノイズにＬＰＦをかけた疑似ランダム信号を用いているため、自然な変化（すなわち、実際に複数の人が斉唱しているかのような変化）を与えることが可能となる。また、上記ＬＰＦにあっては、２Ｈｚ前後で揺れるといった人声の特徴を考慮し、そのカットオフ周波数Ｆｃが２Ｈｚ前後で揺れるように設定される。この結果、カットオフ周波数Ｆｃを固定した場合と比較して、より人間の自然な変化、揺れ方に合致させることが可能となる。なお、上記各処理で用いる疑似ランダム関数rand(t)について、各処理毎にそれぞれ異なるものを用いても良いのはもちろんである。 As described above, according to the audio processing device according to the present embodiment, since a pseudo random signal obtained by multiplying white noise by LPF is used as a modulation signal used at the time of audio conversion, a natural change (ie, actually, Change as if multiple people are singing). In the LPF, the cut-off frequency Fc is set to swing around 2 Hz in consideration of human voice characteristics such as shaking around 2 Hz. As a result, compared with the case where the cut-off frequency Fc is fixed, it becomes possible to match the human natural change and the way of shaking. Of course, the pseudo-random function rand (t) used in each process may be different for each process.

Ｂ．変形例
＜変形例１＞
図１９は、変形例１に係る遷移部変化機能を説明するための図であり、前掲図１１に対応する図である。なお、図１９においても、前掲図１１と同様、変化させる前のピッチを実線（ただし、音程遷移部は太い実線）で示し、変化させた後のピッチを破線で示している。
変形例１に係る遷移部変化機能は、音程が大きく変化する音程遷移部から該音程が安定する音程安定部に移行する度に、該音程安定部のデチューン量（ピッチの僅かなずらし量）を決定し、音程安定部毎にデチューン量を変える機能であり、図１に示す遷移部変化手段４１６’によって実現される。この変形例１に係る遷移部変化機能は、本実施形態に係る遷移部変化機能のように人間の声の自然な変化をシュミレートするものではなく、例えばバイオリンのような楽器から発せられる楽音の変化をシュミレートするものである。周知の通り、バイオリン等の弦楽器は、ピアノ等の鍵盤楽器とは異なり、ある鍵を押下すれば必ずその鍵に対応したジャストピッチ（例えば「Ａ」＝４４０［Hz］）の楽音が発せられるものではなく、弦を押さえる箇所等によって微妙に音が変わるものである。いいかえると、バイオリン等の楽器は、音程を変えるたびに、ジャストピッチ（例えば「Ａ」＝４４０［Hz］）から微妙に音程がずれる一方、その音程の音を出している間はそのピッチでほぼ安定しているといった特徴を有する。このような変化をシュミレートするのが、遷移部変化手段４１６’である（図１９に示す各デチューン量ｄｔ１〜ｄｔ３参照）。 B. Modification <Modification 1>
FIG. 19 is a diagram for explaining the transition part changing function according to the first modification, and corresponds to FIG. 11 described above. In FIG. 19, as in FIG. 11, the pitch before the change is shown by a solid line (however, the pitch transition part is a thick solid line), and the pitch after the change is shown by a broken line.
The transition part changing function according to the modified example 1 is configured to change the detuning amount (a slight shift amount of the pitch) of the pitch stable part every time the pitch transition part changes from a pitch changing part to a pitch stable part where the pitch is stable. This is a function of determining and changing the detune amount for each pitch stable part, and is realized by the transition part changing means 416 ′ shown in FIG. The transition part changing function according to the first modification does not simulate a natural change of a human voice like the transition part changing function according to the present embodiment, but changes a musical sound emitted from an instrument such as a violin. Is to simulate. As is well known, a stringed instrument such as a violin is different from a keyboard instrument such as a piano, and a musical tone having a just pitch (for example, “A” = 440 [Hz]) corresponding to the key is always emitted when a certain key is pressed. Rather, the sound changes slightly depending on where the string is pressed. In other words, every time a musical instrument such as a violin is changed, the pitch slightly deviates from the just pitch (for example, “A” = 440 [Hz]). It has the characteristics of being stable. It is the transition part changing means 416 ′ that simulates such changes (see the respective detune amounts dt1 to dt3 shown in FIG. 19).

図２０は、変形例１に係る遷移部変化手段４１６’によるピッチの制御方法を説明するための図であり、図１９に示すα部分を模式的に示した図である。遷移部変化手段（検出手段）４１６’は、上述した遷移部変化手段４１６と同様に音程遷移部を検出する一方、他の部分を音程安定部として検出する。遷移部変化手段４１６’は、音程安定部から音程遷移部への移行を検出すると（図２０に示すＰ１参照）、後述の如く求めた該音程安定部におけるデチューン量ｄｔ２（＞０）をある時間ｔ１をかけて「０」に収束させる。その後、遷移部変化手段（パラメータ変化手段）４１６’は、音程遷移部から音程安定部への移行を検出すると（図２０に示すＰ２参照）、疑似ランダム関数rand(t)に基づいて新たなデチューン量ｄｔ３（＜０）を求める。そして、遷移部変化手段４１６’は、ある時間ｔ２をかけて新たなデチューン量ｄｔ３となるように制御し、以後、音程安定部から音程遷移部への移行が検出されるまでの間、このデチューン量ｄｔ３を維持する。 FIG. 20 is a diagram for explaining a pitch control method by the transition portion changing unit 416 ′ according to the first modification, and is a diagram schematically showing an α portion shown in FIG. 19. The transition part change means (detection means) 416 'detects the pitch transition part in the same manner as the transition part change means 416 described above, and detects the other part as a pitch stabilization part. When the transition section changing means 416 ′ detects the transition from the pitch stable section to the pitch transition section (see P1 in FIG. 20), the detune amount dt2 (> 0) in the pitch stable section determined as described later is given for a certain time. Multiply t1 to converge to “0”. Thereafter, when the transition portion changing means (parameter changing means) 416 ′ detects a transition from the pitch transition portion to the pitch stable portion (see P2 shown in FIG. 20), a new detune is generated based on the pseudo-random function rand (t). The quantity dt3 (<0) is determined. Then, the transition part changing means 416 ′ controls to obtain a new detune amount dt3 over a period of time t2, and thereafter, until this transition is detected from the pitch stable part to the pitch transition part. Maintain the quantity dt3.

遷移部変化手段４１６’は、このようにして求めた各デチューン量ｄｔ［cent］を下記式（１２）に代入することにより、出力ピッチ［cent］を求める。なお、下記式（１２）に示す入力ピッチは、ビブラート効果付与手段４１５から供給されるピッチを指す。
出力ピッチ［cent］＝入力ピッチ［cent］＋ｄｔ［cent］・・・（１２）
遷移部変化手段４１６’は、このように疑似ランダム関数rand(t)に基づいてデチューン量ｄｔを求め、求めたデチューン量ｄｔを入力ピッチに加算等することで、図１９に破線で示すような音程安定部毎にデチューン量ｄｔが異なる出力ピッチを得る。 The transition section changing means 416 ′ obtains the output pitch [cent] by substituting each detune amount dt [cent] obtained in this way into the following equation (12). Note that the input pitch shown in the following formula (12) indicates the pitch supplied from the vibrato effect applying means 415.
Output pitch [cent] = input pitch [cent] + dt [cent] (12)
The transition part changing means 416 ′ obtains the detune amount dt based on the pseudo-random function rand (t) in this way, and adds the obtained detune amount dt to the input pitch. An output pitch having a different detune amount dt is obtained for each pitch stable part.

以上説明した本変形例に係る方法を入力されるバイオリン等の楽音に適用すれば、利用者等は、ストリングスセクションによる演奏の如く臨場感ある楽音を聴取することが可能となる。なお、上記例では、疑似ランダム信号に基づいてデチューン量ｄｔを決定する場合について説明したが、デチューン量を「０」に収束させる時間（ある時間ｔ１）や、デチューン量をある値まで増加させる時間（ある時間ｔ２）を疑似ランダム信号に基づいて決定しても良い。また、本変形例１に係る遷移部変化手段４１６’と本実施形態に係る遷移部変化手段４１６の両者を音声処理装置１００に搭載しても良く、また、何れか一方の遷移部変化手段を該装置１００に搭載しても良い。 If the method according to this modification described above is applied to an input musical tone such as a violin, a user or the like can listen to a realistic musical tone like a performance by a strings section. In the above example, the case where the detune amount dt is determined based on the pseudo random signal has been described. However, the time for the detune amount to converge to “0” (a certain time t1) and the time for increasing the detune amount to a certain value. (A certain time t2) may be determined based on the pseudo-random signal. Moreover, both the transition part changing means 416 ′ according to the first modification and the transition part changing means 416 according to the present embodiment may be mounted on the speech processing apparatus 100, and any one of the transition part changing means is provided. You may mount in this apparatus 100.

＜変形例２＞
また、上述した本実施形態では、斉唱効果（基本ピッチはいずれのクローン信号も同じ）を得る場合について説明したが、例えば基本ピッチが各クローン信号毎に異なっており、これらが合成されることにより音楽的なハーモニーが構成されるようなハーモナイザ効果を得る場合にも適用可能である。ハーモナイザ効果を得る場合には、例えば図１に示す各タイミング変更手段４１１から出力されるピッチを、所望の量（音楽的なハーモニーが構成されるようなピッチの変化量）だけ変化させれば良い。例えば、Ｃ４の音が装置内部に入力された場合、タイミング変更手段４１１−１からはＣ４→Ｃ３に変更されたピッチが出力され、タイミング変更手段４１１−２からはＣ４→Ａ４に変更されたピッチが出力され、・・・、タイミング変更手段４１１−ｎからはＣ４→Ｇ４に変更されたピッチが出力される。このようにしてハーモナイザ効果を得るようにしても良い。 <Modification 2>
Further, in the present embodiment described above, the case where the chorus effect (basic pitch is the same for all clone signals) has been described. For example, the basic pitch is different for each clone signal, and these are synthesized. The present invention can also be applied to the case of obtaining a harmony effect that composes a musical harmony. In order to obtain the harmonizer effect, for example, the pitch output from each timing changing means 411 shown in FIG. . For example, when the sound of C4 is input to the inside of the apparatus, the pitch changed from C4 to C3 is output from the timing changing unit 411-1 and the pitch changed from C4 to A4 is output from the timing changing unit 411-2. Is output from the timing changing means 411-n. The pitch changed from C4 to G4 is output. In this way, a harmonicizer effect may be obtained.

＜変形例３＞
また、上述した本実施形態では、入力される楽音として音声を例示したが、楽器音（例えばストリングス）等が入力される場合にも適用可能である。また、図１に示す各クローン信号生成部４１０から出力されるクローン信号について、音の定位を制御するためのパンパラメータを適宜変更し、これを合成するようにしても良いのはもちろんである。かかる態様によれば、ステレオ空間内に適切に拡がるような斉唱効果等を得ることが可能となる。また、上述した本実施形態においては、カットオフ周波数Ｆｃを２Ｈｚ前後で揺らすようなＬＰＦを例示したが、ＬＰＦに限らずＢＰＦ（Band Pass Filter）等を用いても良いのはもちろんである。また、カットオフ周波数Ｆｃを２Ｈｚ前後で揺らす（すなわち、カットオフ周波数Ｆｃを一定の周波数範囲内で変更する）態様に限らず、各々の変化項目等に応じて適宜変更可能である。 <Modification 3>
In the above-described embodiment, the voice is exemplified as the input musical sound. However, the present invention can also be applied to the case where an instrument sound (for example, strings) is input. In addition, it is of course possible to appropriately change the pan parameters for controlling the sound localization for the clone signals output from the clone signal generation units 410 shown in FIG. According to this aspect, it is possible to obtain a chorus effect or the like that appropriately spreads in the stereo space. In the present embodiment described above, an LPF that fluctuates the cutoff frequency Fc around 2 Hz is exemplified, but it is needless to say that not only the LPF but also a BPF (Band Pass Filter) or the like may be used. Further, the present invention is not limited to a mode in which the cut-off frequency Fc is fluctuated around 2 Hz (that is, the cut-off frequency Fc is changed within a certain frequency range), and can be appropriately changed according to each change item.

＜変形例４＞
また、以上説明した音声処理装置１００の各部の機能は、ＲＯＭ等に格納されているプログラムによって実現されるため、かかるプログラムについてＣＤ−ＲＯＭ等の記録媒体に記録して頒布したり、インターネット等の通信ネットワークを介して頒布しても良い。もちろん、音声処理装置１００の各部の機能をハードウェアによって実現しても良い。 <Modification 4>
Further, since the functions of the respective units of the voice processing apparatus 100 described above are realized by a program stored in a ROM or the like, the program is recorded on a recording medium such as a CD-ROM and distributed, or the Internet or the like. You may distribute via a communication network. Of course, the function of each unit of the speech processing apparatus 100 may be realized by hardware.

本実施形態に係る音声処理装置の構成を示す図である。It is a figure which shows the structure of the audio processing apparatus which concerns on this embodiment. 同実施形態に係る疑似ランダム信号発生部の構成を示す図である。It is a figure which shows the structure of the pseudo random signal generation part which concerns on the same embodiment. 同実施形態に係る疑似ランダム信号の波形を例示した図である。It is the figure which illustrated the waveform of the pseudorandom signal concerning the embodiment. 同実施形態に係るフレーム情報の出力タイミングが変更されたときの様子を示す図である。It is a figure which shows a mode when the output timing of the frame information which concerns on the embodiment is changed. 同実施形態に係るタイミング変更処理を示すフローチャートである。It is a flowchart which shows the timing change process which concerns on the same embodiment. 同実施形態に係るタイミング変更処理を説明するための図である。It is a figure for demonstrating the timing change process which concerns on the same embodiment. 同実施形態に係るピッチのトレンド変化の様子を示す図である。It is a figure which shows the mode of the trend change of the pitch which concerns on the same embodiment. 同実施形態に係るしゃくり効果が付与されたときのピッチ変化の様子を示す図である。It is a figure which shows the mode of a pitch change when the scooping effect which concerns on the embodiment is provided. 同実施形態に係るしゃくり効果の制御方法を説明するための図である。It is a figure for demonstrating the control method of the scooping effect which concerns on the same embodiment. 同実施形態に係るビブラート効果が付与されたときのピッチ変化の様子を示す図である。It is a figure which shows the mode of a pitch change when the vibrato effect which concerns on the embodiment is provided. 同実施形態に係る遷移部前後におけるピッチ変化の様子を示す図である。It is a figure which shows the mode of the pitch change before and behind the transition part which concerns on the same embodiment. 同実施形態に係る遷移部におけるピッチの制御方法を説明するための図である。It is a figure for demonstrating the control method of the pitch in the transition part which concerns on the embodiment. 同実施形態に係るピッチのスモール変化の様子を示す図である。It is a figure which shows the mode of the small change of the pitch which concerns on the same embodiment. 同実施形態に係る入力音声信号のあるフレームのスペクトルを例示した図である。It is the figure which illustrated the spectrum of a certain frame with the input voice signal concerning the embodiment. 同実施形態に係るECurveのslopeを大きくしたときのスペクトルエンベロープの変化を示す図である。It is a figure which shows the change of the spectrum envelope when the slope of ECurve which concerns on the same embodiment is enlarged. 同実施形態に係るECurveのslopeを小さくしたときのスペクトルエンベロープの変化を示す図である。It is a figure which shows the change of the spectrum envelope when the slope of ECurve which concerns on the embodiment is made small. 同実施形態に係る第１フォルマント周波数の変化の様子を示す図である。It is a figure which shows the mode of a change of the 1st formant frequency concerning the embodiment. 同実施形態に係る第１フォルマント周波数の制御方法を説明するための図である。It is a figure for demonstrating the control method of the 1st formant frequency which concerns on the embodiment. 変形例１に係る遷移部変化機能を説明するための図である。It is a figure for demonstrating the transition part change function which concerns on the modification 1. FIG. 同変形例に係るピッチの制御方法を説明するための図である。It is a figure for demonstrating the control method of the pitch which concerns on the modification.

Explanation of symbols

１００・・・音声処理装置、２００・・・音声信号入力部、３００・・・音声信号分析部、４００・・・クローン信号生成ユニット、４１０・・・クローン信号生成部、４１１・・・タイミング変更手段、４１２・・・供給制御手段、４１３・・・トレンド変化手段、４１４・・・しゃくり効果付与手段、４１５・・・ビブラート効果付与手段、４１６、４１６’・・・遷移部変化手段、４１７・・・スモール変化手段、４１８・・・第１スペクトル変化手段、４１９・・・第２スペクトル変化手段、４２０・・・出力切換手段、５００・・・疑似ランダム信号発生ユニット、５１０・・・疑似ランダム信号発生部、５１１・・・ホワイトノイズ発生器、５１２・・・ＬＰＦ、５１３・・・カットオフ周波数設定手段、５１４・・・正規化手段、５１５・・・出力手段、６００・・・制御情報入力部、７００・・・信号合成部、７１０・・・第１加算器、７２０・・・第２加算器、７３０・・・変換部、８００・・・音声出力部。 DESCRIPTION OF SYMBOLS 100 ... Audio processing apparatus, 200 ... Audio signal input part, 300 ... Audio signal analysis part, 400 ... Clone signal generation unit, 410 ... Clone signal generation part, 411 ... Timing change Means 412 ... Supply control means 413 ... Trend changing means 414 ... Scribbing effect applying means 415 ... Vibrato effect applying means 416, 416 '... Transition part changing means 417 ..Small changing means, 418 ... first spectrum changing means, 419 ... second spectrum changing means, 420 ... output switching means, 500 ... pseudorandom signal generating unit, 510 ... pseudorandom Signal generator, 511... White noise generator, 512... LPF, 513... Cutoff frequency setting means, 514. 515 ... Output means, 600 ... Control information input unit, 700 ... Signal synthesis unit, 710 ... First adder, 720 ... Second adder, 730 ... Conversion unit, 800: Audio output unit.

Claims

Noise generating means for generating a noise signal;
Filter means for removing a signal of a specific frequency component from the noise signal according to a set cutoff frequency and outputting as a pseudo-random signal;
A musical sound processing apparatus comprising: modulation means for modulating an input musical sound based on the pseudo-random signal.

2. The musical tone processing apparatus according to claim 1, further comprising cut-off frequency setting means for setting the cut-off frequency and changing the cut-off frequency within a certain frequency range.

The filter means is a low-pass filter that removes a signal having a frequency component higher than the cut-off frequency, and the cut-off frequency setting means changes the cut-off frequency within a certain frequency range around 2 Hz. The musical tone processing apparatus according to claim 2.

An analyzing / extracting means for extracting a parameter representing the characteristic of the musical sound by analyzing the musical sound;
The modulating means includes
The musical tone processing apparatus according to claim 1, wherein the musical tone is modulated by changing an extracted parameter based on the pseudo-random signal.

5. The musical tone processing apparatus according to claim 1, further comprising a synthesizing unit that synthesizes the musical sound after being modulated by the modulating unit and the musical tone before being modulated. .

The musical sound processing apparatus according to claim 4, wherein the parameter is a pitch or a volume of the musical sound.

The parameter control means includes
Detecting means for detecting an attack time of the input musical sound;
A change amount calculating means for calculating a change amount per unit time of the pitch or volume of the musical sound until a certain time has elapsed since the attack time was detected;
7. The musical tone processing apparatus according to claim 6, further comprising parameter changing means for changing the pitch or volume of the musical tone by adding the calculated change amount to the pitch or volume of the musical tone.

The parameter control means includes
8. The musical sound processing apparatus according to claim 7, further comprising time calculating means for calculating the certain time using the pseudo random signal.

Detecting means for detecting the volume of the sound; and when the volume of the sound detected by the detecting means decreases and the volume falls below a threshold value, the output timing of the parameter is based on the pseudo-random signal. 5. The musical tone processing apparatus according to claim 4, further comprising timing changing means for changing.

It further comprises input means for inputting vibrato control information for controlling the addition of vibrato,
The modulating means includes
4. The musical tone is modulated by changing the vibrato control information based on the random signal and adding vibrato to the musical tone according to the changed vibrato control information. The musical sound processing apparatus according to claim 1.

The parameter control means includes
When a condition of how to change the pitch or volume of the musical sound satisfies a certain condition, a transition detection unit that detects the part as a transition part of the pitch or volume of the musical sound,
Transition part time detecting means for detecting a start time and an end time of the transition part;
Calculating means for calculating at least a change amount per unit time of pitch or volume of the musical sound from when a start time of the transition unit is detected until an end time is detected;
7. The musical tone according to claim 6, further comprising parameter changing means for changing the calculated pitch or volume by adding the calculated change amount to the corresponding pitch or volume of the musical tone. Processing equipment.

The detection means obtains an average value of the pitch or volume of the musical sound at regular intervals, and when the difference between the average value obtained last time and the average value obtained this time exceeds a threshold value, the portion is detected as the pitch of the musical sound or The musical tone processing apparatus according to claim 11, wherein the musical tone processing apparatus is detected as a volume transition unit.

The calculation means calculates the amount of change from when the start time of the transition unit is detected until the end time is detected, and also the pitch of the musical sound until a certain time has elapsed since the end time was detected. Alternatively, the musical sound processing apparatus according to claim 12, wherein the amount of change per unit time of the volume is calculated using the pseudo-random signal.

The parameter is the amount of music detuning,
The parameter control means includes
A detecting means for detecting a portion satisfying a certain condition of a change in the pitch of the musical sound as a pitch transition portion and the other portion as a pitch stable portion;
The parameter changing means for changing the detune amount based on the pseudo-random signal every time a transition from the pitch transition portion to the pitch stabilization portion is detected. Music processing device.

The musical sound is a voice,
A determination means for determining whether the voice is voiced or unvoiced;
When it is determined that the voice is unvoiced, it further comprises changing means for randomly changing the amplitude value or phase value of each frequency component constituting the voice modulated by the modulation means. The musical sound processing apparatus according to any one of claims 1 to 3.

The modulating means includes
The musical sound processing according to any one of claims 1 to 3, wherein a spectral envelope of the musical sound is extracted, and the spectral envelope is continuously changed in time based on the pseudo-random signal. apparatus.

A cut-off frequency setting process for setting a cut-off frequency for the filter means that removes the signal of the specific frequency component from the noise signal generated by the noise generator and outputs it as a pseudo-random signal,
A musical sound processing method comprising: modulating a musical sound that is input based on the pseudo-random signal output from the filter means in which the cutoff frequency is set.

Computer
Noise generating means for generating a noise signal;
Filter means for removing a signal of a specific frequency component from the noise signal according to a set cutoff frequency and outputting as a pseudo-random signal;
A musical sound processing program for causing an input musical sound to function as modulation means for modulating the musical sound based on the pseudo-random signal.