JP4428435B2

JP4428435B2 - Pitch converter and program

Info

Publication number: JP4428435B2
Application number: JP2007268394A
Authority: JP
Inventors: 靖雄吉岡; ロスコスアレックス
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-10-15
Filing date: 2007-10-15
Publication date: 2010-03-10
Anticipated expiration: 2024-08-25
Also published as: JP2008058986A

Abstract

<P>PROBLEM TO BE SOLVED: To obtain an output sound of natural sound quality by a pitch conversion device which uses phase vocoder technology. <P>SOLUTION: An amplitude spectrum (A) is obtained by analyzing the frequency of an input speech waveform through FFT (Fast Fourier Transform) analytic processing. A plurality of local peaks P<SB>0</SB>to P<SB>2</SB>etc., of spectrum intensity are detected on the amplitude spectrum (A) and spectrum distribution regions R<SB>0</SB>etc., are designated by the local peaks. As shown in (B), the spectrum distribution regions R<SB>0</SB>etc., are moved on a frequency axis according to an input pitch to vary the pitch. The spectral distribution region having a pitch frequency approximate to a peak frequency relating to setting and having a spectral intensity of a local peak most approximate to an envelope value relating to the designation is selected and the amplitude spectrum data and phase spectral data of the spectrum distribution region are copied and are used for sound signal generation. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、歌唱合成等に用いるに好適なピッチ変換装置及びプログラムに関し、更に詳しくはフェーズボコーダ技術を用いたピッチ変換技術の改良に関するものである。 The present invention relates to a pitch conversion apparatus and program suitable for use in singing synthesis and the like, and more particularly to improvement of pitch conversion technology using phase vocoder technology.

従来、フェーズボコーダ技術を用いたピッチ変換技術が知られている（例えば非特許文献１参照）。また、このようなピッチ変換技術を利用してピッチ変更を行なう歌唱合成装置も本願と同一出願人により提案され、知られている（例えば、特許文献１参照）。この種の歌唱合成装置におけるピッチ変更処理を図１３について説明する。 Conventionally, a pitch conversion technique using a phase vocoder technique is known (see, for example, Non-Patent Document 1). Also, a singing synthesizing apparatus that changes pitches using such pitch conversion technology has been proposed and known by the same applicant as the present application (see, for example, Patent Document 1). The pitch changing process in this type of singing voice synthesizing apparatus will be described with reference to FIG.

図１３（Ａ）は、原音声の音声波形をＦＦＴ（Fast Fourier Transform）分析処理により周波数分析して得られた振幅スペクトルを示すものである。このような振幅スペクトル上では、複数の局所的ピークＰ_０〜Ｐ_２が指定されると共に、各局所的ピーク毎にその前後のスペクトルを含むＲ_０等のスペクトル分布領域が指定される。局所的ピークＰ_０，Ｐ_１，Ｐ_２は、それぞれ基音，第１倍音（基音の２倍の周波数を有する２倍音），第２倍音（基音の３倍の周波数を有する３倍音）に対応するピークであり、Ｒ_０，Ｒ_１，Ｒ_２は、それぞれピークＰ_０，Ｐ_１，Ｐ_２に対応するスペクトル分布領域である。ｆａ，ｆｂ，ｆｃ，ｆｄは、それぞれスペクトル分布領域Ｒ_０，Ｒ_１，Ｒ_２，Ｒ_３の下限周波数であり、スペクトル分布領域Ｒ_０，Ｒ_１，Ｒ_２の上限周波数は、それぞれ下限周波数ｆｂ，ｆｃ，ｆｄよりわずかに低い周波数に設定される。 FIG. 13A shows an amplitude spectrum obtained by frequency analysis of the speech waveform of the original speech by FFT (Fast Fourier Transform) analysis processing. On such an amplitude spectrum, a plurality of local peaks P _{0 to} P ₂ are specified, and a spectral distribution region such as R ₀ including the spectra before and after each local peak is specified. Local peaks P ₀ , P ₁ , and P ₂ correspond to a fundamental tone, a first harmonic (a second harmonic having a frequency twice that of the fundamental), and a second harmonic (a third harmonic having a frequency that is three times that of the fundamental), respectively. It is a peak, and R ₀ , R ₁ and R ₂ are spectral distribution regions corresponding to the peaks P ₀ , P ₁ and P ₂ , respectively. fa, fb, fc, and fd are the lower limit frequencies of the spectrum distribution regions R ₀ , R ₁ , R ₂ , and R ₃ , respectively, and the upper limit frequencies of the spectrum distribution regions R ₀ , R ₁ , and R ₂ are the lower limit frequencies fb, respectively. , Fc, fd are set to slightly lower frequencies.

ピッチ変更処理では、一例として図１３（Ｂ）に示すようなピッチ上昇処理を行なう。ピッチ上昇処理では、原音声より高いピッチ（音高）を得るように各スペクトル分布領域毎に振幅スペクトル分布を周波数軸上で高音側に移動する。すなわち、原音声の基音のピークの周波数をｆ_０とし、ピッチ上昇後の基音のピークの周波数をｆ_０１とすると、ピッチ変更比Ｔは、Ｔ＝ｆ_０１／ｆ_０となる。ピッチ上昇後の基音のピークＰ_０が周波数ｆ_０１＝ｆ_０Ｔに位置するように領域Ｒ_０内の振幅スペクトル分布を周波数軸上で高音側に移動する。また、原音声の第１，第２倍音のピークＰ_１，Ｐ_２の周波数をそれぞれｆ_１，ｆ_２とすると、ピッチ上昇後の第１倍音のピークＰ_１が周波数ｆ_１１＝ｆ_１Ｔに位置するように領域Ｒ_１内の振幅スペクトル分布を周波数軸上で高音側に移動すると共に、ピッチ上昇後の第２倍音のピークＰ_２が周波数ｆ_２１＝ｆ_２Ｔに位置するように領域Ｒ_２内の振幅スペクトル分布を周波数軸上で高音側に移動する。 In the pitch change process, a pitch increase process as shown in FIG. 13B is performed as an example. In the pitch increase process, the amplitude spectrum distribution is moved to the high pitch side on the frequency axis for each spectrum distribution region so as to obtain a pitch (pitch) higher than the original voice. That is, if the frequency of the fundamental tone peak of the original voice is f ₀ and the peak frequency of the fundamental tone after the pitch rise is f ₀₁ , the pitch change ratio T is T = f ₀₁ / f ₀ . The amplitude spectrum distribution in the region R ₀ is moved to the high frequency side on the frequency axis so that the peak P ₀ of the fundamental tone after the pitch rise is located at the frequency f ₀₁ = f ₀ T. Also, assuming that the frequencies of the first and second harmonic peaks P ₁ and P ₂ of the original voice are f ₁ and f ₂ , respectively, the peak P ₁ of the first harmonic after the pitch increase becomes a frequency f ₁₁ = f ₁ T. The region R ₁ is moved so that the amplitude spectrum distribution in the region R ₁ is located on the high frequency side on the frequency axis, and the second harmonic overtone peak P ₂ after the pitch rise is located at the frequency f ₂₁ = f ₂ T. ₂ is moved to the high pitch side on the frequency axis.

図１３（Ｂ）に示した例では、図１３（Ａ）に示したピークＰ_０〜Ｐ_２を結ぶスペクトルエンベロープ（原音声のスペクトルエンベロープ）ＥＶａと同様の形状を有するスペクトルエンベロープＥＶｂにピッチ上昇後のピークＰ_０〜Ｐ_２を合わせるように振幅スペクトル分布を配置したので、ピッチ上昇後の音色は、原音声の音色と同じになる。原音声とは異なる音色を得たいときは、スペクトルエンベロープＥＶａとは異なる形状をスペクトルエンベロープＥＶｂに付与した上でスペクトルエンベロープＥＶｂにピッチ上昇後のピークＰ_０〜Ｐ_２を合わせるように振幅スペクトル分布を配置すればよい。 In the example shown in FIG. 13B, after the pitch rises to the spectrum envelope EVb having the same shape as the spectrum envelope (spectrum envelope of the original speech) EVa connecting the peaks P _{0 to} P ₂ shown in FIG. Since the amplitude spectrum distribution is arranged so as to match the peaks P _{0 to} P ₂ , the tone after the pitch rise becomes the same as the tone of the original voice. To obtain a timbre different from the original voice, give the spectrum envelope EVb a shape different from the spectrum envelope EVa, and then adjust the amplitude spectrum distribution so that the peaks P _{0 to} P ₂ after the pitch increase are matched to the spectrum envelope EVb. What is necessary is just to arrange.

一方、上記した周波数分析処理により図１３（Ａ）の振幅スペクトルに対応した位相スペクトルが得られる。このような位相スペクトルに基づいて前述のスペクトル分布領域毎に位相スペクトル分布が定められる。図１４には、あるスペクトル分布領域における振幅スペクトル分布ａｍ_０及び位相スペクトル分布ｐｈ_０を示す。簡単のため、振幅スペクトル分布ａｍ_０は、図１３（Ａ）の領域Ｒ_０内の振幅スペクトル分布とは異なる単純な形状のものを示した。図１４において、ｆ_０は局所的ピークに対応するピーク周波数、φ_０はピーク周波数ｆ_０に対応するピーク位相であり、ｆ_Ｌ及びｆ_Ｕは、スペクトル分布領域の下限周波数及び上限周波数をそれぞれ表わす。 On the other hand, a phase spectrum corresponding to the amplitude spectrum of FIG. Based on such a phase spectrum, a phase spectrum distribution is determined for each of the aforementioned spectrum distribution regions. FIG. 14 shows an amplitude spectrum distribution am ₀ and a phase spectrum distribution ph ₀ in a certain spectrum distribution region. For simplicity, the amplitude spectrum distribution am ₀ has a simple shape different from the amplitude spectrum distribution in the region R ₀ in FIG. In FIG. 14, f ₀ is the peak frequency corresponding to the local peak, φ ₀ is the peak phase corresponding to the peak frequency f ₀ , and f _L and f _U represent the lower limit frequency and the upper limit frequency of the spectrum distribution region, respectively. .

図１５には、前述のピッチ変更処理によりピッチ上昇を行なったときの振幅スペクトル分布ＡＭ_０及び位相スペクトル分布ＰＨ_０を示す。振幅スペクトル分布ＡＭ_０は、ピッチ上昇後のピークが周波数ｆ_０１＝ｆ_０Ｔに位置するように振幅スペクトル分布ａｍ_０を周波数軸上で高音側に移動したものである。位相スペクトル分布ＰＨ_０は、ピッチ上昇後のピーク位相が周波数ｆ_０１＝ｆ_０Ｔに位置するように（振幅スペクトル分布ａｍ_０の周波数変更に対応して）位相スペクトル分布ｐｈ_０を周波数軸上で高音側に移動すると共に、移動後の位相スペクトル分布において各スペクトルビンの位相を振幅スペクトル分布ａｍ_０のピッチ上昇に対応して修正したものである。ここで、各スペクトルビンとは、位相スペクトル分布において各周波数に対応する位相スペクトルのことである。 FIG. 15 shows the amplitude spectrum distribution AM ₀ and the phase spectrum distribution PH ₀ when the pitch is increased by the above-described pitch change process. The amplitude spectrum distribution AM ₀ is obtained by moving the amplitude spectrum distribution am ₀ to the high pitch side on the frequency axis so that the peak after the pitch rise is located at the frequency f ₀₁ = f ₀ T. The phase spectrum distribution PH ₀ is obtained by changing the phase spectrum distribution ph ₀ on the frequency axis so that the peak phase after pitch increase is located at the frequency f ₀₁ = f ₀ T (corresponding to the frequency change of the amplitude spectrum distribution am ₀ ). while moving to the treble side is a modification corresponding to the phase of each spectral bin in the pitch increase in the amplitude spectrum distribution it is ₀ in phase spectrum distribution after the movement. Here, each spectrum bin is a phase spectrum corresponding to each frequency in the phase spectrum distribution.

各スペクトルビンの位相を修正するには、次の数１の式に従って位相変更量Δφ_０を求め、各スペクトルビンの位相にΔφ_０を加える。数１の式において、Δｔは、フレーム間隔（フレーム周期）を表わす。 In order to correct the phase of each spectral bin, the phase change amount Δφ ₀ is obtained according to the following equation 1, and Δφ ₀ is added to the phase of each spectral bin. In Equation 1, Δt represents a frame interval (frame period).

例えば、ピッチ上昇後のピーク周波数ｆ_０１に対応するスペクトルビンについては、ピーク位相φ_０にΔφ_０を加えてφ_０＋Δφ_０なる位相とする。他のスペクトルビンについても、各スペクトルビン毎にΔφ_０を加えた位相とする。この結果、図１５に示すような位相スペクトル分布ＰＨ_０が得られる。図１５において、Ｆ_Ｌ及びＦ_Ｕは、ピッチ上昇後のスペクトル分布領域の下限周波数及び上限周波数をそれぞれ表わす。図１４において、周波数ｆ_０，ｆ_Ｌの差分は（ｆ_０−ｆ_Ｌ）であり、周波数ｆ_Ｕ，ｆ_０の差分は（ｆ_Ｕ−ｆ_０）である。これらの差分（ｆ_０−ｆ_Ｌ），（ｆ_Ｕ−ｆ_０）にそれぞれ対応して下限周波数Ｆ_Ｌ及び上限周波数Ｆ_Ｕが設定される。 For example, the spectrum bin corresponding to the peak frequency f ₀₁ after the pitch rise is set to a phase of φ ₀ + Δφ ₀ by adding Δφ ₀ to the peak phase φ ₀ . For other spectral bins, the phase is obtained by adding Δφ ₀ for each spectral bin. As a result, a phase spectrum distribution PH ₀ as shown in FIG. 15 is obtained. In Figure 15, F _L and F _U represent each a lower limit frequency and upper limit frequency of the spectrum distribution region after elevating pitch. In FIG. 14, the difference between the frequencies f ₀ and f _L is (f ₀ −f _L ), and the difference between the frequencies f _U and f ₀ is (f _U −f ₀ ). The lower limit frequency _FL and the upper limit frequency _FU are set corresponding to these differences (f ₀ −f _L ) and (f _U −f ₀ ), respectively.

図１３（Ｂ）に示す各スペクトル分布領域毎の振幅スペクトル分布を表わす振幅スペクトルデータと、各スペクトル分布領域毎に図１３（Ｂ）の振幅スペクトル分布にそれぞれ対応し且つ図１４，１５に関して前述したような修正処理を受け位相スペクトル分布を表わす位相スペクトルデータとは、逆ＦＦＴ処理等により時間領域の音声信号に変換される。この結果、原音声に比べてピッチがＴ倍高い音声信号が得られる。このような音声信号としては、前述したようにスペクトルエンベロープを変更することにより原音声とは音色を異にするものを得ることもできる。
Ｊ．Laroche and Ｍ．Dolson，“New Phase−Vocoder Techniques for Real−Time Pitch Shifting，Chorusing，Harmonizing，and Other Exotic Audio Modifications”Ｊ．Audio Eng．Soc．，Vol．４７，No．１１，１９９９ November 特開２００３−２５５９９８号公報 The amplitude spectrum data representing the amplitude spectrum distribution for each spectrum distribution region shown in FIG. 13B and the amplitude spectrum distribution of FIG. 13B for each spectrum distribution region, respectively, and described above with reference to FIGS. The phase spectrum data representing the phase spectrum distribution subjected to such correction processing is converted into a time-domain audio signal by inverse FFT processing or the like. As a result, an audio signal having a pitch T times higher than that of the original audio can be obtained. As such an audio signal, it is possible to obtain a signal having a timbre different from that of the original voice by changing the spectrum envelope as described above.
J. et al. Laroche and M.M. Dolson, “New Phase-Vocoder Techniques for Real-Time Pitch Shifting, Chorusing, Harmonizing, and Other Exotic Audio Modifications” Audio Eng. Soc. , Vol. 47, no. 11, 1999 November JP 2003-255998 A

上記した音声変換技術によると、音声波形の周波数分析結果を調和成分と非調和成分とに分離しないで音声変換を行なうため、非調和成分が分離して響くことがなく、自然な変換音が得られるはずである。また、有声音の摩擦音や破裂音であっても自然な変換音が得られるはずである。しかしながら、本願の発明者の研究によれば、一層自然な音質を得るためには、いくつかの改良すべき点があることが判明した。 According to the above-described voice conversion technology, voice conversion is performed without separating the frequency analysis result of the voice waveform into a harmonic component and a non-harmonic component, so that the non-harmonic component does not resonate and a natural converted sound is obtained. Should be. In addition, a natural converted sound should be obtained even if it is a voiced frictional sound or a plosive sound. However, according to the research of the inventors of the present application, it has been found that there are some points to be improved in order to obtain a more natural sound quality.

まず、図１３に関して前述したピッチ変更処理にあっては、図１３（Ａ）に示すように基音のピーク周波数ｆ_０をそれぞれ２倍，３倍にした完全倍音周波数２ｆ_０，３ｆ_０を想定すると、第１倍音のピーク周波数ｆ_１が２ｆ_０より高音側にずれたり、第２倍音のピーク周波数ｆ_２が３ｆ_０より低音側にずれたりしていずれも対応する完全倍音周波数に一致しないことが多い。これは、実際の人間の声が完全に周期的でないことに由来する。 First, in the pitch change processing described above with reference to FIG. 13, it is assumed that perfect harmonic frequencies 2f ₀ and 3f ₀ are obtained by doubling and triple the peak frequency f _{0 of the} fundamental tone, respectively, as shown in FIG. 13 (A). The peak frequency f ₁ of the first overtone shifts to a higher sound side than 2f _0, or the peak frequency f ₂ of the second overtone shifts to a lower sound side than 3f ₀ , and none of them matches the corresponding perfect harmonic frequency. Many. This stems from the fact that the actual human voice is not perfectly periodic.

図１３（Ａ）に示した各スペクトル分布領域毎の振幅スペクトル分布に図１３（Ｂ）に関して前述したようにピッチ上昇処理を施すと、第１及び第２倍音のピーク周波数ｆ_１１及びｆ_２１は、それぞれｆ_１Ｔ及びｆ_２Ｔとなり、基音のピーク周波数ｆ_０１に関して想定された完全倍音周波数２ｆ_０１及び３ｆ_０１からそれぞれ大きくずれることになる。すなわち、ピッチ上昇前においては、第１倍音のピーク周波数ｆ_１と完全倍音周波数２ｆ_０との差分Δｆ_１は、Δｆ_１＝ｆ_１−２ｆ_０であり、第２倍音のピーク周波数ｆ_２と完全倍音周波数３ｆ_０との差分Δｆ_２は、Δｆ_２＝ｆ_２−３ｆ_０であるのに対し、ピッチ上昇後においては、第１倍音のピーク周波数ｆ_１１と完全倍音周波数２ｆ_０１との差分Δｆ_１１は、Δｆ_１１＝ｆ_１Ｔ−２ｆ_０１＝ｆ_１Ｔ−２ｆ_０Ｔ＝（ｆ_１−２ｆ_０）Ｔとなり、第２倍音のピーク周波数ｆ_２１と完全倍音周波数３ｆ_０１との差分Δｆ_２１は、Δｆ_２１＝ｆ_２Ｔ−３ｆ_０１＝ｆ_２Ｔ−３ｆ_０Ｔ＝（ｆ_２−３ｆ_０）Ｔとなる。 When the pitch increase process is performed on the amplitude spectrum distribution for each spectrum distribution region shown in FIG. 13A as described above with reference to FIG. 13B, the peak frequencies f ₁₁ and f ₂₁ of the first and second overtones are obtained. F ₁ T and f ₂ T, respectively, and deviate greatly from the perfect harmonic frequencies 2f ₀₁ and 3f ₀₁ assumed for the peak frequency f _{01 of the} fundamental tone. That is, before the pitch rise, the difference Δf ₁ between the peak frequency f ₁ of the first overtone and the perfect harmonic frequency 2f ₀ is Δf ₁ = f ₁ −2f ₀ , which is completely equal to the peak frequency f ₂ of the second overtone. The difference Δf ₂ from the harmonic frequency 3f ₀ is Δf ₂ = f ₂ −3f ₀ , whereas after the pitch rise, the difference Δf ₁₁ between the peak frequency f ₁₁ of the first harmonic and the perfect harmonic frequency 2f ₀₁ is obtained. Δf ₁₁ = f ₁ T−2f ₀₁ = f ₁ T−2f ₀ T = (f ₁ −2f ₀ ) T, and the difference Δf ₂₁ between the peak frequency f ₂₁ of the second harmonic and the perfect harmonic frequency 3f ₀₁ is Δf ₂₁ = f ₂ T-3f ₀₁ = f ₂ T-3f ₀ T = (f ₂ −3f ₀ ) T.

差分Δｆ_１及びΔｆ_２の絶対値をそれぞれ差分Δｆ_１１及びΔｆ_２１の絶対値と対比すると、差分Δｆ_１１及びΔｆ_２１の絶対値は、それぞれ差分Δｆ_１及びΔｆ_２の絶対値のＴ倍になっているのがわかる。図１３（Ｂ）に示したような各スペクトル分布領域毎の振幅スペクトル分布に基づいて前述したように時間領域の音声信号を発生させると、倍音のピーク周波数が完全倍音周波数から大きくずれているため、出力音の音質が不自然になるという問題点がある。また、音質の不自然さは、ピッチ変更比Ｔが大きいほど顕著になることも確認されている。 If the absolute value of the difference Delta] f ₁ and Delta] f ₂ to be compared with the absolute value of each difference Delta] f ₁₁ and Delta] f _21, absolute value of the difference Delta] f ₁₁ and Delta] f ₂₁ are each turned T times the absolute value of the difference Delta] f ₁ and Delta] f ₂ I can see that As described above, when the time-domain sound signal is generated based on the amplitude spectrum distribution for each spectrum distribution region as shown in FIG. 13B, the peak frequency of the harmonics is greatly shifted from the complete harmonic frequency. There is a problem that the sound quality of the output sound becomes unnatural. It has also been confirmed that the unnaturalness of sound quality becomes more pronounced as the pitch change ratio T increases.

その上、図１３（Ｂ）に示したような各スペクトル分布領域毎の振幅スペクトル分布にあっては、例えばピークＰ_０を含む振幅スペクトル分布の一方側及び他方側にスペクトル欠如領域Ｑ_１及びＱ_２が生ずる。このため、出力音には、原音声のような生々しさが乏しいという問題点がある。 In addition, in the amplitude spectrum distribution for each spectrum distribution region as shown in FIG. 13B, for example, the spectrum missing regions Q ₁ and Q are arranged on one side and the other side of the amplitude spectrum distribution including the peak P _0. ₂ occurs. For this reason, there is a problem that the output sound is not as fresh as the original sound.

図１３では、ピッチ変更処理としてピッチ上昇処理の例を示したが、ピッチ変更処理としてはピッチ低下処理も可能である。ピッチ低下処理では、各スペクトル分布領域毎に振幅スペクトル分布を周波数軸上で低音側に移動する。図１６は、本願の発明者の研究に係るピッチ・音色変更処理の一例を示すもので、この例では、ピッチ低下処理が行なわれる。 Although FIG. 13 shows an example of the pitch increasing process as the pitch changing process, a pitch decreasing process can also be performed as the pitch changing process. In the pitch reduction process, the amplitude spectrum distribution is moved to the low frequency side on the frequency axis for each spectrum distribution region. FIG. 16 shows an example of the pitch / tone change processing according to the research of the inventors of the present application. In this example, the pitch reduction processing is performed.

図１６（Ａ）は、図１３（Ａ）に示したのと同様の振幅スペクトルを示すもので、同様の部分には同様の符号を付して詳細な説明を省略する。ピッチ低下処理では、スペクトル分布領域Ｒ_０，Ｒ_１，Ｒ_２の振幅スペクトル分布をそれぞれ図１６（Ｂ）に示すように周波数軸上で低音側に移動する。移動後の振幅スペクトル分布において、ピークＰ_０対応のピークＰ_０１に対応するピーク周波数はＦ_０であり、ピークＰ_１対応のピークＰ_１１に対応するピーク周波数はＦ_２であり、ピークＰ_２対応のピークＰ_２１に対応するピーク周波数はＦ_５である。 FIG. 16A shows an amplitude spectrum similar to that shown in FIG. 13A, and the same parts are denoted by the same reference numerals and detailed description thereof is omitted. In the pitch reduction process, the amplitude spectrum distributions of the spectrum distribution regions R ₀ , R ₁ , and R ₂ are moved to the bass side on the frequency axis as shown in FIG. In the amplitude spectrum distribution after moving, the peak frequency corresponding to the peak _{P 0} corresponding peaks _{P 01} is _{F 0,} the peak frequency corresponding to the peak _{P 1} corresponding peak _{P 11} is _{F 2,} the peak _{P 2} corresponding the peak frequency corresponding to the peak _{P 21} of an _{F 5.}

原音声のスペクトルエンベロープＥＶａとは形状が異なる所定のスペクトルエンベロープＥＶｃを想定する。このようなエンベロープＥＶｃを十分に表現可能とするため、ピーク周波数Ｆ_０とＦ_２との間にはピーク周波数Ｆ_１を、ピーク周波数Ｆ_２とＦ_５との間にはピーク周波数Ｆ_３，Ｆ_４を、ピーク周波数Ｆ_５の高音側にはピーク周波数Ｆ_６，Ｆ_７をそれぞれ設定する。ピーク周波数Ｆ_１については、スペクトル分布領域Ｒ_０〜Ｒ_２のうちＦ_１に最も近いピーク周波数ｆ_０を有するスペクトル分布領域Ｒ_０を選択し、この領域Ｒ_０から振幅スペクトル分布をコピーする。コピーに係る振幅スペクトル分布をピーク周波数がｆ_０からＦ_１に変更されるように周波数軸上で高音側に移動する。 A predetermined spectral envelope EVc having a shape different from that of the original speech spectral envelope EVa is assumed. In order to sufficiently represent such an envelope EVc, the peak frequency F ₁ is between the peak frequencies F ₀ and F _2, and the peak frequencies F ₃ and F are between the peak frequencies F ₂ and F _5. ₄ and peak frequencies F ₆ and F ₇ are set on the high frequency side of the peak frequency F ₅ , respectively. The peak frequencies _{F 1,} select the spectral distribution region _{R 0} having the peak frequency _{f 0} closest to _{F 1} in the spectrum distribution region _R 0 to R _2, copies the amplitude spectrum distribution from this region _{R 0.} The amplitude spectrum distribution related to the copy is moved to the high sound side on the frequency axis so that the peak frequency is changed from f ₀ to F ₁ .

ピーク周波数Ｆ_３，Ｆ_４についても、ピーク周波数Ｆ_１について上記したと同様にＦ_３，Ｆ_４に最も近いピーク周波数ｆ_１を有するスペクトル分布領域Ｒ_１を選択し、この領域Ｒ_１からそれぞれＦ_３，Ｆ_４に対応して振幅スペクトル分布をコピーする。コピーに係る第１の振幅スペクトル分布をピーク周波数がｆ_１からＦ_３に変更されるように周波数軸上で低音側に移動する。また、コピーに係る第２の振幅スペクトル分布をピーク周波数がｆ_１からＦ_４に変更されるように周波数軸上で高音側に移動する。 As for the peak frequencies F ₃ and F ₄ , the spectrum distribution region R ₁ having the peak frequency f ₁ closest to F ₃ and F ₄ is selected in the same manner as described above for the peak frequency F ₁ , and F ₁ is selected from this region R _1. _3, corresponding to F ₄ to copy the amplitude spectrum distribution. The first amplitude spectrum distribution relating to the copy is moved to the bass side on the frequency axis so that the peak frequency is changed from f ₁ to F ₃ . Further, the second amplitude spectrum distribution related to the copy is moved to the high pitch side on the frequency axis so that the peak frequency is changed from f ₁ to F ₄ .

ピーク周波数Ｆ_６，Ｆ_７についても、ピーク周波数Ｆ_３，Ｆ_４について上記したと同様にＦ_６，Ｆ_７にそれぞれ対応してスペクトル分布領域Ｒ_２の振幅スペクトル分布をコピーする。コピーに係る第１の振幅スペクトル分布をピーク周波数がｆ_２からＦ_６に変更されるように周波数軸上で高音側に移動する。コピーに係る第２の振幅スペクトル分布をピーク周波数がｆ_２からＦ_７に変更されるように周波数軸上で高音側に移動する。 For the peak frequencies F ₆ and F ₇ , the amplitude spectrum distribution of the spectrum distribution region R ₂ is copied corresponding to F ₆ and F ₇ in the same manner as described above for the peak frequencies F ₃ and F ₄ . The first amplitude spectrum distribution relating to the copy is moved to the high frequency side on the frequency axis so that the peak frequency is changed from f ₂ to F ₆ . The second amplitude spectrum distribution related to the copy is moved to the high frequency side on the frequency axis so that the peak frequency is changed from f ₂ to F ₇ .

ピーク周波数Ｆ_０，Ｆ_２，Ｆ_５にそれぞれ対応する振幅スペクトル分布において、スペクトルエンベロープＥＶｃにピークＰ_０１，Ｐ_１１，Ｐ_２１を合わせるように各スペクトルビンのスペクトル強度を修正する。ここで、各スペクトルビンとは、振幅スペクトル分布において各周波数に対応する振幅スペクトルのことである。また、ピーク周波数Ｆ_１に対応する振幅スペクトル分布においては、ピークＰ_０２をスペクトルエンベロープＥＶｃに合わせるように各スペクトルビンのスペクトル強度を修正する。 In the amplitude spectrum distribution corresponding to each of the peak frequencies F ₀ , F ₂ , and F ₅ , the spectrum intensity of each spectrum bin is corrected so that the peaks P ₀₁ , P ₁₁ , and P ₂₁ are matched with the spectrum envelope EVc. Here, each spectrum bin is an amplitude spectrum corresponding to each frequency in the amplitude spectrum distribution. In the amplitude spectrum distribution corresponding to the peak frequency F _1, it modifies the spectral intensity of each spectral bin to match the peak P ₀₂ to the spectral envelope EVc.

ピーク周波数Ｆ_３に対応する振幅スペクトル分布においては、ピークＰ_１２をスペクトルエンベロープＥＶｃに合わせるように各スペクトルビンのスペクトル強度を修正する。また、ピーク周波数Ｆ_４に対応する振幅スペクトル分布においては、ピークＰ_１３をスペクトルエンベロープＥＶｃに合わせるように各スペクトルビンのスペクトル強度を修正する。 In the amplitude spectrum distribution corresponding to the peak frequency F _3, to modify the spectral intensity of each spectral bin to match the peak P ₁₂ to the spectral envelope EVc. In the amplitude spectrum distribution corresponding to the peak frequency F _4, to modify the spectral intensity of each spectral bin to match the peak P ₁₃ to the spectral envelope EVc.

ピーク周波数Ｆ_６に対応する振幅スペクトル分布においては、ピークＰ_２２をスペクトルエンベロープＥＶｃに合わせるように各スペクトルビンのスペクトル強度を修正する。また、ピーク周波数Ｆ_７に対応する振幅スペクトル分布においては、ピークＰ_２３をスペクトルエンベロープＥＶｃに合わせるように各スペクトルビンのスペクトル強度を修正する。 In the amplitude spectrum distribution corresponding to the peak frequency F _6, it modifies the spectral intensity of each spectral bin to match the peak P ₂₂ to the spectral envelope EVc. In the amplitude spectrum distribution corresponding to the peak frequency F _7, it modifies the spectral intensity of each spectral bin to match the peak P ₂₃ to the spectral envelope EVc.

上記のようなピッチ・音色変更処理の結果として、図１６（Ｂ）に示すようにピーク周波数Ｆ_０〜Ｆ_７に対応する８つの振幅スペクトル分布がピークＰ_０１，Ｐ_０２，Ｐ_１１〜Ｐ_１３，Ｐ_２１〜Ｐ_２３をスペクトルエンベロープＥＶｃに合わせた状態で配置されることになる。 As a result of the pitch / timbre change processing as described above, as shown in FIG. 16B, eight amplitude spectrum distributions corresponding to the peak frequencies F _{0 to} F ₇ have peaks P ₀₁ , P ₀₂ , P _{11 to} P _13. , P _{21 to} P ₂₃ are arranged in accordance with the spectrum envelope EVc.

上記したピッチ・音色変更処理によれば、ピッチ変更後のあるピーク（例えばピークＰ_０１）の一方側に定めた所定のピーク周波数（例えばＦ_１）に対応するスペクトル分布領域としては、該所定のピーク周波数に最も近いピーク周波数（例えばｆ_０）を有するスペクトル分布領域を選択し、選択に係るスペクトル分布領域の振幅スペクトル分布をコピーして音声信号発生に使用するので、自然な音色を得やすい。しかしながら、例えばピーク周波数Ｆ_３，Ｆ_４にそれぞれ対応するスペクトル分布領域としては、ピークＰ_１を有するスペクトル分布領域の振幅スペクトル分布をコピーした上で各スペクトルビンのスペクトル強度を増大させて使用するので、ノイズ性が強い音色になるという問題点がある。すなわち、スペクトル強度が小さいピークは、比較的不安定であり、そのスペクトル強度を増大させると不安定さが一層拡大されてノイズっぽい印象を与えることとなる。 According to the above-described pitch / tone color changing process, a spectrum distribution region corresponding to a predetermined peak frequency (for example, F ₁ ) defined on one side of a certain peak (for example, peak P ₀₁ ) after the pitch is changed is the predetermined distribution frequency. Since a spectrum distribution region having a peak frequency (for example, f ₀ ) closest to the peak frequency is selected and the amplitude spectrum distribution of the selected spectrum distribution region is copied and used for generating an audio signal, it is easy to obtain a natural timbre. However, for example, as the spectrum distribution regions corresponding to the peak frequencies F ₃ and F ₄ , respectively, the amplitude spectrum distribution of the spectrum distribution region having the peak P ₁ is copied and the spectrum intensity of each spectrum bin is increased and used. There is a problem that the tone becomes strong with noise. That is, a peak having a small spectrum intensity is relatively unstable. When the spectrum intensity is increased, the instability is further expanded and a noise-like impression is given.

図１７，１９は、本願の発明者の研究に係るＦＦＴ分析処理において分析窓の時間位置を異ならせた例を示すものである。これらの図において、ｔは時間を示す。Ｔｐは、入力音声波形の１周期を示し、この１周期は、入力音声のピッチに対応する。ｔ_Ｓ１〜ｔ_Ｓ３は、いずれも声帯振動開始位置を示す。 17 and 19 show an example in which the time position of the analysis window is changed in the FFT analysis processing according to the research of the inventors of the present application. In these figures, t represents time. Tp indicates one period of the input voice waveform, and this one period corresponds to the pitch of the input voice. t _{S1 to} t _S3 all indicate vocal cord vibration start positions.

図１７の例では、分析窓ＦＷの中心Ｗ_Ｃを声帯振動開始位置ｔ_Ｓ２，ｔ_Ｓ３の間の中央位置近傍に合わせた状態でＦＦＴ分析を行なうことにより図１８に示すようなピーク位相を得た。図１８において、横軸は周波数ｆを示し、縦軸は位相（０〜２π）を示す。ｆ_０は、基音のピーク周波数であり、ｆ_１〜ｆ_５はいずれも倍音のピーク周波数である。図１８によれば、ピーク周波数ｆ_０〜ｆ_５にそれぞれ対応するピーク位相φ_０〜φ_５がばらばらの値で揃っていないことがわかる。 In the example of FIG. 17, to obtain the peak phase shown in FIG. 18 by performing an FFT analysis of the center W _C of the analysis window FW in a state matching the center position near between the vocal cords vibrate start position t _S2, t _S3 It was. In FIG. 18, the horizontal axis represents the frequency f, and the vertical axis represents the phase (0 to 2π). f ₀ is the peak frequency of the fundamental tone, and f _{1 to} f ₅ are all the peak frequencies of the harmonics. According to FIG. 18, it can be seen that the peak phases φ _{0 to} φ ₅ corresponding to the peak frequencies f _{0 to} f ₅ are not uniform in value.

図１９の例では、分析窓ＦＷの中心Ｗ_Ｃを声帯振動開始位置ｔ_Ｓ３に合わせた状態でＦＦＴ分析を行なうことにより図２０に示すようなピーク位相を得た。図２０において、図１８と同様の部分には同様の符号を付してある。図２０によれば、ピーク位相φ_０’〜 φ_５’がある値を中心にほぼ揃っていることがわかる。このように位相揃い状態にあるのが自然な音声波形の特徴である。図１９に示した分析窓位置での位相がばらばらであると、前述したように時間領域の音声信号を発生させる際に音声らしくない波形になってしまい、結果として不自然な出力音になってしまう。換言すれば、図１８に示した位相スペクトルを用いて音声変換を行なうと、出力音の音質が不自然になるという問題点がある。 In the example of FIG. 19, to obtain a peak phase shown in FIG. 20 by a state where the center W _C of the analysis window FW tailored to vocal fold vibration start position t _S3 performs FFT analysis. In FIG. 20, the same parts as those in FIG. As can be seen from FIG. 20, the peak phases φ ₀ ′ to φ ₅ ′ are substantially aligned around a certain value. Such a phase-matched state is a characteristic of a natural speech waveform. When the phases at the analysis window positions shown in FIG. 19 are different, as described above, when generating a time domain audio signal, a waveform that does not look like a voice is generated, resulting in an unnatural output sound. End up. In other words, when voice conversion is performed using the phase spectrum shown in FIG. 18, the sound quality of the output sound becomes unnatural.

この発明の目的は、上記のような問題点を解決し、自然な音質の出力音が得られる新規なピッチ変換装置及びプログラムを提供することにある。 An object of the present invention is to provide a novel pitch conversion apparatus and program that can solve the above problems and obtain an output sound with a natural sound quality.

この発明に係るピッチ変換装置は、
原音とは異なるピッチを指示するピッチ情報を入力する入力手段と、
前記原音の音波形に周波数分析処理を施して得られた振幅スペクトルに基づいてスペクトル強度の複数の局所的ピークのうち各局所的ピーク毎に該局所的ピークとその前後のスペクトルとを含むスペクトル分布領域における振幅スペクトル分布を周波数軸に関して表わす振幅スペクトルデータを生成すると共に、前記周波数分析処理により得られた位相スペクトルに基づいて前記スペクトル分布領域毎に位相スペクトル分布を周波数軸に関して表わす位相スペクトルデータを生成する生成手段と、
前記振幅スペクトルデータが表わす振幅スペクトル分布を前記スペクトル分布領域毎に前記ピッチ情報に応じて周波数軸上で移動させることにより前記振幅スペクトルデータを修正する第１の修正手段と、
前記第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布において少なくとも１つの局所的ピークに対応するピーク周波数の一方側に所望のピーク周波数を設定する設定手段と、
前記第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布における複数の局所的ピークにそれぞれ対応するピーク周波数と前記設定手段での設定に係るピーク周波数とにそれぞれ対応してスペクトルエンベロープを形成すべきエンベロープ値を指示する指示手段と、
前記設定に係るピーク周波数に対応して前記指示手段により指示されたエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を前記生成手段での生成に係る振幅スペクトルデータの示すスペクトル分布領域において前記設定に係るピーク周波数と所定の近似関係にあるピーク周波数を有するスペクトル分布領域のうちから選択する選択手段と、
前記選択手段での選択に係るスペクトル分布領域の振幅スペクトルデータ及び位相スペクトルデータを前記生成手段での生成に係る振幅スペクトルデータ及び位相スペクトルデータのうちからコピーする第１のコピー手段と、
前記第１のコピー手段でのコピーに係る振幅スペクトルデータが表わす振幅スペクトル分布においてピーク周波数を前記設定に応じて該振幅スペクトル分布を周波数軸上で移動させることにより前記コピーに係る振幅スペクトルデータを修正する第２の修正手段と、
前記第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布において各振幅スペクトル分布毎に局所的ピークのスペクトル強度を前記指示手段で該局所的ピークに対応するピーク周波数に対応して指示されたエンベロープ値に合わせるように各スペクトルビンのスペクトル強度を修正すると共に、前記第２の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布において局所的ピークのスペクトル強度を前記指示手段で前記設定に係るピーク周波数に対応して指示されたエンベロープ値に合わせるように各スペクトルビンのスペクトル強度を修正する第３の修正手段と、
前記生成手段での生成に係る位相スペクトルデータが表わす位相スペクトル分布を前記第１の修正手段でのピッチ変更に対応して前記スペクトル分布領域毎に修正すると共に、前記第１のコピー手段でのコピーに係る位相スペクトルデータが表わす位相スペクトル分布を前記第２の修正手段での周波数変更に対応して修正する第４の修正手段と、
前記第１〜第３の修正手段での修正に係る振幅スペクトルデータと、前記第４の修正手段での修正に係る位相スペクトルデータとを時間領域の音信号に変換する変換手段と
を備えたものである。 The pitch converter according to the present invention is
Input means for inputting pitch information indicating a pitch different from the original sound;
Spectral distribution including a local peak and a spectrum before and after each local peak among a plurality of local peaks of spectral intensity based on an amplitude spectrum obtained by subjecting the sound waveform of the original sound to frequency analysis processing Generates amplitude spectrum data representing the amplitude spectrum distribution in the region with respect to the frequency axis, and generates phase spectrum data representing the phase spectrum distribution with respect to the frequency axis for each spectrum distribution region based on the phase spectrum obtained by the frequency analysis processing. Generating means for
First correcting means for correcting the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data on the frequency axis according to the pitch information for each spectrum distribution region;
Setting means for setting a desired peak frequency on one side of the peak frequency corresponding to at least one local peak in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means;
Spectral envelopes corresponding to the peak frequencies respectively corresponding to a plurality of local peaks in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means and the peak frequencies related to the setting by the setting means Indicating means for indicating an envelope value to form
A spectrum distribution region having a spectrum distribution region having a spectral intensity of a local peak closest to the envelope value instructed by the instruction unit corresponding to the peak frequency related to the setting is indicated by the amplitude spectrum data related to generation by the generation unit Selecting means for selecting from among spectrum distribution regions having a peak frequency in a predetermined approximate relationship with the peak frequency related to the setting in
A first copy means for copying the amplitude spectrum data and phase spectrum data of the spectrum distribution region related to the selection by the selection means from the amplitude spectrum data and the phase spectrum data related to the generation by the generation means;
The amplitude spectrum data related to the copy is corrected by moving the amplitude spectrum distribution on the frequency axis according to the setting in the amplitude spectrum distribution represented by the amplitude spectrum data related to the copy by the first copying means. Second correcting means for
In the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, the spectrum intensity of the local peak for each amplitude spectrum distribution is associated with the peak frequency corresponding to the local peak by the indicating means. The spectral intensity of each spectral bin is corrected to match the indicated envelope value, and the spectral intensity of the local peak in the amplitude spectral distribution represented by the amplitude spectral data related to the correction by the second correcting means is indicated by the indicating means. A third correcting means for correcting the spectral intensity of each spectral bin so as to match the envelope value indicated corresponding to the peak frequency according to the setting;
The phase spectrum distribution represented by the phase spectrum data related to the generation by the generation means is corrected for each spectrum distribution region corresponding to the pitch change by the first correction means, and is copied by the first copy means. Fourth correcting means for correcting the phase spectrum distribution represented by the phase spectrum data according to the frequency change in the second correcting means;
Conversion means for converting the amplitude spectrum data related to the correction by the first to third correction means and the phase spectrum data related to the correction by the fourth correction means into a sound signal in the time domain It is.

上記のピッチ変換装置によれば、第１の修正手段での修正に係る振幅スペクトルデータ（ピッチ変更処理が施された振幅スペクトルデータ）が表わす振幅スペクトル分布において少なくとも１つの局所的ピークに対応するピーク周波数の一方側に所望のピーク周波数が設定される。これは、スペクトルエンベロープを表現する局所的ピークの数を増大させるためである。 According to the above pitch conversion device, the peak corresponding to at least one local peak in the amplitude spectrum distribution represented by the amplitude spectrum data (amplitude spectrum data subjected to the pitch changing process) related to the correction by the first correction means. A desired peak frequency is set on one side of the frequency. This is to increase the number of local peaks that represent the spectral envelope.

また、第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布における複数の局所的ピークのそれぞれ対応するピーク周波数と、上記設定に係るピーク周波数とにそれぞれ対応してスペクトルエンベロープを形成すべきエンベロープ値が指示される。 Further, a spectrum envelope is formed corresponding to each of the peak frequencies corresponding to the plurality of local peaks in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, and the peak frequency related to the above setting. An envelope value to be indicated is indicated.

設定に係るピーク周波数に対応して指示されたスペクトルエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域が生成手段での生成に係る振幅スペクトルデータの示すスペクトル分布領域において設定に係るピーク周波数と所定の近似関係にあるピーク周波数を有するスペクトル分布領域のうちから選択され、選択に係るスペクトル分布領域の振幅スペクトルデータ及び位相スペクトルデータが生成手段での生成に係る振幅スペクトルデータ及び位相スペクトルデータのうちからコピーされる。そして、コピーに係る振幅スペクトルデータ及び位相スペクトルデータが必要な修正を受けた上で音信号発生に用いられる。 The spectrum distribution region having the spectrum intensity of the local peak closest to the spectrum envelope value indicated corresponding to the peak frequency related to the setting is the peak related to the setting in the spectrum distribution region indicated by the amplitude spectrum data related to the generation by the generating means Amplitude spectrum data and phase spectrum data selected from the spectrum distribution region having a peak frequency having a predetermined approximate relationship with the frequency, and the amplitude spectrum data and phase spectrum data of the selected spectrum distribution region are generated by the generation means Copied from Then, the amplitude spectrum data and the phase spectrum data related to the copy are used for sound signal generation after undergoing necessary correction.

このように、上記のピッチ変換装置では、設定に係るピーク周波数に近いピッチ変更前のピッチ周波数を有し且つ指示に係るエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を選択し、このスペクトル分布領域の振幅スペクトルデータ及び位相スペクトルデータをコピーして音信号発生に用いるので、自然な音色を得るのが容易となる。また、局所的ピークのスペクトル強度をエンベロープ値に合わせる際に振幅スペクトルデータにおいて各スペクトルビンのスペクトル強度をさほど増大させなくてよいので、出力音の音質は、ノイズっぽさがない自然な音質となる。 As described above, the above pitch converter selects a spectrum distribution region having a pitch frequency before the pitch change close to the peak frequency related to the setting and having a spectral intensity of the local peak closest to the envelope value related to the instruction. Since the amplitude spectrum data and phase spectrum data in this spectrum distribution region are copied and used for sound signal generation, it becomes easy to obtain a natural tone color. In addition, when adjusting the spectral intensity of the local peak to the envelope value, it is not necessary to increase the spectral intensity of each spectral bin in the amplitude spectral data so much, so the sound quality of the output sound is a natural sound quality without noise. Become.

上記のピッチ変換装置において、
前記位相スペクトルデータに関して基音のピーク位相からのタイムシフト量の候補値を複数設定すると共に各候補値毎に基音及びｎ倍音のピーク位相を算出する計算手段と、
前記複数の候補値にそれぞれ対応する複数群のピーク位相のうちから平坦に最も近い位相揃い状態となる候補値に対応する１群のピーク位相を選択し、選択に係る群中の基音及びｎ倍音のピーク位相にそれぞれ一致するように前記位相スペクトルデータ中の基音及びｎ倍音のピーク位相を修正する第５の修正手段と、
前記第４の修正手段に代えて、前記第５の修正手段での修正に係る位相スペクトルデータが表す位相スペクトル分布において前記スペクトル分布領域毎に各周波数を第１の修正手段でのピッチ変更に対応して修正する第６の修正手段と、
前記第６の修正手段での修正に係る位相スペクトルデータに関して前記第１の修正手段でのピッチ変更量を考慮してピッチ変更前の基音のピーク位相へのタイムシフト量を算出すると共に算出に係るタイムシフト量に応じて前記第６の修正手段での修正に係る位相スペクトルデータ中の基音及びｎ倍音のピーク位相を修正する第７の修正手段と、
前記第７の修正手段での修正に係る位相スペクトルデータにおいて基音に対応するスペクトル分布領域では前記第５及び第７の修正手段による基音のピーク位相の変更量に対応して基音のピーク位相以外の位相を修正すると共にｎ倍音に対応するスペクトル分布領域では前記第５及び第７の修正手段によるｎ倍音のピーク位相の変更量に対応してｎ倍音のピーク位相以外の位相を修正する第８の修正手段とを備え、
前記変換手段は、前記第１〜第３の修正手段での修正に係る振幅スペクトルデータと、前記第５〜第８の修正手段での修正に係る位相スペクトルデータとを時間領域の音信号に変換するものである
ようにしてもよい。 In the above pitch converter,
Calculating means for setting a plurality of candidate values of the time shift amount from the peak phase of the fundamental tone with respect to the phase spectrum data, and calculating the peak phase of the fundamental tone and the nth harmonic for each candidate value;
A group of peak phases corresponding to a candidate value that is closest to the flatness is selected from among a plurality of groups of peak phases respectively corresponding to the plurality of candidate values, and a fundamental tone and an nth harmonic in the selected group are selected. Fifth correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data so as to coincide with the peak phase of
Instead of the fourth correction means, each phase corresponds to the pitch change in the first correction means for each spectrum distribution region in the phase spectrum distribution represented by the phase spectrum data related to the correction in the fifth correction means. And a sixth correction means for correcting
With respect to the phase spectrum data related to the correction by the sixth correction means, the time shift amount to the peak phase of the fundamental tone before the pitch change is calculated in consideration of the pitch change amount by the first correction means and Seventh correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data related to the correction by the sixth correcting means according to the amount of time shift;
In the spectrum distribution region corresponding to the fundamental tone in the phase spectrum data related to the modification by the seventh modifying means, the peak phase other than the fundamental peak phase corresponds to the amount of change in the fundamental peak phase by the fifth and seventh modifying means. In the spectral distribution region corresponding to the nth harmonic, the phase other than the peak phase of the nth harmonic is corrected corresponding to the amount of change in the peak phase of the nth harmonic by the fifth and seventh correction means. Correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third correction means and the phase spectrum data related to the correction by the fifth to eighth correction means into sound signals in the time domain. It may be made to do.

この態様によれば、原音の音波形に分析窓の中心が声帯振動開始位置からずれた状態で周波数分析処理を施すので、生成手段から生成される位相スペクトルデータが表わす位相スペクトル分布では、図１７，１８に関して前述したように基音及びｎ倍音のピーク位相が不揃いの状態となる。しかし、このようなピーク位相の不揃い状態は、計算手段及び第２〜第５の修正手段により修正される。 According to this aspect, since the frequency analysis process is performed on the sound waveform of the original sound in a state where the center of the analysis window is shifted from the vocal cord vibration start position, in the phase spectrum distribution represented by the phase spectrum data generated by the generation unit, FIG. , 18 as described above, the peak phases of the fundamental tone and the n-th overtone are inconsistent. However, such an uneven state of peak phases is corrected by the calculation means and the second to fifth correction means.

位相スペクトルデータに関して基音のピーク位相からのタイムシフト量の候補値が複数設定され、各候補値毎に基音及びｎ倍音のピーク位相が算出される。複数の候補値にそれぞれ対応する複数群のピーク位相のうちから平坦に最も近い位相揃い状態となる候補値に対応する１群のピーク位相が選択され、選択に係る群中の基音及びｎ倍音のピーク位相にそれぞれ一致するように位相スペクトルデータ中の基音及びｎ倍音のピーク位相が修正される。 A plurality of candidate values for the amount of time shift from the peak phase of the fundamental tone are set for the phase spectrum data, and the peak phase of the fundamental tone and the nth harmonic is calculated for each candidate value. A group of peak phases corresponding to a candidate value that is in the closest phase alignment state is selected from among a plurality of groups of peak phases respectively corresponding to a plurality of candidate values, and the fundamental tone and n harmonics in the selected group are selected. The peak phase of the fundamental tone and the n-th overtone in the phase spectrum data is corrected so as to coincide with the peak phase.

このような修正に係る位相スペクトルデータが表わす位相スペクトル分布においてスペクトル分布領域毎に各周波数がピッチ変更に対応して修正される。この後、ピッチ変更量を考慮してピッチ変更前の基音のピーク位相へのタイムシフト量が算出され、算出に係るタイムシフト量に応じて先の修正に係る位相スペクトルデータ中の基音及びｎ倍音のピーク位相が再修正される。このときのタイムシフトは、位相揃えのためのタイムシフトを元に戻すために行なわれるものである。 In the phase spectrum distribution represented by the phase spectrum data related to such correction, each frequency is corrected corresponding to the pitch change for each spectrum distribution region. Thereafter, the time shift amount to the peak phase of the fundamental tone before the pitch change is calculated in consideration of the pitch change amount, and the fundamental tone and the nth harmonic in the phase spectrum data according to the previous correction according to the calculated time shift amount The peak phase of is corrected again. The time shift at this time is performed in order to restore the time shift for phase alignment.

ここまでの位相修正は、ピーク位相を対象としているので、ピーク位相以外の位相の修正を行なう必要がある。そこで、再修正に係る位相スペクトルデータにおいて基音に対応するスペクトル分布領域では基音のピーク位相の変更量に対応して基音のピーク位相以外の位相が修正され、ｎ倍音に対応するスペクトル分布領域でもｎ倍音のピーク位相の変更量に対応してｎ倍音のピーク位相以外の位相が修正される。 Since the phase correction up to this point is for the peak phase, it is necessary to correct phases other than the peak phase. Therefore, in the phase spectrum data related to the re-correction, in the spectrum distribution region corresponding to the fundamental tone, the phase other than the peak phase of the fundamental tone is corrected in accordance with the amount of change in the peak phase of the fundamental tone. A phase other than the peak phase of the nth harmonic is corrected in accordance with the amount of change in the peak phase of the harmonic.

上記のような修正を施した位相スペクトルデータを音信号発生に用いると、発生される音信号の波形は、声帯振動開始位置にてピーク位相が揃うという自然な音声波形の特徴を有することとなり、自然な音質の出力音が得られる。 When the phase spectrum data subjected to the correction as described above is used for sound signal generation, the waveform of the sound signal to be generated has a characteristic of a natural sound waveform in which the peak phases are aligned at the vocal cord vibration start position, Output sound with natural sound quality can be obtained.

上記のピッチ変換装置において、
前記生成手段での生成に係る振幅スペクトルデータの示すスペクトル分布領域内のノイズ成分領域であって前記第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布のうち少なくとも１つの振幅スペクトル分布の一方側に生じたスペクトル欠如領域と周波数帯域が一致するノイズ成分領域からスペクトルビンをコピーする第２のコピー手段と、
前記第２のコピー手段でのコピーに係るスペクトルビンを前記スペクトル欠如領域に付加するように前記修正に係る振幅スペクトルデータのうち前記少なくとも１つの振幅スペクトル分布を表わす振幅スペクトルデータを修正する第５の修正手段とを備え、
前記変換手段は、前記第１〜第３、および第５の修正手段での修正に係る振幅スペクトルデータと、前記第４の修正手段での修正に係る位相スペクトルデータとを時間領域の音信号に変換するものである
ようにしてもよい。 In the above pitch converter,
At least one amplitude spectrum of the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, which is a noise component area within the spectrum distribution area indicated by the amplitude spectrum data generated by the generation means. A second copy means for copying a spectrum bin from a noise component region whose frequency band coincides with a spectrum lack region generated on one side of the distribution;
A fifth correction of amplitude spectrum data representing the at least one amplitude spectrum distribution among the amplitude spectrum data related to the correction so as to add a spectrum bin related to the copy by the second copy means to the spectrum absence region. Correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third and fifth correction means and the phase spectrum data related to the correction by the fourth correction means into a sound signal in the time domain. You may make it convert.

この態様によれば、第１の修正手段での修正に係る振幅スペクトルデータが表わす振幅スペクトル分布のうち少なくとも１つの振幅スペクトル分布の一方側に生じたスペクトル欠如領域と周波数帯域が一致するノイズ成分領域（局所的ピークの周波数から十分に離れた低スペクトル強度の領域）からスペクトルビンをコピーしてスペクトル欠如領域に付加するようにしたので、原音声が持っている生々しさを出力音に反映させることができ、自然な音質の出力音が得られる。 According to this aspect, the noise component region in which the frequency band coincides with the spectrum lack region generated on one side of at least one amplitude spectrum distribution among the amplitude spectrum distributions represented by the amplitude spectrum data related to the correction by the first correction means. Since spectral bins are copied from (region of low spectral intensity sufficiently far from local peak frequency) and added to the spectrum lacking region, the rawness of the original voice is reflected in the output sound. Output sound with natural sound quality.

この発明によれば、設定に係るピーク周波数に近いピッチ変更前のピーク周波数を有し且つ指示に係るエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を選択したり、周波数分析で得られた位相スペクトルにおいて不揃いであったピーク位相を計算により揃えたり、ピッチ上昇後の振幅スペクトル分布のスペクトル欠如領域と周波数帯域が一致するノイズ成分領域からコピーしたスペクトルビンをスペクトル欠如領域に付加したりしたので、自然な音質の出力音を発生可能となる効果が得られる。 According to the present invention, a spectral distribution region having a peak frequency before the pitch change close to the peak frequency related to the setting and having a spectral intensity of a local peak closest to the indicated envelope value can be selected, or by frequency analysis. Align the peak phases that were not uniform in the obtained phase spectrum by calculation, or add spectrum bins copied from the noise component area whose frequency band matches the spectrum missing area of the amplitude spectrum distribution after pitch increase to the spectrum missing area. As a result, it is possible to generate an output sound with a natural sound quality.

図１は、この発明の一実施形態に係るピッチ変換装置の回路構成を示すものである。このピッチ変換装置は、小型コンピュータ１０によって動作が制御される構成になっている。 FIG. 1 shows a circuit configuration of a pitch converter according to an embodiment of the present invention. This pitch converter is configured to be controlled by a small computer 10.

バス１１には、ＣＰＵ（中央処理装置）１２、ＲＯＭ（リード・オンリィ・メモリ）１４、ＲＡＭ（ランダム・アクセス・メモリ）１６、音声入力部１８、制御パラメータ入力部２０、外部記憶装置２２、表示部２４、Ｄ／Ａ（ディジタル／アナログ）変換部２６、ＭＩＤＩ（Musical Instrument Digital Interface）インターフェース２８、通信インターフェース３０等が接続されている。 The bus 11 includes a CPU (Central Processing Unit) 12, a ROM (Read Only Memory) 14, a RAM (Random Access Memory) 16, a voice input unit 18, a control parameter input unit 20, an external storage device 22, and a display. A unit 24, a D / A (digital / analog) conversion unit 26, a MIDI (Musical Instrument Digital Interface) interface 28, a communication interface 30 and the like are connected.

ＣＰＵ１２は、ＲＯＭ１４にストアされたプログラムに従ってピッチ変換等に関する各種処理を実行するもので、ピッチ変換に関する処理については図２〜４等を参照して後述する。 The CPU 12 executes various processes related to pitch conversion according to a program stored in the ROM 14, and the processes related to pitch conversion will be described later with reference to FIGS.

ＲＡＭ１６は、ＣＰＵ１２の各種処理に際してワーキングエリアとして使用される種々の記憶部を含むものである。この発明の実施に関係する記憶部としては、例えば入力部１８，２０にそれぞれ対応する入力データ記憶領域等が存在するが、詳細については後述する。 The RAM 16 includes various storage units that are used as a working area when the CPU 12 performs various processes. As a storage unit related to the implementation of the present invention, for example, there are input data storage areas corresponding to the input units 18 and 20, respectively, and the details will be described later.

音声入力部１８は、音声信号を入力するためのマイクロホン、音声入力端子等を有するもので、入力した音声信号をディジタル波形データに変換するＡ／Ｄ（アナログ／ディジタル）変換器を備えている。入力に係るディジタル波形データは、ＲＡＭ１６内の所定領域に記憶される。 The audio input unit 18 includes a microphone for inputting an audio signal, an audio input terminal, and the like, and includes an A / D (analog / digital) converter that converts the input audio signal into digital waveform data. Digital waveform data relating to the input is stored in a predetermined area in the RAM 16.

制御パラメータ入力部２０は、文字、数字等を入力可能なキーボードと、マウス等のポインティングデバイスと、ボリューム等のパラメータ設定器とを備えたもので、ピッチ変換処理に用いられる各種の制御パラメータを設定可能である。制御パラメータとしては、ピッチ、音色などを設定可能である。設定に係る制御パラメータを表わす制御パラメータデータは、ＲＡＭ１６内の所定領域に記憶される。 The control parameter input unit 20 includes a keyboard capable of inputting characters, numbers, etc., a pointing device such as a mouse, and a parameter setting device such as a volume, and sets various control parameters used for pitch conversion processing. Is possible. As the control parameter, a pitch, a timbre, etc. can be set. Control parameter data representing the control parameter related to the setting is stored in a predetermined area in the RAM 16.

外部記憶装置２２は、ＨＤ（ハードディスク）、ＦＤ（フレキシブルディスク）、ＣＤ（コンパクトディスク）、ＤＶＤ（ディジタル多目的ディスク）、ＭＯ（光磁気ディスク）等のうち１又は複数種類の記録媒体を着脱可能なものである。外部記憶装置２２に所望の記録媒体を装着した状態では、記録媒体からＲＡＭ１６へデータを転送可能である。また、装着した記録媒体がＨＤやＦＤのように書込み可能なものであれば、ＲＡＭ１６のデータを記録媒体に転送可能である。 The external storage device 22 is detachable from one or more types of recording media among HD (hard disk), FD (flexible disk), CD (compact disk), DVD (digital multipurpose disk), MO (magneto-optical disk) and the like. Is. When a desired recording medium is mounted on the external storage device 22, data can be transferred from the recording medium to the RAM 16. If the mounted recording medium is writable like HD or FD, the data in the RAM 16 can be transferred to the recording medium.

プログラム記録手段としては、ＲＯＭ１４の代わりに外部記憶装置２２の記録媒体を用いることができる。この場合、記録媒体に記録したプログラムは、外部記憶装置２２からＲＡＭ１６へ転送する。そして、ＲＡＭ１６に記憶したプログラムにしたがってＣＰＵ１２を動作させる。このようにすると、プログラムの追加やバージョンアップ等を容易に行なうことができる。 As the program recording means, a recording medium of the external storage device 22 can be used instead of the ROM 14. In this case, the program recorded on the recording medium is transferred from the external storage device 22 to the RAM 16. Then, the CPU 12 is operated according to the program stored in the RAM 16. In this way, it is possible to easily add a program or upgrade a version.

表示部２４は、液晶表示器等の表示器を含むもので、後述する周波数分析結果等の種々の情報を表示可能である。 The display unit 24 includes a display such as a liquid crystal display, and can display various information such as a frequency analysis result to be described later.

Ｄ／Ａ変換部２６は、ピッチ変換処理により生成されたディジタル音声信号をアナログ音声信号に変換するものである。Ｄ／Ａ変換部２６から送出されるアナログ音声信号は、アンプ、スピーカ等を含むサウンドシステム３２により音響に変換される。 The D / A converter 26 converts the digital audio signal generated by the pitch conversion process into an analog audio signal. The analog audio signal sent from the D / A conversion unit 26 is converted into sound by a sound system 32 including an amplifier, a speaker, and the like.

ＭＩＤＩインターフェース２８は、このピッチ変換装置とは別体のＭＩＤＩ機器３４との間でＭＩＤＩ通信を行なうために設けられたもので、この発明では、ＭＩＤＩ機器３４からピッチ変換用のデータを受信するために用いられる。 The MIDI interface 28 is provided for performing MIDI communication with a MIDI device 34 separate from the pitch conversion device. In the present invention, the MIDI interface 28 receives data for pitch conversion from the MIDI device 34. Used for.

通信インターフェース３０は、通信ネットワーク（例えばＬＡＮ（ローカル・エリア・ネットワーク）、インターネット、電話回線等）３６を介して他のコンピュータ３８と情報通信を行なうために設けられたものである。この発明の実施に必要なプログラムや各種データは、コンピュータ３８から通信ネットワーク３６及び通信インターフェース３０を介してＲＡＭ１６または外部記憶装置２２へダウンロード要求に応じて取込むようにしてもよい。 The communication interface 30 is provided to perform information communication with another computer 38 via a communication network (for example, a LAN (local area network), the Internet, a telephone line, etc.) 36. Programs and various data necessary for the implementation of the present invention may be fetched from the computer 38 to the RAM 16 or the external storage device 22 via the communication network 36 and the communication interface 30 in response to a download request.

次に、図２を参照して音声変換処理の一例を説明する。ステップ４０では、入力部１８からマイクロホン又は音声入力端子を介して音声信号を入力してＡ／Ｄ変換し、入力音声信号の音声波形を表わすディジタル波形データをＲＡＭ１６に記憶させる。図１７には、入力音声波形の一例を示す。また、発生すべき音声のピッチ（入力音声信号より高い又は低いピッチ）を指示するピッチ情報を入力部２０から入力し、ＲＡＭ１６に記憶させる。 Next, an example of the voice conversion process will be described with reference to FIG. In step 40, an audio signal is input from the input unit 18 via a microphone or an audio input terminal, A / D converted, and digital waveform data representing the audio waveform of the input audio signal is stored in the RAM 16. FIG. 17 shows an example of the input speech waveform. Also, pitch information indicating the pitch of the sound to be generated (higher or lower pitch than the input sound signal) is input from the input unit 20 and stored in the RAM 16.

ステップ４２では、記憶に係るディジタル波形データについてフレーム毎に区間波形を切出す（ディジタル波形データを分割する）。 In step 42, a section waveform is cut out for each frame of the stored digital waveform data (dividing the digital waveform data).

ステップ４４では、フレーム毎にＦＦＴ分析処理により周波数分析を実行して周波数スペクトル（振幅スペクトルと位相スペクトル）を検出する。そして、周波数スペクトルを表わすデータをＲＡＭ１６の所定領域に記憶させる。図５（Ａ）には、ＦＦＴ分析処理により周波数分析して得た振幅スペクトルの一例を示す。 In step 44, frequency analysis is performed by FFT analysis processing for each frame to detect a frequency spectrum (amplitude spectrum and phase spectrum). Data representing the frequency spectrum is stored in a predetermined area of the RAM 16. FIG. 5A shows an example of an amplitude spectrum obtained by performing frequency analysis by FFT analysis processing.

次に、ステップ４６では、フレーム毎に振幅スペクトルに基づいてピッチを検出し、検出ピッチを表わすピッチデータを生成し、ＲＡＭ１６の所定領域に記憶させる。 Next, in step 46, the pitch is detected for each frame based on the amplitude spectrum, pitch data representing the detected pitch is generated, and stored in a predetermined area of the RAM 16.

ステップ４８では、フレーム毎に振幅スペクトル上でスペクトル強度（振幅）の局所的ピークを複数検知する。局所的ピークを検知するには、近隣の複数（例えば４つ）のピークについて振幅値が最大のピークを検知する方法等を用いることができる。図５（Ａ）には、検知した複数の局所的ピークＰ_０，Ｐ_１，Ｐ_２…が示されている。 In step 48, a plurality of local peaks of the spectrum intensity (amplitude) are detected on the amplitude spectrum for each frame. In order to detect the local peak, a method of detecting a peak having the maximum amplitude value for a plurality of neighboring peaks (for example, four) can be used. FIG. 5A shows a plurality of detected local peaks P ₀ , P ₁ , P ₂ .

ステップ５０では、フレーム毎に振幅スペクトル上で各局所的ピークに対応するスペクトル分布領域を指定し、該領域内の振幅スペクトル分布を周波数軸に関して表わす振幅スペクトルデータを生成し、ＲＡＭ１６の所定領域に記憶させる。スペクトル分布領域を指定する方法としては、隣り合う２つの局所的ピーク間で周波数軸を半分に切り、各半分を近い方の局所的ピークを含むスペクトル分布領域に割当てる方法等を採用することができる。図５（Ａ）には、局所的ピークＰ_０，Ｐ_１，Ｐ_２…をそれぞれ含むスペクトル分布領域Ｒ_０，Ｒ_１，Ｒ_２…を指定した例を示す。 In step 50, a spectrum distribution region corresponding to each local peak on the amplitude spectrum is designated for each frame, and amplitude spectrum data representing the amplitude spectrum distribution in the region with respect to the frequency axis is generated and stored in a predetermined region of the RAM 16. Let As a method for designating a spectral distribution region, a method of cutting the frequency axis in half between two adjacent local peaks and allocating each half to a spectral distribution region including the closer local peak can be employed. . FIG. 5A shows an example in which spectral distribution regions R ₀ , R ₁ , R ₂ ... Each including local peaks P ₀ , P ₁ , P ₂ .

ステップ５２では、フレーム毎に位相スペクトルに基づいて各スペクトル分布領域内の位相スペクトル分布を周波数軸に関して表わす位相スペクトルデータを生成し、ＲＡＭ１６内の所定領域に記憶させる。図１４には、あるフレームのあるスペクトル分布領域における振幅スペクトル分布ａｍ_０及び位相スペクトル分布ｐｈ_０を示す。 In step 52, phase spectrum data representing the phase spectrum distribution in each spectrum distribution region with respect to the frequency axis based on the phase spectrum for each frame is generated and stored in a predetermined region in the RAM 16. FIG. 14 shows an amplitude spectrum distribution am ₀ and a phase spectrum distribution ph ₀ in a spectrum distribution region of a certain frame.

ステップ５４〜６８の処理は、各フレームの振幅スペクトルデータ又は位相スペクトルデータに関して行なわれる。ステップ５４では、振幅スペクトルデータに関してステップ４０での入力に係るピッチ情報に応じてピッチ変更すべく振幅スペクトル分布配置を変更する。 The processing in steps 54 to 68 is performed on the amplitude spectrum data or phase spectrum data of each frame. In step 54, the amplitude spectrum distribution arrangement is changed to change the pitch in accordance with the pitch information related to the input in step 40 with respect to the amplitude spectrum data.

図５は、この発明に係るピッチ変更処理の一例を示すもので、図１３と同様の部分には同様の符号を付して詳細な説明を省略する。ピッチ変更処理としては、ピッチ上昇処理を行なうものとし、ステップ４０では、入力音声信号より高いピッチを指示するピッチ情報を入力する。ステップ４６で求めたピッチデータに対応する周波数（基音のピーク周波数）をｆ_０とし、入力に係るピッチ情報に対応する周波数をｆ_０１とすると、ピッチ変更比Ｔは、Ｔ＝ｆ_０１／ｆ_０となる。 FIG. 5 shows an example of the pitch changing process according to the present invention. The same parts as those in FIG. As the pitch changing process, a pitch increasing process is performed. In step 40, pitch information indicating a pitch higher than the input audio signal is input. Assuming that the frequency (peak frequency of the fundamental tone) corresponding to the pitch data obtained in step 46 is f ₀ and the frequency corresponding to the pitch information related to the input is f ₀₁ , the pitch change ratio T is T = f ₀₁ / f ₀ It becomes.

ピッチ上昇処理では、図５（Ａ）に示すように基音のピーク周波数ｆ_０をそれぞれ２倍，３倍にした完全倍音周波数２ｆ_０，３ｆ_０を想定する。そして、第１倍音のピーク周波数ｆ_１と完全倍音周波数２ｆ_０との差分Δｆ_１＝（ｆ_１−２ｆ_０）を保持すると共に、第２倍音のピーク周波数ｆ_２と完全倍音周波数３ｆ_０との差分Δｆ_２＝（ｆ_２−３ｆ_０）を保持する。差分の保持は、差分Δｆ_１，Δｆ_２をそれぞれ表わす差分データをＲＡＭ１６内の所定領域に記憶させることにより行なう。 In the pitch increase processing, as shown in FIG. 5A, perfect harmonic frequencies 2f ₀ and 3f ₀ are assumed, in which the peak frequency f _{0 of the} fundamental tone is doubled and tripled, respectively. Then, the difference Δf ₁ = (f ₁ −2f ₀ ) between the peak frequency f ₁ of the first harmonic and the perfect harmonic frequency 2f ₀ is held, and the difference between the peak frequency f ₂ of the second harmonic and the perfect harmonic frequency 3f ₀ is maintained. The difference Δf ₂ = (f ₂ −3f ₀ ) is held. The difference is held by storing difference data representing the differences Δf ₁ and Δf ₂ in a predetermined area in the RAM 16.

次に、基音のピークＰ_０が周波数ｆ_０１＝ｆ_０Ｔに位置するように領域Ｒ_０内の振幅スペクトル分布を周波数軸上で高音側に移動する。すなわち、このような移動を可能にすべく領域Ｒ_０の振幅スペクトルデータを修正する（具体的には振幅スペクトル分布において各スペクトルビンの周波数を修正する）。また、ピッチ上昇後の基音のピーク周波数ｆ_０１をそれぞれ２倍，３倍にした完全倍音周波数２ｆ_０１，３ｆ_０１を想定する。ピッチ上昇後の第１倍音のピーク周波数としては完全倍音周波数２ｆ_０１を前述の差分Δｆ_１に対応してシフトした周波数ｆ_１１＝２ｆ_０１＋Δｆ_１を採用すると共に、ピッチ上昇後の第２倍音のピーク周波数としては完全倍音周波数３ｆ_０１を前述の差分Δｆ_２に対応してシフトした周波数ｆ_２１＝３ｆ_０１＋Δｆ_２を採用する。 Next, the amplitude spectrum distribution in the region R ₀ is moved to the high pitch side on the frequency axis so that the fundamental tone peak P ₀ is located at the frequency f ₀₁ = f ₀ T. That is, the amplitude spectrum data in the region R ₀ is corrected to enable such movement (specifically, the frequency of each spectral bin is corrected in the amplitude spectrum distribution). In addition, it is assumed that perfect harmonic frequencies 2f ₀₁ and 3f ₀₁ are obtained by doubling and triple the peak frequency f ₀₁ of the fundamental tone after the pitch rise. While adopting the frequency f _{11 =} 2f 01 ₊ Δf ₁ which is shifted in response to full harmonic frequency 2f ₀₁ to the difference Delta] f ₁ described above as a peak frequency of the first harmonic of the post elevating pitch, the second harmonic of the post elevating pitch the peak frequency employs the full harmonic frequency _f 21 = _{3f 01} + Δf ₂ of the frequency _{3f 01} is shifted in response to the difference Delta] f ₂ described above.

ピッチ上昇後の第１倍音のピークＰ_１が周波数ｆ_１１＝２ｆ_０１＋Δｆ_１に位置するように領域Ｒ_１内の振幅スペクトル分布を周波数軸上で高音側に移動する（すなわち、このような移動を可能にすべく領域Ｒ_１の振幅スペクトルデータを修正する）。また、ピッチ上昇後の第２倍音のピークＰ_２が周波数ｆ_２１＝３ｆ_０１＋Δｆ_２に位置するように領域Ｒ_２の振幅スペクトル分布を周波数軸上で高音側に移動する（すなわち、このような移動を可能にすべく領域Ｒ_２の振幅スペクトルデータを修正する）。 Peak P ₁ of the first harmonic of the post elevating pitch is moved amplitude spectrum distribution in the region R ₁ so as to be positioned on the frequency f _{11 =} 2f 01 ₊ Δf ₁ treble side on the frequency axis (i.e., such movement modifying the amplitude spectrum data of the region R ₁ to enable any). Further, the peak P ₂ of the second harmonic of the post elevating pitch is moved amplitude spectrum distribution region R ₂ so as to be positioned in the frequency f _{21 =} 3f 01 ₊ Δf ₂ treble side on the frequency axis (i.e., like this to permit a movement to correct the amplitude spectrum data region R _2).

図５に関して上記したピッチ変更処理によれば、ピッチ変更比Ｔが大きくなってもピッチ上昇後の倍音のピーク周波数ｆ_１１，ｆ_２１が完全倍音周波数２ｆ_０１，３ｆ_０１からそれぞれ大きくずれることはない。従って、自然な音質の出力音を得ることができる。 According to the pitch change processing described above with reference to FIG. 5, even if the pitch change ratio T is increased, the peak frequencies f ₁₁ and f ₂₁ of the harmonics after the pitch increase are not significantly shifted from the perfect harmonic frequencies 2f ₀₁ and 3f ₀₁ , respectively. . Therefore, an output sound with natural sound quality can be obtained.

図５に関して上記したようにピッチ変更前のΔｆ_１等の差分を保持してピッチ変更後のｆ_１１等の倍音周波数に反映させる処理は、ピッチ上昇の場合に限らず、ピッチ低下の場合にも適用することができる。ピッチ低下の場合には、周波数の差分は小さくなるものの、ピッチ上昇の場合と同様に自然な音質の出力音が得られる。 As described above with reference to FIG. 5, the process of holding the difference such as Δf ₁ before the pitch change and reflecting it in the harmonic frequency such as f ₁₁ after the pitch change is not limited to the case where the pitch is increased but also when the pitch is decreased. Can be applied. When the pitch is lowered, the frequency difference is small, but an output sound with natural sound quality can be obtained as in the case of the pitch rise.

図２のステップ５６では、振幅スペクトルデータに関して局所的ピークをスペクトルエンベロープに合わせるように振幅スペクトル分布において各スペクトルビンのスペクトル強度を修正する。図５（Ｂ）に示した例では、図５（Ａ）に示したピークＰ_０〜Ｐ_２を結ぶスペクトルエンベロープ（原音声のスペクトルエンベロープ）ＥＶａと同様の形状のスペクトルエンベロープＥＶｂにピッチ上昇後のピークＰ_０〜Ｐ_２を合わせるように振幅スペクトル分布において各スペクトルビンのスペクトル強度を修正する。この結果、原音声と同一の音色が得られる。原音声とは異なる音色を得たいときは、図１３に関して前述したようにスペクトルエンベロープＥＶｂを適宜変更し、変更に係るスペクトルエンベロープＥＶｂに合わせて振幅スペクトル分布において各スペクトルビンのスペクトル強度を修正すればよい。 Step 56 of FIG. 2 modifies the spectral intensity of each spectral bin in the amplitude spectral distribution to match the local peak with respect to the amplitude spectral data to the spectral envelope. In the example shown in FIG. 5B, the spectrum envelope EVb having the same shape as the spectrum envelope (spectrum envelope of the original speech) EVa connecting the peaks P _{0 to} P ₂ shown in FIG. The spectral intensity of each spectral bin is corrected in the amplitude spectral distribution so that the peaks P _{0 to} P ₂ are matched. As a result, the same timbre as the original voice can be obtained. When it is desired to obtain a tone different from the original voice, the spectral envelope EVb is appropriately changed as described above with reference to FIG. 13, and the spectral intensity of each spectral bin is corrected in the amplitude spectral distribution according to the changed spectral envelope EVb. Good.

図６は、この発明に係るピッチ・音色変更処理の一例を示すもので、この例では、ピッチ変更処理としてピッチ低下処理をステップ５４で行ない、音色変更処理として原音声とは異なる音色を付与する処理をステップ５６で行なう。図６において、図１６と同様の部分には同様の符号を付してある。 FIG. 6 shows an example of the pitch / timbre change process according to the present invention. In this example, the pitch reduction process is performed in step 54 as the pitch change process, and a timbre different from the original voice is given as the timbre change process. Processing is performed at step 56. In FIG. 6, the same parts as those in FIG. 16 are denoted by the same reference numerals.

図６（Ａ）には、図１６（Ａ）に示したのと同様の各スペクトル分布領域毎の振幅スペクトル分布を示す。ピッチ低下処理では、スペクトル分布領域Ｒ_０，Ｒ_１，Ｒ_２の振幅スペクトル分布をそれぞれ図６（Ｂ）に示すように周波数軸上で低音側に移動する（すなわち、このような移動を可能にすべく振幅スペクトルデータを修正する）。移動後の振幅スペクトル分布において、局所的ピークＰ_０１，Ｐ_１１，Ｐ_２１に対応するピーク周波数は、それぞれＦ_０，Ｆ_２，Ｆ_５である。 FIG. 6A shows an amplitude spectrum distribution for each spectrum distribution region similar to that shown in FIG. In the pitch reduction process, the amplitude spectrum distributions of the spectrum distribution regions R ₀ , R ₁ , and R ₂ are moved to the bass side on the frequency axis as shown in FIG. 6B (that is, such movement is possible). Correct the amplitude spectrum data accordingly). In the amplitude spectrum distribution after movement, the peak frequencies corresponding to the local peaks P ₀₁ , P ₁₁ , and P ₂₁ are F ₀ , F ₂ , and F ₅ , respectively.

音色変更処理では、原音声のスペクトルエンベロープＥＶａとは形状が異なる所定のスペクトルエンベロープＥＶｃを想定する。この場合、ピーク周波数Ｆ_０，Ｆ_２，Ｆ_５に対応する振幅スペクトル分布だけではスペクトルエンベロープＥＶｃを十分に表現できない。そこで、ピーク周波数Ｆ_０とＦ_２との間にはピーク周波数Ｆ_１を、ピーク周波数Ｆ_２とＦ_５との間にはピーク周波数Ｆ_３，Ｆ_４を、ピーク周波数Ｆ_５の高音側にはピーク周波数Ｆ_６，Ｆ_７をそれぞれ設定する。この設定処理は、入力部２０のマウス又はキーボード等の操作により行なってもよく、あるいは変更したい音色の種類（又はエンベロープＥＶｃ）と対応付けて記憶している周波数情報を読出すことで実行するようにしてもよい。 In the tone color changing process, a predetermined spectrum envelope EVc having a shape different from that of the spectrum envelope EVa of the original sound is assumed. In this case, the spectrum envelope EVc cannot be sufficiently expressed only by the amplitude spectrum distribution corresponding to the peak frequencies F ₀ , F ₂ , and F ₅ . Therefore, the peak frequency F ₁ is between the peak frequencies F ₀ and F ₂ , the peak frequencies F ₃ and F ₄ are between the peak frequencies F ₂ and F _5, and the high frequency side of the peak frequency F ₅ is Peak frequencies F ₆ and F ₇ are set, respectively. This setting process may be performed by operating the mouse or keyboard of the input unit 20, or may be executed by reading out frequency information stored in association with the type of tone (or envelope EVc) desired to be changed. It may be.

次に、Ｆ_０〜Ｆ_７の各ピーク周波数毎にスペクトルエンベロープ値（スペクトルエンベロープＥＶｃを形成するためのエンベロープ値）を指示する。この場合、図６（Ａ）に示すスペクトル分布領域Ｒ_０〜Ｒ_２の振幅スペクトル分布においてＦ_０〜Ｆ_７の各ピーク周波数毎に局所的ピークＰ_０〜Ｐ_２のいずれかに関してスペクトル強度をエンベロープ値として指示する。エンベロープ値を指示する際には、図７に示すグラフを用いることができる。 Next, a spectrum envelope value (envelope value for forming the spectrum envelope EVc) is designated for each peak frequency of F _{0 to} F ₇ . In this case, in the amplitude spectrum distribution of the spectrum distribution region R _{0 to} R ₂ shown in FIG. 6A, the spectrum intensity is enveloped for any of the local peaks P _{0 to} P ₂ for each peak frequency of F _{0 to} F _7. Specify as a value. When designating the envelope value, the graph shown in FIG. 7 can be used.

図７は、図６（Ａ）の振幅スペクトル分布に関してピッチ変更後の各ピーク周波数毎にスペクトル強度をエンベロープ値として指示するもので、ｘ軸にはピッチ変更後のピーク周波数Ｆ_０〜Ｆ_７を、ｙ軸にはピッチ変更前のピーク周波数ｆ_０〜ｆ_３を、右側にはピーク周波数ｆ_０〜ｆ_３にそれぞれ対応する局所的ピークＰ_０〜Ｐ_３のスペクトル強度Ｍ_０〜Ｍ_３をそれぞれ示す。図７のグラフは、本願の発明者により音色マッピング（Timbre Mapping）関数図と名付けられたもので、音色設定に用いて便利なものである。線N（ｘ）は、原音声の音色を変更しない場合に相当し、線Ｋ（ｘ）は、原音声の音色を変更する場合に相当する。 FIG. 7 indicates the spectrum intensity as an envelope value for each peak frequency after the pitch change with respect to the amplitude spectrum distribution of FIG. 6 (A), and the peak frequencies F _{0 to} F ₇ after the pitch change are indicated on the x-axis. , Y-axis shows peak frequencies f _{0 to} f ₃ before the pitch change, and right side shows spectral intensities M _{0 to} M ₃ of local peaks P _{0 to} P ₃ corresponding to the peak frequencies f _{0 to} f ₃ , respectively. Show. The graph of FIG. 7 is named as a timbre mapping function diagram by the inventor of the present application, and is convenient for use in timbre setting. Line N (x) corresponds to the case where the timbre of the original voice is not changed, and line K (x) corresponds to the case where the timbre of the original voice is changed.

図７のグラフを用いる場合、図６（Ａ）の領域Ｒ_０〜Ｒ_２の振幅スペクトル分布においてピークＰ_０〜Ｐ_２を結ぶように領域Ｒ_０の下限周波数ｆａから領域Ｒ_２の上限周波数ｆｄまで延長するスペクトルエンベロープＥＶａを補間処理等により作成する。また、表示部２４の表示画面には、図６（Ａ）の領域Ｒ_０〜Ｒ_２の振幅スペクトル分布（エンベロープＥＶａも含む）と、図６（Ｂ）のピーク周波数Ｆ_０，Ｆ_２，Ｆ_５に対応する振幅スペクトル分布と、図７のグラフとを表示する。このような表示状態において、入力部２０のマウス又はキーボード等の操作により図７のグラフ上で所望の位置にカーソルを当てて位置指定を行なうことにより指定に係る位置にＫ_１等のマークが表示される。この結果、エンベロープ値が指示され、指示に係るエンベロープ値は、ＲＡＭ１６に記憶されると共にエンベロープＥＶａ上に点で表示される。なお、表示画面上での位置指定は、入力ペン等を用いて行なってもよい。 When the graph of FIG. 7 is used, the lower limit frequency fa of the region R ₀ to the upper limit frequency fd of the region R ₂ so as to connect the peaks P _{0 to} P ₂ in the amplitude spectrum distribution of the regions R _{0 to} R ₂ of FIG. A spectral envelope EVa that extends up to is created by interpolation processing or the like. Further, the display screen of the display unit 24 includes the amplitude spectrum distribution (including the envelope EVa) of the regions R _{0 to} R ₂ in FIG. 6A and the peak frequencies F ₀ , F ₂ , F in FIG. 6B. _The amplitude spectrum distribution corresponding to ₅ and the graph of FIG. 7 are displayed. In such a display state, mark ₁ like K in position according to the specified by performing a positioned against the cursor to a desired position the mouse or by operating the keyboard or the like on the graph of FIG. 7 of the input unit 20 is displayed Is done. As a result, an envelope value is instructed, and the envelope value according to the instruction is stored in the RAM 16 and displayed as a dot on the envelope EVa. The position designation on the display screen may be performed using an input pen or the like.

ピーク周波数Ｆ_０に対応するエンベロープ値（図６（Ｂ）の点Ｐ_０１に対応）としては、マークＫ_０の位置を指定することによりｙ＝Ｋ（Ｆ_０）なる周波数（ｆ_０より低い周波数）に対応するエンベロープＥＶａ上のスペクトル強度（Ｍ_０より若干大きいスペクトル強度）が指示される。ピーク周波数Ｆ_１に対応するエンベロープ値（図６（Ｂ）の点Ｐ_０２に対応）としては、マークＫ_１の位置を指定することによりｙ＝Ｋ（Ｆ_１）なる周波数（ｆ_０より高い周波数）に対応するエンベロープＥＶａ上のスペクトル強度（Ｍ_０より若干小さいスペクトル強度）が指示される。 As an envelope value corresponding to the peak frequency F ₀ (corresponding to the point P ₀₁ in FIG. 6B), by specifying the position of the mark K ₀ , a frequency y = K (F ₀ ) (a frequency lower than f ₀ ) ) slightly larger spectral intensity than the spectral intensity (M ₀ on the envelope EVa corresponding to) is indicated. As an envelope value corresponding to the peak frequency F ₁ (corresponding to the point P ₀₂ in FIG. 6B), by specifying the position of the mark K ₁ , a frequency (frequency higher than f ₀ ) becomes y = K (F ₁ ). ) slightly smaller spectral intensity than the spectral intensity (M ₀ on the envelope EVa corresponding to) is indicated.

同様にして、ピーク周波数Ｆ_２に対応するエンベロープ値（点Ｐ_１１に対応）としては、マークＫ_２の位置指定によりｙ＝Ｋ（Ｆ_２）に対応するスペクトル強度（Ｍ_１より若干小さいスペクトル強度）が指示される。また、ピーク周波数Ｆ_３に対応するエンベロープ値（点Ｐ_１２に対応）としては、マークＫ_３の位置指定によりｙ＝Ｋ（Ｆ_３）に対応するスペクトル強度（Ｍ_０より若干小さいスペクトル強度）が指示されると共に、ピーク周波数Ｆ_４に対応するエンベロープ値（点Ｐ_１３に対応）としては、マークＫ_４の位置指定によりｙ＝Ｋ（Ｆ_４）に対応するスペクトル強度（Ｍ_０より若干小さいスペクトル強度）が指示される。さらに、ピーク周波数Ｆ_５，Ｆ_６，Ｆ_７に対応するエンベロープ値（点Ｐ_２１，Ｐ_２２，Ｐ_２３に対応）としては、マークＫ_５，Ｋ_６，Ｋ_７の位置指定によりｙ＝Ｋ（Ｆ_５），ｙ＝Ｋ（Ｆ_６），ｙ＝Ｋ（Ｆ_７）に対応するスペクトル強度（ピークＰ_２の近傍のスペクトル強度）がそれぞれ指示される。 Similarly, as the envelope value corresponding to the peak frequency F ₂ (corresponding to the point P ₁₁ ), the spectrum intensity corresponding to y = K (F ₂ ) by the position designation of the mark K ₂ (spectrum intensity slightly smaller than M ₁ ). ) Is instructed. Further, as the envelope value corresponding to the peak frequency F ₃ (corresponding to the point P ₁₂ ), the spectrum intensity corresponding to y = K (F ₃ ) (spectrum intensity slightly smaller than M ₀ ) is designated by the position designation of the mark K _3. As indicated, the envelope value corresponding to the peak frequency F ₄ (corresponding to the point P ₁₃ ) is the spectrum intensity corresponding to y = K (F ₄ ) by the position designation of the mark K ₄ (a spectrum slightly smaller than M ₀ ). Strength) is indicated. Further, envelope values corresponding to the peak frequencies F ₅ , F ₆ , and F ₇ (corresponding to points P ₂₁ , P ₂₂ , and P ₂₃ ) are set to y = K (by position designation of the marks K ₅ , K ₆ , and K ₇ ). F ₅ ), y = K (F ₆ ), and spectrum intensity corresponding to y = K (F ₇ ) (spectrum intensity in the vicinity of peak P ₂ ) are respectively indicated.

Ｆ_０〜Ｆ_７の各ピーク周波数毎にエンベロープ値を指示する場合、スペクトル分布領域Ｒ_０〜Ｒ_２のいずれにおいてもエンベロープ値を指示可能であるが、自然な音色を得るためには、Ｆ_０〜Ｆ_７の各ピーク周波数毎に該ピーク周波数に近いピッチ変更前のピーク周波数を有するスペクトル分布領域においてエンベロープ値を指示するのが望ましい。図７のグラフには、Ｆ_０〜Ｆ_７の各ピーク周波数をｘ軸に、ピーク変更前のピーク周波数ｆ_０〜ｆ_３をｙ軸にそれぞれ示してあるので、Ｆ_０〜Ｆ_７の各ピーク周波数毎にそれに近いピッチ変更前のピーク周波数に関してエンベロープ値を指示することができる。図７の例では、ピーク周波数Ｆ_０，Ｆ_１，Ｆ_３，Ｆ_４についてはスペクトル分布領域Ｒ_０において、ピーク周波数Ｆ_２についてはスペクトル分布領域Ｒ_１において、ピーク周波数Ｆ_５〜Ｆ_７についてはスペクトル分布領域Ｒ_２においてそれぞれエンベロープ値を指示している。 When an envelope value is indicated for each peak frequency of F _{0 to} F ₇ , the envelope value can be indicated in any of the spectrum distribution regions R _{0 to} R ₂ , but in order to obtain a natural tone, F ₀ for each peak frequency of the to F ₇ to indicate the envelope values in the spectral distribution region having a pitch change previous peak frequency near the peak frequency is desirable. In the graph of FIG. 7, the peak frequencies of F _{0 to} F ₇ are shown on the x-axis, and the peak frequencies f _{0 to} f ₃ before the peak change are shown on the y-axis, so that the peaks of F _{0 to} F ₇ are shown. For each frequency, the envelope value can be indicated with respect to the peak frequency before the pitch change close to that. In the example of FIG. 7, the peak frequencies F ₀ , F ₁ , F ₃ , and F ₄ are in the spectrum distribution region R ₀ , the peak frequency F ₂ is in the spectrum distribution region R ₁ , and the peak frequencies F _{5 to} F ₇ are in the spectrum distribution region R ₀ . each instructs the envelope values in the spectral distribution region R _2.

上記した例では、図７のグラフを用いてエンベロープ値の指示を行なったが、図７のグラフを用いなくてもエンベロープ値の指示を行なうことができる。例えば、表示部２４の表示画面には、図６（Ａ）の領域Ｒ_０〜Ｒ_２の振幅スペクトル分布（エンベロープＥＶａも含む）と、図６（Ｂ）のピーク周波数Ｆ_０，Ｆ_２，Ｆ_５に対応する振幅スペクトル分布とを表示する。このような表示状態においてエンベロープＥＶａ上でＦ_０〜Ｆ_７の各ピーク周波数毎に位置指定を行なうことによりエンベロープ値を指示することができる。指示に係る各エンベロープ値は、エンベロープＥＶａ上に点で表示する。また、エンベロープＥＶａの表示を省略しても、例えばピークＰ_０を基準として上下方向（又は斜め左右方向）の位置を指定することによりエンベロープ値を指示することができる。この場合、指示に係るエンベロープ値は、基準としたピークの近傍において指定に係る位置に点で表示すればよい。さらに、スペクトルエンベロープＥＶｃを用いてエンベロープ値の指示を行なうことも可能である。例えば、表示部２４の表示画面には、図６（Ｂ）のピーク周波数Ｆ_０，Ｆ_２，Ｆ_５に対応する振幅スペクトル分布と、エンベロープＥＶｃとを表示する。このような表示状態においてエンベロープＥＶｃ上で点Ｐ_０１，Ｐ_０２，Ｐ_１１〜Ｐ_１３，Ｐ_２１〜Ｐ_２３をそれぞれ指定することによりエンベロープ値を指示してもよい。エンベロープＥＶｃとしては、入力ペン等により任意の形状のエンベロープを表示画面上に描くことができる。 In the above example, the envelope value is designated using the graph of FIG. 7, but the envelope value can be designated without using the graph of FIG. For example, the display screen of the display unit 24 includes the amplitude spectrum distribution (including the envelope EVa) of the regions R _{0 to} R ₂ in FIG. 6A and the peak frequencies F ₀ , F ₂ , F in FIG. ₅ and an amplitude spectrum distribution corresponding to ₅ are displayed. In such a display state, the envelope value can be indicated by specifying the position for each peak frequency of F _{0 to} F ₇ on the envelope EVa. Each envelope value related to the instruction is displayed as a point on the envelope EVa. Also, be omitted display envelope EVa, you can instruct the envelope value by specifying a position in the vertical direction (or oblique lateral direction) for example a peak P ₀ as a reference. In this case, the envelope value according to the instruction may be displayed as a point at the designated position in the vicinity of the reference peak. Further, it is possible to instruct the envelope value using the spectrum envelope EVc. For example, the display screen of the display unit 24 displays the amplitude spectrum distribution corresponding to the peak frequencies F ₀ , F ₂ , and F ₅ in FIG. 6B and the envelope EVc. In such a display state, the envelope value may be indicated by designating points P ₀₁ , P ₀₂ , P _{11 to} P ₁₃ , and P _{21 to} P ₂₃ on the envelope EVc. As the envelope EVc, an envelope having an arbitrary shape can be drawn on the display screen using an input pen or the like.

次に、設定に係るピーク周波数Ｆ_１，Ｆ_３，Ｆ_４，Ｆ_６，Ｆ_７については、指示に係るエンベロープ値毎に該エンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を選択する。この場合、Ｆ_１，Ｆ_３，Ｆ_４，Ｆ_６，Ｆ_７の各ピーク周波数毎に該ピーク周波数と所定の近似関係にあるピッチ低下前のピーク周波数を有する複数のスペクトル分布領域のうちから指示に係るエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を選択する。所定の近似関係としては、例えば近さの順位が１〜２位の範囲内にある関係を採用することができる。このようにするのは、ピッチ変更の前後でピーク周波数が近い方が自然な音色を得やすいからである。図７の例において、ピーク周波数Ｆ_１については、マークＫ_１の位置に最も近いピーク周波数がｆ_０であり且つピークのスペクトル強度がＭ_０であることからピークＰ_０を有するスペクトル分布領域が選択される。同様にして、ピーク周波数Ｆ_３，Ｆ_４については、ピークＰ_０を有するスペクトル分布領域が選択され、ピーク周波数Ｆ_６，Ｆ_７については、ピークＰ_２を有するスペクトル分布領域が選択される。選択処理は、入力部２０のマウス又はキーボード等によるエンベロープ値指示操作に基づいて自動的に行なうことができる。 Next, for the peak frequencies F ₁ , F ₃ , F ₄ , F ₆ , and F ₇ according to the setting, a spectral distribution region having the spectral intensity of the local peak closest to the envelope value is determined for each envelope value according to the instruction. select. In this case, for each peak frequency of F ₁ , F ₃ , F ₄ , F ₆ , and F _7, an indication is made from among a plurality of spectral distribution regions having a peak frequency before pitch reduction that has a predetermined approximate relationship with the peak frequency. A spectral distribution region having the spectral intensity of the local peak closest to the envelope value is selected. As the predetermined approximate relationship, for example, a relationship in which the closeness rank is in the range of 1 to 2 can be adopted. This is because it is easier to obtain a natural tone when the peak frequency is closer before and after the pitch change. In the example of FIG. 7, for the peak frequency F ₁ , the spectrum distribution region having the peak P ₀ is selected because the peak frequency closest to the position of the mark K ₁ is f ₀ and the peak spectral intensity is M _0. Is done. Similarly, the spectrum distribution region having the peak P ₀ is selected for the peak frequencies F ₃ and F ₄ , and the spectrum distribution region having the peak P ₂ is selected for the peak frequencies F ₆ and F ₇ . The selection process can be automatically performed based on an envelope value instruction operation using a mouse or a keyboard of the input unit 20.

次に、選択に係る各スペクトル分布領域の振幅スペクトルデータ及び位相スペクトルデータをステップ５０，５２での生成に係る振幅スペクトルデータ及び位相スペクトルデータのうちからコピーする。ピーク周波数Ｆ_１，Ｆ_３，Ｆ_４については、スペクトル分布領域Ｒ_０の振幅スペクトルデータ及び位相スペクトルデータをコピーし、ピーク周波数Ｆ_６，Ｆ_７については、スペクトル分布領域Ｒ_２の振幅スペクトルデータ及び位相スペクトルデータをコピーする。 Next, the amplitude spectrum data and the phase spectrum data of each spectrum distribution region related to the selection are copied from the amplitude spectrum data and the phase spectrum data related to the generation in steps 50 and 52. For the peak frequencies F ₁ , F ₃ and F ₄ , the amplitude spectrum data and phase spectrum data in the spectrum distribution region R ₀ are copied, and for the peak frequencies F ₆ and F ₇ , the amplitude spectrum data in the spectrum distribution region R ₂ and Copy phase spectrum data.

次に、コピーに係る各振幅スペクトルデータ毎に該振幅スペクトルデータが表わす振幅スペクトル分布において局所的ピークに対応するピーク周波数を設定に係るピーク周波数に変更するように振幅スペクトル分布を周波数軸上で移動する（すなわち、このような移動を可能にすべく振幅スペクトルデータを修正する）。例えば、ピーク周波数Ｆ_１に対応してコピーされた振幅スペクトルデータについては、ピーク周波数をｆ_０からＦ_１に変更するように振幅スペクトル分布を周波数軸上で高音側に移動する。ピーク周波数Ｆ_３，Ｆ_４にそれぞれ対応してコピーされた振幅スペクトルデータについては、ピーク周波数をｆ_０からＦ_３，Ｆ_４にそれぞれ変更するように振幅スペクトル分布を周波数軸上で高音側に移動する。ピーク周波数Ｆ_６，Ｆ_７にそれぞれ対応してコピーされた振幅スペクトルデータについては、ピーク周波数をｆ_２からＦ_６，Ｆ_７にそれぞれ変更するように振幅スペクトル分布を周波数軸上で高音側に移動する。 Next, for each amplitude spectrum data related to the copy, the amplitude spectrum distribution is moved on the frequency axis so that the peak frequency corresponding to the local peak in the amplitude spectrum distribution represented by the amplitude spectrum data is changed to the peak frequency related to the setting. (Ie, modify the amplitude spectrum data to allow such movement). For example, for the amplitude spectrum data copied corresponding to the peak frequency F ₁ , the amplitude spectrum distribution is moved to the high pitch side on the frequency axis so that the peak frequency is changed from f ₀ to F ₁ . The amplitude spectrum data respectively copied to correspond to the peak frequency F _3, F ₄ moves, the peak frequency of the amplitude spectrum distribution to change respectively from f ₀ to F _3, F ₄ treble side on the frequency axis To do. The amplitude spectrum data respectively copied to correspond to the peak frequency F _6, F ₇ move, the amplitude spectrum distribution to change respectively the peak frequency F _6, F ₇ from f ₂ treble side on the frequency axis To do.

次に、ピッチ低下処理によりピーク周波数Ｆ_０，Ｆ_２，Ｆ_５を持つに至った振幅スペクトルデータが表わす振幅スペクトル分布において各振幅スペクトル分布毎に局所的ピークのスペクトル強度を先の指示に係るエンベロープ値に合わせるように各スペクトルビンのスペクトル強度を修正する。例えば、ピーク周波数Ｆ_０に対応する振幅スペクトル分布においては、局所的ピークのスペクトル強度を先の指示に係るエンベロープ値（点Ｐ_０１に対応）に合わせるように各スペクトルビンのスペクトル強度を修正する。ピーク周波数Ｆ_２，Ｆ_５にそれぞれ対応する振幅スペクトル分布においても、局所的ピークのスペクトル強度を先の指示に係るエンベロープ値（点Ｐ_１１，Ｐ_２１に対応）に合わせるように各スペクトルビンのスペクトル強度を修正する。 Next, in the amplitude spectrum distribution represented by the amplitude spectrum data that has peak frequencies F ₀ , F ₂ , and F ₅ by the pitch reduction process, the spectral intensity of the local peak for each amplitude spectrum distribution is indicated in the envelope according to the previous instruction. The spectral intensity of each spectral bin is modified to match the value. For example, in the amplitude spectrum distribution corresponding to the peak frequency F ₀ , the spectrum intensity of each spectrum bin is corrected so that the spectrum intensity of the local peak matches the envelope value (corresponding to the point P ₀₁ ) according to the previous instruction. Even in the amplitude spectrum distribution corresponding to each of the peak frequencies F ₂ and F ₅ , the spectrum of each spectrum bin so that the spectrum intensity of the local peak matches the envelope value (corresponding to the points P ₁₁ and P ₂₁ ) according to the previous instruction. Correct the strength.

このようなスペクトル強度の修正は、コピーに係る各振幅スペクトルデータについても同様にして行なわれる。すなわち、ピーク周波数Ｆ_１，Ｆ_３，Ｆ_４，Ｆ_６，Ｆ_７にそれぞれ対応する振幅スペクトル分布において、局所的ピークのスペクトル強度を先の指示に係るエンベロープ値（点Ｐ_０２，Ｐ_１２，Ｐ_１３，Ｐ_２２，Ｐ_２３に対応）にそれぞれ合わせるように各スペクトルビンのスペクトル強度を修正する。 Such correction of the spectrum intensity is performed in the same manner for each amplitude spectrum data related to the copy. That is, in the amplitude spectrum distribution corresponding to each of the peak frequencies F ₁ , F ₃ , F ₄ , F ₆ , and F ₇ , the spectral intensity of the local peak is set as the envelope value (points P ₀₂ , P ₁₂ , P ₁₃ , P ₂₂ , and P ₂₃ ), and the spectral intensity of each spectral bin is corrected.

上記のようなピッチ・音色変更処理によれば、図６（Ｂ）に示すようにピーク周波数Ｆ_０〜Ｆ_７に対応する８つの振幅スペクトル分布がピークＰ_０１，Ｐ_０２，Ｐ_１１〜Ｐ_１３，Ｐ_２１〜Ｐ_２３をスペクトルエンベロープＥＶｃに合わせた状態で配置されることになる。図６（Ｂ）では、隣り合うスペクトル分布領域毎に振幅スペクトル分布が重なるようになっているが、隣り合うスペクトル分布領域毎に両領域の中央の周波数位置の近傍で低音側のスペクトル分布領域の上限周波数及び高音側のスペクトル分布領域の下限周波数をそれぞれ新たに設定することにより振幅スペクトル分布の重なりが生じないようにすることができる。あるいは隣り合うスペクトル分布領域毎に振幅スペクトル分布の重なり合う個所では周波数同一のスペクトルビンのスペクトル強度をそのまま加算するだけでもよい。なお、図６に関して上記したような音色変更処理は、ピッチ低下の場合に限らず、ピッチ上昇の場合にも行なうことができる。 According to the pitch / tone color changing process as described above, as shown in FIG. 6B, eight amplitude spectrum distributions corresponding to the peak frequencies F _{0 to} F ₇ are peaks P ₀₁ , P ₀₂ , P _{11 to} P _13. , P _{21 to} P ₂₃ are arranged in accordance with the spectrum envelope EVc. In FIG. 6B, the amplitude spectrum distribution is overlapped for each adjacent spectrum distribution region. However, for each adjacent spectrum distribution region, the low frequency side spectrum distribution region is near the center frequency position of both regions. By newly setting the upper limit frequency and the lower limit frequency of the spectrum distribution region on the treble side, overlapping of the amplitude spectrum distribution can be prevented. Alternatively, the spectral intensities of spectral bins having the same frequency may be simply added as they are at the portions where the amplitude spectral distributions overlap for each adjacent spectral distribution region. Note that the timbre changing process as described above with reference to FIG. 6 can be performed not only when the pitch is lowered but also when the pitch is raised.

図６に関して上記したピッチ・音色変更処理によれば、設定に係るピーク周波数に近いピッチ変更前のピッチ周波数を有し且つ指示に係るエンベロープ値に最も近い局所的ピークのスペクトル強度を有するスペクトル分布領域を選択し、このスペクトル分布領域の振幅スペクトルデータ及び位相スペクトルデータをコピーして音声信号発生に用いるので、自然な音色を得るのが容易となる。また、局所的ピークのスペクトル強度をエンベロープ値に合わせる際に振幅スペクトルデータにおいて各スペクトルビンのスペクトル強度をさほど増大させなくてよいので、出力音の音質は、ノイズっぽさがない自然な音質となる。 According to the pitch / timbre change processing described above with reference to FIG. 6, the spectral distribution region having the pitch intensity before the pitch change close to the set peak frequency and having the local peak spectral intensity closest to the indicated envelope value. Is selected, and the amplitude spectrum data and phase spectrum data in this spectrum distribution region are copied and used for generating an audio signal, so that it is easy to obtain a natural timbre. In addition, when adjusting the spectral intensity of the local peak to the envelope value, it is not necessary to increase the spectral intensity of each spectral bin in the amplitude spectral data so much, so the sound quality of the output sound is a natural sound quality without noise. Become.

次に、図３のルートＪ_１に従って（ステップ５８，６０を経由しないで）ステップ６２に移る。ステップ６２では、振幅スペクトルデータに関する振幅スペクトル分布配置の変更に対応して位相スペクトルデータに関して位相スペクトル分布配置を変更する。すなわち、図５に関して前述したピッチ上昇処理又は図６に関して前述したピッチ低下処理を行なった場合には、ステップ５２での生成に係る各位相スペクトルデータが表わす位相スペクトル分布を図１４，１５に関して前述したようにステップ５４でのピッチ変更に対応してスペクトル分布領域毎に修正する。また、図６に関して前述したピッチ低下処理を行なった場合には、コピーに係る各位相スペクトルデータが表わす位相スペクトル分布を該位相スペクトル分布に対応し且つコピーに係る振幅スペクトル分布の周波数変更に対応して修正する。例えば、ピーク周波数Ｆ_１に対応する位相スペクトル分布についてはｆ_０からＦ_１への周波数変更に対応して位相スペクトル分布を修正する。他のピーク周波数Ｆ_３，Ｆ_４，Ｆ_６，Ｆ_７に対応する位相スペクトル分布についても同様にして修正を行なう。 Then, (without going through the steps 58, 60) according to the route _{J 1} in FIG. 3 proceeds to step 62. In step 62, the phase spectrum distribution arrangement is changed for the phase spectrum data in response to the change of the amplitude spectrum distribution arrangement for the amplitude spectrum data. That is, when the pitch increase process described above with reference to FIG. 5 or the pitch decrease process described above with reference to FIG. 6 is performed, the phase spectrum distribution represented by each phase spectrum data related to the generation in step 52 is described above with reference to FIGS. Thus, correction is made for each spectral distribution region corresponding to the pitch change in step 54. When the pitch reduction process described above with reference to FIG. 6 is performed, the phase spectrum distribution represented by each phase spectrum data related to copying corresponds to the phase spectrum distribution and corresponds to the frequency change of the amplitude spectrum distribution related to copying. To correct. For example, for the phase spectrum distribution corresponding to the peak frequency F ₁ , the phase spectrum distribution is corrected in response to the frequency change from f ₀ to F ₁ . The phase spectrum distributions corresponding to the other peak frequencies F ₃ , F ₄ , F ₆ and F ₇ are similarly corrected.

この後は、図３のルートＪ_２に従って（ステップ６４，６６を経由しないで）ステップ６８に移る。ステップ６８の処理については、図１２を参照して後述する。 After this, (without going through the steps 64, 66) according to the route _{J 2} in Figure 3 proceeds to step 68. The process of step 68 will be described later with reference to FIG.

図８〜１１は、この発明に係る位相揃え処理の一例を示すもので、これらの図において、横軸は周波数ｆを、縦軸は位相（０〜２π）をそれぞれ示す。図８は、図１８に示したのと同様のピーク位相を示すもので、これらのピーク位相は、例えば図１７に関して前述したように分析窓ＦＷの中心Ｗ_Ｃを声帯振動開始位置ｔ_Ｓ２，ｔ_Ｓ３の間の中央位置近傍に合わせた状態でＦＦＴ分析を行なうことにより得られたものであり、ピーク周波数ｆ_０〜ｆ_５にそれぞれ対応するピーク位相φ_０〜φ_５が不揃いの状態にある。ピーク位相φ_０〜φ_５は、ステップ５２での生成に係るあるフレームの位相スペクトルデータによって表わされる６つの位相スペクトル分布（ｆ_０〜ｆ_５にそれぞれ対応）にそれぞれ属するものである。 8 to 11 show an example of phase alignment processing according to the present invention. In these drawings, the horizontal axis indicates the frequency f and the vertical axis indicates the phase (0 to 2π). Figure 8 shows the same peak phase to that shown in FIG. 18, these peak phase, for example the analysis window FW of the center W _C a vocal cord vibration start position t _S2 as described above with reference to FIG. _17, t _S3 are those obtained by performing an FFT analysis in a state matching the center position near between, it is in irregular state peak phase phi ₀ to [phi] ₅ respectively corresponding to the peak frequency f ₀ ~f _5. The peak phases φ _{0 to} φ ₅ belong to six phase spectrum distributions (corresponding to f _{0 to} f ₅ respectively) represented by the phase spectrum data of a certain frame related to the generation in step 52.

位相揃え処理では、図１７の状態から図１９の状態になるまでに要する時間（タイムシフト量）を求めると共に、求めたタイムシフト量を用いて図８のピーク位相φ_０〜φ_５を図９に示すように平坦状に揃ったピーク位相φ_０’〜φ_５’に変換する。図３のステップ５８では、位相スペクトルデータに関して基音のピーク位相φ_０からのタイムシフト量の候補値を多数設定し、タイムシフト量の各候補値毎に基音及び倍音のピーク位相を算出する。タイムシフト量の候補値を設定するためには、位相の候補値φ_０Ｃを０〜２πの間で４０〜８０ポイント程度設定し、各候補値φ_０Ｃ毎に次の数２の式に従ってタイムシフト量の候補値ＴＳ_Ｃを設定する。 In the phase alignment process, the time (time shift amount) required from the state of FIG. 17 to the state of FIG. 19 is obtained, and the peak phases φ _{0 to} φ ₅ of FIG. As shown in FIG. _5, the peak phases φ ₀ ′ to φ ₅ ′ are converted into a flat shape. In step 58 of FIG. 3, a large number of time shift amount candidate values from the fundamental peak phase φ ₀ are set for the phase spectrum data, and the fundamental and overtone peak phases are calculated for each candidate time shift amount. In order to set the candidate value of the time shift amount, the phase candidate value φ _0C is set to about 40 to 80 points between ₀ and 2π, and the time shift is performed according to the following equation 2 for each candidate value φ _0C. setting a candidate value TS _C amount.

ここで、ｆ_０は、基音のピーク周波数である。一例として、位相の候補値φ_０Ｃを４０ポイントとすると、タイムシフト量の候補値ＴＳ_Ｃも４０個となる。次に、各候補値ＴＳ_Ｃ毎に基音及び倍音のピーク位相を次の数３の式に従って算出する。 Here, f ₀ is the peak frequency of the fundamental tone. As an example, if the candidate value phi _0C phase and 40 points, it is 40 possible values TS _C of the time shift amounts. Then, the peak phase of the fundamental and overtone is calculated according to the formula for a number 3 for each candidate value TS _C.

ここで、ｉは、ピーク位相の番号であり、基音ではｉ＝０、第１倍音ではｉ＝１、第２倍音ではｉ＝２…となる。ある１つの候補値ＴＳ_Ｃについて、ｆ_１＝２ｆ_０，ｆ_２＝３ｆ_０…とすると、基音のピーク位相はφ_０Ｃ＝φ_０＋２πｆ_０×ＴＳ_Ｃ、第１倍音のピーク位相はφ_１Ｃ＝φ_１＋２π×２ｆ_０×ＴＳ_Ｃ、第２倍音のピーク位相はφ_２Ｃ＝φ_２＋２π×３ｆ_０×ＴＳ_Ｃ…となる。基音及び倍音の数をＮとすると、１つの候補値ＴＳ_ＣについてＮ個のピーク位相が求められる。候補値ＴＳ_Ｃは４０個であるので、１つの候補値ＴＳ_Ｃに対応するＮ個のピーク位相を１群とすると、４０群のピーク位相が求められる。 Here, i is the number of the peak phase, i = 0 for the fundamental tone, i = 1 for the first harmonic, i = 2 for the second harmonic, and so on. For a single candidate value _{_{_{TS C, f 1 = 2f 0}}} , f 2 = the 3f 0 _... to, fundamental tone of the peak phase _{_{_{φ 0C = φ 0 + 2πf 0}}} × TS C, the peak phase of the first harmonic is phi _1C = φ ₁ + 2π × 2f ₀ × TS _C , and the peak phase of the second overtone is φ _2C = φ ₂ + 2π × 3f ₀ × TS _C. When the number of fundamental and harmonic and N, N-number of peak phase is calculated for one of the candidate values TS _C. Since the candidate value TS _C is 40, when the N-number of peak phase corresponding to one candidate value TS _C and 1 group, is required 40 group peak phase.

次に、図３のステップ６０では、各群毎にＮ個のピーク位相の平均値φａｖｅを求め、この平均値からの各ピーク位相φｉｃの絶対ずれ量の和Σａｂｓ（φｉｃ−φａｖｅ）を求める。この絶対ずれ量の和が最も小さい状態が平坦に最も近い位相揃い（Maximally Flat Phase Alignment）状態となり、以下ではこれをＭＦＰＡ状態と称する。ＭＦＰＡ状態となる候補値ＴＳ_Ｃに対応する１群のピーク位相を選択する。 Next, in step 60 of FIG. 3, an average value φave of N peak phases is obtained for each group, and a sum Σabs (φic−φave) of absolute deviation amounts of the peak phases φic from this average value is obtained. The state in which the sum of the absolute deviation amounts is the smallest is the phase that is closest to the flatness (Maximally Flat Phase Alignment) state, which is hereinafter referred to as the MFPA state. Selecting a group of peak phase corresponding to the candidate value TS _C to be MFPA state.

一例として、Ｎ＝６とすると、選択に係る１群内には図８のピーク位相φ_０〜φ_５にそれぞれ対応する第１〜第６のピーク位相が含まれている。選択に係る１群内の第１〜第６のピーク位相にそれぞれ一致するように図８のピーク位相φ_０〜φ_５を修正することにより図９に示すようにＭＦＰＡ状態にあるピーク位相φ_０’〜φ_５’が得られる。図８のピーク位相φ_０〜φ_５を示す位相スペクトルデータは、修正の結果として図９のピーク位相φ_０’〜φ_５’を示すものとなる。 As an example, when N = 6, the first to sixth peak phases respectively corresponding to the peak phases φ _{0 to} φ ₅ in FIG. Peak phase phi ₀ in MFPA state as shown in FIG. 9 by modifying the peak phase phi ₀ to [phi] ₅ of FIG. 8, as each of the first to sixth peak phase within a group according to selected matches '~ Φ ₅ ' is obtained. The phase spectrum data indicating the peak phases φ _{0 to} φ ₅ in FIG. 8 indicates the peak phases φ ₀ ′ to φ ₅ ′ in FIG. 9 as a result of correction.

次に、図３のステップ６２では、振幅スペクトルデータに関する振幅スペクトル分布配置の変更に対応して位相スペクトルデータに関して位相スペクトル分布配置を変更する。一例として、図６に関して前述したものに類似するピッチ低下処理を行なった場合には、図１０に示すように図９での修正に係る位相スペクトルデータが表わす位相スペクトル分布においてスペクトル分布領域毎に各周波数をピッチ低下に対応して修正する。 Next, in step 62 of FIG. 3, the phase spectrum distribution arrangement is changed with respect to the phase spectrum data in response to the change of the amplitude spectrum distribution arrangement with respect to the amplitude spectrum data. As an example, when a pitch reduction process similar to that described above with reference to FIG. 6 is performed, as shown in FIG. 10, each phase distribution in the phase spectrum data represented by the phase spectrum data related to the correction in FIG. Correct the frequency to accommodate the pitch drop.

図１０の例では、ピーク周波数Ｆ_０１，Ｆ_１１，Ｆ_２１，Ｆ_３１，Ｆ_４１，Ｆ_５１は、それぞれピーク周波数ｆ_０，ｆ_１，ｆ_２，ｆ_３，ｆ_４，ｆ_５を周波数軸上で低音側に移動したものであり、ピーク周波数Ｆ_０２，Ｆ_１２，Ｆ_２２，Ｆ_３２，Ｆ_４２，Ｆ_５２は、それぞれピーク周波数ｆ_０，ｆ_１，ｆ_２，ｆ_３，ｆ_４，ｆ_５をコピー処理に伴って変更したものである。ピーク周波数のｆ_０からＦ_０１への変更に対応してピーク位相φ_０’の属する位相スペクトル分布において各スペクトルビンの周波数を変更すると、ピーク周波数Ｆ_０１に対応する位置にピーク位相φ_０１’が得られる。同様にしてピーク周波数Ｆ_０２，Ｆ_１１，Ｆ_１２，Ｆ_２１，Ｆ_２２，Ｆ_３１，Ｆ_３２，Ｆ_４１，Ｆ_４２，Ｆ_５１，Ｆ_５２にそれぞれ対応する位置にピーク位相φ_０２’，φ_１１’，φ_１２’， φ_２１’，φ_２２’，φ_３１’，φ_３２’，φ_４１’，φ_４２’，φ_５１’，φ_５２’が得られる。なお、音色変更処理において、ピーク周波数Ｆ_０２，Ｆ_１２，Ｆ_２２，Ｆ_３２，Ｆ_４２，Ｆ_５２を設定しなかった場合には、ピーク位相φ_０２’，φ_１２’，φ_２２’，φ_３２’，φ_４２’，φ_５２’は存在しない。 In the example of FIG. 10, the peak frequencies F ₀₁ , F ₁₁ , F ₂₁ , F ₃₁ , F ₄₁ , and F ₅₁ have peak frequencies f ₀ , f ₁ , f ₂ , f ₃ , f ₄ , and f ₅ as frequency axes, respectively. The peak frequencies F ₀₂ , F ₁₂ , F ₂₂ , F ₃₂ , F ₄₂ , and F ₅₂ are peak frequencies f ₀ , f ₁ , f ₂ , f ₃ , f ₄ , it is modified with a f ₅ to the copy process. When the frequency of each spectrum bin is changed in the phase spectrum distribution to which the peak phase φ ₀ ′ belongs in response to the change of the peak frequency from f ₀ to F ₀₁ , the peak phase φ ₀₁ ′ is at the position corresponding to the peak frequency F _01. can get. Similarly, peak phases φ ₀₂ ′, φ are respectively located at positions corresponding to the peak frequencies F ₀₂ , F ₁₁ , F ₁₂ , F ₂₁ , F ₂₂ , F ₃₁ , F ₃₂ , F ₄₁ , F ₄₂ , F ₅₁ , F _52. _{_{11 ', φ 12', φ}} 21 ', φ 22', φ 31 ', φ 32', φ 41 ', φ 42', φ 51 ', φ 52' is obtained. If the peak frequencies F ₀₂ , F ₁₂ , F ₂₂ , F ₃₂ , F ₄₂ , and F ₅₂ are not set in the timbre change process, the peak phases φ ₀₂ ′, φ ₁₂ ′, φ ₂₂ ′, φ _{_{32 ', φ 42', φ}} 52 ' does not exist.

次に、図３のステップ６４では、位相スペクトルデータに関してピッチ変更に伴う位相変更量を考慮して位相揃え前の基音のピーク位相へのタイムシフト量を求め、求めたタイムシフト量に応じて各ピーク位相を修正する。このときのタイムシフトは、位相揃えのためのタイムシフトを元へ戻すために行なわれるものである。このときのタイムシフト量ＴＳは、次の数４の式に従って求めることができる。 Next, in step 64 of FIG. 3, the amount of time shift to the peak phase of the fundamental tone before phase alignment is obtained in consideration of the amount of phase change accompanying the change of pitch with respect to the phase spectrum data, and each time shift amount is determined according to the obtained time shift amount. Correct the peak phase. The time shift at this time is performed in order to restore the time shift for phase alignment. The time shift amount TS at this time can be obtained according to the following equation (4).

ここで、φ_０は位相揃え前の基音のピーク位相（図８参照）、Δφ_０はピッチ変更に伴う位相変更量であって数１の式により求められるもの、φ_０’は位相揃え後の基音のピーク位相（図９参照）、Ｔはピッチ変更比をそれぞれ表わす。タイムシフト量ＴＳを求めた後は、タイムシフト量ＴＳに応じて位相変更量Δφ_ＰをΔφ_Ｐ＝２πｆ_Ｐ×ＴＳなる式により求め、求めた位相変更量Δφ_Ｐを図１０の各ピーク位相に加えて図１１の新たなピーク位相を算出する。ここで、ｆ_Ｐは、ピッチ変更後のＦ_０１等のピーク周波数を示す。例えば、ピーク周波数Ｆ_０１に対応するピーク位相φ_０１は、Δφ_０１＝２πＦ_０１×ＴＳなる式によりΔφ_０１を求めた後、図１０のφ_０１’にΔφ_０１を加えることにより求められる。同様にして、ピーク周波数Ｆ_０２，Ｆ_１１，Ｆ_１２，Ｆ_２１，Ｆ_２２，Ｆ_３１，Ｆ_３２，Ｆ_４１，Ｆ_４２，Ｆ_５１，Ｆ_５２にそれぞれ対応するピーク位相φ_０２，φ_１１，φ_１２，φ_２１，φ_２２，φ_３１，φ_３２，φ_４１，φ_４２，φ_５１，φ_５２が求められる。図１１には、図１０の各ピーク位相をタイムシフト量ＴＳに応じて修正した結果が示されている。 Here, φ ₀ is the peak phase of the fundamental tone before phase alignment (see FIG. 8), Δφ ₀ is the phase change amount associated with the pitch change, and is obtained by the equation (1), and φ ₀ ′ is after phase alignment. The peak phase of the fundamental tone (see FIG. 9) and T represent the pitch change ratio. After obtaining the time shift amount TS, the phase change amount Δφ _P is obtained according to the expression Δφ _P = 2πf _P × TS according to the time shift amount TS, and the obtained phase change amount Δφ _{P is set} to each peak phase in FIG. In addition, the new peak phase of FIG. 11 is calculated. Here, f _P indicates a peak frequency such as F ₀₁ after the pitch change. For example, the peak phase φ ₀₁ corresponding to the peak frequency F ₀₁ can be obtained by obtaining Δφ ₀₁ by the equation Δφ ₀₁ = 2πF ₀₁ × TS and then adding Δφ ₀₁ to φ ₀₁ ′ in FIG. Similarly, the peak phases φ ₀₂ , φ ₁₁ , and F ₅₂ corresponding to the peak frequencies F ₀₂ , F ₁₁ , F ₁₂ , F ₂₁ , F ₂₂ , F ₃₁ , F ₃₂ , F ₄₁ , F ₄₂ , F ₅₁ , F ₅₂ , respectively. _{_{_{φ 12, φ 21, φ 22}}} , φ 31, φ 32, is _{_{_{φ 41, φ 42, φ 51}}} , φ 52 is determined. FIG. 11 shows the result of correcting each peak phase in FIG. 10 according to the time shift amount TS.

ここまでの位相修正は、ピーク位相を対象としているので、各スペクトル分布領域毎にピーク位相以外の位相を修正する必要がある。そこで、図３のステップ６６では、位相スペクトルデータに関してピーク位相以外のスペクトルビンの位相をピーク位相の変更量に対応して修正する。例えば、ピーク位相φ_０１の属する位相スペクトル分布では、図８のφ_０から図９のφ_０’へのピーク位相変更量と、図１０のφ_０１’（図９のφ_０’と同じ）から図１１のφ_０１へのピーク位相変更量との和に対応してピーク位相φ_０１以外の各スペクトルビンの位相を修正する。同様にして、ピーク位相φ_０２，φ_１１，φ_１２，φ_２１，φ_２２，φ_３１，φ_３２，φ_４１，φ_４２，φ_５１，φ_５２がそれぞれ属する位相スペクトル分布においてもピーク位相以外の位相をピーク位相の変更量に対応して修正する。 Since the phase correction up to this point is for the peak phase, it is necessary to correct a phase other than the peak phase for each spectrum distribution region. Therefore, in step 66 of FIG. 3, the phase of the spectrum bin other than the peak phase is corrected in accordance with the change amount of the peak phase with respect to the phase spectrum data. For example, the phase spectrum distribution belongs peak phase phi _01, _'a peak phase change amount to, phi ₀₁ in FIG. 10' phi ₀ in FIG. 9 phi ₀ in FIG. 8 (same as phi _{0 'in} FIG. 9) corresponding to the sum of the peak phase change amount to phi ₀₁ in FIG. 11 to correct the peak phase phi ₀₁ except for the phase of each spectral bin. Similarly, in the phase spectrum distribution to which the peak phases φ ₀₂ , φ ₁₁ , φ ₁₂ , φ ₂₁ , φ ₂₂ , φ ₃₁ , φ ₃₂ , φ ₄₁ , φ ₄₂ , φ ₅₁ , φ ₅₂ belong respectively, The phase is corrected according to the change amount of the peak phase.

図８〜１１に関して上記した位相揃え処理によれば、図８に示すように不揃いであったピーク位相を図９に示すようにＭＦＰＡ状態となるように計算により修正したので、図１１に示す位相スペクトルデータを音声信号発生に用いると、発生される音声信号の波形は、声帯振動開始位置にてピーク位相が揃うという自然な音声波形の特徴を有するものとなり、自然な音質の出力音を得ることができる。 According to the phase alignment process described above with reference to FIGS. 8 to 11, the peak phase that was not uniform as shown in FIG. 8 is corrected by calculation so as to be in the MFPA state as shown in FIG. When spectrum data is used for voice signal generation, the waveform of the generated voice signal has the characteristics of a natural voice waveform in which the peak phases are aligned at the vocal cord vibration start position, and an output sound with natural sound quality is obtained. Can do.

図３のステップ６８では、ピッチ上昇した振幅スペクトルデータに関してスペクトル欠如領域にノイズ成分としてのスペクトルビンを付加する。ステップ６８の処理は、ステップ５４においてピッチ変更処理としてピッチ上昇処理を行なった場合にのみ行なわれるもので、図６に関して前述したようなピッチ低下処理を行なった場合には行なわれない。 In step 68 of FIG. 3, a spectrum bin as a noise component is added to the spectrum lack region with respect to the amplitude spectrum data whose pitch has been increased. The process of step 68 is performed only when the pitch increase process is performed as the pitch change process in step 54, and is not performed when the pitch decrease process described above with reference to FIG. 6 is performed.

図１２は、スペクトル付加処理の一例を示すもので、図５と同様の部分には同様の符号を付してある。図１２（Ａ）には、図５（Ａ）に示したのと同様の振幅スペクトルが示されている。ピッチ上昇処理では、領域Ｒ_０，Ｒ_１の振幅スペクトル分布を図１２（Ｂ）に示すように周波数軸上で高音側に移動するので、ピークＰ_０を有する振幅スペクトル分布の一方側及び他方側にはそれぞれスペクトル欠如領域Ｑ₁及びＱ_２が生ずる。スペクトル欠如領域Ｑ_２は、ピークＰ_０を有する振幅スペクトル分布とピークＰ_１を有する振幅スペクトル分布との間に存在する。Ｑ₁等のスペクトル欠如領域は、自然な音声の振幅スペクトル（例えば、図１２（Ａ）に示すもの）には存在しない。 FIG. 12 shows an example of the spectrum addition processing, and the same parts as those in FIG. FIG. 12A shows an amplitude spectrum similar to that shown in FIG. In the pitch increase process, the amplitude spectrum distributions of the regions R ₀ and R ₁ are moved to the high pitch side on the frequency axis as shown in FIG. 12B, and therefore one side and the other side of the amplitude spectrum distribution having the peak P _0. Each have a spectral lack region Q ₁ and Q ₂ . Spectral lack region Q ₂ are present between the amplitude spectrum distribution having the amplitude spectrum distribution and a peak P ₁ has a peak P _0. A spectrum lack region such as Q ₁ does not exist in the amplitude spectrum of natural speech (for example, one shown in FIG. 12A).

図１２（Ａ）に示すスペクトル分布領域Ｒ_０において、ピークＰ_０から十分に離れた（例えば５５Ｈｚ以上離れた）領域Ｋ_１１，Ｋ_１２は、いずれもメインローブから離れた残差成分を含むノイズ成分領域である。また、スペクトル分布領域Ｒ_１においても、ピークＰ_１から十分に離れた領域Ｋ_２１，Ｋ_２２は、いずれもノイズ成分領域である。ステップ６８では、図１２に示すように、スペクトル欠如領域Ｑ_１と周波数帯域が一致するノイズ成分領域ｋ_１（領域Ｋ_１１の一部）からスペクトルビンをコピーしてスペクトル欠如領域Ｑ_１に付加するように振幅スペクトルデータを修正すると共に、スペクトル欠如領域Ｑ_２と周波数帯域が一致するノイズ成分領域ｋ_２（領域Ｋ_２１の一部）からスペクトルビンをコピーしてスペクトル欠如領域Ｑ_２に付加するように振幅スペクトルデータを修正する。 In the spectrum distribution region R ₀ shown in FIG. 12A, the regions K ₁₁ and K _{12 that} are sufficiently separated from the peak P ₀ (for example, 55 Hz or more) are both noises including residual components separated from the main lobe. It is a component area. Also in the spectrum distribution region R ₁ , the regions K ₂₁ and K _{22 that} are sufficiently separated from the peak P ₁ are both noise component regions. In step 68, as shown in FIG. 12, added to the spectrum lack region Q ₁ by copying the spectral bin from the spectrum lack region Q _1, a noise component region k ₁ frequency bands match _(partial region K ₁₁₎ with modifying the amplitude spectrum data as such to be added to the spectrum lack region Q ₂ by copying the spectral bins from the noise component region k ₂ where the spectral lack region Q ₂ frequency bands match _(partial region K ₂₁₎ To correct the amplitude spectrum data.

ステップ６８のスペクトル付加処理によれば、原音声の生々しさを再現することができ、出力音の音質が向上する。なお、ステップ６８のスペクトル付加処理は、図５に関して前述したピッチ上昇処理を行なった場合に限らず、図１３に関して前述したピッチ上昇処理を行なった場合にも行なうことができる。 According to the spectrum addition process in step 68, the rawness of the original sound can be reproduced, and the sound quality of the output sound is improved. Note that the spectrum addition processing in step 68 can be performed not only when the pitch increase processing described above with reference to FIG. 5 is performed but also when the pitch increase processing described above with reference to FIG. 13 is performed.

ステップ７０では、振幅スペクトルデータ及び位相スペクトルデータを時間領域の音声信号（ディジタル波形データ）に変換する。この変換処理は、一例としてステップ７０ａ〜７０ｃにより行なうことができる。すなわち、ステップ７０ａでは、周波数領域のフレームデータ（振幅スペクトルデータ及び位相スペクトルデータ）に逆ＦＦＴ処理を施して時間領域の音声信号を得る。そして、ステップ７０ｂでは、時間領域の音声信号に窓掛け処理を施す。この処理は、時間領域の音声信号に時間窓関数を乗算するものである。ステップ７０ｃでは、時間領域の音声信号にオーバーラップ処理を施す。この処理は、順次のフレームについて波形をオーバーラップさせながら時間領域の音声信号を接続するものである。 In step 70, the amplitude spectrum data and the phase spectrum data are converted into a time domain audio signal (digital waveform data). This conversion process can be performed by steps 70a to 70c as an example. That is, in step 70a, inverse FFT processing is performed on the frequency domain frame data (amplitude spectrum data and phase spectrum data) to obtain a time domain audio signal. In step 70b, a windowing process is performed on the audio signal in the time domain. This process multiplies the time-domain audio signal by a time window function. In step 70c, overlap processing is performed on the time domain audio signal. This process connects audio signals in the time domain while overlapping waveforms for sequential frames.

ステップ７２では、音声信号をＤ／Ａ変換部２６に出力する。この結果、サウンドシステム３２からピッチ変換に係る音声が発生される。 In step 72, the audio signal is output to the D / A converter 26. As a result, sound related to pitch conversion is generated from the sound system 32.

この発明は、上記した実施形態に限定されるものではなく、種々の改変形態で実施可能なものである。例えば、次のような変更が可能である。 The present invention is not limited to the above-described embodiment, and can be implemented in various modifications. For example, the following changes are possible.

（１）上記した実施形態では、入力部１８から入力した音声信号をディジタル波形データに変換してピッチ変換を行なうようにしたが、原音声の音声波形を表わすディジタル波形データを記憶手段（ＲＯＭ１４，ＲＡＭ１６又は外部記憶装置２２等）に記憶しておき、入力部２０のキーボード操作等により所望のディジタル波形データを記憶手段から読出してピッチ変換を行なうようにしてもよい。また、原音声の音声波形を表わすディジタル波形データをインターフェース２８又は３０を介して取得してピッチ変換を行なうようにしてもよい。 (1) In the above embodiment, the voice signal input from the input unit 18 is converted into digital waveform data to perform pitch conversion. However, the digital waveform data representing the voice waveform of the original voice is stored in the storage means (ROM 14, ROM). RAM 16 or external storage device 22), and the desired digital waveform data may be read from the storage means by keyboard operation of input unit 20 to perform pitch conversion. Also, digital waveform data representing the voice waveform of the original voice may be acquired via the interface 28 or 30 to perform pitch conversion.

（２）ステップ４６，５０，５２でそれぞれ得られるピッチデータ、振幅スペクトルデータ，位相スペクトルデータを上記のような記憶手段に記憶しておき、入力部２０のキーボード操作等により所望のピッチデータ、振幅スペクトルデータ及び位相スペクトルデータを記憶手段から読出してピッチ変換を行なうようにしてもよい。また、原波形の音声波形に対応するピッチデータ、振幅スペクトルデータ及び位相スペクトルデータをインターフェース２８又は３０を介して取得してピッチ変換を行なうようにしてもよい。 (2) Pitch data, amplitude spectrum data, and phase spectrum data obtained in steps 46, 50, and 52 are stored in the storage means as described above, and desired pitch data and amplitude are obtained by operating the keyboard of the input unit 20 or the like. The pitch conversion may be performed by reading the spectrum data and the phase spectrum data from the storage means. In addition, pitch conversion may be performed by acquiring pitch data, amplitude spectrum data, and phase spectrum data corresponding to the voice waveform of the original waveform via the interface 28 or 30.

（３）この発明は、音声のピッチ変換に限らず、楽音のピッチ変換にも適用することができる。 (3) The present invention can be applied not only to pitch conversion of speech but also to pitch conversion of musical sounds.

この発明の一実施形態に係るピッチ変換装置の回路構成を示すブロック図である。It is a block diagram which shows the circuit structure of the pitch converter based on one Embodiment of this invention. ピッチ変換処理の一部を示すフローチャートである。It is a flowchart which shows a part of pitch conversion process. 図２のピッチ変換処理の他の一部を示すフローチャートである。It is a flowchart which shows a part of other pitch conversion process of FIG. 図２のピッチ変換処理の更に他の一部を示すフローチャートである。It is a flowchart which shows another part of the pitch conversion process of FIG. この発明に係るピッチ変更処理を説明するためのスペクトル図である。It is a spectrum figure for demonstrating the pitch change process which concerns on this invention. この発明に係るピッチ・音色変更処理を説明するためのスペクトル図である。It is a spectrum figure for demonstrating the pitch and timbre change process which concerns on this invention. 図６（Ａ）の振幅スペクトルに関してピッチ変更後のピーク周波数毎にスペクトルエンベロープ値を示すグラフである。It is a graph which shows a spectrum envelope value for every peak frequency after a pitch change regarding the amplitude spectrum of FIG. ピーク位相の一例を示す図である。It is a figure which shows an example of a peak phase. 図８のピーク位相にタイムシフト・選択処理を施して得られたピーク位相を示す図である。It is a figure which shows the peak phase obtained by giving a time shift and selection process to the peak phase of FIG. 図９のピーク位相にスペクトル分布配置変更処理を施して得られたピーク位相を示す図である。It is a figure which shows the peak phase obtained by giving a spectrum distribution arrangement change process to the peak phase of FIG. 図１０のピーク位相にタイムシフト処理を施して得られたピーク位相を示す図である。It is a figure which shows the peak phase obtained by giving a time shift process to the peak phase of FIG. ピッチ上昇時のスペクトル付加処理を説明するためのスペクトル図である。It is a spectrum figure for demonstrating the spectrum addition process at the time of pitch rise. 従来のピッチ変更処理を説明するためのスペクトル図である。It is a spectrum figure for demonstrating the conventional pitch change process. ピッチ変更前の振幅スペクトル分布及び位相スペクトル分布を例示するスペクトル図である。It is a spectrum figure which illustrates the amplitude spectrum distribution and phase spectrum distribution before a pitch change. ピッチ変更後の振幅スペクトル分布及び位相スペクトル分布を例示するスペクトル図である。It is a spectrum figure which illustrates the amplitude spectrum distribution and phase spectrum distribution after pitch change. 発明者の研究に係るピッチ・音色変更処理を説明するためのスペクトル図である。It is a spectrum figure for demonstrating the pitch and timbre change process which concerns on inventors' research. 発明者の研究に係るＦＦＴ分析処理における分析窓の時間位置の一例を示す波形図である。It is a wave form diagram which shows an example of the time position of the analysis window in the FFT analysis process which concerns on inventors' research. 図１７の分析窓位置でのＦＦＴ分析により得られたピーク位相を示す図である。It is a figure which shows the peak phase obtained by the FFT analysis in the analysis window position of FIG. 発明者の研究に係るＦＦＴ分析処理における分析窓の時間位置の他の例を示す波形図である。It is a wave form diagram which shows the other example of the time position of the analysis window in the FFT analysis process which concerns on inventors' research. 図１９の分析窓位置でのＦＦＴ分析により得られたピーク位相を示す図である。It is a figure which shows the peak phase obtained by the FFT analysis in the analysis window position of FIG.

Explanation of symbols

１０：小型コンピュータ、１１：バス、１２：ＣＰＵ、１４：ＲＯＭ、１６：ＲＡＭ、１８：音声入力部、２０：制御パラメータ入力部、２２：外部記憶装置、２４：表示部、２６：Ｄ／Ａ変換部、２８：ＭＩＤＩインターフェース、３０：通信インターフェース、３２：サウンドシステム、３４：ＭＩＤＩ機器、３６：通信ネットワーク、３８：他のコンピュータ。 10: small computer, 11: bus, 12: CPU, 14: ROM, 16: RAM, 18: voice input unit, 20: control parameter input unit, 22: external storage device, 24: display unit, 26: D / A Conversion unit, 28: MIDI interface, 30: communication interface, 32: sound system, 34: MIDI device, 36: communication network, 38: other computer.

Claims

Input means for inputting pitch information indicating a pitch different from the original sound;
Spectral distribution including a local peak and a spectrum before and after each local peak among a plurality of local peaks of spectral intensity based on an amplitude spectrum obtained by subjecting the sound waveform of the original sound to frequency analysis processing Generates amplitude spectrum data representing the amplitude spectrum distribution in the region with respect to the frequency axis, and generates phase spectrum data representing the phase spectrum distribution with respect to the frequency axis for each spectrum distribution region based on the phase spectrum obtained by the frequency analysis processing. Generating means for
First correcting means for correcting the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data on the frequency axis according to the pitch information for each spectrum distribution region;
Setting means for setting a desired peak frequency on one side of the peak frequency corresponding to at least one local peak in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means;
Spectral envelopes corresponding to the peak frequencies respectively corresponding to a plurality of local peaks in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means and the peak frequencies related to the setting by the setting means Indicating means for indicating an envelope value to form
A spectrum distribution region having a spectrum distribution region having a spectral intensity of a local peak closest to the envelope value instructed by the instruction unit corresponding to the peak frequency related to the setting is indicated by the amplitude spectrum data related to generation by the generation unit Selecting means for selecting from among spectrum distribution regions having a peak frequency in a predetermined approximate relationship with the peak frequency related to the setting in
A first copy means for copying the amplitude spectrum data and phase spectrum data of the spectrum distribution region related to the selection by the selection means from the amplitude spectrum data and the phase spectrum data related to the generation by the generation means;
The amplitude spectrum data related to the copy is corrected by moving the amplitude spectrum distribution on the frequency axis according to the setting in the amplitude spectrum distribution represented by the amplitude spectrum data related to the copy by the first copying means. Second correcting means for
In the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, the spectrum intensity of the local peak for each amplitude spectrum distribution is associated with the peak frequency corresponding to the local peak by the indicating means. The spectral intensity of each spectral bin is corrected to match the indicated envelope value, and the spectral intensity of the local peak in the amplitude spectral distribution represented by the amplitude spectral data related to the correction by the second correcting means is indicated by the indicating means. A third correcting means for correcting the spectral intensity of each spectral bin so as to match the envelope value indicated corresponding to the peak frequency according to the setting;
The phase spectrum distribution represented by the phase spectrum data related to the generation by the generation means is corrected for each spectrum distribution region corresponding to the pitch change by the first correction means, and is copied by the first copy means. Fourth correcting means for correcting the phase spectrum distribution represented by the phase spectrum data according to the frequency change in the second correcting means;
A pitch provided with conversion means for converting amplitude spectrum data related to the correction by the first to third correction means and phase spectrum data related to the correction by the fourth correction means into a sound signal in the time domain. Conversion device.

A program used in a pitch conversion device including a computer, the computer being
Input means for inputting pitch information indicating a pitch different from the original sound;
Spectral distribution including a local peak and a spectrum before and after each local peak among a plurality of local peaks of spectral intensity based on an amplitude spectrum obtained by subjecting the sound waveform of the original sound to frequency analysis processing Generates amplitude spectrum data representing the amplitude spectrum distribution in the region with respect to the frequency axis, and generates phase spectrum data representing the phase spectrum distribution with respect to the frequency axis for each spectrum distribution region based on the phase spectrum obtained by the frequency analysis processing. Generating means for
First correcting means for correcting the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data on the frequency axis according to the pitch information for each spectrum distribution region;
Setting means for setting a desired peak frequency on one side of the peak frequency corresponding to at least one local peak in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means;
Spectral envelopes corresponding to the peak frequencies respectively corresponding to a plurality of local peaks in the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means and the peak frequencies related to the setting by the setting means Indicating means for indicating an envelope value to form
A spectrum distribution region having a spectrum distribution region having a spectral intensity of a local peak closest to the envelope value instructed by the instruction unit corresponding to the peak frequency related to the setting is indicated by the amplitude spectrum data related to generation by the generation unit Selecting means for selecting from among spectrum distribution regions having a peak frequency in a predetermined approximate relationship with the peak frequency related to the setting in
A first copy means for copying the amplitude spectrum data and phase spectrum data of the spectrum distribution region related to the selection by the selection means from the amplitude spectrum data and the phase spectrum data related to the generation by the generation means;
The amplitude spectrum data related to the copy is corrected by moving the amplitude spectrum distribution on the frequency axis according to the setting in the amplitude spectrum distribution represented by the amplitude spectrum data related to the copy by the first copying means. Second correcting means for
In the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, the spectrum intensity of the local peak for each amplitude spectrum distribution is associated with the peak frequency corresponding to the local peak by the indicating means. The spectral intensity of each spectral bin is corrected to match the indicated envelope value, and the spectral intensity of the local peak in the amplitude spectral distribution represented by the amplitude spectral data related to the correction by the second correcting means is indicated by the indicating means. A third correcting means for correcting the spectral intensity of each spectral bin so as to match the envelope value indicated corresponding to the peak frequency according to the setting;
The phase spectrum distribution represented by the phase spectrum data related to the generation by the generation means is corrected for each spectrum distribution region corresponding to the pitch change by the first correction means, and is copied by the first copy means. Fourth correcting means for correcting the phase spectrum distribution represented by the phase spectrum data according to the frequency change in the second correcting means;
A program for functioning as a conversion means for converting amplitude spectrum data related to correction by the first to third correction means and phase spectrum data related to correction by the fourth correction means into sound signals in the time domain.

Calculating means for setting a plurality of candidate values of the time shift amount from the peak phase of the fundamental tone with respect to the phase spectrum data, and calculating the peak phase of the fundamental tone and the nth harmonic for each candidate value;
A group of peak phases corresponding to a candidate value that is closest to the flatness is selected from among a plurality of groups of peak phases respectively corresponding to the plurality of candidate values, and a fundamental tone and an nth harmonic in the selected group are selected. Fifth correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data so as to coincide with the peak phase of
Instead of the fourth correction means, each phase corresponds to the pitch change in the first correction means for each spectrum distribution region in the phase spectrum distribution represented by the phase spectrum data related to the correction in the fifth correction means. And a sixth correction means for correcting
With respect to the phase spectrum data related to the correction by the sixth correction means, the time shift amount to the peak phase of the fundamental tone before the pitch change is calculated in consideration of the pitch change amount by the first correction means and Seventh correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data related to the correction by the sixth correcting means according to the amount of time shift;
In the spectrum distribution region corresponding to the fundamental tone in the phase spectrum data related to the modification by the seventh modifying means, the peak phase other than the fundamental peak phase corresponds to the amount of change in the fundamental peak phase by the fifth and seventh modifying means. In the spectral distribution region corresponding to the nth harmonic, the phase other than the peak phase of the nth harmonic is corrected corresponding to the amount of change in the peak phase of the nth harmonic by the fifth and seventh correction means. Correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third correction means and the phase spectrum data related to the correction by the fifth to eighth correction means into sound signals in the time domain. The pitch conversion device according to claim 1.

The computer,
Calculating means for setting a plurality of candidate values of the time shift amount from the peak phase of the fundamental tone with respect to the phase spectrum data, and calculating the peak phase of the fundamental tone and the nth harmonic for each candidate value;
A group of peak phases corresponding to a candidate value that is closest to the flatness is selected from among a plurality of groups of peak phases respectively corresponding to the plurality of candidate values, and a fundamental tone and an nth harmonic in the selected group are selected. Fifth correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data so as to coincide with the peak phase of
Instead of the fourth correction means, each phase corresponds to the pitch change in the first correction means for each spectrum distribution region in the phase spectrum distribution represented by the phase spectrum data related to the correction in the fifth correction means. And a sixth correction means for correcting
With respect to the phase spectrum data related to the correction by the sixth correction means, the time shift amount to the peak phase of the fundamental tone before the pitch change is calculated in consideration of the pitch change amount by the first correction means and Seventh correcting means for correcting the peak phase of the fundamental tone and the n-th overtone in the phase spectrum data related to the correction by the sixth correcting means according to the amount of time shift;
In the spectrum distribution region corresponding to the fundamental tone in the phase spectrum data related to the modification by the seventh modifying means, the peak phase other than the fundamental peak phase corresponds to the amount of change in the fundamental peak phase by the fifth and seventh modifying means. In the spectral distribution region corresponding to the nth harmonic, the phase other than the peak phase of the nth harmonic is corrected corresponding to the amount of change in the peak phase of the nth harmonic by the fifth and seventh correction means. Function as a correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third correction means and the phase spectrum data related to the correction by the fifth to eighth correction means into sound signals in the time domain. The program according to claim 2.

At least one amplitude spectrum of the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, which is a noise component area within the spectrum distribution area indicated by the amplitude spectrum data generated by the generation means. A second copy means for copying a spectrum bin from a noise component region whose frequency band coincides with a spectrum lack region generated on one side of the distribution;
A fifth correction of amplitude spectrum data representing the at least one amplitude spectrum distribution among the amplitude spectrum data related to the correction so as to add a spectrum bin related to the copy by the second copy means to the spectrum absence region. Correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third and fifth correction means and the phase spectrum data related to the correction by the fourth correction means into a sound signal in the time domain. The pitch conversion device according to claim 1, wherein the pitch conversion device converts the pitch.

The computer,
At least one amplitude spectrum of the amplitude spectrum distribution represented by the amplitude spectrum data related to the correction by the first correction means, which is a noise component area within the spectrum distribution area indicated by the amplitude spectrum data generated by the generation means. A second copy means for copying a spectrum bin from a noise component region whose frequency band coincides with a spectrum lack region generated on one side of the distribution;
A fifth correction of amplitude spectrum data representing the at least one amplitude spectrum distribution among the amplitude spectrum data related to the correction so as to add a spectrum bin related to the copy by the second copy means to the spectrum absence region. Function as a correction means,
The conversion means converts the amplitude spectrum data related to the correction by the first to third and fifth correction means and the phase spectrum data related to the correction by the fourth correction means into a sound signal in the time domain. The program according to claim 2, which is to be converted.