JP2005084661A

JP2005084661A - Speech analysis generator and program

Info

Publication number: JP2005084661A
Application number: JP2003320312A
Authority: JP
Inventors: Katsu Setoguchi; 克瀬戸口
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-09-11
Filing date: 2003-09-11
Publication date: 2005-03-31
Anticipated expiration: 2023-09-11
Also published as: JP4419486B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique for realizing the operation of a Formant position with a lighter load. <P>SOLUTION: The speech data inputted from an A/D converter 8 is extracted in a frame unit, is subjected to fast Fourie transform and is decomposed to a frequency amplitude component and a phase component. The ratio of the frequency components before and after filtering by a moving average filter section 304 is calculated as a frequency amplitude residual and the residual is multiplied by the frequency amplitude scheme obtained by filtering of the frequency amplitude component after a shift, by which the frequency amplitude component is calculated. The pitch is moved by shifting the instantaneous frequency calculated from the phase component. Reverse fast Fourie transform is performed by using the phase component obtained from the frequency amplitude component calculated in such a manner and the instantaneous frequency after the shift, by which the speech data obtained by operating the Formant position and further the pitch is generated. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声波形を分析し、その分析結果から音声波形を生成（合成）するための技術に関する。 The present invention relates to a technique for analyzing a speech waveform and generating (synthesizing) a speech waveform from the analysis result.

音声波形を分析し、その分析結果を用いて音声波形を合成する音声分析合成装置は、入力した音声波形に対し声質を変化させるといった音響効果を付与する用途でも利用されている。
その声質の変化は、例えば音声（例えば人声）のフォルマントを操作したり、或いは音声をバンドパスフィルタ（ＢＰＦ）に通してバンド別に振幅値を特定し、特定した振幅値から構成したフィルタに音声を通すことで行われる。前者の方式を採用した音声分析生成装置としては、例えば特許文献１に記載されたものがある。 A speech analysis / synthesis apparatus that analyzes a speech waveform and synthesizes a speech waveform using the analysis result is also used for applying an acoustic effect such as changing the voice quality of the input speech waveform.
The change in the voice quality is, for example, by manipulating a formant of voice (for example, human voice) or by passing the voice through a bandpass filter (BPF) and specifying an amplitude value for each band, and then changing the voice to a filter configured from the specified amplitude value. It is done by passing. As a speech analysis and generation apparatus that employs the former method, there is one described in Patent Document 1, for example.

その特許文献１に記載された従来の音声分析生成装置では、音声波形を分析して抽出したＬＰＣ（線形予測）係数をＬＳＰ（Line Spectrum Pare）係数に変換し、そのＬＳＰ係数に対して周波数変換を行うことにより、フォルマント位置を移動させていた。しかし、ＬＰＣ係数からＬＳＰ係数への変換やＬＰＣによるフィルタの極の算出では、高次代数方程式を解く必要から演算量が膨大となる。このため、処理が非常に重いという問題点があった。 In the conventional speech analysis and generation apparatus described in Patent Document 1, an LPC (Linear Prediction) coefficient extracted by analyzing a speech waveform is converted into an LSP (Line Spectrum Pare) coefficient, and frequency conversion is performed on the LSP coefficient. The formant position was moved by doing. However, in the conversion from the LPC coefficient to the LSP coefficient and the calculation of the filter pole by LPC, the amount of calculation becomes enormous because it is necessary to solve the high-order algebraic equation. For this reason, there was a problem that processing was very heavy.

その問題点により、処理時間を実用的なレベルにするには高性能な高価な処理システムを用意しなければならなかった（これは、特にリアルタイムで分析、生成（合成）を行う場合に強く要求される）。このことから、全体の製造コストを抑える意味からも、処理の負荷をより軽くすることが望まれていた。
特開２００３−６６９８２号公報 Due to this problem, it was necessary to prepare a high-performance and expensive processing system in order to bring the processing time to a practical level (this is particularly required when analyzing and generating (synthesizing) in real time. ) For this reason, it has been desired to further reduce the processing load from the viewpoint of reducing the overall manufacturing cost.
JP 2003-66982 A

本発明の課題は、フォルマント位置の操作をより軽い負荷で実現させるための技術を提供することにある。 The subject of this invention is providing the technique for implement | achieving operation of a formant position with a lighter load.

本発明の第１〜第３の態様の音声分析生成装置は共に、第１の音声波形を分析し、該分析結果を用いて第２の音声波形の生成を行うことを前提とし、それぞれ以下の手段を具備する。
第１の態様の音声分析生成装置は、第１の音声波形を分析して第１の周波数振幅成分、及び位相成分を抽出する分析手段と、周波数振幅成分のフィルタリングを行うフィルタ手段と、第１の周波数振幅成分に対するフォルマントのシフト量を指示する指示手段と、指示手段により指示されたシフト量に従いシフトを行うシフト手段と、第１の周波数振幅成分に対しフィルタ手段がフィルタリングすることで得られる第２の周波数振幅成分で該第１の周波数振幅成分を除算することにより周波数振幅残差を算出する算出手段と、シフト手段がシフトした第１の周波数振幅成分に対するフィルタ手段によるフィルタリング、及び第２の周波数振幅成分に対するシフト手段によるシフトのうちの一方を行うことで得られる第３の周波数振幅成分に周波数振幅残差を乗算する乗算手段と、乗算手段が乗算を行うことで得られる第４の周波数振幅成分、及び位相成分を用いて第２の音声波形を生成する音声波形生成手段と、を具備する。 Both of the speech analysis / generation apparatuses according to the first to third aspects of the present invention are based on the premise that the first speech waveform is analyzed and the second speech waveform is generated using the analysis result. Means.
According to a first aspect of the present invention, there is provided a speech analysis / generation apparatus that analyzes a first speech waveform to extract a first frequency amplitude component and a phase component, a filter unit that filters a frequency amplitude component, a first An instruction means for instructing the amount of formant shift with respect to the frequency amplitude component, a shift means for shifting according to the shift amount instructed by the instruction means, and a filter means for filtering the first frequency amplitude component. Calculating means for calculating a frequency amplitude residual by dividing the first frequency amplitude component by two frequency amplitude components, filtering by the filter means for the first frequency amplitude component shifted by the shift means, and second The third frequency amplitude component obtained by performing one of the shifts by the shift means for the frequency amplitude component is Multiplying means for multiplying several amplitude residuals, and speech waveform generating means for generating a second speech waveform using a fourth frequency amplitude component and a phase component obtained by performing multiplication by the multiplying means. To do.

第２の態様の音声分析生成装置は、第１の音声波形を分析して第１の周波数振幅成分、及び第１の位相成分を抽出する分析手段と、周波数振幅成分のフィルタリングを行うフィルタ手段と、周波数振幅成分に対しフォルマントのシフトを行う第１のシフト手段と、第１の周波数振幅成分に対しフィルタ手段がフィルタリングすることで得られた第２の周波数振幅成分で該第１の周波数振幅成分を除算して得られる周波数振幅残差を算出する残差算出手段と、第１の位相成分から瞬時周波数を算出する瞬時周波数算出手段と、ピッチのシフト量を指示するピッチ指示手段と、ピッチ指示手段が指示したシフト量に従って、瞬時周波数、及び周波数振幅残差をシフトする第２のシフト手段と、第１のシフト手段がシフトした第１の周波数振幅成分に対するフィルタ手段によるフィルタリング、及び第２の周波数振幅成分に対する第１のシフト手段によるシフトのうちの一方を行うことで得られる第３の周波数振幅成分に、第２のシフト手段によりシフトされた周波数振幅残差を乗算して第４の周波数振幅成分を算出する振幅成分算出手段と、第２のシフト手段によりシフトされた瞬時周波数から第２の位相成分を算出する位相成分算出手段と、第４の周波数振幅成分、及び第２の位相成分を用いて第２の音声波形を生成する音声波形生成手段と、を具備する。 The speech analysis generation device according to the second aspect includes Analyzing the first speech waveform to obtain a first frequency amplitude component; And analyzing means for extracting the first phase component; Filter means for filtering frequency amplitude components; First shift means for performing a formant shift on the frequency amplitude component; Residual calculating means for calculating a frequency amplitude residual obtained by dividing the first frequency amplitude component by a second frequency amplitude component obtained by filtering the first frequency amplitude component by the filter means; , Instantaneous frequency calculating means for calculating an instantaneous frequency from the first phase component; Pitch instruction means for instructing the pitch shift amount; According to the shift amount instructed by the pitch instruction means, Instantaneous frequency, And second shifting means for shifting the frequency amplitude residual; Filtering by the filter means for the first frequency amplitude component shifted by the first shift means; And the third frequency amplitude component obtained by performing one of the shifts by the first shift means with respect to the second frequency amplitude component, Amplitude component calculating means for calculating a fourth frequency amplitude component by multiplying the frequency amplitude residual shifted by the second shift means; Phase component calculation means for calculating a second phase component from the instantaneous frequency shifted by the second shift means; A fourth frequency amplitude component; Voice waveform generation means for generating a second voice waveform using the second phase component; It comprises.

第３の態様の音声分析生成装置は、第１の音声波形を分析して第１の周波数振幅成分、及び位相成分を抽出する分析手段と、周波数振幅成分のフィルタリングを行うフィルタ手段と、周波数振幅成分に対しフォルマントのシフトを行うシフト手段と、ピッチを指示するピッチ指示手段と、ピッチ指示手段が指示したピッチで声帯音源を模擬する音源波形を生成する音源波形生成手段と、音源波形を分析して周波数振幅成分を抽出する他の分析手段と、他の分析手段により音源波形から抽出される周波数振幅成分に、シフト手段がシフトした第１の周波数振幅成分に対するフィルタ手段によるフィルタリング、及び該フィルタ手段がフィルタリングした第１の周波数振幅成分に対する該シフト手段によるシフトのうちの一方を行うことで得られる第２の周波数振幅成分を乗算して第３の周波数振幅成分を算出する振幅成分算出手段と、第３の周波数振幅成分、及び位相成分を用いて第２の音声波形を生成する音声波形生成手段と、を具備する。 According to a third aspect of the present invention, there is provided a speech analysis / generation apparatus that analyzes a first speech waveform to extract a first frequency amplitude component and a phase component, a filter unit that filters a frequency amplitude component, and a frequency amplitude Shift means for shifting the formant with respect to the component; pitch instruction means for instructing the pitch; sound source waveform generation means for generating a sound source waveform that simulates the vocal cord sound source at the pitch indicated by the pitch instruction means; and Analyzing means for extracting the frequency amplitude component, filtering by the filter means for the first frequency amplitude component shifted by the shift means to the frequency amplitude component extracted from the sound source waveform by the other analysis means, and the filter means Is obtained by performing one of the shifts by the shift means on the filtered first frequency amplitude component. Amplitude component calculating means for multiplying the second frequency amplitude component to calculate the third frequency amplitude component, and voice waveform generating means for generating the second voice waveform using the third frequency amplitude component and the phase component And.

なお、上記第１〜第３の態様において、上記分析手段は、第１の音声波形の分析を高速フーリエ変換を用いて行い、音声波形生成手段は、逆高速フーリエ変換を用いて第２の音声波形を生成する、ことが望ましい。上記フィルタ手段については、移動平均フィルタとして機能するものである、ことが望ましい。 In the first to third aspects, the analysis unit performs analysis of the first speech waveform using fast Fourier transform, and the speech waveform generation unit uses the inverse fast Fourier transform to perform the second speech. It is desirable to generate a waveform. The filter means desirably functions as a moving average filter.

また、第２の音声波形を第１の音声波形に重畳して出力できる、ことが望ましい。音源波形生成手段は、音源波形としてRosenberg 波を生成する、ことが望ましい。
本発明の第１〜第３の態様のプログラムは、上記第１〜第３の態様の音声分析生成装置をそれぞれ実現させるための機能を搭載している。 It is also desirable that the second speech waveform can be superimposed on the first speech waveform and output. The sound source waveform generating means preferably generates a Rosenberg wave as the sound source waveform.
The program of the 1st-3rd aspect of this invention is equipped with the function for implement | achieving the audio | voice analysis production | generation apparatus of the said 1st-3rd aspect, respectively.

本発明は、第１の音声波形を分析して第１の周波数振幅成分、及び位相成分を抽出し、第１の周波数振幅成分をフィルタリングすることで得られる第２の周波数振幅成分で該第１の周波数振幅成分を除算することにより周波数振幅残差を算出し、シフトした第１の周波数振幅成分に対するフィルタリング、及び第２の周波数振幅成分のシフトのうちの一方を行うことで得られる第３の周波数振幅成分に周波数振幅残差を乗算して第４の周波数振幅成分を算出し、その第４の周波数振幅成分、及び位相成分を用いて第２の音声波形を生成する。 The present invention analyzes the first speech waveform, extracts the first frequency amplitude component and the phase component, and filters the first frequency amplitude component to obtain the first frequency amplitude component. A frequency amplitude residual is calculated by dividing the frequency amplitude component of the second frequency amplitude component, and a third obtained by performing one of filtering on the shifted first frequency amplitude component and shifting of the second frequency amplitude component A frequency amplitude component is multiplied by the frequency amplitude residual to calculate a fourth frequency amplitude component, and a second speech waveform is generated using the fourth frequency amplitude component and the phase component.

第３の周波数振幅成分を得るまでに行われる周波数振幅成分に対するシフトに伴い、フォルマント位置もシフトされる。位相成分に対する操作は行わないために、ピッチは実質的に維持される。このため、ピッチを変化させることなく、フォルマント位置を操作することができる。ＬＰＣによる極の移動やＬＳＰ係数への変換のために高次方程式を解くような膨大な演算量が必要な処理は行わなくて済むため、そのような処理を行う場合と比較して、処理全体の負荷は大幅に低減させることができる。その低減により、高価な高性能な処理システム（ＣＰＵ或いはＤＳＰ、などを含むもの）を採用しなくとも十分な処理速度が得られることとなる。 With the shift to the frequency amplitude component performed until the third frequency amplitude component is obtained, the formant position is also shifted. Since no operation is performed on the phase component, the pitch is substantially maintained. Therefore, the formant position can be operated without changing the pitch. Since it is not necessary to perform processing that requires an enormous amount of calculation such as solving higher-order equations for movement of poles or conversion to LSP coefficients by LPC, the entire processing is compared with the case where such processing is performed. The load of can be greatly reduced. By the reduction, a sufficient processing speed can be obtained without employing an expensive high-performance processing system (including a CPU or DSP).

上記位相成分から瞬時周波数を算出し、その瞬時周波数、及び周波数振幅残差をシフトし、シフトした周波数振幅残差を第３の周波数振幅成分と乗算して第４の周波数振幅成分を算出し、第４の周波数振幅成分、及びシフトした瞬時周波数から得られる位相成分を用いて第２の音声波形を生成するようにした場合には、フォルマント位置の操作とともにピッチの操作（シフト）も同時に行うことができる。そのようにしても膨大な演算量が必要な処理は行わなくとも済むため、処理全体の負荷の増大は抑えられることとなる。 The instantaneous frequency is calculated from the phase component, the instantaneous frequency and the frequency amplitude residual are shifted, the shifted frequency amplitude residual is multiplied by the third frequency amplitude component, and the fourth frequency amplitude component is calculated. When the second speech waveform is generated using the fourth frequency amplitude component and the phase component obtained from the shifted instantaneous frequency, the pitch operation (shift) is performed simultaneously with the operation of the formant position. Can do. Even in such a case, it is not necessary to perform a process that requires a huge amount of calculation, so that an increase in the load of the entire process can be suppressed.

本発明は、第１の音声波形を分析して第１の周波数振幅成分、及び位相成分を抽出し、指示されたピッチで声帯音源を模擬する音源波形を生成し、その音源波形から抽出される周波数振幅成分に、シフトした第１の周波数振幅成分をフィルタリングするか、或いはフィルタリングした第１の周波数振幅成分に対しシフトすることで得られる第２の周波数振幅成分を乗算して第３の周波数振幅成分を算出し、その第３の周波数振幅成分、及び位相成分を用いて第２の音声波形を生成する。このため、フォルマント位置の操作とともにピッチの操作（シフト）も同時に行うことができる。膨大な演算量が必要な処理は行わなくとも済むため、そのような処理を行う場合と比較して、処理全体の負荷は大幅に低減させることができる。それにより、上記発明と同様の効果が得られる。 The present invention analyzes a first speech waveform, extracts a first frequency amplitude component and a phase component, generates a sound source waveform that simulates a vocal cord sound source at an instructed pitch, and is extracted from the sound source waveform. The frequency amplitude component is filtered by the shifted first frequency amplitude component or is multiplied by the second frequency amplitude component obtained by shifting the filtered first frequency amplitude component to obtain the third frequency amplitude. A component is calculated, and a second speech waveform is generated using the third frequency amplitude component and phase component. For this reason, the operation (shift) of the pitch can be performed simultaneously with the operation of the formant position. Since it is not necessary to perform a process that requires an enormous amount of calculation, the load on the entire process can be greatly reduced as compared with the case where such a process is performed. Thereby, the same effect as the above-mentioned invention can be obtained.

以下、本発明の実施例について、図面を参照しながら詳細に説明する。
＜第１の実施例＞
図１は、本実施例による音声分析生成装置を搭載した電子楽器の構成図である。
その電子楽器は、図１に示すように、楽器全体の制御を行うＣＰＵ１と、複数の鍵を備えた鍵盤２と、各種スイッチを備えたスイッチ部３と、ＣＰＵ１が実行するプログラムや各種制御用データ等を格納したＲＯＭ４と、ＣＰＵ１のワーク用のＲＡＭ５と、例えば液晶表示装置（ＬＣＤ）や複数のＬＥＤなどを備えた表示部６と、特には図示しない端子に接続されたマイク７から入力されるアナログの音声信号のＡ／Ｄ変換を行いその音声データを出力するＡ／Ｄ変換器８と、ＣＰＵ１の指示に従い楽音発音用の波形データを生成する楽音生成部９と、その生成部９が生成した波形データのＤ／Ａ変換を行い、アナログのオーディオ信号を出力するＤ／Ａ変換器１０と、そのオーディオ信号の増幅を行うアンプ１１と、そのアンプ１１が増幅を行った後のオーディオ信号を音声に変換するスピーカ１２と、各種スライダを備えたスライダ部１３と、を備えて構成されている。それらの構成において、ＣＰＵ１、鍵盤２、スイッチ部３、ＲＯＭ４、ＲＡＭ５、表示部６、Ａ／Ｄ変換器８、楽音生成部９、及びスライダ部１３の間はバスによって接続されている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<First embodiment>
FIG. 1 is a configuration diagram of an electronic musical instrument equipped with a speech analysis / generation apparatus according to the present embodiment.
As shown in FIG. 1, the electronic musical instrument includes a CPU 1 that controls the entire musical instrument, a keyboard 2 that includes a plurality of keys, a switch unit 3 that includes various switches, a program executed by the CPU 1, and various control applications. The data is input from a ROM 4 storing data, a work RAM 5 of the CPU 1, a display unit 6 including, for example, a liquid crystal display (LCD) and a plurality of LEDs, and a microphone 7 connected to a terminal (not shown). An A / D converter 8 that performs A / D conversion of an analog audio signal and outputs the audio data, a tone generator 9 that generates waveform data for tone generation in accordance with instructions from the CPU 1, and a generator 9 The D / A converter 10 that performs D / A conversion of the generated waveform data and outputs an analog audio signal, the amplifier 11 that amplifies the audio signal, and the amplifier 11 amplifies A speaker 12 for converting the audio signal after the speech, the slider unit 13 provided with various slider, is configured to include a. In these configurations, the CPU 1, the keyboard 2, the switch unit 3, the ROM 4, the RAM 5, the display unit 6, the A / D converter 8, the musical tone generation unit 9, and the slider unit 13 are connected by a bus.

上記スライダ部１３は、図２に示すように、マイク７により入力された音声のピッチのシフトを指示するためのピッチスライダ２１と、そのフォルマント位置（周波数）のシフトを指示するためのフォルマントスライダ２２と、を備えている。それにより、本実施例による音声分析生成装置は、マイク７により入力された音声のピッチ、或いはフォルマント位置を操作する音響効果を付加できるものとして実現されている。 As shown in FIG. 2, the slider unit 13 includes a pitch slider 21 for instructing a shift of the pitch of the sound input from the microphone 7 and a formant slider 22 for instructing a shift of the formant position (frequency). And. As a result, the voice analysis / generation apparatus according to the present embodiment is realized as an apparatus capable of adding an acoustic effect for manipulating the pitch or formant position of the voice input by the microphone 7.

そのスライダ部１３は、図２に示すスライダ２１、２２の他に、それらのつまみの位置を検出するための検出回路を備えたものである。これはスイッチ部３でも同様である。
マイク７から出力されたアナログの音声信号は、Ａ／Ｄ変換器（ＡＤＣ）８によってデジタルの音声データに変換される。そのＡ／Ｄ変換器８は、例えばサンプリング周波数２２，０５２ＨｚでＡＤ変換（サンプリング）を行う。以降、それがＡＤ変換して得られる音声データについては便宜的に「元音声データ」、或いは「元波形データ」と呼び、マイク７に入力された音声については「元音声」と呼ぶことにする。音声の入力は、ＣＤ−ＲＯＭやＤＶＤ、或いは光磁気ディスク等の記憶媒体を介して行っても良く、ＬＡＮ、或いは公衆網等の通信ネットワークを介して行っても良い。 The slider portion 13 is provided with a detection circuit for detecting the positions of these knobs in addition to the sliders 21 and 22 shown in FIG. The same applies to the switch unit 3.
The analog audio signal output from the microphone 7 is converted into digital audio data by an A / D converter (ADC) 8. The A / D converter 8 performs AD conversion (sampling) at a sampling frequency of 22,052 Hz, for example. Hereinafter, the voice data obtained by AD conversion will be referred to as “original voice data” or “original waveform data” for convenience, and the voice input to the microphone 7 will be referred to as “original voice”. . Audio input may be performed via a storage medium such as a CD-ROM, DVD, or magneto-optical disk, or may be performed via a communication network such as a LAN or a public network.

図３は、本実施例による音声分析生成装置の機能構成図である。
Ａ／Ｄ変換器（ＡＤＣ）８が出力する元音声データは、例えばＲＡＭ５に確保された領域である入力バッファ３０１に格納される。フレーム抽出・窓かけ部３０２は、入力バッファ３０１から１フレーム分の元音声データを切り出し、それに窓関数、例えばハニング窓（Hanning Window）を乗算する。 FIG. 3 is a functional configuration diagram of the speech analysis generation device according to the present embodiment.
The original audio data output from the A / D converter (ADC) 8 is stored in the input buffer 301, which is an area secured in the RAM 5, for example. The frame extraction / windowing unit 302 cuts out original audio data for one frame from the input buffer 301, and multiplies it by a window function, for example, a Hanning Window.

高速フーリエ変換（ＦＦＴ）部３０３は、窓関数乗算後のフレームを対象にＦＦＴを行い、周波数振幅成分と位相成分を算出する。移動平均フィルタ部３０４は、その周波数振幅成分の平均値を出力するフィルタリングを行うものである。そのフィルタ点数はサンプリング周波数が２２，０５２Ｈｚでは７点が最適である。その場合、移動平均フィルタ部３０４は、７個の周波数振幅成分単位でフィルタリングを行うことになり、そのようなフィルタリングにより、周波数振幅成分の概形を示す値が出力されることとなる。その値がフォルマント成分に相当する。 A fast Fourier transform (FFT) unit 303 performs FFT on a frame after window function multiplication, and calculates a frequency amplitude component and a phase component. The moving average filter unit 304 performs filtering to output an average value of the frequency amplitude components. The optimum number of filter points is 7 at a sampling frequency of 22,052 Hz. In that case, the moving average filter unit 304 performs filtering in units of seven frequency amplitude components, and a value indicating the outline of the frequency amplitude component is output by such filtering. The value corresponds to the formant component.

逆数演算部３０６は、フィルタリング後の周波数振幅成分（以降「周波数振幅概形」と呼ぶ）の逆数を算出する。乗算器３０７は、その逆数にフィルタリング前の周波数振幅成分を乗算する。その乗算結果は、フィルタリングの前後における周波数振幅成分の比を表す値である。ここでは、それを周波数振幅残差と呼ぶことにする。 The reciprocal computing unit 306 calculates the reciprocal of the frequency amplitude component after filtering (hereinafter referred to as “frequency amplitude outline”). The multiplier 307 multiplies the inverse of the frequency amplitude component before filtering. The multiplication result is a value representing the ratio of frequency amplitude components before and after filtering. Here, this is called a frequency amplitude residual.

その周波数振幅残差は、ピッチ移動部３１０のシフト部３１０ｂによってシフト（周波数領域上のシフト）される。そのシフト後の周波数振幅残差が乗算器３１２に送られる。
フォルマント移動部３０５のシフト部３０５ａは、ＦＦＴ部３０３から受け取った周波数振幅成分をシフト（周波数領域上のシフトである）する。この操作は、音声のピッチをシフトする操作に相当し、そのシフトに伴い、元音声のフォルマント位置もシフトすることになる。しかし、位相成分の操作は行わないので、厳密にはピッチシフトとはならない。それにより、実質的にはフォルマント位置のみが操作されることとなる。 The frequency amplitude residual is shifted (shifted in the frequency domain) by the shift unit 310b of the pitch moving unit 310. The frequency amplitude residual after the shift is sent to the multiplier 312.
The shift unit 305a of the formant moving unit 305 shifts the frequency amplitude component received from the FFT unit 303 (shift on the frequency domain). This operation corresponds to an operation for shifting the pitch of the voice, and the formant position of the original voice is also shifted with the shift. However, since the phase component is not manipulated, it is not strictly a pitch shift. Thereby, substantially only the formant position is operated.

フォルマント移動部３０５は、その操作後の周波数振幅成分に対し、移動平均フィルタ部３０５ｂによりフィルタリングを更に行う。それにより得られる周波数振幅成分概形が乗算器３１２に送られる。
操作パネル３１１は、図２に示すスライダ２１、２２のつまみの位置に応じて、フォルマント位置、及びピッチのそれぞれのシフト量を指示するものである。例えばスライダ部１３、ＣＰＵ１、ＲＯＭ４、及びＲＡＭ５により実現される。本実施例では、それらのシフト量は、元音声のフォルマント位置、ピッチを基準として表す比率で指示するようにしている。以降「フォルマント位置、及びピッチのシフト量を表す比率はそれぞれ「フォルマントシフト比率」及び「ピッチシフト比率」と呼ぶことにする。 The formant moving unit 305 further performs filtering on the frequency amplitude component after the operation by the moving average filter unit 305b. The outline of the frequency amplitude component obtained thereby is sent to the multiplier 312.
The operation panel 311 indicates the formant position and the shift amount of the pitch according to the positions of the knobs of the sliders 21 and 22 shown in FIG. For example, it is realized by the slider unit 13, the CPU 1, the ROM 4, and the RAM 5. In the present embodiment, these shift amounts are indicated by a ratio that represents the formant position and pitch of the original voice as a reference. Hereinafter, the ratios representing the “formant position and pitch shift amount” will be referred to as “formant shift ratio” and “pitch shift ratio”, respectively.

周波数振幅成分や位相成分は、インデクスにより管理される。このことから、それらのシフトは、インデクス値に比率を乗算してシフト後のインデクス値を算出し、算出したインデクス値に配置を変更することで行われる。
乗算器３１２は、フォルマント移動部３０５からの周波数振幅成分概形をシフト後の周波数振幅残差と乗算し、その乗算結果を出力する。その乗算結果として算出される周波数振幅成分は、ユーザが指定したピッチシフト、フォルマントシフトを施したものに相当する。 The frequency amplitude component and the phase component are managed by an index. From this, these shifts are performed by multiplying the index value by the ratio to calculate the shifted index value, and changing the arrangement to the calculated index value.
Multiplier 312 multiplies the frequency amplitude component outline from formant moving section 305 by the shifted frequency amplitude residual, and outputs the multiplication result. The frequency / amplitude component calculated as the multiplication result corresponds to a pitch shift and formant shift designated by the user.

上記ＦＦＴ部３０３が抽出した位相成分は、瞬時周波数算出部３０８に送られる。その算出部３０８は、位相差計測法により、その位相成分、及び前フレームの位相成分（位相データ）を用いて瞬時周波数を算出する。
ピッチ移動部３１０のシフト部３１０ａは、ピッチシフト比率に応じて、瞬時周波数のシフトを行う。周波数位相差変換部３１３は、ソフト後の瞬時周波数を位相差に変換する。位相差積算部３１４は、その位相差を積分することで位相成分を生成する。 The phase component extracted by the FFT unit 303 is sent to the instantaneous frequency calculation unit 308. The calculation unit 308 calculates the instantaneous frequency by using the phase component and the phase component (phase data) of the previous frame by the phase difference measurement method.
The shift unit 310a of the pitch moving unit 310 shifts the instantaneous frequency according to the pitch shift ratio. The frequency phase difference conversion unit 313 converts the instantaneous frequency after software into a phase difference. The phase difference integrating unit 314 generates a phase component by integrating the phase difference.

逆高速フーリエ変換（ＩＦＦＴ）部３１５は、位相差積算部３１４からの位相成分、及び乗算器３１２からの周波数振幅成分を用いて逆ＦＦＴを行う。その位相成分は、元音声のそれから操作を行ったものに相当するため、逆ＦＦＴにより生成される時間領域の音声データはフォルマント位置だけでなくピッチも変更したものとなる。窓掛けフレーム加算部３１６は、その音声データに対し、他のフレームの音声データに加算して重畳するために窓関数を乗算する。その乗算結果が加算器３１７に送られる。 The inverse fast Fourier transform (IFFT) unit 315 performs inverse FFT using the phase component from the phase difference accumulation unit 314 and the frequency amplitude component from the multiplier 312. Since the phase component corresponds to that obtained by performing the operation after that of the original sound, the time-domain sound data generated by the inverse FFT is not only the formant position but also the pitch. The windowing frame addition unit 316 multiplies the audio data by a window function in order to add and superimpose the audio data on the audio data of other frames. The multiplication result is sent to the adder 317.

その加算器３１７には、フレーム抽出・窓かけ部３０２が出力する窓関数乗算後の元音声データを増幅器３１８が増幅して得られる元音声データが送られる。このことから、加算器３１７は、その元音声データを窓掛けフレーム加算部３１６からの音声データと加算する。その加算後の音声データは、元音声に、フォルマント位置、或いはピッチを操作した元音声を付加したもの、つまりハーモニー効果を付加したものとなる。それが出力バッファ３１９に格納された音声データに加算され重畳される。その出力バッファ３１９は、例えばＲＡＭ５に確保された領域であり、それから読み出された音声データが楽音生成部９を介してＤ／Ａ変換器１０に出力されることにより、スピーカ１２からハーモニー効果が付加された音声が放音されることとなる。 The adder 317 receives original audio data obtained by the amplifier 318 amplifying the original audio data after the window function multiplication output from the frame extraction / windowing unit 302. From this, the adder 317 adds the original audio data with the audio data from the windowed frame addition unit 316. The audio data after the addition is obtained by adding the original sound obtained by manipulating the formant position or pitch to the original sound, that is, by adding the harmony effect. It is added to the audio data stored in the output buffer 319 and superimposed. The output buffer 319 is, for example, an area secured in the RAM 5, and the sound data read from the output buffer 319 is output to the D / A converter 10 via the musical tone generator 9, so that the harmony effect is obtained from the speaker 12. The added sound is emitted.

本実施例による音声分析生成装置は、上述したようにして、元音声データをＦＦＴして周波数振幅成分と位相成分とに分け、ピッチ移動部３１０によるシフトを行わない場合、周波数振幅成分のシフト操作のみを行うことによりフォルマント位置を移動させる。このため、ＬＰＣによる極の移動やＬＳＰ係数への変換のために高次方程式を解く場合と比較して、処理全体の負荷は大幅に低減させることができる。その低減により、処理時間は１／４以下にまで短縮することが確認された。また、位相成分を操作することにより、フォルマント位置の他にピッチも操作することができる。このため、元音声を様々に変化させた音声を生成することができる。位相成分の操作にも膨大な演算量は必要としないが、その操作を行わないようにした場合には、処理全体の負荷はより低減させることができる。 As described above, the voice analysis / generation apparatus according to the present embodiment performs FFT operation on the original voice data to divide the original voice data into the frequency amplitude component and the phase component, and when the shift by the pitch moving unit 310 is not performed, The formant position is moved by doing only. For this reason, compared with the case where a higher order equation is solved for the movement of the pole by LPC and the conversion to the LSP coefficient, the load of the entire process can be greatly reduced. It was confirmed that the processing time was shortened to 1/4 or less by the reduction. In addition to the formant position, the pitch can be manipulated by manipulating the phase component. For this reason, the sound which changed the original sound variously can be generated. An enormous amount of calculation is not required for the operation of the phase component, but if the operation is not performed, the load of the entire process can be further reduced.

以降は、その音声変換装置を実現させるための電子楽器の動作について、図４〜図８に示す各種フローチャートを参照して詳細に説明する。
図４は、全体処理のフローチャートである。始めに図４を参照して、その全体処理について詳細に説明する。なお、その全体処理は、ＣＰＵ１が、ＲＯＭ４に格納されたプログラムを実行して電子楽器のリソースを使用することにより実現される。 Hereinafter, the operation of the electronic musical instrument for realizing the sound conversion device will be described in detail with reference to various flowcharts shown in FIGS.
FIG. 4 is a flowchart of the entire process. First, the entire process will be described in detail with reference to FIG. Note that the overall processing is realized by the CPU 1 executing a program stored in the ROM 4 and using resources of the electronic musical instrument.

先ず、ステップ４０１では、電源がオンされたことに伴い、初期化処理を実行する。続くステップ４０２では、スイッチ部３を構成するスイッチへのユーザの操作に対応するためのスイッチ処理を実行する。そのスイッチ処理は、例えばスイッチ部３を構成する検出回路に各種スイッチの状態を検出させてその検出結果を受け取り、その検出結果を解析して状態が変化したスイッチの種類、及びその変化を特定して行われる。 First, in step 401, an initialization process is executed when the power is turned on. In the subsequent step 402, switch processing for responding to the user's operation on the switches constituting the switch unit 3 is executed. In the switch process, for example, the detection circuit constituting the switch unit 3 detects the state of various switches, receives the detection results, analyzes the detection results, and identifies the type of switch whose state has changed and the change. Done.

ステップ４０２に続くステップ４０３では、鍵盤２へのユーザの操作に対応するための鍵盤処理を実行する。その鍵盤処理を実行することにより、鍵盤２への演奏操作に応じて楽音がスピーカ１２から放音される。ステップ４０４にはその後に移行する。
ステップ４０４では、図２に示すスライダ２１、２２への操作に対応するためのスライダ処理を実行する。続くステップ４０５では、表示部６を構成するＬＣＤ、或いはＬＥＤを駆動してユーザに提供すべき情報を提供するといったことを実現するためのその他処理を実行する。その実行後は上記ステップ４０２に戻る。それにより、電源がオンされている間、ステップ４０２〜４０５で形成される処理ループを繰り返し実行する。 In step 403 following step 402, keyboard processing for responding to a user operation on the keyboard 2 is executed. By executing the keyboard process, a musical sound is emitted from the speaker 12 in accordance with a performance operation on the keyboard 2. Step 404 then proceeds.
In step 404, slider processing for responding to operations on the sliders 21 and 22 shown in FIG. 2 is executed. In subsequent step 405, the LCD or LED constituting the display unit 6 is driven to perform other processing for realizing information to be provided to the user. After the execution, the process returns to step 402. Thereby, while the power is on, the processing loop formed in steps 402 to 405 is repeatedly executed.

図５は、上記ステップ４０４として実行されるスライダ処理のフローチャートである。スライダ部１３を構成する検出回路から受け取った検出結果を解析した後に行われる処理の流れを表したものである。次に図５を参照して、そのスライダ処理について詳細に説明する。 FIG. 5 is a flowchart of slider processing executed as step 404 described above. 4 shows a flow of processing performed after analyzing a detection result received from a detection circuit constituting the slider unit 13. Next, the slider process will be described in detail with reference to FIG.

先ず、ステップ５０１では、ピッチスライダ２１のつまみの位置が変化したか否か判定する。ユーザがそのつまみの位置を変化させた場合、解析によりそのことが判明することから、判定はＹＥＳとなり、ステップ５０２で変数ＰｉｔｃｈＲａｔｉｏに代入のピッチシフト比率の設定（更新）を行った後、ステップ５０３に移行する。そうでない場合には、判定はＮＯとなり、次にそのステップ５０３に移行する。 First, in step 501, it is determined whether or not the knob position of the pitch slider 21 has changed. If the user changes the position of the knob, it is determined by analysis, so the determination is YES, and after setting (updating) the pitch shift ratio to be substituted into the variable PitchRatio in step 502, step 503 is performed. Migrate to Otherwise, the determination is no and the process moves to step 503 next.

ステップ５０３では、フォルマントスライダ２２のつまみの位置が変化したか否か判定する。ユーザがそのつまみの位置を変化させた場合、同様に判定はＹＥＳとなり、ステップ５０４で変数ＦｏｒｍａｎｔＲａｔｉｏに代入のフォルマントシフト比率の設定（更新）を行った後、一連の処理を終了する。そうでない場合には、判定はＮＯとなり、ここで一連の処理を終了する。 In step 503, it is determined whether or not the position of the knob of the formant slider 22 has changed. When the user changes the position of the knob, the determination is similarly YES, and after setting (updating) the formant shift ratio to be substituted in the variable FormatRatio in step 504, the series of processes is terminated. Otherwise, the determination is no and the series of processing ends here.

このようにして、ユーザがスライダ２１、或いは２２のつまみの位置を変化させると、つまみの位置を変化させたスライダの種類、及びその位置に応じてピッチシフト比率、或いはフォルマントシフト比率が更新される。それにより、ユーザはピッチシフト量、及びフォルマント位置のシフト量を指定できるようになっている。なお、上記変数の値の更新は、例えば予め設定したつまみの位置と設定すべき比率の関係を参照して行うようになっている。 In this way, when the user changes the position of the knob of the slider 21 or 22, the pitch shift ratio or formant shift ratio is updated according to the type of the slider that has changed the position of the knob and the position. . Thereby, the user can designate the pitch shift amount and the shift amount of the formant position. The value of the variable is updated with reference to a relationship between a preset knob position and a ratio to be set, for example.

図６は、楽音タイマインタラプト処理のフローチャートである。これは、元音声データの分析、及び音声データの生成（合成）を行うために、例えばサンプリング周期で発生する割り込み信号により実行される処理である。例えば図４に示すスイッチ処理において、音声データの生成を指示するためのスイッチが操作されたと判定したときに割り込み（実行）禁止が解除され（割り込みが有効とされ）、その生成の禁止を指示するためのスイッチが操作されたと判定したときに割り込みが禁止される（割り込みが無効とされる）ようになっている。次に図６を参照して、そのタイマインタラプト処理について詳細に説明する。 FIG. 6 is a flowchart of the musical tone timer interrupt process. This is a process executed by an interrupt signal generated at a sampling period, for example, in order to analyze the original voice data and generate (synthesize) the voice data. For example, in the switch process shown in FIG. 4, when it is determined that a switch for instructing generation of audio data has been operated, the interruption (execution) prohibition is released (interrupt is enabled), and the generation prohibition is instructed. Interrupt is prohibited (interrupt is disabled) when it is determined that the switch for operating the switch has been operated. Next, the timer interrupt process will be described in detail with reference to FIG.

先ず、ステップ６０１では、Ａ／Ｄ変換器８から出力される元音声データの入力バッファ３０１への書き込みを行う。続くステップ６０２では、フレーム処理タイミングか否か判定する。そのタイミングであった場合、判定はＹＥＳとなってステップ６０３に移行し、そうでない場合には、判定はＮＯとなり、ここで一連の処理を終了する。 First, in step 601, the original audio data output from the A / D converter 8 is written to the input buffer 301. In the subsequent step 602, it is determined whether or not it is a frame processing timing. If it is the timing, the determination is yes and the process proceeds to step 603. If not, the determination is no and the series of processes ends here.

生成した音声データは、設定されたオーバーラップファクタの値に従って既に生成されたフレームの音声データと加算される。このことから、その処理タイミングは、サンプリング周波数、オーバーラップファクタの値から決定される周期で到来する。
ステップ６０３では、入力バッファ３０１から１フレームサイズの元音声データを抽出し、窓関数（例えばハニング窓）を乗算する。次のステップ６０４では、乗算後のフレームを対象にＦＦＴを行い、周波数信号成分と位相成分に分ける。その次に移行するステップ６０５では、周波数信号成分のフィルタリング処理を行い、そのフィルタリング処理前後の周波数振幅成分の比（＝フィルタリング処理前／フィルタリング処理後）である周波数振幅残差を算出する。ステップ６０６にはその後に移行する。 The generated audio data is added to the audio data of the already generated frame according to the set overlap factor value. For this reason, the processing timing comes in a cycle determined from the sampling frequency and the value of the overlap factor.
In step 603, original audio data of one frame size is extracted from the input buffer 301 and multiplied by a window function (eg, Hanning window). In the next step 604, FFT is performed on the frame after multiplication to divide it into a frequency signal component and a phase component. In the next step 605, frequency signal component filtering processing is performed, and a frequency amplitude residual which is a ratio of frequency amplitude components before and after the filtering processing (= before filtering processing / after filtering processing) is calculated. Step 606 then proceeds.

ステップ６０６では、変数ＦｏｒｍａｎｔＲａｔｉｏに代入されたフォルマントシフト比率に応じたフォルマント位置の移動を実現させるためのフォルマントシフト処理を実行する。その実行後は、その実現のためにシフトが行われた周波数振幅成分を対象にフィルタリング処理を行うことで周波数振幅概形を算出する周波数振幅概形算出処理を実行する（ステップ６０７）。 In step 606, a formant shift process is executed for realizing the movement of the formant position in accordance with the formant shift ratio assigned to the variable “FormantRatio”. After the execution, a frequency amplitude outline calculation process for calculating the frequency amplitude outline is performed by performing a filtering process on the frequency amplitude component shifted for the realization (step 607).

ステップ６０７に続くステップ６０８では、ステップ６０４で得られた位相成分、及び前フレームの位相成分を用いて瞬時周波数を算出する瞬時周波数算出処理を実行する。その実行後に移行するステップ６０９では、算出された瞬時周波数、及びステップ６０５で算出された周波数振幅残差のシフトをそれぞれ行うピッチシフト処理を実行する。その後は、ステップ６０９のピッチシフト処理でシフトが行われた周波数振幅残差をステップ６０７で算出した周波数振幅概形と乗算する周波数振幅算出処理をステップ６１０で実行してからステップ６１１に移行する。 In step 608 following step 607, an instantaneous frequency calculation process is performed to calculate an instantaneous frequency using the phase component obtained in step 604 and the phase component of the previous frame. In step 609 which moves after the execution, a pitch shift process for shifting the calculated instantaneous frequency and the frequency amplitude residual calculated in step 605 is executed. After that, the frequency amplitude calculation process of multiplying the frequency amplitude residual shifted in the pitch shift process of step 609 by the frequency amplitude outline calculated in step 607 is executed in step 610 and then the process proceeds to step 611.

ステップ６１１では、ステップ６０９のピッチシフト処理でシフトが行われた瞬時周波数を位相差に変換する周波数位相差変換処理を実行する。続くステップ６１２では、その位相差を積分して位相成分を算出する位相差積算処理を実行する。その次のステップ６１３では、ステップ６１２で算出した位相成分、及びステップ６１０で算出した周波数振幅成分を用いて逆ＦＦＴを行う。その逆ＦＦＴにより１フレーム分の時間領域の音声データ（フォルマント位置、或いはピッチがシフト操作された音声データ）を生成した後はステップ６１４に移行する。 In step 611, a frequency phase difference conversion process for converting the instantaneous frequency shifted in the pitch shift process in step 609 into a phase difference is executed. In the following step 612, a phase difference integration process is performed in which the phase difference is integrated to calculate a phase component. In the next step 613, inverse FFT is performed using the phase component calculated in step 612 and the frequency amplitude component calculated in step 610. After generating the time domain audio data for one frame (audio data whose formant position or pitch has been shifted) by the inverse FFT, the process proceeds to step 614.

ステップ６１４では、生成した音声データに窓関数を乗算し、その乗算結果にステップ６０３で窓関数を乗算した元音声データを加算する。その次に移行するステップ６１５では、加算後の音声データ（ハーモニー効果が付加された音声データ）を出力バッファ３１９に既に格納された音声データに加算して重畳する。その後は、ステップ６１６で出力バッファ３１９から音声データを読み出して楽音生成部９に送出してから、一連の処理を終了する。 In step 614, the generated voice data is multiplied by a window function, and the original voice data multiplied by the window function in step 603 is added to the multiplication result. In the next step 615, the added audio data (audio data to which the harmony effect is added) is added to the audio data already stored in the output buffer 319 and superimposed. After that, in step 616, the audio data is read from the output buffer 319 and sent to the musical sound generation unit 9, and then the series of processing ends.

このようにして、楽音タイマインタラプト処理を実行することにより、フォルマント位置、更にはピッチが操作された音声データが生成され、その音声データが元音声データに加えられる。それにより、元音声にフォルマント位置、更にはピッチが操作された音声を同時に発音させる形でハーモニー効果が付加される。 By executing the musical tone timer interrupt process in this way, audio data in which the formant position and further the pitch are manipulated is generated, and the audio data is added to the original audio data. As a result, a harmony effect is added in such a way that the sound whose formant position and further the pitch are manipulated is simultaneously generated in the original sound.

以降は、そのタイマインタラプト処理内で実行されるサブルーチン処理について図７、及び図８に示す各フローチャートを参照して詳細に説明する。
図７は、上記ステップ６０６として実行されるフォルマントシフト処理のフローチャートである。始めに図７を参照して、そのシフト処理について詳細に説明する。 Hereinafter, subroutine processing executed in the timer interrupt processing will be described in detail with reference to the flowcharts shown in FIGS.
FIG. 7 is a flowchart of the formant shift process executed as step 606 described above. First, the shift process will be described in detail with reference to FIG.

図６に示す楽音タイマインタラプト処理において、ステップ６０４のＦＦＴを行うことで得られた周波数振幅成分は１次元の配列変数Ｍａｇの各要素に代入される。その要素を指定する添字の値はインデクス値に対応する。このことから、周波数振幅成分のシフトは、シフト後の周波数振幅成分を代入する１次元の配列変数ＳｈｉｆｔＭａｇを用意して、配列変数Ｍａｇの要素に代入された周波数振幅成分を代入すべき配列変数ＳｈｉｆｔＭａｇの要素を特定して代入することで行っている。 In the musical tone timer interrupt process shown in FIG. 6, the frequency amplitude component obtained by performing the FFT in step 604 is substituted for each element of the one-dimensional array variable Mag. The index value that specifies the element corresponds to the index value. Therefore, for the shift of the frequency amplitude component, a one-dimensional array variable ShiftMag for substituting the shifted frequency amplitude component is prepared, and the array variable ShiftMag to which the frequency amplitude component substituted for the element of the array variable Mag is to be substituted. This is done by specifying and substituting the elements of.

先ず、ステップ７０１では、変数ｉに０を代入する。続くステップ７０２では、変数ＳｈｉｆｔＩｄｘに、変数ｉの値と変数ＦｏｒｍａｎｔＲａｔｉｏの値の乗算結果を四捨五入した値（＝ＩＮＴ（ｉ×ＦｏｒｍａｎｔＲａｔｉｏ））を代入し、変数Ｎｅｘｔには、変数ｉの値に１を加算した値と変数ＦｏｒｍａｎｔＲａｔｉｏの値の乗算結果を四捨五入した値（＝ＩＮＴ（（ｉ＋１）×ＦｏｒｍａｎｔＲａｔｉｏ））を代入する。 First, in step 701, 0 is substituted for variable i. In the subsequent step 702, a value obtained by rounding the multiplication result of the value of the variable i and the value of the variable FormatRatio (= INT (i × FormatRatio)) is substituted for the variable ShiftIdx, and 1 is set to the value of the variable i. A value (= INT ((i + 1) × FormantRatio)) obtained by rounding the multiplication result of the added value and the value of the variable FormatRatio is substituted.

ステップ７０３では、変数ＳｈｉｆｔＩｄｘの値がフレームサイズＦＦＴ＿ＳＩＺＥの１／２の値より小さいか否か判定する。その１／２の値より変数ＳｈｉｆｔＩｄｘの値が小さくなかった場合、判定はＮＯとなってステップ７０６に移行する。そうでない場合には、判定はＹＥＳとなってステップ７０４に移行する。 In step 703, it is determined whether or not the value of the variable ShiftIdx is smaller than half the frame size FFT_SIZE. If the value of the variable ShiftIdx is not smaller than the half value, the determination is no and the process moves to step 706. Otherwise, the determination is yes and the process moves to step 704.

周波数振幅成分のフレームサイズの後半部分は前半部分の折り返しとなる。ステップ７０３の判定を行うのはこのためである。
ステップ７０４では、配列変数ＳｈｉｆｔＭａｇの変数ＳｈｉｆｔＩｄｘの値で指定される要素ＳｈｉｆｔＭａｇ［ＳｈｉｆｔＩｄｘ］に、配列変数Ｍａｇの変数ｉの値で指定される要素Ｍａｇ［ｉ］の値を代入し、変数ＳｈｉｆｔＩｄｘの値をインクリメントする。その次に移行するステップ７０５では、変数ＳｈｉｆｔＩｄｘの値が変数Ｎｅｘｔの値より小さいか否か判定する。前者が後者より小さい場合、判定はＹＥＳとなって上記ステップ７０３に戻る。そうでない場合には、判定はＮＯとなってステップ７０６に移行する。 The latter half of the frame size of the frequency amplitude component is a return of the first half. This is why the determination in step 703 is performed.
In step 704, the value of the element Shift [Id] specified by the value of the variable i of the array variable Mag is substituted into the element ShiftMag [ShiftIdx] specified by the value of the variable ShiftIdx of the array variable ShiftMag, and the value of the variable ShiftIdx Is incremented. In the next step 705, it is determined whether or not the value of the variable ShiftIdx is smaller than the value of the variable Next. If the former is smaller than the latter, the determination is yes and the process returns to step 703. Otherwise, the determination is no and the process moves to step 706.

ステップ７０６では、変数ｉの値をインクリメントする。続くステップ７０７では、変数ｉの値がフレームサイズＦＦＴ＿ＳＩＺＥの１／２の値より小さいか否か判定する。その１／２の値より変数ｉの値が小さくなかった場合、判定はＮＯとなり、ここで一連の処理を終了する。そうでない場合には、判定はＹＥＳとなって上記ステップ７０２に戻る。それにより、配列変数Ｍａｇの各要素に代入された周波数振幅成分のなかで配列変数ＳｈｉｆｔＭａｇの要素に代入すべき周波数成分を全て代入させる。 In step 706, the value of variable i is incremented. In the next step 707, it is determined whether or not the value of the variable i is smaller than half the frame size FFT_SIZE. If the value of the variable i is not smaller than the half value, the determination is no, and the series of processing ends here. Otherwise, the determination is yes and the process returns to step 702 above. Thereby, all the frequency components to be substituted into the elements of the array variable ShiftMag are substituted among the frequency amplitude components substituted into the elements of the array variable Mag.

図８は、図６に示す楽音タイマインタラプト処理内でステップ６０９として実行されるピッチシフト処理のフローチャートである。次に図８を参照して、そのシフト処理について詳細に説明する。
図６に示す楽音タイマインタラプト処理において、ステップ６０５で算出した周波数振幅残差は１次元の配列変数ＲｅｓＭａｇの各要素に代入され、ステップ６０８で算出した瞬時周波数は１次元の配列変数Ｆｒｅｑの各要素に代入される。それらの要素を指定する添字の値はインデクス値に対応する。このことから、それらのシフトはフォルマントシフト処理における周波数振幅成分のシフトと同様に行われる。シフト後の周波数振幅残差、瞬時周波数は、１次元の配列変数ＳｈｉｆｔＲｅｓＭａｇの要素、１次元の配列変数ＳｈｉｆｔＦｒｅｑの要素にそれぞれ代入される。 FIG. 8 is a flowchart of the pitch shift process executed as step 609 in the musical tone timer interrupt process shown in FIG. Next, the shift process will be described in detail with reference to FIG.
In the musical tone timer interrupt process shown in FIG. 6, the frequency amplitude residual calculated in step 605 is substituted for each element of the one-dimensional array variable ResMag, and the instantaneous frequency calculated in step 608 is the element of the one-dimensional array variable Freq. Is assigned to The index values specifying these elements correspond to the index values. Therefore, these shifts are performed in the same manner as the shift of the frequency amplitude component in the formant shift process. The frequency amplitude residual and the instantaneous frequency after the shift are assigned to the element of the one-dimensional array variable ShiftResMag and the element of the one-dimensional array variable ShiftFreq, respectively.

先ず、ステップ８０１では、変数ｉに０を代入する。続くステップ８０２では、変数ＳｈｉｆｔＩｄｘに、変数ｉの値と変数ＰｉｔｃｈＲａｔｉｏの値の乗算結果を四捨五入した値（＝ＩＮＴ（ｉ×ＰｉｔｃｈＲａｔｉｏ））を代入し、変数Ｎｅｘｔには、変数ｉの値に１を加算した値と変数ＰｉｔｃｈＲａｔｉｏの値の乗算結果を四捨五入した値（＝ＩＮＴ（（ｉ＋１）×ＰｉｔｃｈＲａｔｉｏ））を代入する。 First, in step 801, 0 is substituted for variable i. In the subsequent step 802, a value obtained by rounding the multiplication result of the value of the variable i and the value of the variable PitchRatio (= INT (i × PitchRatio)) is substituted for the variable ShiftIdx, and 1 is assigned to the value of the variable i. A value (= INT ((i + 1) × PitchRatio)) obtained by rounding the multiplication result of the added value and the value of the variable PitchRatio is substituted.

ステップ８０３では、変数ＳｈｉｆｔＩｄｘの値がフレームサイズＦＦＴ＿ＳＩＺＥの１／２の値より小さいか否か判定する。その１／２の値より変数ＳｈｉｆｔＩｄｘの値が小さくなかった場合、判定はＮＯとなってステップ８０８に移行する。そうでない場合には、判定はＹＥＳとなってステップ８０４に移行する。 In step 803, it is determined whether or not the value of the variable ShiftIdx is smaller than a half value of the frame size FFT_SIZE. If the value of the variable ShiftIdx is not smaller than the half value, the determination is no and the process moves to step 808. Otherwise, the determination is yes and the process moves to step 804.

ステップ８０４では、配列変数ＳｈｉｆｔＲｅｓＭａｇの変数ＳｈｉｆｔＩｄｘの値で指定される要素ＳｈｉｆｔＲｅｓＭａｇ［ＳｈｉｆｔＩｄｘ］に、配列変数ＲｅｓＭａｇの変数ｉの値で指定される要素ＲｅｓＭａｇ［ｉ］の値を代入する。続くステップ８０５では、配列変数ＳｈｉｆｔＦｒｅｑの変数ＳｈｉｆｔＩｄｘの値で指定される要素ＳｈｉｆｔＦｒｅｑ［ＳｈｉｆｔＩｄｘ］に、配列変数Ｆｒｅｑの変数ｉの値で指定される要素Ｆｒｅｑ［ｉ］の値と変数ＰｉｔｃｈＲａｔｉｏの値の乗算結果を代入する。ステップ８０６にはその代入後に移行する。 In step 804, the value of the element ResMag [i] specified by the value of the variable i of the array variable ResMag is substituted for the element ShiftResMag [ShiftIdx] specified by the value of the variable ShiftIdx of the array variable ShiftResMag. In the subsequent step 805, the element ShiftFreq [ShiftIdx] specified by the value of the variable ShiftIdx of the array variable ShiftFreq is multiplied by the value of the element Freq [i] specified by the value of the variable i of the array variable Freq and the value of the variable PitchRatio. Assign the result. Step 806 proceeds after the substitution.

ステップ８０６では、変数ＳｈｉｆｔＩｄｘの値をインクリメントする。その次に移行するステップ８０７では、変数ＳｈｉｆｔＩｄｘの値が変数Ｎｅｘｔの値より小さいか否か判定する。前者が後者より小さい場合、判定はＹＥＳとなって上記ステップ８０３に戻る。そうでない場合には、判定はＮＯとなってステップ８０８に移行する。 In step 806, the value of the variable ShiftIdx is incremented. In the next step 807, it is determined whether or not the value of the variable ShiftIdx is smaller than the value of the variable Next. If the former is smaller than the latter, the determination is yes and the process returns to step 803 above. Otherwise, the determination is no and the process moves to step 808.

ステップ８０８では、変数ｉの値をインクリメントする。続くステップ８０９では、変数ｉの値がフレームサイズＦＦＴ＿ＳＩＺＥの１／２の値より小さいか否か判定する。その１／２の値より変数ｉの値が小さくなかった場合、判定はＮＯとなり、ここで一連の処理を終了する。そうでない場合には、判定はＹＥＳとなって上記ステップ８０２に戻る。 In step 808, the value of variable i is incremented. In the subsequent step 809, it is determined whether or not the value of the variable i is smaller than a half value of the frame size FFT_SIZE. If the value of the variable i is not smaller than the half value, the determination is no, and the series of processing ends here. Otherwise, the determination is yes and the process returns to step 802.

なお、本実施例では、乗算器３１２はシフト部３１０ｂからの周波数振幅算差を移動平均フィルタ部３０５ｂからの周波数振幅概形と乗算するようになっているが、その周波数振幅概形は移動平均フィルタ部３０４がフィルタリングして得られる周波数振幅概形をシフトしたものであっても良い。そのフィルタリングは、移動平均フィルタによるものでなくとも良く、他の低域通過フィルタによるものであっても良い。 In the present embodiment, the multiplier 312 multiplies the frequency amplitude arithmetic difference from the shift unit 310b by the frequency amplitude outline from the moving average filter unit 305b. The frequency amplitude outline is a moving average. The frequency amplitude outline obtained by filtering by the filter unit 304 may be shifted. The filtering may not be performed by the moving average filter, but may be performed by another low-pass filter.

周波数振幅成分等のシフトは、インデクス値に着目して、配列変数の要素の値を変更せずに行っているが、ネビル補間やラグランジュ補間等の高次補間により行っても良い。元音声に重畳する音声は一つの音声のみとしているが、フォルマント位置、更にはピッチを変えて複数の音声を重畳するようにしても良い。
＜第２の実施例＞
上記第１の実施例では、ピッチをシフトするために周波数振幅残差のシフトを行っている。これに対し、第２の実施例は、そのピッチのシフトを他の方法で実現させるようにしたものである。 The shift of the frequency amplitude component or the like is performed without changing the element value of the array variable by paying attention to the index value, but may be performed by higher-order interpolation such as Neville interpolation or Lagrange interpolation. Although only one sound is superimposed on the original sound, a plurality of sounds may be superimposed by changing the formant position and further the pitch.
<Second embodiment>
In the first embodiment, the frequency amplitude residual is shifted in order to shift the pitch. On the other hand, in the second embodiment, the pitch shift is realized by another method.

第２の実施例による音声分析生成装置を搭載した電子楽器の構成は基本的に第１の実施例におけるそれと同じである。動作も大部分は同じか、或いは比較的に大きな差がない。このようなことから、同じ、或いは区別するほどの相違のないものについては、第１の実施例の説明で付した符号をそのまま用いつつ、第１の実施例から異なる部分に着目して説明を行うこととする。 The configuration of the electronic musical instrument equipped with the speech analysis / generation apparatus according to the second embodiment is basically the same as that in the first embodiment. The operation is also largely the same or relatively small. For this reason, the same or different ones that are not different from each other will be described by focusing on the different parts from the first embodiment while using the reference numerals in the description of the first embodiment as they are. I will do it.

図９は、第２の実施例による音声分析生成装置の機能構成図である。始めに図９を参照して、その機能的構成、及び各部の動作について詳細に説明する。その図９において、第１の実施例と同じ、或いは区別するほどの相違のないものについては同一の符号を付している。 FIG. 9 is a functional configuration diagram of the speech analysis generation device according to the second embodiment. First, the functional configuration and the operation of each unit will be described in detail with reference to FIG. In FIG. 9, the same reference numerals are given to the same components as those in the first embodiment or those that are not different enough to be distinguished.

第２の実施例では、図９に示すように、周波数振幅残差を算出する代わりに、ピッチを有する、声帯音源波形を模擬したRosenberg 波をRosenberg 波生成部９０１により生成するようになっている。その生成部９０１は、操作パネル３１１から指示されたピッチでRosenberg 波を生成する。 In the second embodiment, as shown in FIG. 9, instead of calculating the frequency amplitude residual, a Rosenberg wave simulating a vocal cord sound source waveform having a pitch is generated by the Rosenberg wave generation unit 901. . The generation unit 901 generates a Rosenberg wave at a pitch instructed from the operation panel 311.

ＦＦＴ部９０２は、生成部９０１が生成したRosenberg 波を対象にＦＦＴを行い、周波数振幅成分を乗算器３１２に送る。それにより、乗算器３１２は、その周波数振幅成分とフォルマント移動部３０５の移動平均フィルタ部３０５ｂからの周波数振幅概形とを乗算し、その乗算結果をＩＦＦＴ部３１５に送る。そのＩＦＦＴ部３１５は、その乗算結果である周波数振幅成分とＦＦＴ部３０３からの位相成分を用いて逆ＦＦＴを行い、音声データを生成する。 The FFT unit 902 performs FFT on the Rosenberg wave generated by the generation unit 901 and sends a frequency amplitude component to the multiplier 312. Thereby, the multiplier 312 multiplies the frequency amplitude component by the frequency amplitude outline from the moving average filter unit 305b of the formant moving unit 305, and sends the multiplication result to the IFFT unit 315. The IFFT unit 315 performs inverse FFT using the frequency amplitude component that is the multiplication result and the phase component from the FFT unit 303 to generate audio data.

Rosenberg 波は様々なピッチで生成することができる。このため、そのRosenberg 波を生成してそれから得られる周波数振幅成分を周波数振幅概形と乗算した周波数振幅成分を逆ＦＦＴに用いることにより、フォルマント位置の操作と併せてそのピッチをシフトさせることができる。そのRosenberg 波の生成に重い負荷の処理を行わなくとも済むため、第１の実施例から負荷が重くなるのを回避しつつ、それらを実現させることができる。この第２の実施例では、ボコーダーやピッチコレクトのような使い方をすることもできる。 Rosenberg waves can be generated at various pitches. For this reason, By using the frequency amplitude component obtained by generating the Rosenberg wave and multiplying the frequency amplitude component obtained therefrom by the frequency amplitude outline for the inverse FFT, The pitch can be shifted together with the operation of the formant position. Because it is not necessary to handle heavy load to generate the Rosenberg wave, While avoiding a heavy load from the first embodiment, They can be realized. In this second embodiment, You can also use it like vocoder or pitch correct.

第２の実施例による音声変換装置を実現させるための電子楽器の動作については、楽音タイマインタラプト処理（図６参照）が第１の実施例から比較的に大きく異なっている。このことから、そのタイマインタラプト処理についてのみ、図１０に示すそのフローチャートを参照して詳細に説明する。ここでは、第１の実施例と同じ符号を付したステップの処理についての説明は基本的に省略する。 As for the operation of the electronic musical instrument for realizing the sound conversion apparatus according to the second embodiment, the musical tone timer interrupt process (see FIG. 6) is relatively different from the first embodiment. Therefore, only the timer interrupt process will be described in detail with reference to the flowchart shown in FIG. Here, the description of the process of the step which attached the same code | symbol as 1st Example is abbreviate | omitted fundamentally.

第２の実施例では、ステップ６０７で周波数振幅概形を算出すると、次にステップ１００１に移行する。そのステップ１００１では、ユーザがピッチスライダ２１により指示したピッチでRosenberg 波を生成する。それに続くステップ１００２では、そのRosenberg 波を対象にＦＦＴを行い、周波数振幅成分を抽出する。その後はステップ１００３に移行する。 In the second embodiment, when the frequency amplitude outline is calculated in step 607, the process proceeds to step 1001. In step 1001, a Rosenberg wave is generated at a pitch specified by the user using the pitch slider 21. In subsequent step 1002, FFT is performed on the Rosenberg wave to extract a frequency amplitude component. Thereafter, the process proceeds to step 1003.

ステップ１００３では、ステップ１００２で抽出した周波数振幅成分とステップ６０７で算出した周波数振幅概形を乗算して周波数振幅成分を算出する。次のステップ１００４では、その周波数振幅成分、及びステップ６０４で抽出した位相成分を用いて逆ＦＦＴを行い、音声データを生成する。その後はステップ６１４に移行して、それ以降の処理を同様に実行する。 In step 1003, the frequency amplitude component is calculated by multiplying the frequency amplitude component extracted in step 1002 by the frequency amplitude outline calculated in step 607. In the next step 1004, inverse FFT is performed using the frequency amplitude component and the phase component extracted in step 604 to generate audio data. Thereafter, the process proceeds to step 614, and the subsequent processing is similarly executed.

なお、本実施例（第１及び第２の実施例）は、電子楽器に搭載された音声分析生成装置に本発明を適用したものであるが、本発明を適用できる音声分析生成装置はそのような音声分析生成装置に限定されるわけではない。本発明は音声分析生成装置が搭載された装置の種類や用途などに係わらず、幅広く適用できるものである。 In the present embodiment (first and second embodiments), the present invention is applied to a speech analysis / generation apparatus mounted on an electronic musical instrument. However, a speech analysis / generation apparatus to which the present invention can be applied is like that. However, the present invention is not limited to a voice analysis / generation device. The present invention can be widely applied regardless of the type or use of the device on which the speech analysis / generation device is mounted.

フォルマント位置のシフト量やピッチのシフト量は共にユーザが指定するようにしているが、それらは自動的に指定させるようにしても良い。その指定方法や指定させる手段等をユーザが選択できるようにしても良い。
上述したような音声分析生成装置、或いはその変形例を実現させるようなプログラムは、ＣＤ−ＲＯＭ、ＤＶＤ、或いは光磁気ディスク等の記録媒体に記録させて配布しても良い。或いは、公衆網等で用いられる伝送媒体を介して、そのプログラムの一部、若しくは全部を配信するようにしても良い。そのようにした場合には、ユーザーはプログラムを取得してコンピュータなどのデータ処理装置にロードすることにより、そのデータ処理装置を用いて本発明を適用させた音声分析生成装置を実現させることができる。このことから、記録媒体は、プログラムを配信する装置がアクセスできるものであっても良い。 Both the formant position shift amount and the pitch shift amount are specified by the user, but they may be automatically specified. The user may be able to select the designation method, the means for causing designation, and the like.
The voice analysis / generation apparatus as described above, or a program that realizes a modification thereof may be recorded and distributed on a recording medium such as a CD-ROM, DVD, or magneto-optical disk. Alternatively, part or all of the program may be distributed via a transmission medium used in a public network or the like. In such a case, the user can acquire a program and load it into a data processing device such as a computer, thereby realizing a speech analysis generation device to which the present invention is applied using the data processing device. . Therefore, the recording medium may be accessible by a device that distributes the program.

本実施例による音声分析生成装置を搭載した電子楽器の構成図である。It is a block diagram of the electronic musical instrument carrying the audio | voice analysis production | generation apparatus by a present Example. スライダ部１３が備えたスライダを示す図である。It is a figure which shows the slider with which the slider part 13 was provided. 第１の実施例による音声分析生成装置の機能構成図である。It is a functional block diagram of the audio | voice analysis production | generation apparatus by a 1st Example. 全体処理のフローチャートである。It is a flowchart of the whole process. スライダ処理のフローチャートである。It is a flowchart of a slider process. 楽音タイマインタラプト処理のフローチャートである。It is a flowchart of a musical tone timer interrupt process. フォルマントシフト処理のフローチャートである。It is a flowchart of a formant shift process. ピッチシフト処理のフローチャートである。It is a flowchart of a pitch shift process. 第２の実施例による音声分析生成装置の機能構成図である。It is a functional block diagram of the audio | voice analysis production | generation apparatus by a 2nd Example. 楽音タイマインタラプト処理のフローチャートである（第２の実施例）。It is a flowchart of a musical tone timer interrupt process (second embodiment).

Explanation of symbols

１ＣＰＵ
３スイッチ部
４ＲＯＭ
５ＲＡＭ
７マイク
８Ａ／Ｄ変換器
９楽音生成部
１０Ｄ／Ａ変換器
１１アンプ
１２スピーカ
１３スライダ部

1 CPU
3 Switch part 4 ROM
5 RAM
7 Microphone 8 A / D converter 9 Musical sound generator 10 D / A converter 11 Amplifier 12 Speaker 13 Slider

Claims

In a speech analysis generation device that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
Analyzing means for analyzing the first speech waveform and extracting a first frequency amplitude component and a phase component;
Filter means for filtering frequency amplitude components;
Indicating means for instructing a formant shift amount with respect to the first frequency amplitude component;
Shift means for shifting according to the shift amount instructed by the instruction means;
Calculating means for calculating a frequency amplitude residual by dividing the first frequency amplitude component by a second frequency amplitude component obtained by filtering the first frequency amplitude component by the filter means;
A third frequency amplitude component obtained by performing one of filtering by the filter unit on the first frequency amplitude component shifted by the shift unit and shifting by the shift unit on the second frequency amplitude component. Multiplying means for multiplying the frequency amplitude residual;
Voice waveform generation means for generating the second voice waveform using the fourth frequency amplitude component obtained by the multiplication by the multiplication means and the phase component;
A speech analysis generation apparatus comprising:

In a speech analysis generation device that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
Analyzing means for analyzing the first speech waveform and extracting a first frequency amplitude component and a first phase component;
Filter means for filtering frequency amplitude components;
First shift means for performing a formant shift on the frequency amplitude component;
Residual calculation for calculating a frequency amplitude residual obtained by dividing the first frequency amplitude component by a second frequency amplitude component obtained by filtering the first frequency amplitude component by the filter means. Means,
Instantaneous frequency calculating means for calculating an instantaneous frequency from the first phase component;
Pitch instruction means for instructing the pitch shift amount;
Second shift means for shifting the instantaneous frequency and frequency amplitude residual according to the shift amount instructed by the pitch instruction means;
First obtained by performing one of filtering by the filter means on the first frequency amplitude component shifted by the first shift means and shifting by the first shift means on the second frequency amplitude component. An amplitude component calculating unit that calculates a fourth frequency amplitude component by multiplying the frequency amplitude component of 3 by the frequency amplitude residual shifted by the second shifting unit;
Phase component calculation means for calculating a second phase component from the instantaneous frequency shifted by the second shift means;
Voice waveform generation means for generating the second voice waveform using the fourth frequency amplitude component and the second phase component;
A speech analysis generation apparatus comprising:

In a speech analysis generation device that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
Analyzing means for analyzing the first speech waveform and extracting a first frequency amplitude component and a phase component;
Filter means for filtering frequency amplitude components;
Shift means for performing a formant shift on the frequency amplitude component;
Pitch instruction means for indicating the pitch;
Sound source waveform generating means for generating a sound source waveform that simulates a vocal cord sound source at a pitch indicated by the pitch instruction means;
Other analysis means for analyzing the sound source waveform and extracting frequency amplitude components;
Filtering by the filter means on the first frequency amplitude component shifted by the shift means into the frequency amplitude component extracted from the sound source waveform by the other analysis means, and the first frequency amplitude component filtered by the filter means An amplitude component calculating means for calculating a third frequency amplitude component by multiplying the second frequency amplitude component obtained by performing one of the shifts by the shift means,
Voice waveform generating means for generating the second voice waveform using the third frequency amplitude component and the phase component;
A speech analysis generation apparatus comprising:

The analysis means performs analysis of the first speech waveform using a fast Fourier transform,
The speech waveform generation means generates the second speech waveform using an inverse fast Fourier transform.
The speech analysis generation apparatus according to claim 1, 2, or 3.

The filter means functions as a moving average filter.
The voice analysis generation apparatus according to claim 1, wherein

The second voice waveform can be superimposed on the first voice waveform and output.
The voice analysis generation apparatus according to claim 1, wherein

The sound source waveform generating means generates a Rosenberg wave as the sound source waveform;
The speech analysis generation apparatus according to claim 3, wherein

A program to be executed by a speech analysis generation apparatus that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
An analysis function for analyzing the first speech waveform and extracting a first frequency amplitude component and a phase component;
A filter function for filtering frequency amplitude components;
An instruction function for instructing a formant shift amount with respect to the first frequency amplitude component;
A shift function for shifting according to the shift amount instructed by the instruction function;
A calculation function for calculating a frequency amplitude residual obtained by dividing the first frequency amplitude component by a second frequency amplitude component obtained by filtering the first frequency amplitude component with the filter function;
A third frequency amplitude component obtained by performing one of filtering by the filter function on the first frequency amplitude component shifted by the shift function and shifting by the shift function on the second frequency amplitude component. A multiplication function for multiplying the frequency amplitude residual;
A voice waveform generation function for generating the second voice waveform using the fourth frequency amplitude component obtained by performing multiplication by the multiplication function and the phase component;
A program to realize

A program to be executed by a speech analysis generation apparatus that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
An analysis function for analyzing the first speech waveform and extracting a first frequency amplitude component and a first phase component;
A filter function for filtering frequency amplitude components;
A first shift function for performing a formant shift on the frequency amplitude component;
Residual calculation function for calculating a frequency amplitude residual obtained by dividing the first frequency amplitude component by a second frequency amplitude component obtained by filtering the first frequency amplitude component with the filter function When,
An instantaneous frequency calculation function for calculating an instantaneous frequency from the first phase component;
A pitch instruction function for instructing a pitch shift amount;
A second shift function for shifting the instantaneous frequency and the frequency amplitude residual according to the shift amount instructed by the pitch instruction function;
A first obtained by performing one of filtering by the filter function on the first frequency amplitude component shifted by the first shift function and shifting by the first shift function on the second frequency amplitude component. An amplitude component calculation function for calculating a fourth frequency amplitude component by multiplying the frequency amplitude component of 3 by the frequency amplitude residual shifted by the second shift function;
A phase component calculation function for calculating a second phase component from the instantaneous frequency shifted by the second shift function;
A speech waveform generation function for generating the second speech waveform using the fourth frequency amplitude component and the second phase component;
A program to realize

A program to be executed by a speech analysis generation apparatus that analyzes a first speech waveform and generates a second speech waveform using the analysis result,
An analysis function for analyzing the first speech waveform and extracting a first frequency amplitude component and a phase component;
A filter function for filtering frequency amplitude components;
A shift function for shifting the formant with respect to the frequency amplitude component;
A pitch instruction function for instructing the pitch;
A sound source waveform generation function for generating a sound source waveform that simulates a vocal cord sound source at a pitch specified by the pitch instruction function;
Other analysis functions for analyzing the sound source waveform and extracting frequency amplitude components;
Filtering by the filter function on the first frequency amplitude component shifted by the shift function into the frequency amplitude component extracted from the sound source waveform by the other analysis function, and the first frequency amplitude component filtered by the filter function An amplitude component calculation function for calculating a third frequency amplitude component by multiplying the second frequency amplitude component obtained by performing one of the shifts by the shift function for
A voice waveform generation function for generating the second voice waveform using the third frequency amplitude component and the phase component;
A program to realize