JP2008262140A

JP2008262140A - Musical pitch conversion device and musical pitch conversion method

Info

Publication number: JP2008262140A
Application number: JP2007127383A
Authority: JP
Inventors: Saburo Tsuchiya; 三郎土谷; Kenichi Yamazaki; 健一山崎; Tatsuo Yatagai; 達雄谷田貝
Original assignee: AREX KK
Current assignee: AREX KK
Priority date: 2007-04-11
Filing date: 2007-04-11
Publication date: 2008-10-30

Abstract

<P>PROBLEM TO BE SOLVED: To keep voice quality approximately to the same level as that before conversion, and to suppress noise generated at the time of the conversion, even when a musical pitch of voice is converted. <P>SOLUTION: In a band pass filter output switching section, on the basis of output of an output amplitude comparison section, selection of only one of outputs BPF15 to BPF 21 is repeated. A pitch period detecting section obtains a period of a voice signal on the basis of output from the band pass filter output switching section. A storage section for temporarily storing period information stores the period information detected by the pitch period detecting section. A musical pitch control section performs musical pitch conversion processing of voice data which are stored in a storage section for temporarily storing voice data, with a pitch period of the voice signal as a unit, on the basis of information of the storage section for temporarily storing period information, and outputs the result to the storage section for temporarily storing the voice data, and updates the storage content. The obtained musical pitch conversion processing result is controlled to output from the storage section for temporarily storing the voice data, via a voice data output section. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、入力される音声信号の音程を、所望の音程に変換して出力する音程変換装置、及び音程変換方法に関する。 The present invention relates to a pitch conversion device and a pitch conversion method for converting a pitch of an input audio signal into a desired pitch and outputting the pitch.

従来、入力される音声信号のサンプリング周波数と出力信号のサンプリング周波数との比率を、変換する音程の比率と等しくすることにより、音程（ピッチ周期）を変換する装置が提案されている（例えば特許文献１参照）。また、音声信号の周期を抽出し、音声の１周期毎に波形を切り出した後、音程変換の比率に従った時間間隔で１周期毎の波形を接続することにより、音程（ピッチ周期）を変換する装置が提案されている（例えば特許文献２参照）。 2. Description of the Related Art Conventionally, an apparatus for converting a pitch (pitch period) by making a ratio of a sampling frequency of an input audio signal and a sampling frequency of an output signal equal to a ratio of a pitch to be converted has been proposed (for example, Patent Documents). 1). Also, after extracting the period of the audio signal and cutting out the waveform for each period of the audio, the pitch (pitch period) is converted by connecting the waveforms for each period at time intervals according to the ratio of pitch conversion. An apparatus has been proposed (see, for example, Patent Document 2).

特公平７−６２８００号公報Japanese Patent Publication No. 7-62800 特開２００１−２６５４００号公報JP 2001-265400 A

上記特許文献１で代表される従来技術における音程変換処理は、磁気テープに記録した音声信号を再生する際に、テープの走行速度を変化させることと原理的に等しい。この手法により簡単に音声のピッチ（基本周波数）を変化させることができるが、同時に、フォルマント（声道の共鳴によるピーク）の周波数も変化するため、声質が変化するなど話者固有の特徴も失われ、音程変換によって不自然な音声となるという不具合が生じる。 The pitch conversion process in the prior art represented by the above-mentioned Patent Document 1 is theoretically equivalent to changing the running speed of the tape when reproducing the audio signal recorded on the magnetic tape. This technique can easily change the pitch (fundamental frequency) of the speech, but at the same time, the formant (peak due to resonance of the vocal tract) also changes, so the speaker-specific characteristics such as voice quality change are lost. However, there is a problem that the sound becomes unnatural due to the pitch change.

上記特許文献２で代表される従来技術における音程変換処理は、上述した音声変換による音声の不自然さを解消するためになされたものであり、音声の１周期毎に時間間隔を変化させて波形を変化させて波形を接合することで、音声のピッチが変化してもフォルマントの周波数は一定に保たれるため、音程変換処理の過程にて声質が劣化し難い。しかし、その反面、時間間隔を広くして接合する場合、音声の周期波形間に無音区間を挿入するため、音声の１周期の波形が孤立し、新たに高調波が生じるなど音質の劣化を招来するおそれがある。また、窓関数にて波形処理を行い音声の１周期を抜き出しているが、その際の音質劣化については考慮されていない。
従って、本発明の目的は、音程変換装置において、人間が発した音声の音程を変換しても、変換後の音声品質を変換前のそれと略同一に維持することができ、且つ、変換時に生じる雑音を抑制することができるようにすることにある。The pitch conversion process in the prior art represented by the above-mentioned Patent Document 2 is performed in order to eliminate the unnaturalness of the voice due to the above-described voice conversion, and the waveform is changed by changing the time interval for each cycle of the voice. Since the formant frequency remains constant even when the pitch of the voice changes, the voice quality is unlikely to deteriorate during the pitch conversion process. On the other hand, when joining with a wide time interval, a silent period is inserted between the periodic waveforms of the voice, so that the waveform of one period of the voice is isolated and a new harmonic is generated, resulting in deterioration of sound quality. There is a risk. Further, the waveform processing is performed by the window function to extract one period of the voice, but the deterioration of the sound quality at that time is not taken into consideration.
Accordingly, an object of the present invention is to maintain the converted voice quality substantially the same as that before the conversion even if the pitch of the voice uttered by a person is converted in the pitch conversion device, and occurs at the time of the conversion. It is to be able to suppress noise.

本発明の第１の観点に従う音程変換装置は、入力される音声信号の音程を、所望の音程に設定変更するための音程変化率設定手段と、人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタを用いて高調波成分を抑圧した基本周波数信号を求める高調波成分抑圧手段と、前記基本周波数信号の波形を基に前記音声信号の実波形の周期を検出するピッチ周期検出手段と、前記音程変化率設定手段により設定変更された音程に制御すべく、前記ピッチ周期検出手段で得たピッチ周期の間隔毎に、前記音声信号の実波形を対象に波形処理することにより音程変換する音程変換手段と、を備える音程変換装置であって、前記高調波成分抑圧手段には、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する出力振幅比較手段を具備し、前記音程変換手段では、前記ピッチ周期検出手段で得たピッチ周期の間隔毎に波形の接続処理を実施する。 A pitch converter according to the first aspect of the present invention covers pitch change rate setting means for changing the pitch of an input audio signal to a desired pitch, a frequency range of a human voice, and a band. A plurality of band pass filters that set each pass frequency band so as to divide the inside, harmonic component suppression means for obtaining a fundamental frequency signal that suppresses the harmonic component using each of these band pass filters; and The pitch period detecting means for detecting the period of the actual waveform of the audio signal based on the waveform of the fundamental frequency signal, and the pitch period detecting means obtained by the pitch period detecting means to control the pitch changed by the pitch change rate setting means. A pitch conversion device comprising pitch conversion means for performing pitch conversion by processing the actual waveform of the audio signal for each pitch period interval, wherein the harmonic component suppression means By comparing the threshold value that is changed based on the output value of each bandpass filter and the output value of each bandpass filter, the amplitude is equal to or greater than a certain amount, and a low frequency band is emphasized. Output amplitude comparison means for generating an output for selecting a filter is provided, and the pitch conversion means performs waveform connection processing for each pitch period interval obtained by the pitch period detection means.

本発明の第１の観点に係る好適な実施形態では、前記ピッチ周期検出手段が、前記基本周波数信号の波形のピーク位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているピーク位置同士若しくはそのピーク位置近傍同士の間隔を検出するピッチ周期検出手段である。 In a preferred embodiment according to the first aspect of the present invention, the pitch period detecting means sequentially detects the peak position of the waveform of the fundamental frequency signal or the vicinity thereof, and the detected order is continuous. It is a pitch period detection means for detecting an interval between peak positions or the vicinity of the peak positions.

上記とは別の実施形態では、前記ピッチ周期検出手段が、前記基本周波数信号の波形のゼロクロス位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているゼロクロス位置若しくはそのゼロクロス位置近傍同士を検出するピッチ周期検出手段である。 In another embodiment different from the above, the pitch period detection means sequentially detects the zero-cross position of the waveform of the fundamental frequency signal or the vicinity thereof, and the zero-cross position or the zero-cross position in which the order of detection is continuous. It is a pitch period detection means for detecting the vicinity.

上記とは別の実施形態では、前記音程変換手段が、前記音声信号の再生を、前記音程変化率設定手段により設定された音程に制御すべく、前記ピッチ周期検出手段が検出したピーク位置の間隔で、且つ、そのピーク位置若しくはピーク位置の中間点毎に前記音声信号の実波形における時間軸上の一致点を処理位置として前記音声信号の実波形を対象に波形処理することで音程変換する手段である。 In another embodiment different from the above, the interval of the peak positions detected by the pitch period detecting means so that the pitch converting means controls the reproduction of the audio signal to the pitch set by the pitch change rate setting means. And means for converting the pitch by processing the actual waveform of the audio signal for the peak position or an intermediate point between the peak positions with the coincidence point on the time axis of the actual waveform of the audio signal as a processing position. It is.

上記とは別の実施形態では、前記出力振幅比較手段の後段には、当該出力振幅比較手段の出力を基に、各バンドパスフィルタからの出力のうち、いずれか１つのみ選択し出力するフィルタ出力切換手段を備えている。 In an embodiment different from the above, a filter that selects and outputs only one of the outputs from each band-pass filter based on the output of the output amplitude comparison means at the subsequent stage of the output amplitude comparison means Output switching means is provided.

本発明の第２の観点に従う音程変換方法は、入力される音声信号の音程を、所望の音程に設定変更するためのステップと、人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタを用いて高調波成分を抑圧した基本周波数信号を求めるステップと、前記基本周波数信号の波形を基に前記音声信号の実波形の周期（ピッチ周期）を検出するステップと、前記設定のステップにより設定変更された音程に制御すべく、前記ピッチ周期検出ステップで得られるピッチ周期毎に、前記音声信号の実波形を対象に波形処理することにより音程変換するステップと、を備える音程変換方法であって、前記高調波成分抑圧ステップでは、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成するための出力を生成するステップと、前記音程変換ステップでは、前記ピッチ周期検出ステップで得たピッチ周期の間隔毎に波形の接続処理を実施する。 A pitch conversion method according to the second aspect of the present invention includes a step for changing the pitch of an input audio signal to a desired pitch, a frequency range of a human voice, and a band division. As described above, a plurality of band-pass filters for setting each pass frequency band are prepared, and a step of obtaining a fundamental frequency signal in which harmonic components are suppressed using each of the band-pass filters and a waveform of the fundamental frequency signal are used. The step of detecting the actual waveform period (pitch period) of the audio signal and the pitch of the audio signal for each pitch period obtained in the pitch period detection step so as to control the pitch changed in the setting step. A pitch conversion method by performing waveform processing on an actual waveform as a target, and in the harmonic component suppression step, each bandpass In order to select a filter so that the amplitude is equal to or greater than a certain amount and the low frequency band is emphasized by comparing the threshold value that is changed based on the output value of the filter with the output value of each bandpass filter. In the step of generating the output for generating the output and the pitch conversion step, a waveform connection process is performed for each pitch cycle interval obtained in the pitch cycle detection step.

本発明によれば、音程変換装置において、入力音声の音程を変化させる処理を行う際、特定された周期間隔以外について波形処理を行うことを回避している。このため、フォルマントなど話者固有の特徴を失い難く、高品質な音程変換を可能にしている。 According to the present invention, when performing the process of changing the pitch of the input voice in the pitch converter, it is avoided to perform the waveform processing for other than the specified periodic interval. Therefore, it is difficult to lose speaker-specific features such as formants, and high-quality pitch conversion is possible.

以下、本発明の実施の形態を、図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施形態に係る音程変換装置の内部構成を示す機能ブロック図である。 FIG. 1 is a functional block diagram showing an internal configuration of a pitch changing apparatus according to an embodiment of the present invention.

上記音程変換装置は、図１に示すように、音声データ入力部１と、音程変化率設定部３と、音声データ一時保存用記憶部５と、周期情報一時保存用記憶部７と、音声特徴検出部９と、音程制御部１１と、音声データ出力部１３と、を含む。 As shown in FIG. 1, the pitch conversion device includes a voice data input unit 1, a pitch change rate setting unit 3, a voice data temporary storage unit 5, a periodic information temporary storage unit 7, and a voice feature. A detection unit 9, a pitch control unit 11, and an audio data output unit 13 are included.

音声特徴検出部９は、バンドパスフィルタ（以下、「ＢＰＦ」と表記する）１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１と、出力振幅比較部２３と、バンドパスフィルタ出力切換部２５と、ピッチ周期検出部２７と、を備える。 The audio feature detection unit 9 includes a band pass filter (hereinafter referred to as “BPF”) 15, BPF 17, BPF 19, BPF 21, output amplitude comparison unit 23, band pass filter output switching unit 25, and pitch period detection unit 27. And comprising.

これらの各部において、音声データ入力部１には、音程変換対象とするＰＣＭデータ（ＰＣＭ：ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）を入力する。ＰＣＭデータが入力された音声データ入力部１は、ＰＣＭデータを音声データ一時保存用記憶部５、およびＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１へと出力する。 In each of these units, PCM data (PCM: Pulse Code Modulation) to be converted into pitches is input to the audio data input unit 1. The voice data input unit 1 to which the PCM data has been input outputs the PCM data to the voice data temporary storage unit 5 and the BPF 15, BPF 17, BPF 19, and BPF 21.

音声データ一時保存用記憶部５は、音声データを音程変換処理するために設けたもので、入力した音声データを一時的に記憶し、この一時的に記憶した音声データを、音程制御部１１の制御下で音程変換処理に適した音声データに変換して音声データ出力部１３へと出力する。 The voice data temporary storage unit 5 is provided for performing pitch conversion processing on voice data, temporarily stores the input voice data, and stores the temporarily stored voice data in the pitch control unit 11. Under control, the sound data is converted into sound data suitable for the pitch conversion process and output to the sound data output unit 13.

ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１は、人の声の周波数範囲をカバーし、かつ帯域内を分割するように各通過周波数帯域を設定する。 BPF15, BPF17, BPF19, and BPF21 set each passing frequency band so as to cover the frequency range of human voice and to divide the band.

出力振幅比較部２３は、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１からの出力を一定区間毎に比較し、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する。 The output amplitude comparison unit 23 compares the outputs from the BPF 15, BPF 17, BPF 19, and BPF 21 for each predetermined section, and generates an output for selecting a filter whose amplitude is equal to or larger than a predetermined amount and places importance on a low frequency band. To do.

バンドパスフィルタ出力切換部２５は、出力振幅比較部２３の後段に設けられ、この出力振幅比較部２３の出力を基に、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１の出力のうち、いずれか１つのみを選択し、出力する。 The band-pass filter output switching unit 25 is provided in the subsequent stage of the output amplitude comparison unit 23, and based on the output of the output amplitude comparison unit 23, only one of the outputs of the BPF 15, BPF 17, BPF 19, and BPF 21 is output. Select and output.

ピッチ周期検出部２７は、バンドパスフィルタ出力切換部２５からの出力を基に音声信号の周期を求め、その結果を出力する。 The pitch period detection unit 27 obtains the period of the audio signal based on the output from the bandpass filter output switching unit 25 and outputs the result.

周期情報一時保存用記憶部７は、ピッチ周期検出部２７で検出した周期情報を格納する。 The period information temporary storage unit 7 stores the period information detected by the pitch period detection unit 27.

音程制御部１１は、予め音程変化率設定部３で設定した変化率値に応じて、音声データ一時保存用記憶部５に格納した音声データを周期情報一時保存用記憶部７の情報を基に、音声信号の周期を単位として音程変換処理を行い、その結果を音声データ一時保存用記憶部５に出力し、その記憶内容を更新する。こうして得られる音程変換処理結果を音声データ一時保存用記憶部５から音声データ出力部１３を経由して出力させる制御を行う。 The pitch control unit 11 converts the voice data stored in the voice data temporary storage unit 5 based on the information in the periodic information temporary storage unit 7 according to the change rate value set in advance by the pitch change rate setting unit 3. Then, the pitch conversion process is performed in units of the cycle of the audio signal, the result is output to the audio data temporary storage unit 5 and the stored content is updated. Control is performed to output the pitch conversion processing result thus obtained from the voice data temporary storage unit 5 via the voice data output unit 13.

前述した各部を備えた構成において、音声特徴検出部９について詳述する。この音声特徴検出部９には、人の声の音程により決まる周期（ピッチ周期）を精度良く検出するため、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１の４つからなる帯域フィルタを用意する。それぞれのフィルタの通過帯域は、音程変換対象とする音声の周波数範囲をカバーし、その周波数範囲を分割するように決定する。ここでは、例えば図２に利得Ａ−周波数ｆ〔Ｈｚ〕特性で示すように、フィルタの通過帯域での利得Ａが１．０となる関係でＢＰＦ１５のフィルタの通過帯域は１００〜２００〔Ｈｚ〕、ＢＰＦ１７のフィルタの通過帯域は２００〜３００〔Ｈｚ〕、ＢＰＦ１９のフィルタの通過帯域は３００〜４００〔Ｈｚ〕、ＢＰＦ２１の通過帯域は４００〜５００〔Ｈｚ〕の如くに定める。 The voice feature detection unit 9 in the configuration including the above-described units will be described in detail. The voice feature detection unit 9 is provided with four band filters BPF15, BPF17, BPF19, and BPF21 in order to accurately detect a period (pitch period) determined by the pitch of a human voice. The pass band of each filter is determined so as to cover the frequency range of the speech to be converted and to divide the frequency range. Here, for example, as shown by the gain A-frequency f [Hz] characteristic in FIG. 2, the pass band of the filter of the BPF 15 is 100 to 200 [Hz] because the gain A in the pass band of the filter is 1.0. The pass band of the BPF 17 filter is 200 to 300 [Hz], the pass band of the BPF 19 filter is 300 to 400 [Hz], and the pass band of the BPF 21 is 400 to 500 [Hz].

図２に示す関係にあるＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１の出力を、出力振幅比較部２３に入力する。出力振幅比較部２３は、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１からの出力を一定区間毎に比較し、前記ピッチ周期検出手段における検出で得られる基本周波数に応じた周期で設定変更される閾値と、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１からの一定区間毎の平均出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する。 The outputs of BPF 15, BPF 17, BPF 19, and BPF 21 having the relationship shown in FIG. 2 are input to the output amplitude comparison unit 23. The output amplitude comparison unit 23 compares the outputs from the BPF 15, BPF 17, BPF 19, and BPF 21 for each predetermined section, and a threshold value that is set and changed at a period corresponding to the basic frequency obtained by the detection by the pitch period detecting means, and the BPF 15 , BPF17, BPF19, and BPF21 are compared with the average output value for each predetermined section, thereby generating an output for selecting a filter whose amplitude is not less than a certain amount and attaches importance to a low frequency band.

バンドパスフィルタ出力切換部２５では、出力振幅比較部２３の出力を基に、ＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１の出力のうち、いずれか１つのみを選択する繰り返しがなされる。このため、音声データ入力部１からの音声信号がＢＰＦ１５、ＢＰＦ１７、ＢＰＦ１９、ＢＰＦ２１へと元音声の波形が例えば図３に示す波形のように入力されると、バンドパスフィルタ出力切換部２５の出力は、図４に示すように人の声の基本周波数成分を含み、かつその周波数の高調波成分を抑圧した音声波形となり、ほぼ単一の正弦波となる。 The band pass filter output switching unit 25 repeats selecting only one of the outputs of the BPF 15, BPF 17, BPF 19, and BPF 21 based on the output of the output amplitude comparison unit 23. For this reason, when the sound signal from the sound data input unit 1 is input to the BPF 15, BPF 17, BPF 19, and BPF 21 as the waveform of the original sound as shown in FIG. 3, for example, the output of the bandpass filter output switching unit 25 is output. As shown in FIG. 4, the voice waveform includes the fundamental frequency component of the human voice and suppresses the harmonic component of the frequency, resulting in a substantially single sine wave.

同時に、ピッチ周期検出部２７において、バンドパスフィルタ出力切換部２５の出力を基に、波形のピーク位置を求める処理と、隣り合うピーク位置２点間の距離（時間）から、その２点間のピッチ周期Ａを求める処理とがなされる。なお、図３及び図４中、符号▲１▼、符号▲２▼、符号▲３▼、符号▲４▼、符号▲５▼、符号▲６▼で示す各位置は波形のピーク位置である。このピッチ周期検出部２７からの出力は、周期情報一時保存用記憶部７へと順次入力されて、音程制御部１１で用いる周期情報として更新記憶される。 At the same time, in the pitch period detection unit 27, based on the output of the bandpass filter output switching unit 25, the processing for obtaining the peak position of the waveform and the distance (time) between two adjacent peak positions, the two points Processing for obtaining the pitch period A is performed. In FIG. 3 and FIG. 4, the positions indicated by reference numerals (1), (2), (3), (4), (5), and (6) are peak positions of the waveform. The output from the pitch cycle detection unit 27 is sequentially input to the cycle information temporary storage unit 7 and updated and stored as cycle information used by the pitch control unit 11.

音程制御部１１は、音程変化率設定部３により設定変更された音程に制御すべく、周期情報一時保存用記憶部７に記憶したバンドパスフィルタ出力切換部２５の出力におけるピッチ周期検出部２７が検出したピーク位置２点間の間隔で、且つそのピーク位置間隔毎に処理基準位置（図３にＸで示す位置、以下処理基準位置Ｘという）を設けて、上記音声信号の実波形における時間軸上の一致点を対象に波形処理することにより音程変換を実現する。図５は音程を下げる処理例を示し、図６は音程を上げる処理例を示す。 The pitch control unit 11 includes a pitch cycle detection unit 27 in the output of the bandpass filter output switching unit 25 stored in the cycle information temporary storage unit 7 in order to control the pitch changed by the pitch change rate setting unit 3. A processing reference position (position indicated by X in FIG. 3, hereinafter referred to as processing reference position X) is provided at intervals between the two detected peak positions and for each peak position interval, and the time axis in the actual waveform of the audio signal The pitch conversion is realized by performing waveform processing on the upper coincidence point. FIG. 5 shows an example of processing for lowering the pitch, and FIG. 6 shows an example of processing for raising the pitch.

図５の音程を下げる処理例では、音程変化率設定部３での設定変更に応じて適宜選定したピーク位置２点間において合成区間を定め、処理基準位置Ｘを中心に時間軸上で広げるようにずらして合成する。この際、波形を滑らかに接続するため、処理基準位置Ｘを中心とした一定区間（合成区間）について上記公知のクロスフェード処理を行う。他の周期間隔についても音程変化率設定部３での設定変更に応じて同様にクロスフェード処理を行う。 In the processing example for lowering the pitch in FIG. 5, a synthesis interval is defined between two peak positions appropriately selected according to the setting change in the pitch change rate setting unit 3, and is expanded on the time axis centering on the processing reference position X. To synthesize. At this time, in order to smoothly connect the waveforms, the known cross-fade process is performed for a certain section (composition section) centered on the processing reference position X. For other period intervals, the crossfade process is similarly performed according to the setting change in the pitch change rate setting unit 3.

図６の音程を上げる処理例では、音程変化率設定部３での設定変更に応じて適宜選定したピーク位置２点間において合成区間を定め、処理基準位置Ｘを中心に時間軸上で狭めるようにずらして合成する。この際、波形を滑らかに接続するため、処理基準位置Ｘを中心とした一定区間（合成区間）について上記公知のクロスフェード処理を行う。他の周期間隔についても音程変化率設定部３での設定変更に応じて同様にクロスフェード処理を行う。 In the example of increasing the pitch in FIG. 6, a synthesis interval is defined between two peak positions appropriately selected according to the setting change in the pitch change rate setting unit 3, and narrowed on the time axis centering on the processing reference position X. To synthesize. At this time, in order to smoothly connect the waveforms, the known cross-fade process is performed for a certain section (composition section) centered on the processing reference position X. For other period intervals, the crossfade process is similarly performed according to the setting change in the pitch change rate setting unit 3.

図７は音程を下げる処理例を模式的に示す詳細説明図である。まず、音程変化率設定部３での設定変更に応じて図７（ａ）に示すようにピーク位置▲３▼、ピーク位置▲４▼の２点間におけるピッチ周期Ａ分の音声を取り込み、処理基準位置Ｘを中心に前後の時間幅を等分に定めて処理範囲を設ける。次に、この処理範囲において図７（ｂ）に示すように処理基準位置Ｘよりも後の音声振幅を元の振幅から０まで減少させ、また図７（ｃ）に示すように処理基準位置Ｘよりも前の音声振幅を０から元の振幅まで増加させる。この結果、図７（ｂ）の音声と図７（ｃ）の音声とに処理基準位置Ｘを中心として重複した時間幅が含まれることになる。次に、図７（ｄ）に示すように処理基準位置Ｘを中心とした一定区間（合成区間）である上記重複した時間幅の区間において、図７（ｂ）の音声と図７（ｃ）の音声とを突き合わせて合成する公知のクロスフェード処理を行うことにより、音声の周期間隔がＡからＡ’のように広がる。他の周期間隔についても音程変化率設定部３での設定変更に応じて同様にクロスフェード処理を行うことにより、音程を所望通り下げることができる。 FIG. 7 is a detailed explanatory view schematically showing a processing example for lowering the pitch. First, in accordance with the setting change in the pitch change rate setting unit 3, as shown in FIG. 7 (a), the sound corresponding to the pitch period A between the two points of the peak position (3) and the peak position (4) is captured and processed. A processing range is provided by equally dividing the time width before and after the reference position X. Next, in this processing range, the audio amplitude after the processing reference position X is reduced from the original amplitude to 0 as shown in FIG. 7B, and the processing reference position X is shown in FIG. 7C. The voice amplitude before is increased from 0 to the original amplitude. As a result, the time width overlapped around the processing reference position X is included in the sound of FIG. 7B and the sound of FIG. 7C. Next, as shown in FIG. 7 (d), in the section of the overlapped time width, which is a fixed section (composite section) centering on the processing reference position X, the voice of FIG. 7 (b) and FIG. 7 (c). By performing a well-known cross-fade process that synthesizes and matches the voice of the voice, the period interval of the voice is increased from A to A ′. The pitch can be lowered as desired by performing the cross-fading process in the same manner in accordance with the setting change in the pitch change rate setting unit 3 for other periodic intervals.

図８は音程を上げる処理例を模式的に示す詳細説明図である。まず、音程変化率設定部３での設定変更に応じて図８（ａ）に示すようにピーク位置▲３▼、ピーク位置▲４▼の２点間におけるピッチ周期Ａ分の音声を取り込み、処理基準位置Ｘを中心に前後の空白時間幅を等分に定めて前後に分離された各処理範囲を設ける。次に、この前後に分離された各処理範囲のうち処理基準位置Ｘよりも前の処理範囲を対象にして図８（ｂ）に示すように音声振幅を元の振幅から０まで減少させ、また処理基準位置Ｘよりも後の処理範囲を対象にして図８（ｃ）に示すように音声振幅を０から元の振幅まで増加させる。この結果、図８（ｂ）の音声と図８（ｃ）の音声とに処理基準位置Ｘを中心として空白の時間幅が含まれることになる。次に、図８（ｄ）に示すように処理基準位置Ｘを中心とした一定区間（合成区間）である上記空白の時間幅の区間において、図８（ｂ）の音声と図８（ｃ）の音声とを合成する公知のクロスフェード処理を行うことにより、音声の周期間隔がＡからＡ”のように狭まる。他の周期間隔についても音程変化率設定部３での設定変更に応じて同様にクロスフェード処理を行うことにより、音程を所望通り上げることができる。 FIG. 8 is a detailed explanatory view schematically showing an example of processing for raising the pitch. First, in accordance with the setting change in the pitch change rate setting unit 3, as shown in FIG. 8 (a), the sound of the pitch period A between the two points of the peak position (3) and the peak position (4) is captured and processed. Centering on the reference position X, the front and back blank time widths are equally divided to provide respective processing ranges separated in the front and rear. Next, among the processing ranges separated before and after this, the speech amplitude is reduced from the original amplitude to 0 as shown in FIG. 8 (b) for the processing range before the processing reference position X, and For the processing range after the processing reference position X, the audio amplitude is increased from 0 to the original amplitude as shown in FIG. As a result, the voice of FIG. 8B and the voice of FIG. 8C include a blank time width centering on the processing reference position X. Next, as shown in FIG. 8 (d), in the interval of the blank time width, which is a fixed interval (composite interval) centered on the processing reference position X, the voice of FIG. 8 (b) and FIG. 8 (c). By performing a known cross-fading process for synthesizing with the voice of the voice, the period interval of the voice is narrowed from A to A ″. The same applies to other period intervals according to the setting change in the pitch change rate setting unit 3. By performing the crossfading process, the pitch can be raised as desired.

以上説明したように、本発明の一実施形態においては、バンドパスフィルタ出力切換部２５では、出力振幅比較部２３の出力を基に、ＢＰＦ１５〜ＢＰＦ２１の出力のうち、いずれか１つのみを選択する繰り返しがなされる。ピッチ周期検出部２７は、バンドパスフィルタ出力切換部２５からの出力を基に音声信号の周期（ピッチ周期）を求める。周期情報一時保存用記憶部７は、ピッチ周期検出部２７で検出した周期情報を格納する。音程制御部１１は、音声信号の再生を音程変化率設定部３により設定された音程に制御すべく、周期情報一時保存用記憶部７に記憶したバンドパスフィルタ出力切換部２５の出力におけるピッチ周期検出部２７が検出した周期情報を参照して音程変換処理を行い、その結果を音声データ一時保存用記憶部５に出力し、その記憶内容を更新する。こうして得られる音程変換処理結果を音声データ一時保存用記憶部５から音声データ出力部１３を経由して出力させる制御を行う。 As described above, in one embodiment of the present invention, the band-pass filter output switching unit 25 selects only one of the outputs of the BPF 15 to BPF 21 based on the output of the output amplitude comparison unit 23. Is repeated. The pitch period detection unit 27 obtains the period (pitch period) of the audio signal based on the output from the bandpass filter output switching unit 25. The period information temporary storage unit 7 stores the period information detected by the pitch period detection unit 27. The pitch control unit 11 controls the pitch period in the output of the bandpass filter output switching unit 25 stored in the cycle information temporary storage unit 7 so as to control the reproduction of the audio signal to the pitch set by the pitch change rate setting unit 3. The pitch conversion process is performed with reference to the period information detected by the detection unit 27, the result is output to the voice data temporary storage unit 5, and the stored content is updated. Control is performed to output the pitch conversion processing result thus obtained from the voice data temporary storage unit 5 via the voice data output unit 13.

よって、本発明の一実施形態によれば、音程変換に際しては、上記した如く音声信号の周期情報を高精度に得ている条件でなされるので、変換時に生じる雑音を効果的に抑制することが可能となる。また音程を下げたり若しくは上げたりする何れの処理においても、特定された周期間隔以外について波形処理を行っていない。これにより音声の特徴を失い難く、高品質な音程変換を可能にしている。 Therefore, according to one embodiment of the present invention, the pitch conversion is performed under the condition that the period information of the audio signal is obtained with high accuracy as described above, so that noise generated during the conversion can be effectively suppressed. It becomes possible. Also, in any process of lowering or raising the pitch, waveform processing is not performed for anything other than the specified periodic interval. This makes it possible to convert high-quality pitches without losing voice characteristics.

以上、本発明の好適な実施形態を説明したが、これは本発明の説明のための例示であって、本発明の範囲をこの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能であり、例えば図３〜図８を用いての説明では、処理基準位置Ｘが隣接するピーク位置の２点間における中心位置としているが、その隣接するピーク位置の２点間であればどの位置に処理基準位置Ｘを設定してもよいものである。 The preferred embodiment of the present invention has been described above, but this is an example for explaining the present invention, and the scope of the present invention is not limited to this embodiment. The present invention can also be implemented in various other forms. For example, in the description using FIGS. 3 to 8, the processing reference position X is the center position between two adjacent peak positions. The processing reference position X may be set at any position between the two adjacent peak positions.

また、処理基準位置Ｘは隣接するピーク位置の２点間について１点のみでなく２点以上設定してもよい。 Further, the processing reference position X may be set not only at one point but also at two or more points between two adjacent peak positions.

また、ピーク位置の近傍位置をピーク位置とみなしてピーク周期を設定したり、ピーク位置に代えてバンドパスフィルタ出力切換部２５の出力におけるゼロクロス位置或いはその近傍位置を、ピーク周期を決める基準位置とするなどの応用も勿論可能のものである。 Further, the position near the peak position is regarded as the peak position, the peak period is set, or the zero-cross position in the output of the bandpass filter output switching unit 25 or the vicinity thereof is used as a reference position for determining the peak period instead of the peak position. Of course, such applications as possible are possible.

本発明の一実施形態に係る音程変換装置の内部構成を示す機能ブロック図。The functional block diagram which shows the internal structure of the pitch converter which concerns on one Embodiment of this invention. バンドパスフィルタを複数用意した状況を説明するために用いた利得一周波数特性を示す図。The figure which shows the gain one frequency characteristic used in order to demonstrate the condition where two or more band pass filters were prepared. 図１に記載した音声データ入力部からの出力波形の一例を示す信号波形図。The signal waveform diagram which shows an example of the output waveform from the audio | voice data input part described in FIG. 図１に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図１に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図１に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図１に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる模式図。The schematic diagram regarding the process of the pitch conversion process of the input audio | voice data performed in the pitch control part described in FIG. 図１に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる模式図。The schematic diagram regarding the process of the pitch conversion process of the input audio | voice data performed in the pitch control part described in FIG.

Explanation of symbols

１音声データ入力部
３音程変化率設定部
５音声データ一時保存用記憶部
７周期情報一時保存用記憶部
９音声特徴検出部
１１音程制御部
１３音声データ出力部
１５バンドパスフィルタ（ＢＰＦ）
１７バンドパスフィルタ（ＢＰＦ）
１９バンドパスフィルタ（ＢＰＦ）
２１バンドパスフィルタ（ＢＰＦ）
２３出力振幅比較部
２５バンドパスフィルタ出力切換部
２７ピッチ周期検出部DESCRIPTION OF SYMBOLS 1 Voice data input part 3 Pitch change rate setting part 5 Voice data temporary storage part 7 Period information temporary storage part 9 Voice feature detection part 11 Pitch control part 13 Voice data output part 15 Band pass filter (BPF)
17 Bandpass filter (BPF)
19 Band pass filter (BPF)
21 Bandpass filter (BPF)
23 Output amplitude comparison unit 25 Band pass filter output switching unit 27 Pitch period detection unit

Claims

A pitch change rate setting means for changing the pitch of the input audio signal to a desired pitch;
Prepare a plurality of bandpass filters that set each pass frequency band so as to cover the frequency range of human voice and divide the band, and use the output value of each bandpass filter to determine the harmonic components. Harmonic component suppression means for obtaining a fundamental frequency signal of the suppressed speech;
Pitch period detecting means for detecting the period of the audio signal (pitch period) based on the waveform of the fundamental frequency signal;
In order to control the pitch to be changed by the pitch change rate setting means, the pitch is converted by processing the actual waveform of the audio signal with a period corresponding to the pitch period obtained by the pitch period detecting means. A pitch changing device comprising pitch changing means,
In the harmonic component suppression means, the amplitude is a certain amount or more by comparing the threshold value that is set and changed with reference to the output value of each bandpass filter and the output value of each bandpass filter. And an output amplitude comparing means for generating an output for selecting a filter so that a low frequency band is emphasized,
In the pitch conversion means, a pitch conversion apparatus that performs waveform connection processing at intervals corresponding to the pitch period obtained by the pitch period detection means.

2. The pitch changing device according to claim 1, wherein the pitch period detecting means sequentially detects peak positions of the waveform of the fundamental frequency signal or the vicinity of the peak positions, and peak positions where the detected ranks are continuous or their positions. A pitch changing device which is a pitch period detecting means for detecting an interval between peak positions.

2. The pitch changing device according to claim 1, wherein the pitch period detecting means sequentially detects a zero-cross position of the waveform of the fundamental frequency signal or a vicinity thereof and a zero-cross position in which the detected order is continuous or the zero-cross position. A pitch converter which is a pitch period detecting means for detecting the vicinity of positions.

3. The pitch changing device according to claim 2, wherein the pitch changing means detects the peak position detected by the pitch period detecting means so as to control the reproduction of the audio signal to a pitch set by the pitch change rate setting means. The pitch is converted by processing the actual waveform of the audio signal as a processing position at the interval and at the peak position or for each intermediate point of the peak position using the coincidence point on the time axis in the actual waveform of the audio signal as a processing position. A pitch converter as a means.

2. The pitch changing apparatus according to claim 1, wherein the waveform connection processing is performed by the pitch change rate setting means at a processing reference position that determines one or more voice waveforms for each pitch cycle in the middle of the cycle. The time width corresponding to the pitch that has been deleted is deleted, and the end faces are joined together to shorten the pitch period and increase the pitch, or the waveform connecting process is a voice waveform for each pitch period. Are cut out so that the time widths overlap according to the pitch set and changed by the pitch change rate setting means at a processing reference position determined at one or more points in the middle of the cycle, and the end faces are joined together A pitch conversion device that is a process for extending the pitch period and lowering the pitch.

5. The pitch conversion device according to claim 1, wherein, after the output amplitude comparison unit, based on the output of the output amplitude comparison unit, of the outputs from the bandpass filters, A pitch converter comprising filter output switching means for selecting and outputting only one of them.

A step for changing the pitch of the input audio signal to a desired pitch;
Basically, multiple bandpass filters with each pass frequency band set to cover the frequency range of the human voice and to divide the band, and harmonic components are suppressed using these bandpass filters. Determining a frequency signal;
The pitch obtained by the detection in the pitch period detection step so as to control the pitch set by the step of detecting the period (pitch period) of the audio signal based on the waveform of the fundamental frequency signal and the pitch setting step. A pitch conversion method by performing pitch processing by subjecting the actual waveform of the audio signal to waveform processing in a cycle according to the cycle,
In the harmonic component suppression step, an amplitude is equal to or greater than a predetermined amount by comparing a threshold value that is changed with reference to an output value of each bandpass filter and an output value of each bandpass filter; Generating an output for generating an output for selecting a filter so that a low frequency band is emphasized;
In the pitch conversion step, a pitch conversion device that performs waveform connection processing at intervals corresponding to the pitch cycle obtained in the pitch cycle detection step.