JP2008262140A - Musical pitch conversion device and musical pitch conversion method - Google Patents

Musical pitch conversion device and musical pitch conversion method Download PDF

Info

Publication number
JP2008262140A
JP2008262140A JP2007127383A JP2007127383A JP2008262140A JP 2008262140 A JP2008262140 A JP 2008262140A JP 2007127383 A JP2007127383 A JP 2007127383A JP 2007127383 A JP2007127383 A JP 2007127383A JP 2008262140 A JP2008262140 A JP 2008262140A
Authority
JP
Japan
Prior art keywords
pitch
period
output
waveform
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2007127383A
Other languages
Japanese (ja)
Inventor
Saburo Tsuchiya
三郎 土谷
Kenichi Yamazaki
健一 山崎
Tatsuo Yatagai
達雄 谷田貝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AREX KK
Original Assignee
AREX KK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AREX KK filed Critical AREX KK
Priority to JP2007127383A priority Critical patent/JP2008262140A/en
Publication of JP2008262140A publication Critical patent/JP2008262140A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Electrophonic Musical Instruments (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To keep voice quality approximately to the same level as that before conversion, and to suppress noise generated at the time of the conversion, even when a musical pitch of voice is converted. <P>SOLUTION: In a band pass filter output switching section, on the basis of output of an output amplitude comparison section, selection of only one of outputs BPF15 to BPF 21 is repeated. A pitch period detecting section obtains a period of a voice signal on the basis of output from the band pass filter output switching section. A storage section for temporarily storing period information stores the period information detected by the pitch period detecting section. A musical pitch control section performs musical pitch conversion processing of voice data which are stored in a storage section for temporarily storing voice data, with a pitch period of the voice signal as a unit, on the basis of information of the storage section for temporarily storing period information, and outputs the result to the storage section for temporarily storing the voice data, and updates the storage content. The obtained musical pitch conversion processing result is controlled to output from the storage section for temporarily storing the voice data, via a voice data output section. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、入力される音声信号の音程を、所望の音程に変換して出力する音程変換装置、及び音程変換方法に関する。  The present invention relates to a pitch conversion device and a pitch conversion method for converting a pitch of an input audio signal into a desired pitch and outputting the pitch.

従来、入力される音声信号のサンプリング周波数と出力信号のサンプリング周波数との比率を、変換する音程の比率と等しくすることにより、音程(ピッチ周期)を変換する装置が提案されている(例えば特許文献1参照)。また、音声信号の周期を抽出し、音声の1周期毎に波形を切り出した後、音程変換の比率に従った時間間隔で1周期毎の波形を接続することにより、音程(ピッチ周期)を変換する装置が提案されている(例えば特許文献2参照)。  2. Description of the Related Art Conventionally, an apparatus for converting a pitch (pitch period) by making a ratio of a sampling frequency of an input audio signal and a sampling frequency of an output signal equal to a ratio of a pitch to be converted has been proposed (for example, Patent Documents). 1). Also, after extracting the period of the audio signal and cutting out the waveform for each period of the audio, the pitch (pitch period) is converted by connecting the waveforms for each period at time intervals according to the ratio of pitch conversion. An apparatus has been proposed (see, for example, Patent Document 2).

特公平7−62800号公報Japanese Patent Publication No. 7-62800 特開2001−265400号公報JP 2001-265400 A

上記特許文献1で代表される従来技術における音程変換処理は、磁気テープに記録した音声信号を再生する際に、テープの走行速度を変化させることと原理的に等しい。この手法により簡単に音声のピッチ(基本周波数)を変化させることができるが、同時に、フォルマント(声道の共鳴によるピーク)の周波数も変化するため、声質が変化するなど話者固有の特徴も失われ、音程変換によって不自然な音声となるという不具合が生じる。  The pitch conversion process in the prior art represented by the above-mentioned Patent Document 1 is theoretically equivalent to changing the running speed of the tape when reproducing the audio signal recorded on the magnetic tape. This technique can easily change the pitch (fundamental frequency) of the speech, but at the same time, the formant (peak due to resonance of the vocal tract) also changes, so the speaker-specific characteristics such as voice quality change are lost. However, there is a problem that the sound becomes unnatural due to the pitch change.

上記特許文献2で代表される従来技術における音程変換処理は、上述した音声変換による音声の不自然さを解消するためになされたものであり、音声の1周期毎に時間間隔を変化させて波形を変化させて波形を接合することで、音声のピッチが変化してもフォルマントの周波数は一定に保たれるため、音程変換処理の過程にて声質が劣化し難い。しかし、その反面、時間間隔を広くして接合する場合、音声の周期波形間に無音区間を挿入するため、音声の1周期の波形が孤立し、新たに高調波が生じるなど音質の劣化を招来するおそれがある。また、窓関数にて波形処理を行い音声の1周期を抜き出しているが、その際の音質劣化については考慮されていない。
従って、本発明の目的は、音程変換装置において、人間が発した音声の音程を変換しても、変換後の音声品質を変換前のそれと略同一に維持することができ、且つ、変換時に生じる雑音を抑制することができるようにすることにある。
The pitch conversion process in the prior art represented by the above-mentioned Patent Document 2 is performed in order to eliminate the unnaturalness of the voice due to the above-described voice conversion, and the waveform is changed by changing the time interval for each cycle of the voice. Since the formant frequency remains constant even when the pitch of the voice changes, the voice quality is unlikely to deteriorate during the pitch conversion process. On the other hand, when joining with a wide time interval, a silent period is inserted between the periodic waveforms of the voice, so that the waveform of one period of the voice is isolated and a new harmonic is generated, resulting in deterioration of sound quality. There is a risk. Further, the waveform processing is performed by the window function to extract one period of the voice, but the deterioration of the sound quality at that time is not taken into consideration.
Accordingly, an object of the present invention is to maintain the converted voice quality substantially the same as that before the conversion even if the pitch of the voice uttered by a person is converted in the pitch conversion device, and occurs at the time of the conversion. It is to be able to suppress noise.

本発明の第1の観点に従う音程変換装置は、入力される音声信号の音程を、所望の音程に設定変更するための音程変化率設定手段と、人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタを用いて高調波成分を抑圧した基本周波数信号を求める高調波成分抑圧手段と、前記基本周波数信号の波形を基に前記音声信号の実波形の周期を検出するピッチ周期検出手段と、前記音程変化率設定手段により設定変更された音程に制御すべく、前記ピッチ周期検出手段で得たピッチ周期の間隔毎に、前記音声信号の実波形を対象に波形処理することにより音程変換する音程変換手段と、を備える音程変換装置であって、前記高調波成分抑圧手段には、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する出力振幅比較手段を具備し、前記音程変換手段では、前記ピッチ周期検出手段で得たピッチ周期の間隔毎に波形の接続処理を実施する。  A pitch converter according to the first aspect of the present invention covers pitch change rate setting means for changing the pitch of an input audio signal to a desired pitch, a frequency range of a human voice, and a band. A plurality of band pass filters that set each pass frequency band so as to divide the inside, harmonic component suppression means for obtaining a fundamental frequency signal that suppresses the harmonic component using each of these band pass filters; and The pitch period detecting means for detecting the period of the actual waveform of the audio signal based on the waveform of the fundamental frequency signal, and the pitch period detecting means obtained by the pitch period detecting means to control the pitch changed by the pitch change rate setting means. A pitch conversion device comprising pitch conversion means for performing pitch conversion by processing the actual waveform of the audio signal for each pitch period interval, wherein the harmonic component suppression means By comparing the threshold value that is changed based on the output value of each bandpass filter and the output value of each bandpass filter, the amplitude is equal to or greater than a certain amount, and a low frequency band is emphasized. Output amplitude comparison means for generating an output for selecting a filter is provided, and the pitch conversion means performs waveform connection processing for each pitch period interval obtained by the pitch period detection means.

本発明の第1の観点に係る好適な実施形態では、前記ピッチ周期検出手段が、前記基本周波数信号の波形のピーク位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているピーク位置同士若しくはそのピーク位置近傍同士の間隔を検出するピッチ周期検出手段である。  In a preferred embodiment according to the first aspect of the present invention, the pitch period detecting means sequentially detects the peak position of the waveform of the fundamental frequency signal or the vicinity thereof, and the detected order is continuous. It is a pitch period detection means for detecting an interval between peak positions or the vicinity of the peak positions.

上記とは別の実施形態では、前記ピッチ周期検出手段が、前記基本周波数信号の波形のゼロクロス位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているゼロクロス位置若しくはそのゼロクロス位置近傍同士を検出するピッチ周期検出手段である。    In another embodiment different from the above, the pitch period detection means sequentially detects the zero-cross position of the waveform of the fundamental frequency signal or the vicinity thereof, and the zero-cross position or the zero-cross position in which the order of detection is continuous. It is a pitch period detection means for detecting the vicinity.

上記とは別の実施形態では、前記音程変換手段が、前記音声信号の再生を、前記音程変化率設定手段により設定された音程に制御すべく、前記ピッチ周期検出手段が検出したピーク位置の間隔で、且つ、そのピーク位置若しくはピーク位置の中間点毎に前記音声信号の実波形における時間軸上の一致点を処理位置として前記音声信号の実波形を対象に波形処理することで音程変換する手段である。    In another embodiment different from the above, the interval of the peak positions detected by the pitch period detecting means so that the pitch converting means controls the reproduction of the audio signal to the pitch set by the pitch change rate setting means. And means for converting the pitch by processing the actual waveform of the audio signal for the peak position or an intermediate point between the peak positions with the coincidence point on the time axis of the actual waveform of the audio signal as a processing position. It is.

上記とは別の実施形態では、前記出力振幅比較手段の後段には、当該出力振幅比較手段の出力を基に、各バンドパスフィルタからの出力のうち、いずれか1つのみ選択し出力するフィルタ出力切換手段を備えている。    In an embodiment different from the above, a filter that selects and outputs only one of the outputs from each band-pass filter based on the output of the output amplitude comparison means at the subsequent stage of the output amplitude comparison means Output switching means is provided.

本発明の第2の観点に従う音程変換方法は、入力される音声信号の音程を、所望の音程に設定変更するためのステップと、人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタを用いて高調波成分を抑圧した基本周波数信号を求めるステップと、前記基本周波数信号の波形を基に前記音声信号の実波形の周期(ピッチ周期)を検出するステップと、前記設定のステップにより設定変更された音程に制御すべく、前記ピッチ周期検出ステップで得られるピッチ周期毎に、前記音声信号の実波形を対象に波形処理することにより音程変換するステップと、を備える音程変換方法であって、前記高調波成分抑圧ステップでは、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成するための出力を生成するステップと、前記音程変換ステップでは、前記ピッチ周期検出ステップで得たピッチ周期の間隔毎に波形の接続処理を実施する。    A pitch conversion method according to the second aspect of the present invention includes a step for changing the pitch of an input audio signal to a desired pitch, a frequency range of a human voice, and a band division. As described above, a plurality of band-pass filters for setting each pass frequency band are prepared, and a step of obtaining a fundamental frequency signal in which harmonic components are suppressed using each of the band-pass filters and a waveform of the fundamental frequency signal are used. The step of detecting the actual waveform period (pitch period) of the audio signal and the pitch of the audio signal for each pitch period obtained in the pitch period detection step so as to control the pitch changed in the setting step. A pitch conversion method by performing waveform processing on an actual waveform as a target, and in the harmonic component suppression step, each bandpass In order to select a filter so that the amplitude is equal to or greater than a certain amount and the low frequency band is emphasized by comparing the threshold value that is changed based on the output value of the filter with the output value of each bandpass filter. In the step of generating the output for generating the output and the pitch conversion step, a waveform connection process is performed for each pitch cycle interval obtained in the pitch cycle detection step.

本発明によれば、音程変換装置において、入力音声の音程を変化させる処理を行う際、特定された周期間隔以外について波形処理を行うことを回避している。このため、フォルマントなど話者固有の特徴を失い難く、高品質な音程変換を可能にしている。  According to the present invention, when performing the process of changing the pitch of the input voice in the pitch converter, it is avoided to perform the waveform processing for other than the specified periodic interval. Therefore, it is difficult to lose speaker-specific features such as formants, and high-quality pitch conversion is possible.

以下、本発明の実施の形態を、図面により詳細に説明する。  Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図1は、本発明の一実施形態に係る音程変換装置の内部構成を示す機能ブロック図である。  FIG. 1 is a functional block diagram showing an internal configuration of a pitch changing apparatus according to an embodiment of the present invention.

上記音程変換装置は、図1に示すように、音声データ入力部1と、音程変化率設定部3と、音声データ一時保存用記憶部5と、周期情報一時保存用記憶部7と、音声特徴検出部9と、音程制御部11と、音声データ出力部13と、を含む。  As shown in FIG. 1, the pitch conversion device includes a voice data input unit 1, a pitch change rate setting unit 3, a voice data temporary storage unit 5, a periodic information temporary storage unit 7, and a voice feature. A detection unit 9, a pitch control unit 11, and an audio data output unit 13 are included.

音声特徴検出部9は、バンドパスフィルタ(以下、「BPF」と表記する)15、BPF17、BPF19、BPF21と、出力振幅比較部23と、バンドパスフィルタ出力切換部25と、ピッチ周期検出部27と、を備える。  The audio feature detection unit 9 includes a band pass filter (hereinafter referred to as “BPF”) 15, BPF 17, BPF 19, BPF 21, output amplitude comparison unit 23, band pass filter output switching unit 25, and pitch period detection unit 27. And comprising.

これらの各部において、音声データ入力部1には、音程変換対象とするPCMデータ(PCM:Pulse Code Modulation)を入力する。PCMデータが入力された音声データ入力部1は、PCMデータを音声データ一時保存用記憶部5、およびBPF15、BPF17、BPF19、BPF21へと出力する。  In each of these units, PCM data (PCM: Pulse Code Modulation) to be converted into pitches is input to the audio data input unit 1. The voice data input unit 1 to which the PCM data has been input outputs the PCM data to the voice data temporary storage unit 5 and the BPF 15, BPF 17, BPF 19, and BPF 21.

音声データ一時保存用記憶部5は、音声データを音程変換処理するために設けたもので、入力した音声データを一時的に記憶し、この一時的に記憶した音声データを、音程制御部11の制御下で音程変換処理に適した音声データに変換して音声データ出力部13へと出力する。  The voice data temporary storage unit 5 is provided for performing pitch conversion processing on voice data, temporarily stores the input voice data, and stores the temporarily stored voice data in the pitch control unit 11. Under control, the sound data is converted into sound data suitable for the pitch conversion process and output to the sound data output unit 13.

BPF15、BPF17、BPF19、BPF21は、人の声の周波数範囲をカバーし、かつ帯域内を分割するように各通過周波数帯域を設定する。  BPF15, BPF17, BPF19, and BPF21 set each passing frequency band so as to cover the frequency range of human voice and to divide the band.

出力振幅比較部23は、BPF15、BPF17、BPF19、BPF21からの出力を一定区間毎に比較し、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する。  The output amplitude comparison unit 23 compares the outputs from the BPF 15, BPF 17, BPF 19, and BPF 21 for each predetermined section, and generates an output for selecting a filter whose amplitude is equal to or larger than a predetermined amount and places importance on a low frequency band. To do.

バンドパスフィルタ出力切換部25は、出力振幅比較部23の後段に設けられ、この出力振幅比較部23の出力を基に、BPF15、BPF17、BPF19、BPF21の出力のうち、いずれか1つのみを選択し、出力する。  The band-pass filter output switching unit 25 is provided in the subsequent stage of the output amplitude comparison unit 23, and based on the output of the output amplitude comparison unit 23, only one of the outputs of the BPF 15, BPF 17, BPF 19, and BPF 21 is output. Select and output.

ピッチ周期検出部27は、バンドパスフィルタ出力切換部25からの出力を基に音声信号の周期を求め、その結果を出力する。  The pitch period detection unit 27 obtains the period of the audio signal based on the output from the bandpass filter output switching unit 25 and outputs the result.

周期情報一時保存用記憶部7は、ピッチ周期検出部27で検出した周期情報を格納する。  The period information temporary storage unit 7 stores the period information detected by the pitch period detection unit 27.

音程制御部11は、予め音程変化率設定部3で設定した変化率値に応じて、音声データ一時保存用記憶部5に格納した音声データを周期情報一時保存用記憶部7の情報を基に、音声信号の周期を単位として音程変換処理を行い、その結果を音声データ一時保存用記憶部5に出力し、その記憶内容を更新する。こうして得られる音程変換処理結果を音声データ一時保存用記憶部5から音声データ出力部13を経由して出力させる制御を行う。  The pitch control unit 11 converts the voice data stored in the voice data temporary storage unit 5 based on the information in the periodic information temporary storage unit 7 according to the change rate value set in advance by the pitch change rate setting unit 3. Then, the pitch conversion process is performed in units of the cycle of the audio signal, the result is output to the audio data temporary storage unit 5 and the stored content is updated. Control is performed to output the pitch conversion processing result thus obtained from the voice data temporary storage unit 5 via the voice data output unit 13.

前述した各部を備えた構成において、音声特徴検出部9について詳述する。この音声特徴検出部9には、人の声の音程により決まる周期(ピッチ周期)を精度良く検出するため、BPF15、BPF17、BPF19、BPF21の4つからなる帯域フィルタを用意する。それぞれのフィルタの通過帯域は、音程変換対象とする音声の周波数範囲をカバーし、その周波数範囲を分割するように決定する。ここでは、例えば図2に利得A−周波数f〔Hz〕特性で示すように、フィルタの通過帯域での利得Aが1.0となる関係でBPF15のフィルタの通過帯域は100〜200〔Hz〕、BPF17のフィルタの通過帯域は200〜300〔Hz〕、BPF19のフィルタの通過帯域は300〜400〔Hz〕、BPF21の通過帯域は400〜500〔Hz〕の如くに定める。  The voice feature detection unit 9 in the configuration including the above-described units will be described in detail. The voice feature detection unit 9 is provided with four band filters BPF15, BPF17, BPF19, and BPF21 in order to accurately detect a period (pitch period) determined by the pitch of a human voice. The pass band of each filter is determined so as to cover the frequency range of the speech to be converted and to divide the frequency range. Here, for example, as shown by the gain A-frequency f [Hz] characteristic in FIG. 2, the pass band of the filter of the BPF 15 is 100 to 200 [Hz] because the gain A in the pass band of the filter is 1.0. The pass band of the BPF 17 filter is 200 to 300 [Hz], the pass band of the BPF 19 filter is 300 to 400 [Hz], and the pass band of the BPF 21 is 400 to 500 [Hz].

図2に示す関係にあるBPF15、BPF17、BPF19、BPF21の出力を、出力振幅比較部23に入力する。出力振幅比較部23は、BPF15、BPF17、BPF19、BPF21からの出力を一定区間毎に比較し、前記ピッチ周期検出手段における検出で得られる基本周波数に応じた周期で設定変更される閾値と、BPF15、BPF17、BPF19、BPF21からの一定区間毎の平均出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する。  The outputs of BPF 15, BPF 17, BPF 19, and BPF 21 having the relationship shown in FIG. 2 are input to the output amplitude comparison unit 23. The output amplitude comparison unit 23 compares the outputs from the BPF 15, BPF 17, BPF 19, and BPF 21 for each predetermined section, and a threshold value that is set and changed at a period corresponding to the basic frequency obtained by the detection by the pitch period detecting means, and the BPF 15 , BPF17, BPF19, and BPF21 are compared with the average output value for each predetermined section, thereby generating an output for selecting a filter whose amplitude is not less than a certain amount and attaches importance to a low frequency band.

バンドパスフィルタ出力切換部25では、出力振幅比較部23の出力を基に、BPF15、BPF17、BPF19、BPF21の出力のうち、いずれか1つのみを選択する繰り返しがなされる。このため、音声データ入力部1からの音声信号がBPF15、BPF17、BPF19、BPF21へと元音声の波形が例えば図3に示す波形のように入力されると、バンドパスフィルタ出力切換部25の出力は、図4に示すように人の声の基本周波数成分を含み、かつその周波数の高調波成分を抑圧した音声波形となり、ほぼ単一の正弦波となる。  The band pass filter output switching unit 25 repeats selecting only one of the outputs of the BPF 15, BPF 17, BPF 19, and BPF 21 based on the output of the output amplitude comparison unit 23. For this reason, when the sound signal from the sound data input unit 1 is input to the BPF 15, BPF 17, BPF 19, and BPF 21 as the waveform of the original sound as shown in FIG. 3, for example, the output of the bandpass filter output switching unit 25 is output. As shown in FIG. 4, the voice waveform includes the fundamental frequency component of the human voice and suppresses the harmonic component of the frequency, resulting in a substantially single sine wave.

同時に、ピッチ周期検出部27において、バンドパスフィルタ出力切換部25の出力を基に、波形のピーク位置を求める処理と、隣り合うピーク位置2点間の距離(時間)から、その2点間のピッチ周期Aを求める処理とがなされる。なお、図3及び図4中、符号▲1▼、符号▲2▼、符号▲3▼、符号▲4▼、符号▲5▼、符号▲6▼で示す各位置は波形のピーク位置である。このピッチ周期検出部27からの出力は、周期情報一時保存用記憶部7へと順次入力されて、音程制御部11で用いる周期情報として更新記憶される。  At the same time, in the pitch period detection unit 27, based on the output of the bandpass filter output switching unit 25, the processing for obtaining the peak position of the waveform and the distance (time) between two adjacent peak positions, the two points Processing for obtaining the pitch period A is performed. In FIG. 3 and FIG. 4, the positions indicated by reference numerals (1), (2), (3), (4), (5), and (6) are peak positions of the waveform. The output from the pitch cycle detection unit 27 is sequentially input to the cycle information temporary storage unit 7 and updated and stored as cycle information used by the pitch control unit 11.

音程制御部11は、音程変化率設定部3により設定変更された音程に制御すべく、周期情報一時保存用記憶部7に記憶したバンドパスフィルタ出力切換部25の出力におけるピッチ周期検出部27が検出したピーク位置2点間の間隔で、且つそのピーク位置間隔毎に処理基準位置(図3にXで示す位置、以下処理基準位置Xという)を設けて、上記音声信号の実波形における時間軸上の一致点を対象に波形処理することにより音程変換を実現する。図5は音程を下げる処理例を示し、図6は音程を上げる処理例を示す。  The pitch control unit 11 includes a pitch cycle detection unit 27 in the output of the bandpass filter output switching unit 25 stored in the cycle information temporary storage unit 7 in order to control the pitch changed by the pitch change rate setting unit 3. A processing reference position (position indicated by X in FIG. 3, hereinafter referred to as processing reference position X) is provided at intervals between the two detected peak positions and for each peak position interval, and the time axis in the actual waveform of the audio signal The pitch conversion is realized by performing waveform processing on the upper coincidence point. FIG. 5 shows an example of processing for lowering the pitch, and FIG. 6 shows an example of processing for raising the pitch.

図5の音程を下げる処理例では、音程変化率設定部3での設定変更に応じて適宜選定したピーク位置2点間において合成区間を定め、処理基準位置Xを中心に時間軸上で広げるようにずらして合成する。この際、波形を滑らかに接続するため、処理基準位置Xを中心とした一定区間(合成区間)について上記公知のクロスフェード処理を行う。他の周期間隔についても音程変化率設定部3での設定変更に応じて同様にクロスフェード処理を行う。  In the processing example for lowering the pitch in FIG. 5, a synthesis interval is defined between two peak positions appropriately selected according to the setting change in the pitch change rate setting unit 3, and is expanded on the time axis centering on the processing reference position X. To synthesize. At this time, in order to smoothly connect the waveforms, the known cross-fade process is performed for a certain section (composition section) centered on the processing reference position X. For other period intervals, the crossfade process is similarly performed according to the setting change in the pitch change rate setting unit 3.

図6の音程を上げる処理例では、音程変化率設定部3での設定変更に応じて適宜選定したピーク位置2点間において合成区間を定め、処理基準位置Xを中心に時間軸上で狭めるようにずらして合成する。この際、波形を滑らかに接続するため、処理基準位置Xを中心とした一定区間(合成区間)について上記公知のクロスフェード処理を行う。他の周期間隔についても音程変化率設定部3での設定変更に応じて同様にクロスフェード処理を行う。  In the example of increasing the pitch in FIG. 6, a synthesis interval is defined between two peak positions appropriately selected according to the setting change in the pitch change rate setting unit 3, and narrowed on the time axis centering on the processing reference position X. To synthesize. At this time, in order to smoothly connect the waveforms, the known cross-fade process is performed for a certain section (composition section) centered on the processing reference position X. For other period intervals, the crossfade process is similarly performed according to the setting change in the pitch change rate setting unit 3.

図7は音程を下げる処理例を模式的に示す詳細説明図である。まず、音程変化率設定部3での設定変更に応じて図7(a)に示すようにピーク位置▲3▼、ピーク位置▲4▼の2点間におけるピッチ周期A分の音声を取り込み、処理基準位置Xを中心に前後の時間幅を等分に定めて処理範囲を設ける。次に、この処理範囲において図7(b)に示すように処理基準位置Xよりも後の音声振幅を元の振幅から0まで減少させ、また図7(c)に示すように処理基準位置Xよりも前の音声振幅を0から元の振幅まで増加させる。この結果、図7(b)の音声と図7(c)の音声とに処理基準位置Xを中心として重複した時間幅が含まれることになる。次に、図7(d)に示すように処理基準位置Xを中心とした一定区間(合成区間)である上記重複した時間幅の区間において、図7(b)の音声と図7(c)の音声とを突き合わせて合成する公知のクロスフェード処理を行うことにより、音声の周期間隔がAからA’のように広がる。他の周期間隔についても音程変化率設定部3での設定変更に応じて同様にクロスフェード処理を行うことにより、音程を所望通り下げることができる。  FIG. 7 is a detailed explanatory view schematically showing a processing example for lowering the pitch. First, in accordance with the setting change in the pitch change rate setting unit 3, as shown in FIG. 7 (a), the sound corresponding to the pitch period A between the two points of the peak position (3) and the peak position (4) is captured and processed. A processing range is provided by equally dividing the time width before and after the reference position X. Next, in this processing range, the audio amplitude after the processing reference position X is reduced from the original amplitude to 0 as shown in FIG. 7B, and the processing reference position X is shown in FIG. 7C. The voice amplitude before is increased from 0 to the original amplitude. As a result, the time width overlapped around the processing reference position X is included in the sound of FIG. 7B and the sound of FIG. 7C. Next, as shown in FIG. 7 (d), in the section of the overlapped time width, which is a fixed section (composite section) centering on the processing reference position X, the voice of FIG. 7 (b) and FIG. 7 (c). By performing a well-known cross-fade process that synthesizes and matches the voice of the voice, the period interval of the voice is increased from A to A ′. The pitch can be lowered as desired by performing the cross-fading process in the same manner in accordance with the setting change in the pitch change rate setting unit 3 for other periodic intervals.

図8は音程を上げる処理例を模式的に示す詳細説明図である。まず、音程変化率設定部3での設定変更に応じて図8(a)に示すようにピーク位置▲3▼、ピーク位置▲4▼の2点間におけるピッチ周期A分の音声を取り込み、処理基準位置Xを中心に前後の空白時間幅を等分に定めて前後に分離された各処理範囲を設ける。次に、この前後に分離された各処理範囲のうち処理基準位置Xよりも前の処理範囲を対象にして図8(b)に示すように音声振幅を元の振幅から0まで減少させ、また処理基準位置Xよりも後の処理範囲を対象にして図8(c)に示すように音声振幅を0から元の振幅まで増加させる。この結果、図8(b)の音声と図8(c)の音声とに処理基準位置Xを中心として空白の時間幅が含まれることになる。次に、図8(d)に示すように処理基準位置Xを中心とした一定区間(合成区間)である上記空白の時間幅の区間において、図8(b)の音声と図8(c)の音声とを合成する公知のクロスフェード処理を行うことにより、音声の周期間隔がAからA”のように狭まる。他の周期間隔についても音程変化率設定部3での設定変更に応じて同様にクロスフェード処理を行うことにより、音程を所望通り上げることができる。  FIG. 8 is a detailed explanatory view schematically showing an example of processing for raising the pitch. First, in accordance with the setting change in the pitch change rate setting unit 3, as shown in FIG. 8 (a), the sound of the pitch period A between the two points of the peak position (3) and the peak position (4) is captured and processed. Centering on the reference position X, the front and back blank time widths are equally divided to provide respective processing ranges separated in the front and rear. Next, among the processing ranges separated before and after this, the speech amplitude is reduced from the original amplitude to 0 as shown in FIG. 8 (b) for the processing range before the processing reference position X, and For the processing range after the processing reference position X, the audio amplitude is increased from 0 to the original amplitude as shown in FIG. As a result, the voice of FIG. 8B and the voice of FIG. 8C include a blank time width centering on the processing reference position X. Next, as shown in FIG. 8 (d), in the interval of the blank time width, which is a fixed interval (composite interval) centered on the processing reference position X, the voice of FIG. 8 (b) and FIG. 8 (c). By performing a known cross-fading process for synthesizing with the voice of the voice, the period interval of the voice is narrowed from A to A ″. The same applies to other period intervals according to the setting change in the pitch change rate setting unit 3. By performing the crossfading process, the pitch can be raised as desired.

以上説明したように、本発明の一実施形態においては、バンドパスフィルタ出力切換部25では、出力振幅比較部23の出力を基に、BPF15〜BPF21の出力のうち、いずれか1つのみを選択する繰り返しがなされる。ピッチ周期検出部27は、バンドパスフィルタ出力切換部25からの出力を基に音声信号の周期(ピッチ周期)を求める。周期情報一時保存用記憶部7は、ピッチ周期検出部27で検出した周期情報を格納する。音程制御部11は、音声信号の再生を音程変化率設定部3により設定された音程に制御すべく、周期情報一時保存用記憶部7に記憶したバンドパスフィルタ出力切換部25の出力におけるピッチ周期検出部27が検出した周期情報を参照して音程変換処理を行い、その結果を音声データ一時保存用記憶部5に出力し、その記憶内容を更新する。こうして得られる音程変換処理結果を音声データ一時保存用記憶部5から音声データ出力部13を経由して出力させる制御を行う。  As described above, in one embodiment of the present invention, the band-pass filter output switching unit 25 selects only one of the outputs of the BPF 15 to BPF 21 based on the output of the output amplitude comparison unit 23. Is repeated. The pitch period detection unit 27 obtains the period (pitch period) of the audio signal based on the output from the bandpass filter output switching unit 25. The period information temporary storage unit 7 stores the period information detected by the pitch period detection unit 27. The pitch control unit 11 controls the pitch period in the output of the bandpass filter output switching unit 25 stored in the cycle information temporary storage unit 7 so as to control the reproduction of the audio signal to the pitch set by the pitch change rate setting unit 3. The pitch conversion process is performed with reference to the period information detected by the detection unit 27, the result is output to the voice data temporary storage unit 5, and the stored content is updated. Control is performed to output the pitch conversion processing result thus obtained from the voice data temporary storage unit 5 via the voice data output unit 13.

よって、本発明の一実施形態によれば、音程変換に際しては、上記した如く音声信号の周期情報を高精度に得ている条件でなされるので、変換時に生じる雑音を効果的に抑制することが可能となる。また音程を下げたり若しくは上げたりする何れの処理においても、特定された周期間隔以外について波形処理を行っていない。これにより音声の特徴を失い難く、高品質な音程変換を可能にしている。  Therefore, according to one embodiment of the present invention, the pitch conversion is performed under the condition that the period information of the audio signal is obtained with high accuracy as described above, so that noise generated during the conversion can be effectively suppressed. It becomes possible. Also, in any process of lowering or raising the pitch, waveform processing is not performed for anything other than the specified periodic interval. This makes it possible to convert high-quality pitches without losing voice characteristics.

以上、本発明の好適な実施形態を説明したが、これは本発明の説明のための例示であって、本発明の範囲をこの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能であり、例えば図3〜図8を用いての説明では、処理基準位置Xが隣接するピーク位置の2点間における中心位置としているが、その隣接するピーク位置の2点間であればどの位置に処理基準位置Xを設定してもよいものである。  The preferred embodiment of the present invention has been described above, but this is an example for explaining the present invention, and the scope of the present invention is not limited to this embodiment. The present invention can also be implemented in various other forms. For example, in the description using FIGS. 3 to 8, the processing reference position X is the center position between two adjacent peak positions. The processing reference position X may be set at any position between the two adjacent peak positions.

また、処理基準位置Xは隣接するピーク位置の2点間について1点のみでなく2点以上設定してもよい。  Further, the processing reference position X may be set not only at one point but also at two or more points between two adjacent peak positions.

また、ピーク位置の近傍位置をピーク位置とみなしてピーク周期を設定したり、ピーク位置に代えてバンドパスフィルタ出力切換部25の出力におけるゼロクロス位置或いはその近傍位置を、ピーク周期を決める基準位置とするなどの応用も勿論可能のものである。    Further, the position near the peak position is regarded as the peak position, the peak period is set, or the zero-cross position in the output of the bandpass filter output switching unit 25 or the vicinity thereof is used as a reference position for determining the peak period instead of the peak position. Of course, such applications as possible are possible.

本発明の一実施形態に係る音程変換装置の内部構成を示す機能ブロック図。The functional block diagram which shows the internal structure of the pitch converter which concerns on one Embodiment of this invention. バンドパスフィルタを複数用意した状況を説明するために用いた利得一周波数特性を示す図。The figure which shows the gain one frequency characteristic used in order to demonstrate the condition where two or more band pass filters were prepared. 図1に記載した音声データ入力部からの出力波形の一例を示す信号波形図。The signal waveform diagram which shows an example of the output waveform from the audio | voice data input part described in FIG. 図1に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図1に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図1に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる信号波形図。FIG. 2 is a signal waveform diagram related to a process of pitch conversion processing of input voice data performed in a pitch control unit described in FIG. 1. 図1に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる模式図。The schematic diagram regarding the process of the pitch conversion process of the input audio | voice data performed in the pitch control part described in FIG. 図1に記載した音程制御部において行われる、入力音声データの音程変換処理のプロセスに係わる模式図。The schematic diagram regarding the process of the pitch conversion process of the input audio | voice data performed in the pitch control part described in FIG.

符号の説明Explanation of symbols

1 音声データ入力部
3 音程変化率設定部
5 音声データ一時保存用記憶部
7 周期情報一時保存用記憶部
9 音声特徴検出部
11 音程制御部
13 音声データ出力部
15 バンドパスフィルタ(BPF)
17 バンドパスフィルタ(BPF)
19 バンドパスフィルタ(BPF)
21 バンドパスフィルタ(BPF)
23 出力振幅比較部
25 バンドパスフィルタ出力切換部
27 ピッチ周期検出部
DESCRIPTION OF SYMBOLS 1 Voice data input part 3 Pitch change rate setting part 5 Voice data temporary storage part 7 Period information temporary storage part 9 Voice feature detection part 11 Pitch control part 13 Voice data output part 15 Band pass filter (BPF)
17 Bandpass filter (BPF)
19 Band pass filter (BPF)
21 Bandpass filter (BPF)
23 Output amplitude comparison unit 25 Band pass filter output switching unit 27 Pitch period detection unit

Claims (7)

入力される音声信号の音程を、所望の音程に設定変更するための音程変化率設定手段と、
人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタの出力値を用いて高調波成分を抑圧した音声の基本周波数信号を求める高調波成分抑圧手段と、
前記基本周波数信号の波形を基に前記音声信号の周期(ピッチ周期)を検出するピッチ周期検出手段と、
前記音程変化率設定手段により設定変更された音程に制御すべく、前記ピッチ周期検出手段で得たピッチ周期に応じた周期で、前記音声信号の実波形を対象に波形処理することにより音程変換する音程変換手段と、を備える音程変換装置であって、
前記高調波成分抑圧手段には、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成する出力振幅比較手段を具備し、
前記音程変換手段では、前記ピッチ周期検出手段で得たピッチ周期に応じた間隔毎に波形の接続処理を実施する音程変換装置。
A pitch change rate setting means for changing the pitch of the input audio signal to a desired pitch;
Prepare a plurality of bandpass filters that set each pass frequency band so as to cover the frequency range of human voice and divide the band, and use the output value of each bandpass filter to determine the harmonic components. Harmonic component suppression means for obtaining a fundamental frequency signal of the suppressed speech;
Pitch period detecting means for detecting the period of the audio signal (pitch period) based on the waveform of the fundamental frequency signal;
In order to control the pitch to be changed by the pitch change rate setting means, the pitch is converted by processing the actual waveform of the audio signal with a period corresponding to the pitch period obtained by the pitch period detecting means. A pitch changing device comprising pitch changing means,
In the harmonic component suppression means, the amplitude is a certain amount or more by comparing the threshold value that is set and changed with reference to the output value of each bandpass filter and the output value of each bandpass filter. And an output amplitude comparing means for generating an output for selecting a filter so that a low frequency band is emphasized,
In the pitch conversion means, a pitch conversion apparatus that performs waveform connection processing at intervals corresponding to the pitch period obtained by the pitch period detection means.
請求項1記載の音程変換装置において、前記ピッチ周期検出手段が、前記基本周波数信号の波形のピーク位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているピーク位置同士若しくはそのピーク位置近傍同士の間隔を検出するピッチ周期検出手段である音程変換装置。  2. The pitch changing device according to claim 1, wherein the pitch period detecting means sequentially detects peak positions of the waveform of the fundamental frequency signal or the vicinity of the peak positions, and peak positions where the detected ranks are continuous or their positions. A pitch changing device which is a pitch period detecting means for detecting an interval between peak positions. 請求項1記載の音程変換装置において、前記ピッチ周期検出手段が、前記基本周波数信号の波形のゼロクロス位置若しくはその位置近傍を順次検出すると共に、検出される順位が連続しているゼロクロス位置若しくはそのゼロクロス位置近傍同士を検出するピッチ周期検出手段である音程変換装置。  2. The pitch changing device according to claim 1, wherein the pitch period detecting means sequentially detects a zero-cross position of the waveform of the fundamental frequency signal or a vicinity thereof and a zero-cross position in which the detected order is continuous or the zero-cross position. A pitch converter which is a pitch period detecting means for detecting the vicinity of positions. 請求項2記載の音程変換装置において、前記音程変換手段が、前記音声信号の再生を、前記音程変化率設定手段により設定された音程に制御すべく、前記ピッチ周期検出手段が検出したピーク位置の間隔で、且つ、そのピーク位置若しくはピーク位置の中間点毎に前記音声信号の実波形における時間軸上の一致点を処理位置として前記音声信号の実波形を対象に波形処理することで音程変換する手段である音程変換装置。  3. The pitch changing device according to claim 2, wherein the pitch changing means detects the peak position detected by the pitch period detecting means so as to control the reproduction of the audio signal to a pitch set by the pitch change rate setting means. The pitch is converted by processing the actual waveform of the audio signal as a processing position at the interval and at the peak position or for each intermediate point of the peak position using the coincidence point on the time axis in the actual waveform of the audio signal as a processing position. A pitch converter as a means. 請求項1記載の音程変換装置において、前記波形の接続処理が、前記ピッチ周期毎の音声波形を、その周期の中間に1点以上定める処理基準位置にて、前記音程変化率設定手段により設定変更された音程に応じた時間幅を削除した上で、その端面同士を接合することによりピッチ周期を短縮し音程を上げる処理であり、あるいは、前記波形の接続処理が、前記ピッチ周期毎の音声波形を、その周期の中間に1点以上定める処理基準位置にて、前記音程変化率設定手段により設定変更された音程に応じて時間幅が重複するように切り出した上で、その端面同士を接合することによりピッチ周期を伸長し音程を下げる処理である音程変換装置。  2. The pitch changing apparatus according to claim 1, wherein the waveform connection processing is performed by the pitch change rate setting means at a processing reference position that determines one or more voice waveforms for each pitch cycle in the middle of the cycle. The time width corresponding to the pitch that has been deleted is deleted, and the end faces are joined together to shorten the pitch period and increase the pitch, or the waveform connecting process is a voice waveform for each pitch period. Are cut out so that the time widths overlap according to the pitch set and changed by the pitch change rate setting means at a processing reference position determined at one or more points in the middle of the cycle, and the end faces are joined together A pitch conversion device that is a process for extending the pitch period and lowering the pitch. 請求項1乃至請求項4の何れか1項記載の音程変換装置において、前記出力振幅比較手段の後段には、当該出力振幅比較手段の出力を基に、各バンドパスフィルタからの出力のうち、いずれか1つのみ選択し出力するフィルタ出力切換手段を備えている音程変換装置。  5. The pitch conversion device according to claim 1, wherein, after the output amplitude comparison unit, based on the output of the output amplitude comparison unit, of the outputs from the bandpass filters, A pitch converter comprising filter output switching means for selecting and outputting only one of them. 入力される音声信号の音程を、所望の音程に設定変更するためのステップと、
人の声の周波数範囲をカバーし、且つ帯域内を分割するように、各通過周波数帯域を設定したバンドパスフィルタを複数用意し、これらの各バンドパスフィルタを用いて高調波成分を抑圧した基本周波数信号を求めるステップと、
前記基本周波数信号の波形を基に前記音声信号の周期(ピッチ周期)を検出するステップと、前記音程設定のステップにより設定された音程に制御すべく、前記ピッチ周期検出ステップにおける検出で得られるピッチ周期に応じた周期で、前記音声信号の実波形を対象に波形処理することにより音程変換するステップと、を備える音程変換方法であって、
前記高調波成分抑圧ステップでは、前記各バンドパスフィルタの出力値を基準として設定変更される閾値と、前記各バンドパスフィルタの出力値とを比較することで、振幅が一定量以上であり、かつ低い周波数帯域を重視するようフィルタを選択するための出力を生成するための出力を生成するステップと、
前記音程変換ステップでは、前記ピッチ周期検出ステップで得たピッチ周期に応じた間隔毎に波形の接続処理を実施する音程変換装置。
A step for changing the pitch of the input audio signal to a desired pitch;
Basically, multiple bandpass filters with each pass frequency band set to cover the frequency range of the human voice and to divide the band, and harmonic components are suppressed using these bandpass filters. Determining a frequency signal;
The pitch obtained by the detection in the pitch period detection step so as to control the pitch set by the step of detecting the period (pitch period) of the audio signal based on the waveform of the fundamental frequency signal and the pitch setting step. A pitch conversion method by performing pitch processing by subjecting the actual waveform of the audio signal to waveform processing in a cycle according to the cycle,
In the harmonic component suppression step, an amplitude is equal to or greater than a predetermined amount by comparing a threshold value that is changed with reference to an output value of each bandpass filter and an output value of each bandpass filter; Generating an output for generating an output for selecting a filter so that a low frequency band is emphasized;
In the pitch conversion step, a pitch conversion device that performs waveform connection processing at intervals corresponding to the pitch cycle obtained in the pitch cycle detection step.
JP2007127383A 2007-04-11 2007-04-11 Musical pitch conversion device and musical pitch conversion method Pending JP2008262140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007127383A JP2008262140A (en) 2007-04-11 2007-04-11 Musical pitch conversion device and musical pitch conversion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007127383A JP2008262140A (en) 2007-04-11 2007-04-11 Musical pitch conversion device and musical pitch conversion method

Publications (1)

Publication Number Publication Date
JP2008262140A true JP2008262140A (en) 2008-10-30

Family

ID=39984632

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007127383A Pending JP2008262140A (en) 2007-04-11 2007-04-11 Musical pitch conversion device and musical pitch conversion method

Country Status (1)

Country Link
JP (1) JP2008262140A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012046447A1 (en) * 2010-10-06 2012-04-12 パナソニック株式会社 Encoding device, decoding device, encoding method, and decoding method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012046447A1 (en) * 2010-10-06 2012-04-12 パナソニック株式会社 Encoding device, decoding device, encoding method, and decoding method
JPWO2012046447A1 (en) * 2010-10-06 2014-02-24 パナソニック株式会社 Encoding device, decoding device, encoding method, and decoding method
US9117461B2 (en) 2010-10-06 2015-08-25 Panasonic Corporation Coding device, decoding device, coding method, and decoding method for audio signals

Similar Documents

Publication Publication Date Title
JP3815347B2 (en) Singing synthesis method and apparatus, and recording medium
US11410637B2 (en) Voice synthesis method, voice synthesis device, and storage medium
EP2264696B1 (en) Voice converter with extraction and modification of attribute data
JP6024191B2 (en) Speech synthesis apparatus and speech synthesis method
JP2007316254A (en) Audio signal interpolation method and audio signal interpolation device
JP2006030575A (en) Speech synthesizing device and program
Bonada et al. Sample-based singing voice synthesizer by spectral concatenation
US6629067B1 (en) Range control system
JPH0193795A (en) Enunciation speed conversion for voice
JP2002268658A (en) Device, method, and program for analyzing and synthesizing voice
JPH04358200A (en) Speech synthesizer
JP3795201B2 (en) Acoustic signal encoding method and computer-readable recording medium
JP2008262140A (en) Musical pitch conversion device and musical pitch conversion method
JP3197975B2 (en) Pitch control method and device
JP4552533B2 (en) Acoustic signal processing apparatus and voice level calculation method
JP2000010597A (en) Speech transforming device and method therefor
JP4513556B2 (en) Speech analysis / synthesis apparatus and program
JP2000003200A (en) Voice signal processor and voice signal processing method
JP3289511B2 (en) How to create sound source data for speech synthesis
JP3410387B2 (en) Speech unit creation device, speech synthesis device, speech unit creation method, speech synthesis method, and recording medium
JP3540609B2 (en) Voice conversion device and voice conversion method
JP2654643B2 (en) Voice analysis method
JP2008020870A (en) Method and apparatus for converting voice speed
JPWO2003042648A1 (en) Speech coding apparatus, speech decoding apparatus, speech coding method, and speech decoding method
JP2000099100A (en) Voice conversion device