JP4471780B2 - Audio signal processing apparatus and method - Google Patents

Audio signal processing apparatus and method Download PDF

Info

Publication number
JP4471780B2
JP4471780B2 JP2004243882A JP2004243882A JP4471780B2 JP 4471780 B2 JP4471780 B2 JP 4471780B2 JP 2004243882 A JP2004243882 A JP 2004243882A JP 2004243882 A JP2004243882 A JP 2004243882A JP 4471780 B2 JP4471780 B2 JP 4471780B2
Authority
JP
Japan
Prior art keywords
signal
period
pitch
audio signal
pitch period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2004243882A
Other languages
Japanese (ja)
Other versions
JP2006064755A (en
Inventor
孝之 稗方
哲也 高橋
陽平 池田
敏章 下田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kobe Steel Ltd
Original Assignee
Kobe Steel Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kobe Steel Ltd filed Critical Kobe Steel Ltd
Priority to JP2004243882A priority Critical patent/JP4471780B2/en
Publication of JP2006064755A publication Critical patent/JP2006064755A/en
Application granted granted Critical
Publication of JP4471780B2 publication Critical patent/JP4471780B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Reverberation, Karaoke And Other Acoustics (AREA)

Description

本発明は,入力音声信号からピッチ周期を検出し,そのピッチ周期に基づいて入力音声信号の時間軸の圧縮や伸張を行う音声信号処理装置及びその方法に関するものである。   The present invention relates to an audio signal processing apparatus and method for detecting a pitch period from an input audio signal and compressing or expanding the time axis of the input audio signal based on the pitch period.

カラオケのテンポ(速度)変更やビデオの再生速度変更等を行う際に,音程を変えずに音声信号(オーディオ信号)の再生速度を速くしたり遅くしたりする時間軸圧縮伸張処理(音声信号処理の一例)が要求される。
従来,非特許文献1及び非特許文献2には,音声信号の周期性の強い部分を見出し,その周期(ピッチ周期)の単位での音声信号の省略や繰り返し(挿入)によって(ピッチ周期に基づく)時間軸圧縮伸張処理を行う技術が示されている。この技術では,音声信号における省略するピッチ周期分の信号をその次のピッチ周期分の信号にクロスフェードの重み付けにより重複加算する,或いは挿入するピッチ周期分の信号をその前後のピッチ周期分の信号をクロスフェードの重み付けにより重複加算した信号とするPICOLA(Pointer Interval Control OverLap and Add,ポインター移動量制御による重複加算法)という手法が採用されている。
Time-axis compression / expansion processing (audio signal processing) that increases or decreases the playback speed of audio signals (audio signals) without changing the pitch when changing the tempo (speed) of karaoke or changing the playback speed of video Example) is required.
Conventionally, in Non-Patent Document 1 and Non-Patent Document 2, a portion having a strong periodicity of an audio signal is found, and the audio signal is omitted or repeated (inserted) in units of the cycle (pitch cycle) (based on the pitch cycle). ) A technique for performing time-axis compression / decompression processing is shown. In this technology, a signal corresponding to a pitch period that is omitted in an audio signal is added to a signal corresponding to the next pitch period by cross-fading weighting, or a signal corresponding to a pitch period that is inserted is added to a signal corresponding to the pitch period before and after the signal. A technique called PICOLA (Pointer Interval Control OverLap and Add, overlap addition method by pointer movement amount control) is employed in which the signal is obtained by overlapping and adding by cross-fading weighting.

図5は,PICOLA方式により時間軸圧縮が行われる際の音声信号の波形を模式的に表したものである。
まず,図5(a)に示すように,時間軸圧縮(音声信号の省略)の対象となる音声信号の範囲の先頭位置Po1にポインタが設定され,このポインタ位置Po1からの音声信号について,そのピッチ周期P(強い周期性を有する周期)が検出される。ピッチ周期Pの検出方法の例については後述する。
次に,図5(b)に示すように,前記ポインタ位置Po1からピッチ周期P分の(ピッチ周期Pの長さの)2つの信号a,bをクロスフェードの重み付けにより重複加算した信号a’を生成する。即ち,2つの信号a,bを合成(加算)する際に,図5(a)に破線W1,W2で示すように,信号aに対する重みは時間軸が進むに従ってフェードアウト(次第に低下)し,信号bに対する重みは時間軸が進むに従ってフェードイン(次第に増大)するようクロスフェードの重み付けがなされる。
次に,信号aを削除(省略)するとともに,信号bを信号a’に置き換える。これにより,1ピッチ周期P分の時間軸圧縮が完了する。ここで,音声信号の省略部に設定された信号a’は,クロスフェードの重み付けにより重複加算した信号であるので,その前後の音声信号との繋がりがスムーズとなり,違和感の少ない時間軸圧縮が可能となる。
次に,目標圧縮比がRx(0<Rx<1)であるとすると,ポインタが,前記Po1の位置からC(=P×Rx/(1−Rx))だけ進んだ位置Po2に再設定され,前記Po1の位置から位置Po2までの圧縮処理後の音声信号が出力されるとともに,このポインタ位置Po2から同様の時間軸圧縮処理が繰り返される。これにより,P+Cの長さの元の音声信号から,Cの長さの圧縮音声信号が生成(出力)されることになり,目標圧縮比Rx(=C/(P+C))を達成する時間軸圧縮がなされる。
FIG. 5 schematically shows a waveform of an audio signal when time axis compression is performed by the PICOLA method.
First, as shown in FIG. 5 (a), a pointer is set at the start position Po1 of the range of the audio signal to be subjected to time axis compression (omission of the audio signal), and the audio signal from the pointer position Po1 is A pitch period P (a period having a strong periodicity) is detected. An example of a method for detecting the pitch period P will be described later.
Next, as shown in FIG. 5 (b), a signal a ′ obtained by overlapping and adding two signals a and b corresponding to the pitch period P (the length of the pitch period P) from the pointer position Po1 by cross-fading weighting. Is generated. That is, when the two signals a and b are combined (added), as indicated by broken lines W1 and W2 in FIG. 5A, the weight for the signal a fades out (gradually decreases) as the time axis advances. The weight for b is weighted so that it fades in (increases gradually) as the time axis advances.
Next, the signal a is deleted (omitted) and the signal b is replaced with the signal a ′. Thereby, the time axis compression for one pitch period P is completed. Here, since the signal a ′ set in the omitted portion of the audio signal is a signal that is overlapped and added by weighting of the cross fade, the connection with the audio signals before and after the smooth becomes smooth, and the time axis compression with less sense of incongruity is possible. It becomes.
Next, assuming that the target compression ratio is Rx (0 <Rx <1), the pointer is reset to a position Po2 advanced by C (= P × Rx / (1-Rx)) from the position of Po1. The audio signal after the compression processing from the position Po1 to the position Po2 is output, and the same time axis compression processing is repeated from the pointer position Po2. As a result, a compressed audio signal having a length of C is generated (output) from the original audio signal having a length of P + C, and the time axis for achieving the target compression ratio Rx (= C / (P + C)). Compression is done.

一方,図6は,PICOLA方式により時間軸伸張が行われる際の音声信号の波形を模式的に表したものである。
まず,図6(a)に示すように,時間軸伸張(音声信号の挿入)の対象となる音声信号の範囲の先頭位置Po3にポインタが設定され,このポインタ位置Po3からの音声信号について,そのピッチ周期P(強い周期性を有する周期)が検出される。
次に,図6(b)に示すように,前記ポインタ位置Po3からピッチ周期P分の(ピッチ周期Pの長さの)2つの信号a,bをクロスフェードの重み付けにより重複加算した信号a’を生成する。時間軸伸張の場合のクロスフェードの重み付けは,図6(a)に破線W3,W4で示すように,信号aに対する重みは時間軸が進むに従ってフェードイン(次第に増加)し,信号bに対する重みは時間軸が進むに従ってフェードアウト(次第に低下)するよう重み付けがなされる。
次に,信号a,bの間に信号a’を挿入する。これにより,1ピッチ周期P分の時間軸伸張が完了する。ここで,挿入された信号a’は,クロスフェードの重み付けにより重複加算した信号であるので,その前後の音声信号との繋がりがスムーズとなり,違和感の少ない時間軸伸張が可能となる。
次に,目標伸張比がRy(0<Ry<1)であるとすると,ポインタが,前記Po3の位置からP+S(S=P×1/(Ry−1))だけ進んだ位置Po4に再設定され,前記Po3の位置から位置Po4までの伸張処理後の音声信号が出力されるとともに,このポインタ位置Po4から同様の時間軸伸張処理が繰り返される。これにより,Sの長さの元の音声信号から,P+Sの長さの圧縮音声信号が生成(出力)されることになり,目標伸張比Ry(=(P+S)/S)を達成する時間軸伸張がなされる。
On the other hand, FIG. 6 schematically shows the waveform of an audio signal when time axis expansion is performed by the PICOLA method.
First, as shown in FIG. 6A, a pointer is set at the start position Po3 of the range of the audio signal to be subjected to time axis expansion (audio signal insertion), and the audio signal from the pointer position Po3 is A pitch period P (a period having a strong periodicity) is detected.
Next, as shown in FIG. 6 (b), a signal a ′ obtained by overlapping and adding two signals a and b corresponding to the pitch period P (the length of the pitch period P) from the pointer position Po3 by cross-fading weighting. Is generated. As shown by broken lines W3 and W4 in FIG. 6 (a), the weight for the signal a fades in (increases gradually) as the time axis advances, and the weight for the signal b is Weighting is performed so that fade-out (gradual decrease) occurs as the time axis advances.
Next, the signal a ′ is inserted between the signals a and b. Thereby, the time base extension for one pitch period P is completed. Here, since the inserted signal a ′ is a signal that is overlapped and added by weighting the crossfade, the connection with the audio signals before and after that becomes smooth, and the time axis can be expanded with little discomfort.
Next, assuming that the target expansion ratio is Ry (0 <Ry <1), the pointer is reset to a position Po4 advanced by P + S (S = P × 1 / (Ry−1)) from the position Po3. Then, the audio signal after the expansion process from the position Po3 to the position Po4 is output, and the same time axis expansion process is repeated from the pointer position Po4. As a result, a compressed audio signal having a length of P + S is generated (output) from the original audio signal having a length of S, and a time axis for achieving the target expansion ratio Ry (= (P + S) / S). Stretching is done.

ところで,処理する音声信号が,ステレオオーディオ信号等のように複数チャンネルの音声信号である場合,各チャンネルについてPICOLAを適用すると,ピッチ周期を求める高負荷の演算をチャンネルごとに実行する必要があるため演算負荷が非常に高くなることに加え,チャンネルごとにピッチ周期が異なりうるので,圧縮伸張処理後の音声信号にチャンネル間で元の音声信号とは異なる位相差が生じ,聞く人に違和感を与えてしまうという問題点がある。
この問題を解決するためには,音声信号の圧縮伸張に用いるピッチ周期を,全てのチャンネルで統一(共通化)することが有効である。
例えば,特許文献1には,ステレオ音声信号のLチャンネルとRチャンネルとを加算した信号(L+R)についてピッチ周期を検出し,そのピッチ周期に基づいて両チャンネルの音声信号の圧縮伸張処理(PICOLA)を行う技術が提案されている。
さらに,特許文献2には,複数のチャンネル信号を加算した信号或いは最大の振幅を有するチャンネル信号についてピッチ周期を検出し,そのピッチ周期に基づいて全てのチャンネル信号の圧縮伸張処理を行う技術が提案されている。
これらの技術により,ピッチ周期を求める高負荷の演算を1つの音声信号について求めるだけで済むので演算負荷の増大を防止できるとともに,圧縮伸張処理後の音声信号に,聞く人に違和感を与えるようなチャンネル間での信号の位相差が生じることを防止できる。
特開2001−5500号公報 特開2002−297200号公報 森田,板倉「自己相関関数を用いた音声の時間軸での伸縮」,日本音響学会講演論文集,昭和61年3月,p.199−200 森田,板倉「ポインター移動量制御による重複加算法(PICOLA)を用いた音声の時間軸での伸張圧縮とその評価」,日本音響学会講演論文集,昭和61年10月,p.149−150 猿渡洋「アレー信号処理を用いたブラインド音源分離の基礎」,電子情報通信学会技術報告,2001年4月,vol.EA2001−7,p.49−56 高谷智哉他「SIMOモデルに基づくICAを用いた高忠実度なブラインド音源分離」,電子情報通信学会技術報告,2003年1月,vol.US2002−87,EA2002−108
By the way, if the audio signal to be processed is a multi-channel audio signal such as a stereo audio signal, applying PICOLA to each channel requires a high-load operation for obtaining the pitch period to be executed for each channel. In addition to the extremely high computational load, the pitch period can be different for each channel, resulting in a phase difference that differs from the original audio signal between channels in the audio signal after compression / expansion processing, giving the listener a sense of incongruity. There is a problem that.
In order to solve this problem, it is effective to unify (commonize) the pitch period used for compression / expansion of audio signals in all channels.
For example, in Patent Document 1, a pitch period is detected for a signal (L + R) obtained by adding the L channel and the R channel of a stereo audio signal, and the compression / expansion processing (PICOLA) of the audio signals of both channels is performed based on the pitch period. A technique for performing the above has been proposed.
Further, Patent Document 2 proposes a technique for detecting a pitch period of a signal obtained by adding a plurality of channel signals or a channel signal having the maximum amplitude, and compressing / decompressing all the channel signals based on the pitch period. Has been.
With these technologies, it is only necessary to obtain a high-load operation for determining the pitch period for one audio signal, so that the increase in the operation load can be prevented and the audio signal after compression / decompression processing can be uncomfortable for the listener. It is possible to prevent a signal phase difference between channels from occurring.
JP 2001-5500 A JP 2002-297200 A Morita, Itakura, “Expansion and contraction of speech in time axis using autocorrelation function”, Proceedings of the Acoustical Society of Japan, March 1986, p. 199-200 Morita, Itakura, “Stretching and compressing speech on the time axis using the overlap addition method (PICOLA) with pointer movement control and its evaluation”, Proc. Of the Acoustical Society of Japan, October 1986, p. 149-150 Hiroshi Saruwatari “Basics of Blind Sound Source Separation Using Array Signal Processing”, IEICE Technical Report, April 2001, vol. EA2001-7, p. 49-56 Tomoya Takatani et al. “High fidelity blind source separation using ICA based on SIMO model”, IEICE Technical Report, January 2003, vol. US2002-87, EA2002-108

ここで,ピッチ周期の検出対象となる音声信号(1チャンネル(モノラル)の入力音声信号や,複数チャンネルの入力音声信号の合成音声信号)に,複数の異なる音源からの音声信号が混在している場合,特許文献1や特許文献2に示される技術では,最も周期性の強い代表的な音源の音声信号に対応するピッチ周期が検出されることになる。
このため,特許文献1や特許文献2に示される技術では,複数音源の信号が混在する場合における時間軸圧縮又は伸張後の音声信号において,代表的な一の音源の音声信号は明瞭となるが,その他の音源からの音声信号については明瞭感がなくなり,音声信号全体としての品質劣化につながるという問題点があった。
例えば,入力音声信号に,人の歌唱音と楽器の演奏音とが混在する場合,演奏音は明瞭であるが,歌唱音が不明瞭となる等の音質劣化が生じる。
従って,本発明は上記事情に鑑みてなされたものであり,その目的とするところは,入力音声信号からピッチ周期を検出し,そのピッチ周期に基づいて入力音声信号の時間軸の圧縮や伸張を行う場合に,入力音声信号に複数の音源の音声信号が混在する場合であっても,圧縮・伸張後の音声信号において,複数の音源からの音声信号各々の明瞭感をバランス良く保って音質劣化を防止できる音声信号処理装置及びその方法を提供することにある。
Here, audio signals from a plurality of different sound sources are mixed in an audio signal (one channel (monaural) input audio signal or a synthesized audio signal of input audio signals of a plurality of channels) whose pitch cycle is to be detected. In this case, with the techniques disclosed in Patent Document 1 and Patent Document 2, the pitch period corresponding to the sound signal of a representative sound source having the strongest periodicity is detected.
For this reason, in the techniques shown in Patent Document 1 and Patent Document 2, in the audio signal after time-axis compression or expansion when the signals of a plurality of sound sources coexist, the sound signal of one representative sound source becomes clear. However, there is a problem that the sound signal from other sound sources is unclear and leads to quality deterioration of the sound signal as a whole.
For example, when a human singing sound and a musical instrument performance sound are mixed in the input sound signal, the performance sound is clear, but the sound quality is deteriorated such that the singing sound is unclear.
Accordingly, the present invention has been made in view of the above circumstances, and its object is to detect a pitch period from an input audio signal and to compress or expand the time axis of the input audio signal based on the pitch period. When performing, even if audio signals from multiple sound sources are mixed in the input audio signal, the audio signal after compression / expansion maintains the clarity of each of the audio signals from multiple sound sources in a well-balanced sound quality. Is to provide an audio signal processing apparatus and method thereof.

上記目的を達成するために本発明は,複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から第1の所定の周波数帯の音声信号を分離した音声信号を生成する第1の信号処理手段と,複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から前記第1の所定の周波数帯とは異なる第2の所定の周波数帯の音声信号を分離した音声信号を生成する第2の信号処理手段と,前記第1の信号処理手段により分離された前記第1の所定の周波数帯に含まれる特定の音声信号を構成する周波数成分であるピッチ周期の候補を検出する手段であって,前記入力音声信号又は前記合成音声信号を所定の第1のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数のピッチ周期の候補として検出する第1のピッチ周期候補検出手段と,前記第2の信号処理手段により分離された前記第2の所定の周波数帯に含まれる特定の音声信号を前記第1のピッチ周期用サンプリング周期とは異なる所定の第2のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数の第2のピッチ周期の候補として検出する第2のピッチ周期候補検出手段と,前記1又は複数の第1のピッチ周期の候補と前記1又は複数の第2のピッチ周期の候補とに共通する候補の中から,最も周期性の強い一のピッチ周期を選択するピッチ周期選択手段と,前記ピッチ周期選択手段により選択された前記一のピッチ周期に基づいて前記入力音声信号の時間軸の圧縮及び/又は伸張を行う時間軸調節手段と,を具備してなることを特徴とする音声信号処理装置として構成することが考えられる。
即ち,一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号(以下,ピッチ周期検出用信号という)から一のピッチ周期を検出するにあたり,まず第1段階として,そのピッチ周期検出用信号に所定の信号処理(第1の信号処理)を施した後の信号に基づいてその信号のピッチ周期の複数候補を検出する。
ここで,前記第1の信号処理は,前記入力音声信号に混在する複数の音源からの音声信号の一部を抽出若しくは除去したり,一部の音声信号の周期性(ピッチ周期)を強調若しくは減衰させる等の処理である。また,前記ピッチ周期の複数の候補は,例えば,ピッチ周期としての評価値が高いものから既定数分を候補とすること等が考えられる。
これにより,前記入力音声信号に複数の音源からの音声信号が混在する場合に,それらの中で必ずしも代表的(最も周期性が強い)とはいえない音源の音声信号(例えば,楽器演奏音が混在する場合の歌唱音声信号等)の抽出等を前記信号処理によって行い,その信号に対応したピッチ周期の複数候補を検出できる。
さらに,第2段階として,前記ピッチ周期検出信号又はこれに上記第1の信号処理と異なる他の信号処理(第2の信号処理)を施した信号に基づいて,前記ピッチ周期の複数候補の中から一のピッチ周期を選択する。
このようにして選択された前記一のピッチ周期は,前記ピッチ周期の複数候補検出に用いた信号,即ち,前記ピッチ周期検出用信号に混在する音声信号の中から前記信号処理によって抽出或いは強調等された音源の音声信号と,その他の音源の音声信号との両方に対応したピッチ周期となる。
従って,このようにして検出(選択)された前記一のピッチ周期に基づいて前記入力音声信号の時間軸の圧縮処理や伸張処理を行えば,その処理後の音声信号において,前記入力音声信号に混在する複数の音源からの音声信号各々の明瞭感をバランス良く保つことができる。
In order to achieve the above object, the present invention separates an audio signal of the first predetermined frequency band from one input audio signal in which audio from a plurality of sound sources is mixed or a synthesized audio signal of input audio signals of a plurality of channels. The first signal processing means for generating an audio signal differs from the first predetermined frequency band from one input audio signal mixed with audio from a plurality of sound sources or a synthesized audio signal of a plurality of channels of input audio signals. Second signal processing means for generating an audio signal obtained by separating an audio signal of the second predetermined frequency band; and a specific signal included in the first predetermined frequency band separated by the first signal processing means A means for detecting a pitch period candidate which is a frequency component constituting an audio signal, wherein the input audio signal or the synthesized audio signal is sampled at a predetermined first pitch period sampling period. First pitch period candidate detecting means for detecting one or a plurality of pitch period candidates in ascending order of signal intensity difference between a sampled signal of one period and a sampled signal of another period; Sampling a specific audio signal included in the second predetermined frequency band separated by the signal processing means at a predetermined second pitch period sampling period different from the first pitch period sampling period; Second pitch period candidate detecting means for detecting as one or a plurality of second pitch period candidates in ascending order of difference in signal intensity between the sampled signal of one period and the sampled signal of the other period; said one or more first candidate pitch period, the one or from the common candidate and the candidate of the plurality of second pitch period, strong most periodicity one Pitch period selecting means for selecting a pitch period; and time axis adjusting means for compressing and / or expanding the time axis of the input audio signal based on the one pitch period selected by the pitch period selecting means. It may be configured as an audio signal processing device characterized by comprising.
That is, in detecting one pitch period from one input voice signal or a synthesized voice signal of a plurality of channels of input voice signals (hereinafter referred to as a pitch period detection signal), first, as a first step, the pitch period detection signal A plurality of candidates for the pitch period of the signal is detected based on the signal after being subjected to predetermined signal processing (first signal processing).
Here, the first signal processing extracts or removes a part of audio signals from a plurality of sound sources mixed in the input audio signal, emphasizes the periodicity (pitch period) of some audio signals, or It is processing such as attenuation. The plurality of pitch cycle candidates may be, for example, a predetermined number of candidates having a high evaluation value as the pitch cycle.
As a result, when the audio signals from a plurality of sound sources are mixed in the input sound signal, the sound signal of the sound source that is not necessarily representative (the strongest periodicity) among them (for example, the musical instrument performance sound is The singing voice signal and the like in the case of mixing can be extracted by the signal processing, and a plurality of pitch cycle candidates corresponding to the signal can be detected.
Further, as a second stage, based on the pitch period detection signal or a signal obtained by subjecting the pitch period detection signal to another signal process (second signal process) different from the first signal process, a plurality of pitch period candidates are selected. To select one pitch period.
The one pitch period selected in this way is extracted or emphasized by the signal processing from the signals used for detecting a plurality of candidates of the pitch period, that is, audio signals mixed in the pitch period detection signal. The pitch period corresponds to both the sound signal of the generated sound source and the sound signals of the other sound sources.
Therefore, if the compression process or the expansion process of the time axis of the input audio signal is performed based on the one pitch period detected (selected) in this way, the input audio signal is converted into the input audio signal in the processed audio signal. The clarity of each of the audio signals from a plurality of mixed sound sources can be maintained in a well-balanced manner.

さらに,本発明に係る音声信号処理装置は,複数の音源からの一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から前記第1の所定の周波数帯及び前記第2の所定の周波数帯とは異なる1又は複数の第3の所定の周波数帯の音声信号を分離する第3の信号処理手段と,第3の信号処理手段によって分離された第3の所定の周波数帯の音声信号を第4の所定のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1つの前記周期の信号と他の前記周期の信号との差を周期性が強い周期として,ピッチ周期検出信号の複数の第3の候補として検出し,前記複数の第1の候補と前記複数の第3の候補との双方に対応した周期性の強い複数の周期をピッチ周期の第3の候補として選択して,前記ピッチ周期の複数候補を絞り込むピッチ周期絞り込み手段とを具備し,前記ピッチ周期選択手段が,前記ピッチ周期絞り込み手段により絞り込まれた候補の中から一のピッチ周期を選択してなることが考えられる。
即ち,前記ピッチ周期検出用信号に前述の信号処理(第1の信号処理)とは異なる1又は複数の信号処理(第3の信号処理)を施し,その処理後の1又は複数の信号各々に基づいて,前述の第1段階の処理により検出された前記ピッチ周期の複数候補を絞り込み,その絞り込まれた候補の中から,前述の第2段階の処理によって前記一のピッチ周期を選択することも考えられる。
これにより,前述の第1段階と第2段階との間の中間段階において,前記第1段階及び第2段階におけるピッチ周期の検出若しくは選択の対象となる音源の音声信号とは異なる他の音源の音声信号の抽出,強調等(第3の信号処理)が可能となり,これらの音声信号にも対応したピッチ周期が選択(絞り込み)されることとなる。その結果,前述の第1段階及び第2段階と,それらの間の1以上の中間段階とで,3種以上の異なる音源の音声信号各々にバランス良く対応したピッチ周期を求めことが可能となる。
ここで,前記第1の信号処理が,帯域制限フィルタによって,前記第1の所定の周波数帯の音声信号を分離するものであることが考えられる。
また,前記第2の信号処理及び/又は前記第3の信号処理が,帯域制限フィルタによって,前記第1の所定の周波数帯とは異なる周波数帯の音声信号を分離するものであることが考えられる。
即ち,各段階(第1,第2,その中間)での前記信号処理としては,例えば,各々異なる周波数帯域についての帯域制限フィルタ処理等とすることが考えられる。
その他,イコライジングによる周波数強調によって特定の周波数帯の信号を増幅或いは減衰させる信号処理や,ブラインド音源分離方式(BSS方式)によって前記入力音声信号に含まれる複数の音源の音声信号を分離する信号処理等が考えられる。なお,BSS(Blind Source Separation)方式の詳細は,例えば非特許文献3や非特許文献4等に詳説されている。
Furthermore, the audio signal processing apparatus according to the present invention includes the first predetermined frequency band and the second predetermined frequency from one input audio signal from a plurality of sound sources or a synthesized audio signal of a plurality of channels of input audio signals. A third signal processing means for separating one or a plurality of third predetermined frequency band audio signals different from the band, and a third predetermined frequency band audio signal separated by the third signal processing means. Sampling is performed at a sampling period for the fourth predetermined pitch period, and the signal of one period and the other of the above-mentioned signals are sampled in ascending order of signal strength difference between the sampled signal of one period and the sampled signal of the other period. The difference from the period signal is detected as a plurality of third candidates of the pitch period detection signal as a period having a strong periodicity, and both the plurality of first candidates and the plurality of third candidates are supported. period Pitch period narrowing means for selecting a plurality of strong periods as a third candidate for the pitch period and narrowing down the plurality of pitch period candidates, and the pitch period selecting means is narrowed down by the pitch period narrowing means. It is conceivable that one pitch period is selected from the candidates.
That is, the pitch period detection signal is subjected to one or more signal processing (third signal processing) different from the signal processing described above (first signal processing), and each of the one or more signals after the processing is subjected to the processing. On the basis of this, it is also possible to narrow down a plurality of candidates for the pitch period detected by the above-described first stage processing, and to select the one pitch period from the narrowed candidates by the above-described second stage processing. Conceivable.
As a result, in an intermediate stage between the first stage and the second stage described above, another sound source different from the sound signal of the sound source to be detected or selected in the pitch period in the first stage and the second stage is selected. Extraction, enhancement, and the like (third signal processing) of audio signals are possible, and pitch periods corresponding to these audio signals are selected (narrowed down). As a result, the first and second steps described above, in the one or more intermediate stages between them, can asking you to balance well the corresponding pitch period in speech signals each of three or more different sound sources Become.
Here, it can be considered that the first signal processing is to separate the audio signal of the first predetermined frequency band by a band limiting filter.
Further, it is considered that the second signal processing and / or the third signal processing is to separate an audio signal having a frequency band different from the first predetermined frequency band by a band limiting filter. .
That is, the signal processing at each stage (first, second, and intermediate) may be, for example, band limiting filter processing for different frequency bands.
In addition, signal processing for amplifying or attenuating a signal in a specific frequency band by frequency enhancement by equalizing, signal processing for separating audio signals of a plurality of sound sources included in the input audio signal by a blind sound source separation method (BSS method), etc. Can be considered. Details of the BSS (Blind Source Separation) method are described in detail in Non-Patent Document 3, Non-Patent Document 4, and the like.

本発明によれば,第1段階で入力音声信号若しくは複数の入力音声信号の合成音声信号から信号処理により抽出,強調等を行った所望の音源の音声信号に対応したピッチ周期の複数候補を検出し,その複数候補の中から,その後の第2段階で他の音源の音声信号に対応した一のピッチ周期を選択し,さらにはそれらの中間段階でさらに他の音源の音声信号に対応したピッチ周期の候補を絞り込むことにより,複数の音源の音声信号各々にバランス良く対応したピッチ周期を選択することができる。そして,そのようにして選択されたピッチ周期に基づいて,入力音声信号の時間軸の圧縮処理や伸張処理を行うことにより,その処理後の音声信号において,前記入力音声信号に混在する複数の音源からの音声信号各々の明瞭感をバランス良く保って音質劣化を防止することができる。   According to the present invention, a plurality of pitch cycle candidates corresponding to the sound signal of a desired sound source extracted and enhanced by signal processing from an input sound signal or a synthesized sound signal of a plurality of input sound signals in the first stage is detected. Then, one pitch period corresponding to the sound signal of the other sound source is selected from the plurality of candidates in the second stage thereafter, and further, the pitch corresponding to the sound signal of the other sound source is selected in the intermediate stage. By narrowing down the period candidates, it is possible to select a pitch period corresponding to each of the sound signals of a plurality of sound sources in a balanced manner. Then, by performing compression processing and expansion processing on the time axis of the input audio signal based on the pitch period thus selected, a plurality of sound sources mixed in the input audio signal in the processed audio signal Therefore, it is possible to prevent the sound quality from deteriorating by keeping the clearness of each of the sound signals from the sound balance.

以下添付図面を参照しながら,本発明の実施の形態について説明し,本発明の理解に供する。尚,以下の実施の形態は,本発明を具体化した一例であって,本発明の技術的範囲を限定する性格のものではない。
ここに,図1は本発明の第1実施形態に係る音声信号処理装置Z1の概略構成を表すブロック図,図2は本発明の第2実施形態に係る音声信号処理装置Z2の概略構成を表すブロック図,図3は本発明の第3実施形態に係る音声信号処理装置Z3の概略構成を表すブロック図,図4は本発明の第4実施形態に係る音声信号処理装置Z4の概略構成を表すブロック図,図5はPICOLA方式により音声信号の時間軸圧縮が行われる際の音声信号の波形を模式的に表した図,図6はPICOLA方式により音声信号の時間軸伸張が行われる際の音声信号の波形を模式的に表した図,図7は時間軸圧縮・伸張処理に用いられる音声信号(原音)の波形の一例を表す図,図8及び図9は図7に示す音声信号(原音)に従来の手法で時間軸伸張を行った後の信号の波形の一例を表す図,図10は図7に示す音声信号(原音)に本発明の手法で時間軸伸張を行った後の信号の波形の一例を表す図,図11は楽曲音声信号の波形の一例及びその楽曲音声信号に対して従来の手法と本発明の手法とで検出されたピッチ周期の時間変化を表すグラフである。
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings so that the present invention can be understood. The following embodiment is an example embodying the present invention, and does not limit the technical scope of the present invention.
FIG. 1 is a block diagram showing a schematic configuration of the audio signal processing device Z1 according to the first embodiment of the present invention, and FIG. 2 shows a schematic configuration of the audio signal processing device Z2 according to the second embodiment of the present invention. FIG. 3 is a block diagram showing a schematic configuration of an audio signal processing device Z3 according to the third embodiment of the present invention, and FIG. 4 shows a schematic configuration of an audio signal processing device Z4 according to the fourth embodiment of the present invention. FIG. 5 is a block diagram, FIG. 5 is a diagram schematically showing a waveform of an audio signal when the time base compression of the audio signal is performed by the PICOLA system, and FIG. 6 is an audio when the time axis expansion of the audio signal is performed by the PICOLA system. FIG. 7 is a diagram schematically showing the waveform of a signal, FIG. 7 is a diagram showing an example of a waveform of an audio signal (original sound) used for time axis compression / expansion processing, and FIGS. 8 and 9 are audio signals (original sound) ) To extend the time axis using the conventional method FIG. 10 is a diagram showing an example of the waveform of the subsequent signal, FIG. 10 is a diagram showing an example of the waveform of the signal after the time axis extension is performed on the audio signal (original sound) shown in FIG. It is a graph showing the time change of the pitch period detected with the example of the conventional method and the method of this invention with respect to the example of the waveform of an audio | voice signal, and the music audio | voice signal.

<第1実施形態>
以下,図1に示すブロック図を用いて,本発明の第1実施形態に係る音声信号処理装置Z1について説明する。
図1に示すように,音声信号処理装置Z1は,第1フィルタ(1)と,ピッチ周期検出部2と,ピッチ周期選択部3と,信号圧縮/伸張部4とを具備している。
前記第1フィルタ(1)は,外部から入力される入力モノラル信号M(一の入力音声信号及びピッチ周期検出用信号の一例)に,帯域制限フィルタ処理(第1の信号処理の一例)を施すものである。
前記ピッチ周期検出部2は,前記第1フィルタ(1)によるフィルタ処理後の信号を入力し,その信号のピッチ周期の複数候補を検出するものである(ピッチ周期候補検出手段の一例)。
前記ピッチ周期選択部3は,前記入力モノラル信号(ピッチ周期検出用信号の一例)を入力し,その信号に基づいて,前記ピッチ周期検出部2によって検出されたピッチ周期の複数候補の中から,信号圧縮又は伸張に用いる一のピッチ周期を選択するものである(ピッチ周期選択手段の一例)。
前記信号圧縮/伸張部4は,前記ピッチ周期選択部3(ピッチ周期選択手段)により選択された一のピッチ周期を入力し,これを用いて,例えば,前述したPICOLA方式(図5,図6参照)により,前記入力モノラル信号M(入力音声信号の一例)の時間軸の圧縮及び伸張を行うものである(時間軸調節手段の一例)。
図1に示す音声処理装置Z1及び後述する他の実施形態に係る音声処理装置Z2〜Z4は,その各構成要素を,それぞれCPUやメモリ等からなる処理回路やDSP(Digital Signal Processor)として構成することが考えられるが,その他にも,各構成要素が行う処理(工程)を実現する処理プログラムを所定のコンピュータによって実行するもの等であってもよい。
本音声信号処理装置Z1の特徴は,前記第1フィルタ(1)及び前記ピッチ周期検出部2と,前記ピッチ周期選択部3とにより,ピッチ周期検出を2段階で行う点にある。以下,これについて詳述する。
<First Embodiment>
Hereinafter, the audio signal processing device Z1 according to the first embodiment of the present invention will be described with reference to the block diagram shown in FIG.
As shown in FIG. 1, the audio signal processing device Z1 includes a first filter (1), a pitch period detection unit 2, a pitch period selection unit 3, and a signal compression / expansion unit 4.
The first filter (1) performs band-limiting filter processing (an example of first signal processing) on an external input monaural signal M (an example of one input audio signal and a pitch period detection signal). Is.
The pitch period detection unit 2 receives a signal after filtering by the first filter (1) and detects a plurality of candidates for the pitch period of the signal (an example of pitch period candidate detection means).
The pitch cycle selection unit 3 inputs the input monaural signal (an example of a pitch cycle detection signal), and based on the signal, from among a plurality of pitch cycle candidates detected by the pitch cycle detection unit 2, One pitch period used for signal compression or expansion is selected (an example of pitch period selection means).
The signal compression / decompression unit 4 inputs one pitch cycle selected by the pitch cycle selection unit 3 (pitch cycle selection means), and uses this, for example, the PICOLA system (FIGS. 5 and 6). ) To compress and expand the time axis of the input monaural signal M (an example of an input audio signal) (an example of a time axis adjusting unit).
The speech processing device Z1 shown in FIG. 1 and speech processing devices Z2 to Z4 according to other embodiments to be described later are configured as processing circuits or DSPs (Digital Signal Processors) each composed of a CPU, a memory, and the like. In addition, a processing program that realizes processing (steps) performed by each component may be executed by a predetermined computer.
The audio signal processing device Z1 is characterized in that pitch period detection is performed in two stages by the first filter (1), the pitch period detection unit 2, and the pitch period selection unit 3. This will be described in detail below.

<<第1段階>>
まず,前記第1フィルタ(1)により,前記入力モノラル信号Mに対し,バンドバスフィルタ,ローパスフィルタ,ハイパスフィルタ等の帯域制限フィルタ処理を施す。
この第1フィルタ(1)では,前記入力モノラル信号Mに複数の音源からの音声信号が混在する場合に,それらの中で必ずしも代表的(最も周期性が強い)とはいえない音源の音声信号であって,圧縮/伸張後の明瞭感を確保したい音声信号(例えば,楽器演奏音が混在する場合の歌唱音声信号等)の帯域(人の音声の場合,例えば,200Hz〜8KHz)のみを通過させるようなフィルタ処理を施す。
そして,その信号処理後の信号に対応したピッチ周期の複数候補を,前記ピッチ周期検出部2により検出する。ここで,前記ピッチ周期検出部2によるピッチ周期の複数候補の検出(算出)方法の一例を以下に示す。
前記入力モノラル信号M(例えば,歌唱音声や楽器音等が混在したオーディオ信号)のピッチ周期として適正と考えられるピッチ周期Pの全候補j(jはデジタル音声信号のサンプル数を表し,時間換算したピッチ周期は,「j×サンプリング周期」となる。)として予めj=N0〜Nの所定範囲を設定し,前記第1フィルタ(1)による信号処理(フィルタ処理)後のデジタル音声信号をピッチ周期の評価対象信号Xiとし,その(2N+1)点分のサンプル信号Xi(i=0〜2N,i≧1)について,前記ピッチ周期の全候補j(N0〜N)それぞれについての周期性の強さを評価する。そして,最も周期性の評価結果に基づいてピッチ周期の複数候補を求める。例えば,最も周期性が強いと評価されるものから順に,予め定められた個数(複数個)分,若しくは予め設定された評価値よりも周期性が強いと評価されたもの(複数),或いはそれらの組合せ等によってピッチ周期の複数候補を求める。
この場合,周期性の評価対象とする信号Xiの時間範囲i(サンプル数)を0〜N(ここで,参照される評価対象信号の最大時間範囲は,0〜2N)としたときに,周期性の強さの評価関数を,次の(1)式や(2)式とすることが考えられる。

Figure 0004471780
これらは,jサンプルだけ離れた信号値どうしの差(絶対値又は2乗値)を計算し,その差が小さいほど周期jにおける周期性が強い(即ち,周期jごとに似た波形が現れる)として評価するものである。従って,j=N0〜Nそれぞれについて,(1)式又は(2)式による評価値を計算し,その評価値が最も小さいもの(最も周期性評価が高いもの)から所定の規則に従った複数個分のjをピッチ周期の複数候補として検出(算出)する。
上記以外にも,例えば,jの範囲を複数区間に分割し,その分割区間毎に最も周期性評価の高いもの(前記評価値の最も小さいもの)を選択する方法も考えられる。即ち,jの区間をN0〜N1,N1〜N2,…,Nk〜N(但し,N0<N1<N2<…<Nk<N)というように分割し,分割区間各々において周期性評価が最大となる(例えば,(1)式や(2)式による評価値が最小となる)jをピッチ周期の複数候補とする。 << First Stage >>
First, the first filter (1) performs band-limiting filter processing such as a band-pass filter, a low-pass filter, and a high-pass filter on the input monaural signal M.
In the first filter (1), when audio signals from a plurality of sound sources are mixed in the input monaural signal M, the sound signal of a sound source that is not necessarily representative (having the strongest periodicity) among them. However, it passes only the band (for example, 200 Hz to 8 KHz in the case of human voice) of an audio signal (for example, a singing audio signal when musical instrument performance sounds are mixed) for which a clear sense after compression / decompression is desired. Filter processing is performed.
Then, a plurality of pitch cycle candidates corresponding to the signal after the signal processing are detected by the pitch cycle detection unit 2. Here, an example of a method for detecting (calculating) a plurality of pitch cycle candidates by the pitch cycle detection unit 2 will be described below.
All candidates j (j represents the number of samples of the digital audio signal) and converted to time, which are considered to be appropriate as the pitch period of the input monaural signal M (for example, an audio signal in which singing voice or instrument sound is mixed) The pitch period is “j × sampling period.”), A predetermined range of j = N 0 to N is set in advance, and the digital audio signal after the signal processing (filter processing) by the first filter (1) is pitched. A cycle evaluation target signal X i, and (2N + 1) point sample signals X i (i = 0 to 2N, i ≧ 1) for each of the pitch cycle candidates j (N 0 to N). Assess sexual strength. Then, a plurality of pitch cycle candidates are obtained based on the most periodic evaluation result. For example, in order from the one evaluated to have the strongest periodicity, a predetermined number (plural), or ones evaluated to have a periodicity stronger than a preset evaluation value, or those A plurality of pitch period candidates are obtained by combining the above.
In this case, when the time range i (number of samples) of the signal X i to be evaluated for periodicity is 0 to N (where the maximum time range of the evaluation target signal to be referred to is 0 to 2N), It can be considered that the evaluation function of the strength of periodicity is the following formula (1) or (2).
Figure 0004471780
These calculate the difference (absolute value or square value) between signal values separated by j samples, and the smaller the difference, the stronger the periodicity in period j (that is, a similar waveform appears for each period j). Is to be evaluated. Therefore, for each j = N 0 to N, the evaluation value according to the expression (1) or (2) is calculated, and the evaluation value is the smallest (the one with the highest periodicity evaluation) and the predetermined rule is followed. A plurality of j's are detected (calculated) as a plurality of pitch cycle candidates.
In addition to the above, for example, a method in which the range of j is divided into a plurality of sections and the one having the highest periodicity evaluation (the one having the smallest evaluation value) is selected for each divided section. That is, the section j is divided into N 0 to N 1 , N 1 to N 2 ,..., N k to N (where N 0 <N 1 <N 2 <... <N k <N). Let j be the largest candidate for the pitch period in which the periodicity evaluation is maximum in each section (for example, the evaluation value according to the expressions (1) and (2) is minimum).

<<第2段階>>
次に,前記ピッチ周期選択部3により,前記入力モノラル信号Mに基づいて,前記第1段階で得られたピッチ周期の複数候補の中から,圧縮/伸張に用いる一のピッチ周期を選択する。
具体的には,前記入力モノラル信号M(デジタル音声信号)を前記ピッチ周期の評価対象信号Xiとし,その(2N+1)点分のサンプル信号Xi(i=0〜2N,i≧1)について,前記第1段階で求めたピッチ周期の候補それぞれについての周期性の強さを評価した上で,最も周期性の強いピッチ周期を圧縮/伸張に用いる一のピッチ周期とする。ピッチ周期の評価方法は,前記第1段階と同様である。
このようにして選択された一のピッチ周期は,前記ピッチ周期の複数候補検出に用いた信号,即ち,前記入力モノラル信号M(ピッチ周期検出用信号)に混在する音声信号の中から前記第1フィルタ(1)によって抽出された音源の音声信号と,その他の音源の音声信号との両方に対応したピッチ周期となる。
そして,前記信号圧縮/伸張部4では,前記入力モノラル信号Mに基づいて前記2段階の処理により検出された前記一のピッチ周期を用いて,前記入力モノラル信号Mについて所望の圧縮率(伸張率)で時間軸圧縮(伸張)がなされ,圧縮(伸張)後の音声信号M’が出力される。ここで,圧縮・伸張の方式は,前述したPICOLA方式が採用される。
このようにして出力される圧縮・伸張処理後の音声信号M’においては,前記入力音声信号に混在する複数の音源からの音声信号各々の明瞭感をバランス良く保つことができ,音質が向上する。
<< Second Stage >>
Next, based on the input monaural signal M, the pitch cycle selection unit 3 selects one pitch cycle used for compression / expansion from among a plurality of pitch cycle candidates obtained in the first stage.
Specifically, the input monaural signal M (digital audio signal) is set as the evaluation signal X i for the pitch period, and the sample signal X i for (2N + 1) points (i = 0 to 2N, i ≧ 1). After evaluating the strength of the periodicity for each of the pitch period candidates obtained in the first stage, the pitch period having the strongest periodicity is set as one pitch period used for compression / expansion. The pitch period evaluation method is the same as in the first stage.
One pitch period selected in this way is the signal used for detecting a plurality of candidates for the pitch period, that is, the audio signal mixed in the input monaural signal M (pitch period detection signal). The pitch period corresponds to both the sound signal of the sound source extracted by the filter (1) and the sound signals of other sound sources.
Then, the signal compression / decompression unit 4 uses the one pitch period detected by the two-stage processing based on the input monaural signal M to obtain a desired compression rate (expansion rate) for the input monaural signal M. ) Is time-axis compressed (expanded), and the compressed (expanded) audio signal M ′ is output. Here, the above-described PICOLA method is adopted as the compression / decompression method.
In the audio signal M ′ after compression / decompression processing output in this way, the clarity of each of the audio signals from a plurality of sound sources mixed in the input audio signal can be maintained in a well-balanced manner, and the sound quality is improved. .

<第2実施形態>
次に,図2のブロック図を用いて,本発明の第2実施形態に係る音声信号処理装置Z2について説明する。
図2に示すように,音声信号処理装置Z2は,前記音声信号処理装置Z1に新たな構成要素として合成信号生成部5を加えたものである。
入力音声信号が,ステレオオーディオ信号等のように複数チャンネルの入力音声信号である場合,各チャンネル信号ごとにピッチ周期の検出及び圧縮/伸張を行った信号を合成すると,チャンネルごとにピッチ周期が異なり得るので,圧縮/伸張処理後の音声信号にチャンネル間で元の音声信号とは異なる位相差が生じ,聞く人に違和感を与えてしまう。
この問題を解決するためには,音声信号の圧縮/伸張に用いるピッチ周期を,全てのチャンネルで統一(共通化)することが有効である。
そこで,当該音声信号処理装置Z2では,前記合成信号生成部5により,複数チャンネルの入力ステレオ信号(入力音声信号の一例)の合成音声信号(ピッチ周期検出用信号の一例)を生成し,その合成音声信号に基づいて前記信号処理装置Z1と同様に2段階の処理を経て一のピッチ周期を求める。
前記合成信号生成部5としては,例えば,各チャンネル信号を加算(ステレオ2チャンネルの場合,L+R)するものや,各チャンネル信号を加算した信号(L+R)と減算した信号(L−R)とを生成し,そのうちのいずれかパワー(振幅)の大きい方を前記合成音声信号とするもの等が考えられる。
そして,前記信号圧縮/伸張部4では,前記合成音声信号に基づいて前記2段階の処理により検出された前記一のピッチ周期を用いて,前記ステレオ信号(L,R)の両チャンネル信号それぞれについて所望の圧縮率(伸張率)で時間軸圧縮(伸張)がなされ,圧縮(伸張)後の音声信号L’,R’が出力される。ここで,圧縮・伸張の方式は,前述したPICOLA方式が採用される。
このように,複数チャンネルの音声入力信号から得た1つのピッチ周期Pに基づいて,全てのチャンネル信号の圧縮・伸張処理がなされるので,演算負荷の増大や,聞く人に違和感を与えるような圧縮・伸張後のチャンネル間の位相差発生を防止できる。このような構成も,本発明の実施形態の一例である。
<Second Embodiment>
Next, an audio signal processing device Z2 according to a second embodiment of the present invention will be described using the block diagram of FIG.
As shown in FIG. 2, the audio signal processing device Z2 is obtained by adding a synthesized signal generation unit 5 as a new component to the audio signal processing device Z1.
When the input audio signal is a multi-channel input audio signal such as a stereo audio signal, the pitch period differs for each channel when the signals with the detected pitch period and compression / expansion are synthesized for each channel signal. Therefore, a phase difference different from that of the original audio signal occurs between channels in the audio signal after compression / expansion processing, which gives the listener a sense of incongruity.
In order to solve this problem, it is effective to unify (commonize) the pitch period used for compression / decompression of audio signals in all channels.
Therefore, in the audio signal processing device Z2, the synthesized signal generation unit 5 generates a synthesized audio signal (an example of a pitch period detection signal) of a multi-channel input stereo signal (an example of an input audio signal), and synthesizes the synthesized audio signal. Based on the audio signal, one pitch cycle is obtained through two stages of processing in the same manner as the signal processing device Z1.
As the composite signal generation unit 5, for example, an addition of each channel signal (L + R in the case of two stereo channels), a signal (L + R) obtained by adding each channel signal, and a signal (LR) obtained by subtraction. It is possible to generate the generated voice signal and use the higher one of the power (amplitude) as the synthesized voice signal.
Then, the signal compression / decompression unit 4 uses the one pitch period detected by the two-stage processing based on the synthesized speech signal, for both channel signals of the stereo signal (L, R). Time-axis compression (expansion) is performed at a desired compression rate (expansion rate), and audio signals L ′ and R ′ after compression (expansion) are output. Here, the above-described PICOLA method is adopted as the compression / decompression method.
In this way, all the channel signals are compressed / expanded based on one pitch period P obtained from a plurality of channels of audio input signals, so that the calculation load increases and the listener is uncomfortable. It is possible to prevent the phase difference between the channels after compression and expansion. Such a configuration is also an example of an embodiment of the present invention.

<第3実施形態>
次に,図3のブロック図を用いて,本発明の第3実施形態に係る音声信号処理装置Z3について説明する。
図3に示すように,音声信号処理装置Z3は,前記音声信号処理装置Z1に新たな構成要素として,前記ピッチ周期選択部3への入力信号に対して帯域制限フィルタ処理(第2の信号処理の一例)を施す第2フィルタ(6)を加えたものである。この第2フィルタ(6)のフィルタ特性は,前記第1フィルタ(1)のフィルタ特性とは異なるものである。
このように,前記第2段階における前記ピッチ周期選択部3(ピッチ周期選択手段の一例)において,前記入力モノラル信号M(ピッチ周期検出用信号の一例)に前記第1フィルタ(1)の信号処理とは異なるフィルタ処理(第2の信号処理の一例)を施した信号に基づいて,前記ピッチ周期の複数候補の中から一のピッチ周期を選択する構成も考えられる。
この第2段階でのフィルタ処理(信号処理)により,例えば,最も周期性の強い音源からの音声信号を除去する,或いは,所望の音源からの音声信号のみを抽出する等により,ピッチ周期検出に用いる音源信号を任意に選択でき,圧縮/伸張後の信号(M’)について所望の音質調整を行うことが可能となる。
もちろん,このように第2段階において信号処理を行う構成を,前記音声信号処理装置Z2(複数チャンネルの入力音声信号(ステレオオーディオ信号等)の処理装置)に適用することも考えられる。
<Third Embodiment>
Next, an audio signal processing device Z3 according to a third embodiment of the present invention will be described using the block diagram of FIG.
As shown in FIG. 3, the audio signal processing device Z3 is a new component of the audio signal processing device Z1, and performs band-limiting filter processing (second signal processing) on the input signal to the pitch period selection unit 3. The second filter (6) is applied. The filter characteristic of the second filter (6) is different from the filter characteristic of the first filter (1).
In this way, in the pitch cycle selection unit 3 (an example of pitch cycle selection means) in the second stage, the signal processing of the first filter (1) is applied to the input monaural signal M (an example of a pitch cycle detection signal). A configuration is also conceivable in which one pitch period is selected from a plurality of candidates for the pitch period based on a signal subjected to filter processing (an example of second signal processing) different from the above.
By this filtering process (signal processing) in the second stage, for example, the sound signal from the sound source with the strongest periodicity is removed, or only the sound signal from the desired sound source is extracted. The sound source signal to be used can be arbitrarily selected, and desired sound quality adjustment can be performed on the signal (M ′) after compression / decompression.
Of course, it is also conceivable to apply the configuration in which signal processing is performed in the second stage to the audio signal processing device Z2 (processing device for a plurality of channels of input audio signals (stereo audio signals, etc.)).

<第4実施形態>
次に,図4のブロック図を用いて,本発明の第4実施形態に係る音声信号処理装置Z4について説明する。
図4に示すように,音声信号処理装置Z4は,前記音声信号処理装置Z1に新たな構成要素として,前記第1段階と前記第2段階とにおけるピッチ周期の複数候補検出と一のピッチ周期選択との間の中間段階で,前記第1段階で検出されたピッチ周期の複数候補をさらに絞り込むピッチ周期候補中間選択部20を加えたものである。
前記ピッチ周期候補中間選択部20は,前記入力モノラル信号M(ピッチ周期検出用信号の一例)に,前記第1フィルタ(1)の処理(第1の信号処理)とは各々異なるフィルタ処理(第3の信号処理の一例)を施す複数の第3フィルタ11,12,…,1Nと,それらによりフィルタ処理が施された後の複数の信号各々に基づいて,前記ピッチ周期検出部2(ピッチ周期候補検出手段)により検出された前記ピッチ周期の複数候補を順次絞り込む複数のピッチ周期中間選択部21,22,…,2Nとを具備している(ピッチ周期絞り込み手段の一例)。
前記第3フィルタ(11〜1N)各々は,前記第1段階及び第2段階におけるピッチ周期検出(選択)の対象となる音源の音声信号とは異なる他の音源の音声信号を抽出するフィルタ特性とする。
ここで,前記ピッチ周期中間選択部(21〜2N)は,相互に直列接続されており,前記入力モノラル信号M(ピッチ周期検出用信号の一例)に対して各々前記第1フィルタ(1)と異なるフィルタ処理が施された信号を前記ピッチ周期の評価対象信号Xiとし,各々前段の前記ピッチ周期中間選択部(21〜2(N−1))から出力されるピッチ周期の複数の候補(第1段目の前記ピッチ周期中間選択部21については,前記ピッチ周期検出部2によって検出された前記ピッチ周期の複数候補)それぞれについての周期性の強さを評価した結果に基づいて前記ピッチ周期の複数候補を順次少数(複数)の候補に絞り込む。その絞り込み(複数のピッチ周期の選択)の方法は,前記ピッチ周期検出部2において,ピッチ周期の全候補から前記ピッチ周期の複数候補を選択する方法と同様である。
このような構成により,前記第1段階と第2段階との間の中間段階において,前記第1段階及び第2段階とは異なる音源の音声信号にも対応したピッチ周期を選択することが可能となり,3種以上の異なる音源の音声信号各々にバランス良く対応した一のピッチ周期を求めることが可能となる。
そして,前記ピッチ周期選択部3(ピッチ周期選択手段)により,前記ピッチ周期中間選択部(21〜2N,ピッチ周期絞り込み手段の一例)により絞り込まれた複数のピッチ周期の候補の中から一のピッチ周期を選択する。
<Fourth embodiment>
Next, an audio signal processing device Z4 according to a fourth embodiment of the present invention will be described using the block diagram of FIG.
As shown in FIG. 4, the audio signal processing device Z4 detects a plurality of pitch cycle candidates and selects one pitch cycle in the first stage and the second stage as new components to the audio signal processing apparatus Z1. A pitch cycle candidate intermediate selection unit 20 for further narrowing down a plurality of pitch cycle candidates detected in the first step is added at an intermediate stage between the two.
The pitch cycle candidate intermediate selection unit 20 applies a filter process (first signal process) different from the process of the first filter (1) (first signal process) to the input monaural signal M (an example of a pitch period detection signal). .., 1N, and the plurality of signals after being subjected to the filter processing based on each of the plurality of third filters 11, 12,... , 2N (one example of pitch cycle narrowing means). The pitch cycle intermediate selection units 21, 22,..., 2N sequentially narrow down the plurality of pitch cycle candidates detected by the candidate detection means).
Each of the third filters (11 to 1N) has a filter characteristic for extracting a sound signal of another sound source different from a sound signal of a sound source that is a target of pitch period detection (selection) in the first stage and the second stage. To do.
Here, the pitch cycle intermediate selectors (21 to 2N) are connected in series with each other, and each of the first filter (1) and the input monaural signal M (an example of a pitch cycle detection signal) is connected to each other. A signal subjected to different filter processing is set as the pitch cycle evaluation target signal X i, and a plurality of pitch cycle candidates (21 to 2 (N−1)) output from the preceding pitch cycle intermediate selection unit (21 to 2 (N−1)) ( For the pitch cycle intermediate selection unit 21 in the first stage, the pitch cycle is based on the result of evaluating the strength of the periodicity for each of the plurality of pitch cycle candidates detected by the pitch cycle detection unit 2. The multiple candidates are sequentially narrowed down to a small number (multiple) candidates. The method of narrowing down (selecting a plurality of pitch periods) is the same as the method of selecting a plurality of pitch period candidates from all pitch period candidates in the pitch period detection unit 2.
With such a configuration, it becomes possible to select a pitch period corresponding to an audio signal of a sound source different from the first stage and the second stage in an intermediate stage between the first stage and the second stage. Thus, it is possible to obtain one pitch period corresponding to each of the audio signals of three or more different sound sources in a well-balanced manner.
Then, one pitch is selected from a plurality of pitch period candidates narrowed down by the pitch period selecting section 3 (pitch cycle selecting means) by the pitch cycle intermediate selecting section (21 to 2N, an example of pitch cycle narrowing means). Select a period.

次に,図7〜図10に示す音声波形により,本発明の作用効果について説明する。なお,図7〜図10に示す音声波形について,いずれも,その横軸(時間軸)の幅は0.2秒分,各音声信号のサンプリングレートは44100Hzであり,ピッチ周期検出の際に周期性評価を行うピッチ周期の全範囲は350〜1400サンプル(前述のピッチ周期の全候補N0〜Nに相当)としている。
図7(a)は,ピッチ周期検出に用いる模擬信号(前記入力モノラル信号Mや前記合成音声信号に相当,ピッチ周期検出用信号の一例,以下,原音という)の波形の一例を表し,図7(b),(c)は,その原音に含まれる2つの異なる音声信号(以下,原音成分1,原音成分2という)各々の波形を表す。前記原音成分1(b)は50Hz正弦波であり,前記原音成分2(c)は,533Hz正弦波である。
これに対し,図8(a)は,前記原音に対し,前記原音成分2から求まるピッチ周期(533Hz相当)を用いて前記PICOLA方式により1.41倍の時間軸伸張処理を施した信号の波形を表したものである。また,図8(b),(c)は,各々図8(a)に示す伸張後の信号に含まれる前記原音成分1,前記原音成分2の各々に相当する伸張後の信号である。
図8(a)に示す前記原音の伸長処理後の信号波形は,音質維持の観点からすれば,時間軸が伸張されたことを除いて前記原音の波形(図7(a))に近いことが好ましい。しかし,図8(a)に示すように,一方の前記原音成分2に最も適応したピッチ周期を用いて時間軸伸張を行うと,前記原音の波形とは大きく異なる波形となる。これは,図8(b)に示すように,他方の前記原音成分1に相当する伸張処理後の波形,即ち,時間軸伸張に用いるピッチ周期の選択に全く考慮されなかった低周波数側の信号の波形が大きく歪むためである。
Next, the function and effect of the present invention will be described with reference to the speech waveforms shown in FIGS. 7 to 10, the horizontal axis (time axis) has a width of 0.2 seconds, the sampling rate of each audio signal is 44100 Hz, and the period when the pitch period is detected. The entire range of pitch periods for which the sex evaluation is performed is 350 to 1400 samples (corresponding to all the above-mentioned pitch cycle candidates N 0 to N).
FIG. 7A shows an example of a waveform of a simulation signal (corresponding to the input monaural signal M and the synthesized speech signal, an example of a pitch period detection signal, hereinafter referred to as an original sound) used for pitch period detection. (B) and (c) represent the waveforms of two different audio signals (hereinafter referred to as the original sound component 1 and the original sound component 2) included in the original sound. The original sound component 1 (b) is a 50 Hz sine wave, and the original sound component 2 (c) is a 533 Hz sine wave.
On the other hand, FIG. 8 (a) shows the waveform of a signal obtained by subjecting the original sound to a 1.41-times time base expansion process by the PICOLA method using a pitch period (equivalent to 533 Hz) obtained from the original sound component 2. It represents. FIGS. 8B and 8C show the expanded signals corresponding to the original sound component 1 and the original sound component 2 included in the expanded signal shown in FIG. 8A, respectively.
The signal waveform after the original sound extension process shown in FIG. 8A is close to the waveform of the original sound (FIG. 7A) except that the time axis is extended from the viewpoint of maintaining sound quality. Is preferred. However, as shown in FIG. 8A, when the time axis is expanded using the pitch cycle most suitable for one of the original sound components 2, the waveform of the original sound is greatly different. This is because, as shown in FIG. 8 (b), the waveform after expansion corresponding to the other original sound component 1, that is, the signal on the low frequency side which is not considered at all in the selection of the pitch period used for time axis expansion. This is because the waveform is greatly distorted.

一方,図9(a)は,前記原音に対し,前記原音成分1から求まるピッチ周期(50Hz相当)を用いて前記PICOLA方式により1.41倍の時間軸伸張処理を施した信号の波形を表したものである。また,図9(b),(c)は,各々図9(a)に示す伸張後の信号に含まれる前記原音成分1,前記原音成分2の各々に相当する伸張後の信号の波形である。
この場合も,図8に示したのと同様に,図9(a)に示す前記原音の伸長処理後の信号波形は,前記原音の波形とは大きく異なる波形となる。これは,図9(c)に示すように,時間軸伸張に用いるピッチ周期の選択に全く考慮されなかった高周波数側の前記原音成分2の伸張後の信号にパワー(振幅)の減衰が生じるためである。
このような波形の違い(図7(a)の波形に対する図8(a)及び図9(a)の波形の違い)は,聴覚上も大きな音質劣化として表れる。
On the other hand, FIG. 9A shows a waveform of a signal obtained by subjecting the original sound to 1.41 times time axis expansion processing by the PICOLA method using a pitch period (equivalent to 50 Hz) obtained from the original sound component 1. It is a thing. FIGS. 9B and 9C show the waveforms of the expanded signals corresponding to the original sound component 1 and the original sound component 2 included in the expanded signal shown in FIG. 9A, respectively. .
Also in this case, similarly to the case shown in FIG. 8, the signal waveform after the original sound expansion process shown in FIG. 9A is greatly different from the waveform of the original sound. This is because, as shown in FIG. 9C, power (amplitude) attenuation occurs in the signal after expansion of the original sound component 2 on the high frequency side, which is not considered at all in the selection of the pitch period used for time axis expansion. Because.
Such a waveform difference (difference between the waveforms in FIG. 8A and FIG. 9A with respect to the waveform in FIG. 7A) appears as a significant deterioration in sound quality.

次に,図10を用いて,前記音声信号処理装置Z3(図1)の構成によりピッチ周期検出及び時間軸伸張を行った例について説明する。
ここで,前記第1フィルタ(1)は,前記原音成分2のみを抽出するフィルタ,前記第2フィルタ(6)は,前記原音成分1のみを抽出するフィルタとしている。
また,前記第1段階での前記ピッチ周期検出部2によるピッチ周期の複数候補の検出処理には,ピッチ周期の全範囲(全候補,350〜1400サンプルの範囲)を均等に4区間に分割し,各区間毎に最も周期性評価の高いもの(前述の(1)式による評価値の最も小さいもの)を複数候補(4候補)として検出する処理を適用した。
さらに,前記第2段階での前記ピッチ周期選択部3による一のピッチ周期の選択処理には,前記ピッチ周期の複数候補(4つ)の中から,最も周期性評価の高いもの(前述の(1)式による評価値の最も小さいもの)を前記一のピッチ周期として選択する処理を適用した。
図10(a)は,前記信号処理装置Z1により,信号処理(フィルタ処理)を伴う2段階でのピッチ周期検出(選択)を経て求めたピッチ周期を用いて,前記原音(図7(a))に対し1.41倍の時間軸伸張を施した信号の波形である。
また,図10(b),(c)は,各々図10(a)に示す伸張後の信号に含まれる前記原音成分1,前記原音成分2の各々に相当する伸張後の信号の波形である。
図10(a)〜(c)に示すように,本発明の適用により,前記原音の波形に対する大きな劣化のない出力波形が得られることがわかる。これは,前記第1段階において,前記第1フィルタ(1)によって前記原音から前記原音成分2が抽出され,該原音成分2に対応したピッチ周期の複数候補が検出されるとともに,その複数候補の中から,最も前記原音成分1に対応した一のピッチ周期が選択されるため,前記原音成分1及び前記原音成分2の両方にバランス良く対応した一のピッチ周期が選択されることによる。さらに,前記信号処理装置Z4(図4)のように前記第1段階と前記第2段階との間の中間段階の処理を設けることにより,より多くの音源の音声信号にバランス良く対応したピッチ周期を選択することが可能となる。
このようなピッチ周期を用いて,前記原音に対して圧縮/伸張処理を施した音声信号(例えば,図10(a))は,前述の図8(a),図9(a)等に示すような,従来のピッチ周期検出処理の検出結果を用いた圧縮/伸張後の音声信号に比べて,聴覚上も音質劣化が少ない。
Next, an example in which pitch period detection and time axis expansion are performed by the configuration of the audio signal processing device Z3 (FIG. 1) will be described with reference to FIG.
Here, the first filter (1) is a filter that extracts only the original sound component 2, and the second filter (6) is a filter that extracts only the original sound component 1.
In addition, in the first stage, the pitch period detection unit 2 detects a plurality of pitch period candidates, and the entire range of pitch periods (all candidates, a range of 350 to 1400 samples) is equally divided into four sections. , A process of detecting the one having the highest periodicity evaluation for each section (the one having the smallest evaluation value according to the above formula (1)) as a plurality of candidates (four candidates).
Further, in the selection process of one pitch period by the pitch period selection unit 3 in the second stage, the one having the highest periodicity evaluation (the above-mentioned (4) The process of selecting the one having the smallest evaluation value according to 1) as the one pitch period was applied.
FIG. 10A shows the original sound (FIG. 7A) using the pitch period obtained by the signal processing device Z1 through the pitch period detection (selection) in two stages accompanied by signal processing (filter processing). ) Is a waveform of a signal that has been extended by 1.41 times the time axis.
FIGS. 10B and 10C show the waveforms of the expanded signals corresponding to the original sound component 1 and the original sound component 2 included in the expanded signal shown in FIG. 10A, respectively. .
As shown in FIGS. 10 (a) to 10 (c), it can be seen that by applying the present invention, an output waveform without significant deterioration with respect to the waveform of the original sound can be obtained. This is because, in the first stage, the original sound component 2 is extracted from the original sound by the first filter (1), and a plurality of candidates having a pitch period corresponding to the original sound component 2 are detected. This is because one pitch cycle corresponding to the original sound component 1 is selected from among them, and therefore, one pitch cycle corresponding to both the original sound component 1 and the original sound component 2 is selected in a balanced manner. Further, by providing an intermediate stage process between the first stage and the second stage as in the signal processing device Z4 (FIG. 4), a pitch period corresponding to the sound signals of more sound sources in a well-balanced manner. Can be selected.
An audio signal (for example, FIG. 10 (a)) obtained by compressing / decompressing the original sound using such a pitch period is shown in FIG. 8 (a), FIG. 9 (a), etc. As compared with the audio signal after compression / decompression using the detection result of the conventional pitch period detection process, the sound quality is less deteriorated in hearing.

また,図11は,歌唱音声と楽器音とが混在した楽曲音声信号の波形の一例(a)と,その楽曲音声信号に対して従来の手法で検出されたピッチ周期の時間変化を表すグラフ(b)と本発明の手法で検出されたピッチ周期の時間変化を表すグラフ(c)とを表す。図11(b),(c)のグラフの縦軸は検出されたピッチ周期(サンプル数)を表し,横軸は時間軸(横軸の数値は,1秒間に44100回のサンプリングが行われることを条件として時間をサンプリング回数(サンプル数)で換算したもの)を表す。
ここで,音声信号のサンプリングレートは44100Hzであり,ピッチ周期検出の際に周期性評価を行うピッチ周期の全範囲は,従来の手法及び本発明の手法のいずれにおいても350〜1400サンプル(前述のピッチ周期の全候補N0〜Nに相当)としている。
また,本発明の手法(図11(c))では,前記音声信号処理装置Z1(図1)を用い,前記第1フィルタ(1)として,そのカットオフ周波数が200Hz及び8KHz,そのスロープ特性が−12dB/octであるIIR型フィルタを用いた場合のものである。
また,前記第1段階での前記ピッチ周期検出部2によるピッチ周期の複数候補の検出処理には,ピッチ周期の全範囲(全候補,350〜1400サンプルの範囲)を均等に4区間に分割し,各区間毎に最も周期性評価の高いもの(前述の(1)式による評価値の最も小さいもの)を複数候補(4候補)として検出する処理を適用した。
さらに,前記第2段階での前記ピッチ周期選択部3による一のピッチ周期の選択処理には,前記ピッチ周期の複数候補(4つ)の中から,最も周期性評価の高いもの(前述の(1)式による評価値の最も小さいもの)を前記一のピッチ周期として選択する処理を適用した。
従来の手法図11(b)においては,検出されたピッチ周期がその上限及び下限付近に多く散らばってそのばらつきが大きいのに対し,本発明の手法図11(c)においては,そのばらつきが小さくなり,ピッチ周期=約700サンプルの前後で比較的滑らかに連続したピッチ周期の抽出が行われていることが表れている。
これはフィルタ(音声帯域制限)によって歌唱音声に対応したピッチ周期であるとともに,楽曲音声信号全体としても違和感のないピッチ周期が検出(選択)されていることを示しており,従来手法よりも本発明の手法の方が,聴覚上の音質が向上することを客観的に表している。
FIG. 11 is a graph (a) showing an example of a waveform of a music sound signal in which singing voice and instrument sound are mixed, and a graph showing a time change of the pitch period detected by the conventional method for the music sound signal. b) and a graph (c) showing a time change of the pitch period detected by the method of the present invention. 11B and 11C, the vertical axis represents the detected pitch period (number of samples), the horizontal axis represents the time axis (the numerical value on the horizontal axis represents that 44100 samplings are performed per second. Represents the time converted into the number of samplings (number of samples)).
Here, the sampling rate of the audio signal is 44100 Hz, and the entire range of the pitch period for which the periodicity is evaluated when detecting the pitch period is 350 to 1400 samples (described above) in both the conventional method and the method of the present invention. It corresponds to all pitch cycle candidates N 0 to N).
In the method of the present invention (FIG. 11C), the audio signal processing device Z1 (FIG. 1) is used, and the first filter (1) has a cutoff frequency of 200 Hz and 8 KHz, and a slope characteristic thereof. This is a case where an IIR filter of −12 dB / oct is used.
In addition, in the first stage, the pitch period detection unit 2 detects a plurality of pitch period candidates, and the entire range of pitch periods (all candidates, a range of 350 to 1400 samples) is equally divided into four sections. , A process of detecting the one having the highest periodicity evaluation for each section (the one having the smallest evaluation value according to the above formula (1)) as a plurality of candidates (four candidates).
Further, in the selection process of one pitch period by the pitch period selection unit 3 in the second stage, the one having the highest periodicity evaluation (the above-mentioned (4) The process of selecting the one having the smallest evaluation value according to 1) as the one pitch period was applied.
In the conventional technique FIG. 11 (b), the detected pitch periods are scattered in the vicinity of the upper and lower limits, and the variation is large, whereas in the technique FIG. 11 (c) of the present invention, the variation is small. Thus, it can be seen that the pitch period is extracted relatively smoothly before and after about 700 samples.
This indicates that the pitch period corresponding to the singing voice is detected (selected) by the filter (voice band limitation), and that the pitch period without any sense of incongruity is detected (selected) for the entire music voice signal. The method of the invention objectively indicates that the sound quality on hearing is improved.

以上示した実施形態では,前記第1段階における信号処理として,比較的軽い演算負荷で複数の音源からの音声信号を分離できる帯域制限フィルタ処理を適用した。
その他,前記第1段階における信号処理としては,イコライジングによる周波数強調によって特定の周波数帯の信号を増幅或いは減衰させる信号処理や,ブラインド音源分離方式(BSS方式)によって前記入力音声信号に含まれる複数の音源の音声信号を分離する信号処理等も考えられる。
前記イコライジングによる周波数強調では,例えば,特定の音源の周波数帯域について増幅ゲインを設定し,その周波数帯域の音声信号の振幅をFIRフィルタ処理等を施すことによって増幅(強調)する。これにより,特定の音源の音声信号に対応したピッチ周期を得ることができる。
また,前記BSS方式に基づく音源分離によれば,予め複数音源各々の周波数帯域を指定しなくても自動的に音源分離され,各音源の音声信号が得られる点で有効である。但し,演算負荷は大きくなる。
In the embodiment described above, the band limiting filter process that can separate the audio signals from a plurality of sound sources with a relatively light calculation load is applied as the signal processing in the first stage.
In addition, as the signal processing in the first stage, signal processing for amplifying or attenuating a signal in a specific frequency band by frequency enhancement by equalizing, or a plurality of signals included in the input audio signal by a blind sound source separation method (BSS method) Signal processing for separating the sound signal of the sound source is also conceivable.
In the frequency emphasis by equalizing, for example, an amplification gain is set for the frequency band of a specific sound source, and the amplitude of the audio signal in that frequency band is amplified (emphasized) by performing FIR filter processing or the like. Thereby, the pitch period corresponding to the sound signal of a specific sound source can be obtained.
Further, the sound source separation based on the BSS method is effective in that sound sources are automatically separated and a sound signal of each sound source can be obtained without designating the frequency bands of each of a plurality of sound sources in advance. However, the calculation load increases.

本発明は,音声信号の時間軸圧縮・伸張を行う音声信号処理への利用が可能である。   The present invention can be used for audio signal processing that performs time-base compression / expansion of an audio signal.

本発明の第1実施形態に係る音声信号処理装置Z1の概略構成を表すブロック図。The block diagram showing the schematic structure of the audio | voice signal processing apparatus Z1 which concerns on 1st Embodiment of this invention. 本発明の第2実施形態に係る音声信号処理装置Z2の概略構成を表すブロック図。The block diagram showing schematic structure of the audio | voice signal processing apparatus Z2 which concerns on 2nd Embodiment of this invention. 本発明の第3実施形態に係る音声信号処理装置Z3の概略構成を表すブロック図。The block diagram showing schematic structure of the audio | voice signal processing apparatus Z3 which concerns on 3rd Embodiment of this invention. 本発明の第4実施形態に係る音声信号処理装置Z4の概略構成を表すブロック図。The block diagram showing the schematic structure of the audio | voice signal processing apparatus Z4 which concerns on 4th Embodiment of this invention. PICOLA方式により音声信号の時間軸圧縮が行われる際の音声信号の波形を模式的に表した図。The figure which represented typically the waveform of the audio | voice signal at the time of time-axis compression of an audio | voice signal by a PICOLA system. PICOLA方式により音声信号の時間軸伸張が行われる際の音声信号の波形を模式的に表した図。The figure which represented typically the waveform of the audio | voice signal when the time-axis expansion | extension of an audio | voice signal is performed by a PICOLA system. 時間軸圧縮・伸張処理に用いられる音声信号(原音)の波形の一例を表す図。The figure showing an example of the waveform of the audio | voice signal (original sound) used for a time-axis compression / expansion process. 図7に示す音声信号(原音)に従来の手法で時間軸伸張を行った後の信号の波形の一例を表す図。The figure showing an example of the waveform of the signal after performing time-axis expansion | extension by the conventional method to the audio | voice signal (original sound) shown in FIG. 図7に示す音声信号(原音)に従来の手法で時間軸伸張を行った後の信号の波形の一例を表す図。The figure showing an example of the waveform of the signal after performing time-axis expansion | extension by the conventional method to the audio | voice signal (original sound) shown in FIG. 図7に示す音声信号(原音)に本発明の手法で時間軸伸張を行った後の信号の波形の一例を表す図。The figure showing an example of the waveform of the signal after performing time-axis expansion | extension by the method of this invention to the audio | voice signal (original sound) shown in FIG. 楽曲音声信号の波形の一例及びその楽曲音声信号に対して従来の手法と本発明の手法とで検出されたピッチ周期の時間変化を表すグラフ。The graph showing the time change of the pitch period detected with the example of the conventional method and the method of this invention with respect to the example of the waveform of a music audio | voice signal, and the music audio | voice signal.

符号の説明Explanation of symbols

Z1〜Z4…音声信号処理装置
1,11〜1N,6…フィルタ
2…ピッチ周期検出部
3…ピッチ周期選択部
4…信号圧縮/伸張部
5…合成信号生成部
20…ピッチ周期候補中間選択部
21〜2N…ピッチ周期中間選択部
Z1 to Z4 ... audio signal processing devices 1, 11 to 1N, 6 ... filter 2 ... pitch cycle detection unit 3 ... pitch cycle selection unit 4 ... signal compression / expansion unit 5 ... synthesized signal generation unit 20 ... pitch cycle candidate intermediate selection unit 21 to 2N: Pitch period intermediate selection unit

Claims (5)

複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から第1の所定の周波数帯の音声信号を分離した音声信号を生成する第1の信号処理手段と,
複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から前記第1の所定の周波数帯とは異なる第2の所定の周波数帯の音声信号を分離した音声信号を生成する第2の信号処理手段と,
前記第1の信号処理手段により分離された前記第1の所定の周波数帯に含まれる特定の音声信号を構成する周波数成分であるピッチ周期の候補を検出する手段であって,前記入力音声信号又は前記合成音声信号を所定の第1のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数のピッチ周期の候補として検出する第1のピッチ周期候補検出手段と,
前記第2の信号処理手段により分離された前記第2の所定の周波数帯に含まれる特定の音声信号を前記第1のピッチ周期用サンプリング周期とは異なる所定の第2のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数の第2のピッチ周期の候補として検出する第2のピッチ周期候補検出手段と,
前記1又は複数の第1のピッチ周期の候補と前記1又は複数の第2のピッチ周期の候補とに共通する候補の中から,最も周期性の強い一のピッチ周期を選択するピッチ周期選択手段と,
前記ピッチ周期選択手段により選択された前記一のピッチ周期に基づいて前記入力音声信号の時間軸の圧縮及び/又は伸張を行う時間軸調節手段と,
を具備してなることを特徴とする音声信号処理装置。
First signal processing means for generating an audio signal obtained by separating an audio signal of a first predetermined frequency band from an input audio signal in which audio from a plurality of sound sources is mixed or a synthesized audio signal of input audio signals of a plurality of channels When,
A voice signal in a second predetermined frequency band different from the first predetermined frequency band is separated from one input voice signal in which voices from a plurality of sound sources are mixed or a synthesized voice signal of a plurality of channels of input voice signals. Second signal processing means for generating an audio signal;
Means for detecting a pitch period candidate which is a frequency component constituting a specific audio signal included in the first predetermined frequency band separated by the first signal processing means, the input audio signal or The synthesized speech signal is sampled at a predetermined first pitch period sampling period, and one or a plurality of pitches in ascending order of difference in signal intensity between the sampled signal of one period and the sampled signal of the other period First pitch period candidate detecting means for detecting as a period candidate;
A specific audio signal included in the second predetermined frequency band separated by the second signal processing means is transmitted at a predetermined second pitch period sampling period different from the first pitch period sampling period. Second pitch period candidate detection that is sampled and detected as one or a plurality of second pitch period candidates in order of increasing signal strength difference between the sampled signal of one period and the sampled signal of the other period Means,
Pitch cycle selection for selecting one pitch cycle having the strongest periodicity from candidates common to the one or more first pitch cycle candidates and the one or more second pitch cycle candidates Means,
Time axis adjusting means for compressing and / or expanding the time axis of the input audio signal based on the one pitch period selected by the pitch period selecting means;
An audio signal processing apparatus comprising:
複数の音源からの一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から前記第1の所定の周波数帯及び前記第2の所定の周波数帯とは異なる1又は複数の第3の所定の周波数帯の音声信号を分離する第3の信号処理手段と,
第3の信号処理手段によって分離された第3の所定の周波数帯の音声信号を第4の所定のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1つの前記周期の信号と他の前記周期の信号との差を周期性が強い周期として,ピッチ周期検出信号の複数の第3の候補として検出し,前記複数の第1の候補と前記複数の第3の候補との双方に対応した周期性の強い複数の周期をピッチ周期の第3の候補として選択して,前記ピッチ周期の複数候補を絞り込むピッチ周期絞り込み手段とを具備し,
前記ピッチ周期選択手段が,前記ピッチ周期絞り込み手段により絞り込まれた候補の中から一のピッチ周期を選択してなる請求項1に記載の音声信号処理装置。
One or more third predetermined frequencies different from the first predetermined frequency band and the second predetermined frequency band from one input audio signal from a plurality of sound sources or a synthesized audio signal of input audio signals of a plurality of channels Third signal processing means for separating audio signals in the frequency band of
The audio signal of the third predetermined frequency band separated by the third signal processing means is sampled at the sampling period for the fourth predetermined pitch period, the sampled signal of one period and the other period sampled Detecting a difference between one signal of the period and the signal of the other period as a period having a strong periodicity as a plurality of third candidates of the pitch period detection signal in order of increasing signal intensity difference from the signal of A pitch that selects a plurality of highly periodic periods corresponding to both the plurality of first candidates and the plurality of third candidates as a third candidate for the pitch period, and narrows down the plurality of pitch period candidates. Period narrowing means,
2. The audio signal processing apparatus according to claim 1, wherein the pitch cycle selecting unit selects one pitch cycle from the candidates narrowed down by the pitch cycle narrowing unit.
前記第1の信号処理が,帯域制限フィルタによって,前記第1の所定の周波数帯の音声信号を分離するものである請求項1又は2に記載の音声信号処理装置。   The audio signal processing apparatus according to claim 1 or 2, wherein the first signal processing is to separate the audio signal of the first predetermined frequency band by a band limiting filter. 前記第2の信号処理及び/又は前記第3の信号処理が,帯域制限フィルタによって,前記第1の所定の周波数帯とは異なる周波数帯の音声信号を分離するものである請求項3に記載の音声信号処理装置。   The said 2nd signal processing and / or said 3rd signal processing isolate | separate the audio | voice signal of a frequency band different from a said 1st predetermined frequency band by a band-limiting filter. Audio signal processing device. 複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から第1の所定の周波数帯の音声信号を分離した音声信号を生成する第1の信号処理工程と,
複数の音源からの音声が混在する一の入力音声信号又は複数チャンネルの入力音声信号の合成音声信号から前記第1の所定の周波数帯とは異なる第2の所定の周波数帯の音声信号を分離した音声信号を生成する第2の信号処理工程と,
前記第1の信号処理工程により分離された前記第1の所定の周波数帯に含まれる特定の音声信号を構成する周波数成分であるピッチ周期の候補を検出する工程であって,前記入力音声信号又は前記合成音声信号を所定の第1のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数のピッチ周期の候補として検出する第1のピッチ周期候補検出工程と,
前記第2の信号処理工程により分離された前記第2の所定の周波数帯に含まれる特定の音声信号を前記第1のピッチ周期用サンプリング周期とは異なる所定の第2のピッチ周期用サンプリング周期でサンプリングし,サンプリングされた1つの周期の信号とサンプリングされた他の周期の信号との信号強度の差が小さい順に1又は複数の第2のピッチ周期の候補として検出する第2のピッチ周期候補検出工程と,
前記1又は複数の第1のピッチ周期の候補と前記1又は複数の第2のピッチ周期の候補とに共通する候補の中から,最も周期性の強い一のピッチ周期を選択するピッチ周期選択工程と,
前記ピッチ周期選択工程により選択された前記一のピッチ周期に基づいて前記入力音声信号の時間軸の圧縮及び/又は伸張を行う時間軸調節工程と,
を有してなることを特徴とする音声信号処理方法。
A first signal processing step of generating an audio signal obtained by separating an audio signal of a first predetermined frequency band from an input audio signal in which audio from a plurality of sound sources is mixed or a synthesized audio signal of input audio signals of a plurality of channels When,
A voice signal in a second predetermined frequency band different from the first predetermined frequency band is separated from one input voice signal in which voices from a plurality of sound sources are mixed or a synthesized voice signal of a plurality of channels of input voice signals. A second signal processing step for generating an audio signal;
A step of detecting a pitch period candidate which is a frequency component constituting a specific audio signal included in the first predetermined frequency band separated by the first signal processing step, the input audio signal or The synthesized speech signal is sampled at a predetermined first pitch period sampling period, and one or a plurality of pitches in ascending order of difference in signal intensity between the sampled signal of one period and the sampled signal of the other period A first pitch period candidate detecting step of detecting as a period candidate;
The specific audio signal included in the second predetermined frequency band separated by the second signal processing step is transmitted at a predetermined second pitch period sampling period different from the first pitch period sampling period. Second pitch period candidate detection that is sampled and detected as one or a plurality of second pitch period candidates in order of increasing signal strength difference between the sampled signal of one period and the sampled signal of the other period Process,
Pitch cycle selection for selecting one pitch cycle having the strongest periodicity from candidates common to the one or more first pitch cycle candidates and the one or more second pitch cycle candidates Process,
A time axis adjustment step of compressing and / or expanding the time axis of the input audio signal based on the one pitch cycle selected by the pitch cycle selection step;
An audio signal processing method comprising:
JP2004243882A 2004-08-24 2004-08-24 Audio signal processing apparatus and method Expired - Fee Related JP4471780B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2004243882A JP4471780B2 (en) 2004-08-24 2004-08-24 Audio signal processing apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2004243882A JP4471780B2 (en) 2004-08-24 2004-08-24 Audio signal processing apparatus and method

Publications (2)

Publication Number Publication Date
JP2006064755A JP2006064755A (en) 2006-03-09
JP4471780B2 true JP4471780B2 (en) 2010-06-02

Family

ID=36111337

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2004243882A Expired - Fee Related JP4471780B2 (en) 2004-08-24 2004-08-24 Audio signal processing apparatus and method

Country Status (1)

Country Link
JP (1) JP4471780B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5228300B2 (en) * 2006-08-10 2013-07-03 カシオ計算機株式会社 Audio expansion device, audio expansion method, and program
WO2009025142A1 (en) * 2007-08-22 2009-02-26 Nec Corporation Speaker speed conversion system, its method and speed conversion device

Also Published As

Publication number Publication date
JP2006064755A (en) 2006-03-09

Similar Documents

Publication Publication Date Title
CA2253749C (en) Method and device for instantly changing the speed of speech
JP5255663B2 (en) Speech gain control using auditory event detection based on specific loudness
JP2004527000A (en) High quality time scaling and pitch scaling of audio signals
Lee et al. The effect of loudness on the reverberance of music: Reverberance prediction using loudness models
JP6174856B2 (en) Noise suppression device, control method thereof, and program
US20170353170A1 (en) Intelligent Method And Apparatus For Spectral Expansion Of An Input Signal
JP2005266797A (en) Method and apparatus for separating sound-source signal and method and device for detecting pitch
US8635077B2 (en) Apparatus and method for expanding/compressing audio signal
JP2009296298A (en) Sound signal processing device and method
EP3772224A1 (en) Vibration signal generation apparatus and vibration signal generation program
JP4471780B2 (en) Audio signal processing apparatus and method
JP2008072600A (en) Acoustic signal processing apparatus, acoustic signal processing program, and acoustic signal processing method
JP2008048342A (en) Sound acquisition apparatus
CN108604454A (en) Audio signal processor and input audio signal processing method
JP2005031169A (en) Sound signal processing device, method therefor and program therefor
JPH06289898A (en) Speech signal processor
JP2007033804A (en) Sound source separation device, sound source separation program, and sound source separation method
JP6321334B2 (en) Signal processing apparatus and program
JP3197975B2 (en) Pitch control method and device
JP2010277023A (en) Telephone voice section detector and program of the same
JP3185363B2 (en) hearing aid
JP2002236499A (en) Music signal compressor, music signal compander and music signal preprocessing controller
JP2006220806A (en) Audio signal processor, audio signal processing program and audio signal processing method
JP3102553B2 (en) Audio signal processing device
KR100870870B1 (en) High quality time-scaling and pitch-scaling of audio signals

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060925

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090812

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090901

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20091102

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20091201

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20100128

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20100302

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20100302

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130312

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140312

Year of fee payment: 4

LAPS Cancellation because of no payment of annual fees