JP5201259B2

JP5201259B2 - Apparatus for detecting basic period of speech and apparatus for converting speech speed using the basic period

Info

Publication number: JP5201259B2
Application number: JP2011282994A
Authority: JP
Inventors: 次男伊藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-12-26
Filing date: 2011-12-26
Publication date: 2013-06-05
Anticipated expiration: 2027-08-09
Also published as: JP2012058764A

Description

本発明は、音声の基本周期を検出しその基本周期を用いて話速変換を行う技術に関する。 The present invention relates to a technique for detecting a fundamental period of speech and performing speech speed conversion using the fundamental period.

近年、携帯電話機の利用者層を拡大するため、従来とは異なる利用者層をターゲットとした携帯電話機が流通し始めている。この種の携帯電話機の一例としては、キー操作に不慣れな高齢者向けに操作キーの数を必要最低限の個数に絞り込んだ携帯電話機が挙げられる。この種の高齢者向けの携帯電話機においては、通話音声を聞き取り易くするために、話速変換機能が設けられていると便利である。また、一般的な携帯電話機においても重要な通話内容を聴き間違えないようにするため、話速を遅くするなどの話速変換ができると便利である。このような話速変換を実現する技術の一例としては、非特許文献１に開示された技術が挙げられる。非特許文献１には、基本周期単位で波形の挿入または削除を行うことにより話速変換を実現するＰＩＣＯＬＡと呼ばれるアルゴリズムが開示されている。なお、音声の基本周期を検出する手法としては、音声データに対して位相シフト量を変えつつ位相シフト処理を施し、位相シフト前の音声力データと位相シフト後の音声データとの相関値を算出し、相関値がピークになる位相シフト量に応じた時間長をその音声データの表す音声の基本周期として検出する自己相関算出法が挙げられる。例えば、音声データが音声信号のサンプリングデータ列である場合には、そのサンプリングデータ列とｎ（自然数）サンプルだけずらしたサンプリングデータ列との相関値を算出する処理を上記ｎの値を変えつつ実行して相関値がピークになるｎの値を特定することにより上記基本周期が検出される。
森田直孝，板倉文忠、“ポインター移動制御による重複加算法（ＰＩＣＯＬＡ）を用いた音声の時間軸での伸長圧縮とその評価”、日本音響学会講演論文集、p.149-150、昭和６１年１０月 In recent years, in order to expand the user base of mobile phones, mobile phones targeting a user base different from the conventional one have started to be distributed. As an example of this type of mobile phone, there is a mobile phone in which the number of operation keys is reduced to the minimum necessary number for elderly people who are not accustomed to key operations. In this type of mobile phone for elderly people, it is convenient that a speech speed conversion function is provided in order to make it easy to hear the voice of a call. Also, in order to prevent mistakes in listening to important call contents even in a general mobile phone, it is convenient to be able to convert the speech speed such as slowing down the speech speed. As an example of a technique for realizing such speech speed conversion, a technique disclosed in Non-Patent Document 1 can be cited. Non-Patent Document 1 discloses an algorithm called PICOLA that realizes speech speed conversion by inserting or deleting waveforms in units of basic periods. As a method for detecting the basic period of speech, the phase shift processing is performed on the speech data while changing the phase shift amount, and the correlation value between the speech force data before the phase shift and the speech data after the phase shift is calculated. In addition, there is an autocorrelation calculation method in which the time length corresponding to the phase shift amount at which the correlation value reaches a peak is detected as the basic period of the voice represented by the voice data. For example, when the audio data is a sampling data sequence of an audio signal, a process of calculating a correlation value between the sampling data sequence and a sampling data sequence shifted by n (natural number) samples is executed while changing the value of n. The basic period is detected by specifying the value of n at which the correlation value peaks.
Naotaka Morita and Fumitada Itakura, “Expansion and compression of speech over time using pointer movement control (PICOLA) and its evaluation”, Proc. Of the Acoustical Society of Japan, p.149-150, October 1986 Moon

ところで、携帯電話機を用いた音声通話においては、限られた通信帯域を有効に利用するため、通話音声を表す音声データを圧縮して送受信することが一般的に行われている。具体的には、携帯電話機においては、通話相手へ送信する音声データについては音声コーデックによる圧縮を施して送信する一方、通話相手から受信した圧縮音声データを同音声コーデックにより解凍して音声データを復元しその音声データにしたがって音声を再生する処理が実行されるのである。音声コーデックでは、圧縮率を高めるため、所謂不可逆圧縮が実行される。このため、送信側の携帯電話機において圧縮される前の音声データを受信側の携帯電話機で完全に復元することはできない。例えば、音声コーデックを用いて圧縮および解凍を施した音声データはその圧縮前の音声データに比較して高次倍音成分が強調されている傾向がある。かかる高次倍音成分の強調により、音声の特徴が際立ち聞き取り易くなるといった利点がある一方、前述した基本周期の検出の際に、強調された高次倍音の周期が基本周期であると誤検出される場合がある。 By the way, in a voice call using a mobile phone, in order to effectively use a limited communication band, it is a common practice to compress and transmit voice data representing a call voice. Specifically, in a mobile phone, audio data to be transmitted to the other party is compressed by the audio codec and transmitted, while the compressed audio data received from the other party is decompressed by the same audio codec to restore the audio data. And the process which reproduces | regenerates an audio | voice according to the audio | voice data is performed. In the audio codec, so-called irreversible compression is performed in order to increase the compression rate. For this reason, the audio data before being compressed by the transmitting-side mobile phone cannot be completely restored by the receiving-side mobile phone. For example, audio data compressed and decompressed using an audio codec tends to have higher harmonic components emphasized compared to the audio data before compression. The enhancement of the higher harmonic components has the advantage that the features of the voice stand out more easily. On the other hand, when the fundamental period described above is detected, the period of the emphasized higher harmonics is erroneously detected as the fundamental period. There are cases.

図６（ａ）は、音声コーデックを用いた圧縮および解凍を施していない音声データについての自己相関算出法による基本周期検出処理の処理結果を示す図であり、図６（ｂ）は、同一の音声データについて音声コーデックによる圧縮および解凍を施した後に自己相関算出法による基本周期検出処理を施した場合の処理結果を示す図である。図６（ａ）および図６（ｂ）において、横軸は前述した位相シフト量をサンプル数単位で表した座標軸であり、縦軸は相関値を表す座標軸である（図６（ａ）および（ｂ）では、上記相関値が小さいほど相関が高い（類似度が高い））。図６（ａ）と図６（ｂ）とを比較すれば明らかように、後者においては前者に比較して基本周期が短く検出される。これは、後者においては前者に比較して高次倍音成分が強調されているからである。このように、音声コーデックによる圧縮および解凍を施した音声データについては、基本周期を正確に検出することが困難になり、誤検出された基本周期単位での波形の追加または削除に起因してノイズが発生するなど話速変換に支障が生じる場合がある。
本発明は、上記課題に鑑みてなされたものであり、その第１の目的は、音声コーデックによる圧縮および解凍が施され高次倍音成分が強調された音声データの基本周期を正確に検出することを可能にする技術を提供することである。そして、本発明の第２の目的は、高次倍音成分が強調された音声データに対する話速変換においてノイズの発生を回避することを可能にする技術を提供することである。 FIG. 6A is a diagram showing a processing result of a basic period detection process by an autocorrelation calculation method for audio data that has not been compressed and decompressed using an audio codec, and FIG. It is a figure which shows the processing result at the time of performing the basic period detection process by an autocorrelation calculation method, after performing compression and decompression | decompression with an audio codec about audio | voice data. 6A and 6B, the horizontal axis is a coordinate axis representing the above-described phase shift amount in units of samples, and the vertical axis is a coordinate axis representing a correlation value (FIGS. 6A and 6B). In b), the smaller the correlation value, the higher the correlation (the degree of similarity is higher)). As is apparent from a comparison between FIG. 6A and FIG. 6B, the latter has a shorter basic period than the former. This is because the higher harmonic component is emphasized in the latter compared to the former. In this way, for audio data that has been compressed and decompressed by an audio codec, it becomes difficult to accurately detect the fundamental period, and noise is caused by the addition or deletion of a waveform in the fundamental period unit that was erroneously detected. May cause problems in speech speed conversion.
The present invention has been made in view of the above problems, and a first object of the present invention is to accurately detect the fundamental period of audio data that has been compressed and decompressed by an audio codec and emphasized higher harmonic components. It is to provide technology that makes it possible. A second object of the present invention is to provide a technique that makes it possible to avoid the occurrence of noise in speech speed conversion for speech data in which high-order harmonic components are emphasized.

上記第１の目的を達するため、本発明は、音声データを受け取り、該音声データから高次倍音成分を除去して出力するフィルタと、前記フィルタの出力データについて位相シフト量を変えつつ位相シフト処理を施し、位相シフト前の出力データと位相シフト後の出力データとの相関値を算出し、相関値がピークになる位相シフト量に応じた時間長を前記音声データの表す音声の基本周期として検出する基本周期検出部とを有することを特徴とする基本周期検出装置を提供する。このような基本周期検出装置によれば、基本周期の検出対象である音声データから高次倍音成分が上記フィルタによって除去されているため、基本周期を正確に検出することが可能になる。 In order to achieve the first object, the present invention provides a filter that receives audio data, removes high-order harmonic components from the audio data, and outputs the same, and a phase shift process for the output data of the filter while changing the phase shift amount. The correlation value between the output data before the phase shift and the output data after the phase shift is calculated, and the time length corresponding to the phase shift amount at which the correlation value reaches the peak is detected as the basic period of the voice represented by the voice data And a fundamental period detecting unit that provides the fundamental period detecting device. According to such a basic period detection device, since the higher-order harmonic components are removed from the audio data, which is the detection target of the basic period, by the filter, the basic period can be accurately detected.

より好ましい態様においては、前記フィルタは予め定められたカットオフ周波数よりも高い周波数を有する帯域成分の通過を阻止するローパスフィルタまたはバンドパスフィルタであることを特徴とする。 In a more preferred aspect, the filter is a low-pass filter or a band-pass filter that blocks passage of band components having a frequency higher than a predetermined cutoff frequency.

また、別の好ましい態様においては、前記基本周期検出装置のフィルタは、フィルタ特性が可変のフィルタであり、前記音声データに含まれる高次倍音成分のうち前記音声データの送信元で強調する処理が施された周波数成分を特定する特定手段と、前記特定手段により特定された周波数成分を抑止するフィルタ特性を前記フィルタに設定する制御部とを有することを特徴とする。 In another preferred aspect, the filter of the basic period detection device is a filter having a variable filter characteristic, and the process of emphasizing the high-order harmonic component included in the audio data at the transmission source of the audio data is performed. It is characterized by comprising specifying means for specifying the applied frequency component and a control unit for setting a filter characteristic for suppressing the frequency component specified by the specifying means in the filter.

さらに別の好ましい態様においては、前記基本周期検出装置のフィルタは、フィルタ特性が可変のフィルタであり、音声の周波数特性をオクターブ単位で分割された帯域の各々における信号強度で表した周波数特性データであって、各々異なる声質の音声の周波数特性を表す複数の前記周波数特性データが格納されている記憶手段と、前記フィルタへ与える音声データを解析してその音声データの表す音声についての前記周波数特性データを生成する解析手段と、前記解析手段により生成された周波数特性データと前記記憶手段に格納されている前記複数の周波数特性データの各々とを比較して最も乖離が小さいものを特定し、前記生成した周波数特性データと前記特定した周波数特性データとの差に応じて前記フィルタのフィルタ特性を制御する制御部とを有することを特徴とする。 In still another preferred aspect, the filter of the fundamental period detection device is a filter having a variable filter characteristic, and is a frequency characteristic data representing the frequency characteristic of the sound by the signal strength in each of the bands divided in octaves. A plurality of frequency characteristic data representing frequency characteristics of voices of different voice qualities, and the frequency characteristic data of the voice represented by the voice data analyzed by analyzing the voice data applied to the filter. Comparing the frequency characteristic data generated by the analyzing means and each of the plurality of frequency characteristic data stored in the storage means to identify the one having the smallest discrepancy, The filter characteristics of the filter are controlled according to the difference between the measured frequency characteristic data and the identified frequency characteristic data. And having a control unit for.

また、上記第２の目的を達するため、本発明は、上記各態様の何れかの基本周期検出装置と、受け取った音声データを該基本周期検出装置に与えて基本周期を検出させ、前記受け取った音声データに対して前記基本周期検出装置により検出された基本周期単位で波形の挿入または削除を施して出力する話速変換部と、を具備することを特徴とする話速変換装置を提供する。このような話速変換装置によれば、高次倍音を除去した音声データから検出された正確な基本周期単位で波形の挿入または削除が行われるため、前述したノイズ発生のような不具合の発生が回避される。 In order to achieve the second object, the present invention provides the basic period detection device according to any of the above aspects and the received audio data to the basic period detection device to detect the basic period, and receives the received data. There is provided a speech rate conversion apparatus comprising: a speech rate conversion unit that outputs voice data by inserting or deleting a waveform in units of basic periods detected by the fundamental period detection device. According to such a speech rate conversion apparatus, since the waveform is inserted or deleted in an accurate basic period unit detected from the voice data from which the higher harmonics are removed, the occurrence of the problem such as the noise generation described above is caused. Avoided.

以下、図面を参照しつつ、本発明を実施する際の最良の形態について説明する。
（Ａ：第１実施形態）
図１は、本発明の第１実施形態に係る話速変換装置１Ａの構成例を示すブロック図である。話速変換装置１Ａは、例えば携帯電話機に内蔵され、その携帯電話機の音声コーデック２から供給される音声データに話速変換を施し、Ｄ／Ａ変換器やスピーカ（何れも図示省略）を含んでいる音声出力系３に与えることにより、話速変換を施した音声を出力させるものである。図１に示すように、話速変換装置１Ａは、フィルタ１０Ａ、基本周期検出部２０Ａおよび話速変換部３０Ａを有している。 Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.
(A: 1st Embodiment)
FIG. 1 is a block diagram showing a configuration example of a speech rate conversion apparatus 1A according to the first embodiment of the present invention. The speech speed conversion device 1A is built in, for example, a mobile phone, performs speech speed conversion on audio data supplied from the audio codec 2 of the mobile phone, and includes a D / A converter and a speaker (both not shown). The voice output system 3 is used to output the voice subjected to the speech speed conversion. As shown in FIG. 1, the speech speed conversion apparatus 1A includes a filter 10A, a basic period detection unit 20A, and a speech speed conversion unit 30A.

フィルタ１０Ａは、例えばローパスフィルタ（以下、ＬＰＦ）であり、音声コーデック２から与えられる音声データから所定のカットオフ周波数よりも高い周波数を有する帯域成分を除去して出力する。フィルタ１０Ａのカットオフ周波数をどの程度の値にするかについては適宜実験し、基本周期の検出に影響を与える高次倍音成分がどの程度除去されるかに応じて定めれば良い。本実施形態では、フィルタ１０ＡをＬＰＦで構成したが、バンドパスフィルタで構成しても勿論良い。 The filter 10A is, for example, a low-pass filter (hereinafter referred to as LPF), and removes band components having a frequency higher than a predetermined cutoff frequency from the audio data supplied from the audio codec 2, and outputs the result. The value of the cut-off frequency of the filter 10A may be determined as appropriate depending on how much higher-order harmonic components that affect the detection of the fundamental period are removed. In the present embodiment, the filter 10A is configured with an LPF, but may be configured with a band-pass filter.

基本周期検出部２０Ａは、フィルタ１０Ａにより上記カットオフ周波数よりも高い周波数の帯域成分が除去された音声データからその音声データの表す音声の基本周期を検出し、その基本周期を表すデータ（以下、基本周期データ）を話速変換部３０Ａに供給する処理を実行する電子回路である。基本周期検出部２０Ａにおける基本周期検出アルゴリズムとしては既存のアルゴリズムを用いることができ、本実施形態では前述した自己相関算出法による基本周期検出アルゴリズムが採用されている。より詳細に説明すると、基本周期検出部２０Ａは、フィルタ１０Ａから与えられる音声データに対して、前述したサンプル数単位での位相シフト処理を施し、位相シフト前の音声力データと位相シフト後の音声データとの相関値を算出し、相関値がピーク（本実施形態では、最小）になる位相シフト量に応じた時間長をその音声データの表す音声の基本周期として検出する。なお、相関値が最小になる位相シフト量に応じた時間長を基本周期とするのか、それとも、相関値が最大になる位相シフト量に応じた時間長を基本周期とするのかについては、相関値の定義（すなわち、値が小さいほど相関が高いのか、それとも、値が大きいほど相関が高いのか）に応じて定めれば良い。 The basic period detection unit 20A detects the basic period of the voice represented by the voice data from the voice data from which the band component having a frequency higher than the cut-off frequency is removed by the filter 10A. This is an electronic circuit that executes a process of supplying (basic cycle data) to the speech speed conversion unit 30A. An existing algorithm can be used as the fundamental period detection algorithm in the fundamental period detection unit 20A, and in this embodiment, the fundamental period detection algorithm based on the autocorrelation calculation method described above is employed. More specifically, the basic period detection unit 20A performs the above-described phase shift processing in units of the number of samples on the audio data supplied from the filter 10A, and the audio power data before the phase shift and the audio after the phase shift. A correlation value with the data is calculated, and a time length corresponding to the phase shift amount at which the correlation value reaches a peak (minimum in this embodiment) is detected as a basic period of the voice represented by the voice data. Whether the time length corresponding to the phase shift amount that minimizes the correlation value is the basic period or whether the time length corresponding to the phase shift amount that maximizes the correlation value is the basic period is the correlation value (That is, the smaller the value, the higher the correlation, or the higher the value, the higher the correlation).

話速変換部３０Ａは、音声コーデック２から与えられた音声データに対して所定の話速変換アルゴリズム（例えば、前述したＰＩＣＯＬＡ）にしたがった時間軸圧伸処理を施し、その処理結果である音声データを音声出力系３に供給する電子回路である。話速変換部３０Ａには、携帯電話機の操作部（図示省略）が接続されている。話速変換部３０Ａは、話速を遅くする旨およびその度合いを示す指示が上記操作部を介して与えられた場合には、音声コーデックから与えられた音声データに、上記指示された度合いに応じた数の波形を上記基本周期単位で挿入して出力する処理を実行し、逆に、話速を早くする旨およびその度合いを示す指示が上記操作部を介して与えられた場合には、音声コーデックから与えられた音声データから、上記指示された度合いに応じた数の波形を上記基本周期単位で削除して出力する処理を実行する。ここで、フィルタ１０Ａの出力データに対して話速変換を施すのではなく、音声コーデック２から出力される音声データ（すなわち、高次倍音成分が強調されている音声データ）に対して話速変換を施す理由は、高次倍音成分の強調により得られる利点（すなわち、音声の特徴が強調され聞き取り易くなること）を維持しつつ話速変換を行うためである。
以上が話速変換装置１Ａの構成である。 The speech speed conversion unit 30A performs a time axis companding process according to a predetermined speech speed conversion algorithm (for example, the above-described PICOLA) on the speech data given from the speech codec 2, and the speech data that is the processing result Is an electronic circuit for supplying to the audio output system 3. An operation unit (not shown) of a mobile phone is connected to the speech speed conversion unit 30A. When an instruction to slow down the speech speed and an instruction indicating the degree are given via the operation unit, the speech speed conversion unit 30A responds to the voice data given from the voice codec in accordance with the designated degree. When the instruction to indicate that the speech speed is to be increased and its degree is given via the operation unit, the voice is A process is executed in which the number of waveforms corresponding to the instructed degree is deleted and output in units of the basic period from the audio data given from the codec. Here, speaking speed conversion is not performed on the output data of the filter 10A, but speaking speed conversion is performed on the sound data output from the sound codec 2 (that is, sound data in which higher-order harmonic components are emphasized). The reason is that speech speed conversion is performed while maintaining the advantage obtained by emphasizing higher-order harmonic components (that is, the features of speech are emphasized and easy to hear).
The above is the configuration of the speech speed conversion apparatus 1A.

以上に説明した構成としたため、本実施形態に係る話速変換装置１Ａにおいては、上記カットオフ周波数よりも高い周波数を有する帯域成分が除去された音声データに対して基本周期の検出が実行される。このため、上記カットオフ周波数を適切に定めておけば、通話相手の携帯電話機の音声コーデックにより強調された高次倍音成分を除去して正確な基本周期を検出することが可能になる。図２および図３は、本実施形態の効果を示す図であり、前述した図６（ｂ）の場合と同一の音声データについて自己相関算出処理を施した場合の処理結果を示す図である。図２（ａ）は、フィルタ１０Ａとしてカットオフ周波数が４００Ｈｚの１次ＬＰＦを用いた場合の自己相関値と位相シフト量との関係を表す図であり、図２（ｂ）はフィルタ１０Ａとしてカットオフ周波数が４００Ｈｚの２次ＬＰＦを用いた場合の自己相関値と位相シフト量との関係を表す図である。前掲図６（ｂ）と、図２（ａ）および（ｂ）とを比較すれば明らかなように、図２（ａ）および図２（ｂ）に示す算出結果においては高次倍音成分による影響が緩和され、基本周期をほぼ正確に検出ことが可能になっている。また、図２（ｂ）では図２（ａ）に比較して高次倍音成分による影響が一層和らげられ、図６（ａ）と比較しても高次倍音成分による影響がほとんど無くなっていることがわかる。したがって、フィルタ１０ＡとしてＬＰＦを用いる場合には、１次ＬＰＦよりも２次ＬＰＦを用いることが好ましいと言える。 With the configuration described above, in the speech rate conversion apparatus 1A according to the present embodiment, detection of the fundamental period is performed on audio data from which band components having a frequency higher than the cut-off frequency have been removed. . For this reason, if the cut-off frequency is appropriately determined, it is possible to remove the high-order harmonic components emphasized by the voice codec of the other party's mobile phone and detect an accurate basic period. 2 and 3 are diagrams illustrating the effects of the present embodiment, and are diagrams illustrating processing results when the autocorrelation calculation processing is performed on the same audio data as in the case of FIG. 6B described above. FIG. 2A is a diagram showing the relationship between the autocorrelation value and the phase shift amount when a first-order LPF with a cutoff frequency of 400 Hz is used as the filter 10A, and FIG. 2B is a cut as the filter 10A. It is a figure showing the relationship between the autocorrelation value at the time of using a 2nd order LPF with an off frequency of 400 Hz, and a phase shift amount. As apparent from a comparison between FIG. 6B and FIG. 2A and FIG. 2B, the calculation results shown in FIG. 2A and FIG. Is relaxed, and the fundamental period can be detected almost accurately. Further, in FIG. 2 (b), the influence of the higher order harmonic component is further alleviated compared to FIG. 2 (a), and the influence of the higher order harmonic component is almost eliminated even compared to FIG. 6 (a). I understand. Therefore, when an LPF is used as the filter 10A, it can be said that it is preferable to use a secondary LPF rather than a primary LPF.

図３（ａ）は、フィルタ１０Ａとしてカットオフ周波数が６００Ｈｚの２次ＬＰＦを用いた場合の自己相関値と位相シフト量との関係を表す図であり、図３（ｂ）はフィルタ１０Ａとしてカットオフ周波数が８００Ｈｚの２次ＬＰＦを用いた場合の自己相関値と位相シフト量との関係を表す図である。これら図３（ａ）および（ｂ）に示す算出結果においても、高次倍音成分による影響がほとんど無くなっている。したがって、フィルタ１０Ａとして２次ＬＰＦを用いる態様では、少なくとも４００〜８００Ｈｚの範囲内のカットオフ周波数を有する２次ＬＰＦを用いれば、ほぼ正確に基本周期を求めることが可能になり、話速変換の際にノイズが発生することを回避することが可能になる。なお、本実施形態では、フィルタ１０Ａとして予め定められたカットオフ周波数よりも高い周波数を有する帯域成分の通過を阻害するＬＰＦを用いたが、携帯電話機の操作部を介して入力される指示に応じてカットオフ周波数を切替えても良く、１次ＬＰＦとしてフィルタ１０Ａを機能させるのか、それとも２次ＬＰＦとして機能させるのかを上記指示に応じて切替えても良い。 FIG. 3A is a diagram showing the relationship between the autocorrelation value and the phase shift amount when a second-order LPF having a cutoff frequency of 600 Hz is used as the filter 10A, and FIG. 3B is a cut as the filter 10A. It is a figure showing the relationship between the autocorrelation value at the time of using a 2nd order LPF with an off frequency of 800 Hz, and a phase shift amount. In the calculation results shown in FIGS. 3 (a) and 3 (b), the influence of higher harmonic components is almost eliminated. Therefore, in the aspect using the second-order LPF as the filter 10A, the fundamental period can be obtained almost accurately by using the second-order LPF having a cut-off frequency in the range of at least 400 to 800 Hz. It is possible to avoid the occurrence of noise at the time. In this embodiment, an LPF that inhibits the passage of a band component having a frequency higher than a predetermined cutoff frequency is used as the filter 10A. However, according to an instruction input via the operation unit of the mobile phone. The cut-off frequency may be switched, and whether the filter 10A functions as the primary LPF or the secondary LPF may be switched according to the instruction.

（Ｂ：第２実施形態）
図４は、本発明の第２実施形態に係る話速変換装置１Ｂの構成例を示すブロック図である。図４と図１とを比較すれば明らかように、話速変換装置１Ｂは、制御部４０Ｂを有している点、およびフィルタ１０Ａに換えてフィルタ１０Ｂを有している点で話速変換装置１Ａと異なっている。 (B: Second embodiment)
FIG. 4 is a block diagram showing a configuration example of a speech speed conversion apparatus 1B according to the second embodiment of the present invention. As apparent from a comparison between FIG. 4 and FIG. 1, the speech speed conversion apparatus 1B has a control unit 40B and a speech speed conversion apparatus in that it has a filter 10B instead of the filter 10A. Different from 1A.

フィルタ１０Ｂは、フィルタ特性が可変のフィルタである。フィルタ１０Ｂのフィルタ特性は制御部４０Ｂによって制御される。制御部４０Ｂは、例えばＣＰＵ（Central Processing Unit）と、上記フィルタ特性の制御処理を同ＣＰＵに実行させるための制御プログラムが書き込まれたＦｌａｓｈＲＯＭなどの不揮発性メモリと、その制御プログラムを実行する際のワークエリアとして使用される揮発性メモリであるＲＡＭとを含んでいる（何れも図示省略）。上記制御プログラムにしたがって作動しているＣＰＵは、音声コーデック２により解凍される圧縮音声データの送信元から、その圧縮音声データにおいて何れの高次倍音成分が強調されているのかを示す特性データを取得し、その特性データに応じてフィルタ１０Ｂのフィルタ特性を設定する処理を実行する。より詳細に説明すると、上記ＣＰＵは、上記特性データの示す高次倍音成分を抑止するようなフィルタ特性をフィルタ１０Ｂに設定する処理（例えば、フィルタ１０Ｂにシェルビング型のフィルタ特性を与え、カットオフ周波数やゲインを調整する処理）を実行する。 The filter 10B is a filter with variable filter characteristics. The filter characteristics of the filter 10B are controlled by the control unit 40B. The control unit 40B includes, for example, a CPU (Central Processing Unit), a non-volatile memory such as a FlashROM in which a control program for causing the CPU to execute the control process of the filter characteristics is written, and a control program for executing the control program. It includes a RAM, which is a volatile memory used as a work area (both not shown). The CPU operating in accordance with the control program obtains characteristic data indicating which high-order harmonic components are emphasized in the compressed audio data from the source of the compressed audio data decompressed by the audio codec 2 And the process which sets the filter characteristic of the filter 10B according to the characteristic data is performed. More specifically, the CPU sets a filter characteristic in the filter 10B that suppresses higher-order harmonic components indicated by the characteristic data (for example, gives a shelving-type filter characteristic to the filter 10B and cuts off the filter characteristic). Execute processing to adjust frequency and gain.

上記特性データの一例としては、上記圧縮音声データの送信元である携帯電話機に内蔵されている音声コーデックの入出力特性を示すデータが挙げられる。かかる入出力特性がわかれば、送信側の音声コーデックにより何れの高次倍音成分が強調されたのかを特定することが可能であり、送信側の音声コーデックで強調された高次倍音成分を抑止するようなフィルタ特性をフィルタ１０Ｂに設定することが可能になるからである。なお、かかる特性データの送受信には、例えば、移動電話網の制御チャネルを用いても良く、また、音声通信とパケット通信とを並行に行える携帯電話機同士の音声通話であれば、パケット通信により上記特性データを送受信しても良い。このように話速変換装置１Ｂによれば、通話相手の携帯電話機に実装されている音声コーデックの入出力特性に応じたフィルタ特性がフィルタ１０Ｂに設定される。近年では、様々な種類の携帯電話機が市場に流通しており、それら携帯電話機に実装されている音声コーデックの種類も多岐に亘る。音声コーデックの入出力特性はその音声コーデックの種類に応じて異なることが一般的であるが、本実施形態に係る話速変換装置１Ｂによれば、通話相手の携帯電話機が有する音声コーデックの入出力特性に応じたフィルタ特性がフィルタ１０Ｂに設定されるため、通話相手の携帯電話機がどのような種類の音声コーデックを備えていても通話音声の基本周期を正確に検出することが可能になり、その基本周期を用いた話速変換処理におけるノイズの発生を回避することが可能になる。 As an example of the characteristic data, there is data indicating input / output characteristics of a voice codec built in a mobile phone that is a transmission source of the compressed voice data. If such input / output characteristics are known, it is possible to specify which higher-order harmonic components are emphasized by the transmission-side audio codec, and suppress higher-order harmonic components that are emphasized by the transmission-side audio codec. This is because such filter characteristics can be set in the filter 10B. For transmission / reception of such characteristic data, for example, a control channel of a mobile telephone network may be used. In addition, if the voice communication is performed between mobile phones capable of performing voice communication and packet communication in parallel, the above-described characteristic data is transmitted by packet communication. Characteristic data may be transmitted and received. Thus, according to the speech rate conversion apparatus 1B, the filter characteristic corresponding to the input / output characteristic of the voice codec mounted on the mobile phone of the other party is set in the filter 10B. In recent years, various types of mobile phones have been distributed in the market, and the types of audio codecs installed in these mobile phones are also diverse. In general, the input / output characteristics of the audio codec differ depending on the type of the audio codec. However, according to the speech speed converting apparatus 1B according to the present embodiment, the input / output of the audio codec possessed by the mobile phone of the other party is provided. Since the filter characteristic corresponding to the characteristic is set in the filter 10B, it becomes possible to accurately detect the basic period of the call voice regardless of what kind of voice codec the mobile phone of the other party has. It is possible to avoid the generation of noise in the speech speed conversion process using the basic period.

なお、本実施形態では、通話相手の携帯電話機が有する音声コーデックの入出力特性を示す特性データをその通話相手である携帯電話機から取得したが、例えば電話番号等から上記入出力特性を特定できる場合には、上記特性データを通話相手の携帯電話機から取得する必要はない。例えば、移動電話サービスを提供する通信事業者（キャリア）毎にその通信事業者向けの携帯電話機に実装されている音声コーデックの入出力特性が異なる場合には、携帯電話機の電話帳テーブルに格納されている電話番号の各々に、その電話番号から特定されるキャリア向けの携帯電話機に実装されている音声コーデックの入出力特性に対応したファイルタ特性を表すデータを対応付けておき、通話相手の電話番号に対応するフィルタ特性を上記電話帳テーブルの格納内容から特定してフィルタ１０Ｂへ設定する処理を制御部４０Ｂに実行させれば良い。また、本第２実施形態では、通話相手の携帯電話機に実装されている音声コーデックの入出力特性に基づいて、フィルタ１０Ｂのフィルタ特性を設定したが、さらに、話速変換装置１Ｂが実装されている携帯電話機（受話側の携帯電話機）に実装されている音声コーデックの入出力特性も加味してフィルタ１０Ｂのフィルタ特性を設定しても良い。 In the present embodiment, the characteristic data indicating the input / output characteristics of the voice codec of the mobile phone of the other party is obtained from the mobile phone that is the other party of the call. Therefore, it is not necessary to acquire the characteristic data from the mobile phone of the other party. For example, if the input / output characteristics of the voice codec implemented in a mobile phone for a carrier are different for each carrier (carrier) that provides mobile phone service, it is stored in the phone book table of the mobile phone. Each telephone number is associated with data representing the filter characteristics corresponding to the input / output characteristics of the voice codec installed in the mobile phone for the carrier specified by the telephone number. What is necessary is just to make control part 40B perform the process which specifies the filter characteristic corresponding to a number from the storage content of the said telephone directory table, and sets to the filter 10B. In the second embodiment, the filter characteristics of the filter 10B are set based on the input / output characteristics of the voice codec mounted on the mobile phone of the other party, but the speech speed conversion device 1B is further mounted. The filter characteristics of the filter 10B may be set in consideration of the input / output characteristics of the voice codec installed in the mobile phone (receiver-side mobile phone).

（Ｃ：第３実施形態）
図５は、本発明の第３実施形態に係る話速変換装置１Ｃの構成例を示すブロック図である。図５と図４とを比較すれば明らかなように、話速変換装置１Ｃは、制御部４０Ｂに代えて制御部４０Ｃを設けた点と、記憶部５０Ｃを新たに設けた点が話速変換装置１Ｂと異なっている。記憶部５０Ｃは、例えばＦｌａｓｈＲＯＭなどの不揮発性メモリである。記憶部５０Ｃには、音声の周波数特性をオクターブ単位で分割した各周波数帯域における信号強度で表した周波数特性データが格納されている。具体的には、記憶部５０Ｃには、男性の標準的な音声の周波数特性を表す第１の周波数特性データと女性の標準的な音声の周波数特性を表す第２の周波数特性データが格納されている。ここで、第１および第２の周波数特性データは、オクターブ単位で分割した各周波数帯域における信号強度で音声の周波数特性を表わしたものであるから、音声の音響的な特徴を詳細に表すことはできないが、男性の標準的な音声や女性の標準的な音声の概略的な特徴を表すことはできる。 (C: Third embodiment)
FIG. 5 is a block diagram showing a configuration example of a speech rate conversion apparatus 1C according to the third embodiment of the present invention. As is apparent from a comparison between FIG. 5 and FIG. 4, the speech speed conversion apparatus 1C is based on the point that the control unit 40C is provided instead of the control unit 40B and the point that the storage unit 50C is newly provided. It is different from the apparatus 1B. The storage unit 50C is a non-volatile memory such as FlashROM, for example. The storage unit 50C stores frequency characteristic data represented by signal intensity in each frequency band obtained by dividing the frequency characteristic of audio in octave units. Specifically, the storage unit 50C stores first frequency characteristic data representing frequency characteristics of male standard voice and second frequency characteristic data representing frequency characteristics of female standard voice. Yes. Here, the first frequency characteristic data and the second frequency characteristic data represent the frequency characteristics of the sound by the signal intensity in each frequency band divided in octave units, so that the acoustic characteristics of the sound are expressed in detail. It is not possible, but it can represent the general characteristics of male standard voice and female standard voice.

制御部４０Ｃは、制御部４０Ｂと同一のハードウェア構成を有しているが、制御部４０Ｃの不揮発性メモリには、制御部４０Ｂのものとは異なる制御プログラムが予め書き込まれている。この制御プログラムは、以下に述べる２つの処理を制御部４０ＣのＣＰＵに実行させる。第１に、音声コーデック２から受け取った音声データ（すなわち、フィルタ１０Ｂへ与える音声データ）を解析してその音声データの表す音声についての上記周波数特性データ（その音声データの表す音声の周波数特性を、オクターブ単位で分割した各周波数帯域における信号強度で表したデータ）を生成する処理である。第２に、上記第１の処理にて生成した周波数特性データと記憶部５０Ｃに格納されている２種類の周波数特性データの各々とを比較して乖離が小さい方を特定し、上記生成した周波数特性データと上記特定した周波数特性データとの差に応じたフィルタ特性（具体的には、両者の差を小さくするフィルタ特性）をフィルタ１０Ｂに設定する処理である。 The control unit 40C has the same hardware configuration as the control unit 40B, but a control program different from that of the control unit 40B is previously written in the nonvolatile memory of the control unit 40C. This control program causes the CPU of the control unit 40C to execute the following two processes. First, by analyzing the voice data received from the voice codec 2 (that is, voice data given to the filter 10B), the frequency characteristic data (the voice frequency characteristic represented by the voice data) of the voice represented by the voice data, (Data represented by signal intensity in each frequency band divided in units of octaves). Second, the frequency characteristic data generated in the first process is compared with each of the two types of frequency characteristic data stored in the storage unit 50C to identify the one with the smaller divergence, and the generated frequency This is a process for setting the filter characteristic (specifically, the filter characteristic for reducing the difference between the two) in the filter 10B according to the difference between the characteristic data and the specified frequency characteristic data.

例えば、上記第２の処理により、第１の周波数特性データとの乖離よりも第２の周波数特性データとの乖離が小さいと判定された場合には、該第２の周波数特性データと上記第１の処理にて生成した周波数特性データとの差を小さくするフィルタ特性をフィルタ１０Ｂに設定する処理が制御部４０Ｃによって実行される。具体的には、制御部４０Ｃは、オクターブ単位で分割された複数の周波数帯域のうちのある周波数帯域について上記第１の処理にて生成した周波数特性データの信号強度が上記第２の周波数特性データの同周波数帯域の信号強度よりも所定の閾値以上大きい場合には、その周波数帯域の信号を減衰させるフィルタ特性をフィルタ１０Ｂに設定する処理を実行する。前述したように、第１の周波数特性データは男性の標準的な音声の周波数特性を表すデータであり、第２の周波数特性データは女性の標準的な音声の周波数特性を表すデータであるから、これら２種類の周波数特性データの何れが上記第１の処理で生成した周波数特性データとの乖離が小さいかを判定することによって、通話相手が男性であるのか女性であるかが特定される。そして、このようにして特定した周波数特性データと上記第１の処理で生成した周波数特性データとの差を小さくするフィルタ特性をフィルタ１０Ｂに設定することにより、音声コーデック２から受け取った音声データにおいて、標準的な男性（または女性）の音声に比較して強調されている帯域成分（すなわち、通話相手の音声コーデックにより強調された高次倍音成分）がフィルタ１０Ｂによるフィルタ処理により抑圧されて基本周期検出部２０Ａに与えられるのである。つまり、本実施形態に係る話速変換装置１Ｃによれば、通話相手の声質に適したフィルタ特性をフィルタ１０Ｂに設定し、基本周期の検出の障害となる高次倍音成分を除去するのである。なお、本実施形態では、男性の標準的な音声および女性の標準的な音声の２種類の声質に対応した周波数特性データを記憶部５０Ｃに格納しておいたが、３種類以上の互いに異なる声質の音声の周波数特性を示す周波数特性データを記憶部５０Ｃに格納しておいても良い。 For example, when it is determined by the second process that the deviation from the second frequency characteristic data is smaller than the deviation from the first frequency characteristic data, the second frequency characteristic data and the first frequency characteristic data The control unit 40C executes a process of setting a filter characteristic for reducing the difference from the frequency characteristic data generated by the process in the filter 10B. Specifically, the control unit 40C determines that the signal strength of the frequency characteristic data generated in the first process for a certain frequency band among the plurality of frequency bands divided in octave units is the second frequency characteristic data. When the signal intensity is larger than the signal strength in the same frequency band by a predetermined threshold or more, a process for setting the filter characteristic for attenuating the signal in the frequency band to the filter 10B is executed. As described above, the first frequency characteristic data is data representing the frequency characteristic of male standard voice, and the second frequency characteristic data is data representing the frequency characteristic of female standard voice. By determining which of the two types of frequency characteristic data has a small deviation from the frequency characteristic data generated in the first process, it is specified whether the other party is a male or a female. In the audio data received from the audio codec 2, by setting a filter characteristic in the filter 10B that reduces the difference between the frequency characteristic data thus identified and the frequency characteristic data generated in the first process, The band component emphasized compared to the standard male (or female) voice (that is, the higher-order harmonic component emphasized by the voice codec of the other party) is suppressed by the filter processing by the filter 10B to detect the basic period. It is given to the part 20A. That is, according to the speech speed conversion apparatus 1C according to the present embodiment, the filter characteristic suitable for the voice quality of the other party is set in the filter 10B, and the high-order harmonic components that become an obstacle to the detection of the fundamental period are removed. In the present embodiment, the frequency characteristic data corresponding to two types of voice qualities, male standard voice and female standard voice, are stored in the storage unit 50C, but three or more types of different voice qualities are stored. The frequency characteristic data indicating the frequency characteristic of the voice may be stored in the storage unit 50C.

（Ｄ：その他の実施形態）
以上、本発明の各実施形態について説明したが、上記各実施形態に以下に述べる変形を加えても良い。
（１）上述した実施形態では、音声コーデックにより解凍された音声データの表す音声の基本周期を正確に検出することによって、基本周期を用いた話速変換（すなわち、基本周期単位での波形の挿入または削除）におけるノイズ発生を回避した。しかし、基本周期（または、基本周期の逆数である基本周波数）を利用可能な音声処理は話速変換に限定されるものではなく、また、それら基本周期や基本周波数を用いた他の音声処理においても、それらが正確に検出されていることが望ましいことは言うまでも無い。そこで、前述したフィルタ１０Ａと基本周期検出部２０Ａとを組み合わせて基本周期検出装置（あるいは基本周波数検出装置）を構成し、この基本周期検出装置（基本周波数検出装置）を上記他の音声処理を実行する信号処理装置に組み込んでも良い。同様に、第２実施形態に係るフィルタ１０Ｂ、基本周期検出部２０Ａおよび制御部４０Ｂを組み合わせて基本周期検出装置を構成しても良く、第３実施形態に係るフィルタ１０Ｂ、基本周期検出部２０Ａ、制御部４０Ｃおよび記憶部５０Ｃを組み合わせて基本周期検出装置を構成しても良い。 (D: Other embodiments)
As mentioned above, although each embodiment of this invention was described, you may add the deformation | transformation described below to each said embodiment.
(1) In the above-described embodiment, speech speed conversion using the basic period (that is, waveform insertion in the basic period unit) is performed by accurately detecting the basic period of the audio represented by the audio data decompressed by the audio codec. (Or deletion) was avoided. However, speech processing that can use the fundamental period (or fundamental frequency that is the inverse of the fundamental period) is not limited to speech speed conversion, and in other speech processing that uses the fundamental period or fundamental frequency. However, it goes without saying that it is desirable that they are accurately detected. Therefore, a fundamental period detector (or fundamental frequency detector) is configured by combining the above-described filter 10A and fundamental period detector 20A, and the fundamental period detector (basic frequency detector) executes the other voice processing. It may be incorporated in a signal processing device. Similarly, the fundamental period detector may be configured by combining the filter 10B according to the second embodiment, the fundamental period detector 20A, and the controller 40B. The filter 10B, the fundamental period detector 20A according to the third embodiment, The basic period detection device may be configured by combining the control unit 40C and the storage unit 50C.

（２）上述した実施形態では、本発明に係る話速変換装置を携帯電話機に組み込んだ場合について説明したが、本発明に係る話速変換装置の組み込み対象は携帯電話機に限定されるものではなく、例えば電話会議端末であっても良い。要は、音声コーデックによる圧縮で高次倍音成分が強調された圧縮音声データを通信網を介して送信するとともに、通信網を介して受信した圧縮音声データを音声コーデックにより解凍して得られる音声データにしたがって音声を再生する電子機器であれば、本発明に係る話速変換装置を組み込むことによりノイズの発生の少ない話速変換を行うことができる。 (2) In the above-described embodiment, the case where the speech speed conversion device according to the present invention is incorporated into a mobile phone has been described. However, the subject of incorporation of the speech speed conversion device according to the present invention is not limited to a mobile phone. For example, a telephone conference terminal may be used. In short, audio data obtained by transmitting compressed audio data in which high-order harmonic components are emphasized by compression by the audio codec via the communication network and decompressing the compressed audio data received via the communication network by the audio codec If the electronic device reproduces voice according to the above, it is possible to perform speech speed conversion with less noise by incorporating the speech speed conversion device according to the present invention.

本発明の第１実施形態に係る話速変換装置１Ａの構成例を示すブロック図である。It is a block diagram which shows the structural example of 1 A of speech-speed converters which concern on 1st Embodiment of this invention. 同話速変換装置１Ａのフィルタ１０Ａとしてカットオフ周波数が４００Ｈｚである１次ＬＰＦを用いた場合と同２次ＬＰＦを用いた場合の処理結果を示す図である。It is a figure which shows the processing result at the time of using the 1st-order LPF whose cutoff frequency is 400 Hz, and the 2nd-order LPF as the filter 10A of the speech speed conversion apparatus 1A. 同話速変換装置１Ａのフィルタ１０Ａとしてカットオフ周波数が６００Ｈｚである２次ＬＰＦを用いた場合と同８００Ｈｚである２次ＬＰＦを用いた場合の処理結果を示す図である。It is a figure which shows the processing result at the time of using the 2nd order LPF whose 800 Hz is the same as the case where the 2nd order LPF whose cutoff frequency is 600 Hz is used as the filter 10A of the speech speed conversion apparatus 1A. 本発明の第２実施形態に係る話速変換装置１Ｂの構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech-speed converter 1B which concerns on 2nd Embodiment of this invention. 本発明の第３実施形態に係る話速変換装置１Ｃの構成例を示すブロック図である。It is a block diagram which shows the structural example of 1C of speech-speed converters which concern on 3rd Embodiment of this invention. 携帯電話機における従来の話速変換の問題点を説明するための図である。It is a figure for demonstrating the problem of the conventional speech speed conversion in a mobile telephone.

１Ａ、１Ｂ，１Ｃ…話速変換装置、２…音声コーデック、３…音声出力系、１０Ａ、１０Ｂ…フィルタ、２０Ａ…基本周期検出部、３０Ａ…話速変換部、４０Ｂ、４０Ｃ…制御部、５０Ｃ…記憶部。 DESCRIPTION OF SYMBOLS 1A, 1B, 1C ... Speech speed conversion apparatus, 2 ... Voice codec, 3 ... Voice output system, 10A, 10B ... Filter, 20A ... Basic period detection part, 30A ... Speech speed conversion part, 40B, 40C ... Control part, 50C ... memory part.

Claims

A filter that receives audio data, outputs high-order harmonic components from the audio data, and outputs a filter, the filter having a variable filter characteristic;
Storage means for storing a plurality of frequency characteristic data representing frequency characteristics of voices having different voice qualities, wherein the frequency characteristic data is expressed by signal intensity in each of the bands divided in octave units. When,
Analyzing means for analyzing the audio data applied to the filter and generating the frequency characteristic data for the audio represented by the audio data;
The frequency characteristic data generated by the analyzing means is compared with each of the plurality of frequency characteristic data stored in the storage means to identify the one having the smallest deviation, and the generated frequency characteristic data and the specifying A control unit that controls the filter characteristics of the filter according to a difference from the frequency characteristic data that has been performed;
The output data of the filter is subjected to phase shift processing while changing the phase shift amount, and the correlation value between the output data before the phase shift and the output data after the phase shift is calculated, and the correlation value reaches the peak corresponding to the phase shift amount A basic period detection unit that detects a time length as a basic period of the voice represented by the voice data;
A basic period detection device comprising:

The plurality of frequency characteristic data stored in the storage means include data representing a male standard voice frequency characteristic and data representing a female standard voice frequency characteristic. The fundamental period detection device according to claim 1, wherein

The basic period detection device according to claim 1 or 2,
The received voice data is supplied to the fundamental period detector to detect the fundamental period, and the received voice data is output by inserting or deleting a waveform in units of the fundamental period detected by the fundamental period detector. A speech speed conversion unit;
A speech rate conversion device comprising: