JP2001109499A

JP2001109499A - Speech speed conversion device

Info

Publication number: JP2001109499A
Application number: JP25185899A
Authority: JP
Inventors: Tatsuo Inoue; 健生井上
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1999-08-03
Filing date: 1999-09-06
Publication date: 2001-04-20
Anticipated expiration: 2019-09-06
Also published as: JP3691304B2

Abstract

PROBLEM TO BE SOLVED: To prevent a slowly spoken sound signal from being converted into a more slowly spoken sound signal resulting from applying a speech speed conversion processing to the slowly spoken sound signal in a speech speed conversion device. SOLUTION: A pitch period Tn-1 stored in a pitch period storage part 2 is compared with a pitch period Tn which is newly extracted in a pitch period extraction part 1 by a pitch period comparison part 3. The value of a counter 4 is increased in accordance with the comparison result. A comparison part 6 compares a value M obtained by multiplying the pitch period Tn by the value C of the counter 4 with a threshold S which is set in a threshold setting part 5. A speech speed conversion part 7 applies a speech speed conversion processing to an input sound signal at a prescribed mode based on the comparison result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声信号の話速
を変える話速変換装置に関し、例えば、映像を伴うテレ
ビ、レーザディスク、ＶＴＲ等の音声の早聞きまたは遅
聞きを行なう音声再生装置、聴覚障害者や高齢者のため
に、放送される音声信号をゆっくりした聞きやすい音声
に変換する聴覚補助装置及び該装置を備えた電話機等の
機器、さらにはネイティブスピードで話された英語音声
をゆっくりした聞きやすい音声に変換する英語学習器
等、種々の機器にて利用が可能な話速変換装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech rate converter for changing the speech rate of an audio signal. For the hearing impaired and the elderly, hearing aids and devices such as telephones equipped with the hearing aids, which convert broadcast audio signals into slow, easy-to-hear voices, and English voice spoken at native speed slowly The present invention relates to a speech speed conversion device that can be used in various devices, such as an English language learner that converts speech into an easy-to-hear speech.

【０００２】尚、話速変換とは、音声信号の時間軸を圧
縮してその再生速度を本来の速度よりも速くしたり、あ
るいは逆に音声信号の時間軸を伸長してその再生速度を
本来の速度よりも遅くしたりすることを言う。The speech speed conversion means that the time axis of an audio signal is compressed to make its reproduction speed faster than the original speed, or conversely, the time axis of the audio signal is expanded to make its reproduction speed lower. Or slower than your speed.

【０００３】[0003]

【従来の技術】従来、例えば特開平７−１９２３９２号
公報に開示されているように、入力音声信号が音声区間
であるか無音区間であるかに応じて、入力音声信号に対
して圧縮伸長処理または削除処理を行なうように成され
た話速変換装置が知られている。2. Description of the Related Art Conventionally, as disclosed in, for example, Japanese Patent Application Laid-Open No. 7-192392, a compression / expansion process is performed on an input audio signal depending on whether the input audio signal is a voice section or a silent section. Alternatively, a speech speed conversion device configured to perform a deletion process is known.

【０００４】[0004]

【発明が解決しようとする課題】然し乍ら、上記従来の
話速変換装置では、入力音声信号の話速に関わらず一様
に話速を遅くしていたため、元々早口でなくゆっくりし
た話速であっても話速変換装置によってさらに遅くなっ
たり、また、逆に元々早口で速い話速であっても話速変
換装置によってさらに速くなったりして、大変煩わしい
といった問題があった。However, in the above-mentioned conventional speech speed conversion device, the speech speed is uniformly reduced regardless of the speech speed of the input voice signal. However, there is a problem that the speech speed conversion device further slows down the operation, and conversely, even if the speech speed is originally fast and fast, the speech speed conversion device further increases the speed.

【０００５】[0005]

【課題を解決するための手段】上記の課題を解決するた
め本発明の話速変換装置では、音声信号よりピッチ周期
を検出するピッチ周期検出手段と、該ピッチ周期検出手
段が抽出したピッチ周期に基づいて所定のピッチ周期の
繰り返し回数を計数する計数手段と、前記ピッチ周期検
出手段で抽出したピッチ周期と前記計数手段で計数した
繰り返し回数との積と所定の閾値とを比較する比較判定
手段と、該比較判定手段の判定結果に基づいて話速変換
を行う話速変換手段とを備えたことを特徴とする。In order to solve the above-mentioned problems, in a speech speed conversion device according to the present invention, a pitch cycle detecting means for detecting a pitch cycle from a voice signal and a pitch cycle extracted by the pitch cycle detecting means are provided. Counting means for counting the number of repetitions of a predetermined pitch cycle based on the comparison, and a comparison determination means for comparing a product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means with a predetermined threshold value And a speech speed conversion means for performing speech speed conversion based on the result of the judgment by the comparison judgment means.

【０００６】また、本発明の話速変換装置では、音声信
号よりピッチ周期を検出するピッチ周期検出手段と、該
ピッチ周期検出手段が抽出したピッチ周期に基づいて所
定のピッチ周期の繰り返し回数を計数する計数手段と、
前記ピッチ周期検出手段で抽出したピッチ周期と前記計
数手段で計数した繰り返し回数との積と所定の閾値とを
比較する比較判定手段と、前記所定の閾値を変更する閾
値変更手段と、前記比較判定手段の判定結果に基づいて
話速変換を行う話速変換手段とを備えたことを特徴とす
る。Further, in the speech speed conversion device of the present invention, a pitch cycle detecting means for detecting a pitch cycle from a voice signal, and counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means. Counting means,
A comparison determining unit that compares a product of the pitch period extracted by the pitch period detecting unit and the number of repetitions counted by the counting unit with a predetermined threshold; a threshold changing unit that changes the predetermined threshold; Speech speed conversion means for performing speech speed conversion based on the determination result of the means.

【０００７】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の閾値を越えない場合にのみ、音声信号をゆっくりし
た音声信号に話速変換することを特徴とする。Further, the speech speed conversion means is provided only when the product of the pitch cycle extracted by the pitch cycle detection means in the comparison determination means and the number of repetitions counted by the counting means does not exceed a predetermined threshold value. It is characterized in that the speech signal is converted into a slow speech signal by speaking speed.

【０００８】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の閾値を越えないと判定した場合には音声信号をゆっ
くりした音声信号に話速変換すると共に、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の閾値を越えたと判定した場合には話速変換を行わな
いことを特徴とする。Further, the speech speed conversion means determines that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means does not exceed a predetermined threshold value. The voice signal is converted into a slow voice signal and the speech speed is converted, and the product of the pitch cycle extracted by the pitch cycle detecting means and the number of repetitions counted by the counting means in the comparing and determining means exceeds a predetermined threshold value. When the determination is made, the speech speed conversion is not performed.

【０００９】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の閾値以内と判定した場合は話速倍率を大きくし、前
記比較判定手段において前記ピッチ周期検出手段で抽出
したピッチ周期と前記計数手段で計数した繰り返し回数
との積が所定の閾値より大きいと判断した場合は話速倍
率を小さくし、話速変換を行うことを特徴とする（但
し、話速倍率＝入力音声信号の時間長／出力音声信号の
時間長）。If the product of the pitch cycle extracted by the pitch cycle detecting means and the number of repetitions counted by the counting means is determined to be within a predetermined threshold value, the speech speed converting means determines the speech rate. If the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is determined to be greater than a predetermined threshold, the speech rate magnification is decreased. The speech rate conversion is performed (however, the speech rate magnification = time length of input voice signal / time length of output voice signal).

【００１０】そして、本発明の話速変換装置は、音声信
号よりピッチ周期を検出するピッチ周期検出手段と、該
ピッチ周期検出手段が抽出したピッチ周期に基づいて所
定のピッチ周期の繰り返し回数を計数する計数手段と、
前記ピッチ周期検出手段で抽出したピッチ周期と前記計
数手段で計数した繰り返し回数との積と所定の第１の閾
値及び所定の第２の閾値とを比較する比較判定手段と
（但し、第１の閾値＜第２の閾値）、該比較判定手段の
判定結果に基づいて話速変換を行う話速変換手段とを備
えたことを特徴とする。The speech speed conversion apparatus according to the present invention comprises: a pitch cycle detecting means for detecting a pitch cycle from a voice signal; and counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means. Counting means,
Comparison determination means for comparing the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means with a predetermined first threshold value and a predetermined second threshold value; Threshold value <second threshold value), and a speech speed conversion unit that performs speech speed conversion based on the determination result of the comparison determination unit.

【００１１】また、本発明の話速変換装置は、音声信号
よりピッチ周期を検出するピッチ周期検出手段と、該ピ
ッチ周期検出手段が抽出したピッチ周期に基づいて所定
のピッチ周期の繰り返し回数を計数する計数手段と、前
記ピッチ周期検出手段で抽出したピッチ周期と前記計数
手段で計数した繰り返し回数との積と所定の第１の閾値
及び所定の第２の閾値とを比較する比較判定手段と（但
し、第１の閾値＜第２の閾値）、前記所定の第１の閾値
または所定の第２の閾値を変更する閾値変更手段と、前
記比較判定手段の判定結果に基づいて話速変換を行う話
速変換手段とを備えたことを特徴とする。Further, the speech speed conversion apparatus according to the present invention comprises a pitch cycle detecting means for detecting a pitch cycle from a voice signal, and counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means. Counting means, and comparison determining means for comparing a product of the pitch cycle extracted by the pitch cycle detecting means and the number of repetitions counted by the counting means with a predetermined first threshold value and a predetermined second threshold value ( However, the threshold value changing means for changing the first threshold value <the second threshold value), the predetermined first threshold value or the predetermined second threshold value, and the speech speed conversion is performed based on the determination result of the comparison determination means. And a speech speed conversion unit.

【００１２】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の第１の閾値と所定の第２の閾値との間にあると判定
した場合には、話速変換を行わないことを特徴とする。[0012] The speech speed conversion means may include a product of the pitch cycle extracted by the pitch cycle detection means in the comparison determination means and the number of repetitions counted by the counting means being a predetermined first threshold value and a predetermined first threshold value. If it is determined that the difference is between the threshold values of 2, the speech speed conversion is not performed.

【００１３】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の第１の閾値よりも小さいと判定した場合には、話速
倍率を小さくして話速変換を行なうことを特徴とする
（但し、話速倍率＝入力音声信号の時間長／出力音声信
号の時間長）。[0013] The speech speed conversion means may determine that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is smaller than a predetermined first threshold value in the comparison determination means. When it is determined, the speech speed conversion is performed with the speech speed magnification reduced (however, the speech speed magnification = time length of input voice signal / time length of output voice signal).

【００１４】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の第２の閾値よりも大きいと判定した場合には、話速
倍率を大きくして話速変換を行なうことを特徴とする
（但し、話速倍率＝入力音声信号の時間長／出力音声信
号の時間長）。Further, the speech speed conversion means may determine that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means in the comparison determination means is greater than a predetermined second threshold value. When the judgment is made, the speech speed conversion is performed by increasing the speech speed magnification (however, the speech speed magnification = time length of input voice signal / time length of output voice signal).

【００１５】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の第１の閾値よりも小さいと判定した場合には、話速
倍率を１より小さくして話速変換を行なうことを特徴と
する（但し、話速倍率＝入力音声信号の時間長／出力音
声信号の時間長）。[0015] The speech speed conversion means may determine that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is smaller than a predetermined first threshold value in the comparison determination means. When it is determined, the speech speed conversion is performed with the speech speed ratio smaller than 1 (however, the speech speed ratio = time length of input audio signal / time length of output audio signal).

【００１６】また、前記話速変換手段は、前記比較判定
手段において前記ピッチ周期検出手段で抽出したピッチ
周期と前記計数手段で計数した繰り返し回数との積が所
定の第２の閾値よりも大きいと判定した場合には、話速
倍率を１より大きくして話速変換を行なうことを特徴と
する（但し、話速倍率＝入力音声信号の時間長／出力音
声信号の時間長）。The speech speed converting means may be arranged such that the product of the pitch cycle extracted by the pitch cycle detecting means and the number of repetitions counted by the counting means is larger than a second predetermined threshold value by the comparing and judging means. When it is determined, the speech speed conversion is performed with the speech speed magnification greater than 1 (however, speech speed magnification = time length of input audio signal / time length of output audio signal).

【００１７】また、前記話速変換手段は、話速変換した
音声信号を蓄積する蓄積手段の空容量に応じて、話速倍
率を変更することを特徴とする（但し、話速倍率＝入力
音声信号の時間長／出力音声信号の時間長）。Further, the speech speed conversion means changes the speech speed magnification in accordance with the empty capacity of the storage means for storing the speech signal whose speech speed has been converted (where speech speed magnification = input voice). Signal time length / output audio signal time length).

【００１８】また、前記話速変換手段は、話速変換した
音声信号を蓄積する蓄積手段の空容量が減少するのに応
じて、話速倍率を１に近づけるように変更することを特
徴とする（但し、話速倍率＝入力音声信号の時間長／出
力音声信号の時間長）。Further, the speech speed conversion means changes the speech speed magnification to be closer to 1 as the empty capacity of the storage means for storing the speech signal whose speech speed has been converted decreases. (However, speech speed magnification = time length of input audio signal / time length of output audio signal).

【００１９】また、前記話速変換手段は、話速変換した
音声信号を蓄積する蓄積手段の空容量が増加するのに応
じて、話速倍率を所定の倍率に近づけるように変更する
ことを特徴とする（但し、話速倍率＝入力音声信号の時
間長／出力音声信号の時間長）。Further, the speech speed conversion means changes the speech speed magnification closer to a predetermined magnification in accordance with an increase in the empty capacity of the storage means for accumulating the speech signal whose speech speed has been converted. (However, speech speed magnification = time length of input audio signal / time length of output audio signal).

【００２０】また、前記所定のピッチ周期とは、同一の
ピッチ周期または２倍のピッチ周期または１／２のピッ
チ周期、並びにこれらに近似するピッチ周期であること
を特徴とする。Further, the predetermined pitch cycle is the same pitch cycle, a double pitch cycle or a half pitch cycle, and a pitch cycle approximate to these.

【００２１】[0021]

【発明の実施の形態】以下、図面を参照しつつ本発明の
話速変換装置について詳述する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a speech speed converter according to the present invention.

【００２２】先ず、図１は本発明の話速変換装置の構成
を示す概略ブロック図である。同図において、１は、図
示されないＡ／Ｄ変換器でデジタル信号に変換された音
声信号が入力され、該入力された音声信号よりピッチ周
期を抽出するピッチ周期抽出部であり、そのピッチ周期
の抽出方法として例えば自己相関を利用する。FIG. 1 is a schematic block diagram showing the configuration of a speech speed conversion device according to the present invention. In FIG. 1, reference numeral 1 denotes a pitch cycle extracting unit which receives a voice signal converted into a digital signal by an A / D converter (not shown) and extracts a pitch cycle from the input voice signal. For example, an autocorrelation is used as an extraction method.

【００２３】自己相関を用いたピッチ周期抽出法には、
信号は時間制限されていると仮定し、時間長Ｔsの区間
内だけに信号が存在し、その時間長Ｔsの区間外では信
号は常にゼロとして自己相関を求める短時間自己相関を
用いる方法がある。これは、コロナ社発行「音声のディ
ジタル信号処理」（上）−L.R.Rabiner＆R.W.Schafer
著、鈴木久喜訳−p152-p152にも記載されているよう
に、いま、音声波形をディジタル音声データｘ(ｎ)で表
すと、前述の方法による短時間自己相関値Ｒｎ(ｋ)は下
記のようになる。The pitch period extraction method using autocorrelation includes:
Assuming that the signal is time-limited, there is a method using short-time autocorrelation in which a signal exists only within the section of the time length Ts and the autocorrelation is always set to zero outside the section of the time length Ts. . This is a digital signal processing of voice issued by Corona (above)-LRRabiner & R.W.Schafer
As described in the book, Kuki Suzuki-p152-p152, if the voice waveform is represented by digital voice data x (n), the short-time autocorrelation value Rn (k) by the above-described method is as follows. Become like

【００２４】[0024]

【数１】 (Equation 1)

【００２５】ここで、Ｔsは音声信号が存在すると仮定
した時間区間、ｋは短時間自己相関値Ｒｎ(ｋ)を算出す
るときに音声波形を遅延させる際の遅延時間であり、Ｔ
s≫ｋの関係にある。そして、前記数１において、短時
間自己相関値Ｒｎ(ｋ)が最大となるようなｋの値を求め
ると、その値がピッチ周期である。Here, Ts is a time section in which a voice signal is assumed to exist, k is a delay time for delaying a voice waveform when calculating a short-time autocorrelation value Rn (k), and T
There is a relationship of s≫k. When the value of k that maximizes the short-time autocorrelation value Rn (k) in Equation 1 is obtained, the value is the pitch period.

【００２６】次に、２は前記ピッチ周期抽出部１で抽出
されたピッチ周期が記憶されるピッチ周期記憶部、３は
前記ピッチ周期記憶部２に記憶されたピッチ周期と前記
ピッチ周期抽出部１で新たに抽出したピッチ周期とを比
較するピッチ周期比較部、４は前記ピッチ周期比較部３
における比較結果に応じてインクリメントされるカウン
タ、５は閾値（詳細は後述する）が予め設定された閾値
設定部である。Next, reference numeral 2 denotes a pitch period storage unit in which the pitch periods extracted by the pitch period extraction unit 1 are stored, and 3 denotes a pitch period stored in the pitch period storage unit 2 and the pitch period extraction unit 1. A pitch cycle comparing section for comparing the pitch cycle with the pitch cycle newly extracted in step 4;
Is a counter that is incremented in accordance with the comparison result in, and 5 is a threshold value setting unit in which a threshold value (details will be described later) is set in advance.

【００２７】また、６は前記ピッチ周期抽出部１で抽出
したピッチ周期と前記カウンタ４の値とを乗算した値
と、前記閾値設定部５にて設定されている閾値とを比較
し、その結果を出力する比較部、７は前記比較部６の出
力する比較結果に基づいて、入力音声信号を所定のモー
ドにて話速変換処理を行って出力する話速変換部、８は
話速変換のモード（詳細は後述する）を選択するための
モード選択信号を出力するモード選択部である。6 compares a value obtained by multiplying the pitch period extracted by the pitch period extracting unit 1 with the value of the counter 4 and a threshold value set by the threshold value setting unit 5; , A speech rate conversion section that performs speech rate conversion processing in a predetermined mode on the basis of the comparison result output from the comparison section 6 and outputs the speech signal, and 8 a speech rate conversion section. A mode selection unit that outputs a mode selection signal for selecting a mode (details will be described later).

【００２８】さらに、図２は前記話速変換部７の構成を
示す概略ブロック図である。FIG. 2 is a schematic block diagram showing the configuration of the speech speed conversion section 7. As shown in FIG.

【００２９】同図において、１１は入力された音声信号
の時間軸を圧縮伸長処理する音声時間軸圧縮伸長部であ
り、ここで用いられる圧縮伸長法としては、例えば、ポ
インター移動量制御による重複加算法（Pointer Interv
al Control Overlap and Add:ＰＩＣＯＬＡ）やＴＤＨ
Ｓ(Time Domain Harmonic Scaling)法等の既知の方法が
利用できるが、これら限られるものではなく、要するに
音声信号の時間軸を圧縮伸長してその再生速度を変更で
きるものであれば構わない。In FIG. 1, reference numeral 11 denotes an audio time axis compression / expansion unit for compressing / expanding the time axis of an input audio signal. The compression / expansion method used here is, for example, overlap addition by pointer movement amount control. Law (Pointer Interv
al Control Overlap and Add: PICOLA) and TDH
Known methods such as the S (Time Domain Harmonic Scaling) method can be used, but are not limited to these methods. In short, any method can be used as long as the time axis of the audio signal can be compressed and expanded to change its reproduction speed.

【００３０】次に、１４は前記音声時間軸圧縮伸長部１
１で圧縮伸長処理された音声信号を既存のＡＤＰＣＭ処
理等によって符号化する音声符号化部、１５は前記音声
符号化部１４で符号化された信号を蓄積するためのメモ
リ、１６は前記メモリ１５からの信号を既存のＡＤＰＣ
Ｍ処理等によって復号する音声復号化部である。前記音
声復号化部１６で復号化された音声信号は、図示されな
いＤ／Ａ変換回路でアナログの音声信号に変換されて出
力される。Next, reference numeral 14 denotes the audio time axis compression / expansion unit 1
1, an audio encoder for encoding the audio signal subjected to the compression / expansion processing by the existing ADPCM processing, etc., 15 is a memory for storing the signal encoded by the audio encoder 14, and 16 is the memory 15 From existing ADPC
This is a speech decoding unit for decoding by M processing or the like. The audio signal decoded by the audio decoding unit 16 is converted into an analog audio signal by a D / A conversion circuit (not shown) and output.

【００３１】また、１２は入力音声信号における無音区
間を検出し、その検出結果を前記音声時間軸圧縮伸長部
へ伝送する無音区間検出部、１３は前記音声時間軸圧縮
伸長部に対して話速倍率Ｎ（倍速）を与える話速制御
部、１７は前記メモリ１５の信号データの蓄積量ｊを検
出する蓄積量検出部である。Reference numeral 12 denotes a silent section detecting section for detecting a silent section in the input audio signal and transmitting the detection result to the audio time axis compression / expansion section. A speech speed control unit 17 for giving a magnification N (double speed) is an accumulation amount detection unit for detecting an accumulation amount j of the signal data in the memory 15.

【００３２】ここで、前記話速倍率Ｎは、［話速倍率
Ｎ］（倍速）＝［入力音声信号の時間長］／［出力音声
信号の時間長］で表され、入力音声信号の時間長とは、
前記音声時間軸圧縮伸長部１１に入力された圧縮伸長前
の音声信号の時間長であり、前記出力音声信号の時間長
とは、前記入力音声信号が音声復号化部１６で復号化さ
れた圧縮伸長後の時間長である。Here, the speech speed magnification N is expressed by [speech speed magnification N] (double speed) = [time length of input voice signal] / [time length of output voice signal], and the time length of the input voice signal Is
The time length of the audio signal before compression / expansion input to the audio time axis compression / expansion unit 11, and the time length of the output audio signal refers to the compression time of the input audio signal decoded by the audio decoding unit 16. This is the length of time after elongation.

【００３３】次に、前記図１のモード選択部８によって
選択され、該選択に基づいて前記話速制御部１３にて話
速変換処理される話速の各種モードについて説明する。
話速のモードとしては、以下の表１に示すような、入力
音声信号の時間軸を伸長してゆっくりした音声信号に話
速変換する［ゆっくり再生モード］を備え、該［ゆっく
り再生モード］は、表の如く話速倍率Ｎを１〜４段階に
て細かく選択できるように構成されている。Next, various modes of the speech speed which are selected by the mode selection unit 8 in FIG. 1 and which are subjected to the speech speed conversion processing by the speech speed control unit 13 based on the selection will be described.
As the speech speed mode, as shown in Table 1 below, there is provided a [slow playback mode] in which the time axis of the input audio signal is extended to convert the speech speed into a slow speech signal. As shown in the table, the speech speed magnification N can be finely selected in 1 to 4 stages.

【００３４】[0034]

【表１】 [Table 1]

【００３５】また、前記蓄積量検出部１７から与えられ
る前記メモリ１５の蓄積量ｊに応じて、話速倍率Ｎの値
がさらに細かく分類されているが、メモリ１５の蓄積量
ｊと話速倍率Ｎとの関係については後述する。The value of the speech speed magnification N is further finely classified according to the accumulation amount j of the memory 15 given from the accumulation amount detection unit 17, but the accumulation amount j of the memory 15 and the speech speed magnification The relationship with N will be described later.

【００３６】各表において、話速倍率Ｎは、前記したよ
うに［入力音声信号の時間長］／［出力音声信号の時間
長］で表されるので、よって話速倍率Ｎが１より小さい
場合は話速が通常の速度よりも遅くなることを表してお
り、この値が小さいほど話速は遅い。逆に話速倍率Ｎが
１より大きい場合は話速が通常の速度よりも速くなるこ
とを表しており、この値が大きいほど話速は速い。In each table, the speech speed magnification N is represented by [time length of input voice signal] / [time length of output voice signal] as described above. Indicates that the speech speed is lower than the normal speed, and the smaller this value is, the lower the speech speed is. Conversely, when the speech speed magnification N is greater than 1, it indicates that the speech speed is faster than the normal speed, and the larger the value, the faster the speech speed.

【００３７】ところで、前記ゆっくり再生モードの選択
は、例えば［５０代］、［６０代］、［７０代］、［８
０代］というように、使用者の年代別のラベルが付けら
れた４つの操作ボタンで選択するように構成してもよ
い。By the way, the selection of the slow reproduction mode is performed, for example, in the case of [50s], [60s], [70s], [8].
For example, four operating buttons labeled according to the age of the user may be selected.

【００３８】補足すると、「高齢者を対象とした話速変
換音声の評価実験：（今井篤、清山信正、都木徹、宮坂
栄一、小野博）、日本音響学会講演論文集、１９９３年
３月）には、話速変換された音声の話速と年代別の聞き
易さの評価について述べられている。この論文に依れ
ば、６０代では話速が０．８７倍速、７０代では０．８
７〜０．７３倍速、８０代では０．７３倍速が聞き易い
という結果が出ており、聞き手の年代が高くなるにつれ
て聴き取り易いと感じる話速も遅くなることが判ってい
る。Supplementally, "Evaluation experiments on speech rate converted speech for elderly people: (Atsushi Imai, Nobumasa Kiyoyama, Toru Toki, Eiichi Miyasaka, Hiroshi Ono), Proceedings of the Acoustical Society of Japan, March 1993. ) Describes the evaluation of the speech speed of the converted speech and the ease of hearing by age.According to this paper, the speech speed is 0.87 times faster in the 60s and 0 in the 70s. .8
It has been found that it is easy to hear at 0.73 × speed in the 7s to 0.73 × speed and 80s in the 80s, and it is known that as the age of the listener increases, the speaking speed at which the listener feels easy to hear also decreases.

【００３９】上記のように構成された話速変換部におい
て、前記話速制御部１３は、前記図１に示したモード選
択部８からのモード選択信号、前記比較部からの比較結
果、さらに前記蓄積量検出部１７からの蓄積量ｊ等の情
報に基づいて、話速倍率Ｎを決定し、前記音声時間軸圧
縮伸長部１１に対して出力する。In the speech speed conversion unit configured as described above, the speech speed control unit 13 includes the mode selection signal from the mode selection unit 8 shown in FIG. 1, the comparison result from the comparison unit, Based on information such as the storage amount j from the storage amount detection unit 17, the speech speed magnification N is determined and output to the audio time axis compression / expansion unit 11.

【００４０】前記音声時間軸圧縮伸長部１１は、前記話
速制御部１３からの話速倍率Ｎに基づいて、音声信号の
時間軸の圧縮または伸長の処理を行う。そして、前記音
声時間軸圧縮伸長部１１は、前記無音区間検出部１２に
よる音声信号の無音区間の検出結果を受け取り、これに
基づいて無音部分を適宜削除しながら、音声信号の時間
軸の圧縮または伸長処理を行う。The voice time axis compression / expansion unit 11 performs a process of compressing or expanding the time axis of the audio signal based on the voice speed magnification N from the voice speed control unit 13. Then, the audio time axis compression / expansion unit 11 receives the detection result of the silent section of the audio signal by the silent section detection unit 12 and, based on this, appropriately deletes the silent section and compresses or compresses the time axis of the audio signal. Perform decompression processing.

【００４１】続いて、上記の如く構成された話速変換装
置の、第１の実施例の動作を、図４のフローチャートに
基づいて説明する。Next, the operation of the first embodiment of the speech speed conversion device configured as described above will be described with reference to the flowchart of FIG.

【００４２】先ず、使用者が前記モード選択部８の操作
ボタン（図示せず）を操作してモードを選択する（Ｓ１
１）。ここでは、一例として、前記表１に示した［ゆっ
くり再生モード１］を選択したものとする。前記の操作
よって、モード選択部８より話速変換部７の話速制御部
１３へモード選択信号が与えられる。First, the user operates the operation button (not shown) of the mode selection section 8 to select a mode (S1).
1). Here, as an example, it is assumed that [Slow playback mode 1] shown in Table 1 is selected. By the above operation, the mode selection signal is given from the mode selection unit 8 to the speech speed control unit 13 of the speech speed conversion unit 7.

【００４３】次に、蓄積量検出部１７がメモリ１５の蓄
積量をチェックして蓄積量ｊを求め、蓄積量ｊの値を話
速制御部１３に与える（Ｓ１２）。Next, the storage amount detector 17 checks the storage amount in the memory 15 to determine the storage amount j, and gives the value of the storage amount j to the speech speed controller 13 (S12).

【００４４】この時点では、メモリ１５への符号化され
た音声信号の蓄積量が０であるとすると、前記表１よ
り、話速倍率Ｎとして０．６［倍速］を初期値として設
定する（Ｓ１３）。At this time, assuming that the accumulated amount of the encoded audio signal in the memory 15 is 0, 0.6 [times] is set as the initial value as the speech speed magnification N from Table 1 (see FIG. 1). S13).

【００４５】また、ピッチ周期記憶部２の記憶内容（以
下、変数名としてピッチ周期Ｔn-1とする）は初期化
（クリア）される。The contents stored in the pitch period storage unit 2 (hereinafter, referred to as a pitch period Tn-1 as a variable name) are initialized (cleared).

【００４６】こうして初期値としての話速倍率Ｎ（＝
０．６［倍速］）が設定された後、ピッチ周期抽出部１
が、前記数１に基づいて、入力音声信号のピッチ周期
（以下、変数名としてピッチ周期Ｔnとする）を抽出す
る（Ｓ１４）。一例として、例えばピッチ周期Ｔn＝６
０［サンプル］が求められたとする。尚、この「サンプ
ル」とは、音声信号がディジタル信号である場合に、所
望のサンプリング周波数に従ってサンプリングされた音
声信号の数をいう。Thus, the speech speed magnification N (=
0.6 [double speed]), the pitch period extracting unit 1
Extracts the pitch cycle of the input audio signal (hereinafter, referred to as pitch cycle Tn as a variable name) based on Equation 1 (S14). As an example, for example, pitch period Tn = 6
It is assumed that 0 [sample] is obtained. The “sample” means the number of audio signals sampled according to a desired sampling frequency when the audio signal is a digital signal.

【００４７】前記ピッチ周期抽出部１で抽出されたピッ
チ周期Ｔnとピッチ周期記憶部２に記憶されているピッ
チ周期Ｔn-1とをピッチ周期比較部３で比較する（Ｓ１
５）。The pitch cycle Tn extracted by the pitch cycle extracting section 1 is compared with the pitch cycle Tn-1 stored in the pitch cycle storing section 2 by the pitch cycle comparing section 3 (S1).
5).

【００４８】然し乍ら、前記したようにピッチ周期記憶
部２の記憶内容がクリアされているので、ステップＳ２
１へ進み、カウンタ４のカウント値Ｃを初期化（クリ
ア）して０にし、さらに次のステップＳ１９で［ゆっく
り再生モード］に設定される（既に前記ステップＳ１１
にて［ゆっくり再生モード］が選択されていたので、実
質的にモード変更は行われず、［ゆっくり再生モード］
が維持されることになる）。However, since the contents stored in the pitch period storage unit 2 have been cleared as described above, step S2
The process proceeds to 1 to initialize (clear) the count value C of the counter 4 to 0, and to set the [slow reproduction mode] in the next step S19 (already the step S11).
Since [Slow playback mode] has been selected in [], the mode is not substantially changed, and [Slow playback mode] is selected.
Will be maintained).

【００４９】また、ピッチ周期Ｔnの値（＝６０［サン
プル］）はピッチ周期記憶部２に記憶され、新たにピッ
チ周期Ｔn-1の値となり（Ｓ２２）、さらに、前記ステ
ップＳ１３で設定した該話速倍率Ｎ＝０．６［倍速］を
音声時間軸圧縮伸長部１１に与え、これを受けた時間軸
圧縮伸長部１１は、話速が０．６［倍速］となるように
入力音声信号の時間軸を伸長する。前記時間軸圧縮伸長
部１１で時間軸が伸長された音声信号は、音声符号化部
１４で符号化され、一端、メモリ１５に蓄積された後、
音声復号化部１６で復号されて出力音声信号となり、処
理はステップＳ２３を経てステップＳ１２へ戻される。The value of the pitch period Tn (= 60 [samples]) is stored in the pitch period storage unit 2 and becomes a new value of the pitch period Tn-1 (S22). The speech speed magnification N = 0.6 [double speed] is given to the audio time axis compression / expansion unit 11, and the time axis compression / expansion unit 11 receives the input signal so that the speech speed becomes 0.6 [double speed]. The time axis of is extended. The audio signal whose time axis has been expanded by the time axis compression / expansion unit 11 is encoded by an audio encoding unit 14, and once stored in a memory 15.
The audio signal is decoded by the audio decoding unit 16 to become an output audio signal, and the process returns to step S12 via step S23.

【００５０】そして、再びメモリの蓄積量ｊのチェック
（Ｓ１２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ１３）。Then, the storage amount j of the memory is checked again (S12), and the speech speed magnification N based on the storage amount j is set (S13).

【００５１】そして、さらにピッチ周期Ｔnの抽出が行
われ（Ｓ１４）、例えばピッチ周期Ｔn＝６１［サンプ
ル］が求められたとする。Then, it is assumed that the pitch period Tn is further extracted (S14), and for example, a pitch period Tn = 61 [samples] is obtained.

【００５２】この抽出したピッチ周期Ｔn（＝６１）と
ピッチ周期記憶部２に記憶されているピッチ周期Ｔn-1
（＝６０［サンプル］）とをピッチ周期比較部３で比較
する（Ｓ１５）。The extracted pitch cycle Tn (= 61) and the pitch cycle Tn-1 stored in the pitch cycle storage unit 2
(= 60 [samples)) is compared by the pitch period comparison unit 3 (S15).

【００５３】ここで、新たに抽出したピッチ周期Ｔn
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）が、
Ｔn≒Ｔn-1という条件（即ち、新たに抽出したピッチ周
期Ｔnと前回に抽出したピッチ周期Ｔn-1とが略等しい）
を満たすなら、カウンタ４のカウント値Ｃが１つインク
リメント（カウント値Ｃ＝１となる）される（Ｓ１
６）。Here, the newly extracted pitch period Tn
(= 61 [sample]) and the pitch cycle Tn-1 (= 60 [sample]) stored in the pitch cycle storage unit 2
The condition Tn ≒ Tn-1 (that is, the newly extracted pitch period Tn is substantially equal to the previously extracted pitch period Tn-1)
Is satisfied, the count value C of the counter 4 is incremented by one (count value C = 1) (S1).
6).

【００５４】そして、ピッチ周期抽出部１で抽出したピ
ッチ周期Ｔn（＝６１［サンプル］）とカウンタ４のカ
ウント値Ｃ（＝１）を積算して積算値Ｍ（＝６１）を求
め、これを比較部６に与える（Ｓ１７）。Then, the pitch period Tn (= 61 [sample]) extracted by the pitch period extracting unit 1 and the count value C (= 1) of the counter 4 are integrated to obtain an integrated value M (= 61). This is given to the comparison unit 6 (S17).

【００５５】比較部６は、前記積算値Ｍと閾値設定部５
で設定されている閾値Ｓ（＝１５００）とを比較する
（Ｓ１８）。The comparing section 6 includes the integrated value M and the threshold value setting section 5
Is compared with the threshold value S (= 1500) set in step (S18).

【００５６】前記したように、積算値Ｍ＝６１であっ
て、閾値Ｓを越えておらず、よって話速制御部１３は話
速のモードを［ゆっくり再生モード］のまま維持し、メ
モリ蓄積量ｊに応じて設定された話速倍率Ｎを音声時間
軸圧縮伸長部１１に与え、これを受けた時間軸圧縮伸長
部１１は、与えられた話速倍率Ｎになるように入力音声
信号の時間軸を伸長する。As described above, since the integrated value M is equal to 61 and does not exceed the threshold value S, the speech speed control unit 13 maintains the speech speed mode in the "slow reproduction mode", and j is given to the speech time axis compression / expansion unit 11, and the time axis compression / expansion unit 11 receives the speech rate N and sets the time of the input audio signal to the given speech rate magnification N. Extend the axis.

【００５７】前述と同様に、前記時間軸圧縮伸長部１１
で時間軸が伸長された音声信号は、音声符号化部１４で
符号化され、一端、メモリ１５に蓄積された後、音声復
号化部１６で復号されて出力音声信号となる。As described above, the time axis compression / expansion unit 11
The audio signal whose time axis has been expanded in step (1) is encoded by the audio encoding unit 14, stored at one end in the memory 15, and then decoded by the audio decoding unit 16 to become an output audio signal.

【００５８】そして、新たにピッチ周期Ｔnの値はピッ
チ周期記憶部２に記憶され（Ｓ２２）、処理はステップ
Ｓ２３を経てステップＳ１２へ戻る。Then, the value of the pitch period Tn is newly stored in the pitch period storage section 2 (S22), and the process returns to step S12 via step S23.

【００５９】上記の如きステップＳ１２→・・・・・→
ステップＳ２３を経てステップ１２へ戻るループ処理に
おいて、入力音声信号がゆっくり話した場合の音声信号
の場合、前記ループ処理が繰り返され、やがてステップ
Ｓ１８において、前記積算値Ｍが閾値Ｓを越えるように
なる。Step S12 as described above →→
In the loop processing that returns to step 12 after step S23, if the input audio signal is a speech signal when the speech is slowly spoken, the loop processing is repeated, and in step S18, the integrated value M exceeds the threshold value S. .

【００６０】図３（ｃ）は、高い声で且つゆっくり話し
た場合の音声信号で、同じピッチ周期Ｔnが１０回繰り
返された波形であり、図３（ｄ）は、低い声で且つゆっ
くり話した場合の音声信号で、同じピッチ周期Ｔnが４
回繰り返された波形である。ゆっくり話した場合の音声
信号は、声の高い低いに関わらず、１つ１つの音（”
あ”，”い”，”う”・・・・）の持続時間が長い。そ
してこのような波形の場合、カウンタ４のインクリメン
トが繰り返され、やがて積算値Ｍが閾値Ｓ（＝１５０
０）を越え、前記ステップＳ１８を経てステップＳ２０
に至り、通常再生モード（話速倍率Ｎ＝１．０）にな
り、実質的に話速変換されない。FIG. 3 (c) shows a voice signal when the user speaks high and slowly, and shows a waveform in which the same pitch period Tn is repeated 10 times. FIG. And the same pitch period Tn is 4
This is a waveform repeated twice. Speech signals when spoken slowly are sounded one by one (""
The duration of “a”, “i”, “u”,... Is long, and in the case of such a waveform, the increment of the counter 4 is repeated until the integrated value M reaches the threshold value S (= 150).
0), and goes through step S18 to step S20.
And the normal playback mode (speech rate magnification N = 1.0) is reached, and speech rate conversion is not substantially performed.

【００６１】また、図３（ａ）は、高い声で且つ早口の
音声信号で、同じピッチ周期Ｔnが４回繰り返された波
形であり、図３（ｂ）は、低い声で且つ早口の音声信号
で、同じピッチ周期Ｔnが２回繰り返された波形であ
る。入力音声信号が早口の音声信号（早口で喋った場合
の音声信号）の場合、図３の（ａ）（ｂ）に示すよう
に、声の高い低いに関わらず、１つ１つの音（”
あ”，”い”，”う”・・・・）の持続時間が短い。よ
って、前記ループ処理が繰り返されても、積算値Ｍが閾
値Ｓを越える前に、ピッチ周期の抽出処理が次の音に対
して行われ、前記ステップＳ１５において、新たに抽出
したピッチ周期Ｔn（例えば”い”という音のピッチ周
期）と１つ前に抽出したピッチ周期Ｔn-1（例えば”
あ”という音のピッチ周期）とが異なることなる。FIG. 3A is a waveform of a high-pitched and fast-talking voice signal in which the same pitch period Tn is repeated four times. FIG. 3B is a low-pitched and fast-talking voice signal. The signal is a waveform in which the same pitch cycle Tn is repeated twice. In the case where the input audio signal is a fast-speech sound signal (speech signal when the user speaks fast), as shown in FIGS.
Therefore, even if the loop processing is repeated, the pitch period extraction processing must be performed before the integrated value M exceeds the threshold value S, even if the loop processing is repeated. In step S15, the pitch cycle Tn newly extracted (eg, the pitch cycle of the sound “I”) and the pitch cycle Tn−1 (eg, “1”) extracted immediately before
The pitch cycle of the sound "A") is different.

【００６２】従って、前記ステップＳ１５を経てステッ
プ２１へ移行し、ここでカウンタ４のカウント値Ｃがク
リアされ、さらにステップＳ１９に移行し、声の高い低
いに関わらず、早口の音声信号が続く間は、［ゆっくり
再生モード］が維持されることになる。Therefore, the process proceeds to step 21 via step S15, where the count value C of the counter 4 is cleared. Further, the process proceeds to step S19, where the voice signal of the fast voice continues regardless of whether the voice is high or low. Means that the [slow playback mode] is maintained.

【００６３】上記のように本発明は、早口の音声信号は
１つ１つの音の持続時間が短く、逆にゆっくり話した場
合の音声信号は１つ１つの音の持続時間が長いという点
に着目し、同じピッチ周期の波形の繰り返し回数とピッ
チ周期との積を所定の閾値と比較することで、声の高い
低いの影響を受けることなく、早口で話した音声信号か
ゆっくり話した場合の音声信号かを判断し、早口の音声
信号に対してのみゆっくりした音声信号となるように話
速変換処理を施すものである。これを利用して話速変換
の話速倍率を変更することを特徴とするものである。As described above, the present invention is characterized in that the voice signal of the fast-talking voice has a short duration of each sound, and the voice signal when speaking slowly has a long duration of each sound. By focusing on the product of the number of repetitions of the waveform of the same pitch cycle and the pitch cycle and comparing it with a predetermined threshold value, without being affected by the high or low voice, the voice signal spoken quickly or spoken slowly It is determined whether the signal is an audio signal, and a speech speed conversion process is performed so that only the fast-speech audio signal becomes a slow audio signal. This is used to change the speech rate magnification of the speech rate conversion.

【００６４】ところで、前記ステップＳ１５におけるピ
ッチ周期比較部３での比較条件がＴn≒Ｔn-1等となって
いるが、以下、この理由を説明する。By the way, the comparison condition in the pitch period comparison unit 3 in the step S15 is Tn ≒ Tn-1 and the like. The reason will be described below.

【００６５】人の音声は、電子回路で発生する安定した
正弦波等の信号とは異なり、ピッチ等が揺らいでいるこ
とが多い。このため、同じ音について順次ピッチ周期を
求めても、求めた時によってその値が若干異なることが
ある。このような場合に、同じ音が持続しているにも関
わらず、違う音に変わったという誤った判断をしてしま
うのを防ぐために、上記のような比較条件に設定してい
る。A human voice is different from a stable signal such as a sine wave generated in an electronic circuit, and a pitch or the like often fluctuates. Therefore, even if the pitch period is sequentially obtained for the same sound, the value may slightly differ depending on the obtained time. In such a case, the comparison condition is set as described above in order to prevent an erroneous determination that the sound has changed to a different sound even though the same sound continues.

【００６６】尚、新たに抽出したピッチ周期Ｔnと１つ
前のピッチ周期Ｔn-1とが略同じと判断する許容範囲
は、サンプリング周波数にも影響され、サンプリング周
波数が高いほど許容範囲は広く設定する必要がある。本
実施例では、サンプリング周波数ｆｓ＝１２．８ＫＨｚ
であり、前記許容範囲は３以内とした。The allowable range in which the newly extracted pitch period Tn and the immediately preceding pitch period Tn-1 are determined to be substantially the same is also affected by the sampling frequency. The higher the sampling frequency, the wider the allowable range is set. There is a need to. In this embodiment, the sampling frequency fs = 12.8 KHz
And the allowable range was set to 3 or less.

【００６７】また、ピッチ周期抽出部１でピッチ周期を
抽出したときに、前記と同様にピッチ等の揺らぎによる
影響で、新たにピッチ周期を抽出した際に、１つ前に抽
出したピッチ周期の略２倍の値や略半分の値が抽出され
ることがある。このため、フローチャートに記載してい
るように、Ｔn≒２Ｔn-1や、Ｔn≒１／２Ｔn-1の場合も
同じピッチ周期と判断するように構成されている。When the pitch period is extracted by the pitch period extracting unit 1, when a new pitch period is extracted due to the influence of fluctuations of the pitch and the like as described above, the pitch period of the immediately preceding pitch period is extracted. A value that is approximately twice or approximately half may be extracted. For this reason, as described in the flowchart, it is configured that the same pitch period is also determined when Tn ≒ 2Tn-1 or Tn ≒ 1 / 2Tn-1.

【００６８】次に、前記ステップＳ１２における処理に
関して、蓄積量検出部１７で求めたメモリ１５の蓄積量
ｊと話速倍率Ｎとの関係について説明する。Next, with respect to the processing in step S12, the relationship between the storage amount j of the memory 15 obtained by the storage amount detection unit 17 and the speech speed magnification N will be described.

【００６９】話速変換部７における話速変換の処理が続
くにつれて、符号化された音声信号がメモリ１５に蓄積
されていくとメモリ１５の空容量が減少する。メモリ１
５は符号化されたデジタルの音声信号を一定量分保持す
るように構成されているが、入力音声信号において削除
する無音区間が少ない場合、メモリ１５に音声信号を蓄
積しきれなくなり、蓄積しきれなかった音声信号が欠落
するなどの不都合を生じる恐れがある。このような不都
合を回避するために、メモリ１５の蓄積量をチェック
し、メモリ１５の残量が少なくなるにつれて、話速倍率
Ｎを補正するように構成されている。As the speech speed conversion process in the speech speed conversion unit 7 continues, the vacant capacity of the memory 15 decreases as the encoded voice signals are accumulated in the memory 15. Memory 1
5 is configured to hold a coded digital audio signal for a fixed amount. However, if there are few silent sections to be deleted in the input audio signal, the audio signal can no longer be stored in the memory 15 and is not stored. There is a possibility that an inconvenience such as a missing audio signal is lost. In order to avoid such inconvenience, the storage amount of the memory 15 is checked, and the speech speed magnification N is corrected as the remaining amount of the memory 15 decreases.

【００７０】上記の例において、話速倍率Ｎの初期値は
０．６［倍速］に設定されていたが、表１に示すよう
に、メモリ１５の蓄積量ｊが増えるにつれて、話速倍率
Ｎの値を右へシフトする。具体的には、前記ループ処理
が繰り返される中で、前記ステップ１２において、蓄積
量ｊが２０≦ｋ＜４０［％］の範囲の値になると、話速
倍率Ｎを０．６［倍速］から０．７［倍速］へ変更す
る。以後、蓄積量ｊに応じて表に示す値に変更し、メモ
リ１５がオーバーフローする前に話速倍率Ｎ＝１．０
［倍速］（時間軸の圧縮伸長の無い通常の再生＝［通常
モード］での再生と同じ）になる。In the above example, the initial value of the speech speed magnification N is set to 0.6 [double speed]. However, as shown in Table 1, as the storage amount j of the memory 15 increases, the speech speed magnification N Shift the value of to the right. Specifically, while the loop processing is repeated, in step 12, when the accumulated amount j becomes a value in the range of 20 ≦ k <40 [%], the speech speed magnification N is increased from 0.6 [double speed]. Change to 0.7 [double speed]. Thereafter, the value is changed to the value shown in the table according to the accumulated amount j, and the speech speed ratio N = 1.0 before the memory 15 overflows.
[Double speed] (normal playback without compression / expansion on the time axis = same as playback in [normal mode]).

【００７１】一方、入力音声信号に削除できる無音区間
が多く含まれている場合、メモリ１５の蓄積量ｊが徐々
に減少し、前記表１において、話速倍率Ｎの値を前記と
は逆に左へシフトし、やがて話速倍率Ｎの値が初期値に
達すると、話速倍率Ｎの値のシフトは停止するように構
成されている。On the other hand, when the input voice signal includes many silent sections that can be deleted, the storage amount j of the memory 15 gradually decreases, and in Table 1, the value of the speech speed magnification N is changed in the opposite manner. When the value shifts to the left, and the value of the speech speed magnification N reaches the initial value, the shift of the value of the speech speed magnification N stops.

【００７２】尚、使用者によって停止の指示が与えられ
ると装置は停止する（Ｓ２３）。When a stop instruction is given by the user, the apparatus stops (S23).

【００７３】まとめると、この実施例においては、積算
値Ｍが閾値Ｓより大きいときは［通常再生モード］が自
動的に選択され、積算値Ｍが閾値Ｓ以下のときは［ゆっ
くり再生モード］が自動的に選択されるものである。In summary, in this embodiment, when the integrated value M is larger than the threshold value S, the [normal reproduction mode] is automatically selected, and when the integrated value M is less than the threshold value S, the [slow reproduction mode] is set. It is automatically selected.

【００７４】続いて、図５のフローチャートに基づいて
本発明の第２の実施例について説明する。尚、装置の基
本構成は前記図１及び図２に示したものと同じであり、
その詳細な説明は割愛し、動作についてのみ説明する。Next, a second embodiment of the present invention will be described with reference to the flowchart of FIG. The basic configuration of the device is the same as that shown in FIGS.
The detailed description is omitted, and only the operation is described.

【００７５】先ず、前記の実施例と同様に使用者が前記
モード選択部８の操作ボタン（図示せず）を操作してモ
ードを選択する（Ｓ３１）。一例として、前記表１に示
した［ゆっくり再生モード１］を選択したものとする。
これによって、モード選択部８より話速変換部７の話速
制御部１３へモード選択信号が与えられ、話速制御部１
３は与えられたモード選択信号に基づいて［ゆっくり再
生モード１］に設定される。First, the user operates the operation button (not shown) of the mode selection section 8 to select a mode as in the above embodiment (S31). As an example, it is assumed that [Slow playback mode 1] shown in Table 1 is selected.
As a result, the mode selection signal is given from the mode selection unit 8 to the speech speed control unit 13 of the speech speed conversion unit 7, and the speech speed control unit 1
3 is set to [slow reproduction mode 1] based on the given mode selection signal.

【００７６】次に、蓄積量検出部１７がメモリ１５の蓄
積量をチェックして蓄積量ｊを求め、蓄積量ｊの値を話
速制御部１３に与える（Ｓ３２）。Next, the storage amount detector 17 checks the storage amount in the memory 15 to obtain the storage amount j, and gives the value of the storage amount j to the speech speed controller 13 (S32).

【００７７】この時点では、メモリ１５への符号化され
た音声信号の蓄積量が０であるとすると、前記表１よ
り、話速倍率Ｎとして０．６［倍速］が初期値として設
定される（Ｓ３３）。At this point, assuming that the accumulated amount of the coded audio signal in the memory 15 is 0, from the above Table 1, the speech speed magnification N is set to 0.6 [double speed] as an initial value. (S33).

【００７８】また、ピッチ周期記憶部２の記憶内容（以
下、変数名としてピッチ周期Ｔn-1とする）は初期化
（クリア）される。The contents stored in the pitch period storage unit 2 (hereinafter, referred to as a pitch period Tn-1 as a variable name) are initialized (cleared).

【００７９】こうして初期値としての話速倍率Ｎ（＝
０．６［倍速］）が設定された後、ピッチ周期抽出部１
が、前記数１に基づいて、入力音声信号のピッチ周期
（以下、変数名としてピッチ周期Ｔnとする）を抽出す
る（Ｓ３４）。一例として、例えばピッチ周期Ｔn＝６
０［サンプル］が求められたとする。Thus, the speech speed magnification N (=
0.6 [double speed]), the pitch period extracting unit 1
Extracts the pitch period of the input audio signal (hereinafter referred to as pitch period Tn as a variable name) based on the above equation (S34). As an example, for example, pitch period Tn = 6
It is assumed that 0 [sample] is obtained.

【００８０】前記ピッチ周期抽出部１で抽出されたピッ
チ周期Ｔnとピッチ周期記憶部２に記憶されているピッ
チ周期Ｔn-1とをピッチ周期比較部３で比較する（Ｓ３
５）。The pitch cycle Tn extracted by the pitch cycle extracting section 1 is compared with the pitch cycle Tn-1 stored in the pitch cycle storing section 2 by the pitch cycle comparing section 3 (S3).
5).

【００８１】然し乍ら、前記したようにピッチ周期記憶
部２の記憶内容がクリアされているので、ステップＳ４
１へ進み、カウンタ４のカウント値Ｃを初期化（クリ
ア）して０にする。However, since the contents stored in the pitch period storage unit 2 have been cleared as described above, step S4
The process proceeds to 1 to initialize (clear) the count value C of the counter 4 to 0.

【００８２】また、ピッチ周期Ｔnの値（＝６０［サン
プル］）はピッチ周期記憶部２に記憶されて新たにピッ
チ周期Ｔn-1の値となり（Ｓ４２）、さらに、前記ステ
ップＳ３３で設定した該話速倍率Ｎ＝０．６［倍速］を
音声時間軸圧縮伸長部１１に与え、これを受けた時間軸
圧縮伸長部１１は、話速が０．６［倍速］となるように
入力音声信号の時間軸を伸長する。前記時間軸圧縮伸長
部１１で時間軸が伸長された音声信号は、音声符号化部
１４で符号化され、一端、メモリ１５に蓄積された後、
音声復号化部１６で復号されて出力音声信号となり、処
理はステップＳ４３を経てステップＳ３２へ戻される。The value of the pitch cycle Tn (= 60 [sample]) is stored in the pitch cycle storage unit 2 and becomes a new value of the pitch cycle Tn-1 (S42). The speech speed magnification N = 0.6 [double speed] is given to the audio time axis compression / expansion unit 11, and the time axis compression / expansion unit 11 receives the input signal so that the speech speed becomes 0.6 [double speed]. The time axis of is extended. The audio signal whose time axis has been expanded by the time axis compression / expansion unit 11 is encoded by an audio encoding unit 14, and once stored in a memory 15.
The audio signal is decoded by the audio decoding unit 16 to become an output audio signal, and the process returns to step S32 via step S43.

【００８３】そして、再びメモリの蓄積量ｊのチェック
（Ｓ３２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ３３）。Then, the storage amount j of the memory is checked again (S32), and the speech speed magnification N based on the storage amount j is set again (S33).

【００８４】そして、再びピッチ周期Ｔnの抽出が行わ
れ（Ｓ３４）、例えばピッチ周期Ｔn＝６１［サンプ
ル］が求められたとする。この抽出したピッチ周期Ｔn
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）とを
ピッチ周期比較部３で比較する（Ｓ３５）。Then, it is assumed that the pitch period Tn is extracted again (S34), and for example, the pitch period Tn = 61 [samples] is obtained. This extracted pitch cycle Tn
(= 61 [sample]) and the pitch period Tn-1 (= 60 [sample]) stored in the pitch period storage unit 2 are compared by the pitch period comparison unit 3 (S35).

【００８５】ここで、新たに抽出したピッチ周期Ｔn
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）が、
Ｔn≒Ｔn-1という条件を満たすなら、カウンタ４のカウ
ント値Ｃが１つインクリメントされる（Ｓ３６）。Here, the newly extracted pitch period Tn
(= 61 [sample]) and the pitch cycle Tn-1 (= 60 [sample]) stored in the pitch cycle storage unit 2
If the condition of Tn ≒ Tn-1 is satisfied, the count value C of the counter 4 is incremented by one (S36).

【００８６】そして、ピッチ周期抽出部１で抽出したピ
ッチ周期Ｔn（＝６１［サンプル］）とカウンタ４のカ
ウント値Ｃ（＝１）を積算して積算値Ｍ（＝６１）を求
め（Ｓ３７）、これを比較部６に与える。Then, the pitch period Tn (= 61 [sample]) extracted by the pitch period extracting unit 1 and the count value C (= 1) of the counter 4 are integrated to obtain an integrated value M (= 61) (S37). Are given to the comparison unit 6.

【００８７】比較部６は、前記積算値Ｍと閾値設定部５
で設定されている閾値Ｓ（＝１５００）とを比較する
（Ｓ３８）。前記したように、積算値Ｍ＝６１であっ
て、閾値Ｓを越えておらず、よって話速制御部１３は話
速のモードを［ゆっくり再生モード］のまま維持し、メ
モリ蓄積量ｊに応じて設定された話速倍率Ｎを音声時間
軸圧縮伸長部１１に与え、これを受けた時間軸圧縮伸長
部１１は、話速倍率Ｎに応じた倍速となるように入力音
声信号の時間軸を伸長する。The comparing section 6 compares the integrated value M with the threshold value setting section 5
Is compared with the threshold value S (= 1500) set in step (S38). As described above, since the integrated value M is 61 and does not exceed the threshold value S, the speech speed control unit 13 maintains the speech speed mode in the [slow playback mode], and according to the memory accumulation amount j. Is given to the speech time axis compression / expansion unit 11, and the time axis compression / expansion unit 11 receives the speech rate magnification N and adjusts the time axis of the input audio signal so that the time axis is doubled according to the speech rate magnification N. Elongate.

【００８８】前述と同様に、前記時間軸圧縮伸長部１１
で時間軸が伸長された音声信号は、音声符号化部１４で
符号化され、一端、メモリ１５に蓄積された後、音声復
号化部１６で復号されて出力音声信号となる。As described above, the time axis compression / expansion unit 11
The audio signal whose time axis has been expanded in step (1) is encoded by the audio encoding unit 14, stored at one end in the memory 15, and then decoded by the audio decoding unit 16 to become an output audio signal.

【００８９】そして、新たにピッチ周期Ｔnの値はピッ
チ周期記憶部２に記憶され（Ｓ４２）、処理はステップ
Ｓ４３を経てステップＳ３２へ戻る。Then, the value of the pitch cycle Tn is newly stored in the pitch cycle storage section 2 (S42), and the process returns to step S32 via step S43.

【００９０】上記の如きステップＳ３２→・・・・→ス
テップＳ４３を経てステップ３２へ戻るループ処理にお
いて、入力音声信号がゆっくり話した場合の音声信号の
場合、前記ループ処理が繰り返され、やがてステップＳ
３８において、前記積算値Ｍが閾値設定部５で設定され
ている閾値Ｓを越え、ステップＳ４０に至る。該ステッ
プ４０では、前記表１において、話速倍率Ｎを現在の値
から１つ右へシフトする。これによって、入力音声信号
の時間軸の伸長率が以前より少し小さくなり、話速は以
前より少し速くなる。即ち、前記表１における値（話速
倍率Ｎ）が、現時の値から１つ右の値にシフトされて設
定される。In the above-described loop processing of step S32 →... →→ step S43 and returning to step S32, if the input audio signal is a speech signal in the case of a slow speech, the loop processing is repeated, and then step S32
At 38, the integrated value M exceeds the threshold value S set by the threshold value setting unit 5, and the process proceeds to step S40. In step 40, the speech speed magnification N in Table 1 is shifted right by one from the current value. As a result, the expansion rate of the time axis of the input audio signal becomes slightly smaller than before, and the speech speed becomes slightly faster than before. That is, the value (the speech speed magnification N) in Table 1 is shifted from the current value to the next value and set.

【００９１】逆に、入力音声信号が早口の音声信号の場
合、前記積算値Ｍが閾値設定部５で設定されている閾値
Ｓを越えないので、ステップＳ３８よりステップ３９に
至る。該ステップ３９では、前記表１において、話速倍
率Ｎを現在の値から１つ左へシフトする。これによっ
て、入力音声信号の時間軸の伸長率が以前より少し大き
くなり、話速は以前より少し遅くなる。Conversely, when the input audio signal is a fast-speech audio signal, since the integrated value M does not exceed the threshold value S set by the threshold value setting section 5, the process proceeds from step S38 to step 39. In step 39, the speech speed magnification N in Table 1 is shifted left by one from the current value. As a result, the expansion rate of the time axis of the input audio signal becomes slightly larger than before, and the speech speed becomes slightly lower than before.

【００９２】尚、メモリの蓄積量ｋによっても話速倍率
Ｎが変更されることは既に説明した通りである。It is to be noted that the speech speed magnification N is also changed depending on the storage amount k of the memory as described above.

【００９３】また、使用者によって停止の指示が与えら
れると装置は停止する（Ｓ４３）。When the user gives a stop instruction, the apparatus stops (S43).

【００９４】まとめると、この実施例においては、積算
値Ｍが閾値Ｓより小さいときは［ゆっくり再生モード］
における伸長率が少し大きくなるように自動的に選択さ
れ、積算値Ｍが閾値Ｓより大きいときは［ゆっくり再生
モード］における伸長率が少し小さくなるように自動的
に選択されるものである。In summary, in this embodiment, when the integrated value M is smaller than the threshold value S, [slow reproduction mode]
Is automatically selected to slightly increase the expansion rate in the case of, and when the integrated value M is larger than the threshold value S, the expansion rate in the [slow playback mode] is automatically selected to be slightly reduced.

【００９５】次に、図６のフローチャートに基づいて本
発明の第３の実施例について説明する。尚、装置の基本
構成は前記図１及び図２に示したものと同じであり、そ
の詳細な説明は割愛するが、本実施例では、閾値設定部
５において、第１閾値Ｓ１と、これより大きい値に設定
された第２閾値Ｓ２とを有している。Next, a third embodiment of the present invention will be described with reference to the flowchart of FIG. Note that the basic configuration of the apparatus is the same as that shown in FIGS. 1 and 2 and detailed description thereof is omitted. However, in the present embodiment, the threshold setting unit 5 uses the first threshold S1 and the first threshold S1. And a second threshold value S2 set to a large value.

【００９６】上記図６において、先ず、前記の実施例と
同様に使用者が前記モード選択部８の操作ボタン（図示
せず）を操作してモードを選択する（Ｓ５１）。一例と
して、前記表１に示した［ゆっくり再生モード１］を選
択したものとする。これによって、モード選択部８より
話速変換部７の話速制御部１３へモード選択信号が与え
られ、話速制御部１３は与えられたモード選択信号に基
づいて［ゆっくり再生モード１］に設定される。In FIG. 6, first, the user operates the operation button (not shown) of the mode selection section 8 to select a mode as in the above embodiment (S51). As an example, it is assumed that [Slow playback mode 1] shown in Table 1 is selected. As a result, the mode selection signal is given from the mode selection unit 8 to the speech speed control unit 13 of the speech speed conversion unit 7, and the speech speed control unit 13 sets [slow playback mode 1] based on the given mode selection signal. Is done.

【００９７】次に、蓄積量検出部１７がメモリ１５の蓄
積量をチェックして蓄積量ｊを求め、蓄積量ｊの値を話
速制御部１３に与える（Ｓ５２）。Next, the storage amount detector 17 checks the storage amount in the memory 15 to determine the storage amount j, and gives the value of the storage amount j to the speech speed controller 13 (S52).

【００９８】この時点では、メモリ１５への符号化され
た音声信号の蓄積量が０であるとすると、前記表１よ
り、話速倍率Ｎとして０．６［倍速］が初期値として設
定され（Ｓ５３）、該話速倍率Ｎ＝０．６［倍速］にて
話速変換処理が行われる。前記時間軸圧縮伸長部１１で
時間軸が伸長された音声信号は、音声符号化部１４で符
号化され、一端、メモリ１５に蓄積された後、音声復号
化部１６で復号されて出力音声信号となる。At this point, assuming that the accumulated amount of the encoded audio signal in the memory 15 is 0, from the above Table 1, the speech speed magnification N is set to 0.6 [double speed] as the initial value ( S53) The speech speed conversion processing is performed at the speech speed magnification N = 0.6 [double speed]. The audio signal whose time axis has been expanded by the time axis compression / expansion unit 11 is encoded by an audio encoding unit 14, stored in a memory 15 at one end, and then decoded by an audio decoding unit 16 to output an output audio signal. Becomes

【００９９】また、ピッチ周期記憶部２の記憶内容（以
下、変数名としてピッチ周期Ｔn-1とする）は初期化
（クリア）される。The contents stored in the pitch period storage unit 2 (hereinafter, referred to as a pitch period Tn-1 as a variable name) are initialized (cleared).

【０１００】こうして初期値としての話速倍率Ｎ（＝
０．６［倍速］）が設定された後、ピッチ周期抽出部１
が、前記数１に基づいて、入力音声信号のピッチ周期
（以下、変数名としてピッチ周期Ｔnとする）を抽出す
る（Ｓ５４）。一例として、例えばピッチ周期Ｔn＝６
０［サンプル］が求められたとする。Thus, the speech speed ratio N (=
0.6 [double speed]), the pitch period extracting unit 1
Extracts the pitch period of the input audio signal (hereinafter, referred to as pitch period Tn as a variable name) based on Equation 1 (S54). As an example, for example, pitch period Tn = 6
It is assumed that 0 [sample] is obtained.

【０１０１】前記ピッチ周期抽出部１で抽出されたピッ
チ周期Ｔnとピッチ周期記憶部２に記憶されているピッ
チ周期Ｔn-1とをピッチ周期比較部３で比較する（Ｓ５
５）。The pitch period Tn extracted by the pitch period extraction unit 1 is compared with the pitch period Tn-1 stored in the pitch period storage unit 2 by the pitch period comparison unit 3 (S5).
5).

【０１０２】然し乍ら、前記したようにピッチ周期記憶
部２の記憶内容がクリアされているので、ステップＳ６
２へ進み、そしてカウンタ４のカウント値Ｃを初期化
（クリア）して０にし、さらにピッチ周期Ｔnの値（＝
６０［サンプル］）はピッチ周期記憶部２に記憶されて
新たにピッチ周期Ｔn-1の値となり（Ｓ６３）、処理は
ステップＳ６３を経てステップＳ５２へ戻される。However, since the contents stored in the pitch period storage unit 2 have been cleared as described above, step S6 is executed.
2 and the count value C of the counter 4 is initialized (cleared) to 0, and the value of the pitch period Tn (=
60 [sample]) is stored in the pitch period storage unit 2 and becomes a new value of the pitch period Tn-1 (S63), and the process returns to step S52 via step S63.

【０１０３】そして、再びメモリの蓄積量ｊのチェック
（Ｓ５２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ５３）。Then, the storage amount j of the memory is checked again (S52), and the speech speed magnification N based on the storage amount j is set again (S53).

【０１０４】そして、再びピッチ周期Ｔnの抽出が行わ
れ（Ｓ５４）、例えばピッチ周期Ｔn＝６１［サンプ
ル］が求められたとする。この抽出したピッチ周期Ｔｎ
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）とを
ピッチ周期比較部３で比較する（Ｓ５５）。Then, it is assumed that the pitch period Tn is extracted again (S54), and for example, the pitch period Tn = 61 [samples] is obtained. This extracted pitch cycle Tn
(= 61 [sample]) and the pitch period Tn-1 (= 60 [sample]) stored in the pitch period storage unit 2 are compared by the pitch period comparison unit 3 (S55).

【０１０５】ここで、新たに抽出したピッチ周期Ｔn
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）が、
Ｔn≒Ｔn-1という条件を満たすなら、カウンタ４のカウ
ント値Ｃが１つインクリメントされる（Ｓ５６）。Here, the newly extracted pitch period Tn
(= 61 [sample]) and the pitch cycle Tn-1 (= 60 [sample]) stored in the pitch cycle storage unit 2
If the condition of Tn ≒ Tn-1 is satisfied, the count value C of the counter 4 is incremented by one (S56).

【０１０６】そして、前記ピッチ周期抽出部１で抽出し
たピッチ周期Ｔn（＝６１［サンプル］）とカウンタ４
のカウント値Ｃ（＝１）を積算して積算値Ｍ（＝６１）
を求め（Ｓ５７）、これを比較部６に与える。The pitch period Tn (= 61 [sample]) extracted by the pitch period extracting section 1 and the counter 4
And the integrated value M (= 61)
(S57), and this is given to the comparison unit 6.

【０１０７】比較部６は、前記積算値Ｍと閾値設定部５
で設定されている第１閾値Ｓ１（例えば１０００とす
る）及び第２閾値Ｓ２（例えば２０００とする）とを比
較する（Ｓ５８）。前記したように、積算値Ｍ＝６１で
あるので、第１閾値Ｓ１よりも小さく、よって話速制御
部１３は入力信号の伸長率を少し大きくするように変更
する（Ｓ５９）。即ち、前記表１において、現在の値
（話速倍率Ｎ）を１つ左の値にシフトして設定する。The comparing section 6 compares the integrated value M with the threshold value setting section 5
Are compared with the first threshold value S1 (for example, 1000) and the second threshold value S2 (for example, 2000) (S58). As described above, since the integrated value M = 61, it is smaller than the first threshold value S1, and the speech speed control unit 13 changes the input signal expansion rate to be slightly higher (S59). That is, in Table 1, the current value (the speech speed magnification N) is shifted to the left by one and set.

【０１０８】上記の各実施例と同様に、前記時間軸圧縮
伸長部１１で時間軸が伸長された音声信号は、音声符号
化部１４で符号化され、一端、メモリ１５に蓄積された
後、音声復号化部１６で復号されて出力音声信号とな
る。As in the above embodiments, the audio signal whose time axis has been expanded by the time axis compression / expansion unit 11 is encoded by the audio encoding unit 14 and once stored in the memory 15. The audio signal is decoded by the audio decoding unit 16 and becomes an output audio signal.

【０１０９】そして、新たにピッチ周期Ｔnの値はピッ
チ周期記憶部２に記憶され（Ｓ６３）、処理はステップ
Ｓ６３を経てステップＳ５２へ戻る。Then, the value of the pitch cycle Tn is newly stored in the pitch cycle storage unit 2 (S63), and the process returns to step S52 via step S63.

【０１１０】上記の如きステップＳ５２→・・・・→ス
テップＳ６４を経てステップ５２へ戻るループ処理にお
いて、前記ループ処理が繰り返され、積算値Ｍが第１閾
値Ｓ１と第２閾値Ｓ２の間になった場合、ステップＳ６
０に至るが、この場合は話速倍率Ｎの値の変更は行われ
ない。In the loop processing that returns to step 52 via step S52 → step S64 as described above, the loop processing is repeated, and the integrated value M becomes between the first threshold value S1 and the second threshold value S2. If so, step S6
However, in this case, the value of the speech speed magnification N is not changed.

【０１１１】上記の如きステップＳ５２→・・・・→ス
テップＳ６４を経てステップ５２へ戻るループ処理にお
いて、入力音声信号がゆっくり話した場合の音声信号の
場合、前記ループ処理が繰り返され、やがてステップＳ
５８において、前記積算値Ｍが閾値設定部５で設定され
ている第２閾値Ｓ２を越え、ステップＳ６１に至る。該
ステップ６１では、前記表１において、話速倍率Ｎを現
在の値から１つ右へシフトする。これによって、入力音
声信号の時間軸の伸長率が以前より少し小さくなり、話
速は以前より少し速くなる。In the loop processing that returns to step 52 via step S52 →... → step S64 as described above, if the input audio signal is a speech signal in which the speech is slowly spoken, the above-described loop processing is repeated.
At 58, the integrated value M exceeds the second threshold value S2 set by the threshold value setting unit 5, and the process proceeds to step S61. In step 61, the speech speed magnification N in Table 1 is shifted right by one from the current value. As a result, the expansion rate of the time axis of the input audio signal becomes slightly smaller than before, and the speech speed becomes slightly faster than before.

【０１１２】逆に、入力音声信号が早口の音声信号の場
合、前記積算値Ｍが閾値設定部５で設定されている第１
閾値Ｓ１より小さくなるので、ステップＳ５８よりステ
ップ５９に至る。該ステップ５９では、前記表１におい
て、話速倍率Ｎを現在の値から１つ左へシフトする。こ
れによって、入力音声信号の時間軸の伸長率が以前より
少し大きくなり、話速は以前より少し遅くなる。Conversely, if the input audio signal is a fast-speech audio signal, the first integrated value M set by the threshold setting unit 5
Since it is smaller than the threshold value S1, the process proceeds from step S58 to step 59. In step 59, the speech speed magnification N in Table 1 is shifted left by one from the current value. As a result, the expansion rate of the time axis of the input audio signal becomes slightly larger than before, and the speech speed becomes slightly lower than before.

【０１１３】尚、メモリの蓄積量ｋによっても話速倍率
Ｎが変更されることは既に説明した通りである。As described above, the speech speed magnification N is also changed depending on the storage amount k of the memory.

【０１１４】また、使用者によって停止の指示が与えら
れると装置は停止する（Ｓ６４）。When a stop instruction is given by the user, the apparatus stops (S64).

【０１１５】まとめると、この実施例においては、積算
値Ｍが第１閾値Ｓ１より小さいときは［ゆっくり再生モ
ード］における伸長率が少し大きくなるように自動的に
選択され、積算値Ｍが第２閾値Ｓ２（但し、第１閾値Ｓ
１＜第２閾値Ｓ２）より大きいときは［ゆっくり再生モ
ード］における伸長率が少し小さくなるように自動的に
選択され、積算値Ｍが第１閾値Ｓ１と第２閾値との間に
あるときには［ゆっくり再生モード］における伸長率を
変更しないものである。In summary, in this embodiment, when the integrated value M is smaller than the first threshold value S1, the expansion rate in the [slow playback mode] is automatically selected so as to be slightly larger, and the integrated value M is set to the second value. Threshold value S2 (however, the first threshold value S
1 <second threshold value S2), the expansion rate in the [slow playback mode] is automatically selected to be slightly smaller, and when the integrated value M is between the first threshold value S1 and the second threshold value, In the slow playback mode].

【０１１６】さらに、図７のフローチャートに基づいて
本発明の第４の実施例について説明する。尚、装置の基
本構成は前記図１及び図２に示したものと同じであり、
その詳細な説明は割愛するが、本実施例では、前記第３
の実施例と同様に、閾値設定部５において、第１閾値Ｓ
１と、これより大きい値に設定された第２閾値Ｓ２とを
有している。Next, a fourth embodiment of the present invention will be described with reference to the flowchart of FIG. The basic configuration of the device is the same as that shown in FIGS.
Although the detailed description is omitted, in the present embodiment, the third
Similarly to the embodiment, the threshold setting unit 5 sets the first threshold S
1 and a second threshold value S2 set to a larger value.

【０１１７】それに加えて、以下の表２に示すように、
前記表１に示した［ゆっくり再生モード］の他に［早口
再生モード］が追加されている。この［早口再生モー
ド］も、［ゆっくり再生モード］と同様に、１〜４の４
つのモードを備えている。In addition, as shown in Table 2 below,
In addition to the [slow playback mode] shown in Table 1, [early playback mode] is added. This [Early Playback Mode] is also the same as [Slow Playback Mode], and is 4-4.
It has two modes.

【０１１８】[0118]

【表２】 [Table 2]

【０１１９】前記図７において、先ず、前記の実施例と
同様に使用者が前記モード選択部８の操作ボタン（図示
せず）を操作してモードを選択する（Ｓ７１）。この実
施例では、［ゆっくり再生モード］と［早口再生モー
ド］とを有しているので、夫々のモードについて、４つ
のモードの中から１つずつ選択する。ここでは一例とし
て、前記表１に示した［ゆっくり再生モード１］及び前
記表２に示した［早口再生モード１］を選択したものと
する。これによって、モード選択部８より話速変換部７
の話速制御部１３へモード選択信号が与えられ、話速制
御部１３は与えられたモード選択信号に基づいて、［ゆ
っくり再生モード１］及び［早口再生モード１］に対す
る設定情報が与えられる。In FIG. 7, first, the user operates the operation button (not shown) of the mode selection section 8 to select a mode as in the above embodiment (S71). In this embodiment, since there are a [slow playback mode] and a [early playback mode], one of the four modes is selected for each mode. Here, as an example, it is assumed that the [slow playback mode 1] shown in Table 1 and the [early playback mode 1] shown in Table 2 are selected. As a result, the speech speed conversion unit 7 is
Is given to the speech speed control unit 13, and the speech speed control unit 13 is given setting information for the [slow playback mode 1] and the [early speech playback mode 1] based on the given mode selection signal.

【０１２０】次に、蓄積量検出部１７がメモリ１５の蓄
積量をチェックして蓄積量ｊを求め、蓄積量ｊの値を話
速制御部１３に与えるが（Ｓ７２）、前記の各実施例と
は異なり、この時点では、話速倍率Ｎとして１．０［倍
速］（即ち、［通常再生モード］）が初期値として設定
される（Ｓ７３）。Next, the storage amount detector 17 checks the storage amount in the memory 15 to determine the storage amount j, and gives the value of the storage amount j to the speech speed controller 13 (S72). Unlike this, at this point, 1.0 [double speed] (that is, [normal reproduction mode]) is set as the speech speed magnification N as an initial value (S73).

【０１２１】また、ピッチ周期記憶部２の記憶内容（以
下、変数名としてピッチ周期Ｔn-1とする）は初期化
（クリア）される。The contents stored in the pitch period storage unit 2 (hereinafter, referred to as a pitch period Tn-1 as a variable name) are initialized (cleared).

【０１２２】こうして初期値としての話速倍率Ｎ（＝
１．０［倍速］）が設定された後、ピッチ周期抽出部１
が、前記数１に基づいて、入力音声信号のピッチ周期
（以下、変数名としてピッチ周期Ｔnとする）を抽出す
る（Ｓ７４）。一例として、例えばピッチ周期Ｔn＝６
０［サンプル］が求められたとする。Thus, the speech speed magnification N (=
1.0 [double speed]), the pitch period extracting unit 1
Extracts the pitch cycle of the input audio signal (hereinafter, referred to as pitch cycle Tn as a variable name) based on the above equation (S74). As an example, for example, pitch period Tn = 6
It is assumed that 0 [sample] is obtained.

【０１２３】前記ピッチ周期抽出部１で抽出されたピッ
チ周期Ｔnとピッチ周期記憶部２に記憶されているピッ
チ周期Ｔn-1とをピッチ周期比較部３で比較する（Ｓ７
５）。The pitch cycle Tn extracted by the pitch cycle extracting section 1 is compared with the pitch cycle Tn-1 stored in the pitch cycle storing section 2 by the pitch cycle comparing section 3 (S7).
5).

【０１２４】然し乍ら、前記したようにピッチ周期記憶
部２の記憶内容がクリアされているので、ステップＳ８
２へ進み、カウンタ４のカウント値Ｃを初期化（クリ
ア）して０にし、ステップＳ８０で［通常再生モード］
が選択される（既にステップＳ７３で［通常再生モー
ド］が設定されているので、実質的にはモードの変更は
無し）。However, since the contents stored in the pitch period storage unit 2 have been cleared as described above, step S8 is executed.
Proceeding to 2, the count value C of the counter 4 is initialized (cleared) to 0, and in step S80 [normal reproduction mode]
Is selected (since [Normal playback mode] has already been set in step S73, there is substantially no mode change).

【０１２５】また、ピッチ周期Ｔnの値（＝６０［サン
プル］）はピッチ周期記憶部２に記憶され、新たにピッ
チ周期Ｔn-1の値となり（Ｓ８３）、処理はステップ８
４を経てステップＳ７２へ戻される。Further, the value of the pitch period Tn (= 60 [sample]) is stored in the pitch period storage unit 2 and becomes a new value of the pitch period Tn-1 (S83).
Thereafter, the flow returns to step S72.

【０１２６】そして、再びメモリの蓄積量ｊのチェック
（Ｓ７２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ７３）。Then, the storage amount j of the memory is checked again (S72), and the speech speed magnification N based on the storage amount j is set again (S73).

【０１２７】そして、再びピッチ周期Ｔnの抽出が行わ
れ（Ｓ７４）、例えばピッチ周期Ｔn＝６１［サンプ
ル］が求められたとする。この抽出したピッチ周期Ｔn
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）とを
ピッチ周期比較部３で比較する（Ｓ７５）。Then, it is assumed that the pitch period Tn is extracted again (S74), and for example, the pitch period Tn = 61 [samples] is obtained. This extracted pitch cycle Tn
(= 61 [sample]) and the pitch period Tn-1 (= 60 [sample]) stored in the pitch period storage unit 2 are compared by the pitch period comparison unit 3 (S75).

【０１２８】ここで、新たに抽出したピッチ周期Ｔｎ
（＝６１［サンプル］）とピッチ周期記憶部２に記憶さ
れているピッチ周期Ｔn-1（＝６０［サンプル］）が、
Ｔn≒Ｔn-1という条件を満たすなら、カウンタ４のカウ
ント値Ｃが１つインクリメントされる（Ｓ５６）。Here, the newly extracted pitch period Tn
(= 61 [sample]) and the pitch cycle Tn-1 (= 60 [sample]) stored in the pitch cycle storage unit 2
If the condition of Tn ≒ Tn-1 is satisfied, the count value C of the counter 4 is incremented by one (S56).

【０１２９】そして、ピッチ周期抽出部１で抽出したピ
ッチ周期Ｔn（＝６１［サンプル］）とカウンタ４のカ
ウント値Ｃ（＝１）を積算して積算値Ｍ（＝６１）を求
め（Ｓ７７）、これを比較部６に与える。Then, the pitch period Tn (= 61 [sample]) extracted by the pitch period extracting unit 1 and the count value C (= 1) of the counter 4 are integrated to obtain an integrated value M (= 61) (S77). Are given to the comparison unit 6.

【０１３０】この実施例においても、前記第３の実施例
と同様に、第１閾値Ｓ１と、これより大きい値に設定さ
れた第２閾値Ｓ２とを有している。そして、入力音声信
号のピッチ周期Ｔnとカウンタ４のカウント値Ｃとの積
算値Ｍが、前記第１閾値Ｓ１と第２閾値Ｓ２との間にあ
る場合は標準的な話速の音声であると判断し、積算値Ｍ
が第１閾値Ｓ１より小さい場合は早口の音声であると判
断し、積算値Ｍが第２閾値Ｓ１より大きい場合はゆっく
り話した場合の音声であると判断するものである。This embodiment also has a first threshold value S1 and a second threshold value S2 set to a value larger than the first threshold value, as in the third embodiment. When the integrated value M of the pitch period Tn of the input voice signal and the count value C of the counter 4 is between the first threshold value S1 and the second threshold value S2, it is determined that the voice has a standard voice speed. Judge, integrated value M
Is smaller than the first threshold value S1, it is determined that the voice is a fast-talking voice, and if the integrated value M is larger than the second threshold value S1, it is determined that the voice is a voice when the user speaks slowly.

【０１３１】比較部６は、前記積算値Ｍと閾値設定部５
で設定されている第１閾値Ｓ１（例えば１０００とす
る）及び第２閾値Ｓ２（例えば２０００とする）とを比
較する（Ｓ７８）。前記したように、積算値Ｍ＝６１で
あるので、第１閾値Ｓ１よりも小さく、よって［ゆっく
り再生モード］が選択される（Ｓ７９）。尚、前記ステ
ップＳ７１において、［ゆっくり再生モード１］が［ゆ
っくり再生モード］における初期モードとして設定して
あったので、［ゆっくり再生モード１］が次に変更すべ
きモードとして設定される。The comparing section 6 includes the integrated value M and the threshold setting section 5
Are compared with the first threshold value S1 (for example, 1000) and the second threshold value S2 (for example, 2000) (S78). As described above, since the integrated value M = 61, it is smaller than the first threshold value S1, and thus the [slow playback mode] is selected (S79). Since the [slow playback mode 1] is set as the initial mode in the [slow playback mode] in step S71, the [slow playback mode 1] is set as the mode to be changed next.

【０１３２】上記の各実施例と同様に、前記時間軸圧縮
伸長部１１で時間軸が伸長された音声信号は、音声符号
化部１４で符号化され、一端、メモリ１５に蓄積された
後、音声復号化部１６で復号されて出力音声信号とな
る。As in the above embodiments, the audio signal whose time axis has been expanded by the time axis compression / expansion unit 11 is encoded by the audio encoding unit 14 and once stored in the memory 15. The audio signal is decoded by the audio decoding unit 16 and becomes an output audio signal.

【０１３３】そして、新たにピッチ周期Ｔnの値はピッ
チ周期記憶部２に記憶され（Ｓ８３）、処理はステップ
８４を経てステップＳ７２へ戻る。Then, the value of the pitch cycle Tn is newly stored in the pitch cycle storage unit 2 (S83), and the process returns to step S72 via step S84.

【０１３４】上記の如きステップＳ７２→・・・・→ス
テップＳ８４を経てステップ７２へ戻るループ処理にお
いて、前記ループ処理が繰り返され、積算値Ｍが第１閾
値Ｓ１と第２閾値Ｓ２の間になると、ステップＳ８０に
至るが、この場合、話速倍率Ｎの値の変更は行われな
い。In the loop processing that returns to step 72 after step S72 → step S84 as described above, the above-mentioned loop processing is repeated until the integrated value M falls between the first threshold value S1 and the second threshold value S2. , Step S80, but in this case, the value of the speech speed magnification N is not changed.

【０１３５】また、上記の如きステップＳ７２→・・・
・→ステップＳ８４を経てステップ７２へ戻るループ処
理において、入力音声信号がゆっくり話した音声信号の
場合、前記ループ処理が繰り返され、やがてステップＳ
７８において、前記積算値Ｍが閾値設定部５で設定され
ている第２閾値Ｓ２を越え、ステップＳ８１に至る。該
ステップ８１では、次に変更すべきモードとして［早口
再生モード］を選択する。前記ステップＳ７１におい
て、［早口再生モード１］が［早口再生モード］におけ
る初期モードとして設定してあったので、［早口再生モ
ード１］が次に変更すべきモードとして設定される。Step S72 as described above →
In the loop processing returning to step 72 after step S84, if the input audio signal is a speech signal that is slowly spoken, the above-described loop processing is repeated, and then step S
At 78, the integrated value M exceeds the second threshold value S2 set by the threshold value setting unit 5, and the process proceeds to step S81. In the step 81, a "quick reproduction mode" is selected as a mode to be changed next. In the step S71, since the [early reproduction mode 1] is set as the initial mode in the [early reproduction mode], the [early reproduction mode 1] is set as the mode to be changed next.

【０１３６】モードの設定後、前記と同様にピッチ周期
Ｔnの値（＝６１［サンプル］）はピッチ周期記憶部２
に記憶され、新たにピッチ周期Ｔn-1の値となり（Ｓ８
３）、処理はステップＳ７２へ戻される。After setting the mode, the value of the pitch period Tn (= 61 [sample]) is stored in the pitch period storage unit 2 in the same manner as described above.
And becomes a new value of the pitch period Tn-1 (S8
3) The process returns to step S72.

【０１３７】そして、再びメモリの蓄積量ｊのチェック
（Ｓ７２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ７３）。該ステップＳ７３において、
メモリの蓄積量ｊに応じて、前記表２の話速倍率Ｎの値
が適宜変更される。Then, the storage amount j of the memory is checked again (S72), and the speech speed magnification N based on the storage amount j is set again (S73). In the step S73,
The value of the speech speed magnification N in Table 2 is appropriately changed according to the storage amount j of the memory.

【０１３８】逆に、入力音声信号が早口の音声信号の場
合、前記積算値Ｍが閾値設定部５で設定されている第１
閾値Ｓ１より小さくなるので、ステップＳ７８よりステ
ップ７９に至る。該ステップ７９では、次に変更すべき
モードとして［ゆっくり再生モード］を選択する。前記
ステップＳ７１において、［ゆっくり再生モード１］が
［ゆっくり再生モード］における初期モードとして設定
してあったので、［ゆっくり再生モード１］が次に変更
すべきモードとして設定される。Conversely, when the input audio signal is a fast-speech audio signal, the first integrated value M set by the threshold setting unit 5 is used.
Since it is smaller than the threshold value S1, the process proceeds from step S78 to step 79. In step 79, "slow playback mode" is selected as the mode to be changed next. In step S71, since [slow playback mode 1] is set as the initial mode in [slow playback mode], [slow playback mode 1] is set as the mode to be changed next.

【０１３９】前記のようにモードの設定後、前記と同様
にピッチ周期Ｔnの値はピッチ周期記憶部２に記憶さ
れ、新たにピッチ周期Ｔn-1の値となり（Ｓ８３）、さ
らに時間軸の圧縮も伸長も成されていない音声信号が前
記時間軸圧縮伸長部１１より出力され、音声符号化部１
４で符号化され、一端、メモリ１５に蓄積された後、音
声復号化部１６で復号されて出力音声信号となる。そし
て、その後処理はステップＳ７２へ戻される。After the mode is set as described above, the value of the pitch period Tn is stored in the pitch period storage unit 2 and becomes the new value of the pitch period Tn-1 in the same manner as described above (S83). An audio signal that has not been expanded or decompressed is output from the time axis compression / expansion unit 11 and the audio encoding unit 1
4 and stored in the memory 15 at one end, and then decoded by the audio decoding unit 16 to become an output audio signal. Then, the process returns to step S72.

【０１４０】そして、再びメモリの蓄積量ｊのチェック
（Ｓ７２）、及び前記蓄積量ｊに基づく話速倍率Ｎの設
定が行われる（Ｓ７３）。該ステップＳ７３において、
メモリの蓄積量ｊに応じて、前記表１の話速倍率Ｎの値
が適宜変更される。Then, the storage amount j of the memory is checked again (S72), and the speech speed magnification N based on the storage amount j is set (S73). In the step S73,
According to the storage amount j of the memory, the value of the speech speed magnification N in Table 1 is appropriately changed.

【０１４１】尚、メモリの蓄積量ｋによっても話速倍率
Ｎが変更されることは既に説明した通りである。It is to be noted that the speech speed ratio N is also changed depending on the storage amount k of the memory as described above.

【０１４２】また、使用者によって停止の指示が与えら
れると装置は停止する（Ｓ３３）。When a stop instruction is given by the user, the apparatus stops (S33).

【０１４３】まとめると、この実施例においては、積算
値Ｍが第１閾値Ｓ１より小さいときは［ゆっくり再生モ
ード］が自動的に選択され、積算値Ｍが第２閾値Ｓ２
（但し、第１閾値Ｓ１＜第２閾値Ｓ２）より大きいとき
は［早口再生モード］が自動的に選択され、積算値Ｍが
第１閾値Ｓ１と第２閾値との間にあるときには［通常再
生モード］が自動的に選択されるものである。In summary, in this embodiment, when the integrated value M is smaller than the first threshold value S1, the [slow reproduction mode] is automatically selected, and the integrated value M is set to the second threshold value S2.
(However, when the first threshold value S1 <the second threshold value S2), the [early speed reproduction mode] is automatically selected. When the integrated value M is between the first threshold value S1 and the second threshold value, the [normal reproduction mode] is selected. Mode] is automatically selected.

【０１４４】さらに、図８は、前記図１に示した話速変
換装置に閾値設定部５で設定される閾値を使用者が変更
するための閾値変更操作部１８を設けたものである。音
声を聴いたときに、早口で聴き取り難いと感じたり、逆
にゆっくりすぎると感じる感覚には個人差がある。従っ
て、使用者は前記閾値変更操作部１８によって、自動的
に話速を変更するための判断の基準となる前記閾値Ｓ、
もしくは第１閾値Ｓ１及び第２閾値Ｓ２を変更し、自動
的に自分に適した話速に設定されるように調整すること
ができる。尚、閾値変更操作部１８は、［＋］操作キー
及び［−］操作キー、あるいは［up］操作キー及び［do
wn］操作キーなど、種々の操作キーや、ジョグダイヤル
やスライドレバーなど、種々の形態で構成することがで
きる。さらには、複数のボタン等（［速く］、［少し速
く］、［普通］、［少し遅く］、［遅く］等）を設け、
これらに閾値を設定しておき、使用者がその中から選択
するようにしてもよい。Further, FIG. 8 is provided with a threshold change operation section 18 for the user to change the threshold set by the threshold setting section 5 in the speech speed conversion apparatus shown in FIG. When listening to voice, there is an individual difference in the feeling that it is difficult to hear at a high speed or that it is too slow. Therefore, the user can use the threshold value changing operation section 18 to automatically change the speech speed.
Alternatively, the first threshold value S1 and the second threshold value S2 can be changed and adjusted so as to automatically set a speech speed suitable for the user. The threshold value changing operation unit 18 is provided with a [+] operation key and a [-] operation key, or an [up] operation key and a [do] operation key.
[wn] Various operation keys such as operation keys, and various forms such as a jog dial and a slide lever can be used. In addition, multiple buttons (such as [fast], [slightly fast], [normal], [slightly slow], [slow], etc.)
Thresholds may be set for these, and the user may select from them.

【０１４５】続いて図９は、本発明の第５の実施例の動
作を示すフローチャートであり、これは前記図４に示し
た話速変換装置の動作を示すフローチャートに閾値Ｓを
使用者が変更・設定する処理であるステップＳ２４を追
加したものである。また、前記図４に示した処理と同一
の部分には同一の符号を付し、その詳細な説明は割愛す
る。FIG. 9 is a flow chart showing the operation of the fifth embodiment of the present invention. This is a flow chart showing the operation of the speech speed conversion device shown in FIG. Step S24, which is a setting process, is added. In addition, the same portions as those in the processing shown in FIG. 4 are denoted by the same reference numerals, and detailed description thereof will be omitted.

【０１４６】ステップ１１において使用者がモード（前
記表１に示したゆっくり再生モード１〜４のいずれか）
を選択した後、続く次のステップ２４において閾値変更
操作部１８を操作して閾値Ｓを変更することができる。In step 11, the user sets the mode (any of the slow reproduction modes 1 to 4 shown in Table 1).
Is selected, the threshold value S can be changed by operating the threshold value changing operation unit 18 in the next next step 24.

【０１４７】また、図１０は、本発明の第６の実施例の
動作を示すフローチャートであり、前記図５に示した話
速変換装置の動作を示すフローチャートに閾値Ｓを使用
者が変更・設定する処理であるステップＳ４４を追加し
たものである。また、前記図５に示した処理と同一の部
分には同一の符号を付し、その詳細な説明は割愛する。FIG. 10 is a flowchart showing the operation of the sixth embodiment of the present invention. The user changes and sets the threshold value S in the flowchart showing the operation of the speech speed conversion device shown in FIG. Step S44 is added. In addition, the same parts as those in the processing shown in FIG. 5 are denoted by the same reference numerals, and detailed description thereof will be omitted.

【０１４８】ステップ３１において使用者がモード（前
記表１に示したゆっくり再生モード１〜４のいずれか）
を選択した後、続く次のステップ４４において閾値変更
操作部１８を操作して閾値Ｓを変更することができる。In step 31, the user sets the mode (any of the slow reproduction modes 1 to 4 shown in Table 1).
After selecting, the threshold value S can be changed by operating the threshold value changing operation unit 18 in the next next step 44.

【０１４９】次に、図１１は、本発明の第７の実施例の
動作を示すフローチャートであり、前記図６に示した話
速変換装置の動作を示すフローチャートに第１閾値Ｓ１
及び第２閾値Ｓ２を使用者が変更・設定する処理である
ステップＳ６４を追加したものである。また、前記図６
に示した処理と同一の部分には同一の符号を付し、その
詳細な説明は割愛する。Next, FIG. 11 is a flowchart showing the operation of the seventh embodiment of the present invention. The flowchart showing the operation of the speech speed conversion device shown in FIG.
And step S64, which is a process in which the user changes / sets the second threshold value S2. FIG.
The same reference numerals are given to the same portions as the processes shown in (1), and the detailed description thereof is omitted.

【０１５０】ステップ５１において使用者がモード（前
記表１に示したゆっくり再生モード１〜４のいずれか）
を選択した後、続く次のステップ６４において閾値変更
操作部１８を操作して第１閾値Ｓ１及び第２閾値Ｓ２を
変更することができる。In step 51, the user sets the mode (any of the slow reproduction modes 1 to 4 shown in Table 1).
After the selection of, the threshold change operation unit 18 can be operated in the subsequent next step 64 to change the first threshold S1 and the second threshold S2.

【０１５１】さらに、図１２は、本発明の第８の実施例
の動作を示すフローチャートであり、前記図７に示した
話速変換装置の動作を示すフローチャートに第１閾値Ｓ
１及び第２閾値Ｓ２を使用者が変更・設定する処理であ
るステップＳ８５を追加したものである。また、前記図
７に示した処理と同一の部分には同一の符号を付し、そ
の詳細な説明は割愛する。FIG. 12 is a flowchart showing the operation of the eighth embodiment of the present invention. The flowchart showing the operation of the speech speed converter shown in FIG.
Step S85, which is a process in which the user changes and sets the first and second threshold values S2, is added. Further, the same parts as those in the processing shown in FIG. 7 are denoted by the same reference numerals, and detailed description thereof will be omitted.

【０１５２】ステップ７１において使用者がモード（前
記表１に示したゆっくり再生モード１〜４のいずれか、
あるいは前記表２に示した早口再生モード１〜４のいず
れか）を選択した後、続く次のステップ８５において閾
値変更操作部１８を操作して第１閾値Ｓ１及び第２閾値
Ｓ２を変更することができる。At step 71, the user sets the mode (one of the slow reproduction modes 1 to 4 shown in Table 1 above,
Alternatively, after selecting any of the fast playback modes 1 to 4 shown in Table 2 above, in the next step 85, the first threshold value S1 and the second threshold value S2 are changed by operating the threshold value changing operation unit 18. Can be.

【０１５３】尚、前記各実施例においては、作図の都合
上、モード選択や閾値設定の処理がフローチャートの冒
頭部分にのみ存在するが、これらは話速変換処理中にお
いても適宜変更できるように構成してもよい．さらに、
上記の各実施例においては、第１の閾値及び第２の閾値
の両方を変更するように構成しているが、いずれか一方
を変更できるように構成してもよい。In each of the above embodiments, the mode selection and threshold setting processes are present only at the beginning of the flowchart for the sake of drawing. However, these processes can be changed as needed during the speech speed conversion process. You may. further,
In each of the above embodiments, both the first threshold value and the second threshold value are configured to be changed, but one of them may be configured to be changed.

【０１５４】[0154]

【発明の効果】以上、詳述した如く本発明に依れば、入
力された音声信号が早口で話した時の音声信号かどうか
を判断し、早口で話した時の音声信号に対してのみ話速
変換処理によってゆっくりした音声信号に変換すること
ができるので、入力された音声信号がゆっくり話した音
声信号の場合に話速変換処理されてさらにゆっくり話し
た音声信号に話速変換されるということがない。As described in detail above, according to the present invention, it is determined whether or not an input audio signal is an audio signal when speaking in a utterance, and only for an audio signal when speaking in an utterance. Since it can be converted into a slow voice signal by voice speed conversion processing, if the input voice signal is a voice signal that is slowly spoken, the voice speed conversion processing is performed and the voice speed is converted to a voice signal that is further slowly spoken Nothing.

【０１５５】また、本発明に依れば、話速に応じて話速
変換処理の倍率が適宜変更される。Further, according to the present invention, the magnification of the speech speed conversion processing is appropriately changed according to the speech speed.

【０１５６】さらに、本発明に依れば、話速に応じて話
速変換処理の倍率が適宜変更されるだけでなく、標準的
な速度で話した音声信号に対しては話速変換処理が成さ
れない。Further, according to the present invention, not only the scaling factor of the speech speed conversion process is appropriately changed according to the speech speed, but also the speech speed conversion process is performed for a speech signal spoken at a standard speed. Not done.

【０１５７】そして、本発明に依れば、早口の音声信号
に対してはゆっくり話した音声信号に変換するべく話速
変換処理が行われ、ゆっくり話した音声信号に対しては
早口の音声信号にするべく話速変換処理が行われ、さら
に標準的な速度で話した音声信号に対しては話速変換処
理が成されない。[0157] According to the present invention, the speech rate conversion process is performed on the fast-speech audio signal so as to convert the speech signal into a slowly spoken speech signal. The speech speed conversion process is performed in order to achieve the above, and the speech speed conversion process is not performed on the voice signal spoken at a standard speed.

[Brief description of the drawings]

【図１】本発明の話速変換装置の構成を示す概略ブロッ
ク図である。FIG. 1 is a schematic block diagram showing a configuration of a speech speed conversion device of the present invention.

【図２】本発明の話速変換装置における話速変換部の構
成を示す概略ブロック図である。FIG. 2 is a schematic block diagram showing a configuration of a speech speed conversion unit in the speech speed conversion device of the present invention.

【図３】本発明の話速変換装置の動作を説明するための
図である。FIG. 3 is a diagram for explaining the operation of the speech speed conversion device of the present invention.

【図４】本発明の話速変換装置の動作を説明するための
フローチャートである。FIG. 4 is a flowchart for explaining the operation of the speech speed conversion device of the present invention.

【図５】本発明の第２の実施例の動作を説明するための
フローチャートである。FIG. 5 is a flowchart for explaining the operation of the second embodiment of the present invention.

【図６】本発明の第３の実施例の動作を説明するための
フローチャートである。FIG. 6 is a flowchart for explaining the operation of the third embodiment of the present invention.

【図７】本発明の第４の実施例の動作を説明するための
フローチャートである。FIG. 7 is a flowchart for explaining the operation of the fourth embodiment of the present invention.

【図８】本発明の他の実施例の話速変換装置の構成を示
す概略ブロック図である。FIG. 8 is a schematic block diagram showing a configuration of a speech speed conversion device according to another embodiment of the present invention.

【図９】本発明の第５の実施例の動作を説明するための
フローチャートである。FIG. 9 is a flowchart for explaining the operation of the fifth embodiment of the present invention.

【図１０】本発明の第６の実施例の動作を説明するため
のフローチャートである。FIG. 10 is a flowchart for explaining the operation of the sixth embodiment of the present invention.

【図１１】本発明の第７の実施例の動作を説明するため
のフローチャートである。FIG. 11 is a flowchart for explaining the operation of the seventh embodiment of the present invention.

【図１２】本発明の第８の実施例の動作を説明するため
のフローチャートである。FIG. 12 is a flowchart for explaining the operation of the eighth embodiment of the present invention.

[Explanation of symbols]

１ピッチ周期抽出部２ピッチ周期記憶部３ピッチ周期比較部４カウンタ５閾値設定部６比較部７話速変換部８モード選択部１１音声時間軸圧縮伸長部１２無音区間検出部１３話速制御部１４音声符号化部１５メモリ１６音声復号化部１７蓄積量検出部１８閾値変更操作部 DESCRIPTION OF SYMBOLS 1 Pitch period extraction part 2 Pitch period storage part 3 Pitch period comparison part 4 Counter 5 Threshold setting part 6 Comparison part 7 Speech rate conversion part 8 Mode selection part 11 Voice time axis compression / decompression part 12 Silence section detection part 13 Speech rate control part 14 audio encoding unit 15 memory 16 audio decoding unit 17 accumulated amount detecting unit 18 threshold value changing operation unit

Claims

[Claims]

1. A pitch cycle detecting means for detecting a pitch cycle from an audio signal; a counting means for counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means; Comparing and judging means for comparing a product of the pitch period extracted by the means and the number of repetitions counted by the counting means with a predetermined threshold; and speech speed converting means for performing speech speed conversion based on the judgment result of the comparing and judging means And a speech speed conversion device.

2. A pitch cycle detecting means for detecting a pitch cycle from an audio signal; a counting means for counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means; Comparing and judging means for comparing a product of the pitch period extracted by the means and the number of repetitions counted by the counting means with a predetermined threshold; threshold changing means for changing the predetermined threshold; judgment results of the comparing and judging means And a speech speed conversion means for performing speech speed conversion based on the speech speed.

3. The speech speed converting means, wherein the product of the pitch cycle extracted by the pitch cycle detecting means and the number of repetitions counted by the counting means does not exceed a predetermined threshold value in the comparing and judging means. 3. The speech speed conversion device according to claim 1, wherein the speech speed is converted into a slow voice signal.

4. The speech speed conversion means, when the comparison determination means determines that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means does not exceed a predetermined threshold. The voice signal is converted into a slow voice signal and the speech speed is converted, and the product of the pitch cycle extracted by the pitch cycle detection means in the comparison determination means and the number of repetitions counted by the counting means exceeds a predetermined threshold value. 4. The speech speed conversion device according to claim 1, wherein the speech speed conversion is not performed when the judgment is made.

5. The speech speed conversion means, when the comparison / determination means determines that the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is within a predetermined threshold value. If the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is determined to be greater than a predetermined threshold, the speech rate magnification is decreased. 4. The speech speed conversion device according to claim 1, wherein the speech speed conversion is performed. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

6. A pitch cycle detecting means for detecting a pitch cycle from an audio signal; a counting means for counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means; Means for comparing a product of the pitch period extracted by the means and the number of repetitions counted by the counting means with a predetermined first threshold value and a predetermined second threshold value; And a speech speed conversion means for performing speech speed conversion. (However, the first threshold value <the second threshold value)

7. A pitch cycle detecting means for detecting a pitch cycle from an audio signal; a counting means for counting the number of repetitions of a predetermined pitch cycle based on the pitch cycle extracted by the pitch cycle detecting means; Means for comparing a product of the pitch period extracted by the means and the number of repetitions counted by the counting means with a predetermined first threshold value and a predetermined second threshold value; and the predetermined first threshold value or the predetermined value. A speech rate conversion device comprising: a threshold value changing unit that changes the second threshold value; and a speech speed conversion unit that performs speech speed conversion based on the determination result of the comparison determination unit. (However, the first threshold value <the second threshold value)

8. The speech speed conversion means, wherein the product of the pitch cycle extracted by the pitch cycle detection means in the comparison determination means and the number of repetitions counted by the counting means is a predetermined first threshold value and a predetermined first threshold value. 8. The speech speed conversion device according to claim 6, wherein the speech speed conversion is not performed when it is determined that the speech speed is between two threshold values.

9. The speech speed conversion means, wherein the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means is smaller than a predetermined first threshold value in the comparison determination means. 9. The speech speed conversion device according to claim 6, wherein when the judgment is made, the speech speed conversion is performed with the speech speed magnification reduced. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

10. The speech speed conversion means, wherein the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means in the comparison determination means is greater than a predetermined second threshold value. 10. The speech speed conversion device according to claim 6, wherein when the judgment is made, the speech speed conversion is performed by increasing the speech speed magnification. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

11. The speech speed conversion means, wherein the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means in the comparison determination means is smaller than a predetermined first threshold value. 9. The speech speed conversion device according to claim 6, wherein when the judgment is made, the speech speed conversion is performed with the speech speed magnification smaller than one. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

12. The speech speed conversion means, wherein the product of the pitch cycle extracted by the pitch cycle detection means and the number of repetitions counted by the counting means in the comparison determination means is greater than a predetermined second threshold value. 9. The speech speed conversion device according to claim 6, wherein when the judgment is made, the speech speed conversion is performed with the speech speed magnification greater than 1. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

13. The speech speed conversion unit according to claim 1, wherein the speech speed conversion unit changes the speech speed magnification in accordance with an empty capacity of the storage unit that stores the speech signal whose speech speed has been converted. Speech speed converter. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

14. The speech speed conversion means changes the speech speed magnification to be closer to 1 as the empty capacity of the storage means for storing speech speed converted speech signals decreases. The speech speed conversion device according to claim 1. (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

15. The speech speed conversion means changes the speech speed magnification to be closer to a predetermined magnification in accordance with an increase in the empty capacity of the storage means for storing the speech speed converted speech signal. 14. The speech speed conversion device according to claim 1, wherein: (However, speech speed ratio = time length of input voice signal / time length of output voice signal)

16. The pitch cycle according to claim 1, wherein the predetermined pitch cycle is the same pitch cycle, a double pitch cycle, a half pitch cycle, and a pitch cycle that is close to them. Item 16. The talking-side conversion device according to Item 15.