JPH08292790A

JPH08292790A - Video tape recorder

Info

Publication number: JPH08292790A
Application number: JP7095335A
Authority: JP
Inventors: Tomoshi Tanaka; 智志田中
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1995-04-20
Filing date: 1995-04-20
Publication date: 1996-11-05
Anticipated expiration: 2016-05-08
Also published as: JP3162945B2

Abstract

PURPOSE: To perform a speech speed changing so that the deviation between a video and a voice is made smaller and voice information are obtained as much as possible at the time of a double speed mode and to reproduce one part of the voice at a normal speed by a thinning processing at the time of a reproducing equal to or faster than a triple speed mode. CONSTITUTION: At the time of a double speed reproducing, this tape recorder has a double speed mode performing the speech speed changing performing a compression or elimination processing with respect to an input voice signal according to whether a reproducing signal is voice sections or silent sections. Then, at the time of a ±N (N is a natural number of >=3) multiple speed reproducing, this tape recorder is made so as to be changed over even to an N multiple speed reproducing mode performing the thinning processings of voice sections of prescribed periods of the reproducing voice signal according to the reproducing speed by the command of a system microcomputer 114.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号の話速を変
える話速変換装置を備えた、ビデオテープレコーダ（Ｖ
ＴＲ）関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video tape recorder (V) equipped with a voice speed conversion device for changing the voice speed of a voice signal.
TR)

【０００２】[0002]

【従来の技術】２倍速でも音声を通常のスピードで聞け
るようにしたＶＴＲが商品化されており、その基本的構
成については例えば雑誌「エレクトロニクス」１９９３
年４月号、３４頁〜３７頁に記載されている。2. Description of the Related Art VTRs have been commercialized so that they can listen to sound at a normal speed even at double speed. The basic structure of the VTR is described in, for example, magazine "Electronics" 1993.
April issue, pp. 34-37.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
ようなＶＴＲは２倍速再生時においては、通常のスピー
ドで音声が聞けるものの、音声情報を無条件に半分間引
いて削除してしまう方法であるため、再生音声から記録
されたテープの内容を把握することができなくなった
り、映像と音声の時間的なズレが大きいという問題点が
あった。However, in the VTR as described above, although the voice can be heard at the normal speed during the double speed reproduction, the voice information is unconditionally pulled for half a minute and deleted. However, there are problems that it is not possible to grasp the contents of the tape recorded from the reproduced audio, and there is a large time difference between the video and audio.

【０００４】従って本発明は２倍速再生時には映像と音
声のズレを小さくでき、しかも可及的に音声情報が得ら
れるように音声スピードを制限するように話速変換を行
い、２倍速より早い、例えば早送り、巻戻し再生時の速
度（５倍、９倍）で、音声速度を再生するときには従来
の２倍速再生と同様の方法により音声を再生するように
したビデオテープレコーダを提供しようとするものであ
る。Therefore, according to the present invention, the difference between the video and the audio can be reduced during the double speed reproduction, and the voice speed conversion is performed so as to limit the audio speed so that the audio information can be obtained as much as possible. For example, it is intended to provide a video tape recorder that reproduces audio by the same method as the conventional double speed reproduction when reproducing the audio speed at the speed (5 times or 9 times) at the time of fast-forwarding and rewinding reproduction. Is.

【０００５】[0005]

【課題を解決するための手段】この発明によると２倍速
再生時においては、再生音声信号が音声区間であるか無
音区間であるかに応じて、入力音声信号に対して圧縮伸
長処理または削除処理を行なう話速変換を行う２倍速音
声再生モードと、±Ｎ倍速（Ｎ：３以上の自然数）再生
時においては、再生音声信号の所定の期間の音声区間を
この再生倍速に応じて間引く処理を行うＮ倍速再生モー
ドとを設定する制御手段をビデオテープレコーダに備え
てなる。According to the present invention, during double speed reproduction, compression / expansion processing or deletion processing is performed on an input audio signal depending on whether the reproduced audio signal is a voice section or a silent section. In the double speed audio reproduction mode in which the voice speed conversion is performed, and in the case of ± N times speed reproduction (N: a natural number of 3 or more), the processing for thinning out the audio section of the reproduction audio signal for a predetermined period according to the reproduction speed is performed. The video tape recorder is equipped with control means for setting the N-speed reproduction mode to be performed.

【０００６】また、上記のビデオテープレコーダにおい
て、２倍速音声再生モードを行うべく、入力音声信号を
話速変換処理する話速変換処理手段、話速変換処理手段
の出力が書き込まれるリングメモリ、およびリングメモ
リからデータを一定速度で読み出す手段を備え、話速変
換処理手段は、入力音声信号が音声区間であるか無音区
間であるかおよびリングメモリの蓄積量に応じて、入力
音声信号に対して圧縮伸長処理または削除処理を行なう
手段を備えた話速変換装置を有することを特徴とする。Further, in the above video tape recorder, in order to perform the double speed audio reproduction mode, a speech speed conversion processing means for converting the speech speed of the input audio signal, a ring memory to which the output of the speech speed conversion processing means is written, and The speech speed conversion processing means includes means for reading data from the ring memory at a constant speed, and the speech speed conversion processing means for the input voice signal according to whether the input voice signal is a voice section or a silent section and the amount of storage in the ring memory. It is characterized by having a speech speed conversion device provided with means for performing compression / expansion processing or deletion processing.

【０００７】また、上記のビデオテープレコーダにおい
て、２倍速音声再生モードを行うべく、入力されるアナ
ログ音声信号を設定された再生速度倍率に応じたサンプ
リング周波数でサンプリングするＡ／Ｄ変換手段、Ａ／
Ｄ変換手段から出力された音声信号が入力されるフレー
ムメモリ、フレームメモリに所要数の音声信号が入力さ
れるごとに、それらの音声信号に対して話速変換処理を
行なう話速変換処理手段、話速変換処理手段の出力が書
き込まれるリングメモリ、リングメモリから一定速度で
データを読み出す読出手段、およびリングメモリの書き
込み信号と読み出し信号とに基づいて、リングメモリの
蓄積量を算出する蓄積量算出手段を備えており、話速変
換処理手段は、フレームメモリに入力された所要数の音
声信号に対応する入力音声が、音声区間か無音区間かを
判別する区間判別手段、ならびに、区間判別手段の出力
および蓄積量算出手段の出力に応じて、上記所要数の音
声信号に対して圧縮伸長処理または削除処理を行なう信
号処理手段を備えた話速変換装置を有することを特徴と
する。Further, in the above video tape recorder, an A / D conversion means for sampling the input analog audio signal at a sampling frequency according to a set reproduction speed multiplication factor in order to perform a double speed audio reproduction mode,
A frame memory to which the voice signals output from the D conversion means are input, and a voice speed conversion processing means that performs a voice speed conversion process on the voice signals every time a required number of voice signals is input to the frame memory, A ring memory to which the output of the speech speed conversion processing means is written, a reading means for reading data from the ring memory at a constant speed, and a storage amount calculation for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory. The speech speed conversion processing means includes a section determining means for determining whether the input voice corresponding to the required number of voice signals input to the frame memory is a voice section or a silent section, and the section determining means. Signal processing means for performing compression / expansion processing or deletion processing on the required number of audio signals in accordance with the output and the output of the storage amount calculation means Characterized in that it has a speech speed converting device.

【０００８】また、上記のビデオテープレコーダにおい
て、２倍速音声再生モードを行うべく、入力されるディ
ジタル音声信号が、設定された再生速度倍率に応じた速
度で書き込まれるフレームメモリ、フレームメモリに所
要数の音声信号が入力されるごとに、それらの音声信号
に対して話速変換処理を行なう話速変換処理手段、話速
変換処理手段の出力が書き込まれるリングメモリ、１倍
速再生時のフレームメモリへの書込み速度と等しい周波
数の読み出し信号に基づいて、リングメモリからデータ
を読み出す読出手段、およびリングメモリの書き込み信
号と読み出し信号とに基づいて、リングメモリの蓄積量
を算出する蓄積量算出手段を備えており、話速変換処理
手段は、フレームメモリに入力された所要数の音声信号
に対応する入力音声が、音声区間か無音区間かを判別す
る区間判別手段、ならびに、区間判別手段の出力および
蓄積量算出手段の出力に応じて、上記所要数の音声信号
に対して圧縮伸長処理または削除処理を行なう信号処理
手段を備えている話速変換装置を有することを特徴とす
る。Further, in the above video tape recorder, in order to perform the double speed audio reproduction mode, the input digital audio signal is written into the frame memory and the required number of frame memories in which the digital audio signal is written at a speed corresponding to the set reproduction speed multiplication factor. To the frame memory at the time of the 1 × speed reproduction, the speech speed conversion processing means for performing the speech speed conversion processing on these speech signals every time they are inputted, the output of the speech speed conversion processing means are written. Read-out means for reading data from the ring memory based on a read-out signal having a frequency equal to the writing speed of the ring memory, and storage amount calculation means for calculating the storage amount in the ring memory based on the write-in signal and the read-out signal of the ring memory. The speech speed conversion processing means uses the input sound corresponding to the required number of audio signals input to the frame memory. Is a section discriminating unit for discriminating between a voice section and a silent section, and performs compression / expansion processing or deletion processing on the required number of voice signals in accordance with the output of the section discriminating unit and the output of the storage amount calculating unit. It is characterized in that it has a speech speed conversion device provided with signal processing means.

【０００９】上記ビデオテープレコーダにおいて、Ｎ倍
速再生モードを行うべく、Ｎ倍速で音声データをメモリ
に書き込み、１倍速でその書き込まれたデータを読み出
すようにメモリを制御するようにしたことを特徴とす
る。In the above video tape recorder, the memory is controlled so that the audio data is written into the memory at N times speed and the written data is read out at 1 times speed in order to perform the N times speed reproduction mode. To do.

【００１０】[0010]

【作用】この発明によれば、２倍速再生時においては、
適応型話速変換処理を行い、３倍速以上のときは単純間
引き処理を行うようにＶＴＲが制御される。According to the present invention, during double speed reproduction,
The VTR is controlled so that the adaptive voice speed conversion process is performed and the simple thinning process is performed when the speed is three times or more.

【００１１】この発明によれば、２倍速再生時において
は入力音声信号が音声区間であるか無音区間であるかに
応じて、入力音声信号に対して圧縮伸長処理または削除
処理が行われる。According to the present invention, during double-speed reproduction, the compression / expansion process or the deletion process is performed on the input audio signal depending on whether the input audio signal is the audio section or the silent section.

【００１２】この発明によれば、２倍速再生時において
は、入力音声信号は話速変換処理手段によって、話速変
換処理される。話速変換処理手段の出力は、リングメモ
リに書き込まれる。リングメモリに書き込まれたデータ
は、一定速度で読み出される。話速変換処理手段におい
ては、入力音声信号が音声区間であるか無音区間である
かおよびリングメモリの蓄積量に応じて、入力音声信号
に対して圧縮伸長処理または削除処理が行なわれる。According to the present invention, during the double speed reproduction, the input voice signal is subjected to the voice speed conversion processing by the voice speed conversion processing means. The output of the speech speed conversion processing means is written in the ring memory. The data written in the ring memory is read at a constant speed. In the speech speed conversion processing means, compression / expansion processing or deletion processing is performed on the input voice signal depending on whether the input voice signal is a voice section or a silent section and the amount of storage in the ring memory.

【００１３】この発明によれば、２倍速再生時において
は、入力されるアナログ音声信号は、Ａ／Ｄ変換手段に
より、設定された再生速度倍率に応じたサンプリング周
波数でサンプリングされる。Ａ／Ｄ変換手段から出力さ
れた音声信号は、フレームメモリに入力される。フレー
ムメモリに所要数の音声信号が入力されるごとに、話速
変換処理手段により、それらの音声信号に対して話速変
換処理が行なわれる。話速変換処理手段の出力は、リン
グメモリに書き込まれる。リングメモリに書き込まれた
データは、１倍速再生時のサンプリング周波数に等しい
周波数の読み出し信号に基づいて読み出される。リング
メモリの書き込み信号と読み出し信号とに基づいて、蓄
積量算出手段によって、リングメモリの蓄積量が算出さ
れる。According to the present invention, during double speed reproduction, the input analog audio signal is sampled by the A / D conversion means at the sampling frequency according to the set reproduction speed multiplication factor. The audio signal output from the A / D conversion means is input to the frame memory. Each time a required number of voice signals are input to the frame memory, the voice speed conversion processing means performs the voice speed conversion process on the voice signals. The output of the speech speed conversion processing means is written in the ring memory. The data written in the ring memory is read based on a read signal having a frequency equal to the sampling frequency at the 1 × speed reproduction. The storage amount calculation means calculates the storage amount of the ring memory based on the write signal and the read signal of the ring memory.

【００１４】この発明によれば、２倍速再生時において
は、フレームメモリに入力された所要数の音声信号に対
する入力音声が、区間判別手段により、音声区間か無音
区間かが判別される。そして、区間判別手段の出力およ
び蓄積量算出手段の出力に応じて、上記所要数の音声信
号に対して圧縮伸長処理または削除処理が行なわれる。According to the present invention, at the time of double speed reproduction, the section discrimination means discriminates the input speech for the required number of speech signals inputted to the frame memory by the section discriminating means. Then, according to the output of the section discriminating means and the output of the accumulated amount calculating means, the compression / expansion processing or the deletion processing is performed on the required number of audio signals.

【００１５】この発明によれば、２倍速再生時において
は、入力されるディジタル音声信号が、設定された再生
速度倍率に応じた速度でフレームメモリに書き込まれ
る。フレームメモリに所要数の音声信号が入力されるご
とに、話速変換処理手段により、それらの音声信号に対
して話速変換処理が行なわれる。話速変換処理手段の出
力は、リングメモリに書き込まれる。リングメモリに書
き込まれたデータは、読み出し信号に基づいて一定速度
で読み出される。リングメモリの書き込み信号と読み出
し信号とに基づいて、蓄積量算出手段によって、リング
メモリの蓄積量が算出される。According to the present invention, during double speed reproduction, the input digital audio signal is written in the frame memory at a speed corresponding to the set reproduction speed multiplication factor. Each time a required number of voice signals are input to the frame memory, the voice speed conversion processing means performs the voice speed conversion process on the voice signals. The output of the speech speed conversion processing means is written in the ring memory. The data written in the ring memory is read at a constant speed based on the read signal. The storage amount calculation means calculates the storage amount of the ring memory based on the write signal and the read signal of the ring memory.

【００１６】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。In the speech speed conversion processing means, the section discrimination means discriminates whether the input voice for the required number of voice signals input to the frame memory is the voice section or the silent section. Then, according to the output of the section discriminating means and the output of the accumulated amount calculating means, the compression / expansion processing or the deletion processing is performed on the required number of audio signals.

【００１７】本発明によれば、±Ｎ倍速再生時に一部の
音声が伸長されて、残余の信号が間引かれるAccording to the present invention, a part of the sound is expanded during the ± N speed reproduction, and the remaining signal is thinned out.

【００１８】[0018]

【実施例】以下、図面を参照して、この発明の実施例に
ついて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１９】図２３は本発明を実施したＶＴＲの概略ブ
ロック図を示しており、テープＴの横トラックから再生
されたモノラル音声信号は、音声ヘッドＨにてピックア
ップされイコライザアンプ１１１に入力される。イコラ
イザアンプ１１１でイコライズおよびアンプされた音声
信号は切り替え回路１１５の端子ｂおよび話速変換ＩＣ
１１２に供給される。話速変換ＩＣ１１２の出力は前記
切り替え回路１１５の端子ａに供給される。前記切り替
え回路１１５はシステムマイコン１１４からの指令に基
づいて端子ａまたはｂを選択し、その選択した出力をミ
ュート回路１１６を介して出力する。すなわち、テープ
Ｔから出力される音声は端子ｂから出力される場合、テ
ープ速度比例した速度の音声信号が出力され、端子ａか
らは再生音声信号に圧縮、伸長、削除処理などの処理が
施された信号が出力される。話速変換ＩＣ１１２は、シ
ステムマイコン１１４によって制御されるとともに、メ
モリ（ダイナミックＲＡＭ）１１３と協働して上記の処
理を行う。FIG. 23 is a schematic block diagram of a VTR embodying the present invention. A monaural audio signal reproduced from a horizontal track of a tape T is picked up by an audio head H and input to an equalizer amplifier 111. The audio signal equalized and amplified by the equalizer amplifier 111 is connected to the terminal b of the switching circuit 115 and the speech speed conversion IC.
112. The output of the speech speed conversion IC 112 is supplied to the terminal a of the switching circuit 115. The switching circuit 115 selects the terminal a or b based on a command from the system microcomputer 114, and outputs the selected output via the mute circuit 116. That is, when the audio output from the tape T is output from the terminal b, an audio signal having a speed proportional to the tape speed is output, and the reproduced audio signal is processed from the terminal a such as compression, expansion and deletion. Signal is output. The speech speed conversion IC 112 is controlled by the system microcomputer 114 and performs the above-described processing in cooperation with the memory (dynamic RAM) 113.

【００２０】次に図２３の動作を図２４のフローチャー
トを参照しつつ説明する。Next, the operation of FIG. 23 will be described with reference to the flowchart of FIG.

【００２１】まず、システムマイコン１１４は当該ＶＴ
Ｒが再生モードがどうかの判断を行う（Ｓ１）。ステッ
プ１で再生モードであると判断されると次に２倍速再生
キー１１７が押されたかどうかの判断がステップ２（Ｓ
２）で行われる。この２倍速再生キーが押されると、テ
ープ速度を２倍速にして、映像および音声が再生される
ようになっている。そして、音声に関しては２倍速再生
キー１１７が押されると２倍速の適応話速処理（Ｓ３）
が行われる。First, the system microcomputer 114 determines that the VT
R determines whether the reproduction mode is set (S1). When it is determined in step 1 that the mode is the reproduction mode, it is then determined in step 2 (S) whether the double speed reproduction key 117 is pressed.
2). When this double speed playback key is pressed, the tape speed is doubled and video and audio are played back. Then, for the voice, when the double speed reproduction key 117 is pressed, the double speed adaptive talk speed process (S3).
Is done.

【００２２】この適応話速処理について図２５を用いて
説明すると、通常速度の再生音声が図２５（ａ）のよう
に時間Ｔで「話速変換を用いたＶＴＲです」なる音声が
再生されるものとし、これに２倍速の適応話速処理をか
けると同図（ｂ）のように１／２の１の時間（Ｔ／２）
で再生するが、このために通常速度の音声の単語と単語
の間の無音部分、すなわち話をしていない部分を削除
し、単語の部分は通常速度の音声と近い話速のままで単
語をつなげている。このとき、物音などの周囲の雑音の
ため完全な無音は存在しないが、無音判定のしきい値を
音声の状況に応じて適応的に変化させることによって周
囲の雑音と目的とする音声を判別している。尚、無音を
取り除いただけでは再生時間が１／２にならない場合、
音声の一部を無音区間の長さに応じて適応的に圧縮する
ことにより、再生時間を１／２にしている。そして、再
生が終了したとステップ６（Ｓ６）で判断されると再生
が終了する。This adaptive voice speed processing will be described with reference to FIG. 25. The reproduced voice at the normal speed is reproduced at time T as "VTR using voice speed conversion" as shown in FIG. 25 (a). Assuming that this is applied to the double speed adaptive speech speed processing, a time of 1/2 (T / 2) as shown in FIG.
However, for this reason, the silent part between the words of the normal speed voice, that is, the part that is not speaking, is deleted, and the word part is left at the voice speed close to that of the normal speed voice. Connected. At this time, there is no complete silence due to ambient noise such as object noise, but by distinguishing the ambient noise from the target voice by adaptively changing the silence determination threshold according to the voice situation. ing. If the playback time is not halved just by removing the silence,
The playback time is halved by adaptively compressing a part of the voice according to the length of the silent section. Then, when it is judged in step 6 (S6) that the reproduction is completed, the reproduction is completed.

【００２３】ステップ２（Ｓ２）で２倍速再生キーが押
されていない場合、次にステップ４（Ｓ４）で±Ｎ倍速
モードになったかどうかの判断がされる。この±Ｎ倍速
モードとは例えば早送り再生、または巻戻し再生時がこ
れに相当する。そして、このステップ４で±Ｎ倍速モー
ドでないと判断されると、ステップ６に戻る。If the double speed reproduction key is not pressed in step 2 (S2), then it is judged in step 4 (S4) whether or not the mode is ± N double speed mode. This ± N speed mode corresponds to, for example, fast-forward reproduction or rewind reproduction. Then, when it is determined in step 4 that the mode is not the ± N speed mode, the process returns to step 6.

【００２４】ステップ４で±Ｎ倍速モードである場合、
単純間引き処理による圧縮処理を行い（Ｓ５）、ステッ
プ６に進む。In step 4, if the mode is ± N double speed mode,
The compression process is performed by the simple thinning process (S5), and the process proceeds to step 6.

【００２５】前記単純間引き処理を図２６を参照しつつ
説明すると、例えば３倍速での再生音声信号の場合、図
２６（ａ）で示すようにＴ１時間で「ビデオテープレコ
ーダに時間差適応話速変換回路を」なる音声信号が再生
されてるが、この再生信号に対し、単純間引き処理を行
うと、同図（ｂ）に示すようにＴ２期間である「ビデオ
テ」が伸長されてＴ１時間で再生される。すなわち、３
倍速再生の音声信号を２／３を間引いて（間引かれる期
間：ＴD）標準の話速に変換される。換言すれば、Ｔ
１：Ｔ２＝３：１になる。The simple thinning-out process will be described with reference to FIG. 26. For example, in the case of a reproduced audio signal at a triple speed, as shown in FIG. 26 (a), the time difference adaptive voice speed conversion to the video tape recorder is performed at T1 time. The audio signal "Circuit" is reproduced, but when the simple thinning process is performed on this reproduced signal, "Video TE" which is the T2 period is expanded and reproduced in the T1 time as shown in FIG. It That is, 3
The voice signal for double speed reproduction is converted into the standard speech speed by thinning out 2/3 (thinned period: TD). In other words, T
1: T2 = 3: 1.

【００２６】また同様に、入力信号が５倍速再生の場
合、４／５を間引いて標準話速に変換し（Ｔ１：Ｔ２＝
５：１）、入力信号が９倍速再生の場合、８／９を間引
いて標準話速に変換する（Ｔ１：Ｔ２＝９：１）。Similarly, when the input signal is reproduced at 5 × speed, 4/5 is thinned out and converted to the standard speech speed (T1: T2 =
5: 1), when the input signal is 9 × speed reproduction, 8/9 is thinned out and converted to the standard speech speed (T1: T2 = 9: 1).

【００２７】また、逆転Ｎ倍速再生時（−Ｎ倍速再生
時）に対する単純間引き処理を図５を参照しつつ説明す
ると、図２７に示す通り逆転入力音声の一部を抽出し正
方向の通常話速に変換する。すなわち、図２７（ａ）は
Ｎ倍速の逆転音声信号であり、同図（ｂ）はそれを間引
き処理した場合を示していて、５倍速逆転再生の場合
は、４／５を間引いて標準話速に変換し（Ｔ１：Ｔ２＝
５：１）、９倍速逆転再生の場合は、８／９を間引いて
標準話速に変換する（Ｔ１：Ｔ２＝９：１）。尚、図２
７において、ＴDは間引かれる期間を示す。Further, the simple thinning-out process for the reverse N-times speed reproduction (-N times speed reproduction) will be explained with reference to FIG. 5. As shown in FIG. Convert to speed. That is, FIG. 27 (a) shows an Nx speed reverse audio signal, and FIG. 27 (b) shows a case where it is thinned out. Convert to speed (T1: T2 =
5: 1), in the case of 9 × reverse playback, 8/9 is thinned out and converted to the standard speech speed (T1: T2 = 9: 1). Incidentally, FIG.
In 7, TD indicates the period of thinning.

【００２８】上記のように２倍速再生時においては、適
応型話速変換処理を行い、３倍速以上のときは単純間引
き処理を行うようにしているのは、３倍速以上になると
適応型話速変換を行っても、音声を削除する量が多くな
り、信号処理が簡単な単純間引きとあまり変わらなくな
り、複雑な信号処理を必要とする適応話速をする意味が
無いばかりでなくかえって聞きづらくなるからである。
従って、このように２倍速再生時においては、適応型話
速変換処理を行い、３倍速以上のときは単純間引き処理
を行うようにすることにより、話速変換用ＩＣとしてそ
の構成が簡単になるばかりでなく、ＶＴＲとしての商品
価値を向上せしめることができる。As described above, the adaptive voice speed conversion process is performed at the time of the double speed reproduction, and the simple thinning process is performed when the speed is higher than the triple speed. Even if conversion is performed, the amount of voices deleted will be large, the signal processing will not be much different from the simple thinning out, it will not be meaningless to make adaptive speech speed that requires complicated signal processing, and it will be rather difficult to hear. Because.
Therefore, the adaptive voice speed conversion process is performed during the double speed reproduction, and the simple thinning process is performed when the speed is higher than the triple speed, thereby simplifying the structure of the voice speed conversion IC. Not only that, it can improve the commercial value of the VTR.

【００２９】図１は、上記話速変換ＩＣ内の適応型話速
変換を行う部分に相当する話速変換装置の全体的な構成
を示している。FIG. 1 shows the overall structure of a speech speed conversion device corresponding to a portion for performing adaptive speech speed conversion in the speech speed conversion IC.

【００３０】入力音声信号は、ＡＬＣアンプ１で増幅さ
れた後、Ａ／Ｄ変換部２に送られ、例えば１２ビットの
ディジタル信号に変換される。Ａ／Ｄ変換部２の標準サ
ンプリング周波数は、たとえば８ＫＨｚである。２倍速
再生時には、Ａ／Ｄ変換部２のサンプリング周波数ｆｓ
ＡＤは、１６ＫＨｚとなる。The input audio signal is amplified by the ALC amplifier 1 and then sent to the A / D converter 2 where it is converted into a 12-bit digital signal, for example. The standard sampling frequency of the A / D converter 2 is, for example, 8 KHz. During double speed reproduction, the sampling frequency fs of the A / D converter 2
AD becomes 16 KHz.

【００３１】Ａ／Ｄ変換部２の出力は、ＤＳＰ( Digita
l Signal Processor) ４に送られるとともにレベル検出
部３にも送られる。レベル検出部３は、Ａ／Ｄ変換部２
でＡ／Ｄ変換されたデータが変換レンジの最大値となっ
たときに、ＡＬＣ(automaticlevel control) 信号をＡ
ＬＣアンプ１に出力する。これにより、ＡＬＣアンプ１
のアンプ利得が制御され、Ａ／Ｄ変換部２の入力信号が
最大レンジを越えないようにされる。つまり、ＶＴＲの
再生テープ速度が変化するとＡＬＣアンプ１の入力信号
レベルも変化する。そこで、レベル検出部３の出力に基
づいて、アンプ利得を自動調整することにより、Ａ／Ｄ
変換部２の入力信号が最大レンジを越えないようにして
いる。The output of the A / D converter 2 is a DSP (Digit
l Signal Processor) 4 and the level detector 3. The level detection unit 3 includes the A / D conversion unit 2
When the A / D converted data at the maximum value of the conversion range, ALC (automatic level control) signal is changed to A
Output to LC amplifier 1. As a result, the ALC amplifier 1
The amplifier gain of is controlled so that the input signal of the A / D converter 2 does not exceed the maximum range. That is, when the playback tape speed of the VTR changes, the input signal level of the ALC amplifier 1 also changes. Therefore, by automatically adjusting the amplifier gain based on the output of the level detection unit 3, the A / D
The input signal of the conversion unit 2 is prevented from exceeding the maximum range.

【００３２】ＤＳＰ４は、２フレーム分の音声信号を記
憶できる容量のフレームメモリ５およびフレームメモリ
５に記憶された音声信号に対してフレーム単位で話速変
換処理を行なう話速変換部６とを備えている。１フレー
ムは、ここでは、２００個のサンプリングデータから構
成されるものとする。The DSP 4 is provided with a frame memory 5 having a capacity capable of storing voice signals for two frames, and a voice speed conversion unit 6 for performing voice speed conversion processing on the voice signals stored in the frame memory 5 in units of frames. ing. Here, it is assumed that one frame is composed of 200 pieces of sampling data.

【００３３】フレームメモリ５内の前半領域および後半
領域のうち、一方の領域に記憶された１フレーム分の音
声信号に対して話速変換部６により処理が行なわれると
同時に、他方の領域にＡ／Ｄ変換部２からの信号が蓄積
される。そして、この他方の領域に１フレーム分の信号
が蓄積されると、今度はその領域内のデータに対して話
速変換部６により処理が行なわれると同時に、既に処理
が行なわれたデータが記憶されていた上記一方の領域に
Ａ／Ｄ変換部２からの信号が蓄積される。Of the first half area and the second half area in the frame memory 5, the voice signal for one frame stored in one area is processed by the speech speed conversion unit 6, and at the same time, in the other area. The signal from the / D converter 2 is accumulated. Then, when a signal for one frame is accumulated in the other area, this time the data in that area is processed by the speech speed conversion unit 6 and at the same time the already processed data is stored. The signal from the A / D conversion unit 2 is accumulated in the above-described one area.

【００３４】話速変換部６から出力されたデータは、書
き込みクロックに基づいてリングメモリ７に書き込まれ
る。リングメモリ７に書き込まれたデータは、読み出し
クロックに基づいて、読み出される。リングメモリ７か
ら読み出された信号は、Ｄ／Ａ変換部８によってアナロ
グ信号に変換された後、アンプ１０で増幅され、音声出
力信号として出力される。The data output from the speech speed converter 6 is written in the ring memory 7 based on the write clock. The data written in the ring memory 7 is read based on the read clock. The signal read from the ring memory 7 is converted into an analog signal by the D / A conversion unit 8, amplified by the amplifier 10, and output as an audio output signal.

【００３５】Ｄ／Ａ変換部８のサンプリング周波数ｆｓ
ＤＡは、８ＫＨｚである。また、リングメモリ７の読み
出しクロックの周波数も８ＫＨｚである。リングメモリ
７としては、２１８４５×１２ｂｉｔのもの、すなわ
ち、２１８４５ワードのものが用いられている。したが
って、リングメモリ７にデータを蓄積できる最大時間
（入力信号に対する出力時間の最大遅延時間）は、２１
８４５×１／８０００＝２．７３秒となる。Sampling frequency fs of D / A converter 8
DA is 8 KHz. The frequency of the read clock of the ring memory 7 is also 8 KHz. As the ring memory 7, a 21845 × 12 bit memory, that is, a 21845 word memory is used. Therefore, the maximum time that data can be stored in the ring memory 7 (maximum delay time of output time with respect to input signal) is 21
It becomes 845 × 1/8000 = 2.73 seconds.

【００３６】リングメモリ７に対する書き込みクロック
は、アップダウンカウンタ９のアップカウント用入力端
子（ＵＰ）に入力する。リングメモリ７に対する読み出
しクロックは、アップダウンカウンタ９のダウンカウン
ト用入力端子（ＤＯＷＮ）に入力する。アップダウンカ
ウンタ９は、入力された書き込みクロックの総数と入力
された読み出しクロックの総数との差（リングメモリ７
の蓄積量）をカウントし、そのカウント値を１５ｂｉｔ
のディジタル信号として出力する。アップダウンカウン
タ９の出力は、話速変換部６に送られる。The write clock for the ring memory 7 is input to the up-count input terminal (UP) of the up-down counter 9. The read clock for the ring memory 7 is input to the down-count input terminal (DOWN) of the up-down counter 9. The up / down counter 9 calculates the difference between the total number of input write clocks and the total number of input read clocks (the ring memory 7
(Accumulation amount of) is counted, and the count value is 15 bits
Output as a digital signal of. The output of the up / down counter 9 is sent to the speech speed conversion unit 6.

【００３７】図２は、話速変換部６の詳細な構成を示し
ている。FIG. 2 shows the detailed structure of the speech speed converter 6.

【００３８】フレームメモリ５から読み出された音声信
号は、パワー計算部１１に送られ、１フレーム分の音声
信号の平均パワー値Ｐが算出される。この平均パワー値
Ｐは、サンプリングされた１フレーム内の各音声信号の
振幅をｉ０、ｉ１、…ｉＮ−１（ただし、Ｎ＝２
００）とすると、次の数式１によって求められる。The audio signal read from the frame memory 5 is sent to the power calculator 11 to calculate the average power value P of the audio signal for one frame. This average power value P is the amplitude of each voice signal sampled in one frame i0, i1, ... iN-1 (where N = 2).
00), it is calculated by the following formula 1.

【００３９】[0039]

【数１】 [Equation 1]

【００４０】パワー計算部１１で求められた平均パワー
値Ｐは、比較部１２に送られる。比較部１２には、しき
い値メモリ１３からしきい値Ｔｈが送られており、平均
パワー値Ｐがしきい値Ｔｈ以上（Ｐ≧Ｔｈ）か、平均パ
ワー値Ｐがしきい値Ｔｈより小さいか（Ｐ＜Ｔｈ）が判
別される。比較部１２からは、平均パワー値Ｐがしきい
値Ｔｈ以上（Ｐ≧Ｔｈ）のときには現フレームが音声区
間であることを示す信号が、平均パワー値Ｐがしきい値
Ｔｈより小さいときには現フレームが無音区間であるこ
とを示す信号が、それぞれ出力される。The average power value P obtained by the power calculation unit 11 is sent to the comparison unit 12. The threshold Th is sent from the threshold memory 13 to the comparison unit 12, and the average power value P is equal to or larger than the threshold Th (P ≧ Th) or the average power value P is smaller than the threshold Th. (P <Th) is determined. When the average power value P is greater than or equal to the threshold value Th (P ≧ Th), the comparison unit 12 outputs a signal indicating that the current frame is in the voice section, and when the average power value P is less than the threshold value Th, the current frame is detected. A signal indicating that each is a silent section is output.

【００４１】しきい値Ｔｈとしては、Ａ／Ｄ変換部２の
量子化ビット数が１２ｂｉｔのときには、たとえば、２
１２に設定される。なお、次のようにして、しきい値Ｔ
ｈを変更するようにしてもよい。すなわち、図２に点線
で示すように、パワー定常状態検出およびしきい値更新
部１４を設ける。パワー定常状態検出およびしきい値更
新部１４は、パワー計算部１１からの平均パワー値Ｐ
が、所定フレーム数（例えば、４０フレーム）にわたっ
て一定であったか否かを判別し、一定であったときには
（定常状態）、そのときの平均パワー値Ｐの２倍の値を
しきい値メモリ１３に書き込み、しきい値Ｔｈを更新さ
せる。ただし、更新されるしきい値の最大値は、所定
値、たとえば２１４に制限される。このようにすること
により、定常的に発生している雑音を無音区間として取
り扱うことができるようになる。The threshold value Th is, for example, 2 when the quantization bit number of the A / D converter 2 is 12 bits.
It is set to 12. The threshold value T is set as follows.
You may make it change h. That is, as shown by the dotted line in FIG. 2, the power steady state detection and threshold updating unit 14 is provided. The power steady state detection / threshold value update unit 14 uses the average power value P from the power calculation unit 11.
Is constant over a predetermined number of frames (for example, 40 frames), and when it is constant (steady state), a value twice the average power value P at that time is stored in the threshold memory 13. Write and update the threshold value Th. However, the maximum value of the updated threshold value is limited to a predetermined value, for example, 214. By doing so, it becomes possible to handle the noise that is constantly generated as a silent section.

【００４２】また、入力信号の音声区間と無音区間と
を、次の数式２で示す各フレームの音声信号のパワー累
積値Ｐａと所与のしきい値とに基づいて判別するように
してもよい。Further, the voice section and the silent section of the input signal may be discriminated on the basis of the power cumulative value Pa of the voice signal of each frame and a given threshold value, which is expressed by the following formula 2. .

【００４３】[0043]

【数２】 [Equation 2]

【００４４】比較部１２の出力は、条件分岐部１５に送
られる。条件分岐部１５には、リングメモリ蓄積量状態
判別部１６の出力が入力している。また、条件分岐部１
５には、パワー計算部１１を介してフレームメモリ５か
らの、音声信号が送られている。さらに、条件分岐部１
５には、ポーズ継続長設定メモリ１７が接続されてい
る。ポーズ継続長設定メモリ１７には、無音区間の削除
開始点を決定するためのポーズ継続長Ｔｄｅｌ（無音削
除開始点判別値）が設定されている。The output of the comparison unit 12 is sent to the conditional branching unit 15. The output of the ring memory storage amount state determination unit 16 is input to the conditional branching unit 15. Also, the conditional branching unit 1
An audio signal from the frame memory 5 is sent to the frame 5 via the power calculator 11. Furthermore, the conditional branching unit 1
A pause continuation length setting memory 17 is connected to 5. In the pause duration setting memory 17, a pause duration Tdel (silence deletion start point determination value) for determining a deletion start point of a silent section is set.

【００４５】リングメモリ蓄積量状態判別部１６は、ア
ップダウンカウンタ９から送られてきた蓄積量に基づい
て、リングメモリ７の状態がオーバーフロー直前状態に
なったこと、およびリングメモリ７の状態がアンダーフ
ロー直前状態になったことを検出する。The ring memory accumulated amount state discriminating unit 16 determines that the state of the ring memory 7 has reached the state immediately before the overflow, and that the state of the ring memory 7 is under, based on the accumulated amount sent from the up / down counter 9. It detects that it is in the state just before the flow.

【００４６】つまり、オーバーフロー検出用データメモ
リ１８にはオーバーフロー検出用データＴｍａｘが、ア
ンダーフロー検出用データメモリ１９にはアンダーフロ
ー検出用データＴｍｉｎが、それぞれ記憶されている。
オーバーフロー検出用データＴｍａｘは、例えば、リン
グメモリ７の総ワード数（ＴＯＴＡＬ）２１８４５より
２００小さい値２１６４５に設定されている。アンダー
フロー検出用データＴｍｉｎは、例えば、２００に設定
されている。That is, the overflow detection data memory 18 stores overflow detection data Tmax, and the underflow detection data memory 19 stores underflow detection data Tmin.
The overflow detection data Tmax is set to a value 21645 smaller than the total number of words (TOTAL) 21845 of the ring memory 7 by 200, for example. The underflow detection data Tmin is set to 200, for example.

【００４７】そして、アップダウンカウンタ９から送ら
れてきた蓄積量がオーバーフロー検出用データＴｍａｘ
以上になると、リングメモリ蓄積量状態判別部１６から
オーバーフロー直前検出信号が出力される。また、アッ
プダウンカウンタ９から送られてきた蓄積量がアンダー
フロー検出用データＴｍｉｎ以下になると、リングメモ
リ蓄積量状態判別部１６からアンダーフロー直前検出信
号が出力される。条件分岐部１５は、オーバーフロー直
前検出信号が入力されているときにはリングメモリ７が
オーバーフロー直前状態であると判別し、アンダーフロ
ー直前検出信号が入力されているときにはリングメモリ
７がアンダーフロー直前状態であると判別する。The accumulated amount sent from the up / down counter 9 is the overflow detection data Tmax.
In the above case, the immediately preceding overflow detection signal is output from the ring memory storage amount state determination unit 16. Further, when the storage amount sent from the up / down counter 9 becomes equal to or less than the underflow detection data Tmin, the ring memory storage amount state determination unit 16 outputs a detection signal immediately before underflow. The conditional branching unit 15 determines that the ring memory 7 is in the state immediately before the overflow when the detection signal immediately before the overflow is input, and the ring memory 7 is in the state immediately before the underflow when the detection signal immediately before the underflow is input. To determine.

【００４８】条件分岐部１５は、比較部１２から送られ
てくる音声区間または無音区間の判別信号と、リングメ
モリ蓄積量状態判別部１６から送られてくるリングメモ
リ状態に関する検出信号と、ポーズ継続長設定メモリ１
７に設定されているポーズ継続長Ｔｄｅｌとに基づい
て、以下の６つのケースに場合分けを行なう。そして、
それに応じて、マルチプレクサ２０を制御して、音声信
号を所定の処理部に送る。（１）第１ケース（ｃａｓｅ１）入力信号が音声区間であり、かつリングメモリ７がオー
バーフロー直前状態ではないと判別されたときには、第
１ケースとなる。The conditional branching unit 15 judges whether the voice section or the silent section is sent from the comparing unit 12, the detection signal concerning the ring memory state sent from the ring memory storage amount state judging unit 16, and the pause continuation. Long setting memory 1
Based on the pause continuation length Tdel set to 7, the following 6 cases are classified. And
In response to this, the multiplexer 20 is controlled to send the audio signal to a predetermined processing unit. (1) First Case (case 1) When it is determined that the input signal is in the voice section and the ring memory 7 is not in the state immediately before overflow, the first case is performed.

【００４９】この場合には、音声信号は、マルチプレク
サ２０を介して、ピッチ圧縮伸長手段２３に送られる。
ピッチ圧縮伸長手段２３は、バリアブルスピーチコ
ントロール（ＶＳＣ）を行なうものであり、再生速度倍
率をｎとすると、入力信号に対して、圧縮率１／ｎより
大きな圧縮率で伸長圧縮処理を行なう。ここで用いられ
る伸長圧縮法としては、例えば、ポインター移動量制御
による重複加算法（Pointer Interval Control Overlap
and Add : ＰＩＣＯＬＡ）、ＴＤＨＳ(TimeDomain Ha
rmonic Scaling)法等がある。ピッチ伸長圧縮手段２３
で伸長圧縮処理が行なわれた信号は、デマルチプレクサ
２７を介してリングメモリ７に送られ、書き込みクロッ
クにしたがって、リングメモリ７に書き込まれる。In this case, the audio signal is sent to the pitch compression / expansion means 23 via the multiplexer 20.
The pitch compression / decompression means 23 performs variable speech control (VSC), and performs decompression / compression processing on the input signal at a compression rate higher than the compression rate 1 / n, where n is the reproduction speed multiplication factor. As the decompression / compression method used here, for example, the overlap addition method (Pointer Interval Control Overlap) by the pointer movement amount control is used.
and Add: PICOLA), TDHS (TimeDomain Ha
rmonic Scaling) method. Pitch extension / compression means 23
The signal subjected to the decompression / compression processing in (1) is sent to the ring memory 7 via the demultiplexer 27, and is written in the ring memory 7 in accordance with the write clock.

【００５０】ＶＴＲの２倍速再生時においては、Ａ／Ｄ
変換部２のサンプリング周波数ｆｓＡＤは１６ＫＨＺで
あり、Ｄ／Ａ変換部８のサンプリング周波数ｆｓＤＡは
８ＫＨＺである。このため、音程は元に戻されて出力さ
れる。During double speed reproduction of VTR, A / D
The sampling frequency fsAD of the conversion unit 2 is 16 KHZ, and the sampling frequency fsDA of the D / A conversion unit 8 is 8 KHZ. Therefore, the pitch is restored and output.

【００５１】従来の一般的な時間軸伸長圧縮において
は、ＶＴＲの２倍速再生時には圧縮率１／２で、圧縮さ
れる。言い換えれば、２ピッチ周期が１ピッチ周期に間
引かれる。このため、出力音声は標準音声速度の２倍速
となる。つまり、２倍速再生の通常再生では、出力音声
は標準音声速度の２倍速となる。ただし、音程は元のま
まとなる。In the conventional general time-base decompression / compression, compression is performed at a compression rate of 1/2 during VTR double speed reproduction. In other words, the 2-pitch cycle is thinned out to the 1-pitch cycle. Therefore, the output voice becomes twice the standard voice speed. That is, in the normal reproduction of the double speed reproduction, the output sound is double the standard sound speed. However, the pitch remains unchanged.

【００５２】これに対し、図２の話速変換部６に設けら
れた上記ピッチ伸長圧縮手段２３では、圧縮率が１／２
より大きな値に設定される。ここでは、圧縮率が２／３
に設定されているとする。言い換えれば、３ピッチ周期
が２ピッチ周期に間引かれる。このため、出力音声は、
標準音声速度の３／２倍速となる。この場合も音程は、
元のままである。このように、圧縮率２／３で圧縮され
た場合には、圧縮率１／２の場合に比べて、２／３−１
／２＝１／６だけ、信号が伸長されることになる。この
伸長分が、リングメモリ７の蓄積量となる。On the other hand, in the pitch expansion / compression means 23 provided in the speech speed conversion unit 6 of FIG. 2, the compression ratio is 1/2.
Set to a larger value. Here, the compression rate is 2/3
Is set to. In other words, the 3-pitch cycle is thinned out to the 2-pitch cycle. Therefore, the output voice is
It is 3/2 times the standard voice speed. In this case as well, the pitch is
It remains as it was. In this way, when compressed at a compression rate of 2/3, it is 2 / 3-1 as compared with the case where the compression rate is 1/2.
The signal will be expanded by / 2 = 1/6. This expanded amount becomes the accumulated amount in the ring memory 7.

【００５３】ＰＩＣＯＬＡを用いて、入力信号を圧縮率
２／３で圧縮する方法について、図３を用いて簡単に説
明する。まず、入力信号からピッチ周期が抽出される。
抽出されたピッチ周期をＴｐとする。波形Ａに対して
は、１から０へ直線的に向かう重み（重み関数Ｋ１）が
つけられて、波形Ａ’が作成される。波形Ｂに対しては
０から１に向かう重み（重み関数Ｋ２）がつけられて、
波形Ｂ’が作成される。A method of compressing an input signal at a compression rate of 2/3 using PICOLA will be briefly described with reference to FIG. First, the pitch period is extracted from the input signal.
The extracted pitch period is Tp. A weight (weighting function K1) that linearly goes from 1 to 0 is added to the waveform A to create the waveform A ′. A weight (weight function K2) from 0 to 1 is attached to the waveform B,
Waveform B'is created.

【００５４】そして、それらの波形Ａ’およびＢ’が加
え合わされ、長さＴｐの波形Ａ’＊Ｂ’が作成される。
これらの重みは、波形Ａ’＊Ｂ’の前後の接続点での連
続性を保つためにつけられている。つぎに、ポインター
が、圧縮率に基づいて決まる長さである３Ｔｐ分だけ移
動され、同様な操作が行なわれる。これにより、３つの
波形Ａ、Ｂ、Ｃから２つの波形Ａ’＊Ｂ’およびＣが得
られる。このようにして、３ピッチ周期分の信号が、２
ピッチ周期分の信号に圧縮される。Then, the waveforms A'and B'are added together to create a waveform A '* B' of length Tp.
These weights are added to maintain continuity at the connection points before and after the waveform A ′ * B ′. Next, the pointer is moved by 3 Tp, which is a length determined based on the compression rate, and the same operation is performed. As a result, two waveforms A ′ * B ′ and C are obtained from the three waveforms A, B, and C. In this way, the signal for 3 pitch periods becomes 2
It is compressed into a signal for a pitch period.

【００５５】ピッチ伸長圧縮手段２３による伸長圧縮法
としては、図１７（ａ）、（ｂ）に示すように、ピッチ
抽出をすることなく、所定長の固定フレーム長Ｔｓ単位
で伸長圧縮処理を行うようにしてもよい。固定フレーム
長Ｔｓは、たとえば入力データの２００個分の長さに設
定される。図１７の例では、３Ｔｓを２Ｔｓにする例を
示している。As the expansion / compression method by the pitch expansion / compression means 23, as shown in FIGS. 17 (a) and 17 (b), expansion / compression processing is performed in units of fixed frame length Ts of a predetermined length without pitch extraction. You may do it. The fixed frame length Ts is set to, for example, the length of 200 pieces of input data. In the example shown in FIG. 17, 3Ts is changed to 2Ts.

【００５６】図１７（ａ）の方法では、固定フレーム長
Ｔｓの波形Ａ、Ｂ、Ｃのうち、波形Ａに対しては、１か
ら０へ直線的に向かう重み（重み関数Ｋ１）がつけられ
て、波形Ａ”が作成される。波形Ｂに対しては０から１
に向かう重み（重み関数Ｋ２）がつけられて、波形Ｂ”
が作成される。In the method of FIG. 17A, of the waveforms A, B, and C having the fixed frame length Ts, the waveform A is weighted linearly from 1 to 0 (weighting function K1). Waveform A "is created as a result. For waveform B, 0 to 1
A weight (weighting function K2) is applied to the waveform B ″
Is created.

【００５７】そして、それらの波形Ａ”およびＢ”が加
え合わされ、長さＴｓの波形Ａ”＊Ｂ”が作成される。
これらの重みは、波形Ａ”＊Ｂ”の前後の接続点での連
続性を保つためにつけられている。そして、次の波形Ｃ
に対しては、そのまま出力される。これにより、３つの
波形Ａ、Ｂ、Ｃから２つの波形Ａ”＊Ｂ”およびＣが得
られる。このようにして、３Ｔｓ分の信号が、２Ｔｓ分
の信号に圧縮される。Then, these waveforms A "and B" are added together to create a waveform A "* B" of length Ts.
These weights are added to maintain continuity at the connection points before and after the waveform A "* B". Then, the next waveform C
Is output as it is. This results in two waveforms A "* B" and C from the three waveforms A, B, C. In this way, the signal for 3 Ts is compressed into the signal for 2 Ts.

【００５８】図１７（ｂ）の方法では、固定フレーム長
Ｔｓの波形Ａ〜Ｃのうちの波形Ａには先頭からたとえば
２０個のデータに０から１へ直線的に向かう重み（重み
関数Ｋ３）をつけて波形Ａ”を得る。波形Ｂには１８１
個目〜２００個目までの入力データに１から０へ直線的
に向かう重み（重み関数Ｋ４）をつけて波形Ｂ”を得
る。そして、波形Ｃを削除する。次の３つの波形Ｄ〜Ｆ
に対しても、同様な処理が行われる。このようにして、
３つの波形Ａ〜Ｃ（またはＤ〜Ｆ）からなる信号は、２
つの波形Ａ”およびＢ”（またはＤ”およびＥ”）から
なる信号に圧縮される。つまり、３Ｔｓ分の信号が、２
Ｔｓ分の信号に圧縮される。In the method of FIG. 17 (b), for example, the waveform A of the fixed frame length Ts of the waveforms A to C has a weight (weighting function K3) that linearly goes from 0 to 1 in the 20 pieces of data from the beginning. To obtain waveform A ″. Waveform B has 181
A waveform B ″ is obtained by adding weights (weighting function K4) that linearly goes from 1 to 0 to the first to 200th input data. Then, the waveform C is deleted. The next three waveforms D to F
The same process is performed for. In this way,
A signal composed of three waveforms A to C (or D to F) is 2
It is compressed into a signal consisting of two waveforms A "and B" (or D "and E"). In other words, the signal for 3Ts is 2
It is compressed into a signal for Ts.

【００５９】上記固定フレーム長単位での伸長圧縮処理
を用いた場合には、ピッチ周期ごとの伸長圧縮処理を用
いた場合に比べて、音質は低下するが、処理量は軽減さ
れる。When the decompression / compression process in fixed frame length units is used, the sound quality is lower than that in the case of using the decompression / compression process for each pitch period, but the processing amount is reduced.

【００６０】なお、この話速変換装置が英語学習器に適
用されている場合には（１倍速再生時）、Ａ／Ｄ変換部
２のサンプリング周波数ｆｓＡＤは８ＫＨＺであり、Ｄ
／Ａ変換部８のサンプリング周波数ｆｓＤＡは８ＫＨＺ
である。この場合には、ピッチ圧縮伸長手段２３で、た
とえば、２ピッチ周期が３ピッチ周期になるように、圧
縮率３／２で音声信号が伸長される。つまり、音声区間
が１．５倍に伸長される。したがって、この場合には、
１倍速再生の通常再生時に対して、３／２−１＝１／２
だけ信号が伸長されることになり、この伸長分がリング
メモリ７の蓄積量となる。（２）第２ケース（ｃａｓｅ２）入力信号が音声区間であり、かつリングメモリ７がオー
バーフロー直前状態であると判別されたときには、第２
ケースとなる。When this speech speed converter is applied to an English learning device (during 1 × speed reproduction), the sampling frequency fsAD of the A / D converter 2 is 8 KHZ and D
The sampling frequency fsDA of the A / A converter 8 is 8 KHZ
Is. In this case, the pitch compression / expansion means 23 expands the audio signal at a compression rate of 3/2 so that the 2-pitch cycle becomes a 3-pitch cycle. That is, the voice section is expanded 1.5 times. So in this case,
3 / 2−1 = 1/2 compared to normal playback at 1 × speed
The signal is expanded only by this amount, and the expanded amount becomes the accumulated amount in the ring memory 7. (2) Second case (case 2) When it is determined that the input signal is in the voice section and the ring memory 7 is in a state immediately before overflow, the second case
It becomes a case.

【００６１】この場合には、音声信号はマルチプレクサ
２０を介して、入力信号削除部２１に送られ、音声信号
が削除される。具体的には、アップダウンカウンタ９の
カウント値が、アンダーフロー検出用データＴｍｉｎ以
下になるまで、すなわちリングメモリ７がアンダーフロ
ー直前状態になるまで、リングメモリ７への書き込み動
作が停止される。In this case, the audio signal is sent to the input signal deleting section 21 via the multiplexer 20, and the audio signal is deleted. Specifically, the write operation to the ring memory 7 is stopped until the count value of the up / down counter 9 becomes equal to or less than the underflow detection data Tmin, that is, until the ring memory 7 is in the state immediately before underflow.

【００６２】リングメモリ７がアンダーフロー直前状態
になると、２００個以下の個数、例えば１００個の消音
信号（値”０”の信号）が消音挿入部２２から出力さ
れ、この消音信号がデマルチプレクサ２７を介してリン
グメモリ７に送られて書き込まれる。このように、消音
信号をリングメモリ７へ書き込んでいるのは、音声削除
によって音声信号の〓ぎ目にクリック音が発生するのを
防止するためである。（３）第３ケース（ｃａｓｅ３）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ未満であり、かつリング
メモリ７がオーバーフロー直前状態ではないと判別され
たときには、第３ケースとなる。When the ring memory 7 is in a state immediately before underflow, 200 or less, for example, 100 mute signals (signals of value "0") are output from the mute insertion section 22, and the mute signals are output from the demultiplexer 27. Is sent to and written in the ring memory 7 via. As described above, the mute signal is written in the ring memory 7 in order to prevent a click sound from being generated at the end of the voice signal due to voice deletion. (3) Third Case (case 3) When it is determined that the input signal is in the silent section, the duration of the silent section is less than the set pause duration Tdel, and the ring memory 7 is not in a state immediately before overflow. , The third case.

【００６３】この場合は、上記第１ケースの場合と同じ
処理が行なわれる。ただし、第３ケースに該当する場合
には、再生速度倍率をｎとすると、１／ｎの圧縮率で伸
長圧縮処理を行ってもよい。つまり、第３ケースに該当
する場合には、１／ｎ以上の圧縮率で伸長圧縮処理が行
われる。（４）第４ケース（ｃａｓｅ４）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ未満であり、かつリング
メモリ７がオーバーフロー直前状態であると判別された
ときには、第４ケースとなる。In this case, the same processing as in the first case is performed. However, in the case of the third case, the decompression / compression process may be performed at a compression ratio of 1 / n, where n is the reproduction speed magnification. That is, in the case of the third case, the decompression / compression process is performed at a compression rate of 1 / n or more. (4) Fourth Case (case 4) When it is determined that the input signal is a silent section, the duration of the silent section is less than the set pause duration Tdel, and the ring memory 7 is in a state immediately before overflow. , The fourth case.

【００６４】この場合は、上記第２ケースの場合と同じ
処理が行なわれる。（５）第５ケース（ｃａｓｅ５）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ以上であり、かつリング
メモリ７がアンダーフロー直前状態ではないと判別され
たときには、第５ケースとなる。In this case, the same processing as in the second case is performed. (5) Fifth Case (case 5) It is determined that the input signal is a silent section, the duration of the silent section is equal to or longer than the set pause duration Tdel, and the ring memory 7 is not in the state immediately before underflow. Sometimes it is the fifth case.

【００６５】この場合には、音声信号はマルチプレクサ
２０を介して、入力信号削除部２５に送られ、音声信号
が削除される。具体的には、リングメモリ７への書き込
み動作が停止される。ただし、音声区間のスタート部分
（無声区間）が欠落するのを防止したり、音声の削除に
よって〓ぎ目にクリック音が発生したりするのを防止す
るために、波形合成挿入部２６によって波形合成挿入処
理が行なわれる。In this case, the audio signal is sent to the input signal deleting section 25 via the multiplexer 20, and the audio signal is deleted. Specifically, the write operation to the ring memory 7 is stopped. However, in order to prevent the start portion (unvoiced section) of the voice section from being lost, and to prevent a click sound from being generated at the end of the voice due to the deletion of the voice, the waveform synthesizing and inserting unit 26 performs waveform synthesizing. Insertion processing is performed.

【００６６】波形合成挿入部２６による波形合成挿入処
理について、図４（ａ）、（ｂ）を用いて説明する。図
４（ａ）による方法では、波形合成挿入部２６は、第１
メモリ３１および第２メモリ３２を備えている。入力信
号削除部２５による入力信号削除処理の開始時において
は、削除開始点から、１フレーム長以下の所定長さＴ
ｓ、例えば１フレーム分の入力信号が、第１メモリ３１
にアドレス順に順次記憶される。次に、第１メモリ３１
のアドレスが大きくなるにしたがって１から０に直線的
に変化する関数Ｋ１が、第１メモリ３１の内容Ａに乗算
される。そして、その乗算結果Ａ’が、再度第１メモリ
３１に書き込まれる。Waveform synthesis insertion processing by the waveform synthesis insertion unit 26 will be described with reference to FIGS. In the method according to FIG. 4 (a), the waveform synthesis insertion unit 26 uses the first
The memory 31 and the second memory 32 are provided. At the start of the input signal deleting process by the input signal deleting unit 25, a predetermined length T equal to or less than one frame length is set from the deletion start point.
s, for example, an input signal for one frame is stored in the first memory 31
Are sequentially stored in the order of address. Next, the first memory 31
The content A of the first memory 31 is multiplied by a function K1 which linearly changes from 1 to 0 as the address of becomes larger. Then, the multiplication result A ′ is written in the first memory 31 again.

【００６７】また、入力信号削除部２５による入力信号
削除区間の終了点直前の所定長さＴｓ分の入力信号が、
第２メモリ３２にアドレス順に順次記憶される。次に、
第２メモリ３２のアドレスが大きくなるほど、０から１
に直線的に変化する関数Ｋ２が、第２メモリ３２の内容
Ｂに乗算される。そして、その乗算結果Ｂ’が、再度第
２メモリ３２に書き込まれる。この後、第１メモリ３１
の内容Ａ’と、第２メモリ３２の内容Ｂ’とが加え合わ
されて、所定長さＴｓのデータＡ’＊Ｂ’が得られる。
そして、得られた所定長さＴｓ分のデータＡ’＊Ｂ’が
デマルチプレクサ２７を介して、リングメモリ７に送ら
れ、リングメモリ７に書き込まれる。Further, the input signal of the predetermined length Ts immediately before the end point of the input signal deleting section by the input signal deleting section 25 is
The data is sequentially stored in the second memory 32 in the order of addresses. next,
0 to 1 as the address of the second memory 32 increases
The content B of the second memory 32 is multiplied by the function K2 that linearly changes to. Then, the multiplication result B ′ is written in the second memory 32 again. After this, the first memory 31
The contents A ′ of the above and the contents B ′ of the second memory 32 are added to obtain the data A ′ * B ′ of the predetermined length Ts.
Then, the obtained data A ′ * B ′ of the predetermined length Ts is sent to the ring memory 7 via the demultiplexer 27 and written in the ring memory 7.

【００６８】図４（ｂ）による方法では、削除開始点か
ら、１フレーム長以下の所定長さＴｓ、例えば１フレー
ム分の入力信号が、第１メモリ３１にアドレス順に順次
記憶される。次に、後端に１から０に直線的に変化する
スロープがついた関数Ｋ３が、第１メモリ３１の内容Ａ
に乗算される。そして、その乗算結果Ａ’が、再度第１
メモリ３１に書き込まれる。In the method according to FIG. 4B, an input signal of a predetermined length Ts equal to or less than one frame length, for example, one frame from the deletion start point is sequentially stored in the first memory 31 in the order of addresses. Next, the function K3 having a slope that linearly changes from 1 to 0 at the rear end is the content A of the first memory 31.
Is multiplied by. Then, the multiplication result A ′ is again the first
It is written in the memory 31.

【００６９】また、入力信号削除部２５による入力信号
削除区間の終了点直前の所定長さＴｓ分の入力信号が、
第２メモリ３２にアドレス順に順次記憶される。次に、
前端に０から１に直線的に変化するスロープがついた関
数Ｋ４が、第２メモリ３２の内容Ｂに乗算される。そし
て、その乗算結果Ｂ’が、再度第２メモリ３２に書き込
まれる。この後、第１メモリ３１の内容Ａ’と、第２メ
モリ３２の内容Ｂ’とが〓ぎ合わされて、２Ｔｓ分のの
データＡ’＋Ｂ’が得られる。そして、得られた２Ｔｓ
分のデータＡ’＋Ｂ’がデマルチプレクサ２７を介し
て、リングメモリ７に送られ、リングメモリ７に書き込
まれる。図４（ｂ）では、Ｔｓが、１フレーム分の長さ
である例を示したが、１フレームの半分の長さのデータ
をＴｓとしてもよい。Further, the input signal of the predetermined length Ts immediately before the end point of the input signal deleting section by the input signal deleting section 25 is
The data is sequentially stored in the second memory 32 in the order of addresses. next,
The content B of the second memory 32 is multiplied by a function K4 having a slope that linearly changes from 0 to 1 at the front end. Then, the multiplication result B ′ is written in the second memory 32 again. After that, the contents A ′ of the first memory 31 and the contents B ′ of the second memory 32 are mixed together to obtain 2Ts worth of data A ′ + B ′. And the obtained 2Ts
The minute data A ′ + B ′ is sent to the ring memory 7 via the demultiplexer 27 and written in the ring memory 7. In FIG. 4B, an example in which Ts has a length of one frame is shown, but data having a half length of one frame may be Ts.

【００７０】なお、入力信号削除部２５による無音区間
の音声信号の削除処理が繰り返し行なわれている場合
に、リングメモリ７がアンダーフロー直前状態になるこ
とがある。この場合には、リングメモリ７がアンダーフ
ロー直前状態なったときから、所定長さＴｓ分の入力信
号が第２メモリ３２に記憶される。そして、第１メモリ
３１に記憶されているデータと、第２メモリ３２に記憶
されているデータにもとづいて、上記と同様な波形合成
挿入処理が行なわれる。（６）第６ケース（ｃａｓｅ６）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ以上であり、かつリング
メモリ７がアンダーフロー直前状態であると判別された
ときには、第６ケースとなる。When the input signal deleting unit 25 repeatedly deletes the voice signal in the silent section, the ring memory 7 may be in a state immediately before underflow. In this case, the input signal for the predetermined length Ts is stored in the second memory 32 from the time when the ring memory 7 is in the state immediately before underflow. Then, based on the data stored in the first memory 31 and the data stored in the second memory 32, the same waveform synthesis insertion processing as described above is performed. (6) Sixth case (case 6) It is determined that the input signal is a silent section, the duration of the silent section is equal to or longer than the set pause duration Tdel, and the ring memory 7 is in a state immediately before underflow. Sometimes it is the sixth case.

【００７１】この場合は、入力信号は、マルチプレクサ
２０を介して間引き処理部２４に送られる。間引き処理
部２４では、ＶＴＲの再生速度倍率をｎとして、圧縮率
が１／ｎとなるように間引き処理が行なわれる。たとえ
ば、２倍速再生時には入力信号に対して圧縮率１／２で
間引きが行なわれ、３倍速再生時には入力信号に対して
圧縮率１／３で間引きが行なわれる。１倍速再生時に
は、入力信号がそのまま出力される。In this case, the input signal is sent to the thinning processing section 24 via the multiplexer 20. In the thinning-out processing unit 24, the thinning-out processing is performed so that the compression rate becomes 1 / n, where n is the reproduction speed multiplication factor of the VTR. For example, during double-speed reproduction, the input signal is thinned out at a compression rate of 1/2, and during triple-speed reproduction, the input signal is thinned out at a compression rate of 1/3. During 1 × speed reproduction, the input signal is output as it is.

【００７２】１／ｎ間引き処理部２４による間引き処理
としては、次のような方法が用いられる。ここでは、２
倍速再生時を例にとって説明する。The following method is used for the thinning processing by the 1 / n thinning processing section 24. Here, 2
Description will be made by taking the case of double speed reproduction as an example.

【００７３】上述したＰＩＣＯＬＡまたはＴＤＨＳを用
いた時間軸圧縮法を用い、入力信号のピッチを抽出し、
ピッチデータ部分を圧縮率が１／２となるように、間引
く。Using the time base compression method using PICOLA or TDHS described above, the pitch of the input signal is extracted,
The pitch data portion is thinned out so that the compression rate becomes 1/2.

【００７４】また、図５（ａ）〜（ｃ）に示すように、
ピッチ抽出をすることなく、所定時間Ｔｓごとに波形を
間引くようにしてもよい。Further, as shown in FIGS. 5A to 5C,
The waveform may be thinned out every predetermined time Ts without performing pitch extraction.

【００７５】図５（ａ）の方法では、波形Ａ〜Ｄのう
ち、波形Ｂおよび波形Ｄが間引かれ、波形Ａ、Ｃからな
る信号が得られる。In the method of FIG. 5A, the waveform B and the waveform D are thinned out of the waveforms A to D, and a signal composed of the waveforms A and C is obtained.

【００７６】図５（ｂ）の方法では、波形Ａ〜Ｄのう
ち、波形Ｂと波形Ｄが間引かれている。また、波形Ａに
は、前端に０から１に上昇するスロープ（関数Ｋ４）
が、後端に１から０に下降するスロープ（関数Ｋ３）が
ついた関数が乗算されて、波形Ａ’が作成される。ま
た、波形Ｃには、前端に０から１に上昇するスロープ
（関数Ｋ４）が、後端に１から０に下降するスロープ
（関数Ｋ３）がついた関数が乗算されて、波形Ｃ’が作
成される。このようにして、４つの波形Ａ〜Ｄからなる
信号は、２つの波形Ａ’およびＣ’からなる信号に圧縮
される。In the method of FIG. 5B, the waveform B and the waveform D among the waveforms A to D are thinned out. In addition, the waveform A has a slope (function K4) that rises from 0 to 1 at the front end.
Is multiplied by a function having a slope (function K3) that decreases from 1 to 0 at the rear end, and a waveform A ′ is created. The waveform C is created by multiplying the waveform C by a function with a slope (function K4) increasing from 0 to 1 at the front end and a slope (function K3) decreasing from 1 to 0 at the rear end. To be done. In this way, the signal composed of the four waveforms A to D is compressed into the signal composed of the two waveforms A ′ and C ′.

【００７７】図５（ｃ）の方法では、波形Ａに対して
は、１から０へ直線的に向かう重み（重み関数Ｋ１）が
つけられて、波形Ａ’が作成される。波形Ｂに対しては
０から１に向かう重み（重み関数Ｋ２）がつけられて、
波形Ｂ’が作成される。そして、それらの波形Ａ’およ
びＢ’が加え合わされ、長さＴｓの波形Ａ’＊Ｂ’が作
成される。In the method of FIG. 5C, a weight (weighting function K1) that linearly goes from 1 to 0 is added to the waveform A to create the waveform A '. A weight (weight function K2) from 0 to 1 is attached to the waveform B,
Waveform B'is created. Then, these waveforms A'and B'are added together to create a waveform A '* B' of length Ts.

【００７８】同様に、波形Ｃに対しては、１から０へ直
線的に向かう重み（関数Ｋ１）がつけられて、波形Ｃ’
が作成される。波形Ｄに対しては０から１に向かう重み
（関数Ｋ２）がつけられて、波形Ｄ’が作成される。そ
して、それらの波形Ｃ’およびＤ’が加え合わされ、長
さＴｓの波形Ｃ’＊Ｄ’が作成される。このようにし
て、４つの波形Ａ〜Ｄからなる信号は、２つの波形Ａ’
＊Ｂ’およびＣ’＊Ｄ’からなる信号に圧縮される。Similarly, a weight (function K1) that linearly goes from 1 to 0 is added to the waveform C, and the waveform C '
Is created. A weight (function K2) from 0 to 1 is applied to the waveform D to create the waveform D '. Then, the waveforms C'and D'are added together to create a waveform C '* D' of length Ts. In this way, the signal composed of the four waveforms A to D becomes two waveforms A ′.
Compressed to a signal consisting of * B 'and C' * D '.

【００７９】上述のように、第６ケースに該当する場合
には、ＶＴＲの再生倍率をｎとして、圧縮率１／ｎで間
引き処理が行われているが、次のようにして圧縮率を制
御するようにしてもよい。As described above, in the case of the sixth case, the reproduction ratio of the VTR is n, and the thinning process is performed at the compression ratio 1 / n. However, the compression ratio is controlled as follows. You may do it.

【００８０】圧縮率１／ｎで間引き処理が行われている
場合、Ｄ／Ａ変換器８のサンプリング周波数ｆｓＤＡと
Ａ／Ｄ変換器２のサンプリング周波数ｆｓＡＤとの比ｆ
ｓＤＡ／ｆｓＡＤが、圧縮率１／ｎと等しい場合には、
リングメモリ７の蓄積量は、変化しない。しかしなが
ら、圧縮率１／ｎの演算精度、サンプリング周波数ｆｓ
ＡＤとｆｓＤＡのクロック精度によっては、ｆｓＤＡ／
ｆｓＡＤが圧縮率１／ｎと等しくならないことが起こり
うる。When the thinning process is performed at the compression rate 1 / n, the ratio f between the sampling frequency fsDA of the D / A converter 8 and the sampling frequency fsAD of the A / D converter 2 is set.
When sDA / fsAD is equal to the compression ratio 1 / n,
The storage amount of the ring memory 7 does not change. However, the calculation accuracy of the compression rate 1 / n, the sampling frequency fs
Depending on the clock accuracy of AD and fsDA, fsDA /
It is possible that fsAD does not equal compression ratio 1 / n.

【００８１】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより大
きくなったとき（ｆｓＤＡ／ｆｓＡＤ＞１／ｎ）には、
ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）として、｛（１
／ａ）−（１／ｎ）｝だけ、圧縮率が小さくなり、間引
きの度合いが大きくなり、リングメモリ７の蓄積量が減
少していき、リングメモリ７の蓄積量がアンダーフロー
するおそれがある。When fsDA / fsAD becomes larger than the compression rate 1 / n (fsDA / fsAD> 1 / n),
As fsDA / fsAD = 1 / a (a> 0), {(1
/ A)-(1 / n)}, the compression rate decreases, the degree of thinning increases, the storage amount of the ring memory 7 decreases, and the storage amount of the ring memory 7 may underflow. .

【００８２】一方、ｆｓＤＡ／ｆｓＡＤが圧縮率１／
ｎより小さくなったとき（ｆｓＤＡ／ｆｓＡＤ＜１／
ｎ）には、ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）とし
て、｛（１／ｎ）−（１／ａ）｝だけ、圧縮率が大きく
なり、間引きの度合いが小さくなり、リングメモリ７の
蓄積量が増加していく。On the other hand, fsDA / fsAD is the compression ratio 1 /
When it becomes smaller than n (fsDA / fsAD <1 /
In n), fsDA / fsAD = 1 / a (a> 0), the compression ratio increases and the degree of thinning decreases by {(1 / n) − (1 / a)}, and the ring memory 7 The accumulated amount of is increasing.

【００８３】したがって、間引き処理を行う場合には、
リングメモリ７の蓄積量を確認して、次のように圧縮率
を制御する。ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）と
して、（１／ｎ）−α＜１／ａ＜（１／ｎ）＋αの条件
を満たすαを選定する。ただし、αは、０以上で１以下
の値であり、例えば０．００１〜０．１の範囲の値であ
る。Therefore, when thinning processing is performed,
After confirming the storage amount in the ring memory 7, the compression rate is controlled as follows. As fsDA / fsAD = 1 / a (a> 0), α that satisfies the condition of (1 / n) −α <1 / a <(1 / n) + α is selected. However, α is a value of 0 or more and 1 or less, for example, a value in the range of 0.001 to 0.1.

【００８４】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより大
きくなったとき、すなわち、リングメモリ７の蓄積量が
減少していく場合には、圧縮率を１／ｎから｛（１／
ｎ）＋α｝にする。つまり、圧縮率を大きくし、リング
メモリ７の蓄積量を増加させるようにする。When fsDA / fsAD becomes larger than the compression rate 1 / n, that is, when the storage amount of the ring memory 7 decreases, the compression rate is changed from 1 / n to {(1 /
n) + α}. That is, the compression rate is increased and the storage amount of the ring memory 7 is increased.

【００８５】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより小
さくなったとき、すなわち、リングメモリ７の蓄積量が
増加していく場合には、圧縮率を１／ｎから｛（１／
ｎ）−α｝にする。つまり、圧縮率を小さくし、リング
メモリ７の蓄積量を減少させるようにする。When fsDA / fsAD becomes smaller than the compression rate 1 / n, that is, when the storage amount of the ring memory 7 increases, the compression rate is changed from 1 / n to {(1 /
n) -α}. That is, the compression rate is reduced and the amount of storage in the ring memory 7 is reduced.

【００８６】上記では、リングメモリ７の蓄積量に基づ
いて、圧縮率を変化させているが、間引き処理が行われ
る場合に、フレーム毎に圧縮率を｛（１／ｎ）−α｝ま
たは｛（１／ｎ）＋α｝に、交互に変化させるようにし
てもよい。In the above description, the compression rate is changed based on the amount stored in the ring memory 7. However, when thinning processing is performed, the compression rate is {(1 / n) -α} or {for each frame. Alternatively, it may be changed to (1 / n) + α}.

【００８７】図６および図７は、話速変換部６による処
理手順を示している。6 and 7 show the processing procedure by the speech speed conversion unit 6.

【００８８】以下、ＶＴＲの２倍速再生時の場合の話速
変換部６による処理について、説明する。（１）再生開始時の処理再生が開始されて、パワー計算部１１によって最初のフ
レームの平均パワー値Ｐが算出されると（ステップ
１）、算出された平均パワー値Ｐがしきい値Ｔｈ以上か
否かが比較部１２の出力に基づいて判別される（ステッ
プ２）。The processing by the speech speed conversion unit 6 in the VTR double speed reproduction will be described below. (1) Processing at the start of reproduction When reproduction is started and the average power value P of the first frame is calculated by the power calculation unit 11 (step 1), the calculated average power value P is equal to or greater than the threshold Th. Whether or not it is determined based on the output of the comparison unit 12 (step 2).

【００８９】入力音声信号が無音区間から開始した場
合、最初のフレームにおいては、平均パワー値Ｐはしき
い値Ｔｈより小さくなり、ステップ１１に進む。そし
て、無音区間の継続長（無音区間が継続するフレーム
数）が算出され、算出された継続長がポーズ継続長メモ
リ１７に設定されているポーズ継続長Ｔｄｅｌ以上か否
かが判別される（ステップ１２）。このポーズ継続長Ｔ
ｄｅｌは、たとえば、フレーム数にして４フレーム分の
長さに設定されている。When the input voice signal starts from the silent section, the average power value P becomes smaller than the threshold value Th in the first frame, and the routine proceeds to step 11. Then, the duration of the silent section (the number of frames in which the silent section continues) is calculated, and it is determined whether or not the calculated duration is equal to or longer than the pause duration Tdel set in the pause duration memory 17 (step). 12). This pose duration T
del is set to a length corresponding to four frames, for example, as the number of frames.

【００９０】最初のフレームに対する処理においては、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であるの
で、リングメモリ蓄積量状態判別部１６の出力に基づい
て、リングメモリ７がアンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。In the processing for the first frame,
Since the duration of the silent section is less than the pause duration Tdel, it is determined based on the output of the ring memory storage amount state determination unit 16 whether or not the ring memory 7 is in the state immediately before underflow (steps 13 and 14). .

【００９１】最初のフレームに対する処理においては、
リングメモリ７は、アンダーフロー直前状態になってい
るので、フレームデータが間引き処理部２４によって圧
縮率１／２で間引かれ（ステップ２８）、間引き処理後
の圧縮データがリングメモリ７に書き込まれる。この
後、ステップ１に戻る。（２）第１ケースとなる処理の説明ステップ２で、平均パワー値Ｐがしきい値Ｔｈ以上であ
ると判別されたときには、今回のフレームが音声区間で
あると判断され、ステップ３に進む。ステップ３では、
前フレームが削除区間であったか否かが、第１フラグＦ
１の状態に基づいて判別される。前フレームが削除区間
でない場合には、リングメモリ蓄積量状態判別部１６の
出力に基づいて、リングメモリ７がオーバーフロー直前
状態か否かが判別される（ステップ６、７）。前フレー
ムが削除区間である場合には、ステップ４および５の処
理が行なわれた後、リングメモリ７がオーバーフロー直
前状態か否かが判別される（ステップ６、７）。ステッ
プ４および５の処理については、後述する。In the processing for the first frame,
Since the ring memory 7 is in the state immediately before underflow, the frame data is thinned out by the thinning processing unit 24 at a compression rate of 1/2 (step 28), and the compressed data after the thinning processing is written in the ring memory 7. . Then, the process returns to step 1. (2) Description of processing that is the first case When it is determined in step 2 that the average power value P is equal to or greater than the threshold value Th, it is determined that the current frame is the voice section, and the process proceeds to step 3. In step 3,
Whether or not the previous frame was the deletion section is determined by the first flag F.
It is determined based on the state of 1. If the previous frame is not in the deletion section, it is determined whether or not the ring memory 7 is in the state immediately before overflow based on the output of the ring memory accumulated amount state determination unit 16 (steps 6 and 7). If the previous frame is the deletion section, after the processes of steps 4 and 5 are performed, it is determined whether or not the ring memory 7 is in the state immediately before the overflow (steps 6 and 7). The processing of steps 4 and 5 will be described later.

【００９２】ステップ７において、オーバーフロー直前
状態ではないと判別された場合には、第１ケースとな
り、ピッチ圧縮伸長手段２３によって、今回のフレーム
データが２／３の圧縮率で時間軸圧縮される（ステップ
８）。圧縮データは、リングメモリ７に送られて書き込
まれる。この後、ステップ１に戻る。（２）第２ケースとなる処理の説明ステップ２で、平均パワー値Ｐがしきい値Ｔｈ以上であ
ると判別されたときには、今回送られてきたフレームは
音声区間であると判断され、ステップ３に進む。ステッ
プ３では、前フレームが削除区間であったか否かが、第
１フラグＦ１の状態に基づいて判別される。前フレーム
が削除区間でない場合には、リングメモリ蓄積量状態判
別部１６の出力に基づいて、リングメモリ７がオーバー
フロー直前状態か否かが判別される（ステップ６、
７）。前フレームが削除区間である場合には、ステップ
４および５の処理が行なわれた後、リングメモリ７がオ
ーバーフロー直前状態か否かが判別される（ステップ
６、７）。ステップ４および５の処理については、後述
する。When it is determined in step 7 that the state is not immediately before the overflow, the first case is performed, and the pitch compression / expansion means 23 temporally compresses the current frame data at a compression ratio of 2/3 ( Step 8). The compressed data is sent to and written in the ring memory 7. Then, the process returns to step 1. (2) Description of processing that is the second case When it is determined in step 2 that the average power value P is greater than or equal to the threshold value Th, it is determined that the frame sent this time is in the voice section, and step 3 Proceed to. In step 3, it is determined whether or not the previous frame is the deletion section based on the state of the first flag F1. If the previous frame is not the deletion section, it is determined whether or not the ring memory 7 is in the state immediately before overflow based on the output of the ring memory accumulated amount state determination unit 16 (step 6,
7). If the previous frame is the deletion section, after the processes of steps 4 and 5 are performed, it is determined whether or not the ring memory 7 is in the state immediately before the overflow (steps 6 and 7). The processing of steps 4 and 5 will be described later.

【００９３】ステップ７において、オーバーフロー直前
状態であると判別された場合には、第２ケースとなり、
リングメモリ蓄積量状態判別部１６からアンダーフロー
検出信号が出力されるまで、入力信号削除部２１によっ
て入力信号が削除される（ステップ９）。つまり、リン
グメモリ７がアンダーフロー直前状態になるまで、リン
グメモリ７への書き込みが停止される。If it is determined in step 7 that the state is just before the overflow, the second case is established.
The input signal is deleted by the input signal deletion unit 21 until the underflow detection signal is output from the ring memory accumulation amount state determination unit 16 (step 9). That is, writing to the ring memory 7 is stopped until the ring memory 7 is in a state immediately before underflow.

【００９４】そして、リングメモリ７がアンダーフロー
直前状態になると、消音挿入部２２によって、２００個
以下の所定数の消音信号”０”がリングメモリ７に書き
込まれる（ステップ１０）。そして、ステップ１に戻
る。When the ring memory 7 is in a state immediately before underflow, the muffling insertion section 22 writes a predetermined number of muffling signals "0" of 200 or less to the ring memory 7 (step 10). Then, the process returns to step 1.

【００９５】上記ステップ１０の処理の代わりに、図９
（ａ）または図９（ｂ）に示すような処理を行なっても
よい。図９（ａ）に示す方法について説明すると、ステ
ップ７でオーバーフロー直前状態と判別されたときか
ら、たとえば、２００個の入力信号に対する波形Ａに対
しては、１から０へ直線的に向かう重み（重み関数Ｋ
１）をつけて波形Ａ’を得る。また、アンダーフロー直
前から２００個前までの２００個の入力信号に対する波
形Ｂに対しては０から１に向かう重み（重み関数Ｋ２）
をつけて、波形Ｂ’を得る。Instead of the processing of the above step 10, FIG.
You may perform the process as shown in (a) or FIG.9 (b). The method shown in FIG. 9A will be described. For example, for the waveform A with respect to 200 input signals, the weight (1) that linearly goes from 1 to 0 since the state immediately before the overflow is determined in step 7 ( Weight function K
1) is added to obtain the waveform A ′. Further, for the waveform B for 200 input signals from immediately before underflow to 200 before, a weight (weight function K2) that goes from 0 to 1
To obtain the waveform B ′.

【００９６】そして、得られた２つの波形Ａ’および
Ｂ’を加え合わせて、２００個分の長さの波形Ａ’＊
Ｂ’を作成する。そして、この波形Ａ’＊Ｂ’に対する
２００個の信号をリングメモリ７に書き込む。なお、ア
ンダーフロー直前から２００個前の時点の検出は、アッ
プダウンカウンタ９のカウント値に基づいて行なわれ
る。これにより、音声削除区間の前後の音声信号の〓ぎ
目にクリック音が発生するのを、効果的に防止できる。Then, the obtained two waveforms A'and B'are added together to form a waveform A '* having a length of 200 pieces.
Create B '. Then, 200 signals corresponding to the waveform A ′ * B ′ are written in the ring memory 7. The detection of 200 points before the underflow is performed based on the count value of the up / down counter 9. As a result, it is possible to effectively prevent the click sound from being generated at the end of the audio signal before and after the audio deletion section.

【００９７】図９（ｂ）に示す方法について説明する
と、ステップ７でオーバーフロー直前状態と判別された
ときから、たとえば、１００個の入力信号に対する波形
Ａに対しては、１から０へ直線的に向かう重み（重み関
数Ｋ１）をつけて波形Ａ’を得る。また、アンダーフロ
ー直前から１００個前までの１００個の入力信号に対す
る波形Ｂに対しては０から１に向かう重み（重み関数Ｋ
２）をつけて、波形Ｂ’を得る。そして、得られた２つ
の波形Ａ’およびＢ’を〓ぎ合わせた２００個分の信号
をリングメモリ７に書き込む。The method shown in FIG. 9B will be described. From the time when it is determined in step 7 that the state immediately before the overflow occurs, for example, the waveform A for 100 input signals is linearly changed from 1 to 0. A waveform A ′ is obtained by applying a weight (weighting function K1) to the direction. Further, for the waveform B with respect to 100 input signals from immediately before underflow to 100 before, a weight (weight function K
2) is added to obtain the waveform B ′. Then, 200 signals obtained by combining the obtained two waveforms A ′ and B ′ are written in the ring memory 7.

【００９８】上記ステップ９では、オーバーフロー直前
状態であると判別された場合には、リングメモリ蓄積量
状態判別部１６からアンダーフロー検出信号が出力され
るまで、入力信号削除部２１によって入力信号が削除さ
れているが、リングメモリ７に蓄積されているデータ
を、リングメモリ７がアンダーフロー直前状態になるよ
うに、削除するようにしてもよい。In step 9, when it is determined that the state immediately before the overflow occurs, the input signal is deleted by the input signal deleting section 21 until the underflow detection signal is output from the ring memory accumulated amount state judging section 16. However, the data accumulated in the ring memory 7 may be deleted so that the ring memory 7 is in a state immediately before underflow.

【００９９】具体的には、リングメモリ７の書込開始ア
ドレスを、図１８（ａ）に示すオーバーフロー直前状態
の時のアドレス（Ｃ地点）から、図１８（ｂ）に示すよ
うにリングメモリ７がアンダーフロー直前状態となるア
ドレス（Ａ地点）までジャンプさせる。したがって、ス
テップ９の処理では、Ａ地点からＣ地点までのアドレス
に蓄積されていたデータが削除されることになる。この
後、図１８（ｃ）に示すように、ステップ１０によって
消音信号が書き込まれた後、入力データが書き込まれて
いく。Specifically, the write start address of the ring memory 7 is changed from the address (point C) in the state immediately before the overflow shown in FIG. 18A to the ring memory 7 as shown in FIG. 18B. Causes the address to jump to the address (point A) where it will be in the state just before underflow. Therefore, in the process of step 9, the data accumulated at the addresses from point A to point C is deleted. Thereafter, as shown in FIG. 18C, the mute signal is written in step 10 and then the input data is written.

【０１００】ステップ９において、上記のように、リン
グメモリ７に蓄積されているデータをリングメモリ７が
アンダーフロー直前状態になるように削除した場合、ス
テップ１０で消音信号をリングメモリ７に書き込む代わ
りに図１９（ａ）または図１９（ｂ）のような処理を行
ってもよい。In step 9, as described above, when the data accumulated in the ring memory 7 is deleted so that the ring memory 7 is in the state immediately before underflow, the mute signal is written in the ring memory 7 in step 10. 19 (a) or 19 (b) may be performed.

【０１０１】今、リングメモリ７の書込開始アドレス
が、図１８（ａ）に示すオーバーフロー直前状態の時の
アドレス（Ｃ地点）から、図１８（ｂ）に示すようにリ
ングメモリ７がアンダーフロー直前状態となるアドレス
（Ａ地点）までジャンプしたとする。このＡ地点から所
定数、例えば２００先のアドレス（図１９（ａ）のＢ地
点）までに蓄積されているデータＳに対しては、図１９
（ａ）に示すように、１から０へ直線的に向かう重み
（重み関数Ｋ１）をつけて波形Ｓ’を得る。また、それ
以後にリングメモリ７に書き込まれる２００個分の入力
データ（波形Ｔ）に対しては、図１９（ａ）に示すよう
に、０から１に向かう重み（重み関数Ｋ２）をつけて、
波形Ｔ’を得る。Now, as shown in FIG. 18B, the ring memory 7 underflows from the address (point C) when the write start address of the ring memory 7 is in the state immediately before the overflow shown in FIG. 18A. It is assumed that a jump has been made to the address (point A) that is in the immediately preceding state. For the data S accumulated from this point A to a predetermined number, for example, an address 200 points ahead (point B in FIG. 19A),
As shown in (a), a weight (weight function K1) that linearly goes from 1 to 0 is added to obtain a waveform S '. Further, as shown in FIG. 19A, a weight (weight function K2) from 0 to 1 is added to 200 pieces of input data (waveform T) written in the ring memory 7 thereafter. ,
Obtain the waveform T '.

【０１０２】そして、得られた２つの波形Ｓ’および
Ｔ’を加え合わせて、２００個分の長さの波形Ｓ’＊
Ｔ’を作成する。そして、この波形Ｓ’＊Ｔ’に対する
２００個の信号をＡ地点からリングメモリ７に書き込
む。これにより、蓄積データ削除区間の前後の音声信号
の〓ぎ目にクリック音が発生するのを、効果的に防止で
きる。Then, the two obtained waveforms S'and T'are added together to form a waveform S '* of 200 lengths.
Create T '. Then, 200 signals for this waveform S ′ * T ′ are written in the ring memory 7 from the point A. As a result, it is possible to effectively prevent the click sound from being generated at the end of the audio signal before and after the accumulated data deletion section.

【０１０３】図１９（ｂ）に示す方法について説明する
と、図１８（ｂ）のＡ地点から所定数、例えば１００個
先のアドレス（図１９（ｂ）のＢ地点）までに蓄積され
ているデータＳに対しては、１から０へ直線的に向かう
重み（重み関数Ｋ１）をつけて波形Ｓ’を得る。また、
それ以後にリングメモリ７に書き込まれる１００個分の
入力データ（波形Ｔ）に対しては、０から１に向かう重
み（重み関数Ｋ２）をつけて、波形Ｔ’を得る。そし
て、得られた２つの波形Ｓ’およびＴ’を〓ぎ合わせた
２００個分の信号をＡ地点からリングメモリ７に書き込
む。（３）第３ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。Explaining the method shown in FIG. 19 (b), the data accumulated from point A in FIG. 18 (b) to a predetermined number, for example, 100 addresses ahead (point B in FIG. 19 (b)). For S, a weight (weighting function K1) that linearly goes from 1 to 0 is added to obtain a waveform S ′. Also,
For 100 pieces of input data (waveform T) written in the ring memory 7 thereafter, a weight (weight function K2) from 0 to 1 is applied to obtain a waveform T '. Then, 200 signals obtained by combining the obtained two waveforms S ′ and T ′ are written into the ring memory 7 from the point A. (3) Description of processing that is the third case When it is determined in step 2 that the average power value P is smaller than the threshold value Th, the duration of the silent section up to this time is calculated (step 11) and calculated. The pause length is set in the pause duration memory 17 and the pause duration Tde is set.
It is determined whether or not it is 1 or more (step 12). And
When it is determined that the duration of the silent section is less than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of 6, it is determined whether or not the state is immediately before underflow (steps 13 and 14).

【０１０４】リングメモリ７がアンダーフロー直前状態
になっていないときには、リングメモリ蓄積量状態判別
部１６の出力に基づいて、オーバーフロー直前状態か否
かが判別される（ステップ６、７）。オーバーフロー直
前状態でない場合には、第３ケースとなり、ピッチ圧縮
伸長手段２３によって、今回のフレームデータが２／３
の圧縮率で時間軸圧縮される（ステップ８）。圧縮デー
タは、リングメモリ７に送られて書き込まれる。この
後、ステップ１に戻る。（４）第４ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。When the ring memory 7 is not in the state immediately before the underflow, it is determined whether or not it is in the state immediately before the overflow based on the output of the ring memory accumulated amount state determination unit 16 (steps 6 and 7). If it is not the state immediately before the overflow, the third case occurs, and the pitch compression / expansion means 23 sets the current frame data to 2/3.
The time axis is compressed at the compression ratio of (step 8). The compressed data is sent to and written in the ring memory 7. Then, the process returns to step 1. (4) Description of processing that is the fourth case When it is determined in step 2 that the average power value P is smaller than the threshold value Th, the duration of the silent section up to this time is calculated (step 11) and calculated. The pause length is set in the pause duration memory 17 and the pause duration Tde is set.
It is determined whether or not it is 1 or more (step 12). And
When it is determined that the duration of the silent section is less than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of 6, it is determined whether or not the state is immediately before underflow (steps 13 and 14).

【０１０５】リングメモリ７がアンダーフロー直前状態
になっていないときには、リングメモリ蓄積量状態判別
部１６の出力に基づいて、オーバーフロー直前状態か否
かが判別される（ステップ６、７）。オーバーフロー直
前状態である場合には、第４ケースとなり、リングメモ
リ蓄積量状態判別部１６からアンダーフロー検出信号が
出力されるまで、入力信号削除部２１によって入力信号
が削除される（ステップ９）。つまり、リングメモリ７
がアンダーフロー直前状態になるまで、リングメモリ７
への書き込みが中断される。When the ring memory 7 is not in the state immediately before underflow, it is determined whether or not it is in the state immediately before overflow based on the output of the ring memory accumulation amount state determination unit 16 (steps 6 and 7). In the case of the state immediately before the overflow, the fourth case occurs, and the input signal is deleted by the input signal deletion unit 21 until the underflow detection signal is output from the ring memory accumulation amount state determination unit 16 (step 9). That is, the ring memory 7
Ring memory 7 until
Writing to is interrupted.

【０１０６】そして、リングメモリ７がアンダーフロー
直前状態になると、消音挿入部２２によって、２００個
以下の所定数の消音信号”０”がリングメモリ７に書き
込まれる（ステップ１０）。そして、ステップ１に戻
る。（５）第５ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ以上であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１５、１６）。When the ring memory 7 is in a state immediately before underflow, the muffling insertion section 22 writes a predetermined number of muffling signals "0" of 200 or less to the ring memory 7 (step 10). Then, the process returns to step 1. (5) Description of the Process that is the Fifth Case When it is determined in step 2 that the average power value P is smaller than the threshold value Th, the duration of the silent section up to this time is calculated (step 11) and calculated. The pause length is set in the pause duration memory 17 and the pause duration Tde is set.
It is determined whether or not it is 1 or more (step 12). And
When it is determined that the duration of the silent section is equal to or longer than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of 6, it is determined whether or not the state is immediately before underflow (steps 15 and 16).

【０１０７】リングメモリ７がアンダーフロー直前状態
でないときには、第５ケースとなり、今回のフレームが
入力信号削除部２５による削除区間であることを示す第
１フラグＦ１がセットされる（ステップ１７）。この第
１フラグＦ１は、電源投入時の初期設定において、リセ
ット（Ｆ１＝０）されている。そして、今回のフレーム
が入力信号削除部２５による削除区間の最初のフレーム
であるか否かを示す第２フラグＦ２がリセットされてい
るか否かが判別される（ステップ１８）。When the ring memory 7 is not in the state immediately before underflow, the fifth case is set, and the first flag F1 indicating that the current frame is the deletion section by the input signal deletion unit 25 is set (step 17). The first flag F1 is reset (F1 = 0) in the initial setting when the power is turned on. Then, it is determined whether or not the second flag F2 indicating whether or not the current frame is the first frame of the deletion section by the input signal deletion unit 25 is reset (step 18).

【０１０８】この第２フラグＦ２は、電源投入時の初期
設定において、リセット（Ｆ２＝０）されている。そし
て、入力信号削除部２５による削除区間の最初のフレー
ムに対する処理が終了したときにセット（Ｆ２＝１）に
される。そして、入力信号削除部２５による一連の削除
区間に対する処理が終了したときにリセット（Ｆ２＝
０）される。The second flag F2 is reset (F2 = 0) in the initial setting when the power is turned on. Then, it is set (F2 = 1) when the processing for the first frame of the deletion section by the input signal deletion unit 25 is completed. Then, it is reset (F2 =
0).

【０１０９】したがって、今回のフレームが入力信号削
除部２５による削除区間の最初のフレームであるときに
は、第２フラグＦ２は、リセット（Ｆ２＝０）されてい
る。第２フラグＦ２がリセットされているときには、波
形合成挿入部２６によって第１メモリ３１に今回のフレ
ームデータが記憶される（ステップ１９）。また、入力
信号削除部２５によって今回のフレームデータのリング
メモリ７への書き込みが停止される（ステップ２０）。
つまり、今回のフレームデータが削除される。そして、
第２フラグＦ２がセット（Ｆ２＝１）された後（ステッ
プ２１）、ステップ１に戻る。Therefore, when the current frame is the first frame of the deletion section by the input signal deletion unit 25, the second flag F2 is reset (F2 = 0). When the second flag F2 is reset, the waveform synthesis insertion section 26 stores the current frame data in the first memory 31 (step 19). Further, the input signal deleting unit 25 stops the writing of the current frame data to the ring memory 7 (step 20).
That is, the current frame data is deleted. And
After the second flag F2 is set (F2 = 1) (step 21), the process returns to step 1.

【０１１０】さらに、無音区間が続いている場合には、
ステップ２、１１、１２、１５を通ってステップ１６に
移り、リングメモリ蓄積量状態判別部１６の出力に基づ
いて、リングメモリ７がアンダーフロー直前状態か否か
が判別される。Further, when there is a silent section,
After passing through steps 2, 11, 12, and 15, the process proceeds to step 16, where it is determined based on the output of the ring memory accumulated amount state determination unit 16 whether or not the ring memory 7 is in the state immediately before underflow.

【０１１１】リングメモリ７がアンダーフロー直前状態
でないときには、今回のフレームが入力信号削除部２５
による削除区間であることを示す第１フラグＦ１がセッ
トされる（ステップ１７）。そして、今回のフレームが
入力信号削除部２５による削除区間の最初のフレームで
あるか否かを示す第２フラグＦ２がリセットされている
か否かが判別される（ステップ１８）。When the ring memory 7 is not in the state immediately before underflow, the current frame is the input signal deleting section 25.
The first flag F1 indicating that the section is a deletion section is set (step 17). Then, it is determined whether or not the second flag F2 indicating whether or not the current frame is the first frame of the deletion section by the input signal deletion unit 25 is reset (step 18).

【０１１２】この場合には、第２フラグＦ２はセット
（Ｆ２＝１）されているので、今回のフレームが入力信
号削除部２５による削除区間の最初のフレームでないと
判断される。この場合には、波形合成挿入部２６によっ
て第２メモリ３２に今回のフレームデータが記憶される
（ステップ２２）。また、入力信号削除部２５によって
今回のフレームデータのリングメモリ７への書き込みが
停止される（ステップ２３）。そして、ステップ１に戻
る。In this case, since the second flag F2 is set (F2 = 1), it is determined that the current frame is not the first frame of the deletion section by the input signal deletion unit 25. In this case, the waveform synthesis insertion unit 26 stores the current frame data in the second memory 32 (step 22). Further, the input signal deleting unit 25 stops the writing of the current frame data to the ring memory 7 (step 23). Then, the process returns to step 1.

【０１１３】そして、さらに、無音区間が続きかつリン
グメモリ７がアンダーフロー直前状態となっていないと
きには、ステップ２、１１、１２、１５、１６、１７、
１８、２２および２３の処理が繰り返される。つまり、
第２メモリ３２のフレームデータが更新されるととも
に、フレームデータのリングメモリ７への書き込みが停
止される。Further, when the silent section continues and the ring memory 7 is not in the state immediately before underflow, steps 2, 11, 12, 15, 16, 17,
The processing of 18, 22 and 23 is repeated. That is,
The frame data in the second memory 32 is updated, and the writing of the frame data to the ring memory 7 is stopped.

【０１１４】この後、音声区間のフレームデータが入力
されたときには、ステップ２において、平均パワー値Ｐ
がしきい値Ｔｈ以上となるので、前フレームが入力信号
削除部２５による削除区間であったか否かが、第１フラ
グＦ１状態に基づいて判別される（ステップ３）。この
場合には、第１フラグＦ１がセット（Ｆ１＝１）されて
いるので、前フレームが入力信号削除部２５による削除
区間であったと判別され、ステップ４に移る。ステップ
４では、入力信号削除部２５による削除処理が停止せし
められるとともに、波形合成挿入部２６による波形合成
挿入処理が行なわれる。Thereafter, when the frame data of the voice section is input, in step 2, the average power value P
Is equal to or greater than the threshold Th, it is determined based on the state of the first flag F1 whether or not the previous frame is the deletion section by the input signal deletion unit 25 (step 3). In this case, since the first flag F1 is set (F1 = 1), it is determined that the previous frame is the deletion section by the input signal deletion unit 25, and the process proceeds to step 4. In step 4, the deletion processing by the input signal deletion unit 25 is stopped and the waveform synthesis insertion processing by the waveform synthesis insertion unit 26 is performed.

【０１１５】すなわち、図４（ａ）を用いて既に説明し
たように、第１メモリ３１の内容に１から０に直線的に
変化する関数が乗算され、第２メモリ３２の内容に０か
ら１に直線的に変化する関数が乗算され、これらの両乗
算結果が加え合わされる。この加算結果（図４（ａ）の
Ａ’＊Ｂ’に相当する。）が、デマルチプレクサ２７を
介して、リングメモリ７に送られ、リングメモリ７に書
き込まれる。That is, as already described with reference to FIG. 4A, the contents of the first memory 31 are multiplied by the function that linearly changes from 1 to 0, and the contents of the second memory 32 are changed from 0 to 1. Is multiplied by a linearly varying function and the results of both multiplications are added together. The addition result (corresponding to A ′ * B ′ in FIG. 4A) is sent to the ring memory 7 via the demultiplexer 27 and written in the ring memory 7.

【０１１６】この後、第１フラグＦ１および第２フラグ
Ｆ２がリセット（Ｆ１＝Ｆ２＝０）される（ステップ
５）。そして、ステップ６に進む。After that, the first flag F1 and the second flag F2 are reset (F1 = F2 = 0) (step 5). Then, the process proceeds to step 6.

【０１１７】ところで、連続している無音区間に対し
て、上記のような入力信号削除部２５による削除処理が
繰り返し行なわれている場合において、リングメモリ７
がアンダーフロー直前状態になることがある。この場合
には、上記ステップ１６でＹＥＳとなり、ステップ２４
に移る。ステップ２４では、前フレームが入力信号削除
部２５による削除区間であったか否かが、第１フラグＦ
１の状態に基づいて判別される。By the way, in the case where the deletion processing by the input signal deletion unit 25 as described above is repeatedly performed on the continuous silent section, the ring memory 7
May be in a state just before underflow. In this case, YES is obtained in the above step 16 and step 24
Move on to. In step 24, it is determined whether or not the previous frame is the deletion section by the input signal deletion unit 25, the first flag F.
It is determined based on the state of 1.

【０１１８】この場合には、第１フラグＦ１がセット
（Ｆ１＝１）されているので、ステップ２５に進み、第
２メモリ３２に今回のフレームデータが記憶される。そ
して、入力信号削除部２５による削除処理が停止せしめ
られるとともに、波形合成挿入部２６による波形合成挿
入処理が行なわれる（ステップ２６）。そして、第１フ
ラグＦ１および第２フラグＦ２がリセット（Ｆ１＝Ｆ２
＝０）された後（ステップ２７）、ステップ１に進む。In this case, since the first flag F1 is set (F1 = 1), the routine proceeds to step 25, where the current frame data is stored in the second memory 32. Then, the deletion processing by the input signal deletion unit 25 is stopped, and the waveform synthesis insertion processing is performed by the waveform synthesis insertion unit 26 (step 26). Then, the first flag F1 and the second flag F2 are reset (F1 = F2
= 0) (step 27), the process proceeds to step 1.

【０１１９】上記ステップ２６における波形合成挿入部
２６による波形合成挿入処理には、上記ステップ４で説
明した波形合成挿入処理とほぼ同様であるが、第２メモ
リ３２に記憶されているフレームデータが、リングメモ
リ７がアンダーフロー直前状態になった後のフレームデ
ータである点が、上記ステップ４で説明した処理の場合
と異なっている。The waveform synthesizing / inserting process by the waveform synthesizing / inserting unit 26 in the above step 26 is almost the same as the waveform synthesizing / inserting process described in the above step 4, except that the frame data stored in the second memory 32 is This is different from the case of the processing described in step 4 above in that it is the frame data after the ring memory 7 is in the state immediately before underflow.

【０１２０】なお、上記ステップ２５の処理を省略し、
ステップ２４でＹＥＳとなった場合に、第２メモリ３２
に今回のフレームデータを記憶させることなく、ステッ
プ２６に移るようにしてもよい。この場合には、ステッ
プ２６で行なわれる波形合成挿入処理においては、上記
ステップ４で説明した波形合成挿入処理と同様に、第２
メモリ３２に記憶されているアンダーフロー直前状態よ
り前のフレームデータ（前回のフレームデータ）が用い
られる。The process of step 25 is omitted and
If YES in step 24, the second memory 32
It is also possible to move to step 26 without storing the current frame data. In this case, in the waveform synthesizing / inserting process performed in step 26, as in the waveform synthesizing / inserting process described in step 4, the second
The frame data (previous frame data) before the underflow state stored in the memory 32 is used.

【０１２１】また、上記ステップ２２の処理を省略する
とともに上記ステップ３と上記ステップ４との間に、フ
レームデータを第２メモリ３２に記憶させるステップを
追加するようにしてもよい。この場合には、ステップ４
においては、上記ステップ１９において第１メモリ３１
に記録された内容と、上記ステップ３と上記ステップ４
との間に追加されたステップにおいて第２メモリ３２に
記録された内容とに基づいて、波形合成挿入処理が行わ
れる。（６）第６ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ以上であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１５、１６）。Further, the processing of step 22 may be omitted, and a step of storing the frame data in the second memory 32 may be added between steps 3 and 4. In this case, step 4
In step 19, the first memory 31
The contents recorded in step 3 and step 4 above.
Based on the contents recorded in the second memory 32 in the steps added between and, the waveform synthesis insertion processing is performed. (6) Description of Processing Being the Sixth Case When it is determined in step 2 that the average power value P is smaller than the threshold value Th, the duration of the silent section up to this time is calculated (step 11) and calculated. The pause length is set in the pause duration memory 17 and the pause duration Tde is set.
It is determined whether or not it is 1 or more (step 12). And
When it is determined that the duration of the silent section is equal to or longer than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of 6, it is determined whether or not the state is immediately before underflow (steps 15 and 16).

【０１２２】リングメモリ７がアンダーフロー直前状態
であるときには、前フレームが入力信号削除部２５によ
る削除区間であったか否かが、第１フラグＦ１の状態に
基づいて判別される（ステップ２４）。第１フラグＦ１
がリセットされている場合（Ｆ１＝０）、すなわち、前
フレームが入力信号削除部２５による削除区間でなかっ
た場合には、第６ケースとなり、ステップ２８に移る。
ステップ２８では、間引き処理部２４によって、今回の
フレームデータが圧縮率１／２で間引き処理が行なわれ
る。そして、間引き処理されたデータは、リングメモリ
７に送られて書き込まれる。この後、ステップ１に戻
る。When the ring memory 7 is in the state immediately before underflow, it is determined whether or not the previous frame is the deletion section by the input signal deletion unit 25 based on the state of the first flag F1 (step 24). First flag F1
Is reset (F1 = 0), that is, when the previous frame is not the deletion section by the input signal deletion unit 25, the sixth case is performed, and the process proceeds to step 28.
In step 28, the thinning processing section 24 thins the current frame data at a compression rate of 1/2. Then, the thinned data is sent to the ring memory 7 and written. Then, the process returns to step 1.

【０１２３】つまり、無音区間の継続長がポーズ継続長
Ｔｄｅｌ以上であっても、リングメモリ７がアンダーフ
ロー直前状態であり、かつ前フレームが入力信号削除部
２５による削除区間でない場合には、フレームデータは
削除されず、圧縮率１／２で間引き処理が行なわれた
後、リングメモリ７に書き込まれる。That is, even if the duration of the silent section is equal to or longer than the pause duration Tdel, if the ring memory 7 is in the state immediately before underflow and the previous frame is not the section deleted by the input signal deleting unit 25, The data is not deleted, is thinned out at a compression rate of 1/2, and then written in the ring memory 7.

【０１２４】図７においては、ステップ１２において、
無音区間の継続長が設定されたポーズ継続長Ｔｄｅｌよ
り長いか否かが判別されているが、図８のステップ１２
Ａに示すように、無音区間の継続長Ｔが設定された第１
基準長Ｔ１未満か（Ｔ＜Ｔ１）、無音区間の継続長Ｔが
設定された第１基準長Ｔ１以上で設定された第２基準長
Ｔ２（ただしＴ１＜Ｔ２）未満か（Ｔ１≦Ｔ＜Ｔ２）、
または無音区間の継続長Ｔが設定された第２基準長Ｔ２
以上か（Ｔ≧Ｔ２）を、判別するようにしてもよい。第
１基準長としては、たとえば、４フレーム分の長さが、
第２基準長としてはたとえば４０フレーム分の長さが設
定される。In FIG. 7, in step 12,
It is determined whether or not the duration of the silent section is longer than the set pause duration Tdel, but step 12 in FIG.
As shown in A, the first period in which the duration T of the silent section is set.
Is it less than the reference length T1 (T <T1) or less than the second reference length T2 (where T1 <T2) which is equal to or more than the first reference length T1 where the duration T of the silent section is set (T1 ≦ T <T2 ),
Alternatively, the second reference length T2 in which the duration T of the silent section is set
It may be determined whether or not (T ≧ T2). As the first reference length, for example, a length of 4 frames is
For example, a length of 40 frames is set as the second reference length.

【０１２５】そして、図８に示すように、各判別結果に
応じて、次のようなステップに進むようにしてもよい。
すなわち、無音区間の継続長Ｔが設定された第１基準長
Ｔ１未満（Ｔ＜Ｔ１）である場合には、ステップ１３に
進む。無音区間の継続長Ｔが設定された第１基準長Ｔ１
以上で設定された第２基準長Ｔ２（Ｔ１＜Ｔ２）未満
（Ｔ１≦Ｔ＜Ｔ２）であるときには、ステップ２８に進
んで１／ｎ間引き処理による間引きを行なう。無音区間
の継続長Ｔが設定された第２基準長Ｔ２以上（Ｔ≧Ｔ
２）であるときには、ステップ１５に進む。Then, as shown in FIG. 8, it is possible to proceed to the following steps according to each determination result.
That is, when the duration T of the silent section is less than the set first reference length T1 (T <T1), the process proceeds to step 13. The first reference length T1 in which the duration T of the silent section is set
When the length is less than the second reference length T2 (T1 <T2) set as described above (T1 ≦ T <T2), the process proceeds to step 28 to perform thinning by the 1 / n thinning process. The second reference length T2 or more (T ≧ T
When it is 2), the process proceeds to step 15.

【０１２６】図１０は、２倍速再生時の入力信号と出力
信号との関係を示し、特に無音区間の入力信号が削除さ
れる様子を示している。図１１および図１２は、リング
メモリ７へのデータ書き込み開始点、リングメモリ７か
らのデータ読み出し開始点ならびに図１０の各点Ａ〜Ｈ
におけるリングメモリ７の状態を示している。FIG. 10 shows the relationship between the input signal and the output signal at the time of double-speed reproduction, and particularly shows how the input signal in the silent section is deleted. 11 and 12 are data writing start points to the ring memory 7, data reading start points from the ring memory 7, and points A to H in FIG.
The state of the ring memory 7 in FIG.

【０１２７】図１０では、２倍速再生開始時において
は、入力信号は無音区間となっており、かつリングメモ
リ７は空状態であるので（図１１（ａ）参照）、フレー
ムデータが間引き処理部２４によって圧縮率１／２で間
引かれた後、リングメモリ７に書き込まれていく。In FIG. 10, since the input signal is in the silent section and the ring memory 7 is in the empty state at the start of the double speed reproduction (see FIG. 11A), the frame data is thinned out. The data is thinned out at a compression rate of 1/2 by 24 and then written in the ring memory 7.

【０１２８】そして、リングメモリ７の蓄積量Ｔｍがア
ンダーフロー検出用データＴｍｉｎに達すると、リング
メモリ７からのデータの読み出しが開始される（図１１
（ｂ）参照）。When the accumulated amount Tm of the ring memory 7 reaches the underflow detection data Tmin, the reading of data from the ring memory 7 is started (FIG. 11).
(B)).

【０１２９】そして、入力信号の音声区間ａに対するフ
レームデータが送られてくると（Ａ点）、ピッチ圧縮伸
長手段２３によって、フレームデータが圧縮率２／３で
圧縮される。入力信号と出力信号との長さが一致する圧
縮率１／２の圧縮を基準とすると、フレームデータが伸
長される。この意味で、図１０には、伸長処理と記載さ
れている。そして、この圧縮データがリングメモリ７に
書き込まれる。Ａ点においては、図１１（ｃ）に示すよ
うに、蓄積量ＴｍＡは、Ｔｍｉｎのままである。When frame data for the voice section a of the input signal is sent (point A), the frame data is compressed by the pitch compression / expansion means 23 at a compression rate of 2/3. The frame data is expanded on the basis of compression at a compression rate of 1/2 in which the lengths of the input signal and the output signal match. In this sense, the extension processing is described in FIG. Then, this compressed data is written in the ring memory 7. At the point A, as shown in FIG. 11C, the accumulated amount TmA remains Tmin.

【０１３０】入力信号の音声区間ａに対する出力信号ａ
１は、Ａ点での蓄積量ＴｍＡ分だけ遅れて読み出されて
いく。そして、入力信号の音声区間ａが入力され終わっ
た時点（Ｂ点）では、図１１（ｄ）に示すように、今回
の圧縮区間の開始点であるＡ点での蓄積量Ｔｍｉｎと、
Ａ点からＢ点までの音声区間ａの圧縮データの、圧縮率
１／２の圧縮に対する伸長分ＳｔＢとの和がリングメモ
リ７の蓄積量ＴｍＢ（＝ＳｔＢ＋Ｔｍｉｎ）となる。し
たがって、入力信号の音声区間ａに対する出力信号ａ１
は、Ｂ点からＴｍＢ（＝ＳｔＢ＋Ｔｍｉｎ）分が経過し
た点で出力され終わる。Output signal a for the voice section a of the input signal
1 is read out with a delay of the accumulated amount TmA at the point A. Then, at the time when the voice section a of the input signal has been input (point B), as shown in FIG. 11D, the accumulated amount Tmin at the point A, which is the start point of the current compression section,
The sum of the compressed data of the voice section a from the point A to the point B and the expanded amount StB for the compression of the compression rate 1/2 becomes the storage amount TmB (= StB + Tmin) of the ring memory 7. Therefore, the output signal a1 for the voice section a of the input signal
Ends when TmB (= StB + Tmin) has elapsed from point B.

【０１３１】入力信号の音声区間ａに続くポーズ継続長
Ｔｄｅｌ未満の無音区間のフレームデータも、ピッチ圧
縮伸長手段２３によって圧縮率２／３で圧縮される。こ
の無音区間に続いて音声区間ｂが入力されると、この音
声区間ｂのフレームデータもピッチ圧縮伸長手段２３に
よって圧縮率２／３で圧縮される。The frame data of the silent section less than the pause duration Tdel following the voice section a of the input signal is also compressed by the pitch compression / expansion means 23 at a compression ratio of ⅔. When the voice section b is input subsequently to the silent section, the frame data of the voice section b is also compressed by the pitch compression / expansion means 23 at a compression rate of 2/3.

【０１３２】そして、入力信号の音声区間ｂが入力され
終わった時点（Ｃ点）では、図１１（ｅ）に示すよう
に、今回の圧縮区間の開始点であるＡ点での蓄積量Ｔｍ
ｉｎと、Ａ点からＣ点までの入力信号に対応する圧縮デ
ータの、１／２圧縮に対する伸長分ＳｔＣとの和がリン
グメモリ７の蓄積量ＴｍＣ（＝ＳｔＣ＋Ｔｍｉｎ）とな
る。したがって、入力信号の音声区間ｂに対する出力信
号ｂ１は、Ｃ点からＴｍＣ（＝ＳｔＣ＋Ｔｍｉｎ）分が
経過した点で出力され終わる。Then, at the time when the voice section b of the input signal has been input (point C), as shown in FIG. 11E, the accumulated amount Tm at the point A, which is the start point of the current compression section.
The sum of in and the expanded amount StC of the compressed data corresponding to the input signals from the points A to C for 1/2 compression is the storage amount TmC (= StC + Tmin) of the ring memory 7. Therefore, the output signal b1 for the voice section b of the input signal ends being output at a point after TmC (= StC + Tmin) has elapsed from the point C.

【０１３３】入力信号の音声区間ｂに続いて、ポーズ継
続長Ｔｄｅｌ以上の長さの無音区間の信号が送られてき
たときには、ポーズ継続長Ｔｄｅｌに達するまで（Ｄ
点）はフレームデータが、ピッチ圧縮伸長手段２３によ
って圧縮率２／３で圧縮される。When a signal of a silent section having a length equal to or longer than the pause duration Tdel is sent following the voice section b of the input signal, the pause duration Tdel is reached (D
The point data is compressed by the pitch compression / expansion means 23 at a compression rate of ⅔.

【０１３４】Ｄ点では、図１１（ｆ）に示すように、今
回の圧縮区間の開始点であるＡ点での蓄積量Ｔｍｉｎ
と、Ａ点からＤ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＤとの和がリングメ
モリ７の蓄積量ＴｍＤ（＝ＳｔＤ＋Ｔｍｉｎ）となる。
したがって、入力信号の音声区間ｂとＤ点との間の無音
区間に対する出力信号は、Ｄ点からＴｍＤ（＝ＳｔＤ＋
Ｔｍｉｎ）分が経過した点で出力され終わる。At point D, as shown in FIG. 11 (f), the accumulated amount Tmin at point A, which is the start point of the current compression section.
Then, the sum of the compressed data corresponding to the input signals from the points A to D and the decompressed amount StD for 1/2 compression is the storage amount TmD (= StD + Tmin) of the ring memory 7.
Therefore, the output signal for the silent section between the voice section b of the input signal and the point D is TmD (= StD +) from the point D.
The output ends when Tmin minutes have passed.

【０１３５】ポーズ継続長Ｔｄｅｌ以降の無音区間のフ
レームデータは、リングメモリ７の蓄積量がアンダーフ
ロー検出用データＴｍｉｎ以下になるまで、入力信号削
除部２５によって削除される。このポーズ削除部分の長
さＳｔｄは、今回の圧縮区間の開始点であるＡ点からＤ
点までの入力信号に対応する圧縮データの、１／２圧縮
に対する伸長分ＳｔＤと等しくなる。入力信号削除部２
５によって削除処理が行なわれた後においては、波形合
成挿入部２２によってクリック音防止のための合成波形
が挿入されるが、図１０には挿入された合成波形部分を
省略してある。The frame data in the silent section after the pause duration Tdel is deleted by the input signal deleting section 25 until the accumulated amount in the ring memory 7 becomes the underflow detection data Tmin or less. The length Std of the pause deletion portion is from the point A, which is the start point of the current compression section, to the point D.
The compressed data corresponding to the input signal up to the point becomes equal to the expansion amount StD for 1/2 compression. Input signal deletion unit 2
After the deletion processing is performed by 5, the waveform synthesis insertion unit 22 inserts a synthesized waveform for preventing click sound, but the inserted synthesized waveform portion is omitted in FIG.

【０１３６】入力信号が削除された区間の最終点（Ｅ
点）においては、図１２（ｇ）に示すように、リングメ
モリ７の蓄積量ＴｍＥは、アンダーフロー検出用データ
Ｔｍｉｎ以下となる。ここでは、蓄積量ＴｍＥが、アン
ダーフロー検出用データＴｍｉｎに等しくなった例を示
している。The end point (E
12 (g), the accumulated amount TmE of the ring memory 7 is less than or equal to the underflow detection data Tmin. Here, an example is shown in which the accumulated amount TmE is equal to the underflow detection data Tmin.

【０１３７】Ｅ点からの無音区間に対するフレームデー
タは、間引き処理部２４によって、圧縮率１／２で間引
かれた後、フレームメモリ７に書き込まれる。そして、
音声区間ｃの信号が入力さると（Ｆ点）、この音声区間
ｃのフレームデータがピッチ圧縮伸長手段２３によっ
て、圧縮率２／３で圧縮される。つまり、新たな圧縮区
間が開始される。そして、圧縮データがリングメモリ７
に書き込まれる。The frame data for the silent section from the point E is thinned out by the thinning processing section 24 at a compression rate of 1/2 and then written in the frame memory 7. And
When the signal of the voice section c is input (point F), the frame data of the voice section c is compressed by the pitch compression / expansion means 23 at a compression rate of 2/3. That is, a new compression section is started. Then, the compressed data is the ring memory 7
Is written to.

【０１３８】Ｆ点では、図１２（ｈ）に示すように、リ
ングメモリ７の蓄積量ＴｍＦは、Ｅ点のときと同じＴｍ
ｉｎとなっている。At point F, as shown in FIG. 12 (h), the accumulated amount TmF of the ring memory 7 is the same Tm as at point E.
It is in.

【０１３９】入力信号の音声区間ｃに対する出力信号ｃ
１は、Ｆ点での蓄積量Ｔｍｉｎ分だけ遅れて出力されて
いく。入力信号の音声区間ｃに続くポーズ継続長Ｔｄｅ
ｌ未満の無音区間（音声区間ｃからＧ点までの無音区
間）のフレームデータも、ピッチ圧縮伸長手段２３によ
って圧縮率２／３で圧縮される。Output signal c for voice section c of input signal
1 is output with a delay of the accumulated amount Tmin at the point F. Pause duration Tde following the voice section c of the input signal
The frame data in the silent section less than 1 (the silent section from the voice section c to the point G) is also compressed by the pitch compression / expansion means 23 at a compression rate of 2/3.

【０１４０】Ｇ点では、図１２（ｉ）に示すように、今
回の圧縮区間の開始点であるＦ点での蓄積量Ｔｍｉｎ
と、Ｆ点からＧ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＧとの和がリングメ
モリ７の蓄積量ＴｍＧ（＝ＳｔＧ＋Ｔｍｉｎ）となる。
したがって、入力信号の音声区間ｃからＧ点までの無音
区間に対する出力信号は、Ｇ点からＴｍＧ（＝ＳｔＧ＋
Ｔｍｉｎ）分が経過した点で出力され終わる。At point G, as shown in FIG. 12 (i), the accumulated amount Tmin at point F, which is the start point of the current compression section.
Then, the sum of the compressed data corresponding to the input signals from the F point to the G point and the decompressed amount StG for 1/2 compression is the storage amount TmG (= StG + Tmin) of the ring memory 7.
Therefore, the output signal for the silent section from the voice section c of the input signal to the G point is TmG (= StG +) from the G point.
The output ends when Tmin minutes have passed.

【０１４１】ポーズ継続長Ｔｄｅｌ以降の無音区間のフ
レームデータは、リングメモリ７の蓄積量がアンダーフ
ロー検出用データＴｍｉｎになるまで、入力信号削除部
２５によって削除される。このポーズ削除部分の長さＳ
ｔｄは、今回の圧縮区間の開始点であるＦ点からＧ点ま
での入力信号に対応する圧縮データの、１／２圧縮に対
する伸長分ＳｔＧと等しくなる。The frame data in the silent section after the pause duration Tdel is deleted by the input signal deleting section 25 until the accumulated amount in the ring memory 7 reaches the underflow detection data Tmin. The length S of this pose deletion part
td becomes equal to the extension amount StG of the compressed data corresponding to the input signal from the point F to the point G, which is the start point of the current compression section, for 1/2 compression.

【０１４２】入力信号が削除された区間の最終点（Ｈ
点）においては、図１２（ｊ）に示すように、リングメ
モリ７の蓄積量ＴｍＨは、アンダーフロー検出用データ
Ｔｍｉｎ以下となる。ここでは、蓄積量ＴｍＨが、アン
ダーフロー検出用データＴｍｉｎに等しくなった例を示
している。The final point (H
12 (j), the accumulated amount TmH of the ring memory 7 is equal to or less than the underflow detection data Tmin. Here, an example is shown in which the accumulated amount TmH is equal to the underflow detection data Tmin.

【０１４３】Ｈ点からの無音区間に対するフレームデー
タは、間引き処理部２４によって、圧縮率１／２で間引
かれた後、フレームメモリ７に書き込まれる。そして、
音声区間ｄの信号が入力されると、この音声区間ｄのフ
レームデータがピッチ圧縮伸長手段２３によって、圧縮
率２／３で圧縮される。そして、伸長されたデータがリ
ングメモリ７に書き込まれる。The frame data for the silent section from the point H is thinned out by the thinning processing section 24 at a compression rate of 1/2 and then written in the frame memory 7. And
When the signal of the voice section d is input, the frame data of the voice section d is compressed by the pitch compression / expansion means 23 at a compression rate of 2/3. Then, the expanded data is written in the ring memory 7.

【０１４４】図１３は、２倍速再生時の入力信号と出力
信号との関係を示し、特にオーバーフロー直前状態とな
ったときに、入力信号が削除される様子を示している。
図１４は、図１３の各点Ｓ〜Ｕにおけるリングメモリ７
の状態を示している。FIG. 13 shows the relationship between the input signal and the output signal at the time of double speed reproduction, and particularly shows how the input signal is deleted when the state immediately before the overflow occurs.
FIG. 14 shows the ring memory 7 at points S to U in FIG.
Shows the state of.

【０１４５】ある時点からＴ点までの、音声区間ａ、
ｂ、ｃ等と無音区間とを含む一連の入力信号に対するフ
レームデータが、ピッチ圧縮伸長手段２３によって圧縮
率２／３で圧縮され（圧縮率１／２の圧縮に対しては伸
長され）ているとする。この場合には、リングメモリ７
に伸長分が蓄積されていく。The voice section a from a certain point to the point T,
Frame data for a series of input signals including b, c, etc. and a silent section is compressed by the pitch compression / expansion means 23 at a compression ratio of 2/3 (compressed for compression at a compression ratio of 1/2). And In this case, the ring memory 7
The amount of extension is accumulated in.

【０１４６】音声区間ｂの入力開始点（Ｓ点）において
は、図１４（ａ）に示すように、当該１連の入力信号の
圧縮処理の開始点での蓄積量Ｔｍｉｎと、上記圧縮処理
の開始点からＳ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＳとの和がリングメ
モリ７の蓄積量ＴｍＳ（＝ＳｔＳ＋Ｔｍｉｎ）となる。
したがって、音声区間ｂに対する出力信号ｂ１は、Ｓ点
からＴｍＳ（＝ＳｔＳ＋Ｔｍｉｎ）分が経過した点で出
力され始められる。At the input start point (point S) of the voice section b, as shown in FIG. 14A, the accumulated amount Tmin at the start point of the compression processing of the series of input signals and the compression processing of the above compression processing. The sum of the compressed data corresponding to the input signal from the start point to the point S and the decompressed amount StS for 1/2 compression is the storage amount TmS (= StS + Tmin) of the ring memory 7.
Therefore, the output signal b1 for the voice section b is started to be output when TmS (= StS + Tmin) minutes have elapsed from the point S.

【０１４７】音声区間ｃの入力信号に対応する圧縮デー
タがリングメモリ７に書き込まれた時点（Ｔ点）におい
て、リングメモリ７がオーバーフロー直前状態になった
とする。すなわち、Ｔ点において、リングメモリ７の蓄
積量がオーバーフロー検出用データＴｍａｘ以上になっ
たとする。It is assumed that the ring memory 7 is in a state immediately before overflow at the time (point T) when the compressed data corresponding to the input signal of the voice section c is written in the ring memory 7. That is, it is assumed that the accumulated amount in the ring memory 7 becomes equal to or larger than the overflow detection data Tmax at the point T.

【０１４８】Ｔ点においては、図１４（ｂ）に示すよう
に、当該１連の入力信号に対する圧縮処理の開始点での
蓄積量Ｔｍｉｎと、上記圧縮処理開始点からＴ点までの
入力信号に対応する圧縮データの、１／２圧縮に対する
伸長分ＳｔＴとの和がリングメモリ７の蓄積量ＴｍＴ
（＝ＳｔＴ＋Ｔｍｉｎ）となる。言い換えれば、リング
メモリ７の全ワード数をＴＯＴＡＬとし、オーバーフロ
ー検出用データをＴｍａｘとし、ＴＯＴＡＬとＴｍａｘ
との差をＤｍｉｎとすると、Ｔ点での蓄積量Ｔｍｔは、
Ｔｍａｘに等しいので、ＴＯＴＡＬ−Ｄｍｉｎとなる。At the point T, as shown in FIG. 14B, the accumulated amount Tmin at the start point of the compression process for the series of input signals and the input signal from the compression process start point to the point T. The sum of the corresponding compressed data and the expansion amount StT for 1/2 compression is the storage amount TmT in the ring memory 7.
(= StT + Tmin). In other words, the total number of words in the ring memory 7 is set to TOTAL, the overflow detection data is set to Tmax, and TOTAL and Tmax are set.
If the difference between and is Dmin, the accumulated amount Tmt at the point T is
Since it is equal to Tmax, it becomes TOTAL-Dmin.

【０１４９】したがって、当該１連の入力信号に対する
出力信号は、Ｔ点から蓄積量ＴｍＴ（＝ＳｔＴ＋Ｔｍｉ
ｎ）分遅れた時点で出力され終わる。Therefore, the output signal corresponding to the series of input signals is stored at the point T from the accumulated amount TmT (= StT + Tmi).
The output is finished at the point of n) delay.

【０１５０】Ｔ点において、リングメモリ７がオーバー
フロー直前状態になると、それ以後の入力信号に対して
は、リングメモリ７がアンダーフロー直前状態になるま
で、入力信号削除部２１によって無条件に削除される。
入力信号削除部２１によって削除処理が行なわれた後に
おいては、消音挿入部２２によって消音が挿入される
が、図１３には挿入された消音部分を省略してある。リ
ングメモリ７がオーバーフロー直前状態になった後（Ｔ
点）、フレームデータが削除されていき、図１４（ｃ）
に示すようにＵ点でリングメモリ７がアンダーフロー直
前状態（蓄積量ＴｍＵ＝Ｔｍｉｎ）になったとする。こ
の場合には、Ｔ点からＵ点までの４つの無音区間および
３つの音声区間ｄ、ｅ、ｆからなる入力信号が削除され
る。したがって、Ｔ点からＵ点までの入力信号は、出力
信号としては現れない。At the point T, when the ring memory 7 is in the state immediately before the overflow, the input signal thereafter is unconditionally deleted by the input signal deleting section 21 until the ring memory 7 is in the state immediately before the underflow. It
After the deletion processing is performed by the input signal deletion unit 21, the muffling insertion unit 22 inserts muffling, but the inserted muffling portion is omitted in FIG. 13. After the ring memory 7 is about to overflow (T
Point), the frame data is deleted, and FIG.
It is assumed that the ring memory 7 is in the state immediately before underflow (accumulation amount TmU = Tmin) at point U as shown in FIG. In this case, the input signal composed of four silent sections from point T to point U and three voice sections d, e, f is deleted. Therefore, the input signal from point T to point U does not appear as an output signal.

【０１５１】Ｕ点の後に音声区間ｇの信号が入力される
と、この音声区間に対するフレームデータは、ピッチ圧
縮伸長手段２３によって圧縮率２／３で圧縮され（圧縮
率１／２の圧縮に対しては伸長され）た後、リングメモ
リ７に書き込まれていく。音声区間ｇに対する出力信号
ｇは、Ｕ点でのリングメモリ７の蓄積量Ｔｍｉｎ分だけ
遅れて出力され始められる。When the signal of the voice section g is input after point U, the frame data for this voice section is compressed by the pitch compression / expansion means 23 at a compression rate of ⅔ (for compression at a compression rate of ½). Are decompressed) and then written in the ring memory 7. The output signal g for the voice section g is started to be output with a delay by the storage amount Tmin of the ring memory 7 at the point U.

【０１５２】上記実施例では、入力信号の音声区間と無
音区間とを、各フレームの平均パワー値Ｐに基づいて判
別しているが、各フレームの平均振幅に基づいて判別す
るようにしてもよい。この場合には、図１５に示すよう
に、図２のパワー計算部１１の代わりにフレーム単位で
平均振幅値を計算する平均振幅計算部１１Ａが設けら
れ、しきい値メモリ１３Ａには、Ａ／Ｄ変換部２の量子
化ビット数が１２ｂｉｔのときには、たとえば、値２６
のしきい値が設定される。そして、平均振幅計算部１
１Ａによって計算された平均振幅値と、しきい値メモリ
１３Ａのしきい値とが、比較部１２Ａによって比較され
ることにより、音声区間か無音区間かが判別される。In the above embodiment, the voice section and the silent section of the input signal are discriminated based on the average power value P of each frame, but they may be discriminated based on the average amplitude of each frame. . In this case, as shown in FIG. 15, an average amplitude calculator 11A for calculating an average amplitude value in frame units is provided in place of the power calculator 11 of FIG. When the quantization bit number of the D conversion unit 2 is 12 bits, for example, the value 26
Threshold is set. Then, the average amplitude calculation unit 1
By comparing the average amplitude value calculated by 1A with the threshold value of the threshold value memory 13A by the comparison unit 12A, it is determined whether it is the voice section or the silent section.

【０１５３】つまり、平均振幅値がしきい値以上であれ
ば音声区間と判別され、平均振幅値がしきい値未満であ
れば無音区間と判別される。フレーム単位の平均振幅値
Ｗは、サンプリングされた１フレーム内の各音声信号の
振幅をｉ０、ｉ１、…ｉＮ−１（ただし、Ｎ＝２
００）とすると、次の数式３に基づいて算出される。That is, if the average amplitude value is greater than or equal to the threshold value, it is determined to be a voice section, and if the average amplitude value is less than the threshold value, it is determined to be a silent section. The average amplitude value W on a frame-by-frame basis is i0, i1, ... iN-1 (where N = 2) for the amplitude of each audio signal sampled in one frame.
00), it is calculated based on the following Equation 3.

【０１５４】[0154]

【数３】 (Equation 3)

【０１５５】その他の処理については、図２の話速変換
部６による処理と同じであるので、その説明を省略す
る。The other processing is the same as the processing by the speech speed conversion unit 6 in FIG. 2, and therefore its explanation is omitted.

【０１５６】なお、この場合においても、次のようにし
て、しきい値を変更するようにしてもよい。すなわち、
図１５に点線で示すように、平均振幅定常状態検出およ
びしきい値更新部１４Ａを設ける。平均振幅定常状態検
出およびしきい値更新部１４Ａは、平均振幅計算部１１
Ａからの平均振幅値Ｗが、所定フレーム数にわたって一
定であったか否かを判別し、一定であったときには（定
常状態）、そのときの平均振幅値Ｗの２倍の値をしきい
値メモリ１３Ａに書き込み、しきい値を更新させる。た
だし、更新されるしきい値の最大値は、所定値、たとえ
ば２８に制限される。Even in this case, the threshold value may be changed as follows. That is,
As shown by the dotted line in FIG. 15, an average amplitude steady state detecting and threshold updating unit 14A is provided. The average amplitude steady state detection / threshold updating unit 14A includes an average amplitude calculating unit 11
It is determined whether or not the average amplitude value W from A is constant over a predetermined number of frames, and when it is constant (steady state), a value twice the average amplitude value W at that time is set as the threshold memory 13A. To update the threshold. However, the maximum value of the updated threshold value is limited to a predetermined value, for example, 28 2.

【０１５７】また、入力信号の音声区間と無音区間と
を、次の数式４で示す各フレームの音声信号の振幅累積
値Ｗａと所与のしきい値とに基づいて判別するようにし
てもよい。Further, the voice section and the silent section of the input signal may be discriminated on the basis of the amplitude cumulative value Wa of the voice signal of each frame and a given threshold value shown in the following formula 4. .

【０１５８】[0158]

【数４】 [Equation 4]

【０１５９】また、入力信号の音声区間と無音区間と
を、各フレームの信号の周期性を検出し、検出した周期
が予め定められた音声信号のピッチ周期範囲内であれ
ば、音声区間であると判別し、検出した周期が予め定め
られた音声信号のピッチ周期範囲外であれば無音区間で
あると判別するようにしてもよい。Also, the voice section and the silent section of the input signal are detected as the voice section if the periodicity of the signal of each frame is detected and the detected period is within a predetermined pitch period range of the voice signal. If it is determined that the detected period is outside the predetermined pitch period range of the audio signal, it may be determined to be a silent section.

【０１６０】この場合には、図１６に示すように、図２
のパワー計算部１１の代わりに、自己相関法に基づい
て、フレームごとの周期性を検出するピッチ周期検出部
１１Ｂが設けられ、しきい値メモリ１３Ｂには、音声信
号のピッチ周期範囲が設定される。そして、ピッチ周期
検出部１１Ｂで検出された周期と、しきい値メモリ１３
Ｂに設定された音声信号のピッチ周期範囲とが、比較部
１２Ｂによって比較される。In this case, as shown in FIG.
Instead of the power calculation unit 11 of FIG. 1, a pitch period detection unit 11B for detecting the periodicity for each frame based on the autocorrelation method is provided, and the pitch period range of the audio signal is set in the threshold value memory 13B. It Then, the cycle detected by the pitch cycle detection unit 11B and the threshold memory 13
The comparison unit 12B compares the pitch period range of the audio signal set to B.

【０１６１】設定される音声信号のピッチ周期範囲は、
再生速度により異なり、ｎ倍速再生のときには、たとえ
ば、６６×ｎ（Ｈｚ）〜３２０×ｎ（Ｈｚ）の範囲に設
定される。したがって、２倍速再生時には、音声信号の
ピッチ周期範囲は、１３２Ｈｚ〜６４０Ｈｚの範囲に設
定される。その他の処理については、図２の話速変換部
６による処理と同じであるので、その説明を省略する。The pitch period range of the audio signal to be set is
It depends on the reproduction speed, and is set to, for example, a range of 66 × n (Hz) to 320 × n (Hz) during n-fold speed reproduction. Therefore, during double speed reproduction, the pitch period range of the audio signal is set to the range of 132 Hz to 640 Hz. The other processing is the same as the processing by the speech speed conversion unit 6 in FIG. 2, and thus the description thereof will be omitted.

【０１６２】また、入力信号の音声区間と無音区間と
を、各フレームの信号のパワースペクトルと、定常状態
のパワースペクトルと比較することにより、判別するよ
うにしてもよい。Alternatively, the voice section and the silent section of the input signal may be discriminated by comparing the power spectrum of the signal of each frame with the power spectrum of the steady state.

【０１６３】この場合には、図２０に示すように、図２
のパワー計算部１１の代わりに、フレームごとに所定の
１または複数の周波数帯域に対するパワースペクトルを
算出するパワースペクトル算出部１１Ｃが設けられる。
また、上記所定の１または複数の周波数帯域に対する定
常状態のパワースペクトルがパワースペクトル記憶部１
３Ｃに記憶されている。In this case, as shown in FIG.
In place of the power calculation unit 11 of, a power spectrum calculation unit 11C that calculates a power spectrum for a predetermined one or a plurality of frequency bands for each frame is provided.
In addition, the power spectrum in the steady state for the predetermined one or more frequency bands is stored in the power spectrum storage unit 1.
It is stored in 3C.

【０１６４】パワースペクトル記憶部１３Ｃの内容は、
パワースペクトル算出部１１Ｃによって算出されたパワ
ースペクトルの変化状態に基づいて、パワースペクトル
定常状態検出部１４Ｂが定常状態であることを検出した
ときには、検出された定常状態でのパワースペクトルに
更新される。The contents of the power spectrum storage section 13C are as follows:
When the power spectrum steady state detection unit 14B detects that the power spectrum is in the steady state based on the change state of the power spectrum calculated by the power spectrum calculation unit 11C, the power spectrum in the detected steady state is updated.

【０１６５】入力信号がパワースペクトル算出部１１Ｃ
に送られてくると、フレームごとに所定の１または複数
の周波数帯域に対するパワースペクトルが算出される。
そして、算出されたパワースペクトルと、パワースペク
トル記憶部１３Ｃに記憶されている定常状態のパワース
ペクトルとが比較部１２Ｃによって比較される。The input signal is the power spectrum calculation unit 11C.
Then, the power spectrum for a predetermined one or a plurality of frequency bands is calculated for each frame.
Then, the calculated power spectrum and the steady-state power spectrum stored in the power spectrum storage unit 13C are compared by the comparison unit 12C.

【０１６６】算出されたパワースペクトルが定常状態の
パワースペクトルに対して、変動していれば、そのフレ
ームは音声区間と判別される。逆に、算出されたパワー
スペクトルが定常状態のパワースペクトルに対して、変
動していなければ、そのフレームは無音区間と判別され
る。If the calculated power spectrum fluctuates with respect to the steady-state power spectrum, the frame is discriminated as a voice section. On the contrary, if the calculated power spectrum does not fluctuate with respect to the power spectrum in the steady state, the frame is determined to be a silent section.

【０１６７】具体的には、パワースペクトル記憶部１３
Ｃには、上記所定の１または複数の周波数帯域に対する
定常状態のパワースペクトルに基づいて、上記所定の１
または複数の周波数帯域に対するしきい値が記憶され
る。そして、パワースペクトル記憶部１３Ｃに記憶され
ている。パワースペクトル算出部１１Ｃによって算出さ
れた上記所定の１または複数の周波数帯域に対するパワ
ースペクトルと、パワースペクトル記憶部１３Ｃに記憶
されている対応するしきい値とが比較されることによ
り、入力信号が音声区間か無音区間かが判別される。Specifically, the power spectrum storage unit 13
C is the predetermined one based on the steady-state power spectrum for the one or more predetermined frequency bands.
Alternatively, threshold values for a plurality of frequency bands are stored. Then, it is stored in the power spectrum storage unit 13C. By comparing the power spectrum for the predetermined one or a plurality of frequency bands calculated by the power spectrum calculation unit 11C with the corresponding threshold value stored in the power spectrum storage unit 13C, the input signal is converted into a voice signal. It is determined whether it is a section or a silent section.

【０１６８】たとえば、定常状態のパワースペクトルが
図２１の（ａ）に示されているように、雑音のみのパワ
ースペクトルであるとする。また、雑音が含まれていな
い音声のパワースペクトルが図２１の（ｂ）に示されて
いるものとする。定常状態において、図２１（ａ）のパ
ワースペクトルで示される雑音が存在する場合に、図２
１（ｂ）で示すパワースペクトルを持つ音声信号が入力
すると、そのパワースペクトルは、図２１（ｃ）に示さ
れるように、両者のパワースペクトルが合成されたもの
となる。For example, it is assumed that the power spectrum in the steady state is a noise-only power spectrum as shown in FIG. In addition, it is assumed that the power spectrum of voice that does not include noise is shown in FIG. In the steady state, the noise shown in the power spectrum of FIG.
When an audio signal having the power spectrum shown in 1 (b) is input, the power spectrum becomes a combination of the power spectra of both as shown in FIG. 21 (c).

【０１６９】したがって、たとえば、定常状態のパワー
スペクトルにおいてパワーが比較的小さい周波数帯域ｆ
ａおよびｆｂに対するパワーは、音声区間のパワースペ
クトルにおいては大幅に増加する。つまり、定常状態の
パワースペクトルにおいてパワーが比較的小さい１また
は複数の周波数帯域における定常状態のパワーと、入力
信号のパワースペクトルの上記１または複数の周波数帯
域におけるパワーとを比較することにより、入力信号が
音声区間か無音区間かを判別することができる。Therefore, for example, the frequency band f in which the power is relatively small in the power spectrum in the steady state is
The powers for a and fb increase significantly in the power spectrum of the voice section. That is, by comparing the steady state power in one or more frequency bands in which the power is relatively small in the steady state power spectrum with the power in the one or more frequency bands of the power spectrum of the input signal, It is possible to determine whether is a voice section or a silent section.

【０１７０】なお、定常状態の雑音が高い周波数帯域の
雑音であると判明している場合には、雑音の影響の少な
い低い周波数帯域（例えば、４ＫＨｚ以下の周波数帯
域）に対するパワースペクトルを算出し、算出されたパ
ワースペクトルが所定のしきい値以上か否かによって、
入力信号が音声区間か無音区間かを判別することもでき
る。When it is known that the noise in the steady state is noise in a high frequency band, a power spectrum for a low frequency band (for example, a frequency band of 4 KHz or less) that is less affected by noise is calculated, Depending on whether the calculated power spectrum is greater than or equal to a predetermined threshold,
It is also possible to determine whether the input signal is a voice section or a silent section.

【０１７１】また、各フレームのパワー平均値Ｐと、し
きい値Ｔｈとを比較することにより、音声区間と無音区
間とを判別する場合において、リングメモリ７の蓄積量
に基づいて、しきい値Ｔｈを変化させるようにしてもよ
い。すなわち、リングメモリ７の蓄積量が少なくなるほ
ど、言い換えれば、リングメモリ７の空領域が多くなる
ほど、音声区間の欠落部が少なくなるようにしきい値Ｔ
ｈは小さくされる。これにより、出力音声が自然により
近くなる。When the voice section and the silent section are discriminated by comparing the power average value P of each frame with the threshold value Th, the threshold value is calculated based on the accumulated amount in the ring memory 7. You may make it change Th. That is, the threshold value T is set so that as the storage amount of the ring memory 7 decreases, in other words, as the empty area of the ring memory 7 increases, the missing parts of the voice section decrease.
h is reduced. As a result, the output voice is naturally closer to the sound.

【０１７２】つまり、図２２に示すように、しきい値調
整手段５１を設ける。しきい値調整手段５１は、リング
メモリ蓄積量状態判別部１６からリングメモリ７の蓄積
量を得る。そして、得られたリングメモリ７の蓄積量
を、Ｄ／Ａ変換部８のサンプリング周波数で除すること
により、蓄積時間Ｔｍを算出する。そして、算出された
蓄積時間Ｔｍに基づいて、しきい値Ｔｈを決定し、しき
い値メモリ１３の内容を更新する。That is, as shown in FIG. 22, threshold value adjusting means 51 is provided. The threshold value adjusting means 51 obtains the storage amount of the ring memory 7 from the ring memory storage amount state determination unit 16. Then, the storage time Tm is calculated by dividing the obtained storage amount of the ring memory 7 by the sampling frequency of the D / A conversion unit 8. Then, the threshold Th is determined based on the calculated accumulation time Tm, and the contents of the threshold memory 13 are updated.

【０１７３】より具体的に説明すると、リングメモリ蓄
積量状態判別部１６から得られたリングメモリ７の蓄積
量がＤ／Ａ変換部８のサンプリング周波数である８００
０で除されることにより、蓄積時間Ｔｍが求められる。
そして、予め作成された蓄積時間Ｔｍに対するしきい値
Ｔｈのデータに基づいて、蓄積時間Ｔｍに対するしきい
値Ｔｈが求められる。More specifically, the storage amount of the ring memory 7 obtained from the ring memory storage amount state determination unit 16 is the sampling frequency of the D / A conversion unit 800.
By dividing by 0, the accumulation time Tm is obtained.
Then, the threshold Th for the accumulation time Tm is obtained based on the data of the threshold Th for the accumulation time Tm created in advance.

【０１７４】次の表は、Ａ／Ｄ変換部２の量子化ビット
数が１２ｂｉｔである場合における蓄積時間Ｔｍに対す
るしきい値Ｔｈのデータの一例を示している。The following table shows an example of the data of the threshold Th with respect to the accumulation time Tm when the quantization bit number of the A / D converter 2 is 12 bits.

【０１７５】[0175]

【表１】 [Table 1]

【０１７６】また、各フレームのパワー累積値Ｐａとし
きい値とを比較することにより、音声区間と無音区間と
を判別する場合、各フレームの平均振幅値Ｗとしきい値
とを比較することにより、音声区間と無音区間とを判別
する場合、各フレームの振幅累積値Ｗａとしきい値とを
比較することにより、各フレームのパワースークトルと
しきい値とを比較することにより、音声区間と無音区間
とを判別する場合にも、上記と同様に、リングメモリ７
の蓄積量に基づいて、しきい値を変化させるようにして
もよい。Further, when the voice section and the silent section are discriminated by comparing the power cumulative value Pa of each frame with the threshold value, the average amplitude value W of each frame is compared with the threshold value. When discriminating between the voice section and the silent section, the amplitude cumulative value Wa of each frame is compared with a threshold value, and the power threshold of each frame is compared with the threshold value to determine the voice section and the silent section. Also in the case of determining, the ring memory 7
You may make it change a threshold value based on the accumulation amount of.

【０１７７】また、リングメモリ７の蓄積量に基づい
て、無音区間の削除開始点を決定するためのポーズ継続
長Ｔｄｅｌを変化させるようにしてもよい。すなわち、
リングメモリ７の蓄積量が少なくなるほど、言い換えれ
ば、リングメモリ７の空領域が多くなるほど、無音区間
の削除部が少なくなるように、ポーズ継続長Ｔｄｅｌが
長くされる。これにより、出力音声が自然により近くな
る。Further, the pause duration Tdel for determining the deletion start point of the silent section may be changed based on the accumulated amount in the ring memory 7. That is,
The pause duration Tdel is made longer so that the amount of storage in the ring memory 7 becomes smaller, in other words, the more the empty area of the ring memory 7 becomes, the less the deleted portion of the silent section becomes. As a result, the output voice is naturally closer to the sound.

【０１７８】つまり、図２２に示すように、ポーズ継続
長調整手段５２を設ける。ポーズ継続長調整手段５２
は、リングメモリ蓄積量状態判別部１６からリングメモ
リ７の蓄積量を得る。そして、得られたリングメモリ７
の蓄積量を、Ｄ／Ａ変換部８のサンプリング周波数で除
することにより、蓄積時間Ｔｍを算出する。そして、算
出された蓄積時間Ｔｍに基づいて、ポーズ継続長Ｔｄｅ
ｌを決定し、ポーズ継続長設定メモリ１７の内容を更新
する。That is, as shown in FIG. 22, the pause duration adjusting means 52 is provided. Pose duration adjusting means 52
Obtains the storage amount of the ring memory 7 from the ring memory storage amount state determination unit 16. And the obtained ring memory 7
The storage time Tm is calculated by dividing the storage amount of 1 by the sampling frequency of the D / A conversion unit 8. Then, based on the calculated accumulation time Tm, the pause duration Tde
1 is determined, and the content of the pause duration setting memory 17 is updated.

【０１７９】より具体的に説明すると、リングメモリ蓄
積量状態判別部１６から得られたリングメモリ７の蓄積
量がＤ／Ａ変換部８のサンプリング周波数である８００
０で除されることにより、蓄積時間Ｔｍが求められる。
そして、予め作成された蓄積時間Ｔｍに対するポーズ継
続長Ｔｄｅｌのデータに基づいて、蓄積時間Ｔｍに対す
るポーズ継続長Ｔｄｅｌが求められる。More specifically, the storage amount of the ring memory 7 obtained from the ring memory storage amount state determination unit 16 is the sampling frequency of the D / A conversion unit 800.
By dividing by 0, the accumulation time Tm is obtained.
Then, the pause duration Tdel for the accumulation time Tm is obtained based on the data of the pause duration Tdel for the accumulation time Tm created in advance.

【０１８０】次の表は、ＶＴＲの２倍速再生時における
蓄積時間Ｔｍに対するポーズ継続長Ｔｄｅｌのデータの
一例を示している。The following table shows an example of the data of the pause duration Tdel with respect to the accumulation time Tm during the double speed reproduction of the VTR.

【０１８１】[0181]

【表２】 [Table 2]

【０１８２】以上は、入力信号がアナログ信号の場合に
ついて説明したが、入力信号がディジタルデータである
場合にもこの発明を適用することができる。たとえば、
ＩＣメモリ、磁気ディスク、ディジタル通信回線等か
ら、圧縮されたディジタル音声信号が送られてきた場合
には、圧縮されたディジタル音声信号が伸長されてＰＣ
Ｍ音声信号に変換され、得られたＰＣＭ音声信号がバッ
ファに一旦格納される。その後、設定された再生速度倍
率に応じた速度で、ＰＣＭ音声データがバッファから読
み出されて、図１のフレームメモリ５に送られる。Although the case where the input signal is an analog signal has been described above, the present invention can be applied to a case where the input signal is digital data. For example,
When a compressed digital audio signal is sent from an IC memory, a magnetic disk, a digital communication line, etc., the compressed digital audio signal is expanded and PC
The converted PCM audio signal is converted into an M audio signal, and the obtained PCM audio signal is temporarily stored in a buffer. After that, the PCM audio data is read from the buffer at a speed according to the set reproduction speed magnification and sent to the frame memory 5 in FIG.

【０１８３】以上適応型話速変換処理について詳述した
が、次に単純間引き方式による話速変換について具体的
なメモリ制御動作について説明する。すなわち、話速変
換ＩＣ１１２はメモリ１１３を次のように制御して話速
変換を行う。The adaptive speech rate conversion processing has been described in detail above. Next, a concrete memory control operation for speech rate conversion by the simple thinning method will be described. That is, the speech speed conversion IC 112 controls the memory 113 as follows to convert the speech speed.

【０１８４】図２８は３倍速再生時におけるメモリ制御
動作を示しており、音声信号を３倍速のままで書き込
み、書き込みと同時に読み出しを始め、音声が１倍速再
生と同じ速度になるように読み出しを終了する。すなわ
ち、読み出し周期Ｔの１／３の周期Ｔ／３で書き込むよ
うにメモリ制御されるようになっており、従って、図の
Ｔ０期間が間引かれることになる。同様に、５倍速再生
時は、書き込み時間がＴ／５となり、９倍速再生時はＴ
／９となる。FIG. 28 shows the memory control operation during the 3 × speed reproduction. The audio signal is written at the 3 × speed, the reading is started at the same time as the writing, and the reading is performed so that the audio becomes the same speed as the 1 × speed reproduction. finish. That is, the memory is controlled so that data is written in a cycle T / 3, which is ⅓ of the read cycle T. Therefore, the T0 period in the figure is thinned out. Similarly, the writing time is T / 5 during 5 × speed reproduction, and T is during 9 × speed reproduction.
/ 9.

【０１８５】図２９は逆転５倍速再生時におけるメモリ
の読み出し／書き込みタイミングを示しており、Ｔが１
倍速周期、Ｗは書き込み期間、＊２はＴ／５、＊はＴ／
６となっている。メモリの書き込みアドレスのタイミン
グは５倍速であり、書き込みサイクル期間が５倍速正方
向再生の５／６となっており、また、Ｔ期間６回とした
書き込みサイクルカウンタは、５回で巡回し、アドレス
値が０の期間だけ書き込みをさせている。このため図に
示すように書き込みと、読み出しとは少しずつずれて巡
回し、読み出し５回でもとに戻るのでα点での内容変化
は起こらずスムーズに再生音が聞こえる。尚、図に示す
ように書き込みと読み出しは、ｄとＤ，ｅとＥ、ｆ１は
Ｆ１、Ｆ２はｆ２の順で行われる。また、図２８及び図
２９において、破線は書き込みアドレスの歩進状態を示
し、実線は読み出しアドレスの歩進状態を示す。FIG. 29 shows the read / write timing of the memory during reverse 5 × speed reproduction, where T is 1
Double speed cycle, W is writing period, * 2 is T / 5, * is T /
It is 6. The timing of the write address of the memory is quintuple speed, the write cycle period is 5/6 of the quintuple-speed normal direction reproduction, and the write cycle counter, which is set to six T periods, circulates at five times, Writing is performed only when the value is 0. For this reason, as shown in the figure, the writing and the reading circulate with a slight shift, and the reading returns to the original after five readings, so that the reproduced sound can be heard smoothly without any change in the contents at the point α. As shown in the drawing, writing and reading are performed in the order of d and D, e and E, F1 for f1, and f2 for F2. Further, in FIGS. 28 and 29, the broken line shows the progress state of the write address, and the solid line shows the progress state of the read address.

【０１８６】[0186]

【発明の効果】このように本発明のＶＴＲでは、２倍速
再生時においては適応型話速変換処理を行い、３倍速再
生以上のときは単純間引き処理を行うようにすることに
より、話速変換用ＩＣとしてその構成が簡単になるばか
りでなく、適切な話速変換処理が自動的に選択されるよ
うになっているのでＶＴＲとしての商品価値を向上せし
めることができる。さらに、この発明によれば、２倍速
再生時は処理負荷を低減できるとともに、映像と音声の
ズレを小さくでき、しかも音声信号を蓄積するためのメ
モリの容量も膨大とならないという利点が得られる。As described above, in the VTR of the present invention, the adaptive voice speed conversion process is performed at the time of double speed reproduction, and the simple thinning process is performed at the time of triple speed reproduction or more, thereby converting the voice speed. Not only is the structure of the IC for use simplified, but an appropriate speech rate conversion process is automatically selected, so that the commercial value of the VTR can be improved. Further, according to the present invention, it is possible to reduce the processing load at the time of double speed reproduction, reduce the deviation between the video and the audio, and obtain the advantage that the memory capacity for accumulating the audio signal does not become enormous.

[Brief description of drawings]

【図１】話速変換装置の全体的な構成を示すブロック図
である。FIG. 1 is a block diagram showing an overall configuration of a speech speed conversion device.

【図２】話速変換部の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a speech speed conversion unit.

【図３】ＰＩＣＯＬＡを用いて、入力信号を圧縮率２／
３で圧縮する方法を示す説明図である。FIG. 3 shows a compression ratio of an input signal of 2 / using PICOLA.
6 is an explanatory diagram showing a method of compression in FIG.

【図４】波形合成処理部による処理を説明するための説
明図である。FIG. 4 is an explanatory diagram illustrating a process performed by a waveform synthesis processing unit.

【図５】間引き処理部によって行なわれる各種の間引き
処理方法を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining various thinning-out processing methods performed by a thinning-out processing unit.

【図６】話速変換部による処理手順を示すフローチャー
トである。FIG. 6 is a flowchart showing a processing procedure by a speech speed conversion unit.

【図７】話速変換部による処理手順を示すフローチャー
トである。FIG. 7 is a flowchart showing a processing procedure by a speech speed conversion unit.

【図８】話速変換部による処理手順の変形例を示し、図
７に相当するフローチャートである。8 is a flowchart corresponding to FIG. 7, showing a modification of the processing procedure by the speech speed conversion unit.

【図９】図６のステップ１０の処理と置き換え可能な処
理を説明するための説明図である。9 is an explanatory diagram for explaining a process that can be replaced with the process of step 10 of FIG.

【図１０】２倍速再生時の入力信号と出力信号との関係
を示し、特に無音区間の入力信号が削除される様子を示
すタイムチャートである。FIG. 10 is a time chart showing a relationship between an input signal and an output signal at the time of double speed reproduction, in particular, a state in which the input signal in a silent section is deleted.

【図１１】リングメモリ７へのデータ書き込み開始点、
リングメモリ７からのデータ読み出し開始点ならびに図
１０の点Ａ〜Ｄにおけるリングメモリ７の状態を示す模
式図である。FIG. 11 is a data writing start point to the ring memory 7,
FIG. 11 is a schematic diagram showing a state of the ring memory 7 at a data read start point from the ring memory 7 and points A to D in FIG. 10.

【図１２】図１０の点Ｅ〜Ｈにおけるリングメモリ７の
状態を示す模式図である。12 is a schematic diagram showing a state of the ring memory 7 at points E to H in FIG.

【図１３】２倍速再生時の入力信号と出力信号との関係
を示し、特にオーバーフロー直前状態となったときに、
入力信号が削除される様子を示すタイムチャートであ
る。FIG. 13 shows a relationship between an input signal and an output signal at the time of double speed reproduction, particularly when a state immediately before overflow occurs,
It is a time chart which shows a mode that an input signal is deleted.

【図１４】図１３の各点Ｓ〜Ｕにおけるリングメモリ７
の状態を示す模式図である。FIG. 14 is a ring memory 7 at points S to U in FIG.
It is a schematic diagram which shows the state of.

【図１５】音声区間と無音区間とを判別するための回路
の変形例を示し、図２に相当するブロック図である。FIG. 15 is a block diagram corresponding to FIG. 2, showing a modified example of a circuit for discriminating between a voice section and a silent section.

【図１６】音声区間と無音区間とを判別するための回路
の他の変形例を示し、図２に相当するブロック図であ
る。16 is a block diagram corresponding to FIG. 2, showing another modified example of the circuit for discriminating between the voice section and the silent section.

【図１７】固定フレーム単位で、入力信号を圧縮率２／
３で圧縮する方法を示す説明図である。FIG. 17 shows a compression rate of 2 / for an input signal in units of fixed frames.
6 is an explanatory diagram showing a method of compression in FIG.

【図１８】図６のステップ９の処理と置き換え可能な処
理を説明するための説明図である。FIG. 18 is an explanatory diagram for explaining a process that can be replaced with the process of step 9 of FIG.

【図１９】図６のステップ９の処理として図１８の処理
を採用した場合に、図６のステップ１０の処理と置き換
え可能な処理を説明するための説明図である。FIG. 19 is an explanatory diagram for explaining a process that can be replaced with the process of step 10 of FIG. 6 when the process of FIG. 18 is adopted as the process of step 9 of FIG.

【図２０】音声区間と無音区間とを判別するための回路
のさらに他の変形例を示し、図２に相当するブロック図
である。20 is a block diagram corresponding to FIG. 2, showing still another modified example of the circuit for discriminating between the voice section and the silent section.

【図２１】定常状態のパワースペクトル、雑音を含まな
い音声のパワースペクトルおよび音声区間のパワースペ
クトルを示すグラフである。FIG. 21 is a graph showing a steady-state power spectrum, a power spectrum of noise-free speech, and a power spectrum of a speech section.

【図２２】しきい値調整手段およびポーズ継続長調整手
段が付加された話速変換部を示すブロック図である。FIG. 22 is a block diagram showing a speech speed conversion unit to which threshold value adjusting means and pause duration adjusting means are added.

【図２３】本発明を実施したビデオテープレコーダの要
部回路ブロック図である。FIG. 23 is a circuit block diagram of an essential part of a video tape recorder embodying the present invention.

【図２４】図２３の回路ブロック図の動作説明のための
フローチャートを示す図である。24 is a diagram showing a flowchart for explaining the operation of the circuit block diagram of FIG. 23;

【図２５】適応型話速変換処理の概念を説明するための
図である。FIG. 25 is a diagram for explaining the concept of adaptive speech rate conversion processing.

【図２６】単純間引き処理による話速変換の概念を説明
するための図である。FIG. 26 is a diagram for explaining the concept of speech speed conversion by simple thinning processing.

【図２７】逆転再生時における単純間引き処理による話
速変換の概念を説明するための図である。[Fig. 27] Fig. 27 is a diagram for describing the concept of voice speed conversion by simple thinning processing during reverse reproduction.

【図２８】単純間引き処理を実現するためのメモリ制御
方法を説明するための図である。FIG. 28 is a diagram for explaining a memory control method for realizing a simple thinning process.

【図２９】逆転再生時における単純間引き処理を実現す
るためのメモリ制御方法を説明するための図である。FIG. 29 is a diagram for explaining a memory control method for realizing a simple thinning process during reverse playback.

[Explanation of symbols]

２Ａ／Ｄ変換部４ＤＳＰ５フレームメモリ６話速変換部７リングメモリ８Ｄ／Ａ変換部９アップダウンカウンタ１１パワー計算部１１Ａ平均振幅計算部１１Ｂピッチ周期検出部１１Ｃパワースペクトル計算部１２、１２Ａ、１２Ｂ、１２Ｃ比較部１５条件分岐部１６リングメモリ蓄積量状態判別部２１、２５入力信号削除部２３ピッチ圧縮伸長手段２４間引き処理部５１しきい値調整手段５２ポーズ継続長調整手段１１２話速変換ＩＣ１１４マイコン（マイクロコンピュータ） 2 A / D conversion section 4 DSP 5 Frame memory 6 Speech rate conversion section 7 Ring memory 8 D / A conversion section 9 Up-down counter 11 Power calculation section 11A Average amplitude calculation section 11B Pitch cycle detection section 11C Power spectrum calculation section 12, 12A, 12B, 12C Comparing unit 15 Conditional branching unit 16 Ring memory accumulated amount state determining unit 21, 25 Input signal deleting unit 23 Pitch compression / expansion unit 24 Decimation processing unit 51 Threshold adjusting unit 52 Pause duration adjusting unit 112 Speech speed Conversion IC 114 Microcomputer

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 5/93 Ｈ０４Ｎ 5/93 Ｇ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Office reference number FI technical display location H04N 5/93 H04N 5/93 G

Claims

[Claims]

1. At the time of double speed reproduction, a voice speed conversion for performing compression / expansion processing or deletion processing on an input audio signal is performed depending on whether the reproduced audio signal is a voice section or a silent section. A double speed audio reproduction mode and an N times speed reproduction mode in which, during ± N times speed reproduction (N: a natural number of 3 or more), the audio section of a reproduction audio signal for a predetermined period is thinned according to the reproduction speed. A video tape recorder comprising control means for setting.

2. A voice speed conversion processing means for converting a voice speed of an input voice signal, a ring memory in which an output of the voice speed conversion processing means is written, and a ring memory so as to perform a double speed voice reproduction mode. The voice speed conversion processing means compresses and expands the input voice signal according to whether the input voice signal is a voice section or a silent section and the amount of storage in the ring memory. A video tape recorder having a speech speed conversion device having means for performing processing or deletion processing.

3. The A / D conversion means and the A / D conversion means for sampling the input analog audio signal at a sampling frequency according to a set reproduction speed magnification in order to perform a double speed audio reproduction mode. A frame memory to which the voice signal output from the means is input, and a voice speed conversion processing means for performing a voice speed conversion process on the voice signal every time a required number of voice signals are input to the frame memory,
A ring memory to which the output of the speech speed conversion processing means is written,
The reading means for reading data from the ring memory at a constant speed, and the storage amount calculating means for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory are provided. According to the section discriminating means for discriminating whether the input voice corresponding to the required number of voice signals input to the frame memory is the voice section or the silent section, and the output of the section discriminating means and the accumulated amount calculating means, A video tape recorder having a speech speed conversion device equipped with signal processing means for performing compression / expansion processing or deletion processing on a required number of audio signals.

4. A frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed multiplication rate in order to perform a double speed audio reproduction mode, and a required number of audio signals are stored in the frame memory. Each time a signal is input, a voice speed conversion processing unit for performing a voice speed conversion process on those audio signals, a ring memory to which the output of the voice speed conversion processing unit is written, and a writing to a frame memory at the time of 1 × speed reproduction A read means for reading data from the ring memory based on a read signal having a frequency equal to the speed and a storage amount calculation means for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory are provided. The speech speed conversion processing means is such that the input voice corresponding to the required number of voice signals input to the frame memory is a voice section or a silent section. And a signal processing means for performing compression / expansion processing or deletion processing on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulated amount calculation means. A video tape recorder having a speech speed conversion device.

5. The memory control means according to claim 1, further comprising a memory control means for controlling the memory so that the audio data is written in the memory at N times speed and the written data is read out at 1 times speed in order to perform the N times speed reproduction mode. A video tape recorder that is characterized.