JP3081469B2

JP3081469B2 - Speech speed converter

Info

Publication number: JP3081469B2
Application number: JP06253929A
Authority: JP
Inventors: 浩司田中; 正蔵杉下; 正幸飯田; 正典宮武
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1993-10-19
Filing date: 1994-10-19
Publication date: 2000-08-28
Anticipated expiration: 2015-08-28
Also published as: JPH08147874A

Abstract

PURPOSE: To reduce a processing load, to reduce deviation between a video and an audio and to prevent increase in memory capacity by speech speed conversion-processing an input audio signal with a speech speed conversion- processing means. CONSTITUTION: Since a ring memory 7 becomes easy to become an overflow just before state the more in a program whose vocalizing speed is faster, a compression rate is decided so that an audio reproducing speed becomes close to a double speed. On the contrary, the compression rate is decided so that the audio reproducing speed becomes close to a one time speed the more in the program whose vocalizing speed is later. Thus, the audio reproducing speed becomes the speed of the double speed or below and according to the original vocalizing speed, and a natural reproducing audio is obtained.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声信号の話速を変
える話速変換装置に関し、例えば、映像を伴うレーザデ
ィスク、ＶＴＲ、ＴＶ、ＴＶ電話、ＴＶ会議システムの
音声の早聞き或いは遅聞きを行なう音声再生装置、音声
信号をゆっくりした聞きやすい音声に変換する聴覚補助
機能付きラジオ、電話機、補聴器、または、音声の早聞
き或いは遅聞きを行なうテープレコーダ、ステレオシス
テム、ＣＤプレーヤ、音声ガイダンスシステム、ネイテ
ィブスピードで話された英語音声をゆっくりした聞きや
すい音声に変換する英語学習器等に利用される話速変換
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech speed converter for changing the speech speed of an audio signal, and for example, to listen to the audio of a laser disk, a VTR, a TV, a TV telephone, or a TV conference system accompanied by video at a high or low speed. Audio reproduction device, radio with a hearing aid that converts audio signals into slow, easy-to-hear audio, telephones, hearing aids, or tape recorders, stereo systems, CD players, audio guidance systems that perform early or late audio listening The present invention relates to a speech speed conversion device used for an English language learner or the like that converts English voice spoken at native speed into slow and easy-to-hear voice.

【０００２】[0002]

【従来の技術】話速を変換する従来の技術として、アナ
ログ方式の時間軸伸長圧縮技術がある。しかしながら、
アナログ方式の時間軸伸長圧縮技術を用いた話速変換方
法では、単純な音声波形の間引きまたは音声波形の繰り
返し挿入が行なわれているだけなので、音声のつなぎめ
が不連続になるため、音質が悪くなるという問題があ
る。2. Description of the Related Art As a conventional technique for converting a speech speed, there is an analog time axis expansion / compression technique. However,
In the speech speed conversion method using the analog time-base expansion / compression technology, since simple speech waveform thinning or repeated insertion of speech waveforms are merely performed, the sound connection becomes discontinuous, resulting in poor sound quality. There is a problem of getting worse.

【０００３】良好な音質が得られる音声の時間軸伸長圧
縮技術として、ディジタル信号処理によって、音声のピ
ッチ周期を検出し、検出したピッチ周期単位でピッチ部
の間引きまたは挿入を行なう技術がある。しかしなが
ら、このディジタル方式の時間軸伸長圧縮技術を用いた
話速変換方法では、音声信号における無音区間および音
声区間にかかわらず、一律の圧縮伸長率で音声信号の圧
縮または伸長が行なわれているので、ＶＴＲの倍速再生
時、英語学習器の英語音声再生時等においては音声区間
の再生速度が速くなりすぎ、音声が聞き取れなくなる場
合があるという問題がある。[0003] As a time axis expansion / compression technique for speech that provides good sound quality, there is a technique of detecting a pitch cycle of speech by digital signal processing and thinning or inserting a pitch portion in units of the detected pitch cycle. However, in the speech speed conversion method using the digital time-base expansion / compression technique, the audio signal is compressed or expanded at a uniform compression / expansion rate regardless of the silent section and the audio section of the audio signal. When a VTR is played back at double speed, when an English language learner plays English voice, or the like, there is a problem that the playback speed of the voice section becomes too fast and the voice cannot be heard.

【０００４】[0004]

【発明が解決しようとする課題】上記問題を解決するた
めに、音声信号の無音区間と音声区間とを識別し、無音
区間を削除し、音声区間をピッチ周期単位で伸長する話
速変換方法が既に開発されている（参考文献Ａ（以下、
第１従来方式という）：信学技法ＳＰ９２−５６、ＨＣ
９２−３３（１９９２−０９）タイトル「話速変換に
伴う時間伸長を吸収するための一方法」社団法人電
子情報通信学会発行、参考文献Ｂ（以下、第２従来方式
という）：信学技法ＳＰ９２−１５０（１９９３−０
３）タイトル「難聴者による話速変換方式の評価」
社団法人電子情報通信学会発行）。この方法によれ
ば、音声区間の再生速度を遅くでき、音声が聞きやすく
なる。しかしながら、この方法では、次のような問題が
ある。In order to solve the above problem, a speech speed conversion method for identifying a silent section of a voice signal and a voice section, deleting the silent section, and extending the voice section in units of a pitch period has been proposed. It has already been developed (Ref. A (hereinafter referred to as
The first conventional method): IETF SP92-56, HC
92-33 (1992-09) Title "One Method for Absorbing Time Elongation Associated with Speech Speed Conversion" Published by The Institute of Electronics, Information and Communication Engineers, Reference B (hereinafter referred to as "second conventional method"): IEICE Tech. -150 (1993-0)
3) Title "Evaluation of speech rate conversion method by hearing impaired"
Published by The Institute of Electronics, Information and Communication Engineers). According to this method, the reproduction speed of the voice section can be reduced, and the voice can be easily heard. However, this method has the following problems.

【０００５】第１従来方式では、処理負荷が大きいた
め、高速演算が必要となり、消費電力が大きくなる。第
２従来方式では、映像と音声のズレが大きくなりすぎ内
容把握が困難となるとともに、音声信号を蓄積するため
のメモリの容量が膨大となりコストがかかる。[0005] In the first conventional method, a high processing load is required due to a large processing load, and power consumption is increased. In the second conventional method, the difference between video and audio becomes too large to make it difficult to grasp the contents, and the capacity of the memory for storing the audio signal becomes enormous, resulting in high cost.

【０００６】この発明は、処理負荷を低減できるととも
に、映像と音声のズレを小さくでき、しかも音声信号を
蓄積するためのメモリの容量も膨大とならない話速変換
装置を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech speed conversion device which can reduce the processing load, reduce the difference between video and audio, and does not require a huge memory capacity for storing audio signals. .

【０００７】この発明の他の目的は、入力信号の音声区
間における音声の欠落部をできるだけ少なくしつつ、音
声区間における音声に対する音声再生速度を、設定され
た再生速度倍率に対して遅くさせることができる話速変
換装置を提供することを目的とする。It is another object of the present invention to reduce the voice reproduction speed for a voice in a voice section with respect to a set reproduction speed magnification while minimizing the lack of a voice in a voice section of an input signal. It is an object of the present invention to provide a speech speed conversion device capable of performing the conversion.

【０００８】[0008]

【課題を解決するための手段】この発明による第１の話
速変換装置は、入力音声信号を話速変換処理する話速変
換処理手段、話速変換処理手段の出力が書き込まれるリ
ングメモリ、およびリングメモリから一定速度でデータ
を読み出す手段を備え、話速変換処理手段は、入力音声
信号が音声区間でありかつリングメモリがオーバーフロ
ー直前状態でないときに、設定再生速度倍率をｎとして
１／ｎ以上の圧縮率であって、操作者によって設定され
た番組種類に応じて決定された圧縮率で入力音声信号に
対して圧縮伸長処理を行なう手段を備えていることを特
徴とする。According to a first aspect of the present invention, there is provided a first speech speed conversion device for performing speech speed conversion processing of an input voice signal, a ring memory in which an output of the speech speed conversion processing device is written, and Means for reading out data from the ring memory at a constant speed, wherein the speech speed conversion processing means is configured such that when the input voice signal is a voice section and the ring memory is not in a state immediately before overflow, the set reproduction speed magnification is set to n and 1 / n or more. Means for performing compression / expansion processing on an input audio signal at a compression rate determined according to a program type set by an operator.

【０００９】この発明による第２の話速変換装置は、入
力されるアナログ音声信号を設定された再生速度倍率に
応じたサンプリング周波数でサンプリングするＡ／Ｄ変
換手段、Ａ／Ｄ変換手段から出力された音声信号が入力
されるフレームメモリ、フレームメモリに所要数の音声
信号が入力されるごとに、それらの音声信号に対して話
速変換処理を行なう話速変換処理手段、話速変換処理手
段の出力が書き込まれるリングメモリ、１倍速再生時の
サンプリング周波数と等しい周波数の読み出し信号に基
づいて、リングメモリからデータを読み出す読出手段、
およびリングメモリの書き込み信号と読み出し信号とに
基づいて、リングメモリの蓄積量を算出する蓄積量算出
手段を備えており、話速変換処理手段は、フレームメモ
リに入力された所要数の音声信号に対応する入力音声
が、音声区間か無音区間かを判別する区間判別手段、な
らびに区間判別手段の出力および蓄積量算出手段の出力
に応じて、上記所要数の音声信号に対して圧縮伸長処理
または削除処理を行なう信号処理手段を備え、信号処理
手段は、入力音声が音声区間でありかつリングメモリが
オーバーフロー直前状態でないときに、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類に応じて決定された圧縮率で圧
縮伸長処理を行なう手段を含んでいることを特徴とす
る。The second speech speed conversion device according to the present invention outputs A / D conversion means for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, and outputs the A / D conversion means. A speech memory, to which a required number of speech signals are inputted into the frame memory, speech rate conversion processing means for performing speech rate conversion processing on those speech signals, and speech rate conversion processing means. A ring memory to which an output is written, a reading means for reading data from the ring memory based on a read signal having a frequency equal to a sampling frequency at the time of 1 × speed reproduction,
And a storage amount calculation means for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory, and the speech speed conversion processing means converts the required number of audio signals input to the frame memory into According to the section discriminating means for discriminating whether the corresponding input voice is a voice section or a silence section, and the compression and decompression processing or deletion of the required number of voice signals according to the output of the section discriminating means and the output of the accumulation amount calculating means. Signal processing means for performing processing, wherein when the input voice is a voice section and the ring memory is not in a state immediately before an overflow, the compression rate is 1 / n or more, where n is a set reproduction speed magnification, It is characterized by including means for performing a compression / expansion process at a compression ratio determined according to a program type set by an operator.

【００１０】この発明による第３の話速変換装置は、入
力されるディジタル音声信号が、設定された再生速度倍
率に応じた速度で書き込まれるフレームメモリ、フレー
ムメモリに所要数の音声信号が入力されるごとに、それ
らの音声信号に対して話速変換処理を行なう話速変換処
理手段、話速変換処理手段の出力が書き込まれるリング
メモリ、リングメモリから一定速度でデータを読み出す
読出手段、およびリングメモリの書き込み信号と読み出
し信号とに基づいて、リングメモリの蓄積量を算出する
蓄積量算出手段を備えており、話速変換処理手段は、フ
レームメモリに入力された所要数の音声信号に対応する
入力音声が、音声区間か無音区間かを判別する区間判別
手段、ならびに区間判別手段の出力および蓄積量算出手
段の出力に応じて、上記所要数の音声信号に対して圧縮
伸長処理または削除処理を行なう信号処理手段を備え、
信号処理手段は、入力音声が音声区間でありかつリング
メモリがオーバーフロー直前状態でないときに、設定再
生速度倍率をｎとして１／ｎ以上の圧縮率であって、操
作者によって設定された番組種類に応じて決定された圧
縮率で圧縮伸長処理を行なう手段を含んでいることを特
徴とする。In a third speech speed converter according to the present invention, a required number of audio signals are input to a frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification. Speech rate conversion processing means for performing speech rate conversion processing on these voice signals, a ring memory in which the output of the speech rate conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a ring. A storage amount calculating unit for calculating the storage amount of the ring memory based on the write signal and the read signal of the memory is provided, and the speech speed conversion processing unit corresponds to the required number of audio signals input to the frame memory. In accordance with the section discriminating means for discriminating whether the input voice is a voice section or a silent section, and the output of the section discriminating means and the output of the accumulation amount calculating means, A signal processing means for performing compression and expansion processing or deletion processing with respect to the required number of audio signals,
When the input voice is in the voice section and the ring memory is not in the state immediately before the overflow, the signal processing means has a compression ratio of 1 / n or more, where n is the set reproduction speed magnification, and the program type set by the operator is It is characterized in that it includes means for performing compression / decompression processing at a compression ratio determined accordingly.

【００１１】上記リングメモリとは、リング構造(ring
structure)を有するメモリをいう。リング構造とは、連
鎖リストの最後の項目のポインタが先頭の項目をさすよ
うにつながれたものをいう。The above-mentioned ring memory has a ring structure (ring
structure). The ring structure refers to a structure in which the pointer of the last item of the chained list points to the first item.

【００１２】上記第２または第３の話速変換装置の信号
処理手段としては、たとえば、区間判別手段の出力およ
び蓄積量算出手段の出力に基づいて、（１）入力音声が
音声区間でありかつリングメモリがオーバーフロー直前
状態でない第１モード、（２）入力音声が音声区間であ
りかつリングメモリがオーバーフロー直前状態である第
２モード、（３）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値未満であり、
かつリングメモリがオーバーフロー直前状態でない第３
モード、（４）入力音声が無音区間でありかつ無音区間
の継続長が所定の無音削除開始点判別値未満であり、か
つリングメモリがオーバーフロー直前状態である第４モ
ード、（５）入力音声が無音区間でありかつ無音区間の
継続長が所定の無音削除開始点判別値以上であり、かつ
リングメモリがアンダーフロー直前状態でない第５モー
ド、および（６）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値以上であり、
かつリングメモリがアンダーフロー直前状態である第６
モードのうちのいずれのモードであるかを判別するモー
ド判別手段、第１モードまたは第３モードと判別された
ときに、設定再生速度倍率をｎとして１／ｎ以上の圧縮
率であって、操作者によって設定された番組種類に応じ
て決定された圧縮率で圧縮伸長処理を行なう第１処理手
段、第２モードまたは第４モードと判別されたときに、
リングメモリの蓄積量がアンダーフロー直前状態となる
まで音声信号を削除する第２処理手段、第５モードと判
別されたときに、無音区間の音声信号を削除する第３処
理手段、ならびに、第６モードと判別されたときに、設
定再生速度倍率をｎとして、圧縮率１／ｎ±α（ただ
し、αは０以上で１以下の値）で圧縮伸長処理を行なう
第４処理手段を備えているものが用いられる。The signal processing means of the second or third speech speed conversion device includes, for example, based on the output of the section discriminating means and the output of the storage amount calculating means, (1) the input voice is a voice section and A first mode in which the ring memory is not in the state immediately before the overflow; (2) a second mode in which the input voice is in the voice section and the ring memory is in the state immediately before the overflow; and (3) a continuous section in which the input voice is the silent section and the silent section. Is less than a predetermined silence deletion start point determination value,
Third, the ring memory is not in the state immediately before overflow
(4) a fourth mode in which the input voice is a silent section, the duration of the silent section is less than a predetermined silent deletion start point determination value, and the ring memory is in a state immediately before overflow; A fifth mode in which the duration of the silent section is longer than or equal to a predetermined silent deletion start point discrimination value and the ring memory is not in a state immediately before an underflow; and (6) the input sound is a silent section and a silent section. Is greater than or equal to a predetermined silence deletion start point determination value,
And the ring memory is in the state immediately before underflow.
A mode discriminating unit for discriminating which of the modes, when the mode is determined to be the first mode or the third mode, the set reproduction speed magnification is n and the compression ratio is 1 / n or more; Processing means for performing compression / expansion processing at a compression ratio determined according to the program type set by the user, when it is determined that the second mode or the fourth mode has been selected,
A second processing unit that deletes an audio signal until the storage amount of the ring memory becomes a state immediately before an underflow; a third processing unit that deletes an audio signal in a silent section when the fifth mode is determined; and a sixth processing unit. A fourth processing means is provided for performing compression / expansion processing at a compression ratio of 1 / n ± α (where α is a value of 0 or more and 1 or less), where n is a set reproduction speed magnification when the mode is determined. Things are used.

【００１３】上記無音削除開始点判別値を、上記リング
メモリの蓄積量に応じて調整するようにしてもよい。[0013] The silence deletion start point discrimination value may be adjusted according to the storage amount of the ring memory.

【００１４】この発明による第４の話速変換装置は、入
力音声信号を話速変換処理する話速変換処理手段、話速
変換処理手段の出力が書き込まれるリングメモリ、およ
びリングメモリから一定速度でデータを読み出す手段を
備え、話速変換処理手段は、入力音声信号が音声区間で
ありかつリングメモリがオーバーフロー直前状態でない
ときに、設定再生速度倍率をｎとして１／ｎ以上の圧縮
率であって、操作者によって設定された番組種類および
リングメモリの蓄積量に応じて決定された圧縮率で入力
音声信号に対して圧縮伸長処理を行なう手段を備えてい
ることを特徴とする。According to a fourth aspect of the present invention, there is provided a speech rate conversion device for converting an input voice signal into a speech rate, a ring memory in which the output of the speech rate conversion means is written, and a constant speed from the ring memory. Means for reading data, wherein the speech rate conversion processing means has a compression rate of 1 / n or more, where n is a set reproduction speed magnification, when the input audio signal is in a voice section and the ring memory is not in a state immediately before overflow. Means for performing compression / expansion processing on an input audio signal at a compression rate determined according to a program type set by an operator and a storage amount of a ring memory.

【００１５】この発明による第５の話速変換装置は、入
力されるアナログ音声信号を設定された再生速度倍率に
応じたサンプリング周波数でサンプリングするＡ／Ｄ変
換手段、Ａ／Ｄ変換手段から出力された音声信号が入力
されるフレームメモリ、フレームメモリに所要数の音声
信号が入力されるごとに、それらの音声信号に対して話
速変換処理を行なう話速変換処理手段、話速変換処理手
段の出力が書き込まれるリングメモリ、１倍速再生時の
サンプリング周波数と等しい周波数の読み出し信号に基
づいて、リングメモリからデータを読み出す読出手段、
およびリングメモリの書き込み信号と読み出し信号とに
基づいて、リングメモリの蓄積量を算出する蓄積量算出
手段を備えており、話速変換処理手段は、フレームメモ
リに入力された所要数の音声信号に対応する入力音声
が、音声区間か無音区間かを判別する区間判別手段、な
らびに区間判別手段の出力および蓄積量算出手段の出力
に応じて、上記所要数の音声信号に対して圧縮伸長処理
または削除処理を行なう信号処理手段を備え、信号処理
手段は、入力音声が音声区間でありかつリングメモリが
オーバーフロー直前状態でないときに、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類およびリングメモリの蓄積量に
応じて決定された圧縮率で圧縮伸長処理を行なう手段を
含んでいることを特徴とする。A fifth speech speed conversion device according to the present invention outputs A / D conversion means for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, and outputs the A / D conversion means. A speech memory, to which a required number of speech signals are inputted into the frame memory, speech rate conversion processing means for performing speech rate conversion processing on those speech signals, and speech rate conversion processing means. A ring memory to which an output is written, a reading means for reading data from the ring memory based on a read signal having a frequency equal to a sampling frequency at the time of 1 × speed reproduction,
And a storage amount calculation means for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory, and the speech speed conversion processing means converts the required number of audio signals input to the frame memory into According to the section discriminating means for discriminating whether the corresponding input voice is a voice section or a silence section, and the compression and decompression processing or deletion of the required number of voice signals according to the output of the section discriminating means and the output of the accumulation amount calculating means. Signal processing means for performing processing, wherein when the input voice is a voice section and the ring memory is not in a state immediately before an overflow, the compression rate is 1 / n or more, where n is a set reproduction speed magnification, It includes a means for performing compression / expansion processing at a compression rate determined according to the program type set by the operator and the amount of storage in the ring memory. To.

【００１６】この発明による第６の話速変換装置は、入
力されるディジタル音声信号が、設定された再生速度倍
率に応じた速度で書き込まれるフレームメモリ、フレー
ムメモリに所要数の音声信号が入力されるごとに、それ
らの音声信号に対して話速変換処理を行なう話速変換処
理手段、話速変換処理手段の出力が書き込まれるリング
メモリ、リングメモリから一定速度でデータを読み出す
読出手段、およびリングメモリの書き込み信号と読み出
し信号とに基づいて、リングメモリの蓄積量を算出する
蓄積量算出手段を備えており、話速変換処理手段は、フ
レームメモリに入力された所要数の音声信号に対応する
入力音声が、音声区間か無音区間かを判別する区間判別
手段、ならびに区間判別手段の出力および蓄積量算出手
段の出力に応じて、上記所要数の音声信号に対して圧縮
伸長処理または削除処理を行なう信号処理手段を備え、
信号処理手段は、入力音声が音声区間でありかつリング
メモリがオーバーフロー直前状態でないときに、設定再
生速度倍率をｎとして１／ｎ以上の圧縮率であって、操
作者によって設定された番組種類およびリングメモリの
蓄積量に応じて決定された圧縮率で圧縮伸長処理を行な
う手段を含んでいることを特徴とする。In a sixth speech speed converter according to the present invention, a required number of audio signals are input to a frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification. Speech rate conversion processing means for performing speech rate conversion processing on these voice signals, a ring memory in which the output of the speech rate conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a ring. A storage amount calculating unit for calculating the storage amount of the ring memory based on the write signal and the read signal of the memory is provided, and the speech speed conversion processing unit corresponds to the required number of audio signals input to the frame memory. In accordance with the section discriminating means for discriminating whether the input voice is a voice section or a silent section, and the output of the section discriminating means and the output of the accumulation amount calculating means, A signal processing means for performing compression and expansion processing or deletion processing with respect to the required number of audio signals,
The signal processing means, when the input voice is a voice section and the ring memory is not in a state immediately before overflow, the set reproduction speed magnification is n and the compression rate is 1 / n or more, and the program type and It is characterized in that it includes means for performing compression / expansion processing at a compression ratio determined according to the storage amount of the ring memory.

【００１７】上記第５または第６の話速変換装置の信号
処理手段としては、たとえば、区間判別手段の出力およ
び蓄積量算出手段の出力に基づいて、（１）入力音声が
音声区間でありかつリングメモリがオーバーフロー直前
状態でない第１モード、（２）入力音声が音声区間であ
りかつリングメモリがオーバーフロー直前状態である第
２モード、（３）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値未満であり、
かつリングメモリがオーバーフロー直前状態でない第３
モード、（４）入力音声が無音区間でありかつ無音区間
の継続長が所定の無音削除開始点判別値未満であり、か
つリングメモリがオーバーフロー直前状態である第４モ
ード、（５）入力音声が無音区間でありかつ無音区間の
継続長が所定の無音削除開始点判別値以上であり、かつ
リングメモリがアンダーフロー直前状態でない第５モー
ド、および（６）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値以上であり、
かつリングメモリがアンダーフロー直前状態である第６
モードのうちのいずれのモードであるかを判別するモー
ド判別手段、第１モードまたは第３モードと判別された
ときに、設定再生速度倍率をｎとして１／ｎ以上の圧縮
率であって、操作者によって設定された番組種類および
リングメモリの蓄積量に応じて決定された圧縮率で圧縮
伸長処理を行なう第１処理手段、第２モードまたは第４
モードと判別されたときに、リングメモリの蓄積量がア
ンダーフロー直前状態となるまで音声信号を削除する第
２処理手段、第５モードと判別されたときに、無音区間
の音声信号を削除する第３処理手段、ならびに、第６モ
ードと判別されたときに、設定再生速度倍率をｎとし
て、圧縮率１／ｎ±α（ただし、αは０以上で１以下の
値）で圧縮伸長処理を行なう第４処理手段を備えている
ものが用いられる。The signal processing means of the fifth or sixth speech speed conversion device includes, for example, based on the output of the section discriminating means and the output of the accumulation amount calculating means, (1) the input voice is a voice section and A first mode in which the ring memory is not in the state immediately before the overflow; (2) a second mode in which the input voice is in the voice section and the ring memory is in the state immediately before the overflow; and (3) a continuous section in which the input voice is the silent section and the silent section. Is less than a predetermined silence deletion start point determination value,
Third, the ring memory is not in the state immediately before overflow
(4) a fourth mode in which the input voice is a silent section, the duration of the silent section is less than a predetermined silent deletion start point determination value, and the ring memory is in a state immediately before overflow; A fifth mode in which the duration of the silent section is longer than or equal to a predetermined silent deletion start point discrimination value and the ring memory is not in a state immediately before an underflow; and (6) the input sound is a silent section and a silent section. Is greater than or equal to a predetermined silence deletion start point determination value,
And the ring memory is in the state immediately before underflow.
A mode discriminating unit for discriminating which of the modes, when the mode is determined to be the first mode or the third mode, the set reproduction speed magnification is n and the compression ratio is 1 / n or more; Processing means for performing compression / expansion processing at a compression rate determined according to the program type and the storage amount of the ring memory set by the user, the second mode or the fourth mode
A second processing unit that deletes an audio signal until the storage amount of the ring memory becomes a state immediately before an underflow when the mode is determined; and a second processing unit that deletes an audio signal in a silent section when the mode is determined. (3) processing means, and when the mode is determined to be the sixth mode, the compression / expansion processing is performed at a compression ratio of 1 / n ± α (α is a value of 0 or more and 1 or less), where n is a set reproduction speed magnification. The one provided with the fourth processing means is used.

【００１８】上記無音削除開始点判別値を、上記リング
メモリの蓄積量に応じて調整するようにしてもよい。The silence deletion start point discrimination value may be adjusted according to the storage amount of the ring memory.

【００１９】この発明による第７の話速変換装置は、入
力されるアナログ音声信号を設定された再生速度倍率に
応じたサンプリング周波数でサンプリングするＡ／Ｄ変
換手段、Ａ／Ｄ変換手段から出力された音声信号が入力
されるフレームメモリ、フレームメモリに所要数の音声
信号が入力されるごとに、それらの音声信号に対して話
速変換処理を行なう話速変換処理手段、話速変換処理手
段の出力が書き込まれるリングメモリ、１倍速再生時の
サンプリング周波数と等しい周波数の読み出し信号に基
づいて、リングメモリからデータを読み出す読出手段、
およびリングメモリの書き込み信号と読み出し信号とに
基づいて、リングメモリの蓄積量を算出する蓄積量算出
手段を備えており、話速変換処理手段は、フレームメモ
リに入力された所要数の音声信号に対応する入力音声
が、音声区間か無音区間かを判別する区間判別手段、な
らびに区間判別手段の出力および蓄積量算出手段の出力
に応じて、上記所要数の音声信号に対して圧縮伸長処理
または削除処理を行なう信号処理手段を備え、信号処理
手段は、入力音声が音声区間でありかつリングメモリが
オーバーフロー直前状態でないときにおいて、圧縮率固
定モードが選択されているときには、設定再生速度倍率
をｎとして１／ｎ以上の圧縮率であって、操作者によっ
て設定された番組種類に応じて決定された圧縮率で圧縮
伸長処理を行ない、圧縮率変動モードが選択されている
ときには、設定再生速度倍率をｎとして１／ｎ以上の圧
縮率であって、操作者によって設定された番組種類およ
びリングメモリの蓄積量に応じて決定された圧縮率で圧
縮伸長処理を行なう手段を含んでいることを特徴とす
る。A seventh speech speed conversion device according to the present invention outputs A / D conversion means for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, and outputs the A / D conversion means. A speech memory, to which a required number of speech signals are inputted into the frame memory, speech rate conversion processing means for performing speech rate conversion processing on those speech signals, and speech rate conversion processing means. A ring memory to which an output is written, a reading means for reading data from the ring memory based on a read signal having a frequency equal to a sampling frequency at the time of 1 × speed reproduction,
And a storage amount calculation means for calculating the storage amount of the ring memory based on the write signal and the read signal of the ring memory, and the speech speed conversion processing means converts the required number of audio signals input to the frame memory into According to the section discriminating means for discriminating whether the corresponding input voice is a voice section or a silence section, and the compression and decompression processing or deletion of the required number of voice signals according to the output of the section discriminating means and the output of the accumulation amount calculating means. Signal processing means for performing processing, wherein when the input voice is a voice section and the ring memory is not in a state immediately before overflow, and when the compression rate fixed mode is selected, the set reproduction speed magnification is set to n. Perform compression / expansion processing at a compression rate of 1 / n or more and determined according to the program type set by the operator; When the reduction rate variation mode is selected, the compression rate is 1 / n or more, where n is the set reproduction speed magnification, and the compression rate is determined according to the program type set by the operator and the storage amount of the ring memory. And means for performing compression / decompression processing at a rate.

【００２０】この発明による第８の話速変換装置は、入
力されるディジタル音声信号が、設定された再生速度倍
率に応じた速度で書き込まれるフレームメモリ、フレー
ムメモリに所要数の音声信号が入力されるごとに、それ
らの音声信号に対して話速変換処理を行なう話速変換処
理手段、話速変換処理手段の出力が書き込まれるリング
メモリ、リングメモリから一定速度でデータを読み出す
読出手段、およびリングメモリの書き込み信号と読み出
し信号とに基づいて、リングメモリの蓄積量を算出する
蓄積量算出手段を備えており、話速変換処理手段は、フ
レームメモリに入力された所要数の音声信号に対応する
入力音声が、音声区間か無音区間かを判別する区間判別
手段、ならびに区間判別手段の出力および蓄積量算出手
段の出力に応じて、上記所要数の音声信号に対して圧縮
伸長処理または削除処理を行なう信号処理手段を備え、
信号処理手段は、入力音声が音声区間でありかつリング
メモリがオーバーフロー直前状態でないときにおいて、
圧縮率固定モードが選択されているときには、設定再生
速度倍率をｎとして１／ｎ以上の圧縮率であって、操作
者によって設定された番組種類に応じて決定された圧縮
率で圧縮伸長処理を行ない、圧縮率変動モードが選択さ
れているときには、設定再生速度倍率をｎとして１／ｎ
以上の圧縮率であって、操作者によって設定された番組
種類およびリングメモリの蓄積量に応じて決定された圧
縮率で圧縮伸長処理を行なう手段を含んでいることを特
徴とする。In an eighth speech speed conversion apparatus according to the present invention, a required number of audio signals are input to a frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification. Speech rate conversion processing means for performing speech rate conversion processing on these voice signals, a ring memory in which the output of the speech rate conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a ring. A storage amount calculating unit for calculating the storage amount of the ring memory based on the write signal and the read signal of the memory is provided, and the speech speed conversion processing unit corresponds to the required number of audio signals input to the frame memory. In accordance with the section discriminating means for discriminating whether the input voice is a voice section or a silent section, and the output of the section discriminating means and the output of the accumulation amount calculating means, A signal processing means for performing compression and expansion processing or deletion processing with respect to the required number of audio signals,
The signal processing means, when the input voice is a voice section and the ring memory is not in a state immediately before overflow,
When the fixed compression rate mode is selected, the compression / expansion process is performed at a compression rate of 1 / n or more, where n is the set reproduction speed magnification, and is determined according to the program type set by the operator. When the compression rate fluctuation mode is selected, the set reproduction speed magnification is set to n and 1 / n
It is characterized in that it includes a means for performing the compression / expansion processing at the above-mentioned compression rate and at a compression rate determined according to the program type and the storage amount of the ring memory set by the operator.

【００２１】上記第７または第８の話速変換装置の信号
処理手段としては、たとえば、区間判別手段の出力およ
び蓄積量算出手段の出力に基づいて、（１）入力音声が
音声区間でありかつリングメモリがオーバーフロー直前
状態でない第１モード、（２）入力音声が音声区間であ
りかつリングメモリがオーバーフロー直前状態である第
２モード、（３）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値未満であり、
かつリングメモリがオーバーフロー直前状態でない第３
モード、（４）入力音声が無音区間でありかつ無音区間
の継続長が所定の無音削除開始点判別値未満であり、か
つリングメモリがオーバーフロー直前状態である第４モ
ード、（５）入力音声が無音区間でありかつ無音区間の
継続長が所定の無音削除開始点判別値以上であり、かつ
リングメモリがアンダーフロー直前状態でない第５モー
ド、および（６）入力音声が無音区間でありかつ無音区
間の継続長が所定の無音削除開始点判別値以上であり、
かつリングメモリがアンダーフロー直前状態である第６
モードのうちのいずれのモードであるかを判別するモー
ド判別手段、第１モードまたは第３モードと判別された
ときに、圧縮率固定モードが選択されているときには、
設定再生速度倍率をｎとして１／ｎ以上の圧縮率であっ
て、操作者によって設定された番組種類に応じて決定さ
れた圧縮率で圧縮伸長処理を行ない、圧縮率変動モード
が選択されているときには、設定再生速度倍率をｎとし
て１／ｎ以上の圧縮率であって、操作者によって設定さ
れた番組種類およびリングメモリの蓄積量に応じて決定
された圧縮率で圧縮伸長処理を行なう第１処理手段、第
２モードまたは第４モードと判別されたときに、リング
メモリの蓄積量がアンダーフロー直前状態となるまで音
声信号を削除する第２処理手段、第５モードと判別され
たときに、無音区間の音声信号を削除する第３処理手
段、ならびに、第６モードと判別されたときに、設定再
生速度倍率をｎとして、圧縮率１／ｎ±α（ただし、α
は０以上で１以下の値）で圧縮伸長処理を行なう第４処
理手段を備えているものが用いられる。As the signal processing means of the seventh or eighth speech speed conversion device, for example, based on the output of the section discriminating means and the output of the accumulation amount calculating means, (1) the input voice is a voice section and A first mode in which the ring memory is not in the state immediately before the overflow; (2) a second mode in which the input voice is in the voice section and the ring memory is in the state immediately before the overflow; and (3) a continuous section in which the input voice is the silent section and the silent section. Is less than a predetermined silence deletion start point determination value,
Third, the ring memory is not in the state immediately before overflow
(4) a fourth mode in which the input voice is a silent section, the duration of the silent section is less than a predetermined silent deletion start point determination value, and the ring memory is in a state immediately before overflow; A fifth mode in which the duration of the silent section is longer than or equal to a predetermined silent deletion start point discrimination value and the ring memory is not in a state immediately before an underflow; and (6) the input sound is a silent section and a silent section. Is greater than or equal to a predetermined silence deletion start point determination value,
And the ring memory is in the state immediately before underflow.
A mode discriminating unit for discriminating which of the modes, when the first mode or the third mode is discriminated, when the fixed compression ratio mode is selected,
The compression / expansion process is performed at a compression ratio of 1 / n or more, where n is the set reproduction speed magnification, and is determined according to the program type set by the operator, and the compression ratio fluctuation mode is selected. In some cases, the compression / expansion process is performed at a compression rate of 1 / n or more, where n is a set reproduction speed magnification, and is determined according to the program type set by the operator and the storage amount of the ring memory. When the processing mode is determined to be the second mode or the fourth mode, the second processing means for deleting the audio signal until the storage amount of the ring memory becomes the state immediately before the underflow. When the processing mode is determined to be the fifth mode, A third processing means for deleting the audio signal in the silent section, and a compression rate 1 / n ± α (where α
Is a value greater than or equal to 0 and less than or equal to 1).

【００２２】上記無音削除開始点判別値を、上記リング
メモリの蓄積量に応じて調整するようにしてもよい。The silence deletion start point discrimination value may be adjusted according to the storage amount of the ring memory.

【００２３】上記第２、３、５、６、７または第８の話
速変換装置において、上記区間判別手段としては、たと
えば、フレームメモリに入力された所要数の音声信号の
パワー平均値を算出する手段、および算出されたパワー
平均値と所与のしきい値とに基づいて、入力音声が音声
区間か無音区間かを判別する判別手段を備えているもの
が用いられる。上記しきい値を上記リングメモリの蓄積
量に応じて調整するようにしてもよい。In the second, third, fifth, sixth, seventh or eighth speech speed converting apparatus, the section discriminating means calculates, for example, a power average value of a required number of audio signals input to a frame memory. And means for determining whether the input voice is a voice section or a silent section based on the calculated power average value and a given threshold value. The threshold value may be adjusted according to the storage amount of the ring memory.

【００２４】上記区間判別手段としては、たとえば、フ
レームメモリに入力された所要数の音声信号のパワー累
積値を算出する手段、および算出されたパワー累積値と
所与のしきい値とに基づいて、入力音声が音声区間か無
音区間かを判別する判別手段を備えているものが用いら
れる。上記しきい値を上記リングメモリの蓄積量に応じ
て調整するようにしてもよい。The section discriminating means includes, for example, means for calculating a cumulative power value of a required number of audio signals input to the frame memory, and based on the calculated cumulative power value and a given threshold value. In addition, a device provided with a discriminating means for discriminating whether an input voice is a voice section or a silent section is used. The threshold value may be adjusted according to the storage amount of the ring memory.

【００２５】上記区間判別手段としては、たとえば、フ
レームメモリに入力された所要数の音声信号の振幅平均
値を算出する手段、および算出された振幅平均値と所与
のしきい値とに基づいて、入力音声が音声区間か無音区
間かを判別する判別手段を備えているものが用いられ
る。上記しきい値を上記リングメモリの蓄積量に応じて
調整するようにしてもよい。As the section discriminating means, for example, means for calculating an average value of the amplitude of a required number of audio signals input to the frame memory, and based on the calculated average value of the amplitude and a given threshold value In addition, a device provided with a discriminating means for discriminating whether an input voice is a voice section or a silent section is used. The threshold value may be adjusted according to the storage amount of the ring memory.

【００２６】上記区間判別手段としては、たとえば、フ
レームメモリに入力された所要数の音声信号の振幅累積
値を算出する手段、および算出された振幅累積値と所与
のしきい値とに基づいて、入力音声が音声区間か無音区
間かを判別する判別手段を備えているものが用いられ
る。上記しきい値を上記リングメモリの蓄積量に応じて
調整するようにしてもよい。As the section discriminating means, for example, a means for calculating a cumulative amplitude value of a required number of audio signals input to the frame memory, and a section based on the calculated cumulative amplitude value and a given threshold value In addition, a device provided with a discriminating means for discriminating whether an input voice is a voice section or a silent section is used. The threshold value may be adjusted according to the storage amount of the ring memory.

【００２７】上記区間判別手段としては、たとえば、フ
レームメモリに入力された所要数の音声信号の周期性を
検出する検出手段、および検出された周期に基づいて、
入力音声が音声区間か無音区間かを判別する判別手段を
備えているものが用いられる。The section discriminating means includes, for example, detecting means for detecting the periodicity of a required number of audio signals input to the frame memory, and based on the detected cycle.
The one provided with a discriminating means for discriminating whether the input voice is a voice section or a silent section is used.

【００２８】上記区間判別手段としては、たとえば、フ
レームメモリに入力された所要数の音声信号の所定の１
または複数の周波数帯域に対するパワースペクトルを算
出する算出手段、および算出されたパワースペクトルと
所与のしきい値とに基づいて、入力音声が音声区間か無
音区間かを判別する判別手段を備えているものが用いら
れる。上記しきい値を上記リングメモリの蓄積量に応じ
て調整するようにしてもよい。The section discriminating means may be, for example, a predetermined one of a predetermined number of audio signals input to the frame memory.
Or calculating means for calculating a power spectrum for a plurality of frequency bands, and determining means for determining whether the input voice is a voice section or a silent section based on the calculated power spectrum and a given threshold value. Things are used. The threshold value may be adjusted according to the storage amount of the ring memory.

【００２９】[0029]

【作用】この発明による第１の話速変換装置では、入力
音声信号は、話速変換処理手段によって話速変換処理さ
れる。話速変換処理手段の出力は、リングメモリに書き
込まれる。リングメモリに書き込まれたデータは、一定
速度で読み出される。話速変換処理手段においては、入
力音声信号が音声区間でありかつリングメモリがオーバ
ーフロー直前状態でないときに、設定再生速度倍率をｎ
として１／ｎ以上の圧縮率であって、操作者によって設
定された番組種類に応じて決定された圧縮率で入力音声
信号に対して圧縮伸長処理が行なわれる。In the first speech speed conversion device according to the present invention, the input voice signal is subjected to speech speed conversion processing by speech speed conversion processing means. The output of the speech speed conversion processing means is written to the ring memory. Data written to the ring memory is read at a constant speed. When the input voice signal is a voice section and the ring memory is not in a state immediately before overflow, the set reproduction speed magnification is set to n.
The compression / expansion process is performed on the input audio signal at a compression rate of 1 / n or more, which is determined according to the program type set by the operator.

【００３０】この発明による第２の話速変換装置では、
入力されるアナログ音声信号は、Ａ／Ｄ変換手段によ
り、設定再生速度倍率に応じたサンプリング周波数でサ
ンプリングされる。Ａ／Ｄ変換手段から出力された音声
信号は、フレームメモリに入力される。フレームメモリ
に所要数の音声信号が入力されるごとに、話速変換処理
手段により、それらの音声信号に対して話速変換処理が
行なわれる。話速変換処理手段の出力は、リングメモリ
に書き込まれる。リングメモリに書き込まれたデータ
は、１倍速再生時のサンプリング周波数に等しい周波数
の読み出し信号に基づいて読み出される。リングメモリ
の書き込み信号と読み出し信号とに基づいて、蓄積量算
出手段によって、リングメモリの蓄積量が算出される。In the second speech speed converter according to the present invention,
The input analog audio signal is sampled by the A / D converter at a sampling frequency corresponding to the set reproduction speed magnification. The audio signal output from the A / D converter is input to the frame memory. Each time a required number of voice signals are input to the frame memory, the voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is written to the ring memory. The data written in the ring memory is read based on a read signal having a frequency equal to the sampling frequency at the time of 1 × speed reproduction. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００３１】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときには、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類に応じて決定された圧縮率で圧
縮伸長処理が行なわれる。In the speech speed conversion processing means, the input voice corresponding to the required number of voice signals input to the frame memory is determined by the section determining means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input voice is a voice section and the ring memory is not in the state immediately before the overflow, the compression rate is 1 / n or more, where n is the set reproduction speed magnification, and the compression rate depends on the program type set by the operator. The compression / expansion processing is performed at the compression ratio determined in this manner.

【００３２】この発明による第３の話速変換装置では、
入力されるディジタル音声信号が、設定された再生速度
倍率に応じた速度でフレームメモリに書き込まれる。フ
レームメモリに所要数の音声信号が入力されるごとに、
話速変換処理手段により、それらの音声信号に対して話
速変換処理が行なわれる。話速変換処理手段の出力は、
リングメモリに書き込まれる。リングメモリに書き込ま
れたデータは、読み出し信号に基づいて一定速度で読み
出される。リングメモリの書き込み信号と読み出し信号
とに基づいて、蓄積量算出手段によって、リングメモリ
の蓄積量が算出される。In the third speech speed converter according to the present invention,
The input digital audio signal is written to the frame memory at a speed corresponding to the set reproduction speed magnification. Each time the required number of audio signals are input to the frame memory,
The voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is
Written to the ring memory. Data written to the ring memory is read at a constant speed based on a read signal. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００３３】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときには、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類に応じて決定された圧縮率で圧
縮伸長処理が行なわれる。In the speech speed conversion processing means, the input voice corresponding to the required number of voice signals input to the frame memory is determined by the section determining means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input voice is a voice section and the ring memory is not in the state immediately before the overflow, the compression rate is 1 / n or more, where n is the set reproduction speed magnification, and the compression rate depends on the program type set by the operator. The compression / expansion processing is performed at the compression ratio determined in this manner.

【００３４】この発明による第４の話速変換装置では、
入力音声信号は、話速変換処理手段によって話速変換処
理される。話速変換処理手段の出力は、リングメモリに
書き込まれる。リングメモリに書き込まれたデータは、
一定速度で読み出される。話速変換処理手段において
は、入力音声信号が音声区間でありかつリングメモリが
オーバーフロー直前状態でないときに、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類およびリングメモリの蓄積量に
応じて決定された圧縮率で入力音声信号に対して圧縮伸
長処理が行なわれる。In the fourth speech speed converter according to the present invention,
The input voice signal is subjected to speech speed conversion processing by speech speed conversion processing means. The output of the speech speed conversion processing means is written to the ring memory. The data written to the ring memory
Read at a constant speed. In the voice speed conversion processing means, when the input voice signal is a voice section and the ring memory is not in a state immediately before overflow, the compression rate is 1 / n or more, where n is the set reproduction speed magnification, and is set by the operator. A compression / expansion process is performed on the input audio signal at a compression rate determined according to the type of program and the storage amount of the ring memory.

【００３５】この発明による第５の話速変換装置では、
入力されるアナログ音声信号は、Ａ／Ｄ変換手段によ
り、設定再生速度倍率に応じたサンプリング周波数でサ
ンプリングされる。Ａ／Ｄ変換手段から出力された音声
信号は、フレームメモリに入力される。フレームメモリ
に所要数の音声信号が入力されるごとに、話速変換処理
手段により、それらの音声信号に対して話速変換処理が
行なわれる。話速変換処理手段の出力は、リングメモリ
に書き込まれる。リングメモリに書き込まれたデータ
は、１倍速再生時のサンプリング周波数に等しい周波数
の読み出し信号に基づいて読み出される。リングメモリ
の書き込み信号と読み出し信号とに基づいて、蓄積量算
出手段によって、リングメモリの蓄積量が算出される。In the fifth speech speed converter according to the present invention,
The input analog audio signal is sampled by the A / D converter at a sampling frequency corresponding to the set reproduction speed magnification. The audio signal output from the A / D converter is input to the frame memory. Each time a required number of voice signals are input to the frame memory, the voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is written to the ring memory. The data written in the ring memory is read based on a read signal having a frequency equal to the sampling frequency at the time of 1 × speed reproduction. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００３６】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときには、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類およびリングメモリの蓄積量に
応じて決定された圧縮率で圧縮伸長処理が行われる。In the speech speed conversion processing means, the input voice corresponding to the required number of voice signals input to the frame memory is discriminated by the section discriminating means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input voice is in the voice section and the ring memory is not in the state immediately before the overflow, the compression rate is not less than 1 / n with the set reproduction speed magnification being n, and the program type and the ring set by the operator are set. The compression / decompression processing is performed at a compression ratio determined according to the amount of storage in the memory.

【００３７】この発明による第６の話速変換装置では、
入力されるディジタル音声信号が、設定された再生速度
倍率に応じた速度でフレームメモリに書き込まれる。フ
レームメモリに所要数の音声信号が入力されるごとに、
話速変換処理手段により、それらの音声信号に対して話
速変換処理が行なわれる。話速変換処理手段の出力は、
リングメモリに書き込まれる。リングメモリに書き込ま
れたデータは、読み出し信号に基づいて一定速度で読み
出される。リングメモリの書き込み信号と読み出し信号
とに基づいて、蓄積量算出手段によって、リングメモリ
の蓄積量が算出される。In the sixth speech speed converter according to the present invention,
The input digital audio signal is written to the frame memory at a speed corresponding to the set reproduction speed magnification. Each time the required number of audio signals are input to the frame memory,
The voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is
Written to the ring memory. Data written to the ring memory is read at a constant speed based on a read signal. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００３８】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときには、設定再生速度倍
率をｎとして１／ｎ以上の圧縮率であって、操作者によ
って設定された番組種類およびリングメモリの蓄積量に
応じて決定された圧縮率で圧縮伸長処理が行われる。In the voice speed conversion processing means, the input voice corresponding to the required number of voice signals input to the frame memory is determined by the section determining means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input voice is in the voice section and the ring memory is not in the state immediately before the overflow, the compression rate is not less than 1 / n with the set reproduction speed magnification being n, and the program type and the ring set by the operator are set. The compression / decompression processing is performed at a compression ratio determined according to the amount of storage in the memory.

【００３９】この発明による第７の話速変換装置では、
入力されるアナログ音声信号は、Ａ／Ｄ変換手段によ
り、設定再生速度倍率に応じたサンプリング周波数でサ
ンプリングされる。Ａ／Ｄ変換手段から出力された音声
信号は、フレームメモリに入力される。フレームメモリ
に所要数の音声信号が入力されるごとに、話速変換処理
手段により、それらの音声信号に対して話速変換処理が
行なわれる。話速変換処理手段の出力は、リングメモリ
に書き込まれる。リングメモリに書き込まれたデータ
は、１倍速再生時のサンプリング周波数に等しい周波数
の読み出し信号に基づいて読み出される。リングメモリ
の書き込み信号と読み出し信号とに基づいて、蓄積量算
出手段によって、リングメモリの蓄積量が算出される。In the seventh speech speed converter according to the present invention,
The input analog audio signal is sampled by the A / D converter at a sampling frequency corresponding to the set reproduction speed magnification. The audio signal output from the A / D converter is input to the frame memory. Each time a required number of voice signals are input to the frame memory, the voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is written to the ring memory. The data written in the ring memory is read based on a read signal having a frequency equal to the sampling frequency at the time of 1 × speed reproduction. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００４０】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときにおいて、圧縮率固定
モードが選択されているときには、設定再生速度倍率を
ｎとして１／ｎ以上の圧縮率であって、操作者によって
設定された番組種類に応じて決定された圧縮率で圧縮伸
長処理が行なわれ、圧縮率変動モードが選択されている
ときには、設定再生速度倍率をｎとして１／ｎ以上の圧
縮率であって、操作者によって設定された番組種類およ
びリングメモリの蓄積量に応じて決定された圧縮率で圧
縮伸長処理が行なれる。In the speech speed conversion processing means, the input voice for the required number of voice signals input to the frame memory is determined by the section determining means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input speech is in the speech section and the ring memory is not in the state immediately before the overflow, and the fixed compression rate mode is selected, the compression rate is not less than 1 / n, where n is the set reproduction speed magnification. The compression / expansion process is performed at the compression rate determined according to the program type set by the operator, and when the compression rate variation mode is selected, the compression rate of 1 / n or more is set with the set reproduction speed magnification as n. The compression / expansion processing is performed at a compression rate determined according to the program type and the storage amount of the ring memory set by the operator.

【００４１】この発明による第８の話速変換装置では、
入力されるディジタル音声信号が、設定された再生速度
倍率に応じた速度でフレームメモリに書き込まれる。フ
レームメモリに所要数の音声信号が入力されるごとに、
話速変換処理手段により、それらの音声信号に対して話
速変換処理が行なわれる。話速変換処理手段の出力は、
リングメモリに書き込まれる。リングメモリに書き込ま
れたデータは、読み出し信号に基づいて一定速度で読み
出される。リングメモリの書き込み信号と読み出し信号
とに基づいて、蓄積量算出手段によって、リングメモリ
の蓄積量が算出される。In the eighth speech speed converter according to the present invention,
The input digital audio signal is written to the frame memory at a speed corresponding to the set reproduction speed magnification. Each time the required number of audio signals are input to the frame memory,
The voice speed conversion processing means performs voice speed conversion processing on those voice signals. The output of the speech speed conversion processing means is
Written to the ring memory. Data written to the ring memory is read at a constant speed based on a read signal. The storage amount of the ring memory is calculated by the storage amount calculation means based on the write signal and the read signal of the ring memory.

【００４２】話速変換処理手段においては、フレームメ
モリに入力された所要数の音声信号に対する入力音声
が、区間判別手段により、音声区間か無音区間かが判別
される。そして、区間判別手段の出力および蓄積量算出
手段の出力に応じて、上記所要数の音声信号に対して圧
縮伸長処理または削除処理が行なわれる。信号処理手段
では、入力音声が音声区間でありかつリングメモリがオ
ーバーフロー直前状態でないときにおいて、圧縮率固定
モードが選択されているときには、設定再生速度倍率を
ｎとして１／ｎ以上の圧縮率であって、操作者によって
設定された番組種類に応じて決定された圧縮率で圧縮伸
長処理が行なわれ、圧縮率変動モードが選択されている
ときには、設定再生速度倍率をｎとして１／ｎ以上の圧
縮率であって、操作者によって設定された番組種類およ
びリングメモリの蓄積量に応じて決定された圧縮率で圧
縮伸長処理が行なれる。In the speech speed conversion processing means, the input voice corresponding to the required number of voice signals input to the frame memory is determined by the section determining means as a voice section or a silent section. Then, compression / expansion processing or deletion processing is performed on the required number of audio signals in accordance with the output of the section determination means and the output of the accumulation amount calculation means. In the signal processing means, when the input speech is in the speech section and the ring memory is not in the state immediately before the overflow, and the fixed compression rate mode is selected, the compression rate is not less than 1 / n, where n is the set reproduction speed magnification. The compression / expansion process is performed at the compression rate determined according to the program type set by the operator, and when the compression rate variation mode is selected, the compression rate of 1 / n or more is set with the set reproduction speed magnification as n. The compression / expansion processing is performed at a compression rate determined according to the program type and the storage amount of the ring memory set by the operator.

【００４３】[0043]

【実施例】以下、図面を参照して、この発明をＶＴＲに
適用した場合の実施例について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the present invention is applied to a VTR will be described below with reference to the drawings.

【００４４】図１は、話速変換装置の全体的な構成を示
している。FIG. 1 shows the overall structure of the speech speed conversion device.

【００４５】入力音声信号は、ＡＬＣアンプ１で増幅さ
れた後、Ａ／Ｄ変換部２に送られ、例えば１２ビットの
ディジタル信号に変換される。Ａ／Ｄ変換部２の標準サ
ンプリング周波数は、たとえば８ＫＨｚである。２倍速
再生時には、Ａ／Ｄ変換部２のサンプリング周波数ｆｓ
ＡＤは、１６ＫＨｚとなる。The input audio signal is amplified by the ALC amplifier 1 and then sent to the A / D converter 2 where it is converted into, for example, a 12-bit digital signal. The standard sampling frequency of the A / D converter 2 is, for example, 8 KHz. At the time of 2 × speed reproduction, the sampling frequency fs of the A / D converter 2
AD is 16 KHz.

【００４６】Ａ／Ｄ変換部２の出力は、ＤＳＰ( Digita
l Signal Processor) ４に送られるとともにレベル検出
部３にも送られる。レベル検出部３は、Ａ／Ｄ変換部２
でＡ／Ｄ変換されたデータが変換レンジの最大値となっ
たときに、ＡＬＣ(automaticlevel control) 信号をＡ
ＬＣアンプ１に出力する。これにより、ＡＬＣアンプ１
のアンプ利得が制御され、Ａ／Ｄ変換部２の入力信号が
最大レンジを越えないようにされる。つまり、ＶＴＲの
再生テープ速度が変化するとＡＬＣアンプ１の入力信号
レベルも変化する。そこで、レベル検出部３の出力に基
づいて、アンプ利得を自動調整することにより、Ａ／Ｄ
変換部２の入力信号が最大レンジを越えないようにして
いる。The output of the A / D converter 2 is a DSP (Digital
l Signal Processor) 4 and also to the level detector 3. The level detection unit 3 includes the A / D conversion unit 2
When the A / D-converted data reaches the maximum value of the conversion range, the ALC (automatic level control) signal is output to A
Output to LC amplifier 1. Thereby, the ALC amplifier 1
Is controlled so that the input signal of the A / D converter 2 does not exceed the maximum range. That is, when the playback tape speed of the VTR changes, the input signal level of the ALC amplifier 1 also changes. Therefore, by automatically adjusting the amplifier gain based on the output of the level detector 3, the A / D
The input signal of the converter 2 is prevented from exceeding the maximum range.

【００４７】ＤＳＰ４は、２フレーム分の音声信号を記
憶できる容量のフレームメモリ５およびフレームメモリ
５に記憶された音声信号に対してフレーム単位で話速変
換処理を行なう話速変換部６とを備えている。１フレー
ムは、ここでは、２００個のサンプリングデータから構
成されるものとする。The DSP 4 includes a frame memory 5 having a capacity capable of storing two frames of audio signals, and a speech speed conversion unit 6 for performing a speech speed conversion process on a frame basis for the audio signals stored in the frame memory 5. ing. Here, it is assumed that one frame is composed of 200 pieces of sampling data.

【００４８】フレームメモリ５内の前半領域および後半
領域のうち、一方の領域に記憶された１フレーム分の音
声信号に対して話速変換部６により処理が行なわれると
同時に、他方の領域にＡ／Ｄ変換部２からの信号が蓄積
される。そして、この他方の領域に１フレーム分の信号
が蓄積されると、今度はその領域内のデータに対して話
速変換部６により処理が行なわれると同時に、既に処理
が行なわれたデータが記憶されていた上記一方の領域に
Ａ／Ｄ変換部２からの信号が蓄積される。One of the first half area and the second half area in the frame memory 5 is processed by the speech speed converter 6 for one frame of the audio signal stored in one of the areas, and at the same time, the A The signal from the / D converter 2 is accumulated. When the signal for one frame is accumulated in the other area, the data in the area is processed by the speech speed conversion unit 6 and the data already processed is stored. The signal from the A / D conversion unit 2 is stored in the one of the areas.

【００４９】話速変換部６から出力されたデータは、書
き込みクロックに基づいてリングメモリ７に書き込まれ
る。リングメモリ７に書き込まれたデータは、読み出し
クロックに基づいて、読み出される。リングメモリ７か
ら読み出された信号は、Ｄ／Ａ変換部８によってアナロ
グ信号に変換された後、アンプ１０で増幅され、音声出
力信号として出力される。The data output from the voice speed converter 6 is written to the ring memory 7 based on a write clock. The data written in the ring memory 7 is read based on a read clock. The signal read from the ring memory 7 is converted into an analog signal by the D / A converter 8, then amplified by the amplifier 10, and output as an audio output signal.

【００５０】Ｄ／Ａ変換部８のサンプリング周波数ｆｓ
ＤＡは、８ＫＨｚである。また、リングメモリ７の読み
出しクロックの周波数も８ＫＨｚである。リングメモリ
７としては、２１８４５×１２ｂｉｔのもの、すなわ
ち、２１８４５ワードのものが用いられている。したが
って、リングメモリ７にデータを蓄積できる最大時間
（入力信号に対する出力時間の最大遅延時間）は、２１
８４５×１／８０００＝２．７３秒となる。The sampling frequency fs of the D / A converter 8
DA is 8 KHz. The frequency of the read clock of the ring memory 7 is also 8 KHz. As the ring memory 7, a memory of 21845 × 12 bits, that is, a memory of 21845 words is used. Therefore, the maximum time during which data can be stored in the ring memory 7 (the maximum delay time of the output time with respect to the input signal) is 21
845 x 1/8000 = 2.73 seconds.

【００５１】リングメモリ７に対する書き込みクロック
は、アップダウンカウンタ９のアップカウント用入力端
子（ＵＰ）に入力する。リングメモリ７に対する読み出
しクロックは、アップダウンカウンタ９のダウンカウン
ト用入力端子（ＤＯＷＮ）に入力する。アップダウンカ
ウンタ９は、入力された書き込みクロックの総数と入力
された読み出しクロックの総数との差（リングメモリ７
の蓄積量）をカウントし、そのカウント値を１５ｂｉｔ
のディジタル信号として出力する。アップダウンカウン
タ９の出力は、話速変換部６に送られる。The write clock for the ring memory 7 is input to an up-counting input terminal (UP) of the up-down counter 9. A read clock for the ring memory 7 is input to a down-counting input terminal (DOWN) of the up-down counter 9. The up / down counter 9 calculates the difference between the total number of input write clocks and the total number of input read clocks (the ring memory 7).
Count) and count the count value to 15 bits
As a digital signal. The output of the up / down counter 9 is sent to the speech speed converter 6.

【００５２】図２は、話速変換部６の詳細な構成を示し
ている。FIG. 2 shows a detailed configuration of the speech speed conversion unit 6.

【００５３】フレームメモリ５から読み出された音声信
号は、パワー計算部１１に送られ、１フレーム分の音声
信号の平均パワー値Ｐが算出される。この平均パワー値
Ｐは、サンプリングされた１フレーム内の各音声信号の
振幅をｉ₀、ｉ₁、…ｉ_N- ₁（ただし、Ｎ＝２００）
とすると、次の数式１によって求められる。The audio signal read from the frame memory 5 is sent to the power calculator 11, and the average power value P of the audio signal for one frame is calculated. This average power value P represents the amplitude of each audio signal in one sampled frame as i ₀ , i ₁ , ... i _N- ₁ (However, N = 200)
Then, it is obtained by the following Expression 1.

【００５４】[0054]

【数１】 (Equation 1)

【００５５】パワー計算部１１で求められた平均パワー
値Ｐは、比較部１２に送られる。比較部１２には、しき
い値メモリ１３からしきい値Ｔｈが送られており、平均
パワー値Ｐがしきい値Ｔｈ以上（Ｐ≧Ｔｈ）か、平均パ
ワー値Ｐがしきい値Ｔｈより小さいか（Ｐ＜Ｔｈ）が判
別される。比較部１２からは、平均パワー値Ｐがしきい
値Ｔｈ以上（Ｐ≧Ｔｈ）のときには現フレームが音声区
間であることを示す信号が、平均パワー値Ｐがしきい値
Ｔｈより小さいときには現フレームが無音区間であるこ
とを示す信号が、それぞれ出力される。The average power value P obtained by the power calculator 11 is sent to the comparator 12. The threshold value Th is sent from the threshold value memory 13 to the comparison unit 12, and the average power value P is equal to or larger than the threshold value Th (P ≧ Th) or the average power value P is smaller than the threshold value Th. (P <Th) is determined. The comparison unit 12 outputs a signal indicating that the current frame is a voice section when the average power value P is equal to or larger than the threshold Th (P ≧ Th). Is a signal indicating that is a silent section.

【００５６】しきい値Ｔｈとしては、Ａ／Ｄ変換部２の
量子化ビット数が１２ｂｉｔのときには、たとえば、２
¹²に設定される。なお、次のようにして、しきい値Ｔｈ
を変更するようにしてもよい。すなわち、図２に点線で
示すように、パワー定常状態検出およびしきい値更新部
１４を設ける。パワー定常状態検出およびしきい値更新
部１４は、パワー計算部１１からの平均パワー値Ｐが、
所定フレーム数（例えば、４０フレーム）にわたって一
定であったか否かを判別し、一定であったときには（定
常状態）、そのときの平均パワー値Ｐの２倍の値をしき
い値メモリ１３に書き込み、しきい値Ｔｈを更新させ
る。ただし、更新されるしきい値の最大値は、所定値、
たとえば２¹⁴に制限される。このようにすることによ
り、定常的に発生している雑音を無音区間として取り扱
うことができるようになる。The threshold value Th is, for example, 2 when the number of quantization bits of the A / D converter 2 is 12 bits.
Set to ¹² . It should be noted that the threshold value Th is set as follows.
May be changed. That is, as shown by a dotted line in FIG. 2, a power steady state detection and threshold value updating unit 14 is provided. The power steady state detection and threshold update unit 14 calculates the average power value P from the power calculation unit 11 as
It is determined whether or not it is constant over a predetermined number of frames (for example, 40 frames). If it is constant (steady state), a value twice as large as the average power value P at that time is written into the threshold value memory 13, The threshold Th is updated. However, the maximum value of the updated threshold is a predetermined value,
For example, it is limited to 2 ¹⁴ . By doing so, it is possible to treat the constantly occurring noise as a silent section.

【００５７】また、入力信号の音声区間と無音区間と
を、次の数式２で示す各フレームの音声信号のパワー累
積値Ｐａと所与のしきい値とに基づいて判別するように
してもよい。The voice section and the silent section of the input signal may be determined based on the power accumulated value Pa of the voice signal of each frame and the given threshold value as shown in the following equation (2). .

【００５８】[0058]

【数２】 (Equation 2)

【００５９】比較部１２の出力は、条件分岐部１５に送
られる。条件分岐部１５には、リングメモリ蓄積量状態
判別部１６の出力が入力している。また、条件分岐部１
５には、パワー計算部１１を介してフレームメモリ５か
らの、音声信号が送られている。さらに、条件分岐部１
５には、ポーズ継続長設定メモリ１７が接続されてい
る。ポーズ継続長設定メモリ１７には、無音区間の削除
開始点を決定するためのポーズ継続長Ｔｄｅｌ（無音削
除開始点判別値）が設定されている。The output of the comparing section 12 is sent to the conditional branching section 15. The output of the ring memory accumulated amount state determination unit 16 is input to the conditional branch unit 15. Also, conditional branching unit 1
5 is supplied with an audio signal from the frame memory 5 via the power calculator 11. Furthermore, conditional branching unit 1
5, a pause continuation length setting memory 17 is connected. In the pause continuation length setting memory 17, a pause continuation length Tdel (a silence deletion start point discrimination value) for determining a deletion start point of a silent section is set.

【００６０】リングメモリ蓄積量状態判別部１６は、ア
ップダウンカウンタ９から送られてきた蓄積量に基づい
て、リングメモリ７の状態がオーバーフロー直前状態に
なったこと、およびリングメモリ７の状態がアンダーフ
ロー直前状態になったことを検出する。Based on the amount of storage sent from the up / down counter 9, the ring memory storage amount state determination unit 16 determines that the state of the ring memory 7 has become the state immediately before overflow and that the state of the ring memory 7 is under. Detects the state immediately before the flow.

【００６１】つまり、オーバーフロー検出用データメモ
リ１８にはオーバーフロー検出用データＴｍａｘが、ア
ンダーフロー検出用データメモリ１９にはアンダーフロ
ー検出用データＴｍｉｎが、それぞれ記憶されている。
オーバーフロー検出用データＴｍａｘは、例えば、リン
グメモリ７の総ワード数（ＴＯＴＡＬ）２１８４５より
２００小さい値２１６４５に設定されている。アンダー
フロー検出用データＴｍｉｎは、例えば、２００に設定
されている。That is, the overflow detection data memory 18 stores overflow detection data Tmax, and the underflow detection data memory 19 stores underflow detection data Tmin.
The overflow detection data Tmax is set to, for example, a value 21645 smaller than the total number of words (TOTAL) 21845 of the ring memory 7 by 200. The underflow detection data Tmin is set to, for example, 200.

【００６２】そして、アップダウンカウンタ９から送ら
れてきた蓄積量がオーバーフロー検出用データＴｍａｘ
以上になると、リングメモリ蓄積量状態判別部１６から
オーバーフロー直前検出信号が出力される。また、アッ
プダウンカウンタ９から送られてきた蓄積量がアンダー
フロー検出用データＴｍｉｎ以下になると、リングメモ
リ蓄積量状態判別部１６からアンダーフロー直前検出信
号が出力される。条件分岐部１５は、オーバーフロー直
前検出信号が入力されているときにはリングメモリ７が
オーバーフロー直前状態であると判別し、アンダーフロ
ー直前検出信号が入力されているときにはリングメモリ
７がアンダーフロー直前状態であると判別する。The accumulated amount sent from the up / down counter 9 is equal to the overflow detection data Tmax.
At this point, the ring memory storage amount state determination unit 16 outputs a detection signal immediately before overflow. When the accumulated amount sent from the up / down counter 9 becomes equal to or less than the underflow detection data Tmin, the ring memory accumulated amount state determination unit 16 outputs a detection signal immediately before the underflow. The conditional branch unit 15 determines that the ring memory 7 is in the state immediately before the overflow when the immediately before overflow detection signal is input, and the ring memory 7 is in the state immediately before the underflow when the immediately before underflow detection signal is input. Is determined.

【００６３】条件分岐部１５は、比較部１２から送られ
てくる音声区間または無音区間の判別信号と、リングメ
モリ蓄積量状態判別部１６から送られてくるリングメモ
リ状態に関する検出信号と、ポーズ継続長設定メモリ１
７に設定されているポーズ継続長Ｔｄｅｌとに基づい
て、以下の６つのケースに場合分けを行なう。そして、
それに応じて、マルチプレクサ２０を制御して、音声信
号を所定の処理部に送る。The condition branching unit 15 determines whether the signal is a speech section or a silent section sent from the comparing unit 12, a detection signal regarding the ring memory state sent from the ring memory storage amount state determining unit 16, and a pause continuation. Length setting memory 1
Based on the pause duration Tdel set to 7, the following six cases are classified. And
In response, the multiplexer 20 is controlled to send the audio signal to a predetermined processing unit.

【００６４】（１）第１ケース（ｃａｓｅ１）入力信号が音声区間であり、かつリングメモリ７がオー
バーフロー直前状態ではないと判別されたときには、第
１ケースとなる。(1) First Case (case 1) When it is determined that the input signal is in the voice section and the ring memory 7 is not in the state immediately before the overflow, the first case occurs.

【００６５】この場合には、音声信号は、マルチプレク
サ２０を介して、ピッチ圧縮伸長手段２３に送られる。
ピッチ圧縮伸長手段２３は、バリアブルスピーチコ
ントロール（ＶＳＣ）を行なうものであり、ＶＴＲの再
生速度倍率をｎとすると、入力信号に対して、圧縮率１
／ｎ以上の圧縮率αで伸長圧縮処理を行なう。圧縮率α
は、圧縮伸長率調整手段４２によって決定される。ここ
で用いられる伸長圧縮法としては、例えば、ポインター
移動量制御による重複加算法（Pointer Interval Contr
ol Overlap and Add : ＰＩＣＯＬＡ）、ＴＤＨＳ(Tim
e Domain Harmonic Scaling)法等がある。ピッチ伸長圧
縮手段２３で伸長圧縮処理が行なわれた信号は、デマル
チプレクサ２７を介してリングメモリ７に送られ、書き
込みクロックにしたがって、リングメモリ７に書き込ま
れる。In this case, the audio signal is sent to the pitch compression / expansion means 23 via the multiplexer 20.
The pitch compression / expansion means 23 performs variable speech control (VSC). Assuming that the reproduction speed magnification of the VTR is n, the compression ratio of the input signal is 1 unit.
The decompression and compression processing is performed at a compression ratio α of / n or more. Compression rate α
Is determined by the compression / expansion rate adjusting means 42. As the decompression compression method used here, for example, an overlap addition method (Pointer Interval Contr
ol Overlap and Add: PICOLA), TDHS (Tim
e Domain Harmonic Scaling) method. The signal subjected to the expansion / compression processing by the pitch expansion / compression means 23 is sent to the ring memory 7 via the demultiplexer 27, and is written to the ring memory 7 according to a write clock.

【００６６】ＶＴＲの２倍速再生時においては、Ａ／Ｄ
変換部２のサンプリング周波数ｆｓＡＤは１６ＫＨＺで
あり、Ｄ／Ａ変換部８のサンプリング周波数ｆｓＤＡは
８ＫＨＺである。このため、音程は元に戻されて出力さ
れる。During double-speed playback of a VTR, the A / D
The sampling frequency fsAD of the converter 2 is 16 KHZ, and the sampling frequency fsDA of the D / A converter 8 is 8 KHZ. Therefore, the pitch is restored and output.

【００６７】従来の一般的な時間軸伸長圧縮において
は、２倍速再生時には圧縮率１／２で、圧縮される。言
い換えれば、２ピッチ周期が１ピッチ周期に間引かれ
る。このため、出力音声は標準音声速度の２倍速とな
る。つまり、２倍速再生の通常再生では、出力音声は標
準音声速度の２倍速となる。ただし、音程は元のままと
なる。In conventional general time-axis expansion compression, compression is performed at a compression rate of 1/2 at the time of double speed reproduction. In other words, two pitch periods are thinned out to one pitch period. For this reason, the output sound is twice as fast as the standard sound speed. In other words, in the normal reproduction of the double speed reproduction, the output audio is twice the standard audio speed. However, the pitch remains unchanged.

【００６８】これに対し、図２の話速変換部６に設けら
れた上記ピッチ伸長圧縮手段２３では、圧縮率αは、ユ
ーザによって操作部（図示略）を用いて設定されたモー
ドおよびリングメモリ７の蓄積量の変化に基づいて、圧
縮伸長率調整手段４２によって決定される。ただし、圧
縮率αは、１／２以上の値である。On the other hand, in the pitch expansion / compression means 23 provided in the speech speed conversion section 6 of FIG. 2, the compression rate α is determined by the mode and ring memory set by the user using the operation section (not shown). 7 is determined by the compression / decompression rate adjusting means 42 based on the change in the accumulation amount of the data No. 7. However, the compression ratio α is a value of 以上 or more.

【００６９】ここで、ユーザによって操作部（図示略）
を用いて、モード設定する方法としては２種類ある。Here, the user operates the operation unit (not shown).
There are two types of methods for setting the mode by using.

【００７０】第１番目としては、録画（録音）開始直前
もしくは録画（録音）中に番組設定モード（番組種類）
を記録媒体、例えばビデオテープ、オーディオテープ、
ＤＡＴ等、又は設定番組メモリ（図示略）にユーザが設
定し記憶させる。First, a program setting mode (program type) immediately before starting recording (recording) or during recording (recording).
A recording medium, such as video tape, audio tape,
The user sets and stores it in a DAT or the like or a setting program memory (not shown).

【００７１】そして、再生時には、記録媒体又は設定番
組メモリに記憶された番組設定モード（番組種類）を読
み出し、このモード信号（図２参照）を用いて音声再生
速度を制御する。At the time of reproduction, the program setting mode (program type) stored in the recording medium or the set program memory is read, and the audio reproduction speed is controlled using this mode signal (see FIG. 2).

【００７２】第２番目としては、再生開始直前もしくは
再生時にユーザが番組設定モードを指定し、このモード
信号（図２参照）を用いて音声再生速度を制御する。Second, the user specifies the program setting mode immediately before the start of reproduction or at the time of reproduction, and controls the audio reproduction speed using this mode signal (see FIG. 2).

【００７３】操作部によって設定されるモードの種類に
は、番組を選択するための番組設定モードと、番組設定
モードによって設定された番組に対して圧縮率αを固定
させるか変動させるかを設定する固定変動設定モードと
がある。The type of mode set by the operation unit sets a program setting mode for selecting a program and whether the compression rate α is fixed or varied for the program set in the program setting mode. There is a fixed fluctuation setting mode.

【００７４】次の表は、ＶＴＲの２倍速再生時におい
て、番組設定モードによって設定される番組の例と、各
番組に対して固定モードが設定されたときの、各番組に
対する音声再生速度（圧縮率）と、各番組に対して変動
モードが設定されたときの、各番組に対する音声再生速
度（圧縮率）の変動範囲の一例をそれぞれ示している。The following table shows examples of programs set by the program setting mode during double-speed playback of a VTR, and the audio playback speed (compression) for each program when the fixed mode is set for each program. (Rate) and an example of a fluctuation range of the audio reproduction speed (compression rate) for each program when the fluctuation mode is set for each program.

【００７５】[0075]

【表１】 [Table 1]

【００７６】各番組に対する固定モードに対する音声再
生速度および変動モードに対する音声再生速度範囲は、
次のような考え方に基づいて設定されている。すなわ
ち、番組内容によって、発声速度が異なっている。例え
ば、ドラマ、ニュース、Ｆ１中継および将棋番組では、
発声速度は、Ｆ１中継が最も速く、ニュース、ドラマ、
将棋対局の順に発声速度が遅くなる。このような、発声
速度の違いは、単位時間当たりのモーラ数に起因してい
る。モーラ（ｍｏｒａ）とは、韻律音において、強勢や
抑揚などの単位となる音の相対的長さをいい、１モーラ
は、短母音を含む１音節の長さに相当する。The audio playback speed range for the fixed mode and the audio playback speed range for the variable mode for each program are as follows:
It is set based on the following concept. That is, the utterance speed differs depending on the program content. For example, in drama, news, F1 relay and shogi programs,
The utterance speed is the fastest in F1 broadcasting, news, drama,
The utterance speed decreases in the order of the shogi game. Such a difference in utterance speed is caused by the number of mora per unit time. Mora refers to the relative length of a sound that is a unit such as stress or intonation in a prosodic sound, and one mora corresponds to the length of one syllable including a short vowel.

【００７７】発話者により変動はあるが、各番組の単位
時間当りのモーラ数の平均値は、次のようになる。The average value of the number of mora per unit time of each program is as follows, although it varies depending on the speaker.

【００７８】Ｆ１中継：１２モーラ／秒ニュース：８モーラ／秒ドラマ：５モーラ／秒将棋対局：３モーラ／秒F1 relay: 12 mora / s News: 8 mora / s Drama: 5 mora / s Shogi game: 3 mora / s

【００７９】固定モードが設定されているときには、設
定番組についての固定モードにおける音声再生速度に対
する圧縮率が、圧縮率αとして決定される。たとえば、
ニュース番組が設定され、かつ固定モードが設定されて
いるときには、圧縮率αは、１．４倍速に対する圧縮
率、たとえば０．７１４と決定される。このように、発
声速度が速い番組ほど圧縮率が小さく（音声再生速度が
速く）されるので、次のような利点がある。When the fixed mode is set, the compression rate of the set program with respect to the audio reproduction speed in the fixed mode is determined as the compression rate α. For example,
When a news program is set and the fixed mode is set, the compression rate α is determined to be a compression rate for 1.4 times speed, for example, 0.714. As described above, a program having a higher utterance speed has a lower compression ratio (higher audio reproduction speed), and thus has the following advantages.

【００８０】つまり、発声速度が速い番組ほど、リング
メモリ７がオーバーフロー直前状態になりやすくなるの
で、音声再生速度が２倍速に近くなるように、圧縮率が
決定される。逆に、発声速度が遅い番組ほど、音声再生
速度が１倍速に近くなるように、圧縮率が決定される。
したがって、音声再生速度は、２倍速以下の速度であっ
て、かつ元の発声速度に応じた速度となり、より自然な
再生音が得られる。That is, the higher the utterance speed of a program, the more likely it is that the ring memory 7 will be in the state immediately before the overflow. Therefore, the compression ratio is determined so that the sound reproduction speed is close to twice the speed. Conversely, the compression rate is determined such that the lower the utterance speed of the program, the closer the sound reproduction speed to the normal speed.
Therefore, the sound reproduction speed is equal to or lower than the double speed and is a speed corresponding to the original utterance speed, and a more natural reproduced sound can be obtained.

【００８１】変動モードが設定されている場合には、設
定番組についての変動モードにおける音声再生速度範囲
に対する圧縮率の範囲内で、圧縮率αが次のようにして
決定される。圧縮伸長率調整手段４２は、リングメモリ
７の蓄積量が少なくなるほど、圧縮率が大きくなるよう
に、つまり音声再生速度が遅くなるように、そして、リ
ングメモリ７の蓄積量が多くなるほど、圧縮率が小さく
なるように、つまり音声再生速度が速くなるように、圧
縮率αを決定する。When the variable mode is set, the compression rate α is determined as follows in the range of the compression rate with respect to the audio reproduction speed range in the variable mode for the set program. The compression / expansion rate adjusting means 42 sets the compression rate so that the compression rate increases as the storage amount of the ring memory 7 decreases, that is, the audio reproduction speed decreases, and the compression rate increases as the storage amount of the ring memory 7 increases. The compression ratio α is determined so that is smaller, that is, the audio reproduction speed is higher.

【００８２】つまり、第１ケース（ｃａｓｅ１）に該当
すると判別されたときには、圧縮伸長率調整手段４２
は、リングメモリ蓄積量状態判別部１６からリングメモ
リ７の蓄積量を得る。そして、得られたリングメモリ７
の蓄積量を、Ｄ／Ａ変換部８のサンプリング周波数で除
することにより、蓄積時間Ｔｍを算出する。算出された
蓄積時間Ｔｍに基づいて、圧縮率αを決定する。That is, when it is determined that the case corresponds to the first case (case 1), the compression / decompression rate adjusting means 42
Obtains the storage amount of the ring memory 7 from the ring memory storage amount state determination unit 16. And the obtained ring memory 7
Is divided by the sampling frequency of the D / A converter 8 to calculate the accumulation time Tm. The compression rate α is determined based on the calculated accumulation time Tm.

【００８３】より具体的に説明すると、リングメモリ蓄
積量状態判別部１６から得られたリングメモリ７の蓄積
量が、Ｄ／Ａ変換部８のサンプリング周波数である８０
００で除されることにより、蓄積時間Ｔｍが求められ
る。そして、各番組ごとに予め作成された蓄積時間に対
する圧縮率のデータに基づいて、蓄積時間Ｔｍに対する
圧縮率αが求められる。More specifically, the storage amount of the ring memory 7 obtained from the ring memory storage amount state determination unit 16 is a sampling frequency of the D / A conversion unit 8 of 80.
By dividing by 00, the accumulation time Tm is obtained. Then, the compression rate α for the storage time Tm is obtained based on the data of the compression rate for the storage time created in advance for each program.

【００８４】次の表は、ＶＴＲの２倍速再生時における
Ｆ１中継の番組についての蓄積時間Ｔｍに対する圧縮率
αのデータの一例を示している。この表において、Ｖ
は、圧縮率に対応する音声再生速度を示している。The following table shows an example of the data of the compression rate α with respect to the accumulation time Tm for the F1 relay program at the time of double-speed reproduction of the VTR. In this table, V
Indicates the audio playback speed corresponding to the compression ratio.

【００８５】[0085]

【表２】 [Table 2]

【００８６】この表からわかるように、リングメモリ７
の蓄積時間Ｔｍが小さくなるほど、圧縮率αは大きくな
り、音声再生速度が遅くなる。逆に、リングメモリ７の
蓄積時間Ｔｍが大きくなるほど、圧縮率αは小さくな
り、音声再生速度が速くなる。したがって、変動モード
が設定されている場合には、固定モードが設定された場
合に述べた上記の利点に加えて、入力信号の音声区間に
おける音声の欠落部をできるだけ少なくできるとい利点
がある。As can be seen from this table, the ring memory 7
As the accumulation time Tm becomes shorter, the compression ratio α becomes larger, and the sound reproduction speed becomes slower. Conversely, as the storage time Tm of the ring memory 7 increases, the compression ratio α decreases, and the audio reproduction speed increases. Therefore, when the variable mode is set, in addition to the above-described advantages when the fixed mode is set, there is an advantage that the missing portion of the voice in the voice section of the input signal can be reduced as much as possible.

【００８７】上記方法では、音声の欠落部をできるだけ
少なくするようにしているが、Ｆ１中継、早口のニュー
スでは、高齢者には聞き取れない場合が起こりうる。こ
のような場合には、音声の欠落部を多くし、例えば、蓄
積時間に対する音声再生速度範囲を１．０〜１．３倍速
とし、音声をゆっくりにするようにしてもよい。このよ
うにすると、音声の欠落は多くなるが、再生される音声
速度がゆっくりになり、高齢者にも音声が聞取り易くな
る。In the above-described method, the missing portion of the voice is reduced as much as possible. However, in the F1 relay and the fast-talking news, the elderly person may not be able to hear it. In such a case, the number of missing portions of the sound may be increased, for example, the sound reproduction speed range with respect to the storage time may be set to 1.0 to 1.3 times, and the sound may be made slow. In this way, although the number of missing voices increases, the speed of the voice to be played back is reduced, and the elderly people can easily hear the voices.

【００８８】圧縮率αが、１／２以上の圧縮率、たとえ
ば上記表１の中にはないが、説明の便宜上、２／３に決
定されたとすると、３ピッチ周期が２ピッチ周期に間引
かれる。このため、出力音声は標準音声速度の３／２倍
速となる。この場合も音程は、元のままである。このよ
うに、圧縮率２／３で圧縮された場合には、圧縮率１／
２の場合に比べて、２／３−１／２＝１／６だけ、信号
が伸長されることになる。この伸長分が、リングメモリ
７の蓄積量となる。If the compression ratio α is not less than 圧縮, for example, not shown in Table 1 above, but for convenience of explanation, if it is determined to be 2/3, the three pitch period is thinned out to two pitch periods. I will Therefore, the output sound is 3/2 times the standard sound speed. In this case also, the pitch remains unchanged. As described above, when compression is performed at a compression ratio of 2/3, the compression ratio is 1 /
As compared with the case of 2, the signal is expanded by 2 / 3-1 / 2 = 1/6. This extension is the amount of storage in the ring memory 7.

【００８９】ＰＩＣＯＬＡを用いて、入力信号を圧縮率
２／３で圧縮する方法について、図３を用いて簡単に説
明する。まず、入力信号からピッチ周期が抽出される。
抽出されたピッチ周期をＴｐとする。波形Ａに対して
は、１から０へ直線的に向かう重み（重み関数Ｋ１）が
つけられて、波形Ａ’が作成される。波形Ｂに対しては
０から１に向かう重み（重み関数Ｋ２）がつけられて、
波形Ｂ’が作成される。A method of compressing an input signal at a compression ratio of 2/3 using PICOLA will be briefly described with reference to FIG. First, a pitch period is extracted from an input signal.
Let the extracted pitch period be Tp. The waveform A is weighted linearly from 1 to 0 (weight function K1), and a waveform A 'is created. A weight (weight function K2) from 0 to 1 is assigned to the waveform B,
A waveform B 'is created.

【００９０】そして、それらの波形Ａ’およびＢ’が加
え合わされ、長さＴｐの波形Ａ’＊Ｂ’が作成される。
これらの重みは、波形Ａ’＊Ｂ’の前後の接続点での連
続性を保つためにつけられている。つぎに、ポインター
が、圧縮率に基づいて決まる長さである３Ｔｐ分だけ移
動され、同様な操作が行なわれる。これにより、３つの
波形Ａ、Ｂ、Ｃから２つの波形Ａ’＊Ｂ’およびＣが得
られる。このようにして、３ピッチ周期分の信号が、２
ピッチ周期分の信号に圧縮される。Then, the waveforms A 'and B' are added to generate a waveform A '* B' having a length Tp.
These weights are added to maintain continuity at connection points before and after the waveform A '* B'. Next, the pointer is moved by 3Tp, which is a length determined based on the compression ratio, and the same operation is performed. Thus, two waveforms A ′ * B ′ and C are obtained from the three waveforms A, B, and C. In this way, the signal for three pitch periods becomes 2
The signal is compressed to a pitch period signal.

【００９１】ピッチ伸長圧縮手段２３による伸長圧縮法
としては、図１７（ａ）、（ｂ）に示すように、ピッチ
抽出をすることなく、所定長の固定フレーム長Ｔｓ単位
で伸長圧縮処理を行うようにしてもよい。固定フレーム
長Ｔｓは、たとえば入力データの２００個分の長さに設
定される。図１７の例では、３Ｔｓを２Ｔｓにする例を
示している。As a decompression method by the pitch decompression / compression means 23, as shown in FIGS. 17 (a) and (b), decompression processing is performed in units of a predetermined fixed frame length Ts without extracting a pitch. You may do so. The fixed frame length Ts is set to, for example, a length of 200 input data. FIG. 17 shows an example in which 3Ts is changed to 2Ts.

【００９２】図１７（ａ）の方法では、固定フレーム長
Ｔｓの波形Ａ、Ｂ、Ｃのうち、波形Ａに対しては、１か
ら０へ直線的に向かう重み（重み関数Ｋ１）がつけられ
て、波形Ａ”が作成される。波形Ｂに対しては０から１
に向かう重み（重み関数Ｋ２）がつけられて、波形Ｂ”
が作成される。In the method shown in FIG. 17A, of the waveforms A, B, and C having the fixed frame length Ts, the waveform A is given a weight (weight function K1) that goes linearly from 1 to 0. Thus, a waveform A ″ is created.
Weight (weight function K2) toward the waveform B "
Is created.

【００９３】そして、それらの波形Ａ”およびＢ”が加
え合わされ、長さＴｓの波形Ａ”＊Ｂ”が作成される。
これらの重みは、波形Ａ”＊Ｂ”の前後の接続点での連
続性を保つためにつけられている。そして、次の波形Ｃ
に対しては、そのまま出力される。これにより、３つの
波形Ａ、Ｂ、Ｃから２つの波形Ａ”＊Ｂ”およびＣが得
られる。このようにして、３Ｔｓ分の信号が、２Ｔｓ分
の信号に圧縮される。Then, the waveforms A ″ and B ″ are added to generate a waveform A ″ * B ″ having a length Ts.
These weights are added to maintain continuity at connection points before and after the waveform A "* B". And the next waveform C
Is output as is. As a result, two waveforms A "* B" and C are obtained from the three waveforms A, B, and C. In this way, the signal for 3Ts is compressed into a signal for 2Ts.

【００９４】図１７（ｂ）の方法では、固定フレーム長
Ｔｓの波形Ａ〜Ｃのうちの波形Ａには先頭からたとえば
２０個のデータに０から１へ直線的に向かう重み（重み
関数Ｋ３）をつけて波形Ａ”を得る。波形Ｂには１８１
個目〜２００個目までの入力データに１から０へ直線的
に向かう重み（重み関数Ｋ４）をつけて波形Ｂ”を得
る。そして、波形Ｃを削除する。次の３つの波形Ｄ〜Ｆ
に対しても、同様な処理が行われる。このようにして、
３つの波形Ａ〜Ｃ（またはＤ〜Ｆ）からなる信号は、２
つの波形Ａ”およびＢ”（またはＤ”およびＥ”）から
なる信号に圧縮される。つまり、３Ｔｓ分の信号が、２
Ｔｓ分の信号に圧縮される。In the method shown in FIG. 17B, the waveform A among the waveforms A to C having the fixed frame length Ts has a weight (weight function K3) in which, for example, 20 data from the top linearly goes from 0 to 1. To obtain a waveform A ″.
A weight B (weight function K4) that linearly goes from 1 to 0 is applied to the first to 200th input data to obtain a waveform B ". Then, the waveform C is deleted. The following three waveforms D to F
, The same processing is performed. In this way,
A signal consisting of three waveforms AC (or DF) is 2
Compressed into a signal consisting of two waveforms A "and B" (or D "and E"). That is, the signal for 3Ts is 2
It is compressed to a signal of Ts.

【００９５】上記固定フレーム長単位での伸長圧縮処理
を用いた場合には、ピッチ周期ごとの伸長圧縮処理を用
いた場合に比べて、音質は低下するが、処理量は軽減さ
れる。When the decompression and compression processing in units of the fixed frame length is used, the sound quality is reduced, but the processing amount is reduced as compared with the case where the decompression and compression processing for each pitch period is used.

【００９６】なお、この話速変換装置が英語学習器に適
用されている場合には（１倍速再生時）、Ａ／Ｄ変換部
２のサンプリング周波数ｆｓＡＤは８ＫＨＺであり、Ｄ
／Ａ変換部８のサンプリング周波数ｆｓＤＡは８ＫＨＺ
である。この場合には、圧縮伸長率調整手段４２によっ
て、圧縮率αが１以上の値に決定される。圧縮率αが、
たとえば、１．５に決定された場合には、ピッチ圧縮伸
長手段２３で、２ピッチ周期が３ピッチ周期になるよう
に、音声信号が伸長される。つまり、音声区間が１．５
倍に伸長される。したがって、この場合には、１倍速再
生の通常再生時に対して、３／２−１＝１／２だけ信号
が伸長されることになり、この伸長分がリングメモリ７
の蓄積量となる。When this speech speed converter is applied to an English language learning device (at the time of 1 × speed reproduction), the sampling frequency fsAD of the A / D converter 2 is 8 KHZ,
The sampling frequency fsDA of the / A conversion unit 8 is 8 KHZ
It is. In this case, the compression rate α is determined to be a value of 1 or more by the compression / expansion rate adjusting means 42. The compression ratio α is
For example, if it is determined to be 1.5, the audio signal is expanded by the pitch compression / expansion means 23 so that two pitch periods become three pitch periods. That is, when the voice section is 1.5
It is doubled. Therefore, in this case, the signal is expanded by 3 / 2−1 = 1/2 with respect to the normal reproduction of the 1 × speed reproduction.
Is the accumulated amount.

【００９７】（２）第２ケース（ｃａｓｅ２）入力信号が音声区間であり、かつリングメモリ７がオー
バーフロー直前状態であると判別されたときには、第２
ケースとなる。(2) Second case (case 2) When it is determined that the input signal is in the voice section and the ring memory 7 is in the state immediately before the overflow, the second case
It becomes a case.

【００９８】この場合には、音声信号はマルチプレクサ
２０を介して、入力信号削除部２１に送られ、音声信号
が削除される。具体的には、アップダウンカウンタ９の
カウント値が、アンダーフロー検出用データＴｍｉｎに
なるまで、すなわちリングメモリ７がアンダーフロー直
前状態になるまで、リングメモリ７への書き込み動作が
停止される。In this case, the audio signal is sent to the input signal deleting section 21 via the multiplexer 20, and the audio signal is deleted. Specifically, the writing operation to the ring memory 7 is stopped until the count value of the up / down counter 9 becomes the underflow detection data Tmin, that is, until the ring memory 7 is in the state immediately before the underflow.

【００９９】リングメモリ７がアンダーフロー直前状態
になると、２００個以下の個数、例えば１００個の消音
信号（値”０”の信号）が消音挿入部２２から出力さ
れ、この消音信号がデマルチプレクサ２７を介してリン
グメモリ７に送られて書き込まれる。このように、消音
信号をリングメモリ７へ書き込んでいるのは、音声削除
によって音声信号の繋ぎ目にクリック音が発生するのを
防止するためである。When the ring memory 7 is in the state immediately before underflow, 200 or less, for example, 100 muffling signals (signals having a value “0”) are output from the muffling insertion section 22, and the muffling signals are output from the demultiplexer 27. Is sent to the ring memory 7 via the. The reason why the mute signal is written in the ring memory 7 is to prevent a click sound from being generated at a joint of audio signals due to audio deletion.

【０１００】（３）第３ケース（ｃａｓｅ３）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ未満であり、かつリング
メモリ７がオーバーフロー直前状態ではないと判別され
たときには、第３ケースとなる。(3) Third Case (case 3) It is determined that the input signal is a silent section, the duration of the silent section is less than the set pause duration Tdel, and the ring memory 7 is not in the state immediately before the overflow. When this is done, it becomes the third case.

【０１０１】この場合は、上記第１ケースの場合と同じ
処理が行なわれる。In this case, the same processing as in the first case is performed.

【０１０２】（４）第４ケース（ｃａｓｅ４）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ未満であり、かつリング
メモリ７がオーバーフロー直前状態であると判別された
ときには、第４ケースとなる。(4) Fourth Case (case 4) It is determined that the input signal is a silent section, the duration of the silent section is less than the set pause duration Tdel, and the ring memory 7 is in a state immediately before overflow. When it is done, it becomes the fourth case.

【０１０３】この場合は、上記第２ケースの場合と同じ
処理が行なわれる。In this case, the same processing as in the second case is performed.

【０１０４】（５）第５ケース（ｃａｓｅ５）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ以上であり、かつリング
メモリ７がアンダーフロー直前状態ではないと判別され
たときには、第５ケースとなる。(5) Fifth Case (case 5) If the input signal is a silent section, the duration of the silent section is longer than the set pause duration Tdel, and the ring memory 7 is not in the state immediately before the underflow. When it is determined, it is the fifth case.

【０１０５】この場合には、音声信号はマルチプレクサ
２０を介して、入力信号削除部２５に送られ、音声信号
が削除される。具体的には、リングメモリ７への書き込
み動作が停止される。ただし、音声区間のスタート部分
（無声区間）が欠落するのを防止したり、音声の削除に
よって繋ぎ目にクリック音が発生したりするのを防止す
るために、波形合成挿入部２６によって波形合成挿入処
理が行なわれる。In this case, the audio signal is sent to the input signal deleting section 25 via the multiplexer 20, and the audio signal is deleted. Specifically, the writing operation to the ring memory 7 is stopped. However, in order to prevent the start portion of the voice section (unvoiced section) from being lost or to prevent a click sound from being generated at a joint due to deletion of the voice, the waveform synthesis / insertion unit 26 performs the waveform synthesis / insertion. Processing is performed.

【０１０６】波形合成挿入部２６による波形合成挿入処
理について、図４（ａ）、（ｂ）を用いて説明する。図
４（ａ）による方法では、波形合成挿入部２６は、第１
メモリ３１および第２メモリ３２を備えている。入力信
号削除部２５による入力信号削除処理の開始時において
は、削除開始点から、１フレーム長以下の所定長さＴ
ｓ、例えば１フレーム分の入力信号が、第１メモリ３１
にアドレス順に順次記憶される。次に、第１メモリ３１
のアドレスが大きくなるにしたがって１から０に直線的
に変化する関数Ｋ１が、第１メモリ３１の内容Ａに乗算
される。そして、その乗算結果Ａ’が、再度第１メモリ
３１に書き込まれる。The waveform synthesizing and inserting process performed by the waveform synthesizing and inserting unit 26 will be described with reference to FIGS. In the method according to FIG. 4A, the waveform combining and inserting unit 26
A memory 31 and a second memory 32 are provided. At the start of the input signal deletion process by the input signal deletion unit 25, a predetermined length T equal to or less than one frame length from the deletion start point.
s, for example, an input signal for one frame is stored in the first memory 31.
Are sequentially stored in address order. Next, the first memory 31
Is multiplied with the content A of the first memory 31 by a function K1 that linearly changes from 1 to 0 as the address of the first memory 31 increases. Then, the multiplication result A ′ is written into the first memory 31 again.

【０１０７】また、入力信号削除部２５による入力信号
削除区間の終了点直前の所定長さＴｓ分の入力信号が、
第２メモリ３２にアドレス順に順次記憶される。次に、
第２メモリ３２のアドレスが大きくなるほど、０から１
に直線的に変化する関数Ｋ２が、第２メモリ３２の内容
Ｂに乗算される。そして、その乗算結果Ｂ’が、再度第
２メモリ３２に書き込まれる。この後、第１メモリ３１
の内容Ａ’と、第２メモリ３２の内容Ｂ’とが加え合わ
されて、所定長さＴｓのデータＡ’＊Ｂ’が得られる。
そして、得られた所定長さＴｓ分のデータＡ’＊Ｂ’が
デマルチプレクサ２７を介して、リングメモリ７に送ら
れ、リングメモリ７に書き込まれる。An input signal of a predetermined length Ts immediately before the end point of the input signal deletion section by the input signal deletion section 25 is
The data is sequentially stored in the second memory 32 in the order of addresses. next,
As the address of the second memory 32 increases, 0 to 1
Is multiplied by the content B of the second memory 32. Then, the multiplication result B ′ is written into the second memory 32 again. Thereafter, the first memory 31
Is added to the content B 'of the second memory 32 to obtain data A' * B 'having a predetermined length Ts.
Then, the obtained data A ′ * B ′ for the predetermined length Ts is sent to the ring memory 7 via the demultiplexer 27 and written into the ring memory 7.

【０１０８】図４（ｂ）による方法では、削除開始点か
ら、１フレーム長以下の所定長さＴｓ、例えば１フレー
ム分の入力信号が、第１メモリ３１にアドレス順に順次
記憶される。次に、後端に１から０に直線的に変化する
スロープがついた関数Ｋ３が、第１メモリ３１の内容Ａ
に乗算される。そして、その乗算結果Ａ’が、再度第１
メモリ３１に書き込まれる。In the method shown in FIG. 4B, input signals of a predetermined length Ts, for example, one frame, which are equal to or less than one frame length, are sequentially stored in the first memory 31 in order of address from the deletion start point. Next, a function K3 having a slope that changes linearly from 1 to 0 at the rear end is stored in the content A of the first memory 31.
Is multiplied by Then, the multiplication result A ′ is again the first
The data is written to the memory 31.

【０１０９】また、入力信号削除部２５による入力信号
削除区間の終了点直前の所定長さＴｓ分の入力信号が、
第２メモリ３２にアドレス順に順次記憶される。次に、
前端に０から１に直線的に変化するスロープがついた関
数Ｋ４が、第２メモリ３２の内容Ｂに乗算される。そし
て、その乗算結果Ｂ’が、再度第２メモリ３２に書き込
まれる。この後、第１メモリ３１の内容Ａ’と、第２メ
モリ３２の内容Ｂ’とが繋ぎ合わされて、２Ｔｓ分のの
データＡ’＋Ｂ’が得られる。そして、得られた２Ｔｓ
分のデータＡ’＋Ｂ’がデマルチプレクサ２７を介し
て、リングメモリ７に送られ、リングメモリ７に書き込
まれる。図４（ｂ）では、Ｔｓが、１フレーム分の長さ
である例を示したが、１フレームの半分の長さのデータ
をＴｓとしてもよい。The input signal of a predetermined length Ts immediately before the end point of the input signal deletion section by the input signal deletion section 25 is
The data is sequentially stored in the second memory 32 in the order of addresses. next,
The content B of the second memory 32 is multiplied by a function K4 having a slope linearly changing from 0 to 1 at the front end. Then, the multiplication result B ′ is written into the second memory 32 again. Thereafter, the content A 'of the first memory 31 and the content B' of the second memory 32 are joined to obtain 2Ts worth of data A '+ B'. And the obtained 2Ts
The minute data A ′ + B ′ is sent to the ring memory 7 via the demultiplexer 27 and written to the ring memory 7. FIG. 4B shows an example in which Ts is the length of one frame, but data having half the length of one frame may be used as Ts.

【０１１０】なお、入力信号削除部２５による無音区間
の音声信号の削除処理が繰り返し行なわれている場合
に、リングメモリ７がアンダーフロー直前状態になるこ
とがある。この場合には、リングメモリ７がアンダーフ
ロー直前状態なったときから、所定長さＴｓ分の入力信
号が第２メモリ３２に記憶される。そして、第１メモリ
３１に記憶されているデータと、第２メモリ３２に記憶
されているデータにもとづいて、上記と同様な入力信号
削除処理が行なわれる。When the input signal deleting section 25 repeatedly performs the process of deleting the audio signal in the silent section, the ring memory 7 may be in a state immediately before the underflow. In this case, the input signal for the predetermined length Ts is stored in the second memory 32 from the time when the ring memory 7 enters the state immediately before the underflow. Then, based on the data stored in the first memory 31 and the data stored in the second memory 32, the same input signal deletion processing as described above is performed.

【０１１１】（６）第６ケース（ｃａｓｅ６）入力信号が無音区間であり、かつ無音区間の継続長が設
定されたポーズ継続長Ｔｄｅｌ以上であり、かつリング
メモリ７がアンダーフロー直前状態であると判別された
ときには、第６ケースとなる。(6) Sixth Case (case 6) When the input signal is a silent section, the duration of the silent section is longer than the set pause duration Tdel, and the ring memory 7 is in a state immediately before underflow. When it is determined, it is the sixth case.

【０１１２】この場合は、入力信号は、マルチプレクサ
２０を介して間引き処理部２４に送られる。間引き処理
部２４では、ＶＴＲの再生速度倍率をｎとして、圧縮率
が１／ｎとなるように間引き処理が行なわれる。たとえ
ば、２倍速再生時には入力信号に対して圧縮率１／２で
間引きが行なわれ、３倍速再生時には入力信号に対して
圧縮率１／３で間引きが行なわれる。１倍速再生時に
は、入力信号がそのまま出力される。In this case, the input signal is sent to the thinning processing section 24 via the multiplexer 20. The thinning-out section 24 performs thinning-out processing so that the compression rate is 1 / n, where n is the reproduction speed magnification of the VTR. For example, at the time of double speed reproduction, the input signal is thinned at a compression rate of 1/2, and at the time of triple speed reproduction, the input signal is thinned at a compression rate of 1/3. During 1 × speed reproduction, the input signal is output as it is.

【０１１３】１／ｎ間引き処理部２４による間引き処理
としては、次のような方法が用いられる。ここでは、２
倍速再生時を例にとって説明する。The following method is used as the thinning processing by the 1 / n thinning processing section 24. Here, 2
A description will be given of the case of double speed reproduction as an example.

【０１１４】上述したＰＩＣＯＬＡまたはＴＤＨＳを用
いた時間軸圧縮法を用い、入力信号のピッチを抽出し、
ピッチデータ部分を圧縮率が１／２となるように、間引
く。The pitch of the input signal is extracted using the time axis compression method using PICOLA or TDHS described above,
The pitch data portion is thinned out so that the compression ratio becomes 1/2.

【０１１５】また、図５（ａ）〜（ｃ）に示すように、
ピッチ抽出をすることなく、所定時間Ｔｓごとに波形を
間引くようにしてもよい。Also, as shown in FIGS. 5A to 5C,
The waveform may be thinned every predetermined time Ts without performing the pitch extraction.

【０１１６】図５（ａ）の方法では、波形Ａ〜Ｄのう
ち、波形Ｂおよび波形Ｄが間引かれ、波形Ａ、Ｃからな
る信号が得られる。In the method of FIG. 5A, the waveform B and the waveform D among the waveforms A to D are thinned out, and a signal including the waveforms A and C is obtained.

【０１１７】図５（ｂ）の方法では、波形Ａ〜Ｄのう
ち、波形Ｂと波形Ｄが間引かれている。また、波形Ａに
は、前端に０から１に上昇するスロープ（関数Ｋ４）
が、後端に１から０に下降するスロープ（関数Ｋ３）が
ついた関数が乗算されて、波形Ａ’が作成される。ま
た、波形Ｃには、前端に０から１に上昇するスロープ
（関数Ｋ４）が、後端に１から０に下降するスロープ
（関数Ｋ３）がついた関数が乗算されて、波形Ｃ’が作
成される。このようにして、４つの波形Ａ〜Ｄからなる
信号は、２つの波形Ａ’およびＣ’からなる信号に圧縮
される。In the method shown in FIG. 5B, the waveform B and the waveform D among the waveforms A to D are thinned out. The waveform A has a slope rising from 0 to 1 at the front end (function K4).
Is multiplied by a function having a slope (function K3) having a slope falling from 1 to 0 at the rear end to generate a waveform A ′. The waveform C ′ is formed by multiplying the waveform C by a function having a slope (function K4) rising from 0 to 1 at the front end and a slope (function K3) falling from 1 to 0 at the rear end. Is done. In this way, a signal consisting of four waveforms A to D is compressed into a signal consisting of two waveforms A 'and C'.

【０１１８】図５（ｃ）の方法では、波形Ａに対して
は、１から０へ直線的に向かう重み（重み関数Ｋ１）が
つけられて、波形Ａ’が作成される。波形Ｂに対しては
０から１に向かう重み（重み関数Ｋ２）がつけられて、
波形Ｂ’が作成される。そして、それらの波形Ａ’およ
びＢ’が加え合わされ、長さＴｓの波形Ａ’＊Ｂ’が作
成される。In the method shown in FIG. 5C, the waveform A is weighted linearly from 1 to 0 (weight function K1) to generate a waveform A '. A weight (weight function K2) from 0 to 1 is assigned to the waveform B,
A waveform B 'is created. Then, those waveforms A ′ and B ′ are added to create a waveform A ′ * B ′ having a length Ts.

【０１１９】同様に、波形Ｃに対しては、１から０へ直
線的に向かう重み（関数Ｋ１）がつけられて、波形Ｃ’
が作成される。波形Ｄに対しては０から１に向かう重み
（関数Ｋ２）がつけられて、波形Ｄ’が作成される。そ
して、それらの波形Ｃ’およびＤ’が加え合わされ、長
さＴｓの波形Ｃ’＊Ｄ’が作成される。このようにし
て、４つの波形Ａ〜Ｄからなる信号は、２つの波形Ａ’
＊Ｂ’およびＣ’＊Ｄ’からなる信号に圧縮される。Similarly, a weight (function K1) linearly going from 1 to 0 is assigned to the waveform C, and the waveform C ′
Is created. The waveform D 'is weighted from 0 to 1 (function K2) to generate a waveform D'. Then, those waveforms C ′ and D ′ are added to create a waveform C ′ * D ′ having a length Ts. In this way, a signal consisting of four waveforms A to D becomes two waveforms A '
It is compressed into a signal consisting of * B 'and C' * D '.

【０１２０】上述のように、第６ケースに該当する場合
には、ＶＴＲの再生倍率をｎとして、圧縮率１／ｎで間
引き処理が行われているが、次のようにして圧縮率を制
御するようにしてもよい。As described above, in the case corresponding to the sixth case, the thinning-out process is performed at a compression ratio of 1 / n, where n is the reproduction magnification of the VTR. However, the compression ratio is controlled as follows. You may make it.

【０１２１】圧縮率１／ｎで間引き処理が行われている
場合、Ｄ／Ａ変換器８のサンプリング周波数ｆｓＤＡと
Ａ／Ｄ変換器２のサンプリング周波数ｆｓＡＤとの比ｆ
ｓＤＡ／ｆｓＡＤが、圧縮率１／ｎと等しい場合には、
リングメモリ７の蓄積量は、変化しない。しかしなが
ら、圧縮率１／ｎの演算精度、サンプリング周波数ｆｓ
ＡＤとｆｓＤＡのクロック精度によっては、ｆｓＤＡ／
ｆｓＡＤが圧縮率１／ｎと等しくならないことが起こり
うる。When the decimation process is performed at a compression ratio of 1 / n, the ratio f between the sampling frequency fsDA of the D / A converter 8 and the sampling frequency fsAD of the A / D converter 2 is obtained.
When sDA / fsAD is equal to the compression ratio 1 / n,
The storage amount of the ring memory 7 does not change. However, the calculation accuracy of the compression ratio 1 / n and the sampling frequency fs
Depending on the clock accuracy of AD and fsDA, fsDA /
It is possible that fsAD does not equal the compression ratio 1 / n.

【０１２２】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより大
きくなったとき（ｆｓＤＡ／ｆｓＡＤ＞１／ｎ）には、
ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）として、｛（１
／ａ）−（１／ｎ）｝だけ、圧縮率が小さくなり、間引
きの度合いが大きくなり、リングメモリ７の蓄積量が減
少していき、リングメモリ７の蓄積量がアンダーフロー
するおそれがある。When fsDA / fsAD becomes larger than the compression ratio 1 / n (fsDA / fsAD> 1 / n),
Assuming that fsDA / fsAD = 1 / a (a> 0), ｛(1
/ A)-(1 / n)}, the compression ratio decreases, the degree of thinning increases, the storage amount of the ring memory 7 decreases, and the storage amount of the ring memory 7 may underflow. .

【０１２３】一方、ｆｓＤＡ／ｆｓＡＤが圧縮率１／
ｎより小さくなったとき（ｆｓＤＡ／ｆｓＡＤ＜１／
ｎ）には、ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）とし
て、｛（１／ｎ）−（１／ａ）｝だけ、圧縮率が大きく
なり、間引きの度合いが小さくなり、リングメモリ７の
蓄積量が増加していく。On the other hand, fsDA / fsAD is equal to the compression ratio 1 /
n (fsDA / fsAD <1 /
In n), assuming that fsDA / fsAD = 1 / a (a> 0), the compression ratio increases by {(1 / n)-(1 / a)}, the degree of thinning decreases, and the ring memory 7 Is increasing.

【０１２４】したがって、間引き処理を行う場合には、
リングメモリ７の蓄積量を確認して、次のように圧縮率
を制御する。ｆｓＤＡ／ｆｓＡＤ＝１／ａ（ａ＞０）と
して、（１／ｎ）−α＜１／ａ＜（１／ｎ）＋αの条件
を満たすαを選定する。ただし、αは、０以上で１以下
の値であり、例えば０．００１〜０．１の範囲の値であ
る。Therefore, when performing the thinning process,
After confirming the storage amount of the ring memory 7, the compression ratio is controlled as follows. As fsDA / fsAD = 1 / a (a> 0), α that satisfies the condition of (1 / n) −α <1 / a <(1 / n) + α is selected. Here, α is a value of 0 or more and 1 or less, for example, a value in a range of 0.001 to 0.1.

【０１２５】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより大
きくなったとき、すなわち、リングメモリ７の蓄積量が
減少していく場合には、圧縮率を１／ｎから｛（１／
ｎ）＋α｝にする。つまり、圧縮率を大きくし、リング
メモリ７の蓄積量を増加させるようにする。When fsDA / fsAD becomes larger than the compression ratio 1 / n, that is, when the amount of storage in the ring memory 7 decreases, the compression ratio is changed from 1 / n to ｛(1 / n).
n) + α｝. That is, the compression ratio is increased, and the amount of storage in the ring memory 7 is increased.

【０１２６】ｆｓＤＡ／ｆｓＡＤが圧縮率１／ｎより小
さくなったとき、すなわち、リングメモリ７の蓄積量が
増加していく場合には、圧縮率を１／ｎから｛（１／
ｎ）−α｝にする。つまり、圧縮率を小さくし、リング
メモリ７の蓄積量を減少させるようにする。When fsDA / fsAD becomes smaller than the compression ratio 1 / n, that is, when the amount of storage in the ring memory 7 increases, the compression ratio is changed from 1 / n to ｛(1 / n).
n) -α｝. That is, the compression ratio is reduced, and the amount of storage in the ring memory 7 is reduced.

【０１２７】上記では、リングメモリ７の蓄積量に基づ
いて、圧縮率を変化させているが、間引き処理が行われ
る場合に、フレーム毎に圧縮率を｛（１／ｎ）−α｝ま
たは｛（１／ｎ）＋α｝に、交互に変化させるようにし
てもよい。In the above description, the compression ratio is changed based on the storage amount of the ring memory 7, but when the thinning process is performed, the compression ratio is set to {(1 / n) -α} or { (1 / n) + α｝ may be alternately changed.

【０１２８】図６および図７は、話速変換部６による処
理手順を示している。FIGS. 6 and 7 show a processing procedure by the speech speed conversion unit 6. FIG.

【０１２９】以下、ＶＴＲの２倍速再生時の場合の話速
変換部６による処理について、説明する。The processing performed by the speech speed conversion unit 6 in the case of double speed reproduction of a VTR will be described below.

【０１３０】（１）再生開始時の処理再生が開始されて、パワー計算部１１によって最初のフ
レームの平均パワー値Ｐが算出されると（ステップ
１）、算出された平均パワー値Ｐがしきい値Ｔｈ以上か
否かが比較部１２の出力に基づいて判別される（ステッ
プ２）。(1) Processing at the Start of Reproduction When reproduction is started and the average power value P of the first frame is calculated by the power calculator 11 (step 1), the calculated average power value P is a threshold. Whether or not the value is equal to or greater than Th is determined based on the output of the comparison unit 12 (step 2).

【０１３１】入力音声信号が無音区間から開始した場
合、最初のフレームにおいては、平均パワー値Ｐはしき
い値Ｔｈより小さくなり、ステップ１１に進む。そし
て、無音区間の継続長（無音区間が継続するフレーム
数）が算出され、算出された継続長がポーズ継続長メモ
リ１７に設定されているポーズ継続長Ｔｄｅｌ以上か否
かが判別される（ステップ１２）。このポーズ継続長Ｔ
ｄｅｌは、たとえば、フレーム数にして４フレーム分の
長さに設定されている。When the input speech signal starts from a silent section, in the first frame, the average power value P becomes smaller than the threshold value Th, and the process proceeds to step S11. Then, the duration of the silent section (the number of frames in which the silent section continues) is calculated, and it is determined whether or not the calculated duration is equal to or longer than the pause duration Tdel set in the pause duration memory 17 (step). 12). This pause continuation length T
“del” is set to, for example, a length of four frames in terms of the number of frames.

【０１３２】最初のフレームに対する処理においては、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であるの
で、リングメモリ蓄積量状態判別部１６の出力に基づい
て、リングメモリ７がアンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。In the processing for the first frame,
Since the continuation length of the silent section is less than the pause continuation length Tdel, it is determined whether or not the ring memory 7 is in the state immediately before the underflow based on the output of the ring memory storage amount state determination unit 16 (steps 13 and 14). .

【０１３３】最初のフレームに対する処理においては、
リングメモリ７は、アンダーフロー直前状態になってい
るので、フレームデータが間引き処理部２４によって圧
縮率１／２で間引きされ（ステップ２８）、間引き処理
後の圧縮データがリングメモリ７に書き込まれる。この
後、ステップ１に戻る。In the processing for the first frame,
Since the ring memory 7 is in the state immediately before the underflow, the frame data is thinned by the thinning processing unit 24 at a compression rate of 1/2 (step 28), and the compressed data after the thinning processing is written to the ring memory 7. Thereafter, the process returns to step 1.

【０１３４】（２）第１ケースとなる処理の説明ステップ２で、平均パワー値Ｐがしきい値Ｔｈ以上であ
ると判別されたときには、今回のフレームが音声区間で
あると判断され、ステップ３に進む。ステップ３では、
前フレームが削除区間であったか否かが、第１フラグＦ
１の状態に基づいて判別される。前フレームが削除区間
でない場合には、リングメモリ蓄積量状態判別部１６の
出力に基づいて、リングメモリ７がオーバーフロー直前
状態か否かが判別される（ステップ６、７）。前フレー
ムが削除区間である場合には、ステップ４および５の処
理が行なわれた後、リングメモリ７がオーバーフロー直
前状態か否かが判別される（ステップ６、７）。ステッ
プ４および５の処理については、後述する。(2) Description of the First Case Process If it is determined in step 2 that the average power value P is equal to or greater than the threshold Th, it is determined that the current frame is a voice section, and step 3 is performed. Proceed to. In step 3,
Whether or not the previous frame was a deletion section is determined by the first flag F
1 is determined based on the state. If the previous frame is not a deletion section, it is determined whether or not the ring memory 7 is in the state immediately before overflow based on the output of the ring memory storage amount state determination unit 16 (steps 6 and 7). If the previous frame is a section to be deleted, after the processes of steps 4 and 5 are performed, it is determined whether or not the ring memory 7 is in a state immediately before overflow (steps 6 and 7). Steps 4 and 5 will be described later.

【０１３５】ステップ７において、オーバーフロー直前
状態ではないと判別された場合には、第１ケースとな
り、ピッチ圧縮伸長手段２３によって、今回のフレーム
データが圧縮伸長率調整手段４２によって決定された圧
縮率αで時間軸圧縮される（ステップ８）。圧縮データ
は、リングメモリ７に送られて書き込まれる。この後、
ステップ１に戻る。If it is determined in step 7 that the current state is not the state immediately before the overflow, the first case occurs, and the pitch compression / expansion means 23 converts the current frame data into the compression rate α determined by the compression / expansion rate adjustment means 42. Is compressed on the time axis (step 8). The compressed data is sent to and written to the ring memory 7. After this,
Return to step 1.

【０１３６】（２）第２ケースとなる処理の説明ステップ２で、平均パワー値Ｐがしきい値Ｔｈ以上であ
ると判別されたときには、今回送られてきたフレームは
音声区間であると判断され、ステップ３に進む。ステッ
プ３では、前フレームが削除区間であったか否かが、第
１フラグＦ１の状態に基づいて判別される。前フレーム
が削除区間でない場合には、リングメモリ蓄積量状態判
別部１６の出力に基づいて、リングメモリ７がオーバー
フロー直前状態か否かが判別される（ステップ６、
７）。前フレームが削除区間である場合には、ステップ
４および５の処理が行なわれた後、リングメモリ７がオ
ーバーフロー直前状態か否かが判別される（ステップ
６、７）。ステップ４および５の処理については、後述
する。(2) Description of Processing in Second Case When it is determined in step 2 that the average power value P is equal to or greater than the threshold Th, the frame transmitted this time is determined to be a voice section. Then, go to step 3. In step 3, it is determined whether or not the previous frame is a deletion section based on the state of the first flag F1. If the previous frame is not the deletion section, it is determined whether or not the ring memory 7 is in the state immediately before the overflow based on the output of the ring memory storage amount state determination unit 16 (step 6,
7). If the previous frame is a section to be deleted, after the processes of steps 4 and 5 are performed, it is determined whether or not the ring memory 7 is in a state immediately before overflow (steps 6 and 7). Steps 4 and 5 will be described later.

【０１３７】ステップ７において、オーバーフロー直前
状態であると判別された場合には、第２ケースとなり、
リングメモリ蓄積量状態判別部１６からアンダーフロー
検出信号が出力されるまで、入力信号削除部２１によっ
て入力信号が削除される（ステップ９）。つまり、リン
グメモリ７がアンダーフロー直前状態になるまで、リン
グメモリ７への書き込みが停止される。If it is determined in step 7 that the state is immediately before the overflow, the second case occurs, and
The input signal is deleted by the input signal deletion unit 21 until the underflow detection signal is output from the ring memory storage amount state determination unit 16 (step 9). That is, writing to the ring memory 7 is stopped until the ring memory 7 is in a state immediately before underflow.

【０１３８】そして、リングメモリ７がアンダーフロー
直前状態になると、消音挿入部２２によって、２００個
以下の所定数の消音信号”０”がリングメモリ７に書き
込まれる（ステップ１０）。そして、ステップ１に戻
る。When the ring memory 7 is in the state immediately before underflow, the silence insertion section 22 writes a predetermined number of silence signals "0" of 200 or less into the ring memory 7 (step 10). Then, the process returns to step 1.

【０１３９】上記ステップ１０の処理の代わりに、図９
（ａ）または図９（ｂ）に示すような処理を行なっても
よい。図９（ａ）に示す方法について説明すると、ステ
ップ７でオーバーフロー直前状態と判別されたときか
ら、たとえば、２００個の入力信号に対する波形Ａに対
しては、１から０へ直線的に向かう重み（重み関数Ｋ
１）をつけて波形Ａ’を得る。また、アンダーフロー直
前から２００個前までの２００個の入力信号に対する波
形Ｂに対しては０から１に向かう重み（重み関数Ｋ２）
をつけて、波形Ｂ’を得る。[0139] Instead of the processing in step 10 described above, FIG.
The processing shown in FIG. 9A or FIG. 9B may be performed. The method shown in FIG. 9A will be described. From the time when it is determined in step 7 that the state is immediately before the overflow, for example, for the waveform A corresponding to 200 input signals, the weight (linear) going from 1 to 0 linearly ( Weight function K
1) is applied to obtain a waveform A '. In addition, for waveform B corresponding to 200 input signals from immediately before the underflow to 200 before the underflow, weights from 0 to 1 (weight function K2)
To obtain a waveform B ′.

【０１４０】そして、得られた２つの波形Ａ’および
Ｂ’を加え合わせて、２００個分の長さの波形Ａ’＊
Ｂ’を作成する。そして、この波形Ａ’＊Ｂ’に対する
２００個の信号をリングメモリ７に書き込む。なお、ア
ンダーフロー直前から２００個前の時点の検出は、アッ
プダウンカウンタ９のカウント値に基づいて行なわれ
る。これにより、音声削除区間の前後の音声信号の繋ぎ
目にクリック音が発生するのを、効果的に防止できる。Then, the obtained two waveforms A ′ and B ′ are added to form a waveform A ′ * of 200 lengths.
Create B '. Then, 200 signals for this waveform A ′ * B ′ are written in the ring memory 7. The detection 200 times before the immediately before the underflow is performed based on the count value of the up / down counter 9. Thus, it is possible to effectively prevent a click sound from being generated at a joint between audio signals before and after the audio deletion section.

【０１４１】図９（ｂ）に示す方法について説明する
と、ステップ７でオーバーフロー直前状態と判別された
ときから、たとえば、１００個の入力信号に対する波形
Ａに対しては、１から０へ直線的に向かう重み（重み関
数Ｋ１）をつけて波形Ａ’を得る。また、アンダーフロ
ー直前から１００個前までの１００個の入力信号に対す
る波形Ｂに対しては０から１に向かう重み（重み関数Ｋ
２）をつけて、波形Ｂ’を得る。そして、得られた２つ
の波形Ａ’およびＢ’を繋ぎ合わせた２００個分の信号
をリングメモリ７に書き込む。The method shown in FIG. 9B will be described. When it is determined in step 7 that the state is immediately before the overflow, for example, the waveform A for 100 input signals is linearly changed from 1 to 0. A weight A (weight function K1) is applied to obtain a waveform A '. Also, for waveform B for 100 input signals from immediately before the underflow to 100 before, the weights from 0 to 1 (weight function K
By adding 2), a waveform B 'is obtained. Then, 200 signals obtained by joining the two obtained waveforms A ′ and B ′ are written to the ring memory 7.

【０１４２】上記ステップ９では、オーバーフロー直前
状態であると判別された場合には、リングメモリ蓄積量
状態判別部１６からアンダーフロー検出信号が出力され
るまで、入力信号削除部２１によって入力信号が削除さ
れているが、リングメモリ７に蓄積されているデータ
を、リングメモリ７がアンダーフロー直前状態になるよ
うに、削除するようにしてもよい。In step 9 described above, if it is determined that the state is immediately before the overflow, the input signal deletion unit 21 deletes the input signal until the underflow detection signal is output from the ring memory storage state determination unit 16. However, the data stored in the ring memory 7 may be deleted so that the ring memory 7 is in a state immediately before the underflow.

【０１４３】具体的には、リングメモリ７の書込開始ア
ドレスを、図１８（ａ）に示すオーバーフロー直前状態
の時のアドレス（Ｃ地点）から、図１８（ｂ）に示すよ
うにリングメモリ７がアンダーフロー直前状態となるア
ドレス（Ａ地点）までジャンプさせる。したがって、ス
テップ９の処理では、Ａ地点からＣ地点までのアドレス
に蓄積されていたデータが削除されることになる。この
後、図１８（ｃ）に示すように、ステップ１０によって
消音信号が書き込まれた後、入力データが書き込まれて
いく。Specifically, as shown in FIG. 18B, the write start address of the ring memory 7 is changed from the address (point C) in the state immediately before the overflow shown in FIG. Is jumped to the address (point A) where the state immediately before the underflow occurs. Therefore, in the process of step 9, the data stored at the addresses from the point A to the point C is deleted. Thereafter, as shown in FIG. 18C, after the mute signal is written in step 10, the input data is written.

【０１４４】ステップ９において、上記のように、リン
グメモリ７に蓄積されているデータをリングメモリ７が
アンダーフロー直前状態になるように削除した場合、ス
テップ１０で消音信号をリングメモリ７に書き込む代わ
りに図１９（ａ）または図１９（ｂ）のような処理を行
ってもよい。If the data stored in the ring memory 7 is deleted in step 9 so that the ring memory 7 is in the state immediately before the underflow, the mute signal is written in the ring memory 7 in step 10. 19 (a) or FIG. 19 (b).

【０１４５】今、リングメモリ７の書込開始アドレス
が、図１８（ａ）に示すオーバーフロー直前状態の時の
アドレス（Ｃ地点）から、図１８（ｂ）に示すようにリ
ングメモリ７がアンダーフロー直前状態となるアドレス
（Ａ地点）までジャンプしたとする。このＡ地点から所
定数、例えば２００先のアドレス（図１９（ａ）のＢ地
点）までに蓄積されているデータＳに対しては、図１９
（ａ）に示すように、１から０へ直線的に向かう重み
（重み関数Ｋ１）をつけて波形Ｓ’を得る。また、それ
以後にリングメモリ７に書き込まれる２００個分の入力
データ（波形Ｔ）に対しては、図１９（ａ）に示すよう
に、０から１に向かう重み（重み関数Ｋ２）をつけて、
波形Ｔ’を得る。Now, as shown in FIG. 18B, the write start address of the ring memory 7 is changed from the address (point C) in the state immediately before the overflow shown in FIG. It is assumed that the user jumps to the address (point A) in the immediately preceding state. For data S stored from a point A to a predetermined number, for example, an address 200 points ahead (point B in FIG. 19A), FIG.
As shown in (a), a waveform S ′ is obtained by applying a weight (weight function K1) linearly going from 1 to 0. Further, as shown in FIG. 19A, a weight (weight function K2) from 0 to 1 is applied to 200 pieces of input data (waveform T) written to the ring memory 7 thereafter. ,
A waveform T 'is obtained.

【０１４６】そして、得られた２つの波形Ｓ’および
Ｔ’を加え合わせて、２００個分の長さの波形Ｓ’＊
Ｔ’を作成する。そして、この波形Ｓ’＊Ｔ’に対する
２００個の信号をＡ地点からリングメモリ７に書き込
む。これにより、蓄積データ削除区間の前後の音声信号
の繋ぎ目にクリック音が発生するのを、効果的に防止で
きる。Then, the obtained two waveforms S ′ and T ′ are added, and a waveform S ′ * having a length of 200 pieces is obtained.
Create T '. Then, 200 signals for this waveform S ′ * T ′ are written into the ring memory 7 from the point A. As a result, it is possible to effectively prevent a click sound from being generated at a joint between audio signals before and after the accumulated data deletion section.

【０１４７】図１９（ｂ）に示す方法について説明する
と、図１８（ｂ）のＡ地点から所定数、例えば１００個
先のアドレス（図１９（ｂ）のＢ地点）までに蓄積され
ているデータＳに対しては、１から０へ直線的に向かう
重み（重み関数Ｋ１）をつけて波形Ｓ’を得る。また、
それ以後にリングメモリ７に書き込まれる１００個分の
入力データ（波形Ｔ）に対しては、０から１に向かう重
み（重み関数Ｋ２）をつけて、波形Ｔ’を得る。そし
て、得られた２つの波形Ｓ’およびＴ’を繋ぎ合わせた
２００個分の信号をＡ地点からリングメモリ７に書き込
む。The method shown in FIG. 19B will be described. Data stored from a point A in FIG. 18B to a predetermined number of addresses, for example, 100 addresses ahead (point B in FIG. 19B). S is weighted linearly from 1 to 0 (weight function K1) to obtain a waveform S '. Also,
Thereafter, a weight (weight function K2) from 0 to 1 is applied to 100 pieces of input data (waveform T) written to the ring memory 7 to obtain a waveform T '. Then, 200 signals obtained by joining the two obtained waveforms S ′ and T ′ are written to the ring memory 7 from the point A.

【０１４８】（３）第３ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。(3) Description of the Process in the Third Case If it is determined in step 2 that the average power value P is smaller than the threshold Th, the continuation length of the silent section up to this time is calculated (step 11). The calculated duration is the pause duration Tde set in the pause duration memory 17.
It is determined whether it is equal to or greater than 1 (step 12). And
If it is determined that the duration of the silent section is less than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of No. 6, it is determined whether or not the state is immediately before underflow (steps 13 and 14).

【０１４９】リングメモリ７がアンダーフロー直前状態
になっていないときには、リングメモリ蓄積量状態判別
部１６の出力に基づいて、オーバーフロー直前状態か否
かが判別される（ステップ６、７）。オーバーフロー直
前状態でない場合には、第３ケースとなり、ピッチ圧縮
伸長手段２３によって、今回のフレームデータが圧縮伸
長率調整手段４２によって決定された圧縮率αで時間軸
圧縮される（ステップ８）。圧縮データは、リングメモ
リ７に送られて書き込まれる。この後、ステップ１に戻
る。If the ring memory 7 is not in the state immediately before the underflow, it is determined whether or not it is in the state immediately before the overflow based on the output of the ring memory storage amount state determination section 16 (steps 6 and 7). If it is not the state immediately before the overflow, the third case occurs, and the pitch compression / expansion means 23 compresses the current frame data on the time axis at the compression rate α determined by the compression / expansion rate adjusting means 42 (step 8). The compressed data is sent to and written to the ring memory 7. Thereafter, the process returns to step 1.

【０１５０】（４）第４ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ未満であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１３、１４）。(4) Description of Processing in Fourth Case When it is determined in step 2 that the average power value P is smaller than the threshold Th, the continuation length of the silent section up to this time is calculated (step 11). The calculated duration is the pause duration Tde set in the pause duration memory 17.
It is determined whether it is equal to or greater than 1 (step 12). And
If it is determined that the duration of the silent section is less than the pause duration Tdel, the ring memory storage amount state determination unit 1
Based on the output of No. 6, it is determined whether or not the state is immediately before underflow (steps 13 and 14).

【０１５１】リングメモリ７がアンダーフロー直前状態
になっていないときには、リングメモリ蓄積量状態判別
部１６の出力に基づいて、オーバーフロー直前状態か否
かが判別される（ステップ６、７）。オーバーフロー直
前状態である場合には、第４ケースとなり、リングメモ
リ蓄積量状態判別部１６からアンダーフロー検出信号が
出力されるまで、入力信号削除部２１によって入力信号
が削除される（ステップ９）。つまり、リングメモリ７
がアンダーフロー直前状態になるまで、リングメモリ７
への書き込みが中断される。If the ring memory 7 is not in the state immediately before the underflow, it is determined whether or not it is in the state immediately before the overflow based on the output of the ring memory storage amount state determination section 16 (steps 6 and 7). If the state is immediately before the overflow, the fourth case occurs, and the input signal is deleted by the input signal deletion unit 21 until the underflow detection signal is output from the ring memory storage amount state determination unit 16 (step 9). That is, the ring memory 7
Ring memory 7 until
Writing to is interrupted.

【０１５２】そして、リングメモリ７がアンダーフロー
直前状態になると、消音挿入部２２によって、２００個
以下の所定数の消音信号”０”がリングメモリ７に書き
込まれる（ステップ１０）。そして、ステップ１に戻
る。When the ring memory 7 is in the state immediately before underflow, the silence insertion section 22 writes a predetermined number of silence signals "0" of 200 or less into the ring memory 7 (step 10). Then, the process returns to step 1.

【０１５３】（５）第５ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ以上であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１５、１６）。(5) Description of the Process in the Fifth Case If it is determined in step 2 that the average power value P is smaller than the threshold Th, the continuation length of the silent section up to this time is calculated (step 11). The calculated duration is the pause duration Tde set in the pause duration memory 17.
It is determined whether it is equal to or greater than 1 (step 12). And
When it is determined that the duration of the silent section is equal to or longer than the pause duration Tdel, the ring memory storage amount state determination unit 1
6, it is determined whether or not the state is immediately before the underflow (steps 15 and 16).

【０１５４】リングメモリ７がアンダーフロー直前状態
でないときには、第５ケースとなり、今回のフレームが
入力信号削除部２５による削除区間であることを示す第
１フラグＦ１がセットされる（ステップ１７）。この第
１フラグＦ１は、電源投入時の初期設定において、リセ
ット（Ｆ１＝０）されている。そして、今回のフレーム
が入力信号削除部２５による削除区間の最初のフレーム
であるか否かを示す第２フラグＦ２がリセットされてい
るか否かが判別される（ステップ１８）。When the ring memory 7 is not in the state immediately before the underflow, the fifth case is set, and the first flag F1 indicating that the current frame is a deletion section by the input signal deletion section 25 is set (step 17). The first flag F1 has been reset (F1 = 0) in the initial setting when the power is turned on. Then, it is determined whether or not the second flag F2 indicating whether or not the current frame is the first frame of the deletion section by the input signal deletion unit 25 has been reset (step 18).

【０１５５】この第２フラグＦ２は、電源投入時の初期
設定において、リセット（Ｆ２＝０）されている。そし
て、入力信号削除部２５による削除区間の最初のフレー
ムに対する処理が終了したときにセット（Ｆ２＝１）に
される。そして、入力信号削除部２５による一連の削除
区間に対する処理が終了したときにリセット（Ｆ２＝
０）される。The second flag F2 has been reset (F2 = 0) in the initial setting when the power is turned on. Then, when the processing for the first frame of the deletion section by the input signal deletion unit 25 is completed, it is set to (F2 = 1). Then, when the processing for a series of deletion sections by the input signal deletion unit 25 is completed, reset (F2 =
0).

【０１５６】したがって、今回のフレームが入力信号削
除部２５による削除区間の最初のフレームであるときに
は、第２フラグＦ２は、リセット（Ｆ２＝０）されてい
る。第２フラグＦ２がリセットされているときには、波
形合成挿入部２６によって第１メモリ３１に今回のフレ
ームデータが記憶される（ステップ１９）。また、入力
信号削除部２５によって今回のフレームデータのリング
メモリ７への書き込みが停止される（ステップ２０）。
つまり、今回のフレームデータが削除される。そして、
第２フラグＦ２がセット（Ｆ２＝１）された後（ステッ
プ２１）、ステップ１に戻る。Therefore, when the current frame is the first frame of the section to be deleted by the input signal deleting section 25, the second flag F2 is reset (F2 = 0). When the second flag F2 is reset, the current frame data is stored in the first memory 31 by the waveform synthesis insertion unit 26 (step 19). Further, the writing of the current frame data to the ring memory 7 is stopped by the input signal deleting unit 25 (step 20).
That is, the current frame data is deleted. And
After the second flag F2 is set (F2 = 1) (Step 21), the process returns to Step 1.

【０１５７】さらに、無音区間が続いている場合には、
ステップ２、１１、１２、１５を通ってステップ１６に
移り、リングメモリ蓄積量状態判別部１６の出力に基づ
いて、リングメモリ７がアンダーフロー直前状態か否か
が判別される。Further, when a silent section continues,
The process proceeds to Step 16 through Steps 2, 11, 12, and 15, and it is determined whether or not the ring memory 7 is in the state immediately before the underflow based on the output of the ring memory storage amount state determination unit 16.

【０１５８】リングメモリ７がアンダーフロー直前状態
でないときには、今回のフレームが入力信号削除部２５
による削除区間であることを示す第１フラグＦ１がセッ
トされる（ステップ１７）。そして、今回のフレームが
入力信号削除部２５による削除区間の最初フレームであ
るか否かを示す第２フラグＦ２がリセットされているか
否かが判別される（ステップ１８）。When the ring memory 7 is not in the state immediately before the underflow, the current frame is set to the input signal deleting section 25.
A first flag F1 indicating that the section is a deletion section is set (step 17). Then, it is determined whether or not the second flag F2 indicating whether or not the current frame is the first frame of the deletion section by the input signal deletion unit 25 is reset (step 18).

【０１５９】この場合には、第２フラグＦ２はセット
（Ｆ２＝１）されているので、今回のフレームが入力信
号削除部２５による削除区間の最初のフレームでないと
判断される。この場合には、波形合成挿入部２６によっ
て第２メモリ３２に今回のフレームデータが記憶される
（ステップ２２）。また、入力信号削除部２５によって
今回のフレームデータのリングメモリ７への書き込みが
停止される（ステップ２３）。そして、ステップ１に戻
る。In this case, since the second flag F2 is set (F2 = 1), it is determined that the current frame is not the first frame of the deletion section by the input signal deletion unit 25. In this case, the current frame data is stored in the second memory 32 by the waveform synthesis insertion unit 26 (step 22). Further, the writing of the current frame data to the ring memory 7 is stopped by the input signal deleting unit 25 (step 23). Then, the process returns to step 1.

【０１６０】そして、さらに、無音区間が続きかつリン
グメモリ７がアンダーフロー直前状態となっていないと
きには、ステップ２、１１、１２、１５、１６、１７、
１８、２２および２３の処理が繰り返される。つまり、
第２メモリ３２のフレームデータが更新されるととも
に、フレームデータのリングメモリ７への書き込みが停
止される。When the silent section continues and the ring memory 7 is not in the state immediately before the underflow, steps 2, 11, 12, 15, 16, 17, and
The processing of 18, 22, and 23 is repeated. That is,
The frame data in the second memory 32 is updated, and the writing of the frame data to the ring memory 7 is stopped.

【０１６１】この後、音声区間のフレームデータが入力
されたときには、ステップ２において、平均パワー値Ｐ
がしきい値Ｔｈ以上となるので、前フレームが入力信号
削除部２５による削除区間であったか否かが、第１フラ
グＦ１状態に基づいて判別される（ステップ３）。この
場合には、第１フラグＦ１がセット（Ｆ１＝１）されて
いるので、前フレームが入力信号削除部２５による削除
区間であったと判別され、ステップ４に移る。ステップ
４では、入力信号削除部２５による削除処理が停止せし
められるとともに、波形合成挿入部２６による波形合成
挿入処理が行なわれる。Thereafter, when the frame data of the voice section is input, in step 2, the average power value P
Is greater than or equal to the threshold Th, it is determined based on the state of the first flag F1 whether or not the previous frame was a deletion section by the input signal deletion unit 25 (step 3). In this case, since the first flag F1 is set (F1 = 1), it is determined that the previous frame was a deletion section by the input signal deletion unit 25, and the process proceeds to Step 4. In step 4, the deletion process by the input signal deletion unit 25 is stopped, and the waveform synthesis insertion process by the waveform synthesis insertion unit 26 is performed.

【０１６２】すなわち、図４（ａ）を用いて既に説明し
たように、第１メモリ３１の内容に１から０に直線的に
変化する関数が乗算され、第２メモリ３２の内容に０か
ら１に直線的に変化する関数が乗算され、これらの両乗
算結果が加え合わされる。この加算結果（図４（ａ）の
Ａ’＊Ｂ’に相当する。）が、デマルチプレクサ２７を
介して、リングメモリ７に送られ、リングメモリ７に書
き込まれる。That is, as described above with reference to FIG. 4A, the content of the first memory 31 is multiplied by a function that changes linearly from 1 to 0, and the content of the second memory 32 is multiplied by 0 to 1. Is multiplied by a function that changes linearly, and the results of both multiplications are added. The result of this addition (corresponding to A ′ * B ′ in FIG. 4A) is sent to the ring memory 7 via the demultiplexer 27 and written into the ring memory 7.

【０１６３】この後、第１フラグＦ１および第２フラグ
Ｆ２がリセット（Ｆ１＝Ｆ２＝０）される（ステップ
５）。そして、ステップ６に進む。Thereafter, the first flag F1 and the second flag F2 are reset (F1 = F2 = 0) (step 5). Then, the process proceeds to Step 6.

【０１６４】ところで、連続している無音区間に対し
て、上記のような入力信号削除部２５による削除処理が
繰り返し行なわれている場合において、リングメモリ７
がアンダーフロー直前状態になることがある。この場合
には、上記ステップ１６でＹＥＳとなり、ステップ２４
に移る。ステップ２４では、前フレームが入力信号削除
部２５による削除区間であったか否かが、第１フラグＦ
１の状態に基づいて判別される。By the way, when the deletion processing by the input signal deletion section 25 as described above is repeatedly performed on the continuous silent section, the ring memory 7
May be in the state immediately before underflow. In this case, the result of step 16 is YES, and the
Move on to In step 24, it is determined whether or not the previous frame is a deletion section by the input signal deletion unit 25 by the first flag F.
1 is determined based on the state.

【０１６５】この場合には、第１フラグＦ１がセット
（Ｆ１＝１）されているので、ステップ２５に進み、第
２メモリ３２に今回のフレームデータが記憶される。そ
して、入力信号削除部２５による削除処理が停止せしめ
られるとともに、波形合成挿入部２６による波形合成挿
入処理が行なわれる（ステップ２６）。そして、第１フ
ラグＦ１および第２フラグＦ２がリセット（Ｆ１＝Ｆ２
＝０）された後（ステップ２７）、ステップ１に進む。In this case, since the first flag F1 has been set (F1 = 1), the routine proceeds to step 25, where the current frame data is stored in the second memory 32. Then, the deletion processing by the input signal deletion unit 25 is stopped, and the waveform synthesis insertion processing by the waveform synthesis insertion unit 26 is performed (step 26). Then, the first flag F1 and the second flag F2 are reset (F1 = F2
= 0) (step 27), and then proceed to step 1.

【０１６６】上記ステップ２６における波形合成挿入部
２６による波形合成挿入処理には、上記ステップ４で説
明した波形合成挿入処理とほぼ同様であるが、第２メモ
リ３２に記憶されているフレームデータが、リングメモ
リ７がアンダーフロー直前状態になった後のフレームデ
ータである点が、上記ステップ４で説明した処理の場合
と異なっている。The waveform synthesizing and inserting process performed by the waveform synthesizing and inserting unit 26 in step 26 is substantially the same as the waveform synthesizing and inserting process described in step 4 except that the frame data stored in the second memory 32 is The difference from the processing described in step 4 above is that the frame data is after the ring memory 7 is in the state immediately before the underflow.

【０１６７】なお、上記ステップ２５の処理を省略し、
ステップ２４でＹＥＳとなった場合に、第２メモリ３２
に今回のフレームデータを記憶させることなく、ステッ
プ２６に移るようにしてもよい。この場合には、ステッ
プ２６で行なわれる波形合成挿入処理においては、上記
ステップ４で説明した波形合成挿入処理と同様に、第２
メモリ３２に記憶されているアンダーフロー直前状態よ
り前のフレームデータ（前回のフレームデータ）が用い
られる。The processing in step 25 is omitted, and
If the answer is YES in step 24, the second memory 32
Alternatively, the process may proceed to step 26 without storing the current frame data. In this case, in the waveform synthesis insertion process performed in step 26, the second
The frame data before the underflow immediately before state (the previous frame data) stored in the memory 32 is used.

【０１６８】また、上記ステップ２２の処理を省略する
とともに上記ステップ３と上記ステップ４との間に、フ
レームデータを第２メモリ３２に記憶させるステップを
追加するようにしてもよい。この場合には、ステップ４
においては、上記ステップ１９において第１メモリ３１
に記憶された内容と、上記ステップ３と上記ステップ４
との間に追加されたステップにおいて第２メモリ３２に
記憶された内容とに基づいて、波形合成挿入処理が行わ
れる。Further, the processing of step 22 may be omitted, and a step of storing frame data in the second memory 32 may be added between step 3 and step 4. In this case, step 4
In the step 19, the first memory 31
And the contents stored in step 3 and step 4
The waveform synthesis insertion process is performed based on the content stored in the second memory 32 in the step added between the steps.

【０１６９】（６）第６ケースとなる処理の説明ステップ２で平均パワー値Ｐがしきい値Ｔｈより小さい
と判別されたときには、今回までの無音区間の継続長が
算出され（ステップ１１）、算出された継続長がポーズ
継続長メモリ１７に設定されているポーズ継続長Ｔｄｅ
ｌ以上か否かが判別される（ステップ１２）。そして、
無音区間の継続長がポーズ継続長Ｔｄｅｌ以上であると
判別された場合には、リングメモリ蓄積量状態判別部１
６の出力に基づいて、アンダーフロー直前状態か否かが
判別される（ステップ１５、１６）。(6) Description of the Process in the Sixth Case When it is determined in step 2 that the average power value P is smaller than the threshold Th, the continuation length of the silent section up to this time is calculated (step 11). The calculated duration is the pause duration Tde set in the pause duration memory 17.
It is determined whether it is equal to or greater than 1 (step 12). And
When it is determined that the duration of the silent section is equal to or longer than the pause duration Tdel, the ring memory storage amount state determination unit 1
6, it is determined whether or not the state is immediately before the underflow (steps 15 and 16).

【０１７０】リングメモリ７がアンダーフロー直前状態
であるときには、前フレームが入力信号削除部２５によ
る削除区間であったか否かが、第１フラグＦ１の状態に
基づいて判別される（ステップ２４）。第１フラグＦ１
がリセットされている場合（Ｆ１＝０）、すなわち、前
フレームが入力信号削除部２５による削除区間でなかっ
た場合には、第６ケースとなり、ステップ２８に移る。
ステップ２８では、間引き処理部２４によって、今回の
フレームデータが圧縮率１／２で間引き処理が行なわれ
る。そして、間引き処理されたデータは、リングメモリ
７に送られて書き込まれる。この後、ステップ１に戻
る。When the ring memory 7 is in the state immediately before the underflow, it is determined whether or not the previous frame is a deletion section by the input signal deletion section 25 based on the state of the first flag F1 (step 24). First flag F1
Is reset (F1 = 0), that is, if the previous frame is not the section to be deleted by the input signal deleting section 25, the sixth case occurs and the process proceeds to step.
In step 28, the thinning processing section 24 thins out the current frame data at a compression ratio of 1/2. The thinned data is sent to the ring memory 7 and written. Thereafter, the process returns to step 1.

【０１７１】つまり、無音区間の継続長がポーズ継続長
Ｔｄｅｌ以上であっても、リングメモリ７がアンダーフ
ロー直前状態であり、かつ前フレームが入力信号削除部
２５による削除区間でない場合には、フレームデータは
削除されず、圧縮率１／２で間引き処理が行なわれた
後、リングメモリ７に書き込まれる。That is, even if the duration of the silent section is equal to or longer than the pause duration Tdel, if the ring memory 7 is in the state immediately before underflow and the previous frame is not the section to be deleted by the input signal deleting section 25, the frame The data is not deleted, and after being thinned out at a compression ratio of 1/2, is written to the ring memory 7.

【０１７２】図７においては、ステップ１２において、
無音区間の継続長が設定されたポーズ継続長Ｔｄｅｌよ
り長いか否かが判別されているが、図８のステップ１２
Ａに示すように、無音区間の継続長Ｔが設定された第１
基準長Ｔ１未満か（Ｔ＜Ｔ１）、無音区間の継続長Ｔが
設定された第１基準長Ｔ１以上で設定された第２基準長
Ｔ２（ただしＴ１＜Ｔ２）未満か（Ｔ１≦Ｔ＜Ｔ２）、
または無音区間の継続長Ｔが設定された第２基準長Ｔ２
以上か（Ｔ≧Ｔ２）を、判別するようにしてもよい。第
１基準長としては、たとえば、４フレーム分の長さが、
第２基準長としてはたとえば４０フレーム分の長さが設
定される。In FIG. 7, in step 12,
It is determined whether or not the duration of the silent section is longer than the set pause duration Tdel.
A, as shown in FIG.
Whether it is less than the reference length T1 (T <T1), or less than the second reference length T2 (T1 <T2) set to be equal to or longer than the first reference length T1 in which the silence duration T is set (T1 ≦ T <T2) ),
Or a second reference length T2 in which a continuation length T of a silent section is set.
Whether (T ≧ T2) or not may be determined. As the first reference length, for example, the length for four frames is
As the second reference length, for example, a length for 40 frames is set.

【０１７３】そして、図８に示すように、各判別結果に
応じて、次のようなステップに進むようにしてもよい。
すなわち、無音区間の継続長Ｔが設定された第１基準長
Ｔ１未満（Ｔ＜Ｔ１）である場合には、ステップ１３に
進む。無音区間の継続長Ｔが設定された第１基準長Ｔ１
以上で設定された第２基準長Ｔ２（Ｔ１＜Ｔ２）未満
（Ｔ１≦Ｔ＜Ｔ２）であるときには、ステップ２８に進
んで１／ｎ間引き処理による間引きを行なう。無音区間
の継続長Ｔが設定された第２基準長Ｔ２以上（Ｔ≧Ｔ
２）であるときには、ステップ１５に進む。Then, as shown in FIG. 8, the process may proceed to the following steps according to each determination result.
That is, when the continuation length T of the silent section is less than the set first reference length T1 (T <T1), the process proceeds to step S13. First reference length T1 in which duration T of silent section is set
If it is less than the second reference length T2 (T1 <T2) (T1 ≦ T <T2) set as described above, the process proceeds to step 28, where the thinning is performed by the 1 / n thinning process. The duration T of the silent section is equal to or longer than the set second reference length T2 (T ≧ T
If 2), go to step 15.

【０１７４】図１０は、２倍速再生時の入力信号と出力
信号との関係を示し、特に無音区間の入力信号が削除さ
れる様子を示している。図１１および図１２は、リング
メモリ７へのデータ書き込み開始点、リングメモリ７か
らのデータ読み出し開始点ならびに図１０の各点Ａ〜Ｈ
におけるリングメモリ７の状態を示している。FIG. 10 shows a relationship between an input signal and an output signal at the time of double speed reproduction, and particularly shows a state where an input signal in a silent section is deleted. FIGS. 11 and 12 show a data write start point to the ring memory 7, a data read start point from the ring memory 7, and points A to H in FIG.
3 shows the state of the ring memory 7 in FIG.

【０１７５】２倍速再生開始時においては、入力信号は
無音区間となっており、かつリングメモリ７は空状態で
あるので（図１１（ａ）参照）、フレームデータが間引
き処理部２４によって圧縮率１／２で間引かれた後、リ
ングメモリ７に書き込まれていく。At the start of double-speed playback, the input signal is in a silent section and the ring memory 7 is empty (see FIG. 11A). After the data is decimated by ２, the data is written to the ring memory 7.

【０１７６】そして、リングメモリ７の蓄積量Ｔｍがア
ンダーフロー検出用データＴｍｉｎに達すると、リング
メモリ７からのデータの読み出しが開始される（図１１
（ｂ）参照）。When the accumulated amount Tm of the ring memory 7 reaches the underflow detection data Tmin, reading of data from the ring memory 7 is started (FIG. 11).
(B)).

【０１７７】そして、入力信号の音声区間ａに対するフ
レームデータが送られてくると（Ａ点）、フレームデー
タは、圧縮伸長率調整手段４２によって決定された１／
２以上の圧縮率αで、ピッチ圧縮伸長手段２３により圧
縮される。入力信号と出力信号との長さが一致する圧縮
率１／２の圧縮を基準とすると、圧縮率αが１／２以外
のときにはフレームデータが伸長される。この意味で、
図１０には、伸長処理と記載されている。そして、この
圧縮データがリングメモリ７に書き込まれる。Ａ点にお
いては、図１１（ｃ）に示すように、蓄積量ＴｍＡは、
Ｔｍｉｎのままである。Then, when the frame data for the voice section a of the input signal is sent (point A), the frame data is divided by 1 /
It is compressed by the pitch compression / expansion means 23 at a compression ratio α of 2 or more. Assuming that the compression ratio α is other than １／, the frame data is decompressed when the compression rate α is other than する. In this sense,
FIG. 10 illustrates the decompression process. Then, the compressed data is written to the ring memory 7. At point A, as shown in FIG.
It remains at Tmin.

【０１７８】入力信号の音声区間ａに対する出力信号ａ
１は、Ａ点での蓄積量ＴｍＡ分だけ遅れて読み出されて
いく。そして、入力信号の音声区間ａが入力され終わっ
た時点（Ｂ点）では、図１１（ｄ）に示すように、今回
の圧縮区間の開始点であるＡ点での蓄積量Ｔｍｉｎと、
Ａ点からＢ点までの音声区間ａの圧縮データの、圧縮率
１／２の圧縮に対する伸長分ＳｔＢとの和がリングメモ
リ７の蓄積量ＴｍＢ（＝ＳｔＢ＋Ｔｍｉｎ）となる。し
たがって、入力信号の音声区間ａに対する出力信号ａ１
は、Ｂ点からＴｍＢ（＝ＳｔＢ＋Ｔｍｉｎ）分が経過し
た点で出力され終わる。Output signal a for speech section a of the input signal
1 is read out with a delay of the accumulated amount TmA at the point A. Then, at the point in time when the voice section a of the input signal has been input (point B), as shown in FIG. 11D, the accumulation amount Tmin at point A, which is the start point of the current compression section,
The sum of the compressed data of the voice section a from the point A to the point B and the decompression amount StB with respect to the compression at the compression ratio of 1/2 is the storage amount TmB (= StB + Tmin) of the ring memory 7. Therefore, the output signal a1 for the speech section a of the input signal
Is output when TmB (= StB + Tmin) has elapsed from point B.

【０１７９】入力信号の音声区間ａに続くポーズ継続長
Ｔｄｅｌ未満の無音区間のフレームデータも、ピッチ圧
縮伸長手段２３によって１／２以上の圧縮率αで圧縮さ
れる。この無音区間に続いて音声区間ｂが入力される
と、この音声区間ｂのフレームデータもピッチ圧縮伸長
手段２３によって１／２以上の圧縮率αで圧縮される。The frame data in a silent section shorter than the pause duration Tdel following the audio section a of the input signal is also compressed by the pitch compression / expansion means 23 at a compression rate α of 1/2 or more. When a voice section b is input following the silent section, the frame data of the voice section b is also compressed by the pitch compression / expansion means 23 at a compression rate α of 1/2 or more.

【０１８０】そして、入力信号の音声区間ｂが入力され
終わった時点（Ｃ点）では、図１１（ｅ）に示すよう
に、今回の圧縮区間の開始点であるＡ点での蓄積量Ｔｍ
ｉｎと、Ａ点からＣ点までの入力信号に対応する圧縮デ
ータの、１／２圧縮に対する伸長分ＳｔＣとの和がリン
グメモリ７の蓄積量ＴｍＣ（＝ＳｔＣ＋Ｔｍｉｎ）とな
る。したがって、入力信号の音声区間ｂに対する出力信
号ｂ１は、Ｃ点からＴｍＣ（＝ＳｔＣ＋Ｔｍｉｎ）分が
経過した点で出力され終わる。At the point in time when the voice section b of the input signal has been input (point C), as shown in FIG. 11 (e), the accumulated amount Tm at point A which is the start point of the current compression section.
The sum of in and the decompression amount StC of the compressed data corresponding to the input signals from point A to point C with respect to 圧縮 compression is the accumulated amount TmC (= StC + Tmin) of the ring memory 7. Therefore, the output signal b1 for the voice section b of the input signal ends being output at the point when TmC (= StC + Tmin) has elapsed from the point C.

【０１８１】入力信号の音声区間ｂに続いて、ポーズ継
続長Ｔｄｅｌ以上の長さの無音区間の信号が送られてき
たときには、ポーズ継続長Ｔｄｅｌに達するまで（Ｄ
点）はフレームデータが、ピッチ圧縮伸長手段２３によ
って１／２以上の圧縮率αで圧縮される。When a signal in a silent section having a length equal to or longer than the pause duration Tdel is transmitted following the speech section b of the input signal, the signal reaches the pause duration Tdel (D
(Point), the frame data is compressed by the pitch compression / expansion means 23 at a compression ratio α of 1/2 or more.

【０１８２】Ｄ点では、図１１（ｆ）に示すように、今
回の圧縮区間の開始点であるＡ点での蓄積量Ｔｍｉｎ
と、Ａ点からＤ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＤとの和がリングメ
モリ７の蓄積量ＴｍＤ（＝ＳｔＤ＋Ｔｍｉｎ）となる。
したがって、入力信号の音声区間ｂとＤ点との間の無音
区間に対する出力信号は、Ｄ点からＴｍＤ（＝ＳｔＤ＋
Ｔｍｉｎ）分が経過した点で出力され終わる。At point D, as shown in FIG. 11F, the accumulation amount Tmin at point A, which is the start point of the current compression section,
Of the compressed data corresponding to the input signal from the point A to the point D and the decompression amount StD with respect to the 圧縮 compression is the accumulated amount TmD (= StD + Tmin) of the ring memory 7.
Therefore, the output signal for the silent section between the voice section b of the input signal and the point D is TmD (= StD +
Tmin), the output ends at the point when the minute has elapsed.

【０１８３】ポーズ継続長Ｔｄｅｌ以降の無音区間のフ
レームデータは、リングメモリ７の蓄積量がアンダーフ
ロー検出用データＴｍｉｎ以下になるまで、入力信号削
除部２５によって削除される。このポーズ削除部分の長
さＳｔｄは、今回の圧縮区間の開始点であるＡ点からＤ
点までの入力信号に対応する圧縮データの、１／２圧縮
に対する伸長分ＳｔＤと等しくなる。入力信号削除部２
５によって削除処理が行なわれた後においては、波形合
成挿入部２６によってクリック音防止のための合成波形
が挿入されるが、図１０には挿入された合成波形部分を
省略してある。The frame data in the silent section after the pause duration Tdel is deleted by the input signal deletion unit 25 until the storage amount of the ring memory 7 becomes equal to or less than the underflow detection data Tmin. The length Std of the pause deletion part is D from the point A which is the start point of the current compression section.
It becomes equal to the extension StD of the compressed data corresponding to the input signal up to the point with respect to 圧縮 compression. Input signal deletion unit 2
After the deletion process is performed by step 5, a synthesized waveform for preventing a click sound is inserted by the waveform synthesis insertion unit 26, but the inserted synthesized waveform portion is omitted in FIG.

【０１８４】入力信号が削除された区間の最終点（Ｅ
点）においては、図１２（ｇ）に示すように、リングメ
モリ７の蓄積量ＴｍＥは、アンダーフロー検出用データ
Ｔｍｉｎ以下となる。ここでは、蓄積量ＴｍＥがアンダ
ーフロー検出用データＴｍｉｎに等しくなった例を示し
ている。The last point (E
12), the accumulated amount TmE of the ring memory 7 is equal to or less than the underflow detection data Tmin, as shown in FIG. Here, an example is shown in which the accumulated amount TmE has become equal to the underflow detection data Tmin.

【０１８５】Ｅ点からの無音区間に対するフレームデー
タは、間引き処理部２４によって、圧縮率１／２で間引
かれた後、フレームメモリ７に書き込まれる。そして、
音声区間ｃの信号が入力さると（Ｆ点）、この音声区間
ｃのフレームデータがピッチ圧縮伸長手段２３によっ
て、１／２以上の圧縮率αで圧縮される。つまり、新た
な圧縮区間が開始される。そして、圧縮データがリング
メモリ７に書き込まれる。The frame data for the silent section from the point E is thinned out by the thinning-out processing section 24 at a compression ratio of 1/2, and then written to the frame memory 7. And
When the signal of the voice section c is input (point F), the frame data of the voice section c is compressed by the pitch compression / expansion means 23 at a compression ratio α of 1/2 or more. That is, a new compression section is started. Then, the compressed data is written to the ring memory 7.

【０１８６】Ｆ点では、図１２（ｈ）に示すように、リ
ングメモリ７の蓄積量ＴｍＦは、Ｅ点のときと同じＴｍ
ｉｎとなっている。At the point F, as shown in FIG. 12 (h), the accumulation amount TmF of the ring memory 7 is the same as that at the point E.
in.

【０１８７】入力信号の音声区間ｃに対する出力信号ｃ
１は、Ｆ点での蓄積量Ｔｍｉｎ分だけ遅れて出力されて
いく。入力信号の音声区間ｃに続くポーズ継続長Ｔｄｅ
ｌ未満の無音区間（音声区間ｃからＧ点までの無音区
間）のフレームデータも、ピッチ圧縮伸長手段２３によ
って１／２以上の圧縮率αで圧縮される。Output signal c for speech section c of the input signal
1 is output with a delay of the accumulation amount Tmin at the point F. Pause duration Tde following the voice section c of the input signal
Frame data in a silent section less than 1 (a silent section from the voice section c to the point G) is also compressed by the pitch compression / expansion means 23 at a compression ratio α of 1/2 or more.

【０１８８】Ｇ点では、図１２（ｉ）に示すように、今
回の圧縮区間の開始点であるＦ点での蓄積量Ｔｍｉｎ
と、Ｆ点からＧ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＧとの和がリングメ
モリ７の蓄積量ＴｍＧ（＝ＳｔＧ＋Ｔｍｉｎ）となる。
したがって、入力信号の音声区間ｃからＧ点までの無音
区間に対する出力信号は、Ｇ点からＴｍＧ（＝ＳｔＧ＋
Ｔｍｉｎ）分が経過した点で出力され終わる。At point G, as shown in FIG. 12 (i), the accumulation amount Tmin at point F which is the start point of the current compression section.
Of the compressed data corresponding to the input signal from the point F to the point G and the decompression amount StG with respect to the 圧縮 compression is the accumulated amount TmG (= StG + Tmin) of the ring memory 7.
Therefore, the output signal of the input signal for the silent section from the voice section c to the point G is TmG (= StG +
Tmin), the output ends at the point when the minute has elapsed.

【０１８９】ポーズ継続長Ｔｄｅｌ以降の無音区間のフ
レームデータは、リングメモリ７の蓄積量がアンダーフ
ロー検出用データＴｍｉｎになるまで、入力信号削除部
２５によって削除される。このポーズ削除部分の長さＳ
ｔｄは、今回の圧縮区間の開始点であるＦ点からＧ点ま
での入力信号に対応する圧縮データの、１／２圧縮に対
する伸長分ＳｔＧと等しくなる。The frame data in the silent section after the pause duration Tdel is deleted by the input signal deletion unit 25 until the storage amount of the ring memory 7 becomes the underflow detection data Tmin. Length S of this pause deletion part
td is equal to the decompression amount StG of the compressed data corresponding to the input signal from the point F to the point G, which is the start point of the current compression section, with respect to 圧縮 compression.

【０１９０】入力信号が削除された区間の最終点（Ｈ
点）においては、図１２（ｊ）に示すように、リングメ
モリ７の蓄積量ＴｍＨは、アンダーフロー検出用データ
Ｔｍｉｎ以下となる。ここでは、蓄積量ＴｍＨがアンダ
ーフロー検出用データＴｍｉｎに等しくなった例を示し
ている。The last point of the section from which the input signal has been deleted (H
12), the accumulated amount TmH of the ring memory 7 is equal to or less than the underflow detection data Tmin, as shown in FIG. Here, an example is shown in which the accumulated amount TmH is equal to the underflow detection data Tmin.

【０１９１】Ｈ点からの無音区間に対するフレームデー
タは、間引き処理部２４によって、圧縮率１／２で間引
かれた後、フレームメモリ７に書き込まれる。そして、
音声区間ｄの信号が入力されると、この音声区間ｄのフ
レームデータがピッチ圧縮伸長手段２３によって、１／
２以上の圧縮率αで圧縮される。そして、伸長されたデ
ータがリングメモリ７に書き込まれる。The frame data for the silent section from the point H is thinned out by the thinning-out processing section 24 at a compression ratio of 、, and then written into the frame memory 7. And
When the signal of the voice section d is input, the frame data of the voice section d is converted into 1 /
It is compressed at a compression ratio α of 2 or more. Then, the decompressed data is written to the ring memory 7.

【０１９２】図１３は、２倍速再生時の入力信号と出力
信号との関係を示し、特にオーバーフロー直前状態とな
ったときに、入力信号が削除される様子を示している。
図１４は、図１３の各点Ｓ〜Ｕにおけるリングメモリ７
の状態を示している。FIG. 13 shows the relationship between the input signal and the output signal at the time of 2 × speed reproduction, and particularly shows how the input signal is deleted when the state immediately before the overflow occurs.
FIG. 14 shows the ring memory 7 at each of points S to U in FIG.
The state of is shown.

【０１９３】ある時点からＴ点までの、音声区間ａ、
ｂ、ｃ等と無音区間とを含む一連の入力信号に対するフ
レームデータが、ピッチ圧縮伸長手段２３によって１／
２以上の圧縮率αで圧縮され（圧縮率αが１／２以外の
ときには、圧縮率１／２の圧縮に対しては伸長され）て
いるとする。この場合には、リングメモリ７に伸長分が
蓄積されていく。From a certain point of time to a point T, a voice section a,
The frame data for a series of input signals including b, c, etc. and a silent section is
It is assumed that the image data is compressed at a compression ratio α of 2 or more (when the compression ratio α is other than 1/2, the compression is extended at a compression ratio of 1/2). In this case, the extension is accumulated in the ring memory 7.

【０１９４】音声区間ｂの入力開始点（Ｓ点）において
は、図１４（ａ）に示すように、当該１連の入力信号の
圧縮処理の開始点での蓄積量Ｔｍｉｎと、上記圧縮処理
の開始点からＳ点までの入力信号に対応する圧縮データ
の、１／２圧縮に対する伸長分ＳｔＳとの和がリングメ
モリ７の蓄積量ＴｍＳ（＝ＳｔＳ＋Ｔｍｉｎ）となる。
したがって、音声区間ｂに対する出力信号ｂ１は、Ｓ点
からＴｍＳ（＝ＳｔＳ＋Ｔｍｉｎ）分が経過した点で出
力され始められる。At the input start point (point S) of the voice section b, as shown in FIG. 14A, the accumulation amount Tmin at the start point of the compression processing of the series of input signals and the compression processing of the compression processing are performed. The sum of the compressed data corresponding to the input signal from the start point to the point S and the decompression amount StS with respect to 圧縮 compression is the accumulated amount TmS (= StS + Tmin) of the ring memory 7.
Therefore, the output signal b1 for the voice section b is started to be output at the point when TmS (= StS + Tmin) has elapsed from the point S.

【０１９５】音声区間ｃの入力信号に対応する圧縮デー
タがリングメモリ７に書き込まれた時点（Ｔ点）におい
て、リングメモリ７がオーバーフロー直前状態になった
とする。すなわち、Ｔ点において、リングメモリ７の蓄
積量がオーバーフロー検出用データＴｍａｘ以上になっ
たとする。It is assumed that the ring memory 7 is in the state immediately before the overflow at the time (point T) when the compressed data corresponding to the input signal of the voice section c is written into the ring memory 7. That is, it is assumed that, at the point T, the accumulated amount of the ring memory 7 is equal to or larger than the overflow detection data Tmax.

【０１９６】Ｔ点においては、図１４（ｂ）に示すよう
に、当該１連の入力信号に対する圧縮処理の開始点での
蓄積量Ｔｍｉｎと、上記圧縮処理開始点からＴ点までの
入力信号に対応する圧縮データの、１／２圧縮に対する
伸長分ＳｔＴとの和がリングメモリ７の蓄積量ＴｍＴ
（＝ＳｔＴ＋Ｔｍｉｎ）となる。言い換えれば、リング
メモリ７の全ワード数をＴＯＴＡＬとし、オーバーフロ
ー検出用データをＴｍａｘとし、ＴＯＴＡＬとＴｍａｘ
との差をＤｍｉｎとすると、Ｔ点での蓄積量Ｔｍｔは、
Ｔｍａｘに等しいので、ＴＯＴＡＬ−Ｄｍｉｎとなる。At the point T, as shown in FIG. 14 (b), the accumulated amount Tmin at the start point of the compression processing for the series of input signals and the input signal from the compression processing start point to the point T are determined. The sum of the corresponding compressed data and the decompression amount StT with respect to １／ compression is the accumulated amount TmT of the ring memory 7.
(= StT + Tmin). In other words, the total number of words in the ring memory 7 is set to TOTAL, the overflow detection data is set to Tmax, and TOTAL and Tmax are used.
Is Dmin, the accumulated amount Tmt at point T is
Since it is equal to Tmax, it becomes TOTAL-Dmin.

【０１９７】したがって、当該１連の入力信号に対する
出力信号は、Ｔ点から蓄積量ＴｍＴ（＝ＳｔＴ＋Ｔｍｉ
ｎ）分遅れた時点で出力され終わる。Therefore, the output signal corresponding to the series of input signals starts from the point T and the accumulated amount TmT (= StT + Tmi)
n) Output is completed at the point of time delayed.

【０１９８】Ｔ点において、リングメモリ７がオーバー
フロー直前状態になると、それ以後の入力信号に対して
は、リングメモリ７がアンダーフロー直前状態になるま
で、入力信号削除部２１によって無条件に削除される。
入力信号削除部２１によって削除処理が行なわれた後に
おいては、消音挿入部２２によって消音が挿入される
が、図１３には挿入された消音部分を省略してある。リ
ングメモリ７がオーバーフロー直前状態になった後（Ｔ
点）、フレームデータが削除されていき、図１４（ｃ）
に示すようにＵ点でリングメモリ７がアンダーフロー直
前状態（蓄積量ＴｍＵ＝Ｔｍｉｎ）になったとする。こ
の場合には、Ｔ点からＵ点までの４つの無音区間および
３つの音声区間ｄ、ｅ、ｆからなる入力信号が削除され
る。したがって、Ｔ点からＵ点までの入力信号は、出力
信号としては現れない。At the point T, when the ring memory 7 enters the state immediately before the overflow, the input signals thereafter are unconditionally deleted by the input signal deletion section 21 until the ring memory 7 enters the state immediately before the underflow. You.
After the deletion process is performed by the input signal deletion unit 21, silence is inserted by the silence insertion unit 22, but the inserted silence part is omitted in FIG. After the ring memory 7 enters the state immediately before overflow (T
Point), and the frame data is deleted, as shown in FIG.
It is assumed that the ring memory 7 is in a state immediately before underflow (accumulated amount TmU = Tmin) at point U as shown in FIG. In this case, an input signal including four silent sections and three voice sections d, e, and f from point T to point U is deleted. Therefore, the input signal from point T to point U does not appear as an output signal.

【０１９９】Ｕ点の後に音声区間ｇの信号が入力される
と、この音声区間に対するフレームデータは、ピッチ圧
縮伸長手段２３によって１／２以上の圧縮率αで圧縮さ
れ（圧縮率αが１／２以外のときには、圧縮率１／２の
圧縮に対しては伸長され）た後、リングメモリ７に書き
込まれていく。音声区間ｇに対する出力信号ｇは、Ｕ点
でのリングメモリ７の蓄積量Ｔｍｉｎ分だけ遅れて出力
され始められる。When the signal of the voice section g is input after the point U, the frame data for this voice section is compressed by the pitch compression / expansion means 23 at a compression rate α of 1/2 or more (the compression rate α is 1 / When the compression ratio is other than 2, the data is decompressed at a compression rate of 1/2, and then written into the ring memory 7. The output signal g for the voice section g is started to be output with a delay of the accumulation amount Tmin of the ring memory 7 at the point U.

【０２００】上記実施例では、入力信号の音声区間と無
音区間とを、各フレームの平均パワー値に基づいて判別
しているが、各フレームの平均振幅に基づいて判別する
ようにしてもよい。この場合には、図１５に示すよう
に、図２のパワー計算部１１の代わりにフレーム単位で
平均振幅値を計算する平均振幅計算部１１Ａが設けら
れ、しきい値メモリ１３Ａには、たとえば、値２⁶のし
きい値が設定される。そして、平均振幅計算部１１Ａに
よって計算された平均振幅値と、しきい値メモリ１３Ａ
のしきい値とが、比較部１２Ａによって比較されること
により、音声区間か無音区間かが判別される。In the above embodiment, the voice section and the silent section of the input signal are determined based on the average power value of each frame, but may be determined based on the average amplitude of each frame. In this case, as shown in FIG. 15, an average amplitude calculator 11A that calculates an average amplitude value in frame units is provided instead of the power calculator 11 of FIG. 2, and the threshold memory 13A includes, for example, A threshold value of ²⁶ is set. Then, the average amplitude value calculated by the average amplitude calculation unit 11A and the threshold value memory 13A
Is compared with the threshold value by the comparison unit 12A to determine whether it is a voice section or a silent section.

【０２０１】つまり、平均振幅値がしきい値以上であれ
ば音声区間と判別され、平均振幅値がしきい値未満であ
れば無音区間と判別される。フレーム単位の平均振幅値
Ｗは、サンプリングされた１フレーム内の各音声信号の
振幅をｉ₀、ｉ₁、…ｉ_N-1（ただし、Ｎ＝２００）と
すると、次の数式３に基づいて算出される。That is, if the average amplitude value is equal to or larger than the threshold value, it is determined to be a voice section, and if the average amplitude value is less than the threshold value, it is determined to be a silent section. The average amplitude value W for each frame is calculated based on the following equation (3), where i ₀ , i ₁ ,... I _N−1 (where N = 200) is the amplitude of each audio signal in one sampled frame. Is calculated.

【０２０２】[0202]

【数３】 (Equation 3)

【０２０３】その他の処理については、図２の話速変換
部６による処理と同じであるので、その説明を省略す
る。The other processing is the same as the processing performed by the speech speed conversion unit 6 in FIG. 2, and a description thereof will be omitted.

【０２０４】なお、この場合においても、次のようにし
て、しきい値を変更するようにしてもよい。すなわち、
図１５に点線で示すように、平均振幅定常状態検出およ
びしきい値更新部１４Ａを設ける。平均振幅定常状態検
出およびしきい値更新部１４Ａは、平均振幅計算部１１
Ａからの平均振幅値Ｗが、所定フレーム数にわたって一
定であったか否かを判別し、一定であったときには（定
常状態）、そのときの平均振幅値Ｗの２倍の値をしきい
値メモリ１３Ａに書き込み、しきい値を更新させる。た
だし、更新されるしきい値の最大値は、所定値、たとえ
ば２⁸に制限される。In this case, the threshold value may be changed in the following manner. That is,
As shown by a dotted line in FIG. 15, an average amplitude steady state detection and threshold value updating unit 14A is provided. The average amplitude steady state detection and threshold update unit 14A includes an average amplitude calculation unit 11
It is determined whether or not the average amplitude value W from A is constant over a predetermined number of frames. If the average amplitude value W is constant (steady state), a value twice the average amplitude value W at that time is stored in the threshold memory 13A. To update the threshold. However, the maximum value of the threshold to be updated, a predetermined value is limited for example to 2 ^8.

【０２０５】また、入力信号の音声区間と無音区間と
を、次の数式４で示す各フレームの音声信号の振幅累積
値Ｗａと所与のしきい値とに基づいて判別するようにし
てもよい。Further, the speech section and the silence section of the input signal may be determined based on the cumulative amplitude value Wa of the speech signal of each frame and the given threshold value as shown in the following Expression 4. .

【０２０６】[0206]

【数４】 (Equation 4)

【０２０７】また、入力信号の音声区間と無音区間と
を、各フレームの信号の周期性を検出し、検出した周期
が予め定められた音声信号のピッチ周期範囲内であれ
ば、音声区間であると判別し、検出した周期が予め定め
られた音声信号のピッチ周期範囲外であれば無音区間で
あると判別するようにしてもよい。[0207] The voice section and the silent section of the input signal are detected as the voice section if the periodicity of the signal of each frame is detected and the detected cycle is within a predetermined pitch cycle range of the voice signal. If the detected cycle is outside the predetermined pitch cycle range of the audio signal, it may be determined to be a silent section.

【０２０８】この場合には、図１６に示すように、図２
のパワー計算部１１の代わりに、自己相関法に基づい
て、フレームごとの周期性を検出するピッチ周期検出部
１１Ｂが設けられ、しきい値メモリ１３Ｂには、音声信
号のピッチ周期範囲が設定される。そして、ピッチ周期
検出部１１Ｂで検出された周期と、しきい値メモリ１３
Ｂに設定された音声信号のピッチ周期範囲とが、比較部
１２Ｂによって比較される。In this case, as shown in FIG.
Is provided with a pitch cycle detecting section 11B for detecting the periodicity of each frame based on the autocorrelation method, and the pitch memory range of the voice signal is set in the threshold value memory 13B. You. Then, the cycle detected by the pitch cycle detecting unit 11B and the threshold memory 13
The comparison unit 12B compares the pitch cycle range of the audio signal set to B.

【０２０９】設定される音声信号のピッチ周期範囲は、
再生速度により異なり、ｎ倍速再生のときには、たとえ
ば、６６×ｎ（Ｈｚ）〜３２０×ｎ（Ｈｚ）の範囲に設
定される。したがって、２倍速再生時には、音声信号の
ピッチ周期範囲は、１３２Ｈｚ〜６４０Ｈｚの範囲に設
定される。その他の処理については、図２の話速変換部
６による処理と同じであるので、その説明を省略する。The range of the pitch period of the audio signal to be set is as follows.
It depends on the reproduction speed, and is set, for example, in the range of 66 × n (Hz) to 320 × n (Hz) at the time of n × speed reproduction. Therefore, at the time of double speed reproduction, the pitch cycle range of the audio signal is set to a range of 132 Hz to 640 Hz. Other processes are the same as the processes performed by the speech speed conversion unit 6 in FIG. 2, and thus description thereof is omitted.

【０２１０】また、入力信号の音声区間と無音区間と
を、各フレームの信号のパワースペクトルと、定常状態
のパワースペクトルと比較することにより、判別するよ
うにしてもよい。[0210] Further, the voice section and the silent section of the input signal may be determined by comparing the power spectrum of the signal of each frame with the power spectrum of the steady state.

【０２１１】この場合には、図２０に示すように、図２
のパワー計算部１１の代わりに、フレームごとに所定の
１または複数の周波数帯域に対するパワースペクトルを
算出するパワースペクトル算出部１１Ｃが設けられる。
また、上記所定の１または複数の周波数帯域に対する定
常状態のパワースペクトルがパワースペクトル記憶部１
３Ｃに記憶されている。In this case, as shown in FIG.
Is provided with a power spectrum calculator 11C for calculating a power spectrum for one or a plurality of predetermined frequency bands for each frame.
The power spectrum in the steady state for the predetermined one or a plurality of frequency bands is stored in the power spectrum storage unit 1.
3C.

【０２１２】パワースペクトル記憶部１３Ｃの内容は、
パワースペクトル算出部１１Ｃによって算出されたパワ
ースペクトルの変化状態に基づいて、パワースペクトル
定常状態検出部１４Ｂが定常状態であることを検出した
ときには、検出された定常状態でのパワースペクトルに
更新される。The contents of the power spectrum storage section 13C are as follows.
When the power spectrum steady state detection unit 14B detects that the power spectrum is in the steady state based on the change state of the power spectrum calculated by the power spectrum calculation unit 11C, the power spectrum is updated to the detected power spectrum in the steady state.

【０２１３】入力信号がパワースペクトル算出部１１Ｃ
に送られてくると、フレームごとに所定の１または複数
の周波数帯域に対するパワースペクトルが算出される。
そして、算出されたパワースペクトルと、パワースペク
トル記憶部１３Ｃに記憶されている定常状態のパワース
ペクトルとが比較部１２Ｃによって比較される。When the input signal is the power spectrum calculator 11C
, A power spectrum for one or more predetermined frequency bands is calculated for each frame.
Then, the comparison unit 12C compares the calculated power spectrum with the power spectrum in the steady state stored in the power spectrum storage unit 13C.

【０２１４】算出されたパワースペクトルが定常状態の
パワースペクトルに対して、変動していれば、そのフレ
ームは音声区間と判別される。逆に、算出されたパワー
スペクトルが定常状態のパワースペクトルに対して、変
動していなければ、そのフレームは無音区間と判別され
る。If the calculated power spectrum fluctuates with respect to the steady-state power spectrum, the frame is determined to be a voice section. Conversely, if the calculated power spectrum does not fluctuate from the steady-state power spectrum, the frame is determined to be a silent section.

【０２１５】具体的には、パワースペクトル記憶部１３
Ｃには、上記所定の１または複数の周波数帯域に対する
定常状態のパワースペクトルに基づいて、上記所定の１
または複数の周波数帯域に対するしきい値が記憶され
る。そして、パワースペクトル記憶部１３Ｃに記憶され
ている。パワースペクトル算出部１１Ｃによって算出さ
れた上記所定の１または複数の周波数帯域に対するパワ
ースペクトルと、パワースペクトル記憶部１３Ｃに記憶
されている対応するしきい値とが比較されることによ
り、入力信号が音声区間か無音区間かが判別される。Specifically, the power spectrum storage unit 13
C includes the predetermined one based on the steady-state power spectrum for the predetermined one or more frequency bands.
Alternatively, threshold values for a plurality of frequency bands are stored. Then, it is stored in the power spectrum storage unit 13C. The power spectrum for the predetermined one or more frequency bands calculated by the power spectrum calculation unit 11C is compared with the corresponding threshold value stored in the power spectrum storage unit 13C, so that the input signal is It is determined whether the section is a section or a silent section.

【０２１６】たとえば、定常状態のパワースペクトルが
図２１の（ａ）に示されているように、雑音のみのパワ
ースペクトルであるとする。また、雑音が含まれていな
い音声のパワースペクトルが図２１の（ｂ）に示されて
いるものとする。定常状態において、図２１（ａ）のパ
ワースペクトルで示される雑音が存在する場合に、図２
１（ｂ）で示すパワースペクトルを持つ音声信号が入力
すると、そのパワースペクトルは、図２１（ｃ）に示さ
れるように、両者のパワースペクトルが合成されたもの
となる。For example, it is assumed that the power spectrum in the steady state is a power spectrum of only noise as shown in FIG. Also, it is assumed that the power spectrum of the voice without noise is shown in FIG. In the steady state, when noise shown by the power spectrum of FIG.
When an audio signal having the power spectrum shown by 1 (b) is input, the power spectrum is obtained by combining both power spectra as shown in FIG. 21 (c).

【０２１７】したがって、たとえば、定常状態のパワー
スペクトルにおいてパワーが比較的小さい周波数帯域ｆ
ａおよびｆｂに対するパワーは、音声区間のパワースペ
クトルにおいては大幅に増加する。つまり、定常状態の
パワースペクトルにおいてパワーが比較的小さい１また
は複数の周波数帯域における定常状態のパワーと、入力
信号のパワースペクトルの上記１または複数の周波数帯
域におけるパワーとを比較することにより、入力信号が
音声区間か無音区間かを判別することができる。Therefore, for example, a frequency band f in which power is relatively small in a power spectrum in a steady state.
The powers for a and fb increase significantly in the power spectrum of the voice section. That is, by comparing the steady-state power in one or more frequency bands having relatively small power in the steady-state power spectrum with the power in the one or more frequency bands of the power spectrum of the input signal, Is a voice section or a silent section.

【０２１８】なお、定常状態の雑音が高い周波数帯域の
雑音であると判明している場合には、雑音の影響の少な
い低い周波数帯域（例えば、４ＫＨｚ以下の周波数帯
域）に対するパワースペクトルを算出し、算出されたパ
ワースペクトルが所定のしきい値以上か否かによって、
入力信号が音声区間か無音区間かを判別することもでき
る。If it is known that the steady-state noise is noise in a high frequency band, a power spectrum for a low frequency band (for example, a frequency band of 4 KHz or less) where the influence of the noise is small is calculated. Depending on whether the calculated power spectrum is equal to or greater than a predetermined threshold,
It is also possible to determine whether the input signal is a voice section or a silent section.

【０２１９】また、各フレームのパワー平均値Ｐと、し
きい値Ｔｈとを比較することにより、音声区間と無音区
間とを判別する場合において、リングメモリ７の蓄積量
に基づいて、しきい値Ｔｈを変化させるようにしてもよ
い。すなわち、リングメモリ７の蓄積量が少なくなるほ
ど、言い換えれば、リングメモリ７の空領域が多くなる
ほど、音声区間の欠落部が少なくなるようにしきい値Ｔ
ｈは小さくされる。これにより、出力音声が自然により
近くなる。Further, by comparing the average power value P of each frame with the threshold value Th, when discriminating between a voice section and a silent section, the threshold value is determined based on the storage amount of the ring memory 7. Th may be changed. That is, the threshold value T is set such that the smaller the storage amount of the ring memory 7, in other words, the larger the empty area of the ring memory 7, the smaller the missing portion of the voice section.
h is reduced. This makes the output sound closer to nature.

【０２２０】つまり、図２２に示すように、しきい値調
整手段５１を設ける。しきい値調整手段５１は、リング
メモリ蓄積量状態判別部１６からリングメモリ７の蓄積
量を得る。そして、得られたリングメモリ７の蓄積量
を、Ｄ／Ａ変換部８のサンプリング周波数で除すること
により、蓄積時間Ｔｍを算出する。そして、算出された
蓄積時間Ｔｍに基づいて、しきい値Ｔｈを決定し、しき
い値メモリ１３の内容を更新する。That is, as shown in FIG. 22, a threshold adjusting means 51 is provided. The threshold adjusting unit 51 obtains the storage amount of the ring memory 7 from the ring memory storage amount state determination unit 16. Then, the storage time Tm is calculated by dividing the obtained storage amount of the ring memory 7 by the sampling frequency of the D / A converter 8. Then, the threshold value Th is determined based on the calculated accumulation time Tm, and the content of the threshold value memory 13 is updated.

【０２２１】より具体的に説明すると、リングメモリ蓄
積量状態判別部１６から得られたリングメモリ７の蓄積
量がＤ／Ａ変換部８のサンプリング周波数である８００
０で除されることにより、蓄積時間Ｔｍが求められる。
そして、予め作成された蓄積時間Ｔｍに対するしきい値
Ｔｈのデータに基づいて、蓄積時間Ｔｍに対するしきい
値Ｔｈが求められる。More specifically, the storage amount of the ring memory 7 obtained from the ring memory storage amount state determination unit 16 is 800, which is the sampling frequency of the D / A conversion unit 8.
By dividing by 0, the accumulation time Tm is obtained.
Then, the threshold value Th for the accumulation time Tm is obtained based on the data of the threshold value Th for the accumulation time Tm created in advance.

【０２２２】次の表は、Ａ／Ｄ変換部２の量子化ビット
数が１２ｂｉｔである場合における蓄積時間Ｔｍに対す
るしきい値Ｔｈのデータの一例を示している。The following table shows an example of data of the threshold value Th with respect to the accumulation time Tm when the number of quantization bits of the A / D converter 2 is 12 bits.

【０２２３】[0223]

【表３】 [Table 3]

【０２２４】また、各フレームのパワー累積値Ｐａとし
きい値とを比較することにより、音声区間と無音区間と
を判別する場合、各フレームの平均振幅値Ｗとしきい値
とを比較することにより、音声区間と無音区間とを判別
する場合、各フレームの振幅累積値Ｗａとしきい値とを
比較することにより、各フレームのパワースークトルと
しきい値とを比較することにより、音声区間と無音区間
とを判別する場合にも、上記と同様に、リングメモリ７
の蓄積量に基づいて、しきい値を変化させるようにして
もよい。When the speech section and the silent section are discriminated by comparing the power accumulated value Pa of each frame with the threshold value, the average amplitude value W of each frame is compared with the threshold value. When discriminating between the voice section and the silent section, the power section of each frame is compared with the threshold value by comparing the amplitude cumulative value Wa of each frame with the threshold value. Is also determined in the same manner as described above.
The threshold value may be changed based on the accumulated amount of.

【０２２５】また、リングメモリ７の蓄積量に基づい
て、無音区間の削除開始点を決定するためのポーズ継続
長Ｔｄｅｌを変化させるようにしてもよい。すなわち、
リングメモリ７の蓄積量が少なくなるほど、言い換えれ
ば、リングメモリ７の空領域が多くなるほど、無音区間
の削除部が少なくなるように、ポーズ継続長Ｔｄｅｌが
長くされる。これにより、出力音声が自然により近くな
る。Further, the pause continuation length Tdel for determining the start point of deletion of a silent section may be changed based on the storage amount of the ring memory 7. That is,
The pause continuation length Tdel is increased such that the smaller the storage amount of the ring memory 7, in other words, the larger the empty area of the ring memory 7, the smaller the number of silence sections to be deleted. This makes the output sound closer to nature.

【０２２６】つまり、図２２に示すように、ポーズ継続
長調整手段５２を設ける。ポーズ継続長調整手段５２
は、リングメモリ蓄積量状態判別部１６からリングメモ
リ７の蓄積量を得る。そして、得られたリングメモリ７
の蓄積量を、Ｄ／Ａ変換部８のサンプリング周波数で除
することにより、蓄積時間Ｔｍを算出する。そして、算
出された蓄積時間Ｔｍに基づいて、ポーズ継続長Ｔｄｅ
ｌを決定し、ポーズ継続長設定メモリ１７の内容を更新
する。That is, as shown in FIG. 22, a pause continuation length adjusting means 52 is provided. Pause continuation length adjusting means 52
Obtains the storage amount of the ring memory 7 from the ring memory storage amount state determination unit 16. And the obtained ring memory 7
Is divided by the sampling frequency of the D / A converter 8 to calculate the accumulation time Tm. Then, based on the calculated accumulation time Tm, the pause duration Tde
is determined, and the contents of the pause continuation length setting memory 17 are updated.

【０２２７】より具体的に説明すると、リングメモリ蓄
積量状態判別部１６から得られたリングメモリ７の蓄積
量がＤ／Ａ変換部８のサンプリング周波数である８００
０で除されることにより、蓄積時間Ｔｍが求められる。
そして、予め作成された蓄積時間Ｔｍに対するポーズ継
続長Ｔｄｅｌのデータに基づいて、蓄積時間Ｔｍに対す
るポーズ継続長Ｔｄｅｌが求められる。More specifically, the storage amount of the ring memory 7 obtained from the ring memory storage amount state determination unit 16 is 800, which is the sampling frequency of the D / A conversion unit 8.
By dividing by 0, the accumulation time Tm is obtained.
Then, the pause duration Tdel for the accumulation time Tm is obtained based on the data of the pause duration Tdel for the accumulation time Tm created in advance.

【０２２８】次の表は、ＶＴＲの２倍速再生時における
蓄積時間Ｔｍに対するポーズ継続長Ｔｄｅｌのデータの
一例を示している。The following table shows an example of the data of the pause duration Tdel with respect to the accumulation time Tm at the time of double speed reproduction of the VTR.

【０２２９】[0229]

【表４】 [Table 4]

【０２３０】以上は、入力信号がアナログ信号の場合に
ついて説明したが、入力信号がディジタルデータである
場合にもこの発明を適用することができる。たとえば、
ＩＣメモリ、磁気ディスク、ディジタル通信回線等か
ら、圧縮されたディジタル音声信号が送られてきた場合
には、圧縮されたディジタル音声信号が伸長されてＰＣ
Ｍ音声信号に変換され、得られたＰＣＭ音声信号がバッ
ファに一旦格納される。その後、設定された再生速度倍
率に応じた速度で、ＰＣＭ音声データがバッファから読
み出されて、図１のフレームメモリ５に送られる。In the above, the case where the input signal is an analog signal has been described. However, the present invention can be applied to a case where the input signal is digital data. For example,
When a compressed digital audio signal is sent from an IC memory, a magnetic disk, a digital communication line, or the like, the compressed digital audio signal is expanded and the PC
The converted PCM audio signal is temporarily stored in a buffer. Thereafter, the PCM audio data is read from the buffer at a speed corresponding to the set reproduction speed magnification, and is sent to the frame memory 5 in FIG.

【０２３１】次に本発明による話速変換装置をＴＶ電話
機に応用する場合の実施例について述べる。尚、以下の
話速変換装置１０９は図１に示す話速変換装置に相当す
る。Next, an embodiment in which the speech speed conversion device according to the present invention is applied to a TV telephone will be described. Note that the following speech speed converter 109 corresponds to the speech speed converter shown in FIG.

【０２３２】図２３は、本発明を応用した話速変換機能
付きＴＶ電話機の概略構成図である。映像と音声とが混
在する入力信号は、信号分離合成部１０１により、映像
信号と音声信号とに分離され各々映像信号処理ブロック
１０２および音声信号処理ブロック１０３に送信され処
理される。また、上記各々の信号処理ブロック１０２お
よび１０３で処理された映像および音声信号は信号分離
合成部１０１により合成され、映像信号と音声信号とが
混在する出力信号となる。FIG. 23 is a schematic configuration diagram of a TV telephone with a speech speed conversion function to which the present invention is applied. An input signal in which video and audio are mixed is separated into a video signal and an audio signal by a signal separation / synthesis unit 101, and transmitted to a video signal processing block 102 and an audio signal processing block 103, respectively, for processing. The video and audio signals processed by the respective signal processing blocks 102 and 103 are combined by the signal separation / combination unit 101 to become an output signal in which the video signal and the audio signal are mixed.

【０２３３】次に映像信号処理ブロック１０２での処理
について述べる。信号分離合成部１０１によって分離さ
れた映像入力信号は、映像受信部１０４により受信さ
れ、上記映像をモニター１０５に映し出す。また、カメ
ラ１０７によって撮影された映像は映像送信部１０６に
より映像信号として信号分離合成部１０１に送信され
る。Next, the processing in the video signal processing block 102 will be described. The video input signal separated by the signal separation / synthesis unit 101 is received by the video reception unit 104, and the video is displayed on a monitor 105. Further, the video image captured by the camera 107 is transmitted by the video transmission unit 106 to the signal separation / combination unit 101 as a video signal.

【０２３４】同様に、音声信号ブロック１０３では、信
号分離合成部１０１によって分離された音声入力信号
は、受話部１０８により受信され、本発明による話速変
換装置１０９により受信者が聞きやすい速度に話速制御
された音声として、スピーカー１１０により発声され
る。また、マイク１１２によって集音された音声は送話
部１１１により音声信号として信号分離合成部１０１に
送信される。この時、上記入出力信号に混在する映像と
音声は時間的なズレがないことが好ましく、本発明によ
る話速変換装置１０９により極力映像と音声のズレのな
いＴＶ電話機の提供が可能となる。Similarly, in the audio signal block 103, the audio input signal separated by the signal separation / combination unit 101 is received by the receiving unit 108, and the audio is converted to a speed that is easy for the receiver to hear by the speech speed conversion device 109 according to the present invention. The voice is uttered by the speaker 110 as the speed-controlled voice. The voice collected by the microphone 112 is transmitted to the signal separation / combination unit 101 as a voice signal by the transmission unit 111. At this time, it is preferable that the video and the audio mixed in the input / output signal have no time lag, and the speech speed conversion device 109 according to the present invention can provide a TV telephone with the minimum lag between the video and the audio.

【０２３５】加えて、上記ＴＶ電話機で映像信号の授受
がなく、電話機の如き音声信号のみを送受信する場合で
も、本発明による話速変換装置１０９により送話者と受
話者との会話のタイミングのズレがおこりにくく、か
つ、受話者が聞きやすい速度で話速変換可能な話速変換
機能付き電話機も提供可能となることは言うまでもな
い。In addition, even when the TV telephone does not transmit and receive a video signal, and transmits and receives only an audio signal as in a telephone, the speech speed conversion device 109 according to the present invention makes it possible to determine the timing of the conversation between the transmitter and the receiver. Needless to say, it is possible to provide a telephone with a speech speed conversion function that is less likely to be shifted and that can change the speech speed at a speed that is easy for the listener to hear.

【０２３６】[0236]

【発明の効果】この発明によれば、処理負荷を低減でき
るとともに、映像と音声のズレを小さくでき、しかも音
声信号を蓄積するためのメモリの容量も膨大とならない
話速変換装置が得られる。According to the present invention, it is possible to obtain a speech speed conversion device capable of reducing the processing load, reducing the difference between the video and the audio, and not increasing the capacity of the memory for storing the audio signal.

【０２３７】また、この発明によれば、入力信号の音声
区間における音声の欠落部をできるだけ少なくしつつ、
音声区間における音声に対する音声再生速度を、設定さ
れた再生速度倍率に対して遅くさせることができる。Further, according to the present invention, it is possible to reduce the number of missing voices in the voice section of the input signal as much as possible.
The audio reproduction speed for the audio in the audio section can be reduced with respect to the set reproduction speed magnification.

[Brief description of the drawings]

【図１】話速変換装置の全体的な構成を示すブロック図
である。FIG. 1 is a block diagram showing an overall configuration of a speech speed conversion device.

【図２】話速変換部の構成を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration of a speech speed conversion unit.

【図３】ＰＩＣＯＬＡを用いて、入力信号を圧縮率２／
３で圧縮する方法を示す説明図である。FIG. 3 shows that an input signal is compressed at a compression ratio of 2 /
FIG. 3 is an explanatory diagram showing a method of performing compression in No. 3;

【図４】波形合成処理部による処理を説明するための説
明図である。FIG. 4 is an explanatory diagram for describing processing by a waveform synthesis processing unit;

【図５】間引き処理部によって行なわれる各種の間引き
処理方法を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining various thinning processing methods performed by a thinning processing unit.

【図６】話速変換部による処理手順を示すフローチャー
トである。FIG. 6 is a flowchart illustrating a processing procedure by a speech speed conversion unit;

【図７】話速変換部による処理手順を示すフローチャー
トである。FIG. 7 is a flowchart illustrating a processing procedure by a speech speed conversion unit;

【図８】話速変換部による処理手順の変形例を示し、図
７に相当するフローチャートである。FIG. 8 is a flowchart illustrating a modified example of the processing procedure by the speech speed conversion unit and corresponding to FIG. 7;

【図９】図６のステップ１０の処理と置き換え可能な処
理を説明するための説明図である。FIG. 9 is an explanatory diagram for explaining a process that can be replaced with the process of step 10 in FIG. 6;

【図１０】２倍速再生時の入力信号と出力信号との関係
を示し、特に無音区間の入力信号が削除される様子を示
すタイムチャートである。FIG. 10 is a time chart showing a relationship between an input signal and an output signal at the time of double-speed reproduction, and particularly showing a state in which an input signal in a silent section is deleted.

【図１１】リングメモリ７へのデータ書き込み開始点、
リングメモリ７からのデータ読み出し開始点ならびに図
１０の点Ａ〜Ｄにおけるリングメモリ７の状態を示す模
式図である。FIG. 11 shows a starting point of data writing to the ring memory 7,
FIG. 11 is a schematic diagram showing a state of starting reading data from the ring memory 7 and a state of the ring memory 7 at points A to D in FIG. 10.

【図１２】図１０の点Ｅ〜Ｈにおけるリングメモリ７の
状態を示す模式図である。12 is a schematic diagram showing the state of the ring memory 7 at points E to H in FIG.

【図１３】２倍速再生時の入力信号と出力信号との関係
を示し、特にオーバーフロー直前状態となったときに、
入力信号が削除される様子を示すタイムチャートであ
る。FIG. 13 shows a relationship between an input signal and an output signal at the time of 2 × speed reproduction.
6 is a time chart illustrating a state in which an input signal is deleted.

【図１４】図１３の各点Ｓ〜Ｕにおけるリングメモリ７
の状態を示す模式図である。14 is a diagram showing a ring memory 7 at each of points S to U in FIG. 13;
It is a schematic diagram which shows the state of.

【図１５】音声区間と無音区間とを判別するための回路
の変形例を示し、図２に相当するブロック図である。FIG. 15 is a block diagram illustrating a modified example of a circuit for determining a voice section and a silent section, corresponding to FIG. 2;

【図１６】音声区間と無音区間とを判別するための回路
の他の変形例を示し、図２に相当するブロック図であ
る。FIG. 16 is a block diagram showing another modified example of the circuit for distinguishing between a voice section and a silent section, and corresponds to FIG. 2;

【図１７】固定フレーム単位で、入力信号を圧縮率２／
３で圧縮する方法を示す説明図である。FIG. 17 shows a compression ratio of 2 /
FIG. 3 is an explanatory diagram showing a method of performing compression in No. 3;

【図１８】図６のステップ９の処理と置き換え可能な処
理を説明するための説明図である。FIG. 18 is an explanatory diagram for explaining a process that can be replaced with the process of step 9 in FIG. 6;

【図１９】図６のステップ９の処理として図１８の処理
を採用した場合に、図６のステップ１０の処理と置き換
え可能な処理を説明するための説明図である。19 is an explanatory diagram for explaining a process that can be replaced with the process of step 10 of FIG. 6 when the process of FIG. 18 is adopted as the process of step 9 of FIG. 6;

【図２０】音声区間と無音区間とを判別するための回路
のさらに他の変形例を示し、図２に相当するブロック図
である。FIG. 20 is a block diagram showing still another modification of the circuit for discriminating between a voice section and a silent section, corresponding to FIG. 2;

【図２１】定常状態のパワースペクトル、雑音を含まな
い音声のパワースペクトルおよび音声区間のパワースペ
クトルを示すグラフである。FIG. 21 is a graph showing a power spectrum in a steady state, a power spectrum of a voice without noise, and a power spectrum of a voice section.

【図２２】しきい値調整手段およびポーズ継続長調整手
段が付加された話速変換部を示すブロック図である。FIG. 22 is a block diagram showing a speech speed conversion unit to which a threshold adjustment unit and a pause duration adjustment unit are added.

【図２３】本発明を応用した話速変換機能付きＴＶ電話
機の概略構成図である。FIG. 23 is a schematic configuration diagram of a TV phone with a speech speed conversion function to which the present invention is applied.

[Explanation of symbols]

２Ａ／Ｄ変換部４ＤＳＰ５フレームメモリ６話速変換部７リングメモリ８Ｄ／Ａ変換部９アップダウンカウンタ１１パワー計算部１１Ａ平均振幅計算部１１Ｂピッチ周期検出部１１Ｃパワースークトル計算部１２、１２Ａ、１２Ｂ、１２Ｃ比較部１５条件分岐部１６リングメモリ蓄積量状態判別部２１、２５入力信号削除部２３ピッチ圧縮伸長手段２４間引き処理部４２圧縮伸長率調整手段５１しきい値調整手段５２ポーズ継続長調整手段 Reference Signs List 2 A / D converter 4 DSP 5 Frame memory 6 Speech speed converter 7 Ring memory 8 D / A converter 9 Up / down counter 11 Power calculator 11A Average amplitude calculator 11B Pitch cycle detector 11C Power spectrum calculator 12 , 12A, 12B, 12C Comparing unit 15 Conditional branching unit 16 Ring memory storage amount state discriminating unit 21, 25 Input signal deleting unit 23 Pitch compression / expansion unit 24 Decimation processing unit 42 Compression / expansion rate adjusting unit 51 Threshold adjusting unit 52 Pause Duration adjustment means

───────────────────────────────────────────────────── フロントページの続き (72)発明者宮武正典大阪府守口市京阪本通２丁目５番５号三洋電機株式会社内 (56)参考文献特開平３−205656（ＪＰ，Ａ) 特開平８−137492（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/04 G11B 20/02 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Masanori Miyatake 2-5-5 Keihanhondori, Moriguchi-shi, Osaka Sanyo Electric Co., Ltd. (56) References JP 3-205656 (JP, A) JP Hei 8-137492 (JP, A) (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 21/04 G11B 20/02

Claims

(57) [Claims]

1. A speech speed conversion processing means for performing speech speed conversion processing of an input voice signal, a ring memory in which an output of the speech speed conversion processing means is written,
And a means for reading data from the ring memory at a constant speed. When the input voice signal is in the voice section and the ring memory is not in the state immediately before the overflow, the speech speed conversion processing means sets the playback speed magnification to 1 / n Means for performing a compression / expansion process on the input audio signal at the compression rate determined according to the program type set by the operator.

2. A / D conversion means for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, a frame memory to which an audio signal output from the A / D conversion means is input, Each time a required number of voice signals are input to the frame memory, voice speed conversion processing means for performing voice speed conversion processing on those voice signals, a ring memory in which the output of the voice speed conversion processing means is written, 1 × speed reproduction Reading means for reading data from the ring memory based on a read signal having a frequency equal to the sampling frequency at the time, and storage amount calculating means for calculating the storage amount of the ring memory based on a write signal and a read signal of the ring memory. The speech speed conversion processing means includes an input voice corresponding to a required number of voice signals input to the frame memory. A signal for performing compression / expansion processing or deletion processing on the required number of audio signals in accordance with an output of the section determination means and an output of the accumulation amount calculation means, When the input sound is a voice section and the ring memory is not in a state immediately before overflow, the signal processing means has a compression ratio of 1 / n or more, where n is a set reproduction speed magnification, and is set by an operator. A speech speed conversion device including means for performing a compression / expansion process at a compression rate determined according to a selected program type.

3. A frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification. Each time a required number of audio signals are input to the frame memory, the digital audio signal is Speed conversion processing means for performing voice speed conversion processing, a ring memory to which the output of the voice speed conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a write signal and a read signal of the ring memory. The speech rate conversion processing means determines whether the input voice corresponding to the required number of voice signals input to the frame memory is a voice section or a silent section. Section discriminating means for discriminating, and according to the output of the section discriminating means and the output of the accumulation amount calculating means, Signal processing means for performing a decompression / expansion processing or a deletion processing, wherein the signal processing means performs compression of 1 / n or more with the set reproduction speed magnification being n when the input voice is a voice section and the ring memory is not in a state immediately before overflowing. A speech speed conversion device including means for performing compression / expansion processing at a compression rate determined according to a program type set by an operator.

4. A speech speed conversion processing means for performing speech speed conversion processing of an input voice signal, a ring memory in which an output of the speech speed conversion processing means is written,
And a means for reading data from the ring memory at a constant speed. When the input voice signal is in the voice section and the ring memory is not in the state immediately before the overflow, the speech speed conversion processing means sets the playback speed magnification to 1 / n Means for performing compression / expansion processing on the input audio signal at the compression rate determined in accordance with the program type and the amount of storage in the ring memory, the compression rate being set by the operator. Conversion device.

5. An A / D converter for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, a frame memory to which an audio signal output from the A / D converter is input, Each time a required number of voice signals are input to the frame memory, voice speed conversion processing means for performing voice speed conversion processing on those voice signals, a ring memory in which the output of the voice speed conversion processing means is written, 1 × speed reproduction Reading means for reading data from the ring memory based on a read signal having a frequency equal to the sampling frequency at the time, and storage amount calculating means for calculating the storage amount of the ring memory based on a write signal and a read signal of the ring memory. The speech speed conversion processing means includes an input voice corresponding to a required number of voice signals input to the frame memory. A signal for performing compression / expansion processing or deletion processing on the required number of audio signals in accordance with an output of the section determination means and an output of the accumulation amount calculation means, When the input sound is a voice section and the ring memory is not in a state immediately before overflow, the signal processing means has a compression ratio of 1 / n or more, where n is a set reproduction speed magnification, and is set by an operator. A speech speed conversion device including means for performing compression / expansion processing at a compression ratio determined in accordance with a selected program type and an amount of storage in a ring memory.

6. A frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification. Speed conversion processing means for performing voice speed conversion processing, a ring memory to which the output of the voice speed conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a write signal and a read signal of the ring memory. The speech rate conversion processing means determines whether the input voice corresponding to the required number of voice signals input to the frame memory is a voice section or a silent section. Section discriminating means for discriminating, and according to the output of the section discriminating means and the output of the accumulation amount calculating means, Signal processing means for performing a decompression / expansion processing or a deletion processing, wherein the signal processing means performs compression of 1 / n or more with the set reproduction speed magnification being n when the input voice is a voice section and the ring memory is not in a state immediately before overflowing. A speech speed conversion device including means for performing a compression / expansion process at a compression ratio determined according to a program type and a storage amount of a ring memory set by an operator.

7. A / D conversion means for sampling an input analog audio signal at a sampling frequency corresponding to a set reproduction speed magnification, a frame memory to which an audio signal output from the A / D conversion means is input, Each time a required number of voice signals are input to the frame memory, voice speed conversion processing means for performing voice speed conversion processing on those voice signals, a ring memory in which the output of the voice speed conversion processing means is written, 1 × speed reproduction Reading means for reading data from the ring memory based on a read signal having a frequency equal to the sampling frequency at the time, and storage amount calculating means for calculating the storage amount of the ring memory based on a write signal and a read signal of the ring memory. The speech speed conversion processing means includes an input voice corresponding to a required number of voice signals input to the frame memory. A signal for performing compression / expansion processing or deletion processing on the required number of audio signals in accordance with an output of the section determination means and an output of the accumulation amount calculation means, Signal processing means, when the input voice is a voice section and the ring memory is not in a state immediately before overflow,
When the fixed compression rate mode is selected, the compression / expansion process is performed at a compression rate of 1 / n or more, where n is the set reproduction speed magnification, and is determined according to the program type set by the operator. When the compression rate fluctuation mode is selected, the set reproduction speed magnification is set to n and 1 / n
A speech speed conversion device including means for performing a compression / expansion process at the above-mentioned compression rate and at a compression rate determined according to the program type and the storage amount of the ring memory set by the operator.

8. A frame memory in which an input digital audio signal is written at a speed corresponding to a set reproduction speed magnification, and each time a required number of audio signals are input to the frame memory, the digital audio signal is Speed conversion processing means for performing voice speed conversion processing, a ring memory to which the output of the voice speed conversion processing means is written, reading means for reading data from the ring memory at a constant speed, and a write signal and a read signal of the ring memory. The speech rate conversion processing means determines whether the input voice corresponding to the required number of voice signals input to the frame memory is a voice section or a silent section. Section discriminating means for discriminating, and according to the output of the section discriminating means and the output of the accumulation amount calculating means, A signal processing means for performing a reduced expansion processing or deletion processing, the signal processing means, at the time the input speech is the speech section and the ring memory is not overflow state immediately before,
When the fixed compression rate mode is selected, the compression / expansion process is performed at a compression rate of 1 / n or more, where n is the set reproduction speed magnification, and is determined according to the program type set by the operator. When the compression rate fluctuation mode is selected, the set reproduction speed magnification is set to n and 1 / n
A speech speed conversion device including means for performing a compression / expansion process at the above-mentioned compression rate and at a compression rate determined according to the program type and the storage amount of the ring memory set by the operator.

9. The signal processing means includes: (1) a first mode in which the input voice is a voice section and the ring memory is not in a state immediately before overflowing, based on an output of the section determination means and an output of the storage amount calculation means; 2) the second mode in which the input voice is a voice section and the ring memory is in a state immediately before overflow; (3) the input voice is a silent section and the duration of the silent section is less than a predetermined silent deletion start point discriminating value. And a third mode in which the ring memory is not in the state immediately before the overflow, and (4) the input voice is in the silent section, the duration of the silent section is less than the predetermined silent deletion start point determination value, and the ring memory is in the state immediately before the overflow. A fourth mode, (5) the input voice is a silent section, the duration of the silent section is greater than or equal to a predetermined silent deletion start point determination value, and (5) The input mode is a silent section, the duration of the silent section is greater than or equal to a predetermined silent deletion start point determination value, and the ring memory is in a state immediately before the underflow. A mode discriminating means for discriminating which of the sixth mode and the third mode. When the mode is determined to be the first mode or the third mode, the set reproduction speed magnification is set to n and the compression ratio is 1 / n or more. So,
First processing means for performing a compression / expansion process at a compression ratio determined according to a program type set by an operator; when the mode is determined to be the second mode or the fourth mode, the storage amount of the ring memory is immediately before the underflow; A second processing unit for deleting the audio signal until the state is reached; a third processing unit for deleting the audio signal in a silent section when the fifth mode is determined; and a setting when the sixth mode is determined. Playback speed magnification is n
And a fourth processing means for performing a compression / expansion process at a compression ratio of 1 / n ± α (where α is a value of 0 or more and 1 or less). The speech speed conversion device according to 1.

10. The signal processing means includes: (1) a first mode in which an input voice is a voice section and a ring memory is not in a state immediately before an overflow, based on an output of the section determining means and an output of the storage amount calculating means; 2) the second mode in which the input voice is a voice section and the ring memory is in a state immediately before overflow; (3) the input voice is a silent section and the duration of the silent section is less than a predetermined silent deletion start point discriminating value. And a third mode in which the ring memory is not in the state immediately before the overflow, and (4) the input voice is in the silent section, the duration of the silent section is less than the predetermined silent deletion start point determination value, and the ring memory is in the state immediately before the overflow. A fourth mode, (5) the input voice is a silent section, the duration of the silent section is equal to or greater than a predetermined silent deletion start point determination value, and A fifth mode in which the memory is not in the state immediately before the underflow; and (6) the input voice is in the silent section, the duration of the silent section is equal to or greater than the predetermined silent deletion start point determination value, and the ring memory is in the state immediately before the underflow. A mode discriminating means for discriminating which of the sixth mode and the third mode. When the mode is determined to be the first mode or the third mode, the set reproduction speed magnification is set to n and the compression ratio is 1 / n or more. So,
First processing means for performing compression / expansion processing at a compression rate determined according to the program type set by the operator and the amount of storage in the ring memory; when the mode is determined to be the second mode or the fourth mode, A second processing unit for deleting the audio signal until the accumulated amount becomes a state immediately before the underflow; a third processing unit for deleting an audio signal in a silent section when the fifth mode is determined; and a sixth mode. Is set, the set playback speed magnification is set to n
And a fourth processing means for performing a compression / expansion process at a compression ratio of 1 / n ± α (where α is a value of 0 or more and 1 or less). The speech speed conversion device according to 1.

11. The signal processing means includes: (1) a first mode in which an input voice is a voice section and a ring memory is not in a state immediately before an overflow, based on an output of the section determining means and an output of the storage amount calculating means; 2) the second mode in which the input voice is a voice section and the ring memory is in a state immediately before overflow; (3) the input voice is a silent section and the duration of the silent section is less than a predetermined silent deletion start point discriminating value. And a third mode in which the ring memory is not in the state immediately before the overflow, and (4) the input voice is in the silent section, the duration of the silent section is less than the predetermined silent deletion start point determination value, and the ring memory is in the state immediately before the overflow. A fourth mode, (5) the input voice is a silent section, the duration of the silent section is equal to or greater than a predetermined silent deletion start point determination value, and A fifth mode in which the memory is not in the state immediately before the underflow; and (6) the input voice is in the silent section, the duration of the silent section is equal to or greater than the predetermined silent deletion start point determination value, and the ring memory is in the state immediately before the underflow. A mode discriminating means for discriminating which of the following six modes: When the mode is determined to be the first mode or the third mode, and when the fixed compression ratio mode is selected, the set reproduction speed magnification is set. Where n is a compression rate of 1 / n or more, the compression / decompression processing is performed at a compression rate determined according to the program type set by the operator, and when the compression rate fluctuation mode is selected, the set reproduction is performed. The compression rate is 1 / n or more, where n is the speed magnification, and the compression rate is determined according to the program type set by the operator and the compression rate determined according to the amount of storage in the ring memory. A first processing means for performing a long process; a second processing means for deleting an audio signal until the storage amount of the ring memory becomes a state immediately before an underflow when the second mode or the fourth mode is determined; A third processing means for deleting the audio signal in the silent section when determined, and a set reproduction speed magnification of n when determined to be the sixth mode.
And a fourth processing means for performing a compression / expansion process at a compression ratio of 1 / n ± α (where α is a value of 0 or more and 1 or less). The speech speed conversion device according to 1.

12. The section discriminating means includes means for calculating a power average value of a required number of audio signals input to the frame memory, and based on the calculated power average value and a given threshold value. Determining means for determining whether the input voice is a voice section or a silent section.
12. The speech speed conversion device according to any one of claims 11 and 11.

13. The section discriminating means includes means for calculating a cumulative power value of a required number of audio signals input to a frame memory, and based on the calculated cumulative power value and a given threshold value. Determining means for determining whether the input voice is a voice section or a silent section.
12. The speech speed conversion device according to any one of claims 11 and 11.

14. The section discriminating means includes means for calculating an amplitude average value of a required number of audio signals input to the frame memory, and based on the calculated amplitude average value and a given threshold value. Determining means for determining whether the input voice is a voice section or a silent section.
12. The speech speed conversion device according to any one of claims 11 and 11.

15. The section discriminating means includes means for calculating a cumulative amplitude value of a required number of audio signals input to the frame memory, and based on the calculated cumulative amplitude value and a given threshold value. Determining means for determining whether the input voice is a voice section or a silent section.
12. The speech speed conversion device according to any one of claims 11 and 11.

16. The section discriminating means includes: detecting means for detecting the periodicity of a required number of audio signals input to the frame memory; and determining whether the input voice is a voice section or a silent section based on the detected cycle. And determining means for determining.
12. The speech speed conversion device according to any one of claims 11 and 11.

17. The section discriminating means includes: calculating means for calculating a power spectrum of a required number of audio signals input to the frame memory for one or more predetermined frequency bands; and calculating the power spectrum and a given power spectrum. Determining means for determining whether the input voice is a voice section or a silent section based on the threshold value.
12. The speech speed conversion device according to any one of claims 11 and 11.

18. The apparatus according to claim 12, wherein said threshold value is adjusted according to the amount of storage in said ring memory.
18. The speech speed conversion device according to any one of claims 17 and 17.

19. The speech speed conversion device according to claim 9, wherein said first processing means performs compression / expansion processing in units of a pitch cycle or in integral multiples of a pitch cycle.

20. A speech speed conversion apparatus according to claim 9, wherein said first processing means performs compression / expansion processing in fixed frame length units.

21. The speech speed conversion device according to claim 9, wherein the silence deletion start point discrimination value is adjusted according to the amount of storage in the ring memory.