JP2001222300A

JP2001222300A - Voice reproducing device and recording medium

Info

Publication number: JP2001222300A
Application number: JP2000030959A
Authority: JP
Inventors: Atsushi Imai; 篤今井; Nobumasa Seiyama; 信正清山; Toru Tsugi; 徹都木
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2000-02-08
Filing date: 2000-02-08
Publication date: 2001-08-17

Abstract

PROBLEM TO BE SOLVED: To reproduce a voice at high speed in a time frame nearly equal to the time frame required for reproducing a whole recorded voice at a specified speed while making the voice easy to listen by reducing omission of information as much as possible when the recorded voice is reproduced at the specified speed. SOLUTION: When reproducing instruction of the voice data becoming an n-fold speed (specified speed) reproducing object is inputted, the voice data supplied through a communication line, a CD-RW and a DVD, etc., are separated to a voice section from a non-voice section by a voice analytic part 2a. Then, the speech speed of voice section held between non-voice sections of a fixed time length or above is converted so that the speed becomes later at its top part than the prescribed reproducing speed, and is returned gradually to the prescribed reproducing speed toward the end by a speech speed conversion part 3a. The speech speed converted voice signal and the voice signal of the non-voice section separated by the voice analytic part 2a are synthesized by a synthetic part 5a, and then, the voice is reproduced at the n-fold speed without delaying remarkably from the reproducing time frame while making an important part easy to listen.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、通常の発話速度を
大幅に上回る高速な音声の聞き取りを補助する音声再生
装置および記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound reproducing apparatus and a recording medium for assisting a user to hear a high-speed sound which greatly exceeds a normal utterance speed.

【０００２】［発明の概要］本発明は任意の信号処理方
法によって、予め記録された音声信号を、声の高さを変
更すること無く、または意図的に声の高さを変化させな
がら、収録時よりも高速で再生する場合において、発話
の開始部分や、聞こえに大切な情報と思われる音声部分
を、声の高さや強さなどの情報から自動的に検出し、そ
の部分を前後の発話速度より遅く変換する一方、非音声
部分や、発話末尾のパワーの小さい部分などを適応的に
削除することで、従来聴敢不可能であった１０倍速再生
音声程度までの高速音声の内容把握を可能にするもので
ある。[Summary of the Invention] The present invention records an audio signal recorded in advance by an arbitrary signal processing method without changing the pitch of the voice or intentionally changing the pitch of the voice. When playing back faster than usual, the beginning of an utterance and the audio part that is considered important for hearing are automatically detected from information such as the pitch and strength of the voice, and that part is uttered before and after. While converting slower than the speed, non-voice parts and low power parts at the end of the speech are adaptively deleted, so that the contents of high-speed voices up to about 10 times high-speed playback voices, which were previously impossible to hear, can be grasped. Is what makes it possible.

【０００３】[0003]

【従来の技術】従来、音声の早聞きは、アナログ方式の
テープレコーダやＶＴＲに記録されたものを早回し再生
し、再生速度に比例して声の高さの変化した音声を聞く
方法や、デジタル方式のテープレコーダやＶＴＲに記録
されたものを、再生速度に比例してデータを読み飛ばす
ことにより、離散的な音声を聞く方法が一般的であっ
た。2. Description of the Related Art Conventionally, a method of listening to audio at a high speed is a method of playing back an audio recorded on an analog tape recorder or VTR at a high speed, and listening to a voice whose voice pitch has changed in proportion to the playback speed. In general, a method of listening to discrete sounds by skipping data recorded in a digital tape recorder or VTR in proportion to the reproduction speed has been used.

【０００４】また、基本周波数を逐次抽出し、その波形
単位での間引き処理を行うことにより、声の高さを変化
させずに連続的な高速音声を再生する話速変換方式（特
許第２９５５２４７号）が提案されている。[0004] A speech speed conversion method (Japanese Patent No. 2955247) for successively extracting fundamental frequencies and performing thinning-out processing on a waveform basis to reproduce continuous high-speed voice without changing the pitch of voice. ) Has been proposed.

【０００５】[0005]

【発明が解決しようとする課題】ところで、このような
音声の早聞き方法のうち、アナログテープレコーダーに
よる方式では、データの欠落はないものの、音の高さの
変化により非常に聞き辛く、また、デジタルテープレコ
ーダによる方式では、再生速度の上昇につれて情報欠落
が増大するため、両方式共に２倍速程度までの聞き取り
が限界であった。Among the fast listening methods of such a sound, the method using an analog tape recorder has no data loss, but is very hard to hear due to a change in pitch. In the system using the digital tape recorder, information loss increases as the reproduction speed increases, so that both systems have a limit of listening up to about double speed.

【０００６】また、上記の話速変換方式では、さらに高
速な音声の聞き取りが期待できるが、従来、放送素材Ｖ
ＴＲなどの内容検索を行う場合には、映像主体の検索が
１０倍速程度で行われるのに対し、音声本体では高々、
２倍速程度までの検索しかできなかった。[0006] In the above-mentioned speech speed conversion method, it is expected that a higher-speed voice can be heard.
When performing a content search such as TR, the search mainly for video is performed at about 10 times speed, while the main body of audio is at most,
Only a search up to about 2x speed could be performed.

【０００７】このほか、部分的に情報を削除することに
より、音声区間の伸張を行う方式が提案されている（１
９９５信学総大Ｄ−６９５）。この方式は、メモリ上に
蓄積した音声データ量に応じて話速を制御したり、無音
区間の削除したりすることにより、音声の高速再生を行
う。[0007] In addition, there has been proposed a method of extending a voice section by partially deleting information (1).
995 IEEJ D-695). In this method, high-speed reproduction of voice is performed by controlling the voice speed according to the amount of voice data stored in a memory or deleting a silent section.

【０００８】しかしながら、この方式では、内容の重要
度に関係無く情報が削除されたり、話速の制御をされて
しまうことが多く、内容の聞き取りや把握が十分できな
いことがあった。However, in this method, information is often deleted or the speech speed is controlled irrespective of the importance of the content, so that the content cannot be heard or grasped sufficiently.

【０００９】これらのことから明らかなように、従来、
提案されているいずれの方式でも、内容を把握させるこ
とができる２倍速、３倍速程度の再生速度でしか、音声
の高速再生を行うことができない。As is apparent from the above, conventionally,
In any of the proposed systems, high-speed reproduction of sound can be performed only at a reproduction speed of about 2 × or 3 × which can make the contents understandable.

【００１０】このため、映像内容を検索する場合と同様
な１０倍速程度の再生速度で、音声の内容を検索できる
装置の開発が強く望まれていた。[0010] For this reason, there has been a strong demand for the development of an apparatus capable of retrieving audio contents at a reproduction speed of about 10 times the same as that for retrieving video contents.

【００１１】本発明は上記の事情に鑑み、情報の欠落を
なるべく少なく留めて、指定速度で再生された音声を聞
き取り易くさせながら、収録された音声全体を指定速度
で再生させるのに必要な時間枠とほぼ同じ時間枠で高速
再生させることができ、これによって収録された音声の
内容を番組編集者などに高速で検索させることができる
音声再生装置および記録媒体を提供することを目的とし
ている。[0011] In view of the above circumstances, the present invention minimizes the loss of information and minimizes the time required to reproduce the entire recorded voice at the specified speed while making it easier to hear the voice reproduced at the specified speed. It is an object of the present invention to provide an audio reproducing apparatus and a recording medium which can be played back at high speed in substantially the same time frame as the frame, thereby enabling a program editor or the like to search the contents of the recorded audio at high speed.

【００１２】[0012]

【課題を解決するための手段】上記の目的を達成するた
めに本発明は、請求項１では、再生対象となる音声信号
を音響分析して、音声区間の音声信号と非音声区間の音
声信号とに分離する音響分析部と、この音響分析部で分
離された音声区間の音声信号に対し、その冒頭部分を所
定の再生速度よりも遅くし、かつ末尾に向けて次第に所
定の再生速度に戻すように話速を変換する話速変換部
と、この話速変換部によって話速変換が行われた音声信
号と前記音響分析部で分離された非音声区間の音声信号
とを合成して変換音声信号を生成する合成部とを備えた
ことを特徴としている。In order to achieve the above object, according to the present invention, an audio signal to be reproduced is acoustically analyzed, and a voice signal in a voice section and a voice signal in a non-voice section are provided. And an audio analysis unit that separates the audio signal into audio sections separated by the audio analysis unit. The head of the audio signal is slower than a predetermined reproduction speed, and gradually returns to the predetermined reproduction speed toward the end. A speech speed conversion unit for converting speech speed, and a speech signal converted by speech speed conversion by the speech speed conversion unit and a speech signal in a non-speech section separated by the acoustic analysis unit. And a synthesizing unit for generating a signal.

【００１３】請求項２では、請求項１に記載の音声再生
装置において、前記話速変換部によって話速変換が行わ
れた音声信号の時間情報に基づき、前記音響分析部で分
離された非音声区間の音声信号を適応的に削除、または
圧縮して、前記合成部で話速変換後の前記音声信号と合
成させる非音声区間長制御部を備えたことを特徴として
いる。According to a second aspect of the present invention, in the audio reproducing apparatus according to the first aspect, the non-speech separated by the acoustic analysis unit based on time information of the audio signal subjected to the speech speed conversion by the speech speed conversion unit. A non-speech section length control unit is provided that adaptively deletes or compresses the speech signal in the section and combines the speech signal with the speech signal after the speech speed conversion in the combining unit.

【００１４】請求項３では、再生対象となる音声信号の
パワー値を判定して、前記音声信号を音声区間の音声信
号、非音声区間の音声信号に分離させるのに必要なパワ
ーしきい値を生成する音声・非音声判定部と、この音声
・非音声判定部によって得られたパワーしきい値を用い
て、再生対象となる音声信号を音声区間の音声信号と非
音声区間の音声信号とに分離する音響分析部と、この音
響分析部で分離された音声区間の音声信号に対し、その
冒頭部分を所定の再生速度よりも遅くし、かつ末尾に向
けて次第に所定の再生速度に戻すように話速を変換する
話速変換部と、この話速変換部によって話速変換が行わ
れた音声信号と前記音響分析部で分離された非音声区間
の音声信号とを合成して、変換音声信号を生成する合成
部とを備えたことを特徴としている。According to a third aspect of the present invention, the power value of the audio signal to be reproduced is determined, and a power threshold value required to separate the audio signal into an audio signal in an audio section and an audio signal in a non-audio section is determined. Using the voice / non-voice determination unit to generate and the power threshold obtained by the voice / non-voice determination unit, the voice signal to be reproduced is converted into a voice signal in a voice section and a voice signal in a non-voice section. A sound analysis unit to be separated, and the head of the sound signal of the sound section separated by the sound analysis unit is slower than a predetermined reproduction speed, and gradually returns to a predetermined reproduction speed toward the end. A speech speed conversion unit for converting a speech speed, a speech signal subjected to speech speed conversion by the speech speed conversion unit, and a speech signal in a non-speech section separated by the acoustic analysis unit; And a synthesizing unit for generating It is characterized.

【００１５】請求項４では、請求項３に記載の音声再生
装置において、前記音声・非音声判定部によって、再生
対象となる音声信号を音声区間の音声信号と非音声区間
の音声信号とに分離させるのに必要なパワーしきい値を
生成するとき、音声区間の音声信号を伸張させたことに
伴う原音声からの遅延時聞の蓄積量に比例させて、前記
パワーしきい値を適応的に変化させることを特徴として
いる。According to a fourth aspect of the present invention, in the audio reproducing apparatus according to the third aspect, the audio signal to be reproduced is separated into an audio signal in an audio section and an audio signal in a non-audio section by the audio / non-audio section. When generating the power threshold necessary for causing the power threshold to adaptively increase the power threshold in proportion to the amount of delay time accumulation from the original voice due to the expansion of the voice signal in the voice section. It is characterized by changing.

【００１６】請求項５では、再生対象となる音声信号を
音響分析して、音声区間の音声信号と非音声区間の音声
信号とに分離する音響分析部と、この音響分析部で分離
された音声区間の音声信号に対し、その冒頭部分を所定
の再生速度よりも遅くし、かつ末尾に向けて次第に所定
の再生速度に戻すように話速を変換する話速変換部と、
この話速変換部によって話速変換が行われた音声信号の
時間情報に基づき、前記音響分析部で分離された非音声
区間の音声信号を適応的に削除、または圧縮する際に、
所定の長さよりは短くせずに出力する非音声区間調整部
と、前記話速変換部によって話速変換が行われた音声信
号と前記非音声区間調整部で得られた非音声区間の音声
信号とを合成して、変換音声信号を生成する合成部とを
備えたことを特徴としている。According to a fifth aspect of the present invention, a sound analysis unit that performs sound analysis of a sound signal to be reproduced and separates the sound signal into a sound signal in a sound section and a sound signal in a non-speech section, and a sound separated by the sound analysis unit. A speech speed conversion unit that converts the speech speed so that the beginning portion of the audio signal in the section is slower than a predetermined reproduction speed, and gradually returns to the predetermined reproduction speed toward the end.
Based on the time information of the voice signal subjected to voice speed conversion by the voice speed conversion unit, when adaptively deleting or compressing the voice signal of the non-voice section separated by the acoustic analysis unit,
A non-speech section adjusting unit that outputs the speech signal without being shorter than a predetermined length, a speech signal that has undergone speech rate conversion by the speech rate conversion unit, and a speech signal in a non-speech section obtained by the non-speech section adjustment unit And a synthesizing unit for synthesizing and generating a converted audio signal.

【００１７】請求項６では、再生対象となる音声信号を
音響分析して、音声区間の音声信号と非音声区間の音声
信号とに分離する音響分析部と、この音響分析部で分離
された音声区間の音声信号に含まれる基本周波数を計算
する基本周波数演算部と、この基本周波数演算部で得ら
れた基本周波数の変化率に応じて、前記音響分析部で分
離された音声区間の音声信号を適応的に伸長させて、話
速を変換する話速変換部と、この話速変換部によって話
速変換が行われた音声信号と前記音響分析部で分離され
た非音声区間の音声信号とを合成して、変換音声信号を
生成する合成部とを備えたことを特徴としている。According to a sixth aspect of the present invention, a sound analysis unit that performs sound analysis of a sound signal to be reproduced and separates the sound signal into a sound signal in a sound section and a sound signal in a non-speech section, and a sound separated by the sound analysis unit. A fundamental frequency calculation unit for calculating a fundamental frequency included in the sound signal of the section, and a sound signal of the sound section separated by the sound analysis unit according to a change rate of the fundamental frequency obtained by the fundamental frequency calculation unit. A speech speed conversion unit that adaptively expands the speech speed to convert the speech speed, a speech signal subjected to speech speed conversion by the speech speed conversion unit, and a speech signal of a non-speech section separated by the acoustic analysis unit. A synthesizing unit for synthesizing and generating a converted audio signal.

【００１８】請求項７では、請求項６に記載の音声再生
装置において、前記話速変換部によって、前記音響分析
部で分離された音声区間の音声信号を伸長させる際、基
本周波数演算部で得られた基本周波数の変化率と、予め
設定されている変化率しきい値とを比較し、基本周波数
演算部で得られた基本周波数の変化率が予め設定されて
いる変化率しきい値を越えている区間の音声信号に対す
る伸長率を前後の音声信号に対する伸長率より大きくす
ることを特徴としている。According to a seventh aspect of the present invention, in the audio reproducing apparatus according to the sixth aspect, when the speech speed converting section expands the audio signal in the audio section separated by the acoustic analysis section, the audio signal is obtained by the fundamental frequency calculating section. The calculated change rate of the fundamental frequency is compared with a preset change rate threshold value, and the change rate of the fundamental frequency obtained by the fundamental frequency calculation unit exceeds the preset change rate threshold value. It is characterized in that the expansion rate for the audio signal in the section is larger than the expansion rate for the preceding and following audio signals.

【００１９】請求項８では、請求項６に記載の音声再生
装置において、前記話速変換部によって、前記音響分析
部で分離された音声区間の音声信号を伸長させる際、基
本周波数演算部で得られた基本周波数の変化率と、予め
設定されている変化率しきい値とを比較し、基本周波数
演算部で得られた基本周波数の変化率が予め設定されて
いる変化率しきい値を越えたとき、当該音声区間の出現
時刻から一定時間、または当該音声区間から以降に有声
音区間が一定数出現するまでの間、同じ伸張率で、音声
信号を伸長させることを特徴としている。According to an eighth aspect of the present invention, in the audio reproducing apparatus according to the sixth aspect, when the speech speed converting section expands the audio signal in the audio section separated by the acoustic analysis section, the expansion is performed by a fundamental frequency calculating section. The calculated change rate of the fundamental frequency is compared with a preset change rate threshold value, and the change rate of the fundamental frequency obtained by the fundamental frequency calculation unit exceeds the preset change rate threshold value. In this case, the audio signal is expanded at the same expansion rate for a certain period of time from the appearance time of the voice section or until a certain number of voiced sound sections appear after the voice section.

【００２０】請求項９では、コンピュータ装置を動作さ
せるプログラムが格納された記録媒体において、前記コ
ンピュータ装置にインストールされて、音声再生指示が
入力されたとき、前記コンピュータ装置内に、再生対象
となる音声信号を音響分析して、音声区間の音声信号と
非音声区間の音声信号とに分離する音響分析部と、この
音響分析部で分離された音声区間の音声信号に対し、そ
の冒頭部分を所定の再生速度よりも遅くし、かつ末尾に
向けて次第に所定の再生速度に戻すように話速を変換す
る話速変換部と、この話速変換部によって話速変換が行
われた音声信号と前記音響分析部で分離された非音声区
間の音声信号とを合成して、変換音声信号を生成する合
成部とを生成させる音声再生プログラムが格納されたこ
とを特徴としている。According to a ninth aspect of the present invention, in a recording medium in which a program for operating a computer device is stored, when a sound reproduction instruction is input and installed in the computer device, a sound to be reproduced is stored in the computer device. A sound analysis unit that performs sound analysis on a signal and separates the sound signal into a speech signal in a speech section and a speech signal in a non-speech section. A speech speed conversion unit that changes the speech speed so as to be slower than the playback speed and gradually returns to a predetermined playback speed toward the end; an audio signal whose speech speed has been converted by the speech speed conversion unit; An audio reproduction program for synthesizing the audio signal of the non-speech section separated by the analysis section and generating a synthesis section for generating a converted audio signal is stored. .

【００２１】請求項１０では、請求項９に記載の記録媒
体において、前記音声再生プログラムは、前記コンピュ
ータ装置にインストールされて、音声再生指示が入力さ
れたとき、前記コンピュータ装置内に、前記話速変換部
によって話速変換が行われた音声信号の時間情報に基づ
き、前記音響分析部で分離された非音声区間の音声信号
を適応的に削除、または圧縮して、前記合成部で話速変
換後の前記音声信号と合成させる非音声区間長制御部を
生成させることを特徴としている。According to a tenth aspect of the present invention, in the recording medium according to the ninth aspect, the voice reproduction program is installed in the computer device, and when a voice reproduction instruction is input, the voice speed is stored in the computer device. Based on the time information of the voice signal subjected to the voice speed conversion by the conversion unit, the voice signal of the non-voice section separated by the sound analysis unit is adaptively deleted or compressed, and the voice speed conversion is performed by the synthesis unit. It is characterized in that a non-voice section length control unit to be synthesized with the subsequent voice signal is generated.

【００２２】請求項１１では、コンピュータ装置を動作
させるプログラムが格納された記録媒体において、前記
コンピュータ装置にインストールされて、音声再生指示
が入力されたとき、前記コンピュータ装置内に、再生対
象となる音声信号のパワー値を判定して、前記音声信号
を音声区間の音声信号、非音声区間の音声信号に分離さ
せるのに必要なパワーしきい値を生成する音声・非音声
判定部と、この音声・非音声判定部によって得られたパ
ワーしきい値を用いて、再生対象となる音声信号を音声
区間の音声信号と非音声区間の音声信号とに分離する音
響分析部と、この音響分析部で分離された音声区間の音
声信号に対し、その冒頭部分を所定の再生速度よりも遅
くし、かつ末尾に向けて次第に所定の再生速度に戻すよ
うに話速を変換する話速変換部と、この話速変換部によ
って話速変換が行われた音声信号と前記音響分析部で分
離された非音声区間の音声信号とを合成して、変換音声
信号を生成する合成部とを生成させる音声再生プログラ
ムが格納されたことを特徴としている。According to an eleventh aspect of the present invention, in a recording medium storing a program for operating a computer device, when a sound reproduction instruction is input and installed in the computer device, a sound to be reproduced is stored in the computer device. An audio / non-speech determining unit that determines a power value of a signal and generates a power threshold necessary for separating the audio signal into an audio signal in an audio section and an audio signal in a non-audio section; Using the power threshold value obtained by the non-speech determination unit, an audio analysis unit that separates the audio signal to be reproduced into an audio signal in a voice section and an audio signal in a non-voice section, and the sound analysis unit The speech speed is converted so that the beginning of the speech signal in the speech section is slower than the predetermined playback speed, and gradually returns to the predetermined playback speed toward the end. A speech speed conversion unit, and a synthesis unit that synthesizes the speech signal subjected to speech speed conversion by the speech speed conversion unit and the speech signal of the non-speech section separated by the sound analysis unit to generate a converted speech signal. And a sound reproducing program for generating the following is stored.

【００２３】請求項１２では、請求項１１に記載の記録
媒体において、前記音声再生プログラムは、前記コンピ
ュータ装置にインストールされて、音声再生指示が入力
されたとき、前記コンピュータ装置内に、再生対象とな
る音声信号を音声区間の音声信号と非音声区間の音声信
号とに分離させるのに必要なパワーしきい値を生成し、
音声区間の音声信号を伸張させたことに伴う原音声から
の遅延時聞の蓄積量に比例させて、前記パワーしきい値
を適応的に変化させる前記音声・非音声判定部を生成さ
せることを特徴としている。According to a twelfth aspect of the present invention, in the recording medium according to the eleventh aspect, the audio reproduction program is installed in the computer device, and when an audio reproduction instruction is input, the audio reproduction program is stored in the computer device as a reproduction target. Generate a power threshold necessary to separate the audio signal into a voice signal in a voice section and a voice signal in a non-voice section,
Generating the voice / non-voice determination unit that adaptively changes the power threshold in proportion to the amount of delay time storage from the original voice due to the expansion of the voice signal in the voice section. Features.

【００２４】請求項１３では、コンピュータ装置を動作
させるプログラムが格納された記録媒体において、前記
コンピュータ装置にインストールされて、音声再生指示
が入力されたとき、前記コンピュータ装置内に、再生対
象となる音声信号を音響分析して、音声区間の音声信号
と非音声区間の音声信号とに分離する音響分析部と、こ
の音響分析部で分離された音声区間の音声信号に対し、
その冒頭部分を所定の再生速度よりも遅くし、かつ末尾
に向けて次第に所定の再生速度に戻すように話速を変換
する話速変換部と、この話速変換部によって話速変換が
行われた音声信号の時間情報に基づき、前記音響分析部
で分離された非音声区間の音声信号を適応的に削除、ま
たは圧縮する際に、所定の長さよりは短くせずに出力す
る非音声区間調整部と、この話速変換部によって話速変
換が行われた音声信号と前記非音声区間調整部で得られ
た非音声区間の音声信号とを合成して、変換音声信号を
生成する合成部とを生成させる音声再生プログラムが格
納されたことを特徴としている。According to a thirteenth aspect of the present invention, in a recording medium storing a program for operating a computer device, when a sound reproduction instruction is input and installed in the computer device, a sound to be reproduced is stored in the computer device. A sound analysis unit that performs sound analysis on a signal and separates the sound signal into a sound signal in a sound section and a sound signal in a non-speech section.
A speech speed conversion unit that converts the speech speed so that the beginning portion is slower than a predetermined playback speed and gradually returns to the predetermined playback speed toward the end, and the speech speed conversion is performed by the speech speed conversion unit. Non-speech section adjustment for outputting, without adaptively deleting or compressing a speech signal in a non-speech section separated by the sound analysis unit based on time information of the speech signal, without shortening the speech signal to a length shorter than a predetermined length. A synthesizing unit for synthesizing the voice signal subjected to the voice speed conversion by the voice speed converting unit and the voice signal of the non-voice section obtained by the non-voice section adjusting unit to generate a converted voice signal; Is stored.

【００２５】請求項１４では、コンピュータ装置を動作
させるプログラムが格納された記録媒体において、前記
コンピュータ装置にインストールされて、音声再生指示
が入力されたとき、前記コンピュータ装置内に、再生対
象となる音声信号を音響分析して、音声区間の音声信号
と非音声区間の音声信号とに分離する音響分析部と、こ
の音響分析部で分離された音声区間の音声信号に含まれ
る基本周波数を計算する基本周波数演算部と、この基本
周波数演算部で得られた基本周波数の変化率に応じて、
前記音響分析部で分離された音声区間の音声信号を適応
的に伸長させて、話速を変換する話速変換部と、この話
速変換部によって話速変換が行われた音声信号と前記音
響分析部で分離された非音声区間の音声信号とを合成し
て、変換音声信号を生成する合成部とを生成させる音声
再生プログラムが格納されたことを特徴としている。According to a fourteenth aspect of the present invention, in a recording medium storing a program for operating a computer device, when a voice reproduction instruction is input and installed in the computer device, a sound to be reproduced is stored in the computer device. A sound analysis unit that performs sound analysis on a signal and separates the signal into a sound signal in a sound section and a sound signal in a non-speech section, and calculates a fundamental frequency included in the sound signal in the sound section separated by the sound analysis unit. According to the frequency calculation unit and the change rate of the fundamental frequency obtained by the fundamental frequency calculation unit,
A speech speed conversion unit that adaptively expands the speech signal of the speech section separated by the sound analysis unit to convert a speech speed; a speech signal that has undergone speech speed conversion by the speech speed conversion unit; An audio reproduction program for synthesizing the audio signal of the non-speech section separated by the analysis unit and generating a synthesis unit for generating a converted audio signal is stored.

【００２６】請求項１５では、請求項１４に記載の記録
媒体において、前記音声再生プログラムは、前記コンピ
ュータ装置にインストールされて、音声再生指示が入力
されたとき、前記コンピュータ装置内に、前記音響分析
部で分離された音声区間の音声信号を伸長させる際、基
本周波数演算部で得られた基本周波数の変化率と、予め
設定されている変化率しきい値とを比較し、基本周波数
演算部で得られた基本周波数の変化率が予め設定されて
いる変化率しきい値を越えている区間の音声信号に対す
る伸長率を前後の音声信号に対する伸長率より大きくす
る話速変換部を生成させることを特徴としている。According to a fifteenth aspect of the present invention, in the recording medium according to the fourteenth aspect, the audio reproduction program is installed in the computer device, and when a voice reproduction instruction is input, the audio analysis program is stored in the computer device. When expanding the audio signal of the audio section separated by the section, the change rate of the fundamental frequency obtained by the fundamental frequency calculation section is compared with a preset change rate threshold, and the fundamental frequency calculation section A speech speed conversion unit for generating an expansion rate for an audio signal in a section in which the obtained change rate of the fundamental frequency exceeds a preset change rate threshold value greater than the expansion rate for the preceding and following audio signals. Features.

【００２７】請求項１６では、請求項１４に記載の記録
媒体において、前記音声再生プログラムは、前記コンピ
ュータ装置にインストールされて、音声再生指示が入力
されたとき、前記コンピュータ装置内に、前記音響分析
部で分離された音声区間の音声信号を伸長させる際、基
本周波数演算部で得られた基本周波数の変化率と、予め
設定されている変化率しきい値とを比較し、基本周波数
演算部で得られた基本周波数の変化率が予め設定されて
いる変化率しきい値を越えたとき、当該音声区間の出現
時刻から一定時間、または当該音声区間から以降に有声
音区間が一定数出現するまでの時間、同じ伸張率で、音
声信号を伸長させる話速変換部を生成させることを特徴
としている。According to a sixteenth aspect of the present invention, in the recording medium according to the fourteenth aspect, the sound reproduction program is installed in the computer device, and when a sound reproduction instruction is input, the sound analysis program is stored in the computer device. When expanding the audio signal of the audio section separated by the section, the change rate of the fundamental frequency obtained by the fundamental frequency calculation section is compared with a preset change rate threshold, and the fundamental frequency calculation section When the obtained rate of change of the fundamental frequency exceeds a preset rate of change threshold, a certain period of time from the appearance time of the voice section, or until a certain number of voiced sound sections appear after the voice section. During this time, a speech speed conversion unit for expanding the audio signal is generated at the same expansion rate.

【００２８】上記の各請求項の構成によれば、各音声の
開始部分を指定速度よりもゆっくりとした速度で、再生
させるとともに、情報の欠落をなるべく少なく留めて、
指定速度で再生された音声を聞き取り易くさせながら、
収録された音声全体を指定速度で再生させるのに必要な
時間枠とほぼ同じ時間枠で、高速再生させ、これによっ
て収録された音声の内容を番組編集者などに高速で検索
させることができる。According to the configuration of each of the above-mentioned claims, the start portion of each sound is reproduced at a speed lower than the designated speed, and the loss of information is kept as small as possible.
While making it easier to hear the sound played at the specified speed,
High-speed playback is performed in a time frame substantially the same as the time frame required to reproduce the entire recorded audio at the specified speed, and thereby the content of the recorded audio can be searched at high speed by a program editor or the like.

【００２９】[0029]

【発明の実施の形態】《第１の実施形態》図１は本発明
による音声再生装置および記録媒体のうち、請求項１、
２に対応する一実施形態を示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS <First Embodiment> FIG. 1 shows an audio reproducing apparatus and a recording medium according to the present invention.
FIG. 3 is a block diagram showing an embodiment corresponding to FIG.

【００３０】図１に示すように、この音声再生装置１ａ
は、音響分析部２ａと、話速変換部３ａと、非音声区間
長制御部４ａと、合成部５ａとを備え、供給される音声
データを音声区間と、非音声区間とに分離させるととも
に、一定時間長以上の非音声区間に扶まれた音声区間に
対し、その冒頭部分が所定の再生速度よりも遅くなり、
かつ末尾に向けて次第に所定の再生速度に戻すように話
速変換させて、重要な部分を聞き易くさせながら、再生
時間枠から大きく遅らせることなく指定速度（ｎ倍速）
で音声を再生させる。なお、供給される音声データとし
ては、通信回線などを介してＣＤ−ＲＷやＤＶＤなど
に、映像データとともに記録された音声データ、あるい
は通信回線を介して供給される音声データ（主調整室、
副調整室などから出力される映像データと対にされた音
声データ）、あるいはビデオテープなどにアナログ記録
された映像信号、音声信号をデジタル化して得られた音
声データ、あるいはビデオテープなどに映像データとと
もにデジタル記録された音声データなどである。以下の
第２の実施形態〜第４の実施形態においても同様であ
る。As shown in FIG. 1, the audio reproducing apparatus 1a
Includes an acoustic analysis unit 2a, a speech speed conversion unit 3a, a non-speech section length control unit 4a, and a synthesis unit 5a, and separates supplied speech data into a speech section and a non-speech section. For a voice section that is dependent on a non-voice section that is longer than a certain length of time, the beginning of that section is slower than the predetermined playback speed,
The speech speed is converted so as to gradually return to the predetermined playback speed toward the end, so that important parts can be easily heard, and the designated speed (n times speed) without greatly delaying from the playback time frame.
To play the sound. The audio data supplied may be audio data recorded along with video data on a CD-RW or DVD via a communication line or the like, or audio data supplied via a communication line (main control room,
Audio data paired with video data output from the sub-control room, etc.), video signals recorded analogly on a video tape, etc., audio data obtained by digitizing audio signals, or video data on a video tape, etc. And audio data recorded digitally. The same applies to the following second to fourth embodiments.

【００３１】音響分析部２ａは、予め設定されているパ
ワーしきい値を用いて、ｎ倍速（ｎ；ｎ＞０の有理数、
以下同じ）で再生するように指定された音声データを音
声区間と非音声区間とに分離し、音声区間に含まれてい
る音声データと、この音声データに対する時間情報とを
話速変換部３ａに供給するとともに、非音声区間の音声
データを非音声区間長制御部４ａに供給する。The acoustic analysis unit 2a uses a power threshold value set in advance to perform n-times speed (n; a rational number of n> 0,
Voice data designated to be reproduced in the same manner) is divided into a voice section and a non-voice section, and the voice data included in the voice section and time information for this voice data are transmitted to the speech speed conversion section 3a. At the same time, the voice data of the non-voice section is supplied to the non-voice section length control unit 4a.

【００３２】話速変換部３ａは、音響分析部２ａから出
力された音声データ（音声区間に含まれている音声デー
タ）の音声波形を所定の規則、例えば一息で発声された
音声の開始部分を必ず所望の再生速度、例えばｎ倍速よ
り相対的に遅く変換させ、かつ残りの音声を末尾に向け
て漸次、所定の再生速度に戻すように変換させるという
規則で伸長させて、音声区間の話速を逐次、変化させ、
これによって得られた音声データを合成部５ａに供給す
るとともに、音響分析部２ａから出力された時間情報
（音声区間の音声データに付加されている時間情報）
と、話速変換後における音声データの時間情報とを比較
して、話速変換前の音声波形に対する話速変換後におけ
る音声波形の遅延時間を示す遅延時間情報を生成し、こ
れを非音声区間長制御部４ａに供給する。The speech speed conversion section 3a converts the speech waveform of the speech data (speech data included in the speech section) output from the sound analysis section 2a into a predetermined rule, for example, a start portion of a speech uttered in a breath. The speech speed of the voice section is always expanded by the rule that the speed is converted relatively lower than the desired playback speed, for example, n times speed, and the remaining voice is gradually converted toward the end to return to the predetermined playback speed. Is changed successively,
The obtained audio data is supplied to the synthesizing unit 5a, and the time information output from the acoustic analysis unit 2a (time information added to the audio data of the voice section)
Is compared with the time information of the voice data after the speech speed conversion to generate delay time information indicating the delay time of the voice waveform after the voice speed conversion with respect to the voice waveform before the voice speed conversion. It is supplied to the length controller 4a.

【００３３】非音声区間長制御部４ａは、話速変換部３
ａから出力される遅延時間情報に基づき、音声分析部２
ａから出力される音声データ（非音声区間に含まれる音
声データ）を適応的に削除、または圧縮して、非音声区
間に含まれる音声データの長さを話速変換部３ａの音声
データ伸長処理で生じた遅れ時間を解消させるのに必要
な長さにするとともに、この削除処理、圧縮処理で得ら
れた非音声区間の音声データを合成部５ａに供給する。The non-speech section length control unit 4 a
a based on the delay time information output from the
The voice data output from a is adaptively deleted or compressed, and the length of the voice data included in the non-voice section is subjected to voice data expansion processing of the speech speed conversion unit 3a. In addition to the length necessary to eliminate the delay time caused by the above, the voice data of the non-voice section obtained by the deletion processing and the compression processing is supplied to the synthesis unit 5a.

【００３４】合成部５ａは、話速変換部３ａから出力さ
れる話速変換済みの音声データと、非音声区間長制御部
４ａから出力される削除、圧縮処理済みの音声データと
を合成するとともに、この合成処理で得られた音声デー
タをサウンドボード（図示は省略する）に供給して、ス
ピーカから、遅れ時間が少なく、かつ聞き取り易い音声
を出力させる。The synthesizing unit 5a synthesizes the speech speed converted speech data output from the speech speed conversion unit 3a with the deleted and compressed speech data output from the non-speech section length control unit 4a. Then, the audio data obtained by the synthesis processing is supplied to a sound board (not shown), and a speaker outputs a sound with a small delay time and easy to hear.

【００３５】このように、この第１の実施形態では、指
定速度（ｎ倍速）で再生対象となる音声データの再生指
示が入力されたとき、供給される音声データを音声区間
と、非音声区間とに分離させた後、一定時間長以上の非
音声区間に扶まれた音声区間に対し、その冒頭部分が所
定の再生速度よりも遅くなり、かつ末尾に向けて次第に
所定の再生速度に戻すように話速変換させて、重要な部
分を聞き易くさせながら、再生時間枠から大きく遅らせ
ることなくｎ倍速で音声を再生させるようにしているの
で、収録した音声をｎ倍速で再生させるとき、高速再生
対象となる音声を一息で発声し得る単位に分割させ、各
音声の開始部分をｎ倍速よりもゆっくりとした速度で、
再生させ、これによって情報の欠落をなるべく少なく留
めて、ｎ倍速で再生された音声を聞き取り易くさせなが
ら、収録された音声全体をｎ倍速で再生させるのに必要
な時間枠とほぼ同じ時間枠で、音声を高速再生させ、収
録された音声の内容を番組編集者などに高速で検索させ
ることができる（請求項１の効果）。As described above, in the first embodiment, when an instruction to reproduce audio data to be reproduced at a specified speed (n times speed) is input, the supplied audio data is divided into an audio section and a non-audio section. After that, for a voice section that is dependent on a non-voice section that is longer than a certain time length, the beginning of the voice section is slower than the predetermined playback speed, and gradually returns to the predetermined playback speed toward the end. The sound speed is converted to the sound speed, so that the important parts are easy to hear, and the sound is played back at n times speed without greatly delaying from the playback time frame. The target voice is divided into units that can be uttered in a single breath, and the start of each voice is divided at a speed slower than n times speed.
Playback, thereby reducing the loss of information as much as possible, making it easier to hear the sound reproduced at n times speed, and in the same time frame as that required to reproduce the entire recorded sound at n times speed. The audio can be reproduced at high speed, and the contents of the recorded audio can be searched at high speed by a program editor or the like (effect of claim 1).

【００３６】また、この第１の実施形態では、収録した
音声をｎ倍速で再生させるとき、話速変換部３ａによっ
て、音声部分の波形長を長くした分だけ、非音声区間長
制御部４ａによって、非音声区間の長さを短くした後、
合成部５ａによって、音声区間の音声データと、非音声
区間の音声データとを加算させて、変換音声データを生
成させるようにしているので、情報の欠落をなるべく少
なく留めて、ｎ倍速で再生された音声を聞き取り易くさ
せながら、話速変換前における音声区間、非音声区間全
体の長さと、話速変換後における音声区間、非音声区間
全体の長さとをほぼ同じ長さにすることができる（請求
項２の効果）。In the first embodiment, when the recorded voice is played back at n times speed, the speech speed converter 3a controls the non-voice section length controller 4a by the length of the waveform length of the voice portion. , After shortening the length of non-speech segments,
The synthesis unit 5a adds the voice data in the voice section and the voice data in the non-voice section to generate converted voice data, so that the information is reproduced at n times speed while minimizing the loss of information. It is possible to make the length of the entire voice section and non-voice section before the speech rate conversion and the length of the entire voice section and non-voice section after the voice rate conversion substantially equal to each other while making the voice easy to hear ( Effect of Claim 2).

【００３７】《第２の実施形態》図２は本発明による音
声再生装置および記録媒体のうち、請求項３、４、に対
応する一実施形態を示すブロック図である。[Second Embodiment] FIG. 2 is a block diagram showing an embodiment of a sound reproducing apparatus and a recording medium according to the present invention, corresponding to the third and fourth aspects.

【００３８】この図に示す音声再生装置１ｂは、音声・
非音声判定部６と、音響分析部２ｂと、話速変換部３ｂ
と、非音声区間長制御部４ｂと、合成部５ｂとを備え、
供給される音声データのパワー値に対応するしきい値を
用いて、音声データを音声区間と、非音声区間とに分離
させるとともに、一定時間長以上の非音声区間に扶まれ
た音声区間に対し、その冒頭部分が所定の再生速度より
も遅くなり、かつ末尾に向けて次第に所定の再生速度に
戻すように話速変換させて、重要な部分を聞き易くさせ
ながら、再生時間枠から大きく遅らせることなく指定速
度（ｎ倍速）で音声を再生させる。The audio reproducing apparatus 1b shown in FIG.
Non-voice determination unit 6, sound analysis unit 2b, speech speed conversion unit 3b
And a non-voice section length control unit 4b and a synthesis unit 5b.
Using a threshold value corresponding to the power value of the supplied voice data, the voice data is separated into a voice section and a non-voice section. The speech speed is converted so that the beginning portion is slower than the predetermined playback speed and gradually returns to the predetermined playback speed toward the end, so that the important portion is easily heard while the delay is greatly delayed from the playback time frame. And reproduce the sound at the specified speed (n times speed).

【００３９】この際、音声・非音声判定部６は、ｎ倍速
で再生するように指定された音声データのパワー値を検
知して、音声データを音声区間の音声データと非音声区
間の音声データとに分離するのに必要なパワーしきい値
を生成するとともに、非音声区間長制御部４ｂから出力
される遅延時間情報に基づき、話速変換処理によって音
声がどの程度、遅れているかを判定する。原音声に比べ
て、変換音声の遅れが目立つと判定されたとき、削除の
対象を増やすために、音声区間と判定される割合を減ら
す一方、非音声区間と判定される割合を増加させるよう
に、パワーしきい値を適応的に調整する。これによって
得られたパワーしきい値と、ｎ倍速で再生するように指
定された音声データとを音響分析部２ｂに供給する。At this time, the voice / non-voice determination unit 6 detects the power value of the voice data designated to be reproduced at n times speed, and converts the voice data into the voice data of the voice section and the voice data of the non-voice section. In addition to generating a power threshold value necessary to separate the voice signal into a voice signal, the speech speed conversion process determines how much the voice is delayed based on the delay time information output from the non-voice section length control unit 4b. . When it is determined that the delay of the converted voice is conspicuous compared to the original voice, in order to increase the number of objects to be deleted, the ratio determined as a voice segment is reduced, while the ratio determined as a non-voice segment is increased. , Adjust the power threshold adaptively. The power threshold value thus obtained and the audio data designated to be reproduced at n times speed are supplied to the acoustic analysis unit 2b.

【００４０】音響分析部２ｂは、音声・非音声判定部６
から出力されるパワーしきい値を用いて、音声・非音声
判定部６から出力される音声データを音声区間と非音声
区間とに分離し、音声区間に含まれている音声データ
と、この音声データに対する時間情報とを話速変換部３
ｂに供給するとともに、非音声区間の音声データを非音
声区間長制御部４ｂに供給する。The sound analysis unit 2b includes a voice / non-voice determination unit 6.
The voice data output from the voice / non-voice determination unit 6 is separated into a voice section and a non-voice section using the power threshold value output from the Speech rate converter 3 converts time information for data
b, and the voice data of the non-voice section is supplied to the non-voice section length control unit 4b.

【００４１】話速変換部３ｂは、音響分析部２ｂから出
力された音声データ（音声区間に含まれている音声デー
タ）の音声波形を所定の規則、例えば一息で発声された
音声の開始部分を必ず所望の再生速度、例えばｎ倍速よ
り相対的に遅く変換させ、かつ残りの音声を末尾に向け
て漸次、所定の再生速度に戻すように変換させるという
規則で、伸長させて、音声区間の話速を逐次、変化さ
せ、これによって得られた音声データを合成部５ｂに供
給するとともに、音響分析部２ｂから出力された時間情
報（音声区間の音声データに付加されている時間情報）
と、話速変換後における音声データの時間情報とを比較
して、話速変換前の音声波形に対する話速変換後におけ
る音声波形の遅延時間を示す遅延時間情報を生成し、こ
れを非音声区間長制御部４ｂに供給する。The speech speed conversion unit 3b converts the speech waveform of the speech data (speech data included in the speech section) output from the sound analysis unit 2b into a predetermined rule, for example, a start portion of a speech uttered in a breath. In accordance with the rule that the sound is converted relatively slower than the desired reproduction speed, for example, n times speed, and the remaining sound is gradually converted toward the end so as to return to the predetermined reproduction speed, the sound is expanded in the speech section. The speed is sequentially changed, and the obtained voice data is supplied to the synthesizing unit 5b, and the time information output from the acoustic analysis unit 2b (time information added to the voice data of the voice section)
Is compared with the time information of the voice data after the speech speed conversion to generate delay time information indicating the delay time of the voice waveform after the voice speed conversion with respect to the voice waveform before the voice speed conversion. It is supplied to the length control unit 4b.

【００４２】非音声区間長制御部４ｂは、話速変換部３
ｂから出力される遅延時間情報を音声・非音声判定部６
に転送しながら、前記遅延時間情報に基づき、音声分析
部２ｂから出力される音声データ（非音声区間に含まれ
る音声データ）を適応的に削除、または圧縮して、非音
声区間に含まれる音声データの長さを話速変換部３ｂの
音声データ伸長処理で生じた遅れ時間を解消させるのに
必要な長さにするとともに、この削除処理、圧縮処理で
得られた非音声区間の音声データを合成部５ｂに供給す
る。The non-speech section length control unit 4 b
b) the delay time information output from the voice / non-voice determination unit 6
The voice data (voice data included in the non-voice section) output from the voice analysis unit 2b is adaptively deleted or compressed based on the delay time information to transfer the voice data included in the non-voice section. The length of the data is set to a length necessary to eliminate the delay time caused by the voice data decompression processing of the voice speed conversion unit 3b, and the voice data of the non-voice section obtained by the deletion processing and compression processing is deleted. It is supplied to the synthesizing unit 5b.

【００４３】合成部５ｂは、話速変換部３ｂから出力さ
れる話速変換済みの音声データと、非音声区間長制御部
４ｂから出力される削除、圧縮処理済みの音声データと
を合成するとともに、この合成処理で得られた音声デー
タをサウンドボード（図示は省略する）に供給して、ス
ピーカから、遅れ時間が少なく、かつ聞き取り易い音声
を出力させる。The synthesizing unit 5b synthesizes the speech speed converted speech data output from the speech speed conversion unit 3b with the deleted and compressed speech data output from the non-speech section length control unit 4b. Then, the audio data obtained by the synthesis processing is supplied to a sound board (not shown), and a speaker outputs a sound with a small delay time and easy to hear.

【００４４】このように、この第２の実施形態では、ｎ
倍速再生対象となる音声データの再生指示が入力された
とき、供給される音声データのパワー値に対応するしき
い値を用いて、音声データを音声区間と、非音声区間と
に分離させた後、一定時間長以上の非音声区間に扶まれ
た音声区間に対し、その冒頭部分が所定の再生速度より
も遅くなり、かつ末尾に向けて次第に所定の再生速度に
戻すように話速変換させて、重要な部分を聞き易くさせ
ながら、再生時間枠から大きく遅らせることなくｎ倍速
で音声を再生させるようにしているので、収録した音声
をｎ倍速で再生させるとき、高速再生対象となる音声の
パワー値に応じた最適なパワーしきい値を使用させて、
音声区間中であっても、聞き取りに際しては重要度が低
いと考えられる部分を非音声区間と同様に効率的に削除
させることができ、これによって情報の欠落をなるべく
少なく留めて、ｎ倍速で再生された音声を聞き取り易く
させながら、収録された音声全体をｎ倍速で再生させる
のに必要な時間枠とほぼ同じ時間枠で、音声を高速再生
させ、収録された音声の内容を番組編集者などに高速で
検索させることができる（請求項３の効果）。As described above, in the second embodiment, n
When an instruction to reproduce audio data to be played at double speed is input, the audio data is separated into an audio section and a non-audio section using a threshold value corresponding to the power value of the supplied audio data. For a voice section that is dependent on a non-voice section that is longer than a certain time length, the voice speed is converted so that the beginning is slower than a predetermined playback speed and gradually returns to the predetermined playback speed toward the end. Since the audio is played back at n times speed without significantly delaying the playback time frame while making the important parts easy to hear, the power of the sound to be played at high speed when playing the recorded sound at n times speed Using the optimal power threshold according to the value,
Even during a voice section, it is possible to efficiently delete portions considered to be of low importance in the same way as a non-voice section during listening, thereby minimizing information loss and reproducing at n times speed. At the same time frame as necessary to play the entire recorded sound at n times speed while making the recorded sound easy to hear, the sound is played at high speed and the contents of the recorded sound are edited by the program editor etc. Can be searched at high speed (the effect of claim 3).

【００４５】また、この第２の実施形態では、収録した
音声をｎ倍速で再生させるとき、話速変換前の音声波形
に対する話速変換後における音声波形の遅延時間に応じ
て、高速再生対象となる音声データを音声区間の音声デ
ータと、非音声区間の音声データとに分離させるのに必
要なパワーしきい値を変化させ、非音声区間に含まれる
音声データのみならず、音声区間に含まれる音声データ
のうち、聞き取りに際しては重要度が低いと考えられる
音声部分をも削除させるようにしているので、音声区間
の長さに比べて、非音声区間が短いときでも、収録され
た音声全体をｎ倍速で再生させるのに必要な時間枠とほ
ぼ同じ時間枠で、聞き取り易さを保持させたまま、音声
を高速再生させることができる（請求項４の効果）。In the second embodiment, when the recorded voice is played back at n times speed, the high-speed playback target is determined according to the delay time of the voice waveform after the voice speed conversion with respect to the voice waveform before the voice speed conversion. The power threshold required to separate the audio data into the audio data of the voice section and the voice data of the non-voice section is changed, so that not only the voice data included in the non-voice section but also the voice data included in the voice section are changed. Of the audio data, the audio part considered to be less important for listening is also deleted, so even if the non-audio section is shorter than the length of the audio section, the entire recorded audio can be deleted. The audio can be reproduced at a high speed in a time frame substantially equal to the time frame required for the reproduction at the n-times speed while maintaining the easiness of hearing (the effect of claim 4).

【００４６】《第３の実施形態》図３は本発明による音
声再生装置および記録媒体のうち、請求項５に対応する
一実施形態を示すブロック図である。<< Third Embodiment >> FIG. 3 is a block diagram showing an embodiment corresponding to claim 5 of the audio reproducing apparatus and the recording medium according to the present invention.

【００４７】この図に示す音声再生装置１ｃは、音響分
析部２ｃと、話速変換部３ｃと、非音声区間長判定・制
御部７と、合成部５ｃとを備え、供給される音声データ
を音声区間と、非音声区間とに分離させるとともに、一
定時間長以上の非音声区間に扶まれた音声区間に対し、
その冒頭部分が所定の再生速度よりも遅くなり、かつ末
尾に向けて次第に所定の再生速度に戻すように話速変換
させ、さらに非音声区間の長さを一定長以上に保持さ
せ、重要な部分を聞き易くさせながら、再生時間枠から
大きく遅らせることなく指定速度（ｎ倍速）で音声を再
生させる。The audio reproducing apparatus 1c shown in this figure includes an acoustic analysis unit 2c, a speech speed conversion unit 3c, a non-speech section length determination / control unit 7, and a synthesis unit 5c. The voice section and the non-voice section are separated, and the voice section which is dependent on the non-voice section of a certain time length or more is
The speech speed is converted so that the beginning portion is slower than the predetermined playback speed and gradually returns to the predetermined playback speed toward the end, and the length of the non-speech section is kept at a certain length or more, and important portions are Is reproduced at a specified speed (n times speed) without greatly delaying from the reproduction time frame while making it easy to hear.

【００４８】この際、音響分析部２ｃは、予め設定され
ているパワーしきい値を用いて、ｎ倍速で再生するよう
に指定された音声データを音声区間と非音声区間とに分
離し、音声区間に含まれている音声データと、この音声
データに対する時間情報とを話速変換部３ｃに供給する
とともに、非音声区間の音声データを非音声区間長判定
部７に供給する。At this time, the acoustic analysis unit 2c separates the audio data designated to be reproduced at n times speed into a voice section and a non-voice section by using a preset power threshold, and The voice data included in the section and the time information for the voice data are supplied to the speech speed conversion unit 3c, and the voice data of the non-voice section is supplied to the non-voice section length determination unit 7.

【００４９】話速変換部３ｃは、音響分析部２ｃから出
力された音声データ（音声区間に含まれている音声デー
タ）の音声波形を所定の規則、例えば一息で発声された
音声の開始部分を必ず所望の再生速度、例えばｎ倍速よ
り相対的に遅く変換させ、かつ残りの音声を末尾に向け
て漸次、所定の再生速度に戻すように変換させるという
規則で、伸長させて、音声区間の話速を逐次、変化さ
せ、これによって得られた音声データを合成部５ｃに供
給するとともに、音響分析部２ｃから出力された時間情
報（音声区間の音声データに付加されている時間情報）
と、話速変換後における音声データの時間情報とを比較
して、話速変換前の音声波形に対する話速変換後におけ
る音声波形の遅延時間を示す遅延時間情報を生成し、こ
れを非音声区間長判定・制御部７に供給する。The speech speed conversion section 3c converts the speech waveform of the speech data (speech data included in the speech section) output from the sound analysis section 2c into a predetermined rule, for example, a start portion of a speech uttered in a breath. In accordance with the rule that the sound is converted relatively slower than the desired reproduction speed, for example, n times speed, and the remaining sound is gradually converted toward the end so as to return to the predetermined reproduction speed, the sound is expanded in the speech section. The speed is sequentially changed, and the obtained voice data is supplied to the synthesizing unit 5c, and the time information output from the acoustic analysis unit 2c (time information added to the voice data of the voice section)
Is compared with the time information of the voice data after the speech speed conversion to generate delay time information indicating the delay time of the voice waveform after the voice speed conversion with respect to the voice waveform before the voice speed conversion. It is supplied to the length determination / control unit 7.

【００５０】非音声区間長判定・制御部７は、音響分析
部２ｃから非音声区間の音声データが出力されていると
き、話速変換部３ｃから出力される遅延時間情報に基づ
き、音響分析部２ｃから出力される音声データ（非音声
区間に含まれる音声データ）を適応的に削除、または圧
縮して、非音声区間に含まれる音声データの長さを話速
変換部３ｃの音声データ伸長処理で生じた遅れ時間を解
消させるのに必要な長さにする際、予め設定された極め
て短い設定区間長（音声区間に含まれる音声データを１
０倍速で再生する場合には、例えば１００ｍｓ程度）を
最低限残すとともに、この削除処理、圧縮処理で得られ
た非音声区間の音声データを合成部５ｃに供給する。When the voice data of the non-voice section is output from the voice analysis section 2c, the non-voice section length determination / control section 7 controls the voice analysis section based on the delay time information output from the speech speed conversion section 3c. The voice data (voice data included in the non-voice section) output from 2c is adaptively deleted or compressed, and the length of the voice data included in the non-voice section is processed by the voice data decompression processing of the speech speed conversion unit 3c. In order to make the length necessary to eliminate the delay time caused by the above, a very short set section length (a sound data included in a sound section
In the case of reproducing at 0 times speed, for example, the minimum length of about 100 ms is left, and the audio data of the non-voice section obtained by the deletion processing and compression processing is supplied to the synthesizing unit 5c.

【００５１】合成部５ｃは、話速変換部３ｃから出力さ
れる話速変換済みの音声データと、非音声区間長判定部
７から出力される非音声区間の音声データまたは非音声
区間長制御部４ｃから出力される削除、圧縮処理済みの
音声データとを合成するとともに、この合成処理で得ら
れた音声データをサウンドボード（図示は省略する）に
供給して、スピーカから、遅れ時間が少なく、かつ聞き
取り易い音声を出力させる。The synthesizing unit 5c includes the speech speed converted speech data output from the speech speed conversion unit 3c and the speech data of the non-speech section output from the non-speech section length determination unit 7 or the non-speech section length control unit. 4c is combined with the deleted and compressed audio data, and the audio data obtained by the synthesis is supplied to a sound board (not shown). In addition, it outputs a sound that is easy to hear.

【００５２】このように、この第３の実施形態では、ｎ
倍速再生対象となる音声データの再生指示が入力された
とき、ＣＤ−ＲＷ、ＤＶＤ、通信回線などを介して供給
される音声データを音声区間と、非音声区間とに分離さ
せた後、一定時間長以上の非音声区間に扶まれた音声区
間に対し、その冒頭部分が所定の再生速度よりも遅くな
り、かつ末尾に向けて次第に所定の再生速度に戻すよう
に話速変換させ、さらに非音声区間の長さを一定長以上
に保持させ、重要な部分を聞き易くさせながら、再生時
間枠から大きく遅らせることなくｎ倍速で音声を再生さ
せるようにしているので、収録した音声の非音声区間を
削除させながら、収録した音声をｎ倍速で再生させると
き、発声の開始部分にある声立て境界などを残しなが
ら、高速再生対象となる音声を一息で発声し得る単位に
分割させ、各音声の開始部分をｎ倍速よりもゆっくりと
した速度で、再生させ、これによって情報の欠落をなる
べく少なく留めて、ｎ倍速で再生された音声を聞き取り
易くさせながら、収録された音声全体をｎ倍速で再生さ
せるのに必要な時間枠とほぼ同じ時間枠で、音声を高速
再生させ、収録された音声の内容を番組編集者などに高
速で検索させることができる（請求項５の効果）。As described above, in the third embodiment, n
When an instruction to reproduce audio data to be played at double speed is input, audio data supplied via a CD-RW, a DVD, a communication line, or the like is separated into an audio section and a non-audio section, and then a predetermined period of time. For the voice section that is dependent on the non-voice section longer than the length, the voice speed is converted so that the beginning of the voice section is slower than the predetermined playback speed and gradually returns to the predetermined playback speed toward the end. The length of the section is kept at a certain length or more, and the important parts are made easier to hear, and the sound is reproduced at n times speed without greatly delaying from the reproduction time frame. When the recorded voice is played back at n times speed while being deleted, the voice to be played at high speed is divided into units that can be uttered in a single breath while leaving the utterance boundary at the start of the voice, and the Play the entire recorded sound at n times speed while playing back the start portion at a speed slower than n times speed, thereby minimizing the loss of information and making it easier to hear the sound played at n times speed. In a time frame almost the same as the time frame required for the reproduction, the audio can be reproduced at a high speed, and the contents of the recorded audio can be searched by the program editor or the like at a high speed (effect of claim 5).

【００５３】《第４の実施形態》図４は本発明による音
声再生装置および記録媒体のうち、請求項６、７、８に
対応する一実施形態を示すブロック図である。<< Fourth Embodiment >> FIG. 4 is a block diagram showing an embodiment of a sound reproducing apparatus and a recording medium according to the present invention corresponding to claims 6, 7 and 8.

【００５４】この図に示す音声再生装置１ｄは、音響分
析部２ｄと、基本周波数演算部８と、話速変換部３ｄ
と、非音声区間長制御部４ｄと、合成部５ｄとを備え、
供給される音声データを音声区間と、非音声区間とに分
離させた後、一定時間長以上の非音声区間に扶まれた音
声区間に対し、その基本周波数の変動に応じて適応的に
波形伸長を行って、重要な部分を聞き易くさせながら、
再生時間枠から大きく遅らせることなく指定速度（ｎ倍
速）で音声を再生させる。The audio reproducing apparatus 1d shown in this figure includes an acoustic analyzing unit 2d, a fundamental frequency calculating unit 8, a speech speed converting unit 3d
And a non-voice section length control unit 4d and a synthesis unit 5d.
After the supplied voice data is separated into a voice section and a non-voice section, the waveform is expanded adaptively according to the change in the fundamental frequency of the voice section that is dependent on the non-voice section for a certain length of time or more. To make the important parts easier to hear,
The sound is reproduced at the specified speed (n times speed) without greatly delaying from the reproduction time frame.

【００５５】この際、音響分析部２ｄは、ｎ倍速で再生
するように指定された音声データの時間情報を話速変換
部３ｄに供給しながら、予め設定されているパワーしき
い値を用いて、ｎ倍速で再生するように指定された音声
データを音声区間と非音声区間とに分離し、音声区間に
含まれている音声データを基本周波数演算部８に供給す
るとともに、非音声区間の音声データを非音声区間長制
御部４ｄに供給する。At this time, the sound analysis unit 2d uses the preset power threshold value while supplying the time information of the audio data designated to be reproduced at n times speed to the speech speed conversion unit 3d. , The audio data designated to be reproduced at n times speed is separated into a voice section and a non-voice section, and the voice data included in the voice section is supplied to the fundamental frequency calculating section 8 and the voice of the non-voice section is The data is supplied to the non-voice section length control unit 4d.

【００５６】基本周波数演算部８は、音響分析部２ｄか
ら出力される音声区間の音声データによって示される音
声波形の基本周波数を逐次、計算し、この計算結果に基
づき、基本周波数情報を生成するとともに、この基本周
波数情報と、音響分析部２ｄから出力される音声区間の
音声データとを話速変換部３ｄに供給する。The fundamental frequency calculation unit 8 sequentially calculates the fundamental frequency of the speech waveform indicated by the speech data of the speech section output from the acoustic analysis unit 2d, and generates fundamental frequency information based on the calculation result. The basic frequency information and the audio data of the audio section output from the audio analysis unit 2d are supplied to the speech speed conversion unit 3d.

【００５７】話速変換部３ｄは、基本周波数演算部８か
ら出力された基本周波数情報で示される基本周波数の時
間的な変化率と、予め設定されている変化率しきい値と
を比較し、基本周波数の時間的な変化率が変化率しきい
値より小さいとき、基本周波数の時間的な変化に応じ
て、音響分析部２ｄから出力された音声データ（音声区
間に含まれている音声データ）の音声波形を適応的に伸
長させて、音声区間の話速を逐次、変化させ、また基本
周波数の時間的な変化率が変化率しきい値より大きいと
き、所定の規則、例えば基本周波数の時間的な変化率が
変化率しきい値より大きい区間だけ、音響分析部２ｄか
ら出力された音声データ（音声区間に含まれている音声
データ）の音声波形を前後区間の伸長率より大きい伸長
率で、伸長させるという規則、あるいは基本周波数の時
間的な変化率が変化率しきい値より大きくなった時刻か
ら一定時間（または、当該時刻が含まれる音声区間から
後に出現する有音声区間が一定の数に達するまでの
間）、同じ伸長率で、伸長させるという規則で、音声波
形を伸長させて、基本周波数の変化に依存する特定箇所
を安定させ、声の調子が変わった部分を強調させ、これ
によって得られた音声データを合成部５ｄに供給し、さ
らにこれらの動作と並行し、音響分析部２ｄから出力さ
れた時間情報（音声区間の音声データに付加されている
時間情報）と、話速変換後における音声データの時間情
報とを比較して、話速変換前の音声波形に対する話速変
換後における音声波形の遅延時間を示す遅延時間情報を
生成し、これを非音声区間長制御部４ｄに供給する。The speech speed converter 3d compares the temporal change rate of the fundamental frequency indicated by the fundamental frequency information output from the fundamental frequency calculator 8 with a preset change rate threshold. When the temporal change rate of the fundamental frequency is smaller than the threshold value of the change rate, the voice data (voice data included in the voice section) output from the acoustic analysis unit 2d according to the temporal change of the fundamental frequency. The speech waveform of the speech section is adaptively expanded, the speech speed of the speech section is sequentially changed, and when the temporal change rate of the fundamental frequency is larger than the change rate threshold, a predetermined rule, for example, the time of the fundamental frequency The audio waveform of the audio data (the audio data included in the audio section) output from the acoustic analysis unit 2d in the section where the dynamic change rate is larger than the change rate threshold is set at an expansion rate larger than the expansion rate of the preceding and succeeding sections. , When extended Rule or a certain period of time from the time when the temporal change rate of the fundamental frequency becomes larger than the change rate threshold (or until a certain number of voiced sections appearing after the voice section containing the time reach a certain number) In the meantime, the same expansion rate is used to expand the voice waveform by the rule of expansion, to stabilize a specific part depending on the change of the fundamental frequency, and to emphasize the part where the tone of the voice has changed. The voice data is supplied to the synthesizing unit 5d, and in parallel with these operations, the time information output from the acoustic analysis unit 2d (time information added to the voice data of the voice section) and By comparing the time information of the voice data with the time information of the voice data, delay time information indicating the delay time of the voice waveform after the voice speed conversion with respect to the voice waveform before the voice speed conversion is generated, and this is transmitted to the non-voice section length control unit 4d. To feed.

【００５８】非音声区間長制御部４ｄは、話速変換部３
ｄから出力される遅延時間情報に基づき、音声分析部２
ｄから出力される音声データ（非音声区間に含まれる音
声データ）を適応的に削除、または圧縮して、非音声区
間に含まれる音声データの長さを話速変換部３ｄの音声
データ伸長処理で生じた遅れ時間を解消させるのに必要
な長さにするとともに、この削除処理、圧縮処理で得ら
れた非音声区間の音声データを合成部５ｄに供給する。The non-speech section length control unit 4 d
d based on the delay time information output from the
d) Adaptively deletes or compresses the voice data (voice data included in the non-voice section) outputted from d, and determines the length of the voice data included in the non-voice section by the voice data expansion processing of the speech speed conversion unit 3d. In addition to the length necessary to eliminate the delay time caused by the above, the voice data of the non-voice section obtained by the deletion processing and the compression processing is supplied to the synthesis unit 5d.

【００５９】合成部５ｄは、話速変換部３ｄから出力さ
れる話速変換済みの音声データと、非音声区間長制御部
４ｄから出力される削除、圧縮処理済みの音声データと
を合成するとともに、この合成処理で得られた音声デー
タをサウンドボード（図示は省略する）に供給して、ス
ピーカから、遅れ時間が少なく、かつ聞き取り易い音声
を出力させる。The synthesizing unit 5d synthesizes the speech speed converted speech data output from the speech speed conversion unit 3d and the deleted and compressed speech data output from the non-speech section length control unit 4d. Then, the audio data obtained by the synthesis processing is supplied to a sound board (not shown), and a speaker outputs a sound with a small delay time and easy to hear.

【００６０】このように、この第４の実施形態では、ｎ
倍速再生対象となる音声データの再生指示が入力された
とき、供給される音声データを音声区間と、非音声区間
とに分離させた後、一定時間長以上の非音声区間に扶ま
れた音声区間に対し、その基本周波数の変動に応じて適
応的に波形伸長を行って、重要な部分を聞き易くさせな
がら、再生時間枠から大きく遅らせることなくｎ倍速で
音声を再生させるようにしているので、収録した音声を
ｎ倍速で再生させるとき、高速再生対象となる音声を一
息で発声し得る単位に分割させ、各音声の基本周波数が
変動した部分をｎ倍速よりもゆっくりとした速度で、再
生させることができ、これによって声の高さが変化した
部分を重点的に伸長させ、かつ情報の欠落をなるべく少
なく留め、ｎ倍速で再生された音声を聞き取り易くさせ
ながら、収録された音声全体をｎ倍速で再生させるのに
必要な時間枠とほぼ同じ時間枠で、音声を高速再生させ
て、収録された音声の内容を番組編集者などに高速で検
索させることができる（請求項６の効果）。As described above, in the fourth embodiment, n
When an instruction to reproduce audio data to be played at double speed is input, the supplied audio data is separated into an audio section and a non-audio section, and then the audio section that is interrupted by the non-audio section that is longer than a certain time length On the other hand, since the waveform is expanded adaptively according to the fluctuation of the fundamental frequency, the sound is reproduced at n times speed without greatly delaying from the reproduction time frame while the important part is easily heard. When playing back recorded voice at n times speed, the voice to be played at high speed is divided into units that can be uttered in one breath, and the portion where the fundamental frequency of each voice fluctuates is played back at a speed slower than n times speed. It is possible to focus on the part where the pitch of the voice has changed, and to reduce the loss of information as much as possible. The audio can be played back at high speed in a time frame substantially equal to the time frame required for playing back the entire audio at n times speed, and the contents of the recorded audio can be searched at high speed by a program editor or the like. 6 effect).

【００６１】また、この第４の実施形態では、基本周波
数の時間的な変化率が変化率しきい値より大きいときに
適応する規則として、基本周波数の時間的な変化率が変
化率しきい値より大きい区間だけ、音響分析部２ｄから
出力された音声データ（音声区間に含まれている音声デ
ータ）の音声波形を前後区間の伸長率より大きい伸長率
で、伸長させるという規則を選択させることができるよ
うにしているので、収録した音声をｎ倍速で再生させる
とき、各音声の基本周波数が大きく変動した部分を周囲
の速度よりも更にゆっくりとした速度で、再生させるこ
とができ、これによって情報の欠落をなるべく少なく留
めさせながら、声の高さが変化した部分を重点的に伸長
させて、ｎ倍速で再生された音声を聞き取り易くさせる
ことができる（請求項７の効果）。In the fourth embodiment, as a rule to be applied when the temporal change rate of the fundamental frequency is larger than the change rate threshold, the temporal change rate of the fundamental frequency is It is possible to select a rule to extend the audio waveform of the audio data (the audio data included in the audio section) output from the acoustic analysis unit 2d at an expansion rate larger than that of the preceding and succeeding sections only in the larger section. When the recorded sound is played back at n times speed, the part where the fundamental frequency of each sound fluctuates greatly can be played back at a speed slower than the surrounding speed, and this allows the information to be reproduced. While the loss of voices is kept as small as possible, the portion where the voice pitch has changed is emphasized and the voice reproduced at the n-times speed can be easily heard. The effect of section 7).

【００６２】また、この第４の実施形態では、基本周波
数の時間的な変化率が変化率しきい値より大きいときに
適応する規則として、基本周波数の時間的な変化率が変
化率しきい値より大きくなった時刻から一定時間（また
は、当該時刻が含まれる音声区間から以降に出現する有
音声区間が一定の数に達するまでの間）、同じ伸長率
で、伸長させるという規則を選択させることができるよ
うにしているので、収録した音声をｎ倍速で再生させる
とき、各音声の基本周波数が大きく変動した部分を当該
有声音区間に続く音声区間を一定時間、または、一定数
の有声音区間が出現するまでの間、同じゆっくりとした
速度で、再生させることができ、これによって情報の欠
落をなるべく少なく留めさせながら、声の高さが変化し
た部分を含む一定の区間を重点的に伸長させて、ｎ倍速
で再生された音声を聞き取り易くさせることができる
（請求項８の効果）。In the fourth embodiment, as a rule to be applied when the temporal change rate of the fundamental frequency is larger than the change rate threshold, the temporal change rate of the fundamental frequency is Select a rule to extend at the same decompression rate for a certain period of time (or until a certain number of voiced sections appearing after the voice section containing the time reaches a certain number) from the time when the time becomes larger. When the recorded voice is played back at n times speed, a portion where the fundamental frequency of each voice fluctuates greatly is replaced by a voice segment following the voiced voice segment for a certain period of time or a certain number of voiced voice segments. Until the appearance of the sound, it can be played back at the same slow speed, thereby keeping the loss of information as small as possible, It was allowed to focus on extending between, can be easier to hear the sound reproduced by the n-times speed (effect of claim 8).

【００６３】《他の実施形態》図５は、本発明による音
声再生プログラムを記録した記録媒体をコンピュータ装
置内にインストールして図１乃至図４に示す音声再生装
置を構成する一例を示すブロック図である。<< Other Embodiments >> FIG. 5 is a block diagram showing an example in which a recording medium on which an audio reproduction program according to the present invention is recorded is installed in a computer to constitute the audio reproduction apparatus shown in FIGS. It is.

【００６４】すなわち、記録媒体１１に格納されている
音声再生プログラム１３がインストールされたコンピュ
ータ装置によって音響分析部２ａ〜２ｄ、話速変換部３
ａ〜３ｄ、非音声区間長制御部４ａ〜４ｄ、合成部５ａ
〜５ｄ、音声・非音声判定部６、非音声区間長判定部
７、基本周波数演算部８が生成された音声再生装置１ａ
〜１ｄを構成したものであり、コンピュータ装置のキー
ボード、マウスなどが操作されて、ｎ倍速再生対象とな
る音声データの再生指示が入力されたとき、通信回線１
５やＣＤ−ＲＷ１６、ＤＶＤ１７などを介して供給され
る音声データを音声区間と、非音声区間とに分離させた
後、一定時間長以上の非音声区間に扶まれた音声区間に
対し、その冒頭部分が所定の再生速度よりも遅くなり、
かつ末尾に向けて次第に所定の再生速度に戻すように話
速変換させて、重要な部分を聞き易くさせながら、再生
時間枠から大きく遅らせることなくｎ倍速で音声を再生
させる。That is, the computer in which the sound reproducing program 13 stored in the recording medium 11 is installed is used by the sound analyzers 2a to 2d and the speech speed converter 3
a to 3d, non-voice section length control units 4a to 4d, synthesis unit 5a
To 5d, a voice / non-voice determining unit 6, a non-voice section length determining unit 7, and a voice reproducing device 1a in which a fundamental frequency calculating unit 8 is generated.
1d, the keyboard and the mouse of the computer device are operated to input an instruction to reproduce audio data to be reproduced at n-times speed.
5, audio data supplied via the CD-RW 16, DVD 17, etc., is separated into a voice section and a non-voice section. Part is slower than the specified playback speed,
In addition, the speech speed is converted so as to gradually return to a predetermined playback speed toward the end, and the audio is played back at n times speed without greatly delaying from the playback time frame while making important parts easy to hear.

【００６５】この場合、記録媒体１１は、ＣＤ−ＲＯ
Ｍ、ＤＶＤなどによって構成される記録媒体本体１２
と、この記録媒体本体１２に記録された音声再生プログ
ラム１３と、この音声再生プログラム１３をコンピュー
タ装置本体にセットさせて音声再生装置１ａ〜１ｄを構
築させるセットアッププログラム１４とを備えており、
インストール指示が入力されたとき、記録媒体本体１２
に格納されているセットアッププログラム１４によっ
て、記録媒体本体１２に格納されている音声再生プログ
ラム１３をコンピュータ装置本体のＣＰＵに転送させ
て、このハードディスク機構にインストールさせる。こ
のようにして図１〜図４に示した各音声再生装置１ａ〜
１ｄを構成することができる。In this case, the recording medium 11 is a CD-RO
Recording medium body 12 composed of M, DVD, etc.
And a sound reproducing program 13 recorded on the recording medium main body 12, and a setup program 14 for setting the sound reproducing program 13 in the computer main body to construct the sound reproducing devices 1a to 1d.
When an installation instruction is input, the recording medium body 12
The audio reproduction program 13 stored in the recording medium main body 12 is transferred to the CPU of the computer main body by the setup program 14 stored in the hard disk mechanism, and is installed in the hard disk mechanism. Thus, each of the audio reproducing apparatuses 1a to 1a shown in FIGS.
1d can be configured.

【００６６】なお上記の例では、音声再生装置１ａ〜１
ｄとして、記録媒体１１に格納されている音声再生プロ
グラム１３をコンピュータ装置にインストールして、音
響分析部２ａ〜２ｄ、話速変換部３ａ〜３ｄ、非音声区
間長制御部４ａ〜４ｄ、合成部５ａ〜５ｄ、音声・非音
声判定部６、非音声区間長判定部７、基本周波数演算部
８を生成させるようにしているが、ＬＳＩ素子、ＩＣ素
子、トランジスタ素子、抵抗、コンデンサ、コイルなど
のディスクリート部品を使用して、これら音響分析部２
ａ〜２ｄ、話速変換部３ａ〜３ｄ、非音声区間長制御部
４ａ〜４ｄ、合成部５ａ〜５ｄ、音声・非音声判定部
６、非音声区間長判定部７、基本周波数演算部８を作成
させて、音声再生装置を作成させるようにしても良い。In the above example, the audio reproducing devices 1a to 1a
As d, the sound reproduction program 13 stored in the recording medium 11 is installed in the computer device, and the sound analysis units 2a to 2d, the speech speed conversion units 3a to 3d, the non-speech section length control units 4a to 4d, and the synthesis unit 5a to 5d, a voice / non-voice determination unit 6, a non-voice section length determination unit 7, and a fundamental frequency calculation unit 8 are generated. However, LSI devices, IC devices, transistor devices, resistors, capacitors, coils, etc. Using discrete components, these acoustic analysis units 2
a to 2d, speech speed converters 3a to 3d, non-speech section length controllers 4a to 4d, synthesis sections 5a to 5d, speech / non-speech decision section 6, non-speech section length decision section 7, and fundamental frequency calculation section 8. Alternatively, a sound reproduction device may be created.

【００６７】そして、このようにして作成された音声再
生装置を使用させることにより、コンピュータ装置を使
用した音声再生装置１ａ〜１ｄよりも、高速で、かつ効
率の良い話速変換処理を行わせ、番組編集作業の効率を
さらに向上させることができる。Then, by using the audio reproducing device created in this way, the speech speed converting process can be performed faster and more efficiently than the audio reproducing devices 1a to 1d using the computer device. The efficiency of the program editing work can be further improved.

【００６８】[0068]

【発明の効果】以上説明したように本発明によれば、請
求項１および請求項９では、収録した音声を指定速度
（ｎ倍速）で再生させるとき、高速再生対象となる音声
を一息で発声し得る単位に分割させ、各音声の開始部分
を指定速度よりもゆっくりとした速度で、再生させるこ
とにより、情報の欠落をなるべく少なく留めて、指定速
度で再生された音声を聞き取り易くさせながら、収録さ
れた音声全体を指定速度で再生させるのに必要な時間枠
とほぼ同じ時間枠で、高速再生させることができ、これ
によって収録された音声の内容を番組編集者などに高速
で検索させることができる。As described above, according to the present invention, according to the first and ninth aspects, when a recorded voice is reproduced at a specified speed (n times speed), the voice to be reproduced at high speed is uttered in a breath. By splitting the sound into startable units and playing back the start of each sound at a speed slower than the specified speed, the loss of information is kept as small as possible, making it easier to hear the sound played at the specified speed, High-speed playback can be performed in the same time frame as that required to play the entire recorded audio at the specified speed, which allows program editors etc. to search the contents of the recorded audio at high speed Can be.

【００６９】請求項２および請求項１０では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声を一息で発声し得る単位に分割させ、各音声の開始
部分を指定速度よりもゆっくりとした速度で、再生させ
るとともに、無音声部分を効率的に削除させることによ
り、情報の欠落をなるべく少なく留めて、指定速度で再
生された音声を聞き取り易くさせながら、収録された音
声全体を指定速度で再生させるのに必要な時間枠とほぼ
同じ時間枠で、高速再生させることができ、これによっ
て収録された音声の内容を番組編集者などに高速で検索
させることができる。According to the second and tenth aspects, when the recorded voice is reproduced at the specified speed, the voice to be reproduced at high speed is divided into units that can be uttered in a single breath, and the start portion of each voice is set to be lower than the specified speed. By playing back at a slow speed and efficiently deleting non-sound parts, information loss is kept as small as possible, making it easier to hear the sound played at the specified speed, High-speed reproduction can be performed in a time frame substantially the same as the time frame required for reproduction at the specified speed, and thereby the contents of the recorded audio can be searched at high speed by a program editor or the like.

【００７０】請求項３および請求項１１では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声のパワー値に応じた最適なパワーしきい値を使用さ
せて、高速再生対象となる音声を一息で発声し得る単位
に分割させ、さらに各音声の開始部分を指定速度よりも
ゆっくりとした速度で、再生させるとともに、聞き取り
に際しては重要度が低いと考えられる音声部分および無
音声部分を効率的に削除させることにより、情報の欠落
をなるべく少なく留めて、指定速度で再生された音声を
聞き取り易くさせながら、収録された音声全体を指定速
度で再生させるのに必要な時間枠とほぼ同じ時間枠で、
高速再生させることができ、これによって収録された音
声の内容を番組編集者などに高速で検索させることがで
きる。According to the third and eleventh aspects, when the recorded audio is reproduced at a specified speed, the optimum power threshold value according to the power value of the audio to be reproduced at high speed is used, and Is divided into units that can be uttered in a single breath, and the start of each voice is played back at a speed slower than the specified speed, and voice parts and non-voice parts that are considered to be less important for listening Efficiently removes the information loss as much as possible, making it easier to hear the sound played at the specified speed, while maintaining the time frame required to play the entire recorded sound at the specified speed. In the same time frame,
High-speed reproduction can be performed, and thereby, the contents of the recorded audio can be searched at high speed by a program editor or the like.

【００７１】請求項４および請求項１２では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声のパワー値に応じた最適なパワーしきい値を使用さ
せて、高速再生対象となる音声を一息で発声し得る単位
に分割させ、さらに各音声の開始部分を指定速度よりも
ゆっくりとした速度で、再生させるとともに、聞き取り
に際しては重要度が低いと考えれる音声部分および無音
声部分をより効率的に削除させることにより、情報の欠
落をなるべく少なく留めて、指定速度で再生された音声
を聞き取り易くさせながら、収録された音声全体を指定
速度で再生させるのに必要な時間枠とほぼ同じ時間枠
で、高速再生させることができ、これによって収録され
た音声の内容を番組編集者などに高速で検索させること
ができる。According to the fourth and twelfth aspects, when the recorded sound is reproduced at the specified speed, the optimum power threshold value corresponding to the power value of the sound to be reproduced at high speed is used to determine whether the sound is to be reproduced at high speed. Is divided into units that can be uttered in a single breath, and the start of each voice is played back at a speed slower than the specified speed, and voice parts and non-voice parts that are considered to be less important for listening The time frame required to play the entire recorded sound at the specified speed while reducing the loss of information as much as possible, making it easier to hear the sound played at the specified speed High-speed playback can be performed in almost the same time frame, and thereby, the contents of the recorded audio can be searched at high speed by a program editor or the like.

【００７２】請求項５および請求項１３では、収録した
音声の非音声区間を削除させながら、収録した音声を指
定速度で再生させるとき、発声の開始部分にある声立て
境界などを残しながら、高速再生対象となる音声を一息
で発声し得る単位に分割させ、各音声の開始部分を指定
速度よりもゆっくりとした速度で、再生させることによ
り、情報の欠落をなるべく少なく留めて、指定速度で再
生された音声を聞き取り易くさせながら、収録された音
声全体を指定速度で再生させるのに必要な時間枠とほぼ
同じ時間枠で、高速再生させることができ、これによっ
て収録された音声の内容を番組編集者などに高速で検索
させることができる。According to the fifth and thirteenth aspects, when the recorded voice is reproduced at a specified speed while deleting the non-voice section of the recorded voice, the high speed is maintained while leaving a vocal boundary at the start of the voice. The audio to be played is divided into units that can be uttered in a single breath, and the start of each audio is played back at a speed slower than the specified speed, thereby minimizing information loss and playing at the specified speed. The recorded audio can be played back at high speed in a time frame almost the same as that required to play the entire recorded audio at the specified speed while making it easier to hear the recorded audio. Editors can search at high speed.

【００７３】請求項６および請求項１４では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声を一息で発声し得る単位に分割させ、各音声の基本
周波数が変動した部分を指定速度よりもゆっくりとした
速度で、再生させることにより、声の高さが変化した部
分を重点的に伸長させ、これによって情報の欠落をなる
べく少なく留めて、指定速度で再生された音声を聞き取
り易くさせながら、収録された音声全体を指定速度で再
生させるのに必要な時間枠とほぼ同じ時間枠で、高速再
生させて、収録された音声の内容を番組編集者などに高
速で検索させることができる。According to claim 6 and claim 14, when the recorded voice is reproduced at a specified speed, the voice to be reproduced at high speed is divided into units that can be uttered in a single breath, and the portion where the fundamental frequency of each voice fluctuates is reduced. By playing back at a speed slower than the specified speed, the part where the voice pitch has changed is emphasized, thereby reducing the loss of information as much as possible and listening to the sound played at the specified speed To make it easy to play back at high speed in the same time frame as that required to play the entire recorded audio at the specified speed, and to have the program editor search the contents of the recorded audio at high speed Can be.

【００７４】請求項７および請求項１５では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声を一息で発声し得る単位に分割させ、各音声の基本
周波数が大きく変動した部分を周囲の速度よりも更にゆ
っくりとした速度で、再生させることにより、声の高さ
が変化した部分を重点的に伸長させ、これによって情報
の欠落をなるべく少なく留めて、指定速度で再生された
音声を聞き取り易くさせながら、収録された音声全体を
指定速度で再生させるのに必要な時間枠とほぼ同じ時間
枠で、高速再生させて、収録された音声の内容を番組編
集者などに高速で検索させることができる。According to the seventh and fifteenth aspects, when a recorded voice is reproduced at a designated speed, the voice to be reproduced at a high speed is divided into units that can be uttered in a single breath, and a portion where the fundamental frequency of each voice greatly fluctuates. Was played back at a slower speed than the surrounding speed, so that the part where the voice pitch changed was emphasized, thereby minimizing information loss and playing back at the specified speed. At the same time frame as required to play the entire recorded sound at the specified speed while making the sound easy to hear, it is played back at high speed, and the contents of the recorded sound are sent to the program editor etc. at high speed. Can be searched.

【００７５】請求項８および請求項１６では、収録した
音声を指定速度で再生させるとき、高速再生対象となる
音声を一息で発声し得る単位に分割させ、各音声の基本
周波数が大きく変動した部分を中心として、当該有声音
区間に続く音声区間を一定時間、または、一定数の有声
音区間が出現するまでの間、同じゆっくりとした速度
で、再生させることにより、声の高さが変化した部分を
重点的に伸長させ、これによって情報の欠落をなるべく
少なく留めて、指定速度で再生された音声を聞き取り易
くさせながら、収録された音声全体を指定速度で再生さ
せるのに必要な時間枠とほぼ同じ時間枠で、高速再生さ
せて、収録された音声の内容を番組編集者などに高速で
検索させることができる。According to the eighth and sixteenth aspects, when the recorded voice is reproduced at a specified speed, the voice to be reproduced at high speed is divided into units that can be uttered in a single breath, and a portion where the fundamental frequency of each voice greatly fluctuates. The voice pitch was changed by playing the voice section following the voiced section for a certain period of time or at the same slow speed until a certain number of voiced sections appeared. The time frame required to play the entire recorded sound at the specified speed while focusing on the part, thereby minimizing the loss of information and making the sound played at the specified speed easier to hear In almost the same time frame, high-speed reproduction can be performed, and the contents of the recorded audio can be searched at high speed by a program editor or the like.

[Brief description of the drawings]

【図１】本発明による音声再生装置および記録媒体のう
ち、請求項１、２、９、１０に対応する一実施形態を示
すブロック図である。FIG. 1 is a block diagram showing an embodiment corresponding to claims 1, 2, 9, and 10 of an audio reproducing apparatus and a recording medium according to the present invention.

【図２】本発明による音声再生装置および記録媒体のう
ち、請求項３、４、１１、１２に対応する一実施形態を
示すブロック図である。FIG. 2 is a block diagram showing an embodiment corresponding to claims 3, 4, 11, and 12 of the audio reproducing apparatus and the recording medium according to the present invention.

【図３】本発明による音声再生装置および記録媒体のう
ち、請求項５、１３に対応する一実施形態を示すブロッ
ク図である。FIG. 3 is a block diagram showing an embodiment corresponding to claims 5 and 13 of the audio reproducing apparatus and the recording medium according to the present invention.

【図４】本発明による音声再生装置および記録媒体のう
ち、請求項６、７、８、１４、１５、１６に対応する一
実施形態を示すブロック図である。FIG. 4 is a block diagram showing an embodiment corresponding to claims 6, 7, 8, 14, 15, and 16 of the audio reproducing apparatus and the recording medium according to the present invention.

【図５】本発明による音声再生プログラムを記録した記
録媒体をコンピュータ装置内にインストールして図１乃
至図４に示す音声再生装置を構成する一例を示すブロッ
ク図である。FIG. 5 is a block diagram showing an example in which a recording medium on which an audio reproduction program according to the present invention is recorded is installed in a computer device to constitute the audio reproduction device shown in FIGS.

[Explanation of symbols]

１ａ〜１ｄ：音声再生装置２ａ〜２ｄ：音響分析部３ａ〜３ｄ：話速変換部４ａ、４ｂ、４ｄ：非音声区間長制御部５ａ〜５ｄ：合成部６：音声・非音声判定部７：非音声区間長判定・制御部（非音声区間調整部）８：基本周波数演算部１１：記録媒体１２：記録媒体本体１３：音声再生プログラム１４：セットアッププログラム１５：通信回線１６：ＣＤ−ＲＷ１７：ＤＶＤ 1a to 1d: voice reproducing device 2a to 2d: sound analysis unit 3a to 3d: speech speed conversion unit 4a, 4b, 4d: non-voice section length control unit 5a to 5d: synthesis unit 6: voice / non-voice determination unit 7: Non-voice section length determination / control section (non-voice section adjustment section) 8: fundamental frequency calculation section 11: recording medium 12: recording medium body 13: voice reproduction program 14: setup program 15: communication line 16: CD-RW 17: DVD

フロントページの続き (72)発明者都木徹東京都世田谷区砧一丁目10番11号日本放送協会放送技術研究所内Ｆターム(参考） 5D045 BA02 9A001 BB02 BB03 BB04 DD11 EE02 EE04 FF03 HH16 HH18 JJ72 KK43 KK60 Continued on the front page (72) Inventor Toru Toki 1-10-11 Kinuta, Setagaya-ku, Tokyo F-term in the Japan Broadcasting Corporation Research Institute of Broadcasting Technology (Reference) 5D045 BA02 9A001 BB02 BB03 BB04 DD11 EE02 EE04 FF03 HH16 HH18 JJ72 KK43 KK60

Claims

[Claims]

1. A sound analysis unit that performs sound analysis on a sound signal to be reproduced and separates the sound signal into a sound signal in a sound section and a sound signal in a non-speech section, and a sound in the sound section separated by the sound analysis unit. A speech speed conversion unit for converting the speech speed of the signal so that the beginning of the signal is slower than a predetermined reproduction speed and gradually returning to the end to the predetermined reproduction speed toward the end; An audio reproducing apparatus, comprising: a synthesizing unit that synthesizes a converted audio signal and an audio signal in a non-speech section separated by the acoustic analysis unit to generate a converted audio signal.

2. The sound reproducing apparatus according to claim 1, wherein the sound of the non-speech section separated by the sound analysis unit based on time information of the sound signal subjected to the speech speed conversion by the speech speed conversion unit. A sound reproducing apparatus, comprising: a non-speech section length control unit that adaptively deletes or compresses a signal and synthesizes the signal with the sound signal after the speech speed conversion in the synthesis unit.

3. A sound for determining a power value of a sound signal to be reproduced and generating a power threshold necessary for separating the sound signal into a sound signal in a sound section and a sound signal in a non-speech section. A non-speech determining unit, and a sound that separates a sound signal to be reproduced into a voice signal in a voice section and a voice signal in a non-voice section using the power threshold value obtained by the voice / non-voice determining unit. An analysis unit, for the audio signal of the audio section separated by the acoustic analysis unit, the speech speed is set so that the beginning portion thereof is slower than a predetermined reproduction speed, and gradually returns to the predetermined reproduction speed toward the end. A speech speed conversion unit to be converted; and a speech signal subjected to speech speed conversion by the speech speed conversion unit and a speech signal in a non-speech section separated by the acoustic analysis unit to generate a converted speech signal. And a combining unit. Audio playback device.

4. The audio reproduction apparatus according to claim 3, wherein the audio / non-speech determination unit separates the audio signal to be reproduced into an audio signal in an audio section and an audio signal in a non-audio section. When generating the required power threshold,
An audio reproducing apparatus characterized in that the power threshold value is adaptively changed in proportion to an amount of delay time accumulation from an original audio due to extension of an audio signal in an audio section.

5. A sound analysis unit that performs sound analysis of a sound signal to be reproduced and separates the sound signal into a sound signal in a sound section and a sound signal in a non-speech section, and a sound in the sound section separated by the sound analysis unit. A speech speed conversion unit for converting the speech speed of the signal so that the beginning of the signal is slower than a predetermined reproduction speed and gradually returning to the end to the predetermined reproduction speed toward the end; Based on the time information of the converted audio signal, when adaptively deleting or compressing the audio signal of the non-speech section separated by the acoustic analysis unit,
A non-speech section adjusting unit that outputs a voice signal without being shorter than a predetermined length; a speech signal whose speech rate has been converted by the speech rate conversion unit; and a speech signal of a non-speech section obtained by the non-speech section adjustment unit. And a synthesizing unit for synthesizing and generating a converted audio signal.

6. A sound analysis unit for sound-analyzing a sound signal to be reproduced and separating the sound signal into a sound signal in a sound section and a sound signal in a non-speech section, and a sound in the sound section separated by the sound analysis unit. A fundamental frequency calculator for calculating a fundamental frequency included in the signal, and adaptively converting a speech signal of a speech section separated by the acoustic analyzer in accordance with a change rate of the fundamental frequency obtained by the fundamental frequency calculator. A speech speed conversion unit for extending the speech speed and converting the speech speed; and synthesizing a speech signal subjected to speech speed conversion by the speech speed conversion unit and a speech signal of a non-speech section separated by the acoustic analysis unit. And a synthesizer for generating a converted audio signal.

7. The sound reproduction device according to claim 6, wherein the speech speed conversion unit expands a sound signal in a sound section separated by the sound analysis unit, and obtains a basic signal obtained by a fundamental frequency calculation unit. A section in which the change rate of the frequency is compared with a preset change rate threshold value, and the change rate of the fundamental frequency obtained by the fundamental frequency calculation section exceeds the preset change rate threshold value. A decompression ratio for the audio signal of the first and second audio signals is made larger than a decompression ratio for the preceding and succeeding audio signals.

8. The sound reproduction device according to claim 6, wherein the speech speed conversion unit expands a speech signal in a speech section separated by the sound analysis unit, and obtains a basic signal obtained by a fundamental frequency calculation unit. The change rate of the frequency is compared with a preset change rate threshold, and when the change rate of the fundamental frequency obtained by the fundamental frequency calculation unit exceeds the preset change rate threshold, An audio reproducing apparatus characterized in that an audio signal is expanded at the same expansion rate for a fixed time from the appearance time of the voice section or until a certain number of voiced sound sections appear after the voice section.

9. A recording medium in which a program for operating a computer device is stored, wherein when installed in the computer device and an audio reproduction instruction is input, an audio signal to be reproduced is sounded in the computer device. An audio analysis unit that analyzes and separates the audio signal into an audio signal in a voice section and an audio signal in a non-voice section; And a speech speed conversion unit that changes the speech speed so as to gradually return to a predetermined playback speed toward the end, and a speech signal whose speech speed has been converted by the speech speed conversion unit and the acoustic analysis unit. A recording medium, comprising: a synthesizing unit for synthesizing a separated non-voice section voice signal to generate a converted voice signal; and a voice reproduction program for generating a voice signal.

10. The recording medium according to claim 9, wherein the audio reproduction program is installed in the computer device, and when a voice reproduction instruction is input, the audio reproduction program is installed in the computer device by the speech speed conversion unit. Based on the time information of the speech signal subjected to the speech rate conversion, the speech signal of the non-speech section separated by the acoustic analysis unit is adaptively deleted or compressed, and the speech rate after the speech rate conversion is performed by the synthesis unit. A non-voice section length control unit for synthesizing with a voice signal is generated.

11. A recording medium in which a program for operating a computer device is stored, wherein the power of an audio signal to be reproduced is installed in the computer device when an audio reproduction instruction is input when the program is installed in the computer device. A voice threshold for generating a power threshold required to separate the voice signal into a voice signal in a voice section and a voice signal in a non-voice section by determining the voice signal.
A non-speech determining unit, and an audio analysis for separating a voice signal to be reproduced into a voice signal in a voice section and a voice signal in a non-voice section using a power threshold obtained by the voice / non-voice determining unit. And the voice signal of the voice section separated by the acoustic analysis unit, the speech speed is converted so that the beginning is slower than a predetermined playback speed and gradually returns to the predetermined playback speed toward the end. A speech rate conversion unit that performs speech rate conversion by the speech rate conversion unit and a speech signal in a non-speech section separated by the acoustic analysis unit to generate a converted speech signal. A recording medium characterized by storing an audio reproducing program for generating a sound reproducing program.

12. The recording medium according to claim 11, wherein the audio reproduction program is installed in the computer device, and when an audio reproduction instruction is input, an audio signal to be reproduced is stored in the computer device. Generates a power threshold required to separate the audio signal into the audio signal of the voice section and the voice signal of the non-voice section, and stores the delay time from the original voice due to the expansion of the voice signal in the voice section. A recording medium characterized by generating the voice / non-voice determination unit that adaptively changes the power threshold in proportion to an amount.

13. A recording medium in which a program for operating a computer device is stored. When a sound reproduction instruction is input and installed in the computer device, an audio signal to be reproduced is sounded in the computer device. An audio analysis unit that analyzes and separates the audio signal into an audio signal in a voice section and an audio signal in a non-voice section; And a speech speed conversion unit that changes the speech speed so as to gradually return to a predetermined playback speed toward the end, based on time information of the audio signal subjected to the speech speed conversion by the speech speed conversion unit, When adaptively deleting or compressing the audio signal of the non-speech section separated by the acoustic analysis unit,
A non-speech section adjustment unit that outputs a voice signal without being shorter than a predetermined length; a speech signal whose speech rate has been converted by the speech rate conversion unit; and a speech signal of a non-speech section obtained by the non-speech section adjustment unit. And a synthesizing unit for synthesizing and generating a converted audio signal, and a sound reproducing program for generating an audio signal.

14. A recording medium in which a program for operating a computer device is stored, wherein when the computer device is installed and an audio reproduction instruction is input, an audio signal to be reproduced is sounded in the computer device. An acoustic analysis unit that analyzes and separates the audio signal into a voice section and a non-voice section voice signal; and a basic frequency calculation unit that calculates a fundamental frequency included in the voice signal of the voice section separated by the acoustic analysis section. And a speech speed conversion unit that adaptively expands the speech signal of the speech section separated by the acoustic analysis unit and converts the speech speed according to the change rate of the fundamental frequency obtained by the fundamental frequency calculation unit. And synthesizing the voice signal subjected to the voice speed conversion by the voice speed conversion unit and the voice signal of the non-voice section separated by the acoustic analysis unit to generate a converted voice signal. Recording medium, wherein the audio playback program to generate a generating unit, is stored.

15. The recording medium according to claim 14, wherein the sound reproduction program is installed in the computer device, and is separated by the sound analysis unit in the computer device when a sound reproduction instruction is input. When expanding the voice signal of the voice section, the rate of change of the fundamental frequency obtained by the fundamental frequency calculation unit is compared with a preset change rate threshold, and obtained by the fundamental frequency calculation unit. Generating a speech speed conversion unit that makes the expansion rate for an audio signal in a section where the change rate of the fundamental frequency exceeds a preset change rate threshold larger than the expansion rate for the preceding and following audio signals; Recording medium.

16. The recording medium according to claim 14, wherein the sound reproduction program is installed in the computer device, and is separated by the sound analysis unit in the computer device when a sound reproduction instruction is input. When expanding the voice signal of the voice section, the rate of change of the fundamental frequency obtained by the fundamental frequency calculation unit is compared with a preset change rate threshold, and obtained by the fundamental frequency calculation unit. When the rate of change of the fundamental frequency exceeds a preset rate of change threshold, a certain time from the appearance time of the voice section, or a time until a certain number of voiced sound sections appear after the voice section, A recording medium characterized by generating a speech speed conversion unit for expanding an audio signal at the same expansion rate.