JPH08123483A

JPH08123483A - Speech time base conversion device

Info

Publication number: JPH08123483A
Application number: JP6260206A
Authority: JP
Inventors: Masayuki Misaki; 正之三崎; Takeshi Norimatsu; 武志則松; Kazuhiko Sato; 和彦佐藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-10-25
Filing date: 1994-10-25
Publication date: 1996-05-17
Anticipated expiration: 2016-07-16
Also published as: JP3189597B2

Abstract

PURPOSE: To perform conversion into the speed of a speech which is higher than a reproduction speed and lower than a recording speed while maintaining the interval at the time of recording when a sound signal is reproduced at a speed lower than that in the recording. CONSTITUTION: A speech which is reproduced at an M-fold speed (0<M<1) by a recording and reproduction part 101 is written in an input buffer 103 occasionally and a voiced/voiceless decision part 104 makes a voiced/voiceless decision. A time base control part 105 varies an expansion ratio for data read out of the input buffer 103 according to the decision result of a voiced section or voiceless section to expand the time base. The time-base expanded data written in an output buffer are properly converted by a D/A converter 111 into an analog signal in cycles T. A data remaining mount monitor part 109 measures the remaining amount of data which are not read out of the output buffer to the D/A converter 11 and still left and supplies the result to an expansion ratio control part 110. The expansion ratio control part 110 determines the expansion ratio of the voiced section and voiceless section independently according to a conversion rule corresponding to the remaining amount of data and supplied it to the time base control part 105.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ビデオテープレコーダ
ー（ＶＴＲ）等で音声の低速再生を行なう際に必要とな
る、音声の時間軸の長さを任意に伸長を行なうことを可
能にする音声時間軸変換装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice capable of arbitrarily extending the length of the time axis of the voice, which is required when the voice is played back at a low speed on a video tape recorder (VTR) or the like. The present invention relates to a time axis conversion device.

【０００２】[0002]

【従来の技術】従来より、ある速度で記録された音声信
号を記録時の速度と異なる速度で再生する音声時間軸変
換装置は存在する。例えば、テープレコーダーではテー
プの走行スピードを調節して再生速度を早めたり遅めた
りすることができる。しかし、再生スピードが変化する
のに伴って同時に音程も変化してしまうので、内容が聞
きづらくなってしまう。そこで、音程を変化させずに再
生速度を変化させることの可能な音声時間軸変換装置が
提案されている。2. Description of the Related Art Conventionally, there has been an audio time axis converter for reproducing an audio signal recorded at a certain speed at a speed different from the speed at the time of recording. For example, in a tape recorder, it is possible to adjust the running speed of the tape to speed up or slow down the playback speed. However, as the playback speed changes, the pitch also changes at the same time, making the content hard to hear. Therefore, there has been proposed an audio time base conversion device capable of changing the reproduction speed without changing the pitch.

【０００３】以下、従来の音声時間軸変換装置について
図面を参照しながら説明する。図４は従来の音声時間軸
変換装置の構成を表すブロック図である。図４におい
て、１は音響信号の記録および再生を行なう記録再生
部、２は再生されたアナログ信号をデジタル信号に変換
するＡ／Ｄ変換器、３はデジタルデータを記録するため
のバッファメモリ、４はＤ／Ａ変換器、５はバッファメ
モリへのデータの書き込みを制御する制御部、６はメモ
リのデータの読み出しを制御する読み出し制御部であ
る。A conventional speech time base converter will be described below with reference to the drawings. FIG. 4 is a block diagram showing the configuration of a conventional audio time base conversion device. In FIG. 4, 1 is a recording / reproducing unit for recording and reproducing acoustic signals, 2 is an A / D converter for converting reproduced analog signals into digital signals, 3 is a buffer memory for recording digital data, 4 Is a D / A converter, 5 is a control unit that controls writing of data to the buffer memory, and 6 is a read control unit that controls reading of data from the memory.

【０００４】以上のように構成された音声時間軸変換装
置について以下にその動作を説明する。ここでは、記録
媒体への記録速度以下で音声信号を再生する場合に、音
程を記録時の状態に戻して再生する音声時間軸変換装置
について説明する。The operation of the speech time base conversion apparatus configured as described above will be described below. Here, a description will be given of an audio time axis conversion device that restores a pitch to a state at the time of recording and reproduces the audio signal when the audio signal is reproduced at a recording speed or less on a recording medium.

【０００５】まず、記録再生部１において、記録時のＭ
倍（０＜Ｍ＜１）の速度で音響信号が再生される。ここ
で記録再生部とは、例えばＶＴＲ、テープレコーダー等
である。次に、記録再生部１から再生された音響信号
は、再生速度に反比例したサンプリング周期Ｔ／Ｍで、
Ａ／Ｄ変換器２によりデジタル信号に変換される。ここ
で、Ｔは記録時の音響信号について標本化定理を満足す
るサンプリング周期であり、Ｍ倍速再生された音響信号
の場合には、その１／Ｍの周期になる。Ａ／Ｄ変換され
たこれらのデジタル信号は、書き込み制御部５によって
周期Ｔ／Ｍでバッファメモリ３に順次記録されていく。
ここで、バッファメモリ３に記録された各デジタル信号
を、周期Ｔで読み出し再生すれば、記録時の音程に復元
できるが、出力信号を連続して出し続けるには入力信号
データが不足し、時間的に空白となる区間ができる。そ
のため、読み出し制御部６ではバッファメモリ３に蓄え
られたデジタル信号を数１０msecのフレーム単位で２度
繰り返して読みだしを行う区間を設けるようにして、不
足するデータを補うようにする。読み出し制御部６によ
り読み出されたデジタル信号を、Ｄ／Ａ変換器４により
サンプリング周期Ｔでアナログ信号に変換する。これら
一連の処理により、音程を変化させずに音声時間軸変換
が実現できる。ここで説明したような、音程一定で速度
のみを変換する技術については、例えば「『会話の時間
軸を圧縮／伸長するテープ・レコーダ』、小坂，横堀，
藤田，日経エレクトロニクス（１９７６．７．２６）」
に詳しく解説されている。First, in the recording / reproducing unit 1, M at the time of recording
The acoustic signal is reproduced at double speed (0 <M <1). Here, the recording / reproducing unit is, for example, a VTR, a tape recorder, or the like. Next, the acoustic signal reproduced from the recording / reproducing unit 1 has a sampling period T / M inversely proportional to the reproduction speed,
It is converted into a digital signal by the A / D converter 2. Here, T is a sampling period that satisfies the sampling theorem for the acoustic signal at the time of recording, and in the case of an acoustic signal reproduced at M times speed, it is a period of 1 / M thereof. These A / D-converted digital signals are sequentially recorded in the buffer memory 3 at the cycle T / M by the write controller 5.
Here, if each digital signal recorded in the buffer memory 3 is read and reproduced at a cycle T, the pitch at the time of recording can be restored, but input signal data is insufficient to continuously output the output signal, and There is a blank section. For this reason, the read control unit 6 is provided with a section in which the digital signal stored in the buffer memory 3 is repeatedly read twice in a frame unit of several tens of msec so as to compensate for the insufficient data. The D / A converter 4 converts the digital signal read by the read control unit 6 into an analog signal at a sampling cycle T. By this series of processing, the voice time axis conversion can be realized without changing the pitch. For the technique of converting only velocity with constant pitch as described here, for example, "Tape recorder for compressing / expanding conversation time axis", Kosaka, Yokobori,
Fujita, Nikkei Electronics (1976.26)
It is explained in detail in.

【０００６】図５は１／２倍速の場合の処理例を示して
いる。（ａ）は記録時のデータを示しており、（ｂ）は
バッファメモリに蓄えられていくデータの時間的位置を
示している。（ｂ）の各ブロックを２回づつ繰り返しな
がらサンプリング周期Ｔで再生したものが（ｃ）のデー
タ列となり、これは（ａ）のデータ列と音程が同じであ
り、時間軸が２倍のスケールになっている。FIG. 5 shows an example of processing in the case of 1/2 speed. (A) shows the data at the time of recording, and (b) shows the time position of the data accumulated in the buffer memory. The data string of (c) is reproduced by repeating each block of (b) twice at the sampling period T. The data string has the same pitch as the data string of (a), and the time axis has a double scale. It has become.

【０００７】[0007]

【発明が解決しようとする課題】上記した従来例では、
記録時の音程を保ち、音声速度は記録媒体の再生速度と
同一であり、記録時より遅く変換されている。ここで、
ＶＴＲ等で画像情報を詳細にゆっくりと見たい場合など
において、記録媒体の再生速度を遅くしていくと、従来
の時間軸変換装置を用いた場合には、音声速度も画像と
同様に遅くなっていく。ここで、人間の会話速度を違和
感無く可変できる幅については、０．７５〜１．５倍程
度といわれている。したがって、主として画像情報を詳
細にゆっくりと見たい場合に記録媒体の再生速度をあま
り遅くすると、再生される音声速度が必要以上に遅くな
りすぎて違和感が起こり、かえって聴き取りにくくな
る。また、これを回避するために、現在の記録媒体の再
生速度よりも早い速度で音声を聴取しようとすると、再
生すべき音声信号データが時間的に見て不足してしま
う。この場合、不足する音声データ区間が定期的に発生
し、この区間に無音データを挿入するなどの手段を用い
たとしても、不連続な音声信号となり極めて不自然な再
生音となる。SUMMARY OF THE INVENTION In the above-mentioned conventional example,
The pitch at the time of recording is maintained, the voice speed is the same as the reproduction speed of the recording medium, and it is converted to be slower than that at the time of recording. here,
When the reproduction speed of the recording medium is slowed down when it is desired to view the image information in detail slowly on a VTR or the like, when the conventional time axis conversion device is used, the audio speed is slowed down like the image. To go. Here, it is said that the range in which the human conversation speed can be changed without a sense of discomfort is about 0.75 to 1.5 times. Therefore, if the reproduction speed of the recording medium is too slow, mainly when it is desired to view the image information in detail slowly, the reproduced sound speed becomes unnecessarily slow and a sense of incongruity occurs, which makes it difficult to hear. Further, in order to avoid this, if an attempt is made to listen to audio at a speed higher than the current reproduction speed of the recording medium, audio signal data to be reproduced will be insufficient in terms of time. In this case, a lacking audio data section is regularly generated, and even if a means such as inserting silent data into this section is used, it becomes a discontinuous audio signal, resulting in an extremely unnatural reproduced sound.

【０００８】本発明は、上記課題を解決するもので、記
録媒体から記録速度以下の再生速度で読み出した場合
に、音声の速度は必要以上に遅くせず、不連続点を生じ
ることもなく、聴き取りやすい音声を聴取することが可
能な音声時間軸変換装置を提供することを目的とする。The present invention is intended to solve the above-mentioned problems, and when reading from a recording medium at a reproduction speed lower than the recording speed, the speed of voice is not slowed down more than necessary and a discontinuity is not generated. It is an object of the present invention to provide a voice time axis conversion device capable of listening to a voice that is easy to hear.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、請求項１記載の音声時間軸変換装置は、記録媒体に
記憶された音響信号を記録時のＭ（０＜Ｍ＜１）倍の速
度で読み出す再生部と、再生部で読み出されたアナログ
信号をデジタル信号へと変換するＡ／Ｄ変換器と、Ａ／
Ｄ変換器の出力データを記憶する入力バッファと、入力
信号の有音区間と無音区間の判定を行なう有音無音判定
部と、有音無音判定部の判定結果により無音区間に対す
る伸長比と有音区間に対する伸長比とを各々独立に設定
した時間軸伸長を行う時間軸制御部と、時間軸制御部の
出力データを記録するための出力バッファと、出力バッ
ファ内に記録されているデータのデータ残量を計測する
データ残量監視部と、データ残量監視部から得られたデ
ータ残量に応じて予め定めた変換規則にしたがって時間
軸変換の伸長比を決定する伸長比制御部と、出力バッフ
ァに記録されている音声データをアナログ信号に変換す
るＤ／Ａ変換器とを備えたものである。In order to solve the above-mentioned problems, an audio time axis converter according to a first aspect of the present invention is an audio signal stored in a recording medium which is M (0 <M <1) times as large as that at the time of recording. A reproducing unit for reading at a speed of, an A / D converter for converting an analog signal read by the reproducing unit into a digital signal, and an A / D converter
An input buffer that stores the output data of the D converter, a voiced / non-voiced determination unit that determines the voiced section and the voiceless section of the input signal, and an expansion ratio and a voiced rate for the voiced section based on the determination result of the voiced / voiceless determination section. A time axis control unit that performs time axis expansion by setting the expansion ratio for each section independently, an output buffer for recording the output data of the time axis control unit, and a data remaining of the data recorded in the output buffer. A data remaining amount monitoring unit that measures the amount, an expansion ratio control unit that determines an expansion ratio for time-axis conversion according to a predetermined conversion rule according to the data remaining amount obtained from the data remaining amount monitoring unit, and an output buffer And a D / A converter for converting the voice data recorded in the above into an analog signal.

【００１０】請求項２記載の音声時間軸変換装置は、無
音区間に対する伸長比と有音区間に対する伸長比とをデ
ータ残量に応じて各々独立に設定する伸長比設定部を備
えたものである。According to another aspect of the present invention, there is provided an audio time base conversion device including an expansion ratio setting section for independently setting an expansion ratio for a silent section and an expansion ratio for a voiced section according to the remaining amount of data. .

【００１１】請求項３記載の音声時間軸変換装置は、無
音区間伸長比を１／Ｍ以上に設定し、有音区間伸長比を
１．０以上１／Ｍ以下に設定し、データ残量に対応した
変換規則に基づき各々の伸長比を決定する伸長比制御部
を備えたものである。According to a third aspect of the present invention, in the voice time axis converter, the silent section expansion ratio is set to 1 / M or more, and the voiced section expansion ratio is set to 1.0 to 1 / M, and the remaining data amount is set. An expansion ratio control unit that determines each expansion ratio based on the corresponding conversion rule is provided.

【００１２】請求項４記載の音声時間軸変換装置は、有
音区間伸長比をデータ残量が所定の値以下の場合には１
／Ｍに設定し、それ以外には指定した固定値に設定し、
無音区間伸長比を１／Ｍ以上の範囲でデータ残量に対応
した変換規則に基づき伸長比を決定する伸長比制御部を
備えたものである。According to a fourth aspect of the present invention, there is provided a voice time base conversion apparatus which sets the voiced section expansion ratio to 1 when the remaining amount of data is less than a predetermined value.
Set to / M, otherwise set to the specified fixed value,
An expansion ratio control unit for determining the expansion ratio based on the conversion rule corresponding to the remaining amount of data in the silent section expansion ratio of 1 / M or more is provided.

【００１３】[0013]

【作用】上記の構成によれば、有音無音の判定の結果を
もとに、無音区間の伸長比を有音区間より大きくした時
間軸圧縮を行った後にバッファメモリに書き込みを行
う。この際に、バッファメモリ内に記録しているデータ
のデータ残量を計測し、データ残量が少なくなるほど伸
長比を大きくし、また、無音区間の割合が少なくても自
動的に伸長比を加減してバッファメモリに絶えず十分な
データが確保される構成にしたことにより、可能な限り
有音区間の再生速度を記録時に近い値に保って再生する
ことができ、聴き取りやすいスロー再生音を得ることが
できる。According to the above construction, based on the result of the judgment of the voiced / unvoiced state, the time axis compression is performed in which the expansion ratio of the voiceless section is made larger than that of the voiced section, and then the data is written in the buffer memory. At this time, the remaining amount of data recorded in the buffer memory is measured, the expansion ratio is increased as the remaining amount of the data decreases, and the expansion ratio is automatically adjusted even if the ratio of silent sections is small. By configuring so that sufficient data is constantly secured in the buffer memory, it is possible to play back with the playback speed in the voiced section kept as close as possible to the recording time as much as possible, and obtain slow playback sound that is easy to hear. be able to.

【００１４】また、請求項４記載の構成によれば、バッ
ファメモリに残っているデータの数であるデータ残量が
極めて少ない場合には、有音区間でも１／Ｍの伸長比で
時間軸伸長して音切れを防ぎ、それ以外の場合には無音
区間の伸長比をデータ残量をもとに調整する。これによ
って、音声の速度は所定の固定値で再生しつつ、バッフ
ァメモリが空になることによって出力信号がとぎれるこ
ともない、違和感の無い自然な再生音を得ることができ
る。Further, according to the structure of claim 4, when the remaining amount of data, which is the number of data remaining in the buffer memory, is extremely small, the time axis extension is performed at the extension ratio of 1 / M even in the sound section. In this case, the expansion ratio in the silent section is adjusted based on the remaining amount of data. As a result, it is possible to obtain a natural reproduced sound with no discomfort, in which the output signal is not interrupted by emptying the buffer memory while reproducing the audio at a predetermined fixed value.

【００１５】[0015]

【実施例】以下、本発明の第１の実施例について図面を
参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は本発明の第１の実施例における音声
時間軸変換装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the arrangement of a speech time base converter according to the first embodiment of the present invention.

【００１７】図１において、１０１は音響信号の記録お
よび再生を行なう記録再生部、１０２は記録再生部１で
再生されたアナログ信号をデジタル信号に変換するＡ／
Ｄ変換器、１０３はＡＤ変換された音響信号を一旦記録
するための入力バッファ、１０４は入力バッファから詠
み出されたデジタル信号列が有音区間であるか無音区間
であるかを判定する有音無音判定部、１０５は、入力バ
ッファから読み出された信号に対して所定の伸長比で時
間軸伸長処理を行う時間軸制御部、１０６は入力バッフ
ァからのデータの読み出しおよびそのアドレスを制御す
る読み出し制御部、１０７は出力バッファへのデータの
書き込みおよびそのアドレスを制御する書き込み制御
部、１０８は時間軸制御部で処理されたデータを一時的
に蓄える出力バッファ、１０９は出力バッファに一時的
に保存しているデータサイズを監視するデータ残量監視
部、１１０は時間軸制御部の伸長比をデータ残量監視部
の出力に応じて決定する伸長比制御部、１１１は出力バ
ッファに記録されたデジタルデータをアナログ信号に変
換するＤ／Ａ変換器である。In FIG. 1, 101 is a recording / reproducing section for recording and reproducing an acoustic signal, and 102 is A / A for converting an analog signal reproduced by the recording / reproducing section 1 into a digital signal.
A D converter, 103 is an input buffer for temporarily recording the AD-converted acoustic signal, and 104 is a voice sound for determining whether the digital signal string read out from the input buffer is a voiced section or a silent section. A silence determination unit, 105 is a time axis control unit that performs time axis expansion processing on a signal read from the input buffer at a predetermined expansion ratio, and 106 is data reading from the input buffer and reading that controls the address thereof. A control unit, 107 is a write control unit that controls writing of data to the output buffer and its address, 108 is an output buffer that temporarily stores the data processed by the time axis control unit, and 109 is temporarily stored in the output buffer. A data remaining amount monitoring unit for monitoring the data size being monitored, 110 determines the expansion ratio of the time axis control unit according to the output of the data remaining amount monitoring unit. Stretch ratio control unit that, 111 is a D / A converter for converting the digital data recorded in the output buffer to an analog signal.

【００１８】以上のように構成された音声時間軸変換装
置について、以下その動作を図１を参照しながら詳細に
説明する。The operation of the speech time base conversion apparatus configured as described above will be described in detail below with reference to FIG.

【００１９】まず、記録再生部１０１から記録時のＭ
（０＜Ｍ＜１）倍の速度で音響信号が読み出される。以
後、速度とは記録速度に対する相対速度を表すこととす
る。ここで、記録再生部１０１での記録時のサンプリン
グ周期をＴとすると、記録再生部１０１よりＭ倍速で再
生された音響信号は逐次Ａ／Ｄ変換器１０２によりサン
プリング周期Ｔ／Ｍでデジタル信号系列に変換されて、
入力バッファ１０３に書き込まれる。一方、Ｄ／Ａ変換
器１１１は記録時と同じサンプリング周期Ｔでアナログ
信号への変換が行われるので、単位時間あたり入力信号
に比べて１／Ｍ倍の信号が出力バッファには適宜準備さ
れていなければならない。その際に、入力信号全体に同
じ割合の時間軸伸長を施すのではなく、無音区間には有
音区間より大きい伸長比で時間軸伸縮することで有音区
間の伸長比を下げるように動作させるのが基本的な考え
方である。First, from the recording / reproducing unit 101, M at the time of recording
The acoustic signal is read at a speed of (0 <M <1) times. Hereinafter, the speed means the relative speed to the recording speed. Here, assuming that the sampling period at the time of recording in the recording / reproducing unit 101 is T, the acoustic signal reproduced by the recording / reproducing unit 101 at M times speed is successively digital signal sequence at the sampling period T / M by the A / D converter 102. Is converted to
It is written in the input buffer 103. On the other hand, since the D / A converter 111 performs conversion into an analog signal at the same sampling period T as during recording, a signal 1 / M times the input signal per unit time is appropriately prepared in the output buffer. There must be. At that time, instead of applying the same proportion of time-axis expansion to the entire input signal, the time-axis expansion and contraction is performed in the silent section at a larger expansion ratio than in the voiced section so that the expansion ratio of the voiced section is lowered. Is the basic idea.

【００２０】入力バッファから読み出された信号系列か
ら、有音無音判定部１０４によりそのサンプル値列が有
音区間であるか無音区間であるかの判定が行われる。こ
の有音／無音判定は公知の技術により容易に判定でき
る。この判定結果をもとに、時間軸制御部１０６は、入
力バッファから読み出したデータに対して時間軸伸長処
理を施して、出力バッファ１０８へ出力する。その際に
は、無音区間には無音区間用の伸長比で時間軸伸長し、
有音区間には有音区間用の伸長比で時間軸伸長が行われ
る。これらの伸長比はデータ残量監視部１０９で求めら
れたデータ残量をもとに伸長比制御部１１０で設定値を
与えられる。From the signal sequence read from the input buffer, the sound / silence judging section 104 judges whether the sample value sequence is a sound section or a sound section. This sound / silence judgment can be easily judged by a known technique. Based on this determination result, the time axis control unit 106 performs time axis expansion processing on the data read from the input buffer and outputs the data to the output buffer 108. In that case, in the silent section, time axis expansion is performed at the expansion ratio for the silent section,
In the voiced section, time axis extension is performed at the extension ratio for the voiced section. These expansion ratios are given set values by the expansion ratio control unit 110 based on the data remaining amount obtained by the data remaining amount monitoring unit 109.

【００２１】データ残量監視部は、出力バッファに書き
込まれているが、Ｄ／Ａ変換器１１１にはまだ出力され
ていないデータの残量をモニタしており、そのデータ残
量によって有音区間用伸長比と無音区間用伸長比を決定
する。したがって、出力バッファへのデータの溜まり具
合に応じて伸長比を調整することで、出力バッファが空
になることを防いでいる。The data remaining amount monitoring unit monitors the remaining amount of data that has been written in the output buffer but has not yet been output to the D / A converter 111. The expansion ratio for the sound and the expansion ratio for the silent section are determined. Therefore, the output buffer is prevented from becoming empty by adjusting the expansion ratio according to the amount of data accumulated in the output buffer.

【００２２】データ残量と伸長比の関係は例えば図２
（ａ）のように１次関数で与えられるものでも、あるい
は階段状に変化するものでもかまわない。図２（ａ）の
例において，出力バッファが空に近い状態ほど伸長率を
大きくして、出力バッファにデータを溜まりやすくして
いる。特に無音区間の伸長率を大きくしている。これ
は、有音区間の伸長率を下げても出力バッファが空にな
らないようするためである。図２（ｂ）の例では、有音
区間はデータ残量が０にならない限り伸長率０、すなわ
ち、記録時と同一の音声の速度で再生されることにな
る。この場合、有音区間の伸長率が固定の１の状態で
は、有音区間が連続すると出力バッファ内のデータ残量
が急激に減少することになるので、無音区間の伸長率は
おおむね大きめにして、出力バッファにデータが溜まり
やすくしている。時間軸伸長することで出力バッファが
空にならないようにデータ数を増加させることはできる
が、むやみに大きい値の伸長比を与えていると出力バッ
ファの容量を越えてしまうことになり、出力信号の連続
性を保てなくなる。このため、データ残量が多くなるに
連れて、伸長比は小さく押さえてある。The relationship between the remaining amount of data and the expansion ratio is shown in FIG.
It may be given by a linear function as shown in (a) or may be changed stepwise. In the example of FIG. 2A, the expansion rate is increased as the output buffer is closer to the empty state, so that the data can be easily stored in the output buffer. Especially, the expansion rate in the silent section is increased. This is to prevent the output buffer from becoming empty even if the expansion rate of the voiced section is lowered. In the example of FIG. 2B, the voiced section is reproduced at the extension rate 0, that is, at the same voice speed as at the time of recording, unless the remaining data amount becomes 0. In this case, in the state where the expansion rate of the voiced section is fixed to 1, if the voiced section continues, the remaining amount of data in the output buffer decreases rapidly. Therefore, the expansion rate of the silent section should be set to a large value. , It is easy to collect data in the output buffer. Although it is possible to increase the number of data so that the output buffer does not become empty by expanding the time axis, if the expansion ratio of a large value is given unnecessarily, it will exceed the capacity of the output buffer. Cannot be maintained. Therefore, the expansion ratio is kept small as the remaining amount of data increases.

【００２３】以下は、記録媒体の再生速度を記録時の２
／３倍（Ｍ＝２／３）にした場合を一例にとって、動作
説明を行う。In the following, the reproduction speed of the recording medium is set to 2 when recording.
The operation will be described by taking the case of ⅓ times (M = 2/3) as an example.

【００２４】まず、図２の伸長比設定テーブルは、デー
タ残量が０のとき、有音区間の伸長率を１．５にして、
入力信号に有音が与えられても出力バッファが空になる
ことを防いでいる。また、データ残量がほぼ出力バッフ
ァ容量と等しい場合には、無音区間の伸長比は１．５以
下に抑える必要がある。First, in the expansion ratio setting table of FIG. 2, when the remaining amount of data is 0, the expansion ratio of the sound section is set to 1.5,
The output buffer is prevented from becoming empty even if a voice is given to the input signal. When the remaining amount of data is almost equal to the output buffer capacity, the expansion ratio in the silent section needs to be suppressed to 1.5 or less.

【００２５】図３は無音区間と有音区間とを別々の時間
軸伸長比で時間軸伸長を行う場合の処理の様子を時間軸
に関し模式的に示したものである。（ａ）の記録時の入
力信号に対して（ｂ）は２／３倍の再生速度で記録媒体
から音声を再生した場合である。ここで、入力信号の無
音区間の割合に依存して無音区間、有音区間の伸長比を
決める必要がある。（ｃ）と（ｄ）には無音区間の割合
の異なる２つの例を示す。入力信号１から６の部分にお
いて、（ｃ）の例では１，２，３が無音区間で、４，
５，６が有音区間とした場合の処理を行っている。
（ｄ）の例では１，２が無音区間で、３，４，５，６が
有音区間とした場合の処理を行っている。この例では有
音区間の伸長比はともに１．０にしているため、無音区
間の伸長比は、（ｃ）の例では２．０、（ｄ）の例では
２．５となる。これらの例のように、無音区間の割合が
あらかじめ推定できれば、出力バッファから過不足なく
出力データをＤ／Ａ変換器に供給し続けられるので、伸
長比を一定に固定しておけばよい。しかし、再生するソ
ースの種類によって無音の含まれる割合は様々である。
したがって、出力バッファに蓄えられたデータの量をモ
ニタしその値によって伸長比を決定し、出力バッファで
出力データの時間的な過不足を吸収することによって、
無音の割合が予想できない音声であっても、無音区間と
有音区間の伸長比を独立に設定することができる。FIG. 3 is a schematic diagram showing the processing state in the case of expanding the time axis in the silent section and the voiced section at different time axis expansion ratios with respect to the time axis. (B) shows the case where the sound is reproduced from the recording medium at the reproduction speed of 2/3 times with respect to the input signal at the time of recording of (a). Here, it is necessary to determine the expansion ratio of the silent section and the voiced section depending on the ratio of the silent section of the input signal. (C) and (d) show two examples with different ratios of silent sections. In the portion of the input signals 1 to 6, in the example of (c), 1, 2, and 3 are silent intervals, and 4,
Processing is performed when 5 and 6 are voiced sections.
In the example of (d), processing is performed when 1 and 2 are silent sections and 3, 4, 5, and 6 are voiced sections. In this example, since the expansion ratios of the voiced sections are both 1.0, the expansion ratio of the silent section is 2.0 in the example of (c) and 2.5 in the example of (d). If the ratio of the silent sections can be estimated in advance as in these examples, the output data can be continuously supplied to the D / A converter from the output buffer without excess or deficiency, so the expansion ratio may be fixed. However, the ratio of silence included varies depending on the type of source to be reproduced.
Therefore, by monitoring the amount of data stored in the output buffer, determining the expansion ratio based on that value, and absorbing the time deficiency of the output data in the output buffer,
Even for a voice in which the ratio of silence is unpredictable, it is possible to set the expansion ratios of the silent section and the voiced section independently.

【００２６】ここで、時間軸変換処理の解説については
例えば「『高品質音声速度変換方式のＤＳＰによる実
現』、鈴木，三崎，電子情報通信学会音声研究会資料
SP90-34、（1990.8.23）」などに詳しく記述されてい
る。For a description of the time axis conversion process, see, for example, "" Realization of high-quality voice speed conversion system by DSP ", Suzuki, Misaki, Institute of Electronics, Information and Communication Engineers, Speech Study Group materials
SP90-34, (1990.8.23) ”and the like.

【００２７】このような伸長比の制御を行なうことによ
り、無音区間の割合により時間軸伸長する伸長比が少々
変化するが、記録時の音声の速度以下で、かつ、記録媒
体の再生速度より早い音声の速度で、音声信号を聴取で
きることになる。By controlling the expansion ratio as described above, the expansion ratio for time-base expansion slightly changes depending on the ratio of the silent section, but it is less than the speed of the sound at the time of recording and faster than the reproduction speed of the recording medium. The voice signal can be heard at the speed of voice.

【００２８】以上のように、本実施例によれば、データ
残量に基づいて有音区間・無音区間各々独立に時間軸伸
長比を設定し、データ残量が予め定めた一定量より少な
い時には有音区間の伸長比を１／Ｍに設定して出力信号
が途切れることを防ぎつつ、有音区間をできるだけ記録
時の音声の速度に近くする伸長比の制御を行うことで、
記録媒体の再生速度が遅くなっても違和感なく聞き取り
やすい再生音を得ることができる。As described above, according to the present embodiment, the time axis expansion ratio is set independently for each of the voiced section and the silent section based on the remaining amount of data, and when the remaining amount of data is less than a predetermined fixed amount. By setting the expansion ratio of the voiced section to 1 / M and preventing the output signal from being interrupted, by controlling the expansion ratio that makes the voiced section as close as possible to the speed of voice during recording,
Even if the reproduction speed of the recording medium becomes slower, it is possible to obtain a reproduced sound that is easy to hear without a feeling of strangeness.

【００２９】[0029]

【発明の効果】以上のように、本発明は、記録速度のＭ
倍（０＜Ｍ＜１）で再生された音響信号を有音無音判定
部で有音区間と無音区間の判定をし、時間軸制御部では
有音区間と無音区間に対して独立に設定した伸長比で時
間軸伸長して出力バッファに蓄え、出力バッファに記録
されているデータ残量に応じて無音区間および有音区間
の伸長比を予め定めた規則により決定し、無音区間伸長
比を１／Ｍ以上かつ、有音区間伸長比を１．０以上１／
Ｍ以下に設定して各々の伸長比を独立に変化させること
で、有音区間の音声の速度を再生速度より早くできる。
また、有音区間伸長比は、データ残量が所定の値以下の
場合には１／Ｍに設定し、それ以外の場合には指定した
固定値に設定し、かつ、無音区間伸長比を１／Ｍ以上の
範囲でデータ残量に対応した変換規則に基づいて決定す
ることにより、音声の速度は再生速度より早い一定値で
再生することができる。したがって、有音区間の音声の
速度をより記録時に近い値にして出力できる。また、無
音区間の含まれる割合に応じて無音伸長比と有音伸長
比、あるいは、無音伸長比のみを調整できるように、デ
ータ残量監視部を設けており、その結果、どのような入
力信号が与えられても、出力信号が途切れることなく再
生できる。As described above, according to the present invention, the recording speed M
The sound signal reproduced by a factor of 2 (0 <M <1) is used to determine the voiced section and the voiceless section by the voiced / voiceless determination section, and the time axis control section independently sets the voiced section and the voiceless section. It is expanded on the time axis at the expansion ratio and stored in the output buffer, and the expansion ratio of the silent section and the voiced section is determined according to a predetermined rule according to the remaining amount of data recorded in the output buffer, and the silent section expansion ratio is set to 1 / M or more, and the expansion ratio of voiced section is 1.0 or more 1 /
By setting it to be equal to or lower than M and changing each expansion ratio independently, the speed of the voice in the voiced section can be made faster than the reproduction speed.
Further, the voiced section expansion ratio is set to 1 / M when the remaining amount of data is less than or equal to a predetermined value, and is set to a designated fixed value in other cases, and the silent section expansion ratio is set to 1. By determining based on the conversion rule corresponding to the remaining amount of data in the range of / M or more, the voice speed can be reproduced at a constant value faster than the reproduction speed. Therefore, the speed of the voice in the voiced section can be output at a value closer to that at the time of recording. In addition, a data remaining amount monitoring unit is provided so that the silent expansion ratio and the sound expansion ratio, or only the silent expansion ratio can be adjusted according to the ratio of the silent section. Even if is given, the output signal can be reproduced without interruption.

【００３０】このように、本発明によれば、画像信号を
ゆっくりと見る都合によって、記録媒体の再生速度を遅
くしても音声信号を必要以上に遅い音声の速度で聞く必
要はなくなり、違和感の無い聞き取りやすいスロー再生
を可能にする音声時間軸変換装置を提供することができ
る。As described above, according to the present invention, it is not necessary to listen to the audio signal at an unnecessarily slow audio speed even if the reproduction speed of the recording medium is slowed down because of the convenience of viewing the image signal slowly, which makes the viewer feel uncomfortable. It is possible to provide an audio time axis conversion device that enables slow-motion playback that is easy to hear.

[Brief description of drawings]

【図１】本発明の実施例の音声時間軸変換装置の構成を
示すブロック図FIG. 1 is a block diagram showing a configuration of an audio time base converter according to an embodiment of the present invention.

【図２】本実施例の伸長比設定テーブルの説明図FIG. 2 is an explanatory diagram of an expansion ratio setting table according to this embodiment.

【図３】本実施例の時間軸伸長処理の模式図FIG. 3 is a schematic diagram of a time axis extension process of the present embodiment.

【図４】従来例の音声時間軸変換装置のブロック図FIG. 4 is a block diagram of a conventional audio time axis converter.

【図５】従来の時間軸伸長処理の模式図FIG. 5 is a schematic diagram of conventional time-axis expansion processing.

[Explanation of symbols]

１０１記録再生部１０２Ａ／Ｄ変換器１０３入力バッファ１０４有音無音判定部１０５時間軸制御部１０６読み出し制御部１０７書き込み制御部１０８出力バッファ１０９データ残量監視部１１０伸長比制御部１１１Ｄ／Ａ変換器 Reference numeral 101 recording / playback unit 102 A / D converter 103 input buffer 104 voiced / non-voiced determination unit 105 time axis control unit 106 read control unit 107 write control unit 108 output buffer 109 data remaining amount monitoring unit 110 expansion ratio control unit 111 D / A converter

Claims

[Claims]

1. A reproducing section for reading an acoustic signal stored in a recording medium at a speed M (0 <M <1) times as high as that at the time of recording, and an analog signal read by the reproducing section is converted into a digital signal. A / D converter, an input buffer that stores the output data of the AD converter, a voiced / soundless determination unit that determines a voiced section and a silence section of the input signal, and a determination of the voiced / soundless determination section. According to the result, a time axis control unit that sets the expansion ratio for the silent section and the expansion ratio for the voiced section independently to expand the audio data in the input buffer on the time axis, and stores the output data of the time axis control unit. According to a predetermined rule according to the output buffer, a data remaining amount monitoring unit that measures the data remaining amount of the data stored in the output buffer, and the data remaining amount obtained from the data remaining amount monitoring unit Time axis conversion And stretch ratio control unit which determines the stretch ratio,
An audio time base conversion device comprising: a D / A converter for converting audio data recorded in the output buffer into an analog signal.

2. The audio time base conversion according to claim 1, wherein the expansion ratio control unit independently sets an expansion ratio for a silent section and an expansion ratio for a voiced section according to the remaining amount of data. apparatus.

3. The expansion ratio control unit sets the silent section expansion ratio to 1 / M.
3. The above-described setting, the sounding section expansion ratio is set to 1.0 or more and 1 / M or less, and each expansion ratio is determined based on the conversion rule corresponding to the remaining amount of data. Audio time base converter.

4. The expansion ratio control unit sets the expansion ratio of the voiced section to 1 / M when the remaining amount of data is equal to or less than a predetermined value, and sets it to a fixed value other than the specified value for the silent section. Stretch ratio 1 /
4. The audio time base converter according to claim 3, wherein the expansion ratio is determined based on a conversion rule corresponding to the remaining amount of data in a range of M or more.