JP2002297200A

JP2002297200A - Speaking speed converting device

Info

Publication number: JP2002297200A
Application number: JP2001098735A
Authority: JP
Inventors: Tatsuo Inoue; 健生井上
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2001-03-30
Filing date: 2001-03-30
Publication date: 2002-10-11
Anticipated expiration: 2021-03-30
Also published as: JP4212253B2

Abstract

PROBLEM TO BE SOLVED: To provide a speaking speed converting device which can perform an adequate speaking speed conversion processing for aural signals of multiple channels while keeping a reproduced aural signal synchronous. SOLUTION: This speaking speed converting device which performs the speaking-speed conversion processing for input aural signals of multiple channels inputted from a voice reproducing device according to pitch cycles obtained from those input aural signals is equipped with a pitch cycle calculation part 124 which calculates a single pitch cycle for each processing section from the input aural signals of the multiple channels and a time-base compression and expansion part 14 which time-base-compresses the input aural signals of the multiple channels in the respective processing sections according to the obtained pitch cycles.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号の話速を
変える話速変換装置に関し、例えば、映像を伴うテレ
ビ、レーザディスク、ＶＴＲ、ハードディスクレコーダ
等の音声の早聞きまたは遅聞きを行なう音声再生装置、
聴覚障害者や高齢者のために、放送される音声信号をゆ
っくりした聞きやすい音声に変換する聴覚補助装置及び
該装置を備えた電話機等の機器、さらにはネイティブス
ピードで話された英語音声をゆっくりした聞きやすい音
声に変換する英語学習装置等、種々の機器にて利用が可
能な話速変換装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech speed conversion device for changing the speech speed of an audio signal, and for example, a voice for performing a fast or slow listening of a voice of a television, a laser disk, a VTR, a hard disk recorder, etc. accompanied by video. Playback device,
For the hearing impaired and the elderly, hearing aids and devices such as telephones equipped with the hearing aids, which convert broadcast audio signals into slow, easy-to-hear sound, as well as English voice spoken at native speed slowly The present invention relates to a speech speed conversion device that can be used in various devices, such as an English learning device that converts the voice into an easy-to-hear voice.

【０００２】尚、話速変換とは、音声信号の時間軸を圧
縮してその再生速度を本来の速度よりも速くしたり、あ
るいは逆に音声信号の時間軸を伸長してその再生速度を
本来の速度よりも遅くしたりすることを言う。The speech speed conversion means that the time axis of an audio signal is compressed to make its reproduction speed faster than the original speed, or conversely, the time axis of the audio signal is expanded to make its reproduction speed lower. Or slower than your speed.

【０００３】[0003]

【従来の技術】従来、例えば特開平７−１９２３９２号
公報に開示されているように、ＶＴＲの高速再生時にお
いて、ビデオテープから読み取られた音声信号のうち、
無音区間の音声信号を削除し、有音区間の音声信号をそ
のピッチ周期に基づいて時間軸上において伸長し、有音
区間の音声をユーザによって設定されたＶＴＲの再生速
度よりゆっくり再生する話速変換装置が知られており、
また、このような話速変換装置を搭載したＶＴＲが実用
化されている。2. Description of the Related Art Conventionally, as disclosed in, for example, Japanese Patent Application Laid-Open No. 7-192392, at the time of high-speed reproduction of a VTR, of audio signals read from a video tape,
Speech speed at which the voice signal of the silent section is deleted, the voice signal of the voice section is extended on the time axis based on the pitch period, and the voice of the voice section is reproduced more slowly than the VTR reproduction speed set by the user. Conversion devices are known,
Also, a VTR equipped with such a speech speed conversion device has been put to practical use.

【０００４】そして、このような話速変換装置において
は、近年、処理後の音声出力の高品質化が強く求められ
るようになってきている。[0004] In such a speech speed conversion device, in recent years, there has been a strong demand for higher quality audio output after processing.

【０００５】ところが、上述した従来の装置において
は、再生される音声信号がモノラル信号である場合には
その信号を、また、ステレオ信号である場合にはいずれ
か一方のチャンネルの音声信号を対象として話速変換処
理を行っており、ステレオ及び多チャンネルの音声信号
に対する話速変換処理が行われておらず、音声出力の品
質が低かった。However, in the above-mentioned conventional apparatus, when the reproduced audio signal is a monaural signal, the reproduced signal is a monaural signal, and when the reproduced audio signal is a stereo signal, the audio signal of one of the channels is targeted. Speech rate conversion processing was performed, and speech rate conversion processing for stereo and multi-channel audio signals was not performed, resulting in poor audio output quality.

【０００６】これに対し、例えば、各チャンネルの音声
信号を順次話速変換処理すること、または、従来の音声
処理装置を複数用いることにより、各チャンネルの音声
信号を並列的に処理することが考えられる。[0006] On the other hand, for example, it is conceivable to sequentially process voice signals of respective channels or to use a plurality of conventional voice processors to process voice signals of respective channels in parallel. Can be

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、いずれ
の場合も、各チャンネルの音声信号が独立して処理され
るため、各チャンネルの音声信号が異なる場合には、話
速変換処理後の音声信号のデータ量に差が生じてしま
う。これにより、再生される音声信号の同期がとれなく
なり、ユーザに対して違和感を与えてしまうという問題
が生じる。However, in each case, since the audio signal of each channel is independently processed, if the audio signal of each channel is different, the audio signal after the speech speed conversion processing is processed. A difference occurs in the amount of data. As a result, there is a problem that the audio signal to be reproduced cannot be synchronized, giving a sense of incongruity to the user.

【０００８】そこで本発明は、このような課題に鑑みて
なされたものであって、再生される音声信号の同期を維
持しつつ、多チャンネルの音声信号について適切な話速
変換処理を行うことができる話速変換装置を提供するこ
とを目的とする。The present invention has been made in view of such a problem, and it is an object of the present invention to perform appropriate speech speed conversion processing on multi-channel audio signals while maintaining synchronization of reproduced audio signals. It is an object of the present invention to provide a speech speed conversion device capable of performing the conversion.

【０００９】[0009]

【課題を解決するための手段】本発明の請求項１におけ
る話速変換装置は、音声再生装置より入力される複数チ
ャンネルの入力音声信号をそれら入力音声信号から得ら
れるピッチ周期に基づいて話速変換処理する話速変換装
置であって、複数チャンネルの入力音声信号から処理区
間毎に各チャンネル共通のピッチ周期を算出するピッチ
周期算出手段と、得られたピッチ周期に基づいてその処
理区間における各チャンネルの入力音声信号を時間軸圧
縮する時間軸圧縮伸長手段とを備えているものである。According to a first aspect of the present invention, there is provided a speech speed converting apparatus for converting input speech signals of a plurality of channels inputted from a speech reproducing apparatus based on a pitch period obtained from the input speech signals. A speech speed conversion device for performing a conversion process, comprising: a pitch cycle calculating means for calculating a pitch cycle common to each channel for each processing section from input audio signals of a plurality of channels; and And a time axis compression / expansion means for time axis compression of the input audio signal of the channel.

【００１０】また、請求項２は、請求項１における話速
変換装置において、処理区間が、ピッチ周期算出手段に
て算出されるピッチ周期に応じて変化するものである。According to a second aspect of the present invention, in the speech speed conversion device according to the first aspect, the processing section changes according to the pitch cycle calculated by the pitch cycle calculating means.

【００１１】また、請求項３は、請求項１または２にお
ける話速変換装置において、ピッチ周期算出手段が、複
数チャンネルの入力音声信号を処理区間毎に加算する加
算手段を備え、その加算手段にて得られた加算入力音声
信号からピッチ周期を算出するものである。According to a third aspect of the present invention, in the speech speed conversion device according to the first or second aspect, the pitch period calculating means includes an adding means for adding input audio signals of a plurality of channels for each processing section, and the adding means includes: The pitch period is calculated from the added input audio signal obtained as described above.

【００１２】また、請求項４は、請求項１または２にお
ける話速変換装置において、ピッチ周期算出手段が、複
数チャンネルの入力音声信号から処理区間毎に最大の信
号強度を有する方のチャンネルを検出する最大信号強度
検出手段を備え、その最大信号強度検出手段にて検出さ
れたチャンネルの入力音声信号からピッチ周期を算出す
るものである。この信号強度は、例えば、信号のパワ
ー、振幅平均値、または振幅累積値を意味する。According to a fourth aspect of the present invention, in the speech speed conversion device according to the first or second aspect, the pitch period calculating means detects a channel having a maximum signal strength for each processing section from the input audio signals of the plurality of channels. And a pitch period is calculated from the input audio signal of the channel detected by the maximum signal strength detecting means. The signal strength means, for example, the power of the signal, the average amplitude value, or the cumulative amplitude value.

【００１３】また、請求項４は、請求項１または２にお
ける話速変換装置において、ピッチ周期算出手段が、複
数チャンネルの入力音声信号から処理区間毎に最大の自
己相関値を有する方のチャンネルを検出する最大自己相
関値検出手段を備え、その最大自己相関値検出手段にて
検出されたチャンネルの入力音声信号からピッチ周期を
算出するものである。According to a fourth aspect of the present invention, in the speech speed conversion device according to the first or second aspect, the pitch period calculating means determines a channel having the largest autocorrelation value for each processing section from the input audio signals of the plurality of channels. It is provided with a maximum autocorrelation value detecting means for detecting, and calculates a pitch period from an input audio signal of a channel detected by the maximum autocorrelation value detecting means.

【００１４】本発明の請求項６における話速変換装置
は、音声再生装置より入力される複数チャンネルの入力
音声信号をそれら入力音声信号から得られるピッチ周期
に基づいて話速変換処理する話速変換装置であって、複
数チャンネルの入力音声信号を第１の処理区間毎に加算
する加算手段と、その加算手段にて得られた加算入力音
声信号に基づいてその処理区間が有音区間であるか無音
区間であるかを判定する区間判定手段と、その区間判定
手段にて無音区間であると判定された処理区間における
各チャンネルの入力音声信号を削除する無音区間削除手
段と、区間判定手段にて有音区間であると判定された加
算入力音声信号から第２の処理区間毎に単一のピッチ周
期を算出するピッチ周期算出手段と、そのピッチ周期算
出手段にて得られたピッチ周期に基づいてその処理区間
における各チャンネルの入力音声信号を時間軸圧縮する
時間軸圧縮伸長手段とを備えているものである。According to a sixth aspect of the present invention, there is provided a speech speed conversion device for performing a speech speed conversion process on input audio signals of a plurality of channels input from an audio reproducing device based on a pitch period obtained from the input audio signals. An adding means for adding input audio signals of a plurality of channels for each first processing section, and determining whether the processing section is a sound section based on the added input audio signal obtained by the adding means. A section determining means for determining whether or not the section is a silent section; a silent section deleting means for deleting an input audio signal of each channel in a processing section determined to be a silent section by the section determining means; Pitch period calculating means for calculating a single pitch period for each of the second processing sections from the added input audio signal determined to be a sound section; Based on the pitch period in which and a time axis compression and expansion means for compressing the time axis an input audio signal of each channel in the processing section.

【００１５】本発明の請求項７における話速変換装置
は、音声再生装置から入力される複数チャンネルの入力
音声信号を話速変換処理する話速変換処理手段と、その
話速変換処理手段にて処理された音声信号が書き込まれ
る音声メモリと、その音声メモリから音声信号を読み出
す読出手段とを備えた話速変換装置であって、話速変換
処理手段が、複数チャンネルの入力音声信号から第１の
処理区間毎に最大の信号強度を有する方のチャンネルを
検出する最大信号強度検出手段と、その最大信号強度検
出手段にて検出されたチャンネルの入力音声信号に基づ
いてその処理区間が有音区間であるか無音区間であるか
を判定する区間判定手段と、その区間判定手段にて無音
区間であると判定された処理区間における各チャンネル
の入力音声信号を削除する無音区間削除手段と、区間判
定手段にて有音区間であると判定されたチャンネルの入
力音声信号から第２の処理区間毎に単一のピッチ周期を
算出するピッチ周期算出手段と、そのピッチ周期算出手
段にて得られたピッチ周期に基づいてその処理区間にお
ける各チャンネルの入力音声信号を時間軸圧縮する時間
軸圧縮伸長手段とを備えているものである。A speech speed conversion apparatus according to a seventh aspect of the present invention is a speech speed conversion processing means for performing speech rate conversion processing on input audio signals of a plurality of channels input from an audio reproduction apparatus, and the speech speed conversion processing means. A speech speed conversion device comprising an audio memory into which a processed audio signal is written and reading means for reading out the audio signal from the audio memory. Maximum signal strength detection means for detecting the channel having the highest signal strength for each processing section, and the processing section having a sound section based on the input audio signal of the channel detected by the maximum signal strength detection means. Section determining means for determining whether the signal is a silent section or not, and removing the input audio signal of each channel in the processing section determined to be a silent section by the section determining means. Silence section deletion means, pitch cycle calculation means for calculating a single pitch cycle for each second processing section from an input audio signal of a channel determined to be a sound section by the section determination means, and the pitch And a time axis compression / expansion means for time axis compressing the input audio signal of each channel in the processing section based on the pitch period obtained by the period calculation means.

【００１６】本発明の請求項８における話速変換装置
は、音声再生装置から入力される複数チャンネルの入力
音声信号を話速変換処理する話速変換処理手段と、その
話速変換処理手段にて処理された音声信号が書き込まれ
る音声メモリと、その音声メモリから音声信号を読み出
す読出手段とを備えた話速変換装置であって、話速変換
処理手段が、複数チャンネルの入力音声信号から第１の
処理区間毎に最大の自己相関値を有するチャンネルを検
出する最大自己相関値検出手段と、その最大自己相関値
検出手段にて検出されたチャンネルの入力音声信号に基
づいてその処理区間が有音区間であるか無音区間である
かを判定する区間判定手段と、その区間判定手段にて無
音区間であると判定された処理区間における各チャンネ
ルの入力音声信号を削除する無音区間削除手段と、区間
判定手段にて有音区間であると判定されたチャンネルの
入力音声信号から第２の処理区間毎に単一のピッチ周期
を算出するピッチ周期算出手段と、そのピッチ周期算出
手段にて得られたピッチ周期に基づいてその処理区間に
おける各チャンネルの入力音声信号を時間軸圧縮する時
間軸圧縮伸長手段とを備えているものである。According to an eighth aspect of the present invention, there is provided a speech speed conversion device, comprising: speech speed conversion processing means for performing a speech speed conversion process on a plurality of channels of input audio signals input from an audio reproduction device; A speech speed conversion device comprising an audio memory into which a processed audio signal is written and reading means for reading out the audio signal from the audio memory. A maximum autocorrelation value detecting means for detecting a channel having a maximum autocorrelation value for each processing section, and a processing section having a sound based on an input audio signal of the channel detected by the maximum autocorrelation value detecting means. A section determining means for determining whether the section is a section or a silent section, and an input sound signal of each channel in the processing section determined to be a silent section by the section determining means. Silence section deletion means for removing, pitch cycle calculation means for calculating a single pitch cycle for each second processing section from an input audio signal of a channel determined to be a sound section by the section determination means, And a time axis compression / expansion means for time axis compressing the input audio signal of each channel in the processing section based on the pitch cycle obtained by the pitch cycle calculation means.

【００１７】また、請求項９は、請求項６ないし８の何
れかの話速変換装置において、第２の処理区間が、ピッ
チ周期算出手段にて算出されるピッチ周期に応じて変化
するものである。According to a ninth aspect of the present invention, in the speech speed conversion device according to any one of the sixth to eighth aspects, the second processing section changes in accordance with the pitch cycle calculated by the pitch cycle calculating means. is there.

【００１８】[0018]

【発明の実施の形態】以下、図面を参照して、本発明の
実施の形態について説明する。〔１〕第１の実施の形態図１は、本実施の形態における話速変換装置の構成を示
す概略ブロック図である。なお、以下の説明において、
圧縮率とは、時間軸圧縮伸長部１４への入力信号の時間
長（データ量）をＰ、上記入力信号に対して時間軸圧縮
伸長部１４から出力される出力信号の時間長（データ
量）をＱとすると、Ｐ／Ｑで定義されるものとする。ま
た、音声メモリ１５内の未読み出しの音声信号の蓄積率
とは、音声メモリ１５に記憶できる音声信号の総データ
量に対する未読み出しの音声信号の蓄積量の割合〔％〕
とし、以下単に蓄積率というものとする。Embodiments of the present invention will be described below with reference to the drawings. [1] First Embodiment FIG. 1 is a schematic block diagram showing a configuration of a speech speed conversion device according to the present embodiment. In the following description,
The compression ratio is defined as P, the time length (data amount) of an input signal to the time axis compression / expansion unit 14, and the time length (data amount) of an output signal output from the time axis compression / expansion unit 14 with respect to the input signal. Is defined as P / Q. The storage rate of the unread audio signal in the audio memory 15 is the ratio [%] of the storage amount of the unread audio signal to the total data amount of the audio signal that can be stored in the audio memory 15.
Hereinafter, it is simply referred to as an accumulation rate.

【００１９】同図において、１０はハードディスクドラ
イブであり、映像信号と、ステレオ音声再生するために
必要な左チャンネル及び右チャンネルの音声信号とが、
それぞれの信号の種類を表すヘッダ情報と共に記録され
ているものとする。なお、左チャンネルと、それに続く
右チャンネルの音声信号は、互いに時間的に対応してい
る。In FIG. 1, reference numeral 10 denotes a hard disk drive, which converts a video signal and left and right channel audio signals required for stereo sound reproduction into
It is assumed that the information is recorded together with header information indicating the type of each signal. Note that the left channel and the subsequent right channel audio signals temporally correspond to each other.

【００２０】例えば、ＭＰＥＧ方式で符号化されている
デジタルテレビ放送を対象とする場合には、受信した放
送信号のＴＳ（Transport Stream）について、各パケッ
トの識別情報を参照することにより、映像信号を持つ映
像パケットと、ステレオ再生するために必要な左チャン
ネル及び右チャンネルの音声信号を持つ音声パケットと
を分離抽出し、得られた各信号をそれぞれＭＰＥＧ復号
処理した後、各信号の種類に応じたヘッダ情報を付けて
ハードディスクドライブ１０に記録すればよい。このと
き、音声信号は、右チャンネルの音声信号と左チャンネ
ルの音声信号とが１データずつ交互に記録されている。For example, when a digital television broadcast encoded by the MPEG system is targeted, a video signal is transmitted by referring to identification information of each packet with respect to a TS (Transport Stream) of a received broadcast signal. Separately extract the video packet and the audio packet having the left channel and right channel audio signals required for stereo reproduction, subject each of the obtained signals to MPEG decoding processing, and perform processing according to the type of each signal. What is necessary is just to record on the hard disk drive 10 with header information. At this time, in the audio signal, a right channel audio signal and a left channel audio signal are alternately recorded one by one.

【００２１】ハードディスクドライブ１０から読み出さ
れた信号は、フレームメモリ１１に蓄積される。フレー
ムメモリ１１に蓄積された信号は、信号分析部１２内に
設けた信号分離部１２１に送られ、そこでヘッダ情報に
基づいて映像信号と音声信号とに分離される。分離され
た映像信号は、図示しないモニタに送られ、そこで表示
される。The signal read from the hard disk drive 10 is stored in the frame memory 11. The signal stored in the frame memory 11 is sent to a signal separating unit 121 provided in the signal analyzing unit 12, where it is separated into a video signal and an audio signal based on the header information. The separated video signal is sent to a monitor (not shown) and displayed there.

【００２２】一方、信号分離部１２１にて分離された左
チャンネル及び右チャンネルの音声信号は、信号加算部
１２２に送られる。On the other hand, the left channel and right channel audio signals separated by the signal separation section 121 are sent to a signal addition section 122.

【００２３】信号加算部１２２は、信号分離部１２１よ
り入力される左チャンネル及び右チャンネルの音声信号
を順次加算し、両チャンネルの１フレーム分の音声信号
が加算された時点で、その加算音声信号を出力する。The signal adding unit 122 sequentially adds the left channel and right channel audio signals input from the signal separating unit 121, and when the audio signals for one frame of both channels are added, the added audio signal Is output.

【００２４】区間判定部１２３は、信号加算部１２２よ
り入力される１フレーム分の加算音声信号について、そ
のパワー、振幅平均値、振幅累積値等の信号強度を求
め、得られた値に基づいて入力された加算音声信号が有
音区間であるか、無音区間であるかを判定する。ここで
は、信号強度として信号のパワーを用いるものとする。The section determination section 123 obtains signal strengths such as power, amplitude average value, amplitude accumulation value, and the like of the added audio signal for one frame inputted from the signal addition section 122, based on the obtained values. It is determined whether the input added audio signal is a sound section or a silent section. Here, the power of the signal is used as the signal strength.

【００２５】パワーＰは、１フレーム内の加算音声信号
の振幅値をｉ₀、ｉ₁、…ｉ_N-1とすると、次式数１によ
って求められる。The power P is the amplitude value of the addition sound signal in one frame i _0, i _1, ... When i _N-1, obtained by the following equation number 1.

【００２６】[0026]

【数１】 (Equation 1)

【００２７】上記数１により得られたパワーの平均値Ｐ
は、予め設定された閾値Ｔｈと比較され、パワーの平均
値Ｐが閾値Ｔｈ以上（Ｐ≧Ｔｈ）か、閾値Ｔｈより小さ
い（Ｐ＜Ｔｈ）かが判定される。そして、区間判定部１
２３からは、パワーの平均値Ｐが閾値Ｔｈ以上（Ｐ≧Ｔ
ｈ）のときには現フレームが有音区間であることを示す
信号が、また、パワーの平均値Ｐが閾値Ｔｈより小さい
（Ｐ＜Ｔｈ）ときには現フレームが無音区間であること
を示す信号が出力される。The average value P of the power obtained by the above equation (1)
Is compared with a preset threshold Th, and it is determined whether the average value P of the power is equal to or larger than the threshold Th (P ≧ Th) or smaller than the threshold Th (P <Th). Then, the section determination unit 1
23, the average value P of the power is equal to or larger than the threshold Th (P ≧ T
In the case of h), a signal indicating that the current frame is a voiced section is output, and when the average power value P is smaller than the threshold Th (P <Th), a signal indicating that the current frame is a voiceless section is output. You.

【００２８】無音区間削除部１３は、区間判定部１２３
より入力される信号に基づいて、無音区間であると判定
された区間の加算音声信号を生成する元となった左チャ
ンネルの音声信号及びそれに続く右チャンネルの音声信
号を共に削除し、有音区間であると判定された区間の加
算音声信号を生成する元となった左チャンネルの音声信
号及びそれに続く右チャンネルの音声信号を出力する。The silent section deleting section 13 includes a section determining section 123
Based on the input signal, both the left-channel audio signal and the subsequent right-channel audio signal from which the added audio signal of the section determined to be a silent section is generated are deleted, and the sound section Then, the left channel audio signal from which the added audio signal of the section determined to be generated and the subsequent right channel audio signal are output.

【００２９】これにより、無音区間として削除されるフ
レームは、左チャンネル及び右チャンネルの音声信号が
互いに時間的に対応するフレームとなるため、有音区間
における左チャンネル及び右チャンネルの音声信号につ
いてフレーム単位の同期が維持される。As a result, the frame to be deleted as a silent section is a frame in which the audio signals of the left channel and the right channel temporally correspond to each other. Are kept synchronized.

【００３０】また、加算音声信号における無音区間は、
左チャンネル及び右チャンネルの音声信号の信号強度が
共に弱いフレームであることから、両チャンネルの音声
信号について適切に無音区間の判定が行われる。The silent section in the added voice signal is
Since the signal strength of both the left and right channel audio signals is a weak frame, the silent section is appropriately determined for the audio signals of both channels.

【００３１】一方、再生音の品質の高い時間軸圧縮伸長
処理を行うための方法として、音声信号をそのピッチ周
期単位で時間軸圧縮伸長処理する方法が知られており、
ここではその方法を用いる。On the other hand, as a method for performing the time axis compression / expansion processing with high quality of reproduced sound, there is known a method of time axis compression / expansion processing of an audio signal in units of its pitch cycle.
Here, that method is used.

【００３２】このため本実施の形態における話速変換装
置は、ピッチ周期算出部１２４を備えている。ピッチ周
期算出部１２４は、信号加算部１２２より入力される加
算音声信号からピッチ周期を算出し、出力する。ここで
は、ピッチ周期の算出方法として自己相関を利用する。For this reason, the speech speed conversion device according to the present embodiment includes a pitch cycle calculation unit 124. The pitch cycle calculation section 124 calculates a pitch cycle from the added voice signal input from the signal addition section 122 and outputs the pitch cycle. Here, an autocorrelation is used as a pitch period calculation method.

【００３３】自己相関を用いたピッチ周期の算出方法に
は、信号が時間制限されていると仮定し、時間長Ｔsの
区間内だけに信号が存在し、その時間長Ｔsの区間外で
は信号は常にゼロとして自己相関を求める短時間自己相
関を用いる方法がある。これは、コロナ社発行「音声の
ディジタル信号処理」（上）−L.R.Rabiner＆R.W.Schaf
er著、鈴木久喜訳−p152-p152にも記載されているよう
に、いま、音声波形をディジタル音声信号ｘ(ｎ)で表す
と、前述の方法による短時間自己相関値Ｒｎ(ｋ)は以下
のようになる。In the method of calculating the pitch period using the autocorrelation, it is assumed that the signal is time-limited, and the signal exists only within the section of the time length Ts, and the signal is outside the section of the time length Ts. There is a method of using short-time autocorrelation that always obtains autocorrelation as zero. This is a digital signal processing of voice issued by Corona (above)-LRRabiner & R.W.Schaf
As described in er, by Kuki Suzuki -p152-p152, if the audio waveform is represented by a digital audio signal x (n), the short-time autocorrelation value Rn (k) by the above-described method is as follows. become that way.

【００３４】[0034]

【数２】 (Equation 2)

【００３５】ここで、Ｔsは音声信号が存在すると仮定
した時間区間、ｋは短時間自己相関値Ｒｎ(ｋ)を算出す
るときに音声波形を遅延させる際の遅延時間であり、Ｔ
s≫ｋの関係にある。そして、上記数２において、短時
間自己相関値Ｒｎ(ｋ)が最大となるようなｋの値を求め
ると、その値がピッチ周期となる。得られたピッチ周期
は、時間軸圧縮伸長部１４へ送られる。Here, Ts is a time section in which a voice signal is assumed to exist, k is a delay time for delaying a voice waveform when calculating a short-time autocorrelation value Rn (k), and T
There is a relationship of s≫k. Then, when the value of k that maximizes the short-time autocorrelation value Rn (k) is obtained in Expression 2, the value becomes the pitch period. The obtained pitch period is sent to the time axis compression / expansion unit 14.

【００３６】時間軸圧縮伸長部１４は、無音区間削除部
１３において無音区間の音声信号が削除された後の音声
信号（有音区間の音声信号）について、ピッチ周期算出
部１２４より入力されたピッチ周期を左チャンネルの音
声信号及びそれに続く右チャンネルの音声信号共通のピ
ッチ周期として、時間軸圧縮伸長処理を行う。なお、こ
の時間軸圧縮伸長処理については、ピッチ周期算出部１
２４にて得られたピッチ周期に応じて変化する処理単位
で行われる。The time axis compression / expansion unit 14 controls the pitch input from the pitch period calculation unit 124 for the voice signal (voice signal of the voiced section) after the voice signal of the voiceless section has been deleted by the voiceless section deletion unit 13. The time axis compression / expansion processing is performed by setting the cycle as a pitch cycle common to the left channel audio signal and the subsequent right channel audio signal. The time axis compression / expansion processing is performed by the pitch period calculation unit 1
The processing is performed in units of processing that change in accordance with the pitch cycle obtained in 24.

【００３７】この時間軸圧縮伸長部１４における時間軸
圧縮処理は、例えば図２に示すように、処理対象となる
ピッチ周期２周期分の音声波形を切り出した後、ピッチ
周期１周期分の波形Ａに１から０に直線的に変化する重
み係数を乗じて波形Ａ´を生成し、また、残りの波形Ｂ
に０から１に直線的に変化する重み係数を乗じて波形Ｂ
´を生成し、それぞれを足し合わせることによって、ピ
ッチ周期１周期分の波形Ｃを得ることにより行われる。
具体的に圧縮率が１．５倍の場合、図３に示すように、
１ピッチ周期目の波形と２ピッチ周期目の波形を１つの
波形に圧縮して１つ目の出力波形とし、次に３ピッチ周
期目をそのまま２つ目の出力波形とし、更に４ピッチ周
期目の波形と５ピッチ周期目の波形を１つの波形に圧縮
し３つ目の出力波形とすればよい。In the time axis compression processing in the time axis compression / expansion unit 14, for example, as shown in FIG. 2, a speech waveform of two pitch periods to be processed is cut out, and then a waveform A of one pitch period is extracted. Is multiplied by a weighting coefficient that changes linearly from 1 to 0 to generate a waveform A ′.
Is multiplied by a weighting coefficient that changes linearly from 0 to 1 and the waveform B
Are generated and added together to obtain a waveform C for one pitch period.
Specifically, when the compression ratio is 1.5 times, as shown in FIG.
The waveform of the first pitch cycle and the waveform of the second pitch cycle are compressed into one waveform to be the first output waveform, and then the third pitch cycle is directly used as the second output waveform, and further the fourth pitch cycle is obtained. And the waveform of the fifth pitch cycle may be compressed into one waveform to obtain a third output waveform.

【００３８】また、時間軸圧縮伸長部１４における時間
軸圧縮伸長処理は、例えば図４に示すように、処理対象
となるピッチ周期３周期分の音声波形を切り出した後、
ピッチ周期２周期分の波形Ａに０から１に直線的に変化
する重み係数乗じて波形Ａ´を生成し、また、ピッチ周
期２周期分の波形Ｂに例えば１から０に直線的に変化す
る重み係数を乗じて波形Ｂ´を生成し、それぞれを足し
合わせることによって、ピッチ周期１周期分の波形Ｄ及
び波形Ｅそれぞれを得ることにより行われる。In the time axis compression / expansion processing in the time axis compression / expansion section 14, for example, as shown in FIG.
A waveform A 'is generated by multiplying a waveform A for two pitch periods by a weight coefficient that linearly changes from 0 to 1 and a waveform B for two pitch periods linearly changes from 1 to 0, for example. This is performed by generating a waveform B 'by multiplying by a weight coefficient and adding the waveforms B' to each other to obtain a waveform D and a waveform E for one pitch period.

【００３９】このようなピッチ周期に基づく時間軸圧縮
伸長処理においては、左チャンネル及び右チャンネルの
音声信号について、互いに同一のピッチ周期を用いるこ
とにより、時間軸圧縮伸長処理後の音声信号について、
左チャンネル及び右チャンネルの同期が維持される。In the time axis compression / expansion processing based on such a pitch cycle, the same pitch cycle is used for the audio signals of the left channel and the right channel, so that the audio signal after the time axis compression / expansion processing can be used.
Synchronization of the left and right channels is maintained.

【００４０】また、加算音声信号に基づいて得られたピ
ッチ周期は、左チャンネル及び右チャンネルの音声信号
にパワーの差がある場合、再生音声の品質に大きく影響
するパワーの大きい方のチャンネルの音声信号のピッチ
周期に近くなるため、パワーの大きい方のチャンネルの
再生音声の品質が高くなる。このとき、パワーの小さい
方のチャンネルの再生音声の品質は低くなるが、その影
響は小さく、全体として、再生音声の品質が高くなる。
例え左チャンネル及び右チャンネルの音声信号にパワー
の差がない場合であっても、加算音声信号に基づいて得
られたピッチ周期は、両チャンネルの音声信号における
ピッチ周期の中間的な値となるため、左チャンネル及び
右チャンネルの音声信号が同程度の品質で再生され、一
方のチャンネルに著しい再生音声の品質低下を生じさせ
ることがない。The pitch period obtained based on the added audio signal is determined by the difference between the power of the audio signal of the left channel and that of the audio signal of the right channel. Since the pitch becomes closer to the pitch period of the signal, the quality of the reproduced sound of the channel with the higher power is improved. At this time, the quality of the reproduced sound of the channel with the lower power is low, but the effect is small, and the quality of the reproduced sound is generally high.
Even if there is no difference in power between the left channel and right channel audio signals, the pitch period obtained based on the added audio signal is an intermediate value between the pitch periods in the audio signals of both channels. The audio signals of the left and right channels are reproduced with substantially the same quality, and the quality of the reproduced audio is not significantly reduced in one of the channels.

【００４１】更に、ピッチ周期算出部１２４における演
算処理が、左右２チャンネルの音声信号に対して１チャ
ンネル分の処理量に削減されるため、演算処理にかかる
負荷が軽減される。Further, since the arithmetic processing in the pitch cycle calculating section 124 is reduced to the processing amount of one channel for the audio signals of the left and right two channels, the load on the arithmetic processing is reduced.

【００４２】そして、時間軸圧縮伸長部１４によって時
間軸圧縮伸長処理が行われた音声信号は、一旦音声メモ
リ１５に蓄積され、そこから時間軸圧縮伸長処理におけ
る圧縮率に関係なく標準再生速度で読み出される。The audio signal subjected to the time axis compression / expansion processing by the time axis compression / expansion unit 14 is temporarily stored in the audio memory 15 and then stored therefrom at the standard reproduction speed regardless of the compression rate in the time axis compression / expansion processing. Is read.

【００４３】この時間軸圧縮処理における圧縮率は、ユ
ーザから指示される再生モードと音声メモリ１５の蓄積
率とに応じて決定される。このため、本実施の形態にお
ける話速変換装置は、音声メモリ１５の蓄積率を算出す
る蓄積率算出部１６を備えている。蓄積率算出部１６に
よって算出された蓄積率は話速制御部１７に送られる。The compression rate in the time axis compression processing is determined according to the reproduction mode specified by the user and the accumulation rate of the audio memory 15. For this reason, the speech speed conversion device according to the present embodiment includes a storage rate calculation unit 16 that calculates the storage rate of the audio memory 15. The accumulation rate calculated by the accumulation rate calculation unit 16 is sent to the speech speed control unit 17.

【００４４】話速制御部１７は、ユーザによって設定さ
れた再生モードと蓄積率とに基づいて時間軸圧縮伸長部
１４で用いられる圧縮率を制御する。The speech speed control unit 17 controls the compression ratio used by the time axis compression / expansion unit 14 based on the reproduction mode and the accumulation ratio set by the user.

【００４５】例えば、ユーザが高速再生モードを設定し
たとする。この場合、ハードディスクドライブ１０から
読み出されてモニタに出力される映像の再生速度（以
下、設定再生速度と称する）は１．５倍となるが、音声
はその設定再生速度と同じかそれより遅い速度で出力さ
れる。For example, assume that the user has set the high-speed playback mode. In this case, the playback speed of the video read from the hard disk drive 10 and output to the monitor (hereinafter, referred to as the set playback speed) is 1.5 times, but the sound is equal to or slower than the set playback speed. Output at speed.

【００４６】表１は、高速再生モードにおける蓄積率と
圧縮率との関係を示している。Table 1 shows the relationship between the accumulation ratio and the compression ratio in the high-speed reproduction mode.

【００４７】[0047]

【表１】 [Table 1]

【００４８】話速制御部１７は、上記表１の蓄積率と圧
縮率との関係を記憶した蓄積率／圧縮率テーブルを備え
ており、蓄積率算出部１６から蓄積率が送られてくる
と、この蓄積率／圧縮率テーブルに基づいて、蓄積率算
出部１６から送られてきた蓄積率に対応する圧縮率を読
み出し、時間軸圧縮伸長部１４に設定する。（１）蓄積率が０〜３０％（０以上且つ３０％未満）で
ある場合蓄積率が０〜３０％である場合には、圧縮率は１に設定
される。この場合、ハードディスクドライブ１０から読
み出された音声信号は、一旦フレームメモリ１１に格納
され、そこから設定再生速度倍率１．５に応じた再生速
度（４８ｋＨｚ）で読み出される。The speech speed control unit 17 has a storage ratio / compression ratio table storing the relationship between the storage ratio and the compression ratio shown in Table 1 above, and when the storage ratio is sent from the storage ratio calculation unit 16. Based on this storage rate / compression rate table, the compression rate corresponding to the storage rate sent from the storage rate calculation unit 16 is read and set in the time axis compression / expansion unit 14. (1) When the accumulation rate is 0 to 30% (0 or more and less than 30%) When the accumulation rate is 0 to 30%, the compression rate is set to 1. In this case, the audio signal read from the hard disk drive 10 is temporarily stored in the frame memory 11 and read therefrom at a playback speed (48 kHz) corresponding to the set playback speed magnification factor 1.5.

【００４９】フレームメモリ１１から読み出された音声
信号は、無音区間削除部１３によって無音区間の信号が
削除された後、時間軸圧縮伸長部１４において時間軸圧
縮伸長処理は行われずに、音声メモリ１５に蓄積され
る。音声メモリ１５に蓄積された音声信号は、Ｌ／Ｒ分
離部１８において左チャンネル及び右チャンネルの音声
信号に分離された後、それぞれＤ／Ａ変換部１９１、１
９２によって標準サンプリング周波数（３２ｋＨｚ）で
サンプリングされて出力される。従って、出力音声の話
速は、標準再生速度（１倍速再生時の再生速度）で再生
されたときの出力音声の話速と等しくなる。After the audio signal read from the frame memory 11 is deleted in the silent section by the silent section deletion section 13, the time axis compression / expansion section 14 does not perform the time axis compression / expansion processing, 15 are stored. The audio signal stored in the audio memory 15 is separated into left and right channel audio signals by the L / R separation unit 18 and then D / A conversion units 191 and 1, respectively.
The sampled data is output at 92 at a standard sampling frequency (32 kHz). Therefore, the speech speed of the output sound becomes equal to the speech speed of the output sound when reproduced at the standard reproduction speed (reproduction speed at 1 × speed reproduction).

【００５０】音声信号に無音区間の信号が少ない場合、
音声メモリ１５への書き込み速度は、音声メモリ１５か
らのデータ読み出し速度より速いので、音声メモリ１５
内の未読み出しのデータの蓄積量が増加していく。この
未読み出しの音声信号の蓄積量が増加していく速度は、
音声信号に無音区間の信号が少なくなる程、速くなる。
一方、音声信号に無音区間の信号が多くなると、その無
音区間のデータ量によっては、逆に音声メモリ１５内の
未読み出しのデータの蓄積量が減少する。When there are few signals in the silent section in the audio signal,
Since the writing speed to the audio memory 15 is faster than the data reading speed from the audio memory 15,
, The amount of storage of unread data increases. The speed at which the accumulation amount of the unread audio signal increases is
The faster the sound signal is, the less the signal in the silent section is.
On the other hand, when the number of signals in the silent section increases in the audio signal, the storage amount of unread data in the audio memory 15 is reduced depending on the data amount in the silent section.

【００５１】なお、表１には記載していないが、音声メ
モリ１５の蓄積率が１０％未満となった場合、話速制御
部１７から無音区間削除部１３に対して無音区間の削除
を禁止する制御信号が出力され、その後音声メモリ１５
の蓄積率が２０％を超えた場合、話速制御部１７から無
音区間削除部１３に対して無音区間の削除を再開する制
御信号が出力されることになっている。（２）蓄積率が３０〜６０％（３０以上且つ６０％未
満）である場合蓄積率が３０〜６０％である場合には、圧縮率は１．２
に設定される。この場合には、時間軸圧縮伸長部１４
は、入力信号の時間長Ｐと出力信号の時間長Ｑとの比が
１．２：１となるように、音声信号に対して時間軸圧縮
伸長処理を行う。この結果、出力音声の話速は、標準再
生速度（１倍速再生時の再生速度）で再生されたときの
音声出力の話速よりも若干速くなる。一方、音声メモリ
１５に入力される有音区間の音声データ量が低減される
ので、上記（１）の場合に比べて、音声メモリ１５内の
未読み出しのデータの蓄積量が増加していく速度が遅く
なり、無音区間のデータ量によっては、逆に音声メモリ
１５内の未読み出しのデータの蓄積量が減少する。（３）蓄積率が６０〜９０％（６０以上且つ９０％未
満）である場合蓄積率が６０〜９０％である場合には、圧縮率は１．４
に設定される。この場合には、時間軸圧縮伸長部１４
は、入力信号の時間長Ｐと出力信号の時間長Ｑとの比が
１．４：１となるように、入力信号に対して時間軸圧縮
伸長処理を行う。この結果、出力音声の話速は、上記
（２）の場合に比べて更に速くなる。Although not described in Table 1, when the accumulation rate of the voice memory 15 is less than 10%, the speech speed control unit 17 prohibits the silent section deleting unit 13 from deleting the silent section. Is output, and then the audio memory 15
When the accumulation rate exceeds 20%, the speech speed control unit 17 outputs a control signal to the silent interval deleting unit 13 to restart the deletion of the silent interval. (2) When the accumulation rate is 30 to 60% (30 or more and less than 60%) When the accumulation rate is 30 to 60%, the compression rate is 1.2.
Is set to In this case, the time axis compression / expansion unit 14
Performs a time axis compression / expansion process on the audio signal such that the ratio of the time length P of the input signal to the time length Q of the output signal becomes 1.2: 1. As a result, the voice speed of the output voice is slightly higher than the voice speed of the voice output at the time of reproduction at the standard reproduction speed (reproduction speed at 1 × speed reproduction). On the other hand, since the amount of voice data in the voiced section input to the voice memory 15 is reduced, the speed at which the amount of unread data stored in the voice memory 15 increases compared to the case of (1) above. And the amount of unread data stored in the audio memory 15 decreases depending on the amount of data in the silent section. (3) When the accumulation rate is 60 to 90% (60 or more and less than 90%) When the accumulation rate is 60 to 90%, the compression rate is 1.4.
Is set to In this case, the time axis compression / expansion unit 14
Performs a time axis compression / expansion process on the input signal so that the ratio of the time length P of the input signal to the time length Q of the output signal is 1.4: 1. As a result, the speech speed of the output voice is higher than in the case (2).

【００５２】一方、音声メモリ１５に入力される有音区
間のデータ量が上記（２）の場合に比べて更に低減され
るので、音声メモリ１５内の未読み出しのデータの蓄積
量が増加していく速度が遅くなり、無音区間のデータ量
によっては、逆に音声メモリ１５内の未読み出しのデー
タの蓄積量が減少する。（４）蓄積率が９０〜１００％（９０以上且つ１００％
未満）である場合蓄積率が９０〜１００％である場合には、圧縮率は１．
５に設定される。この場合には、時間軸圧縮伸長部１４
は、入力信号の時間長Ｐと出力信号の時間長Ｑとの比が
１．５：１となるように、入力信号に対して時間軸圧縮
伸長処理を行う。この結果、出力音声の話速は、上記
（３）の場合に比べて更に速くなる。On the other hand, since the data amount of the sound section inputted to the audio memory 15 is further reduced as compared with the case of the above (2), the storage amount of the unread data in the audio memory 15 increases. On the other hand, depending on the amount of data in the silent section, the amount of unread data stored in the audio memory 15 decreases. (4) The accumulation rate is 90 to 100% (90 or more and 100%
When the accumulation rate is 90 to 100%, the compression rate is 1.
Set to 5. In this case, the time axis compression / expansion unit 14
Performs a time axis compression / expansion process on the input signal such that the ratio of the time length P of the input signal to the time length Q of the output signal becomes 1.5: 1. As a result, the speech speed of the output voice is higher than in the case (3).

【００５３】一方、音声メモリ１５への書き込み速度
は、音声メモリ１５からのデータ読み出し速度と等しい
ので、無音区間削除部１３によって削除された無音区間
のデータ量だけ音声メモリ１５内の未読み出しのデータ
の蓄積量が減少する。音声メモリ１５内の未読み出しの
音声信号の蓄積量が減少していく速度は、無音区間のデ
ータ量が多くなる程、速くなる。On the other hand, since the writing speed to the audio memory 15 is equal to the data reading speed from the audio memory 15, the unread data in the audio memory 15 corresponds to the data amount of the silent section deleted by the silent section deleting unit 13. Decrease the amount of accumulation. The speed at which the accumulated amount of unread audio signals in the audio memory 15 decreases increases as the data amount in the silent section increases.

【００５４】次に、標準再生速度で再生する場合におい
て、音声がその再生速度と同じかそれより遅い速度で再
生される遅聞きモードの動作について説明する。Next, a description will be given of the operation in the slow listening mode in which the sound is reproduced at a speed equal to or lower than the reproduction speed when the reproduction is performed at the standard reproduction speed.

【００５５】表２は、遅聞き再生モードにおける蓄積率
と圧縮率との関係を示している。Table 2 shows the relationship between the accumulation rate and the compression rate in the slow listening playback mode.

【００５６】[0056]

【表２】 [Table 2]

【００５７】話速制御部１７は、上記表２の蓄積率と圧
縮率との関係を記憶した蓄積率／圧縮率テーブルを備え
ており、蓄積率算出部１６から蓄積率が送られてくる
と、この蓄積率／圧縮率テーブルに基づいて、蓄積率算
出部１６から送られてきた蓄積率に対応する圧縮率を読
み出し、時間軸圧縮伸長部１４に設定する。そして、上
述した高速再生モードと同様に、時間軸圧縮伸長部１４
において蓄積率に応じた圧縮率で時間軸圧縮伸長処理が
行われる。〔２〕第２の実施の形態第１の実施の形態では、左チャンネル及び右チャンネル
の音声信号を加算した加算音声信号を用いて、区間判定
及びピッチ周期算出を行う場合について説明したが、本
実施の形態では、左チャンネル及び右チャンネルの音声
信号それぞれの信号強度に基づいて選択されるいずれか
一方の音声信号を用いて、区間判定及びピッチ周期算出
を行う場合について説明する。The speech speed control unit 17 has a storage ratio / compression ratio table storing the relationship between the storage ratio and the compression ratio shown in Table 2 above. Based on this storage rate / compression rate table, the compression rate corresponding to the storage rate sent from the storage rate calculation unit 16 is read and set in the time axis compression / expansion unit 14. Then, similarly to the high-speed playback mode described above, the time axis compression / decompression unit 14
, The time axis compression / expansion processing is performed at a compression rate corresponding to the accumulation rate. [2] Second Embodiment In the first embodiment, a case has been described in which the section determination and the pitch cycle calculation are performed using the added audio signal obtained by adding the audio signals of the left channel and the right channel. In the embodiment, a case will be described in which the section determination and the pitch period calculation are performed using one of the audio signals selected based on the signal strength of each of the audio signals of the left channel and the right channel.

【００５８】図５は、第１の実施の形態における信号分
析部１２の別の構成（信号分析部２２）を示す概略ブロ
ック図である。なお、第１の実施の形態と同様の構成に
ついては同一の図番を付している。FIG. 5 is a schematic block diagram showing another configuration (signal analysis unit 22) of signal analysis unit 12 in the first embodiment. The same components as those in the first embodiment are denoted by the same reference numerals.

【００５９】同図において、２２１は信号強度算出部で
あり、フレームメモリ１１から信号分離部１２１を介し
て入力される左チャンネル及び右チャンネルの音声信号
について、１フレーム分の信号強度を算出する。信号強
度としては、第１の実施の形態において説明したとお
り、信号のパワー、振幅平均値、振幅累積値等を用いる
ことができる。In the figure, reference numeral 221 denotes a signal strength calculation unit which calculates the signal strength of one frame for the left channel and right channel audio signals input from the frame memory 11 via the signal separation unit 121. As the signal strength, as described in the first embodiment, the power of the signal, the average amplitude value, the cumulative amplitude value, and the like can be used.

【００６０】信号強度判定部２２２は、信号強度算出部
２２１から入力される互いに時間的に対応する左チャン
ネル及び右チャンネルの音声信号についての信号強度を
比較して、信号強度の強い方のチャンネルを示す信号が
出力される。The signal strength judging section 222 compares the signal strengths of the left and right channel audio signals input from the signal strength calculating section 221 with respect to each other in time, and determines the channel having the stronger signal strength. Is output.

【００６１】音声信号選択部２２３は、信号強度判定部
２２２より入力される信号に基づいて、信号強度が弱い
と判定されたチャンネル側の音声信号を削除し、信号強
度が強いと判定されたチャンネル側の音声信号を出力す
る。この結果、出力される音声信号は、互いに時間的に
対応する左チャンネル及び右チャンネルの音声信号のう
ち信号強度の強い方のチャンネルの音声信号をフレーム
単位で繋げた構成となる。The audio signal selection section 223 deletes the audio signal on the channel side where the signal intensity is determined to be low based on the signal input from the signal intensity determination section 222 and deletes the channel signal where the signal intensity is determined to be high. Output the audio signal of the side. As a result, the output audio signal has a configuration in which the audio signals of the channel having the stronger signal strength among the audio signals of the left channel and the right channel that temporally correspond to each other are connected in frame units.

【００６２】区間判定部１２３は、音声信号選択部２２
３から入力される音声信号について、先に信号強度算出
部２２１にて算出された信号強度の値を参照することに
より入力された音声信号が有音区間であるか、無音区間
であるかを判定する。[0062] The section determination section 123 includes the audio signal selection section 22.
3 is referred to the value of the signal strength previously calculated by the signal strength calculator 221 to determine whether the input voice signal is a voiced section or a silent section. I do.

【００６３】信号強度算出部２２１にて算出された信号
強度は、予め設定された閾値と比較され、信号強度の値
が閾値以上のときには現フレームが有音区間であること
を示す信号が、また、信号強度の値が閾値より小さいと
きには現フレームが無音区間であることを示す信号が無
音区間削除部１３に出力される。The signal strength calculated by the signal strength calculation unit 221 is compared with a preset threshold value. When the signal strength value is equal to or greater than the threshold value, a signal indicating that the current frame is a voiced section is output. When the value of the signal strength is smaller than the threshold value, a signal indicating that the current frame is a silent section is output to the silent section deleting unit 13.

【００６４】無音区間削除部１３は、第１の実施の形態
と同様に、区間判定部１２３より入力される信号に基づ
いて、無音区間であると判定された区間の左チャンネル
の音声信号及びそれに続く右チャンネルの音声信号を共
に削除し、有音区間であると判定された区間の左チャン
ネルの音声信号及びそれに続く右チャンネルの音声信号
を出力する。As in the first embodiment, based on the signal input from the section determining section 123, the silent section deleting section 13 outputs the audio signal of the left channel of the section determined to be the silent section and the The audio signal of the subsequent right channel is deleted together, and the audio signal of the left channel and the audio signal of the subsequent right channel in the section determined to be a sound section are output.

【００６５】これにより、無音区間として削除されるフ
レームは、左チャンネル及び右チャンネルの音声信号が
互いに時間的に対応するフレームとなるため、有音区間
における左チャンネル及び右チャンネルの音声信号につ
いてフレーム単位の同期が維持される。As a result, the frame deleted as a silent section is a frame in which the audio signals of the left channel and the right channel temporally correspond to each other. Are kept synchronized.

【００６６】また、左チャンネル及び右チャンネルの音
声信号のうち信号強度の強い方をフレーム単位で選択す
ることにより得られる音声信号における無音区間は、信
号強度の弱い方の音声信号も無音区間となることから、
両チャンネルの音声信号について適切に無音区間の判定
が行われる。A silent section of an audio signal obtained by selecting a higher signal strength of the left and right channel audio signals in frame units is also a silent section of the audio signal having a lower signal strength. From that
A silent section is appropriately determined for the audio signals of both channels.

【００６７】一方、ピッチ周期算出部１２４は、音声信
号選択部２２３から入力される音声信号について、第１
の実施の形態と同様に、自己相関を用いてピッチ周期を
算出し、時間軸圧縮伸長部１４に出力する。On the other hand, the pitch cycle calculation section 124 performs the first processing on the audio signal input from the audio signal selection section 223.
Similarly to the embodiment, the pitch period is calculated using the autocorrelation, and is output to the time axis compression / expansion unit 14.

【００６８】時間軸圧縮伸長部１４は、ピッチ周期算出
部１２４より入力される単一のピッチ周期に基づいて、
有音区間であると判定された区間の左チャンネルの音声
信号及びそれに続く右チャンネルの音声信号について、
時間軸圧縮伸長処理を行い、出力する。The time axis compression / expansion unit 14 performs the following based on a single pitch period input from the pitch period calculation unit 124.
For the left channel audio signal and the subsequent right channel audio signal of the section determined to be a sound section,
Perform time axis compression / expansion processing and output.

【００６９】これにより、左チャンネル及び右チャンネ
ルの音声信号は、ともに音声信号選択部２２３にて選択
された左チャンネル及び右チャンネルのいずれか一方の
音声信号から算出された共通のピッチ周期に基づいて時
間軸圧縮伸長処理が行われるため、両チャンネルの同期
が維持される。Thus, the audio signals of the left channel and the right channel are both based on the common pitch period calculated from either the audio signal of the left channel or the audio signal of the right channel selected by the audio signal selector 223. Since the time axis compression / expansion processing is performed, the synchronization of both channels is maintained.

【００７０】また、左チャンネル及び右チャンネルのう
ち信号強度の強い方の音声信号に基づいてピッチ周期を
算出しているため、例え、それが信号強度の弱い側の音
声信号のピッチ周期と異なっている場合であっても、再
生時における影響は小さく、全体としては、再生音声の
品質が高くなる。Further, since the pitch period is calculated based on the audio signal of the left channel and the right channel having the higher signal strength, it differs from the pitch period of the voice signal of the lower signal strength, for example. Even if there is, the effect at the time of reproduction is small, and the quality of the reproduced sound as a whole is high.

【００７１】更に、ピッチ周期算出部１４における演算
処理が、左右２チャンネルの音声信号に対して１チャン
ネル分の処理量に削減されるため、演算処理にかかる負
荷が軽減される。Further, since the calculation processing in the pitch cycle calculation unit 14 is reduced to the processing amount for one channel for the audio signals of two channels on the left and right, the load on the calculation processing is reduced.

【００７２】そして、時間軸圧縮伸長部１４によって時
間軸圧縮伸長処理が行われた音声信号は、一旦音声メモ
リ１５に蓄積され、そこから読み出される。〔３〕第３の実施の形態第２の実施の形態では、左チャンネル及び右チャンネル
の音声信号それぞれの信号強度に基づいて選択されるい
ずれか一方の音声信号を用いて、区間判定及びピッチ周
期算出の算出を行う場合について説明したが、本実施の
形態では、左チャンネル及び右チャンネルそれぞれの自
己相関係数に基づいて選択されるいずれか一方の音声信
号を用いて、区間判定及びピッチ周期算出の算出を行う
場合について説明する。The audio signal which has been subjected to the time axis compression / expansion processing by the time axis compression / expansion unit 14 is temporarily stored in the audio memory 15 and read out therefrom. [3] Third Embodiment In the second embodiment, using one of the audio signals selected based on the signal strength of each of the audio signals of the left channel and the right channel, the section determination and the pitch cycle are performed. Although the case of performing the calculation has been described, in the present embodiment, the section determination and the pitch period calculation are performed using one of the audio signals selected based on the autocorrelation coefficient of each of the left channel and the right channel. The case of calculating is described.

【００７３】図６は、第１の実施の形態における信号分
析部１２の別の構成（信号分析部２３）を示す概略ブロ
ック図である。なお、第１の実施の形態と同様の構成に
ついては同一の図番を付している。FIG. 6 is a schematic block diagram showing another configuration (signal analysis unit 23) of signal analysis unit 12 in the first embodiment. The same components as those in the first embodiment are denoted by the same reference numerals.

【００７４】同図において、２３１は自己相関係数算出
部であり、フレームメモリ１１から信号分離部１２１を
介して入力される左チャンネル及び右チャンネルの音声
信号それぞれについて、第１の実施の形態におけるピッ
チ周期の演算と同様の手法により求められる自己相関値
Ｒｎ(ｋ)を自己相関値Ｒ（０）で除算した自己相関係数
の最大値を算出する。In the figure, reference numeral 231 denotes an auto-correlation coefficient calculation unit, which converts the left channel and right channel audio signals input from the frame memory 11 via the signal separation unit 121 in the first embodiment. The maximum value of the autocorrelation coefficient is calculated by dividing the autocorrelation value Rn (k) obtained by the same method as the pitch period calculation by the autocorrelation value R (0).

【００７５】自己相関判定部２３２は、自己相関係数算
出部２３１から入力される互いに時間的に対応する左チ
ャンネル及び右チャンネルの音声信号それぞれについて
の自己相関係数の最大値を比較し、その結果、自己相関
係数の最大値の大きい方のチャンネルを示す信号が出力
される。The auto-correlation determining section 232 compares the maximum values of the auto-correlation coefficients of the left and right channel audio signals input from the auto-correlation coefficient calculating section 231 with each other in time. As a result, a signal indicating the channel with the largest autocorrelation coefficient is output.

【００７６】音声信号選択部２３３は、自己相関判定部
２３２より入力される信号に基づいて、相関が低い（自
己相関係数の最大値が小さい）と判定されたチャンネル
側の音声信号を削除し、相関が高い（自己相関係数の最
大値が大きい）と判定されたチャンネル側の音声信号を
出力する。この結果、出力された音声信号は、互いに時
間的に対応する左チャンネル及び右チャンネルの音声信
号のうち相関の高い方のチャンネルの音声信号をフレー
ム単位で繋げた構成となる。The audio signal selection section 233 deletes the audio signal on the channel side determined to have low correlation (the maximum value of the autocorrelation coefficient is small) based on the signal input from the autocorrelation determination section 232. , And outputs a channel-side audio signal determined to have a high correlation (the maximum value of the autocorrelation coefficient is large). As a result, the output audio signal has a configuration in which the audio signals of the channel having the higher correlation among the audio signals of the left channel and the right channel that temporally correspond to each other are connected in frame units.

【００７７】一方、信号加算部２３４は、信号分離部１
２１より入力される左チャンネル及び右チャンネルの音
声信号を順次加算し、両チャンネルの１フレーム分のデ
ータが加算された時点で、その加算音声信号を出力す
る。On the other hand, the signal adding section 234
The audio signals of the left channel and the right channel input from 21 are sequentially added, and when data of one frame of both channels is added, the added audio signal is output.

【００７８】区間判定部１２３は、信号加算部２３４か
ら入力される音声信号について、信号強度を求め、得ら
れた値に基づいて入力された音声信号が有音区間である
か、無音区間であるかを判定し、その信号を無音区間削
除部１３に出力する。The section determination section 123 obtains the signal strength of the voice signal input from the signal addition section 234, and based on the obtained value, the input voice signal is a voiced section or a silent section. And outputs the signal to the silent section deletion unit 13.

【００７９】無音区間削除部１３は、第１の実施の形態
と同様に、区間判定部１２３より入力される信号に基づ
いて、無音区間であると判定された区間の左チャンネル
の音声信号及びそれに続く右チャンネルの音声信号を共
に削除し、有音区間であると判定された区間の左チャン
ネルの音声信号及びそれに続く右チャンネルの音声信号
を出力する。As in the first embodiment, based on the signal input from the section determining section 123, the silent section deleting section 13 outputs the audio signal of the left channel of the section determined to be a silent section and the The audio signal of the subsequent right channel is deleted together, and the audio signal of the left channel and the audio signal of the subsequent right channel in the section determined to be a sound section are output.

【００８０】これにより、無音区間として削除されるフ
レームは、左チャンネル及び右チャンネルの音声信号が
互いに時間的に対応するフレームとなるため、有音区間
における左チャンネル及び右チャンネルの音声信号につ
いてフレーム単位の同期が維持される。As a result, the frame to be deleted as a silent section is a frame in which the audio signals of the left channel and the right channel temporally correspond to each other. Are kept synchronized.

【００８１】また、加算音声信号における無音区間は、
左チャンネル及び右チャンネルの音声信号の信号強度が
共に弱いフレームであることから、両チャンネルの音声
信号について適切に無音区間の判定が行われる。The silent section in the added voice signal is
Since the signal strength of both the left and right channel audio signals is a weak frame, the silent section is appropriately determined for the audio signals of both channels.

【００８２】一方、ピッチ周期算出部１２４は、先に自
己相関係数算出部２３１にて算出された自己相関係数の
最大値に対応するピッチ周期を求め、時間軸圧縮伸長部
１４に出力する。On the other hand, pitch cycle calculating section 124 obtains a pitch cycle corresponding to the maximum value of the auto-correlation coefficient previously calculated by auto-correlation coefficient calculating section 231 and outputs it to time axis compression / expansion section 14. .

【００８３】時間軸圧縮伸長部１４は、ピッチ周期算出
部１２４より入力される単一のピッチ周期に基づいて、
有音区間であると判定された区間の左チャンネルの音声
信号及びそれに続く右チャンネルの音声データについ
て、時間軸圧縮伸長処理を行い、その結果を出力する。The time axis compression / expansion unit 14 performs the following based on a single pitch period input from the pitch period calculation unit 124.
A time axis compression / expansion process is performed on the audio signal of the left channel and the audio data of the subsequent right channel in the section determined to be a sound section, and the result is output.

【００８４】これにより、左チャンネル及び右チャンネ
ルの音声信号は、ともに音声信号選択部２３３にて選択
された左チャンネル及び右チャンネルのいずれか一方の
音声信号から算出された共通のピッチ周期に基づいて時
間軸圧縮伸長処理が行われるため、両チャンネルの同期
が維持される。Thus, the audio signals of the left channel and the right channel are both based on the common pitch period calculated from the audio signal of either the left channel or the right channel selected by the audio signal selection section 233. Since the time axis compression / expansion processing is performed, the synchronization of both channels is maintained.

【００８５】また、左チャンネル及び右チャンネルのう
ち相関の高い方の音声信号に基づいてピッチ周期を算出
しているため、例え、それが、相関の低い側の音声信号
のピッチ周期と異なっている場合であっても、再生時に
おける影響は小さく、全体としては、再生音声の品質が
高くなる。Since the pitch period is calculated based on the audio signal having the higher correlation between the left channel and the right channel, it differs from the pitch period of the audio signal having the lower correlation, for example. Even in this case, the effect at the time of reproduction is small, and the quality of the reproduced sound is generally high.

【００８６】そして、時間軸圧縮伸長部１４によって時
間軸圧縮伸長処理が行われた音声信号は、一旦音声メモ
リ１５に蓄積され、そこから読み出される。The audio signal subjected to the time axis compression / expansion processing by the time axis compression / expansion unit 14 is temporarily stored in the audio memory 15 and read out therefrom.

【００８７】なお、上述した各実施の形態においては、
入力される左チャンネル及び右チャンネルの音声信号全
てを対象として無音区間の削除及び時間軸圧縮伸長処理
を行ったが、互いに時間的に対応する左チャンネル及び
右チャンネルの音声信号についてフレーム単位で減算
し、得られた差分値に応じて、無音区間の削除及び時間
軸圧縮伸長処理を行うか否かを判定する構成を設けても
よい。すなわち、差分値が予め定められた閾値以上の場
合、両音声信号が相違しているため、無音区間の削除及
び時間軸圧縮伸長処理を行わず、差分値が閾値より小さ
い場合、両音声信号が類似しているため、無音区間の削
除及び時間軸圧縮伸長処理を行う。In each of the above embodiments,
Silence periods are deleted and time axis compression / expansion processing is performed for all the input left and right channel audio signals, but the left and right channel audio signals temporally corresponding to each other are subtracted in frame units. Alternatively, a configuration may be provided in which it is determined whether or not the silence section is to be deleted and the time axis compression / expansion processing is performed in accordance with the obtained difference value. That is, when the difference value is equal to or greater than the predetermined threshold, the two audio signals are different, so that the silent section is not deleted and the time axis compression / expansion processing is not performed. Since they are similar, a silent section is deleted and a time axis compression / expansion process is performed.

【００８８】この場合、時間的に対応する左チャンネル
及び右チャンネルの音声信号が類似しているフレームだ
けに無音区間の削除及び時間軸圧縮伸長処理が行われる
ため、左チャンネル及び右チャンネルで共通のピッチ周
期を用いても、これによる音質の劣化は極めて小さくな
る。In this case, since the silence section is deleted and the time base compression / expansion processing is performed only on the frame in which the audio signals of the left and right channels corresponding to each other in time are similar, the common processing is performed on the left and right channels. Even if the pitch period is used, the deterioration of the sound quality due to this is extremely small.

【００８９】また、上述した各実施の形態においては、
ハードディスクドライブ１０から読み出される音声デー
タが２チャンネルである場合について説明したが、３チ
ャンネルもしくはそれ以上の場合においても同様の効果
を得ることが可能である。In each of the above embodiments,
The case where the audio data read from the hard disk drive 10 has two channels has been described, but the same effect can be obtained in the case of three channels or more.

【００９０】[0090]

【発明の効果】本発明によれば、複数チャンネルの入力
音声信号に対し単一のピッチ周期で時間軸圧縮伸長処理
を行っているため、再生される音声信号の同期を維持し
つつ話速変換処理を行うことが可能となる。According to the present invention, since the time axis compression / expansion processing is performed on a plurality of input audio signals at a single pitch period, the speech speed conversion is performed while maintaining the synchronization of the reproduced audio signals. Processing can be performed.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における話速変換
装置の構成を示す概略ブロック図である。FIG. 1 is a schematic block diagram illustrating a configuration of a speech speed conversion device according to a first embodiment of the present invention.

【図２】図１の話速変換装置における時間軸圧縮処理
の原理を説明する説明図である。FIG. 2 is an explanatory diagram illustrating the principle of time axis compression processing in the speech speed conversion device of FIG.

【図３】図２の時間軸圧縮処理の原理を用いた時間軸
圧縮処理の一例を説明する説明図である。FIG. 3 is an explanatory diagram illustrating an example of a time axis compression process using the principle of the time axis compression process of FIG. 2;

【図４】図１の話速変換装置における時間軸圧縮伸長
処理の原理を説明する説明図である。FIG. 4 is an explanatory diagram illustrating the principle of time axis compression / expansion processing in the speech speed conversion device of FIG. 1;

【図５】本発明の第２の実施の形態における話速変換
装置の信号分析部の構成を示す概略ブロック図である。FIG. 5 is a schematic block diagram illustrating a configuration of a signal analysis unit of a speech speed conversion device according to a second embodiment of the present invention.

【図６】本発明の第３の実施の形態における話速変換
装置の信号部分析部の構成を示す概略ブロック図であ
る。FIG. 6 is a schematic block diagram illustrating a configuration of a signal analysis unit of a speech speed conversion device according to a third embodiment of the present invention.

[Explanation of symbols]

１０：ハードディスクドライブ１１：フレームメモリ１２：信号解析部１２１：信号分離部１２２：信号加算部１２３：区間判定部１２４：ピッチ周期算出部１３：無音区間削除部１４：時間軸圧縮伸長部１５：音声メモリ１６：蓄積率算出部１７：話速変換部１８：Ｌ／Ｒ分離部１９１：Ｄ／Ａコンバータ１９２：Ｄ／Ａコンバータ 10: Hard disk drive 11: Frame memory 12: Signal analysis unit 121: Signal separation unit 122: Signal addition unit 123: Section determination unit 124: Pitch cycle calculation unit 13: Silence section deletion unit 14: Time axis compression / decompression unit 15: Voice Memory 16: Accumulation rate calculation unit 17: Speech speed conversion unit 18: L / R separation unit 191: D / A converter 192: D / A converter

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０９Ｂ 19/06 Ｇ１０Ｌ 9/00 Ｄ 21/04 9/08 Ｂ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G09B 19/06 G10L 9/00 D 21/04 9/08 B

Claims

[Claims]

1. A speech speed conversion device for performing a speech speed conversion process on an input speech signal of a plurality of channels input from an audio reproduction device based on a pitch period obtained from the input speech signal, wherein the input speech of the plurality of channels is provided. Pitch period calculating means for calculating a pitch period common to each channel from the signal for each processing section, and time axis compression / expansion means for time axis compressing the input audio signal of each channel in the processing section based on the obtained pitch period. A speech speed conversion device comprising:

2. The speech speed conversion device according to claim 1, wherein said processing section changes according to a pitch period calculated by said pitch period calculation means.

3. The pitch cycle calculation means includes an addition means for adding the input audio signals of the plurality of channels for each processing section, and calculates a pitch cycle from the added input audio signal obtained by the addition means. 2. The method according to claim 1, wherein
Or the speech speed converter according to 2.

4. A maximum signal strength detecting means for detecting a channel having a maximum signal strength in each of the processing sections from the input audio signals of the plurality of channels, wherein the pitch cycle calculating means comprises: 3. The speech speed conversion device according to claim 1, wherein a pitch period is calculated from an input voice signal of the channel detected by the means.

5. A maximum autocorrelation value detecting means for detecting a channel having a maximum autocorrelation value for each processing section from the input audio signals of the plurality of channels, wherein the pitch period calculating means comprises: 3. The speech speed conversion device according to claim 1, wherein a pitch period is calculated from an input audio signal of the channel detected by the correlation value detection means.

6. A speech speed conversion device for performing speech speed conversion processing on input speech signals of a plurality of channels input from an audio reproduction device based on a pitch period obtained from the input speech signals, wherein the input speech of the plurality of channels is provided. Adding means for adding a signal for each first processing section, and section determination for determining whether the processing section is a sound section or a silent section based on the added input audio signal obtained by the addition means Means, a silent section deleting means for deleting an input audio signal of each channel in a processing section determined to be a silent section by the section determining means, and a section having a sound is determined by the section determining means. Pitch period calculating means for calculating a single pitch period for each second processing section from the added input audio signal; and Speech speed conversion apparatus characterized by an input audio signal for each channel and a time axis compression and expansion means for compressing the time axis that.

7. A speech rate conversion processing means for performing speech rate conversion processing on input audio signals of a plurality of channels input from an audio reproduction apparatus, and an audio memory in which the audio signal processed by the speech rate conversion processing means is written. And a reading means for reading an audio signal from the audio memory, wherein the speech speed conversion processing means outputs a maximum signal intensity for each of the first processing sections from the input audio signals of the plurality of channels. Maximum signal strength detection means for detecting the channel having one, and whether the processing section is a sound section or a silent section based on the input audio signal of the channel detected by the maximum signal strength detection means. A section determining means for determining; a silent section deleting means for deleting an input audio signal of each channel in a processing section determined to be a silent section by the section determining means; Pitch period calculating means for calculating a single pitch period for each second processing section from an input audio signal of a channel determined to be a sound section in a stage, and a pitch obtained by the pitch period calculating means A speech speed conversion device comprising: a time axis compression / expansion means for time axis compression of an input audio signal of each channel in the processing section based on a cycle.

8. A speech speed conversion processing means for performing speech speed conversion processing of input audio signals of a plurality of channels input from an audio reproduction device, and an audio memory in which the audio signal processed by the speech speed conversion processing means is written. Reading means for reading out a voice signal from the voice memory, wherein the voice speed conversion processing means generates a maximum autocorrelation for each first processing section from the input voice signals of the plurality of channels. Maximum autocorrelation value detecting means for detecting a channel having a value, and whether the processing section is a sound section or a silent section based on the input audio signal of the channel detected by the maximum autocorrelation value detecting means. A section determining means for determining whether or not an input audio signal of each channel in a processing section determined to be a silent section by the section determining means; Means for calculating a single pitch cycle for each second processing section from an input audio signal of a channel determined to be a sound section by means, and a pitch obtained by the pitch cycle calculating means. A speech speed conversion device, comprising: time axis compression / expansion means for time axis compression of an input audio signal of each channel in the processing section based on a cycle.

9. The speech speed conversion device according to claim 6, wherein the second processing section changes according to a pitch cycle calculated by the pitch cycle calculation means. .