JP2005092021A

JP2005092021A - Method and device for speaking speed conversion

Info

Publication number: JP2005092021A
Application number: JP2003327784A
Authority: JP
Inventors: Mikio Oda; 幹夫小田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-09-19
Filing date: 2003-09-19
Publication date: 2005-04-07

Abstract

<P>PROBLEM TO BE SOLVED: To perform speaking speed conversion processing which is free of time deviation in speech stereo signal input. <P>SOLUTION: Provided are an adder which adds inputted speech stereo signals of channels L and R together, a speaking speed conversion processing part which inputs the speech signal made monaural by the adder and performs specified speaking speed conversion, and a distributor which distributes the signal outputted after the speaking speed conversion process by the speaking speed conversion processing part to the speech channels L and R respectively, so that the distributed speech L and R signals are reproduced by a speaker. Speed conversion which is free of time deviation in speech stereo signal input is realized without adding algorithm of mutual time management processing of the inputted speech stereo signals and extra memory. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、話す速度を制御する話速変換に関するものである。 The present invention relates to speaking speed conversion for controlling speaking speed.

昨今のテレビ放送は放送局の増加、また衛星デジタル放送の開局などで、さまざまなジャンルのプログラムを放送できるようになり、視聴者を楽しませている。しかしながらこれらのプログラムの中には、早口でしゃべるアナウンサーやタレントがおり、高齢者には聞き取れない場合がある。この課題を解決する技術として、デジタル技術の進歩により、音程を変えずに、速度のみをゆっくりとするデジタル方式の話速変換装置が提案されている（例えば特許文献１参照）。 With the recent increase in broadcasting stations and the opening of digital satellite broadcasting, various types of programs can now be broadcast, which is entertaining viewers. However, some of these programs have spoken announcers and talents that may not be audible to the elderly. As a technique for solving this problem, there has been proposed a digital speech speed conversion device that slows only the speed without changing the pitch due to the advancement of digital technology (see, for example, Patent Document 1).

以下図４を参照しながら、デジタル方式の話速変換装置における第１の従来技術の一例について説明する。図４において、符号１０は入力された音声信号をアナログ音声信号からデジタル音声信号に変換するＡ／Ｄコンバータ、１１はデジタル音声信号を話速変換する話速変換部、１２は話速変換されたデジタル音声信号をアナログ音声信号に戻すＤ／Ａコンバータ、１３は入力され、デジタル変換された音声デジタル信号を一時格納するリングメモリに構成されたメモリである。 Hereinafter, an example of the first prior art in the digital speech rate conversion apparatus will be described with reference to FIG. In FIG. 4, reference numeral 10 is an A / D converter that converts an input voice signal from an analog voice signal to a digital voice signal, 11 is a speech speed conversion unit that converts the digital voice signal, and 12 is speech speed converted. A D / A converter 13 for converting a digital audio signal back to an analog audio signal is a memory configured as a ring memory that temporarily stores an input digitally converted audio digital signal.

以上のように構成された話速変換装置について、その動作を説明する。入力された音声信号はＡ／Ｄコンバータ１０でアナログ信号からデジタル信号に変換され、話速変換する話速変換部１１に入力され、また同時にリングメモリ構成されたメモリ１３に格納される。話速変換部１１は入力された音声デジタル信号の無音区間の切り出し、話速変換する発音の母音切り出しなどの処理を行い、図５（ａ）に示す会話例「それはわたしのです」を図５（ｂ）に示すごとく、単純に２倍の時間に話速を伸ばすとすると「そーれーはーわーたーしーのーでーすー」という具合に母音を追加して時間を延ばす処理を行う。時間を伸ばされた音声デジタル信号はＤ／Ａコンバータ１２でデジタル音声信号からアナログ音声信号に戻され、一連の話速変換処理を終了する。一方、話速変換され時間が伸ばされていくと、有限なメモリ容量であるメモリ１３は、書き込みアドレスを読み出しアドレスが追い越す、いわゆるオーバーフローする可能性が出てくる。メモリ１３の残量を照会しながら話速変換処理の話速変換率を決定しようとするのが従来の方式であり、逆に無音区間では保存の必要がなく、つぎの音声信号の立ち上がりで、話速変換の頭合わせを行い、オーバーフローの削減を図る。 The operation of the speech speed converting apparatus configured as described above will be described. The input audio signal is converted from an analog signal to a digital signal by the A / D converter 10 and input to the speech speed conversion unit 11 for converting the speech speed, and is simultaneously stored in the memory 13 configured as a ring memory. The speech speed conversion unit 11 performs processing such as extraction of a silent section of the input voice digital signal, extraction of a vowel of a pronunciation for speech speed conversion, and the conversation example “that is mine” shown in FIG. As shown in (b), if you simply increase the speaking speed in twice the time, you can add vowels and say, “Sore is so awesome!” Perform a lengthening process. The audio digital signal whose time has been extended is returned from the digital audio signal to the analog audio signal by the D / A converter 12, and a series of speech speed conversion processing is completed. On the other hand, when the speech speed is converted and the time is extended, the memory 13 having a finite memory capacity may overflow so that the read address overtakes the write address. It is the conventional method to try to determine the speech rate conversion rate of the speech rate conversion process while inquiring the remaining amount of the memory 13, and conversely, there is no need to store in the silent period, and at the rising edge of the next speech signal, The head of speech speed conversion is adjusted and the overflow is reduced.

しかしながら前記従来の構成では、音声がモノラル信号なら、話速変換することで、聞き易くなるが、昨今のテレビ放送の音声はステレオ信号が通常であり、音声ステレオ信号を話速変換する場合は従来構成のステレオ化として図６に示す２チャンネル並列を想定した場合、同期ずれの問題が考えられる。すなわち、図６の話速変換装置に、図５（ｃ）に示す音声ステレオ信号が入力された場合の話速変換を考えると、Ｌチャンネル音声が時間ｔ１で言い終わってからＲチャンネル音声が時間ｔ２で言い始める場合、話速変換すると、図５（ｄ）に示すごとくＬチャンネル音声が時間ｔ３まで遅れたにもかかわらず、無音区間で待機していたＲチャンネル音声が時間ｔ２で話速変換を開始し、時間ｔ２と時間ｔ３の区間に時間オーバーラップが発生するなど同期ずれの問題があった。 However, in the conventional configuration, if the sound is a monaural signal, it is easier to hear by converting the speech speed. However, the sound of a recent television broadcast is usually a stereo signal, and when converting the audio stereo signal to the speech speed, it is conventional. Assuming that the two-channel parallel configuration shown in FIG. 6 is assumed as the stereo structure, there is a problem of synchronization shift. That is, considering the speech speed conversion when the speech stereo signal shown in FIG. 5 (c) is input to the speech speed conversion device of FIG. 6, the R channel sound is timed after the L channel sound is finished at time t1. When speaking is started at t2, when the speech speed is converted, as shown in FIG. 5 (d), the R channel speech that has been waiting in the silent section is converted to speech speed at time t2 even though the L channel speech is delayed until time t3. , And there is a problem of synchronization error such that time overlap occurs between time t2 and time t3.

この課題を解決する技術として音声ステレオ信号を加算し、加算された音声信号で無音区間を検出して、それぞれＬチャンネル音声と、Ｒチャンネル音声の無音区間削除を行い、ＬチャンネルとＲチャンネルの同期をとった話速変換装置が提案されている（例えば特許文献２参照）。 As a technique for solving this problem, an audio stereo signal is added, a silence interval is detected from the added audio signal, and silence intervals of the L channel audio and the R channel audio are deleted, respectively, and the L channel and the R channel are synchronized. A speech speed conversion device has been proposed (see, for example, Patent Document 2).

以下図７を参照しながら、話速変換装置における第２の従来技術の一例について説明する。図７において、符号２０は映像、音声信号の記録媒体としてのハードディスクドライブ、２１はハードディスクドライブ２０から読み出された映像、音声データをフレーム毎に記憶するフレームメモリ、２２は映像、音声信号を分離する信号分離部、２３は分離された音声ステレオ信号を加算する信号加算部、２４は話速変換するために、ピッチ周期を検出するピッチ周期算出部、２５は１フレーム分の加算音声信号が有音区間か無音区間かを判定する区間判定部、２６は区間判定部２５で無音区間と判定された区間のＬチャンネル音声と、Ｒチャンネル音声を共に削除する無音区間削除部、２７はピッチ周期単位で時間軸圧縮伸張処理する時間軸圧縮伸張部、２８は時間軸圧縮伸張された音声データを記憶する音声メモリ、２９は音声メモリ２８のデータ蓄積率を算出し、話速変換制御を行う蓄積率算出、話速制御部、３０は音声メモリ２８から順次読み出される音声ＬＲデータを分離するＬ／Ｒ分離部、３１はＬＲ分離された音声信号のうち、Ｌチャンネルの音声信号をデジタルからアナログに変換するＤ／Ａコンバータ、３２はＬＲ分離された音声信号のうち、Ｒチャンネルの音声信号をデジタルからアナログに変換するＤ／Ａコンバータである。 Hereinafter, an example of the second prior art in the speech speed conversion device will be described with reference to FIG. In FIG. 7, reference numeral 20 is a hard disk drive as a recording medium for video and audio signals, 21 is a frame memory for storing video and audio data read from the hard disk drive 20, and 22 is a video and audio signal separated. 23, a signal adding unit for adding the separated audio stereo signals, 24 for a pitch period calculating unit for detecting the pitch period for speech speed conversion, and 25 for having an added audio signal for one frame. A section determining unit that determines whether the section is a sound section or a silent section, 26 is a silent section deleting unit that deletes both the L channel sound and the R channel sound of the section determined by the section determining unit 25 as a silent section, and 27 is a pitch cycle unit A time axis compression / decompression unit that performs time axis compression / decompression processing, 28 is a voice memory that stores time-axis compressed / expanded voice data, and 29 is a voice memo 28, the storage rate calculation for performing speech speed conversion control, the speech speed control unit, 30 is an L / R separation unit that separates the speech LR data that is sequentially read from the speech memory 28, and 31 is the LR separation. A D / A converter that converts an L channel audio signal from digital to analog among the audio signals, and 32 is a D / A converter that converts an R channel audio signal from digital to analog among the LR separated audio signals. It is.

以上のように構成された第２の従来の話速変換装置について、その動作を説明する。デジタルテレビ放送などで送られた映像信号、音声信号は復調され、ハードディスクドライブ２０に記録される。記録された映像信号、音声信号はフレームメモリ２１にフレーム単位で読み出され、信号分配部２２で映像信号、音声信号に分離する。分離された音声ＬＲ信号は信号加算部２３で順次加算され、両チャンネルの１フレーム分の音声信号が加算された時点で加算信号を出力する。区間判定部２５は、加算出力された１フレーム分の信号の信号レベルを検出し、有音区間か、無音区間かを判定し、無音区間削除部２６を制御し、無音区間であれば、その１フレーム分のＬチャンネル音声と、Ｒチャンネル音声を共に削除する。このようにして、フレーム単位の同期を維持する。但し、Ｌチャンネル音声と、Ｒチャンネル音声はそれぞれ独立である。次にピッチ周期算出部２４でピッチ周期を算出し、同一のピッチ周期を用いて時間軸圧縮伸張することにより、時間軸圧縮伸張後の、音声Ｌチャンネル、Ｒチャンネルの同期が維持されつつ、品質の良い時間圧縮伸張処理が可能となる。時間軸圧縮伸張された音声データは音声メモリ２８に記録され、順次読み出され、音声ＬＲデータを分離するＬ／Ｒ分離部３０、Ｌチャンネルの音声信号をデジタルからアナログに変換するＤ／Ａコンバータ３１、Ｒチャンネルの音声信号をデジタルからアナログに変換するＤ／Ａコンバータ３２を通してアナログ２チャンネルで再生される。また音声メモリ２８のデータ蓄積率を算出し、話速変換制御を行う蓄積率算出、話速制御部２９は、蓄積率が少ないと無音区間を削除しないとか、逆に蓄積率が多いと、圧縮率を高くするとかの制御を行う、つまり、速聞き時の話速変換方法である。
特開平７−１９２３９２号公報特開２００２−２９７２００号公報 The operation of the second conventional speech speed converting apparatus configured as described above will be described. A video signal and an audio signal transmitted by digital television broadcasting or the like are demodulated and recorded in the hard disk drive 20. The recorded video signal and audio signal are read into the frame memory 21 in units of frames and separated into a video signal and an audio signal by the signal distribution unit 22. The separated audio LR signals are sequentially added by the signal adder 23, and an added signal is output when the audio signals for one frame of both channels are added. The section determination unit 25 detects the signal level of the signal for one frame that is added and output, determines whether it is a voiced section or a silent section, controls the silent section deletion unit 26, and if it is a silent section, Both L-channel audio and R-channel audio for one frame are deleted. In this way, synchronization in frame units is maintained. However, the L channel sound and the R channel sound are independent of each other. Next, the pitch period is calculated by the pitch period calculation unit 24, and the time axis compression / expansion is performed using the same pitch period, thereby maintaining the synchronization of the audio L channel and R channel after the time axis compression / expansion. Time compression / decompression processing is possible. The audio data subjected to time-axis compression / expansion is recorded in the audio memory 28, sequentially read out, and an L / R separation unit 30 for separating the audio LR data, and a D / A converter for converting the L channel audio signal from digital to analog. 31. The audio signal of the R channel is reproduced with two analog channels through a D / A converter 32 for converting the digital signal into an analog signal. Further, the storage rate calculation / speech rate control unit 29 that calculates the data storage rate of the voice memory 28 and performs speech speed conversion control does not delete the silent section if the storage rate is low, or conversely, if the storage rate is high, the compression is performed. This is a method for converting the speech speed at the time of fast listening.
JP-A-7-192392 JP 2002-297200 A

しかしながら音声ステレオ信号を話速変換する場合は、前記第１の従来技術の構成のように単純に２チャンネル並列処理すると、Ｌチャンネル音声とＲチャンネル音声に時間同期がとれず、オーバーラップ区間が発生するなど問題があり、このオーバーラップを防ぐには前記第２の従来技術の構成のように、Ｌチャンネル、Ｒチャンネルの相互時間管理処理のアルゴリズムの追加と、ＬＲ独立の音声メモリの追加が必要になる。 However, when converting the speech speed of the audio stereo signal, if the two channels are simply processed in parallel as in the first prior art configuration, the L channel audio and the R channel audio cannot be synchronized in time, and an overlap period is generated. In order to prevent this overlap, it is necessary to add an algorithm for mutual time management processing of the L channel and the R channel and to add an LR independent audio memory as in the configuration of the second prior art. become.

前記課題を解決するために、本発明の話速変換方法は、入力された複数チャンネル、例えばＬチャンネル及びＲチャンネルの音声ステレオ信号を加算して、モノラル信号で話速変換することを特徴としたものである。 In order to solve the above-mentioned problem, the speech speed conversion method of the present invention is characterized in that the input voice stereo signals of a plurality of channels, for example, L channel and R channel, are added and the speech speed is converted with a monaural signal. Is.

また話速変換装置として、入力されたＬチャンネル及びＲチャンネルの音声ステレオ信号を加算する加算器と、前記加算器でモノラル信号になった音声信号を入力とし所定の話速変換を行う話速変換処理部と、前記話速変換処理部の出力信号をＬチャンネルとＲチャンネルにそれぞれ分配し音声ＬＲ信号として出力する分配器を具備し、話速変換及び分配された音声ＬＲ信号をスピーカあるいはイヤホンで再生するように構成したものである。 Further, as a speech speed conversion device, an adder for adding the input L-channel and R-channel audio stereo signals, and a speech speed conversion for performing a predetermined speech speed conversion by inputting the audio signal that has become a monaural signal by the adder. A processing unit and a distributor that distributes the output signal of the speech speed conversion processing unit to the L channel and the R channel, respectively, and outputs the result as a speech LR signal. The speech LR signal that has been subjected to speech speed conversion and distribution is output by a speaker or an earphone. It is configured to reproduce.

上記構成を備えることにより、本発明は、音声ステレオ信号の話速変換処理の場合において、時間同期がとれないことで発生するＬチャンネル音声とＲチャンネル音声のオーバーラップを防止できるとともに、Ｌチャンネル、Ｒチャンネルの相互時間管理処理アルゴリズムの追加及び、それに伴うメモリの追加をすることなく簡単な回路構成で話速変換が行えるものである。 By providing the above configuration, the present invention can prevent the overlap of the L channel sound and the R channel sound that are generated when time synchronization is not achieved in the case of the speech speed conversion processing of the sound stereo signal, and the L channel, The speech speed can be converted with a simple circuit configuration without adding an R channel mutual time management processing algorithm and accompanying memory.

（実施の形態１）
以下に、本発明の第１の実施の形態について、図１、図２を用いて説明する。 (Embodiment 1)
The first embodiment of the present invention will be described below with reference to FIGS.

図１は、本発明の第１の実施の形態における話速変換装置の構成を示すブロック図である。図１において、符号１は、入力されたＬチャンネル、Ｒチャンネルの音声ステレオ信号を加算する加算器、２は加算器１でモノラル信号になった音声信号を入力とし、所定の話速変換を行う話速変換処理部、３は話速変換処理部２で話速変換処理され、出力された信号を、ＬチャンネルとＲチャンネルにそれぞれ分配し音声ＬＲ信号として出力する分配器であり、分配器３で分配された音声ＬＲ信号をＬチャンネルとＲチャンネルのスピーカで再生するように構成している。なお音声ＬＲ信号は、話速変換されたモノラル信号をＬチャンネルとＲチャンネルに同一情報を有する信号として分配したものである。 FIG. 1 is a block diagram showing the configuration of the speech rate conversion apparatus according to the first embodiment of the present invention. In FIG. 1, reference numeral 1 is an adder for adding input L channel and R channel audio stereo signals, and 2 is an audio signal that has been converted to a monaural signal by the adder 1, and performs predetermined speech speed conversion. The speech speed conversion processing unit 3 is a distributor that performs speech speed conversion processing by the speech speed conversion processing unit 2 and distributes the output signals to the L channel and the R channel, respectively, and outputs them as audio LR signals. The audio LR signal distributed in (1) is reproduced by speakers of the L channel and the R channel. Note that the audio LR signal is obtained by distributing a monaural signal subjected to speech speed conversion as a signal having the same information in the L channel and the R channel.

以上のように構成された話速変換装置について、その動作と各部の詳細を説明する。例えばテレビ受信機の音声再生において、Ｌチャンネル、Ｒチャンネルの音声ステレオ信号に検波された音声信号は加算器１で加算され、モノラル信号に変換される。加算器１は、例えばデジタル信号を扱うのであれば、アダーと両チャンネルのタイミングを取るラッチとで容易に構成できるし、アナログ信号を扱うのであれば、抵抗ネットワーク等を用いて容易に構成でき、複数チャンネルの音声情報を加算してモノラル信号に変換するものであればその構成は限定されない。 The operation and details of each part of the speech speed converting apparatus configured as described above will be described. For example, in audio reproduction by a television receiver, audio signals detected as L-channel and R-channel audio stereo signals are added by an adder 1 and converted to a monaural signal. For example, if the adder 1 handles digital signals, the adder 1 can be easily configured with an adder and a latch that takes timing of both channels. If an analog signal is handled, the adder 1 can be easily configured using a resistor network or the like. The configuration is not limited as long as audio information of a plurality of channels is added and converted into a monaural signal.

次に加算器１から出力されたモノラル信号を入力とし、話速変換処理部２は所定の話速変換処理をする。以下、図２を用いて話速変換処理の詳細を説明する。 Next, the monaural signal output from the adder 1 is input, and the speech speed conversion processing unit 2 performs a predetermined speech speed conversion process. Details of the speech speed conversion process will be described below with reference to FIG.

今、図２（ａ）に示すごとく、Ｌチャンネル、Ｒチャンネルにそれぞれ、「それはわたしのです」、「いいえかれのです」という会話があったとすると、加算器１の出力はモノラル信号に変換されることにより、図２（ｂ）に示すごとく、「それはわたしのですいいえかれのです」という具合に、ＬチャンネルとＲチャンネル間での会話における時間的対応を保持したモノラル信号に変換される。 As shown in Fig. 2 (a), if there are conversations on the L channel and the R channel, that is "I am" and "No, it is", the output of the adder 1 is converted to a monaural signal. As shown in FIG. 2 (b), it is converted to a monaural signal that retains the temporal correspondence in the conversation between the L channel and the R channel, such as "That's mine.

話速変換処理部２における話速変換処理は、例えばその一例を示すと、まず入力されたモノラル信号の無音区間の切り出し、話速変換する発音の母音切り出しなどの処理を行う。例えば変換比率を２倍の時間に話速変換すると仮定すると、無音区間を除いた会話の区間に母音を追加して２倍の時間に延ばす処理を行う。その処理結果を図２（ｃ）に示す。 As an example of the speech speed conversion processing in the speech speed conversion processing unit 2, first, processing such as extraction of a silent section of an input monaural signal, extraction of a vowel of pronunciation for speech speed conversion, and the like is performed. For example, assuming that the conversion rate is converted to a speech speed of twice the time, processing is performed to add a vowel to the conversation section excluding the silent section and extend it to twice the time. The processing result is shown in FIG.

図２（ｃ）の処理結果に示されるごとく、Ｌチャンネルの会話部分の終了時刻はｔ１からｔ３になり時間が延びている。そしてＲチャンネルの会話部分の開始時刻もｔ２からｔ３にシフトされている。その結果Ｌチャンネルの会話部分に対してＲチャンネルの会話部分がオーバーラップすることがなく、かつ会話における時間的対応も保持しつつ加速変換されたことが分かる。 As shown in the processing result of FIG. 2C, the end time of the conversation portion of the L channel is from t1 to t3, and the time is extended. The start time of the conversation part of the R channel is also shifted from t2 to t3. As a result, it can be seen that the conversation portion of the R channel does not overlap with the conversation portion of the L channel, and acceleration conversion is performed while maintaining temporal correspondence in the conversation.

次に、話速変換部１より出力された信号は、分配器３でＬチャンネルとＲチャンネルに音声ＬＲ信号としてそれぞれ分配される。分配器３は、デジタル信号を扱うのであれば、両チャンネルに同一音声情報を流せばよく、アナログ信号を扱うのであれば、抵抗分割等で両チャンネルに同一信号を流すように構成すればよく、当該機能を満たすものであればその構成は限定されない。 Next, the signal output from the speech speed conversion unit 1 is distributed as an audio LR signal to the L channel and the R channel by the distributor 3. The distributor 3 may be configured to flow the same audio information to both channels if it handles digital signals, and may be configured to flow the same signal to both channels by resistance division or the like if it handles analog signals. The configuration is not limited as long as the function is satisfied.

さらに、話速変換され分配された音声ＬＲ信号はスピーカ再生のための回路、例えばアンプ回路などに供給され、テレビ受信機内蔵のスピーカで再生されてテレビ視聴が可能となる。 Further, the speech LR signal which has been converted after the speech speed is supplied to a circuit for reproducing a speaker, for example, an amplifier circuit, and is reproduced by a speaker built in the television receiver so that the television can be viewed.

以上のように、本発明の話速変換装置によれば、Ｌチャンネル、Ｒチャンネルの音声ステレオ信号を、単純に加算するだけの簡単な回路構成で、音声ステレオ信号入力時の時間ズレのない話速変換が実現でき聞き易いテレビ視聴が可能となる。 As described above, according to the speech speed converting apparatus of the present invention, the speech stereo signal of L channel and R channel can be simply added and the speech without a time shift when the audio stereo signal is input. High-speed conversion can be realized and TV viewing that is easy to hear is possible.

また、従来技術の構成のようにステレオ信号のまま話速変換を行った場合、Ｌチャンネル、Ｒチャンネルの相互時間管理処理を行う必要があるのと、余分なメモリが必要となることが予想され、話速変換処理システムが大きくなる欠点があるのに対し、本願発明のように分配されたモノラル信号である音声ＬＲ信号を再生することで、音の定位情報は損なわれるが、上記の従来技術の構成で予想されるような欠点もなく、簡単な構成で両チャンネル間での会話における時間的対応を保持した聞き易い話速変換が実現でき、例えば高齢者など早い会話を聞き取ることが困難な視聴者に対しそのメリットは大きい。 In addition, when speech speed conversion is performed with a stereo signal as in the configuration of the prior art, it is necessary to perform mutual time management processing of the L channel and the R channel, and an extra memory is expected. The speech speed conversion processing system has a disadvantage that the sound localization information is lost by reproducing the audio LR signal which is a monaural signal distributed as in the present invention. With the simple configuration, it is possible to realize easy-to-listen speech speed conversion that keeps the time correspondence in the conversation between both channels with a simple configuration. For example, it is difficult to hear early conversations such as elderly people The benefit is great for viewers.

特にＬＲ独立の話速変換処理だと、Ｌチャンネルの話速変換が終わった事を検出してからＲチャンネルの話速変換する。そのため余分な同期取りのアルゴリズム処理時間の追加があり、それまでＲチャンネルの話速変換を待たせておくというように、Ｌチャンネルの会話終了時とＲチャンネルの会話開始時とのタイムラグが生じ、話速変換による画像とのズレが拡大してしまう。 In particular, in the LR independent speech speed conversion process, the R channel speech speed conversion is performed after detecting that the L channel speech speed conversion has been completed. Therefore, extra synchronization processing time is added, and there is a time lag between the end of the L channel conversation and the start of the R channel conversation. Misalignment with the image due to speech speed conversion will increase.

しかし本発明の話速変換装置によれば、話速変換による両チャンネル間の無音処理は同時に行われ、例えば図２（ｃ）に示すように、Ｌチャンネルの会話終了時ｔ３に対し、Ｒチャンネルの会話開始時もｔ３となり、タイムラグがなく、よって話速変換による画像とのズレは最小に抑えることができる。すなわち、本発明の話速変換装置は、テレビジョン受像機、ＶＴＲ、ＤＶＤなどの画像を伴うＡＶ機器の話速変換において格別の効果を奏する。 However, according to the speech speed converting apparatus of the present invention, the silence processing between the two channels by the speech speed conversion is performed at the same time. For example, as shown in FIG. T3 at the start of the conversation, and there is no time lag. Therefore, the deviation from the image due to the speech speed conversion can be minimized. That is, the speech speed conversion apparatus of the present invention has a special effect in the speech speed conversion of AV equipment with images such as television receivers, VTRs, and DVDs.

なお、本実施の形態では話速変換の変換比率が２倍の場合を説明したが、変換比率はこの値に限定されることはなく、必要に応じてその値を設定すればよい。その場合、図２で示した時刻ｔ３に相当するタイミングが変わるだけで、本願発明の作用効果は同様に奏し得るものである。 In the present embodiment, the case where the conversion ratio of the speech speed conversion is double has been described. However, the conversion ratio is not limited to this value, and may be set as necessary. In that case, only the timing corresponding to the time t3 shown in FIG.

また音声ステレオ信号の２チャンネルの場合を例に説明したが、ＭＰＥＧ−２ＡＡＣ、ドルビーデジタルなどの５．１チャンネルや、マルチチャンネルの音声信号に対しても、全音声チャンネルを加算してモノラル信号で話速変換すれば、簡単な回路構成で、時間ズレのない話速変換が実現できることは言うまでもない。 Also, the case of two audio stereo signals has been described as an example, but 5.1 audio such as MPEG-2 AAC, Dolby Digital, and multi-channel audio signals are also added to all audio channels as monaural signals. Needless to say, if the speech speed is converted, the speech speed can be converted with a simple circuit configuration and without time lag.

（実施の形態２）
つぎに、本発明の第２の実施の形態について、図３を用いて説明する。図３は、本発明の第２の実施の形態における話速変換装置の構成を示すブロック図である。なお、第１の実施形態と同一の構成要素に対しては同一の符合を付すこととし、説明が重複する部分は適宜省略するものとする。 (Embodiment 2)
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 3 is a block diagram showing the configuration of the speech rate conversion apparatus according to the second embodiment of the present invention. In addition, the same code | symbol shall be attached | subjected with respect to the component same as 1st Embodiment, and the part which overlaps description shall be abbreviate | omitted suitably.

図３において、符号１は、入力されたＬチャンネル、Ｒチャンネルの音声ステレオ信号を加算する加算器、２は加算器１でモノラル信号になった音声信号を入力とし、所定の話速変換を行う話速変換処理部、３は話速変換処理部２で話速変換処理され、出力された信号を、ＬチャンネルとＲチャンネルにそれぞれ分配し音声ＬＲ信号として出力する分配器であり、分配器３で分配された音声ＬＲ信号をＬチャンネルとＲチャンネルのイヤホン出力とし、スピーカでは、入力された通常のＬチャンネルとＲチャンネルの音声ステレオ信号を再生する構成である。 In FIG. 3, reference numeral 1 is an adder for adding the input L-channel and R-channel audio stereo signals, and 2 is the input of the audio signal that has been converted to a monaural signal by the adder 1, and performing predetermined speech speed conversion. The speech speed conversion processing unit 3 is a distributor that performs speech speed conversion processing by the speech speed conversion processing unit 2 and distributes the output signals to the L channel and the R channel, respectively, and outputs them as audio LR signals. The audio LR signal distributed in the above is used as the L channel and R channel earphone outputs, and the speaker is configured to reproduce the input normal L channel and R channel audio stereo signals.

以上のように構成された話速変換装置について、その動作を説明する。テレビ受信機の音声再生において、Ｌチャンネル、Ｒチャンネルの音声ステレオ信号に検波された音声信号はそのままテレビ受像機内臓のスピーカで再生される。他方、この音声ステレオ信号は加算器１で加算され、モノラル信号に変換され、さらに話速変換処理部２で所定の話速変換がなされ、分配器３で音声ＬＲチャンネルにそれぞれ分配され、話速変換及び分配された音声ＬＲ信号としてテレビ受信機に追加されたイヤホン端子で再生される。なお、話速変換の動作や各構成要素の詳細な説明は前述の実施の形態１で説明したのと同じなので、ここでは割愛する。 The operation of the speech speed converting apparatus configured as described above will be described. In the audio reproduction of the television receiver, the audio signal detected as the L-channel and R-channel audio stereo signals is reproduced as it is by a speaker built in the television receiver. On the other hand, this audio stereo signal is added by the adder 1 and converted into a monaural signal, further subjected to predetermined speech speed conversion by the speech speed conversion processing unit 2, and distributed to the audio LR channel by the distributor 3, respectively. The converted and distributed audio LR signal is reproduced at the earphone terminal added to the television receiver. Note that the speech speed conversion operation and the detailed description of each component are the same as those described in the first embodiment, and are omitted here.

以上のような構成により、通常のＬチャンネル、Ｒチャンネルの音声ステレオ信号は、テレビ受信機内蔵のスピーカで再生することで、通常の話速で理解できる視聴者はスピーカ再生で聞き、通常の話速についていけない高齢者などはイヤホン端子から、イヤホンで話速変換された音声信号を聞くことにより、同じテレビ受信機でのテレビ視聴が可能となり、家族団欒が楽しめる。高齢者にとっては、モノラル信号になり音の定位情報は損なわれるが、話速変換による聞き易さのメリットの方が大きい。 With the above configuration, normal L-channel and R-channel audio stereo signals are played back with a speaker built in the television receiver, so that a viewer who can understand at normal speaking speed listens to the speaker with normal playback speed. Elderly people who cannot keep up with the speed can listen to the sound signal converted from the earphone through the earphone terminal, so that they can watch the TV on the same TV receiver and enjoy the family together. For elderly people, it becomes a monaural signal and the localization information of the sound is impaired, but the merit of ease of hearing by speech speed conversion is greater.

またイヤホン端子から話速変換信号を出力することにより、この信号を入力に利用し、赤外線などの音飛ばし機能などと併用することにより、高齢者にとって、更に快適な聞き取り易いテレビ視聴環境が整う。 Also, by outputting a speech rate conversion signal from the earphone terminal, this signal is used for input, and in combination with a sound skip function such as infrared rays, a more comfortable TV viewing environment for the elderly can be established.

またイヤホン端子出力は、スピーカ再生側とは独立の音量調整を可能とするように音量調整部を設ける構成とすることが望ましい。さらに所定の話速を保ち音程を可変する音程調整部を設けることもできる。一般に高齢者は早い話速に追従困難なだけではなく、聴取感度も低下し、かつ聴取可能な音域も狭くなっているので、イヤホン端子出力に上記のような構成を追加すれば、他の視聴者に影響を与えることなく、自ら最も聞き取りやすい設定でテレビ視聴が可能になる。 Further, it is desirable that the earphone terminal output is provided with a volume adjusting unit so that the volume can be adjusted independently from the speaker reproduction side. Furthermore, it is possible to provide a pitch adjusting unit that keeps a predetermined speech speed and varies the pitch. In general, elderly people are not only difficult to follow fast speaking speeds, but also have low listening sensitivity and a narrow range of sounds that can be listened to. TV can be viewed with a setting that is easy to hear without affecting the user.

本発明に係る話速変換方法及び話速変換装置によれば、簡単な回路構成で時間ズレのない話速変換が実現でき、とりわけ、画像が映し出される場合には、画像と音声のズレを極力小さくできるので、画像を伴う話速変換に効果があり、テレビのみならずＶＴＲ、ＤＶＤなどのＡＶ機器の話速変換においてとりわけ有用である。 According to the speech speed conversion method and the speech speed conversion apparatus according to the present invention, speech speed conversion without time deviation can be realized with a simple circuit configuration, and in particular, when an image is projected, the difference between the image and the voice is minimized. Since it can be made small, it is effective for speech speed conversion with images, and is particularly useful for speech speed conversion not only for television but also for AV equipment such as VTR and DVD.

本発明の第１の実施の形態における話速変換装置のブロック構成図The block block diagram of the speech-speed converter in the 1st Embodiment of this invention 本発明の話速変換のタイミング説明図Explanation of timing of speech speed conversion of the present invention 本発明の第２の実施の形態における話速変換装置のブロック構成図Block diagram of the speech rate conversion apparatus in the second embodiment of the present invention 従来の実施例の話速変換装置のブロック構成図Block configuration diagram of a speech speed conversion device according to a conventional embodiment 従来の話速変換のタイミング説明図Timing diagram of conventional speech speed conversion 本発明を使用しない従来の第１の話速変換装置のブロック構成図1 is a block diagram of a conventional first speech speed conversion apparatus that does not use the present invention. 本発明を使用しない従来の第２の話速変換装置のブロック構成図The block block diagram of the 2nd conventional speech rate converter which does not use this invention

Explanation of symbols

１加算器
２話速変換処理部
３分配器 1 adder 2 speech speed conversion processing unit 3 distributor

Claims

A speech speed conversion method comprising: adding a plurality of input channel audio signals to convert them to a monaural signal; and converting the monaural signal to a speech speed.

An adder for adding the input L-channel and R-channel audio stereo signals; a speech rate conversion processing unit for inputting a speech signal that has been converted to a monaural signal by the adder and performing a predetermined speech rate conversion; and the speech rate A speech rate conversion apparatus comprising a distributor for distributing an output signal of a conversion processing unit to an L channel and an R channel and outputting the signal as an audio LR signal, and reproducing the audio LR signal with a speaker.

An adder for adding the input L-channel and R-channel audio stereo signals; a speech rate conversion processing unit for inputting a speech signal that has been converted to a monaural signal by the adder and performing a predetermined speech rate conversion; and the speech rate A distributor that distributes the output signal of the conversion processing unit to the L channel and the R channel and outputs the signal as an audio LR signal, reproduces the audio stereo signal at a speaker, and reproduces the audio LR signal at an earphone terminal; A speech speed conversion device characterized by that.