JP2008107381A

JP2008107381A - Speaking speed converting device and speaking speed converting control method

Info

Publication number: JP2008107381A
Application number: JP2006287305A
Authority: JP
Inventors: Atsushi Hotta; 厚堀田; Takashi Sudo; 貴志須藤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-10-23
Filing date: 2006-10-23
Publication date: 2008-05-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speaking speed converting device in which a speaking speed conversion magnification rate of a voice signal is set to more than 1, only for a voice part which is hard to hear in audibility, that is, voice is hardly delayed from an image by promptly reproducing the voice. <P>SOLUTION: Sonant/silence of an input voice signal is discriminated in a sonant/silence discrimination section, and from current input voice signal total time and output voice signal total time, it is determined whether or not, a silence signal is rejected by a signal rejection determination section. The silence signal which is determined to be rejected, is rejected in a silence signal rejection section, and extension processing is performed on the other signals with the speaking speed conversion magnification rate determined by previous processing in a speed conversion section, without changing a voice pitch. The signals are accumulated in an output voice control section and sequentially forwarded to an output signal terminal, and a conversion speed control section calculates the speaking speed conversion magnification rate from difference value between a voice accumulation amount of current processing time and the voice accumulation amount of previous processing time and output to the speed conversion section. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、テレビジョン装置や電話機及び携帯電話機等の音声をその音声ピッチを変えずにゆっくり再生することで、受聴者に対して聴き取り易くする話速変換技術に関するものである。 The present invention relates to a speech speed conversion technique that makes it easy for a listener to hear a sound of a television device, a telephone, a mobile phone, or the like by slowly reproducing the sound without changing the sound pitch.

音声信号を再生する時に、話者の音声ピッチを変えないでゆっくり音声を再生し、受聴者に対して聴き取り易くする技術として音声信号の話速変換技術がある。この技術を適用した従来の話速変換装置の動作原理は、音声信号が有音であるか、無音であるかを判別し、有音と判別された音声信号に対してのみ音声ピッチを変えずに時間伸長処理を施すことであり、この時間伸長処理に起因して発生する再生音声の時間的な遅れは、無音と判別された音声信号を廃棄することで、吸収する仕組みとなっている（例えば、特許文献１、２参照）。 As a technique for reproducing a voice signal slowly without changing the voice pitch of the speaker and facilitating listening to the listener, there is a voice signal conversion technique. The principle of operation of a conventional speech speed conversion device to which this technology is applied is to determine whether a voice signal is voiced or silent and to change the voice pitch only for a voice signal determined to be voiced. The time delay of the playback sound generated due to the time extension process is absorbed by discarding the audio signal determined to be silent ( For example, see Patent Documents 1 and 2).

特開２００２−２９７２００号公報JP 2002-297200 A 特開２００１−１０９４９９号公報JP 2001-109499 A

しかしながら、話速変換装置に入力される音声信号が、例えば、テレビ放送のニュース番組における現場中継音声や街頭インタビュー収録音声など、話者音声に加え、背景音が含まれる場合や、ＣＭ(Commercial Message)など背景音が常に鳴っている場合においては、音声信号を無音と判別することは難しく、無音信号の廃棄処理の実行は実質的に困難となる。このようなケースでは、音声信号に対する時間伸長処理の影響で、映像に対して音声の遅れが一方的に増えていくこととなり、テレビ視聴者が違和感を覚えてしまう。
また、上記時間遅れに相当する音声信号は、別途メモリに保持しつつ、再生を行う必要があり、装置の規模、コストが大きくなるデメリットが生じてしまう。 However, if the audio signal input to the speech rate conversion device includes background sound in addition to speaker audio, such as on-site relay audio or street interview recorded audio in a TV broadcast news program, CM (Commercial Message When the background sound is always sounding, such as), it is difficult to determine that the audio signal is silent, and it is substantially difficult to execute the discarding process of the silent signal. In such a case, due to the effect of the time expansion processing on the audio signal, the audio delay increases unilaterally with respect to the video, and the TV viewer feels uncomfortable.
Further, the audio signal corresponding to the time delay needs to be reproduced while being held in a separate memory, resulting in a disadvantage that the scale and cost of the apparatus are increased.

この問題点を解決する方法の一つとして、特許文献２には、圧縮伸長処理の施された音声信号を蓄積するメモリの蓄積量が、０％以上２０％未満の時は音声信号の話速変換倍率（入力音声信号の時間長を出力音声信号の時間長で除算した値）を０．６倍、２０％以上４０％未満の時は０．７倍、４０％以上６０％未満の時は０．８倍、６０％以上８０％未満の時は０．９倍、８０％以上１００％以下の時は１．０倍（圧縮伸長処理無し）、といったように、メモリの蓄積量に応じて音声信号の話速変換倍率を変える方法が記載されている。 As one method for solving this problem, Patent Document 2 discloses that when the storage amount of a memory for storing a speech signal subjected to compression / decompression processing is 0% or more and less than 20%, the speech speed of the speech signal is as follows. Conversion magnification (value obtained by dividing the time length of the input audio signal by the time length of the output audio signal) is 0.6 times, 0.7 times when 20% or more and less than 40%, and 40% or more and less than 60% 0.8 times, 0.9 times when 60% or more and less than 80%, 1.0 times when 80% or more and 100% or less (no compression / decompression processing), etc. A method for changing the speech speed conversion magnification of an audio signal is described.

しかしながら、上記手法は、あくまでメモリの破綻防止対策を主眼としたものであり、メモリの蓄積量が多くなるに従って、話者音声をゆっくり再生することが困難となる問題がある。 However, the above method is mainly intended to prevent the failure of the memory, and there is a problem that it becomes difficult to slowly reproduce the speaker voice as the storage amount of the memory increases.

また、スポーツニュース番組等では、アナウンサーの話者音声に加え、背景にＢＧＭ(Back Ground Music)を流しているものもあり、この場合は、無音部分の全く無い音声信号が連続して話速変換装置に入力されることとなる。このような番組に対して、上記手法を適用した場合、メモリの蓄積量が多くなるに従って、音声の遅れ時間の増加スピードは緩和されるものの、無音部分が全く無いため、いずれはメモリの蓄積量が８０％以上に達することとなる。これは、例えば音声信号を蓄積するメモリが２秒分の音声信号を蓄積できるとした場合、音声が映像に対して１．６秒（＝２秒×８０％）遅れたままとなることを意味し、音声をゆっくりに変換することで、受聴者に対して聴き取り易い音声を提供するという、話速変換装置本来の目的から逸脱してしまう問題がある。 Some sports news programs, for example, have background music (BGM) in addition to the announcer's speaker voice. In this case, the voice signal without any silence is continuously converted to the speech speed. It will be input to the device. When the above method is applied to such a program, the increase in the delay time of the sound is reduced as the amount of accumulated memory increases, but there is no silence at all. Will reach 80% or more. This means that, for example, if the memory for storing the audio signal can store the audio signal for 2 seconds, the audio will be delayed by 1.6 seconds (= 2 seconds × 80%) with respect to the video. However, there is a problem that deviates from the original purpose of the speech speed converting apparatus, which is to provide a voice that is easy to listen to the listener by slowly converting the voice.

この発明は、かかる問題点を解決するためになされたもので、単位時間毎に音声遅れ時間の増加量を計算し、この計算結果に応じて、予め用意しておいた複数の話速変換倍率を、単位時間毎に随時切り換えて圧縮伸長処理を実行することによって、映像に対する音声の遅れを抑圧し、テレビ視聴者が映像と音声のずれによって感じる違和感を軽減する話速変換装置を提供することを目的としている。 The present invention has been made to solve such a problem, and calculates an increase in speech delay time per unit time, and a plurality of speech rate conversion magnifications prepared in advance according to the calculation result. To provide a speech speed conversion device that suppresses a delay in audio with respect to a video and reduces a sense of discomfort that a TV viewer feels due to a difference between the video and the audio by performing compression / decompression processing at any time in units of time. It is an object.

また、後者の課題については、聴感上聴こえ難い音声部分のみを対象に、音声信号の話速変換倍率を１．０倍よりも大きくする、即ち、音声を早く再生する処理を施すことで、映像に対する音声の遅れ時間が生じ難い話速変換装置を提供することを目的としている。 Also, with respect to the latter problem, only the audio part that is difficult to hear due to the audibility, the speech speed conversion magnification of the audio signal is made larger than 1.0, that is, the process of reproducing the sound quickly is performed. It is an object of the present invention to provide a speech speed conversion device that hardly causes a delay time of voice.

この発明に係る話速変換装置は、音声入力端子、有音／無音判別部、信号廃棄判断部、無音信号廃棄部、速度変換部、出力音声制御部、音声出力端子及び変換速度制御部を備え、
有音／無音判別部は、音声入力端子からの入力音声信号を有音か無音かを判別し、
信号廃棄判断部は、有音／無音判別部からの判別信号で有音信号は破棄せず、無音信号は音声入力端子からの入力音声信号の総時間を算出し、出力音声制御部からの出力音声信号の総時間を算出し、音声出力総時間数から、音声入力総時間数を減算し、その結果により無音信号を廃棄するか否かを判断し、
無音信号廃棄部は同じく音声信号を入力し、信号破棄判断部からの破棄信号で無音信号を廃棄し、それ以外の信号は速度変換部に渡し、
速度変換部は無音信号廃棄部から渡された音声信号に対し、音声ピッチを変えずに音声信号の伸長処理を変換速度制御部からの話速変換倍率で施し、
出力音声制御部は、速度変換部から時間圧縮伸長処理された音声信号を音声蓄積メモリに蓄積するとともに、音声信号が入力される毎に、その時間長を出力音声信号の時間長として、信号廃棄判別部に渡し、さらに音声蓄積メモリに蓄積された出力待機中の音声信号を、時間的に古いものから順に音声出力端子に出力しかつ出力された音声サンプル数をカウントし、一定周期毎に現在の音声蓄積量として、変換速度制御部に渡し、
変換速度制御部は、出力音声制御部から受け取った現在の音声蓄積量と、前回処理時の音声蓄積量との差分値を算出し、この差分値を単位時間で除算して、音声遅延増加度を算出し、音声遅延増加度に応じた話速変換倍率を速度変換部に出力する構成にされたものである。 The speech speed conversion device according to the present invention includes a voice input terminal, a voice / silence discrimination section, a signal discard judgment section, a silence signal discard section, a speed conversion section, an output voice control section, a voice output terminal, and a conversion speed control section. ,
The voice / silence discrimination unit discriminates whether the voice signal input from the voice input terminal is voiced or silent,
The signal discard judgment unit does not discard the voice signal by the discrimination signal from the voice / silence discrimination unit, and the silence signal calculates the total time of the input voice signal from the voice input terminal and outputs it from the output voice control unit Calculate the total time of the audio signal, subtract the total audio input time from the total audio output time, and determine whether to discard the silence signal based on the result,
The silent signal discarding unit also inputs the audio signal, discards the silent signal with the discard signal from the signal discard determination unit, and passes the other signals to the speed conversion unit,
The speed conversion unit performs a voice signal expansion process on the voice signal passed from the silence signal discarding unit without changing the voice pitch at the conversion rate of the voice speed from the conversion speed control unit,
The output audio control unit accumulates the audio signal subjected to the time compression / decompression processing from the speed conversion unit in the audio accumulation memory, and discards the signal as the time length of the output audio signal every time the audio signal is input. Output to the audio output terminal the audio signals waiting to be output that have been passed to the discriminator and stored in the audio storage memory, in order from the oldest to the audio output terminal, and count the number of output audio samples. To the conversion speed control unit
The conversion speed control unit calculates a difference value between the current voice accumulation amount received from the output voice control unit and the voice accumulation amount at the previous processing, and divides this difference value by unit time to obtain a voice delay increase degree. , And the speech speed conversion magnification corresponding to the degree of increase in voice delay is output to the speed conversion unit.

また、この発明に係る話速変換制御方法は、
音声入力端子からの入力音声信号を有音か無音かを判別し、判別信号を出力する有音／無音判別ステップと、
有音／無音判別ステップからの判別信号で有音信号は破棄せず、無音信号は現時点における音声出力総時間数から、音声入力総時間数を減算し、その結果により無音信号を廃棄するか否かを判断する信号廃棄判断ステップと、
音声信号を入力し、信号破棄判断ステップからの破棄信号で無音信号を廃棄し、それ以外の信号は次処理の速度変換ステップに渡す無音信号廃棄ステップと
無音信号廃棄ステップから渡された音声信号に対し、音声ピッチを変えずに音声信号の伸長処理を前回処理で定まる話速変換倍率で施す速度変換ステップと、
速度変換ステップから時間圧縮伸長処理された音声信号を音声蓄積メモリに蓄積するとともに、音声信号が入力される毎に、その時間長を出力音声信号の時間長として、信号廃棄判別ステップで使用する音声出力総時間数の元データとして出力し、さらに音声蓄積メモリに蓄積された出力待機中の音声信号を、時間的に古いものから順に音声出力端子に出力し、かつ出力された音声サンプル数をカウントし、一定周期毎に現在の音声蓄積量として出力する出力音声制御ステップと、
出力音声制御ステップから出力された現在の音声蓄積量と、前回処理時の音声蓄積量との差分値を算出し、この差分値を単位時間で除算して、音声遅延増加度を算出し、音声遅延増加度に応じた話速変換倍率を速度変換ステップで用いる話速変換倍率として出力する変換速度制御ステップを有する物である。 Further, the speech speed conversion control method according to the present invention includes:
A voice / silence discrimination step for discriminating whether the input voice signal from the voice input terminal is voiced or silent, and outputting a judgment signal;
Whether or not the sound signal is discarded by the determination signal from the sound / silence determination step, and the sound signal subtracts the total time of sound input from the total sound output time at the present time, and whether to discard the sound signal according to the result A signal discarding determination step for determining whether or not
The audio signal is input, the silence signal is discarded by the discard signal from the signal discard determination step, and other signals are sent to the silence signal discard step and the silence signal discard step that are passed to the speed conversion step of the next processing. On the other hand, a speed conversion step for performing speech signal expansion processing at a speech speed conversion magnification determined in the previous processing without changing the voice pitch,
The audio signal subjected to the time compression / decompression processing from the speed conversion step is accumulated in the audio accumulation memory, and every time the audio signal is input, the time length is set as the time length of the output audio signal, and the audio used in the signal discard determination step Output as the original data for the total number of output times, and output the standby audio signals stored in the audio storage memory to the audio output terminal in order from the oldest one, and count the number of output audio samples And an output audio control step for outputting the current audio accumulation amount at regular intervals,
Calculate the difference between the current audio accumulation amount output from the output audio control step and the audio accumulation amount at the previous processing, and divide this difference value by unit time to calculate the audio delay increase degree. It has a conversion speed control step for outputting a speech speed conversion magnification corresponding to the degree of increase in delay as a speech speed conversion magnification used in the speed conversion step.

この発明による話速変換装置によれば、時間圧縮伸長処理における話速変換倍率を、単位時間毎に随時切り換えて動作させることが可能となり、例えば、テレビ受像機に適用された場合は、映像に対する音声の遅れを抑圧できる効果がある。 According to the speech speed conversion device of the present invention, the speech speed conversion magnification in the time compression / decompression process can be switched and operated at any time per unit time. For example, when applied to a television receiver, This has the effect of suppressing voice delays.

以下、この発明を実施するための最良の一形態を説明する。
実施の形態１．
この発明の実施の形態１に係る話速変換装置について図面を参照しながら説明する。図１は、この発明の実施の形態１に係る話速変換装置の構成を示す機能ブロック図である。図において、１は音声入力端子、２は有音／無音判別部、３は信号廃棄判断部、４は無音信号廃棄部、５は速度変換部、６は出力音声制御部、７は音声出力端子、８は変換速度制御部である。 The best mode for carrying out the present invention will be described below.
Embodiment 1 FIG.
A speech speed converting apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of the speech rate conversion apparatus according to Embodiment 1 of the present invention. In the figure, 1 is an audio input terminal, 2 is a sound / silence discrimination unit, 3 is a signal discard determination unit, 4 is a silence signal discard unit, 5 is a speed conversion unit, 6 is an output audio control unit, and 7 is an audio output terminal. , 8 is a conversion speed control unit.

次に、その動作について、図１を用いて説明する。
音声信号は音声入力端子１から話速変換装置に入力され、有音／無音判別部２、信号廃棄判断部３、無音信号廃棄部４に渡される。有音／無音判別部２は、入力音声信号に対して有音か無音かの判別を行い、判別結果を信号廃棄判断部３に渡す。 Next, the operation will be described with reference to FIG.
The voice signal is input from the voice input terminal 1 to the speech rate conversion device, and passed to the voice / silence determination unit 2, the signal discard determination unit 3, and the silence signal discard unit 4. The voice / silence determination unit 2 determines whether the input voice signal is voiced or silent, and passes the determination result to the signal discard determination unit 3.

信号廃棄判断部３は、有音／無音判別部２から渡された判別結果が有音であった場合は、信号廃棄を行わない指示信号を無音信号廃棄部４に渡す。
一方、有音／無音判別部２から受け取った判別結果が無音であった場合は、無音信号の廃棄が可能か不可能かを判断した上で、可能と判断された場合は信号廃棄を行う指示信号を、不可能と判断された場合は信号廃棄を行わない指示信号を、無音信号廃棄部４に渡す。
ここで、信号廃棄判断部３の動作について図２のフローチャートを用いて説明を加える。 When the determination result passed from the sound / silence determination unit 2 is sound, the signal discard determination unit 3 passes an instruction signal not to discard the signal to the silence signal discard unit 4.
On the other hand, if the determination result received from the voice / silence determination unit 2 is silent, it is determined whether or not the silent signal can be discarded, and if it is determined to be possible, an instruction to discard the signal is given. When it is determined that the signal is not possible, an instruction signal that does not discard the signal is passed to the silent signal discarding unit 4.
Here, the operation of the signal discard determination unit 3 will be described with reference to the flowchart of FIG.

図２は、信号廃棄判断部３の処理フローチャートを示したものであり、３ａ、３ｂ、３ｃ、３ｅ、３ｆは処理ブロック、３ｄは判断ブロックである。
また、後述する処理ブロック３ｂにおいて使用する音声入力時間数と、処理ブロック３ｃにおいて使用する音声出力時間数については、話速変換装置の動作が開始された時点での値、即ち初期値はともに０である。次に、その動作について説明する。 FIG. 2 shows a processing flowchart of the signal discard determination unit 3. Reference numerals 3a, 3b, 3c, 3e, and 3f are processing blocks, and reference numeral 3d is a determination block.
As for the number of voice input hours used in the processing block 3b described later and the number of voice output hours used in the processing block 3c, both the values at the start of the operation of the speech speed converting apparatus, that is, the initial values are both 0. It is. Next, the operation will be described.

処理ブロック３ａでは、音声入力端子１から受け取った入力音声信号の時間長を、音声入力時間数に加算して、入力音声信号の総時間を算出する。
処理ブロック３ｂでは、後述する出力音声制御部６から受け取った出力音声信号の時間長を、音声出力時間数に加算して、出力音声信号の総時間を算出する。 In the processing block 3a, the time length of the input voice signal received from the voice input terminal 1 is added to the number of voice input hours to calculate the total time of the input voice signal.
In the processing block 3b, the time length of the output audio signal received from the output audio control unit 6 to be described later is added to the number of audio output hours to calculate the total time of the output audio signal.

処理ブロック３ｃでは、音声出力時間数から、音声入力時間数を減算する。
判断ブロック３ｄでは、減算結果が正の値であった場合は、処理ブロック３ｅにおいて、信号廃棄可能の指示信号を無音信号廃棄部４に送出し、処理を終了する。
一方、減算結果が負の値または０であった場合は、処理ブロック３ｆにおいて、信号廃棄不可能の指示信号を無音信号廃棄部４に送出し、処理を終了する。 In the processing block 3c, the audio input time is subtracted from the audio output time.
In the determination block 3d, when the subtraction result is a positive value, in the processing block 3e, an instruction signal capable of discarding the signal is sent to the silent signal discarding unit 4 and the processing is ended.
On the other hand, when the subtraction result is a negative value or 0, in the processing block 3f, an instruction signal indicating that the signal cannot be discarded is sent to the silent signal discarding unit 4, and the processing is terminated.

この信号廃棄判断部３における一連の動作は、例えば、話速変換装置に入力される音声信号が無音信号、或いは無音部分を多く含む音声信号であった場合において、無音部分の廃棄を繰り返すことで、話速変換装置から出力する音声信号が無くなってしまうことを避けるために実行される。
以上が信号廃棄判断部３の動作説明である。引き続き、図１の話速変換装置の一連の動作説明に戻る。 A series of operations in the signal discard determination unit 3 is performed by repeatedly discarding the silent part when the voice signal input to the speech speed conversion device is a silent signal or a voice signal including many silent parts. This is executed in order to avoid losing the voice signal output from the speech speed conversion device.
The operation of the signal discard determination unit 3 has been described above. Subsequently, the description returns to a series of operations of the speech speed conversion apparatus of FIG.

無音信号廃棄部４は、信号廃棄判断部３から受け取った指示信号の内容が、信号廃棄を行わない指示内容であった場合は、音声入力端子１から入力された音声信号をそのまま速度変換部５に渡し、一方、信号廃棄を行う指示内容であった場合は、音声入力端子１から入力された音声信号を廃棄し、速度変換部５に渡さない。 When the content of the instruction signal received from the signal discard determination unit 3 is an instruction content that does not discard the signal, the silent signal discard unit 4 directly converts the voice signal input from the voice input terminal 1 into the speed conversion unit 5. On the other hand, if the instruction content is to discard the signal, the voice signal input from the voice input terminal 1 is discarded and not sent to the speed conversion unit 5.

速度変換部５は、無音信号廃棄部４から受け取った音声信号に対し、音声ピッチを変えずに時間圧縮伸長処理を施し、処理の結果生成された音声信号を、出力音声制御部６へ渡す。また、速度変換部５における時間圧縮伸長処理の話速変換倍率は、１．０倍、０．９倍、０．８倍等、１．０倍以下の複数のパターンを予め用意しておき、話速変換装置の動作開始時の時間圧縮伸長処理の話速変換倍率は、最も低い変換倍率、この例では０．８倍とし、以降の処理において、いずれの話速変換倍率を選択するかは変換速度制御部８が決める。この話速変換倍率の決め方については、後述する変換速度制御部８の動作説明にて、詳しく説明する。 The speed conversion unit 5 performs time compression / decompression processing on the audio signal received from the silence signal discard unit 4 without changing the audio pitch, and passes the audio signal generated as a result of the processing to the output audio control unit 6. In addition, as the speech speed conversion magnification of the time compression / decompression process in the speed conversion unit 5, a plurality of patterns of 1.0 times or less, such as 1.0 times, 0.9 times, and 0.8 times, are prepared in advance. The speech speed conversion magnification of the time compression / decompression processing at the start of the operation of the speech speed conversion device is the lowest conversion magnification, in this example, 0.8 times. Which speech speed conversion magnification is selected in the subsequent processing? The conversion speed control unit 8 decides. The method for determining the speech speed conversion magnification will be described in detail in the operation description of the conversion speed control unit 8 described later.

出力音声制御部６は、速度変換部５から受け取った音声信号を、音声出力信号端子７に渡し、話速変換装置から音声信号が出力される。
ここで、出力音声制御部６の動作について図３を用いて説明を加える。図３は、出力音声制御部６の構成を示す機能ブロック図であり、６ａは出力音声数計算部、６ｂは変換音声蓄積部、６ｃは遅延時間計算部である。
次に、その動作について説明する。 The output voice control unit 6 passes the voice signal received from the speed conversion unit 5 to the voice output signal terminal 7, and the voice signal is output from the speech speed conversion device.
Here, the operation of the output voice control unit 6 will be described with reference to FIG. FIG. 3 is a functional block diagram showing the configuration of the output sound control unit 6. Reference numeral 6a denotes an output sound number calculation unit, 6b a converted sound storage unit, and 6c a delay time calculation unit.
Next, the operation will be described.

出力音声数計算部６ａは、速度変換部５から時間圧縮伸長処理の施された音声信号が入力される毎に、その時間長を出力音声信号の時間長として、信号廃棄判別部３に渡す。変換音声蓄積部６ｂは、速度変換部５から受け取った時間圧縮伸長処理の施された音声信号を、予め用意しておいた音声蓄積メモリに蓄積する。そして、音声蓄積メモリに蓄積され、出力待機中の音声信号を、時間的に古いものから順に音声出力信号端子７に出力し、最終的に話速変換装置から音声信号が出力される。 Each time an audio signal subjected to time compression / decompression processing is input from the speed converter 5, the output audio number calculator 6 a passes the time length to the signal discard determination unit 3 as the time length of the output audio signal. The converted voice storage unit 6b stores the voice signal that has been subjected to the time compression / decompression processing received from the speed conversion unit 5 in a voice storage memory prepared in advance. Then, the audio signals stored in the audio storage memory and waiting for output are output to the audio output signal terminal 7 in order from the oldest one, and finally the audio signal is output from the speech speed conversion device.

遅延時間計算部６ｃは、変換音声蓄積部６ｂから音声出力信号端子７に出力された音声サンプル数をカウントし、一定周期毎に現在の音声蓄積量として、変換速度制御部８に渡す。ここでの一定周期とは、例えば１秒とし、サンプリング周波数が４８ｋＨｚの音声信号の場合は、４８０００サンプル分の音声サンプルが、変換音声蓄積部６ｂから音声出力信号端子７に出力される毎、となる。
以上が出力音声制御部６の動作説明である。 The delay time calculation unit 6c counts the number of audio samples output from the converted audio accumulating unit 6b to the audio output signal terminal 7, and passes it to the conversion speed control unit 8 as a current audio accumulation amount at a certain period. Here, the fixed period is, for example, 1 second, and in the case of an audio signal with a sampling frequency of 48 kHz, every time 48000 samples of audio samples are output from the converted audio storage unit 6b to the audio output signal terminal 7, Become.
The above is the description of the operation of the output sound control unit 6.

次に、速度変換部５における音声信号の話速変換倍率を決定する変換速度制御部８の動作について、図４のフローチャートを用いて説明を加える。 Next, the operation of the conversion speed control unit 8 that determines the speech speed conversion magnification of the voice signal in the speed conversion unit 5 will be described with reference to the flowchart of FIG.

図４は、変換速度制御部８の処理フローチャートを示したものであり、８ａ、８ｂ、８ｅ、８ｆ、８ｇ、８ｈは処理ブロック、８ｃ、８ｄは判断ブロックである。次に、その動作について説明する。 FIG. 4 shows a processing flow chart of the conversion speed control unit 8, wherein 8a, 8b, 8e, 8f, 8g, and 8h are processing blocks, and 8c and 8d are determination blocks. Next, the operation will be described.

処理ブロック８ａでは、出力音声制御部６から受け取った現在の音声蓄積量と、前回の処理の際に記録しておいた音声蓄積量との差分値を計算する。 In the processing block 8a, a difference value between the current sound accumulation amount received from the output sound control unit 6 and the sound accumulation amount recorded in the previous process is calculated.

処理ブロック８ｂでは、差分値を単位時間で除算する（以降の説明においては、この除算結果を音声遅延増加度と呼ぶことにする）。ここでの単位時間とは、出力音声制御部６から現在の音声蓄積量を受け取る周期と同じであり、出力音声制御部６の動作説明で示した例では、１．０秒毎となる。
音声遅延増加度の算出例としては、
例えば、前回の音声蓄積量が０．１秒、現在の音声蓄積量が０．３秒、単位時間を１．０秒とした場合、
音声遅延増加度＝（０．３秒−０．１秒）／１．０秒＝０．２
前回の音声蓄積量が０．３秒、現在の音声蓄積量が０．１秒、単位時間を１．０秒とした場合、
音声遅延増加度＝（０．１秒−０．３秒）／１．０秒＝−０．２
となる。 In the processing block 8b, the difference value is divided by the unit time (in the following description, this division result is referred to as a voice delay increase degree). Here, the unit time is the same as the period of receiving the current voice accumulation amount from the output voice control unit 6, and is every 1.0 seconds in the example shown in the operation explanation of the output voice control unit 6.
As an example of calculating the audio delay increase,
For example, when the previous voice accumulation amount is 0.1 seconds, the current voice accumulation amount is 0.3 seconds, and the unit time is 1.0 second,
Increase in voice delay = (0.3 sec-0.1 sec) /1.0 sec = 0.2
When the previous voice accumulation amount is 0.3 seconds, the current voice accumulation amount is 0.1 seconds, and the unit time is 1.0 second,
Increase in voice delay = (0.1 sec-0.3 sec) /1.0 sec = -0.2
It becomes.

判断ブロック８ｃでは、音声遅延増加度が０．１以上であった場合は、処理ブロック８ｆにおいて、時間圧縮伸長処理の話速変換倍率を１．０倍とする指示信号を速度変換部５に送出し、処理ブロック８ｈに進む。
一方、判断ブロック８ｃにおいて、音声遅延増加度が０．１未満であった場合は、判断ブロック８ｄへ進む。 In the decision block 8c, when the degree of increase in the voice delay is 0.1 or more, in the processing block 8f, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 1.0 is sent to the speed converter 5. Then, the process proceeds to the processing block 8h.
On the other hand, if the voice delay increase degree is less than 0.1 in the decision block 8c, the process proceeds to the decision block 8d.

判断ブロック８ｄでは、音声遅延増加度が０．０５以上であった場合は、処理ブロック８ｇにおいて、時間圧縮伸長処理の話速変換倍率を０．９倍とする指示信号を速度変換部５に送出し、処理ブロック８ｈに進む。
一方、判断ブロック８ｄにおいて、音声遅延増加度が０．０５未満であった場合は、処理ブロック８ｅにおいて、時間圧縮伸長処理の話速変換倍率を０．８倍とする指示信号を速度変換部５に送出し、処理ブロック８ｈに進む。 In the decision block 8d, if the degree of increase in voice delay is 0.05 or more, the processing block 8g sends an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.9 times to the speed converter 5. Then, the process proceeds to the processing block 8h.
On the other hand, if the voice delay increase degree is less than 0.05 in the decision block 8d, the speed conversion unit 5 sends an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.8 in the processing block 8e. To process block 8h.

最後に処理ブロック８ｈでは、出力音声制御部６から受け取った現在の音声蓄積量を記録し、処理を終了する。 Finally, in the processing block 8h, the current voice accumulation amount received from the output voice control unit 6 is recorded, and the process is terminated.

以上が、この発明の実施の形態１の話速変換装置の一連の動作説明である。
なお、動作説明中、有音／無音判別部２における有音か無音かの判別方法については、背景技術の説明に挙げた通り、特開２００２−２９７２００号公報等に一例が詳しく開示されている。また、速度変換部５における音声ピッチを変えずに音声信号の伸長を行う方法については、特開２００１−１０９４９９号公報等に一例が詳しく開示されている。 The above is the description of a series of operations of the speech speed converting apparatus according to the first embodiment of the present invention.
Incidentally, during the operation description, an example of the sound / silence determination unit 2 for determining whether sound is present or not is disclosed in detail in Japanese Patent Application Laid-Open No. 2002-297200 and the like as described in the background art. . An example of a method for expanding an audio signal without changing the audio pitch in the speed conversion unit 5 is disclosed in detail in Japanese Patent Application Laid-Open No. 2001-109499.

以上詳しく説明したように、この発明の実施の形態１によれば、話速変換装置の時間圧縮伸長処理における話速変換倍率を、単位時間毎に随時切り換えて動作させることが可能となり、映像に対する音声の遅れを抑圧できる効果がある。 As described above in detail, according to the first embodiment of the present invention, it is possible to switch the speech speed conversion magnification in the time compression / decompression processing of the speech speed conversion apparatus at every unit time and to operate the video. This has the effect of suppressing voice delays.

他の実施例１．
以上、この発明を実施の形態１の詳細な説明では、音声信号の話速変換倍率の決定方法として、音声遅延増加度に応じて話速変換倍率を切り換える変換速度制御部８を用いたが、音声遅延増加度がある一定条件を満たした場合に、話速変換倍率を前回の倍率から変更しない変換速度制御部Ａ８１を用いても良い。
次に、速度変換部５における音声信号の話速変換倍率を決定する変換速度制御部Ａ８１の動作について、図５のフローチャートを用いて説明を加える。 Other Embodiments
As described above, in the detailed description of the first embodiment of the present invention, the conversion speed control unit 8 that switches the speech speed conversion ratio according to the degree of increase in the voice delay is used as the method for determining the speech speed conversion ratio of the audio signal. When the voice delay increase degree satisfies a certain condition, the conversion speed control unit A81 that does not change the speech speed conversion magnification from the previous magnification may be used.
Next, the operation of the conversion speed control unit A81 that determines the speech speed conversion magnification of the voice signal in the speed conversion unit 5 will be described with reference to the flowchart of FIG.

図５は、変換速度制御部Ａ８１の処理フローチャートを示したものであり、８１ａ、８１ｂ、８１ｆ、８１ｇ、８１ｈ、８１ｉ、８１ｊは処理ブロック、８１ｃ、８１ｄ、８１ｅは判断ブロックである。次に、その動作について説明する。 FIG. 5 shows a processing flowchart of the conversion speed control unit A81, in which 81a, 81b, 81f, 81g, 81h, 81i, and 81j are processing blocks, and 81c, 81d, and 81e are determination blocks. Next, the operation will be described.

処理ブロック８１ａでは、出力音声制御部６から受け取った現在の音声蓄積量と、前回の処理の際に記録しておいた音声蓄積量との差分値を計算する。
処理ブロック８１ｂでは、差分値を単位時間で除算して、音声遅延増加度を算出する。 In the processing block 81a, a difference value between the current voice accumulation amount received from the output voice control unit 6 and the voice accumulation amount recorded in the previous process is calculated.
In the processing block 81b, the difference value is divided by the unit time to calculate the audio delay increase degree.

判断ブロック８１ｃでは、音声遅延増加度が０．１以上であった場合は、処理ブロック８１ｉにおいて、時間圧縮伸長処理の話速変換倍率を１．０倍とする指示信号を速度変換部５に送出し、処理ブロック８１ｊに進む。
一方、判断ブロック８１ｃにおいて、音声遅延増加度が０．１未満であった場合は、判断ブロック８１ｄへ進む。 In the decision block 81c, when the degree of increase in voice delay is 0.1 or more, in the processing block 81i, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 1.0 is sent to the speed conversion unit 5. Then, the process proceeds to the processing block 81j.
On the other hand, when the audio delay increase degree is less than 0.1 in the decision block 81c, the process proceeds to the decision block 81d.

判断ブロック８１ｄでは、音声遅延増加度が０．０５以上であった場合は、処理ブロック８１ｈにおいて、時間圧縮伸長処理の話速変換倍率を０．９倍とする指示信号を速度変換部５に送出し、処理ブロック８１ｊに進む。
一方、判断ブロック８１ｄにおいて、音声遅延増加度が０．０５未満であった場合は、判断ブロック８１ｅへ進む。 In the decision block 81d, when the degree of increase in voice delay is 0.05 or more, in the processing block 81h, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.9 is sent to the speed converter 5. Then, the process proceeds to the processing block 81j.
On the other hand, in the determination block 81d, when the audio delay increase degree is less than 0.05, the process proceeds to the determination block 81e.

判断ブロック８１ｅでは、音声遅延増加度が−０．１以上であった場合は、処理ブロック８１ｇにおいて、時間圧縮伸長処理の話速変換倍率を、前回の倍率から変更しない指示信号を速度変換部５に送出し、処理ブロック８１ｊに進む。
一方、判断ブロック８１ｅにおいて、音声遅延増加度が−０．１未満であった場合は、処理ブロック８１ｆにおいて、時間圧縮伸長処理の話速変換倍率を０．８倍とする指示信号を速度変換部５に送出し、処理ブロック８１ｊに進む。 In the decision block 81e, when the degree of increase in the voice delay is −0.1 or more, in the processing block 81g, an instruction signal that does not change the speech speed conversion magnification of the time compression / expansion processing from the previous magnification is sent to the speed conversion unit 5. To process block 81j.
On the other hand, when the voice delay increase degree is less than −0.1 in the decision block 81e, in the processing block 81f, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.8 is sent to the speed conversion unit. 5 and proceeds to processing block 81j.

最後に処理ブロック８１ｊでは、出力音声制御部６から受け取った現在の音声蓄積量を記録し、処理を終了する。 Finally, in the processing block 81j, the current voice accumulation amount received from the output voice control unit 6 is recorded, and the process is terminated.

変換速度制御部Ａ８１の特徴としては、図４に示した実施の形態１の変換速度制御部８では音声遅延増加度が０．０５未満の場合において話速変換倍率を０．８倍にしていたものを、その閾値を下げ、−０．１０未満の場合とし、音声遅延増加度が−０．１０以上０．０５未満の値を取る場合は不感帯ゾーンとし、話速変換倍率を変更しないようにした点である。その理由は、例えば、複数の話者が討論を交わしている場合においては、会話の中で一瞬だけ音声蓄積量が減少し、再び増加に転じることがあるが、このようなケースにおいて、話速変換倍率をもっとも速度の遅い０．８倍に戻してしまうと、映像に対する音声の遅れが再び増加に転じてしまうことを避けるためである。
以上が、変換速度制御部Ａ８１の動作説明である。 As a feature of the conversion speed control unit A81, in the conversion speed control unit 8 of the first embodiment shown in FIG. 4, the speech speed conversion magnification is set to 0.8 when the voice delay increase is less than 0.05. If the threshold is lowered to less than -0.10, and the voice delay increase takes a value between -0.10 and less than 0.05, it is set as a dead zone, and the speech rate conversion magnification is not changed. This is the point. The reason for this is that, for example, in the case where a plurality of speakers are in discussion, the voice accumulation amount may decrease for a moment in the conversation and start to increase again. If the conversion magnification is returned to 0.8 times, which is the slowest speed, the audio delay with respect to the video is prevented from increasing again.
The above is the description of the operation of the conversion speed control unit A81.

以上詳しく説明したように、変換速度制御部８の代わりに変換速度制御部Ａ８１を用いた他の実施例１によれば、時間圧縮伸長処理の話速変換倍率を単位時間毎に随時切り換えて動作させることが可能となり、映像に対する音声の遅れを抑圧できる効果に加え、音声蓄積量の微小な増減変化に対しては、現在の速度を維持することが可能となり、映像に対する音声の遅れを更に抑圧できる効果がある。 As described above in detail, according to another embodiment 1 in which the conversion speed control unit A81 is used instead of the conversion speed control unit 8, the speech speed conversion magnification of the time compression / decompression process is switched at any time per unit time. In addition to the effect of suppressing the audio delay with respect to the video, it is possible to maintain the current speed against a minute increase / decrease in the amount of accumulated audio, further suppressing the audio delay with respect to the video. There is an effect that can be done.

実施の形態２．
この発明を実施するための最良の形態２に係る話速変換装置について図面を参照しながら説明する。図６は、この発明の実施の形態２に係る話速変換装置の構成を示す機能ブロック図であり、１は音声入力端子、２１は有音／無音判別部Ａ、３は信号廃棄判断部、４は無音信号廃棄部、５１は速度変換部Ａ、６は出力音声制御部、７は音声出力端子、８２は変換速度制御部Ｂである。 Embodiment 2. FIG.
A speech speed converting apparatus according to the best mode 2 for carrying out the present invention will be described with reference to the drawings. FIG. 6 is a functional block diagram showing the configuration of the speech rate conversion apparatus according to Embodiment 2 of the present invention, in which 1 is a voice input terminal, 21 is a voice / silence discrimination unit A, 3 is a signal discard judgment unit, 4 is a silent signal discarding unit, 51 is a speed conversion unit A, 6 is an output audio control unit, 7 is an audio output terminal, and 82 is a conversion speed control unit B.

次に、その動作について、図６を用いて説明する。
音声信号は音声入力端子１から話速変換装置に入力され、有音／無音判別部Ａ２１、信号廃棄判断部３、無音信号廃棄部４に渡される。有音／無音判別部Ａ２１は、入力音声信号に対して有音か無音かの判別を行い、判別結果を信号廃棄判断部３に渡す。また、有音と判断された入力音声信号のレベル計算を行い、その結果を変換速度制御部Ｂ８２に渡す。 Next, the operation will be described with reference to FIG.
The voice signal is input from the voice input terminal 1 to the speech rate conversion device, and passed to the voice / silence discrimination unit A21, the signal discard determination unit 3, and the silence signal discard unit 4. The voice / silence determination unit A21 determines whether the input voice signal is voiced or silent, and passes the determination result to the signal discard determination unit 3. Also, the level of the input audio signal determined to be sound is calculated, and the result is passed to the conversion speed control unit B82.

信号廃棄判断部３は、有音／無音判別部Ａ２１から渡された判別結果が有音であった場合は、信号廃棄を行わない指示信号を無音信号廃棄部４に渡す。一方、有音／無音判別部Ａ２１から受け取った判別結果が無音であった場合は、無音信号の廃棄が可能か不可能かを判断した上で、可能と判断された場合は信号廃棄を行う指示信号を、不可能と判断された場合は信号廃棄を行わない指示信号を、無音信号廃棄部４に渡す。
この信号廃棄判断部３は、この発明の実施の形態１の説明の中で、図２を用いて詳しく説明を行ったものと同じであり、よって、説明は省略する。 The signal discard determination unit 3 passes an instruction signal not to discard the signal to the silence signal discard unit 4 when the determination result delivered from the sound / silence determination unit A21 is sound. On the other hand, if the determination result received from the voice / silence determination unit A21 is silent, it is determined whether or not the silent signal can be discarded, and if it is determined to be possible, an instruction to discard the signal is given. When it is determined that the signal is not possible, an instruction signal that does not discard the signal is passed to the silent signal discarding unit 4.
The signal discard determination unit 3 is the same as that described in detail with reference to FIG. 2 in the description of the first embodiment of the present invention, and thus the description thereof is omitted.

無音信号廃棄部４は、信号廃棄判断部３から受け取った指示信号の内容が、信号廃棄を行わない指示内容であった場合は、入力音声信号をそのまま速度変換部Ａ５１に渡し、一方、信号の廃棄を行う指示内容であった場合は、音声入力端子１から入力された音声信号を廃棄し、速度変換部Ａ５１に渡さない。 If the content of the instruction signal received from the signal discard determination unit 3 is an instruction content that does not discard the signal, the silent signal discard unit 4 passes the input voice signal as it is to the speed conversion unit A51, If the instruction content is to be discarded, the voice signal input from the voice input terminal 1 is discarded and not sent to the speed conversion unit A51.

速度変換部Ａ５１は、無音信号廃棄部４から受け取った音声信号に対し、音声ピッチを変えずに時間圧縮伸長処理を施し、処理の結果生成された音声信号を、出力音声制御部６へ渡す。また、速度変換部Ａ５１における時間圧縮伸長処理の話速変換倍率は、１．１倍、１．０倍、０．９倍、０．８倍等、複数のパターンを予め用意しておき、話速変換装置の動作開始時の時間圧縮伸長処理の話速変換倍率は、最も低い変換倍率、この例では０．８倍とし、以降の処理において、いずれの話速変換倍率を選択するかは変換速度制御部Ｂ８２が決める。この話速変換倍率の決め方については、後述する変換速度制御部Ｂ８２の動作説明にて、詳しく説明する。 The speed conversion unit A51 performs time compression / decompression processing on the audio signal received from the silence signal discarding unit 4 without changing the audio pitch, and passes the audio signal generated as a result of the processing to the output audio control unit 6. In addition, a plurality of patterns such as 1.1 times, 1.0 times, 0.9 times, 0.8 times, etc. are prepared in advance as the speaking speed conversion magnification of the time compression / decompression process in the speed converting unit A51. The speech speed conversion magnification of the time compression / decompression processing at the start of the operation of the speed converter is the lowest conversion magnification, 0.8 in this example, and it is converted which speech speed conversion magnification is selected in the subsequent processing. The speed controller B82 determines. The method for determining the speech speed conversion magnification will be described in detail in the operation description of the conversion speed control unit B82 described later.

出力音声制御部６は、速度変換部Ａ５１から受け取った音声信号を、音声出力信号端子７に渡し、話速変換装置から音声信号が出力される。
出力音声制御部６は、この発明の実施の形態１の説明の中で図３を用いて詳しく説明を行ったものと同等であり、その構成、動作説明は省略する。 The output voice controller 6 passes the voice signal received from the speed converter A51 to the voice output signal terminal 7, and the voice signal is output from the speech speed converter.
The output audio control unit 6 is the same as that described in detail with reference to FIG. 3 in the description of the first embodiment of the present invention, and its configuration and operation description are omitted.

次に、速度変換部Ａ５１における音声信号の話速変換倍率を決定する変換速度制御部Ｂ８２の動作について、図７のフローチャートを用いて説明を加える。 Next, the operation of the conversion speed control unit B82 for determining the speech speed conversion magnification of the voice signal in the speed conversion unit A51 will be described with reference to the flowchart of FIG.

図７は、変換速度制御部Ｂ８２の処理フローチャートを示したものであり、８２ａ、８２ｂ、８２ｅ、８２ｇ、８２ｈ、８２ｉ、８２ｊ、８２ｋは処理ブロック、８２ｃ、８２ｄ、８２ｆは判断ブロックである。次に、その動作について説明する。 FIG. 7 shows a processing flowchart of the conversion speed control unit B82, in which 82a, 82b, 82e, 82g, 82h, 82i, 82j, and 82k are processing blocks, and 82c, 82d, and 82f are determination blocks. Next, the operation will be described.

処理ブロック８２ａでは、出力音声制御部６から受け取った現在の音声蓄積量と、前回の処理の際に記録しておいた音声蓄積量との差分値を計算する。
処理ブロック８２ｂでは、差分値を単位時間で除算して、音声遅延増加度を算出する。 In the processing block 82a, a difference value between the current voice accumulation amount received from the output voice control unit 6 and the voice accumulation amount recorded in the previous process is calculated.
In the processing block 82b, the difference value is divided by the unit time to calculate the audio delay increase degree.

判断ブロック８２ｃでは、音声遅延増加度が０．１以上であった場合は、処理ブロック８２ｅに進む。一方、判断ブロック８２ｃにおいて、音声遅延増加度が０．１未満であった場合は、判断ブロック８２ｄへ進む。 In the decision block 82c, when the audio delay increase degree is 0.1 or more, the process proceeds to the processing block 82e. On the other hand, if the voice delay increase degree is less than 0.1 in the decision block 82c, the process proceeds to the decision block 82d.

判断ブロック８２ｄでは、音声遅延増加度が０．０５以上であった場合は、処理ブロック８２ｈにおいて、時間圧縮伸長処理の話速変換倍率を０．９倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８２ｋに進む。
一方、判断ブロック８２ｄにおいて、音声遅延増加度が０．０５未満であった場合は、処理ブロック８２ｇにおいて、時間圧縮伸長処理の話速変換倍率を０．８倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８２ｋに進む。 In the decision block 82d, if the degree of increase in voice delay is 0.05 or more, in the processing block 82h, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.9 times is sent to the speed converter A51. Then, the process proceeds to the processing block 82k.
On the other hand, if the voice delay increase degree is less than 0.05 in the decision block 82d, the processing block 82g sends an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.8 times the speed conversion unit A51. To process block 82k.

処理ブロック８２ｅでは、有音／無音判別部Ａ２１から受け取った、有音と判断された入力音声信号のレベルと、予め定めておいた無音閾値レベルの差分値の計算を行う。
ここでの無音閾値レベルとは、音声信号を無音と判定するレベルの値であり、例えば、信号レベルの最大値をデシベル表記で０ｄＢとした場合、無音と判定するレベル値は、−４５ｄＢとする。なお、有音／無音判別部Ａ２１にて、有音／無音の判別手段に信号レベルを用いた場合は、無音と判定するレベルの値を有音／無音判別部Ａ２１から受け取るように変更し、受け取った値を無音閾値レベルに代用しても良い。 In the processing block 82e, a difference value between the level of the input voice signal determined to be voice received from the voice / silence discrimination unit A21 and a predetermined silence threshold level is calculated.
The silence threshold level here is a value of a level at which the audio signal is determined to be silent. For example, when the maximum value of the signal level is 0 dB in decibel notation, the level value to be determined as silence is −45 dB. . When the signal level is used for the sound / silence determination unit in the sound / silence determination unit A21, the value for determining the level of silence is changed to be received from the sound / silence determination unit A21, The received value may be substituted for the silence threshold level.

判断ブロック８２ｆでは、処理ブロック８２ｅにおいて算出したレベル差分値が５ｄＢ以下であった場合は、処理ブロック８２ｊにおいて、時間圧縮伸長処理の話速変換倍率を１．１倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８２ｋに進む。
一方、判断ブロック８２ｆにおいて、レベル差分値が５ｄＢより大きかった場合は、処理ブロック８２ｉにおいて、時間圧縮伸長処理の話速変換倍率を１．０倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８２ｋに進む。 In the decision block 82f, when the level difference value calculated in the processing block 82e is 5 dB or less, in the processing block 82j, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 1.1 times is sent to the speed conversion unit. Send to A51 and proceed to process block 82k.
On the other hand, if the level difference value is larger than 5 dB in the decision block 82f, the processing block 82i sends an instruction signal for setting the speech speed conversion magnification of the time compression / expansion process to 1.0 to the speed converter A51. Proceed to process block 82k.

最後に処理ブロック８２ｋでは、出力音声制御部６から受け取った現在の音声蓄積量を記録し、処理を終了する。 Finally, in the processing block 82k, the current voice accumulation amount received from the output voice control unit 6 is recorded, and the process is terminated.

変換速度制御部Ｂ８２の特徴としては、図１に示した実施の形態１の変換速度制御部８では、音声遅延増加度０．１以上の場合において話速変換倍率を１．０倍にしていたものを、音声遅延増加度０．１以上で入力音声信号のレベルと無音閾値レベルの差分値が５ｄＢ以下の場合のみ話速変換倍率を１．１倍に変更した点である。その理由は、映像に対する音声の遅れを小さくするために、聴感上聴こえ難い、無音閾値レベルより少し大きい音声信号のみを早く再生することである。
以上が、変換速度制御部Ｂ８２の動作説明である。 As a feature of the conversion speed control unit B82, in the conversion speed control unit 8 of the first embodiment shown in FIG. 1, the speech speed conversion magnification is set to 1.0 when the voice delay increase degree is 0.1 or more. In other words, the speech speed conversion magnification is changed to 1.1 only when the difference between the input speech signal level and the silence threshold level is 5 dB or less when the speech delay increase degree is 0.1 or more. The reason is that, in order to reduce the audio delay with respect to the video, only an audio signal that is difficult to hear and is slightly higher than the silence threshold level is quickly reproduced.
The above is the description of the operation of the conversion speed control unit B82.

以上が、この発明の実施の形態２の話速変換装置の一連の動作説明である。なお、動作説明中、有音／無音判別部２における有音か無音かの判別方法については、背景技術の説明に挙げた通り、特開２００２−２９７２００号公報等に一例が詳しく開示されている。また、速度変換部Ａ５１における音声ピッチを変えずに音声信号の伸長を行う方法については、特開２００１−１０９４９９号公報等に一例が詳しく開示されている。 The above is the description of a series of operations of the speech speed converting apparatus according to the second embodiment of the present invention. Incidentally, during the operation description, an example of the sound / silence determination unit 2 for determining whether sound is present or not is disclosed in detail in Japanese Patent Application Laid-Open No. 2002-297200 and the like as described in the background art. . An example of a method for expanding an audio signal without changing the audio pitch in the speed converter A51 is disclosed in detail in Japanese Patent Application Laid-Open No. 2001-109499.

以上詳しく説明したように、この発明の実施の形態２によれば、時間圧縮伸長処理の話速変換倍率を単位時間毎に随時切り換えて動作させることが可能となり、映像に対する音声の遅れを抑圧できる効果がある。 As described above in detail, according to the second embodiment of the present invention, it is possible to switch the speech speed conversion magnification of the time compression / decompression process at every unit time, and to suppress the audio delay with respect to the video. effective.

また、この発明の実施の形態２によれば、聴感上聴こえ難い音声信号の時間圧縮伸長処理の話速変換倍率を、従来の１．０倍から１．１倍にすることが可能となり、映像に対する音声の遅れを更に抑圧できる効果がある。 Further, according to the second embodiment of the present invention, it is possible to increase the speech speed conversion magnification of the time compression / expansion processing of the audio signal which is difficult to hear from the viewpoint of hearing from 1.0 times to 1.1 times of the conventional method. It is possible to further suppress the voice delay with respect to.

他の実施例２．
以上、この発明の実施の形態２の詳細な説明では、音声信号の話速変換倍率の決定方法として、音声遅延増加度に応じて話速変換倍率を切り換える変換速度制御部Ｂ８２を用いたが、音声遅延増加度がある一定条件を満たした場合のみ話速変換倍率を前回の倍率から変更しない変換速度制御部Ｃ８３を用いても良い。
次に、速度変換部Ａ５１における音声信号の話速変換倍率を決定する変換速度制御部Ｃ８３の動作について、図８のフローチャートを用いて説明を加える。 Other Embodiment 2
As described above, in the detailed description of the second embodiment of the present invention, the conversion speed control unit B82 that switches the speech speed conversion ratio according to the degree of increase in the voice delay is used as the method for determining the speech speed conversion ratio of the audio signal. Only when the voice delay increase degree satisfies a certain condition, the conversion speed control unit C83 that does not change the speech speed conversion magnification from the previous magnification may be used.
Next, the operation of the conversion speed control unit C83 that determines the speech speed conversion magnification of the voice signal in the speed conversion unit A51 will be described with reference to the flowchart of FIG.

図８は、変換速度制御部Ｃ８３の処理フローチャートを示したものであり、８３ａ、８３ｂ、８３ｆ、８３ｈ、８３ｉ、８３ｊ、８３ｋ、８３ｌ、８３ｍは処理ブロック、８３ｃ、８３ｄ、８３ｅ、８３ｇは判断ブロックである。
次に、その動作について説明する。 FIG. 8 shows a processing flowchart of the conversion speed control unit C83, in which 83a, 83b, 83f, 83h, 83i, 83j, 83k, 83l, and 83m are processing blocks, and 83c, 83d, 83e, and 83g are determination blocks. It is.
Next, the operation will be described.

処理ブロック８３ａでは、出力音声制御部６から受け取った現在の音声蓄積量と、前回の処理の際に記録しておいた音声蓄積量との差分値を計算する。
処理ブロック８３ｂでは、差分値を単位時間で除算して、音声遅延増加度を算出する。 In the processing block 83a, a difference value between the current voice accumulation amount received from the output voice control unit 6 and the voice accumulation amount recorded in the previous process is calculated.
In the processing block 83b, the difference value is divided by the unit time to calculate the audio delay increase degree.

判断ブロック８３ｃでは、音声遅延増加度が０．１以上であった場合は、処理ブロック８３ｆに進む。一方、判断ブロック８３ｃにおいて、音声遅延増加度が０．１未満であった場合は、判断ブロック８３ｄへ進む。 In the decision block 83c, when the audio delay increase degree is 0.1 or more, the process proceeds to the processing block 83f. On the other hand, if the voice delay increase degree is less than 0.1 in the decision block 83c, the process proceeds to the decision block 83d.

判断ブロック８３ｄでは、音声遅延増加度が０．０５以上であった場合は、処理ブロック８３ｊにおいて、時間圧縮伸長処理の話速変換倍率を０．９倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８３ｍに進む。一方、判断ブロック８３ｄにおいて、音声遅延増加度が０．０５未満であった場合は、判断ブロック８３ｅへ進む。 In the decision block 83d, when the degree of increase in voice delay is 0.05 or more, in the processing block 83j, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.9 times is sent to the speed converter A51. Then, the process proceeds to the processing block 83m. On the other hand, if it is determined at decision block 83d that the increase in audio delay is less than 0.05, the flow proceeds to decision block 83e.

判断ブロック８３ｅでは、音声遅延増加度が−０．１以上であった場合は、処理ブロック８３ｉにおいて、時間圧縮伸長処理の話速変換倍率を、前回の倍率から変更しない指示信号を速度変換部Ａ５１に送出し、処理ブロック８３ｍに進む。一方、判断ブロック８３ｅにおいて、音声遅延増加度が−０．１未満であった場合は、処理ブロック８３ｈにおいて、時間圧縮伸長処理の話速変換倍率を０．８倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８３ｍに進む。 In the decision block 83e, if the degree of increase in the voice delay is −0.1 or more, in the processing block 83i, an instruction signal that does not change the speech speed conversion magnification of the time compression / expansion processing from the previous magnification is sent to the speed converter A51. And proceed to processing block 83m. On the other hand, when the voice delay increase degree is less than −0.1 in the decision block 83e, in the processing block 83h, an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 0.8 is sent to the speed conversion unit. The data is sent to A51, and the process proceeds to process block 83m.

処理ブロック８３ｆでは、有音／無音判別部Ａ２１から受け取った、有音と判断された入力音声信号のレベルと、予め定めておいた無音閾値レベルの差分値の計算を行う。
ここでの無音閾値レベルとは、音声信号を無音と判定するレベルの値であり、例えば、信号レベルの最大値をデシベル表記で０ｄＢとした場合、無音と判定するレベル値は−４５ｄＢとする。なお、有音／無音判別部Ａ２１にて、有音／無音の判別手段に信号レベルを用いた場合は、無音と判定するレベルの値を有音／無音判別部Ａ２１から受け取るように変更し、受け取った値を無音閾値レベルに代用しても良い。 In the processing block 83f, a difference value between the level of the input voice signal determined to be voice received from the voice / silence discrimination unit A21 and a predetermined silence threshold level is calculated.
The silence threshold level here is a value of a level at which the audio signal is determined to be silent. For example, when the maximum value of the signal level is set to 0 dB in decibel notation, the level value to be determined as silence is −45 dB. When the signal level is used for the sound / silence determination unit in the sound / silence determination unit A21, the value for determining the level of silence is changed to be received from the sound / silence determination unit A21, The received value may be substituted for the silence threshold level.

判断ブロック８３ｇでは、処理ブロック８３ｆにおいて算出したレベル差分値が５ｄＢ以下であった場合は、処理ブロック８３ｌにおいて、時間圧縮伸長処理の話速変換倍率を１．１倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８３ｍに進む。一方、判断ブロック８３ｇにおいて、レベル差分値が５ｄＢより大きかった場合は、処理ブロック８３ｋにおいて、時間圧縮伸長処理の話速変換倍率を１．０倍とする指示信号を速度変換部Ａ５１に送出し、処理ブロック８３ｍに進む。 In the decision block 83g, if the level difference value calculated in the processing block 83f is 5 dB or less, the processing block 83l sends an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 1.1 times as a speed conversion unit. The data is sent to A51, and the process proceeds to process block 83m. On the other hand, if the level difference value is larger than 5 dB in the decision block 83g, the processing block 83k sends an instruction signal for setting the speech speed conversion magnification of the time compression / decompression process to 1.0, to the speed converter A51. Proceed to process block 83m.

最後に処理ブロック８３ｍでは、出力音声制御部６から受け取った現在の音声蓄積量を記録し、処理を終了する。 Finally, in the processing block 83m, the current voice accumulation amount received from the output voice control unit 6 is recorded, and the process is terminated.

変換速度制御部Ｃ８３の特徴としては、図６に示した実施の形態２の変換速度制御部Ｂ８２では音声遅延増加度が０．０５未満の場合において話速変換倍率を０．８倍にしていたものを、その閾値を下げ、−０．１０未満の場合とし、音声遅延増加度が−０．１０以上０．０５未満の値を取る場合は不感帯ゾーンとし、話速変換倍率を変更しないようにした点である。その理由は、例えば、複数の話者が討論を交わしている場合においては、会話の中で一瞬だけ音声蓄積量が減少し、再び増加に転じることがあるが、このようなケースにおいて、話速変換倍率をもっとも速度の遅い０．８倍に戻してしまうと、映像に対する音声の遅れが再び増加に転じてしまうことを避けるためである。
以上が、変換速度制御部Ｃ８３の動作説明である。 As a feature of the conversion speed control unit C83, in the conversion speed control unit B82 of the second embodiment shown in FIG. 6, the speech speed conversion magnification is set to 0.8 when the voice delay increase is less than 0.05. If the threshold is lowered to less than -0.10, and the voice delay increase takes a value between -0.10 and less than 0.05, it is set as a dead zone, and the speech rate conversion magnification is not changed. This is the point. The reason for this is that, for example, in the case where a plurality of speakers are in discussion, the voice accumulation amount may decrease for a moment in the conversation and start to increase again. If the conversion magnification is returned to 0.8 times, which is the slowest speed, the audio delay with respect to the video is prevented from increasing again.
The above is the description of the operation of the conversion speed control unit C83.

以上詳しく説明したように、変換速度制御部Ｂ８２の代わりに変換速度制御部Ｃ８３を用いた他の実施例２によれば、時間圧縮伸長処理の話速変換倍率を単位時間毎に随時切り換えて動作させることが可能となり、映像に対する音声の遅れを抑圧できる効果に加え、音声蓄積量の微小な増減変化に対しては、現在の速度を維持することが可能となり、映像に対する音声の遅れを更に抑圧できる効果がある。 As described above in detail, according to the second embodiment in which the conversion speed control unit C83 is used instead of the conversion speed control unit B82, the speech speed conversion magnification of the time compression / decompression process is switched as needed every unit time. In addition to the effect of suppressing the audio delay with respect to the video, it is possible to maintain the current speed against a minute increase / decrease in the amount of accumulated audio, further suppressing the audio delay with respect to the video. There is an effect that can be done.

なお、以上の各実施の形態はテレビ放送の映像と音声の関係についてを例に説明したが、この発明は電話機等にも適用されることで相手との会話において返答が時間的に遅れる等の間のあいたことが防げる。 In the above embodiments, the relationship between video and audio of television broadcasting has been described as an example. However, the present invention is also applied to a telephone or the like so that a response is delayed in a conversation with the other party. You can prevent things in between.

この発明は、テレビジョン装置や電話機及び携帯電話機等に適用されることで、これら装置の音声をその音声ピッチを変えずにゆっくり再生することができ、受聴者に対して聴き取り易い話音声が提供できるものである。 The present invention is applied to a television device, a telephone set, a mobile phone, and the like, so that the voice of these devices can be reproduced slowly without changing the voice pitch, and a spoken voice that is easy to hear for the listener can be obtained. It can be provided.

この発明の実施の形態１に係る話速変換装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech speed conversion apparatus which concerns on Embodiment 1 of this invention. 実施の形態１における信号廃棄判断部の処理フローチャートである。5 is a process flowchart of a signal discard determination unit in the first embodiment. 実施の形態１における出力音声制御部の構成を示す機能ブロック図である。3 is a functional block diagram illustrating a configuration of an output audio control unit according to Embodiment 1. FIG. 実施の形態１における変換速度制御部の動作のフローチャートである。3 is a flowchart of the operation of a conversion speed control unit in the first embodiment. 実施の形態１に係る他の実施例の変換速度制御部Ａの処理フローチャートである。7 is a process flowchart of a conversion speed control unit A of another example according to the first embodiment. この発明を実施の形態２に係る話速変換装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech-speed converter based on Embodiment 2 of this invention. この発明を実施の形態２に係る変換速度制御部Ｂの処理フローチャートである。It is a process flowchart of the conversion speed control part B which concerns on this Embodiment 2 for this invention. 実施の形態２に係る他の実施例の変換速度制御部Ｃの処理フローチャートである。10 is a process flowchart of a conversion speed control unit C of another example according to the second embodiment.

Explanation of symbols

１；音声入力端子、２；有音／無音判別部、３；信号廃棄判断部、４；無音信号廃棄部、５；速度変換部、６；出力音声制御部、６ａ；出力音声数計算部、６ｂ；変換音声蓄積部、６ｃ；遅延時間計算部、７；音声出力端子、８；変換速度制御部、２１；有音／無音判別部Ａ、５１；速度変換部Ａ、８１；変換速度制御部Ａ、８２；変換速度制御部Ｂ、８３；変換速度制御部Ｃ。 DESCRIPTION OF SYMBOLS 1; Audio | voice input terminal, 2; Sound / silence discrimination | determination part, 3; Signal discard judgment part, 4; Silent signal discard part, 5; Speed conversion part, 6; Output voice control part, 6a; 6b; Conversion voice accumulating section, 6c; Delay time calculation section, 7; Voice output terminal, 8; Conversion speed control section, 21; Sound / silence discrimination section A, 51; Speed conversion section A, 81; A, 82; Conversion speed control unit B, 83; Conversion speed control unit C.

Claims

A voice input terminal, a voice / silence determination unit, a signal discard determination unit, a silence signal discard unit, a speed conversion unit, an output voice control unit, a voice output terminal, and a conversion speed control unit;
The voice / silence discrimination unit discriminates whether the voice signal input from the voice input terminal is voiced or silent,
The signal discard judgment unit does not discard the voice signal by the discrimination signal from the voice / silence discrimination unit, and the silence signal calculates the total time of the input voice signal from the voice input terminal and outputs it from the output voice control unit Calculate the total time of the audio signal, subtract the total audio input time from the total audio output time, and determine whether to discard the silence signal based on the result,
The silent signal discarding unit also inputs the audio signal, discards the silent signal with the discard signal from the signal discard determination unit, and passes the other signals to the speed conversion unit,
The speed conversion unit performs a voice signal expansion process on the voice signal passed from the silence signal discarding unit without changing the voice pitch at the conversion rate of the voice speed from the conversion speed control unit,
The output audio control unit accumulates the audio signal subjected to the time compression / decompression processing from the speed conversion unit in the audio accumulation memory, and discards the signal as the time length of the output audio signal every time the audio signal is input. Output to the audio output terminal the audio signals waiting to be output that have been passed to the discriminator and stored in the audio storage memory, in order from the oldest to the audio output terminal, and count the number of output audio samples. To the conversion speed control unit
The conversion speed control unit calculates a difference value between the current voice accumulation amount received from the output voice control unit and the voice accumulation amount at the previous processing, and divides this difference value by unit time to obtain a voice delay increase degree. And a speech speed conversion device configured to output a speech speed conversion magnification corresponding to the degree of increase in voice delay to the speed converter.

The conversion speed control unit does not perform the time compression / expansion processing based on the voice delay increase degree and the amplitude level of the input voice signal is smaller than a predetermined value. The speech speed converting apparatus according to claim 1, wherein the apparatus is configured to output to the speed converting unit.

3. The conversion speed control unit according to claim 1, wherein the conversion speed control unit is configured to output a speech speed conversion magnification of the previous process to the speed conversion unit when the degree of increase in voice delay is within a specific range. Speaking speed converter.

A voice / silence discrimination step for discriminating whether the input voice signal from the voice input terminal is voiced or silent, and outputting a judgment signal;
Whether or not the sound signal is discarded by the determination signal from the sound / silence determination step, and the sound signal subtracts the total time of sound input from the total sound output time at the present time, and whether to discard the sound signal according to the result A signal discarding determination step for determining whether or not
The audio signal is input, the silence signal is discarded by the discard signal from the signal discard determination step, and other signals are sent to the silence signal discard step and the silence signal discard step that are passed to the speed conversion step of the next processing. On the other hand, a speed conversion step for performing speech signal expansion processing at a speech speed conversion magnification determined in the previous processing without changing the voice pitch,
The audio signal subjected to the time compression / decompression processing from the speed conversion step is accumulated in the audio accumulation memory, and every time the audio signal is input, the time length is set as the time length of the output audio signal, and the audio used in the signal discard determination step Output as the original data for the total number of output times, and output the standby audio signals stored in the audio storage memory to the audio output terminal in order from the oldest one, and count the number of output audio samples And an output audio control step for outputting the current audio accumulation amount at regular intervals,
Calculate the difference between the current audio accumulation amount output from the output audio control step and the audio accumulation amount at the previous processing, and divide this difference value by unit time to calculate the audio delay increase degree. A speech speed conversion control method, comprising: a conversion speed control step for outputting a speech speed conversion magnification according to a delay increase degree as a speech speed conversion magnification used in the speed conversion step.