JP2000250566A

JP2000250566A - Sound and soundless deciding device and speech rate converting device

Info

Publication number: JP2000250566A
Application number: JP11047533A
Authority: JP
Inventors: Tatsuo Inoue; 健生井上
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1999-02-25
Filing date: 1999-02-25
Publication date: 2000-09-14

Abstract

PROBLEM TO BE SOLVED: To provide a sound/soundless deciding device capable of improving sound/soundless decisive precision by using movement of an image. SOLUTION: This device is the sound/soundless deciding device provided with a means making a video signal and a voice signal synthesized with the video signal an input and calculating the sound/soundless deciding data from the input voice signal and a decision means deciding whether an input voice is a sound section or a soundless section by comparing the obtained sound/ soundless deciding data with a sound/soundless deciding threshold value. In such a case, this device is provided with a movement detection means 1 detecting the movement of the image based on an input video signal and a threshold value control means 2 changing the sound/soundless deciding threshold value based on the detection result of the movement detection means 1.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、有音・無音判定
装置および話速変換装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound / non-speech determining device and a speech speed converting device.

【０００２】[0002]

【従来の技術】ＶＴＲの高速再生時において、ビデオテ
ープから読み取られた音声信号のうち、無音区間の音声
信号を削除または時間軸上において圧縮し、有音区間の
音声信号を時間軸上において伸長することにより、有音
区間の音声を再生速度より遅い速度で出力する話速変換
装置が知られている。2. Description of the Related Art During high-speed reproduction of a VTR, of audio signals read from a video tape, an audio signal in a silent section is deleted or compressed on the time axis, and an audio signal in a sound section is expanded on the time axis. By doing so, a speech speed conversion device that outputs voice in a sound section at a speed lower than the playback speed is known.

【０００３】ビデオテープから読み取られた音声信号が
無音区間であるか有音区間であるかの判別は、たとえ
ば、音声信号パワーを予め設定された閾値とを比較する
ことにより行われている。つまり、音声信号パワーが閾
値以上である区間を有音区間と判別し、音声信号パワー
が閾値より小さい区間を無音区間と判別している。[0003] Whether an audio signal read from a video tape is a silent section or a sound section is determined, for example, by comparing the audio signal power with a preset threshold value. That is, a section in which the audio signal power is equal to or greater than the threshold is determined as a sound section, and an interval in which the audio signal power is smaller than the threshold is determined as a silent section.

【０００４】しかしながら、この方法では、背景ノイズ
が大きい場合やＢＧＭが存在する場合には、無音区間と
すべき区間が有音区間と判別されることがある。また、
声の小さなセリフがある部分において、無音区間と判別
されることがある。However, in this method, when there is a large background noise or when BGM is present, a section which should be a silent section may be determined as a sound section. Also,
A portion having a small voice line may be determined to be a silent section.

【０００５】[0005]

【発明が解決しようとする課題】ところで、ドラマ、ニ
ュース等の番組においては、出演者の声を出力すること
が重要となる。出演者が喋っている場面では、出演者の
口が動く。また、ゴルフ番組、テニス番組等において
は、スウイング音を出力することが重要となる。スウイ
ング音が発生している場面では、ゴルフクラブ、ラケッ
ト等が動いている。In the case of programs such as dramas and news, it is important to output the voices of the performers. When the performer is talking, the performer's mouth moves. In a golf program, a tennis program, and the like, it is important to output a swing sound. In a scene where a swing sound is generated, a golf club, a racket, or the like is moving.

【０００６】この発明は、画像の動きを利用することに
より、有音・無音判定精度を向上させることができる有
音・無音判定装置を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a sound / silence determination device which can improve the sound / silence determination accuracy by utilizing the movement of an image.

【０００７】また、この発明は、判定精度の高い有音・
無音区間判定手段を備えた話速変換装置を提供すること
を目的とする。Further, the present invention provides a sound / voice having high determination accuracy.
An object of the present invention is to provide a speech speed conversion device provided with a silent section determination unit.

【０００８】[0008]

【課題を解決するための手段】この発明による有音・無
音判定装置は、映像信号および映像信号に同期した音声
信号を入力とし、入力音声信号から有音・無音判定用デ
ータを算出する手段および得られた有音・無音判定用デ
ータを有音・無音判定用閾値と比較することにより、入
力音声が有音区間であるか無音区間であるかを判定する
判定手段を備えている有音・無音判定装置において、入
力映像信号に基づいて画像の動きを検出する動き検出手
段、および動き検出手段の検出結果に基づいて、有音・
無音判定用閾値を変化させる閾値制御手段を備えている
ことを特徴とする。A sound / non-speech determining apparatus according to the present invention receives a video signal and an audio signal synchronized with the video signal, and calculates voice / non-speech determining data from the input audio signal. By comparing the obtained voice / non-speech determination data with a voice / non-speech determination threshold, the voice / non-speech determination unit determines whether the input voice is in a voice section or a non-voice section. In the silence determination device, a motion detection unit that detects a motion of an image based on an input video signal, and a sound / noise detection unit based on a detection result of the motion detection unit.
It is characterized by comprising a threshold control means for changing a silence determination threshold.

【０００９】動き検出手段としては、たとえば、１画面
内に設定された複数の領域毎に動きベクトルを検出する
ものが用いられ、閾値制御手段としては、たとえば、動
きベクトルの最大値が所定値以上である場合に有音・無
音判定用閾値を小さくし、動きベクトルの最大値が所定
値より小さい場合には有音・無音判定用閾値を大きくさ
せるように、有音・無音判定用閾値を制御するものが用
いられる。As the motion detecting means, for example, a means for detecting a motion vector for each of a plurality of regions set in one screen is used. As the threshold control means, for example, the maximum value of the motion vector is equal to or more than a predetermined value. The threshold value for voice / non-speech determination is controlled so that the threshold value for voice / non-speech determination is reduced when the value is, and the threshold value for voice / non-voice determination is increased when the maximum value of the motion vector is smaller than a predetermined value. Is used.

【００１０】有音・無音判定用データとしては、たとえ
ば、所定期間単位毎の入力音声信号のパワー平均値、所
定期間単位毎の入力音声信号のパワー累積値、所定期間
単位毎の入力音声信号の振幅平均値、所定期間単位毎の
入力音声信号の振幅累積値等が用いられる。The sound / non-speech determination data includes, for example, an average power value of the input audio signal for each predetermined period unit, a cumulative power value of the input audio signal for each predetermined period unit, and an input audio signal value for each predetermined period unit. The average amplitude value, the cumulative amplitude value of the input audio signal for each predetermined period unit, and the like are used.

【００１１】この発明による話速変換装置は、映像信号
および映像信号に同期した音声信号を入力とし、入力音
声信号を話速変換する話速変換装置において、入力音声
信号が有音区間であるか無音区間であるかを判別する区
間判別手段、有音区間においては、入力音声信号を時間
軸上で伸長させる伸長手段、および無音区間において
は、入力音声信号を時間軸上で圧縮させるかまたは削除
する手段を備えており、区間判別手段は、入力音声信号
から有音・無音判定用データを算出する算出手段、得ら
れた有音・無音判定用データを有音・無音判定用閾値と
比較することにより、入力音声が有音区間であるか無音
区間であるかを判定する判定手段、入力映像信号に基づ
いて画像の動きを検出する動き検出手段、および動き検
出手段の検出結果に基づいて、有音・無音判定用閾値を
変化させる閾値制御手段を備えていることを特徴とす
る。According to the speech speed conversion device of the present invention, in a speech speed conversion device which receives a video signal and an audio signal synchronized with the video signal as input, and converts the input audio signal into a speech speed, whether the input audio signal is a voiced section or not. Section discriminating means for discriminating whether a section is a silent section, expanding means for expanding an input audio signal on a time axis in a sound section, and compressing or deleting the input audio signal on a time axis in a silent section A section discriminating means for calculating speech / silence determination data from the input voice signal, and comparing the obtained speech / silence determination data with a speech / silence determination threshold. By this means, a determination means for determining whether the input sound is a sound section or a silent section, a motion detection means for detecting a motion of an image based on an input video signal, and a detection result of the motion detection means Zui it, characterized in that it comprises a threshold control means for changing a voice or silence determination threshold.

【００１２】動き検出手段としては、たとえば、１画面
内に設定された複数の領域毎に動きベクトルを検出する
ものが用いられ、閾値制御手段としては、たとえば、動
きベクトルの最大値が所定値以上である場合に有音・無
音判定用閾値を小さくし、動きベクトルの最大値が所定
値より小さい場合には有音・無音判定用閾値を大きくさ
せるように、有音・無音判定用閾値を制御するものが用
いられる。As the motion detecting means, for example, a means for detecting a motion vector for each of a plurality of areas set in one screen is used. As the threshold control means, for example, the maximum value of the motion vector is equal to or more than a predetermined value. The threshold value for voice / non-speech determination is controlled so that the threshold value for voice / non-speech determination is reduced when the value is, and the threshold value for voice / non-voice determination is increased when the maximum value of the motion vector is smaller than a predetermined value. Is used.

【００１３】有音・無音判定用データとしては、たとえ
ば、所定期間単位毎の入力音声信号のパワー平均値、所
定期間単位毎の入力音声信号のパワー累積値、所定期間
単位毎の入力音声信号の振幅平均値、所定期間単位毎の
入力音声信号の振幅累積値等が用いられる。The sound / non-speech determination data includes, for example, the average power of the input audio signal for each predetermined period unit, the cumulative power value of the input audio signal for each predetermined period unit, and the input audio signal for each predetermined period unit. The average amplitude value, the cumulative amplitude value of the input audio signal for each predetermined period unit, and the like are used.

【００１４】[0014]

【発明の実施の形態】BEST MODE FOR CARRYING OUT THE INVENTION

【００１５】以下、図面を参照して、この発明の実施の
形態について説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は、話速変換装置の構成を示してい
る。FIG. 1 shows a configuration of a speech speed conversion device.

【００１７】ここでは、ＶＴＲの２倍速再生時におい
て、音声区間の音声信号を２倍速再生時の再生出力より
も遅い速度で出力させ、無音区間の音声信号を削除する
場合について説明する。Here, a case will be described in which the audio signal in the audio section is output at a speed lower than the reproduction output in the double-speed reproduction during the double-speed reproduction of the VTR, and the audio signal in the silent section is deleted.

【００１８】ＶＴＲから２倍速再生速度で読み出された
映像信号は、動きベクトル検出部１に送られる。動きベ
クトル検出部１は、よく知られているように、代表点マ
ッチング法に基づいて、動きベクトルを検出するための
データを生成するものである。A video signal read from the VTR at a double speed reproduction speed is sent to a motion vector detecting section 1. As is well known, the motion vector detecting section 1 generates data for detecting a motion vector based on a representative point matching method.

【００１９】代表点マッチング法について、簡単に説明
する。図２に示すように、各フレームの映像エリア１０
０内に複数の動きベクトル検出領域Ａ０〜Ｅ７が設定さ
れている。各動きベクトル検出領域Ａ０〜Ｅ７の大きさ
は同じである。また、各動きベクトル検出領域Ａ０〜Ｅ
７は、図３に示すように、さらに複数の小領域ｅに分割
されている。そして、図４に示すように、各小領域ｅそ
れぞれに、複数のサンプリング点Ｓと１つの代表点Ｒと
が設定されている。The representative point matching method will be briefly described. As shown in FIG. 2, the video area 10 of each frame
A plurality of motion vector detection areas A0 to E7 are set in 0. The sizes of the motion vector detection areas A0 to E7 are the same. Also, each motion vector detection area A0-E
7 is further divided into a plurality of small areas e as shown in FIG. Then, as shown in FIG. 4, a plurality of sampling points S and one representative point R are set in each small area e.

【００２０】現フレームにおける各小領域ｅ内のサンプ
リング点Ｓの映像信号レベルと、前フレームにおける対
応する小領域ｅの代表点Ｒの映像信号レベルとの差（各
サンプリング点における相関値）が、各動きベクトル検
出領域Ａ０〜Ｅ７ごとに求められる。そして、各動きベ
クトル検出領域Ａ０〜Ｅ７ごとに、動きベクトル検出領
域内の全ての小領域間において、代表点Ｒに対する偏位
が同じサンプリング点どうしの相関値が累積加算され
る。したがって、各動きベクトル検出領域Ａ０〜Ｅ７ご
とに、１つの小領域ｅ内のサンプリング点の数に応じた
数の相関累積値が求められる。The difference (correlation value at each sampling point) between the video signal level of the sampling point S in each small area e in the current frame and the video signal level of the representative point R of the corresponding small area e in the previous frame is It is obtained for each of the motion vector detection areas A0 to E7. Then, for each of the motion vector detection areas A0 to E7, correlation values of sampling points having the same deviation with respect to the representative point R are cumulatively added among all the small areas in the motion vector detection area. Therefore, for each of the motion vector detection areas A0 to E7, the number of correlation accumulated values corresponding to the number of sampling points in one small area e is obtained.

【００２１】各動きベクトル検出領域Ａ０〜Ｅ７内にお
いて、相関累積値が最小となる点の偏位、すなわち相関
性が最も高い点の偏位が、当該動きベクトル検出領域Ａ
０〜Ｅ７の動きベクトル（被写体の動き）として抽出さ
れる。In each of the motion vector detection areas A0 to E7, the deviation of the point having the smallest accumulated correlation value, that is, the deviation of the point having the highest correlation is determined by the motion vector detection area A0 to E7.
The motion vectors are extracted as motion vectors 0 to E7 (movements of the subject).

【００２２】動きベクトル検出部１によって生成された
データは、有音・無音判定用閾値を決定するための有音
・無音判定用閾値制御部２に送られる。有音・無音判定
用閾値制御部２には、動き判定用閾値設定部７によって
設定された動き判定用閾値も入力されている。動き判定
用閾値設定部７には、番組モードに対応した動き判定用
閾値が記憶されており、ユーザによって入力された番組
モードに対応する動き判定用閾値が有音・無音判定用閾
値制御部２に送られる。The data generated by the motion vector detecting section 1 is sent to a voice / silence determination threshold control section 2 for determining a voice / silence determination threshold. The threshold for motion determination set by the threshold setting unit for motion determination 7 is also input to the threshold control unit 2 for sound / non-speech determination. The motion determination threshold setting unit 7 stores a motion determination threshold corresponding to the program mode, and sets the motion determination threshold corresponding to the program mode input by the user to the sound / non-sound determination threshold control unit 2. Sent to

【００２３】表１は、番組モード毎の動き判定用閾値の
相対値の例を示している。Table 1 shows an example of the relative value of the motion determination threshold for each program mode.

【００２４】[0024]

【表１】 [Table 1]

【００２５】たとえば、ニュース番組に対する動き判定
用閾値の相対値と、スポーツ番組（ゴルフ番組）に対す
る動き判定用閾値の相対値を比較すると、スポーツに対
する動き判定用閾値の相対値が大きく設定されている。
ニュース番組の場合にはキャスターの口の動きの有無が
有音・無音判定に重要な要素となり、ゴルフ番組の場合
にはゴルフクラブの動きの有無が有音・無音判定に重要
な要素となる。口の動きに対する動きベクトルは小さ
く、ゴルフクラブの動きに対する動きベクトルは大き
い。そこで、ニュース番組の場合にはキャスターの口の
動きの有無を判定できるようにするための動き判定用閾
値が小さい値に設定され、ゴルフ番組の場合には口の動
き等の小さい動きを無視して、ゴルフクラブの動き等の
大きい動きの有無を判定できるように動き判定用閾値が
大きい値に設定されているのである。For example, comparing the relative value of the threshold for motion determination with respect to a news program and the relative value of the threshold for motion determination with respect to a sports program (golf program), the relative value of the threshold for motion determination with respect to sports is set to be large. .
In the case of a news program, the presence / absence of the movement of the mouth of the caster is an important factor for the sound / non-sound determination. In the case of a golf program, the presence / absence of the movement of the golf club is an important factor for the sound / silence determination. The motion vector for the movement of the mouth is small, and the motion vector for the movement of the golf club is large. Therefore, in the case of a news program, the threshold value for motion determination for enabling the presence or absence of the movement of the mouth of the caster to be determined is set to a small value, and in the case of a golf program, small movements such as the movement of the mouth are ignored. Thus, the movement determination threshold is set to a large value so that the presence or absence of a large movement such as the movement of the golf club can be determined.

【００２６】有音・無音判定用閾値制御部２は、各動き
ベクトル検出領域Ａ０〜Ｅ７で検出された動きベクトル
のうちの最大値と、動き判定用閾値とを比較する。そし
て、各動きベクトル検出領域Ａ０〜Ｅ７で検出された動
きベクトルのうちの最大値が動き判定用閾値以上である
場合には有音・無音判定用閾値を小さくし、動きベクト
ルの最大値が動き判定用閾値より小さい場合には有音・
無音判定用閾値を大きくさせる。The sound / non-speech determination threshold control unit 2 compares the maximum value of the motion vectors detected in the respective motion vector detection areas A0 to E7 with the motion determination threshold. If the maximum value of the motion vectors detected in the respective motion vector detection areas A0 to E7 is equal to or larger than the threshold for motion determination, the threshold for voice / non-speech determination is reduced, and the maximum value of the motion vector If it is smaller than the judgment threshold,
Increase the silence determination threshold.

【００２７】ＶＴＲから２倍速再生速度で読み出された
入力音声信号は、パワー算出部３に送られる。パワー算
出部３では、所定期間単位分、たとえば、１フレーム分
の音声信号の平均パワー値が算出される。パワー算出部
３で算出された平均パワー値は、有音・無音区間判定部
４に送られる。An input audio signal read from the VTR at a double speed reproduction speed is sent to a power calculator 3. The power calculator 3 calculates an average power value of the audio signal for a predetermined period unit, for example, one frame. The average power value calculated by the power calculation unit 3 is sent to the sound / non-sound section determination unit 4.

【００２８】有音・無音区間判定部４では、パワー算出
部３から送られてきた平均パワー値と、有音・無音判定
用閾値制御部２によって決定された有音・無音判定用閾
値とが比較される。平均パワー値が有音・無音判定用閾
値以上である場合には、現フレームが音声区間であると
判別される。平均パワー値が有音・無音判定用閾値より
小さい場合には、現フレームが無音区間であると判別さ
れる。The voice / non-speech interval determination unit 4 calculates the average power value sent from the power calculation unit 3 and the voice / non-voice determination threshold value determined by the voice / non-voice determination threshold control unit 2. Be compared. If the average power value is equal to or greater than the voiced / silence determination threshold, the current frame is determined to be a voice section. If the average power value is smaller than the voice / silence determination threshold, the current frame is determined to be a silent section.

【００２９】話速制御部５には、入力音声信号と有音・
無音区間判定部４の判定結果とが送られる。話速制御部
５は、無音削除部５１と時間軸伸長部５２とを備えてい
る。有音区間であると判定された音声信号は、２倍速再
生時の再生出力よりも遅い速度で出力されるようにする
ために、時間軸伸長部５２によって時間軸伸長処理が行
われる。無音区間であると判別された音声信号は、無音
削除部５１によって時間軸圧縮処理が行われる。The speech speed control unit 5 includes an input voice signal and
The determination result of the silent section determination unit 4 is sent. The speech speed control unit 5 includes a silence removing unit 51 and a time axis extending unit 52. The time axis decompression unit 52 performs time axis decompression processing so that the audio signal determined to be a sound section is output at a speed lower than the reproduction output at the time of double speed reproduction. The audio signal determined to be a silent section is subjected to time axis compression processing by the silent section 51.

【００３０】時間軸伸長部５２によって時間軸伸長処理
が行われた有音区間の音声信号は、音声メモリ６に一旦
格納される。音声メモリ６に格納された音声信号は、逐
次読み出されて出力される。The audio signal of the sound section subjected to the time axis expansion processing by the time axis expansion section 52 is temporarily stored in the audio memory 6. The audio signals stored in the audio memory 6 are sequentially read and output.

【００３１】上記実施の形態では、パワー算出部３によ
って、所定期間単位分の音声信号の平均パワー値が算出
されているが、所定期間単位毎の入力音声信号のパワー
累積値、所定期間単位毎の入力音声信号の振幅平均値ま
たは所定期間単位毎の入力音声信号の振幅累積値を算出
するようにしてもよい。In the above-described embodiment, the average power value of the audio signal for the predetermined period unit is calculated by the power calculation unit 3. However, the power accumulated value of the input audio signal for each predetermined period unit, Alternatively, the average amplitude value of the input audio signal or the cumulative amplitude value of the input audio signal for each predetermined period unit may be calculated.

【００３２】また、上記実施の形態では、無音区間であ
ると判別された音声信号は無音削除部５１によって削除
されているが、無音区間であると判別された音声信号に
対して時間軸圧縮処理を行うようにしてもよい。この場
合には、時間軸圧縮処理が行われた無音区間の音声信号
も音声メモリ６に一時的に格納される。In the above embodiment, the audio signal determined to be a silent section is deleted by the silent deletion section 51. However, the audio signal determined to be the silent section is subjected to the time axis compression processing. May be performed. In this case, the audio signal of the silent section subjected to the time axis compression processing is also temporarily stored in the audio memory 6.

【００３３】上記実施の形態では、各動きベクトル検出
領域Ａ０〜Ｅ７で検出された動きベクトルのうちの最大
値が動き判定用閾値以上である場合には有音・無音判定
用閾値が小さくされ、動きベクトルの最大値が動き判定
用閾値より小さい場合には有音・無音判定用閾値が大き
くされている。In the above embodiment, when the maximum value of the motion vectors detected in the respective motion vector detection areas A0 to E7 is equal to or larger than the threshold for motion determination, the threshold for voice / non-voice determination is reduced. When the maximum value of the motion vector is smaller than the threshold for motion determination, the threshold for voice / non-voice determination is increased.

【００３４】したがって、たとえば、ドラマ、ニュース
等の番組において、出演者が喋っている場面では、出演
者の口が動くため、有音・無音判定用閾値が小さくされ
る。この結果、出演者が小さい声で喋っている区間が有
音区間と判別されやすくなり、出演者の声が聴き取りや
すくなる。Therefore, for example, in a program such as a drama or a news, when a performer is speaking, the performer's mouth moves, so that the threshold for sound / non-speech determination is reduced. As a result, a section in which the performer is speaking in a low voice is more likely to be determined as a sound section, and the performer's voice is more easily heard.

【００３５】また、ゴルフ番組、テニス番組等におい
て、スウィング音が発生している場面では、ゴルフクラ
ブ、ラケット等が動くため、有音・無音判定用閾値が小
さくされる。この結果、スウィング音が発生している区
間が有音区間と判別されやすくなり、スウィング音が聴
き取りやすくなる。In a golf program, a tennis program, and the like, when a swing sound is generated, a golf club, a racket, and the like move, so that the threshold for sound / non-sound determination is reduced. As a result, the section where the swing sound is generated is more likely to be determined as a sound section, and the swing sound is more easily heard.

【００３６】[0036]

【発明の効果】この発明によれば、画像の動きを利用す
ることにより、有音・無音判定精度を向上させることが
できる有音・無音判定装置が実現する。また、この発明
によれば、判定精度の高い有音・無音区間判定手段を備
えた話速変換装置が実現する。According to the present invention, a voiced / silent discriminating apparatus which can improve the voiced / silent discrimination accuracy by utilizing the motion of an image is realized. Further, according to the present invention, a speech speed conversion device provided with a voiced / silent section determination unit having high determination accuracy is realized.

[Brief description of the drawings]

【図１】話速変換装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a speech speed conversion device.

【図２】各フレームの映像エリアに設定される複数の動
きベクトル検出領域を示す模式図である。FIG. 2 is a schematic diagram showing a plurality of motion vector detection areas set in a video area of each frame.

【図３】各動きベクトル検出領域に設定されている小領
域ｅを示す模式図である。FIG. 3 is a schematic diagram showing a small area e set in each motion vector detection area.

【図４】各小領域ｅ内に設定されている複数のサンプリ
ング点Ｓと１つの代表点Ｒとを示す模式図である。FIG. 4 is a schematic diagram showing a plurality of sampling points S and one representative point R set in each small area e.

[Explanation of symbols]

１動きベクトル検出部２有音・無音判定用閾値制御部３パワー算出部４有音・無音区間判定部５話速制御部６音声メモリ７動き判定用閾値設定部５１無音削除部５２時間軸伸長部 DESCRIPTION OF SYMBOLS 1 Motion vector detection part 2 Threshold control part for sound / non-speech determination 3 Power calculation part 4 Speech / non-speech section judgment part 5 Speech speed control part 6 Voice memory 7 Threshold setting part for motion judgment 51 Silence deletion part 52 Time axis extension Department

Claims

[Claims]

1. A means for receiving a video signal and an audio signal synchronized with the video signal as input, and calculating voice / non-speech determination data from the input voice signal, and converting the obtained voice / non-speech data into voice / non-speech. A sound / silence determination device having a determination unit that determines whether an input voice is a voiced section or a voiceless section by comparing with a determination threshold value. A sound / non-speech determining device, comprising: a motion detecting means for detecting; and a threshold control means for changing a sound / no-sound determining threshold based on a detection result of the motion detecting means.

2. The motion detecting means detects a motion vector for each of a plurality of areas set in one screen, and the threshold control means is provided when the maximum value of the motion vector is equal to or more than a predetermined value. And controlling the voice / silence determination threshold so that the voice / silence determination threshold is reduced and the voice / non-voice determination threshold is increased when the maximum value of the motion vector is smaller than a predetermined value. Item 1. The sound / non-sound determination device according to item 1.

3. The sound / non-speech determination data includes an average power value of the input audio signal for each predetermined period unit, an accumulated power value of the input audio signal for each predetermined period unit, and an amplitude of the input audio signal for each predetermined period unit. 3. The sound / non-speech determining device according to claim 1, wherein the sound / non-speech determining device is one arbitrarily selected from an average value and an amplitude cumulative value of the input audio signal for each predetermined period unit.

4. A speech speed conversion device which receives a video signal and an audio signal synchronized with the video signal and converts the input audio signal into a speech speed, determines whether the input audio signal is a voiced section or a silent section. Section determining means to perform, in a sound section, expansion means for expanding the input audio signal on the time axis, and in a silent section, means for compressing or deleting the input audio signal on the time axis, The section discriminating means includes calculating means for calculating voice / non-speech determination data from the input voice signal, and comparing the obtained voice / non-speech determination data with a voice / non-speech determination threshold to determine whether the input voice is valid. Determining means for determining whether the section is a sound section or a silent section; a motion detecting means for detecting a motion of an image based on an input video signal; and a sound / silence determining section based on a detection result of the motion detecting means. A speech speed conversion device comprising threshold value control means for changing a threshold value.

5. The motion detecting means detects a motion vector for each of a plurality of areas set in one screen, and the threshold control means is provided when the maximum value of the motion vector is equal to or more than a predetermined value. And controlling the voice / silence determination threshold so that the voice / silence determination threshold is reduced and the voice / non-voice determination threshold is increased when the maximum value of the motion vector is smaller than a predetermined value. Item 5. A speech speed conversion device according to item 4.

6. The sound / non-speech determination data includes an average power value of the input audio signal for each predetermined period unit, an accumulated power value of the input audio signal for each predetermined period unit, and an amplitude of the input audio signal for each predetermined period unit. 6. The speech speed conversion device according to claim 4, wherein the speech speed conversion device is one arbitrarily selected from an average value and an amplitude accumulation value of the input audio signal for each predetermined period unit.