JP5016556B2

JP5016556B2 - Scene change detection apparatus, encoding apparatus, and scene change detection method

Info

Publication number: JP5016556B2
Application number: JP2008153197A
Authority: JP
Inventors: 康明笹倉
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2008-06-11
Filing date: 2008-06-11
Publication date: 2012-09-05
Anticipated expiration: 2028-06-11
Also published as: JP2009302767A

Description

本発明はシーン変化検出装置、符号化装置及びシーン変化検出方法に関し、特に入力音声及び入力画像の変化点に基づきシーン変化を検出するシーン変化検出装置、符号化装置及びシーン変化検出方法に関する。 The present invention relates to a scene change detection device, an encoding device, and a scene change detection method, and more particularly, to a scene change detection device, an encoding device, and a scene change detection method for detecting a scene change based on a change point of an input sound and an input image.

テレビ放送などを録画するビデオレコーダでは、録画情報の中からコマーシャル（ＣＭ：Commercial Message）部分を検出して、コマーシャル部分のスキップや削除を行なうことができる。従来、テレビ放送中のＣＭ区間の検出は音声信号の切り替わり（例えば、ステレオとモノラルとの切り替わり）などの音声特徴情報や映像の輝度変化等の映像特徴情報の組み合わせによる検出が行われていた。また、ＣＭ区間の検出は、時刻情報などの時間特徴を用いる場合もある。時間特徴情報を用いた場合、時間特徴情報は、例えば、１単位を１５秒として定義し、１ＣＭ区間を単位時間の整数倍と定義する。このような装置におけるコマーシャル部分の検出装置及び検出方法について様々な提案がなされている。このような装置の一例が特許文献１、２（以下、従来例１、２と称す）に開示されている。 In a video recorder that records television broadcasts and the like, it is possible to detect a commercial (CM) portion from recorded information and skip or delete the commercial portion. Conventionally, the detection of a CM section during TV broadcasting has been performed by a combination of audio feature information such as switching of audio signals (for example, switching between stereo and monaural) and video feature information such as changes in video brightness. In addition, the detection of the CM section may use a time feature such as time information. When time feature information is used, the time feature information defines, for example, one unit as 15 seconds and one CM section as an integer multiple of unit time. Various proposals have been made for commercial part detection devices and detection methods in such devices. An example of such an apparatus is disclosed in Patent Documents 1 and 2 (hereinafter referred to as Conventional Examples 1 and 2).

従来例１では、入力画像及び入力音声よりシーン変化点を検出し、検出したシーン変化点から信号区間を算出する。そして、信号区間がＣＭの特定特徴に一致するか否かを判定する特定区間判断手段を有する。ここで、このＣＭの特定特徴とは、予め設定された上記音声特徴情報、映像特徴情報、又は、時間特徴情報（以下、これらの総称を特徴情報と称す）である。 In Conventional Example 1, a scene change point is detected from an input image and input sound, and a signal section is calculated from the detected scene change point. And it has the specific area judgment means which determines whether a signal area corresponds with the specific characteristic of CM. Here, the specific feature of the CM is the above-described voice feature information, video feature information, or time feature information (hereinafter, these generic names are referred to as feature information).

一方、従来例２では、画像情報に基づきエッジ黒フレーム情報及びフェード黒フレーム情報を検出する検出手段と、エッジ黒フレーム情報又はフェード黒フレーム情報のうち隣接する２つの情報に基づき第１時間差情報を決定する時間差手段と、第１時間差情報が第１プリセット時間差情報内にあるかを判定するテスト手段を有する。つまり、従来例２では、第１時間差情報が第１プリセット時間差情報内にある場合にその区間をコマーシャル区間として検出する。なお、第１プリセット時間差情報とは、例えば１０秒、２０秒、３０秒などの複数の設定値のそれぞれの前後０．５秒を範囲とした設定の集合情報である。
特開２００４−１４７２０４号公報特表２００３−５３４７５７号公報 On the other hand, in the conventional example 2, the first time difference information is obtained based on the detection unit that detects the edge black frame information and the fade black frame information based on the image information, and two adjacent information among the edge black frame information or the fade black frame information. Time difference means for determining and test means for determining whether the first time difference information is included in the first preset time difference information. That is, in Conventional Example 2, when the first time difference information is included in the first preset time difference information, the section is detected as a commercial section. Note that the first preset time difference information is set information of a setting within a range of 0.5 seconds before and after each of a plurality of setting values such as 10 seconds, 20 seconds, and 30 seconds.
JP 2004-147204 A Special table 2003-534757 gazette

しかし、ＣＭ単位時間が規定されてない場合（例えば欧米の放送）や、ステレオ、モノラルなどの変化が発生しない場合や、映画放送などで番組区間中にＣＭ区間と番組区間の切り替わり時と類似した映像・音声特徴がある場合などがあり、このような放送形態の場合には高精度にＣＭ区間を検出することは従来例１、２では困難である問題がある。つまり、従来例１、２は、予め規定した信号区間に一致するもののみをＣＭ部分として検出する。そのため、時間が不定なＣＭ期間（例えば、一つ一つのＣＭ時間がランダムになる場合など）を有するテレビ放送を録画しようとした場合、そのＣＭ部分を検出できない場合がある。 However, it is similar to the case where CM unit time is not specified (for example, broadcasting in Europe and the United States), when there is no change in stereo, monaural, etc., or when switching between CM section and program section during a program section in movie broadcasting etc. There are cases where there is a video / audio feature, and in the case of such a broadcast form, there is a problem that it is difficult to detect the CM section with high accuracy in the conventional examples 1 and 2. That is, in the conventional examples 1 and 2, only those that match the signal interval defined in advance are detected as the CM portion. Therefore, when trying to record a television broadcast having a CM period with an indefinite time (for example, when each CM time is random), the CM portion may not be detected.

本発明の一態様は、音声信号の無音状態を検出して、無音検出時刻情報を出力する音声情報判定部と、画像信号における画像の輝度変化又は輝度レベルの低下を検出して、前記輝度変化又は輝度レベルの低下を検出した時刻を示すシーン変化時刻情報を記憶する画像情報判定部と、前記無音検出時刻情報と前記シーン変化時刻情報とに基づき第１のシーン変化候補時刻を出力するシーン変化候補ポイント検出部と、前記第１のシーン変化候補時刻よりも後の時刻における前記音声信号の音声レベルの変化量と予め設定された判定最小値とに基づき前記第１のシーン変化候補時刻の有効性を判定し、有効と判定された前記第１のシーン変化候補時刻を第２のシーン変化候補時刻として出力するシーン変化候補ポイント判定部と、前記第２のシーン変化候補時刻のうち前後する時刻の時間差と検出最大値とに基づきシーン変化検出情報を出力する出力判定部と、を有するシーン変化検出装置である。 According to one aspect of the present invention, a sound information determination unit that detects a soundless state of a sound signal and outputs silence detection time information, and detects a change in luminance or a decrease in luminance level of the image in the image signal. Alternatively, an image information determination unit that stores scene change time information indicating a time when a decrease in luminance level is detected, and a scene change that outputs a first scene change candidate time based on the silence detection time information and the scene change time information Validity of the first scene change candidate time based on the candidate point detection unit, the amount of change in the audio level of the audio signal at a time later than the first scene change candidate time, and a preset minimum determination value A scene change candidate point determination unit that outputs the first scene change candidate time determined to be effective and the second scene change candidate time as the second scene change candidate time; An output judging unit that outputs a scene change detection information based on the time difference between the detected maximum value of the time that the front and rear of the change candidate time is a scene change detection device including a.

また、本発明の別の態様は、音声信号と画像信号とからなる映像が入力されるシーン変化検出装置におけるシーン変化検出方法であって、音声信号の無音状態を検出して、無音検出時刻情報を生成し、画像信号における画像の輝度変化又は輝度レベルの低下を検出して、前記輝度変化又は輝度レベルの低下を検出した時刻を示すシーン変化時刻情報を生成し、前記無音検出時刻情報と前記シーン変化時刻情報とに基づき第１のシーン変化候補時刻を生成し、前記第１のシーン変化候補時刻よりも後の時刻における前記音声信号の音声レベルの変化量と予め設定された判定最小値とに基づき前記第１のシーン変化候補時刻の有効性を判定し、有効と判定された前記第１のシーン変化候補時刻を第２のシーン変化候補時刻として出力し、前記第２のシーン変化候補時刻のうち前後する時刻の時間差が検出最大時間以下である場合にシーン変化検出情報を出力する。 According to another aspect of the present invention, there is provided a scene change detection method in a scene change detection apparatus to which a video composed of an audio signal and an image signal is input. The silence change time information is detected by detecting a silence state of the audio signal. Generating a scene change time information indicating a time at which the brightness change or the decrease in the brightness level is detected, detecting a brightness change or a decrease in the brightness level of the image in the image signal, and generating the silence detection time information and the A first scene change candidate time is generated based on the scene change time information, a change amount of the audio level of the audio signal at a time later than the first scene change candidate time, and a preset determination minimum value, The first scene change candidate time is determined based on the first scene change candidate time, the first scene change candidate time determined to be valid is output as a second scene change candidate time, and the second scene change candidate time is output. And it outputs a scene change detection information when time difference between successive first scene change candidate time is below detection maximum time.

本発明にかかるシーン変化検出装置、シーン変化検出方法及びこれらを含む符号化装置は、シーン変化を検出する場合に、無音検出時刻及び画像の輝度が変化時刻を示すシーン変化時刻情報に基づき第１のシーン変化候補時刻を出力する。また、本発明では、第１のシーン変化候補時刻後の音声信号の音声レベルの変化量に基づき第１のシーン変化候補時刻の有効性を判定する。つまり、第１のシーン変化候補時刻が番組区間とＣＭ区間との間であることの正当性を判定する。そして、有効な第１のシーン変化候補時刻を第２のシーン変化候補時刻とし、この現時刻の第２のシーン変化候補時刻と前時刻の第２のシーン変化候補時刻との時間差からシーン変化検出情報を出力する。つまり、本発明にかかるシーン変化検出装置、シーン変化検出方法及びこれらを含む符号化装置は、高精度に検出された第２のシーン変化候補時刻に基づき、区間の長さが不定な番組の切り替わりが発生する場合であっても、複数の設定を有することなくそれぞれの区間を区別して高精度に検出することが可能になる。 The scene change detection device, the scene change detection method, and the encoding device including these according to the present invention, when detecting a scene change, are based on the silence change time and the scene change time information indicating that the luminance of the image indicates the change time. The candidate scene change time is output. In the present invention, the effectiveness of the first scene change candidate time is determined based on the amount of change in the audio level of the audio signal after the first scene change candidate time. That is, it is determined whether or not the first scene change candidate time is between the program section and the CM section. Then, the effective first scene change candidate time is set as the second scene change candidate time, and the scene change is detected from the time difference between the second scene change candidate time at the current time and the second scene change candidate time at the previous time. Output information. That is, the scene change detection device, the scene change detection method, and the encoding device including these according to the present invention can switch between programs whose section length is indefinite based on the second scene change candidate time detected with high accuracy. Even if this occurs, it is possible to detect each section with high accuracy without having a plurality of settings.

また、本発明の別の態様は、音声信号の無音状態を検出して、無音検出時刻情報を出力する音声情報判定部と、画像信号における画像の輝度変化又は輝度レベルの低下を検出して、前記輝度変化又は輝度レベルの低下を検出した時刻を示すシーン変化時刻情報を記憶する画像情報判定部と、前記無音検出時刻情報に基づき設定される無音検出期間に前記シーン変化時刻情報が含まれる場合に、前記シーン変化時刻情報により示される時刻よりも後に、予め設定される判定最小値以上の前記音声信号の音声レベルの変化量が検出されたことに応じて前記音声信号と前記画像信号を含む映像信号の変化点を示すシーン変化検出情報を出力する区間判定部と、を有するシーン変化検出装置である。 Further, another aspect of the present invention detects a silence state of an audio signal, outputs an audio information determination unit that outputs silence detection time information, and detects a change in luminance or a decrease in luminance level of the image in the image signal. When the scene change time information is included in an image information determination unit that stores scene change time information indicating a time at which the change in luminance or a decrease in luminance level is detected, and a silence detection period set based on the silence detection time information In addition, the audio signal and the image signal are included in response to detection of a change amount of the audio level of the audio signal that is equal to or greater than a predetermined minimum determination value after the time indicated by the scene change time information. A scene change detection apparatus including a section determination unit that outputs scene change detection information indicating a change point of a video signal.

本発明のシーン変化検出装置によれば、無音検出時刻情報に基づき設定される無音検出期間にシーン変化時刻情報が含まれる場合に、シーン変化時刻情報により示される時刻よりも後に、予め設定される判定最小値以上の音声信号の音声レベルの変化量が検出されたことに応じてシーン変化検出情報を出力する。つまり、シーン変化検出情報の生成において、区間の切り替わりの特徴である音声レベルの変化量に基づく判断を行うことで、番組区間とＣＭ区間とをより高精度に判別することができる。従って、本発明のシーン変化検出装置が出力するシーン変化検出情報は、より高精度に番組区間とＣＭ区間の境界となる時刻を示すことができる。 According to the scene change detection device of the present invention, when the scene change time information is included in the silence detection period set based on the silence detection time information, it is set in advance after the time indicated by the scene change time information. Scene change detection information is output in response to detection of a change in the sound level of the sound signal that is equal to or greater than the minimum determination value. That is, in the generation of the scene change detection information, it is possible to discriminate between the program section and the CM section with higher accuracy by making a determination based on the amount of change in the audio level that is a feature of the section switching. Therefore, the scene change detection information output by the scene change detection apparatus of the present invention can indicate the time that becomes the boundary between the program section and the CM section with higher accuracy.

本発明のシーン変化検出装置及びシーン変化検出方法によれば、検出すべき区間の長さがそれぞれ不定な場合であっても、検出すべき区間を精度良く検出することが可能である。 According to the scene change detection device and the scene change detection method of the present invention, it is possible to accurately detect a section to be detected even when the length of the section to be detected is indefinite.

実施の形態１
以下、図面を参照して本発明の実施の形態について説明する。実施の形態１にかかる符号化装置１のブロック図を図１に示す。図１に示すように、符号化装置１は、アナログ／デジタル変換器（Ａ／Ｄ変換器）１０、遅延回路（Time Base Corrector Unit:ＴＢＣユニット）２０、エンコーダ３０、記憶装置４０、アプリケーション実行ユニット（ＣＰＵ）５０、シーン変化検出装置６０を有している。 Embodiment 1
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram of an encoding apparatus 1 according to the first embodiment. As shown in FIG. 1, an encoding apparatus 1 includes an analog / digital converter (A / D converter) 10, a delay circuit (Time Base Corrector Unit: TBC unit) 20, an encoder 30, a storage device 40, and an application execution unit. (CPU) 50 and a scene change detection device 60 are provided.

Ａ／Ｄ変換器１０は、アナログ画像信号及びアナログ音声信号をデジタル画像信号及びデジタル音声信号に変換して出力する。なお、以下の説明では、デジタル画像信号とデジタル音声信号を単に画像信号及び音声信号と称す。ＴＢＣユニット２０は、遅延回路の１つであって、例えば、Ａ／Ｄ変換器１０から出力される画像信号の遅延量を調節することで、画像信号の時間揺らぎを低減する。エンコーダ３０は、ＴＢＣユニット２０が出力する画像信号とＡ／Ｄ変換器１０が出力する音声信号とを所定のフォーマット（例えば、ＭＰＥＧ２（Moving Picture Experts Group 2））に準じて符号化を行なう。記憶装置４０は、エンコーダにて符号化されたデータを記憶する。 The A / D converter 10 converts an analog image signal and an analog audio signal into a digital image signal and a digital audio signal, and outputs them. In the following description, the digital image signal and the digital audio signal are simply referred to as an image signal and an audio signal. The TBC unit 20 is one of delay circuits, and for example, by adjusting the delay amount of the image signal output from the A / D converter 10, the time fluctuation of the image signal is reduced. The encoder 30 encodes the image signal output from the TBC unit 20 and the audio signal output from the A / D converter 10 according to a predetermined format (for example, MPEG2 (Moving Picture Experts Group 2)). The storage device 40 stores data encoded by the encoder.

アプリケーション実行ユニット５０は、例えば録画ソフトなどのアプリケーションソフトを実行するものであって、例えば中央演算処理装置（ＣＰＵ：Central Processing Unit）などが用いられる。アプリケーション実行ユニット５０は、後述するシーン変化検出装置６０からシーン変化検出情報を受け取り、例えばＣＭ（Commercial Message）区間を記述した情報ファイルを生成し、録画したデータと共に記憶装置４０にこれを記憶する。 The application execution unit 50 executes application software such as video recording software, and uses, for example, a central processing unit (CPU). The application execution unit 50 receives scene change detection information from a scene change detection device 60 to be described later, generates an information file describing a CM (Commercial Message) section, for example, and stores it in the storage device 40 together with the recorded data.

シーン変化検出装置６０は、ＴＢＣユニット２０が出力する画像信号とＡ／Ｄ変換器１０が出力する音声信号とに基づきシーン変化検出情報を出力する。シーン変化検出装置６０は、音声情報判定部７０、画像情報判定部８０、区間判定部９０を有する。音声情報判定部７０は、音声信号の音声レベルを解析して、無音検出時刻情報Ａを出力する。音声情報判定部７０は、無音期間判定部７１を有している。無音期間判定部７１は、音声信号の音声レベルが予め設定された無音判定レベルの値以下になった時刻を開始時刻として、この開始時刻から予め設定された無音判定期間の間音声レベルが閾値を超えない場合に、無音判定期間の終了時刻を無音検出時刻情報Ａとして出力する。 The scene change detection device 60 outputs scene change detection information based on the image signal output from the TBC unit 20 and the audio signal output from the A / D converter 10. The scene change detection device 60 includes an audio information determination unit 70, an image information determination unit 80, and a section determination unit 90. The voice information determination unit 70 analyzes the voice level of the voice signal and outputs silence detection time information A. The audio information determination unit 70 has a silent period determination unit 71. The silence period determination unit 71 uses a time when the sound level of the sound signal is equal to or lower than a preset silence determination level as a start time, and the sound level is set to a threshold value during a silence determination period set in advance from the start time. If not, the end time of the silence determination period is output as silence detection time information A.

画像情報判定部８０は、ＴＢＣユニット２０が出力する画像信号における輝度変化又は輝度レベルの低下を検出して、輝度変化又は輝度レベルの低下を検出した時刻をシーン変化時刻情報として記憶する。画像情報判定部８０は、シーンチェンジ検出部８１、黒画像検出部８２、ＯＲ回路８３、シーン変化時刻情報格納部８４を有する。 The image information determination unit 80 detects a change in luminance or a decrease in luminance level in the image signal output from the TBC unit 20, and stores the time when the change in luminance or the decrease in luminance level is detected as scene change time information. The image information determination unit 80 includes a scene change detection unit 81, a black image detection unit 82, an OR circuit 83, and a scene change time information storage unit 84.

シーンチェンジ検出部８１は、前時刻の画像信号の輝度レベルと現時刻の画像信号の輝度レベルとの差が予め設定される輝度変化幅閾値以上である場合等に、現時刻においてシーンチェンジが発生したと判断し、現時刻をシーンチェンジ検出結果Ｂとして出力する。黒画像検出部８２は、現時刻の画像信号の輝度レベルの絶対値が予め設定される輝度レベル閾値以下である場合に、現時刻において黒画像への変化が発生したと判断して、現時刻を黒画像検出結果Ｃとして出力する。 The scene change detection unit 81 generates a scene change at the current time when the difference between the luminance level of the image signal at the previous time and the luminance level of the image signal at the current time is greater than or equal to a preset luminance change width threshold. The current time is output as the scene change detection result B. When the absolute value of the luminance level of the image signal at the current time is equal to or less than a preset luminance level threshold, the black image detection unit 82 determines that a change to the black image has occurred at the current time, Is output as a black image detection result C.

ＯＲ回路８３は、シーンチェンジ検出結果Ｂと黒画像検出結果Ｃとの論理和をシーン変化時刻として出力する。つまり、ＯＲ回路８３は、シーンチェンジ検出結果Ｂ及び黒画像検出結果Ｃが同じ時刻を示す場合はその時刻をシーン変化時刻とし、シーンチェンジ検出結果Ｂと黒画像検出結果Ｃとが異なる時刻を示す場合はそれぞれをシーン変化時刻として出力する。シーン変化時刻情報格納部８４は、ＯＲ回路８３が出力するシーン変化時刻をシーン変化時刻情報Ｄとして格納する。 The OR circuit 83 outputs the logical sum of the scene change detection result B and the black image detection result C as the scene change time. That is, when the scene change detection result B and the black image detection result C indicate the same time, the OR circuit 83 sets the time as the scene change time, and the scene change detection result B and the black image detection result C indicate different times. In each case, each is output as a scene change time. The scene change time information storage unit 84 stores the scene change time output from the OR circuit 83 as scene change time information D.

区間判定部９０は、音声情報判定部７０が出力する無音検出時刻情報Ａと画像情報判定部８０に記憶されているシーン変化時刻情報Ｄとに基づきシーン変化検出情報を出力する。区間判定部９０は、シーン変化候補ポイント検出部９１、シーン変化候補ポイント判定部９２、出力判定部９３、判定最小値格納部９４、検出最大値格納部９５を有する。シーン変化候補ポイント検出部９１は、無音検出時刻情報Ａを含む所定の期間を無音検出期間に設定する。そして、シーン変化候補ポイント検出部９１は、無音検出期間中にシーン変化時刻情報Ｄから抽出したシーン変化時刻が含まれる場合、そのシーン変化時刻をシーン変化候補時刻Ｅとして出力する。なお、シーン変化候補ポイント検出部９１は、無音検出期間中に複数のシーン変化時刻が含まれる場合、最も前の時刻のシーン変化時刻をシーン変化候補時刻Ｅとして出力する。 The section determination unit 90 outputs scene change detection information based on the silence detection time information A output from the audio information determination unit 70 and the scene change time information D stored in the image information determination unit 80. The section determination unit 90 includes a scene change candidate point detection unit 91, a scene change candidate point determination unit 92, an output determination unit 93, a determination minimum value storage unit 94, and a detection maximum value storage unit 95. The scene change candidate point detection unit 91 sets a predetermined period including the silence detection time information A as the silence detection period. If the scene change time extracted from the scene change time information D is included in the silence detection period, the scene change candidate point detection unit 91 outputs the scene change time as the scene change candidate time E. The scene change candidate point detection unit 91 outputs the scene change time at the earliest time as the scene change candidate time E when a plurality of scene change times are included in the silence detection period.

シーン変化候補ポイント判定部９２は、シーン変化候補時刻Ｅの直後に発生する音声レベルの変化量（本実施の形態では、音声レベルの増加量ΔＡＵＤＩＯ）と、判定最小値格納部９４に格納されている判定最小値Ａ２＿ｔｈとの比較を行う。より具体的には、シーン変化候補ポイント判定部９２は、シーン変化候補時刻Ｅの後に所定の期間（例えば、候補ポイント判定期間Δｔ）を設定し、候補ポイント判定期間Δｔ中に判定最小値Ａ２＿ｔｈ以上の音声レベルの増加量ΔＡＵＤＩＯを検出すると、そのときのシーン変化候補時刻Ｅを有効と判定する。そして、シーン変化候補ポイント判定部９２は、有効と判定されたシーン変化候補時刻Ｅを有効シーン変化候補時刻Ｆとして出力する。一方、シーン変化候補ポイント判定部９２は、候補ポイント判定期間Δｔ中に判定最小値Ａ２＿ｔｈ以上の音声レベルの増加量ΔＡＵＤＩＯが検出されなかった場合は、そのときのシーン変化候補時刻Ｅを無効と判定する。そして、シーン変化候補ポイント判定部９２は、無効と判定されたシーン変化候補時刻Ｅを廃棄する。判定最小値格納部９４は、最小判定値を格納する。この判定最小値は、アプリケーション実行ユニット５０から送信されるものでも良く、また予め設定される値であっても良い。 The scene change candidate point determination unit 92 is stored in the audio level change amount (in this embodiment, the audio level increase ΔAUDIO) that occurs immediately after the scene change candidate time E, and the determination minimum value storage unit 94. Is compared with the determination minimum value A2_th. More specifically, the scene change candidate point determination unit 92 sets a predetermined period (for example, the candidate point determination period Δt) after the scene change candidate time E, and is equal to or greater than the determination minimum value A2_th during the candidate point determination period Δt. Is detected, the scene change candidate time E at that time is determined to be valid. Then, the scene change candidate point determination unit 92 outputs the scene change candidate time E determined to be valid as the valid scene change candidate time F. On the other hand, the scene change candidate point determination unit 92 determines that the scene change candidate time E at that time is invalid when the increase amount ΔAUDIO of the audio level equal to or greater than the determination minimum value A2_th is not detected during the candidate point determination period Δt. To do. Then, the scene change candidate point determination unit 92 discards the scene change candidate time E determined to be invalid. The determination minimum value storage unit 94 stores a minimum determination value. This minimum determination value may be transmitted from the application execution unit 50 or may be a preset value.

出力判定部９３は、有効シーン変化候補時刻Ｆのうち前後する時刻との差に基づき検出区間を算出と、検出区間の時間と検出最大値格納部９５に格納されている検出最大値ＣＭｍａｘとの比較と、を行ないシーン変化検出情報を出力する。なお、シーン変化検出情報の判定処理についての詳細は後述する。検出最大値格納部９３は、検出最大値ＣＭｍａｘを格納する。この検出最大値ＣＭｍａｘは、アプリケーション実行ユニット５０から送信されるものでも良く、また予め設定される値であっても良い。また、検出最大値ＣＭｍａｘは、例えば１つのＣＭの最大時間として定義されるものであって、以下の説明ではこの検出最大値よりも短い区間をＣＭ区間とし、長い区間を本編区間とする。 The output determination unit 93 calculates a detection interval based on the difference between the effective scene change candidate times F and the preceding and following times, and calculates the detection interval time and the detection maximum value CMmax stored in the detection maximum value storage unit 95. The scene change detection information is output after comparison. Details of the scene change detection information determination process will be described later. The maximum detection value storage unit 93 stores the maximum detection value CMmax. This detected maximum value CMmax may be transmitted from the application execution unit 50 or may be a preset value. The detected maximum value CMmax is defined as, for example, the maximum time of one CM. In the following description, a section shorter than the detected maximum value is a CM section, and a long section is a main section.

ここで、シーン変化候補ポイント検出部９１の動作について詳細に説明する。シーン変化候補ポイント検出部９１の動作を含むフローチャートを図２に示す。図２では、音声情報判定部７０の動作に基づきシーン変化候補ポイント検出部９１が動作し、その後画像情報判定部８０の動作に基づきさらにシーン変化候補ポイント検出部９１が動作するフローになっている。 Here, the operation of the scene change candidate point detection unit 91 will be described in detail. A flowchart including the operation of the scene change candidate point detection unit 91 is shown in FIG. In FIG. 2, the scene change candidate point detection unit 91 operates based on the operation of the audio information determination unit 70, and then the scene change candidate point detection unit 91 further operates based on the operation of the image information determination unit 80. .

図２に示すフローチャートによれば、シーン変化候補時刻の検出が開始されると、ステップＳ１として、まず音声情報判定部７０にて音声信号の音声レベルの解析が行なわれ無音検出時刻情報Ａが取得される（ステップＳ１）。続いて、シーン変化候補ポイント検出部９１は、無音検出時刻情報Ａが示す時刻の前に検出前期間値を加え、さらにこの時刻の後に検出後期間値を加え、検出前期間値と検出後期間値との合計値を無音検出期間とする（ステップＳ２）。 According to the flowchart shown in FIG. 2, when the detection of the scene change candidate time is started, as step S1, the audio information determination unit 70 first analyzes the audio level of the audio signal to obtain the silence detection time information A. (Step S1). Subsequently, the scene change candidate point detection unit 91 adds the pre-detection period value before the time indicated by the silence detection time information A, and further adds the post-detection period value after this time. The pre-detection period value and the post-detection period The total value with the value is set as a silence detection period (step S2).

そして、ステップＳ３として、画像情報判定部８０に格納されるシーン変化時刻情報Ｄを参照して、無音検出期間に含まれるシーン変化時刻があるか否かを判断する。このステップＳ３において、無音検出期間に含まれるシーン変化時刻がない場合、シーン変化候補時刻は出力されない。一方、ステップＳ３において、無音検出期間に含まれるシーン変化時刻がある場合、シーン変化候補ポイント検出部９１はそのシーン変化時刻の情報をｎ番目のシーン変化候補時刻Ｃｔｉｍｅ（ｎ）として取得する（ステップＳ４）。なお、ｎは、整数であって、本実施の形態におけるシーン変化検出装置のシーン変化検出動作において検出されたシーン変化候補ポイントの順番を示す。ｎについて以下の説明においても同様とする。 In step S3, it is determined whether or not there is a scene change time included in the silence detection period with reference to the scene change time information D stored in the image information determination unit 80. In this step S3, when there is no scene change time included in the silence detection period, the scene change candidate time is not output. On the other hand, if there is a scene change time included in the silence detection period in step S3, the scene change candidate point detection unit 91 acquires information on the scene change time as the nth scene change candidate time Ctime (n) (step S3). S4). Note that n is an integer and indicates the order of the scene change candidate points detected in the scene change detection operation of the scene change detection apparatus according to the present embodiment. The same applies to n in the following description.

次にステップＳ５としてシーン変化候補ポイント判定部９２が判定最小値格納部９４に格納される判定最小値Ａ２＿ｔｈを参照し、Ｃｔｉｍｅ（ｎ）が有効であるか否かを判定する。具体的には、Ｃｔｉｍｅ（ｎ）の直後の候補ポイント判定期間Δｔ（例えば５００ｍｓｅｃ以内の時間）に発生した音声レベル増加量ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを上回るか否かを判定する。このステップＳ５において、音声レベル増加量ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを下回る場合、Ｃｔｉｍｅ（ｎ）は誤検出されたシーン変化候補時刻として廃棄する(ステップＳ６)。そして、ステップＳ３において無音検出期間に含まれるシーン変化候補時刻がない場合、及び、ステップＳ６においてシーン変化候補時刻が無効にされた場合は有効なシーン変化候補時刻がないものとして処理が終了する。また、ステップＳ５において音声レベル増加量ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを上回る場合は検出されたシーン変化候補時刻を有効シーン変化候補時刻Ｆとて出力して（ステップＳ７）、処理が終了する。 Next, in step S5, the scene change candidate point determination unit 92 refers to the determination minimum value A2_th stored in the determination minimum value storage unit 94, and determines whether or not Ctime (n) is valid. Specifically, it is determined whether or not the audio level increase amount ΔAUDIO generated in the candidate point determination period Δt (for example, a time within 500 msec) immediately after Ctime (n) exceeds the determination minimum value A2_th. In step S5, when the audio level increase amount ΔAUDIO is less than the minimum determination value A2_th, Ctime (n) is discarded as an erroneously detected scene change candidate time (step S6). Then, if there is no candidate scene change time included in the silence detection period in step S3, and if the candidate scene change time is invalidated in step S6, the process ends with no valid scene change candidate time. If the audio level increase amount ΔAUDIO exceeds the minimum determination value A2_th in step S5, the detected scene change candidate time is output as the effective scene change candidate time F (step S7), and the process ends.

ここで、シーン変化候補ポイント検出部９１及びシーン変化候補ポイント判定部９２の動作のタイミングチャートを図３に示し、さらに詳細にシーン変化候補ポイント検出部９１及びシーン変化候補ポイント判定部９２の動作を説明する。まず、音声情報判定部７０に入力される音声信号の音声レベルが低下して、無音判定レベルＡ１＿ｔｈを下回ると、音声情報判定部７０は、その時刻を開始時刻として無音判定を開始する。そして、音声情報判定部７０は、無音判定開示時刻から予め設定される無音判定期間ｔ１が経過する間、音声レベルが無音判定レベルＡ１＿ｔｈを上回らないままであれば、無音判定期間経過後の時刻を無音検出時刻として判定し、その時刻を無音検出時刻情報Ａとして出力する。 Here, a timing chart of the operations of the scene change candidate point detection unit 91 and the scene change candidate point determination unit 92 is shown in FIG. 3, and the operations of the scene change candidate point detection unit 91 and the scene change candidate point determination unit 92 are described in more detail. explain. First, when the sound level of the sound signal input to the sound information determination unit 70 decreases and falls below the silence determination level A1_th, the sound information determination unit 70 starts silence determination using that time as the start time. If the voice level does not exceed the silence determination level A1_th during the silence determination period t1 set in advance from the silence determination disclosure time, the voice information determination unit 70 sets the time after the silence determination period has elapsed. The silence detection time is determined, and the time is output as silence detection time information A.

続いて、シーン変化候補ポイント検出部９１は、所定の期間として無音検出時刻情報Ａが示す時刻の前に検出前期間値を加え、さらにこの時刻の後に検出後期間値を加え、検出前期間値と検出後期間値との合計値を無音検出期間とする。なお、この無音検出期間中に音声信号の音声レベルが無音判定レベルＡ１＿ｔｈを上回った場合であっても、その区間も含め一義的に無音検出期間は設定される。そして、シーン変化候補ポイント検出部９１は、この無音検出期間に含まれるシーン変化時刻がある場合、そのシーン変化時刻のうち最も前のものをシーン変化候補時刻Ｃｔｉｍｅ（ｎ）として出力する。 Subsequently, the scene change candidate point detection unit 91 adds the pre-detection period value before the time indicated by the silence detection time information A as the predetermined period, and further adds the post-detection period value after this time. And the post-detection period value are defined as a silence detection period. Even when the sound level of the sound signal exceeds the silence determination level A1_th during the silence detection period, the silence detection period is uniquely set including that section. Then, when there is a scene change time included in the silence detection period, the scene change candidate point detection unit 91 outputs the earliest of the scene change times as the scene change candidate time Ctime (n).

そして、シーン変化候補ポイント判定部９２は、シーン変化候補時刻Ｃ＿Ｔｉｍｅ（ｎ）の後の候補ポイント判定期間Δｔの期間中に音声レベルの音声レベル増加量ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを上回った場合は、そのときのシーン変化候補時刻Ｃ＿Ｔｉｍｅ（ｎ）を有効なものと判定し、有効シーン変化候補時刻Ｃｔｉｍｅ（ｎ）として出力する。一方、シーン変化候補時刻Ｃ＿Ｔｉｍｅ（ｎ）の後の候補ポイント判定期間Δｔの期間中に音声レベルの音声レベル増加量ΔＡＵＤＩＯが判定最小値を上回らなかった場合は、そのときのシーン変化候補時刻Ｃ＿Ｔｉｍｅ（ｎ）を無効なもと判定して廃棄する。そのため、有効シーン変化候補時刻Ｆは出力されない。 Then, the scene change candidate point determination unit 92, when the audio level increase ΔAUDIO exceeds the determination minimum value A2_th during the candidate point determination period Δt after the scene change candidate time C_Time (n), The scene change candidate time C_Time (n) at that time is determined to be valid, and is output as the effective scene change candidate time Ctime (n). On the other hand, if the audio level increase amount ΔAUDIO of the audio level does not exceed the determination minimum value during the candidate point determination period Δt after the scene change candidate time C_Time (n), the scene change candidate time C_Time ( Determine n) as invalid and discard. Therefore, the effective scene change candidate time F is not output.

続いて、出力判定部９３の動作について詳細に説明する。出力判定部９３の動作を示すフローチャートを図４に示す。図４に示すように、出力判定部９３は、まずシーン変化候補時刻のうち二時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−２）と一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）との差分から第２の検出区間Ｌｔｉｍｅ（ｎ−１）を算出する（ステップＳ１０）。続いて、出力判定部９３は、シーン変化候補時刻のうち一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）と現時刻のシーン変化候補時刻Ｃｔｉｍｅ（ｎ）との差分から第１の検出区間Ｌｔｉｍｅ（ｎ）を算出する（ステップＳ１１）。 Next, the operation of the output determination unit 93 will be described in detail. A flowchart showing the operation of the output determination unit 93 is shown in FIG. As illustrated in FIG. 4, the output determination unit 93 firstly includes a scene change candidate time Ctime (n−2) two hours before the scene change candidate time and a scene change candidate time Ctime (n−1) one time before. The second detection interval Ltime (n−1) is calculated from the difference between the two (step S10). Subsequently, the output determination unit 93 determines the first detection interval from the difference between the scene change candidate time Ctime (n−1) one hour before the scene change candidate time and the current scene change candidate time Ctime (n). Ltime (n) is calculated (step S11).

そして、第１の検出区間Ｌｔｉｍｅ（ｎ）の時間と検出最大値ＣＭｍａｘの値とを比較する（ステップＳ１２）。ステップＳ１２にて、第１の検出区間Ｌｔｉｍｅ（ｎ）の時間が検出最大値ＣＭｍａｘの値以下であると判断された場合、第２の検出区間Ｌｔｉｍｅ（ｎ−１）の時間と検出最大値ＣＭｍａｘとを比較する（ステップＳ１３）。そして、ステップＳ１３にて、第２の検出区間Ｌｔｉｍｅ（ｎ−１）が検出最大値ＣＭｍａｘよりも大きな場合、シーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）をＣＭ開始時刻として出力する（ステップＳ１４）。また、ステップＳ１３にて、第２の検出区間Ｌｔｉｍｅ（ｎ−１）が検出最大値ＣＭｍａｘ以下であった場合、シーン変化検出情報の出力は行なわずに処理を終了する。 Then, the time of the first detection interval Ltime (n) is compared with the value of the maximum detection value CMmax (step S12). When it is determined in step S12 that the time of the first detection interval Ltime (n) is equal to or less than the value of the detection maximum value CMmax, the time of the second detection interval Ltime (n−1) and the detection maximum value CMmax Are compared (step S13). If the second detection interval Ltime (n−1) is larger than the maximum detection value CMmax in step S13, the scene change candidate time Ctime (n−1) is output as the CM start time (step S14). If the second detection interval Ltime (n−1) is equal to or less than the maximum detection value CMmax in step S13, the process ends without outputting the scene change detection information.

一方、ステップＳ１２にて、第１の検出区間Ｌｔｉｍｅ（ｎ）の時間が検出最大値ＣＭｍａｘの値よりも大きいと判断された場合、第２の検出区間Ｌｔｉｍｅ（ｎ−１）の時間と検出最大値ＣＭｍａｘとを比較する（ステップＳ１５）。そして、ステップＳ１５にて、第２の検出区間Ｌｔｉｍｅ（ｎ−１）が検出最大値ＣＭｍａｘ以下であった場合、シーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）をＣＭ終了時刻として出力する（ステップＳ１６）。また、ステップＳ１５にて、第２の検出区間Ｌｔｉｍｅ（ｎ−１）が検出最大値ＣＭｍａｘよりも大きかった場合、シーン変化検出情報の出力は行なわずに処理を終了する。 On the other hand, when it is determined in step S12 that the time of the first detection interval Ltime (n) is larger than the value of the detection maximum value CMmax, the time of the second detection interval Ltime (n-1) and the detection maximum The value CMmax is compared (step S15). If the second detection interval Ltime (n−1) is equal to or smaller than the maximum detection value CMmax in step S15, the scene change candidate time Ctime (n−1) is output as the CM end time (step S16). . If the second detection interval Ltime (n−1) is larger than the maximum detection value CMmax in step S15, the process ends without outputting the scene change detection information.

上記シーン変化検出装置において、検出されるシーン変化検出情報を模式的に示した図を図５から図８に示す。図５に示す例は、ステップＳ１４においてＣＭ開始時刻が出力される場合を示すものである。この場合、無音検出期間、シーンチェンジの検出結果Ｂ、黒画像検出結果Ｃ、及び、音声レベル増加量の判定結果に基づき有効シーン変化候補時刻Ｆが検出される。そして、３つのシーン変化候補時刻から求まる第１の検出区間Ｌｔｉｍｅ（ｎ）は検出最大値ＣＭｍａｘ以下であって、第２の検出区間Ｌｔｉｍｅ（ｎ−１）は検出最大値ＣＭｍａｘよりも大きい。従って、一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）がＣＭ開始時刻として検出される。なお、図５に示す例では、無音検出期間中にシーンチェンジ検出結果Ｂが検出された場合であっても、そのときに音声レベル増加量ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを上回らない場合についても示している。このような場合、ステップＳ６の判断によりそのときのシー変化候補時刻Ｅが廃棄され、有効シーン変化候補時刻Ｆは出力されない。 FIGS. 5 to 8 are diagrams schematically showing scene change detection information detected in the scene change detection apparatus. The example shown in FIG. 5 shows a case where the CM start time is output in step S14. In this case, the effective scene change candidate time F is detected based on the silence detection period, the scene change detection result B, the black image detection result C, and the determination result of the sound level increase amount. The first detection interval Ltime (n) obtained from the three scene change candidate times is equal to or less than the detection maximum value CMmax, and the second detection interval Ltime (n−1) is larger than the detection maximum value CMmax. Accordingly, the scene change candidate time Ctime (n−1) one hour before is detected as the CM start time. In the example shown in FIG. 5, even when the scene change detection result B is detected during the silence detection period, the case where the audio level increase amount ΔAUDIO does not exceed the determination minimum value A2_th at that time is also shown. Yes. In such a case, the sea change candidate time E at that time is discarded by the determination in step S6, and the effective scene change candidate time F is not output.

図６に示す例は、ステップＳ１６においてＣＭ終了時刻が出力される場合を示すものである。この場合、無音検出期間、シーンチェンジの検出結果Ｂ、黒画像検出結果Ｃ、及び、音声レベル増加量の判定結果に基づきシーン変化候補時刻Ｅが検出される。そして、３つのシーン変化候補時刻から求まる第１の検出区間Ｌｔｉｍｅ（ｎ）は検出最大値ＣＭｍａｘよりも大きく、第２の検出区間Ｌｔｉｍｅ（ｎ−１）は検出最大値ＣＭｍａｘ以下である。従って、一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）がＣＭ終了時刻として検出される。なお、図６に示す例では、無音検出期間中にシーンチェンジ検出結果Ｂが検出されない場合を示している。このような場合、そのときに音声レベル増加量。ΔＡＵＤＩＯが判定最小値Ａ２＿ｔｈを上回ったとしてもシー変化候補時刻Ｅがシーン変化候補ポイント検出部９１から出力されないため、有効シーン変化候補時刻Ｆは出力されない。 The example shown in FIG. 6 shows a case where the CM end time is output in step S16. In this case, the scene change candidate time E is detected based on the silence detection period, the scene change detection result B, the black image detection result C, and the determination result of the sound level increase amount. The first detection interval Ltime (n) obtained from the three scene change candidate times is larger than the detection maximum value CMmax, and the second detection interval Ltime (n−1) is less than or equal to the detection maximum value CMmax. Therefore, the scene change candidate time Ctime (n−1) one hour before is detected as the CM end time. In the example shown in FIG. 6, the scene change detection result B is not detected during the silence detection period. In such a case, the audio level increase at that time. Even if ΔAUDIO exceeds the minimum determination value A2_th, the sea change candidate time E is not output from the scene change candidate point detection unit 91, and therefore the valid scene change candidate time F is not output.

図７に示す例は、ステップＳ１３において第２の検出区間の長さが検出最大値よりも小さいと判断された場合である。この場合、無音検出期間、シーンチェンジの検出結果Ｂ及び黒画像検出結果Ｃに基づきシーン変化候補時刻Ｅが検出される。そして、３つのシーン変化候補時刻Ｅから求まる第１の検出区間Ｌｔｉｍｅ（ｎ）は検出最大値ＣＭｍａｘ以下であって、第２の検出区間Ｌｔｉｍｅ（ｎ−１）は検出最大値ＣＭｍａｘ以下である。従って、一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）についてはＣＭ終了時刻としての検出は行なわれない。そして、一時刻後のシーン変化候補時刻Ｃｔｉｍｅ（ｎ＋１）が検出され、シーン変化候補時刻Ｃｔｉｍｅ（ｎ＋１）とシーン変化候補時刻Ｃｔｉｍｅ（ｎ）との差から求まる第３の検出区間の値が検出最大値ＣＭｍａｘよりも大きい場合、現刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ）がＣＭ終了時刻として検出される。 The example shown in FIG. 7 is a case where it is determined in step S13 that the length of the second detection section is smaller than the maximum detection value. In this case, the scene change candidate time E is detected based on the silence detection period, the scene change detection result B, and the black image detection result C. The first detection interval Ltime (n) obtained from the three scene change candidate times E is equal to or less than the detection maximum value CMmax, and the second detection interval Ltime (n−1) is equal to or less than the detection maximum value CMmax. Therefore, the scene change candidate time Ctime (n−1) one hour before is not detected as the CM end time. Then, the scene change candidate time Ctime (n + 1) one time later is detected, and the value of the third detection section obtained from the difference between the scene change candidate time Ctime (n + 1) and the scene change candidate time Ctime (n) is detected maximum. When the value is larger than the value CMmax, the scene change candidate time Ctime (n) before the current time is detected as the CM end time.

図８に示す例は、ステップＳ１５において第２の検出区間の長さが検出最大値ＣＭｍａｘよりも大きいと判断された場合である。この場合、無音検出期間、シーンチェンジの検出結果Ｂ及び黒画像検出結果Ｃに基づきシーン変化候補時刻Ｅが検出される。そして、３つのシーン変化候補時刻Ｅから求まる第１の検出区間Ｌｔｉｍｅ（ｎ）は検出最大値ＣＭｍａｘよりも大きく、第２の検出区間Ｌｔｉｍｅ（ｎ−１）は検出最大値ＣＭｍａｘよりも大きい。従って、一時刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）についてはＣＭ開始時刻としての検出は行なわれない。そして、一時刻後のシーン変化候補時刻Ｃｔｉｍｅ（ｎ＋１）が検出され、シーン変化候補時刻Ｃｔｉｍｅ（ｎ＋１）とシーン変化候補時刻Ｃｔｉｍｅ（ｎ）との差から求まる第３の検出区間の値が検出最大値ＣＭｍａｘ以下であった場合、現刻前のシーン変化候補時刻Ｃｔｉｍｅ（ｎ）がＣＭ開始時刻として検出される。 The example shown in FIG. 8 is a case where it is determined in step S15 that the length of the second detection section is larger than the maximum detection value CMmax. In this case, the scene change candidate time E is detected based on the silence detection period, the scene change detection result B, and the black image detection result C. The first detection interval Ltime (n) obtained from the three scene change candidate times E is larger than the detection maximum value CMmax, and the second detection interval Ltime (n−1) is larger than the detection maximum value CMmax. Accordingly, the scene change candidate time Ctime (n−1) one hour before is not detected as the CM start time. Then, the scene change candidate time Ctime (n + 1) one time later is detected, and the value of the third detection section obtained from the difference between the scene change candidate time Ctime (n + 1) and the scene change candidate time Ctime (n) is detected maximum. If the value is less than or equal to the value CMmax, the scene change candidate time Ctime (n) before the current time is detected as the CM start time.

上記説明より、本実施の形態にかかるシーン変化検出装置６０は、音声信号の無音検出期間、画像信号の変化点及び音声信号の変化量に基づきシーン変化候補時刻を算出し、複数のシーン変化候補時刻から検出区間を算出する。このとき、本実施の形態にかかるシーン変化検出装置６０では、無音検出期間と有音期間とにおける音声レベルの変化量を考慮して、変化量が判定最小値Ａ２＿ｔｈ以上でなければ検出されたシーン変化候補時刻を廃棄する。このように、本実施の形態にかかるシーン変化検出装置６０では、ＣＭ区間と番組区間との音声レベルの差がそれぞれの区間内での音声レベルの差よりも大きなことに着目し、シーン変化候補時刻の検出精度を向上させることができる。そして、高精度に検出されたシーン変化候補時刻に基づき番組の開始と終了又はＣＭの開始と終了との区間を示す検出区間を高精度に検出する。また、検出区間と検出最大値とを比較することで、検出区間の長さが不定な場合であっても１つの設定値に基づき検出するべき区間の誤検出を防止することが可能である。つまり、本実施の形態にかかるシーン変化検出装置６０は、高精度に検出された検出区間と検出最大値とにより、精度の良い番組区間とＣＭ区間との境界の検出を行うことができる。 From the above description, the scene change detection device 60 according to the present embodiment calculates the scene change candidate time based on the silence detection period of the audio signal, the change point of the image signal, and the change amount of the audio signal, and a plurality of scene change candidates. The detection interval is calculated from the time. At this time, in the scene change detection device 60 according to the present embodiment, in consideration of the amount of change in the audio level during the silence detection period and the sound period, the detected scene is not greater than the determination minimum value A2_th. Discard the change candidate time. As described above, the scene change detection device 60 according to the present embodiment pays attention to the fact that the difference in the audio level between the CM section and the program section is larger than the difference in the audio level in each section. Time detection accuracy can be improved. Then, based on the scene change candidate time detected with high precision, a detection section indicating a section between the start and end of the program or the start and end of the CM is detected with high precision. Further, by comparing the detection interval and the maximum detection value, it is possible to prevent erroneous detection of the interval to be detected based on one set value even when the length of the detection interval is indefinite. That is, the scene change detection device 60 according to the present embodiment can detect the boundary between the program section and the CM section with high accuracy based on the detection section detected with high accuracy and the maximum detected value.

また、本実施の形態にかかるシーン変化検出装置６０は、検出区間のうち隣接する検出区間のそれぞれと検出最大値とを比較し、その結果に基づき検出すべきシーン変化候補時刻を判定することで、１つの検出区間のみで判定するときよりも検出精度を向上させることが可能である。例えば、図８に示すように、本編区間において検出されたシーン変化候補時刻Ｃｔｉｍｅ（ｎ−１）についての検出を行なわない処理が可能である。また、このような複数の検出区間を用いて判定を行なうことで、例えばＣＭ区間が連続する場合であっても、連続するＣＭ区間の開始時刻と終了時刻とを判定することが可能になる。例えば、図７に示すように、複数のＣＭ区間が連続する場合に、連続するＣＭ区間の最も前のＣＭ区間の開始時刻と最も後のＣＭ区間の終了時刻とを検出し、連続するＣＭ区間の間の時刻は検出しない処理が可能である。このような処理を行なうことで、本実施の形態にかかるシーン変化検出装置は、検出するシーン変化検出情報の個数を減らすことが可能である。 Further, the scene change detection device 60 according to the present embodiment compares each detection interval adjacent to the detection maximum value among the detection intervals, and determines a scene change candidate time to be detected based on the result. The detection accuracy can be improved as compared with the case where the determination is made with only one detection section. For example, as shown in FIG. 8, it is possible to perform processing that does not detect the scene change candidate time Ctime (n−1) detected in the main section. Further, by performing determination using such a plurality of detection sections, for example, even when CM sections are continuous, it is possible to determine the start time and end time of the continuous CM sections. For example, as shown in FIG. 7, when a plurality of CM sections are continuous, the start time of the earliest CM section and the end time of the last CM section of the consecutive CM sections are detected, and the consecutive CM sections are detected. It is possible to perform processing without detecting the time between. By performing such processing, the scene change detection apparatus according to the present embodiment can reduce the number of scene change detection information to be detected.

さらに、本実施の形態にかかるシーン変化検出装置６０は、音声情報判定部７０が無音と判定した無音検出時刻に対して所定の幅を有する無音検出期間を設定することで、シーン変化候補時刻の誤検出を防止する。例えば、図１に示す符号化装置１では、ＴＢＣユニット２０による遅延があるために、図９に示すような音声信号と画像信号とのズレが生じる。図９に示す例は、本来は、画像信号の黒画像に同期して音声信号の音声レベルが低下しなければならないところ、ＴＢＣユニット２０の遅延によって、音声レベルが低下する期間と黒画像となる期間にズレが生じてしまった場合を示す。このような場合であっても、本実施の形態にかかるシーン変化検出装置６０は、無音検出時刻に対して所定の幅を有する無音検出期間を設定するために、このズレを検出前設定値と検出後設定値によって吸収して、シーン変化候補時刻を正確に検出することが可能である。 Furthermore, the scene change detection device 60 according to the present embodiment sets the silence detection period having a predetermined width with respect to the silence detection time determined by the audio information determination unit 70 as silence, thereby determining the scene change candidate time. Prevent false detection. For example, in the encoding apparatus 1 shown in FIG. 1, there is a delay due to the TBC unit 20, which causes a deviation between the audio signal and the image signal as shown in FIG. 9. In the example shown in FIG. 9, the sound level of the sound signal must be lowered in synchronization with the black image of the image signal. The case where the gap has occurred in the period is shown. Even in such a case, the scene change detection device 60 according to the present embodiment uses this deviation as a pre-detection set value in order to set a silence detection period having a predetermined width with respect to the silence detection time. It is possible to accurately detect the scene change candidate time by absorbing the set value after detection.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、検出最大値は、ＣＭ区間の最大値に限られるものではなく、検出するべき区間の最大値を設定したものであれば良い。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention. For example, the maximum detection value is not limited to the maximum value in the CM section, but may be any value as long as the maximum value in the section to be detected is set.

実施の形態１にかかるシーン変化検出装置を含む符号化装置のブロック図である。1 is a block diagram of an encoding device including a scene change detection device according to a first exemplary embodiment. 実施の形態１にかかるシーン変化検出装置におけるシーン変化候補時刻の検出処理のフローチャートを示す図である。It is a figure which shows the flowchart of the detection process of the scene change candidate time in the scene change detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン変化検出装置におけるシーン変化候補時刻の検出処理のタイミングチャートである。6 is a timing chart of a process for detecting a scene change candidate time in the scene change detection apparatus according to the first embodiment; 実施の形態１にかかるシーン変化検出装置における出力判定処理のフローチャートを示す図である。It is a figure which shows the flowchart of the output determination process in the scene change detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン検出装置において検出されるシーン変化検出情報を模式的に示した図である。It is the figure which showed typically the scene change detection information detected in the scene detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン検出装置において検出されるシーン変化検出情報を模式的に示した図である。It is the figure which showed typically the scene change detection information detected in the scene detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン検出装置において検出されるシーン変化検出情報を模式的に示した図である。It is the figure which showed typically the scene change detection information detected in the scene detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン検出装置において検出されるシーン変化検出情報を模式的に示した図である。It is the figure which showed typically the scene change detection information detected in the scene detection apparatus concerning Embodiment 1. FIG. 実施の形態１にかかるシーン検出装置において音声信号と画像信号のズレによる誤検出を防止する例を示すタイミングチャートである。4 is a timing chart illustrating an example of preventing erroneous detection due to a difference between an audio signal and an image signal in the scene detection device according to the first exemplary embodiment;

Explanation of symbols

１符号化装置
１０Ａ／Ｄ変換器
２０ＴＢＣユニット
３０エンコーダ
４０記憶装置
５０アプリケーション実行ユニット
６０シーン変化検出装置
７０音声情報判定部
７１無音期間判定部
８０画像情報判定部
８１シーンチェンジ検出部
８２黒画像検出部
８３ＯＲ回路
８４シーン変化時刻情報格納部
９０区間判定部
９１シーン変化候補ポイント検出部
９２シーン変化候補ポイント判定部
９３出力判定部
９４判定最小値格納部
９５検出最大値格納部
Ａ無音検出時刻情報
Ｂシーンチェンジ検出結果
Ｃ黒画像検出結果
Ｄシーン変化時刻情報
Ｅシーン変化候補時刻
Ｆ有効シーン変化候補時刻
ＣＭｍａｘ検出最大値
Ｃｔｉｍｅシーン変化候補時刻
Ｌｔｉｍｅ検出区間 1 Encoder 10 A / D Converter 20 TBC Unit 30 Encoder 40 Storage Device 50 Application Execution Unit 60 Scene Change Detection Device 70 Audio Information Determination Unit 71 Silent Period Determination Unit 80 Image Information Determination Unit 81 Scene Change Detection Unit 82 Black Image Detection unit 83 OR circuit 84 Scene change time information storage unit 90 Section determination unit 91 Scene change candidate point detection unit 92 Scene change candidate point determination unit 93 Output determination unit 94 Determination minimum value storage unit 95 Detection maximum value storage unit A Silence detection time Information B Scene change detection result C Black image detection result D Scene change time information E Scene change candidate time F Effective scene change candidate time CMmax Detection maximum value Ctime Scene change candidate time Ltime Detection section

Claims

A voice information determination unit that detects a silence state of the voice signal and outputs silence detection time information;
An image information determination unit that detects a change in luminance or a decrease in luminance level of an image in an image signal and stores scene change time information indicating a time when the luminance change or the decrease in luminance level is detected;
A scene change candidate point detector that outputs a first scene change candidate time based on the silence detection time information and the scene change time information;
The effectiveness of the first scene change candidate time is determined based on the amount of change in the audio level of the audio signal at a time later than the first scene change candidate time and a preset minimum determination value, and is effective A scene change candidate point determination unit that outputs the first scene change candidate time determined as the second scene change candidate time;
An output determination unit that outputs scene change detection information based on the time difference between the preceding and succeeding times of the second scene change candidate time and the detected maximum value ;
The output determination unit
Calculating a first detection interval based on a time difference between a current time of the second scene change candidate times and a time one hour before the current time of the second scene change candidate times;
A second detection interval based on a time difference between a time one hour before the current time of the second scene change candidate time and a time two times before the current time of the second scene change candidate time. To calculate
When the first detection interval is equal to or less than the maximum detection value and the second detection interval is greater than the maximum detection value, the second scene change candidate time one time before is determined as the scene change detection information. Output as the start time,
When the first detection interval is larger than the detection maximum value and the second detection interval is less than or equal to the detection maximum value, the second scene change candidate time one time before is determined as the scene change detection information. A scene change detection device that outputs the end time .

The scene change candidate point determination unit is preset with a candidate point determination period for setting a predetermined period after the first scene change candidate time, and calculates the amount of change in the audio level within the candidate point determination period. The scene change detection apparatus according to claim 1, which is used for determining the effectiveness.

The scene change detection device according to claim 2, wherein the scene change candidate point determination unit discards the first scene change candidate time when the amount of change in the audio level is smaller than the determination minimum value.

The scene change candidate point detection unit sets a predetermined period including the silence detection time information as a silence detection period, and the first scene change candidate when the scene change time information is included in the silence detection period The scene change detection apparatus according to claim 1, which outputs time.

The scene change candidate point detection unit outputs the scene change time information indicating the earliest time as the first scene change candidate time when a plurality of the scene change time information is included in the silence detection period. Item 5. The scene change detection device according to Item 4.

The image information determination unit includes a scene change detection unit that detects a luminance change of an image in the image signal, a black image detection unit that detects a decrease in luminance level in the image signal, an output of the scene change detection unit, and the scene change detecting device according to any one of claims 1 to 5 having an oR circuit for calculating the logical sum of the output of the black image detector.

And a scene change detecting device according to claim 1 to 6,
An analog / digital converter for converting an analog audio signal and an analog image signal into a digital audio signal and a digital image signal;
A delay circuit for delaying the digital image signal;
An encoder that encodes the digital audio signal and a digital image signal input via the delay circuit,
The audio information determination unit of the scene change detection device outputs the silence detection time information based on the digital audio signal, and the image information determination unit of the scene change detection device inputs the digital image input to the encoder. An encoding device that outputs the scene change time information based on a signal.

A scene change detection method in a scene change detection apparatus to which a video composed of an audio signal and an image signal is input,
Detect silence state of audio signal, generate silence detection time information,
Detecting a change in luminance or a decrease in luminance level of an image in an image signal, and generating scene change time information indicating a time at which the luminance change or a decrease in luminance level is detected;
Generating a first scene change candidate time based on the silence detection time information and the scene change time information;
The effectiveness of the first scene change candidate time is determined based on the amount of change in the audio level of the audio signal at a time later than the first scene change candidate time and a preset minimum determination value, and is effective Output the first scene change candidate time determined as the second scene change candidate time,
Calculating a first detection interval based on a time difference between a current time of the second scene change candidate times and a time one hour before the current time of the second scene change candidate times;
A second detection interval based on a time difference between a time one hour before the current time of the second scene change candidate time and a time two times before the current time of the second scene change candidate time. To calculate
If the first detection interval is equal to or less than the maximum detection value and the second detection interval is greater than the maximum detection value, the second scene change candidate time one time before is determined from the scene change detection information. Output as start time,
When the first detection interval is larger than the detection maximum value and the second detection interval is less than or equal to the detection maximum value, the second scene change candidate time one time before is determined as the scene change detection information. A scene change detection method that outputs the end time .