JP4214990B2

JP4214990B2 - Event detection method, apparatus and program

Info

Publication number: JP4214990B2
Application number: JP2004355505A
Authority: JP
Inventors: 弾三上; 精一紺谷; 正志森本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-12-08
Filing date: 2004-12-08
Publication date: 2009-01-28
Anticipated expiration: 2024-12-08
Also published as: JP2006166105A

Description

本発明は、直線パターン識別方法及び装置及びプログラムに係り、特に、映像中から特定のイベントを検出するためのイベント検出方法及び装置及びプログラムに関する。 The present invention relates to a linear pattern identification method, apparatus, and program, and more particularly, to an event detection method, apparatus, and program for detecting a specific event from an image.

映像中からイベントを検出することにより、一覧性を向上させるといった研究は古くから行われている。例えば、野球映像中から投球のイベントを検出することにより、要約を行う方法がある（例えば、特許文献１参照）。
特開２００３−５２００３号公報 Research has been conducted for a long time to improve the listability by detecting events in video. For example, there is a method of summarizing by detecting a pitching event from a baseball video (for example, see Patent Document 1).
JP 2003-52003 A

しかしながら、上記従来の方法では、映像中の緑色領域の検出や、褐色領域の検出を行って、投手が投球するときの構図を検出することによって、投球イベントの特定を行っている。すなわち、テンプレートマッチングによる検出である。 However, in the above-described conventional method, the pitch event is specified by detecting the green region or the brown region in the video and detecting the composition when the pitcher throws. That is, detection by template matching.

しかし、内野には芝のないグランドもあれば、マウンドの砂の色が赤褐色のグランドから一般的な砂の色まで様々である。さらにそれは、デイゲームであったり、ナイトゲームであったりして映り方は様々に変化してしまう。さらに、カメラが設置される位置は球場及び放送局によって、変化してしまい、全ての球場、試合時刻、放送局において利用可能なテンプレートを用意するのは非常に難しい。 However, there are grounds with no turf in the infield, and the sand color of the mound varies from reddish brown grounds to common sand colors. Furthermore, it is a day game or a night game, and the way it is projected changes variously. Furthermore, the position where the camera is installed varies depending on the stadium and broadcasting station, and it is very difficult to prepare a template that can be used at all stadiums, game times, and broadcasting stations.

本発明は、上記の点に鑑みなされたもので、映像から撮影環境の影響を受けずに、イベントを高精度に検出することが可能なイベント検出方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide an event detection method, apparatus, and program capable of detecting an event with high accuracy without being influenced by a shooting environment from a video. To do.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、映像中から特定のイベントを検出するイベント検出方法において、
映像記憶手段から映像を読み込む映像読み込みステップ（ステップ１）と、
読み込まれた映像中の音響の特徴を検出する音響特徴検出ステップ（ステップ２）と、
音響の特徴が検出された映像区間に基づいて、イベント候補の検出を行うイベント候補検出ステップ（ステップ３）と、
イベント候補の映像区間における映像特徴を抽出する映像特徴抽出ステップ（ステップ４）と、
映像特徴に基づいて、映像候補の教師なしクラスタリングを行うクラスタリングステップ（ステップ５）と、
要素数が多い上位のクラスタを正解クラスタとして記憶手段に格納する正解クラスタ選択ステップ（ステップ６）と、を行う。 The present invention (Claim 1) is an event detection method for detecting a specific event from a video.
A video reading step (step 1) for reading video from the video storage means;
An acoustic feature detection step (step 2) for detecting acoustic features in the read video;
An event candidate detection step (step 3) for detecting an event candidate based on the video section in which the acoustic feature is detected;
A video feature extraction step (step 4) for extracting video features in the video segment of the event candidate;
A clustering step (step 5) for performing unsupervised clustering of video candidates based on video features;
A correct cluster selection step (step 6) of storing the upper cluster having a large number of elements as a correct cluster in the storage means is performed.

また、本発明（請求項２）は、正解クラスタ選択ステップ(ステップ６)の後に、
クラスタリングの結果からテンプレート画像を取得するテンプレート画像取得ステップと、
テンプレート画像により映像記憶手段の映像全体を再検索するイベント再検索ステップと、を更に行う。 In the present invention (Claim 2), after the correct cluster selection step (Step 6),
A template image acquisition step for acquiring a template image from the result of clustering;
An event re-retrieval step for re-searching the entire video of the video storage means by using the template image is further performed.

また、本発明（請求項３）は、音響特徴検出ステップ（ステップ２）において、
音響パワー推移に突発音の特徴がある部分を音響の特徴として抽出する。 Further, the present invention (Claim 3) is the acoustic feature detection step (Step 2).
A portion having a sudden sound feature in the sound power transition is extracted as a sound feature.

また、本発明（請求項４）は、映像特徴抽出ステップ（ステップ４）において、
映像特徴として、イベント候補の区間における画像のＲＧＢ値を用いる。 Further, the present invention (Claim 4) is, in the video feature extraction step (Step 4),
The RGB value of the image in the event candidate section is used as the video feature.

また、本発明（請求項５）は、映像特徴抽出ステップ（ステップ４）において、
映像特徴として、エッジ画像の方向成分を用いる。 Further, the present invention (Claim 5) provides a video feature extraction step (Step 4).
The direction component of the edge image is used as the video feature.

また、本発明（請求項６）は、映像特徴抽出ステップ（ステップ４）において、
映像特徴として、動き特徴を用いる。 Further, the present invention (Claim 6) provides a video feature extraction step (Step 4).
Motion features are used as video features.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項７）は、映像中から特定のイベントを検出するイベント検出装置であって、
映像を記憶する映像記憶手段１１０と、
映像記憶手段１１０から映像を読み込む映像読み込み手段１と、
読み込まれた映像中の音響の特徴を検出する音響特徴検出手段２と、
音響の特徴が検出された映像区間に基づいて、イベント候補の検出を行うイベント候補検出手段３と、
イベント候補の映像区間における映像特徴を抽出する映像特徴抽出手段４と、
映像特徴に基づいて、映像候補の教師なしクラスタリングを行うクラスタリング手段５と、
要素数が多い上位のクラスタを正解クラスタとして記憶手段１２０に格納する正解クラスタ選択手段６と、を有する。 The present invention (Claim 7) is an event detection device for detecting a specific event from a video,
Video storage means 110 for storing video;
Video reading means 1 for reading video from the video storage means 110;
Acoustic feature detection means 2 for detecting acoustic features in the read video;
Event candidate detection means 3 for detecting event candidates based on the video section in which the acoustic features are detected;
Video feature extraction means 4 for extracting video features in a video section of event candidates;
Clustering means 5 for performing unsupervised clustering of video candidates based on video features;
Correct cluster selection means 6 for storing the upper cluster having a large number of elements in the storage means 120 as a correct answer cluster.

また、本発明（請求項８）は、クラスタリングの結果からテンプレート画像を取得するテンプレート画像取得手段と、
テンプレート画像により映像記憶手段１１０の映像全体を再検索するイベント再検索手段と、を更に有する。 Further, the present invention (Claim 8) includes a template image acquisition means for acquiring a template image from the result of clustering;
Event re-searching means for re-searching the entire video in the video storage unit 110 using the template image is further included.

また、本発明（請求項９）は、音響特徴検出手段２において、
音響パワー推移に突発音の特徴がある部分を音響の特徴として抽出する。 Further, the present invention (Claim 9) is the acoustic feature detection means 2,
A portion having a sudden sound feature in the sound power transition is extracted as a sound feature.

また、本発明（請求項１０）は、映像特徴抽出手段４において、
映像特徴として、イベント候補の区間における画像のＲＧＢ値を用いる。 Further, the present invention (Claim 10) is the video feature extracting means 4,
The RGB value of the image in the event candidate section is used as the video feature.

また、本発明（請求項１１）は、映像特徴抽出手段４において、
映像特徴として、エッジ画像の方向成分を用いる。 Further, the present invention (claim 11) is the video feature extraction means 4,
The direction component of the edge image is used as the video feature.

また、本発明（請求項１２）は、映像特徴抽出手段４において、
映像特徴として、動き特徴を用いる。 Further, the present invention (Claim 12) is the video feature extraction means 4,
Motion features are used as video features.

本発明（請求項１３）は、映像中から特定のイベントを検出するイベント検出プログラムであって、
請求項１乃至６記載のイベント検出方法を実現するための方法をコンピュータに実行させるプログラムである。 The present invention (Claim 13) is an event detection program for detecting a specific event from a video,
A program for causing a computer to execute a method for realizing the event detection method according to claim 1.

上記のように、本発明によれば、スポーツ映像(例えば、野球)と打撃や捕球の映像シーンとのマッチングを行う処理において、先に音響からシーン抽出処理を行い、次に、抽出した映像シーンのクラスタリング処理を行うことにより、従来の技術と比べて、高い精度で打撃や捕球の映像シーンとのマッチングが可能となる。このため、撮影環境に影響を受けずにイベントを高精度に検出することが可能となる。 As described above, according to the present invention, in the process of matching a sports video (for example, baseball) and a video scene of hitting or catching, the scene extraction processing is first performed from the sound, and then the extracted video By performing the scene clustering process, it is possible to match the video scene of hitting or catching with higher accuracy than in the conventional technique. For this reason, it becomes possible to detect an event with high accuracy without being affected by the shooting environment.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明では、イベント検出装置において、まず、経験的知識として、映像に比べて撮影角度などによる変化が少ない音響情報を用いた検出を行い、その後映像特徴を用いてイベントを絞り込む処理を行う。 In the present invention, in the event detection apparatus, first, as empirical knowledge, detection is performed using acoustic information that has less change due to shooting angle or the like compared to video, and then processing for narrowing down events using video features is performed.

図３は、本発明の一実施の形態におけるイベント検出装置の構成を示す。 FIG. 3 shows the configuration of the event detection apparatus according to an embodiment of the present invention.

同図に示すイベント検出装置は、読み込み部１、音響特徴抽出部２、イベント候補検出部３、映像特徴抽出部４、クラスタリング部５、正解クラスタ選択部６、テンプレート作成部７、テンプレート利用再検索部８、映像記憶部１１０及びイベント記憶部１２０から構成される。 The event detection apparatus shown in the figure includes a reading unit 1, an acoustic feature extraction unit 2, an event candidate detection unit 3, a video feature extraction unit 4, a clustering unit 5, a correct cluster selection unit 6, a template creation unit 7, and a template usage re-search. Unit 8, a video storage unit 110, and an event storage unit 120.

読み込み部１は、映像記憶部１１０から映像を読み込み、音響特徴抽出部２に出力する。 The reading unit 1 reads a video from the video storage unit 110 and outputs it to the acoustic feature extraction unit 2.

音響特徴抽出部２は、読み込まれた映像の音響の特徴を抽出し、イベント候補検出部３に出力する。 The acoustic feature extraction unit 2 extracts the acoustic feature of the read video and outputs it to the event candidate detection unit 3.

イベント候補検出部３は、音響の特徴からイベント候補の検出を行い、映像特徴抽出部４に出力する。 The event candidate detection unit 3 detects an event candidate from the acoustic feature and outputs it to the video feature extraction unit 4.

映像特徴抽出部４は、イベント候補となった区間から、映像の特徴を抽出し、正解候補クラスタリング部５に出力する。 The video feature extraction unit 4 extracts video features from the sections that are event candidates and outputs them to the correct candidate clustering unit 5.

正解候補クラスタリング部５は、映像の特徴量に基づいてイベント候補のクラスタリングを行い、正解クラスタ選択部６に出力する。 The correct candidate clustering unit 5 performs clustering of event candidates based on the video feature amount, and outputs the result candidates to the correct cluster selecting unit 6.

正解クラスタ選択部６は、クラスタリング結果から正解クラスタを選択し、テンプレート作成部７に出力する。 The correct cluster selection unit 6 selects a correct cluster from the clustering result and outputs it to the template creation unit 7.

テンプレート作成部７は、クラスタリング結果からテンプレート画像を取得する。 The template creation unit 7 acquires a template image from the clustering result.

テンプレート利用再検索部８は、テンプレート画像により映像全体を再検索し、その結果をイベント記憶部１２０に格納する。 The template usage re-search unit 8 re-searches the entire video using the template image, and stores the result in the event storage unit 120.

映像記憶部１１０は、ディスク装置等の記憶媒体であり、映像を蓄積する。 The video storage unit 110 is a storage medium such as a disk device, and stores video.

イベント記憶部１２０は、ディスク装置等の記憶媒体であり、テンプレート利用再検索部８から出力されたクラスタのイベントを記録する。 The event storage unit 120 is a storage medium such as a disk device, and records a cluster event output from the template use re-search unit 8.

図４は、本発明の一実施の形態におけるイベント検出装置の動作のフローチャートである。 FIG. 4 is a flowchart of the operation of the event detection apparatus in one embodiment of the present invention.

ステップ１０１）読み込み部１において、映像記憶部１１０から映像を読み込む。 Step 101) The reading unit 1 reads a video from the video storage unit 110.

ステップ１０２）音響特徴抽出部２において、読み込まれた映像の音響信号のパワーを計算し、当該パワーの推移に基づいて音響の特徴を抽出する。音響の特徴を抽出する方法としては、パワーの推移に突発音の特徴がある部分を検出する方法がある。 Step 102) The acoustic feature extraction unit 2 calculates the power of the acoustic signal of the read video, and extracts the acoustic feature based on the transition of the power. As a method for extracting acoustic features, there is a method for detecting a portion having a sudden sound feature in the transition of power.

ステップ１０３）イベント候補検出部３は、音響特徴抽出部２により抽出された音響の特徴に基づいてイベント候補の検出を行う。具体的には、抽出された音響の特徴が含まれる映像の区間を検出する。 Step 103) The event candidate detection unit 3 detects an event candidate based on the acoustic feature extracted by the acoustic feature extraction unit 2. Specifically, a video section including the extracted acoustic features is detected.

ステップ１０４）映像特徴抽出部４は、イベント候補検出部３において検出された映像の区間から任意の方法により映像の特徴量を求める。 Step 104) The video feature extraction unit 4 obtains the video feature amount from the video section detected by the event candidate detection unit 3 by an arbitrary method.

ステップ１０５）正解候補クラスタリング部５は、映像特徴抽出部４で求められた特徴量に基づいて映像のクラスタリングを行う。クラスタリングの方法としては、映像特徴量を用いた教師なしクラスタリングを行う方法がある。 Step 105) The correct candidate clustering unit 5 performs video clustering based on the feature amount obtained by the video feature extraction unit 4. As a clustering method, there is a method of performing unsupervised clustering using video feature amounts.

ステップ１０６）正解クラスタ選択部６は、クラスタリングの結果に基づいて、正解クラスタを選択する。 Step 106) The correct cluster selector 6 selects a correct cluster based on the clustering result.

ステップ１０７）テンプレート作成部７は、クラスタリングの結果からテンプレート画像を取得する。 Step 107) The template creation unit 7 acquires a template image from the clustering result.

ステップ１０８）テンプレート利用再検索部８は、テンプレート画像を用いて映像記憶部１１０を再検索し、その結果をイベント記憶部１２０に格納する。 Step 108) The template use re-search unit 8 re-searches the video storage unit 110 using the template image, and stores the result in the event storage unit 120.

なお、ステップ１０７とステップ１０８の処理は、必須でなく、映像特徴量として再検出に利用できる程度に出現する特徴(例えば、ＭＨＩ：Motion History Image)を検出できた場合に、ステップ１０７においてテンプレートを生成し、ステップ１０８において当該テンプレートを用いて再検索することにより、良い結果を得ることができる。 Note that the processing of step 107 and step 108 is not essential, and if a feature that appears to the extent that it can be used for redetection as a video feature amount (for example, MHI: Motion History Image) can be detected, a template is determined in step 107. Good results can be obtained by generating and re-searching in step 108 using the template.

以下、図面と共に本発明の実施例を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

本実施例では、野球映像から音響解析によって投球後に発生する打球音、あるいは捕球音の候補を取得し、投球イベントの候補を求め、その投球イベントの各候補に対する動画像区間から動き特徴によるクラスタリングを行い、映像中から投球イベントを取得する例を説明する。 In this embodiment, a candidate for a pitching sound or a catching sound generated after a pitch is obtained from a baseball video by acoustic analysis, a pitching event candidate is obtained, and clustering based on motion features from a moving image section for each pitching event candidate is obtained. An example in which a pitching event is acquired from the video will be described.

図５は、本発明の一実施例のイベント検出装置の構成を示す。 FIG. 5 shows the configuration of an event detection apparatus according to an embodiment of the present invention.

同図に示すイベント検出装置は、映像読み込み部１０、突発音区間検出部２０、突発音区間取得部４０、突発音区間映像クラスタリング部５０、テンプレート作成部７０、テンプレート画像利用再建策部８０、映像記憶部１１０、打捕球イベント記憶部１２０から構成される。 The event detection apparatus shown in the figure includes a video reading unit 10, a sudden sound segment detection unit 20, a sudden sound segment acquisition unit 40, a sudden sound segment video clustering unit 50, a template creation unit 70, a template image utilization reconstruction measure unit 80, a video. The storage unit 110 and the hitting ball event storage unit 120 are configured.

映像読み込み部１０は、映像記憶部１１０から映像を読み込み、突発音区間検出部２０に出力する。 The video reading unit 10 reads a video from the video storage unit 110 and outputs it to the sudden sound section detection unit 20.

突発音区間検出部２０は、読み込まれた映像中から突発音響を含む部分を検出し、突発音区間映像取得部４０に出力する。このとき、後述する音響解析により突発音を検出し、投球シーンの候補を取得する。 The sudden sound section detection unit 20 detects a portion including sudden sound from the read video and outputs the detected part to the sudden sound section video acquisition unit 40. At this time, sudden sound is detected by acoustic analysis to be described later, and pitching scene candidates are acquired.

突発音区間映像取得部４０は、検出された突発音区間について、その突発区間映像を取得し、突発音区間映像クラスタリング部５０に出力する。 The sudden sound section video acquisition unit 40 obtains the sudden sound section video of the detected sudden sound section and outputs it to the sudden sound section video clustering unit 50.

突発音区間映像クラスタリング部５０は、突発音区間映像をクラスタリングすることにより、イベントを選択する。 The sudden sound section video clustering unit 50 selects an event by clustering the sudden sound section video.

テンプレート作成部７０は、クラスタリング結果から、テンプレート画像を取得する。 The template creation unit 70 acquires a template image from the clustering result.

テンプレート利用再検索部８０は、テンプレート画像を用いて再検索し、その結果を打捕球イベント記憶部１２０に格納する。 The template use re-search unit 80 performs re-search using the template image, and stores the result in the hit ball event storage unit 120.

図６は、本発明の一実施例のイベント検出処理のフローチャートである。 FIG. 6 is a flowchart of event detection processing according to an embodiment of the present invention.

ステップ２０１）映像読み込み部１０は、映像記憶部１１０から映像を読み込む。 Step 201) The video reading unit 10 reads a video from the video storage unit 110.

ステップ２０２）突発音区間検出部２０は、読み込まれた映像中から突発音響を含む部分を音響解析により検出する。 Step 202) The sudden sound section detection unit 20 detects a part including sudden sound from the read video by acoustic analysis.

ステップ２０３）突発区間映像取得部４０において、ステップ２０２で検出された突発音区間に対応する映像を取得する。 Step 203) In the sudden section video acquisition unit 40, a video corresponding to the sudden sound section detected in Step 202 is acquired.

ステップ２０４）突発音区間映像クラスタリング部５０において、ステップ２０３で取得した突発音区間映像をクラスタリングし、クラスタリング結果に基づいて、イベントを選択する。 Step 204) In the sudden sound section video clustering unit 50, the sudden sound section video acquired in step 203 is clustered, and an event is selected based on the clustering result.

ステップ２０５）テンプレート作成部７０において、ステップ２０４で得られたクラスタリング結果からテンプレート画像を取得する。 Step 205) The template creation unit 70 acquires a template image from the clustering result obtained in step 204.

ステップ２０６）テンプレート利用再検索部８０において、テンプレート画像を用いて、映像記憶部１１０を再検索し、その結果を打捕球イベント記憶部１２０に格納する。 Step 206) The template use re-search unit 80 re-searches the video storage unit 110 using the template image, and stores the result in the hitting ball event storage unit 120.

以下、上記の動作を詳細に説明する。 Hereinafter, the above operation will be described in detail.

まず、上記のステップ２０２の突発音検出動作について説明する。 First, the sudden sound detection operation in step 202 will be described.

図７は、本発明の一実施例の突発音の検出のフローチャートである。 FIG. 7 is a flowchart for detecting sudden sound according to an embodiment of the present invention.

ステップ３０１）突発音区間検出部２０は、読み込まれた映像の音響信号のパワーを計算する。 Step 301) The sudden sound section detection unit 20 calculates the power of the audio signal of the read video.

ステップ３０２）パワーの推移に突発音の特徴がある部分を検出する。 Step 302) A portion having a sudden sound characteristic in the power transition is detected.

次に、図７のステップ３０１について詳細に説明する。 Next, step 301 in FIG. 7 will be described in detail.

図８は、本発明の一実施例の音響信号のパワーの計算方法のフローチャートである。 FIG. 8 is a flowchart of a method for calculating the power of an acoustic signal according to an embodiment of the present invention.

ステップ４０１）突発音区間検出部２０において、ｉ＝０と初期化する。 Step 401) The sudden sound section detection unit 20 initializes i = 0.

ステップ４０２）ｊ＝ｉからj＝i＋WindowSize（ウィンドウサイズ）の音響信号data[j]に対して窓関数をかける。窓関数としては、Hamming 窓やHanning窓、矩形窓などが有名である。ここでは、どのような窓関数を使っても特に構わない。 Step 402) A window function is applied to the acoustic signal data [j] from j = i to j = i + WindowSize (window size). As the window function, Hamming window, Hanning window, rectangular window, etc. are famous. Here, any window function may be used.

ステップ４０３） \sum_{j=i^{i+WindowSize}data[i]で、ｉからWindowSize幅の音響信号に対してパワーを計算し、ｐ（ｉ）に代入する。 Step 403) With \ sum_ {j = i ^ {i + WindowSize} data [i], the power is calculated for the acoustic signal having a window size width from i and substituted for p (i).

ステップ４０４）ｉにWindowSlide（ウィンドウスライド量）を加算する。 Step 404) Add WindowSlide (window slide amount) to i.

ステップ４０５） i+WindowSlideがDataMaxよりも小さいかどうかを判断する。小さければ、ステップ４０２に移行し、そうでなければ処理を終了する。 Step 405) It is determined whether i + WindowSlide is smaller than DataMax. If it is smaller, the process proceeds to step 402; otherwise, the process is terminated.

次に、図７のステップ３０２の突発性検出処理について詳細に説明する。 Next, the suddenness detection process in step 302 of FIG. 7 will be described in detail.

図９は、本発明の一実施例の突発性検出処理のフローチャートである。 FIG. 9 is a flowchart of sudden detection processing according to an embodiment of the present invention.

ステップ５０１）ｉ=0と初期化する。 Step 501) i = 0 is initialized.

ステップ５０２） flag＝０と初期化する。 Step 502) Initialize flag = 0.

ステップ５０３）現在のflagにnew_flagを代入してから、new_flag=F(p(i),flag)を計算する。F(p(i),flag)は、ステップ３０１で求めたパワーと、現在のflagとから決定する関数である。new_flagの求め方を図１０に示す。 Step 503) After assigning new_flag to the current flag, new_flag = F (p (i), flag) is calculated. F (p (i), flag) is a function determined from the power obtained in step 301 and the current flag. FIG. 10 shows how to obtain new_flag.

ステップ５０４） new_flagが１であれば、ステップ５０５へ移行し、そうでなければステップ５０６に移行する。 Step 504) If new_flag is 1, the process proceeds to step 505, and if not, the process proceeds to step 506.

ステップ５０５）時刻ｓｔにｉを代入する。 Step 505) Substitute i for time st.

ステップ５０６） flag=3のときに、new_flagが１か０の場合には、ステップ５０７に移行し、それ以外の場合はステップ５１０に移行する。 Step 506) When flag = 3, if new_flag is 1 or 0, the process proceeds to Step 507; otherwise, the process proceeds to Step 510.

ステップ５０７）ｔにｉ−ｓｔを代入する。 Step 507) Substitute i-st for t.

ステップ５０８）ｔｈ４＜ｔ＜ｔｈ５を満たせばステップ５０９に移行し、そうでなければステップ５１０に移行する。 Step 508) If th4 <t <th5 is satisfied, the process proceeds to step 509; otherwise, the process proceeds to step 510.

ステップ５０９）時刻ｓｔを突発音区間の先頭として保存する。 Step 509) The time st is stored as the head of the sudden sound section.

ステップ５１０） i+WindowSlideを加えてステップ５０３に移行し、そうでなければステップ処理を終了する。 Step 510) i + WindowSlide is added and the process proceeds to Step 503. Otherwise, the step process is terminated.

ステップ５１１）ｉにWindowSlideを加えてステップ５０３に移行する。 Step 511) Add WindowSlide to i and go to Step 503.

図１０は、new_flagを求める関数を表した表である。突発音として検出されるのは網掛け部分であり、ｔｈ１＞p（i）かつflag=3の場合、new_flag=1となり、ｔｈ３＞ｐ（ｉ）＞ｔｈ１かつ、flag=3の場合は、new_flag=0となることを表している。 FIG. 10 is a table showing a function for obtaining new_flag. It is a shaded portion that is detected as a sudden sound. When th1> p (i) and flag = 3, new_flag = 1, and when th3> p (i)> th1 and flag = 3, new_flag = 0.

次に、ステップ２０４の突発音区間映像クラスタリング部５０におけるサムネイル画像のクラスタリング処理について説明する。 Next, the thumbnail image clustering process in the sudden sound section video clustering unit 50 in step 204 will be described.

図１１は、本発明の一実施例のサムネイル画像のクラスタリングのフローチャートである。 FIG. 11 is a flowchart of thumbnail image clustering according to an embodiment of the present invention.

ステップ７０１）突発音区間映像から特徴を抽出する。 Step 701) Features are extracted from the sudden sound section video.

ステップ７０２）突発音区間映像を特徴量を用いて教師なしクラスタリングを行う。 Step 702) Unsupervised clustering is performed on the sudden sound section video using the feature amount.

ステップ７０３）要素数の多いＮクラスタを打捕球クラスタとし、図１２に示す打捕球イベント記憶部８０に保存する。 Step 703) N clusters having a large number of elements are set as hitting ball clusters and stored in the hitting ball event storage unit 80 shown in FIG.

次に、上記のステップ７０１の映像特徴の抽出処理について説明する。 Next, the video feature extraction processing in step 701 will be described.

図１３は、本発明の一実施例のサムネイルの画像特徴量検出処理のフローチャート（その1）である。同図に示す処理は、画像を小さなブロックに区切ったときの画素値の平均値を用いて特徴量を検出するものである。 FIG. 13 is a flowchart (No. 1) of the thumbnail image feature amount detection processing according to the embodiment of the present invention. The process shown in the figure is to detect a feature amount using an average value of pixel values when an image is divided into small blocks.

ステップ８０１）ｘ＝０、ｙ＝０と初期化する。 Step 801) Initialization is performed with x = 0 and y = 0.

ステップ８０２）左上が（ｘ，ｙ）、右下が（x+block_size, y+block_size）の矩形領域に関して、RGB（Red,Green,Blue）各々について、画素値の平均を求め、v[x/block_size][y/block_size]に代入する。 Step 802) With respect to a rectangular area whose upper left is (x, y) and lower right is (x + block_size, y + block_size), an average of pixel values is obtained for each of RGB (Red, Green, Blue), and v [x / Assign to block_size] [y / block_size].

ステップ８０３） x+block_size<width-block_sizeであれば、ステップ８０４へ移行し、そうでなければステップ８０５に移行する。 Step 803) If x + block_size <width-block_size, go to Step 804, otherwise go to Step 805.

ステップ８０４） xにblock_sizeを加えてステップ８０２へ移行する。 Step 804) The block_size is added to x, and the process proceeds to Step 802.

ステップ８０５）ｘに０を代入する。 Step 805) 0 is substituted for x.

ステップ８０６） y+block_size<height-block_sizeであれば、ステップ８０７へ移行し、そうでなければ処理を終了する。 Step 806) If y + block_size <height-block_size, the process proceeds to Step 807; otherwise, the process ends.

ステップ８０５） xに０を代入する。 Step 805) 0 is substituted for x.

ステップ８０７）ｙにblock_sizeを加えて、ステップ８０２へ移行する。 Step 807) The block_size is added to y, and the process proceeds to Step 802.

上記の方法は、vを特徴量として用いるものである。 The above method uses v as a feature quantity.

次に、上記のステップ７０１の別の実施方法について説明する。 Next, another implementation method of the above step 701 will be described.

図１４は、本発明の一実施例のサムネイルの画像特徴量検出処理のフローチャート（その２）である。同図に示す処理は、エッジ画像の方向成分を用いて映像の特徴量を検出するものである。 FIG. 14 is a flowchart (part 2) of the thumbnail image feature amount detection process according to the embodiment of the present invention. The process shown in the figure is to detect the feature amount of the video using the direction component of the edge image.

ステップ１００１）ｘ＝０、ｙ＝０と初期化する。 Step 1001) Initialize with x = 0 and y = 0.

ステップ１００２）左上が（x，ｙ）、右下が（x+block_size,y+block_size）の矩形領域に関して、ＲＧＢ各々について、画素値のｘ方向差分を求め、d_x[x/block_size][y/block_size]に代入する。 Step 1002) For the rectangular area whose upper left is (x, y) and lower right is (x + block_size, y + block_size), the x-direction difference of pixel values is obtained for each of RGB, and d_x [x / block_size] [y / Assign to block_size].

ステップ１００３）左上が（ｘ，ｙ）、右下が（x+block_size, y+block_size)の木液領域に関して、ＲＧＢ各々について、画素値のｙ方向差分を求め、d_y[x/block_size][y/block_size]に代入する。 Step 1003) With respect to the sap area of the upper left (x, y) and the lower right (x + block_size, y + block_size), a pixel value y-direction difference is obtained for each of RGB, and d_y [x / block_size] [y Assign to / block_size].

ステップ１００４） x+block_size<width-block_sizeであれば、ステップ１００６へ移行する。そうでなければステップ８０５に移行する。 Step 1004) If x + block_size <width-block_size, go to Step 1006. Otherwise, the process proceeds to step 805.

ステップ１００５）ｘにblock_sizeを加えてステップ１００２へ移行する。 Step 1005) Block_size is added to x, and the process proceeds to Step 1002.

ステップ１００６）ｘに０を代入する。 Step 1006) 0 is substituted for x.

ステップ１００７） y+block_size<height-block_sizeであれば、ステップ８０７へ移行し、そうでなければ処理を終了する。 Step 1007) If y + block_size <height-block_size, the process proceeds to step 807, and if not, the process ends.

ステップ１００８）ｙにblock_sizeを加えてステップ１００２へ移行する。 Step 1008) Add block_size to y and go to Step 1002.

上記の手法では、d_x及びd_yを特徴量として用いている。 In the above method, d_x and d_y are used as feature amounts.

上記のｖ，d_x及びd_yの全てを用いて画像特徴量の検出を行ってもよい。 The image feature amount may be detected using all of v, d_x, and d_y.

さらに、ステップ７０１の別の方法として動き特徴を利用してもよい。その場合には、例えば、K. Fujii and K. Arakawa, “Video editing based on motion recognition using temporal templates,” Proc, ACM User Interface Software Technology (UIST) 2003, pp. 71-72,2003を利用することが可能である。なお、動き特徴を用いる場合には、図４のステップ１０４とステップ１０５の間に、ステップ１０４で取得した映像特徴をテンプレートとして用いて、映像記憶部１１０に格納されている映像全体を再検索する処理が必要となる。 Furthermore, motion features may be used as another method of step 701. In this case, for example, K. Fujii and K. Arakawa, “Video editing based on motion recognition using temporal templates,” Proc, ACM User Interface Software Technology (UIST) 2003, pp. 71-72, 2003 should be used. Is possible. When using the motion feature, the entire video stored in the video storage unit 110 is searched again using the video feature acquired in step 104 as a template between step 104 and step 105 in FIG. Processing is required.

次に、本実施例を実施した際の評価実験について説明する。 Next, an evaluation experiment when this example is implemented will be described.

実験に用いたコンテンツは、メジャーリーグ放送映像（以降、放送映像）、及び自ら撮影したキャッチボール映像（以降、個人撮影映像）である。 The content used in the experiment is a major league broadcast video (hereinafter referred to as broadcast video) and a catch ball video (hereinafter referred to as personal shot video) taken by itself.

放送映像は３種類、個人撮影映像は２種類のコンテンツについて実験を行った。全てのコンテンツとも４４．１kHzのmpeg映像である。以降、音を用いた打捕球音検出の結果、それらをクラスタリングすることにより作成したテンプレートＭＨＩ(Motion History Image)の例、テンプレートＭＨＩを用いて映像から投球シーン検出を行った結果について説明する。 Experiments were conducted on three types of broadcast video and two types of personal video. All contents are 44.1kHz mpeg video. Hereinafter, an example of a template MHI (Motion History Image) created by clustering them as a result of detection of a hitting ball sound using sound, and a result of detecting a pitching scene from a video using the template MHI will be described.

まず、音による投球シーン検出を、テレビ放送コンテンツ及び家庭用テレビカメラで撮影したキャッチボール映像において実験した。その結果を表１に示す。 First, an experiment to detect a pitching scene by sound was performed on a catchball image shot with a television broadcast content and a home-use television camera. The results are shown in Table 1.

音による投球シーン検出は球場、撮影環境により、その精度が大幅に変化してしまう。放送映像に関しては、人が聞いて打球音捕球音を何とか聞き取ることができる音響レベルのコンテンツである。そのため、再現率が低めの値になっている。

The accuracy of the pitching scene detection by sound varies greatly depending on the stadium and the shooting environment. The broadcast video is a sound level content that a person can listen to and catch the hitting sound. Therefore, the recall rate is a low value.

また、個人撮影のキャッチボールに関しては会話をしながらのキャッチボールであり、会話以外に大きな突発性雑音は含まれていない。検出漏れは、投球ミスにより捕球音が含まれないものであり、再現率の低さはキャッチャーからの返球の検出によるものが多い。 In addition, the catch ball of personal photography is a catch ball while having a conversation, and does not include a large sudden noise other than the conversation. The detection omission is that the catching sound is not included due to a throwing mistake, and the low recall is often due to the detection of the returned ball from the catcher.

この適合率・再現率では、放送映像に関しては想定どおり適合率が５０％を超えているため問題ないが、個人撮影映像に関しては５０％を下回ってしまい、クラスタリング時に、投球イベントのクラスタの要素数が、他のクラスタの要素数を下回ってしまう可能性がある。 This precision / reproducibility is not a problem because the precision is over 50% for broadcast video as expected, but it is less than 50% for personal video, and the number of elements in the pitch event cluster during clustering. May fall below the number of elements in other clusters.

このクラスタリングでは、再現率が低すぎると、音検出による投球イベント検出の制度がよくない場合に問題が発生してしまう可能性があるが、一方で適合率を常に１００％に保つ必要がある。さらに、実装上の問題として、初期クラスタ要素数ｎは、Ο（ｎ２）で計算量を増大させる。これらの状況を考え、以降の実験では、ｎ＝３を用いることとした。 In this clustering, if the reproducibility is too low, there is a possibility that a problem may occur when the pitching event detection system based on sound detection is not good. On the other hand, it is necessary to keep the precision rate at 100%. Further, as an implementation problem, the initial cluster element number n increases the amount of calculation by Ο (n2). Considering these situations, n = 3 was used in the subsequent experiments.

打捕球音により検出した投球イベント候補うち、クラスタリングによって得られたた投球イベントクラスタを用いてテンプレートを作成する。 A template is created using a pitch event cluster obtained by clustering among pitch event candidates detected by the hitting sound.

個人撮影映像に関しては、適合率が約３０％と低いため、投球イベント以外のクラスタの要素数が最も多くなってしまう危険性があった。しかし、実際には音による投球イベント検出での投球イベント候補のうち投球イベント以外に関しては、その時刻での動きが様々であり、大きな一つのクラスタにまとまることはなく、経験的に、個人撮影映像においても十分な精度でテンプレートＭＨＩの作成が可能である。
With regard to personal video, since the relevance rate is as low as about 30%, there is a risk that the number of elements of the cluster other than the pitching event will be the largest. However, actually, among the pitching event candidates in the pitching event detection by sound, the movement at that time is various, and it is not put together into one big cluster, and it is empirically personal video The template MHI can be created with sufficient accuracy .

テンプレートＭＨＩとのマッチングによる投球シーン検出の結果を表２に示す。 Table 2 shows the results of pitching scene detection by matching with the template MHI.

また、テンプレートＭＨＩとの距離の時間による推移のグラフを図１５に示す。表２から分かるとおり、再現率が９５．８％、適合率が８５．０％と、音による検出に比べて四度よく検出ができていることがわかる。さらに、適合率について、誤検出の全てがリプレーの投球シーンであった。リプレーはスロー再生しているため理想的には検出しないことが望ましいため不正解としているが、当然ながら非常に困難である。リプレーも正解とするならば、適合率は１００％であり、適合率・再現率ともに高精度であることがわかる。

Further, FIG. 15 shows a graph of the transition of the distance from the template MHI with time. As can be seen from Table 2, the reproducibility is 95.8% and the relevance rate is 85.0%, indicating that the detection can be performed four times better than the detection by sound. Furthermore, with regard to precision, all false detections were replaying scenes. Since replay is slow playback, it is desirable not to detect it ideally, so it is an incorrect answer, but of course it is very difficult. If the replay is also correct, it can be seen that the precision is 100%, and the precision and recall are both highly accurate.

また、図１５に示すテンプレートＭＨＩと個人撮影映像との距離の時間変化において、距離が小さくなっている部分は全て投球イベントであった。これから分かるとおり、投球イベントとそのほかのイベントに関して十分に分離可能であり、テンプレートが適切なものであることも見て取れる。これから野球映像インデキシングとして極めて有用であることがわかる。 Further, in the time change of the distance between the template MHI and the personal shot image shown in FIG. 15, all portions where the distance is small are pitching events. As you can see, you can see that the throwing event and other events are well separable and that the template is appropriate. From this, it can be seen that this is extremely useful as baseball video indexing.

なお、図３、図５に示すイベント検出装置の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行し、本発明を実現することができること、あるいは、図４、図６〜図９、図１１、図１３、図１４のフローチャートや本文記載の説明文で示した処理の手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることはいうまでもなく、コンピュータでその機能を実現するためのプログラム、あるいは、コンピュータにその処理の手順を実行させるためのプログラムを、そのコンピュータが読み取り可能な記憶媒体、例えば、ＨＤＤ，ＭＯ，ＲＯＭ，メモリカード，ＣＤ，ＤＶＤ，リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。 3 or 5 can be implemented by using a computer program to execute a part or all of the functions of the event detection apparatus shown in FIGS. 3 and 5, and the present invention can be realized. It goes without saying that the processing procedures shown in the flowcharts of FIGS. 6 to 9, 11, 13, and 14 and the explanatory text described in the text can be constituted by a computer program and the program can be executed by the computer. In addition, a computer-readable storage medium such as an HDD, an MO, a ROM, a memory card, a program for realizing the function of the computer, or a program for causing the computer to execute the processing procedure. Record it on a CD, DVD, removable disk, etc., save it, or distribute it Rukoto is possible.

上記のプログラムは、インターネットや電子メールなど、ネットワークを通して提供することも可能である。 The above program can also be provided through a network such as the Internet or electronic mail.

以上、本発明の代表的な実施の形態及び実施例を説明したが、本発明は上記の実施の形態及び実施例に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The exemplary embodiments and examples of the present invention have been described above, but the present invention is not limited to the above-described embodiments and examples, and various modifications and applications are possible within the scope of the claims. It is.

本発明は、映像からイベントを検出する技術に適用可能である。 The present invention is applicable to a technique for detecting an event from a video.

本発明の原理説明図である。It is a principle explanatory view of the present invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態におけるイベント検出装置の構成図である。It is a block diagram of the event detection apparatus in one embodiment of this invention. 本発明の一実施の形態におけるイベント検出装置の動作のフローチャートである。It is a flowchart of operation | movement of the event detection apparatus in one embodiment of this invention. 本発明の一実施例のイベント検出装置の構成図である。It is a block diagram of the event detection apparatus of one Example of this invention. 本発明の一実施例のイベント検出処理のフローチャートである。It is a flowchart of the event detection process of one Example of this invention. 本発明の一実施例の突発音の検出のフローチャートである。It is a flowchart of the detection of sudden sound of one Example of this invention. 本発明の一実施例の音響信号のパワーの計算方法のフローチャートである。It is a flowchart of the calculation method of the power of the acoustic signal of one Example of this invention. 本発明の一実施例の突発性検出処理のフローチャートである。It is a flowchart of the suddenness detection process of one Example of this invention. 本発明の一実施例のnew_flagの求め方F(p(i)，flag)の例である。It is an example of how to obtain new_flag F (p (i), flag) according to an embodiment of the present invention. 本発明の一実施例のサムネイル画像のクラスタリング処理のフローチャートである。It is a flowchart of the clustering process of the thumbnail image of one Example of this invention. 本発明の一実施例の打捕球イベント記憶部の例である。It is an example of the hitting ball event storage part of one Example of this invention. 本発明の一実施例のサムネイルの画像特徴量検出処理のフローチャート(その１)である。It is a flowchart (the 1) of the image feature-value detection process of the thumbnail of one Example of this invention. 本発明の一実施例のサムネイルの画像特徴量検出処理のフローチャート(その２)である。It is a flowchart (the 2) of the image feature-value detection process of the thumbnail of one Example of this invention. テンプレートMHIとの距離を示す図である。It is a figure which shows the distance with the template MHI.

Explanation of symbols

１読み込み手段、読み込み部
２音響特徴抽出手段、音響特徴抽出部
３イベント候補検出手段、イベント候補検出部
４映像特徴抽出手段、映像特徴抽出部
５クラスタリング手段、クラスタリング部
６正解クラスタ選択手段、正解クラスタ選択部
７テンプレート作成部
８テンプレート利用再検索部
１０映像読み込み部
２０突発音区間検出部
４０突発音区間映像取得部
５０突発音区間映像クラスタリング部
７０テンプレート作成部
８０テンプレート利用再検索部
１１０映像記憶手段、映像記憶部
１２０イベント記憶手段、イベント記憶部、打捕球イベント記憶部 DESCRIPTION OF SYMBOLS 1 Reading means, Reading part 2 Acoustic feature extraction means, Acoustic feature extraction part 3 Event candidate detection means, Event candidate detection part 4 Video feature extraction means, video feature extraction part 5 Clustering means, Clustering part 6 Correct cluster selection means, Correct cluster Selection unit 7 Template creation unit 8 Template use re-search unit 10 Video reading unit 20 Sudden sound segment detection unit 40 Sudden sound segment video acquisition unit 50 Sudden sound segment video clustering unit 70 Template creation unit 80 Template re-search unit 110 Video storage means , Video storage unit 120 event storage means, event storage unit, hitting ball event storage unit

Claims

In the event detection method for detecting a specific event in the video,
A video reading step for reading video from the video storage means;
An acoustic feature detection step of detecting an acoustic feature in the read video;
An event candidate detection step for detecting an event candidate based on the video section in which the acoustic feature is detected;
A video feature extraction step of extracting video features in the video segment of the event candidate;
A clustering step for performing unsupervised clustering of video candidates based on the video features;
A correct cluster selection step of storing the upper cluster having a large number of elements in the storage means as a correct cluster;
An event detection method characterized by:

After the correct cluster selection step,
A template image acquisition step of acquiring a template image from the clustering result;
An event re-search step for re-searching the entire video of the video storage means by the template image;
The event detection method according to claim 1, further comprising:

In the acoustic feature detection step,
The event detection method according to claim 1, wherein a portion having a sudden sound characteristic in the sound power transition is extracted as the acoustic feature.

In the video feature extraction step,
The event detection method according to claim 1, wherein an RGB value of an image in the event candidate section is used as the video feature.

In the video feature extraction step,
The event detection method according to claim 4, wherein a direction component of an edge image is used as the video feature.

In the video feature extraction step,
The event detection method according to claim 4, wherein a motion feature is used as the video feature.

An event detection device for detecting a specific event from a video,
Video storage means for storing video;
Video reading means for reading video from the video storage means;
Acoustic feature detecting means for detecting acoustic features in the read video;
Event candidate detection means for detecting an event candidate based on the video section in which the acoustic feature is detected;
Video feature extraction means for extracting video features in the event candidate video section;
Clustering means for performing unsupervised clustering of video candidates based on the video features;
Correct cluster selection means for storing the upper cluster having a large number of elements in the storage means as the correct cluster;
An event detection apparatus comprising:

Template image acquisition means for acquiring a template image from the clustering result;
Event re-search means for re-searching the entire video of the video storage means by the template image;
The event detection apparatus according to claim 7, further comprising:

The acoustic feature detection means includes
The event detection device according to claim 7, wherein a portion having a sudden sound characteristic in the sound power transition is extracted as the acoustic feature.

The video feature extraction means includes
The event detection apparatus according to claim 7, wherein an RGB value of an image in the event candidate section is used as the video feature.

The video feature extraction means includes
The event detection apparatus according to claim 10, wherein a direction component of an edge image is used as the video feature.

The video feature extraction means includes
The event detection apparatus according to claim 10, wherein a motion feature is used as the video feature.

An event detection program for detecting a specific event from a video,
An event detection program causing a computer to execute a method for realizing the event detection method according to claim 1.