JP4906588B2

JP4906588B2 - Specific operation determination device, reference data generation device, specific operation determination program, and reference data generation program

Info

Publication number: JP4906588B2
Application number: JP2007133179A
Authority: JP
Inventors: 正樹高橋; 昌秀苗村; 真人藤井; 伸行八木
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2007-05-18
Filing date: 2007-05-18
Publication date: 2012-03-28
Anticipated expiration: 2027-05-18
Also published as: JP2008287594A

Description

本発明は、映像解析の技術に係り、特に、映像に含まれる人物の特定の動作を判定するための特定動作判定装置、リファレンスデータ生成装置、特定動作判定プログラム及びリファレンスデータ生成プログラムに関する。 The present invention relates to video analysis technology, and more particularly to a specific motion determination device, a reference data generation device, a specific motion determination program, and a reference data generation program for determining a specific motion of a person included in a video.

一般に、投球シーンでは、ピッチャの後ろから撮影した映像（以下、ＰＣサイズ映像と言う）が放送される。そのため、野球中継の映像から投球シーンを抽出するためには、当該映像からＰＣサイズ映像を検知すれば投球シーンをもれなく収集することが可能に思える。しかし、実際にはピッチャの交代の場面などの投球シーン以外でもＰＣサイズ映像が用いられる場合があるため、ＰＣサイズ映像を検知しても、投球シーンのみを抽出することはできず、更には、投球のタイミングを判定することはできない。 In general, in a pitching scene, a video (hereinafter referred to as a PC size video) shot from behind a pitcher is broadcast. Therefore, in order to extract a pitching scene from a baseball broadcast video, it seems possible to collect all pitching scenes by detecting a PC size video from the video. However, since the PC size video may be used other than the pitching scene such as a pitcher changing scene, it is not possible to extract only the pitching scene even if the PC size video is detected. The timing of the pitch cannot be determined.

そこで、従来、テレビ放送等の映像信号から所望のシーンを検索する方法が提案されている。例えば、事前に登録されたバッタの顔の特徴量やピッチャ領域の上下動の推移の情報を参照して、野球のプレイの開始時刻と終了時刻を判定し、オンプレイ映像を抽出する方法が開示されている（特許文献１参照）。 Therefore, conventionally, a method for searching for a desired scene from a video signal such as a television broadcast has been proposed. For example, a method is disclosed in which a baseball play start time and end time are determined by referring to pre-registered grasshopper face feature amounts and pitcher region transition information, and an on-play video is extracted. (See Patent Document 1).

また、スポーツ番組の画面中央付近の文字スーパによる得点表示によって得点シーンを検出し、また、映像中の物体や背景の動きベクトル量により特徴的なシーンを検出する方法が提案されている（特許文献２参照）。
特開２００６−２３８３９３号公報（段落番号００３４〜００８９）特開平１１−５５６１３号公報（段落番号００２４〜００３５） In addition, a method has been proposed in which a scoring scene is detected by displaying a score by a character super near the center of the screen of a sports program, and a characteristic scene is detected by the amount of motion vectors of objects and backgrounds in a video (Patent Literature). 2).
JP 2006-238393 A (paragraph numbers 0034 to 0089) JP-A-11-55613 (paragraph numbers 0024 to 0035)

しかし、特許文献１に記載の方法では、顔の特徴量を登録するため、予め登録すべき情報が膨大になり、また、処理負荷も高くなるため、リアルタイム処理に向かない。更に、ピッチャの上下動の推移はシンプルな特徴量であるため、投球以外の動作を投球動作と誤判定してしまう可能性があるという問題があった。 However, the method described in Patent Document 1 is not suitable for real-time processing because the amount of information to be registered in advance is enormous and the processing load increases because facial feature amounts are registered. Further, since the vertical movement of the pitcher is a simple feature amount, there is a possibility that an operation other than the pitching may be erroneously determined as the pitching operation.

また、特許文献２に記載の方法では、スーパ表示から特定シーンを判定するが、スーパ表示のない映像から特定のシーンを判定することはできない。更に、動きベクトルを用いた判定では単純な閾値処理によって判定しているため、誤判定する可能性があるという問題があった。 In the method described in Patent Document 2, a specific scene is determined from a super display, but a specific scene cannot be determined from an image without a super display. Furthermore, since the determination using the motion vector is performed by simple threshold processing, there is a possibility that the determination may be erroneous.

なお、野球のイベント（例えば、三振、ヒット、ダブルプレーなど）は、ほとんどが投球シーンを起点としている。従って、野球のイベントを判定するためには、まず投球シーンを識別することが肝要である。そして、このようなイベントの情報を、過去の膨大な映像にメタデータとして付与する場合において、この膨大な映像から投球シーンのような特定の動作のシーンを自動的に判定することが望まれていた。 Most baseball events (for example, strikeout, hit, double play, etc.) start from the pitching scene. Therefore, in order to determine a baseball event, it is important to first identify a pitching scene. When such event information is added as metadata to a huge amount of past video, it is desired to automatically determine a scene of a specific action such as a throwing scene from this huge amount of video. It was.

本発明は、前記従来技術の問題を解決するために成されたもので、顔の特徴量や、スーパ表示を用いずに、映像から特定の動作のシーンを高精度で判定することができる特定動作判定装置、リファレンスデータ生成装置、特定動作判定プログラム及びリファレンスデータ生成プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems of the prior art, and is capable of determining a scene of a specific operation from a video with high accuracy without using a facial feature amount or a super display. An object is to provide an operation determination device, a reference data generation device, a specific operation determination program, and a reference data generation program.

前記課題を解決するため、請求項１に記載の特定動作判定装置は、人物の特定の動作の映像における時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを補正し、補正された当該フレーム画像と、前記隣接フレーム画像との差分画像を生成して、当該差分画像に基づいて前記人物の領域の動きの特徴を示す所定の基準により正規化された複数の特徴量を算出し、この特徴量を前記映像ごとに時系列に並べて次元圧縮して生成されたリファレンス時系列データを予め記憶するリファレンス時系列データ記憶装置から、当該リファレンス時系列データを参照して、入力された入力映像から、当該入力映像に含まれる人物の特定の動作の映像を判定する特定動作判定装置であって、撮影方向補正手段と、特徴量算出手段と、時系列データ算出手段と、動作判定手段とを備える構成とした。 In order to solve the above-described problem, the specific motion determination device according to claim 1, wherein each time-series frame image in a video of a specific motion of a person is an adjacent frame that is another frame image continuous with the frame image. A shift of the entire frame image with respect to the image is corrected, and a difference image between the corrected frame image and the adjacent frame image is generated. Based on the difference image, the motion of the person region is corrected. Reference time-series data that stores in advance reference time-series data generated by calculating a plurality of feature quantities normalized according to a predetermined standard indicating the features, and arranging the feature quantities in time series for each video and performing dimension compression By referring to the reference time-series data from the storage device, an image of a specific action of a person included in the input video is input from the input video. A specific operation determination device determines, and the shooting direction correction means, a feature amount calculating means, the time-series data calculation means, configured to include an operation determination unit.

かかる構成によれば、特定動作判定装置は、撮影方向補正手段によって、入力映像の時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを検出して補正し、特徴量算出手段によって、撮影方向補正手段で補正されたフレーム画像と、隣接フレーム画像との差分画像を生成して、当該差分領域の画素数、当該差分領域の分散、当該差分領域の重心の移動速度、前記人物の領域の差分画素数及び前記人物の領域の主軸の傾きを所定の基準により正規化して、複数の特徴量として算出する。また、特定動作判定装置は、時系列データ算出手段によって、特徴量算出手段で算出された特徴量を当該入力映像において時系列に並べて主成分分析を施して、次元圧縮された時系列データを生成し、動作判定手段によって、リファレンス時系列データ記憶装置に記憶されたリファレンス時系列データと、時系列データ算出手段で生成された時系列データとのすべての主成分について相関係数を算出して、時系列データにおいて、相関係数が所定の閾値以上となる区間に対応する映像を特定の動作の映像と判定する。 According to such a configuration, the specific operation determination device uses the shooting direction correction unit to perform, for each of the time-series frame images of the input video, the frame image for the adjacent frame image that is another frame image continuous with the frame image. By detecting and correcting the shift during the entire shooting, the feature amount calculating unit generates a difference image between the frame image corrected by the shooting direction correcting unit and the adjacent frame image, and the number of pixels in the difference region, The variance of the difference area, the moving speed of the center of gravity of the difference area, the number of difference pixels in the person area, and the inclination of the principal axis of the person area are normalized according to a predetermined reference and calculated as a plurality of feature amounts. In addition, the specific action determination device generates time-series data that is dimensionally compressed by time-series data calculation means by arranging the feature quantities calculated by the feature quantity calculation means in time series in the input video and performing principal component analysis. Then, the operation determination means calculates the correlation coefficient for all principal components of the reference time series data stored in the reference time series data storage device and the time series data generated by the time series data calculation means, In the time series data, a video corresponding to a section where the correlation coefficient is equal to or greater than a predetermined threshold is determined as a video of a specific operation.

これによって、特定動作判定装置は、入力映像から時系列データを生成して、入力映像に含まれる特定の動作の映像を判定することができる。ここで、特定の動作の映像を用いて、リファレンス時系列データが予め生成されている。このリファレンス時系列データは、特定の動作の映像におけるフレーム画像間の差分画像から解析されるため、当該特定の動作の映像における人物の領域の動きの特徴を示すものであり、また、時系列データは、入力映像におけるフレーム画像間の差分画像から解析されるため、入力映像における人物の領域の動きの特徴を示すものとなる。そのため、特定動作判定装置は、時系列データにおいて、リファレンス時系列データと相関の高い区間に対応する入力映像内の映像を特定の動作の映像と判定することができる。 As a result, the specific action determination device can generate time-series data from the input video and determine the video of the specific action included in the input video. Here, reference time-series data is generated in advance using a video of a specific operation. Since this reference time series data is analyzed from the difference image between the frame images in the video of the specific action, it indicates the characteristics of the movement of the person's area in the video of the specific action. Is analyzed from the difference image between the frame images in the input video, and thus indicates the characteristics of the movement of the person region in the input video. Therefore, the specific operation determination device can determine, in the time series data, the video in the input video corresponding to the section having a high correlation with the reference time series data as the video of the specific operation.

また、請求項２に記載のリファレンスデータ生成装置は、請求項１に記載の特定動作判定装置において用いられる、人物の特定の動作の映像における前記人物の領域の動きの特徴を示すリファレンス時系列データを生成するリファレンスデータ生成装置であって、撮影方向補正手段と、特徴量算出手段と、時系列データ算出手段とを備える構成とした。 In addition, the reference data generation device according to claim 2 is used in the specific motion determination device according to claim 1, and reference time-series data indicating characteristics of the motion of the person area in the video of the specific motion of the person The reference data generation device for generating the image data includes a shooting direction correction unit, a feature amount calculation unit, and a time-series data calculation unit.

かかる構成によれば、リファレンスデータ生成装置は、撮影方向補正手段によって、人物の特定の動作の映像における時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを検出して補正し、特徴量算出手段によって、撮影方向補正手段で補正されたフレーム画像と、隣接フレーム画像との差分画像を生成して、当該差分領域の画素数、当該差分領域の分散、当該差分領域の重心の移動速度、前記人物の領域の差分画素数及び前記人物の領域の主軸の傾きを所定の基準により正規化して、複数の特徴量として算出する。また、リファレンスデータ生成装置は、時系列データ算出手段によって、特徴量算出手段で算出された特徴量を当該映像において時系列に並べて主成分分析を施して、次元圧縮されたリファレンス時系列データを生成する。 According to such a configuration, the reference data generation device uses the shooting direction correction unit to, for each of the time-series frame images in the video of the specific action of the person, an adjacent frame image that is another frame image continuous to the frame image. for, and detect and correct deviation of the time the frame image overall shooting, by the feature amount calculating means, and the frame image corrected by the photographing direction correction means generates a difference image between adjacent frame images, the difference The number of pixels in the area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the number of difference pixels in the person area, and the inclination of the principal axis of the person area are normalized according to a predetermined criterion, and a plurality of features It is calculated as the amount. The reference data generating device, when the series data calculation means, a feature amount calculated by the feature calculating unit by performing side-by-side principal component analysis on the time series in the image, generating a reference time-series data dimension reduction To do.

これによって、リファレンスデータ生成装置は、特定の動作の映像から、当該特定の動作の映像における人物の領域の動きの特徴を示すリファレンス時系列データを生成することができる。 Accordingly, the reference data generation device can generate reference time-series data indicating the characteristics of the movement of the person's area in the video of the specific operation from the video of the specific operation.

更に、請求項３に記載の特定動作判定プログラムは、人物の特定の動作の映像における時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを補正し、補正された当該フレーム画像と、前記隣接フレーム画像との差分画像を生成して、当該差分画像に基づいて前記人物の領域の動きの特徴を示す所定の基準により正規化された複数の特徴量を算出し、この特徴量を前記映像ごとに時系列に並べて次元圧縮して生成されたリファレンス時系列データを予め記憶するリファレンス時系列データ記憶装置から、当該リファレンス時系列データを参照して、入力された入力映像から、当該入力映像に含まれる人物の特定の動作の映像を判定するために、コンピュータを、撮影方向補正手段、特徴量算出手段、時系列データ算出手段、動作判定手段として機能させることとした。 Furthermore, the specific motion determination program according to claim 3 is configured to perform, for each time-series frame image in a video of a specific motion of a person, the frame for an adjacent frame image that is another frame image continuous with the frame image. A predetermined image indicating a characteristic of the movement of the person's area based on the difference image by generating a difference image between the corrected frame image and the adjacent frame image by correcting a shift during photographing of the entire image. From a reference time-series data storage device that pre-stores reference time-series data generated by calculating a plurality of feature quantities normalized according to a reference and dimensionally compressing the feature quantities in time series for each video. Referring to the reference time-series data, a video of a specific action of a person included in the input video is determined from the input video. In order, a computer, photographing direction correction means, feature amount calculating means, the time-series data calculation means, and a function as an operation determination unit.

かかる構成によれば、特定動作判定プログラムは、撮影方向補正手段によって、入力映像の時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを検出して補正し、特徴量算出手段によって、撮影方向補正手段で補正されたフレーム画像と、隣接フレーム画像との差分画像を生成して、当該差分領域の画素数、当該差分領域の分散、当該差分領域の重心の移動速度、前記人物の領域の差分画素数及び前記人物の領域の主軸の傾きを所定の基準により正規化して、複数の特徴量として算出する。また、特定動作判定プログラムは、時系列データ算出手段によって、特徴量算出手段で算出された特徴量を当該入力映像において時系列に並べて主成分分析を施して、次元圧縮された時系列データを生成し、動作判定手段によって、リファレンス時系列データ記憶装置に記憶されたリファレンス時系列データと、時系列データ算出手段で生成された時系列データとのすべての主成分について相関係数を算出して、時系列データにおいて、相関係数が所定の閾値以上となる区間に対応する映像を特定の動作の映像と判定する。 According to such a configuration, the specific operation determination program causes the shooting direction correction unit to perform, for each of the time-series frame images of the input video, the frame image for the adjacent frame image that is another frame image continuous to the frame image. By detecting and correcting the shift during the entire shooting, the feature amount calculating unit generates a difference image between the frame image corrected by the shooting direction correcting unit and the adjacent frame image, and the number of pixels in the difference region, The variance of the difference area, the moving speed of the center of gravity of the difference area, the number of difference pixels in the person area, and the inclination of the principal axis of the person area are normalized according to a predetermined reference and calculated as a plurality of feature amounts. In addition, the specific action determination program generates a dimensionally compressed time series data by performing a principal component analysis by arranging the feature quantities calculated by the feature quantity calculation means in time series in the input video by the time series data calculation means. Then, the operation determination means calculates the correlation coefficient for all principal components of the reference time series data stored in the reference time series data storage device and the time series data generated by the time series data calculation means, In the time series data, a video corresponding to a section where the correlation coefficient is equal to or greater than a predetermined threshold is determined as a video of a specific operation.

これによって、特定動作判定プログラムは、入力映像から時系列データを生成して、入力映像に含まれる特定の動作の映像を判定することができる。ここで、特定の動作の映像を用いて、特定の動作の映像における人物の領域の動きの特徴を示すリファレンス時系列データを予め生成しておくことで、特定動作判定プログラムは、時系列データにおいて、リファレンス時系列データと相関の高い区間に対応する映像を特定の動作の映像と判定することができる。 As a result, the specific action determination program can generate time-series data from the input video and determine the video of the specific action included in the input video. Here, by using the video of the specific motion and generating the reference time-series data indicating the characteristics of the movement of the person's area in the video of the specific motion in advance, the specific motion determination program The video corresponding to the section having a high correlation with the reference time-series data can be determined as the video of the specific operation.

また、請求項４に記載のリファレンスデータ生成プログラムは、請求項１に記載の特定動作判定装置において用いられる、人物の特定の動作の映像における前記人物の領域の動きの特徴を示すリファレンス時系列データを生成するために、コンピュータを、撮影方向補正手段、特徴量算出手段、時系列データ算出手段として機能させることとした。 According to a fourth aspect of the present invention, there is provided a reference data generation program that is used in the specific action determination device according to the first aspect, and that is a reference time-series data indicating the characteristics of the movement of the person area in the video of the specific action of the person. Therefore, the computer is caused to function as an imaging direction correction unit, a feature amount calculation unit, and a time series data calculation unit.

かかる構成によれば、リファレンスデータ生成プログラムは、撮影方向補正手段によって、人物の特定の動作の映像における時系列のフレーム画像の各々について、当該フレーム画像に連続する他のフレーム画像である隣接フレーム画像に対する、当該フレーム画像全体の撮影時のずれを検出して補正し、特徴量算出手段によって、撮影方向補正手段で補正されたフレーム画像と、隣接フレーム画像との差分画像を生成して、当該差分領域の画素数、当該差分領域の分散、当該差分領域の重心の移動速度、前記人物の領域の差分画素数及び前記人物の領域の主軸の傾きを所定の基準により正規化して、複数の特徴量として算出する。また、リファレンスデータ生成プログラムは、時系列データ算出手段によって、特徴量算出手段で算出された特徴量を当該映像において時系列に並べて主成分分析を施して、次元圧縮されたリファレンス時系列データを生成する。 According to such a configuration, the reference data generation program uses the shooting direction correction unit to, for each of the time-series frame images in the video of the specific action of the person, an adjacent frame image that is another frame image continuous to the frame image. for, and detect and correct deviation of the time the frame image overall shooting, by the feature amount calculating means, and the frame image corrected by the photographing direction correction means generates a difference image between adjacent frame images, the difference The number of pixels in the area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the number of difference pixels in the person area, and the inclination of the principal axis of the person area are normalized according to a predetermined criterion, and a plurality of features It is calculated as the amount. The reference data generation program, when the series data calculation means, a feature amount calculated by the feature calculating unit by performing side-by-side principal component analysis on the time series in the image, generating a reference time-series data dimension reduction To do.

これによって、リファレンスデータ生成プログラムは、特定の動作の映像から、当該特定の動作の映像における人物の領域の動きの特徴を示すリファレンス時系列データを生成することができる。 Thus, the reference data generation program can generate reference time-series data indicating the characteristics of the movement of the person's area in the video of the specific operation from the video of the specific operation.

本発明に係る特定動作判定装置、リファレンスデータ生成装置、特定動作判定プログラム及びリファレンスデータ生成プログラムでは、以下のような優れた効果を奏する。 The specific operation determination device, the reference data generation device, the specific operation determination program, and the reference data generation program according to the present invention have the following excellent effects.

請求項１又は請求項３に記載の発明によれば、入力映像の時系列データに基づいて、当該入力映像に含まれる特定の動作の映像を判定することができる。そして、従来のように、顔を識別する必要がなく、更に、時系列データを生成する際に特徴量を次元圧縮するため、処理負荷を低くすることができる。また、従来のスーパ表示のように後から映像に付加された付加情報を用いる必要がないので、付加情報の付加されていない入力映像に対しても適用することができる。また、入力映像内の人物の領域の動きの特徴を示す時系列データに基づいて、人物の特定の動作の映像であるかを判定するため、高精度に特定の動作の映像であるかを判定することができる。 According to the first or third aspect of the invention, it is possible to determine a video of a specific operation included in the input video based on the time-series data of the input video. Then, unlike the prior art, it is not necessary to identify a face, and the feature amount is dimensionally compressed when generating time-series data, so that the processing load can be reduced. Further, since there is no need to use additional information added to the video later as in the conventional super display, the present invention can also be applied to an input video to which no additional information is added. Also, based on time-series data indicating the motion characteristics of the person's area in the input video, it is determined whether the video is a specific motion of the person with high accuracy in order to determine whether the video is a specific motion of the person. can do.

請求項２又は請求項４に記載の発明によれば、特定の動作の映像からリファレンス時系列データを生成することができる。 According to the second or fourth aspect of the present invention, reference time-series data can be generated from a video of a specific operation.

以下、本発明の実施の形態について図面を参照して説明する。なお、ここでは、本発明におけるリファレンスデータ生成装置及び特定動作判定装置を備える特定動作抽出システムについて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Here, a specific action extraction system including a reference data generation apparatus and a specific action determination apparatus according to the present invention will be described.

［特定動作抽出システムの構成］
まず、図１を参照して、特定動作抽出システムＳの構成について説明する。図１は、本発明のリファレンスデータ生成装置及び特定動作判定装置を備える特定動作抽出システムの構成を示したブロック図である。 [Configuration of specific action extraction system]
First, the configuration of the specific action extraction system S will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of a specific action extraction system including a reference data generation apparatus and a specific action determination apparatus according to the present invention.

特定動作抽出システムＳは、人物の特定の動作が撮影された複数の過去の映像である特定動作映像に基づいて、当該動作による人物の動きの特徴を示すリファレンス時系列データを生成して、当該リファレンス時系列データに基づいて、外部から入力された入力映像から、この入力映像に含まれる当該動作の映像を判定するものである。特定動作抽出システムＳは、リファレンスデータ生成装置１と、リファレンス時系列データ記憶装置３と、特定動作判定装置５とを備えて構成されている。なお、ここでは、特定動作抽出システムＳが、野球の投球の動作の映像（シーン）を抽出する場合を例に説明するが、カメラを固定して撮影する固定撮影に近い状態で撮影される映像であれば、投球動作に限らず、野球の他の動作や、様々なスポーツの動作や、スポーツ以外の動作にも適用可能である。 The specific action extraction system S generates reference time-series data indicating characteristics of a person's movement due to the action based on the specific action video that is a plurality of past videos in which the specific action of the person is captured, Based on the reference time-series data, the video of the operation included in the input video is determined from the input video input from the outside. The specific action extraction system S includes a reference data generation device 1, a reference time-series data storage device 3, and a specific action determination device 5. Here, a case where the specific motion extraction system S extracts a baseball pitching motion image (scene) will be described as an example, but a video shot in a state close to fixed shooting in which the camera is fixed. If so, the present invention can be applied not only to a pitching operation but also to other operations of baseball, various sports operations, and operations other than sports.

リファレンスデータ生成装置１は、外部から入力された特定動作映像に基づいて、当該動作による人物の動きの特徴を示すリファレンス時系列データを生成するものである。ここで生成されたリファレンス時系列データは、リファレンス時系列データ記憶装置３に記憶される。 The reference data generation device 1 generates reference time-series data indicating the characteristics of a person's movement caused by the operation based on a specific operation image input from the outside. The reference time series data generated here is stored in the reference time series data storage device 3.

なお、ここで入力される特定動作映像とは、人物の特定の動作の映像であり、例えば、過去に撮影された映像から、人手によって特定の動作の開始から終了までの区間のみが抽出された映像である。また、リファレンス時系列データは、特定動作映像の人物の動きの特徴を示すものであり、後記する特定動作判定装置５において、特定の動作の映像であるか否かを判定する基準となるデータである。 The specific motion video input here is a video of a specific motion of a person. For example, only a section from the start to the end of a specific motion is manually extracted from a video shot in the past. It is a picture. The reference time-series data indicates the characteristics of the person's movement in the specific motion video, and is data serving as a reference for determining whether the video is a specific motion in the specific motion determination device 5 described later. is there.

リファレンス時系列データ記憶装置３は、リファレンスデータ生成装置１によって生成されたリファレンス時系列データを記憶するもので、ハードディスク等の一般的な記憶手段である。このリファレンス時系列データは、特定動作判定装置５によって参照されて用いられる。 The reference time series data storage device 3 stores the reference time series data generated by the reference data generation device 1, and is a general storage means such as a hard disk. This reference time series data is referred to and used by the specific operation determination device 5.

特定動作判定装置５は、リファレンス時系列データ記憶装置３に記憶されたリファレンス時系列データに基づいて、外部から入力された入力映像から、人物の特定の動作の映像を判定するものである。ここで特定の動作の映像と判定された映像の区間を示す情報は、外部に出力される。 The specific motion determination device 5 determines a video of a specific motion of a person from an input video input from the outside based on the reference time-series data stored in the reference time-series data storage device 3. Here, information indicating the section of the video determined as the video of the specific operation is output to the outside.

ここで、特定動作判定装置５は、リファレンスデータ生成装置１がリファレンス時系列データを生成する方法と同じ方法によって、入力映像における人物の動きの特徴を示す時系列データを生成する。そして、リファレンス時系列データ記憶装置３に記憶されたリファレンス時系列データを、特定の動作の映像の特徴量を学習した学習結果として用い、このリファレンス時系列データに対して相関の高い時系列データに対応する区間の映像を特定の動作の映像と判定する。なお、特定動作判定装置５によって入力映像を判定する際には、すでにリファレンス時系列データがリファレンス時系列データ記憶装置３に記憶されていることとする。 Here, the specific action determination device 5 generates time-series data indicating the characteristics of the person's movement in the input video by the same method as the reference data generation device 1 generates the reference time-series data. Then, the reference time series data stored in the reference time series data storage device 3 is used as a learning result obtained by learning the feature amount of the video of the specific operation, and the time series data having a high correlation with the reference time series data is obtained. The video of the corresponding section is determined as the video of a specific operation. It is assumed that the reference time-series data is already stored in the reference time-series data storage device 3 when the specific operation determination device 5 determines the input video.

以下、リファレンスデータ生成装置１と、特定動作判定装置５との詳細な構成について説明する。
（リファレンスデータ生成装置の構成）
リファレンスデータ生成装置１の構成について説明する。リファレンスデータ生成装置１は、画像記憶手段１１と、撮影方向補正手段１２と、特徴量算出手段１３と、時系列データ算出手段１４とを備える。 Hereinafter, detailed configurations of the reference data generation device 1 and the specific operation determination device 5 will be described.
(Configuration of reference data generator)
A configuration of the reference data generation device 1 will be described. The reference data generation apparatus 1 includes an image storage unit 11, a shooting direction correction unit 12, a feature amount calculation unit 13, and a time series data calculation unit 14.

画像記憶手段１１は、外部から入力された特定動作映像を記憶して、各種の信号／画像処理をするためのメモリであり、例えば、特定動作映像を１フレーム単位でデジタルデータとして記録するものである。ここで記憶されるフレーム画像は１フレーム分遅延されて、撮影方向補正手段１２及び特徴量算出手段１３に出力される。 The image storage means 11 is a memory for storing a specific operation video input from the outside and performing various signals / image processing, for example, recording the specific operation video as digital data in units of one frame. is there. The frame image stored here is delayed by one frame and output to the shooting direction correction unit 12 and the feature amount calculation unit 13.

撮影方向補正手段１２は、外部から入力された入力信号のフレーム画像（以下、現フレーム画像Ｉｍａｇｅ_ｔと言う）の、画像記憶手段１１から入力された１フレーム遅延されたフレーム画像（隣接フレーム画像Ｉｍａｇｅ_ｔ−１）に対する撮影方向のずれによる画像のずれを補正するものである。ここで補正された現フレーム画像Ｉｍａｇｅ_ｔは、特徴量算出手段１３に出力される。 The photographing direction correction unit 12 is a frame image (adjacent frame image Image) input from the image storage unit 11 of a frame image of an input signal input from the outside (hereinafter referred to as a current frame image Image _t ). _This is to correct a shift of the image due to a shift of the photographing direction with respect to _t-1 ). The current frame image Image _t corrected here is output to the feature amount calculating means 13.

ここで、野球中継における投球映像は、固定撮影とみなせるほど撮影時のカメラの撮影方向は変化しない場合が多いが、実際にはカメラマンの手の震えや、カメラの設置位置の振動等によって、カメラには微小な揺れが生じる。そして、後記する特徴量算出手段１３では、現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１との差分画像を生成して、当該現フレーム画像Ｉｍａｇｅ_ｔ内の人物領域における動きを検出するが、この時に現フレーム画像Ｉｍａｇｅ_ｔに、隣接フレーム画像Ｉｍａｇｅ_ｔ−１に対してカメラの揺れによるずれが生じていると、正確な人物領域の動きの解析が困難になる。そこで、ここでは、撮影方向補正手段１２によって、現フレーム画像Ｉｍａｇｅ_ｔの隣接フレーム画像Ｉｍａｇｅ_ｔ−１に対する動きベクトルを算出して、現フレーム画像Ｉｍａｇｅ_ｔ全体の位置を補正することとした。 Here, in many cases, the shooting direction of a camera shot during a baseball broadcast does not change so that it can be regarded as fixed shooting, but in reality the camera is shaken by the cameraman's hand shake, the camera installation position vibration, etc. There is a slight shaking. Then, the feature value calculation means 13 described later generates a difference image between the current frame image Image _t and the adjacent frame image Image _t−1, and detects a motion in the person area in the current frame image Image _t . At this time, if the current frame image Image _t is deviated from the adjacent frame image Image _{t-1 due} to camera shake, it is difficult to accurately analyze the movement of the person area. Therefore, here, the photographing direction correction means 12, it was decided to calculate the motion vectors for the adjacent frame images Image _t-1 of the current frame image Image _t, to correct the position of the entire current frame image Image _t.

この動きベクトルは、例えば、ブロックマッチング法によって算出することができる。ここで、図２を参照（適宜図１参照）して、撮影方向補正手段１２がブロックマッチング法によって動きベクトルを算出して、現フレーム画像Ｉｍａｇｅ_ｔのずれを補正する方法について説明する。図２は、現フレーム画像と、現フレーム画像内の注目矩形領域との例を模式的に示す模式図、（ａ）は、現フレーム画像の例を模式的に示す模式図、（ｂ）は、（ａ）のＢで示した、現フレーム画像内の注目矩形領域の例を模式的に示す模式図である。 This motion vector can be calculated by, for example, a block matching method. Here, with reference to FIG. 2 (refer to FIG. 1 as appropriate), a method in which the shooting direction correction unit 12 calculates a motion vector by the block matching method and corrects the shift of the current frame image Image _t will be described. FIG. 2 is a schematic diagram schematically illustrating an example of a current frame image and a target rectangular area in the current frame image, FIG. 2A is a schematic diagram schematically illustrating an example of the current frame image, and FIG. FIG. 6B is a schematic diagram schematically illustrating an example of a target rectangular area in a current frame image indicated by B in FIG.

まず、撮影方向補正手段１２は、図２（ａ）に示す現フレーム画像Ｉｍａｇｅ_ｔ内に注目矩形領域Ｂを設定する。ここでは、撮影方向補正手段１２は、操作者からの指令に基づいて、注目矩形領域Ｂを設定することとした。ここで、野球中継の投球シーンでは、撮影方向がほぼ固定されるため、看板などの特徴的は背景はいつも同じ位置に現れる。そのため、ひとつの野球中継において、操作者は、領域（注目矩形領域Ｂ）の位置及び大きさを一度指定すれば、以降変更せずに同じ領域を用いることができる。このとき、なるべく選手の領域にかからない領域が指定されることが好ましい。なお、撮影方向補正手段１２は、操作者からの指令によらず、例えば、所定の位置に所定の大きさの注目矩形領域Ｂを自動で設定することとしてもよい。 First, the shooting direction correction unit 12 sets a target rectangular area B in the current frame image Image _t shown in FIG. Here, the photographing direction correcting unit 12 sets the target rectangular area B based on a command from the operator. Here, in the baseball broadcast pitch scene, the shooting direction is almost fixed, so the background of the signboard and other features always appears at the same position. Therefore, in one baseball broadcast, the operator can use the same area without changing it once the position and size of the area (the target rectangular area B) are designated once. At this time, it is preferable to designate a region that does not cover the player's region as much as possible. Note that the shooting direction correction unit 12 may automatically set a rectangular region of interest B having a predetermined size at a predetermined position, for example, without a command from the operator.

そして、図２（ｂ）に示すような注目矩形領域Ｂが設定されたとする。そうすると、撮影方向補正手段１２は、ブロックマッチングにより、画像記憶手段１１から入力された１フレーム遅延された隣接フレーム画像Ｉｍａｇｅ_ｔ−１から、この注目矩形領域Ｂの画像と一致する箇所を求めて、この注目矩形領域Ｂの動きベクトルを求める。ここで、動きベクトルが例えば（１，２）であった場合に、撮影方向補正手段１２は、現フレーム画像Ｉｍａｇｅ_ｔの全画素を（−１，−２）ずつ移動することで、現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１と間のずれを補正することができる。ここで、撮影方向補正手段１２には、特定動作映像を構成する現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１とが順次入力され、撮影方向補正手段１２は、前記の処理をフレームごとに行う。 Assume that a rectangular region of interest B as shown in FIG. 2B is set. Then, the shooting direction correction unit 12 obtains a location that matches the image of the target rectangular region B from the adjacent frame image Image _t-1 delayed by one frame input from the image storage unit 11 by block matching. A motion vector of the target rectangular area B is obtained. Here, when the motion vector is, for example, (1, 2), the shooting direction correcting unit 12 moves all the pixels of the current frame image Image _t by (−1, −2), thereby moving the current frame image. The shift between Image _t and adjacent frame image Image _t-1 can be corrected. Here, the current direction image Image _t and the adjacent frame image Image _t-1 constituting the specific operation video are sequentially input to the shooting direction correction unit 12, and the shooting direction correction unit 12 performs the above processing for each frame. Do.

図１に戻って説明を続ける。特徴量算出手段１３は、撮影方向補正手段１２から入力された現フレーム画像Ｉｍａｇｅ_ｔと、画像記憶手段１１から入力された隣接フレーム画像Ｉｍａｇｅ_ｔ−１とに基づいて、特定動作映像の人物の動きの特徴を示す特徴量を算出するものである。ここで算出された特徴量は、時系列データ算出手段１４に出力される。ここで、特徴量算出手段１３は、現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１との差分画像を生成して、この画像における特徴量を算出する。 Returning to FIG. 1, the description will be continued. Based on the current frame image Image _t input from the shooting direction correction unit 12 and the adjacent frame image Image _t−1 input from the image storage unit 11, the feature amount calculation unit 13 moves the person in the specific motion video. The feature amount indicating the feature is calculated. The feature amount calculated here is output to the time-series data calculation unit 14. Here, the feature amount calculation means 13 generates a difference image between the current frame image Image _t and the adjacent frame image Image _t−1, and calculates the feature amount in this image.

ここでは、特徴量算出手段１３は、現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１との差分を示す２値化画像を生成して、特徴量を算出することとした。以下、特徴量算出手段１３が差分を示す２値化画像を生成して、特徴量を算出する方法について説明する。 Here, the feature amount calculation means 13 generates a binarized image indicating the difference between the current frame image Image _t and the adjacent frame image Image _t−1 and calculates the feature amount. Hereinafter, a method in which the feature amount calculation unit 13 generates a binarized image indicating the difference and calculates the feature amount will be described.

ここで、特徴量算出手段１３による差分算出時の画素値は、輝度であってもよいし、赤、青、緑などの特定の色としてもよい。そして、特徴量算出手段１３は、算出された差分の絶対値が閾値以上である領域の画素値を「１」、それ以外の領域の画素値を「０」とした２値化画像を生成する。これによって、現フレーム画像Ｉｍａｇｅ_ｔと隣接フレーム画像Ｉｍａｇｅ_ｔ−１との間で動きが生じた領域（例えば、投球時の腕）のみを抽出することができる。 Here, the pixel value at the time of calculating the difference by the feature amount calculating unit 13 may be luminance or a specific color such as red, blue, or green. Then, the feature amount calculating unit 13 generates a binarized image in which the pixel value of the region where the absolute value of the calculated difference is equal to or greater than the threshold is “1” and the pixel value of the other region is “0”. . As a result, it is possible to extract only an area (for example, an arm at the time of pitching) in which movement has occurred between the current frame image Image _t and the adjacent frame image Image _t-1 .

更に、特徴量算出手段１３は、得られた差分画像（２値化画像）から、その画像における特徴量を算出する。ここで、野球の映像では、投球ごとに撮影方向が若干変化するため、差分が生じ、動きが生じた領域として抽出された領域（以下、差分領域と言う）の画像座標をそのまま特徴量とすることができない。例えば、ホームベースの重心の画像座標は、撮影方向が変われば変化する。そこで、特徴量算出手段１３は、撮影方向の変化を考慮して、撮影方向の変化の影響を排除した下記の特徴量を算出することした。なお、これらの特徴量は一例であり、特徴量算出手段１３は、人物の動きの特徴を示す他の特徴量を算出することとしてもよい。 Further, the feature amount calculation means 13 calculates a feature amount in the image from the obtained difference image (binarized image). Here, in the baseball video, the shooting direction changes slightly for each pitch, so a difference occurs, and the image coordinates of the region extracted as the region where the motion has occurred (hereinafter referred to as the difference region) is directly used as the feature amount. I can't. For example, the image coordinates of the center of gravity of the home base change as the shooting direction changes. Therefore, the feature quantity calculation means 13 calculates the following feature quantity that excludes the influence of the change in the shooting direction in consideration of the change in the shooting direction. Note that these feature amounts are merely examples, and the feature amount calculation unit 13 may calculate other feature amounts indicating the characteristics of the person's movement.

ここでは、特徴量算出手段１３は、差分画像から、差分領域の画素数（以下、差分画素数と言う）、差分領域の分散、差分領域の重心の移動速度（以下、重心速度と言う）、投手領域の差分画素数、投手領域の主軸の傾きを特徴量として算出することとした。ここで、図３を参照して、特徴量算出手段１３が各特徴量を算出する方法について説明する。図３は、特徴量算出手段によって生成された差分画像を模式的に示す模式図である。なお、図３では、差分領域を白で示した。 Here, the feature amount calculation means 13 calculates, from the difference image, the number of pixels in the difference area (hereinafter referred to as the difference pixel number), the variance of the difference area, the moving speed of the centroid of the difference area (hereinafter referred to as the centroid speed), The number of difference pixels in the pitcher area and the inclination of the main axis of the pitcher area are calculated as feature amounts. Here, with reference to FIG. 3, a method by which the feature amount calculation unit 13 calculates each feature amount will be described. FIG. 3 is a schematic diagram schematically showing the difference image generated by the feature amount calculation means. In FIG. 3, the difference area is shown in white.

＜差分画素数＞
差分画素数は、差分画像全体における差分領域の画素の総数である。ここで、特徴量算出手段１３は、差分画像の画素値が「１」である画素の総数を求める。 <Number of difference pixels>
The difference pixel number is the total number of pixels in the difference area in the entire difference image. Here, the feature amount calculating unit 13 obtains the total number of pixels whose pixel value of the difference image is “1”.

＜差分領域の分散＞
差分領域の分散は、差分領域の重心Ｃからの差分の分散である。ここで、特徴量算出手段１３は、水平方向の分散と、垂直方向の分散とを算出することとした。 <Distribution of difference area>
The variance of the difference area is the variance of the difference from the center of gravity C of the difference area. Here, the feature amount calculating means 13 calculates the horizontal dispersion and the vertical dispersion.

＜重心速度＞
重心速度は、差分領域の重心Ｃの、前フレームの差分領域の重心Ｃからの移動量である。ここで、特徴量算出手段１３は、重心Ｃの水平方向の移動量と垂直方向の移動量とを算出することとした。 <Center of gravity speed>
The center-of-gravity speed is the amount of movement of the center of gravity C of the difference area from the center of gravity C of the difference area of the previous frame. Here, the feature amount calculation unit 13 calculates the horizontal movement amount and the vertical movement amount of the center of gravity C.

＜投手領域の差分画素数＞
投手領域の差分画素数は、例えば図３に示すように、差分領域の重心Ｃから水平及び垂直方向にそれぞれ一定画素数（例えば、σを標準偏差として２σ）以内の領域を投手領域Ｑとした時の、投手領域Ｑ内の差分領域の画素数である。更に、ここでは、特徴量算出手段１３は、投手領域Ｑを４等分して、左上領域Ｑ１、右上領域Ｑ２、左下領域Ｑ３及び右下領域Ｑ４の各々についての差分画素数も求めることとした。 <Number of difference pixels in pitcher area>
For example, as shown in FIG. 3, the pitcher area Q is an area within a certain number of pixels (for example, 2σ with σ being a standard deviation) in the horizontal and vertical directions from the center of gravity C of the difference area. This is the number of pixels in the difference area in the pitcher area Q at the time. Further, here, the feature amount calculation means 13 equally divides the pitcher area Q into four, and obtains the number of difference pixels for each of the upper left area Q1, the upper right area Q2, the lower left area Q3, and the lower right area Q4. .

＜投手領域の主軸の傾き＞
投手領域の主軸の傾きは、投手領域Ｒ内の差分領域の主軸Ｍの傾きである。ここで、特徴量算出手段１３は、投手領域Ｒ内の画像の２次モーメントを求めることで、投手領域の主軸の傾きを算出することができる。ここで、画像Ｉ（ｘ，ｙ）のｐ＋ｑ次モーメントＭ_ｐｑは、以下の式（１）で表される。そして、特徴量算出手段１３は、投手領域の主軸の傾きθを以下の式（２）によって算出することができる。 <Inclination of the main axis of the pitcher area>
The inclination of the main axis of the pitcher area is the inclination of the main axis M of the difference area in the pitcher area R. Here, the feature quantity calculating means 13 can calculate the inclination of the principal axis of the pitcher area by obtaining the second moment of the image in the pitcher area R. Here, the p + q moment _Mpq of the image I (x, y) is expressed by the following equation (1). And the feature-value calculation means 13 can calculate inclination (theta) of the main axis | shaft of a pitcher area | region by the following formula | equation (2).

更に、特徴量算出手段１３は、以上のようにして算出した各値を、０〜１の範囲で正規化して、差分画像の特徴量とする。例えば、特徴量算出手段１３は、正規化後の差分画素数ｓ＿ｃｎｔを、以下の式（３）によって算出することができる。ここで、ＮＩは、差分画像の全画素数、ＮＳは、差分画素数である。
ｓ＿ｃｎｔ＝ＮＳ／ＮＩ …（３） Furthermore, the feature amount calculation unit 13 normalizes each value calculated as described above within a range of 0 to 1 to obtain a feature amount of the difference image. For example, the feature amount calculating unit 13 can calculate the normalized difference pixel number s_cnt by the following equation (3). Here, NI is the total number of pixels of the difference image, and NS is the number of difference pixels.
s_cnt = NS / NI (3)

図１に戻って説明を続ける。時系列データ算出手段１４は、特徴量算出手段１３によって算出された特定動作映像ごとの特徴量を時系列に並べて次元圧縮するものである。ここで次元圧縮されて生成されたリファレンス時系列データは、リファレンス時系列データ記憶装置３に記憶される。 Returning to FIG. 1, the description will be continued. The time series data calculation means 14 arranges the feature quantities for each specific motion video calculated by the feature quantity calculation means 13 in time series and performs dimension compression. Here, the reference time series data generated by the dimension compression is stored in the reference time series data storage device 3.

ここで、投球動作は３秒程度で完了することが多い。例えば、３秒間の特定動作映像である場合には、この特定動作映像は９０フレーム（３［秒］×３０［フレーム／秒］）となり、フレームごとに処理した場合に、次元ごとに９０個の特徴量が時系列に並ぶこととなる。ただし、特徴量算出手段１３によって算出された特徴量をすべて用いると次元数が高く（ここでは１１次元）、処理負荷が大きい。そこで、時系列データ算出手段１４は、特徴量について主成分分析を施し、次元を圧縮する。なお、圧縮後の次元数は、２〜５次元程度が好ましい。以下、時系列データ算出手段１４による次元圧縮について説明する。 Here, the pitching operation is often completed in about 3 seconds. For example, in the case of a specific action video for 3 seconds, the specific action video is 90 frames (3 [seconds] × 30 [frames / second]). The feature amounts are arranged in time series. However, if all the feature amounts calculated by the feature amount calculation means 13 are used, the number of dimensions is high (here, 11 dimensions) and the processing load is large. Therefore, the time-series data calculation unit 14 performs principal component analysis on the feature amount and compresses the dimension. The number of dimensions after compression is preferably about 2 to 5 dimensions. Hereinafter, dimension compression by the time-series data calculating unit 14 will be described.

時系列データ算出手段１４は、以下の式（４）に示すような特徴量の行列Ａを生成する。ここで、ａ_ｔｊ（ｔ＝０〜Ｔ，ｊ＝１〜Ｎ）は、時刻ｔにおけるｊ番目の特徴量であり、行列Ａの成分である。また、Ｔは映像の時間長、Ｎは特徴量の種類の数（次元数）を示す。 The time-series data calculation unit 14 generates a feature quantity matrix A as shown in the following equation (4). Here, a _tj (t = 0 to T, j = 1 to N) is the j-th feature amount at time t and is a component of the matrix A. T represents the time length of the video, and N represents the number of types of feature values (number of dimensions).

時系列データ算出手段１４は、この行列Ａについて主成分分析を施すことで、以下の式（５）に示す固有値Ｐ_０〜Ｐ_Ｔと、固有ベクトルＶ_０〜Ｖ_Ｔが求められる。なお、式（５）では、固有値Ｐ_０〜Ｐ_Ｔの値が大きい順に上から並べた。ここで、ｖ_ｉｊ（ｉ＝０〜Ｔ，ｊ＝１〜Ｎ）は、固有ベクトルＶ_ｔの成分である。そして、時系列データ算出手段１４は、一定値以上の固有値Ｐ_０〜Ｐ_Ｋ（Ｋ＜Ｔ）を持つ主成分のみを利用することにより、次元数をＫ個に圧縮できる。 The time-series data calculating unit 14 performs principal component analysis on the matrix A to obtain eigenvalues P _{0 to} P _T and eigen vectors V _{0 to} V _T shown in the following equation (5). In Expression (5), the eigenvalues P _{0 to} P _T are arranged from the top in descending order. _{Here, v ij (i = 0~T,} j = 1~N) is a component of the eigenvector _{V t.} The time-series data calculating unit 14 can compress the number of dimensions to K by using only the principal components having eigenvalues P _{0 to} P _K (K <T) that are equal to or greater than a certain value.

更に、時系列データ算出手段１４は、フレームごとに固有ベクトルと特徴量との内積を算出することで、主成分得点を算出し、この主成分得点を時系列に並べたものをリファレンス時系列データとする。リファレンス時系列データの例を、図４に示す。なお、ここでは、次元圧縮後の次元数を３とし、図４において、３次元の主成分得点の時系列のデータｓ１〜ｓ３から構成されるリファレンス時系列データの例を示している。図４は、時系列データ算出手段によって算出されたリファレンス時系列データの例を示すグラフである。図４において、横軸をフレーム数（時刻）、縦軸を主成分得点とした。 Further, the time series data calculation means 14 calculates the principal component score by calculating the inner product of the eigenvector and the feature amount for each frame, and the time series data arranged in time series is referred to as the reference time series data. To do. An example of reference time series data is shown in FIG. Here, the number of dimensions after the dimension compression is 3, and FIG. 4 shows an example of reference time series data composed of time series data s1 to s3 of three-dimensional principal component scores. FIG. 4 is a graph showing an example of reference time series data calculated by the time series data calculation means. In FIG. 4, the horizontal axis represents the number of frames (time), and the vertical axis represents the main component score.

以上によって、リファレンスデータ生成装置１は、外部から入力された特定動作映像から、後記する特定動作判定装置５において特定の動作の映像であるか否かを判定する基準となるリファレンス時系列データを生成することができる。 As described above, the reference data generation device 1 generates reference time-series data as a reference for determining whether or not the video is a specific motion in the specific motion determination device 5 to be described later, from the specific motion video input from the outside. can do.

（特定動作判定装置の構成）
次に、特定動作判定装置５の構成について説明する。特定動作判定装置５は、画像記憶手段５１と、撮影方向補正手段５２と、特徴量算出手段５３と、時系列データ算出手段５４と、入力時系列データ記憶手段５５と、動作判定手段５６とを備える。 (Configuration of specific operation determination device)
Next, the configuration of the specific action determination device 5 will be described. The specific motion determination device 5 includes an image storage unit 51, a shooting direction correction unit 52, a feature amount calculation unit 53, a time series data calculation unit 54, an input time series data storage unit 55, and a motion determination unit 56. Prepare.

画像記憶手段５１は、外部から入力された入力映像を記憶して、各種の信号／画像処理をするためのメモリであり、例えば、入力映像を１フレーム単位でデジタルデータとして記録するものである。ここで記憶されるフレーム画像は１フレーム分遅延されて、撮影方向補正手段５２及び特徴量算出手段５３に出力される。 The image storage means 51 is a memory for storing input video input from the outside and performing various signal / image processing. For example, the input video is recorded as digital data in units of one frame. The frame image stored here is delayed by one frame and output to the shooting direction correction unit 52 and the feature amount calculation unit 53.

撮影方向補正手段５２は、外部から入力された入力信号のフレーム画像（以下、現フレーム画像Ｉｍａｇｅ_ｔと言う）の、画像記憶手段５１から入力された１フレーム遅延されたフレーム画像（隣接フレーム画像Ｉｍａｇｅ_ｔ−１）に対する撮影方向のずれによる画像のずれを補正するものである。ここで補正された現フレーム画像Ｉｍａｇｅ_ｔは、特徴量算出手段５３に出力される。なお、この撮影方向補正手段５２は、リファレンスデータ生成装置１の撮影方向補正手段１２と比べて、入力される映像が、例えば、野球中継の映像のような、特定の動作の映像を含む入力映像となり、出力先が特徴量算出手段５３となっただけで、機能は同一である。 The photographing direction correcting means 52 is a frame image (adjacent frame image Image) input from the image storage means 51 of a frame image of the input signal input from the outside (hereinafter referred to as current frame image Image _t ). _This is to correct a shift of the image due to a shift of the photographing direction with respect to _t-1 ). The current frame image Image _t corrected here is output to the feature amount calculating means 53. Note that this shooting direction correction means 52 is compared with the shooting direction correction means 12 of the reference data generation device 1 in that the input video includes a video of a specific operation, such as a baseball broadcast video, for example. Thus, the function is the same only when the output destination is the feature amount calculation means 53.

特徴量算出手段５３は、撮影方向補正手段５２から入力された現フレーム画像Ｉｍａｇｅ_ｔと、画像記憶手段５１から入力された隣接フレーム画像Ｉｍａｇｅ_ｔ−１とに基づいて、入力映像の人物の動きの特徴を示す特徴量を算出するものである。ここで算出された特徴量は、時系列データ算出手段５４に出力される。なお、この特徴量算出手段５３は、リファレンスデータ生成装置１の特徴量算出手段１３と比べて、入力される映像が、例えば、野球中継の映像のような、特定の動作の映像を含む入力映像となり、出力先が時系列データ算出手段５４となっただけで、算出される特徴量の種類や算出方法は同一である。 Based on the current frame image Image _t input from the shooting direction correction unit 52 and the adjacent frame image Image _t−1 input from the image storage unit 51, the feature amount calculation unit 53 determines the movement of the person in the input video. A feature amount indicating a feature is calculated. The feature amount calculated here is output to the time-series data calculation unit 54. Note that the feature amount calculation means 53 is compared with the feature amount calculation means 13 of the reference data generation device 1 in that the input video includes a video of a specific operation such as a baseball broadcast video, for example. Thus, the type and the calculation method of the feature amount to be calculated are the same only when the output destination is the time-series data calculation unit 54.

時系列データ算出手段５４は、特徴量算出手段５３によって算出された入力映像ごとの特徴量を時系列に並べて次元圧縮するものである。ここで次元圧縮されて生成された時系列データは、入力時系列データ記憶手段５５に記憶される。なお、この時系列データ算出手段５４は、リファレンスデータ生成装置１の時系列データ算出手段１４と比べて、入力される特徴量が入力映像の特徴量となり、生成される時系列データが入力時系列データ記憶手段５５に記憶されることとしただけで、次元圧縮の方法や圧縮後の次元数は同一である。 The time series data calculation unit 54 arranges the feature amounts for each input video calculated by the feature amount calculation unit 53 in time series and performs dimension compression. Here, the time series data generated by dimension compression is stored in the input time series data storage means 55. Note that the time series data calculation means 54 is characterized in that the input feature quantity is the feature quantity of the input video and the generated time series data is the input time series as compared to the time series data calculation means 14 of the reference data generation apparatus 1. The dimensional compression method and the number of dimensions after compression are the same only by storing in the data storage means 55.

ここで、特定動作判定装置５が、入力映像から人物の特定の動作の映像をリアルタイムで判定する場合には、時系列データ算出手段５４は、特徴量算出手段５３から１フレーム分の特徴量が入力されるたびに、このフレームから所定数（例えば、数秒間に相当するフレーム数）のフレームだけ遡った特徴量を時系列に並べて次元圧縮して、時系列データを生成することとしてもよい。 Here, when the specific motion determination device 5 determines the video of the specific motion of the person from the input video in real time, the time-series data calculation unit 54 receives the feature amount for one frame from the feature amount calculation unit 53. Each time an input is made, time series data may be generated by dimensionally compressing feature quantities that are traced back by a predetermined number (for example, the number of frames corresponding to several seconds) from this frame in time series.

入力時系列データ記憶手段５５は、時系列データ算出手段５４によって生成された時系列データを記憶するもので、ハードディスク等の一般的な記憶手段である。この時系列データは、動作判定手段５６によって参照されて用いられる。 The input time series data storage means 55 stores the time series data generated by the time series data calculation means 54 and is a general storage means such as a hard disk. This time series data is referred to and used by the operation determination means 56.

動作判定手段５６は、リファレンス時系列データ記憶装置３に記憶されたリファレンス時系列データと、入力時系列データ記憶手段５５に記憶された時系列データとの相関係数を算出し、入力映像に含まれる特定の動作（投球）の映像を判定して検出するものである。ここで特定の動作の映像と判定された映像の区間を示す情報は、外部に出力される。 The operation determination means 56 calculates a correlation coefficient between the reference time series data stored in the reference time series data storage device 3 and the time series data stored in the input time series data storage means 55, and is included in the input video. The video of a specific action (throwing) is determined and detected. Here, information indicating the section of the video determined as the video of the specific operation is output to the outside.

ここで、動作判定手段５６は、ある主成分ｘについての相関係数ｒ_ｘを、以下の式（６）によって算出することができる。ここで、Ｉ_ｔは、時刻ｔにおける時系列データの主成分ｘの主成分得点、Ｒ_ｔは、時刻ｔにおけるリファレンス時系列データの主成分ｘの主成分得点である。 Here, the motion determination means 56 can calculate the correlation coefficient r _x for a certain principal component _x by the following equation (6). Here, I _t is, principal component score of the principal component x of the time series data at time t, R _t is the principal component score of the principal component x of the reference time-series data at time t.

そして、動作判定手段５６は、すべての主成分について相関係数を算出し、固有値に応じた重みを掛けて足し合わせることで、最終的な相関係数を算出する。ここで、動作判定手段５６は、以下の式（７）によって最終的な相関係数Ｒを算出することができる。なお、ｗ_ｘは主成分ｘの重み、Ｐ_ｘは固有値を示す。 Then, the operation determination unit 56 calculates a correlation coefficient for all the principal components, and adds the weights according to the eigenvalues to add a final correlation coefficient. Here, the motion determination means 56 can calculate the final correlation coefficient R by the following equation (7). Note that w _x is a weight of the principal component x, and P _x is an eigenvalue.

動作判定手段５６は、この相関係数Ｒを、時系列データに対して１フレームずつずらして時系列データ全体の区間に対して算出する。なお、特定動作判定装置５が、入力映像から人物の特定の動作の映像をリアルタイムで判定する場合には、時系列データ算出手段５４によって１フレームずつ開始時刻をずらした所定数のフレーム分の時系列データが生成されるため、動作判定手段５６は、各々の時系列データについて、式（６）によって相関係数ｒ_ｘを算出すればよい。そして、動作判定手段５６が、時系列データにおいて、相関係数が所定の閾値を越えた区間に対応する入力映像の一部の映像を、特定の動作の映像と判定し、この映像の区間を示す時刻（フレームの番号）の情報を出力する。 The operation determination unit 56 calculates the correlation coefficient R for the entire time series data section by shifting the correlation coefficient R by one frame from the time series data. When the specific motion determination device 5 determines the video of the specific motion of the person from the input video in real time, the time corresponding to a predetermined number of frames in which the start time is shifted frame by frame by the time-series data calculation unit 54. Since the series data is generated, the operation determination unit 56 may calculate the correlation coefficient r _x with respect to each time series data by the equation (6). Then, the motion determination means 56 determines a part of the input video corresponding to the section where the correlation coefficient exceeds a predetermined threshold in the time series data as the video of the specific motion, and the section of the video is Outputs information about the indicated time (frame number).

これによって、特定動作判定装置５は、リファレンス時系列データ記憶装置３に記憶されたリファレンス時系列データに基づいて、入力映像から特定の動作の映像を判定することができる。 As a result, the specific operation determination device 5 can determine the video of the specific operation from the input video based on the reference time-series data stored in the reference time-series data storage device 3.

以上、特定動作抽出システムＳの構成について説明したが、本発明はこれに限定されるものではない。例えば、ここでは、野球の投球のシーンを判定する場合を例に挙げて説明したが、本発明のリファレンスデータ生成装置１及び特定動作判定装置５は、野球に限らず、様々な動作に適用可能である。例えば、特定動作抽出システムＳは、サッカのコーナ付近を撮影したカメラ映像を解析することで、コーナキックが行われたか否かを判定することができる。また、例えば、セキュリティの分野においても、防犯カメラ等の映像に適用することで、異常行動の判定などを行うことができる。 The configuration of the specific action extraction system S has been described above, but the present invention is not limited to this. For example, here, a case where a baseball pitching scene is determined has been described as an example. However, the reference data generation device 1 and the specific motion determination device 5 of the present invention are applicable not only to baseball but also to various motions. It is. For example, the specific action extraction system S can determine whether or not a corner kick has been performed by analyzing a camera image obtained by photographing the vicinity of the corner of the sucker. Also, for example, in the field of security, it is possible to determine abnormal behavior by applying it to video from a security camera or the like.

更に、特定動作抽出システムＳによって判定された映像に対してメタデータを付与することで、メタデータの生成を自動で行うことができる。また、得られた情報を、野球中継などで提示して、視聴者への状況理解を促進することも可能である。特に、テレビ放送やデータ放送や、インタネットなどに向けた映像制作を支援することに貢献できる。 Furthermore, by adding metadata to the video determined by the specific action extraction system S, it is possible to automatically generate metadata. It is also possible to promote the understanding of the situation to the viewer by presenting the obtained information on a baseball game. In particular, it can contribute to supporting video production for television broadcasting, data broadcasting, and the Internet.

なお、リファレンスデータ生成装置１が、特定の動作を行う人物（投手）に対応付けられた特定動作映像を、外部から順次入力することとして、この人物ごとにリファレンス時系列データを生成することとしてもよい。このとき、リファレンス時系列データ記憶装置３は、人物ごとのリファレンス時系列データを、人物を識別する識別情報（例えば、人名や投手名）に対応付けて記憶する。更に、特定動作判定装置５の動作判定手段５６は、生成された時系列データと、それぞれの人物に対応するリファレンス時系列データとの相関を分析する。そして、動作判定手段５６は、ある人物の識別情報が対応付けられたリファレンス時系列データと相関の高い時系列データに対応する映像を、当該人物の特定の動作の映像と判定することができる。このように、特定動作判定装置５は、人物を識別して特定の動作の映像を判定でき、人物の識別情報とともに、特定の動作の映像と判定された投球の映像の時刻の情報を出力することができる。 Note that the reference data generation device 1 may generate reference time-series data for each person by sequentially inputting a specific action video associated with a person (pitcher) performing a specific action from the outside. Good. At this time, the reference time-series data storage device 3 stores the reference time-series data for each person in association with identification information (for example, a person name or pitcher name) for identifying the person. Further, the action determination unit 56 of the specific action determination device 5 analyzes the correlation between the generated time series data and the reference time series data corresponding to each person. Then, the motion determination unit 56 can determine a video corresponding to time-series data having a high correlation with reference time-series data associated with identification information of a certain person as a video of a specific motion of the person. As described above, the specific motion determination device 5 can identify a person and determine a video of a specific motion, and outputs information on the time of the video of the pitch determined to be a video of the specific motion together with the identification information of the person. be able to.

また、リファレンスデータ生成装置１が、リファレンス時系列データを生成する際に、例えば、操作者によって指定された特定動作映像の投球のタイミングの時刻の情報を当該リファレンス時系列データに対応付けてリファレンス時系列データ記憶装置３に記憶することとしてもよい。このとき、特定動作判定装置５の動作判定手段５６は、あるリファレンス時系列データと相関の高い時系列データに対応する映像において、当該リファレンス時系列データに対応付けられた投球タイミングの時刻に相当する時刻をこの映像における投球タイミングとすることで、特定動作判定装置５は、特定の動作の映像と判定した映像の投球のタイミングも判定して出力することができる。 Further, when the reference data generation device 1 generates the reference time-series data, for example, the reference time-series data is associated with the time information of the timing of pitching of the specific motion video specified by the operator in the reference time-series data. It may be stored in the series data storage device 3. At this time, the motion determination means 56 of the specific motion determination device 5 corresponds to the time of the pitching timing associated with the reference time-series data in the video corresponding to the time-series data having a high correlation with certain reference time-series data. By setting the time as the pitching timing in the video, the specific motion determination device 5 can also determine and output the pitching timing of the video determined as the video of the specific motion.

更に、リファレンスデータ生成装置１が、リファレンス時系列データを生成する際に、投球のタイミングの時刻の情報に加えて、特徴量算出手段１３によって算出された、当該時刻に対応する人物領域の位置（重心）の情報もまた当該リファレンス時系列データに対応付けてリファレンス時系列データ記憶装置３に記憶することとしてもよい。このとき、特定動作判定装置５の動作判定手段５６は、あるリファレンス時系列データと相関の高い時系列データに対応する映像において、当該リファレンス時系列データに対応付けられた投球タイミングの時刻をこの映像における投球タイミングとするとともに、当該時刻における人物（投手）のフレーム画像内における位置も出力することができる。 Further, when the reference data generating device 1 generates the reference time-series data, in addition to the time information of the pitching timing, the position of the person area (corresponding to the time calculated by the feature amount calculating unit 13 ( The center of gravity) information may also be stored in the reference time series data storage device 3 in association with the reference time series data. At this time, the motion determination means 56 of the specific motion determination device 5 uses the video of the pitch timing associated with the reference time-series data in the video corresponding to the time-series data highly correlated with certain reference time-series data. And the position of the person (pitcher) in the frame image at that time can also be output.

そして、この特定動作判定装置５によって特定の動作の映像と判定された映像の区間を示す時刻の情報と、投球タイミングの時刻の情報と、人物の位置の情報とを、同一の入力映像を入力して投球の映像からボールを抽出・追跡して投球軌跡を示す軌跡画像を生成する投球軌跡作画装置（例えば、特開２００５−１２３８２４号公報参照、図示せず）に出力することとしてもよい。この投球軌跡作画装置は、入力映像の各々のフレーム画像中に、ボールを探索する狭い探索範囲を設定することで、この探索範囲内からボールをリアルタイムで抽出して追跡することができる。このような投球軌跡作画装置に野球中継の映像のみを直接入力しても、この装置では投球の有無や投球のタイミングは検出できず、更に、放送カメラでは投球の合間にカメラ動作が行われるため、投球タイミングやボールの探索範囲の初期位置を毎回手動で設定する必要があった。 Then, the same input video is input as the time information indicating the section of the video determined as the video of the specific motion by the specific motion determination device 5, the time information of the pitching timing, and the position information of the person. Then, it may be output to a pitching trajectory drawing device (see, for example, JP-A-2005-123824, not shown) that extracts and tracks the ball from the video of the pitching and generates a trajectory image indicating the pitching trajectory. This pitching trajectory drawing device sets a narrow search range for searching for a ball in each frame image of the input video, and can extract and track the ball from the search range in real time. Even if only a baseball broadcast video is directly input to such a pitch trajectory drawing device, this device cannot detect the presence or timing of the pitch and the timing of the pitch, and the broadcast camera performs the camera operation between pitches. It was necessary to manually set the pitching timing and the initial position of the ball search range each time.

そこで、特定動作判定装置５が、特定の動作の映像と判定された映像の区間を示す時刻の情報と、投球タイミングの時刻の情報と、人物の位置の情報とを当該投球軌跡作画装置に入力することとすると、投球軌跡作画装置は、特定動作判定装置５から入力された映像の区間を示す時刻の情報に基づいて、入力映像の当該区間について軌跡画像を作画し、投球のタイミングを示す時刻をボールの抽出開始の時刻とし、更に、人物の位置をボールのおおよそのリリースポイントとみなして、この位置の情報に基づいて探索範囲を設定することで、特定動作判定装置５と投球軌跡作画装置とを用いて、軌跡画像の生成をすべて自動で行うことが可能になる。更に、特定動作判定装置５と投球軌跡作画装置とに入力する入力映像には放送カメラ映像を用いることができるため、新たなカメラの設置やキャリブレーションの必要がなく、運用性が高い。 Therefore, the specific motion determination device 5 inputs time information indicating a video segment determined to be a video of a specific motion, pitch timing time information, and person position information to the pitch trajectory drawing device. Then, the pitching trajectory drawing device draws a trajectory image for the segment of the input video based on the time information indicating the segment of the video input from the specific motion determination device 5, and indicates the timing of the pitching , And the position of the person is regarded as an approximate release point of the ball, and the search range is set based on the position information, whereby the specific motion determination device 5 and the pitching trajectory drawing device It is possible to automatically perform the generation of the trajectory image by using. Furthermore, since broadcast camera images can be used as input images to be input to the specific motion determination device 5 and the pitching trajectory drawing device, there is no need to install a new camera or calibration, and operability is high.

ただし、リアルタイムで軌跡画像を生成する場合において、本発明の特定動作判定装置５は、投球終了後に動作の判定がなされるため、この判定結果が投球軌跡作画装置に入力されるのを持っていては、投球軌跡作画装置が探索範囲を初期設定するタイミングを過ぎてしまう。そこで、特定動作判定装置５に入力される入力映像に比べて数秒程度遅延させた映像を投球軌跡作画装置に入力することで、特定動作判定装置５からの判定結果の入力後に、投球軌跡作画装置が、遅延された入力映像のフレーム画像中に探索範囲を初期設定し、ボールを抽出して追跡することができる。 However, in the case of generating a trajectory image in real time, the specific motion determination device 5 of the present invention determines the motion after the pitching is completed, and therefore the determination result is input to the pitching trajectory drawing device. Will pass the timing at which the pitch trajectory drawing device initially sets the search range. Therefore, by inputting a video delayed by several seconds compared to the input video input to the specific motion determination device 5 to the pitch trajectory drawing device, after the determination result from the specific motion determination device 5 is input, the pitch trajectory drawing device However, the search range can be initialized in the frame image of the delayed input video, and the ball can be extracted and tracked.

なお、リファレンスデータ生成装置１及び特定動作判定装置５は、コンピュータにおいて各手段を各機能プログラムとして実現することも可能であり、各機能プログラムを結合して、リファレンスデータ生成プログラム及び特定動作判定プログラムとして動作させることも可能である。 Note that the reference data generation device 1 and the specific operation determination device 5 can also realize each unit as a function program in a computer, and combine the function programs as a reference data generation program and a specific operation determination program. It is also possible to operate.

［特定動作抽出システムの動作］
次に、図５及び図６を参照（適宜図１参照）して、本発明におけるリファレンスデータ生成装置１及び特定動作判定装置５を備える特定動作抽出システムＳの動作について説明する。図５は、本発明におけるリファレンスデータ生成装置のリファレンスデータ生成動作を示したフローチャートである。図６は、本発明における特定動作判定装置の特定動作判定動作を示したフローチャートである。 [Operation of specific action extraction system]
Next, referring to FIGS. 5 and 6 (refer to FIG. 1 as appropriate), the operation of the specific action extraction system S including the reference data generation device 1 and the specific action determination device 5 in the present invention will be described. FIG. 5 is a flowchart showing the reference data generation operation of the reference data generation apparatus according to the present invention. FIG. 6 is a flowchart showing the specific operation determination operation of the specific operation determination device according to the present invention.

（リファレンスデータ生成動作）
まず、図５を参照（適宜図１参照）して、リファレンスデータ生成装置１が、特定動作判定装置５の判定における特定の動作の判定基準となるリファレンスデータを生成するリファレンスデータ生成動作について説明する。 (Reference data generation operation)
First, with reference to FIG. 5 (refer to FIG. 1 as appropriate), a reference data generation operation in which the reference data generation device 1 generates reference data serving as a determination criterion for a specific operation in the determination by the specific operation determination device 5 will be described. .

リファレンスデータ生成装置１は、撮影方向補正手段１２によって、外部から特定動作映像を入力して、当該特定動作映像の現フレーム画像Ｉｍａｇｅ_ｔの、画像記憶手段１１から入力された１フレーム遅延された隣接フレーム画像Ｉｍａｇｅ_ｔ−１に対する撮影方向のずれによる画像のずれを補正する（ステップＳ１１）。続いて、リファレンスデータ生成装置１は、特徴量算出手段１３によって、ステップＳ１１において補正された現フレーム画像Ｉｍａｇｅ_ｔと、画像記憶手段１１から入力された隣接フレーム画像Ｉｍａｇｅ_ｔ−１との差分画像を生成して、この差分画像から、特定動作映像の人物の動きの特徴を示す特徴量を算出する（ステップＳ１２）。 The reference data generating apparatus 1 receives a specific operation video from outside by the shooting direction correction unit 12 and adjoins the current frame image Image _t of the specific operation video delayed by one frame input from the image storage unit 11. The shift of the image due to the shift of the shooting direction with respect to the frame image Image _t-1 is corrected (step S11). Subsequently, the reference data generation device 1 uses the feature amount calculation unit 13 to obtain a difference image between the current frame image Image _t corrected in step S11 and the adjacent frame image Image _t−1 input from the image storage unit 11. The feature amount indicating the feature of the motion of the person in the specific motion video is calculated from the difference image (step S12).

そして、リファレンスデータ生成装置１は、撮影方向補正手段１２によって、この特定動作映像のすべてのフレーム画像について特徴量の算出が終了したかを判断する（ステップＳ１３）。そして、終了していない場合には（ステップＳ１３でＮｏ）、リファレンスデータ生成装置１は、ステップＳ１１に戻って、撮影方向補正手段１２が、次のフレーム画像のずれを補正する動作以降の動作を行う。 Then, the reference data generation device 1 determines whether the calculation of the feature amount is completed for all the frame images of the specific motion video by the shooting direction correction unit 12 (step S13). If not completed (No in step S13), the reference data generation device 1 returns to step S11, and the shooting direction correction unit 12 performs operations after the operation of correcting the shift of the next frame image. Do.

一方、終了した場合には（ステップＳ１３でＹｅｓ）、リファレンスデータ生成装置１は、時系列データ算出手段１４によって、ステップＳ１２において算出された当該特定動作映像の特徴量を時系列に並べて次元圧縮し、リファレンス時系列データを生成する（ステップＳ１４）。そして、リファレンスデータ生成装置１は、時系列データ算出手段１４によって、ステップＳ１４において生成されたリファレンス時系列データを、リファレンス時系列データ記憶装置３に記憶して（ステップＳ１５）動作を終了する。 On the other hand, when the processing is completed (Yes in step S13), the reference data generation device 1 uses the time-series data calculation unit 14 to arrange the feature amounts of the specific motion video calculated in step S12 in time series and to perform dimension compression. Then, reference time-series data is generated (step S14). Then, the reference data generation device 1 stores the reference time series data generated in step S14 in the reference time series data storage device 3 by the time series data calculation means 14 (step S15), and ends the operation.

（特定動作判定動作）
更に、図６を参照して、特定動作判定装置５が、図５に示すリファレンスデータ生成動作によって生成されたリファレンス時系列データを用いて、入力映像から特定の動作の映像を判定する特定動作判定動作について説明する。 (Specific operation judgment operation)
Further, referring to FIG. 6, the specific operation determination device 5 uses the reference time series data generated by the reference data generation operation shown in FIG. 5 to determine the specific operation video from the input video. The operation will be described.

特定動作判定装置５は、撮影方向補正手段５２によって、外部から入力映像を入力して、当該入力映像の現フレーム画像Ｉｍａｇｅ_ｔの、画像記憶手段５１から入力された１フレーム遅延された隣接フレーム画像Ｉｍａｇｅ_ｔ−１に対する撮影方向のずれによる画像のずれを補正する（ステップＳ５１）。続いて、特定動作判定装置５は、特徴量算出手段５３によって、ステップＳ５１において補正された現フレーム画像Ｉｍａｇｅ_ｔと、画像記憶手段５１から入力された隣接フレーム画像Ｉｍａｇｅ_ｔ−１との差分画像を生成して、この差分画像から、入力映像の人物の動きの特徴を示す特徴量を算出する（ステップＳ５２）。 The specific operation determination device 5 receives an input video from outside by the shooting direction correction unit 52, and the adjacent frame image delayed by one frame input from the image storage unit 51 of the current frame image Image _t of the input video. The shift of the image due to the shift of the shooting direction with respect to Image _t-1 is corrected (step S51). Subsequently, the specific action determination device 5 obtains a difference image between the current frame image Image _t corrected in step S51 and the adjacent frame image Image _t−1 input from the image storage unit 51 by the feature amount calculation unit 53. A feature amount indicating the feature of the motion of the person in the input video is calculated from the difference image (step S52).

そして、特定動作判定装置５は、撮影方向補正手段５２によって、この入力映像のすべてのフレーム画像について終了したかを判断する（ステップＳ５３）。そして、終了していない場合には（ステップＳ５３でＮｏ）、特定動作判定装置５は、ステップＳ５１に戻って、撮影方向補正手段５２が、次のフレーム画像のずれを補正する動作以降の動作を行う。 Then, the specific operation determination device 5 determines whether or not all the frame images of the input video have been completed by the shooting direction correction unit 52 (step S53). If not completed (No in step S53), the specific operation determination device 5 returns to step S51, and the shooting direction correction unit 52 performs the operation after the operation of correcting the shift of the next frame image. Do.

一方、終了した場合には（ステップＳ５３でＹｅｓ）、特定動作判定装置５は、時系列データ算出手段５４によって、ステップＳ５２において算出された当該入力映像の特徴量を時系列に並べて次元圧縮し、時系列データを生成する（ステップＳ５４）。そして、特定動作判定装置５は、時系列データ算出手段５４によって、ステップＳ５４において生成された時系列データを、入力時系列データ記憶手段５５に記憶する（ステップＳ５５）。更に、特定動作判定装置５は、動作判定手段５６によって、ステップＳ５５において記憶された時系列データと、リファレンスデータ生成動作（図５参照）によって生成された各々のリファレンス時系列データとの相関係数を算出して、相関係数が閾値以上の映像を特定の動作の映像と判定して（ステップＳ５６）、動作を終了する。 On the other hand, when the processing is completed (Yes in step S53), the specific action determination device 5 arranges the feature quantities of the input video calculated in step S52 in time series and performs dimension compression by the time series data calculation unit 54, Time series data is generated (step S54). The specific action determination device 5 stores the time series data generated in step S54 by the time series data calculation unit 54 in the input time series data storage unit 55 (step S55). Furthermore, the specific action determination device 5 uses the action determination means 56 to correlate the time series data stored in step S55 and each reference time series data generated by the reference data generation operation (see FIG. 5). Is calculated, and a video whose correlation coefficient is equal to or greater than a threshold is determined as a video of a specific motion (step S56), and the motion is terminated.

本発明のリファレンスデータ生成装置及び特定動作判定装置を備える特定動作抽出システムの構成を示したブロック図である。It is the block diagram which showed the structure of the specific action extraction system provided with the reference data generation apparatus and specific action determination apparatus of this invention. 現フレーム画像と、現フレーム画像内の注目矩形領域との例を模式的に示す模式図、（ａ）は、現フレーム画像の例を模式的に示す模式図、（ｂ）は、（ａ）のＢで示した、現フレーム画像内の注目矩形領域の例を模式的に示す模式図である。Schematic diagram schematically showing an example of the current frame image and a rectangular region of interest in the current frame image, (a) is a schematic diagram schematically showing an example of the current frame image, and (b) is (a). It is a schematic diagram which shows typically the example of the attention rectangular area in the present frame image shown by B of FIG. 本発明のリファレンスデータ生成装置の特徴量算出手段によって生成された差分画像を模式的に示す模式図である。It is a schematic diagram which shows typically the difference image produced | generated by the feature-value calculation means of the reference data generation apparatus of this invention. 本発明のリファレンスデータ生成装置の時系列データ算出手段によって算出されたリファレンス時系列データの例を示すグラフである。It is a graph which shows the example of the reference time series data calculated by the time series data calculation means of the reference data generation apparatus of this invention. 本発明におけるリファレンスデータ生成装置のリファレンスデータ生成動作を示したフローチャートである。It is the flowchart which showed the reference data production | generation operation | movement of the reference data production | generation apparatus in this invention. 本発明における特定動作判定装置の特定動作判定動作を示したフローチャートである。It is the flowchart which showed the specific operation | movement determination operation | movement of the specific operation determination apparatus in this invention.

Explanation of symbols

Ｓ特定動作抽出システム
１リファレンスデータ生成装置
１１画像記憶手段
１２撮影方向補正手段
１３特徴量算出手段
１４時系列データ算出手段
３リファレンス時系列データ記憶装置
５特定動作判定装置
５１画像記憶手段
５２撮影方向補正手段
５３特徴量算出手段
５４時系列データ算出手段
５５入力時系列データ記憶手段
５６動作判定手段 DESCRIPTION OF SYMBOLS S Specific motion extraction system 1 Reference data production | generation apparatus 11 Image memory | storage means 12 Shooting direction correction means 13 Feature-value calculation means 14 Time series data calculation means 3 Reference time series data storage device 5 Specific motion determination apparatus 51 Image storage means 52 Shooting direction correction Means 53 Feature amount calculation means 54 Time series data calculation means 55 Input time series data storage means 56 Operation determination means

Claims

For each time-series frame image in a video of a specific action of a person, the deviation at the time of shooting the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image is corrected and corrected. Generating a difference image between the frame image and the adjacent frame image, and calculating a plurality of feature amounts normalized based on a predetermined reference indicating a feature of the movement of the person based on the difference image; An input video input by referring to the reference time-series data from a reference time-series data storage device that pre-stores reference time-series data generated by dimensionally compressing the feature quantities in time series for each video. A specific action determination device for determining a video of a specific action of a person included in the input video,
Shooting direction correction means for detecting and correcting a shift in shooting of the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image for each of the time-series frame images of the input video; ,
A difference image between the frame image corrected by the photographing direction correction unit and the adjacent frame image is generated, and the number of pixels in the difference area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the person speed difference pixels in the area and the inclination of the main axis of the person area is normalized by the predetermined reference, the feature amount calculating means for calculating a plurality of feature amounts,
Time-series data calculating means for generating the dimension-compressed time-series data by arranging the feature quantities calculated by the feature-value calculating means in time series in the input video and performing principal component analysis ;
Calculating a correlation coefficient for all principal components of the reference time-series data stored in the reference time-series data storage device and the time-series data generated by the time-series data calculating means; In the above, the operation determination means for determining the video corresponding to the section in which the correlation coefficient is equal to or greater than a predetermined threshold as the video of the specific operation,
A specific action determination device comprising:

A reference data generation device that generates reference time-series data indicating characteristics of movement of the person's area in a video of a specific action of a person used in the specific action determination device according to claim 1,
For each of the time-series frame images in the video of the specific action of the person, a shift at the time of shooting the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image is detected and corrected. Photographing direction correction means;
A difference image between the frame image corrected by the photographing direction correction unit and the adjacent frame image is generated, and the number of pixels in the difference area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the person A feature amount calculating means for normalizing the difference pixel number of the region and the inclination of the principal axis of the person region according to a predetermined reference, and calculating as a plurality of feature amounts;
Time series data calculation means for generating the reference time series data subjected to dimension compression by arranging the feature quantities calculated by the feature quantity calculation means in time series in the video and performing principal component analysis ;
A reference data generation device comprising:

For each time-series frame image in a video of a specific action of a person, the deviation at the time of shooting the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image is corrected and corrected. Generating a difference image between the frame image and the adjacent frame image, and calculating a plurality of feature amounts normalized based on a predetermined reference indicating a feature of the movement of the person based on the difference image; An input video input by referring to the reference time-series data from a reference time-series data storage device that pre-stores reference time-series data generated by dimensionally compressing the feature quantities in time series for each video. In order to determine a video of a specific action of a person included in the input video,
For each of the time-series frame images of the input video, a shooting direction correction unit that detects and corrects a shift during shooting of the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image;
A difference image between the frame image corrected by the photographing direction correction unit and the adjacent frame image is generated, and the number of pixels in the difference area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the person speed difference pixels in the area and the inclination of the main axis of the person area is normalized by predetermined reference, the feature amount calculating means for calculating a plurality of feature quantities,
Time-series data calculating means for generating dimensionally compressed time-series data by arranging the feature quantities calculated by the feature-quantity calculating means in time series in the input video and performing principal component analysis ;
Calculating a correlation coefficient for all principal components of the reference time-series data stored in the reference time-series data storage device and the time-series data generated by the time-series data calculating means; In, the operation determination means for determining the video corresponding to the section where the correlation coefficient is equal to or greater than a predetermined threshold as the video of the specific operation,
A specific operation determination program characterized in that it functions as a program.

A computer for generating reference time-series data indicating characteristics of movement of the person area in a video of a specific action of a person used in the specific action determination device according to claim 1,
For each of the time-series frame images in the video of the specific action of the person, a shift at the time of shooting the entire frame image with respect to an adjacent frame image that is another frame image continuous with the frame image is detected and corrected. Shooting direction correction means,
A difference image between the frame image corrected by the photographing direction correction unit and the adjacent frame image is generated, and the number of pixels in the difference area, the variance of the difference area, the moving speed of the center of gravity of the difference area, the person A feature amount calculating means for normalizing the difference pixel number of the region and the inclination of the principal axis of the person region according to a predetermined reference, and calculating as a plurality of feature amounts;
Time series data calculation means for generating the reference time series data subjected to dimension compression by arranging the feature quantities calculated by the feature quantity calculation means in time series in the video and performing principal component analysis ;
A reference data generation program characterized by functioning as