JP7122243B2

JP7122243B2 - Image identification device, classification system, production support system, methods and programs thereof

Info

Publication number: JP7122243B2
Application number: JP2018239678A
Authority: JP
Inventors: 誠佐藤; 正斗神崎; 寿晃鈴木; 貴之篠田
Original assignee: Nippon Television Network Corp
Current assignee: Nippon Television Network Corp
Priority date: 2018-03-05
Filing date: 2018-12-21
Publication date: 2022-08-19
Anticipated expiration: 2038-12-21
Also published as: JP2020074055A

Description

本発明は、画像識別装置、分類システム、制作支援システム、それらの方法及びプログラムに関する。 The present invention relates to an image identification device, a classification system, a production support system, methods and programs thereof.

近年、画像認識の技術が発達し、競技の映像から選手を識別する技術が提案されている（例えば、特許文献１）。この技術は、映像中の選手が誰であるかの情報を取得するため、選手の顔、ユニフォームの色、背番号、番組用のスーパー（テロップ）などの特徴を認識に使用している。 In recent years, image recognition technology has been developed, and a technology for identifying a player from a game video has been proposed (for example, Patent Literature 1). This technology uses characteristics such as the player's face, uniform color, jersey number, and program super (telop) to obtain information about who the player is in the video.

特開２００２－２３６９１３号公報JP-A-2002-236913

ところで、映像の人物を識別する際、識別しようとする人物はかならずしも正面を向いているとは限らず、色々な方向を向いているのが通常である。 By the way, when identifying a person in an image, the person to be identified does not always face the front, but usually faces in various directions.

このような状況において、映像から特定の人物を、高精度で識別できる技術が求められていた。 Under such circumstances, there is a demand for a technology that can identify a specific person from a video with high accuracy.

そこで、本発明は、映像から特定の人物を高精度で識別できる画像識別装置、分類システム、制作支援システム、それらの方法及びプログラムを提供することにある。 Accordingly, it is an object of the present invention to provide an image identification device, a classification system, a production support system, methods and programs thereof that can identify a specific person from a video with high accuracy.

本発明の一態様は、映像から特定の特定人物を識別する画像識別装置であって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定する向き推定部と、前記特定人物の向き毎に、前記特定人物を識別するための学習が行われた複数の特定人物識別部と、前記向き推定部により推定された前記特定人物の向きに対応する前記特定人物識別部を選択し、選択した特定人物識別部により、前記映像から前記特定人物を識別させる制御部とを有する画像識別装置である。 One aspect of the present invention is an image identification device that identifies a specific person from an image, the person is recognized from the image, the position of the recognized person on the image is referred to, and the identified person on the image is identified. an orientation estimation unit for estimating an orientation of a person; a plurality of specific person identification units trained to identify the specific person for each orientation of the specific person; and the specific person estimated by the orientation estimation unit. and a control unit that selects the specific person identification unit corresponding to the orientation of the person and causes the selected specific person identification unit to identify the specific person from the video.

本発明の一態様は、映像から特定の特定人物を識別する画像識別装置の教師データを分類する分類システムであって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定する向き推定部と、前記向き推定部により推定された前記特定人物の向きに対応する格納部に、前記特定人物の向きを推定した映像を格納する格納部とを有する分類システムである。 One aspect of the present invention is a classification system for classifying teacher data for an image identification device that identifies a specific person from a video, wherein the person is recognized from the video, and the position of the recognized person on the video is referenced. and storing the image in which the orientation of the specific person is estimated in an orientation estimation unit for estimating the orientation of the specific person on the image and a storage unit corresponding to the orientation of the specific person estimated by the orientation estimation unit. A classification system having a storage for

本発明の一態様は、カメラが積載された中継車と、カメラの映像から特定の特定人物を識別する画像識別装置と、特定人物の所定ポイントを通過したことを報告する報告システムとを備えた制作支援システムであって、前記中継車は、中継車のコース上の位置情報を取得する位置情報取得部を有し、前記画像識別装置は、前記中継車に積載されたカメラの映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定する向き推定部と、前記特定人物の向き毎に、前記特定人物を識別するための学習が行われた複数の特定人物識別部と、前記向き推定部により推定された前記特定人物の向きに対応する前記特定人物識別部を選択し、選択した特定人物識別部により、前記映像から前記特定人物を識別させる制御部とを有し、前記特定人物識別部は、前記向き推定部により推定された前記特定人物の向きを参照し、前記特定人物の位置関係を特定し、前記報告システムは、前記報告の結果を集計する集計部を有し、前記制作支援システムは、前記中継車の位置情報と、前記画像識別装置が中継車に積載されたカメラの映像から識別した前記特定人物の位置関係と、前記報告システムの集計結果とを用いて、前記中継車及び前記特定人物の位置関係を視覚的に表現した画面を生成する画面生成部を有する制作支援システムである。 One aspect of the present invention includes a relay van loaded with a camera, an image identification device that identifies a specific specific person from the image of the camera, and a reporting system that reports that the specific person has passed a predetermined point. In the production support system, the relay vehicle has a position information acquisition unit that acquires position information on the course of the relay vehicle, and the image identification device identifies a person from an image of a camera mounted on the relay vehicle. an orientation estimating unit that recognizes, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image; A plurality of learned specific person identification units and the specific person identification unit corresponding to the orientation of the specific person estimated by the orientation estimation unit are selected, and the selected specific person identification unit identifies the a control unit that identifies a specific person, the specific person identification unit refers to the orientation of the specific person estimated by the orientation estimation unit, and identifies the positional relationship of the specific person; , a tallying unit for tallying the results of the report, and the production support system stores the position information of the relay vehicle and the position of the specific person identified by the image identification device from the image of the camera loaded on the relay vehicle. The production support system includes a screen generation unit that generates a screen that visually expresses the positional relationship between the broadcast van and the specific person using the relationship and the aggregated result of the reporting system.

本発明の一態様は、映像から特定の特定人物を識別する画像識別方法であって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定し、前記推定された前記特定人物の向きに対応する特定人物識別部を選択し、選択した特定人物識別部により、前記映像から前記特定人物を識別させ、前記特定人物識別部は、前記特定人物の向き毎に、前記特定人物を識別するための学習が行われたものである画像識別方法である。 One aspect of the present invention is an image identification method for identifying a specific person from a video, which includes recognizing the person from the video, referring to the position of the recognized person on the video, and performing the identification on the video. estimating the orientation of a person, selecting a specific person identification unit corresponding to the estimated orientation of the specific person, causing the selected specific person identification unit to identify the specific person from the video, and identifying the specific person from the image; is an image identification method in which learning for identifying the specific person is performed for each orientation of the specific person.

本発明の一態様は、映像から特定の特定人物を識別する画像識別装置の教師データを分類する分類方法であって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定し、前記推定された前記特定人物の向きに対応する格納部に、前記特定人物の向きを推定した映像を格納する分類方法である。 One aspect of the present invention is a classification method for classifying teacher data for an image identification device that identifies a specific person from a video, wherein the person is recognized from the video, and the position of the recognized person on the video is referenced. and estimating the orientation of the specific person on the image, and storing the image with the estimated orientation of the specific person in a storage unit corresponding to the estimated orientation of the specific person.

本発明の一態様は、映像から特定の特定人物を識別する情報処理装置のプログラムであって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定する向き推定部と、前記特定人物の向き毎に、前記特定人物を識別するための学習が行われた複数の特定人物識別部と、前記向き推定部により推定された前記特定人物の向きに対応する前記特定人物識別部を選択し、選択した特定人物識別部により、前記映像から前記特定人物を識別させる制御部として情報処理装置を機能させるプログラムである。 One aspect of the present invention is a program for an information processing device that identifies a specific person from a video, recognizing the person from the video, referring to the position of the recognized person on the video, an orientation estimation unit for estimating the orientation of the specific person; a plurality of specific person identification units trained to identify the specific person for each orientation of the specific person; A program that selects the specific person identification unit corresponding to the direction of the specific person and causes the information processing device to function as a control unit that identifies the specific person from the video by the selected specific person identification unit.

本発明の一態様は、映像から特定の特定人物を識別する画像識別装置の教師データを分類する情報処理装置のプログラムであって、映像から人物を認識し、認識された人物の前記映像上の位置を参照し、前記映像上の前記特定人物の向きを推定する向き推定部と、前記向き推定部により推定された前記特定人物の向きに対応する格納部に、前記特定人物の向きを推定した映像を格納する格納部として情報処理装置を機能させるプログラムである。 One aspect of the present invention is a program for an information processing device that classifies training data for an image identification device that identifies a specific person from a video, the program for recognizing the person from the video, and identifying the recognized person on the video. An orientation estimating unit for estimating the orientation of the specific person on the image by referring to the position, and a storage unit corresponding to the orientation of the specific person estimated by the orientation estimating unit. A program that causes an information processing device to function as a storage unit that stores video.

本発明は、精度良く、映像から特定の人物を識別できる。 The present invention can accurately identify a specific person from an image.

図１は第１の実施の形態における映像制作システムの構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of a video production system according to the first embodiment. 図２は第１の実施の形態を説明するための図である。FIG. 2 is a diagram for explaining the first embodiment. 図３は第１の実施の形態を説明するための図である。FIG. 3 is a diagram for explaining the first embodiment. 図４は第１の実施の形態を説明するための図である。FIG. 4 is a diagram for explaining the first embodiment. 図５は第１の実施の形態を説明するための図である。FIG. 5 is a diagram for explaining the first embodiment. 図６は第１の実施の形態における画像識別装置２の構成例を示すブロック図である。FIG. 6 is a block diagram showing a configuration example of the image identification device 2 in the first embodiment. 図７は時刻ｔにおける画像フレームの画像内に存在する人物を検出した場合の概念図である。FIG. 7 is a conceptual diagram when detecting a person existing in an image of an image frame at time t. 図８は時刻ｔ＋１における画像フレームの画像内に存在する人物を検出した場合の概念図である。FIG. 8 is a conceptual diagram when detecting a person existing in the image of the image frame at time t+1. 図９は時刻ｔにおける画像フレームの画像内に存在する人物を検出した場合の概念図である。FIG. 9 is a conceptual diagram when detecting a person existing in an image of an image frame at time t. 図１０は時刻ｔ＋１における画像フレームの画像内に存在する人物を検出した場合の概念図である。FIG. 10 is a conceptual diagram when detecting a person existing in the image of the image frame at time t+1. 図１１は注目領域を説明するための図である。FIG. 11 is a diagram for explaining a region of interest. 図１２は算出された識別対象（クラス）毎の尤度の一例を示した図である。FIG. 12 is a diagram showing an example of the likelihood calculated for each identification target (class). 図１３は第１の実施の形態を説明するための図である。FIG. 13 is a diagram for explaining the first embodiment. 図１４は第２の実施の形態における画像識別装置の構成例を示すブロック図である。FIG. 14 is a block diagram showing a configuration example of an image identification device according to the second embodiment. 図１５は第２の実施の形態を説明するための図である。FIG. 15 is a diagram for explaining the second embodiment. 図１６は第２の実施の形態を説明するための図である。FIG. 16 is a diagram for explaining the second embodiment. 図１７は第３の実施の形態における分類システムのブロック図である。FIG. 17 is a block diagram of a classification system according to the third embodiment. 図１８は第４の実施の形態のブロック図である。FIG. 18 is a block diagram of the fourth embodiment. 図１９はメタデータの一例を示す図である。FIG. 19 is a diagram showing an example of metadata. 図２０は第４の実施の形態を説明するための図である。FIG. 20 is a diagram for explaining the fourth embodiment. 図２１は第５の実施の形態のブロック図である。FIG. 21 is a block diagram of the fifth embodiment. 図２２は第５の実施の形態を説明するための図である。FIG. 22 is a diagram for explaining the fifth embodiment. 図２３は第１の実施の形態の変形例２における映像制作システムの構成例を示すブロック図である。FIG. 23 is a block diagram showing a configuration example of a video production system according to modification 2 of the first embodiment. 図２４は第１の実施の形態の変形例２を説明するための図である。FIG. 24 is a diagram for explaining Modification 2 of the first embodiment. 図２５は第５の実施の形態の他の例を説明するための図である。FIG. 25 is a diagram for explaining another example of the fifth embodiment.

＜第１の実施の形態＞
以下、図面を参照して、本発明の第１の実施の形態における画像識別装置及びプログラムを含む映像制作システムを説明する。 <First embodiment>
A video production system including an image identification device and a program according to a first embodiment of the present invention will be described below with reference to the drawings.

本発明の第１の実施の形態では、映像から特定の特定人物を識別する例として、多くの観客（人物）が存在する映像から駅伝の競技中の選手（特定人物）を識別する例を説明する。但し、あくまでも例であり、本発明は本例に限定されるものではない。 In the first embodiment of the present invention, as an example of identifying a specific person from a video, an example of identifying a player (specific person) during a long-distance relay race from a video in which many spectators (persons) are present will be described. do. However, this is only an example, and the present invention is not limited to this example.

図１は、本実施形態における映像制作システムの構成例を示すブロック図である。映像制作システムは、複数のカメラ１ａ～１ｄと、画像識別装置２とを備える。 FIG. 1 is a block diagram showing a configuration example of a video production system according to this embodiment. The video production system comprises a plurality of cameras 1a-1d and an image identification device 2.

カメラ１ａ～１ｄは、駅伝の競技中の選手の映像を撮影するカメラである。図１中では、４台のカメラを示している。 The cameras 1a to 1d are cameras for capturing images of the athletes during the relay race. In FIG. 1, four cameras are shown.

ここで、カメラ１ａは、図２に示す如く、選手の顔を正面から捉える撮影角度から選手を撮影するカメラである。すなわち、映像上の選手の向きが正面となる映像を撮影するカメラである。カメラ１ｂは、図３に示す如く、選手を背面から捉える撮影角度から選手を撮影するカメラである。すなわち、映像上の選手の向きが背面となる映像を撮影するカメラである。カメラ１ｃは、図４に示す如く、選手の顔を左側から捉える撮影角度から選手を撮影するカメラである。すなわち、映像上の選手の向きが左向きとなる映像を撮影するカメラである。カメラ１ｄは、図５に示す如く、選手の顔を右側から捉える撮影角度から選手を撮影するカメラである。すなわち、映像上の選手の向きが右向きとなる映像を撮影するカメラである。但し、１台のカメラで、選手を様々な撮影角度から撮影するようにしても良い。 Here, as shown in FIG. 2, the camera 1a is a camera that photographs the player from a photographing angle that captures the face of the player from the front. In other words, it is a camera that captures an image in which the direction of the player on the image is the front. The camera 1b, as shown in FIG. 3, is a camera for photographing the player from a photographing angle that captures the player from behind. In other words, it is a camera that captures an image in which the back of the player on the image is oriented. The camera 1c, as shown in FIG. 4, is a camera that takes an image of the player from a shooting angle that captures the player's face from the left side. In other words, it is a camera that captures an image in which the direction of the player on the image is to the left. The camera 1d, as shown in FIG. 5, is a camera that takes an image of the player from a shooting angle that captures the player's face from the right side. In other words, it is a camera that captures an image in which the orientation of the player on the image is to the right. However, one camera may be used to photograph the player from various shooting angles.

図６は、本実施形態における画像識別装置２の構成例を示すブロック図である。画像識別装置２は、向き推定部２１と、制御部２２と、画像識別部２３ａ～２３ｄとを備える。 FIG. 6 is a block diagram showing a configuration example of the image identification device 2 in this embodiment. The image identification device 2 includes an orientation estimation section 21, a control section 22, and image identification sections 23a to 23d.

向き推定部２１には、カメラ１ａ～１ｄのいずれか一つの映像が入力される。向き推定部２１は、入力された映像から人物を認識し、認識された人物の映像上の位置を参照し、映像上の選手（特定人物）の向きを推定する。 The orientation estimating unit 21 receives an image from one of the cameras 1a to 1d. The orientation estimation unit 21 recognizes a person from the input image, refers to the position of the recognized person on the image, and estimates the orientation of the player (specific person) on the image.

ここで、映像上の人物の認識であるが、人物であることが認識できればよく、個々の人物の属性等まで認識できる必要はない。すなわち、映像中の木や車等の存在物と人物とが区別できるような認識ができれば良い。そして、認識した人物の個人を特定するような属性（氏名や、属するチーム等）まで、識別する必要はない。 Here, as for the recognition of the person on the image, it is sufficient if the person can be recognized, and it is not necessary to be able to recognize the attributes of each person. In other words, it is sufficient to be able to recognize existing objects such as trees and cars in an image and persons to be distinguished from each other. In addition, it is not necessary to identify the attributes (name, team to which the person belongs, etc.) that identify the recognized person.

向き推定部２１による向きの推定の具体的な一例を説明すると、向き推定部２１は、入力された映像の時刻ｔにおける画像フレームの画像中から人物を検出する。図７は、時刻ｔにおける画像フレームの画像内に存在する人物を検出した場合の概念図である。図７、８では、四角で囲んだものが人物であると検出されたものであり、選手、観客にかかわらず、人物であると識別できるものを全て検出している。これらの映像上の人物の認識は、従来の画像認識技術を用いることができる。 A specific example of orientation estimation by the orientation estimation unit 21 will be described. FIG. 7 is a conceptual diagram when detecting a person existing in an image of an image frame at time t. In FIGS. 7 and 8, those enclosed in squares are detected as persons, and all objects that can be identified as persons are detected regardless of whether they are athletes or spectators. Conventional image recognition technology can be used to recognize people in these images.

次に、向き推定部２１の映像上の選手（特定人物）の向きの推定であるが、認識した人物の映像上の位置から特定する。推定する方法のひとつとして、認識された人物の映像上の位置の時間的変化を参照して、映像上の選手（特定人物）の向きを推定する。例えば、マラソンや駅伝等のスポーツでは、選手を一定の大きさで映るように、カメラは選手と一定の距離を保ちながら移動するケースが多い。この場合、選手はカメラと共に移動するが、観客はその場に留まる傾向が高い。 Next, the orientation of the player (specific person) on the image is estimated by the orientation estimation unit 21, and is identified from the position of the recognized person on the image. As one estimation method, the orientation of the player (specific person) on the video is estimated by referring to the temporal change in the position of the recognized person on the video. For example, in sports such as marathons and long-distance relay races, there are many cases in which the camera moves while maintaining a certain distance from the athlete so that the athlete is captured at a certain size. In this case, the athletes move with the camera, but the spectators tend to stay where they are.

例えば、選手の顔が見える正面方向から撮影した映像において、特定の観客に着目して考えると、時刻ｔの画像フレームの映像が図７に示すような場合、時刻ｔ＋１の画像フレームの映像では、その特定の観客はその場に留まり、図８に示すような映像になる。つまり、選手とカメラとの位置は時刻が進んでも維持されるが、その特定の観客とカメラとの距離は離れるため、その特定の観客は、映像上後方に移動することになる。このような映像では、選手の向きは正面であると推定することができる。 For example, in a video shot from the front where a player's face can be seen, when focusing on a specific spectator, the video of the image frame at time t is as shown in FIG. 7, the video of the image frame at time t+1 is That particular spectator remains in place, resulting in an image such as that shown in FIG. In other words, although the positions of the athletes and the camera are maintained as time progresses, the specific spectator moves backward in the video because the distance between the specific spectator and the camera increases. In such an image, it can be assumed that the player is facing the front.

一方、選手の背面方向から撮影した映像において、特定の観客に着目して考えると、時刻ｔの画像フレームの映像が図９に示すような場合、時刻ｔ＋１の画像フレームの映像では、その特定の観客はその場に留まり、図１０に示すような映像になる。つまり、選手とカメラとの位置は時刻が進んでも維持されるが、その特定の観客とカメラとの距離は近づくため、その特定の観客は、映像上前方に移動することになる。このような映像では、選手の向きは後ろを向いている（背面）であると推定することができる。 On the other hand, when focusing on a specific spectator in the video shot from behind the player, if the video of the image frame at time t is as shown in FIG. The spectator stays in place, and the image shown in FIG. 10 is displayed. In other words, although the positions of the players and the camera are maintained as time progresses, the distance between the specific spectator and the camera decreases, so the specific spectator moves forward in the image. In such an image, it can be estimated that the player is facing backward (back).

このように、異なる時刻で得られるフレーム画像中の人物に関して同一人物を対応付ける追跡処理を行うことにより、各人物のカメラとの相対的な移動ベクトルを推定することができる。そして、この移動ベクトルを用いることで、選手の向きを推定することができる。移動ベクトルは、対応付けられた異なる時刻に得られた画像内の人物の座標位置の差分ベクトルで表す。 In this way, by performing tracking processing that associates the same person with the persons in the frame images obtained at different times, it is possible to estimate the movement vector of each person relative to the camera. By using this movement vector, the orientation of the player can be estimated. The movement vector is represented by a difference vector of the coordinate positions of the person in the associated images obtained at different times.

具体的には、ある映像において、少数の人物の移動ベクトルの移動量が小さく、多数の人物の移動ベクトルの方向が映像上後方（遠ざかる方向）であり、その多数の人物の移動ベクトルの移動量が予め定めた閾値よりも移動量が大きい場合は、選手の顔が見える正面方向から撮影した映像であることが推定される。すなわち、選手の向きが正面であると推定される。 Specifically, in a given image, the amount of movement of the movement vectors of a small number of persons is small, the direction of the movement vectors of a large number of persons is backward in the image (the direction of moving away), and the amount of movement of the movement vectors of the large number of persons is is greater than a predetermined threshold value, it is estimated that the video was shot from the front where the player's face can be seen. That is, it is estimated that the player is facing forward.

一方、ある映像において、少数の人物の移動ベクトルの移動量が小さく、多数の人物の移動ベクトルの方向が映像上前方（近づく方向）であり、その多数の人物の移動ベクトルの移動量が予め定めた閾値よりも移動量が大きい場合は、選手の背面から撮影した映像であることが推定される。すなわち、選手の向きが背面であると推定される。 On the other hand, in a certain image, the amount of movement of the movement vectors of a small number of persons is small, the direction of the movement vectors of many persons is forward (approaching direction) in the image, and the amount of movement of the movement vectors of the large number of persons is predetermined. If the amount of movement is larger than the threshold value, it is estimated that the image is shot from behind the player. That is, it is estimated that the orientation of the player is the back.

このような移動ベクトルを用いて、選手の向きの推定を行う。 Using such a movement vector, the orientation of the player is estimated.

上述の例は、選手の向きが正面と背面との場合を説明したが、選手の向きが左向き、又は右向きの場合も同様の手法により推定することができる。例えば、選手の向きが左向きの場合は、選手以外の観客は、時間的な移動ベクトルが右方向に大きくなる。一方、選手の向きが右向きの場合は、選手以外の観客は、時間的な移動ベクトルが左方向に大きくなる。このようにして、選手の左右の向きを推定することができる。 In the above example, the case where the player faces the front and the back has been described, but the same method can be used to estimate when the player faces left or right. For example, when the player faces left, the temporal movement vectors of spectators other than the player increase in the right direction. On the other hand, when the player faces right, the spectators other than the player have larger temporal movement vectors in the left direction. In this way, the player's left-right orientation can be estimated.

また、上述の選手（特定人物）の向きの推定方法は一例であり、映像の種類によって適切な方法を選択する。例えば、バレーボールやサッカーなどのフィールド競技の映像では、上述したマラソンや駅伝等のスポーツとは逆に、選手は移動する一方、観客等のその場に留まる（例えば、観客席に座っている）映像が多い。このような競技では、移動量の大きい移動ベクトルの方向を用いることで、選手の向きを推定することができる。 Also, the method of estimating the orientation of the player (specific person) described above is an example, and an appropriate method is selected according to the type of video. For example, in videos of field events such as volleyball and soccer, contrary to sports such as marathons and long-distance relay races, the athletes move while the spectators remain in place (for example, sitting in the spectator seats). There are many. In such a game, the orientation of the player can be estimated by using the direction of the movement vector with a large amount of movement.

また、移動ベクトルを用いず、識別した人物の映像上の配置から選手の向きを推定するようにしても良い。例えば、バレーボール等において、ネットが映る正面側から撮影した映像では、ネット越しに、前衛（ネットに近い側）として３人の選手（左からフロントレフト(FL)、フロントセンター(FC)、フロントライト(FR)）が立ち、後衛（ネットから遠い側）として３人の選手（左からバックレフト(BL)、バックセンター(BC)、バックライト(BR)）が立つ、特徴的な人物の配置パターンが検出される。このような場合は、選手の向きが正面であると推定することができる。一方、ネットを背面にした映像では、ネットを背面にし、前衛（ネットに近い側）として３人の選手（左からフロントレフト(FL)、フロントセンター(FC)、フロントライト(FR)）が立ち、後衛（ネットから遠い側）として３人の選手（左からバックレフト(BL)、バックセンター(BC)、バックライト(BR)）が立つ人物の配置パターンが検出される。このような場合は、選手の向きが背面であると推定することができる。 Alternatively, the direction of the player may be estimated from the position of the identified person on the video without using the movement vector. For example, in a video shot from the front side where the net is reflected in a volleyball game, three players (from the left) front left (FL), front center (FC), front right (FR)) stands, and three players (from the left, back left (BL), back center (BC), back light (BR)) stand as the rear guard (the side farthest from the net). is detected. In such a case, it can be assumed that the player is facing forward. On the other hand, in the video with the net behind, three players (from the left, front left (FL), front center (FC), and front right (FR)) are standing in the vanguard (the side closest to the net). , and three players (back left (BL), back center (BC), and backlight (BR) from the left) standing as rear guards (far side from the net) are detected. In such a case, it can be assumed that the player is facing the back.

このように、選手（特定人物）の向きの推定方法は一例であり、映像の種類によって適切な方法を選択する。 Thus, the method of estimating the orientation of the player (specific person) is an example, and an appropriate method is selected depending on the type of video.

制御部２２は、向き推定部２１の推定結果を受け、推定された選手の向きに対応する画像識別部２３ａ～ｄのいずれかを選択し、選択した画像識別部２３ａ～ｄに映像を出力する。 The control unit 22 receives the estimation result of the orientation estimation unit 21, selects one of the image identification units 23a to 23d corresponding to the estimated orientation of the player, and outputs an image to the selected image identification unit 23a to 23d. .

画像識別部２３ａ～ｄは、選手の向き毎に、選手を識別するための学習が行われた画像識別部である。 The image identification units 23a to 23d are image identification units that have undergone learning for identifying a player for each orientation of the player.

第１の実施の形態では、画像識別部２３ａは、選手を正面から撮影した映像を用いて、選手を識別するための学習が行われた画像識別部である。別の言い方をすると、選手が映像上で正面を向いている（選手の顔が見える向き）映像を教師データとして学習され、選手が映像上で正面を向いている（選手の顔が見える向き）映像から選手を識別するために用いられる画像識別部である。尚、教師データは、選手が映像上で真正面を向いている（選手の顔が見える向き）映像のみを教師データとするものではなく、選手が正面に近い左右方向を向いている（例えば、左右４５度）映像を教師データに加えても良い。 In the first embodiment, the image identification unit 23a is an image identification unit that has undergone learning for identifying a player using a video image of the player taken from the front. In other words, the training data is a video in which the player is facing the front (the direction in which the player's face can be seen), and the player is trained as training data. This is an image identification unit used to identify the player from the video. It should be noted that the training data does not consist only of videos in which the player is facing straight ahead (the direction in which the player's face can be seen), but rather in which the player is facing in a left-right direction that is close to the front (for example, left and right). 45 degree) video may be added to the teacher data.

また、画像識別部２３ｂは、選手を背面から撮影した映像を用いて、選手を識別するための学習が行われた画像識別部である。別の言い方をすると、選手が映像上で後ろを向いている（選手の顔が見えない向き）映像を教師データとして学習され、選手が映像上で後ろを向いている（選手の顔が見えない向き）映像から選手を識別するために用いられる画像識別部である。尚、教師データは、選手が映像上で背面を向いている（選手の顔が見えない向き）映像のみを教師データとするものではなく、選手が背面に近い左右方向を向いている（例えば、左右４５度）映像を教師データに加えても良い。 Further, the image identification unit 23b is an image identification unit that has undergone learning for identifying a player using a video image of the player taken from behind. In other words, a video in which the player is facing backwards (a direction in which the player's face cannot be seen) is learned as training data, and a video in which the player is facing backwards (a player's face cannot be seen) Direction) This is an image identification unit used to identify the player from the video. In addition, the training data does not consist only of images in which the player faces the back (a direction in which the player's face cannot be seen) in the video, but in which the player faces the left and right directions close to the back (for example, 45 degrees to the left and right) may be added to the teacher data.

また、画像識別部２３ｃは、選手の左横側から撮影した映像を用いて、選手を識別するための学習が行われた画像識別部である。別の言い方をすると、選手が映像上で左方向を向いている（選手の左顔が見える向き）映像を教師データとして学習され、選手が映像上で左方向を向いている（選手の左顔が見える向き）映像から選手を識別するために用いられる画像識別部である。尚、選手が映像上で左方向を向いている（選手の左顔が見える向き）映像のみを教師データとするものではなく、選手が左方向に近い左右方向を向いている（例えば、左右４５度）映像を教師データに加えても良い。 Also, the image identification unit 23c is an image identification unit that has undergone learning for identifying a player using an image captured from the left side of the player. In other words, the training data is a video in which the player is facing left (the direction in which the player's left face can be seen). is visible) This is an image identification unit used to identify the player from the video. It should be noted that not only images in which the player is facing left on the video (the direction in which the player's left face is visible) are used as training data, but the player is facing in a horizontal direction close to the left (for example, left and right 45 degrees). degree) You may add an image|video to teacher data.

また、画像識別部２３ｄは、選手の右横側から撮影した映像を用いて、選手を識別するための学習が行われた画像識別部である。別の言い方をすると、選手が映像上で右方向を向いている（選手の右顔が見える向き）映像を教師データとして学習され、選手が映像上で右方向を向いている（選手の右顔が見える向き）映像から選手を識別するために用いられる画像識別部である。尚、選手が映像上で右方向を向いている（選手の右顔が見える向き）映像のみを教師データとするものではなく、選手が右方向に近い左右方向を向いている（例えば、左右４５度）映像を教師データに加えても良い。 Also, the image identification unit 23d is an image identification unit that has undergone learning for identifying a player using an image taken from the right lateral side of the player. In other words, the training data is a video in which the player is facing right (the direction in which the player's right face can be seen). is visible) This is an image identification unit used to identify the player from the video. It should be noted that not only the video in which the player is facing right (the direction in which the player's right face is visible) is used as training data, but rather the player is facing in a left-right direction close to the right (for example, left and right 45 degrees). degree) You may add an image|video to teacher data.

画像識別部２３ａ～ｄの学習方法としては、パターンマッチングや、ディープラーニング等の手法を用いた機械学習などがある。 As a learning method of the image identification units 23a to 23d, there are pattern matching, machine learning using techniques such as deep learning, and the like.

ここで、画像識別部２３ａ～ｄが識別する識別対象は、例えば、映像中の選手（特定人物）及びその選手の属性である。選手の属性とは、選手個々の氏名や年齢のみならず、例えば、選手の属するチームや大学、役割（野球の場合には投手や野手等、サッカーの場合には、オフェンスやディフェンス）等である。尚、以下の説明では、便宜的に、識別対象をクラスと記載する場合がある。また、カテゴリーとは、選手及びその選手の属性を識別するために用いられる特徴量の種類である。代表的なカテゴリーとしては、例えば、選手の顔、選手が着ているユニフォーム、背番号、タスキ及びゼッケンに記載されている文字等である。 Here, the identification targets identified by the image identification units 23a to 23d are, for example, a player (specific person) in the video and attributes of the player. A player's attributes include not only the name and age of each player, but also the team or university to which the player belongs, role (pitcher, fielder, etc. in the case of baseball, offense or defense in the case of soccer), etc. . In the following description, for the sake of convenience, an object to be identified may be referred to as a class. A category is a type of feature used to identify a player and attributes of the player. Representative categories include, for example, the face of the player, the uniform worn by the player, the uniform number, the sash, and the letters written on the bib.

多数の人物が存在する映像から正しく選手及びその選手の属性を識別するためには、どのカテゴリーに着目して識別するかが重要である。例えば、映像中の選手の位置や選手の属するチームを識別するには、ユニフォームの色や模様、タスキ及びゼッケンに記載されている文字等が重要なカテゴリーとなる。一方、個々の選手（氏名等）まで特定したいのならば、各選手の顔のカテゴリーは重要である。更に、どのようなカテゴリーを用い、そのカテゴリーにどのような重みをかけるかは、競技毎に異なる。例えば、競技がマラソン、駅伝である場合、映像中に選手以外の観客等の人物が多数存在しており、選手が履いているシューズ等の特徴量に重点を置いて用いても、類似するシューズを履いている観客がおり、精度よく識別することはできない。一方、競技が野球やサッカー等の場合、映像中に存在する観客等の位置がほぼ決まっているので、観客等を識別前に識別対象から除くことは比較的容易であり、識別自体は主に選手が着ているユニフォームに着目すれば良い。 In order to correctly identify players and their attributes from a video in which many people are present, it is important which category should be focused on for identification. For example, in order to identify the position of a player in an image or the team to which the player belongs, the important categories are the color and pattern of the uniform, the characters written on the sash and the bib. On the other hand, if it is desired to specify even individual players (names, etc.), the category of each player's face is important. Furthermore, what categories are used and what weights are given to those categories differ from sport to sport. For example, if the competition is a marathon or a relay road race, there are many people such as spectators other than the athletes in the video. There is a spectator who wears , and it is not possible to identify it with high accuracy. On the other hand, when the game is baseball, soccer, etc., the positions of the spectators in the video are almost fixed, so it is relatively easy to exclude the spectators from the identification target before identification. Just look at the uniforms the players are wearing.

上記の理由から、画像識別部２３ａ～ｄは、競技毎又は識別する選手毎に各カテゴリーに対して異なる重みのパラメータを記憶するように構成しても良い。 For the above reasons, the image identification units 23a-23d may be configured to store different weight parameters for each category for each sport or for each player to be identified.

上述した画像識別装置２は、例えば、ＣＰＵ（Central Processing Unit）等のプロセッサと、一時記憶としてのメモリと、不揮発性の記憶装置（ＥＥＰＲＯＭやハードディスク）とを含み構成される。記憶装置に記憶されたプログラムをメモリに読み出して実行することにより、ＣＰＵ等のプロセッサが、向き推定部２１、制御部２２及び画像識別部２３ａ～ｄとして機能する。 The image identification device 2 described above includes, for example, a processor such as a CPU (Central Processing Unit), a memory as temporary storage, and a nonvolatile storage device (EEPROM or hard disk). A processor such as a CPU functions as the orientation estimation unit 21, the control unit 22, and the image identification units 23a to 23d by reading the program stored in the storage device into the memory and executing the program.

次に、本発明の実施の形態における画像識別装置及びプログラムを含む映像制作システムの動作を説明する。尚、以下の説明では、競技が駅伝であり、識別する特定人物が選手であり、識別対象が選手の位置及びその選手が所属する大学名である例を説明する。 Next, the operation of the video production system including the image identification device and program according to the embodiment of the present invention will be described. In the following description, an example will be described in which the competition is an Ekiden, the specific person to be identified is a player, and the identification target is the position of the player and the name of the university to which the player belongs.

まず、カメラ１ａ～１ｄは、観客及び選手を含む映像を撮影する。本例では、カメラ１ａは、選手の顔を正面から捉える撮影角度から選手を撮影するカメラである。カメラ１ｂは、選手を背面から捉える撮影角度から選手を撮影するカメラである。カメラ１ｃは、選手の顔を左側から捉える撮影角度から選手を撮影するカメラである。カメラ１ｄは、選手の顔を右側から捉える撮影角度から選手を撮影するカメラである。 First, the cameras 1a to 1d shoot images including spectators and athletes. In this example, the camera 1a is a camera for photographing the player from a photographing angle that captures the face of the player from the front. The camera 1b is a camera for photographing the player from a photographing angle that captures the player from behind. The camera 1c is a camera that takes an image of the player from a shooting angle that captures the player's face from the left side. The camera 1d is a camera that takes an image of the player from a shooting angle that captures the player's face from the right side.

これらのカメラ１ａ～１ｄのうち、例えば、放送用にいずれかのひとつが選択されて使用される。ここでは、最初に、カメラ１ａの映像が使用されるものとする。従って、向き推定部２１に入力される映像は、カメラ１ａの映像である。 One of these cameras 1a to 1d is selected and used for broadcasting, for example. Here, it is assumed that the image of the camera 1a is used first. Therefore, the image input to the direction estimation unit 21 is the image of the camera 1a.

向き推定部２１は、カメラ１ａの映像を入力し、映像上の選手の向きを推定する。本例では、カメラ１ａの映像の時刻ｔの画像フレームの映像が図７に示すものとする。そして、カメラ１ａの映像の時刻ｔ＋１の画像フレームの映像が図８に示すものとする。すると、上述したように、特定の観客に着目して考えると、選手とカメラとの位置は時刻が進んでも維持されるが、その特定の観客とカメラとの距離は離れるため、その特定の観客は、映像上後方に移動することになる。このような映像では、選手の向きは正面であると推定される。推定された結果（選手の向きが正面）は、制御部２２に出力される。 The direction estimation unit 21 inputs the image of the camera 1a and estimates the direction of the player on the image. In this example, the image of the image frame at the time t of the image of the camera 1a is shown in FIG. It is assumed that the image of the image frame at the time t+1 of the image of the camera 1a is shown in FIG. Then, as described above, focusing on a specific spectator, the positions of the players and the camera are maintained even if the time advances, but the distance between the specific spectator and the camera increases. moves backward on the image. In such a video, the orientation of the player is presumed to be frontal. The estimated result (the player is facing forward) is output to the control unit 22 .

制御部２２は、向き推定部２１の推定結果（選手の向きが正面）を受信し、選手及びその選手の属性を識別させる画像識別部を、画像識別部２３ａ～ｄのいずれかから選択する。本例では、向き推定部２１の推定結果は選手の向きが正面であるので、選手の向きが正面用の画像識別部２３ａを選択する。そして、カメラ１ａの映像を、画像識別部２３ａに出力する。 The control unit 22 receives the estimation result of the orientation estimation unit 21 (the orientation of the player is the front), and selects an image identification unit for identifying the player and the attributes of the player from any of the image identification units 23a to 23d. In this example, the estimation result of the orientation estimating unit 21 indicates that the player is facing forward, so the image identification unit 23a for the player facing front is selected. Then, the image of the camera 1a is output to the image identifying section 23a.

画像識別部２３ａは、カメラ１ａの映像から、選手及びその選手の属性を識別する。まず、画像識別部２３ａは、フレームの画像中から人物を検出する。人物を検出する方法は限定するものではない。図７では、四角で囲んだものが人物であると検出されたものであり、選手、観客にかかわらず、人物であると識別できるものを検出している。尚、向き推定部２１により、映像における人物の識別は済んでいるので、これらの人物の識別は向き推定部２１の処理を引き継ぎ、人物の識別自体は行わなくても良いように構成することもできる。 The image identification unit 23a identifies the player and the attributes of the player from the video of the camera 1a. First, the image identification unit 23a detects a person from the image of the frame. A method for detecting a person is not limited. In FIG. 7, the objects enclosed in squares are detected as persons, and those that can be identified as persons are detected regardless of whether they are athletes or spectators. Since the orientation estimation unit 21 has already identified the persons in the video, the identification of these persons may take over the processing of the orientation estimation unit 21, and the identification of the persons itself may be omitted. can.

次に、画像識別部２３ａは、検出した人物に関して、図１１に示す如く、頭頂部から首中点を結ぶ直線の距離をＬとし、人物の首中心から下方向の位置にＬ×２Ｌの大きさの注目領域を設定する。そして、検出された人物が予め学習した識別対象（クラス）、すなわち特定の選手に属する尤度（信頼度）を算出する。画像識別部２３ａは、算出された尤度が予め定められた閾値より越え、最も尤度が高い人物をそのクラス（特定の選手）に属する人物と判定する。 Next, as shown in FIG. 11, the image identification unit 23a sets the distance of a straight line connecting the top of the head to the midpoint of the neck of the detected person as L, and places a large image of L×2L at a position downward from the center of the neck of the person. set the region of interest for Then, the likelihood (reliability) that the detected person belongs to an identification target (class) learned in advance, that is, to a specific player is calculated. The image identifying unit 23a determines a person whose calculated likelihood exceeds a predetermined threshold and has the highest likelihood as a person belonging to the class (specific player).

図１２は、算出された識別対象（クラス）毎の尤度を、一例を示した図である。尚、図１２の例では、識別した選手の大学名以外にも精度を高めるために、白バイと観客のクラスを設けている。図１２の例では、人物Ａは観客である確からしさが最も高く、人物Ｂは大学Ｙの選手である確からしさが最も高く、人物Ｃは大学Ｘの選手である確からしさが最も高く、人物Ｄは大学Ｚの選手である確からしさが最も高い。ここで、閾値を０．７とすると、画像識別部２３ａは、人物Ａは観客のクラスに属し、人物Ｂは大学Ｙのクラスに属し、人物Ｃは大学Ｘのクラスに属し、人物Ｄは大学Ｚのクラスに属すると判定する。そして、人物Ｂの映像上の位置情報（例えば、注目領域を特定する座標情報）と大学名Ｙとを出力する。同様に、人物Ｃの映像上の位置情報（例えば、注目領域を特定する座標情報）と大学名Ｘとを出力する。同様に、人物Ｄの映像上の位置情報（例えば、注目領域を特定する座標情報）と大学名Ｚとを出力する。尚、人物Ａについては、選手ではないので、出力対象から除外するが、出力することを妨げるものではない。 FIG. 12 is a diagram showing an example of the likelihood calculated for each identification target (class). In the example of FIG. 12, in addition to the identified player's college name, there are also classes for police motorcycles and spectators in order to improve accuracy. In the example of FIG. 12, person A is most likely to be a spectator, person B is most likely to be a university Y athlete, person C is most likely to be a university X athlete, and person D is most likely to be an athlete. is most likely to be a player from University Z. Here, if the threshold value is 0.7, the image identification unit 23a determines that person A belongs to the audience class, person B belongs to the university Y class, person C belongs to the university X class, person D belongs to the university class, and person D belongs to the university class. It is determined that it belongs to the Z class. Then, the position information of the person B on the image (for example, coordinate information specifying the attention area) and the university name Y are output. Similarly, the position information of the person C on the image (for example, coordinate information specifying the attention area) and the university name X are output. Similarly, the position information of the person D on the image (for example, the coordinate information specifying the attention area) and the university name Z are output. Note that the person A is not a player, so he is excluded from the output targets, but this does not prevent him from being output.

本例では、選手の向きが正面の場合の識別に特化した画像識別部２３ａを用いて、選手の顔を正面から捉える撮影角度で選手を撮影した映像から、選手（クラス）を識別しているので、選手の向きを考慮しない画像認識に比較して、高精度に選手（特定人物）を識別することができる。 In this example, the player (class) is identified from the video of the player captured at a shooting angle that captures the player's face from the front using the image identification unit 23a specialized for identification when the player faces the front. Therefore, the player (specific person) can be identified with higher accuracy than image recognition that does not consider the direction of the player.

このようにして、映像から選手及び選手の属性を識別する。識別された選手の位置情報及びその選手の属性（上述した例では大学名）は、映像の加工に用いることができる。例えば、図１３に示されるように、識別された選手の大学名の画像をコンピュータグラフィックにより、各識別された選手の上に重畳するようにする。このような形態を取れば、映像制作は、映像を目視により確認して選手を識別する必要がなく、映像制作の自動化を図ることができる。 In this way, players and their attributes are identified from the video. The identified player's position information and the player's attribute (university name in the above example) can be used for video processing. For example, as shown in FIG. 13, an image of the identified player's college name may be computer-graphically superimposed on each identified player. If such a form is adopted, video production does not need to be visually confirmed to identify players, and video production can be automated.

更に、駅伝では、区間ごとに走者が決まっているので、区間ごとにその区間を走る選手を識別するためのクラスに限定することもできる。更に、識別した選手の大学名が識別できれば、その区間を走る選手名も特定することもでき、その特定した選手名を、映像上の選手の上に表示することもできる。 Furthermore, in the Ekiden, runners are determined for each section, so it is possible to limit the classes for each section to identify the runners who will run in that section. Furthermore, if the university name of the identified athlete can be identified, the name of the athlete who runs the section can also be identified, and the identified athlete name can be displayed above the athlete on the video.

続いて、時間が経過し、放送用の映像がカメラ１ｂの映像に切り替えられたものとする。この場合、向き推定部２１に入力される映像は、カメラ１ｂの映像である。 Subsequently, it is assumed that time has passed and the image for broadcasting has been switched to the image of the camera 1b. In this case, the image input to the direction estimation unit 21 is the image of the camera 1b.

向き推定部２１は、カメラ１ｂの映像を入力し、映像上の選手の向きを推定する。本例では、カメラ１ｂの映像の時刻ｔの画像フレームの映像が図９に示すものとする。そして、カメラ１ｂの映像の時刻ｔ＋１の画像フレームの映像が図１０に示すものとする。すると、上述したように、特定の観客に着目して考えると、選手とカメラとの位置は時刻が進んでも維持されるが、その特定の観客とカメラとの距離は近づくため、その特定の観客は、映像上前方に移動することになる。このような映像では、選手の向きは後ろを向いている（背面）であると推定される。推定された結果（選手の向きが背面）は、制御部２２に出力される。 The orientation estimation unit 21 inputs the image from the camera 1b and estimates the orientation of the player on the image. In this example, the image of the image frame at the time t of the image of the camera 1b is shown in FIG. 10 shows the image of the image frame at time t+1 of the image of the camera 1b. Then, as described above, focusing on a specific spectator, the positions of the athletes and the camera are maintained even if the time advances, but the distance between the specific spectator and the camera is shortened. will move forward on the image. In such a video, it is assumed that the player is facing backwards (back). The estimated result (the player is facing the back) is output to the control unit 22 .

制御部２２は、向き推定部２１の推定結果（選手の向きが背面）を受信し、選手及びその選手の属性を識別させる画像識別部を、画像識別部２３ａ～ｄのいずれかから選択する。本例では、向き推定部２１の推定結果は選手の向きが背面であるので、選手の向きが背面用の画像識別部２３ｂを選択する。そして、カメラ１ｂの映像を、画像識別部２３ｂに出力する。 The control unit 22 receives the estimation result of the direction estimation unit 21 (the player is facing the back), and selects an image identification unit for identifying the player and the attributes of the player from any of the image identification units 23a to 23d. In this example, the estimation result of the orientation estimating unit 21 indicates that the player is facing the back, so the image identification unit 23b for the player facing the back is selected. Then, the image of the camera 1b is output to the image identifying section 23b.

画像識別部２３ｂは、カメラ１ｂの映像から、選手及びその選手の属性を識別する。まず、画像識別部２３ｂは、フレームの画像中から人物を検出する。人物を検出する方法は限定するものではない。図９は、時刻ｔにおける画像フレームの画像内に存在する人物を検出した場合の概念図である。図９では、四角で囲んだものが人物であると検出されたものであり、選手、観客にかかわらず、人物であると識別できるものを検出している。 The image identification unit 23b identifies the player and the attributes of the player from the image of the camera 1b. First, the image identification unit 23b detects a person from the image of the frame. A method for detecting a person is not limited. FIG. 9 is a conceptual diagram when detecting a person existing in an image of an image frame at time t. In FIG. 9, the objects enclosed in squares are detected as persons, and those that can be identified as persons are detected regardless of whether they are athletes or spectators.

画像識別部２３ｂは、上述したと同様に、検出した人物に関して、図１１に示す如く、頭頂部から首中点を結ぶ直線の距離をＬとし、人物の首中心から下方向の位置にＬ×２Ｌの大きさの注目領域を設定する。そして、検出された人物が予め学習した識別対象（クラス）に属する尤度（信頼度）を算出する。画像識別部２３ｂは、算出された尤度が予め定められた閾値より越え、最も尤度が高い人物をそのクラスに属する人物と判定する。 In the same manner as described above, the image identifying unit 23b sets the distance of the straight line connecting the top of the head to the midpoint of the neck of the detected person as L, and the position downward from the center of the neck of the person as shown in FIG. A region of interest with a size of 2L is set. Then, the likelihood (reliability) that the detected person belongs to an identification target (class) learned in advance is calculated. The image identifying unit 23b determines a person whose calculated likelihood exceeds a predetermined threshold and has the highest likelihood as a person belonging to the class.

本例では、選手の向きが背面の場合の識別に特化した画像識別部２３ｂを用いて、選手を背面から捉える撮影角度で選手を撮影した映像から、選手（クラス）を識別しているので、選手の向きを考慮しない画像認識に比較して、高精度に選手（特定人物）を識別することができる。 In this example, the player (class) is identified from the video of the player captured at a shooting angle that captures the player from the back using the image identification unit 23b that specializes in identifying when the player is facing the back. , the player (specific person) can be identified with high accuracy compared to image recognition that does not consider the direction of the player.

尚、向き推定部２１の向きの判定によっては、選手（特定人物）の向きが確定することができない場合がある。例えば、選手が真正面を向いておらず、多少左方向を向いている場合等である。この場合、向き推定部２１は、選手の向きが正面であるとの判定結果と、選手の向きが左方向であるとの判定結果が出力される可能性がある。このような場合、制御部２２は、正面用の画像識別部２３ａと左方向の画像識別部２３ｃとの双方を選択し、画像識別部２３ａと画像識別部２３ｃとに画像識別を行わせるようにしても良い。 Depending on the determination of the orientation by the orientation estimation unit 21, the orientation of the player (specific person) may not be determined. For example, this is the case when the player is not facing straight ahead but slightly to the left. In this case, the direction estimation unit 21 may output a determination result that the player is facing forward and a determination result that the player is facing left. In such a case, the control unit 22 selects both the front image identification unit 23a and the left image identification unit 23c, and causes the image identification unit 23a and the image identification unit 23c to perform image identification. can be

更に、向き推定部２１が選手（特定人物）の向きの確からしさ（例えば、正面の向きの確からしさ＝０．７、左向きの確からしさ＝０．３等）に基づいて、確からしさが高い向き用の画像識別部を選択するようにしても良い。 Further, the orientation estimation unit 21 determines the orientation of the player (specific person) based on the likelihood of the orientation (for example, the likelihood of facing forward = 0.7, the likelihood of facing left = 0.3, etc.). You may make it select the image identification part for.

第１の実施の形態では、入力される映像から映像中の選手（特定人物）の向きを推定し、各選手（特定人物）の向きの識別に特化した画像認識部により、その映像から選手（特定人物）を識別しているので、選手（特定人物）の向きを考慮しない画像認識に比較して、高精度に選手（特定人物）を識別することができる。 In the first embodiment, the direction of the player (specific person) in the video is estimated from the input video, and an image recognition unit specialized in identifying the direction of each player (specific person) recognizes the player from the video. Since the (specific person) is identified, the player (specific person) can be identified with higher accuracy than image recognition that does not consider the orientation of the player (specific person).

＜第１の実施の形態の変形例１＞
次に、上述した第１の実施の形態の変形例１を説明する。 <Modification 1 of the first embodiment>
Next, Modification 1 of the above-described first embodiment will be described.

上述した第１の実施の形態では、向き推定部２１が選手の向きを推定する際に、認識された人物の移動ベクトルとの相違から、選手の向きを推定している。 In the above-described first embodiment, when the orientation estimation unit 21 estimates the orientation of the player, the orientation of the player is estimated from the difference from the recognized movement vector of the person.

そこで、第１の実施の形態の変形例１では、認識された人物の移動ベクトルを用いて、観客（人物）と選手（特定人物）とを区別する。 Therefore, in Modification 1 of the first embodiment, the recognized movement vector of the person is used to distinguish between the spectator (person) and the player (specific person).

具体的には、向き推定部２１は、移動ベクトルの移動量と、各フレーム単位で算出された人物とを関連付けて記憶していく。そして、移動ベクトルの移動量と各フレーム単位で算出された人物とが関連付けられた情報を、制御部２２に出力する。 Specifically, the orientation estimation unit 21 associates and stores the movement amount of the movement vector and the person calculated for each frame. Then, information in which the movement amount of the movement vector and the person calculated for each frame are associated is output to the control unit 22 .

制御部２２では、向き推定部２１の向きの推定結果を受信し、選手及びその選手の属性を識別させる画像識別部を、画像識別部２３ａ～ｄのいずれかから選択する。そして、選択した画像識別部２３ａ～ｄに、映像とともに、移動ベクトルの移動量と各フレーム単位で算出された人物とが関連付けられた情報を出力する。 The control unit 22 receives the direction estimation result of the direction estimation unit 21, and selects an image identification unit for identifying the player and the attributes of the player from any of the image identification units 23a to 23d. Then, to the selected image identification units 23a to 23d, information in which the amount of movement of the movement vector and the person calculated for each frame are associated is output together with the image.

画像識別部２３ａ～ｄは、移動ベクトルの移動量と各フレーム単位で算出された人物とが関連付けられた情報に基づいて、移動ベクトルの移動量が予め定めた値以上である人物を観客とし、その人物を識別対象（特定人物）の候補から除外し、残った人物を識別対象（特定人物）として識別する。 The image identification units 23a to 23d regard persons whose movement amount of the movement vector is equal to or greater than a predetermined value as spectators, based on information in which the movement amount of the movement vector and the person calculated for each frame are associated with each other, and The person is excluded from the identification target (specific person) candidates, and the remaining person is identified as the identification target (specific person).

このようにすれば、観客（人物）と選手（特定人物）とを精度よく識別可能であり、結果として、識別対象の選手（特定人物）を高精度で識別することができる。 In this way, the spectator (person) and the player (specific person) can be identified with high accuracy, and as a result, the player (specific person) to be identified can be identified with high accuracy.

＜第１の実施の形態の変形例２＞
次に、上述した第１の実施の形態の変形例２を説明する。 <Modification 2 of the first embodiment>
Next, Modification 2 of the above-described first embodiment will be described.

上述した第１の実施の形態では、各カメラ１ａ～１ｄは、基本的に、選手とともに移動するカメラとして説明した。しかし、カメラの中には選手とともに移動せず、固定されたカメラもある（例えば、定点カメラ）。そこで、第１の実施の形態の変形例２では、カメラが選手とともに移動しない定点カメラの場合を説明する。 In the first embodiment described above, the cameras 1a to 1d are basically described as cameras that move together with the players. However, some cameras do not move with the player and are fixed (eg fixed point cameras). Therefore, in Modified Example 2 of the first embodiment, a case where the camera is a fixed-point camera that does not move with the player will be described.

図２３は第１の実施の形態の変形例２における映像制作システムの構成例を示すブロック図である。第１の実施の形態の変形例２における映像制作システムは、カメラ１００と、向き推定部１０１と、制御部２２と、画像識別部２３ａ～２３ｄとを備える。 FIG. 23 is a block diagram showing a configuration example of a video production system according to modification 2 of the first embodiment. A video production system according to Modification 2 of the first embodiment includes a camera 100, an orientation estimation section 101, a control section 22, and image identification sections 23a to 23d.

第１の実施の形態の変形例２が第１の実施の形態と異なる点は、カメラ１００がある場所に設置されて撮影方向も固定された定点カメラであることと、向き推定部１０１による向き推定の方法が向き推定部２１と異なる点である。 Modification 2 of the first embodiment differs from the first embodiment in that the camera 100 is a fixed-point camera that is installed at a certain location and the shooting direction is also fixed, and that the orientation estimation unit 101 The estimation method differs from that of the orientation estimation unit 21 .

ここで、カメラ１００は道がＵ字にカーブする道のＵ字の頂点に相当する位置に設置され、カメラ１００の撮影方向の中心はＵ字の頂点からＵ字の中心方向であり、カメラ１００の画角はＵ字の道の全体を撮影できるものとする。 Here, the camera 100 is installed at a position corresponding to the U-shaped vertex of the road that curves into a U-shape, and the center of the photographing direction of the camera 100 is from the U-shaped vertex toward the U-shaped center. It is assumed that the angle of view of is capable of photographing the entire U-shaped road.

すると、カメラ１００は、例えば、図２４のような映像を撮影することができる。具体的には、Ｕ字にカーブの左上方向の位置（地点Ａ）から選手が地点Ｂに移動し、Ｕ字にカーブの頂点である地点Ｃに到達する映像である。更に、地点Ｃから右上方向の地点Ｄに移動し、最後に右上方向の地点Ｅに到達する映像である。 Then, the camera 100 can shoot an image as shown in FIG. 24, for example. Specifically, it is an image in which the player moves from a position (point A) in the upper left direction of the U-shaped curve to point B, and reaches point C, which is the top of the U-shaped curve. Further, the image moves from point C to point D in the upper right direction, and finally reaches point E in the upper right direction.

このような映像の場合、定点カメラであるカメラ１００は選手と共に移動しないので、上述した第１の実施の形態とは逆の状況が起きる。すなわち、カメラ１００が撮影した映像では、選手は移動するが、観客はその場に留まるため、時刻が異なるフレーム間で選手の移動が検出でき、観客の移動は検出できない。更に、移動が検出できる人物において、その移動の方向（移動ベクトル）も異なる。 In the case of such an image, since the camera 100, which is a fixed-point camera, does not move with the player, a situation opposite to that of the first embodiment described above occurs. That is, in the video captured by the camera 100, the player moves, but the spectator stays in place. Therefore, the player's movement can be detected between frames at different times, but the spectator's movement cannot be detected. Furthermore, the direction of movement (movement vector) differs between persons whose movements can be detected.

図２４を用いて具体的に説明すると、地点Ａでは観客の移動は検出できず、選手の画面下方向の移動ベクトルが検出できる。また、地点Ｂでは観客の移動は検出できず、選手の画面右下方向の移動ベクトルが検出できる。また、地点Ｃでは観客の移動は検出できず、選手の画面右方向の移動ベクトルが検出できる。また、地点Ｄでは観客の移動は検出できず、選手の画面右上方向の移動ベクトルが検出できる。また、地点Ｅでは観客の移動は検出できず、選手の画面上方向の移動ベクトルが検出できる。 Specifically, with reference to FIG. 24, the movement of the spectators cannot be detected at the point A, but the movement vector of the player in the downward direction of the screen can be detected. At the point B, the movement of the spectators cannot be detected, but the movement vector of the player in the lower right direction of the screen can be detected. At point C, the movement of the spectators cannot be detected, but the movement vector of the player in the right direction of the screen can be detected. At the point D, the movement of the spectators cannot be detected, but the movement vector of the player in the upper right direction of the screen can be detected. At the point E, the movement of the spectators cannot be detected, but the movement vector of the player in the upward direction of the screen can be detected.

このように、カメラ１００の映像の特殊性に鑑みて、向き推定部１０１はカメラ１００が撮影した映像の選手の向きを推定する。 In this way, the direction estimating unit 101 estimates the direction of the player in the video captured by the camera 100 in view of the particularity of the video captured by the camera 100 .

具体的には、向き推定部１０１は、選手が地点Ａに存在している映像では、選手の画面下方向の移動ベクトルを検出することにより、選手の向きは正面であると推定する。 Specifically, the direction estimation unit 101 detects the movement vector of the player in the downward direction of the screen in the video in which the player is present at the point A, and estimates that the player is facing forward.

また、向き推定部１０１は、選手が地点Ｂに存在している映像では、選手の画面右下方向の移動ベクトルを検出し、そのベクトルの方向と所定の閾値とを比較する。そして、所定の閾値と比較して、ベクトルの方向が下方向に近いならば、選手の向きは正面であると推定する。一方、所定の閾値と比較して、ベクトルの方向が右方向に近いならば、選手の向きは右横方向であると推定する。 In addition, the orientation estimation unit 101 detects the movement vector of the player in the lower right direction of the screen in the video in which the player is present at the point B, and compares the direction of the vector with a predetermined threshold. Then, compared with a predetermined threshold, if the direction of the vector is closer to the downward direction, it is estimated that the player is facing forward. On the other hand, if the direction of the vector is closer to the right than the predetermined threshold, then the player's orientation is assumed to be lateral to the right.

また、向き推定部１０１は、選手が地点Ｃに存在している映像では、選手の画面右方向の移動ベクトルを検出し、選手の向きは右方向（右向き）であると推定する。 In addition, the direction estimation unit 101 detects the movement vector of the player in the right direction of the screen in the video in which the player is present at the point C, and estimates that the player is facing rightward (rightward).

また、向き推定部１０１は、選手が地点Ｄに存在している映像では、選手の画面右上方向の移動ベクトルを検出し、そのベクトルの方向と所定の閾値とを比較する。そして、所定の閾値と比較して、ベクトルの方向が上方向に近いならば、選手の向きは背面であると推定する。一方、所定の閾値と比較して、ベクトルの方向が右方向に近いならば、選手の向きは右横方向であると推定する。 In addition, the direction estimation unit 101 detects the movement vector of the player in the upper right direction of the screen in the video in which the player is present at the point D, and compares the direction of the vector with a predetermined threshold. Then, if the direction of the vector is closer to the upward direction than a predetermined threshold, it is estimated that the player is facing the back. On the other hand, if the direction of the vector is closer to the right than the predetermined threshold, then the player's orientation is assumed to be lateral to the right.

また、向き推定部１０１は、選手が地点Ｅに存在している映像では、選手の画面上方向の移動ベクトルを検出し、選手の向きは背面であると推定する。 In addition, the orientation estimation unit 101 detects the movement vector of the player in the upward direction of the screen in the video in which the player is present at the point E, and estimates that the player is facing the back.

そして、制御部２２は、向き推定部２１の推定結果を受け、推定された選手の向きに対応する画像識別部２３ａ～ｄのいずれかを選択し、選択した画像識別部２３ａ～ｄに、カメラ１００の映像を出力する。 Then, the control unit 22 receives the estimation result of the orientation estimation unit 21, selects one of the image identification units 23a to 23d corresponding to the estimated orientation of the player, and provides the selected image identification unit 23a to d with the camera 100 images are output.

尚、向き推定部１０１の移動ベクトルの検出は、映像の所定フレーム分、例えば、５フレーム毎に検出を行う。 Note that detection of the motion vector by the direction estimation unit 101 is performed for a predetermined number of frames of the video, for example, every five frames.

このようにすれば、設置位置及び撮影方向が固定されたカメラの映像であっても、精度よく人物の向きを推定することができる。 In this way, even if the image is captured by a camera whose installation position and shooting direction are fixed, the orientation of the person can be estimated with high accuracy.

更に、本第１の実施の形態の変形例２の説明では、正面用の画像識別部２３ａと、背面用の画像識別部２３ｂと、左横側用の画像識別部２３ｃと、右横側用の画像識別部２３ｄとを用いたが、これらに加えて、正面と右横側との間の４５度方向を向いている人物の映像を教師データで学習した画像識別部と、背面と右横側との間の４５度方向を向いている人物の映像を教師データで学習した画像識別部とを用意し、選手が地点Ｂ、地点Ｄに存在している映像は、それらの画像識別部に識別させても良い。このようにすれば、更なる認識精度を得ることができる。 Furthermore, in the description of the modification 2 of the first embodiment, the front image identification unit 23a, the back image identification unit 23b, the left side image identification unit 23c, and the right side image identification unit 23c In addition to these, an image identification unit that learns from teacher data an image of a person facing a direction of 45 degrees between the front and the right side, and an image identification unit 23d Prepare an image identification unit that has learned images of a person facing a 45-degree direction between the side and the side using teacher data, and images of players at points B and D are processed by those image identification units. You can identify it. By doing so, further recognition accuracy can be obtained.

また、上述した第１の実施の形態の変形例１と同様に、認識された人物の移動ベクトルを用いて、観客（人物）と選手（特定人物）とを区別することも可能である。 Further, as in the first modification of the first embodiment described above, it is also possible to distinguish between spectators (persons) and players (specific persons) using the recognized movement vectors of the persons.

上述したように、定点カメラで撮影された映像では、所定フレーム間の映像で、移動が検出できる人物が選手であり、移動を検出できない人物が観客である。このような性質を利用し、画像識別部２３ａ～ｄは、移動ベクトルの移動量と各フレーム単位で算出された人物とが関連付けられた情報に基づいて、移動ベクトルの移動量が予め定めた値以下である人物を観客とし、その人物を識別対象（特定人物）の候補から除外し、残った人物を識別対象（特定人物）として識別する。 As described above, in images captured by a fixed-point camera, a person whose movement can be detected between predetermined frames is an athlete, and a person whose movement cannot be detected is a spectator. Using such a property, the image identification units 23a to 23d set the movement amount of the movement vector to a predetermined value based on the information in which the movement amount of the movement vector and the person calculated for each frame are associated with each other. The following persons are set as spectators, the persons are excluded from candidates for identification (specific persons), and the remaining persons are identified as identification objects (specific persons).

＜第２の実施の形態＞
第２の実施の形態は、識別された識別対象の映像上の大きさによって、識別対象の位置関係を識別する画像識別装置及びプログラムを含む映像制作システムを説明する。 <Second Embodiment>
The second embodiment will explain a video production system including an image identification device and a program for identifying the positional relationship of identification objects based on the size of the identified identification objects on the video.

２次元の映像では、手前にあるものが大きく映り、遠方にあるものほど小さく映る。これは競技を撮影している場合も同様である。例えば、マラソンや駅伝では、選手を正面から撮影する場合が多い。この場合、手前を走る選手ほど映像上で大きく映り、後方の選手ほど映像上で小さく映る。 In a two-dimensional image, objects in the foreground appear larger, and objects in the distance appear smaller. This is also the case when shooting a competition. For example, in marathons and relay races, athletes are often photographed from the front. In this case, the player running in front appears larger in the image, and the player in the back appears smaller in the image.

第２の実施の形態では、このような性質を利用し、識別した識別対象の位置関係を識別する。 In the second embodiment, using such properties, the positional relationship of the identified identification objects is identified.

図１４は、第２の実施形態における画像識別装置２の構成例を示すブロック図である。第２の実施形態における画像識別装置２は、第１の実施の形態における画像識別装置２に加えて、位置関係識別部２４を備える。 FIG. 14 is a block diagram showing a configuration example of the image identification device 2 according to the second embodiment. The image identification device 2 in the second embodiment includes a positional relationship identification section 24 in addition to the image identification device 2 in the first embodiment.

位置関係識別部２４は、識別した識別対象の映像上の大きさから各識別対象の位置関係を識別する。上述した第１の実施の形態では、識別対象の識別のため、検出した人物に関して、図１１に示す如く、頭頂部から首中点を結ぶ直線の距離をＬとし、人物の首中心から下方向にＬ×２Ｌの注目領域を設定した。 The positional relationship identification unit 24 identifies the positional relationship of each identification target from the size of the identified identification target on the image. In the first embodiment described above, in order to identify an identification target, as shown in FIG. 11, for a detected person, the distance of a straight line connecting the top of the head to the neck midpoint is L, and the distance from the neck center of the person to the downward direction is L. A region of interest of L×2L was set in .

位置関係識別部２４は、識別した識別対象の映像上の大きさの指標として、この注目領域の大きさに着目する。 The positional relationship identification unit 24 focuses on the size of this attention area as an index of the size of the identified identification target on the video.

更に、位置関係識別部２４は、注目領域の大きさに加えて、向き推定部２１が推定した選手の向き（特定人物）も着目する。 Furthermore, the positional relationship identification unit 24 focuses on the direction of the player (specific person) estimated by the direction estimation unit 21 in addition to the size of the region of interest.

例えば、図１５の例は、選手の向きが正面の場合の映像である。そして、本映像では、人物Ｄの注目領域が最も大きく、人物Ｃの注目領域が最も小さい。従って、選手の順位は、人物Ｄ（大学Ｚの選手）、人物Ｂ（大学Ｙの選手）、人物Ｃ（大学Ｘの選手）の順であり、その順番で走行していることが識別できる。一方、図１６に示すように、選手の向きが背面の場合の映像である場合、図１５と同様な人物の位置状況であっても、注目領域が小さい程、選手の順位が高いことになり、選手の順位は、人物Ｃ（大学Ｘの選手）、人物Ｂ（大学Ｙの選手）、人物Ｄ（大学Ｚの選手）の順であり、その順番で走行していることが識別できる。 For example, the example in FIG. 15 is an image when the player faces the front. In this image, the attention area of person D is the largest and the attention area of person C is the smallest. Therefore, the order of the athletes is Person D (athlete of University Z), Person B (athlete of University Y), and Person C (athlete of University X), and it can be identified that they are running in that order. On the other hand, as shown in FIG. 16, in the case of an image in which the player faces the back, even if the position of the person is the same as in FIG. 15, the smaller the attention area, the higher the player's ranking. , the order of the athletes is person C (athlete of university X), person B (athlete of university Y), and person D (athlete of university Z) in that order, and it can be identified that they are running in that order.

このように、識別された選手の映像上の大きさと、識別された選手の向きとを考慮することにより、撮影方向に関係なく、識別した選手の映像上の位置関係を識別することができる。 Thus, by considering the size of the identified player on the video and the direction of the identified player, the positional relationship of the identified player on the video can be identified regardless of the shooting direction.

尚、上述した例では、識別した選手の注目領域に着目して位置関係を識別したが、これに限られない。例えば、選手自体の映像内での選手の大きさでも良い。また、他の方法として、各選手が共通して装着している装着具に着目する方法がある。例えば、ゼッケン等は、選手間で共通の大きさなので、好適である。 In the above example, the positional relationship is identified by focusing on the attention area of the identified player, but the invention is not limited to this. For example, it may be the size of the player in the video of the player himself. As another method, there is a method of paying attention to the equipment commonly worn by each player. For example, bibs and the like are suitable because they have a common size among players.

更に、本例の応用例として、位置関係を識別するに用いた大きさから、相対的な距離も求めることが可能である。競技の撮影に用いられカメラには撮影した映像とともに、画角などの撮影情報を取得することができる。従って、画角などに対応付けて位置関係を識別するに用いた大きさと距離との関係を、予め学習しておけば、選手（識別対象）間の距離も算出することが可能である。 Furthermore, as an application example of this example, relative distances can also be obtained from the magnitudes used to identify the positional relationship. The camera used for filming the competition can acquire the photographing information such as the angle of view together with the photographed image. Therefore, if the relationship between the size and the distance used to identify the positional relationship is learned in advance in association with the angle of view, etc., it is possible to calculate the distance between the players (identification targets).

更に、求められた選手間の距離を用いて、選手の位置関係の変化を予測することも可能である。例えば、選手間の距離が時間とともに小さくなる場合は、後方の選手が先行する選手を追い抜く可能性がある。このような場合、警告を発することにより、制作側に選手の位置関係の変化を知らせることができる。 Furthermore, it is also possible to predict changes in the positional relationship of the players using the obtained distances between the players. For example, if the distance between players decreases over time, a trailing player may overtake a leading player. In such a case, by issuing a warning, it is possible to inform the production side of the change in the positional relationship of the players.

＜第３の実施の形態＞
第３の実施の形態を説明する。 <Third Embodiment>
A third embodiment will be described.

第３の実施の形態は、上述した第１の実施の形態における画像識別部２３ａ～ｄを学習させるための教師データの分類システムである。 The third embodiment is a teacher data classification system for making the image identification units 23a to 23d in the first embodiment learn.

上述した向き推定部２１は、入力された映像から人物を認識し、認識された人物の映像上の位置を参照し、映像上の選手（特定人物）の向きを推定するものである。この向き推定部２１を用いることにより、画像識別部２３ａ～ｄに学習させる映像の教師データを、選手の向き毎に分類することができる。 The orientation estimation unit 21 described above recognizes a person from an input image, refers to the position of the recognized person on the image, and estimates the orientation of the player (specific person) on the image. By using this direction estimation unit 21, it is possible to classify the teacher data of the images to be learned by the image identification units 23a to 23d according to the player's direction.

図１７は、第３の実施の形態における分類システムのブロック図である。 FIG. 17 is a block diagram of a classification system according to the third embodiment.

向き推定部２１は、上述した第１の実施の形態で説明した向き推定部２１と同様な動作を行う。本例では、上述した第１の実施の形態と同様に、映像から選手（特定人物）の向きとして、正面、背面、左向き及び右向きのいずれかを推定する。 The orientation estimator 21 performs the same operation as the orientation estimator 21 described in the first embodiment. In this example, as in the first embodiment described above, any one of front, back, left, and right is estimated as the orientation of the player (specific person) from the video.

格納部３０は、向き推定部２１の推定結果を受信し、正面、背面、左向き及び右向きのいずれかの教師データ格納フォルダ３１ａ～３１ｄに、その映像を格納する。 The storage unit 30 receives the estimation result of the orientation estimation unit 21, and stores the image in any one of the front, rear, left, and right teacher data storage folders 31a to 31d.

このようにすることにより、画像識別部２３ａ～ｄを学習させるための教師データを、自動的に分類することができる。 By doing so, it is possible to automatically classify teacher data for learning the image identification units 23a to 23d.

＜第４の実施の形態＞
第４の実施の形態を説明する。 <Fourth Embodiment>
A fourth embodiment will be described.

図１８は第４の実施の形態のブロック図である。 FIG. 18 is a block diagram of the fourth embodiment.

第４の実施の形態は、第１及び第２の実施の形態に加えて、メタデータ生成部４０と、メタデータ格納部４１と、シーン検索部４２とを備える。 The fourth embodiment includes a metadata generation section 40, a metadata storage section 41, and a scene search section 42 in addition to the first and second embodiments.

メタデータ生成部４０は、各画像識別部２３ａ～ｄの識別結果（選手（特定人物）の属性、選手（特定人物）の映像上の位置等）、及び位置関係識別部２４の識別結果（選手（特定人物）の位置関係等）と、カメラ１ａ～１ｄの映像とを関連付けたメタデータを生成する。 The metadata generation unit 40 generates the identification results of the image identification units 23a to 23d (the attributes of the player (specific person), the position of the player (specific person) on the video, etc.) and the identification results of the positional relationship identification unit 24 (player Metadata that associates (the positional relationship of (a specific person), etc.) with the images of the cameras 1a to 1d is generated.

各画像識別部２３ａ～ｄ及び位置関係識別部２４は、所定の時間間隔（例えば、１秒間隔）で、識別結果を出力している。そこで、メタデータ生成部４０は、所定の時間間隔で受け取った識別結果と、識別を行った映像を識別する映像識別情報、識別結果に対応する映像のタイムコードとを関連付けてメタデータ格納部４１に格納する。メタデータ格納部４１は、メタデータのみならず、そのメタデータに対応する各映像も格納するようにしても良い。 Each of the image identification units 23a to 23d and the positional relationship identification unit 24 outputs identification results at predetermined time intervals (for example, 1-second intervals). Therefore, the metadata generation unit 40 associates the identification results received at predetermined time intervals, the image identification information for identifying the identified images, and the time code of the images corresponding to the identification results, and stores them in the metadata storage unit 41. store in The metadata storage unit 41 may store not only metadata but also each video corresponding to the metadata.

具体的に説明すると、各画像識別部２３ａ～ｄは、図１２に示したような各選手（特定人物）に対する識別結果を出力している。更に、注目領域を設定する際に、認識した選手（特定人物）の首元（首中点）の座標及び顔の長さ（Ｌ）を算出しているので、これらの情報も出力可能である。また、位置関係識別部２４は、認識した各選手（特定人物）の位置関係が出力できる。そこで、メタデータ生成部４０は、識別した各選手（特定人物）が各大学に属する信頼度と、識別した各選手（特定人物）の首元（首中点）の座標及び顔の長さ（Ｌ）と、その選手の順位と、識別を行った映像の映像識別情報と、その映像のタイムコードとを関連付けてメタデータ格納部４１に格納する。尚、識別した各選手（特定人物）が各大学に属する信頼度と、識別した各選手（特定人物）の首元（首中点）の座標及び顔の長さ（Ｌ）と、その選手の順位と、その識別の対象となった映像の映像識別情報と、映像のタイムコードとを、以下、メタデータの各インデックスと記載する。 Specifically, the image identification units 23a to 23d output identification results for each player (specific person) as shown in FIG. Furthermore, since the coordinates of the neck (midpoint of the neck) and the face length (L) of the recognized player (specific person) are calculated when setting the attention area, it is also possible to output this information. . Further, the positional relationship identification unit 24 can output the recognized positional relationship of each player (specific person). Therefore, the metadata generation unit 40 determines the degree of reliability that each identified player (specific person) belongs to each university, the coordinates of the neck (midpoint of the neck) and the length of the face ( L), the rank of the player, the video identification information of the video that has been identified, and the time code of the video are associated with each other and stored in the metadata storage unit 41 . In addition, the reliability of each identified player (specific person) belonging to each university, the coordinates and face length (L) of the neck (neck midpoint) of each identified player (specific person), and the player's The order, the video identification information of the video to be identified, and the time code of the video are hereinafter referred to as each index of metadata.

図１９はメタデータの一例を示す図である。図１９の例では、インデックスとして、識別した各選手（特定人物）が各大学に属する信頼度と、識別した各選手（特定人物）の首元（首中点）の座標及び顔の長さ（Ｌ）と、その選手の順位と、その識別の対象となった映像の映像識別情報と、そのタイムコードとがあり、識別した各選手（特定人物）が各大学に属する信頼度と、識別した各選手（特定人物）の首元（首中点）の座標及び顔の長さ（Ｌ）と、その選手の順位と、識別を行った映像の映像識別情報と、その映像のタイムコードとが関連付けられ、メタデータが構成されている例である。例えば、人物Ｂでは、各大学に属する信頼度（大学Ｘ＝０．２、大学Ｙ＝０．９、大学Ｚ＝０．１、・・・）と、首元の座標（＝（７８１，４５０））と、顔の長さ（Ｌ）（＝５）と、順位（＝２）と、映像識別情報（カメラ１）と、タイムコード（＝０２：１０：０１）とが関連付けられ、メタデータが構成されている。尚、図１９の例では、識別された人物のうち、選手以外の観客等は除外している。 FIG. 19 is a diagram showing an example of metadata. In the example of FIG. 19, as indexes, the confidence that each identified player (specific person) belongs to each university, the coordinates of the neck (neck midpoint) of each identified player (specific person), and the length of the face ( L), the ranking of the player, the video identification information of the video that was the target of the identification, and the time code, and the reliability of each identified player (specific person) belonging to each university. The coordinates of the neck (midpoint of the neck) and face length (L) of each player (specific person), the ranking of the player, the video identification information of the video that has been identified, and the time code of the video. It is an example that is associated and configured with metadata. For example, for person B, the degree of reliability belonging to each university (university X=0.2, university Y=0.9, university Z=0.1, . . . ) and the coordinates of the neck (=(781,450 )), face length (L) (=5), rank (=2), video identification information (camera 1), and time code (=02:10:01) are associated, and metadata is configured. In addition, in the example of FIG. 19, among the identified persons, spectators other than the players are excluded.

シーン検索部４２は、所定の検索条件に該当するシーンを、メタデータ格納部４１に格納されているメタデータを用いて検索する。検索条件は、メタデータの各インデックスから検索することができる条件であれば良い。 The scene search unit 42 searches for scenes that meet predetermined search conditions using metadata stored in the metadata storage unit 41 . The search condition may be any condition that enables a search from each index of the metadata.

例えば、検索条件として、「大学Ｘ、大学Ｙの選手が映っているシーン」がシーン検索部４２に設定された場合、シーン検索部４２はメタデータ格納部４１に格納されているメタデータのインデックスのうち、映像識別情報が同一であり、「大学Ｘ」及び「大学Ｙ」の信頼度が所定の閾値（例えば、０，７）よりも大きい条件に該当するタイムコードを検索する。そして、該当する映像識別情報のタイムコードに対応する映像のシーンを抽出する。図１９の例において、「大学Ｘ、大学Ｙの選手が映っているシーン」の条件に該当するシーンは、図２０に示すように、「カメラ１の映像」のタイムコード「０２：１０：０１」のシーンである。そこで、「カメラ１の映像」のタイムコード「０２：１０：０１」のシーンを、「カメラ１の映像」から抽出し、そのシーンを表示する。尚、図２０の例は、「大学Ｘ、大学Ｙの選手が映っているシーン」を条件としているので、大学Ｘ及び大学Ｙの選手が映っていれば、他の大学の選手も映っても良い例を示している。しかし、検索条件を「大学Ｘ、大学Ｙの選手のみが映っているシーン」とすれば、大学Ｘ及び大学Ｙの選手のみが映っているシーンを検索することも可能である。 For example, when the scene search unit 42 is set with “a scene in which athletes from university X and university Y are shown” as a search condition, the scene search unit 42 searches the index of the metadata stored in the metadata storage unit 41 . Among them, a time code that has the same video identification information and that satisfies the condition that the reliability of "university X" and "university Y" is greater than a predetermined threshold value (for example, 0, 7) is searched. Then, the video scene corresponding to the time code of the corresponding video identification information is extracted. In the example of FIG. 19, a scene that satisfies the condition of "a scene in which athletes from university X and university Y are shown" is the time code "02:10:01" of "video from camera 1" as shown in FIG. ” is the scene. Therefore, the scene of the time code "02:10:01" of the "camera 1 video" is extracted from the "camera 1 video", and the scene is displayed. In the example of FIG. 20, the condition is "a scene in which athletes from University X and University Y are shown". gives a good example. However, if the search condition is "a scene in which only university X and university Y players are shown", it is possible to search for scenes in which only university X and university Y players are shown.

また、検索条件として、「大学Ｚの選手が大学Ｙの選手を追い抜くシーン」等の条件も設定することが可能である。この場合、上述のインデックに加えて、順位のインデックスを参照し、大学Ｚの信頼度が高い人物の順位と大学Ｙの信頼度が高い人物の順位とが入れ替わるタイムコードの前後のシーンを検索すれば良い。 As a search condition, it is also possible to set conditions such as "a scene in which a player from University Z overtakes a player from University Y". In this case, in addition to the index described above, the ranking index is referred to, and the scenes before and after the time code where the ranking of the highly reliable person of University Z and the ranking of the highly reliable person of University Y are exchanged are retrieved. Good luck.

第４の実施の形態は、識別結果を映像のメタデータ化することにより、特定の条件に該当する映像のシーンを検索することができる。 In the fourth embodiment, by converting the identification result into video metadata, it is possible to search for video scenes that meet specific conditions.

＜第５の実施の形態＞
第５の実施の形態を説明する。 <Fifth Embodiment>
A fifth embodiment will be described.

第５の実施の形態は、第２の実施の形態における識別対象の位置関係を識別する画像識別装置と、距離計システムと、マラソン速報オンラインシステムとを利用した制作支援システムについて説明する。 The fifth embodiment describes a production support system using the image identification device for identifying the positional relationship of identification targets, the rangefinder system, and the marathon bulletin online system in the second embodiment.

まず、距離計システムとマラソン速報オンラインシステムとを説明する。 First, the rangefinder system and the marathon bulletin online system will be explained.

カメラが積載された中継車には、距離計が設けられている。この距離計は、中継車に設けられているセンサにより計測された走行距離と、GPSを用いた測位システムから得られる中継車の位置とを用いて、中継車のマラソンのコース上の距離を得ることが出来る。 A range finder is provided on the broadcast vehicle loaded with cameras. This rangefinder obtains the distance of the relay vehicle on the marathon course by using the distance measured by the sensor installed in the relay vehicle and the position of the relay vehicle obtained from a positioning system using GPS. can do

一方、マラソン速報オンラインシステムは、マラソンのコース上の所定のポイントに人員を配置し、その人員がそのポイントを通過する選手(大学名)を確認し、集計システムに報告する。集計システムでは、各ポイントから報告される選手(大学名)の通過状況を集計し、各選手(大学名)が存在するおよその位置と順位とを推定する。このようなマラソン速報オンラインシステムは、従来から利用されていいたものである。 On the other hand, the marathon bulletin online system allocates personnel to predetermined points on the marathon course, confirms the athletes (university names) who pass the points, and reports to the counting system. The tallying system tallies the passage status of the athlete (university name) reported from each point, and estimates the approximate position and rank of each athlete (university name). Such a marathon bulletin online system has been used conventionally.

図２１は、第５の実施の形態のブロック図である。 FIG. 21 is a block diagram of the fifth embodiment.

図２１中、５１はカメラ１ａを積載した１号中継車であり、５２はカメラ１ｂを積載した２号中継車であり、５３はカメラ１ｃを積載した３号中継車であり、５４はカメラ１ｄを積載した４号中継車であり、５５は画像識別装置、５６はマラソン速報オンラインシステムであり、５７は位置関係特定部であり、５８はディスプレイである。 In FIG. 21, 51 is the No. 1 relay vehicle loaded with the camera 1a, 52 is the No. 2 relay vehicle loaded with the camera 1b, 53 is the No. 3 relay vehicle loaded with the camera 1c, and 54 is the camera 1d. 55 is an image identification device, 56 is a marathon bulletin online system, 57 is a positional relationship specifying unit, and 58 is a display.

中継車５１から中継車５４には、上述した距離計が設けられている。この距離計は、中継車に設けられているセンサにより計測された走行距離と、GPSを用いた測位システムから得られる中継車の位置とを用いて、中継車のマラソンのコース上の距離を算出し、位置関係特定部５７に出力する。また、中継車５１から中継車５４に積載されたカメラ１ａからカメラ１ｄの撮影映像は、画像識別装置５５に出力される。 The relay car 51 to the relay car 54 are provided with the range finder described above. This rangefinder calculates the distance of the relay vehicle on the marathon course by using the distance measured by the sensor installed in the relay vehicle and the position of the relay vehicle obtained from the positioning system using GPS. and output to the positional relationship specifying unit 57 . In addition, the images captured by the cameras 1 a to 1 d loaded on the relay vans 51 to 54 are output to the image identification device 55 .

画像識別装置５５は、第２の実施の形態において説明した画像識別装置である。映像の識別結果及び位置関係は、位置関係特定部５７に出力される。 The image identification device 55 is the image identification device described in the second embodiment. The image identification result and the positional relationship are output to the positional relationship specifying unit 57 .

マラソン速報オンラインシステム５６は、マラソンのコース上の所定のポイントに配置された人員から報告される選手(大学名)の通過状況を集計システムが集計し、その結果を位置関係特定部５７に出力する。 In the marathon bulletin online system 56, the tally system tallies the passage status of the athlete (university name) reported by the personnel stationed at the predetermined points on the marathon course, and outputs the result to the positional relationship identification unit 57. .

次に、第５の実施の形態の動作を、具体例を用いて説明する。 Next, the operation of the fifth embodiment will be described using a specific example.

まず、本動作例では、ＡＡ大学からＪＪ大学までの合計１２校の選手が走行する駅伝を想定する。そして、現在、ＡＡ大学からＪＪ大学の選手は、区間が２０ｋｍの６区を走行しており、各中継車５１から中継車５４は、６区を走行しているものとする。 First, in this operation example, an Ekiden is assumed in which athletes from a total of 12 schools, from AA University to JJ University, run. At present, players from AA University to JJ University are running in 6 sections with a section of 20 km, and relay vans 51 to 54 are running in 6 sections.

中継車５１（１号車）は、距離計より、６区の１９．５ｋｍを走行している。そして、中継車５１（１号車）に積載されているカメラ１ａは、先頭を走行する選手を撮影している。中継車５２（２号車）は、距離計より、６区の１９．２ｋｍを走行している。そして、中継車５２（２号車）に積載されているカメラ１ｂは、走行する選手を撮影している。中継車５３（３号車）は、距離計より、６区の１８．６ｋｍを走行している。そして、中継車５３（３号車）に積載されているカメラ１ｃは、走行する選手を撮影している。中継車５４（４号車）は、距離計より、６区の１７．８ｋｍを走行している。そして、中継車５４（４号車）に積載されているカメラ１ｄは、走行する選手を撮影している。、距離計より得られた各中継車５１から５４が位置する距離は、位置関係特定部５７に出力される。また、各中継車５１から５４のカメラ１ａからカメラ１ｄにより撮影された映像は、画像識別装置５５に出力される。 The relay vehicle 51 (car No. 1) is traveling 19.5 km in the 6th section according to the rangefinder. The camera 1a loaded on the broadcast van 51 (car No. 1) takes an image of the player running in the lead. Relay van 52 (car No. 2) is traveling 19.2 km in Section 6 according to the rangefinder. Then, the camera 1b loaded on the broadcast van 52 (car No. 2) photographs the running player. Relay van 53 (car No. 3) is traveling 18.6 km in Section 6 according to the rangefinder. The camera 1c loaded on the relay van 53 (car No. 3) photographs the running player. Relay van 54 (car No. 4) is traveling 17.8 km in Section 6 according to the rangefinder. A camera 1d loaded on the relay van 54 (car No. 4) photographs the running player. , the distances at which the relay cars 51 to 54 are located, which are obtained from the rangefinders, are output to the positional relationship specifying unit 57 . Also, the images captured by the cameras 1 a to 1 d of the relay cars 51 to 54 are output to the image identification device 55 .

画像識別装置５５は、カメラ１ａで撮影された映像から、選手とその選手の映像上の位置関係を識別する。ここで識別された選手は、ＡＡ大学の選手のみであるとする。この識別結果は、位置関係特定部５７に出力される。 The image identification device 55 identifies the positional relationship between the player and the player on the video from the video captured by the camera 1a. It is assumed that the players identified here are only players from AA universities. This identification result is output to the positional relationship specifying unit 57 .

画像識別装置５５は、カメラ１ｂで撮影された映像から、選手とその選手の映像上の位置関係を識別する。ここで識別された選手はＥＥ大学の選手及びＣＣ大学の選手であり、その位置関係は、ＥＥ大学の選手が先頭であり、次順位がＣＣ大学の選手であるものとする。この識別結果は、位置関係特定部５７に出力される。 The image identification device 55 identifies the positional relationship between the player and the player in the video captured by the camera 1b. The athletes identified here are the EE University athletes and the CC University athletes, and the positional relationship is such that the EE University athletes are at the top, followed by the CC University athletes. This identification result is output to the positional relationship specifying unit 57 .

画像識別装置５５は、カメラ１ｃで撮影された映像から、選手とその選手の映像上の位置関係を識別する。ここで識別された選手は、ＢＢ大学の選手、ＩＩ大学の選手及びＧＧ大学の選手であり、その位置関係は、ＢＢ大学の選手（先頭）、ＩＩ大学の選手、ＧＧ大学の選手の順であるものとする。この識別結果は、位置関係特定部５７に出力される。 The image identification device 55 identifies the positional relationship between the player and the player on the video from the video captured by the camera 1c. The players identified here are BB University players, II University players, and GG University players, and their positional relationship is in the order of BB University players (top), II University players, and GG University players. Assume that there is This identification result is output to the positional relationship specifying unit 57 .

画像識別装置５５は、カメラ１ｄで撮影された映像から、選手とその選手の映像上の位置関係を識別する。ここで識別された選手は、ＦＦ大学の選手のみであるとする。この識別結果は、位置関係特定部５７に出力される。 The image identification device 55 identifies the positional relationship between the player and the player on the video from the video captured by the camera 1d. Assume that the players identified here are only players from FF University. This identification result is output to the positional relationship specifying unit 57 .

マラソン速報オンラインシステム５６では、各ポイントから以下の報告を受けているものとする。尚、選手の記載順序は、各ポイントの通過順序を示している。
（１）１９．５ｋｍ地点
ＡＡ大学の選手の通過
（２）１９．０ｋｍ地点
ＡＡ大学の選手、ＥＥ大学の選手、ＣＣ大学の選手の通過
（３）１８．５ｋｍ地点
ＡＡ大学の選手、ＥＥ大学の選手、ＣＣ大学の選手、ＫＫ大学の選手、ＢＢ大学の選手、ＩＩ大学の選手、ＧＧ大学の選手の通過
（４）１８．０ｋｍ地点
ＡＡ大学の選手、ＥＥ大学の選手、ＣＣ大学の選手、ＫＫ大学の選手、ＢＢ大学の選手、ＩＩ大学の選手、ＧＧ大学の選手、ＤＤ大学の選手、ＬＬ大学の選手、ＨＨ大学の選手の通過
（５）１７．５ｋｍ地点
ＡＡ大学の選手、ＥＥ大学の選手、ＣＣ大学の選手、ＫＫ大学の選手、ＢＢ大学の選手、ＩＩ大学の選手、ＧＧ大学の選手、ＤＤ大学の選手、ＬＬ大学の選手、ＨＨ大学の選手、ＦＦ大学の選手、ＪＪ大学の選手の通過
位置関係特定部５７は、各中継車５１から５４が位置する距離と、画像識別装置５５が識別した大学の選手及び位置関係（順位）と、マラソン速報オンラインシステム５６からの集計結果とを受け、ディスプレイ５８に表示する選手走行位置画面を生成する。 It is assumed that the marathon bulletin online system 56 receives the following reports from each point. The order in which the athletes are listed indicates the order in which they passed each point.
(1) 19.5km point AA athletes pass (2) 19.0km point AA University athletes, EE University athletes and CC University athletes pass (3) 18.5km point AA University athletes and EE University athletes , CC University athletes, KK University athletes, BB University athletes, II University athletes, GG University athletes passing (4) 18.0km point AA University athletes, EE University athletes, CC University athletes , KK University athletes, BB University athletes, II University athletes, GG University athletes, DD University athletes, LL University athletes, HH University athletes passing (5) 17.5 km point AA University athletes, EE University Players, CC University Players, KK University Players, BB University Players, II University Players, GG University Players, DD University Players, LL University Players, HH University Players, FF University Players, JJ Passage of University Athletes The positional relationship specifying unit 57 collects the distances at which the broadcast vans 51 to 54 are located, the university athletes and positional relationships (ranking) identified by the image identification device 55, and the marathon bulletin online system 56. In response to the result, the player running position screen to be displayed on the display 58 is generated.

位置関係特定部５７は、まず、各中継車５１から５４の距離計より、各中継車が以下の位置を走行していることを判別する。
（１）中継車５１（１号車）６区の１９．５ｋｍを走行
（２）中継車５２（２号車）６区の１９．２ｋｍを走行
（３）中継車５３（３号車）６区の１８．６ｋｍを走行
（４）中継車５４（４号車）６区の１７．８ｋｍを走行
以上から、各中継車の位置関係が判別できる。
（１）先頭の中継車が中継車５１（１号車）であること
（２）中継車５２（２号車）は２番目を走行しており、中継車５１（１号車）と中継車５２（２号車）との距離が３０ｍであること
（３）中継車５３（３号車）は３番目を走行しており、中継車５２（２号車）と中継車５３（３号車）との距離が６００ｍであること
（４）中継車５４（４号車）は４番目を走行しており、中継車５３（３号車）と中継車５４（４号車）との距離が８００ｍであること
更に、カメラ１ａから１ｄの映像から、画像識別装置５５が識別した結果は、以下の通りのものであるとする。
（１）カメラ１ａの映像からの識別結果
ＡＡ大学の選手
（２）カメラ１ｂの映像からの識別結果
ＥＥ大学の選手、ＣＣ大学の選手（順位は記載順）
（３）カメラ１ｃの映像からの識別結果
ＢＢ大学の選手、ＩＩ大学の選手、ＧＧ大学の選手（順位は記載順）
（４）カメラ１ｄの映像からの識別結果
ＦＦ大学の選手
以上の結果より、図２２で示される選手走行位置画面のうち、距離計システム及び画像識別装置の結果を生成することができる。 The positional relationship specifying unit 57 first determines that each relay car is traveling at the following positions from the rangefinders of each relay car 51 to 54 .
(1) Relay van 51 (car No. 1) travels 19.5 km in District 6 (2) Relay van 52 (car No. 2) travels 19.2 km in District 6 (3) Relay van 53 (car No. 3) travels 18 in District 6 Traveling 6 km (4) Relay van 54 (Car No. 4) Traveling 17.8 km in section 6 From the above, the positional relationship of each relay van can be determined.
(1) The leading relay van is relay van 51 (car No. 1). (2) Relay van 52 (car No. 2) is running second. (3) Relay van 53 (car No. 3) is running third, and the distance between relay van 52 (car No. 2) and relay van 53 (car No. 3) is 600 m. (4) Relay van 54 (car No. 4) is running fourth, and the distance between relay van 53 (car No. 3) and relay van 54 (car No. 4) is 800 m. It is assumed that the result of identification by the image identification device 55 from the image is as follows.
(1) Identification result from camera 1a AA athlete (2) Identification result from camera 1b image Athlete from EE University, athlete from CC University (ranking in order of description)
(3) Identification results from camera 1c video Players from BB University, Players from II University, Players from GG University (ranking in order of description)
(4) Identification Result from Video of Camera 1d FF University Athlete Based on the above results, the result of the rangefinder system and the image identification device can be generated from the athlete running position screen shown in FIG.

図２２の選手走行位置画面のうち距離計システムの部分では、中継車５１（１号車）、中継車５２（２号車）、中継車５３（３号車）及び中継車５４（４号車）の区間内での距離と、各中継車間の距離と、トップ（中継車５１（１号車））との差とを表示している。 In the distance meter system part of the player running position screen in FIG. , the distance between each relay car, and the difference from the top (relay car 51 (car No. 1)).

また、図２２の選手走行位置画面のうち画像識別装置５５が識別した各カメラの映像の識別結果が表示されている。尚、画像識別装置の結果の画面の選手の大学名は、上から順位順に表示している。 Further, the identification result of the image of each camera identified by the image identification device 55 is displayed in the player running position screen of FIG. 22 . In addition, the university names of the athletes on the result screen of the image identification device are displayed in order of ranking from the top.

続いて、マラソン速報オンラインシステムの部分の画面であるが、画像識別装置５５により検出できなかった大学の選手を、位置関係特定部５７は選別する。ここでは、画像識別装置５５により検出できなかった大学の選手は、ＫＫ大学の選手、ＤＤ大学の選手、ＬＬ大学の選手、ＨＨ大学の選手、ＪＪ大学の選手である。これらの選手の位置は、マラソン速報オンラインシステムの集計結果より、以下のように推定できる。
（１）ＫＫ大学の選手
１９．０ｋｍ地点と１８．５ｋｍ地点との間を走行中
（２）ＤＤ大学の選手
１８．５ｋｍ地点と１８．０ｋｍ地点との間を走行中
（３）ＬＬ大学の選手
１８．５ｋｍ地点と１８．０ｋｍ地点との間を走行中
（４）ＨＨ大学の選手、
１８．５ｋｍ地点と１８．０ｋｍ地点との間を走行中
（５）ＪＪ大学の選手
１８．０ｋｍ地点と１７．５ｋｍ地点との間を走行中
以上の結果より、以下のことがわかる。
（１）中継車５１（１号車）と中継車５２（２号車）との間に位置する選手
なし
（２）中継車５２（２号車）と中継車５３（３号車）との間に位置する選手
ＫＫ大学の選手
（３）中継車５３（３号車）と中継車５４（４号車）との間に位置する選手
ＤＤ大学の選手、ＬＬ大学の選手、ＨＨ大学の選手
（４）中継車５４（４号車）より後ろに位置する選手
ＪＪ大学の選手
以上より、図２２の選手走行位置画面のうちマラソン速報オンラインシステムの部分の画面を作成することができる。尚、マラソン速報オンラインシステムの画面の選手の大学名は、上から順位順に表示している。 Next, on the screen of the marathon bulletin online system, the positional relationship specifying unit 57 selects the university athletes that the image identification device 55 could not detect. Here, the university athletes that could not be detected by the image identification device 55 are KK University athletes, DD University athletes, LL University athletes, HH University athletes, and JJ University athletes. The positions of these athletes can be estimated as follows from the aggregated results of the marathon bulletin online system.
(1) KK University athlete running between 19.0km point and 18.5km point (2) DD University athlete running between 18.5km point and 18.0km point (3) LL University athlete Athlete running between 18.5km and 18.0km (4) HH University athlete
Running between the 18.5 km point and the 18.0 km point (5) JJ University athlete Running between the 18.0 km point and the 17.5 km point From the above results, the following can be understood.
(1) Positioned between relay van 51 (car No. 1) and relay van 52 (car No. 2) without players (2) Positioned between relay van 52 (car No. 2) and relay van 53 (car No. 3) Athletes from KK University (3) Athletes located between relay van 53 (car No. 3) and relay van 54 (car No. 4) Athletes from DD University, athletes from LL University, athletes from HH University (4) Broadcast van 54 Athletes behind (Car No. 4) Athletes from JJ University From the above, it is possible to create a screen for the marathon bulletin online system portion of the athlete running position screen of FIG. 22 . In addition, the university names of the athletes on the screen of the marathon bulletin online system are displayed in order from the top.

尚、図２２の例では、マラソン速報オンラインシステムの部分を、中継車の存在する位置にあわせて表示するようにした。しかし、図２５のように、マラソン速報オンラインシステムで得られる全大学の順位を表示するようにしても良い。 In the example of FIG. 22, the part of the marathon bulletin online system is displayed in accordance with the position of the relay van. However, as shown in FIG. 25, the ranking of all universities obtained from the marathon bulletin online system may be displayed.

更に、ディスプレイ５８に表示する選手走行位置画面を、Ｗｅｂ上に公開し、関係者が見られるようにしても良い。この際、チャット機能を設け、各関係者がコミュニケーションを取れるようにしても良い。図２５の例では、選手走行位置画面の下にチャット画面を設け、各関係者の発言を表示できるようにしている。例えば、映像を選択するＸＸＸがＤＤＤ大学の選手の映像がほしい場合、４号車に対して移動を伝える。４号車では、選手走行位置画面見て、ＦＦ大学の選手の後ろに位置するＤＤ大学の選手の位置まで移動することができ、その移動の了解を伝えることができる。 Furthermore, the player running position screen displayed on the display 58 may be made public on the Web so that it can be viewed by related parties. At this time, a chat function may be provided so that each person concerned can communicate with each other. In the example of FIG. 25, a chat screen is provided below the player's running position screen so that comments from each person concerned can be displayed. For example, if XXX, who selects a video, wants a video of a DDD university player, she will tell Car 4 to move. In car No. 4, it is possible to move to the position of the DD University athlete, which is located behind the FF University athlete, by looking at the athlete running position screen, and to give consent to the movement.

第５の実施の形態によれば、各中継車の距離情報と、画像識別装置の識別結果及び位置関係と、マラソン速報オンラインシステムの集計結果とを連携することにより、中継車や参加選手の位置関係を視覚的に把握することができ、マラソン等を中継する製作者にとって有益な情報を提供することができる。 According to the fifth embodiment, by linking the distance information of each relay van, the identification result and positional relationship of the image identification device, and the tally result of the marathon bulletin online system, the position of the relay van and participating athletes can be determined. The relationship can be visually grasped, and useful information can be provided to the producer who relays the marathon or the like.

以上好ましい実施の形態をあげて本発明を説明したが、全ての実施の形態の構成を備える必要はなく、適時組合せて実施することができるばかりでなく、本発明は必ずしも上記実施の形態に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to preferred embodiments, it is not necessary to include all the configurations of the embodiments, and not only can they be combined as appropriate, but the present invention is not necessarily limited to the above embodiments. However, it can be modified and implemented in various ways within the scope of its technical ideas.

１ａ～１ｄカメラ
２画像識別装置
２１向き推定部
２２制御部
２３ａ～２３ｄ画像識別部
２４位置関係識別部
３０格納部
３１ａ～３１ｄ教師データ格納フォルダ
４０メタデータ生成部
４１メタデータ格納部
４２シーン検索部
５１～５４中継車
５５画像識別装置
５６マラソン速報オンラインシステム
５７位置関係特定部
５８ディスプレイ
１００カメラ
１０１向き推定部 1a to 1d camera 2 image identification device 21 orientation estimation unit 22 control unit 23a to 23d image identification unit 24 positional relationship identification unit 30 storage unit 31a to 31d teacher data storage folder 40 metadata generation unit 41 metadata storage unit 42 scene search unit 51 to 54 Broadcast van 55 Image identification device 56 Marathon bulletin online system 57 Positional relationship identification unit 58 Display 100 Camera 101 Orientation estimation unit

Claims

An image identification device that identifies a specific specific person from an image,
an orientation estimation unit that recognizes a person from an image, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image;
a plurality of specific person identification units trained to identify the specific person for each orientation of the specific person;
a control unit that selects the specific person identifying unit corresponding to the orientation of the specific person estimated by the orientation estimating unit and causes the selected specific person identifying unit to identify the specific person from the image. .

2. The image identification device according to claim 1, wherein the orientation estimating unit refers to a temporal change in the position of the recognized person on the image to estimate the orientation of the specific person on the image.

wherein the orientation estimating unit estimates the orientation of the specific person on the image by referring to a temporal change in the position of the recognized person on the image according to whether or not a camera that captures the image is movable. Item 3. The image identification device according to item 2.

The specific person identifying unit refers to the position of the recognized person on the image determined by the orientation estimating unit, and distinguishes the recognized person between the specific person and persons other than the specific person. The image identification device according to any one of claims 1 to 3.

5. The image identification device according to any one of claims 1 to 4, wherein the specific person identification unit identifies the positional relationship of the specific person by referring to the direction of the specific person estimated by the direction estimation unit.

6. The image identification device according to any one of claims 1 to 5, wherein the orientation estimating unit estimates any one of front, back, left and right as the orientation of the specific person.

7. The apparatus according to any one of claims 1 to 6, further comprising a metadata generation unit that associates the identification result of the specific person identification unit and time information of the video in which the identification result is identified, and generates metadata of the video. 1. The image identification device according to claim 1.

A scene search unit that searches for video scenes that match predetermined search conditions by referring to the metadata.
The image identification device according to claim 7 .

A classification system for classifying teacher data for an image identification device that identifies a specific person from video,
an orientation estimation unit that recognizes a person from an image, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image;
A classification system comprising: a storage unit corresponding to the orientation of the specific person estimated by the orientation estimation unit;

A production support system comprising a relay van loaded with a camera, an image identification device that identifies a specific specific person from the image of the camera, and a reporting system that reports that the specific person has passed a predetermined point,
The relay vehicle is
Having a position information acquisition unit that acquires position information of the broadcast vehicle on the course,
The image identification device is
an orientation estimating unit that recognizes a person from an image captured by a camera mounted on the broadcast vehicle, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image; a plurality of specific person identification units trained to identify the specific person and the specific person identification unit corresponding to the orientation of the specific person estimated by the orientation estimation unit are selected for each orientation of the and a control unit for identifying the specific person from the image by the selected specific person identification unit, wherein the specific person identification unit refers to the orientation of the specific person estimated by the orientation estimation unit, and the Identify the positional relationship of a specific person,
The reporting system has an aggregation unit that aggregates the results of the reports,
The production support system uses the location information of the relay vehicle, the positional relationship of the specific person identified by the image identification device from the image of the camera mounted on the relay vehicle, and the aggregated result of the reporting system, A production support system having a screen generation unit that generates a screen that visually expresses the positional relationship between the broadcast van and the specific person.

An image identification method for identifying a specific person from a video,
recognizing a person from an image, referring to the position of the recognized person on the image, and estimating the orientation of the specific person on the image;
selecting a specific person identification unit corresponding to the estimated orientation of the specific person, and identifying the specific person from the image by the selected specific person identification unit;
The image identification method, wherein the specific person identifying unit is trained for identifying the specific person for each orientation of the specific person.

A classification method for classifying teacher data for an image identification device that identifies a specific person from a video,
recognizing a person from an image, referring to the position of the recognized person on the image, and estimating the orientation of the specific person on the image;
A classification method for storing an image with an estimated orientation of the specific person in a storage unit corresponding to the estimated orientation of the specific person.

A program for an information processing device that identifies a specific specific person from a video,
an orientation estimation unit that recognizes a person from an image, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image;
a plurality of specific person identification units trained to identify the specific person for each orientation of the specific person;
The information processing device functions as a control unit that selects the specific person identifying unit corresponding to the orientation of the specific person estimated by the orientation estimating unit, and identifies the specific person from the image by the selected specific person identifying unit. program to make

A program for an information processing device that classifies teacher data for an image identification device that identifies a specific person from a video,
an orientation estimation unit that recognizes a person from an image, refers to the position of the recognized person on the image, and estimates the orientation of the specific person on the image;
A program that causes an information processing device to function as a storage unit that stores an image obtained by estimating the direction of the specific person in a storage unit corresponding to the direction of the specific person estimated by the direction estimation unit.