JP2007134913A

JP2007134913A - Method and device for selecting image

Info

Publication number: JP2007134913A
Application number: JP2005325446A
Authority: JP
Inventors: Masahiro Iwasaki; 正宏岩崎; Takeo Azuma; 健夫吾妻; Kazuo Nobori; 一生登
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-11-09
Filing date: 2005-11-09
Publication date: 2007-05-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method capable of selecting an image which is easy for a user to recognize movements and shapes of the arms and legs of a person out of a plurality of camera images. <P>SOLUTION: The method of selecting at least one image out of the plurality of images picked up by a plurality of cameras includes a step (S301) of acquiring a plurality of images picked up by the plurality of cameras and including the person, steps (S302 to S305) of extracting a feature quantity found from a plurality of representative points including in the plurality of images and provided on an outline of a subject silhouette, an unevenness degree decision step (S306) of deciding degrees of unevenness indicating how much the shape of the subject silhouette is uneven from feature values as to the plurality of images, and a step (S307) of selecting at least one image for recognizing shapes of the arms or legs of the person from the plurality of images based upon the degrees of unevenness of the shape of the subject silhouette of the plurality of images. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、複数のカメラで同時に撮像した映像の中から少なくとも１つの映像を選択する映像選択方法に関し、特に、人物を含むシーンを複数のカメラで同時に撮像した映像から、人物の四肢の動きや形状をユーザが認識しやすい映像を選択する映像選択方法に関する。 The present invention relates to an image selection method for selecting at least one image from images simultaneously captured by a plurality of cameras, and in particular, from the images obtained by simultaneously capturing a scene including a person using a plurality of cameras, The present invention relates to a video selection method for selecting a video whose shape is easy for a user to recognize.

従来、複数箇所に設置したカメラで店舗や街頭等を撮像し、監視する場合には、各カメラから得られた映像をモニタに表示し、その映像を監視員が監視していた。また、複数のカメラを用いて、広範囲を撮像する必要のあるサッカー等のスポーツ中継においても、放送に用いる映像の切り替えは、ディレクターの指示によって行われていた。このような場合、監視員やディレクターは、複数の映像を同時に注視する必要があり、負担が大きい。また、カメラの台数が増えると、複数の映像に同時に注意を払うことが難しくなる。 Conventionally, when a store or a street is imaged and monitored with cameras installed at a plurality of locations, images obtained from each camera are displayed on a monitor, and the images are monitored by an observer. Also, in sports broadcasting such as soccer where it is necessary to capture a wide range using a plurality of cameras, switching of the video used for broadcasting is performed according to instructions from the director. In such a case, it is necessary for the observer or director to watch a plurality of images at the same time, which is a heavy burden. Also, as the number of cameras increases, it becomes difficult to pay attention to multiple images simultaneously.

一方で、複数のカメラで同時に撮像した映像から、セレクタを用いてカメラ映像を自動的に選択する方法が提案されている（例えば、特許文献１および特許文献２参照。）。 On the other hand, a method has been proposed in which a camera image is automatically selected from images simultaneously captured by a plurality of cameras using a selector (see, for example, Patent Document 1 and Patent Document 2).

特許文献１には、複数のカメラで撮像した映像から、それぞれの映像に対して動きベクトルを検出し、その動きベクトルの大きさが所定値以上の場合に、それに対応するカメラ映像をセレクタで選択する方法が開示されている。同様に、背景画像を事前に用意しておき、その背景画像と現在の映像との差分値の大きさに基づいて侵入物体の存否を判断し、侵入物体が存在すると判断した場合に、それに対応するカメラ映像を選択する方法もある。また、同様に人物検出方法を併用する事によって、人物を検出したカメラ映像を選択する方法もある。 In Patent Document 1, a motion vector is detected for each video from videos taken by a plurality of cameras, and when the magnitude of the motion vector is a predetermined value or more, a corresponding camera video is selected by a selector. A method is disclosed. Similarly, a background image is prepared in advance, the presence / absence of an intruding object is determined based on the size of the difference between the background image and the current video, and if there is an intruding object, that is handled There is also a method of selecting a camera image to be performed. Similarly, there is a method of selecting a camera image in which a person is detected by using a person detection method together.

また、特許文献２には、複数台の監視カメラ映像において、監視員が特定人物をマーキングし、マーキングされた特定人物の顔画像と複数台の監視カメラで撮像している人物の顔画像とを照合して、特定人物の映像を選択的に表示する方法が開示されている。
特開平１−１７１０９７号公報特開２００４−１２８６１５号公報 In Patent Document 2, a surveillance person marks a specific person in a plurality of surveillance camera images, and a face image of the marked specific person and a face image of a person captured by a plurality of surveillance cameras are recorded. A method of selectively displaying an image of a specific person by collating is disclosed.
Japanese Patent Laid-Open No. 1-171097 JP 2004-128615 A

しかしながら、上記従来技術に代表されるカメラ映像の選択方法を用いても、人物の四肢の動きや形状をユーザが認識しやすい映像を選択し、表示することができない。 However, even if a camera image selection method typified by the above prior art is used, it is not possible to select and display an image in which the user can easily recognize the movement and shape of the limbs of a person.

なぜならば、特許文献１に代表されるカメラ映像の選択方法では、動きベクトルを用いて、その動きベクトルの大きさが所定の閾値を越えるか否かに基づいてカメラ映像の選択を行っている。この場合、映像中に存在する移動物体が人物か否かに関わらず、映像に時間的な変化が起こった場合に、そのカメラ映像が選択されることになる。そのため、人物の四肢の動きや形状を認識しやすい映像とは必ずしもならない。 This is because the camera image selection method represented by Patent Document 1 uses a motion vector to select a camera image based on whether the magnitude of the motion vector exceeds a predetermined threshold. In this case, regardless of whether or not the moving object present in the video is a person, the camera video is selected when a temporal change occurs in the video. For this reason, the image is not necessarily easy to recognize the movement and shape of the person's limbs.

また、人物検出方法を用いて映像中に人物が存在するカメラ映像を選択する場合、および特許文献２に代表される顔検出技術を用いて映像中に人物の顔が存在するカメラ映像を選択する場合においては、人物が被写体となっているカメラ映像を選択することは可能であっても、人物の四肢の動きや形状を認識しやすい映像を選択できるとは限らない。 In addition, when a camera image in which a person is present in the image is selected using the person detection method, and a camera image in which a person's face is present in the image is selected using a face detection technique represented by Patent Document 2. In some cases, even if it is possible to select a camera image in which a person is a subject, it is not always possible to select an image in which the movement and shape of the person's limbs can be easily recognized.

また、人物は多関節物体であるために、車や電車等の形状変化のない物体とは異なり、姿勢変化が問題となる。すなわち、姿勢によって四肢の動きや形状の認識のしやすさが異なる。このため、カメラと被写体との相対位置関係をあらかじめ一意に決定することが難しい。そのため、人物の四肢の動きや形状をユーザが認識しやすい映像を得るためには、姿勢や動作ごとにその都度、カメラと被写体との相対位置関係を決定する必要がある。 In addition, since a person is an articulated object, a change in posture becomes a problem unlike an object such as a car or a train that does not change its shape. That is, the ease of recognizing the movement and shape of the limbs varies depending on the posture. For this reason, it is difficult to uniquely determine the relative positional relationship between the camera and the subject in advance. Therefore, in order to obtain an image in which the user can easily recognize the movement and shape of the person's limbs, it is necessary to determine the relative positional relationship between the camera and the subject for each posture and movement.

そして、人物の四肢の動きや形状を認識しやすい映像を得るためには、人物の四肢が胴体等の身体パーツによって隠されていない映像であることが望ましい。そのため、特に人物の四肢の見えを評価することが必要である。 In order to obtain an image that can easily recognize the movement and shape of a person's limbs, it is desirable that the person's limbs be an image that is not hidden by body parts such as the trunk. Therefore, it is particularly necessary to evaluate the appearance of a person's limbs.

本発明は、上述の課題を解決するためになされたものであり、複数のカメラ映像の中から、人物の四肢の動きや形状をユーザが認識しやすい映像を選択することができる映像選択方法を提供することを目的とする。 The present invention has been made to solve the above-described problem, and provides a video selection method capable of selecting a video from which a user can easily recognize the movement and shape of a person's limb from a plurality of camera videos. The purpose is to provide.

上記目的を達成するために、本発明に係る映像選択方法は、複数のカメラで撮像された複数の映像のうち、少なくとも１つの映像を選択する映像選択方法であって、複数のカメラで撮像された人物を含む複数の映像を取得する映像取得ステップと、前記複数の映像の各々に含まれる人物の被写体シルエットの輪郭線上に設けられた複数の代表点より求められる特徴量を抽出する特徴量抽出ステップと、前記複数の映像の各々について、前記特徴量から前記被写体シルエットの形状の凹凸度合いを示す凹凸度を判定する凹凸度判定ステップと、前記複数の映像の凹凸度に基づいて、前記複数の映像から前記人物の手または足の形状を認識するための少なくとも１つの映像を選択する映像選択ステップとを含む。 In order to achieve the above object, a video selection method according to the present invention is a video selection method for selecting at least one video from a plurality of videos taken by a plurality of cameras, which is picked up by a plurality of cameras. A video acquisition step for acquiring a plurality of videos including a person and a feature extraction for extracting feature quantities obtained from a plurality of representative points provided on the contour lines of the subject silhouette of the person included in each of the videos An unevenness degree determining step for determining an unevenness degree indicating the degree of unevenness of the shape of the subject silhouette from the feature amount for each of the plurality of images, and the plurality of images based on the unevenness degree of the plurality of images. A video selection step of selecting at least one video for recognizing the shape of the hand or foot of the person from the video.

なお、本発明は、このような特徴的なステップを備える映像選択方法として実現することができるだけでなく、映像選択方法に含まれる特徴的なステップを手段とする映像選択装置として実現したり、映像選択方法に含まれる特徴的なステップをコンピュータに実行させるプログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ（Compact Disc-Read Only Memory）等の記録媒体やインターネット等の通信ネットワークを介して流通させることができるのは言うまでもない。 The present invention can be realized not only as a video selection method including such characteristic steps, but also as a video selection device using the characteristic steps included in the video selection method as a means. The characteristic steps included in the selection method can be realized as a program for causing a computer to execute the steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet.

本発明によると、複数のカメラ映像の中から、人物の四肢の動きや形状をユーザが認識しやすい映像を選択することができる映像選択方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the image | video selection method which can select the image | video which a user can recognize easily the motion and shape of a person's limb from several camera images can be provided.

本発明に係る映像選択方法は、複数のカメラで撮像された複数の映像のうち、少なくとも１つの映像を選択する映像選択方法であって、複数のカメラで撮像された人物を含む複数の映像を取得する映像取得ステップと、前記複数の映像の各々に含まれる人物の被写体シルエットの輪郭線上に設けられた複数の代表点より求められる特徴量を抽出する特徴量抽出ステップと、前記複数の映像の各々について、前記特徴量から前記被写体シルエットの形状の凹凸度合いを示す凹凸度を判定する凹凸度判定ステップと、前記複数の映像の凹凸度に基づいて、前記複数の映像から前記人物の手または足の形状を認識するための少なくとも１つの映像を選択する映像選択ステップとを含む。より具体的には、前記映像選択ステップでは、前記凹凸度が最大となる映像を選択する。 A video selection method according to the present invention is a video selection method for selecting at least one video among a plurality of videos captured by a plurality of cameras, and a plurality of videos including a person captured by a plurality of cameras. A video acquisition step of acquiring, a feature amount extraction step of extracting a feature amount obtained from a plurality of representative points provided on a contour line of a subject silhouette of a person included in each of the plurality of videos; An unevenness degree determining step for determining an unevenness degree indicating the unevenness degree of the shape of the subject silhouette from the feature amount for each, and the human hand or foot from the plurality of images based on the unevenness degree of the plurality of images. A video selection step of selecting at least one video for recognizing the shape of the video. More specifically, in the image selection step, an image having the maximum unevenness is selected.

この構成によると、人物を含む映像における被写体シルエットの形状の凹凸度を判定している。このため、人物の四肢の形状をユーザが認識しやすい映像を選択することができる。 According to this configuration, the degree of unevenness of the shape of the subject silhouette in an image including a person is determined. For this reason, it is possible to select an image in which the user can easily recognize the shape of the limb of the person.

好ましくは、前記特徴量抽出ステップでは、所定の基準点と前記複数の代表点との間の距離を特徴量として抽出する。 Preferably, in the feature amount extraction step, a distance between a predetermined reference point and the plurality of representative points is extracted as a feature amount.

輪郭線上の代表点を用いることによって、被写体シルエットの画像のすべての画素値を用いる場合と比較して、被写体シルエットの形状を効率的に表現することができる。 By using the representative points on the contour line, the shape of the subject silhouette can be expressed more efficiently than when all the pixel values of the subject silhouette image are used.

さらに好ましくは、前記凹凸度判定ステップは、前記複数の代表点における複数の特徴量を連結することにより曲線を生成する曲線生成ステップと、前記曲線における極大値を検出する極大値検出ステップと、前記極大値に基づいて、前記凹凸度を算出する凹凸度算出ステップとを含む。 More preferably, the unevenness degree determination step includes a curve generation step of generating a curve by connecting a plurality of feature quantities at the plurality of representative points, a maximum value detection step of detecting a maximum value in the curve, And an unevenness degree calculating step for calculating the unevenness degree based on the local maximum value.

このように、曲線における極大値の数を用いることによって、人物の四肢の姿勢を強く反映した被写体シルエットの形状を表現することが可能である。 Thus, by using the number of local maximum values in the curve, it is possible to express the shape of the subject silhouette that strongly reflects the posture of the person's limbs.

さらに好ましくは、前記凹凸度判定ステップは、前記複数の代表点における複数の特徴量を連結することにより曲線を生成する曲線生成ステップと、前記曲線における極大値を検出する極大値検出ステップと、前記極大値と前記極大値の数とに基づいて、前記極大値を人物の手または足に対応付ける対応付けステップと、対応付けの結果と前記極大値とに基づいて、前記凹凸度を算出する凹凸度算出ステップとを含む。より具体的には、前記凹凸度算出ステップでは、前記対応付けの結果に基づいて、手に対応付けられた極大値または足に対応付けられた極大値のいずれか一方に重み付けを行ない、前記凹凸度を算出する。 More preferably, the unevenness degree determination step includes a curve generation step of generating a curve by connecting a plurality of feature quantities at the plurality of representative points, a maximum value detection step of detecting a maximum value in the curve, An associating step for associating the local maximum value with a person's hand or foot based on the local maximum value and the number of local maximum values, and an unevenness degree for calculating the unevenness level based on the result of the association and the local maximum value Calculation step. More specifically, in the unevenness degree calculating step, based on the result of the association, one of a local maximum value associated with a hand and a local maximum value associated with a foot is weighted, and the unevenness Calculate the degree.

四肢の対応付けを行ない、極大値に重み付けを行なうことにより、手または足に着目して映像選択を行なうことができるようになる。例えば、手に着目した映像選択を行なうようにすれば、万引き現場の監視用途等に利用することができる。 By associating the limbs and weighting the maximum value, it becomes possible to select an image while paying attention to the hand or foot. For example, if video selection focusing on the hand is performed, it can be used for shoplifting spot monitoring.

さらに好ましくは、前記凹凸度判定ステップでは、さらに、前記複数の映像の各々について、時間的に連続する複数の映像の凹凸度の時間平均を算出し、算出結果を前記被写体シルエット形状の凹凸度とする。 More preferably, in the unevenness degree determining step, for each of the plurality of videos, a time average of unevenness degrees of a plurality of temporally continuous images is calculated, and the calculation result is calculated as the unevenness degree of the subject silhouette shape. To do.

このように凹凸度の時間平均を新たな凹凸度とすることにより、特に動作中の四肢の形状や動きをユーザが認識しやすい映像を選択することができる。 In this way, by setting the time average of the unevenness as a new unevenness, it is possible to select an image in which the user can easily recognize the shape and movement of the currently operating limb.

さらに好ましくは、前記映像選択ステップでは、選択された映像を基準として、当該映像とは反対の方向から被写体を撮像している映像を同時に選択する。また、前記映像選択ステップでは、選択された映像を基準として、当該映像を撮像しているカメラの光軸方向と直交する方向から被写体を撮像している映像を同時に選択するようにしてもよい。 More preferably, in the video selection step, video images of the subject taken from the opposite direction to the video images are simultaneously selected on the basis of the selected video images. Further, in the video selection step, the video that is capturing the subject may be simultaneously selected from the direction orthogonal to the optical axis direction of the camera that is capturing the video with reference to the selected video.

他の方向から撮像された映像を選択することにより、一方の方向から撮像された映像を選択するよりも、四肢の形状や動きをユーザが認識しやすくなる。また、他の方向から撮像された映像を選択することにより、人間の顔が映った映像を選択することができる。 Selecting an image captured from another direction makes it easier for the user to recognize the shape and movement of the limb than selecting an image captured from one direction. Further, by selecting a video imaged from another direction, a video image showing a human face can be selected.

さらに好ましくは、前記映像選択方法は、さらに、前記複数のカメラで撮像された前記複数の映像の中から、人物の顔画像を含む映像を検出する顔画像検出ステップを含み、前記映像選択ステップでは、さらに、顔画像を含む映像の検出結果に基づいて、前記複数の映像から前記人物の顔を認識するための少なくとも１つの映像を選択する。 More preferably, the image selection method further includes a face image detection step of detecting an image including a face image of a person from the plurality of images captured by the plurality of cameras. In the image selection step, Further, at least one video for recognizing the person's face is selected from the plurality of videos based on the detection result of the video including the face image.

この方法によると、顔画像と人物の四肢の形状をユーザが同時に認識しやすい映像を選択することができる。 According to this method, it is possible to select an image in which the user can easily recognize the face image and the shape of the limbs of the person at the same time.

以下、本発明の実施の形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（実施の形態１）
まず、本発明の実施の形態１に係る映像表示装置について説明する。実施の形態１に係る映像表示装置は、複数のカメラで撮像された映像の中から、人物の四肢の形状をユーザが認識しやすい映像を選択し、表示する。 (Embodiment 1)
First, the video display apparatus according to Embodiment 1 of the present invention will be described. The video display apparatus according to Embodiment 1 selects and displays a video from which a user can easily recognize the shape of a person's limb from videos taken by a plurality of cameras.

図１は、本発明の実施の形態１に係る映像表示装置の構成を示す図である。図１の映像表示装置は、人物を含む映像における被写体シルエットの形状の凹凸度合いを示す凹凸度を判定することによって、人物の四肢の形状をユーザが認識しやすい映像を選択し、表示する装置であり、撮像部１０１と、画像保持部１０２と、特徴抽出部１０３と、形状判定部１０４と、カメラ選択部１０５とを備えている。 FIG. 1 is a diagram showing a configuration of a video display apparatus according to Embodiment 1 of the present invention. The video display apparatus of FIG. 1 is an apparatus that selects and displays an image in which a user can easily recognize the shape of a person's limb by determining the degree of unevenness indicating the degree of unevenness of the shape of a subject silhouette in an image including a person. And an imaging unit 101, an image holding unit 102, a feature extraction unit 103, a shape determination unit 104, and a camera selection unit 105.

撮像部１０１は、複数台のカメラ１０１ａ〜１０１ｈにより構成され、複数の映像を撮像する処理部であり、映像取得手段の一例である。 The imaging unit 101 includes a plurality of cameras 101a to 101h, is a processing unit that captures a plurality of videos, and is an example of a video acquisition unit.

画像保持部１０２は、撮像部１０１で撮像された複数の画像を保持する記憶装置である。ここで、画像保持部１０２が保持する画像は、時系列に並んだ画像であっても構わない。 The image holding unit 102 is a storage device that holds a plurality of images captured by the imaging unit 101. Here, the images held by the image holding unit 102 may be images arranged in time series.

特徴抽出部１０３は、画像保持部１０２に保持された画像から被写体の輪郭形状を表す輪郭特徴量を抽出する処理部であり、特徴量抽出手段の一例である。 The feature extraction unit 103 is a processing unit that extracts a contour feature amount representing the contour shape of the subject from the image held in the image holding unit 102, and is an example of a feature amount extraction unit.

図２は、特徴抽出部１０３の詳細な構成を示す図である。
図２に示すように、特徴抽出部１０３は、人物領域抽出部１０３１と、シルエット抽出部１０３２と、輪郭特徴抽出部１０３３とから構成される。 FIG. 2 is a diagram illustrating a detailed configuration of the feature extraction unit 103.
As shown in FIG. 2, the feature extraction unit 103 includes a person region extraction unit 1031, a silhouette extraction unit 1032, and a contour feature extraction unit 1033.

人物領域抽出部１０３１は、画像保持部１０２に保持された画像から人物領域を抽出する処理部である。シルエット抽出部１０３２は、抽出した人物領域の画素値を「１」とし、それ以外を「０」とするような２値化処理を行い、被写体シルエットを抽出する処理部である。輪郭特徴抽出部１０３３は、被写体シルエットに対して、被写体シルエットの輪郭に関する特徴量を抽出する処理部である。 The person area extraction unit 1031 is a processing unit that extracts a person area from the image held in the image holding unit 102. The silhouette extraction unit 1032 is a processing unit that extracts a subject silhouette by performing binarization processing so that the pixel value of the extracted person region is “1” and the others are “0”. The contour feature extraction unit 1033 is a processing unit that extracts a feature amount related to the contour of the subject silhouette with respect to the subject silhouette.

形状判定部１０４は、輪郭特徴抽出部１０３３で抽出した輪郭特徴量からシルエット抽出部１０３２で得た被写体シルエットの形状の凹凸度を判定する処理部であり、凹凸度判定手段の一例である。この凹凸度は、人物の四肢の見えを大きく反映する値である。 The shape determination unit 104 is a processing unit that determines the unevenness degree of the shape of the subject silhouette obtained by the silhouette extraction unit 1032 from the contour feature amount extracted by the contour feature extraction unit 1033, and is an example of an unevenness degree determination unit. This unevenness degree is a value that largely reflects the appearance of a person's limbs.

カメラ選択部１０５は、複数台のカメラ１０１ａ〜１０１ｈの中から、形状判定部１０４で判定した被写体シルエットの形状の凹凸度に基づいて、複数のカメラ映像の中から、人物の四肢の形状をユーザが認識しやすいカメラ映像を選択する処理部であり、映像選択手段の一例である。 The camera selection unit 105 selects the shape of a person's limb from a plurality of camera images based on the degree of unevenness of the shape of the subject silhouette determined by the shape determination unit 104 from the plurality of cameras 101a to 101h. Is a processing unit that selects a camera video that is easy to recognize, and is an example of video selection means.

表示部１０６は、カメラ選択部１０５で選択された映像をモニタ等に表示する処理部である。 The display unit 106 is a processing unit that displays the video selected by the camera selection unit 105 on a monitor or the like.

これにより、人物の四肢の形状をユーザが認識しやすい映像を選択し、表示することが可能である。 Thereby, it is possible to select and display an image in which the user can easily recognize the shape of the limb of the person.

以下に、本実施の形態に係る映像表示装置による映像表示方法について、図３および図４を用いて詳細に説明する。 Hereinafter, a video display method by the video display apparatus according to the present embodiment will be described in detail with reference to FIGS.

図３は、映像表示装置の実行する処理のフローチャートである。図４は、輪郭特徴量の抽出方法について説明するための図である。 FIG. 3 is a flowchart of processing executed by the video display apparatus. FIG. 4 is a diagram for explaining a contour feature amount extraction method.

まず、人物領域抽出部１０３１は、図４（ａ）に示すような入力画像４０１を画像保持部１０２から受け付ける（Ｓ３０１）。 First, the person area extraction unit 1031 receives an input image 401 as shown in FIG. 4A from the image holding unit 102 (S301).

次に、人物領域抽出部１０３１は、入力された画像４０１に対して図４（ｂ）に示すような背景画像４０２を用いて背景差分処理を行い、人物領域を抽出する（Ｓ３０２）。なお、ここでは、背景差分処理の代わりにフレーム間差分処理を行って人物領域を抽出しても良いし、領域分割手法を用いて人物領域を抽出しても良い。さらに、人物検出処理を併用する場合には、M.Oren, C.Papageorgiou, P.Sinha, E.Osuna and T.Poggio, "Pedestrian Detection using wavelet templates", Proc.of CVPR97, pp.193-199, 1997に開示されている技術等を用いて、人物検出処理を行い、その結果を用いて人物領域を切り出しても良い。さらに、エッジ検出処理を併用しても良い。また、背景差分処理を行う場合は、人物の存在しない背景となる画像を事前に準備して、画像保持部１０２に保持しておく必要がある。さらに、動画像から背景スプライト画像を生成し、背景差分処理のための背景画像として用いることもできる。 Next, the person area extraction unit 1031 performs background difference processing on the input image 401 using a background image 402 as shown in FIG. 4B to extract a person area (S302). Here, instead of the background difference process, the inter-frame difference process may be performed to extract the person area, or the person area may be extracted using the area division method. Furthermore, when using human detection processing together, M.Oren, C.Papageorgiou, P.Sinha, E.Osuna and T.Poggio, "Pedestrian Detection using wavelet templates", Proc.of CVPR97, pp.193-199 , 1997 may be used to perform person detection processing, and the result may be used to cut out a person region. Furthermore, edge detection processing may be used in combination. In addition, when performing the background difference process, it is necessary to prepare an image as a background without a person in advance and hold it in the image holding unit 102. Furthermore, a background sprite image can be generated from a moving image and used as a background image for background difference processing.

次に、シルエット抽出部１０３２は、Ｓ３０２で抽出した人物領域を含む画像に対して、抽出した人物領域の画素をオンピクセルとし、それ以外をオフピクセルとするような２値化処理を行う（Ｓ３０３）。これによって、図４（ｃ）に示すような人物画像の被写体シルエット４０３が得られる。ここで、被写体シルエット４０３は、ハッチングを施した部分がオンピクセルの部分であり、白色部分がオフピクセルの部分の例である。 Next, the silhouette extraction unit 1032 performs a binarization process on the image including the person area extracted in S302 so that the extracted pixel of the person area is an on-pixel and the other is an off-pixel (S303). ). As a result, a subject silhouette 403 of a person image as shown in FIG. 4C is obtained. Here, the subject silhouette 403 is an example in which the hatched portion is an on-pixel portion and the white portion is an off-pixel portion.

なお、ここで、ノイズの影響により起こる輪郭の不要な凹凸を減らすために、被写体シルエットを平滑化することも有効である。 Note that it is also effective to smooth the subject silhouette in order to reduce unnecessary irregularities in the contour caused by the influence of noise.

次に、輪郭特徴抽出部１０３３は、被写体シルエットの輪郭を抽出する（Ｓ３０４）。輪郭抽出は、オンピクセルとオフピクセルの境界点を抽出することにより可能である。また、Ｓ３０３にて、２値化処理を行わない場合は、被写体シルエットに対して空間微分フィルタを適用することで輪郭を抽出することができる。また、抽出した輪郭の細線化を行う場合は、Ｈｉｌｄｉｔｃｈの細線化アルゴリズムを用いて行うことができる。空間微分フィルタおよびＨｉｌｄｉｔｃｈの細線化アルゴリズムについては、C.J.Hilditch,"Linear Skeletones From Square Cupboards",Machine Intelligence 4, Eginburgh Univ.Press, p.403, 1969に詳細が記述されている。 Next, the contour feature extraction unit 1033 extracts the contour of the subject silhouette (S304). Contour extraction is possible by extracting boundary points between on-pixels and off-pixels. If the binarization process is not performed in S303, the contour can be extracted by applying a spatial differential filter to the subject silhouette. Further, when thinning the extracted contour, it can be performed using the thinning algorithm of Hilditch. Details of the spatial differential filter and the Hilitch thinning algorithm are described in C. J. Hilditch, “Linear Skeletones From Square Cupboards”, Machine Intelligence 4, Eginburgh Univ. Press, p. 403, 1969.

次に、輪郭特徴抽出部１０３３は、抽出した被写体シルエットの輪郭から被写体シルエットの輪郭形状を表現する輪郭特徴量を抽出する（Ｓ３０５）。輪郭特徴量の抽出方法の一例について図４および図５を用いて以下に説明する。なお、輪郭特徴量としては、被写体シルエットの輪郭形状を表現する特徴量であれば以下の例に限らない。 Next, the contour feature extraction unit 1033 extracts a contour feature amount expressing the contour shape of the subject silhouette from the extracted contour of the subject silhouette (S305). An example of the contour feature amount extraction method will be described below with reference to FIGS. The contour feature amount is not limited to the following example as long as it is a feature amount expressing the contour shape of the subject silhouette.

まず、輪郭特徴抽出部１０３３は、Ｓ３０４にて抽出した被写体シルエット４０３に対して、基準点を決定する。望ましい例としては、図４（ｄ）に示すように、被写体シルエットの重心位置ｇを基準点として求める。ここで、基準点は、被写体シルエットのすべての画素を用いずに、被写体シルエットの輪郭画素位置の重心としても構わない。 First, the contour feature extraction unit 1033 determines a reference point for the subject silhouette 403 extracted in S304. As a desirable example, as shown in FIG. 4D, the gravity center position g of the subject silhouette is obtained as a reference point. Here, the reference point may be the center of gravity of the contour pixel position of the subject silhouette without using all the pixels of the subject silhouette.

次に、輪郭特徴抽出部１０３３は、被写体シルエットの主軸４０５を算出する。被写体シルエットの主軸４０５は、シルエット上の各輪郭点から垂線を下ろした時に、垂線の長さの2乗和が最小になる軸とすることで求めることができる。 Next, the contour feature extraction unit 1033 calculates the main axis 405 of the subject silhouette. The main axis 405 of the subject silhouette can be obtained by setting the axis that minimizes the sum of squares of the lengths of the perpendiculars when the perpendicular is drawn from each contour point on the silhouette.

具体的には、求めたい主軸４０５は、傾きｔａｎθの直線であり、重心（ｇ_x，ｇ_y）を通るため、以下のように表せる。

Specifically, the main axis 405 to be obtained is a straight line having an inclination tan θ and passes through the center of gravity (g _x , g _y ), and can be expressed as follows.

次に、各輪郭点（ｘ_i，ｙ_i）から（数１）で表される直線に垂線を下ろすと、垂線の長さｄ_iは、

で表される。 Next, when a perpendicular is drawn from each contour point (x _i , y _i ) to the straight line represented by (Equation 1), the length d _{i of the} perpendicular is

It is represented by

そこで、最小化したい関数Ｊは、各輪郭点から引いた垂線の長さの2乗和をとったものであるため、（数１）と（数２）を用いて、次のように表される。

Therefore, the function J to be minimized is obtained by taking the sum of squares of the lengths of the perpendicular lines drawn from the contour points, and is expressed as follows using (Equation 1) and (Equation 2). The

ここで、（数３）を最小化するためには

より、

となる。 Here, in order to minimize (Equation 3)

Than,

It becomes.

（数５）より、直線の傾きは（数６）で表す事ができる。そして（数１）に直線の傾きを代入することで、主軸４０５を得ることができる。

From (Equation 5), the slope of the straight line can be expressed by (Equation 6). Then, the main axis 405 can be obtained by substituting the slope of the straight line into (Equation 1).

次に、輪郭特徴抽出部１０３３は、輪郭特徴量の始点を決定するために、被写体シルエットの輪郭上の各点と主軸４０５との交点を求める。ここで、輪郭上の各点と求めた主軸との交点４０７が複数点存在する場合は、図４（ｅ）に示すように、輪郭点中で最上部に位置する輪郭点を輪郭特徴量の始点４０６とする。この時、始点４０６は、必ずしも最上部に位置する輪郭点を選択する必要は無い。ただし、特に対象が人物である場合は、主軸４０５と交差する最上部の輪郭点を始点４０６とすることが有効である。理由は、始点４０６が頭部に対応しやすくなるためである。具体的には,頭部は、腕や足の領域と異なり変形が起きにくい。また、影の影響は、地面に対して出やすいために、その影響も受けにくいことが挙げられる。 Next, the contour feature extraction unit 1033 obtains the intersection point between each point on the contour of the subject silhouette and the main axis 405 in order to determine the starting point of the contour feature value. Here, when there are a plurality of intersection points 407 between each point on the contour and the obtained main axis, as shown in FIG. Let it be the start point 406. At this time, the start point 406 does not necessarily need to select the contour point located at the top. However, in particular, when the target is a person, it is effective to set the uppermost contour point intersecting with the main axis 405 as the start point 406. The reason is that the starting point 406 can easily correspond to the head. Specifically, the head is unlikely to deform unlike the arms and legs. Moreover, since the influence of a shadow is easy to come out with respect to the ground, it is mentioned that it is hard to receive the influence.

次に、輪郭特徴抽出部１０３３は、始点４０６から、被写体シルエットの輪郭上の各点をトレースする。輪郭点のトレース方法は、「安居院猛、長尾智晴著、「画像の処理と認識」、７２ページ、昭晃堂、１９９２年発行」に詳細が記述されている。 Next, the contour feature extraction unit 1033 traces each point on the contour of the subject silhouette from the start point 406. Contour point tracing methods are described in detail in “Takeharu Aoi, Tomoharu Nagao,“ Processing and Recognition of Images ”, page 72, published by Shosho-do, 1992”.

次に、輪郭特徴抽出部１０３３は、図４（ｅ）に示すように、トレースした輪郭上の各点と重心ｇとの距離ｌを算出する。ここで、輪郭特徴量は、図５の輪郭特徴量５０３に示すように、被写体シルエットの輪郭上の各点と重心との距離ｌを要素とするベクトルとして表現する。なお、図５におけるＡ〜Ｉの記号は、それぞれ画像５０１および画像５０２に示した被写体シルエットの輪郭上の代表点の一部とその輪郭特徴量における対応点を示す。 Next, the contour feature extraction unit 1033 calculates a distance l between each point on the traced contour and the center of gravity g, as shown in FIG. Here, the contour feature amount is expressed as a vector having a distance l between each point on the contour of the subject silhouette and the center of gravity as an element, as indicated by the contour feature amount 503 in FIG. Note that symbols A to I in FIG. 5 indicate a part of representative points on the contours of the subject silhouettes shown in the images 501 and 502, and corresponding points in the contour feature values.

ここで、図６（ａ）に示すように、被写体の姿勢や画像中での大きさによって、輪郭点の数が異なる場合について説明する。この場合、それぞれの被写体シルエットにおいて、輪郭点の数がｎの被写体シルエットと輪郭点の数がｍの被写体シルエットが得られたものとする。ここで、ｎ＜ｍである。 Here, as shown in FIG. 6A, a case where the number of contour points varies depending on the posture of the subject and the size in the image will be described. In this case, in each subject silhouette, a subject silhouette having n contour points and a subject silhouette having m contour points are obtained. Here, n <m.

この例の場合、輪郭特徴量ベクトルの要素数がｎおよびｍとなる。そこで、輪郭特徴量のベクトル要素数が異なることを防ぐために、ベクトルの要素数はあらかじめ決定しておき、固定値Ｚとすることが望ましい。具体的には、任意の入力画像に対して、輪郭点数がＺとなるように、全輪郭点の中から輪郭特徴量として利用する輪郭点上の代表点を図６（ｂ）に示すように（ｎ／Ｚ）点ごとまたは（ｍ／Ｚ）点ごとに選択する。本実施の形態では、このように選択した輪郭点を代表点とした特徴量を輪郭特徴量とする。 In this example, the number of elements of the contour feature quantity vector is n and m. Therefore, in order to prevent the number of vector elements of the contour feature quantity from differing, it is desirable that the number of vector elements is determined in advance and set to a fixed value Z. Specifically, as shown in FIG. 6B, representative points on the contour points to be used as the contour feature amount from all the contour points so that the number of contour points is Z for an arbitrary input image. Select every (n / Z) points or every (m / Z) points. In the present embodiment, a feature amount having the contour point selected in this way as a representative point is set as a contour feature amount.

ここで、輪郭特徴量の例について説明する。図５に示すように、複数台のカメラ１０１ａ〜１０１ｈで構成される撮像部１０１で得られた映像に対して、それぞれ被写体シルエットを抽出した後、輪郭特徴量５０３を抽出する。輪郭特徴量ｌ_cは、被写体シルエットの重心ｇと輪郭特徴量として利用する輪郭上の代表点との距離ｌを要素として持つベクトルで構成される。さらに、被写体の大きさの違いの影響を排除する場合は、被写体シルエットの面積（画素数）Ｓを計算し、次式のように、面積Ｓで正規化しても良い。

Here, an example of the outline feature amount will be described. As shown in FIG. 5, after extracting a subject silhouette for each video obtained by the imaging unit 101 including a plurality of cameras 101 a to 101 h, an outline feature quantity 503 is extracted. The contour feature value l _c is composed of a vector having as an element the distance l between the center of gravity g of the subject silhouette and the representative point on the contour used as the contour feature value. Furthermore, in order to eliminate the influence of the difference in the size of the subject, the area (number of pixels) S of the subject silhouette may be calculated and normalized with the area S as in the following equation.

ここで、ｌ_c__zは、カメラｃで撮像された画像から得た被写体シルエットの重心ｇと輪郭上の代表点ｚとの距離を示す。また、Ｓ_cは、カメラｃで撮像された画像から得た被写体シルエットの面積である。なお、このようにして得られた輪郭特徴量に対して、ローパスフィルタをかけることも有効である。これは、前述したように被写体シルエットに対して平滑化を行うことと同様の効果がある。 Here, l _{c —} _z indicates the distance between the center of gravity g of the subject silhouette obtained from the image captured by the camera c and the representative point z on the contour. S _c is the area of the subject silhouette obtained from the image captured by the camera c. It is also effective to apply a low-pass filter to the contour feature value obtained in this way. This has the same effect as smoothing the subject silhouette as described above.

次に、形状判定部１０４は、カメラｃで得られた画像ごとに輪郭特徴抽出部１０３３で抽出した輪郭特徴量ｌ_cからシルエット抽出部１０３２で得た被写体シルエット形状の凹凸度ｄ_cを計算する（Ｓ３０６）。具体的には、（数８）を用いて、図５に示すような輪郭特徴量ｌ_c（輪郭特徴量５０３）を輪郭点に沿って差分をとることで凹凸度を計算することができる。

Next, the shape determination unit 104 calculates the degree of unevenness d _c of the resulting subject silhouette shape from the outline feature quantity l _c extracted for each image obtained by the camera c the contour feature extraction unit 1033 with the silhouette extraction unit 1032 (S306). Specifically, the degree of unevenness can be calculated by taking the difference along the contour point from the contour feature quantity l _c (contour feature quantity 503) as shown in FIG.

なお、絶対値の代わりに２乗値を用いても構わないし、１階差分の代わりに２階差分を行っても構わない。これにより、例えば、図５（ａ）の場合は、ｄ＝４．１０となり、図５（ｂ）の場合は、ｄ＝５．８１となる。このように、シルエット中に人物の四肢が表現される場合は、そうでない場合と比較して、シルエット形状の凹凸度が大きくなる。 A square value may be used instead of the absolute value, or a second-order difference may be performed instead of the first-order difference. Thus, for example, in the case of FIG. 5A, d = 4.10, and in the case of FIG. 5B, d = 5.81. As described above, when the limbs of a person are expressed in the silhouette, the unevenness degree of the silhouette shape is larger than in the case where the limbs of the person are not.

さらに、時系列画像から被写体シルエット形状の凹凸度を判定する場合は、カメラｃで得られたＴ枚の画像列に対して、凹凸度ｄ_cの時間平均Ｄ_cを計算することによって、時間平均Ｄ_cを新たな凹凸度としてもよい。

Further, when the case of determining the irregularities of the object silhouette shape from the series images to T images string obtained by the camera c, by calculating the time average D _c of asperity d _c, time-averaged D _c may be a new degree of unevenness.

なお、ここで、ｄ_c（ｔ）は、カメラｃで撮像した時刻ｔの画像から得られた凹凸度である。 Here, d _c (t) is the degree of unevenness obtained from the image at time t captured by the camera c.

また、１つのカメラに複数の人物が撮像されている場合は、被写体シルエットごとに（数８）または（数９）に基づいて凹凸度を算出し、算出した値を被写体数で割ることにより、被写体シルエット１つ当たりの凹凸度の平均値を求め、その平均値を凹凸度としてもよい。また、複数の被写体シルエットから得られる複数の凹凸度の合計値を凹凸度としても良い。 Further, when a plurality of persons are captured by one camera, the degree of unevenness is calculated based on (Equation 8) or (Equation 9) for each subject silhouette, and the calculated value is divided by the number of subjects. An average value of the unevenness per subject silhouette may be obtained, and the average value may be used as the unevenness. Further, the sum of a plurality of unevenness levels obtained from a plurality of subject silhouettes may be used as the unevenness level.

次に、カメラ選択部１０５は、Ｓ３０６で算出した被写体シルエット形状の凹凸度に基づいて、カメラ映像を選択する（Ｓ３０７）。 Next, the camera selection unit 105 selects a camera image based on the unevenness degree of the subject silhouette shape calculated in S306 (S307).

ここでは、（数９）のように、被写体シルエット形状の凹凸度の時間平均を用いた場合について説明するが、（数８）の場合も同様である。 Here, the case where the time average of the unevenness degree of the subject silhouette shape is used as in (Equation 9) will be described, but the same applies to the case of (Equation 8).

図７のように、複数台のカメラ１０１ａ〜１０１ｈによって一定の領域を撮像している例で説明する。もちろん、スポーツ中継のように、各カメラはカメラマンによって撮像されていても構わない。 An example in which a certain area is imaged by a plurality of cameras 101a to 101h as shown in FIG. 7 will be described. Of course, each camera may be imaged by a cameraman as in sports broadcasting.

人物の四肢の形状をユーザが認識しやすいカメラ映像を得るためには、（数９）におけるＤ_cが最大となるカメラ映像を選択することが有効である。また、Ｄ_cの値に基づいて、上位Ｎ個のカメラ映像を選択することも可能である。なお、ここでＮは、自然数でありカメラの台数以下とする。 In order to obtain a camera image in which the user can easily recognize the shape of a person's limb, it is effective to select a camera image that maximizes D _c in (Equation 9). It is also possible to select the top N camera videos based on the value of D _c . Here, N is a natural number that is equal to or less than the number of cameras.

また、本実施の形態では、被写体シルエットから輪郭特徴量を推定しているため、例えば被写体の胸側と背中側のように、被写体の前後の区別がつき難い場合がある。このような場合を回避するために、Ｄ_cの値に基づいて選択したカメラ映像を基準として、選択したカメラの撮像方向と対面する位置に設置され、撮像されたカメラ映像を選択することも有効である。さらに、選択したカメラの撮像方向と直交する方向のカメラ映像を選択することもできる。 Further, in the present embodiment, since the contour feature amount is estimated from the subject silhouette, it may be difficult to distinguish between the front and back of the subject, such as the chest side and the back side of the subject. To avoid such a case, based on the camera image selected based on the value of D _c, it is installed in a position facing the imaging direction of the selected camera, also effective to select the captured camera images It is. Furthermore, it is possible to select a camera image in a direction orthogonal to the imaging direction of the selected camera.

また、全設置カメラの中から図８に示すように、複数のカメラの組をＡ〜Ｄのように事前に決定しておいて、一方のカメラ映像が選択された場合は、組となるもう一方のカメラ映像も選択されるようにしておいても良い。 Also, as shown in FIG. 8 among all the installed cameras, a set of a plurality of cameras is determined in advance as A to D, and when one of the camera images is selected, the set becomes a set. One camera image may be selected.

次に、表示部１０６は、図９に示すようにＳ３０７で選択された映像をモニタ９０１に表示する（Ｓ３０８）。具体的には、モニタ９０１に、Ｓ３０７で被写体シルエット形状の凹凸度に基づいて選択された映像９０２を表示する。ここでは、凹凸度の高い順にＮ個の映像を表示しても構わない。 Next, the display unit 106 displays the video selected in S307 on the monitor 901 as shown in FIG. 9 (S308). Specifically, the image 902 selected based on the unevenness degree of the subject silhouette shape in S307 is displayed on the monitor 901. Here, N images may be displayed in descending order of unevenness.

また、選択された映像９０２に加えて、そのカメラの撮像方向と対面する位置に設置、撮像されたカメラ映像９０３を表示する。この時さらに、カメラ配置９０６と共に、現在表示されている映像９０２を撮像しているカメラ９０４および当該カメラと対面する位置に設置されているカメラ９０５の設置位置を表示することも可能である。 In addition to the selected image 902, a camera image 903 that is installed and imaged at a position facing the imaging direction of the camera is displayed. At this time, it is also possible to display the camera arrangement 906 and the installation position of the camera 904 capturing the currently displayed video 902 and the camera 905 installed at the position facing the camera.

なお、映像の表示方法は、Ｓ３０７で被写体シルエット形状の凹凸度に基づいて選択された映像が表示されていれば良く、上記の例に限らない。 Note that the video display method is not limited to the above example as long as the video selected based on the unevenness degree of the subject silhouette shape in S307 is displayed.

以上により、人物を含むシーンを複数のカメラで同時に撮像した映像から、人物の四肢の形状をユーザが認識しやすい映像を自動的に選択することができる。そのため、監視員やディレクターは、複数のカメラから得られた映像すべてに注視する必要がなく、注視すべきカメラ映像の数を減らすことが可能となり、負担を軽減することができる。 As described above, an image in which the user can easily recognize the shape of a person's limb can be automatically selected from images obtained by simultaneously capturing a scene including a person with a plurality of cameras. Therefore, it is not necessary for the supervisor or director to watch all the videos obtained from the plurality of cameras, and the number of camera videos to be watched can be reduced, thereby reducing the burden.

（実施の形態１の変形例）
次に、被写体シルエットの輪郭点上の代表点を実施の形態１とは異なる方法で選択して輪郭特徴量を抽出する方法について説明する。 (Modification of Embodiment 1)
Next, a description will be given of a method of extracting a contour feature amount by selecting a representative point on the contour point of the subject silhouette by a method different from that of the first embodiment.

映像表示装置の構成は、図１および図２に示したものと同様である。また、映像表示装置の実行する処理は、図３に示したフローチャートと基本的には同一ではあるが、輪郭特徴量抽出処理（Ｓ３０５）における輪郭特徴量抽出方法が異なる。それに伴い、Ｓ３０６〜Ｓ３０８で使用される輪郭特徴量も実施の形態１とは異なる。それ以外の処理については、実施の形態１で説明したものと同様である。 The configuration of the video display device is the same as that shown in FIGS. The processing executed by the video display device is basically the same as the flowchart shown in FIG. 3, but the contour feature amount extraction method in the contour feature amount extraction process (S305) is different. Accordingly, the outline feature amount used in S306 to S308 is also different from that in the first embodiment. Other processes are the same as those described in the first embodiment.

以下に、輪郭特徴量抽出処理（Ｓ３０５）について、詳細に説明する。
図１０は、輪郭特徴量を抽出する際の代表点の決定方法を説明するための図である。 Hereinafter, the outline feature amount extraction process (S305) will be described in detail.
FIG. 10 is a diagram for explaining a method for determining a representative point when extracting a contour feature amount.

被写体シルエットの重心ｇの決定方法および被写体シルエットの主軸１００２の求め方は、実施の形態１で説明したものと同様である。また、輪郭特徴量の始点１００３の求め方も実施の形態１で説明したものと同様である。 The method for determining the center of gravity g of the subject silhouette and the method of obtaining the subject silhouette principal axis 1002 are the same as those described in the first embodiment. Further, the method for obtaining the start point 1003 of the contour feature quantity is the same as that described in the first embodiment.

次に、輪郭特徴抽出部１０３３は、図１０の矢印のように、重心ｇを中心として一定角度ごとに時計方向または反時計方向に直線を引き、被写体シルエットの輪郭との交点を代表点１００４として求める。そして、重心ｇを基準点として、基準点と代表点との距離ｌを算出する。 Next, the contour feature extraction unit 1033 draws a straight line in a clockwise direction or a counterclockwise direction at a certain angle around the center of gravity g as an arrow in FIG. Ask. Then, the distance l between the reference point and the representative point is calculated using the center of gravity g as the reference point.

この時、図１０の点Ａおよび点Ｂに示すように、重心ｇから一定角度ごとに引いた直線と被写体シルエットとの交点が２点以上存在する場合には、重心ｇからの距離が長い方の交点Ａを代表点として用いる。 At this time, as shown by points A and B in FIG. 10, when there are two or more intersections between the straight line drawn from the center of gravity g at a certain angle and the subject silhouette, the longer distance from the center of gravity g Is used as a representative point.

輪郭特徴量は、上記の手順で求めた基準点と代表点との距離を用いて、（数７）の代わりに次式のように定義することができる。

The contour feature amount can be defined as the following equation instead of (Equation 7) using the distance between the reference point and the representative point obtained in the above procedure.

ここで、ｌ_c__yは、カメラｃで得られた映像から抽出した被写体シルエットの主軸１００２上に設定された基準点（重心ｇ）と代表点１００４との距離を示す。なお、Ｓ_cは、カメラｃで得られた映像から抽出した被写体シルエットの面積である。 Here, l _{c —} _y indicates the distance between the reference point (center of gravity g) set on the main axis 1002 of the subject silhouette extracted from the video obtained by the camera c and the representative point 1004. Note that _Sc is the area of the subject silhouette extracted from the video obtained by the camera c.

なお、このようにして得られた輪郭特徴量に対して、ローパスフィルタをかけることも有効である。これは、前述したように被写体シルエットに対して平滑化を行うことと同様の効果がある。 It is also effective to apply a low-pass filter to the contour feature value obtained in this way. This has the same effect as smoothing the subject silhouette as described above.

次に、Ｓ３０６からＳ３０８の処理は、（数７）の代わりに（数１０）を用いることで実現できるため、その詳細な説明は繰り返さない。 Next, since the processing from S306 to S308 can be realized by using (Equation 10) instead of (Equation 7), detailed description thereof will not be repeated.

本変形例で用いられる輪郭特徴量は、重心から引く直線の角度によって代表点を決定するため、輪郭点数が図６に示すように被写体シルエットの形状や大きさに依存しない。そのため、図１１のように、足部に生じた影の影響を受けて、輪郭抽出した画像の足部が地面と繋がった場合とそうでない場合においても、輪郭点の数に差異が生じないため、比較的、影の影響を受けにくい。 The contour feature amount used in the present modification determines the representative point based on the angle of the straight line drawn from the center of gravity, so the number of contour points does not depend on the shape or size of the subject silhouette as shown in FIG. Therefore, as shown in FIG. 11, there is no difference in the number of contour points between the case where the foot portion of the contour extracted image is connected to the ground and the case where the foot portion is not connected to the ground due to the influence of the shadow generated on the foot portion. It is relatively insensitive to shadows.

（実施の形態２）
次に、本発明の実施の形態２に係る映像表示装置について説明する。 (Embodiment 2)
Next, a video display apparatus according to Embodiment 2 of the present invention will be described.

実施の形態２に係る映像表示装置は、形状判定部１０４における被写体シルエットの凹凸度を実施の形態１に示した方法とは異なる方法で算出する。 The video display apparatus according to the second embodiment calculates the unevenness degree of the subject silhouette in the shape determination unit 104 by a method different from the method described in the first embodiment.

実施の形態２に係る映像表示装置の構成は、図１に示したものと同様である。ただし、形状判定部１０４の内部構成が実施の形態１とは異なる。それ以外の、撮像部１０１、画像保持部１０２、特徴抽出部１０３、カメラ選択部１０５および表示部１０６は、実施の形態１と同様であるので、それらの説明はここでは繰り返さない。 The configuration of the video display apparatus according to the second embodiment is the same as that shown in FIG. However, the internal configuration of the shape determination unit 104 is different from that of the first embodiment. Other than that, imaging unit 101, image holding unit 102, feature extraction unit 103, camera selection unit 105, and display unit 106 are the same as those in the first embodiment, and therefore, description thereof will not be repeated here.

図１２は、形状判定部１０４の内部構成を示すブロック図である。
形状判定部１０４は、輪郭特徴抽出部１０３３で抽出された輪郭特徴量からシルエット抽出部１０３２で得られた被写体シルエット形状の凹凸度を判定する処理部であり、ピーク検出部１０４１と、四肢判定部１０４２と、凹凸度判定部１０４３とを備えている。この凹凸度は、人物の四肢の見えを大きく反映する。 FIG. 12 is a block diagram illustrating an internal configuration of the shape determination unit 104.
The shape determination unit 104 is a processing unit that determines the degree of unevenness of the subject silhouette shape obtained by the silhouette extraction unit 1032 from the contour feature amount extracted by the contour feature extraction unit 1033, and includes a peak detection unit 1041 and an limb determination unit 1042 and an unevenness degree determination unit 1043. This unevenness degree largely reflects the appearance of a person's limbs.

ピーク検出部１０４１は、輪郭特徴抽出部１０３３で抽出した輪郭特徴量のピーク位置とピーク数を検出する。 The peak detection unit 1041 detects the peak position and the number of peaks of the contour feature amount extracted by the contour feature extraction unit 1033.

四肢判定部１０４２は、ピーク検出部１０４１で検出した輪郭特徴量のピーク位置に対して手または足の対応付けを行う。 The limb determination unit 1042 associates a hand or a foot with the peak position of the contour feature value detected by the peak detection unit 1041.

凹凸度判定部１０４３は、輪郭特徴量のピーク位置に対応付けられた手および足の情報を用いて、被写体シルエット形状の凹凸度を判定する。 The unevenness degree determination unit 1043 determines the unevenness degree of the subject silhouette shape by using the hand and foot information associated with the peak position of the contour feature value.

なお、本実施の形態では、四肢判定部１０４２において、手または足の情報を付加することによって、凹凸度判定部１０４３において凹凸度を判定する例について説明するが、四肢判定部１０４２を用いずに、凹凸度判定部１０４３において輪郭特徴量のピーク数を被写体シルエットの凹凸度としても良い。 Note that in this embodiment, an example in which the unevenness determination unit 1043 determines the unevenness by adding hand or foot information in the limb determination unit 1042 will be described, but the limb determination unit 1042 is not used. In the unevenness degree determination unit 1043, the peak number of the contour feature amount may be used as the unevenness degree of the subject silhouette.

以下に、本実施の形態に係る映像表示装置による映像表示方法について説明する。図１３は、本実施の形態に係る映像表示装置が実行する処理のフローチャートである。 Hereinafter, a video display method by the video display apparatus according to the present embodiment will be described. FIG. 13 is a flowchart of processing executed by the video display apparatus according to this embodiment.

Ｓ３０１からＳ３０５までの処理は、実施の形態１と同様であるため、説明を省略する。 Since the processing from S301 to S305 is the same as that in the first embodiment, description thereof is omitted.

ピーク検出部１０４１は、図１４（ａ）に示すように、特徴抽出部１０３で抽出した輪郭特徴量のピーク位置を検出する（Ｓ１３０１）。具体的には、輪郭特徴量を輪郭点に沿って差分をとることで、極大値を持つ点を選択する。すなわち、差分値が正から負に変化した点をピーク位置とする。図１４の黒丸で示した部分は、輪郭特徴量におけるピーク検出結果１４０２とそれに対応する被写体シルエット１４０１の輪郭点上の代表点である。 As shown in FIG. 14A, the peak detection unit 1041 detects the peak position of the contour feature amount extracted by the feature extraction unit 103 (S1301). Specifically, a point having a maximum value is selected by taking a difference in the contour feature amount along the contour point. That is, the point where the difference value changes from positive to negative is set as the peak position. A portion indicated by a black circle in FIG. 14 is a representative point on the contour point of the peak detection result 1402 and the corresponding object silhouette 1401 in the contour feature value.

なお、Ｓ１３０１では、実施の形態１と同様、輪郭特徴量に対してローパスフィルタをかけた後にピーク位置を検出することで、不要な凹凸を減らす事ができる。 In S1301, as in the first embodiment, unnecessary irregularities can be reduced by detecting the peak position after applying a low-pass filter to the contour feature quantity.

次に、四肢判定部１０４２は、Ｓ１３０１で検出したそれぞれのピーク位置に手または足のラベル付けを行う（Ｓ１３０２）。ここで、図１４（ｂ）に示すようにそれぞれのピーク位置に対応する輪郭特徴量１４０３をｌ_p1，ｌ_p2，…，ｌ_pMとする。 Next, the limb determination unit 1042 labels each peak position detected in S1301 with a hand or a foot (S1302). Here, as shown in FIG. 14B, the contour feature quantities 1403 corresponding to the respective peak positions are set to l _p1 , l _p2 _,.

具体的には、図１５に示すように、Ｓ１３０１で検出したピーク数に応じて場合分けを行い、手と足のラベル付け方法を決定することができる。 Specifically, as shown in FIG. 15, cases can be classified according to the number of peaks detected in S1301, and a labeling method for hands and feet can be determined.

ピーク数が１個の場合は、ピーク位置に足を対応付ける。
ピーク数が２個の場合は、ピーク位置のすべてに足を対応付ける場合と、ピーク位置にそれぞれ手と足を対応付ける場合とに分けることができる。 When the number of peaks is one, a foot is associated with the peak position.
When the number of peaks is two, it can be divided into a case where feet are associated with all peak positions and a case where hands and feet are associated with peak positions.

ピーク数が３個の場合は、ピーク位置にそれぞれ、足を２個と手を１個対応付ける場合と、ピーク位置にそれぞれ足を１個と手を２個対応付ける場合とに分けることができる。 When the number of peaks is three, it can be divided into a case where two feet and one hand are associated with each peak position, and a case where one foot and two hands are associated with each peak position.

ピーク数が４個の場合は、ピーク位置にそれぞれ、四肢のいずれかを対応付ける。
以上のように、ピーク数に応じて四肢のいずれを対応させるかを決定する。 When the number of peaks is 4, any one of the limbs is associated with each peak position.
As described above, it is determined which of the four limbs corresponds to the number of peaks.

次に、各ピーク位置への四肢のラベル付け方法を説明する。
ピーク数が１個の場合は、図１５に示すように、頭を始点とすると終点も頭となる。そのため、ピークの存在する位置は足として一意に決定できる。なお、手に相当する位置にピークが存在する場合は、２個以上のピークが存在することになる。 Next, a method for labeling limbs to each peak position will be described.
When the number of peaks is 1, as shown in FIG. 15, if the head is the start point, the end point is also the head. Therefore, the position where the peak exists can be uniquely determined as a foot. When a peak is present at a position corresponding to the hand, two or more peaks are present.

ピーク数が２個以上の場合は、ピーク位置の輪郭特徴量ｌ_p1，ｌ_p2，…，ｌ_pMを得る。ここでＭはピーク数である。以下、説明のため、ピーク位置の輪郭特徴量をｌ_p1＞ｌ_p2＞…＞ｌ_pMとする。 When the number of peaks is two or more, contour feature quantities l _p1 , l _p2 _,. Here, M is the number of peaks. Hereinafter, for purposes of explanation, the outline feature quantity of the peak position and _{_{l p1> l p2>...>}} l pM.

ピーク数が２個の場合、ピーク位置の輪郭特徴量の比を用いて、

のように、決定する。すなわち、ｌ_p1／ｌ_p2の値が所定の閾値αよりも大きい場合には、ｌ_p1を足の輪郭特徴量と判断し、ｌ_p2を手の輪郭特徴量と判断する。それ以外の場合には、ｌ_p1およびｌ_p2がともに足の輪郭特徴量であると判断する。 When the number of peaks is 2, using the ratio of the contour feature quantity at the peak position,

To decide. That is, when the value of l _p1 / l _p2 is larger than the predetermined threshold value α, l _p1 is determined as the contour feature amount of the foot, and l _p2 is determined as the contour feature amount of the hand. In other cases, it is determined that l _p1 and l _p2 are both foot contour feature values.

ピーク数が３個の場合、ピーク位置における最大の輪郭特徴量ｌ_p1は、足の輪郭特徴量であると判断し、最小の輪郭特徴量ｌ_p3は手の輪郭特徴量であると判断する。また、輪郭特徴量の中間値ｌ_p2が、以下のいずれに対応するかは（数１２）に従い決定される。すなわち、最大輪郭特徴量ｌ_p1と輪郭特徴量の中間値ｌ_p2との比（ｌ_p1／ｌ_p2）が所定の閾値αよりも大きければ、ｌ_p2を足の輪郭特徴量と判断し、それ以外の場合には、ｌ_p2を手の輪郭特徴量と判断する。

When the number of peaks is three, it is determined that the maximum contour feature value l _{p1 at} the peak position is the contour feature amount of the foot, and the minimum contour feature value l _p3 is determined to be the contour feature amount of the hand. Also, which of the following corresponds to the intermediate value l _p2 of the contour feature value is determined according to (Equation 12). That is, if the ratio (l _p1 / l _p2 ) between the maximum contour feature value l _p1 and the intermediate value l _p2 of the contour feature value is larger than a predetermined threshold value α, it is determined that l _p2 is the contour feature value of the foot, In other cases, it is determined that l _p2 is the contour characteristic amount of the hand.

ピーク数が４個の場合は、ｌ_p1およびｌ_p2を足の輪郭特徴量であると判断し、ｌ_p3およびｌ_p4を手の輪郭特徴量であると判断する。 When the number of peaks is four, it is determined that l _p1 and l _p2 are the contour feature amounts of the feet, and l _p3 and l _p4 are determined as the contour feature amounts of the hand.

ここで、（数１１）、（数１２）において、実験的にα＝１．５とした。ただし、閾値αの値はこれに限定されるものではなく、カメラの設置位置や対象とするシーン等によっても異なる。 Here, in (Equation 11) and (Equation 12), α = 1.5 was experimentally set. However, the value of the threshold value α is not limited to this, and varies depending on the installation position of the camera, the target scene, and the like.

また、輪郭特徴量のピーク位置への手足の対応付け方法は、上記の例に限られない。
例えば、ピーク位置の輪郭特徴量の大小に基づいて、手および足を振り分けても良い。また、被写体が地面に立っている状況を仮定できる場合であれば、図５の例のように、輪郭特徴量の各ピーク位置について、被写体シルエットの輪郭点上の対応点を求め、その対応点が地面に近い場合は足、画像上の縦軸方向において、重心位置に近い場合は手としてラベル付けを行うことも可能である。 Also, the method of associating the limb with the peak position of the contour feature quantity is not limited to the above example.
For example, hands and feet may be distributed based on the size of the contour feature amount at the peak position. If it can be assumed that the subject is standing on the ground, a corresponding point on the contour point of the subject silhouette is obtained for each peak position of the contour feature amount as in the example of FIG. It is also possible to label as a foot when the hand is close to the ground and as a hand when it is close to the center of gravity in the vertical axis direction on the image.

次に、凹凸度判定部１０４３は、Ｓ１３０２で対応付けられたラベル情報を用いて被写体シルエット形状の凹凸度を算出する（Ｓ１３０３）。 Next, the unevenness degree determination unit 1043 calculates the unevenness degree of the subject silhouette shape using the label information associated in S1302 (S1303).

すなわち、凹凸度判定部１０４３は、カメラｃで得られた画像ごとにラベル付けされたピーク位置の輪郭特徴量を用いて、数１３に従い被写体シルエット形状の凹凸度ｄ_cを計算する。

That is, irregularity determination unit 1043, using the outline feature quantity of labeled peak position for each image obtained by the camera c, calculating the asperity d _c of the object silhouette shape as the number 13.

ここで、Ｍは輪郭特徴量のピーク数、すなわちラベル付けされた輪郭特徴量のピークの数に相当する。 Here, M corresponds to the number of peaks of the outline feature quantity, that is, the number of peaks of the labeled outline feature quantity.

また、監視システムにおける万引き動作など、手の動作に着目する必要がある場合や、スポーツ中継等でも手の動きを重視する場合は、手としてラベル付けされた輪郭特徴量に重みをかけることによって、被写体シルエット形状のうち手を重要視した凹凸度ｄを定義することができる。

In addition, when it is necessary to pay attention to hand movements such as shoplifting movements in a surveillance system, or when importance is attached to movements of hands even in sports relaying etc., by weighting the contour feature amount labeled as a hand, It is possible to define the degree of unevenness d that places importance on the hand in the subject silhouette shape.

ここで、Ｋは手としてラベル付けされた輪郭特徴量のピーク数、Ｌは足としてラベル付けされた輪郭特徴量のピーク数を示す。また、

は、手としてラベル付けされた輪郭特徴量であり、

は足としてラベル付けされた輪郭特徴量である。このため、手の見えを重視した重み付け緒を行なうためには、１＞β＞０とすればよい。また、β＞１とすれば、足の見えを重視した凹凸度となる。 Here, K represents the number of peaks of the contour feature value labeled as a hand, and L represents the peak number of the contour feature value labeled as a foot. Also,

Is a contour feature labeled as a hand,

Is an outline feature labeled as a foot. For this reason, in order to perform weighting with emphasis on the appearance of the hand, 1>β> 0 may be set. Further, if β> 1, the degree of unevenness with an emphasis on the appearance of the foot is obtained.

このようなβの値をユーザが設定できるようなモードを、映像表示装置に用意しても良い。 A mode in which the user can set such a value of β may be prepared in the video display device.

さらに、（数９）のように、カメラｃで得られたＴ枚の画像列に対して凹凸度の時間平均を計算することも有効である。 Further, as shown in (Equation 9), it is also effective to calculate the time average of the unevenness for the T image rows obtained by the camera c.

上記のように得られた被写体シルエット形状の凹凸度を用いて、実施の形態１と同様にＳ３０７およびＳ３０８の処理が実行される。 Using the unevenness degree of the subject silhouette shape obtained as described above, the processing of S307 and S308 is executed as in the first embodiment.

以上説明したように、本実施の形態に係る映像表示装置によると、人物を含むシーンを複数のカメラで同時に撮像した映像から、人物の四肢の形状をユーザが認識しやすい映像を自動的に選択することができる。そのため、監視員やディレクターは、複数のカメラから得られた映像すべてに注視する必要がなく、注視すべきカメラ映像の数を減らすことが可能となる。このため、監視員やディレクターの映像監視の負担を軽減することができる。 As described above, according to the video display apparatus according to the present embodiment, a video that allows a user to easily recognize the shape of a person's extremities is automatically selected from videos obtained by simultaneously capturing scenes including a person with a plurality of cameras can do. Therefore, it is not necessary for the supervisor or director to watch all the videos obtained from the plurality of cameras, and the number of camera videos to be watched can be reduced. For this reason, it is possible to reduce the burden of video surveillance for the supervisor and director.

さらに、実施の形態１に加えて、手または足の見えに重みをかけた映像選択が可能となるため、スポーツ中継や監視等、用途に応じたカメラ映像の選択が可能である。 Further, in addition to the first embodiment, since it is possible to select a video with a weight on the appearance of hands or feet, it is possible to select a camera video according to the application, such as sports broadcasting or monitoring.

（実施の形態３）
次に、本発明の実施の形態３に係る映像表示装置について説明する。 (Embodiment 3)
Next, a video display apparatus according to Embodiment 3 of the present invention will be described.

実施の形態３に係る映像表示装置は、実施の形態１および２に示した映像表示装置の映像表示方法に加えて、顔画像を含むカメラ映像を同時に表示させるものである。 In addition to the video display method of the video display device shown in the first and second embodiments, the video display device according to the third embodiment simultaneously displays a camera video including a face image.

図１６は、実施の形態３に係る映像表示装置の構成を示すブロック図である。
映像表示装置は、人物を含む映像における被写体シルエットの形状の凹凸度を判定することによって、人物の四肢の形状をユーザが認識しやすい映像を選択すると共に、顔画像検出処理を行い、検出した顔画像を含むカメラ映像と前記選択したカメラ映像とを同時に表示する装置であり、撮像部１０１と、画像保持部１０２と、特徴抽出部１０３と、形状判定部１０４と、顔画像検出部１６０１と、カメラ選択部１６０２と、表示部１６０３とを備えている。 FIG. 16 is a block diagram showing a configuration of the video display apparatus according to Embodiment 3.
The video display device determines the degree of unevenness of the shape of the subject silhouette in an image including a person, thereby selecting an image that allows the user to easily recognize the shape of the limb of the person, performing face image detection processing, and detecting the detected face An apparatus that simultaneously displays a camera image including an image and the selected camera image, and includes an imaging unit 101, an image holding unit 102, a feature extraction unit 103, a shape determination unit 104, a face image detection unit 1601, A camera selection unit 1602 and a display unit 1603 are provided.

撮像部１０１、画像保持部１０２、特徴抽出部１０３および形状判定部１０４は、実施の形態１または２で示したものと同様である。また、特徴抽出部１０３の詳細な構成は図２に示したものと同様である。 The imaging unit 101, the image holding unit 102, the feature extraction unit 103, and the shape determination unit 104 are the same as those described in the first or second embodiment. The detailed configuration of the feature extraction unit 103 is the same as that shown in FIG.

顔画像検出部１６０１は、人物領域抽出部１０３１で抽出された映像から、顔画像の検出を行う。 The face image detection unit 1601 detects a face image from the video extracted by the person area extraction unit 1031.

カメラ選択部１６０２は、複数台のカメラの中から形状判定部１０４で判定した被写体シルエットの形状の凹凸度に基づいて、人物の四肢の形状をユーザが認識しやすい映像を撮像しているカメラ映像を選択する。それとともに、カメラ選択部１６０２は、顔画像検出部１６０１での顔画像検出結果に従い、顔画像を含む映像を撮像しているカメラ映像を選択する。 The camera selection unit 1602 captures an image that allows the user to easily recognize the shape of the person's limb based on the degree of unevenness of the shape of the subject silhouette determined by the shape determination unit 104 from among a plurality of cameras. Select. At the same time, the camera selection unit 1602 selects a camera image capturing an image including the face image according to the face image detection result in the face image detection unit 1601.

表示部１６０３は、カメラ選択部１６０２で選択された映像をモニタ等に表示する。
これにより、人物の四肢の形状と顔とをユーザが認識しやすい映像を選択し、表示することが可能である。もちろん、前記選択した映像を記録装置（図示せず）に記録することも可能である。 A display unit 1603 displays the video selected by the camera selection unit 1602 on a monitor or the like.
Thereby, it is possible to select and display an image in which the user can easily recognize the shape and face of the person's limbs. Of course, it is also possible to record the selected video on a recording device (not shown).

以下に、本実施の形態に係る映像表示装置による映像表示方法について、図１７のフローチャートを用いて説明する。ここでは、人物の四肢の形状をユーザが認識しやすい映像の選択方法は、実施の形態２と同様であるものとして説明を行なうが、実施の形態１と同様にして、当該映像の選択を行うようにしてもよい。 Hereinafter, a video display method by the video display apparatus according to the present embodiment will be described with reference to the flowchart of FIG. Here, the video selection method that allows the user to easily recognize the shape of the limbs of the person is described as being the same as in the second embodiment, but the video is selected in the same manner as in the first embodiment. You may do it.

Ｓ３０１〜Ｓ３０５およびＳ１３０１〜Ｓ１３０３の処理は、実施の形態２と同様である。このため、その詳細な説明は繰り返さない。 The processes of S301 to S305 and S1301 to S1303 are the same as in the second embodiment. Therefore, detailed description thereof will not be repeated.

顔画像検出部１６０１は、Ｓ３０２で抽出された人物領域に対して、顔画像検出処理を行う（Ｓ１７０１）。顔画像検出処理は、K.K.Sung and T.Poggio, "Example-Based Learning for View-Based Human Face Detection",IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.1, pp.39-51, 1998に開示されている技術等を用いることができる。この手法では、顔画像を予め学習させることにより、顔画像のモデルを構築する。 The face image detection unit 1601 performs face image detection processing on the person region extracted in S302 (S1701). Face image detection processing is described in KKSung and T. Poggio, "Example-Based Learning for View-Based Human Face Detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.1, pp.39-51, Techniques disclosed in 1998 can be used. In this method, a face image model is constructed by learning the face image in advance.

ここでは、上記技術を用いた例について説明するが、他の顔画像検出手法を用いても構わない。なお、主に正面、または正面に近い顔画像を撮像しているカメラ映像を選択したい場合には、あらかじめ用意した正面から撮像した顔画像をモデル学習に用いる。 Here, an example using the above technique will be described, but other face image detection methods may be used. When it is desired to select a camera image that mainly captures a front image or a face image close to the front, a face image captured from the front prepared in advance is used for model learning.

次に、カメラ選択部１６０２は、Ｓ１３０３で算出された被写体シルエット形状の凹凸度とＳ１７０１で行った顔画像検出結果とに基づいて、カメラ映像を選択する（Ｓ１７０２）。 Next, the camera selection unit 1602 selects a camera image based on the unevenness degree of the subject silhouette shape calculated in S1303 and the face image detection result performed in S1701 (S1702).

ここでは、（数９）を用いて被写体シルエット形状の凹凸度の時間平均を算出する場合を想定して説明を行なうが、（数８）に基づいて凹凸度を算出する場合も同様である。 Here, description will be made on the assumption that the time average of the unevenness degree of the subject silhouette shape is calculated using (Equation 9), but the same applies to the case where the unevenness degree is calculated based on (Equation 8).

実施の形態１と同様に、図７のように、複数台のカメラによって一定の領域を撮像している例を用いて説明する。 As in the first embodiment, description will be made using an example in which a certain region is imaged by a plurality of cameras as shown in FIG.

また、カメラ選択部１６０２は、顔画像を検出したカメラ映像も同時に選択する（Ｓ１７０２）。 In addition, the camera selection unit 1602 also selects the camera video from which the face image is detected (S1702).

カメラ選択部１６０２は、Ｓ１３０３にて算出した被写体シルエット形状の凹凸度に基づいて選択されたカメラ映像とＳ１７０１にて行った顔画像検出結果に基づいて選択されたカメラ映像とが異なる場合には、両方の画像を選択する。もちろんこの例に限らず、顔画像が検出されたカメラ映像すべてを選択するようにしても良い。ただし、監視等において映像の通信容量等に限定がある場合に、複数の顔画像が検出された場合は、検出した顔画像の大きさでカメラ映像を選択することもできる。 When the camera image selected based on the unevenness degree of the subject silhouette shape calculated in S1303 and the camera image selected based on the face image detection result performed in S1701 are different, the camera selection unit 1602 Select both images. Of course, the present invention is not limited to this example, and all camera images from which face images are detected may be selected. However, when there is a limitation in video communication capacity or the like in monitoring or the like, if a plurality of face images are detected, the camera video can be selected based on the size of the detected face images.

また、カメラ選択部１６０２は、被写体シルエット形状の凹凸度に基づいて選択されたカメラ映像と顔画像検出結果に基づいて選択されたカメラ映像とが同じであれば、両方を満たすカメラ映像を選択する。 In addition, if the camera video selected based on the unevenness degree of the subject silhouette shape and the camera video selected based on the face image detection result are the same, the camera selection unit 1602 selects a camera video that satisfies both. .

次に、表示部１６０３は、図１８に示すようにＳ１７０２で選択された映像を表示する（Ｓ１７０３）。具体的には、モニタ１８０１に、Ｓ１７０２で被写体シルエット形状の凹凸度に基づいて選択された映像１８０２と顔画像を検出したカメラ映像１８０３とを表示する。なお、顔検出結果に基づいて、顔位置を四角等で囲んで表示しても良いし、さらに、顔画像を拡大して表示しても良い。また、この時、設置されたカメラがズーム機能を持ったカメラであれば、顔画像をズームして撮像しても良い。 Next, the display unit 1603 displays the video selected in S1702 as shown in FIG. 18 (S1703). Specifically, the monitor 1801 displays the video 1802 selected based on the unevenness degree of the subject silhouette shape in S1702 and the camera video 1803 in which the face image is detected. Note that, based on the face detection result, the face position may be enclosed by a square or the like, or the face image may be enlarged and displayed. At this time, if the installed camera has a zoom function, the face image may be zoomed and imaged.

ここでは、凹凸度の高い順にＮ個の映像を表示しても構わないし、顔画像検出された映像すべてを表示しても構わない。また、被写体シルエット形状の凹凸度に基づいて選択されたカメラ映像と顔画像検出結果に基づいて選択されたカメラ映像とが同じであれば、両方を満たすカメラ映像を表示しても良い。この時さらに、カメラ配置１８０６と共に、現在表示されているカメラである被写体シルエット画像の凹凸度に基づいて選択されたカメラ１８０４および顔画像を検出したカメラ１８０５の設置位置を表示することも可能である。 Here, N images may be displayed in descending order of the degree of unevenness, or all images from which face images are detected may be displayed. Further, if the camera video selected based on the unevenness degree of the subject silhouette shape is the same as the camera video selected based on the face image detection result, a camera video that satisfies both may be displayed. At this time, it is also possible to display the installation position of the camera 1804 selected based on the unevenness degree of the subject silhouette image that is the currently displayed camera and the camera 1805 that detected the face image, together with the camera arrangement 1806. .

また、上記のように選択、表示した映像を同時に記録装置（図示せず）に記録することも可能である。 It is also possible to simultaneously record the video selected and displayed as described above in a recording device (not shown).

なお、映像の表示方法は、Ｓ１７０２で被写体シルエット形状の凹凸度と顔画像検出結果に基づいて選択された映像が表示されていれば良く、上記の例に限られない。 Note that the video display method is not limited to the above example as long as the video selected based on the unevenness degree of the subject silhouette shape and the face image detection result in S1702 is displayed.

以上説明したように、本実施の形態に係る映像表示装置によると、人物を含むシーンを複数のカメラで同時に撮像した映像から、人物の四肢の形状と人物の顔とをユーザが認識しやすい映像を自動的に選択することができる。そのため、監視員やディレクターは、複数のカメラから得られた映像すべてに注視する必要がなく、注視すべきカメラ映像の数を減らすことが可能となり、負担を軽減することができる。また、特に監視等の用途において、人物の動作と顔画像とを同時に把握することは、犯罪の立証においても有効である。 As described above, according to the video display device according to the present embodiment, a video that allows a user to easily recognize the shape of a person's limb and the face of a person from a video obtained by simultaneously capturing a scene including the person with a plurality of cameras. Can be selected automatically. Therefore, it is not necessary for the supervisor or director to watch all the videos obtained from the plurality of cameras, and the number of camera videos to be watched can be reduced, thereby reducing the burden. Further, in particular for monitoring and other uses, it is also effective in proving crimes to simultaneously grasp a person's movement and face image.

以上、本発明の実施の形態に係る映像表示装置について説明したが、本発明は、この実施の形態に限定されるものではない。 The video display device according to the embodiment of the present invention has been described above, but the present invention is not limited to this embodiment.

例えば、上述の実施の形態では被写体のシルエット形状の凹凸度を（数８）または（数９）に従い計算を行なっているが、以下に示す（数１７）に従い凹凸度を計算するようにしても良い。

For example, in the above-described embodiment, the unevenness degree of the silhouette shape of the subject is calculated according to (Equation 8) or (Equation 9), but the unevenness degree may be calculated according to (Equation 17) shown below. good.

すなわち、カメラｃで得られたＴ枚の画像列に対して、凹凸度の時間方向の差分の絶対値を計算し、足し合わせた値を新たな凹凸度としても良い。図１９は、カメラｃで得られた５枚の画像列から、新たな凹凸度Ｄ_cを求める例を模式的に示した図である。図１９に示されているグラフは、時刻ｔ＝１から時刻ｔ＝５までの５枚の画像列から得られた凹凸度ｄ_c（ｔ）を示しており、隣接する２枚の画像列から得られる凹凸度ｄ_c（ｔ＋１）およびｄ_c（ｔ）の間で差分の絶対値を計算し、それらを合計したものが新たな凹凸度Ｄ_cとなる。 That is, the absolute value of the difference in the degree of unevenness in the time direction may be calculated for the T image sequences obtained by the camera c, and the added value may be used as the new unevenness degree. 19, the five image sequence obtained by the camera c, is a diagram schematically showing an example of obtaining a new degree of unevenness D _c. The graph shown in FIG. 19 shows the unevenness degree d _c (t) obtained from five image sequences from time t = 1 to time t = 5, and from two adjacent image sequences. The absolute value of the difference is calculated between the obtained unevenness degrees d _c (t + 1) and d _c (t), and the sum of them is the new unevenness degree D _c .

本発明によると、人物の四肢の形状をユーザが認識しやすい映像を選択することができる。また、前記被写体シルエット形状の凹凸度の時間情報を用いることによって、人物の四肢の動きをユーザが認識しやすい映像を選択することができる。これにより、ユーザは、複数のカメラから得られた映像すべてに注視する必要がなくなり、注視すべきカメラ映像の数を減らすことができるため、ユーザの監視の負担を軽減することができる。よって、本発明は複数のカメラを用いた万引き現場の監視装置や、複数のカメラの中から適切なシーンを選択する際の映像選択装置等に適用できる。 According to the present invention, it is possible to select an image in which a user can easily recognize the shape of a person's limb. Further, by using the time information of the unevenness degree of the subject silhouette shape, it is possible to select an image in which the user can easily recognize the movement of the person's limbs. This eliminates the need for the user to pay attention to all the images obtained from the plurality of cameras, and can reduce the number of camera images to be watched, thereby reducing the user's monitoring burden. Therefore, the present invention can be applied to a shoplifting site monitoring apparatus using a plurality of cameras, a video selection apparatus for selecting an appropriate scene from a plurality of cameras, and the like.

本発明の実施の形態１に係る映像表示装置の構成を示す図である。It is a figure which shows the structure of the video display apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る特徴抽出部の構成を示す図である。It is a figure which shows the structure of the feature extraction part which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る映像表示装置が実行する処理のフローチャートである。It is a flowchart of the process which the video display apparatus which concerns on Embodiment 1 of this invention performs. 本発明の実施の形態１に係る輪郭特徴量の抽出方法について説明するための図である。It is a figure for demonstrating the extraction method of the outline feature-value which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る輪郭特徴量の抽出方法について説明するための図である。It is a figure for demonstrating the extraction method of the outline feature-value which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る輪郭特徴量の代表点を示す図である。It is a figure which shows the representative point of the outline feature-value which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る複数台のカメラの設置例を示す図である。It is a figure which shows the example of installation of the several camera which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る複数台のカメラの組合わせ例を示す図である。It is a figure which shows the example of a combination of the several camera which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る映像表示の例を示す図である。It is a figure which shows the example of the video display which concerns on Embodiment 1 of this invention. 本発明の実施の形態１の変形例に係る輪郭特徴量の抽出方法について説明するための図である。It is a figure for demonstrating the extraction method of the outline feature-value which concerns on the modification of Embodiment 1 of this invention. 本発明の実施の形態１の変形例に係る輪郭抽出結果を示す図である。It is a figure which shows the outline extraction result which concerns on the modification of Embodiment 1 of this invention. 本発明の実施の形態２に係る形状判定部の構成を示す図である。It is a figure which shows the structure of the shape determination part which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る映像表示装置が実行する処理のフローチャートである。It is a flowchart of the process which the video display apparatus concerning Embodiment 2 of this invention performs. 本発明の実施の形態２に係る輪郭特徴量のピーク検出例を示す図である。It is a figure which shows the example of a peak detection of the outline feature-value which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る四肢の対応付けの例を示す図である。It is a figure which shows the example of matching of the limb which concerns on Embodiment 2 of this invention. 本発明の実施の形態３に係る映像表示装置の構成を示す図である。It is a figure which shows the structure of the video display apparatus concerning Embodiment 3 of this invention. 本発明の実施の形態３に係る映像表示装置が実行する処理のフローチャートである。It is a flowchart of the process which the video display apparatus concerning Embodiment 3 of this invention performs. 本発明の実施の形態３に係る映像表示の例を示す図である。It is a figure which shows the example of the video display which concerns on Embodiment 3 of this invention. 本発明の実施の形態１〜３に係る凹凸度の他の算出方法を模式的に示す図である。It is a figure which shows typically the other calculation methods of the unevenness | corrugation degree which concern on Embodiment 1-3 of this invention.

Explanation of symbols

１０１撮像部
１０２画像保持部
１０３特徴抽出部
１０４形状判定部
１０５，１６０２カメラ選択部
１０６，１６０３表示部
１０３１人物領域抽出部
１０３２シルエット抽出部
１０３３輪郭特徴抽出部
１０４１ピーク検出部
１０４２四肢判定部
１０４３凹凸度判定部
１６０１顔画像検出部 DESCRIPTION OF SYMBOLS 101 Image pick-up part 102 Image holding part 103 Feature extraction part 104 Shape determination part 105,1602 Camera selection part 106,1603 Display part 1031 Person area extraction part 1032 Silhouette extraction part 1033 Contour feature extraction part 1041 Peak detection part 1042 Limb judgment part 1043 Irregularity Degree determination unit 1601 Face image detection unit

Claims

A video selection method for selecting at least one video from a plurality of videos captured by a plurality of cameras,
A video acquisition step of acquiring a plurality of videos including a person imaged by a plurality of cameras;
A feature amount extraction step of extracting a feature amount obtained from a plurality of representative points provided on a contour line of a person's subject silhouette included in each of the plurality of videos;
An unevenness degree determining step for determining an unevenness degree indicating the degree of unevenness of the shape of the subject silhouette from the feature amount for each of the plurality of images,
And a video selection step of selecting at least one video for recognizing the shape of the person's hand or foot from the plurality of videos based on the degree of unevenness of the plurality of videos.

The video selection method according to claim 1, wherein in the video selection step, a video having the maximum unevenness is selected.

The video selection method according to claim 1, wherein in the feature amount extraction step, a distance between a predetermined reference point and the plurality of representative points is extracted as a feature amount.

The unevenness degree determining step includes:
A curve generation step of generating a curve by connecting a plurality of feature quantities at the plurality of representative points;
A maximum value detecting step for detecting a maximum value in the curve;
The video selection method according to claim 3, further comprising: calculating a degree of unevenness based on the maximum value.

The unevenness degree determining step includes:
A curve generation step of generating a curve by connecting a plurality of feature quantities at the plurality of representative points;
A maximum value detecting step for detecting a maximum value in the curve;
An association step of associating a person's hand or foot with the local maximum based on the local maximum and the number of local maximums;
The video selection method according to claim 3, further comprising: a concave / convex degree calculating step of calculating the concave / convex degree based on a result of association and the maximum value.

The unevenness degree calculating step calculates the unevenness degree by weighting either the maximum value associated with the hand or the maximum value associated with the foot based on the result of the association. 5. The video selection method according to 5.

2. The unevenness determination step further calculates, for each of the plurality of videos, a time average of the unevenness of a plurality of temporally continuous images, and sets the calculation result as the unevenness of the subject silhouette shape. The video selection method according to any one of -6.

In the unevenness degree determining step, for each of the plurality of videos, a difference value in the time direction of the unevenness degree of the plurality of temporally continuous images is calculated, and the calculation result is set as the unevenness degree of the subject silhouette shape. The video selection method according to claim 1.

The video selection method according to any one of claims 1 to 8, wherein in the video selection step, videos selected from a direction opposite to the video are simultaneously selected on the basis of the selected video.

9. The video selection step simultaneously selects, based on the selected video, images capturing the subject from a direction orthogonal to the optical axis direction of the camera capturing the video. 9. Item 2. The video selection method according to item 1.

The video selection method further includes a face image detection step of detecting a video including a face image of a person from the plurality of videos captured by the plurality of cameras.
The video selection step further selects at least one video for recognizing the person's face from the plurality of videos based on a detection result of the video including a face image. The video selection method according to item.

A video selection device that selects at least one video from a plurality of videos captured by a plurality of cameras,
Video acquisition means for acquiring a plurality of videos including a person captured by a plurality of cameras;
Feature amount extraction means for extracting feature amounts obtained from a plurality of representative points provided on the contour line of the subject silhouette of a person included in each of the plurality of videos;
For each of the plurality of videos, an unevenness degree determining means for determining an unevenness degree indicating the unevenness degree of the shape of the subject silhouette from the feature amount;
A video selection device comprising: video selection means for selecting at least one video for recognizing the shape of the person's hand or foot from the plurality of videos based on the degree of unevenness of the plurality of videos.

A program of a video selection method for selecting at least one video from a plurality of videos captured by a plurality of cameras,
A video acquisition step of acquiring a plurality of videos including a person imaged by a plurality of cameras;
A feature amount extraction step of extracting a feature amount obtained from a plurality of representative points provided on a contour line of a person's subject silhouette included in each of the plurality of videos;
An unevenness degree determining step for determining an unevenness degree indicating the degree of unevenness of the shape of the subject silhouette from the feature amount for each of the plurality of images,
A program that causes a computer to execute an image selection step of selecting at least one image for recognizing the shape of the hand or foot of the person from the plurality of images based on the degree of unevenness of the plurality of images.