JP5376403B2

JP5376403B2 - Video display device and program

Info

Publication number: JP5376403B2
Application number: JP2009202933A
Authority: JP
Inventors: 勇児糟谷; 禎史荒木; 慶二大村
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2009-09-02
Filing date: 2009-09-02
Publication date: 2013-12-25
Anticipated expiration: 2029-09-02
Also published as: JP2011055291A

Description

本発明は、動画カメラによって複数の人物が入る画角で撮影した映像から切り出された人物の映像を画面の主表示領域に表示する映像表示装置に関し、より詳しくは、個々の人物映像を表示する画面と全員の映像を表示する画面の間で相互にユーザーの意図に従い選択された画面に表示を切り替える機能を備えた映像表示装置及び映像表示装置のコンピュータに用いるプログラムに関する。 The present invention relates to a video display device that displays a video of a person cut out from a video shot at an angle of view where a plurality of people enter with a video camera in a main display area of the screen, and more specifically, displays an individual human video. The present invention relates to a video display device having a function of switching between a screen and a screen that displays the video of all the members according to the user's intention, and a program used for the computer of the video display device.

従来から、例えば、テレビ会議等において参加者の映像を表示する際に、カメラで撮影した映像から人物がいる部分を抽出し、抽出した人物にズームして表示することや、人物の映像部分を切り出し表示することにより、会議の進行に必要な画面を作成する技術が知られている。
テレビ会議に適用される上記のような表示画面の作成、表示制御を行う従来技術として、特許文献１を例示することができる。特許文献１では、複数人（全員）を含む場面を撮影した映像を表示する画面で、マウスポインタ等を利用してユーザーによって選択された特定の１人を切り出し、切り出した特定の人物を表示する制御を行うことが記載されている。 Conventionally, for example, when displaying a participant's video in a video conference or the like, a portion where a person is extracted from the video captured by the camera, and the extracted person is zoomed and displayed. A technique for creating a screen necessary for the progress of a conference by cutting and displaying is known.
Patent document 1 can be illustrated as a prior art which performs creation and display control of the above display screens applied to a video conference. In Patent Literature 1, a specific one selected by a user is cut out using a mouse pointer or the like on a screen that displays an image of a scene including a plurality of persons (all members), and the cut out specific person is displayed. It is described that control is performed.

ただ、特許文献１の映像表示では、特定の人物を表示した後、どのようなユーザーの操作によって表示画面を切り替えるかを示していない。従って、画面を全員の映像に戻したい、もしくは他の人物の映像に直接切り替えたい、というユーザーの要求に応えることができない、という問題が生じる。
本発明は、複数の人物が入る画角で撮影が可能な広角の動画カメラを用いることでリアルタイムに人物を検出し、検出した人物の映像を切り出して表示する映像表示装置における上記従来技術の問題に鑑みてなされたもので、その目的は、個人の映像を表示する画面からユーザーの意図に従い選択された他の構成員もしくは全員の映像を表示する画面にすばやく切り替え、表示を行えるようにすることにある。 However, the video display of Patent Document 1 does not indicate what kind of user operation is used to switch the display screen after a specific person is displayed. Therefore, there arises a problem that it is impossible to respond to a user's request to return the screen to everyone's video or to directly switch to another person's video.
The present invention has the above-mentioned problems in the conventional video display device that detects a person in real time by using a wide-angle video camera capable of shooting at an angle of view where a plurality of persons enter, and cuts out and displays the detected person's video. The purpose was to quickly switch from a screen displaying personal images to a screen displaying images of other members or everyone selected according to the user's intention, so that it can be displayed. It is in.

本発明の映像表示装置は、複数の人物が入る画角で動画カメラによって撮影した映像から人物候補を検出する人物検出手段と、時系列の撮影映像から前記人物検出手段によって検出された人物候補の検出結果をもとに表示対象とする人物を判定する表示対象判定手段と、前記表示対象判定手段によって判定された人物が納まる撮影映像上の領域を区分する領域区分手段と、前記領域区分手段によって区分された領域の映像を撮影映像から切り出す映像切出手段と、映像をもとに画面に画像を表示する画像表示手段と、前記映像切出手段によって切り出された映像の全てを前記画像表示手段の選択用画面に用いるサムネイルとする処理、並びに前記映像切出手段によって切り出された映像を前記画像表示手段の主画面に用いる画像とする処理を行う表示用データ処理手段と、前記画像表示手段の選択用画面のサムネイルに対する操作により領域の一部を選択する領域選択手段と、前記領域選択手段により選択された領域の切り出し映像をもとに前記表示用データ処理手段が処理したデータを用いて前記画像表示手段における表示を制御する表示制御手段を有したことを特徴とする。 The video display device according to the present invention includes a person detection unit that detects a human candidate from video captured by a video camera at an angle of view where a plurality of persons enter, and a human candidate detected by the human detection unit from a time-series captured video. Display target determining means for determining a person to be displayed based on a detection result, area classifying means for classifying an area on a captured video in which the person determined by the display target determining means is contained, and the area classifying means Video cutout means for cutting out the video of the divided area from the shot video, image display means for displaying an image on the screen based on the video, and all of the video cut out by the video cutout means as the image display means Processing for selecting a thumbnail to be used for the selection screen, and processing for converting the video clipped by the video cropping means to an image to be used for the main screen of the image display means Display data processing means, area selection means for selecting a part of the area by an operation on the thumbnail of the selection screen of the image display means, and the clipped video of the area selected by the area selection means It has a display control means for controlling the display on the image display means using the data processed by the display data processing means.

本発明によると、複数の人物が入る画角で撮影が可能な広角の動画カメラを用いることでリアルタイムに人物を検出し、検出した人物の映像を切り出して表示する映像表示において、個々の人物映像を表示する画面と全員の映像を表示する画面の間で相互にユーザーの意図に従い選択された画面に表示をすばやく切り替えることができる。 According to the present invention, in a video display in which a person is detected in real time by using a wide-angle video camera capable of shooting at an angle of view where a plurality of persons enter, and an image of the detected person is cut out and displayed, individual person images are displayed. It is possible to quickly switch the display to the screen selected according to the user's intention between the screen displaying the video and the screen displaying the video of all members.

本発明の実施形態に係る映像表示装置において画面表示する映像領域の選択用画面の１構成例を示す図である。It is a figure which shows one structural example of the screen for selection of the video area displayed on a screen in the video display apparatus which concerns on embodiment of this invention. 選択用画面（図１）を介してユーザーが行う選択操作による画面の遷移を説明する図である。It is a figure explaining the transition of the screen by the selection operation which a user performs via the screen for selection (FIG. 1). 本発明の実施形態に係る映像表示装置において画面表示する映像領域の選択用画面の他の構成例を示す図である。It is a figure which shows the other structural example of the screen for selection of the video area | region displayed on a screen in the video display apparatus which concerns on embodiment of this invention. 本発明に係る映像表示装置の画像処理系の概略構成を示す機能ブロック図である。It is a functional block diagram which shows schematic structure of the image processing system of the video display apparatus concerning this invention. 顔検知及び動き検知の対象画像と検知概念の説明図（Ａ）、並びに顔検知及び動き検知のトラッキング方法の説明図（Ｂ）、（Ｃ）である。It is explanatory drawing (A) of the target image and detection concept of face detection and motion detection, and explanatory drawing (B), (C) of the tracking method of face detection and motion detection. 顔もしくは動き検知結果をもとに人物候補の検出リストを作成する処理の手順を示すフロー図である。It is a flowchart which shows the procedure of the process which produces the detection list of a person candidate based on a face or a motion detection result. 検知リストを作成する処理の手順を示すフロー図である。It is a flowchart which shows the procedure of the process which produces a detection list. 検知リストを作成する処理のフロー（図７）のステップＳ２０１で行うシーク処理の手順を示すフロー図である。It is a flowchart which shows the procedure of the seek process performed by step S201 of the flow (FIG. 7) of the process which produces a detection list. 検知リストを作成する処理のフロー（図７）のステップＳ２０２で行う逆シーク処理の手順を示すフロー図である。It is a flowchart which shows the procedure of the reverse seek process performed by step S202 of the flow (FIG. 7) of the process which produces a detection list. シーク処理の手順を示すフロー（図８）のステップＳ３０２，Ｓ３０６で行うトラッキングの手順を示すフロー図である。It is a flowchart which shows the procedure of tracking performed by step S302, S306 of the flow (FIG. 8) which shows the procedure of a seek process. 検知リストを作成する際の撮影画像上の人物（人物候補）と検出リスト及び検知リストとの関係を説明する概念図である。It is a conceptual diagram explaining the relationship between the person (person candidate) on the picked-up image at the time of creating a detection list, a detection list, and a detection list. 表示用データの作成処理の手順を示すフロー図である。It is a flowchart which shows the procedure of the production | generation process of the display data.

以下に、本発明の映像表示装置に係る実施形態を説明する。
本発明は、動画カメラによって複数の人物が入る画角で撮影した映像から切り出された人物の映像を表示装置の主画面に表示する映像表示装置に係り、テレビ会議等において参加者の映像を表示する際に有効な手段となる。以下には、テレビ会議等の用途に適応する例を実施形態として示す。
本実施形態は、人物が納まる撮影映像として切り出される領域を、個人、複数人（２人、３人、・・等）及び全員の納まる領域とし、さらにフルサイズの撮影映像領域を含めて多様化し、その中から、ユーザーが表示対象として求める映像を選択し、画面表示を可能とする手段を特徴とする。 Embodiments according to the video display apparatus of the present invention will be described below.
The present invention relates to a video display device that displays on a main screen of a display device a video of a person cut out from a video shot at an angle of view where a plurality of people enter with a video camera, and displays a video of a participant in a video conference or the like It becomes an effective means when doing. In the following, an example that is adapted to applications such as video conferencing is shown as an embodiment.
In the present embodiment, an area to be cut out as a captured video in which a person is accommodated is an area in which individuals, a plurality of people (two people, three people, etc.) and all are accommodated, and is further diversified to include a full-size captured video area. A feature is that the user can select a video image to be displayed as a display target and display the screen.

「表示映像の選択用画面」
表示対象となる多様な映像領域の中から求める映像領域をユーザーが選択するために、映像表示装置は、表示画面を介してユーザーに表示対象となる映像領域を示し、選択した映像領域を指示するユーザー操作を受け付ける。
表示画面を通してユーザーに表示対象となる映像領域を示し、選択する映像領域を指示できるようにする方法として、ここでは、２つの異なる方法で対応する実施形態を示す。
１つは、表示対象となる映像領域の画面のサムネイルを選択用画面として作成し、主画面（選択された映像領域の画面等を表示する画面）とは別にこの選択用画面を表示する方法によるものである（以下、この方法によって表示対象となる映像領域の選択を行う機能を「第１の選択機能」もしくは「サムネイル選択機能」という）。
もう１つは、主画面に表示される撮影映像（画像）中に表示対象となる映像領域を選択するための表示要素を埋め込むことにより、選択用画面を作成し、この選択用画面を表示する方法によるものである（以下、この方法によって表示対象となる映像領域の選択を行う機能を「第２の選択機能」もしくは「埋め込み要素選択機能」という）。 "Display video selection screen"
In order for the user to select a desired video area from among various video areas to be displayed, the video display device indicates the video area to be displayed to the user via the display screen and instructs the selected video area. Accept user operations.
As a method for showing the video area to be displayed to the user through the display screen and instructing the video area to be selected, here, corresponding embodiments are shown by two different methods.
One is to create a thumbnail for the screen of the video area to be displayed as a selection screen and display this selection screen separately from the main screen (screen for displaying the screen of the selected video area, etc.). (Hereinafter, a function for selecting a video area to be displayed by this method is referred to as a “first selection function” or a “thumbnail selection function”).
The other is to create a selection screen by embedding a display element for selecting a video region to be displayed in a captured video (image) displayed on the main screen, and display this selection screen. (Hereinafter, a function for selecting a video area to be displayed by this method is referred to as a “second selection function” or an “embedding element selection function”).

図１は、上記第１の選択機能（サムネイル選択機能）による映像領域選択用の画面の構成例を示す図である。
図１において、映像表示装置の画像表示部（後記図４、参照）の画面は、主画面１１０と、主画面１１０に表示する画面を選択する選択用画面１２０よりなる。同図の主画面１１０は、例えば広角のビデオカメラで撮影した映像をフルサイズで表示した画面であり、画面に３人の人物を撮った画像を表示している。
また、選択用画面１２０は、複数の選択対象の画面全部をサムネイルとして配列し、図１の実施形態では、主画面１１０に合成して常時表示する形態をとっている。
図１の例では、選択用画面１２０に配列されるサムネイルは、画面［１］としてフルサイズの画面、画面［２］として全員が納まる画面、画面［３］として「Ａさん」のみが納まる画面、画面［４］として「Ｂさん」のみが納まる画面、画面［５］として「Ｃさん」のみが納まる画面の各画面を表示する。なお、「Ａさん」、「Ｂさん」、「Ｃさん」は、各人の撮影映像上の並び順（左から順）で表示する。
なお、図１の例では、選択用画面１２０は、主画面１１０と同一画面を構成するが、外部からの操作（後記図２の操作手段２１０、参照）により主画面１１０の一部に一時的に合成してもよいし、又主画面と切り替えて選択用画面のみを表示してもよい。 FIG. 1 is a diagram showing a configuration example of a screen for selecting a video area by the first selection function (thumbnail selection function).
In FIG. 1, the screen of the image display unit (see FIG. 4 described later) of the video display device includes a main screen 110 and a selection screen 120 for selecting a screen to be displayed on the main screen 110. The main screen 110 in FIG. 6 is a screen that displays, for example, a video captured by a wide-angle video camera in full size, and displays an image of three persons on the screen.
In addition, the selection screen 120 arranges all the plurality of selection target screens as thumbnails, and in the embodiment of FIG. 1, the selection screen 120 is combined with the main screen 110 and always displayed.
In the example of FIG. 1, the thumbnails arranged on the selection screen 120 are a full-size screen as the screen [1], a screen where everyone is stored as the screen [2], and a screen where only “Mr. A” is stored as the screen [3]. The screen [4] is a screen that contains only “Mr. B”, and the screen [5] is a screen that contains only “Mr. C”. “Mr. A”, “Mr. B”, and “Mr. C” are displayed in the order of arrangement on the captured video of each person (from left to right).
In the example of FIG. 1, the selection screen 120 constitutes the same screen as the main screen 110, but is temporarily displayed on a part of the main screen 110 by an external operation (refer to the operation unit 210 in FIG. 2 described later). Or may be switched to the main screen to display only the selection screen.

画像表示部の画面に表示される映像領域の選択用画面を介してユーザーが行う選択操作による画面の遷移について、図２に示す動作例を参照して説明する。なお、図２は、図１に示した映像領域選択用の画面を例にする。
図２の（Ａ）、（Ｂ）及び（Ｃ）の各表示画面には、それぞれ主画面１１０にユーザーの操作により選択された画面を表示している。また、（Ａ）、（Ｂ）及び（Ｃ）の各表示画面の選択用画面１２０には、選択対象の画面全部のサムネイルを常時表示する。
図２の動作例では、映像表示装置が起動されると、初期画面として図２（Ａ）に示すフルサイズの画面［１］を現在の主画面として表示する。この表示状態で、リモコン、マウス等の操作手段２１０を用いて選択用画面１２０に選択対象として示されたサムネイルの中から画面［３］（特定の個人）を選択する操作が行われると、この指示を映像表示装置が受け付け、主画面１１０を画面［１］から画面［３］に遷移させる。この操作手段２１０の選択操作は、選択用画面１２０のサムネイルをポインタで指示する方法、選択用画面１２０に配列されるサムネイルに識別番号を付け番号で指示する方法、或いは１回の操作ごとに配列に従い順番にサムネイルを選択する、といった方法を採用することにより実施できる。
選択用画面１２０には、選択対象の画面全部のサムネイルを常時表示しているので、画面［３］（特定の個人）から画面［２］（全員にフィット）もしくは画面［１］（フルサイズ）への遷移のようにどのサムネイルでも選択可能な画面に直ちに遷移することができる。 The transition of the screen by the selection operation performed by the user via the screen for selecting the video area displayed on the screen of the image display unit will be described with reference to the operation example shown in FIG. FIG. 2 shows an example of the video area selection screen shown in FIG.
In each of the display screens of FIGS. 2A, 2 </ b> B, and 2 </ b> C, a screen selected by a user operation is displayed on the main screen 110. In addition, on the selection screen 120 of the display screens (A), (B), and (C), thumbnails of all the selection target screens are always displayed.
In the operation example of FIG. 2, when the video display device is activated, the full-size screen [1] shown in FIG. 2A is displayed as the current main screen as the initial screen. In this display state, when an operation for selecting the screen [3] (specific individual) from the thumbnails shown as selection targets on the selection screen 120 using the operation means 210 such as a remote controller or a mouse is performed. The video display device accepts the instruction and changes the main screen 110 from the screen [1] to the screen [3]. The selection operation of the operation means 210 is performed by a method of designating thumbnails on the selection screen 120 with a pointer, a method of designating thumbnails arranged on the selection screen 120 with an identification number, or an arrangement for each operation. This can be implemented by adopting a method of selecting thumbnails in order according to the above.
Since the selection screen 120 always displays thumbnails of all screens to be selected, the screen [3] (specific individual) to the screen [2] (fits all) or screen [1] (full size) It is possible to immediately transition to a screen where any thumbnail can be selected like transition to.

図３は、上記第２の選択機能（埋め込み要素選択機能）による映像領域選択用の画面の構成例を示す図である。
図３において、映像表示装置の画像表示部の画面には、表示している撮影映像（画像）中に表示対象となる映像領域を選択するための表示要素を埋め込むことにより、選択用画面を構成し、選択用画面の選択表示要素に対する操作により、選択された画面への切り替えを行う。図３に例示する画面１１０は、例えば広角のビデオカメラで撮影した映像をフルサイズで表示した画面であり、画面に３人の人物を撮った画像を表示しているが、この画面で映像領域を選択するための表示要素として［１］〜［５］の各領域を示す枠を用いて、選択可能な領域を表すことにより選択用画面を構成する。
また、この選択用画面のユーザーへの提示の仕方は、表示中の画面に領域を示す表示要素を目立たない線で常時表示させることや、或いは必要時にユーザーの操作に応じて領域を示す表示要素を有する選択用画面への切り替え表示を行うようにすることができる。 FIG. 3 is a diagram showing a configuration example of a screen for selecting a video area by the second selection function (embedding element selection function).
In FIG. 3, a selection screen is configured by embedding a display element for selecting a video region to be displayed in a captured video (image) being displayed on the screen of the image display unit of the video display device. Then, switching to the selected screen is performed by an operation on the selection display element of the selection screen. A screen 110 illustrated in FIG. 3 is a screen that displays, for example, a video captured by a wide-angle video camera in full size, and displays an image of three persons on the screen. A selection screen is configured by representing a selectable area using a frame indicating each area of [1] to [5] as a display element for selecting.
In addition, the method of presenting the selection screen to the user is to always display the display element indicating the area on the screen being displayed with an inconspicuous line, or to display the area according to the user's operation when necessary. The display can be switched to a selection screen having

画像表示部の画面に表示される図３の選択用画面を介してユーザーが行う選択操作による画面の遷移は、リモコン、マウス等の操作手段２１０を用いて選択領域を指示し、指示された領域の映像を主画面に表示するという点で、図２に示す動作例を参照して説明した第１の選択機能と基本的に同様な方法を採用することができる。
ただ、映像領域を選択するための表示要素を埋め込む第２の選択機能では、第１の選択機能の図２の例に示すように、常時、選択用画面１２０にフルサイズのサムネイルを表示し、フルサイズの画面を選択できる構成はとれないので、表示されている画面が個人の納まる領域等のフルサイズ以外の画面であっても、ユーザーの操作に応じて図３に示すフルサイズの表示画面に遷移できるように、フルサイズの表示画面への選択操作を常時行うために、別のキー入力手段を設ける等を付加することにより実施することができる。 The transition of the screen by the selection operation performed by the user via the selection screen shown in FIG. 3 displayed on the screen of the image display unit indicates the selection area using the operation means 210 such as a remote controller or a mouse. A method that is basically similar to the first selection function described with reference to the operation example shown in FIG. 2 can be adopted in that the above video is displayed on the main screen.
However, in the second selection function for embedding a display element for selecting a video area, as shown in the example of FIG. 2 of the first selection function, a full-size thumbnail is always displayed on the selection screen 120, Since it is not possible to select a full-size screen, the full-size display screen shown in FIG. 3 can be selected according to the user's operation even if the displayed screen is a screen other than the full-size screen such as an area where an individual fits. In order to always perform the selection operation to the full-size display screen, it is possible to implement by adding another key input means or the like.

「画像処理系の構成」
次に、撮影画像からこの映像表示装置の画像表示部で用いる表示用データ（上記「表示映像の選択用画面」に記載した画面を表示するためのデータ）を処理する画像処理系の構成について説明する。
図４は、本実施形態に係る画像処理系の概略構成を示す機能ブロック図である。
図４に示す本実施形態に係る画像処理系は、映像表示装置の制御部（不図示）の制御下で動作し、顔検知部２０１、動き検知部２０２、マージ部２０３、検知リスト作成部２０４、画像切り出し部２０５、画像表示部２０６及び画面選択指示部２０７を有する。 "Image processing system configuration"
Next, the configuration of an image processing system that processes display data (data for displaying the screen described in the “display video selection screen”) used in the image display unit of the video display device from the captured image will be described. To do.
FIG. 4 is a functional block diagram showing a schematic configuration of the image processing system according to the present embodiment.
The image processing system according to the present embodiment illustrated in FIG. 4 operates under the control of a control unit (not shown) of the video display device, and includes a face detection unit 201, a motion detection unit 202, a merge unit 203, and a detection list creation unit 204. An image cutout unit 205, an image display unit 206, and a screen selection instruction unit 207.

映像表示装置の制御部（不図示）は、画像処理装置全体を制御し、ビデオカメラで撮影された映像を取り込み、取り込んだ映像を表示用のデータとして処理し、処理した表示用のデータにより表示手段（ディスプレイ）を制御する手段として、ＣＰＵ（Central Processing Unit）、ＣＰＵの処理に必要なプログラムやデータ等を一時的に記憶しておくためのＲＡＭ（Random Access Memory）及びＣＰＵを駆動して演算や処理等を実行させるためのプログラム等を格納したＲＯＭ（Read Only Memory）等の記憶部よりなるコンピュータを備える。この実施形態では、映像表示装置の制御部を構成するコンピュータを図４に示す画像処理系の各処理部として機能させるためのプログラムを前記ＲＯＭ等の記憶部に記憶（記録）させ、ＣＰＵがＲＡＭをワークメモリに用いてこのプログラムを駆動することで、制御部のコンピュータによってこの該画像処理系を構成する形態で実施することができる。もちろん、この画像処理系は、専用の画像処理ＩＣで構成してもよい。 The control unit (not shown) of the video display device controls the entire image processing device, captures video captured by the video camera, processes the captured video as display data, and displays the processed display data. As means for controlling the means (display), a CPU (Central Processing Unit), a RAM (Random Access Memory) for temporarily storing programs and data necessary for CPU processing, and the CPU are operated for calculation. And a computer including a storage unit such as a ROM (Read Only Memory) storing a program for executing processing and the like. In this embodiment, a program for causing a computer constituting the control unit of the video display device to function as each processing unit of the image processing system shown in FIG. 4 is stored (recorded) in the storage unit such as the ROM, and the CPU By using this as a work memory and driving this program, the image processing system can be implemented by a computer of the control unit. Of course, this image processing system may be constituted by a dedicated image processing IC.

画像処理系（図４）が有する各機能部は、次の処理機能を持つ。
顔検知部２０１、動き検知部２０２は、撮影映像から人物とみなす特徴量の１つである顔、動きをそれぞれ検知する（詳細は図５を参照し後述）。
マージ部２０３は、顔検知部２０１及び動き検知部２０２の検知結果を所定の処理条件に従いマージし、人物とみなせる画像（以下「人物候補」という）を検出リストの形式でまとめ、検出結果とする（詳細は図６を参照し後述）。
検知リスト作成部２０４は、マージ部２０３から受け取る検出リストをもとに所定の判定条件に従い表示の対象とする人物を判定する処理部で、判定結果を検知リストの形式でまとめる（詳細は図７〜１０を参照し後述）。
画像切り出し部２０５は、検知リスト作成部２０４から受け取る検知リストをもとに検知した人物が納まる表示対象領域を撮影映像から切り出し、画像表示部２０６に渡す。
画像表示部２０６は、表示対象領域の撮影映像（フルサイズの映像及び切り出された映像）を主画面に用いる画像とする処理、選択用画面に用いる画像（サムネイルを作成する処理或いは映像領域を選択するための表示要素を埋め込む処理）を作成する処理等の表示用データ処理を行い、また、画面選択指示部２０７の指示に従い選択された画面の表示用データをディスプレイの表示制御信号として出力する。
画面選択指示部２０７は、ユーザーの画面選択操作を受け取り、選択された画面の表示を画像表示部２０６に指示する。 Each functional unit included in the image processing system (FIG. 4) has the following processing functions.
The face detection unit 201 and the motion detection unit 202 each detect a face and a motion, which are one of feature amounts regarded as a person, from the captured video (details will be described later with reference to FIG. 5).
The merge unit 203 merges the detection results of the face detection unit 201 and the motion detection unit 202 in accordance with predetermined processing conditions, collects images that can be regarded as persons (hereinafter referred to as “person candidates”) in the form of a detection list, and sets the detection results. (Details will be described later with reference to FIG. 6).
The detection list creation unit 204 is a processing unit that determines a person to be displayed according to a predetermined determination condition based on the detection list received from the merge unit 203, and summarizes the determination results in the form of a detection list (for details, see FIG. 7). 10 to 10).
The image cutout unit 205 cuts out a display target area in which a person detected based on the detection list received from the detection list creation unit 204 is stored from the captured video, and passes it to the image display unit 206.
The image display unit 206 selects an image used for the main screen from the captured video (full-size video and clipped video) in the display target area, and selects an image used for the selection screen (a thumbnail creation process or a video area). Display data processing such as processing for embedding display elements to be performed), and display data for the screen selected in accordance with an instruction from the screen selection instruction unit 207 is output as a display control signal for the display.
The screen selection instruction unit 207 receives a user's screen selection operation and instructs the image display unit 206 to display the selected screen.

「画像処理系によるデータ処理」
次に、画像処理系（図４）が行う撮影映像のデータ処理の詳細を説明する。この処理は、表示対象となる、フルサイズの撮影映像及び撮影映像から切り出された人物領域の映像の中から主画面に表示する映像を選択するためにユーザーに示す選択用画面を作成し、ユーザーが選択した映像を主画面に表示する撮影映像のデータ処理である。
この処理では、先ず撮影映像から人物候補を検出する。この人物候補の検出は、人物を中心に画面表示を行うことを目的に、撮影映像に現れ、表示対象として選択される人物を捉える処理で、経時変化を伴う個々の人物を正しく捉える必要がある。 "Data processing by image processing system"
Next, details of the data processing of the captured video performed by the image processing system (FIG. 4) will be described. This process creates a selection screen shown to the user in order to select the video to be displayed on the main screen from the full-size captured video and the human area video clipped from the captured video to be displayed. This is data processing of a captured video that displays the selected video on the main screen.
In this process, first, a person candidate is detected from a photographed video. This person candidate detection is a process of capturing a person appearing in a captured image and selected as a display target for the purpose of displaying a screen centering on the person, and it is necessary to correctly capture individual persons with changes over time. .

“顔及び動きの検知”
個々の人物を正しく捉えるために、撮影映像に現れる人物の特徴量である顔及び動きを検知する。この検知は、画像処理系（図４）の顔検知部２０１及び動き検知部２０２が入力された撮影映像（画像）に対して行う。
顔検知部２０１及び動き検知部２０２で行う顔検知及び動き検知について、図５を参照して説明する。
図５（Ａ）の画面１１０に示すようなフルサイズの撮影画像を対象とした場合、この画像に現れた３人の人物が検知対象になる。
顔検知は、図５（Ａ），（Ｂ）に示すように、正方形で囲んだ顔の領域が検知対象であり、撮影画像に含まれる顔の画像をパターン認識によって検知する。このパターン認識による検知は、例えば、パターンマッチングの手法として知られている、Ｈａａｒ−Ｌｉｋｅ特徴による特徴量検出とＡｄａｂｏｏｓｔ学習を組み合わせた識別器を用いた手法（www.2ken.no-ip.com/publication/2008/P-06026.pdf、参照）を採用することにより実施することができる。
また、動き検知は、図５（Ａ），（Ｃ）に示すように、白矢印で模式的に示した動きが検知対象であり、撮影画像に含まれる移動体の動きをオプティカルフロー（フレーム間の画面上での動きの情報をベクトルとして表したもの）を利用して検知する。このオプティカルフローを利用する動きの検知は、例えば、追跡人物の動きモデルと色モデルをつくり、それらを確率的に統合することで追跡人物に属する画素（人物候補領域）を割り出し、人物領域を決定する既存の手法（www-cv.mech.eng.osaka-u.ac.jp/~kubo/research/index-j.html、参照）を採用することにより実施することができる。 “Face and motion detection”
In order to correctly capture each person, the face and movement, which are the feature amounts of the person appearing in the captured video, are detected. This detection is performed on the captured video (image) input by the face detection unit 201 and the motion detection unit 202 of the image processing system (FIG. 4).
Face detection and motion detection performed by the face detection unit 201 and the motion detection unit 202 will be described with reference to FIG.
When a full-size photographed image as shown on the screen 110 in FIG. 5A is targeted, three persons appearing in this image are targeted for detection.
In the face detection, as shown in FIGS. 5A and 5B, a face area surrounded by a square is a detection target, and a face image included in a photographed image is detected by pattern recognition. This detection by pattern recognition is, for example, a method using a classifier (www.2ken.no-ip.com/), which is known as a pattern matching method, which combines feature value detection using Haar-Like features and Adaboost learning. This can be done by adopting publication / 2008 / P-06026.pdf.
In addition, as shown in FIGS. 5A and 5C, the motion detection is the detection target of the motion schematically indicated by the white arrow, and the motion of the moving body included in the captured image is optically flowed (between frames). The motion information on the screen is expressed as a vector). Motion detection using this optical flow is, for example, creating a motion model and color model of a tracked person, and probabilistically integrating them to determine the pixels (person candidate areas) belonging to the tracked person and determining the person area Can be implemented by adopting the existing method (see www-cv.mech.eng.osaka-u.ac.jp/~kubo/research/index-j.html).

“検出リストの作成”
顔検知部２０１及び動き検知部２０２でそれぞれ検知した顔と動きは、これらの検知が行われる度に、検知結果が渡されるマージ部２０３によって所定の処理条件のもとにマージし、検知した人物候補を検出リストにまとめる。ここで作成する人物候補の検出リストは、人物候補ごとに少なくとも以下に例示するエントリ（１）〜（３）を持つものを用いる。
検出リスト：
エントリ（１）検出位置（顔の外接矩形の頂点或いは動きの中心点の座標等）
エントリ（２）検出サイズ（顔の外接矩形或いは動きの内接矩形の縦横サイズ等）
エントリ（３）検出タイプ（検知対象の顔、動き等の種類）
この検出リストのデータは、あくまで現時点の撮影画像に現れた人物候補、即ち人物とみなされる画像を検出したデータであり、人物ではない可能性もある。よって、このマージ部２０３の処理結果は、後段の処理で人物候補が表示対象となる人物であると判定され、検知リスト（後記で詳述）に登録されるまで、一旦上記検出リストの形式で管理する。
なお、上記マージする際の処理条件は、人物とみなす特徴を顔と動きという２つの特徴で捉えるので、同一人物の顔と動きが同時に検知されると、検知結果が重複することがあるので、この重複を排除する条件とする（後述する図６の処理フローにおけるＳ１０１の説明、参照）。 “Create Discovery List”
The face and motion detected by the face detection unit 201 and the motion detection unit 202 are merged under a predetermined processing condition by the merge unit 203 to which the detection result is passed every time these detections are performed, and the detected person Put candidates together in a detection list. The person candidate detection list created here uses at least entries (1) to (3) exemplified below for each person candidate.
Detection list:
Entry (1) Detection position (coordinates of the vertex of the circumscribed rectangle of the face or the center point of movement)
Entry (2) Detected size (vertical and horizontal size of circumscribed rectangle of face or inscribed rectangle of motion)
Entry (3) Detection type (type of detection target face, movement, etc.)
The data in the detection list is data that detects a human candidate appearing in the current captured image, that is, an image that is regarded as a person, and may not be a person. Therefore, the processing result of the merge unit 203 is once in the form of the detection list until it is determined that the person candidate is a person to be displayed in the subsequent process and is registered in the detection list (described in detail later). to manage.
In addition, since the processing condition at the time of merging captures the feature regarded as a person with two features of face and motion, if the same person's face and motion are detected at the same time, the detection results may overlap, This is a condition for eliminating this duplication (see the description of S101 in the processing flow of FIG. 6 described later).

マージ部２０３が行う検出リストの作成処理を図６に示すフロー図にもとづいて説明する。
マージ部２０３は、顔検知部２０１及び動き検知部２０２からそれぞれ顔、動きの検知結果を受け取ると、同一人物の検知結果と考えられる検知結果の重複を排除する。ここでは、動き検知で得られる検出位置（動きの中心点の座標）が顔検知で得られる検出位置（顔の外接矩形の頂点の座標）の周囲の一定範囲にある場合には、動き検知の結果を破棄する（ステップＳ１０１）。つまり、同一人物の顔、動きの検出位置の関係は一定の関係にあるので、予めこの条件を検証することにより得られる位置関係から重複排除の上記範囲を定め、定めた範囲内にある動きの検知結果を破棄し、顔の検知結果を残す。
次に、顔の検知結果と破棄しなかった動きの検知結果をマージする（ステップＳ１０２）。この処理は、人物候補として検出された顔、動きにそれぞれ識別子を付け、ラベリングすることにより、各検知単位で得られる検出結果としての人物候補をまとめる処理である。
この後、各検知単位でマージされた検出結果を検出リストに登録し（ステップＳ１０３）、検出リストの作成処理を終了する。検出リストの作成は、ラベリングした人物候補ごとに上記検出リストのエントリ（１）〜（３）にそれぞれデータを登録する処理である。 The detection list creation process performed by the merge unit 203 will be described with reference to the flowchart shown in FIG.
When the merge unit 203 receives the detection results of the face and the motion from the face detection unit 201 and the motion detection unit 202, respectively, the merge unit 203 eliminates duplication of detection results considered to be the same person detection result. Here, if the detection position (coordinate of the center point of the motion) obtained by motion detection is within a certain range around the detection position (coordinate of the vertex of the circumscribed rectangle of the face) obtained by face detection, The result is discarded (step S101). In other words, since the relationship between the face of the same person and the detection position of the motion is a fixed relationship, the above range of deduplication is determined from the positional relationship obtained by verifying this condition in advance, and the motions within the defined range are determined. Discard detection results and leave face detection results.
Next, the face detection result and the motion detection result that has not been discarded are merged (step S102). This process is a process for putting together human candidates as detection results obtained in each detection unit by attaching identifiers to the faces and movements detected as human candidates and labeling them.
Thereafter, the detection result merged for each detection unit is registered in the detection list (step S103), and the detection list creation processing is terminated. The creation of the detection list is a process of registering data in the detection list entries (1) to (3) for each person candidate labeled.

“検知リストの作成”
上記マージ部２０３で作成された検出リストに登録されたデータは、人物候補（人物とみなされるもの）の検出データであり、人物ではない可能性もある。
そこで、時系列にこれまで行ってきた人物検知で得た結果と今回（現時点で）得た検出結果との関係で人物らしいと判断される人物候補を今回の検知結果として求めることにより、人物の検知精度を保証し、安定した検知結果を得る。
手法としては、今回行おうとしていると同様の判断プロセスで前回までに行って得られた履歴を勘案して、検出された人物候補から新規に検出された人物候補を含め、人物らしいと判断される人物候補のみを選択し、今回の検知結果とする（詳細は図７〜１０を参照し後述）。 “Create detection list”
The data registered in the detection list created by the merging unit 203 is detection data of a candidate for a person (what is regarded as a person) and may not be a person.
Therefore, by finding the candidate person who is judged to be likely to be a person based on the relationship between the results obtained in person detection conducted so far in time series and the detection results obtained this time (currently) as the current detection results, Guarantees detection accuracy and obtains stable detection results.
As a technique, it is determined that the person seems to be a person including newly detected person candidates from the detected person candidates in consideration of the history obtained up to the previous time in the same determination process as the current attempt. Only the person candidate to be selected is selected as the detection result of this time (details will be described later with reference to FIGS. 7 to 10).

検知リスト作成部２０４は、マージ部２０３によって作成された今回の検出リスト（人物候補の検出結果）を受け取り、上記の人物検知を行う処理部であり、各回の処理により検知結果として得られる人物は、検知リストの形式でまとめられる。ここで作成する検知リストは、人物ごとに以下に例示するエントリ（１）〜（７）を持つものを用いる。
検出リスト：
エントリ（１）検知位置（顔の外接矩形の頂点或いは動きの中心点の座標等）
エントリ（２）検知サイズ（顔の外接矩形或いは動きの内接矩形の縦横サイズ等）
エントリ（３）トラッキング対象範囲（原点に対し予め対象として定めたトラッキングの範囲、図１０の説明、参照）
エントリ（４）トラッキング用ヒストグラム（ヒストグラムの要素、ヒストグラムが求められる対象領域等、図１０の説明、参照）
エントリ（５）検知カウンタ（カウント値を画像切り出し部２０５が切り出し範囲を決めるファクタとして使用）
エントリ（６）検知終了カウンタ（カウント値が０以下のエントリの削除に使用）
エントリ（７）検出タイプ（検知対象の顔、動き等の種類）
この検知リストのデータは、人物候補が表示対象となる人物であると判定され結果と、表示画像として切り出される人物の画像領域を定めるためのであり、この検知リストの作成後に、画像表示部２０６に渡され、又次回の検知リストを作成するために用いることができるように管理される。 The detection list creation unit 204 is a processing unit that receives the current detection list (person candidate detection result) created by the merge unit 203 and performs the above-described person detection. The person obtained as a detection result by each processing is , In the form of a detection list. The detection list created here uses an entry (1) to (7) exemplified below for each person.
Detection list:
Entry (1) Detection position (coordinates of the vertex of the circumscribed rectangle of the face or the center point of movement, etc.)
Entry (2) Detection size (vertical and horizontal size of circumscribed rectangle of face or inscribed rectangle of movement)
Entry (3) Tracking target range (tracking range predetermined as a target with respect to the origin, see description of FIG. 10)
Entry (4) Histogram for tracking (Histogram elements, target area for which histogram is obtained, etc., see description of FIG. 10)
Entry (5) Detection counter (the count value is used as a factor for the image cutout unit 205 to determine the cutout range)
Entry (6) Detection end counter (used to delete entries whose count value is 0 or less)
Entry (7) Detection type (type of detection target face, movement, etc.)
This detection list data is used to determine the person candidate as a person to be displayed and to determine the result and the image area of the person to be cut out as a display image. And is managed so that it can be used to create the next detection list.

“検知リスト作成の処理フロー”
検知リスト作成部２０４が行う検知リストの作成処理を図７〜１０に示すフロー図にもとづいて詳細に説明する。
なお、以下の説明では、検知リストを作成する際の撮影画像上の人物（人物候補）と検出リスト及び検知リストとの関係を説明する図１１の概念図を参照する。図１１において、左側の画面１１０は、今回求めた検出リストに対応し、登録された各人物候補とそれぞれの検出リスト３０６，３０１〜３０３が矢印で結ばれている。また、右側の画面１１０は、前回の検知リスト作成時に対応し、登録された各人物とそれぞれの検知リスト４０１〜４０３，４０５が矢印で結ばれている。 "Detection list creation process flow"
The detection list creation processing performed by the detection list creation unit 204 will be described in detail with reference to the flowcharts shown in FIGS.
In the following description, reference is made to the conceptual diagram of FIG. 11 for explaining the relationship between a person (person candidate) on a captured image, a detection list, and a detection list when a detection list is created. In FIG. 11, a screen 110 on the left corresponds to the detection list obtained this time, and each registered person candidate and each of the detection lists 306, 301 to 303 are connected by arrows. The right screen 110 corresponds to the previous detection list creation, and each registered person and each detection list 401 to 403, 405 are connected by arrows.

図７の処理フローによると、検知リスト作成部２０４は、マージ部２０３によって作成された今回の検出リスト（人物候補の検出結果）を受け取ると、先ず、この検出リストに登録された人物候補と現在管理している（前回作成された）検知リストの人物との対応付けを検出結果（検知結果）にもとづいて行う（ステップＳ２０１）。このステップＳ２０１の処理は、このステップのサブシーケンスを示す図８のシーク処理を行う。このシーク処理は、検知リスト側から検出リスト側に対応する人物（人物候補）の検出データを探しにいく処理である。なお、図８のシーク処理の詳細は、後述する。
次に、ステップＳ２０１の処理とは逆に、検出リスト側から検知リスト側に対応する人物の検出データを探しにいき、現在検知リストにない人物候補のリストへの追加登録を行う（ステップＳ２０２）。このステップＳ２０２の処理は、このステップのサブシーケンスを示す図９の逆シーク処理を行う。なお、図９の逆シーク処理の詳細は、後述する。
この後、ステップＳ２０１及びＳ２０１の処理を行った結果、作成しようとしている検知リストのエントリである検知終了カウンタの値が「０以下」になる人物に関するデータを検知リストから削除する（ステップＳ２０３）。なお、ステップＳ２０１のシーク処理で対応付けができた人物のデータは、検知リストに登録し続ける。
上記の処理を行った後、検知リストの作成を終了する。 According to the processing flow of FIG. 7, when the detection list creation unit 204 receives the current detection list (person candidate detection result) created by the merge unit 203, first, the person candidate registered in the detection list and the current candidate are registered. Correlation with a person in the detection list that is managed (created last time) is performed based on the detection result (detection result) (step S201). In the process of step S201, the seek process of FIG. 8 showing a subsequence of this step is performed. This seek process is a process of searching for detection data of a person (person candidate) corresponding to the detection list side from the detection list side. Details of the seek process in FIG. 8 will be described later.
Next, contrary to the process of step S201, the detection list side is searched for detection data of the person corresponding to the detection list side, and additional registration is performed on a list of person candidates that are not currently in the detection list (step S202). . The process of step S202 performs the reverse seek process of FIG. 9 showing the subsequence of this step. Details of the reverse seek process in FIG. 9 will be described later.
Thereafter, as a result of performing the processing of steps S201 and S201, data relating to a person whose detection end counter value, which is an entry of the detection list to be created, becomes “0 or less” is deleted from the detection list (step S203). It should be noted that the data of the person associated with the seek process in step S201 is continuously registered in the detection list.
After performing the above processing, the creation of the detection list is terminated.

ここで、上記した図７の処理フローにおけるステップＳ２０１のシーク処理を、このステップのサブシーケンスを示す図８を参照して詳述する。
このシーク処理は、検知リスト側から検出リスト側に対応する人物（人物候補）の検出データを探しにいく処理である。図１１の例を参照すると、検知リスト４０１〜４０３，４０５それぞれが検出リスト３０６，３０１〜３０３中のいずれかと所定の対応関係を有することを確認する処理である。
図８のフローによると、始めに、検知リスト中の処理対象とする人物の検知位置から予め定めた一定の範囲内に検出リストの人物候補が存在するか否かを、それぞれのリストに登録された検知位置をもとに確認し、存在する場合にその個数を求め、求めた個数が「０個」「１個」「２個以上」によって、それぞれ処理を分岐する（ステップＳ３０１）。 Here, the seek process in step S201 in the process flow of FIG. 7 will be described in detail with reference to FIG. 8 showing a subsequence of this step.
This seek process is a process of searching for detection data of a person (person candidate) corresponding to the detection list side from the detection list side. Referring to the example of FIG. 11, this is processing for confirming that each of the detection lists 401 to 403 and 405 has a predetermined correspondence with any of the detection lists 306 and 301 to 303.
According to the flow of FIG. 8, first, whether or not a candidate person in the detection list exists within a predetermined range from the detection position of the person to be processed in the detection list is registered in each list. Based on the detected position, the number of the detected positions is obtained, and the number of the obtained positions is determined. The processing is branched depending on the obtained number of “0”, “1”, “2 or more” (step S301).

検出リストの人物候補が存在しない場合（ステップＳ３０１−０個）、検知リスト中の処理対象とする人物のトラッキングを行い、トラッキング結果を得る（ステップＳ３０２）。なお、このステップＳ３０２のトラッキングは、このステップのサブシーケンスを示す図１０のフローによる（図１０のトラッキングの詳細は、後述）。
トラッキング結果を得た後、検知リストで管理している終了カウンタを所定値（−ｄ０）ダウンカウントし（ステップＳ３０３）、処理フローを終了する。
図１１の例を参照すると、上記ステップＳ３０２及びＳ３０３の処理は、検知リスト４０５に対する処理に相当し、この場合、検出リストの人物候補が対応しないので、終了カウンタを−１（：−ｄ０）ダウンカウントしている。なお、終了カウンタのカウント値は、図7でステップＳ２０３の処理として先に述べたように、０以下になると検知リストのエントリから削除され、画面に表示されなくなる。従って、終了カウンタに設定する所定値（−ｄ０）は、人物の検知漏れと誤検知との兼ね合いで適当な重みを付けるようにする。 If there is no person candidate in the detection list (step S301-0), the person to be processed in the detection list is tracked to obtain a tracking result (step S302). The tracking in step S302 is based on the flow of FIG. 10 showing the subsequence of this step (details of tracking in FIG. 10 will be described later).
After obtaining the tracking result, the end counter managed in the detection list is counted down by a predetermined value (−d0) (step S303), and the processing flow is ended.
Referring to the example of FIG. 11, the processing in steps S302 and S303 corresponds to the processing for the detection list 405. In this case, since the candidate person in the detection list does not correspond, the end counter is decreased by −1 (: −d0). Counting. The count value of the end counter is deleted from the entry in the detection list and is not displayed on the screen when the count value becomes 0 or less, as described above as the processing of step S203 in FIG. Accordingly, the predetermined value (-d0) set in the end counter is appropriately weighted in consideration of the detection failure of the person and the erroneous detection.

検出リストの人物候補が１個だけ存在する場合（ステップＳ３０１−１個）、検出リストに登録されている検出位置を新規に作成する検知リストに登録する検出位置とする（ステップＳ３０４）。
この後、検知リストで管理している検知カウンタを所定値（＋ｄ１）アップカウントし（ステップＳ３０５）、処理フローを終了する。
図１１の例を参照すると、上記ステップＳ３０４及びＳ３０５の処理は、検知リスト４０２及び４０３に対する処理に相当し、この場合、それぞれ検出リストの３０２及び３０３のデータで管理する人物候補が対応するので、検知カウンタを＋２（：＋ｄ１）アップカウントしている。 When there is only one person candidate in the detection list (step S301-1), the detection position registered in the detection list is set as the detection position registered in the newly created detection list (step S304).
Thereafter, the detection counter managed in the detection list is up-counted by a predetermined value (+ d1) (step S305), and the processing flow ends.
Referring to the example of FIG. 11, the processes in steps S304 and S305 correspond to the processes for the detection lists 402 and 403, and in this case, the person candidates managed by the data of the detection lists 302 and 303 correspond respectively. The detection counter is incremented by +2 (: + d1).

検出リストの人物候補が２個以上存在する場合（ステップＳ３０１−２個以上）、検知リスト中の処理対象とする人物のトラッキングを行い、検出リストに２個以上存在する人物候補それぞれに対するトラッキング位置を求め、求めた中の尤もらしい位置（最も近い位置）をトラッキング結果として得る（ステップＳ３０６）。なお、このステップＳ３０６のトラッキングは、このステップのサブシーケンスを示す図１０のフローによる（図１０のトラッキングの詳細は、後述）。
この後、検知リストで管理している検知カウンタを所定値（＋ｄ２）アップカウントし（ステップＳ３０７）、処理フローを終了する。
図１１の例を参照すると、上記ステップＳ３０６及びＳ３０７の処理は、検知リスト４０１に対する処理に相当し、この場合、検出リストの３０１及び３０２のデータで管理する２人の人物候補が対応するが、その中の検出リスト３０１のトラッキング位置が近いので、検出リスト３０１の方を検知結果とし、検知カウンタを＋１（：＋ｄ２）アップカウントしている。 When there are two or more person candidates in the detection list (step S301-2 or more), the person to be processed in the detection list is tracked, and the tracking position for each of the two or more person candidates in the detection list is set. The most likely position (the closest position) obtained is obtained as a tracking result (step S306). The tracking in step S306 is based on the flow of FIG. 10 showing the subsequence of this step (details of tracking in FIG. 10 will be described later).
Thereafter, the detection counter managed in the detection list is up-counted by a predetermined value (+ d2) (step S307), and the processing flow ends.
Referring to the example of FIG. 11, the processing of steps S306 and S307 corresponds to the processing for the detection list 401, and in this case, two person candidates managed by the data of the detection list 301 and 302 correspond. Since the tracking position of the detection list 301 among them is close, the detection list 301 is used as a detection result, and the detection counter is incremented by +1 (: + d2).

次いで、上記した図８の処理フローにおけるステップＳ３０２又はＳ３０６のトラッキングを、このステップのサブシーケンスを示す図１０を参照して詳述する。
このトラッキングは、検知タイプが顔検知か動き検知かによってトラッキング領域を異にする。顔検知の場合、図５（Ｂ）に示すように、顔がある領域の下の顔と同じ大きさの領域をトラッキング領域とする。また、動き検知の場合、図５（Ｃ）に示すように、検知された動き（フレーム間差分等）の中心を含む内接矩形をトラッキング領域とする。
顔検知、動き検知いずれの場合も、上記のようにして定められたトラッキング領域におけるヒストグラムを求める。求めるヒストグラムは、ここでは、領域内の画素ごとの色を検出し、色に対する画素数の分布を求めることにより得る。
図１０のフローによると、検知タイプによって処理を分岐するので、先ず、顔検知か動き検知かを確認する（ステップＳ５０１）。 Next, tracking in step S302 or S306 in the processing flow of FIG. 8 will be described in detail with reference to FIG. 10 showing a subsequence of this step.
The tracking area differs depending on whether the detection type is face detection or motion detection. In the case of face detection, as shown in FIG. 5B, an area having the same size as the face below the face is set as the tracking area. In the case of motion detection, as shown in FIG. 5C, an inscribed rectangle including the center of detected motion (difference between frames, etc.) is set as a tracking region.
In both cases of face detection and motion detection, a histogram in the tracking area determined as described above is obtained. Here, the obtained histogram is obtained by detecting the color of each pixel in the region and obtaining the distribution of the number of pixels with respect to the color.
According to the flow of FIG. 10, since the process branches depending on the detection type, first, it is confirmed whether face detection or motion detection (step S501).

検知タイプが顔の場合、顔がある領域の下の顔と同じ大きさの領域をトラッキング領域として設定し（ステップＳ５０２）、他方、検知タイプが動きの場合、動きの中心を含む内接矩形をトラッキング領域として設定して（ステップＳ５０３）、それぞれ設定した領域の画像を検知リストがもとにした撮影画像から取り出して、この領域の画素値のヒストグラム（本実施形態では画素値を色値とするので、色に対する画素数の分布）を求める（ステップＳ５０４）。
次いで、検出リストがもとにした撮影画像に対してトラッキングを実行する（ステップＳ５０５）。このトラッキングは、ステップＳ５０４でヒストグラムを求めた時の画像上の位置の周囲の一定範囲をトラッキング対象範囲とし、かつステップＳ５０４でヒストグラムを求めた時と同様の方法で検出リストがもとにした撮影画像を処理し、ヒストグラムを求める。
ステップＳ５０５では、トラッキング対象範囲内を走査してヒストグラムを求める。また、このトラッキングは、所定数のフレームについて各フレームで繰り返し実行される（ステップＳ５０５〜Ｓ５０７）。
トラッキングの結果は、ステップＳ５０５〜Ｓ５０７で得た検出リストのヒストグラム群の中からステップＳ５０４で得た検出リストのヒストグラムに最も近いヒストグラムを求め、そのヒストグラムのもとになった領域として得る（ステップＳ５０６）。
上記のようにして、トラッキングの結果として得た領域を、上記したシーク処理（図８）のステップＳ３０２及びステップＳ３０６のトラッキング位置に用いる。 If the detection type is a face, an area having the same size as the face below the face is set as a tracking area (step S502). On the other hand, if the detection type is movement, an inscribed rectangle including the center of movement is set. A tracking area is set (step S503), an image of each set area is taken out from a captured image based on the detection list, and a histogram of pixel values in this area (in this embodiment, the pixel value is a color value). Therefore, the distribution of the number of pixels with respect to the color) is obtained (step S504).
Next, tracking is performed on the captured image based on the detection list (step S505). This tracking uses a fixed range around the position on the image when the histogram is obtained in step S504 as a tracking target range, and is based on the detection list in the same manner as when the histogram is obtained in step S504. Process the image and determine the histogram.
In step S505, the tracking target range is scanned to obtain a histogram. This tracking is repeatedly executed for each frame for a predetermined number of frames (steps S505 to S507).
As a result of tracking, a histogram closest to the histogram of the detection list obtained in step S504 is obtained from the histograms of the detection list obtained in steps S505 to S507, and obtained as an area based on the histogram (step S506). ).
As described above, the area obtained as a result of tracking is used as the tracking position in steps S302 and S306 of the seek process (FIG. 8).

ここで、上記した図７の処理フローにおけるステップＳ２０２の逆シーク処理を、このステップのサブシーケンスを示す図９を参照して詳述する。
この逆シーク処理は、検出リスト側から検知リスト側に対応する人物の検出データを探しにいく処理である。図１１の例を参照すると検出リスト３０６，３０１〜３０３それぞれが、検知リスト４０１〜４０３，４０５中のいずれかと所定の対応関係を有することを確認する処理である。ただ、この実施形態では、図８を参照して説明したように、既にシーク処理を行っているので、現在検知リストにない人物候補を探し、新規にリストへの追加登録を行うために行う。 Here, the reverse seek process in step S202 in the process flow of FIG. 7 will be described in detail with reference to FIG. 9 showing a subsequence of this step.
This reverse seek process is a process of searching for detection data of a person corresponding to the detection list side from the detection list side. Referring to the example of FIG. 11, this is processing for confirming that each of the detection lists 306, 301 to 303 has a predetermined correspondence with any one of the detection lists 401 to 403 and 405. However, in this embodiment, as described with reference to FIG. 8, since the seek process has already been performed, a candidate person who is not currently in the detection list is searched and newly added to the list.

図９のフローによると、検知リストにない人物候補が対象になるので、先ず、検出リスト中の人物候補で検知リストに対応しなかったものを探す（ステップＳ４０１）。
次に、ステップＳ４０１で得られる人物候補の検出位置が、全検知リストに登録されているトラッキング対象範囲の外側にあるか否かを確認する（ステップＳ４０２）。なお、この実施形態では、トラッキング対象範囲内にあれば、シーク処理の対象になることを前提にしている。
ステップＳ４０２でトラッキング対象範囲の外側にあれば（ステップＳ４０２-YES）、この人物候補がどの検知リストにもない人物であるとみなし、検知リストで管理している検知カウンタを所定値（＋ｄ３）アップカウントするとともに、終了カウンタを所定値（＋ｄ４）アップカウントする（ステップＳ４０３）。
図１１の例を参照すると、上記ステップＳ４０３の処理は、検出リスト３０６に対する処理に相当し、この場合、検知リストに対応する人物がないので、新規に検知リスト４０６を追加し、検知リストに登録する検知カウンタを＋１（：＋ｄ３）アップカウントし、かつ終了カウンタを＋３（：＋ｄ４）アップカウントしている。
他方、ステップＳ４０２でトラッキング対象範囲内にあれば（ステップＳ４０２-NO）、逆シークの対象外であるから、ステップＳ４０３の処理をパスする。
ステップＳ４０１〜Ｓ４０３は、全検出リスト中の人物候補で全検知リストに対応しなかったもの全部が終了するまで、１つずつ繰り返し行う。
全検出リスト中の全人物候補の逆シーク処理を済ませたことを確認した後、この処理フローを終了する。 According to the flow of FIG. 9, since the person candidates not in the detection list are targeted, first, the person candidates in the detection list that do not correspond to the detection list are searched (step S401).
Next, it is confirmed whether or not the detection position of the human candidate obtained in step S401 is outside the tracking target range registered in the entire detection list (step S402). In this embodiment, it is premised that if it is within the tracking target range, it becomes a target of seek processing.
If it is outside the tracking target range in step S402 (step S402-YES), the person candidate is regarded as a person who is not in any detection list, and the detection counter managed in the detection list is increased by a predetermined value (+ d3). In addition to counting, the end counter is incremented by a predetermined value (+ d4) (step S403).
Referring to the example of FIG. 11, the process of step S403 corresponds to the process for the detection list 306. In this case, since there is no person corresponding to the detection list, a detection list 406 is newly added and registered in the detection list. The detection counter is incremented by +1 (: + d3), and the end counter is incremented by +3 (: + d4).
On the other hand, if it is within the tracking target range in step S402 (step S402-NO), it is out of the reverse seek target, so the process of step S403 is passed.
Steps S401 to S403 are repeated one by one until all the human candidates in the entire detection list that do not correspond to the entire detection list are completed.
After confirming that the reverse seek process has been completed for all person candidates in the all detection list, this process flow is terminated.

“表示用データの処理”
画像切り出し部２０５は、検知リスト作成部２０４から受け取る検知リストをもとに検知した人物が納まる表示対象領域を撮影映像から切り出す。
検知リスト作成部２０４から受け取る新しい検知リストには、“検知リスト作成の処理フロー”で図１１を参照して説明したように、シーク処理（図８、参照）の結果として前回の検知リストから引き続き捉えられる人物及び逆シーク処理（図９、参照）の結果として新たに捉えられた人物が載せられ、これらが表示の対象になる。
新たに作成された検知リストに載った各人物が納まる領域を撮影映像から切り出す際、検知リストに載った各人物のエントリにある検知カウンタのカウント値が予め定めた一定の値（閾値）以上であることを条件に表示対象として、その領域の画像を切り出す。
なお、検知カウンタのカウント値は、シーク処理（図８）のステップＳ３０５及びＳ３０７、並びに逆シーク処理（図９）のＳ４０３で述べたように、それぞれの人物の検知状態によって加算する所定値（ｄ１〜ｄ３）を変えることを可能にしている．よって、表示対象とするか否かの判断を左右する検知カウンタに設定する所定値（ｄ１〜ｄ３）は、人物の検知漏れと誤検知との兼ね合いで適当な重みを付けるようにする。また、同様の観点で上記閾値も適当な値を設定することが必要になる。 “Processing data for display”
The image cutout unit 205 cuts out a display target area in which a person detected based on the detection list received from the detection list creation unit 204 is contained from the captured video.
The new detection list received from the detection list creation unit 204 continues from the previous detection list as a result of the seek process (see FIG. 8), as described with reference to FIG. A person to be captured and a person newly captured as a result of the reverse seek process (see FIG. 9) are placed, and these are displayed.
When the area where each person on the newly created detection list is stored is cut out from the captured video, the count value of the detection counter in the entry of each person on the detection list is greater than or equal to a predetermined value (threshold). The image of the area is cut out as a display target on the condition that there is.
The count value of the detection counter is a predetermined value (d1) that is added according to the detection state of each person as described in steps S305 and S307 of the seek process (FIG. 8) and S403 of the reverse seek process (FIG. 9). -D3) can be changed. Accordingly, the predetermined values (d1 to d3) set in the detection counter that determines whether or not the display target is to be displayed are appropriately weighted in consideration of the detection failure of the person and the erroneous detection. In addition, from the same viewpoint, it is necessary to set an appropriate value for the threshold value.

画像切り出し部２０５及び画像表示部２０６が行う表示用データの作成処理を図１２に示すフロー図にもとづいて詳細に説明する。
図１２の処理フローによると、先ず、新たに作成された検知リストにある個々の人物の切り出し範囲をそれぞれ求める（ステップＳ６０１）。この切り出し範囲の求め方は、検知リストにある人物の中で、検知カウンタのカウント値が予め定めた一定値以上のエントリを持つ人物の検知リストを個々に取り出し、各人物の検知リストのエントリにある検知位置を中心にこの人物が納まる領域を切り出し範囲としてそれぞれ求める。
次に、同検知リストをもとに人物全員が納まる領域の切り出し範囲を求める（ステップＳ６０２）。この切り出し範囲の求め方は、検知リストにある人物の中で、検知カウンタのカウント値が予め定めた一定値以上のエントリを持つ人物全員の検知リストを取り出し、各人物の検知リストのエントリにある検知位置を中心にこの人物が納まる領域を求め、さらに求めた各領域を包含する最小の矩形に一定の余白を設けた領域を切り出し範囲として求める。
なお、この処理フローでは、個人と全員の切り出し範囲を求めたが、ステップＳ６０２と同様の方法によって、２人以上の任意の人数の切り出し範囲を求めるようにしてもよい。 Display data creation processing performed by the image cutout unit 205 and the image display unit 206 will be described in detail with reference to the flowchart shown in FIG.
According to the processing flow of FIG. 12, first, the cutout ranges of individual persons in the newly created detection list are respectively determined (step S601). This cut-out range is calculated by taking out each person's detection list having an entry whose count value of the detection counter is equal to or greater than a predetermined value from the persons in the detection list, and adding them to each person's detection list entry. A region where the person is contained around a certain detection position is obtained as a cutout range.
Next, based on the detection list, a cutout range of an area in which all persons are accommodated is obtained (step S602). The method for obtaining the cut-out range is to take out a detection list of all persons whose entries in the detection list have a count value of a detection counter equal to or greater than a predetermined value, and have the detection list entry for each person. An area in which the person is contained around the detected position is obtained, and an area in which a certain margin is provided in a minimum rectangle including each obtained area is obtained as a cutout range.
In this processing flow, the cut-out range of individuals and all members is obtained, but a cut-out range of any number of two or more people may be obtained by the same method as in step S602.

次に、ステップＳ６０１及びＳ６０２で求めた撮影画像領域の切り出し範囲にもとづいて画像を切り出し、画面表示を行う（ステップＳ６０３）。
ここで行う画像の切り出しにおいては、少なくともステップＳ６０１及びＳ６０２で求めた切り出し範囲を含む領域の画像が、表示する画面のサイズとの関係で適切な画像として納まるように領域の切り出し範囲を調整する。
また、画像表示部２０６は、表示対象領域の撮影画像（フルサイズの映像及び切り出された画像）を表示画面に用いる画像とする処理を行う。この実施形態では、画面選択指示部２０７の指示に従い選択された画面を表示する主画面１１０に用いる画面もしくは図１の選択用画面１２０に用いる画像サムネイルに用いる画面に合わせた画像表示を行うための倍率変換等の処理を施す。また、図３を参照して説明した画面選択に用いる表示要素を埋め込む処理を行い、同図の選択用画面を作成する表示用データ処理を行う。 Next, an image is cut out based on the cutout range of the photographed image area obtained in steps S601 and S602, and screen display is performed (step S603).
In the image cutout performed here, the region cutout range is adjusted so that at least the image of the region including the cutout range obtained in steps S601 and S602 is stored as an appropriate image in relation to the size of the screen to be displayed.
In addition, the image display unit 206 performs processing for converting a captured image (a full-size video and a clipped image) in the display target area into an image to be used for the display screen. In this embodiment, an image is displayed in accordance with the screen used for the main screen 110 that displays the screen selected according to the instruction of the screen selection instruction unit 207 or the screen used for the image thumbnail used for the selection screen 120 in FIG. Processing such as magnification conversion is performed. Further, the processing for embedding display elements used for screen selection described with reference to FIG. 3 is performed, and the display data processing for creating the selection screen shown in FIG. 3 is performed.

１１０・・主画面、１２０・・選択用画面、２０１・・顔検知部、２０２・・動き検知部、２０３・・マージ部、２０４・・検知リスト作成部、２０５・・画像切り出し部、２０６・・画像表示部、２０７・・画面選択指示部、２１０・・操作手段。 110 ··· Main screen, 120 ··· Screen for selection, 201 ·· Face detection unit, 202 ·· Motion detection unit, 203 · · Merge unit, 204 · · Detection list creation unit, 205 · · Image cropping unit, 206 · · Image display unit, 207... Screen selection instruction unit, 210.

特開平１０−５１７５５号公報JP-A-10-51755

Claims

A person detecting means for detecting a person candidate from a video shot by a video camera at an angle of view where a plurality of persons enter;
Display target determination means for determining a person to be displayed based on a detection result of a person candidate detected by the person detection means from a time-series captured video;
Area classifying means for classifying an area on the captured video in which the person determined by the display target determining means is stored;
Video cutout means for cutting out the video of the area divided by the area dividing means from the shot video;
Image display means for displaying an image on the screen based on the video;
Processing that uses all of the video clipped by the video cutout means as thumbnails for use in the selection screen of the image display means, and an image using the video cut out by the video cutout means as the main screen of the image display means Display data processing means for performing the processing
Area selection means for selecting a part of the area by an operation on the thumbnail of the selection screen of the image display means;
Video having display control means for controlling display on the image display means using data processed by the display data processing means based on a cut-out video of the area selected by the area selection means Display device.

The video display device according to claim 1 ,
The image display device according to claim 1, wherein the area classifying unit classifies an area in which all persons are accommodated and an area in which individual persons are accommodated.

In the video display device according to claim 1 or 2 ,
The display data processing means further adds all captured images as data used for the display area for screen selection, and processes the all captured images as data used for the main screen.

The video display device according to any one of claims 1 to 3 ,
The video display device, wherein the person detecting means detects a face and a motion from a captured video and detects a human candidate based on the detection result.

The video display device according to claim 4 ,
The person detection means discards the motion detection result when the detected motion is within a certain range around the detected face, and collects the motion detection result and the face detection result that have not been discarded, together with a human candidate in a captured video unit An image display device provided as a detection result of

The video display device according to any one of claims 1 to 5 ,
The display target determining means determines whether or not the candidate person obtained as the current detection result and the previously determined display target person can be regarded as the same person according to a predetermined determination condition based on both detection results. A video display device characterized by:

The video display device according to claim 6 ,
The video display apparatus according to claim 1, wherein the display target determination unit uses a predetermined determination condition that both are within a predetermined range.

In the video display device according to any one of claims 1 to 7 ,
The display target determination unit uses the detection data of the person determined as the display target last time, the detection data of the human candidate detected by the person detection unit from the current captured video, and the determination reference data, respectively. A video display device comprising means for managing data common to both in a form in which registration is performed with the same entry.

The computer is caused to function as each means of the person detection means, display target determination means, area segmentation means, video segmentation means, display data processing means, area selection means, and display control means in the video display device according to claim 1. Program for.

A computer-readable recording medium on which the program according to claim 9 is recorded.