JP2022054247A

JP2022054247A - Information processing device and information processing method

Info

Publication number: JP2022054247A
Application number: JP2020161325A
Authority: JP
Inventors: 寛高木; Hiroshi Takagi; 裕輔御手洗; Hirosuke Mitarai; 俊太舘; Shunta Tachi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2022-04-06

Abstract

To provide a technique for enabling a user to select the desired subject among subjects in a captured image.SOLUTION: An information processing device acquires an image captured by an imaging unit. The information processing device selects one or more subjects as a selected subject on the basis of the movement of the subject in the acquired captured image and information pertaining to the field of view of the imaging unit or information pertaining to a viewing operation on the captured image.SELECTED DRAWING: Figure 1

Description

本発明は、画像における被写体の選択技術に関するものである。 The present invention relates to a technique for selecting a subject in an image.

時系列的に供給される画像から特定の被写体像を抽出して被写体を追跡する技術では、動画像における人間の顔領域や人体領域の特定などを行っている。被写体追跡技術は、例えば、通信会議、マン・マシン・インターフェイス、セキュリティ、任意の被写体を追跡するためのモニタ・システム、画像圧縮などの多くの分野で使用可能である。 In the technique of extracting a specific subject image from the images supplied in time series and tracking the subject, the human face region and the human body region in the moving image are specified. Subject tracking techniques can be used in many areas, such as communication conferencing, man-machine interfaces, security, monitor systems for tracking arbitrary subjects, and image compression.

デジタルスチルカメラやデジタルビデオカメラでは、タッチパネルなどを用いた操作により指定される画像内の被写体像を抽出して追跡し、被写体に対する焦点状態や露出状態を最適化する技術が提案されている。 For digital still cameras and digital video cameras, a technique has been proposed in which a subject image in an image specified by an operation using a touch panel or the like is extracted and tracked to optimize the focus state and exposure state for the subject.

特許文献１には、画像内の被写体の動きベクトルを算出し、移動量が所定値以上の被写体のうち、コントラストが最も高い被写体を追尾対象として選択する撮像装置が開示されている。 Patent Document 1 discloses an image pickup device that calculates a motion vector of a subject in an image and selects a subject having the highest contrast among subjects having a movement amount of a predetermined value or more as a tracking target.

特許第５５８９５２７号Patent No. 5589527

ＤａｌａｌａｎｄＴｒｉｇｇｓ．ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓｆｏｒＨｕｍａｎＤｅｔｅｃｔｉｏｎ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２００５Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005 Ｂ．Ｋ．Ｐ．ＨｏｒｎａｎｄＢ．Ｇ．Ｓｃｈｕｎｃｋ，“ＤｅｔｅｒｍｉｎｉｎｇＯｐｔｉｃａｌＦｌｏｗ，ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ”，ｖｏｌ．１７，ｐｐ．１８５－２０３，１９８１．B. K. P. Horn and B. G. Schunk, "Determining Optical Flow, Artificial Intelligence", vol. 17, pp. 185-203, 1981. ＰｈｉｌｉｐｐＦｉｓｃｈｅｒ，ｅｔａｌ．ＦｌｏｗＮｅｔ：ＬｅａｒｎｉｎｇＯｐｔｉｃａｌＦｌｏｗｗｉｔｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓ，２０１５．Philippp Fisher, et al. FlowNet: Learning Optical Flow with Convolutional Network, 2015. ＪｏｓｅｐｈＲｅｄｍｏｎ，ｅｔａｌ．ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ：Ｕｎｉｆｉｅｄ，Ｒｅａｌ－ＴｉｍｅＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎ，２０１５Joseph Redmon, et al. You Only Look Access: Unified, Real-Time Object Detection, 2015

特許文献１に開示されている被写体選択方法では、被写体の移動量とコントラストに基づいて被写体を選択しているが、被写体が複数あり、撮影者が選択したい被写体の移動量が小さい場合やコントラストが最大でない場合は、被写体の選択精度が低下する。本発明は、撮像画像における被写体のうちユーザが所望する被写体の選択可能にするための技術を提供する。 In the subject selection method disclosed in Patent Document 1, the subject is selected based on the movement amount and contrast of the subject, but when there are a plurality of subjects and the movement amount of the subject desired to be selected by the photographer is small or the contrast is high. If it is not the maximum, the selection accuracy of the subject is reduced. The present invention provides a technique for enabling a user to select a desired subject among subjects in a captured image.

本発明の一様態は、撮像部により撮像された撮像画像を取得する取得手段と、前記取得手段に取得された前記撮像画像における被写体の動きと、該撮像部の視野に係るもしくは該撮像画像に対する閲覧操作に係る情報と、に基づいて、選択被写体として１以上の被写体を選択する選択手段とを備えることを特徴とする。 The uniform state of the present invention relates to the acquisition means for acquiring the captured image captured by the imaging unit, the movement of the subject in the captured image acquired by the acquisition unit, and the visual field of the imaging unit or the captured image. It is characterized by comprising information related to a browsing operation and selection means for selecting one or more subjects as a selection subject based on the information.

本発明により、撮像画像における被写体のうちユーザが所望する被写体の選択可能にするための技術を提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a technique for enabling a user to select a desired subject among subjects in a captured image.

（ａ）はコンピュータ装置のハードウェア構成例を示すブロック図、（ｂ）はシステムの構成例を示すブロック図。(A) is a block diagram showing a hardware configuration example of a computer device, and (b) is a block diagram showing a system configuration example. （ａ）は情報処理装置１００の動作を示すフローチャート、（ｂ）は注目度算出部１２５の動作パラメータを更新（学習）するための処理のフローチャート、（ｃ）は撮像画像のシーケンスの一例を示す図。(A) is a flowchart showing the operation of the information processing apparatus 100, (b) is a flowchart of processing for updating (learning) the operation parameters of the attention calculation unit 125, and (c) is an example of a sequence of captured images. figure. （ａ）はＮＮモデルの具体例を示す図、（ｂ）は情報取得部１２２、検出部１２３、動き算出部１２４、注目度算出部１２５の機能をＣＮＮを用いて実施する構成を示す図、（ｃ）は撮像画像の表示例を示す図。(A) is a diagram showing a specific example of the NN model, and (b) is a diagram showing a configuration in which the functions of the information acquisition unit 122, the detection unit 123, the motion calculation unit 124, and the attention level calculation unit 125 are performed using CNN. (C) is a diagram showing a display example of a captured image. （ａ）は撮像装置の外観例を示す図、（ｂ）は表示部４５１における表示例を示す図。(A) is a diagram showing an example of the appearance of an image pickup apparatus, and (b) is a diagram showing an example of display on the display unit 451. （ａ）はシステムの機能構成例を示すブロック図、（ｂ）は情報処理装置５００の動作を示すフローチャート、（ｃ）は撮像画像の一例を示す図。(A) is a block diagram showing an example of a functional configuration of a system, (b) is a flowchart showing the operation of the information processing apparatus 500, and (c) is a diagram showing an example of a captured image. （ａ）はシステムの機能構成例を示すブロック図、（ｂ）は部分画像６５０の操作部６０１の表示画面における表示例を示す図、（ｃ）は拡大画像６６０の表示例を示す図。(A) is a block diagram showing a functional configuration example of the system, (b) is a diagram showing a display example on the display screen of the operation unit 601 of the partial image 650, and (c) is a diagram showing a display example of the enlarged image 660.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential for the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are given the same reference numbers, and duplicate explanations are omitted.

［第１の実施形態］
まず、本実施形態に係るシステムの構成例について、図１（ｂ）のブロック図を用いて説明する。図１（ｂ）に示す如く、本実施形態に係るシステムは、撮像部１５０と、情報処理装置１００と、を有し、撮像部１５０と情報処理装置１００との間は、無線および／または無線のネットワークを介して互いにデータ通信が可能なように構成されている。ここで、撮像部１５０は、ネットワークカメラであるとする。 [First Embodiment]
First, a configuration example of the system according to the present embodiment will be described with reference to the block diagram of FIG. 1 (b). As shown in FIG. 1 (b), the system according to the present embodiment has an image pickup unit 150 and an information processing device 100, and the image pickup unit 150 and the information processing device 100 are wirelessly and / or wirelessly connected to each other. It is configured to enable data communication with each other via the network of. Here, it is assumed that the image pickup unit 150 is a network camera.

まず、撮像部１５０について説明する。撮像部１５０は、動画像を撮像し、該動画像における各フレームの画像を撮像画像として出力する。なお、撮像部１５０は、定期的もしくは不定期的に静止画像を撮像し、該静止画像を撮像画像として出力する装置であっても良い。いずれにせよ、撮像部１５０からは撮像画像のシーケンスが出力される。撮像部１５０は、情報処理装置１００からの制御に基づいて、パン方向または／およびチルト方向に撮像方向を変化することができる。このように、撮像部１５０は、視野を変化させることができる撮像装置である。なお、図１（ｂ）では、撮像部１５０は情報処理装置１００とは別個の装置としているが、撮像部１５０と情報処理装置１００とを一体化させて１台の撮像機能付き情報処理装置を構成しても良い。 First, the image pickup unit 150 will be described. The image pickup unit 150 captures a moving image and outputs an image of each frame in the moving image as a captured image. The image pickup unit 150 may be a device that periodically or irregularly captures a still image and outputs the still image as an captured image. In any case, the image pickup unit 150 outputs a sequence of captured images. The image pickup unit 150 can change the image pickup direction in the pan direction and / or the tilt direction based on the control from the information processing apparatus 100. In this way, the image pickup unit 150 is an image pickup device capable of changing the field of view. In FIG. 1B, the image pickup unit 150 is a separate device from the information processing device 100, but the image pickup unit 150 and the information processing device 100 are integrated to form one information processing device with an image pickup function. It may be configured.

次に、情報処理装置１００について説明する。画像取得部１２１は、撮像部１５０から出力された撮像画像を取得する。情報取得部１２２は、撮像時における撮像部１５０の視野に関する情報を視野制御情報として取得する。検出部１２３は、画像取得部１２１が取得した撮像画像から被写体（本実施形態では人とする）を検出する。動き算出部１２４は、検出部１２３が検出したそれぞれの被写体の撮像画像における動きを求める。注目度算出部１２５は、情報取得部１２２が取得した視野制御情報と、動き算出部１２４が算出した「それぞれの被写体の撮像画像における動き」と、に基づいて、該被写体に対する注目度を算出する。学習部１２７は、注目度算出部１２５の動作パラメータを更新（学習）する。選択部１２６は、検出部１２３が検出したそれぞれの被写体について注目度算出部１２５が算出した注目度に基づいて、該それぞれの被写体から１以上の被写体を選択する。 Next, the information processing apparatus 100 will be described. The image acquisition unit 121 acquires the captured image output from the image pickup unit 150. The information acquisition unit 122 acquires information regarding the visual field of the imaging unit 150 at the time of imaging as visual field control information. The detection unit 123 detects a subject (referred to as a person in this embodiment) from the captured image acquired by the image acquisition unit 121. The motion calculation unit 124 obtains motion in the captured image of each subject detected by the detection unit 123. The attention degree calculation unit 125 calculates the degree of attention to the subject based on the visual field control information acquired by the information acquisition unit 122 and the “movement in the captured image of each subject” calculated by the motion calculation unit 124. .. The learning unit 127 updates (learns) the operation parameters of the attention level calculation unit 125. The selection unit 126 selects one or more subjects from the respective subjects based on the attention level calculated by the attention level calculation unit 125 for each subject detected by the detection unit 123.

次に、画像取得部１２１が取得する撮像画像のシーケンスの一例について、図２（ｃ）を用いて説明する。撮像画像２５０は、撮像部１５０が時刻ｔ１において撮像した撮像画像であり、該撮像画像２５０には歩行中の被写体２５１および被写体２５２が含まれている。（ｘ１１，ｙ１１）は撮像画像２５０における被写体２５１の重心位置（画像座標）であり、（ｘ２１，ｙ２１）は撮像画像２５０における被写体２５２の重心位置（画像座標）を示す。 Next, an example of the sequence of the captured images acquired by the image acquisition unit 121 will be described with reference to FIG. 2 (c). The captured image 250 is an captured image captured by the imaging unit 150 at time t1, and the captured image 250 includes a walking subject 251 and a subject 252. (X11, y11) indicates the position of the center of gravity (image coordinates) of the subject 251 in the captured image 250, and (x21, y21) indicates the position of the center of gravity (image coordinates) of the subject 252 in the captured image 250.

撮像画像２６０は、撮像部１５０が、撮像画像２５０の撮像後、撮像方向を変えて（撮像部１５０を回転させて）時刻ｔ２（時刻ｔ１よりも後の時刻）において撮像した撮像画像である。撮像画像２６０には歩行中の被写体２５１および被写体２５２が含まれている。（ｘ１２，ｙ１２）は撮像画像２６０における被写体２５１の重心位置（画像座標）であり、（ｘ２２，ｙ２２）は撮像画像２６０における被写体２５２の重心位置（画像座標）を示す。次に、情報処理装置１００の動作について、図２（ａ）のフローチャートに従って説明する。 The captured image 260 is an captured image captured by the imaging unit 150 at time t2 (time after time t1) by changing the imaging direction (rotating the imaging unit 150) after imaging the captured image 250. The captured image 260 includes a walking subject 251 and a subject 252. (X12, y12) is the position of the center of gravity (image coordinates) of the subject 251 in the captured image 260, and (x22, y22) indicates the position of the center of gravity (image coordinates) of the subject 252 in the captured image 260. Next, the operation of the information processing apparatus 100 will be described according to the flowchart of FIG. 2A.

＜ステップＳ２０１＞
画像取得部１２１が撮像部１５０から出力された撮像画像を取得する。なお、画像取得部１２１は撮像画像を撮像部１５０から取得することに限らない。例えば、画像取得部１２１は、撮像部１５０により撮像された撮像画像群を保持している記憶装置から撮像画像を取得するようにしても良い。 <Step S201>
The image acquisition unit 121 acquires the captured image output from the image pickup unit 150. The image acquisition unit 121 is not limited to acquiring the captured image from the image pickup unit 150. For example, the image acquisition unit 121 may acquire an captured image from a storage device that holds a group of captured images captured by the image pickup unit 150.

＜ステップＳ２０２＞
情報取得部１２２は、「ステップＳ２０１で取得した撮像画像の撮像時における撮像部１５０の視野制御情報」として「ステップＳ２０１で取得した撮像画像の撮像時における撮像部１５０の回転量」を取得する。撮像部１５０の回転量は、撮像部１５０が有するジャイロセンサが測定した該撮像部１５０の角速度から算出することができる。時刻ｔにおける撮像部１５０の角速度をｗ（ｔ）とすると、時刻ｔ１から時刻ｔ２までの間の撮像部１５０の回転量Δθは以下の式を計算することで算出することができる。 <Step S202>
The information acquisition unit 122 acquires "the amount of rotation of the image pickup unit 150 at the time of imaging the captured image acquired in step S201" as "field of view control information of the image pickup unit 150 at the time of imaging the captured image acquired in step S201". The amount of rotation of the image pickup unit 150 can be calculated from the angular velocity of the image pickup unit 150 measured by the gyro sensor of the image pickup unit 150. Assuming that the angular velocity of the imaging unit 150 at time t is w (t), the rotation amount Δθ of the imaging unit 150 between time t1 and time t2 can be calculated by calculating the following equation.

よって、情報取得部１２２は、撮像部１５０のジャイロセンサから出力される「各時刻における撮像部１５０の角速度」を用いて上記の式を計算することで、「ステップＳ２０１で取得した撮像画像の撮像時における撮像部１５０の回転量」を求める。 Therefore, the information acquisition unit 122 calculates the above equation using the “angular velocity of the image pickup unit 150 at each time” output from the gyro sensor of the image pickup unit 150, and “captures the captured image acquired in step S201”. The amount of rotation of the image pickup unit 150 at the time "is obtained.

なお、撮像部１５０の視野制御情報は、該撮像部１５０の回転量に限らない。例えば、撮像部１５０の加速度センサによる測定結果、撮像部１５０の測距センサによる測定結果、撮像部１５０の絞りの情報、撮像部１５０が載置されている雲台のパン、チルト、であっても良い。また、撮像部１５０がネットワークカメラ（情報処理装置１００からネットワークを介して遠隔操作可能なカメラ）である場合、撮像部１５０の視野制御情報は、撮像部１５０のパン、チルト、ズームの制御情報であっても良い。 The visual field control information of the image pickup unit 150 is not limited to the rotation amount of the image pickup unit 150. For example, the measurement result by the acceleration sensor of the image pickup unit 150, the measurement result by the distance measurement sensor of the image pickup unit 150, the aperture information of the image pickup unit 150, and the pan and tilt of the pan head on which the image pickup unit 150 is mounted. Is also good. When the image pickup unit 150 is a network camera (a camera that can be remotely controlled from the information processing device 100 via the network), the field of view control information of the image pickup unit 150 is the pan, tilt, and zoom control information of the image pickup unit 150. There may be.

また、情報取得部１２２は、「ステップＳ２０１で取得した撮像画像に添付されている視野制御情報」を、「ステップＳ２０１で取得した撮像画像の撮像時における撮像部１５０の視野制御情報」として取得するようにしても良い。このように、情報取得部１２２による視野制御情報の取得方法は特定の取得方法に限らないし、また、視野制御情報として利用可能な情報は上記の情報に限らない。 Further, the information acquisition unit 122 acquires "the visual field control information attached to the captured image acquired in step S201" as "the visual field control information of the imaging unit 150 at the time of capturing the captured image acquired in step S201". You may do so. As described above, the method of acquiring the visual field control information by the information acquisition unit 122 is not limited to the specific acquisition method, and the information that can be used as the visual field control information is not limited to the above information.

＜ステップＳ２０３＞
検出部１２３は、ステップＳ２０１で取得した撮像画像から被写体を検出する。撮像画像から被写体を検出する方法としては、例えば、上記の非特許文献１に記載の方法がある。非特許文献１に記載の方法では、画像から勾配方向ヒストグラム特徴（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）を抽出し、該抽出した特徴量をサポートベクターマシンで学習したモデルを用いて人か否かを識別する。なお、撮像画像から被写体を検出するための方法は、非特許文献１に開示されている方法に限らない。例えば、抽出する特徴量は勾配方向ヒストグラム特徴に限らず、Ｈａａｒ－ｌｉｋｅ特徴、ＬＢＰＨ特徴（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎＨｉｓｔｏｇｒａｍ）等を用いても良いし、それらを組み合せても良い。また、人を識別するモデルはサポートベクターマシンに限らず、アダブースト識別器、ランダム分類木（ＲａｎｄｏｍｉｚｅｄＴｒｅｅ）等を用いても良い。 <Step S203>
The detection unit 123 detects a subject from the captured image acquired in step S201. As a method of detecting a subject from a captured image, for example, there is the method described in Non-Patent Document 1 described above. In the method described in Non-Patent Document 1, gradient direction histogram features (Histograms of Oriented Gradients) are extracted from the image, and the extracted features are identified as human or not by using a model trained by a support vector machine. The method for detecting a subject from a captured image is not limited to the method disclosed in Non-Patent Document 1. For example, the feature amount to be extracted is not limited to the gradient direction histogram feature, but a Haar-like feature, an LBPH feature (Local Binary Pattern Histogram), or the like may be used, or a combination thereof may be used. Further, the model for identifying a person is not limited to the support vector machine, and an AdaBoost classifier, a random classification tree (Randomized Tree), or the like may be used.

そして検出部１２３は、撮像画像から検出した被写体ごとに、該被写体の画像領域（被写体領域）の位置情報と、該被写体領域に対する尤度と、を求める。「被写体領域の位置情報」は、該被写体領域が矩形領域の場合には、該被写体領域の四隅の画像座標であっても良いし、対向する２つの頂点のそれぞれの画像座標であっても良い。また、「被写体領域の位置情報」は、被写体領域の形状にかかわらず、該被写体領域の重心位置（画像座標）であっても良い。「被写体領域に対する尤度」は、該被写体領域から抽出された特徴量と、被写体を識別するためのモデルと、の一致度（人としての確度）を表す。そして検出部１２３は、所定値以上の尤度を求めた被写体領域に対応する被写体を候補被写体とする。 Then, the detection unit 123 obtains the position information of the image area (subject area) of the subject and the likelihood with respect to the subject area for each subject detected from the captured image. When the subject area is a rectangular area, the "position information of the subject area" may be the image coordinates of the four corners of the subject area or the image coordinates of the two vertices facing each other. .. Further, the "position information of the subject area" may be the position of the center of gravity (image coordinates) of the subject area regardless of the shape of the subject area. The "likelihood with respect to the subject area" represents the degree of agreement (accuracy as a person) between the feature amount extracted from the subject area and the model for identifying the subject. Then, the detection unit 123 sets a subject corresponding to the subject region for which the likelihood of the predetermined value or more is obtained as a candidate subject.

＜ステップＳ２０４＞
動き算出部１２４は、それぞれの候補被写体について、該候補被写体の撮像画像における動きを算出する。「候補被写体の撮像画像における動き」は、例えば、撮像画像間における候補被写体の重心位置のずれとして求める。図２（ｃ）を例にとると、撮像画像２６０と撮像画像２５０との間における被写体２５１の動きは、以下の式を用いて求めることができる。 <Step S204>
The motion calculation unit 124 calculates the motion of each candidate subject in the captured image of the candidate subject. The "movement of the candidate subject in the captured image" is obtained, for example, as the deviation of the position of the center of gravity of the candidate subject between the captured images. Taking FIG. 2C as an example, the movement of the subject 251 between the captured image 260 and the captured image 250 can be obtained by using the following equation.

Ｖ（ｘ、ｙ）＝（（ｘ１２－ｘ１１）／（ｔ２－ｔ１），（ｙ１２－ｙ１１）／（ｔ２－ｔ１））
Ｖ（ｘ、ｙ）は、「撮像画像２６０と撮像画像２５０との間における被写体２５１の動き」を表すベクトル（動きベクトル）を表している。なお、撮像画像における被写体の動きを求める方法は特定の方法に限らず、例えば、上記の非特許文献２に開示されているオプティカルフロー等の各画素における速度ベクトルを求めるようにしても良い。 V (x, y) = ((x12-x11) / (t2-t1), (y12-y11) / (t2-t1))
V (x, y) represents a vector (motion vector) representing "movement of the subject 251 between the captured image 260 and the captured image 250". The method for obtaining the movement of the subject in the captured image is not limited to a specific method, and for example, the velocity vector in each pixel such as the optical flow disclosed in Non-Patent Document 2 may be obtained.

＜ステップＳ２０５＞
注目度算出部１２５は、ステップＳ２０２で取得した視野制御情報と、ステップＳ２０４で候補被写体ごとに算出した動きと、に基づいて、該候補被写体に対する注目度を算出する。 <Step S205>
The attention degree calculation unit 125 calculates the attention degree to the candidate subject based on the visual field control information acquired in step S202 and the movement calculated for each candidate subject in step S204.

例えば、注目度算出部１２５は、予め機械学習により学習された学習済みモデルを用いる。学習済みモデルは、例えば、ＮＮ（ＮｅｕｒａｌＮｅｔｗｏｒｋ）モデルで構成可能である。例えば、ＮＮモデルとして、候補被写体の動きと、撮像部１５０の視野制御情報と、を入力すると、該候補被写体の注目度を出力するように学習された学習済みモデルを予め作成しておく。そして、注目度算出部１２５は、ステップＳ２０２で取得した視野制御情報と、ステップＳ２０４で候補被写体ごとに算出した動きと、を該学習済みモデルに入力し、該学習済みモデルの出力を「該候補被写体に対する注目度」として算出する。 For example, the attention degree calculation unit 125 uses a trained model learned in advance by machine learning. The trained model can be configured by, for example, an NN (Neural Network) model. For example, as an NN model, when the movement of the candidate subject and the visual field control information of the imaging unit 150 are input, a learned model trained to output the degree of attention of the candidate subject is created in advance. Then, the attention degree calculation unit 125 inputs the visual field control information acquired in step S202 and the movement calculated for each candidate subject in step S204 into the trained model, and outputs the output of the trained model as “the candidate”. Calculated as "degree of attention to the subject".

ここで、学習済みモデルとして用いるＮＮモデルの具体例を、図３（ａ）を用いて説明する。図３（ａ）のＮＮモデルは、入力層３０２、中間層３０３、出力層３０４の３つの全結合層を有し、各層は、互いに異なる複数のニューロン３０５を有している。入力層３０２の各ニューロンには、撮像部１５０の回転量と候補被写体の動きとが入力され、入力層３０２に入力されたこれらの情報は中間層３０３、出力層３０４に伝達され、出力層３０４からは該候補被写体に対する注目度（推定注目度３０６）が出力される。なお、図３（ａ）では全結合層は１層だけであるが、複数あっても良い。また、ＮＮモデルは、中間層が無く入力層と出力層のみで構成されていても良い。 Here, a specific example of the NN model used as the trained model will be described with reference to FIG. 3A. The NN model of FIG. 3A has three fully connected layers, an input layer 302, an intermediate layer 303, and an output layer 304, each layer having a plurality of neurons 305 different from each other. The rotation amount of the imaging unit 150 and the movement of the candidate subject are input to each neuron of the input layer 302, and these information input to the input layer 302 are transmitted to the intermediate layer 303 and the output layer 304, and are transmitted to the output layer 304. Outputs the degree of attention (estimated degree of attention 306) for the candidate subject. In FIG. 3A, there is only one fully connected layer, but there may be a plurality of fully bonded layers. Further, the NN model may be composed of only an input layer and an output layer without an intermediate layer.

なお、学習済みモデルとして、様々な入力に対して注目度の算出が可能な汎用的な学習済みモデルを使用しても良く、入力に応じて学習済みモデルを使い分けても良い。例えば、撮像画像中の候補被写体の数に応じて学習済みモデルを使い分ける方法がある。なお、学習済みモデルはＮＮモデルに限らず、ＳＶＲ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｒｅｓｓｉｏｎ）等の他の機械学習モデルで構成しても良いし、候補被写体と撮像部１５０の位置の夫々の時間変化の相関を注目度として算出することも可能である。あるいは、再帰型のＮＮモデルとして、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）やＬＳＴＭ（Ｌｏｎｇｓｈｏｒｔ－ｔｅｒｍｍｅｍｏｒｙ）等を用いても良い。 As the trained model, a general-purpose trained model capable of calculating the degree of attention for various inputs may be used, or the trained model may be used properly according to the inputs. For example, there is a method of properly using the trained model according to the number of candidate subjects in the captured image. The trained model is not limited to the NN model, but may be composed of other machine learning models such as SVR (Support Vector Regression), and attention should be paid to the correlation between the time change of the candidate subject and the position of the imaging unit 150. It can also be calculated as a degree. Alternatively, as a recursive NN model, RNN (Recurrent Neural Network), LSTM (Long short-term memory), or the like may be used.

＜ステップＳ２０６＞
選択部１２６は、ステップＳ２０５で算出したそれぞれの候補被写体の注目度に基づいて、該それぞれの候補被写体から１以上の候補被写体を選択被写体として選択する。例えば、選択部１２６は、ステップＳ２０５で算出したそれぞれの候補被写体の注目度のうち最も値が大きい注目度に対応する候補被写体を選択被写体として選択する。また例えば、選択部１２６は、ステップＳ２０５で算出したそれぞれの候補被写体の注目度のうち閾値以上の注目度に対応する候補被写体を選択被写体として選択する。また例えば、選択部１２６は、ステップＳ２０５で算出した注目度が何れも閾値を超えない場合は、何れの候補被写体も選択被写体として選択しない、としても良い。また例えば、選択部１２６は、複数フレームに渡って、候補被写体ごとに注目度の平均値を求め、最も値が大きい平均値に対応する候補被写体を選択被写体として選択するようにしても良い。 <Step S206>
The selection unit 126 selects one or more candidate subjects from the respective candidate subjects as the selected subject based on the degree of attention of each candidate subject calculated in step S205. For example, the selection unit 126 selects the candidate subject corresponding to the highest attention level among the attention levels of each candidate subject calculated in step S205 as the selected subject. Further, for example, the selection unit 126 selects a candidate subject corresponding to the attention level equal to or higher than the threshold value among the attention levels of the candidate subjects calculated in step S205 as the selection subject. Further, for example, if the attention level calculated in step S205 does not exceed the threshold value, the selection unit 126 may not select any candidate subject as the selected subject. Further, for example, the selection unit 126 may obtain the average value of the degree of attention for each candidate subject over a plurality of frames, and select the candidate subject corresponding to the average value having the largest value as the selection subject.

なお、選択部１２６は選択被写体の選択後、該選択被写体に関する情報を出力するが、出力する情報や該情報の出力先については様々な形態が考えられ、特定の形態に限らない。 After selecting the selected subject, the selection unit 126 outputs information about the selected subject, but various forms can be considered for the information to be output and the output destination of the information, and the present invention is not limited to a specific form.

次に、注目度算出部１２５の動作パラメータを更新（学習）するために情報処理装置１００が行う処理について、図２（ｂ）のフローチャートに従って説明する。図２（ｂ）において、図２（ａ）に示した処理ステップと同じ処理ステップには同じステップ番号を付しており、該処理ステップに係る説明は省略する。 Next, the process performed by the information processing apparatus 100 for updating (learning) the operation parameters of the attention degree calculation unit 125 will be described with reference to the flowchart of FIG. 2 (b). In FIG. 2B, the same processing step as the processing step shown in FIG. 2A is assigned the same step number, and the description of the processing step will be omitted.

情報処理装置１００は、撮像画像から選択被写体を選択する場合には、図２（ａ）のフローチャートに従った処理を行い、注目度算出部１２５の動作パラメータを更新（学習）する場合には、図２（ｂ）のフローチャートに従った処理を行う。なお、情報処理装置１００は、上記のステップＳ２０６の処理の後、図２（ｂ）のステップＳ１２１１に処理を移行しても良い。 The information processing apparatus 100 performs processing according to the flowchart of FIG. 2A when selecting a selected subject from the captured image, and updates (learns) the operation parameters of the attention degree calculation unit 125 when updating (learning) the operation parameters of the attention degree calculation unit 125. Processing is performed according to the flowchart of FIG. 2 (b). The information processing apparatus 100 may shift the processing to step S1211 of FIG. 2B after the processing of step S206 described above.

＜ステップＳ１２１１＞
学習部１２７は、ステップＳ２０５で算出した候補被写体の注目度と、該候補被写体について設定されている正解注目度と、の差分（誤差）を算出する。注目度と正解注目度との誤差の求め方は特定の求め方に限らず、注目度と正解注目度との差の二乗、注目度と正解注目度との差の絶対値、など、様々な求め方が適用可能である。ここで、候補被写体の正解注目度の設定について説明する。例えば、学習部１２７は、上記のステップＳ２０１で取得した撮像画像を表示画面に表示させる。 <Step S1211>
The learning unit 127 calculates the difference (error) between the attention level of the candidate subject calculated in step S205 and the correct answer attention level set for the candidate subject. The method of calculating the error between the degree of attention and the correct answer attention is not limited to a specific method, but there are various methods such as the square of the difference between the degree of attention and the correct answer attention, the absolute value of the difference between the degree of attention and the correct answer attention, and so on. The method of calculation is applicable. Here, the setting of the correct answer attention level of the candidate subject will be described. For example, the learning unit 127 displays the captured image acquired in step S201 above on the display screen.

表示画面における撮像画像の表示例を図３（ｃ）に示す。図３（ｃ）の撮像画像３５０には候補被写体３５１および候補被写体３５２が含まれている。図３（ｃ）ではユーザは、候補被写体３５１を「所望の候補被写体」として選択するべく、マウスを操作してカーソル３５３を候補被写体３５１付近に移動させ、そこでドラッグ操作を行って枠３５４内に候補被写体３５１が収まるように枠３５４を制御している。そして図３（ｃ）に示す如く、枠３５４内に候補被写体３５１が収まった状態でユーザがマウスを操作して決定指示を入力すると、学習部１２７は、候補被写体３５１の正解注目度を「１」に設定する（全ての候補被写体の正解注目度の初期値は「０」である）。なお、複数の候補被写体を所望の候補被写体として選択することで、該複数の候補被写体のそれぞれの正解注目度を「１」に設定することができる。 FIG. 3C shows an example of displaying the captured image on the display screen. The captured image 350 of FIG. 3C includes a candidate subject 351 and a candidate subject 352. In FIG. 3C, the user operates the mouse to move the cursor 353 to the vicinity of the candidate subject 351 in order to select the candidate subject 351 as the “desired candidate subject”, and drags the candidate subject 351 into the frame 354. The frame 354 is controlled so that the candidate subject 351 fits. Then, as shown in FIG. 3C, when the user operates the mouse to input a decision instruction while the candidate subject 351 is contained in the frame 354, the learning unit 127 sets the correct attention level of the candidate subject 351 to "1". (The initial value of the correct answer attention of all candidate subjects is "0"). By selecting a plurality of candidate subjects as desired candidate subjects, the correct answer attention level of each of the plurality of candidate subjects can be set to "1".

なお、被写体が選択されて所定時間以内に再選択操作を受け付けた場合は、初めに選択された被写体の正解注目度を「０」としても良い。また、画像取得部１２１が取得する撮像画像のシーケンスにおいて最初の撮像画像に対して所望の候補被写体の選択操作を行った場合、以降のそれぞれの撮像画像から検出された候補被写体と該所望の候補被写体とのテンプレートマッチングを行う。そして、該所望の候補被写体との一致度が閾値以上の候補被写体の正解注目度を「１」に設定する。なお、該所望の候補被写体との一致度が閾値以上の候補被写体がない撮像画像は学習に用いなくても良い。 If the subject is selected and the reselection operation is accepted within a predetermined time, the correct answer attention level of the initially selected subject may be set to "0". Further, when a desired candidate subject is selected for the first captured image in the sequence of captured images acquired by the image acquisition unit 121, the candidate subjects detected from the subsequent captured images and the desired candidate are selected. Perform template matching with the subject. Then, the degree of attention to the correct answer of the candidate subject whose degree of agreement with the desired candidate subject is equal to or greater than the threshold value is set to "1". It should be noted that the captured image in which there is no candidate subject whose degree of coincidence with the desired candidate subject is equal to or greater than the threshold value may not be used for learning.

＜ステップＳ１２１２＞
学習部１２７は、ステップＳ１２１１で算出した誤差に基づいて注目度算出部１２５の動作パラメータを更新（学習）する。本実施形態では、学習部１２７は、ステップＳ１２１１で算出した誤差に基づいて、注目度算出部１２５が使用する学習済みモデルの動作パラメータを更新（学習）する。 <Step S1212>
The learning unit 127 updates (learns) the operation parameters of the attention level calculation unit 125 based on the error calculated in step S1211. In the present embodiment, the learning unit 127 updates (learns) the operation parameters of the trained model used by the attention level calculation unit 125 based on the error calculated in step S1211.

例えば、注目度算出部１２５が図３（ａ）に示す構成を有する学習済みモデルを用いる場合、学習部１２７は、候補被写体について出力層３０４から出力された推定注目度３０６と、該候補被写体の正解注目度３１６と、の誤差を求める。そして学習部１２７は、該求めた誤差を全結合層（出力層３０４、中間層３０３、入力層３０２）に順番に逆伝搬させていき、ニューロン間の重みを更新する。 For example, when the attention degree calculation unit 125 uses a trained model having the configuration shown in FIG. 3A, the learning unit 127 has an estimated attention level 306 output from the output layer 304 for the candidate subject and the candidate subject. Find the error between the correct answer and the attention level 316. Then, the learning unit 127 propagates the obtained error back to the fully connected layers (output layer 304, intermediate layer 303, input layer 302) in order, and updates the weights between neurons.

なお、撮像画像のレーティングの評価値に基づいてパラメータの更新量に重み付けしても良い。例えば、撮像画像のレーティングの評価値が高いほどパラメータの更新量を大きくする方法がある。 The parameter update amount may be weighted based on the evaluation value of the rating of the captured image. For example, there is a method of increasing the amount of parameter update as the evaluation value of the rating of the captured image is higher.

＜変形例１＞
第１の実施形態では、情報取得部１２２、検出部１２３、動き算出部１２４、注目度算出部１２５のそれぞれは独立した機能部としていたが、これらの機能部を１つの機能部に統合しても良い。例えば、これらの機能部の動作をＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いて一括で実施するようにしても良い。 <Modification 1>
In the first embodiment, the information acquisition unit 122, the detection unit 123, the motion calculation unit 124, and the attention level calculation unit 125 are each independent functional units, but these functional units are integrated into one functional unit. Is also good. For example, the operations of these functional units may be collectively performed using a CNN (Convolutional Neural Network).

情報取得部１２２、検出部１２３、動き算出部１２４、注目度算出部１２５の機能をＣＮＮを用いて実施する構成について、図３（ｂ）を用いて説明する。図３（ｂ）に示す構成を用いて、情報取得部１２２、検出部１２３、動き算出部１２４、注目度算出部１２５の機能を一括で実施する。 A configuration in which the functions of the information acquisition unit 122, the detection unit 123, the motion calculation unit 124, and the attention level calculation unit 125 are performed by using the CNN will be described with reference to FIG. 3 (b). Using the configuration shown in FIG. 3B, the functions of the information acquisition unit 122, the detection unit 123, the motion calculation unit 124, and the attention level calculation unit 125 are collectively performed.

図３（ｂ）に示す構成のＮＮモデルは、３種類のＣＮＮ（ＣＮＮ３２１、３２２、３２３）を有する。ＣＮＮ３２１は、画像取得部１２１が取得した撮像画像３３１から候補被写体の位置と大きさを示す特徴量３３２を推定するモデルである。ＣＮＮ３２２は上記の非特許文献３に開示されているような複数の画像からオプティカルフローを算出するモデルである。さらにＣＮＮ３２２は、オプティカルフローから候補被写体の動きと背景の動きを算出し、背景の動きに基づいて視野制御情報を特徴量３３３として算出する。連結器３２４は、特徴量３３２と特徴量３３３とを連結した特徴量３３４を算出する。ＣＮＮ３２３は、特徴量３３４から撮像画像３３１における各領域の注目度のマップ（注目度マップ３３５）を算出するモデルである。そして本変形例では、選択部１２６は、注目度マップ３３５におけるそれぞれの候補被写体の注目度に基づいて、該それぞれの候補被写体から１以上の選択被写体を選択する。このようにすることで、学習可能なモデルで被写体を検出することができる。 The NN model having the configuration shown in FIG. 3B has three types of CNNs (CNN321, 322, 323). The CNN 321 is a model that estimates a feature amount 332 indicating the position and size of a candidate subject from the captured image 331 acquired by the image acquisition unit 121. CNN322 is a model for calculating an optical flow from a plurality of images as disclosed in Non-Patent Document 3 above. Further, CNN322 calculates the movement of the candidate subject and the movement of the background from the optical flow, and calculates the visual field control information as the feature amount 333 based on the movement of the background. The coupler 324 calculates the feature amount 334 by connecting the feature amount 332 and the feature amount 333. The CNN 323 is a model for calculating a map of the degree of attention of each region in the captured image 331 (attention map 335) from the feature amount 334. Then, in this modification, the selection unit 126 selects one or more selected subjects from the respective candidate subjects based on the degree of attention of each candidate subject in the attention level map 335. By doing so, it is possible to detect the subject with a learnable model.

＜変形例２＞
着目フレームの撮像画像において選択被写体を選択し、該着目フレーム以降、該選択被写体を追尾するような場合、該着目フレーム以降のあるフレームの撮像画像において該選択被写体の注目度が閾値未満になれば、該選択被写体以外の他候補被写体の注目度に基づき、該他候補被写体から改めて選択被写体を選択するようにしても良い。このようにすることで、一度選択した被写体を追尾する場合に、途中で他の被写体に誤って追尾が移った場合に、正しい被写体を再選択することができる。 <Modification 2>
When a selected subject is selected in the captured image of the frame of interest and the selected subject is tracked after the frame of interest, if the degree of attention of the selected subject is less than the threshold value in the captured image of a frame after the frame of interest. , The selected subject may be selected again from the other candidate subjects based on the degree of attention of the other candidate subjects other than the selected subject. By doing so, when tracking a once selected subject, if the tracking is mistakenly transferred to another subject in the middle, the correct subject can be reselected.

＜変形例３＞
第１の実施形態では、選択部１２６は、それぞれの候補被写体の注目度に基づいて、該それぞれの候補被写体から１以上の選択被写体を選択していたが、選択被写体の選択はユーザ操作に応じて行っても良い。本変形例では、情報処理装置１００が撮像部１５０を含む撮像装置であるケースを例にとり説明する。 <Modification 3>
In the first embodiment, the selection unit 126 selects one or more selected subjects from the respective candidate subjects based on the degree of attention of each candidate subject, but the selection of the selected subject depends on the user operation. You may go. In this modification, a case where the information processing device 100 is an image pickup device including an image pickup unit 150 will be described as an example.

本実施形態に係る撮像装置の外観例を図４（ａ）に示す。本実施形態に係る撮像装置４５０は、各種の表示を行う表示部４５１と、ユーザが各種の操作入力を行うために操作するスイッチ４５２およびダイアル４５３と、を有する。 FIG. 4A shows an example of the appearance of the image pickup apparatus according to the present embodiment. The image pickup apparatus 450 according to the present embodiment includes a display unit 451 that displays various displays, a switch 452 and a dial 453 that are operated by the user to perform various operation inputs.

表示部４５１における表示例を図４（ｂ）に示す。以下では、表示部４５１における表示制御は選択部１２６が行うものとする。図４（ｂ）に示す如く、表示部４５１には、候補被写体４０１および候補被写体４０２を含む撮像画像４００が表示されている。また、撮像装置４５０は、図４（ｂ）の右側に示す如く、候補被写体４０１について求めた注目度「０．６」および候補被写体４０２について求めた注目度「０．４」が登録されたテーブル４２０を保持している。 A display example on the display unit 451 is shown in FIG. 4 (b). In the following, it is assumed that the selection unit 126 performs the display control in the display unit 451. As shown in FIG. 4B, the display unit 451 displays the captured image 400 including the candidate subject 401 and the candidate subject 402. Further, as shown on the right side of FIG. 4B, the image pickup apparatus 450 is a table in which the attention level “0.6” obtained for the candidate subject 401 and the attention level “0.4” obtained for the candidate subject 402 are registered. Holds 420.

選択部１２６は、候補被写体４０１を包含する枠４１１および候補被写体４０２を包含する枠４１２を表示しているが、枠４１１および枠４１２のそれぞれは、候補被写体４０１の注目度および候補被写体４０２の注目度に応じた表示形態となっている。図４（ｂ）では、候補被写体４０２よりも注目度が高い候補被写体４０１の枠４１１は実線の枠として表示されており、候補被写体４０１よりも注目度が低い候補被写体４０２の枠４１２は点線の枠として表示されている。なお、注目度に応じた枠の表示形態には様々な表示形態があり、特定の表示形態に限らない。例えば、注目度に応じて枠の色、線の太さ等を制御しても良い。また、枠ではなく、半透明色の矩形を候補被写体に重ねて表示しても良く、その場合、注目度に応じて矩形の表示形態を制御する。このように、枠以外に、「候補被写体をユーザに通知するための情報」であれば、いかなる情報を表示しても良い。 The selection unit 126 displays a frame 411 including the candidate subject 401 and a frame 412 including the candidate subject 402, but each of the frame 411 and the frame 412 has a degree of attention of the candidate subject 401 and attention of the candidate subject 402. The display format is according to the degree. In FIG. 4B, the frame 411 of the candidate subject 401 having a higher degree of attention than the candidate subject 402 is displayed as a solid line frame, and the frame 412 of the candidate subject 402 having a lower attention level than the candidate subject 401 is a dotted line. It is displayed as a frame. It should be noted that there are various display forms of the frame according to the degree of attention, and the display form is not limited to a specific display form. For example, the color of the frame, the thickness of the line, and the like may be controlled according to the degree of attention. Further, instead of the frame, a semi-transparent color rectangle may be superimposed on the candidate subject and displayed, and in that case, the display form of the rectangle is controlled according to the degree of attention. As described above, in addition to the frame, any information may be displayed as long as it is "information for notifying the user of the candidate subject".

ユーザがスイッチ４５２を押下すると、選択部１２６は、実線の枠である枠４１１で囲まれている候補被写体４０１（選択対象）を選択被写体として選択する。ユーザがダイアル４５３を一方側に回すと、枠４１１は点線の枠として表示され、「現在実線の枠で囲まれている候補被写体の次に注目度が高い候補被写体」である候補被写体４０２の枠４１２が実線で表示される。つまりユーザはダイアル４５３を操作することで、選択対象を注目度に応じた順序（注目度が高い順でも良いし、低い順でも良い）で切り替えることができる。このような選択被写体の選択により、注目度が高い候補被写体が複数ある場合に、ユーザの所望の候補被写体を選択被写体として選択することができる。 When the user presses the switch 452, the selection unit 126 selects the candidate subject 401 (selection target) surrounded by the frame 411, which is a solid line frame, as the selection subject. When the user turns the dial 453 to one side, the frame 411 is displayed as a dotted line frame, and the frame of the candidate subject 402 which is "the candidate subject having the next highest attention after the candidate subject currently surrounded by the solid line frame". 412 is displayed as a solid line. That is, by operating the dial 453, the user can switch the selection target in the order according to the degree of attention (the order of high attention may be high or the order of low attention may be low). By selecting a selected subject in this way, when there are a plurality of candidate subjects having a high degree of attention, the user's desired candidate subject can be selected as the selected subject.

＜第１の実施形態の効果＞
第１の実施形態によれば、撮像画像中に候補被写体が複数あり、ユーザが選択したい被写体の移動量が小さい場合や、コントラストが最大でない場合においても、候補被写体の動きと撮像部の視野制御情報に基づいて所望の被写体を選択することができる。さらに、第１の実施形態によれば、多様な候補被写体の動きや撮像部の視野制御情報に対して被写体を選択することができる。 <Effect of the first embodiment>
According to the first embodiment, even when there are a plurality of candidate subjects in the captured image and the movement amount of the subject desired by the user is small or the contrast is not the maximum, the movement of the candidate subject and the field of view control of the image pickup unit are controlled. The desired subject can be selected based on the information. Further, according to the first embodiment, the subject can be selected for various movements of the candidate subject and the visual field control information of the imaging unit.

なお、ネットワークカメラとして例示した撮像部１５０において、視野により注目すべき被写体の位置は変わることが考えられる。例えば、商店の防犯カメラとして撮像部１５０を設定した場合を考える。この場合、撮像部１５０が出入り口を撮像方向としている場合と、商品棚の近くを撮像方向としている場合とで注目すべき被写体の位置は変わるため、撮像方向と注目すべき被写体の位置とは相関関係がある。このように、撮像方向である視野制御情報を入力とすることでより高精度に注目すべき被写体を選択するための出力を得ることが可能となる。 In the image pickup unit 150 exemplified as a network camera, it is conceivable that the position of the subject to be noted changes depending on the field of view. For example, consider the case where the image pickup unit 150 is set as a security camera in a store. In this case, the position of the subject to be noted changes depending on whether the image pickup unit 150 has the entrance / exit as the image pickup direction or the vicinity of the product shelf as the image pickup direction, so that the image pickup direction and the position of the subject to be noted correlate with each other. There is a relationship. In this way, by inputting the visual field control information which is the imaging direction, it is possible to obtain an output for selecting a subject to be noted with higher accuracy.

［第２の実施形態］
本実施形態を含む以下の各実施形態では、第１の実施形態との差分について説明し、以下で特に触れない限りは、第１の実施形態と同様であるものとする。第１の実施形態では、候補被写体の動きと視野制御情報とに基づいて選択被写体を選択した。このような第１の実施形態では、注目度が高い特定の被写体の近くにある被写体を選択する場合、該特定の被写体の近くにある被写体が変わるたびに、被写体の再選択が必要になる。 [Second Embodiment]
Each of the following embodiments including the present embodiment describes the difference from the first embodiment, and is the same as the first embodiment unless otherwise specified below. In the first embodiment, the selected subject is selected based on the movement of the candidate subject and the visual field control information. In such a first embodiment, when a subject near a specific subject having a high degree of attention is selected, it is necessary to reselect the subject every time the subject near the specific subject changes.

本実施形態では、それぞれの候補被写体の注目度に基づいて、該それぞれの候補被写体から特定の候補被写体を注目点として特定し、それぞれの候補被写体について、注目点との相対的な位置の変化に基づいて追尾度を算出する。追尾度とは、候補被写体が注目点に接近している度合いである。 In the present embodiment, a specific candidate subject is specified as a point of interest from each candidate subject based on the degree of attention of each candidate subject, and the position of each candidate subject changes relative to the point of interest. The tracking degree is calculated based on this. The tracking degree is the degree to which the candidate subject is close to the point of interest.

本実施形態では、サッカーの試合を撮像した撮像画像のシーケンスが情報処理装置１００に入力されるケースを例にとり説明する。この場合、情報処理装置１００は、該撮像画像中のボールを注目点として選択し、該撮像画像中のそれぞれの候補被写体（注目点以外の候補被写体）について、ボールとの相対的な位置の変化に基づいて追尾度を算出し、該追尾度に基づいて選択被写体を選択する。これにより、特定の注目点の近くで注目点に近接している被写体を選択しやすくなる。 In the present embodiment, a case where a sequence of captured images of a soccer game is input to the information processing apparatus 100 will be described as an example. In this case, the information processing apparatus 100 selects a ball in the captured image as a point of interest, and changes in the relative position of each candidate subject (candidate subject other than the point of interest) in the captured image with respect to the ball. The tracking degree is calculated based on the above, and the selected subject is selected based on the tracking degree. This makes it easier to select a subject that is close to a particular point of interest.

次に、撮像部１５０が撮像した撮像画像の一例について、図５（ｃ）を用いて説明する。撮像部１５０により撮像された撮像画像５５０には、矢印の方向に移動している被写体５５１および被写体５５２と、矢印の方向に移動しているボール５５３と、が含まれている。（ｘ１，ｙ１）は撮像画像５５０における被写体５５１の重心位置（画像座標）を示し、（ｘ２，ｙ２）は撮像画像５５０における被写体５５２の重心位置（画像座標）を示している。また、（ｘｂ，ｙｂ）は撮像画像５５０におけるボール５５３の重心位置（画像座標）を示している。 Next, an example of the captured image captured by the imaging unit 150 will be described with reference to FIG. 5 (c). The captured image 550 captured by the image pickup unit 150 includes a subject 551 and a subject 552 moving in the direction of the arrow, and a ball 553 moving in the direction of the arrow. (X1, y1) indicates the position of the center of gravity (image coordinates) of the subject 551 in the captured image 550, and (x2, y2) indicates the position of the center of gravity (image coordinates) of the subject 552 in the captured image 550. Further, (xb, yb) indicates the position of the center of gravity (image coordinates) of the ball 553 in the captured image 550.

次に、本実施形態に係るシステムの機能構成例について、図５（ａ）のブロック図を用いて説明する。情報処理装置５００は、上記の情報処理装置１００に注目点算出部５０１および追尾度算出部５０２を加えた構成を有する。 Next, an example of the functional configuration of the system according to the present embodiment will be described with reference to the block diagram of FIG. 5A. The information processing apparatus 500 has a configuration in which the attention point calculation unit 501 and the tracking degree calculation unit 502 are added to the information processing apparatus 100.

注目点算出部５０１は、注目度算出部１２５が求めたそれぞれの候補被写体の注目度に基づいて、該それぞれの候補被写体から特定の候補被写体を注目点として選択する。追尾度算出部５０２は、注目点に対する候補被写体（注目点以外の候補被写体）の相対的な位置の変化に基づいて、該候補被写体に対して追尾度を算出する。 The attention point calculation unit 501 selects a specific candidate subject from the respective candidate subjects as the attention point based on the attention level of each candidate subject obtained by the attention degree calculation unit 125. The tracking degree calculation unit 502 calculates the tracking degree for the candidate subject based on the change in the relative position of the candidate subject (candidate subject other than the attention point) with respect to the attention point.

次に、情報処理装置５００の動作について、図５（ｂ）のフローチャートに従って説明する。図５（ｂ）において、図２（ａ）の処理ステップと同じ処理ステップには同じステップ番号を付しており、該処理ステップに係る説明は省略する。図５（ｂ）のフローチャートでは、ステップＳ２０５における処理の後、ステップＳ１５０１に進む。 Next, the operation of the information processing apparatus 500 will be described with reference to the flowchart of FIG. 5 (b). In FIG. 5B, the same processing steps as those in FIG. 2A are assigned the same step numbers, and the description of the processing steps will be omitted. In the flowchart of FIG. 5B, after the processing in step S205, the process proceeds to step S1501.

＜ステップＳ１５０１＞
注目点算出部５０１は、ステップＳ２０５で算出したそれぞれの候補被写体の注目度のうち最も値が大きい注目度を特定し、該特定した注目度の候補被写体を注目点として選択する。以下では具体的な説明を行うために、図５（ｃ）の被写体５５１、被写体５５２、ボール５５３の３つの候補被写体のそれぞれの注目度のうち、ボール５５３の注目度が最も値が大きいとし、その結果、注目点としてボール５５３が選択されたものとする。 <Step S1501>
The attention point calculation unit 501 identifies the attention degree having the largest value among the attention degrees of each candidate subject calculated in step S205, and selects the candidate subject of the specified attention degree as the attention point. In the following, in order to give a specific explanation, it is assumed that the attention of the ball 553 is the highest among the attention of each of the three candidate subjects of the subject 551, the subject 552, and the ball 553 in FIG. 5 (c). As a result, it is assumed that the ball 553 is selected as the point of interest.

＜ステップＳ１５０２＞
選択部１２６は、注目点を選択被写体として選択するか否かを判断する。注目点を選択被写体として選択するか否かを判断する判断方法には様々な判断方法があり、特定の判断方法に限らない。本実施形態では、検出部１２３は、上記の非特許文献２に記載の物体検出（被写体の検出）と並行して物体の属性分類（該被写体の属性の分類）を行う。そして選択部１２６は、注目点の属性が「人物」であれば、該注目点を選択被写体として選択するものと判断し、処理はステップＳ２０６に進む。一方、選択部１２６は、注目点の属性が「人物」ではない場合には、該注目点は選択被写体として選択しないものと判断し、処理はステップＳ１５０３に進む。 <Step S1502>
The selection unit 126 determines whether or not to select the point of interest as the selected subject. There are various judgment methods for judging whether or not to select a point of interest as a selected subject, and the judgment method is not limited to a specific judgment method. In the present embodiment, the detection unit 123 performs attribute classification (classification of the subject attribute) of the object in parallel with object detection (subject detection) described in Non-Patent Document 2 above. Then, if the attribute of the attention point is "person", the selection unit 126 determines that the attention point is selected as the selected subject, and the process proceeds to step S206. On the other hand, when the attribute of the attention point is not "person", the selection unit 126 determines that the attention point is not selected as the selected subject, and the process proceeds to step S1503.

なお、注目点を選択被写体として選択するか否かは、ユーザに判断させてもよい。例えば、注目点を選択被写体として選択するか否かを示す設定情報をユーザが情報処理装置５００に事前に設定しておいても良い。また例えば、ステップＳ１５０２において、注目点を選択被写体として選択するか否かをユーザに問い合わせても良い。そして、ユーザから「注目点を選択被写体として選択する」旨の操作入力があれば、処理はステップＳ２０６に進み、ユーザから「注目点を選択被写体として選択しない」旨の操作入力があれば、処理はステップＳ１５０３に進む。 The user may decide whether or not to select the point of interest as the selected subject. For example, the user may set in advance setting information indicating whether or not to select the attention point as the selected subject in the information processing apparatus 500. Further, for example, in step S1502, the user may be inquired whether or not to select the attention point as the selected subject. Then, if there is an operation input to "select the attention point as the selected subject" from the user, the process proceeds to step S206, and if there is an operation input to "do not select the attention point as the selected subject" from the user, the process proceeds. Goes to step S1503.

＜ステップＳ１５０３＞
追尾度算出部５０２は、被写体５５１および被写体５５２のそれぞれについて、ボール５５３との相対的な位置の変化に基づいて、追尾度を算出する。ボール５５３に対する被写体５５１の相対的な位置は（ｘ１－ｘｂ，ｙ１－ｙｂ）であり、ボール５５３に対する被写体５５２の相対的な位置は（ｘ２－ｘｂ，ｙ２－ｙｂ）である。 <Step S1503>
The tracking degree calculation unit 502 calculates the tracking degree for each of the subject 551 and the subject 552 based on the change in the position relative to the ball 553. The relative position of the subject 551 with respect to the ball 553 is (x1-xb, y1-yb), and the relative position of the subject 552 with respect to the ball 553 is (x2-xb, y2-yb).

よって、追尾度算出部５０２は、撮像画像間における（ｘ１－ｘｂ，ｙ１－ｙｂ）の差分を、「ボール５５３に対する被写体５５１の相対的な位置の変化」として求める。また、追尾度算出部５０２は、撮像画像間における（ｘ２－ｘｂ，ｙ２－ｙｂ）の差分を、「ボール５５３に対する被写体５５２の相対的な位置の変化」として求める。 Therefore, the tracking degree calculation unit 502 obtains the difference of (x1-xb, y1-yb) between the captured images as "change in the relative position of the subject 551 with respect to the ball 553". Further, the tracking degree calculation unit 502 obtains the difference of (x2-xb, y2-yb) between the captured images as "change in the relative position of the subject 552 with respect to the ball 553".

そして追尾度算出部５０２は、一定期間内の「ボール５５３に対する被写体５５１の相対的な位置の変化」を求める。そして追尾度算出部５０２は、一定期間内の変化の平均（ボール５５３に対する被写体５５１の平均的な相対位置）と、ボール５５３の位置と、の間の距離の逆数を、被写体５５１に対する追尾度として求める。 Then, the tracking degree calculation unit 502 obtains "a change in the relative position of the subject 551 with respect to the ball 553" within a certain period of time. Then, the tracking degree calculation unit 502 uses the reciprocal of the distance between the average of changes within a certain period (the average relative position of the subject 551 with respect to the ball 553) and the position of the ball 553 as the tracking degree with respect to the subject 551. demand.

同様に追尾度算出部５０２は、一定期間内の「ボール５５３に対する被写体５５２の相対的な位置の変化」を求める。そして追尾度算出部５０２は、一定期間内の変化の平均（ボール５５３に対する被写体５５２の平均的な相対位置）と、ボール５５３の位置と、の間の距離の逆数を、被写体５５２に対する追尾度として求める。 Similarly, the tracking degree calculation unit 502 obtains "a change in the relative position of the subject 552 with respect to the ball 553" within a certain period of time. Then, the tracking degree calculation unit 502 uses the reciprocal of the distance between the average of changes within a certain period (the average relative position of the subject 552 with respect to the ball 553) and the position of the ball 553 as the tracking degree with respect to the subject 552. demand.

なお、追尾度算出部５０２は注目度算出部１２５と同様に、学習済みモデルを用いて候補被写体に対する追尾度を求めるようにしても良い。この場合、追尾度算出部５０２は、「注目点に対する候補被写体の相対的な位置の変化」を入力した学習済みモデルの出力を、該候補被写体に対する追尾度として求める。この学習済みモデルは、「注目点に対する候補被写体の相対的な位置の変化」が入力されると、該候補被写体に対する追尾度を出力するように学習されたＮＮモデルである。 Note that the tracking degree calculation unit 502 may obtain the tracking degree for the candidate subject by using the trained model in the same manner as the attention degree calculation unit 125. In this case, the tracking degree calculation unit 502 obtains the output of the trained model in which the "change in the relative position of the candidate subject with respect to the point of interest" is input as the tracking degree for the candidate subject. This trained model is an NN model trained to output the tracking degree for the candidate subject when "change in the relative position of the candidate subject with respect to the point of interest" is input.

そして、本実施形態では、ステップＳ１５０１からステップＳ２０６に処理が進んだ場合には、ステップＳ２０６では第１の実施形態と同様の処理が行われる。一方、ステップＳ１５０３からステップＳ２０６に処理が進んだ場合、ステップＳ２０６では、選択部１２６は、注目点以外のそれぞれの候補被写体の追尾度に基づいて、該それぞれの候補被写体から１以上の候補被写体を選択被写体として選択する。注目度の代わりに追尾度を用いること以外は、第１の実施形態におけるステップＳ２０６と同様である。 Then, in the present embodiment, when the process proceeds from step S1501 to step S206, the same process as in the first embodiment is performed in step S206. On the other hand, when the process proceeds from step S1503 to step S206, in step S206, the selection unit 126 selects one or more candidate subjects from the respective candidate subjects based on the tracking degree of each candidate subject other than the point of interest. Select as the selected subject. It is the same as step S206 in the first embodiment except that the tracking degree is used instead of the attention degree.

＜第２の実施形態の効果＞
第２の実施形態によれば、特定の物体の注目度が高い場合に、該特定の物体の近くの物体を選択被写体として選択することができる。 <Effect of the second embodiment>
According to the second embodiment, when the degree of attention of a specific object is high, an object near the specific object can be selected as a selection subject.

［第３の実施形態］
第１の実施形態では、撮像時の撮像部１５０の視野制御情報に基づいて候補被写体の注目度を算出し、注目度に基づいて選択被写体を選択する方法について説明した。第１の実施形態では、選択被写体を選択する撮像画像を撮像した撮像部１５０の視野制御情報を取得しているので、該撮像画像から選択被写体を選択することができる。しかし、選択被写体を選択する撮像画像が過去に撮像されたものであり、該撮像画像の撮像時における撮像部１５０の視野制御情報が得られていない場合、第１の実施形態では、該撮像画像から選択被写体を選択することができない。本実施形態では、撮像部１５０の視野制御情報として、過去に撮像した撮像画像に対するユーザの閲覧操作に係る情報を用いる。 [Third Embodiment]
In the first embodiment, a method of calculating the degree of attention of a candidate subject based on the visual field control information of the image pickup unit 150 at the time of imaging and selecting the selected subject based on the degree of attention has been described. In the first embodiment, since the visual field control information of the image pickup unit 150 that has captured the captured image for selecting the selected subject is acquired, the selected subject can be selected from the captured image. However, if the captured image for selecting the selected subject has been captured in the past and the visual field control information of the imaging unit 150 at the time of capturing the captured image is not obtained, in the first embodiment, the captured image Cannot select the selected subject from. In the present embodiment, as the visual field control information of the image pickup unit 150, information related to the user's browsing operation for the captured images captured in the past is used.

本実施形態に係るシステムの機能構成例について、図６（ａ）のブロック図を用いて説明する。図６（ａ）に示す如く、本実施形態に係るシステムは、情報処理装置６００と、ユーザが情報処理装置６００に対して各種の操作入力を行うために操作する操作部６０１と、過去に撮像された撮像画像群を保持する外部記憶装置６０２と、を有する。 An example of the functional configuration of the system according to the present embodiment will be described with reference to the block diagram of FIG. 6A. As shown in FIG. 6A, the system according to the present embodiment includes an information processing apparatus 600, an operation unit 601 operated by a user to perform various operation inputs to the information processing apparatus 600, and an image taken in the past. It has an external storage device 602 that holds the captured image group.

まず、操作部６０１について説明する。操作部６０１は、キーボード、マウス、タッチパネルなどのユーザインターフェースを有し、ユーザが操作することで各種の指示を情報処理装置６００に対して入力することができる。また、操作部６０１は、撮像画像などを表示可能な表示画面を有する。 First, the operation unit 601 will be described. The operation unit 601 has a user interface such as a keyboard, a mouse, and a touch panel, and various instructions can be input to the information processing apparatus 600 by the user operating the operation unit 601. Further, the operation unit 601 has a display screen capable of displaying an captured image or the like.

次に、外部記憶装置６０２について説明する。外部記憶装置６０２は、ハードディスクドライブ装置やＵＳＢメモリなどのメモリ装置であり、該外部記憶装置６０２には、過去に撮像された撮像画像群が保存されている。 Next, the external storage device 602 will be described. The external storage device 602 is a memory device such as a hard disk drive device or a USB memory, and the external storage device 602 stores a group of captured images captured in the past.

次に、情報処理装置６００について説明する。情報処理装置６００は、上記の情報処理装置１００と同様の構成を有しているが、情報取得部１２２が視野制御情報を操作部６０１から取得する点、画像取得部１２１が撮像画像を外部記憶装置６０２から取得する点、が第１の実施形態と異なる。ユーザは操作部６０１を操作して、外部記憶装置６０２に保存されている撮像画像に対して各種の閲覧操作を行っており、情報取得部１２２は、この閲覧操作に係る情報を視野制御情報として操作部６０１から取得する。撮像画像に対する閲覧操作に係る情報には、例えば、ユーザによる該撮像画像の拡大操作や、ユーザによる該撮像画像内の閲覧範囲の移動操作（上下左右方向のスクロール操作）などの操作に係る情報が含まれている。 Next, the information processing apparatus 600 will be described. The information processing device 600 has the same configuration as the information processing device 100 described above, but the information acquisition unit 122 acquires the field control information from the operation unit 601 and the image acquisition unit 121 stores the captured image externally. The point obtained from the apparatus 602 is different from the first embodiment. The user operates the operation unit 601 to perform various browsing operations on the captured image stored in the external storage device 602, and the information acquisition unit 122 uses the information related to this browsing operation as the visual field control information. Obtained from the operation unit 601. The information related to the viewing operation for the captured image includes, for example, information related to operations such as an operation for enlarging the captured image by the user and an operation for moving the viewing range in the captured image (scrolling operation in the up, down, left, and right directions) by the user. include.

情報処理装置６００は、情報取得部１２２が操作部６０１から視野制御情報を取得すること、画像取得部１２１が外部記憶装置６０２から撮像画像を取得すること、視野制御情報として撮像画像の閲覧操作に係る情報を用いること、を除けば、情報処理装置１００と同様に動作する。 In the information processing device 600, the information acquisition unit 122 acquires the field control information from the operation unit 601, the image acquisition unit 121 acquires the captured image from the external storage device 602, and the information processing device 600 is used for viewing the captured image as the field control information. It operates in the same manner as the information processing apparatus 100, except that such information is used.

図６（ｂ）は、画像取得部１２１が外部記憶装置６０２から読み出した撮像画像の部分画像６５０（撮像画像内の閲覧範囲）の、操作部６０１の表示画面における表示例を示しており、該部分画像６５０には、被写体６５１，６５２，６５３が含まれている。ユーザは、撮像画像において閲覧したい範囲がウィンドウ内に収まるように、操作部６０１を操作してウィンドウ内で撮像画像をスクロールさせることができる。また、ユーザは、ウィンドウ内に表示されている部分画像６５０の部分領域６５５を拡大した拡大画像６６０（図６（ｃ））をウィンドウ内に表示させるべく、操作部６０１を操作してカーソル６５４を用いてドラッグ操作を行うことで部分領域６５５を指定する。これによりウィンドウ内には図６（ｃ）に示す如く、部分領域６５５内の画像を拡大した拡大画像６６０が表示される。このようなユーザ操作に応じた画像の表示制御は、操作部６０１によって行われる。そして情報取得部１２２は、このようなユーザ操作に係る情報を視野制御情報として操作部６０１から取得する。本実施形態では、ユーザが操作部６０１を操作して行ったスクロール操作が、撮像画像のどの部分領域をウィンドウ内に表示させるためのスクロール操作であったのか、を示すスクロール情報を視野制御情報として用いるものとする。 FIG. 6B shows an example of displaying a partial image 650 (viewing range in the captured image) of the captured image read from the external storage device 602 by the image acquisition unit 121 on the display screen of the operation unit 601. The partial image 650 includes a subject 651,652,653. The user can operate the operation unit 601 to scroll the captured image in the window so that the range to be viewed in the captured image fits in the window. Further, the user operates the operation unit 601 to move the cursor 654 in order to display the enlarged image 660 (FIG. 6 (c)) in which the partial area 655 of the partial image 650 displayed in the window is enlarged in the window. The partial area 655 is specified by performing a drag operation using the tool. As a result, as shown in FIG. 6C, an enlarged image 660 obtained by enlarging the image in the partial region 655 is displayed in the window. The image display control according to such a user operation is performed by the operation unit 601. Then, the information acquisition unit 122 acquires the information related to such a user operation from the operation unit 601 as the visual field control information. In the present embodiment, scroll information indicating which partial area of the captured image is to be displayed in the window as the scroll operation performed by the user by operating the operation unit 601 is used as the visual field control information. It shall be used.

なお、候補被写体ごとに、撮像画像のシーケンスにおける注目度の平均（平均注目度）を求め、該求めた平均注目度に応じて選択被写体の選択を行うようにしても良い。その際、撮像画像の部分領域６５５内の画像を拡大した拡大画像６６０をウィンドウ内に表示させた場合には、拡大画像６６０の拡大率（部分領域６５５内の画像に対する拡大率）に応じて、該撮像画像内の候補被写体の重みを制御する。よって、平均注目度は、撮像画像ごとに注目度を重み付けした重み付け注目度の平均である。 The average attention level (average attention level) in the sequence of captured images may be obtained for each candidate subject, and the selected subject may be selected according to the obtained average attention level. At that time, when the enlarged image 660 obtained by enlarging the image in the partial area 655 of the captured image is displayed in the window, the magnified image 660 is enlarged according to the magnified image (magnification rate with respect to the image in the partial area 655). The weight of the candidate subject in the captured image is controlled. Therefore, the average attention level is the average of the weighted attention levels weighted for each captured image.

＜変形例１＞
第３の実施形態では、外部記憶装置６０２には、１台のみの撮像部１５０による撮像画像群が保存されているものとしたが、２台以上の撮像部１５０による撮像画像群が保存されていても良い。その場合、ユーザは任意の撮像部１５０による撮像画像を閲覧することができる。 <Modification 1>
In the third embodiment, it is assumed that the external storage device 602 stores the image group captured by only one image pickup unit 150, but the image group captured by two or more image pickup units 150 is stored. May be. In that case, the user can view the image captured by any imaging unit 150.

現在、撮像部Ａによる撮像画像の閲覧中にユーザが操作部６０１を操作して、撮像部Ｂの撮像画像を閲覧する指示を入力したとする。このとき情報取得部１２２は、撮像部Ａの撮像範囲から撮像部Ｂの撮像範囲への移動量ＭＡを求める。 It is assumed that the user operates the operation unit 601 while viewing the image captured by the image pickup unit A and inputs an instruction to view the image captured by the image pickup unit B. At this time, the information acquisition unit 122 obtains the amount of movement MA from the image pickup range of the image pickup unit A to the image pickup range of the image pickup unit B.

検出部１２３は、撮像画像内におけるそれぞれの被写体を認識する。動き算出部１２４は、撮像部Ａによる撮像画像における被写体Ｔの位置から撮像部Ｂによる撮像画像における被写体Ｔの位置への移動量ＭＢを求める。 The detection unit 123 recognizes each subject in the captured image. The motion calculation unit 124 obtains the amount of movement MB from the position of the subject T in the image captured by the image pickup unit A to the position of the subject T in the image captured by the image pickup unit B.

注目度算出部１２５は、移動量ＭＡと移動量ＭＢに対応する「被写体Ｔの注目度」を出力するよう学習された学習済みモデルに、該移動量ＭＡと移動量ＭＢとを入力して、該学習済みモデルの出力を「被写体Ｔの注目度」として求める。なお、移動量ＭＡと移動量ＭＢとから「被写体Ｔの注目度」を求める方法は特定の方法に限らず、他にも例えば、第１の実施形態と同様に、撮像範囲の移動量と被写体Ｔの移動量との相関を用いて該被写体Ｔの注目度を求める方法を採用しても良い。このようにすることで、複数の撮像部で撮像した複数の撮像画像の閲覧情報から、閲覧者が注目している被写体を選択することができる。 The attention degree calculation unit 125 inputs the movement amount MA and the movement amount MB into the trained model trained to output the “attention degree of the subject T” corresponding to the movement amount MA and the movement amount MB. The output of the trained model is obtained as the "attention degree of the subject T". The method of obtaining the "attention level of the subject T" from the movement amount MA and the movement amount MB is not limited to a specific method, and for example, as in the first embodiment, the movement amount and the subject in the imaging range. A method of obtaining the degree of attention of the subject T by using the correlation with the movement amount of T may be adopted. By doing so, it is possible to select the subject that the viewer is paying attention to from the browsing information of the plurality of captured images captured by the plurality of imaging units.

＜第３の実施形態の効果＞
第３の実施形態によれば、監視カメラの録画データなど、撮像部から取得できる視野制御情報が限定される場合であっても、閲覧者による画像の閲覧操作に基づいて注目度を算出し、注目度に基づいて被写体を選択することができる。 <Effect of the third embodiment>
According to the third embodiment, even when the visual field control information that can be acquired from the image pickup unit is limited, such as the recorded data of the surveillance camera, the attention level is calculated based on the image viewing operation by the viewer. The subject can be selected based on the degree of attention.

＜他分野への適用例＞
例えば、仮想視点画像において閲覧操作に基づいて注目する被写体の選択を行うような場合であっても、上記の実施形態や変形例を同様に適用することができる。仮想視点画像とは、複数の撮像装置を用いて異なる位置から同期撮像を行い、得られた多視点の画像を使用して生成された画像であり、このような技術により、３次元空間上の任意の視点からの画像を取得することができる。他にも、ＶＲゴーグルなどのゴーグル型の映像再生装置において、ゴーグルの動き情報に加え、ユーザの視線情報に基づいて候補被写体の注目度を算出するような場合であっても、上記の実施形態や変形例を同様に適用することができる。このようにすることで、注目度が高い人物や物体に関するアノテーションを表示することや、追尾映像を再生することができる。 <Example of application to other fields>
For example, even in the case of selecting a subject of interest based on a browsing operation in a virtual viewpoint image, the above-described embodiment and modification can be applied in the same manner. A virtual viewpoint image is an image generated by performing synchronous imaging from different positions using a plurality of image pickup devices and using the obtained multi-viewpoint images. By such a technique, a virtual viewpoint image is displayed on a three-dimensional space. Images from any viewpoint can be acquired. In addition, in a goggle-type video reproduction device such as VR goggles, even when the attention level of the candidate subject is calculated based on the user's line-of-sight information in addition to the movement information of the goggles, the above embodiment And variants can be applied in the same way. By doing so, it is possible to display annotations related to a person or an object having a high degree of attention and to reproduce a tracking image.

［第４の実施形態］
図１（ｂ）に示した情報処理装置１００の機能部、図５（ａ）に示した情報処理装置５００の機能部、図６（ａ）に示した情報処理装置６００の機能部、はハードウェアで実装しても良いし、ソフトウェア（コンピュータプログラム）で実装しても良い。後者の場合、該ソフトウェアを実行可能なコンピュータ装置は、上記の情報処理装置１００や情報処理装置５００や情報処理装置６００に適用可能である。 [Fourth Embodiment]
The functional unit of the information processing device 100 shown in FIG. 1 (b), the functional unit of the information processing device 500 shown in FIG. 5 (a), and the functional unit of the information processing device 600 shown in FIG. 6 (a) are hardware. It may be implemented by hardware or software (computer program). In the latter case, the computer device capable of executing the software can be applied to the information processing device 100, the information processing device 500, and the information processing device 600.

このようなコンピュータ装置のハードウェア構成例について、図１（ａ）のブロック図を用いて説明する。なお、図１（ａ）に示した構成は、情報処理装置１００や情報処理装置５００や情報処理装置６００に適用可能なコンピュータ装置の構成の一例であり、適宜変更／変形が可能である。 An example of such a hardware configuration of a computer device will be described with reference to the block diagram of FIG. 1 (a). The configuration shown in FIG. 1A is an example of the configuration of a computer device applicable to the information processing device 100, the information processing device 500, and the information processing device 600, and can be appropriately changed / modified.

ＣＰＵ１０１は、ＲＯＭ１０２やＲＡＭ１０３に格納されているコンピュータプログラムやデータを用いて各種の処理を実行する。これによりＣＰＵ１０１は、コンピュータ装置全体の動作制御を行うと共に、上記の情報処理装置１００や情報処理装置５００や情報処理装置６００が行うものとして説明した各種の処理を実行もしくは制御する。 The CPU 101 executes various processes using computer programs and data stored in the ROM 102 and the RAM 103. As a result, the CPU 101 controls the operation of the entire computer device, and also executes or controls various processes described as those performed by the information processing device 100, the information processing device 500, and the information processing device 600.

ＲＯＭ１０２には、上記の情報処理装置１００や情報処理装置５００や情報処理装置６００の設定データが格納されている。また、ＲＯＭ１０２には、上記の情報処理装置１００や情報処理装置５００や情報処理装置６００の起動に係るコンピュータプログラムやデータが格納されている。また、ＲＯＭ１０２には、上記の情報処理装置１００や情報処理装置５００や情報処理装置６００の基本動作に係るコンピュータプログラムやデータが格納されている。 The ROM 102 stores the setting data of the information processing device 100, the information processing device 500, and the information processing device 600. Further, the ROM 102 stores computer programs and data related to the activation of the information processing device 100, the information processing device 500, and the information processing device 600. Further, the ROM 102 stores computer programs and data related to the basic operations of the information processing device 100, the information processing device 500, and the information processing device 600.

ＲＡＭ１０３は、ＲＯＭ１０２や外部記憶装置１０４からロードされたコンピュータプログラムやデータを格納するためのエリアを有する。さらにＲＡＭ１０３は、Ｉ／Ｆ１０７を介して外部（撮像部１５０、操作部６０１、外部記憶装置６０２など）から受信したデータを格納するためのエリアを有する。さらにＲＡＭ１０３は、ＣＰＵ１０１が各種の処理を実行する際に用いるワークエリアを有する。このようにＲＡＭ１０３は、各種のエリアを適宜提供することができる。 The RAM 103 has an area for storing computer programs and data loaded from the ROM 102 and the external storage device 104. Further, the RAM 103 has an area for storing data received from the outside (imaging unit 150, operation unit 601, external storage device 602, etc.) via the I / F 107. Further, the RAM 103 has a work area used by the CPU 101 to execute various processes. As described above, the RAM 103 can appropriately provide various areas.

外部記憶装置１０４は、ハードディスクドライブ装置、ＵＳＢメモリ、フラッシュメモリなどの不揮発性メモリである。外部記憶装置１０４にはＯＳ（オペレーティングシステム）、情報処理装置１００や情報処理装置５００や情報処理装置６００が行うものとして説明した各種の処理をＣＰＵ１０１に実行もしくは制御させる為のコンピュータプログラムやデータが保存されている。外部記憶装置１０４に保存されているコンピュータプログラムやデータは、ＣＰＵ１０１による制御に従って適宜ＲＡＭ１０３にロードされ、ＣＰＵ１０１による処理対象となる。外部記憶装置１０４は、上記の外部記憶装置６０２としても利用可能である。 The external storage device 104 is a non-volatile memory such as a hard disk drive device, a USB memory, and a flash memory. The external storage device 104 stores computer programs and data for causing the CPU 101 to execute or control various processes described as those performed by the OS (operating system), the information processing device 100, the information processing device 500, and the information processing device 600. Has been done. The computer programs and data stored in the external storage device 104 are appropriately loaded into the RAM 103 according to the control by the CPU 101, and are processed by the CPU 101. The external storage device 104 can also be used as the above-mentioned external storage device 602.

また、外部記憶装置１０４は、メディア（記録媒体）と、該メディアへのアクセスを実現するための外部記憶ドライブと、で実現することもできる。このようなメディアとしては、例えば、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤ、ＭＯ、等が知られている。また、外部記憶装置１０４は、ネットワークを介してコンピュータ装置と接続されているサーバ装置等のコンピュータ装置であっても良い。 Further, the external storage device 104 can also be realized by a medium (recording medium) and an external storage drive for realizing access to the media. As such a medium, for example, a flexible disk (FD), a CD-ROM, a DVD, an MO, and the like are known. Further, the external storage device 104 may be a computer device such as a server device connected to the computer device via a network.

表示部１０５は、液晶画面、有機ＥＬ画面、タッチパネル画面等を有し、ＣＰＵ１０１による処理結果を画像や文字などでもって表示する。なお、表示部１０５は、画像や文字を投影するプロジェクタなどの投影装置であっても良い。また、表示部１０５は、ネットワークを介してコンピュータ装置と接続されている表示装置であっても良い。表示部１０５は、上記の説明に登場した、表示を行う機器（表示画面）として利用可能である。 The display unit 105 has a liquid crystal screen, an organic EL screen, a touch panel screen, and the like, and displays the processing result by the CPU 101 with images, characters, and the like. The display unit 105 may be a projection device such as a projector that projects an image or characters. Further, the display unit 105 may be a display device connected to a computer device via a network. The display unit 105 can be used as a display device (display screen) that has appeared in the above description.

操作部１０６は、キーボード、マウス、タッチパネル画面などのユーザインターフェースであり、ユーザが操作することで各種の指示をＣＰＵ１０１に対して入力することができる。操作部１０６は、上記の操作部６０１としても利用可能である。 The operation unit 106 is a user interface such as a keyboard, a mouse, and a touch panel screen, and various instructions can be input to the CPU 101 by the user's operation. The operation unit 106 can also be used as the operation unit 601 described above.

表示部１０５及び操作部１０６の構成はこれに限らず、他の構成としても良い。例えば、操作部１０６をペンタブレッドとしても良いし、表示部１０５と操作部１０６とを合わせてタブレットとしても良い。 The configuration of the display unit 105 and the operation unit 106 is not limited to this, and other configurations may be used. For example, the operation unit 106 may be a pen tab red, or the display unit 105 and the operation unit 106 may be combined to form a tablet.

Ｉ／Ｆ１０７は、外部（撮像部１５０、操作部６０１、外部記憶装置６０２など）との間のデータ通信を行うための通信インターフェースであり、コンピュータ装置は、Ｉ／Ｆ１０７を介して外部との間のデータ通信を行うことができる。 The I / F 107 is a communication interface for performing data communication with the outside (imaging unit 150, operation unit 601, external storage device 602, etc.), and the computer device is with the outside via the I / F 107. Data communication can be performed.

ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、外部記憶装置１０４、表示部１０５、操作部１０６、Ｉ／Ｆ１０７はいずれも、システムバス１０８に接続されている。 The CPU 101, ROM 102, RAM 103, external storage device 104, display unit 105, operation unit 106, and I / F 107 are all connected to the system bus 108.

また、上記の各実施形態や各変形例で使用した数値、処理タイミング、処理順、データ（情報）の取得元／出力先、画面の構成やその操作方法などは、具体的な説明を行うために一例として挙げたもので、このような一例に限定することを意図したものではない。 In addition, the numerical values, processing timing, processing order, data (information) acquisition source / output destination, screen configuration, and its operation method used in each of the above embodiments and modifications are for specific explanation. It is given as an example, and is not intended to be limited to such an example.

また、以上説明した各実施形態や各変形例の一部若しくは全部を適宜組み合わせて使用しても構わない。また、以上説明した各実施形態や各変形例の一部若しくは全部を選択的に使用しても構わない。 Further, a part or all of each of the above-described embodiments and modifications may be used in combination as appropriate. Further, a part or all of each of the above-described embodiments and modifications may be selectively used.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiment, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to publicize the scope of the invention.

１００：情報処理装置１２１：画像取得部１２２：情報取得部１２３：検出部１２４：動き算出部１２５：注目度算出部１２６：選択部１２７：学習部１５０：撮像部 100: Information processing device 121: Image acquisition unit 122: Information acquisition unit 123: Detection unit 124: Motion calculation unit 125: Attention calculation unit 126: Selection unit 127: Learning unit 150: Imaging unit

Claims

An acquisition means for acquiring an image captured by the image pickup unit, and
Selection to select one or more subjects as the selected subject based on the movement of the subject in the captured image acquired by the acquisition means and the information related to the field of view of the imaging unit or the viewing operation for the captured image. An information processing device characterized by being provided with means.

Further provided with a calculation means for obtaining the degree of attention to the subject in the captured image based on the movement and the information.
The information processing according to claim 1, wherein the selection means selects one or more subjects from the subjects in the captured image as the selection subject based on the degree of attention obtained by the calculation means for each subject. Device.

The information processing apparatus according to claim 2, wherein the calculation means obtains the output of the trained model in which the movement and the information are input as the degree of attention.

The information according to claim 2 or 3, wherein the selection means reselects the selected subject based on the attention of a subject other than the selected subject when the attention of the selected subject becomes less than the threshold value. Processing equipment.

One of claims 2 to 4, further comprising display control means for displaying information for notifying the user of the subject based on the degree of attention obtained by the calculation means for each subject. The information processing device described in.

The information processing apparatus according to claim 5, wherein the display control means displays information for notifying the user of the subject in a display form according to the degree of attention of the subject.

The information processing apparatus according to claim 5, wherein the selection means switches the selection target among the respective subjects in an order according to the degree of attention, and selects the selection target as the selection subject.

The selection means selects a specific subject as a point of interest among the respective subjects based on the degree of attention obtained for each subject by the calculation means, and selects the point of interest and a subject other than the point of interest. The information processing apparatus according to claim 2, wherein one or more subjects are selected as selected subjects from subjects other than the point of interest based on the relative change in the position of the information processing apparatus.

The information related to the visual field of the image pickup unit includes the rotation amount of the image pickup unit, the measurement result by the acceleration sensor of the image pickup unit, the measurement result by the distance measuring sensor of the image pickup unit, the aperture information of the image pickup unit, and the image pickup unit. The information processing according to any one of claims 1 to 8, wherein the pan and tilt of the mounted cloud stand, and the pan, tilt, and zoom control information of the image pickup unit are used. Device.

The information processing apparatus according to claim 2, wherein the attention level obtained for each subject by the calculation means is the average of the attention levels of the subject in a plurality of captured images.

The amount of movement from the imaging range of the imaging unit A to the imaging range of the imaging unit B and the amount of movement from the position of the subject in the image captured by the imaging unit A to the position of the subject in the image captured by the imaging unit B are acquired. Acquisition method and
The amount of movement from the imaging range of the imaging unit A to the imaging range of the imaging unit B, and the amount of movement from the position of the subject in the image captured by the imaging unit A to the position of the subject in the image captured by the imaging unit B. An information processing apparatus comprising:, a selection means for selecting one or more subjects as a selection subject based on the above.

It is an information processing method performed by an information processing device.
Acquires the captured image captured by the imaging unit and obtains it.
It is characterized in that one or more subjects are selected as the selected subject based on the movement of the subject in the acquired image and the information related to the field of view of the image pickup unit or the browsing operation for the captured image. Information processing method to do.

It is an information processing method performed by an information processing device.
The amount of movement from the imaging range of the imaging unit A to the imaging range of the imaging unit B and the amount of movement from the position of the subject in the image captured by the imaging unit A to the position of the subject in the image captured by the imaging unit B are acquired. death,
The amount of movement from the imaging range of the imaging unit A to the imaging range of the imaging unit B, and the amount of movement from the position of the subject in the image captured by the imaging unit A to the position of the subject in the image captured by the imaging unit B. An information processing method characterized in that one or more subjects are selected as selected subjects based on.

A computer program for making a computer function as each means of the information processing apparatus according to any one of claims 1 to 11.