JP2023013821A

JP2023013821A - Image processing device, image reproducing device, and program

Info

Publication number: JP2023013821A
Application number: JP2021118256A
Authority: JP
Inventors: 清晴相澤; Kiyoharu Aizawa; 裕紀澤邉; Yuki Sawabe
Original assignee: University of Tokyo NUC
Current assignee: University of Tokyo NUC
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2023-01-26

Abstract

To provide an image processing device, an image reproducing device, and a program capable of detecting attractive regions and performing processing corresponding to wide-angle image data.SOLUTION: An image processing device includes a control unit, a storage unit, an operation unit, a display unit, and an interface unit. The control unit includes: an acquisition unit 22 that acquires image data to be processed; a candidate region extraction unit 23 that extracts a plurality of candidate regions, which are candidates for attractive regions that satisfy a predetermined condition from the acquired image data; a saliency map estimation unit 25 that estimates, based on the acquired image data, saliency map information related to the image data; and a region determination unit 26 that determines attractive regions in the acquired image data, based on the saliency map and information on the candidate regions.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置、画像再生装置、及びプログラムに関する。 The present invention relates to an image processing device, an image reproducing device, and a program.

近年では、コンピュータを用いて種々の画像データ（静止画像であると動画像であるとを問わない）を視聴する機会が増大している。このような背景の下、種々の画像データのそれぞれから注目するべき領域を利用者に案内することが要望されている。このような要望は、画像全体の一部が表示される全天球画像などの広角の画像データや、動画像データにおいて顕著である。 2. Description of the Related Art In recent years, opportunities to view various image data (regardless of whether they are still images or moving images) using computers are increasing. Under such circumstances, it is desired to guide the user to the area to be noticed from each of various image data. Such a demand is conspicuous in wide-angle image data such as omnidirectional images in which a part of the entire image is displayed, and moving image data.

Daniel Martin, Ana Serrano, and Belen Masia. 2020. Panoramic convolutions for 360° single-image saliency prediction. In CVPR Workshop on Computer Vision for Augmented and Virtual RealityDaniel Martin, Ana Serrano, and Belen Masia. 2020. Panoramic convolutions for 360° single-image saliency prediction. In CVPR Workshop on Computer Vision for Augmented and Virtual Reality

非特許文献１には、入力された画像データのなかで特徴的な部分を検出した、顕著性マップ（Saliency map）を生成する技術が開示されている。このような技術を利用すると、画像データから、注目するべき部分を見出すことは一応可能である。 Non-Patent Document 1 discloses a technique for generating a saliency map by detecting characteristic portions in input image data. By using such a technique, it is possible to find a portion of interest from the image data.

しかしながら、上記従来の顕著性マップを利用する場合には、次のような問題点があった。まず第１に、顕著性マップは画像データ内での顕著性の分布を示すものであるため、必ずしも視聴者にとって意味のある領域を抽出できない。また第２に、全天球画像などの広角の画像データについて顕著性マップを生成する場合、正距円筒画像などの矩形の画像に変換してから処理を行うこととなる。しかし、正距円筒画像に変換する際、有意な部分が左右あるいは上下に分割されてしまい、意図した顕著性マップが得られない場合があって、注目領域の検出は困難なものとなっていた。 However, when using the conventional saliency map, there are the following problems. First of all, since the saliency map shows the distribution of saliency in the image data, it cannot always extract regions that are meaningful to the viewer. Secondly, when generating a saliency map for wide-angle image data such as an omnidirectional image, processing is performed after conversion into a rectangular image such as an equirectangular image. However, when converting to an equirectangular image, the significant part is divided horizontally or vertically, and the intended saliency map may not be obtained, making it difficult to detect the region of interest. .

本発明は上記実情に鑑みて為されたもので、注目領域を検出でき、また、広角の画像データに対応した処理を行うことのできる画像処理装置、画像再生装置、及びプログラムを提供することを、その目的の一つとする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and aims to provide an image processing apparatus, an image reproducing apparatus, and a program capable of detecting an attention area and performing processing corresponding to wide-angle image data. , as one of its purposes.

上記従来例の問題点を解決する本発明の一態様は、画像処理装置であって、処理の対象となる画像データを取得する取得手段と、前記取得した画像データから、所定の条件を満足する注目領域の候補となる候補領域を、複数抽出する候補領域抽出手段と、前記取得した画像データに基づいて、当該画像データに係る顕著性マップ情報を推定する顕著性マップ推定手段と、前記顕著性マップと、前記候補領域の情報とに基づいて、前記取得した画像データ内の注目領域を決定する決定手段と、を含むこととしたものである。 One aspect of the present invention that solves the problems of the conventional example is an image processing apparatus comprising an acquisition unit that acquires image data to be processed, and an image processing apparatus that satisfies a predetermined condition from the acquired image data. Candidate area extraction means for extracting a plurality of candidate areas that are candidates for the attention area; Saliency map estimation means for estimating saliency map information related to the acquired image data based on the acquired image data; determining means for determining a region of interest in the acquired image data based on the map and information of the candidate region.

この画像処理装置によると、注目領域を検出でき、また、広角の画像データに対応した処理を行うことができる。 According to this image processing apparatus, a region of interest can be detected, and processing corresponding to wide-angle image data can be performed.

本発明の実施の形態に係る画像処理装置の構成例を表すブロック図である。1 is a block diagram showing a configuration example of an image processing device according to an embodiment of the present invention; FIG. 本発明の実施の形態に係る画像処理装置の例を表す機能ブロック図である。1 is a functional block diagram showing an example of an image processing device according to an embodiment of the present invention; FIG. 本発明の実施の形態に係る画像処理装置が用いる仮想的な天球と、その座標系の例を表す説明図である。1 is an explanatory diagram showing an example of a virtual celestial sphere and its coordinate system used by an image processing apparatus according to an embodiment of the present invention; FIG. 本発明の実施の形態に係る画像処理装置の動作例を表すフローチャート図である。It is a flowchart figure showing the operation example of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置の動作例を表すもう一つのフローチャート図である。FIG. 5 is another flowchart showing an operation example of the image processing device according to the embodiment of the present invention; 本発明の実施の形態に係る画像処理装置の動作例を表すさらにもう一つのフローチャート図である。FIG. 9 is yet another flowchart showing an operation example of the image processing device according to the embodiment of the present invention; 本発明の実施の形態に係る画像処理装置が生成した情報の出力の例を表す説明図である。FIG. 4 is an explanatory diagram showing an output example of information generated by the image processing apparatus according to the embodiment of the present invention;

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像処理装置１は、図１に例示するように、制御部１１、記憶部１２、操作部１３、表示部１４、及びインタフェース部１５を含んで構成される。 An embodiment of the present invention will be described with reference to the drawings. An image processing apparatus 1 according to an embodiment of the present invention includes a control section 11, a storage section 12, an operation section 13, a display section 14, and an interface section 15, as illustrated in FIG.

ここで制御部１１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに従って動作する。本実施の形態では、この制御部１１は、処理の対象となる画像データを取得し、当該取得した画像データから、所定の条件を満足する注目領域の候補となる候補領域を複数抽出する。またこの制御部１１は、上記取得した画像データを入力として、当該画像データに係る顕著性マップ情報を推定するためのニューラルネットワークを機械学習する機械学習処理と、当該機械学習処理により、入力された画像データに対応する顕著性マップ情報を推定するよう機械学習した状態にあるニューラルネットワークを利用し、上記取得した画像データに係る顕著性マップ情報を推定する推定処理とを実行する。 Here, the control unit 11 is a program control device such as a CPU, and operates according to a program stored in the storage unit 12 . In this embodiment, the control unit 11 acquires image data to be processed, and extracts a plurality of candidate regions that are candidates for attention regions that satisfy a predetermined condition from the acquired image data. In addition, the control unit 11 receives the acquired image data as an input, performs machine learning processing for performing machine learning on a neural network for estimating saliency map information related to the image data, and and an estimation process of estimating saliency map information related to the acquired image data using a neural network that has undergone machine learning to estimate saliency map information corresponding to the image data.

ここで顕著性マップ情報は、処理対象となった画像データと同じ形状の画像データであり、それぞれの画素の画素値（ここでは輝度とする）が、当該画素に対応する、処理対象の画像データの画素（または画素群）で表される画像部分の顕著性を表す値（スカラー値）となっているようなものである。 Here, the saliency map information is image data having the same shape as the image data to be processed, and the pixel value (here, luminance) of each pixel corresponds to the image data to be processed. is a value (scalar value) that represents the salience of the image portion represented by the pixel (or group of pixels) of .

そして制御部１１は、当該推定処理により得られた顕著性マップと、候補領域の情報とに基づいて、取得した画像データ内の注目領域を決定する。制御部１１は、この決定した注目領域の情報を、画像データに関連付けて記憶部１２に格納する。これらの制御部１１の動作については後に詳しく述べる。 Then, the control unit 11 determines a region of interest in the acquired image data based on the saliency map obtained by the estimation process and information on the candidate region. The control unit 11 stores the determined attention area information in the storage unit 12 in association with the image data. These operations of the control unit 11 will be described later in detail.

記憶部１２は、メモリデバイスやディスクデバイスであり、制御部１１によって実行されるプログラムを保持する。このプログラムは、コンピュータ可読、かつ非一時的な記録媒体に格納されて提供され、この記憶部１２に格納されたものでよい。またこの記憶部１２は、制御部１１のワークメモリとしても動作する。 The storage unit 12 is a memory device or disk device, and holds programs executed by the control unit 11 . This program may be provided by being stored in a computer-readable, non-temporary recording medium and stored in the storage section 12 . The storage section 12 also operates as a work memory for the control section 11 .

操作部１３は、キーボードやマウス等であり、画像データの視聴者の操作を受けて、当該操作の内容を制御部１１に出力する。表示部１４は、ディスプレイ等であり、制御部１１から入力される指示に従って情報を表示する。 The operation unit 13 is a keyboard, a mouse, or the like, and receives an operation of the viewer of the image data and outputs the content of the operation to the control unit 11 . The display unit 14 is a display or the like, and displays information according to instructions input from the control unit 11 .

インタフェース部１５は、ネットワークインタフェースや、ＵＳＢ（Universal Serial Bus）等を含み、外部のコンピュータ機器から画像データ等の種々の情報を受け入れて、制御部１１に出力する。またこのインタフェース部１５は、制御部１１から入力される指示に従って、情報を指示された出力先（コンピュータ機器やストレージデバイス等）へ出力する。 The interface unit 15 includes a network interface, a USB (Universal Serial Bus), etc., receives various information such as image data from an external computer device, and outputs the information to the control unit 11 . The interface unit 15 also outputs information to an indicated output destination (computer equipment, storage device, etc.) in accordance with an instruction input from the control unit 11 .

次に本実施の形態の制御部１１の動作について説明する。この制御部１１は、記憶部１２に格納されたプログラムを実行することで図２に例示するように、機能的に、受入部２１と、取得部２２と、候補領域抽出部２３と、顕著性マップ学習処理部２４と、顕著性マップ推定部２５と、領域決定部２６と、出力部２７とを含んで構成される。 Next, the operation of the control section 11 of this embodiment will be described. By executing the program stored in the storage unit 12, the control unit 11 functionally functions as an acceptance unit 21, an acquisition unit 22, a candidate region extraction unit 23, a saliency It includes a map learning processing unit 24 , a saliency map estimation unit 25 , an area determination unit 26 and an output unit 27 .

ここで受入部２１は、機械学習処理または推定処理の対象となるデータを受け入れる。ここで機械学習処理の対象は、顕著性マップ推定部２５であり、機械学習処理の対象となるデータは、画像データと、当該画像データを入力したときの正解となる顕著性マップの情報（以下、教師マップデータと呼ぶ）とが関連付けられたものである。また推定処理の対象となるデータは画像データである。なお、機械学習処理の対象となるデータに含まれる画像データが動画像データである場合、当該機械学習処理の対象となるデータには動画像データの各フレームに対応する教師マップデータが含まれているものとする。 Here, the receiving unit 21 receives data to be subjected to machine learning processing or estimation processing. Here, the object of the machine learning process is the saliency map estimation unit 25, and the data to be the object of the machine learning process are image data and correct saliency map information (hereinafter referred to as , called teacher map data). Data to be subjected to estimation processing is image data. If the image data included in the data targeted for machine learning processing is moving image data, the data targeted for machine learning processing includes teacher map data corresponding to each frame of the moving image data. It is assumed that there is

また、機械学習処理の対象となるデータや、推定処理の対象となるデータに含まれる画像データは、動画像データや、全天球画像の静止画像データ、全天球画像の動画像データなどのうち、いずれかの種類のデータであるものとする。 In addition, the image data included in the data targeted for machine learning processing and the data targeted for estimation processing can be video data, still image data of omnidirectional images, video data of omnidirectional images, etc. Among them, it shall be data of one kind.

取得部２２は、受入部２１が受け入れたデータから、処理対象となる画像データ等を少なくとも一つ取得する。この取得部２２の処理は、受入部２１が受け入れたデータの種類、及び処理の内容（機械学習処理であるか推定処理であるか）によって異なっていてよい。 The acquiring unit 22 acquires at least one piece of image data or the like to be processed from the data received by the receiving unit 21 . The processing of this acquisition unit 22 may differ depending on the type of data received by the receiving unit 21 and the content of processing (whether it is machine learning processing or estimation processing).

［広角画像データの場合］
例えば受入部２１が受け入れた画像データが全天球画像の静止画像データなど、天球の少なくとも半球を超える範囲に投影される広角画像データ（天球全体に投影される全天球画像を含む）であれば、この取得部２２は、当該天球に投影された画像（なお全天球をカバーしない広角画像データの場合は当該広角画像データを仮想的な天球に投影する。この場合画像のない部分については予め定めた画素値に設定すればよい）について、次のような処理を行う。 [For wide-angle image data]
For example, if the image data received by the receiving unit 21 is wide-angle image data projected to a range exceeding at least a hemisphere of the celestial sphere, such as still image data of an omnidirectional image (including omnidirectional images projected on the entire celestial sphere). For example, the acquisition unit 22 projects the image projected on the celestial sphere (in the case of wide-angle image data that does not cover the entire celestial sphere, the wide-angle image data is projected onto the virtual celestial sphere. In this case, the part without the image is can be set to a predetermined pixel value), the following processing is performed.

すなわち取得部２２は、いずれの処理の場合も、図３に例示するような、広角画像データが投影されている仮想的な天球について、当該天球の中心を原点（ＸＹＺ軸の交わる点）として、例えばＸ軸の正の方向でＹＺ平面と、この天球とが交差する半円を中央子午線Ｃとし、予め定めた所定の標準緯線Ｄ（Ｚ＝ｚであるＸＹ面に平行な面（ただしｚは０≦ｚ≦ｒの範囲で予め定めた値であり、ここでｒは天球の半径とする。以下の例でも同じ）で天球を切ってできる円）と、上記中央子午線Ｃとをパラメータとしてこの天球に投影した広角画像データを、矩形状の正距円筒画像に変換する。取得部２２は例えば推定処理を行う場合、この標準緯線や中央子午線のパラメータを、視聴者の操作に応じて決定してもよい。一例として取得部２２は、視聴者の指定した視線方向が、変換後の正距円筒画像の中心となるよう、標準緯線や中央子午線のパラメータを設定して、正距円筒画像への変換を行う。 That is, in any process, the acquisition unit 22 takes the center of the virtual celestial sphere on which the wide-angle image data is projected as shown in FIG. For example, the central meridian C is the semicircle where the YZ plane intersects the celestial sphere in the positive direction of the X axis, and the predetermined standard latitude line D (a plane parallel to the XY plane where Z = z (where z is It is a predetermined value in the range of 0 ≤ z ≤ r, where r is the radius of the celestial sphere. Wide-angle image data projected onto the celestial sphere is converted into a rectangular equirectangular image. For example, when performing estimation processing, the acquisition unit 22 may determine the parameters of the standard parallel and the central meridian according to the viewer's operation. As an example, the acquisition unit 22 sets the parameters of the standard parallel and the central meridian so that the line-of-sight direction specified by the viewer is the center of the converted equirectangular image, and performs conversion to the equirectangular image. .

また取得部２２は、少なくとも機械学習処理を行うときには、上記仮想的な天球をＺ軸（天球の中心から天頂方向）周りに一様乱数でランダムに決定した角度θ1，θ2，…で中央子午線の方向を回転し、各角度でＸ軸正の方向から回転した方向の子午線を中央子午線とし、これらの中央子午線のそれぞれと、予め定めた所定の標準緯線とをパラメータとして、当該広角画像データを矩形状の正距円筒画像に変換して、複数の正距円筒画像を得てもよい。 In addition, at least when performing machine learning processing, the acquisition unit 22 rotates the virtual celestial sphere around the Z-axis (from the center of the celestial sphere to the zenith direction) at angles θ1, θ2, . The direction is rotated, and the meridian of the direction rotated from the positive direction of the X axis at each angle is set as the central meridian, and the wide-angle image data is converted into a rectangular shape using each of these central meridians and a predetermined standard latitude as parameters. A plurality of equirectangular images may be obtained by converting to an equirectangular image of the shape.

この処理は、広角画像データを投影した天球を回転させつつ、処理の対象となる画像データを複数個取得する処理に相当する。 This process corresponds to a process of acquiring a plurality of image data to be processed while rotating the celestial sphere on which the wide-angle image data is projected.

さらに取得部２２は、少なくとも機械学習処理を行うときに行う処理として、上述のように回転をＺ軸周り（水平方向の回転）に施すだけでなく、他の軸を回転軸としてそれぞれの座標軸周りに（なお、天球の中心を原点とするので、各座標軸はこの原点で交わるものとする）それぞれ、互いに異なる角度の組（φ1，ψ1，θ1），（φ2，ψ2，θ2），…で回転し、各角度の組で回転した後のＸ軸正の方向の子午線を中央子午線とし、また各角度の組で回転した後の所定の標準緯線とをパラメータとして、当該広角画像データを、矩形状の正距円筒画像に変換して、複数の（互いに異なる視線方向をそれぞれの中心とする）正距円筒画像を得てもよい。 Furthermore, the acquisition unit 22 not only rotates around the Z-axis (rotation in the horizontal direction) as described above, but also rotates around each coordinate axis with another axis as the rotation axis, as processing performed at least when performing machine learning processing. (Because the center of the celestial sphere is the origin, each coordinate axis must intersect at this origin.) Then, the meridian in the positive direction of the X-axis after rotation by each angle pair is defined as the central meridian, and the wide-angle image data is converted into a rectangular shape using a predetermined standard latitude after rotation by each angle pair as parameters. , to obtain a plurality of equirectangular images (each centered on a different line-of-sight direction).

取得部２２は、少なくとも機械学習処理を行うときには、上記の方法によって得た複数の正距円筒画像を、処理対象の画像データとして出力する。また取得部２２は、推定処理を行うときには、広角画像データに基づいて得た、矩形状の正距円筒画像を一つだけ出力することとしてよい。なお、ここで得られた正距円筒画像の各画素の座標と、広角画像データを投影した天球上の座標とは相互に変換可能となっている。 At least when performing machine learning processing, the acquisition unit 22 outputs a plurality of equirectangular images obtained by the above method as image data to be processed. Also, when performing the estimation process, the acquisition unit 22 may output only one rectangular equirectangular image obtained based on the wide-angle image data. The coordinates of each pixel of the equirectangular image obtained here and the coordinates on the celestial sphere onto which the wide-angle image data is projected can be mutually converted.

さらにこの取得部２２は、機械学習処理を行うときには、受け入れた画像データに関連付けられた教師マップデータについても画像データと同様に、仮想的な天球の、関連付けられた画像データを投影した範囲と同じ範囲に投影する。 Furthermore, when performing machine learning processing, the acquisition unit 22 also sets the teacher map data associated with the received image data to the same range as the projected image data associated with the virtual celestial sphere, similarly to the image data. Project to range.

そして取得部２２は、複数の正距円筒画像を得た際の、画像データを投影した天球の回転角θ1，θ2，…（または（φ1，ψ1，θ1），（φ2，ψ2，θ2），…）を用い、それぞれの角度で各座標軸周りに、教師マップデータを投影した天球を回転し、各角度（または各角度の組）で回転した後のＸ軸正の方向の子午線を中央子午線とし、所定の標準緯線（既に述べたように、Ｚ＝ｚであるＸＹ面に平行な面（ただしｚは０≦ｚ≦ｒの範囲で予め定めた値であり、ここでｒは天球の半径とする）で天球を切ってできる円であり、Ｘ軸周りにφ1，φ2，…、Ｙ軸周りにψ1，ψ2，…とそれぞれさらに回転する場合は、また各角度の組で回転した後のＺ＝ｚで、それぞれの角度に対応する標準緯線を定めるものとする）とをパラメータとして、当該教師マップデータを、矩形状の正距円筒画像（以下、画像データの正距円筒画像と区別するため、教師画像と呼ぶ）に変換して、複数の教師画像を得る。 Then, the acquisition unit 22 obtains the rotation angles θ1, θ2, . …) is used to rotate the celestial sphere onto which the teacher map data is projected around each coordinate axis at each angle, and the meridian in the positive direction of the X axis after rotation at each angle (or each angle set) is taken as the central meridian. , a predetermined standard parallel (as already mentioned, a plane parallel to the XY plane where Z=z (where z is a predetermined value in the range of 0≦z≦r, where r is the radius of the celestial sphere ) is a circle formed by cutting the celestial sphere, and if it is further rotated around the X axis as φ1, φ2, …, and ψ1, ψ2, … around the Y axis, then Z = z to define the standard parallel corresponding to each angle), the teacher map data is converted into a rectangular equirectangular image (hereinafter referred to as an equirectangular image of the image data). , called teacher images) to obtain a plurality of teacher images.

これにより取得部２２は、広角画像データに基づいて得た複数の（互いに異なる視線方向をそれぞれの中心とする）正距円筒画像のそれぞれに対応する、複数の教師画像を得る。 As a result, the acquiring unit 22 acquires a plurality of teacher images corresponding to each of the plurality of equirectangular images (each centered on a different line-of-sight direction) obtained based on the wide-angle image data.

［動画像データの場合］
また、受入部２１が受け入れたデータに含まれる画像データが動画像データである場合、取得部２２は、機械学習処理を行う場合も、推定処理を行う場合も、いずれの場合でも、動画像データに基づいて、一連の静止画像データを生成する。この静止画像データは、例えば動画像データに含まれるキーフレームのデータ（例えばＩフレーム）を抽出したものであってもよいし、動画像データを再生して得られる各フレームの静止画像データであってもよい。取得部２２は、上記の方法によって得た複数の再生時点での静止画像データ（フレーム）を、処理対象の画像データとして出力する。 [For moving image data]
Further, when the image data included in the data received by the receiving unit 21 is moving image data, the acquiring unit 22 performs the moving image data regardless of whether the machine learning process or the estimation process is performed. generates a series of still image data based on The still image data may be obtained by extracting key frame data (for example, I frame) included in the moving image data, or may be still image data of each frame obtained by reproducing the moving image data. may The obtaining unit 22 outputs the still image data (frames) at a plurality of playback points obtained by the above method as image data to be processed.

また取得部２２は、機械学習処理を行う場合には、処理対象の画像データとした各再生時点でのフレームに対応する教師マップデータを、受入部２１が受け入れたデータから抽出する。 Further, when performing machine learning processing, the acquiring unit 22 extracts teacher map data corresponding to the frame at each reproduction time point, which is the image data to be processed, from the data received by the receiving unit 21 .

［動画の広角画像データの場合］
さらに、受入部２１が受け入れたデータが全天球画像の静止画像データなど、天球の少なくとも半球を超える範囲に投影される広角画像データであって、かつ動画像データである場合は、取得部２２は次のように処理を行ってもよい。 [For wide-angle video image data]
Furthermore, when the data received by the receiving unit 21 is wide-angle image data projected to a range exceeding at least a hemisphere of the celestial sphere, such as still image data of an omnidirectional image, and is moving image data, the acquiring unit 22 may be processed as follows:

この例の取得部２２は、機械学習処理を行う場合、予め天球の回転角度θ1，θ2，…（水平方向のみ回転する場合）、または（φ1，ψ1，θ1），（φ2，ψ2，θ2），…をそれぞれ一様乱数によりランダムに決定しておく。 When performing machine learning processing, the acquiring unit 22 in this example preliminarily sets the rotation angles θ1, θ2, . , . . . are randomly determined by uniform random numbers.

そして取得部２２は、動画像データに含まれる広角画像のキーフレームのデータ（例えばＩフレーム）を抽出して静止画像の広角画像データを取得するか、あるいは、当該動画像データを再生して得られる複数の再生時点での各フレームの（静止画像の）広角画像データを取得する。 Then, the acquisition unit 22 acquires the wide-angle image data of the still image by extracting the key frame data (for example, I frame) of the wide-angle image included in the moving image data, or obtains the wide-angle image data by reproducing the moving image data. Acquire wide-angle image data (of a still image) for each frame at multiple playback time points.

そして推定処理を行うときには、取得部２２は、得られた静止画像の広角画像データのそれぞれについて、仮想的な天球に投影し、当該広角画像データを投影した仮想的な天球の、所定の方向（例えば視聴者が設定した方向）が変換後の正距円筒画像の中心となるように中央子午線及び標準緯線のパラメータを設定し、天球に投影された広角画像データを、矩形状の正距円筒画像に変換して、各キーフレームに対応する正距円筒画像を得て、これらを出力する。なお、動画像データの再生中に推定処理を実行する場合は、取得部２２は、キーフレームに基づく画像を生成する際に、そのときに視聴者が設定している方向（例えば視聴者の前方として指定された方向）が変換後の正距円筒画像の中心となるように中央子午線及び標準緯線のパラメータを設定して、天球に投影された広角画像データを、矩形状の正距円筒画像に変換し、当該変換した正距円筒画像を出力する。 Then, when performing the estimation process, the acquisition unit 22 projects each of the obtained wide-angle image data of the still image onto a virtual celestial sphere, and projects the wide-angle image data onto the virtual celestial sphere in a predetermined direction ( For example, the direction set by the viewer) is set to the center of the equirectangular image after conversion. to obtain equirectangular images corresponding to each keyframe, and output them. Note that if the estimation process is to be executed during playback of moving image data, the acquisition unit 22 will, when generating the image based on the keyframes, set the direction set by the viewer at that time (for example, forward direction of the viewer). ) is the center of the equirectangular image after transformation, and the wide-angle image data projected on the celestial sphere is converted into a rectangular equirectangular image by setting the parameters of the central meridian and standard parallels. Transform and output the transformed equirectangular image.

また機械学習処理を行う場合、取得部２２は、得られた静止画像の広角画像データのそれぞれについて、仮想的な天球に投影し、当該広角画像データを投影した仮想的な天球を、上記決定した回転角度で、対応する軸周りに回転する。取得部２２は、各角度（または各角度の組）で回転した後のＸ軸正の方向の子午線を中央子午線とし、所定の標準緯線（既に述べたように、Ｚ＝ｚであるＸＹ面に平行な面（ただしｚは０≦ｚ≦ｒの範囲で予め定めた値であり、ここでｒは天球の半径とする）で天球を切ってできる円であり、Ｘ軸周りにφ1，φ2，…、Ｙ軸周りにψ1，ψ2，…とそれぞれさらに回転する場合は、また各角度の組で回転した後のＺ＝ｚで、それぞれの角度に対応する標準緯線を定めるものとする）とこの中央子午線とをパラメータとして、天球に投影された広角画像データを、矩形状の正距円筒画像に変換して、複数の正距円筒画像を得る。この処理は、広角画像データを投影した天球を回転させつつ矩形の画像データに変換して、処理の対象となる画像データを複数個取得する処理に相当する。 When performing machine learning processing, the acquiring unit 22 projects each of the wide-angle image data of the obtained still image onto a virtual celestial sphere, and determines the virtual celestial sphere onto which the wide-angle image data is projected. Rotate around the corresponding axis by the rotation angle. The acquisition unit 22 sets the meridian in the positive direction of the X axis after rotation by each angle (or each angle set) as the central meridian, and the predetermined standard parallel (as already described, on the XY plane where Z = z) It is a circle formed by cutting the celestial sphere with parallel planes (where z is a predetermined value in the range of 0≤z≤r, where r is the radius of the celestial sphere). . . , ψ1, ψ2, . Using the central meridian as a parameter, the wide-angle image data projected onto the celestial sphere is converted into rectangular equirectangular images to obtain a plurality of equirectangular images. This process corresponds to a process of rotating the celestial sphere onto which the wide-angle image data is projected and converting it into rectangular image data to obtain a plurality of image data to be processed.

取得部２２は、上記の方法によって得た複数の再生時点での各フレームごとに複数の正距円筒画像を、処理対象の画像データとして出力する。この例でも得られた正距円筒画像の各画素の座標と、そのもととなった天球上の座標とは相互に変換可能となっている。 The acquiring unit 22 outputs, as image data to be processed, a plurality of equirectangular images for each frame at a plurality of reproduction time points obtained by the above method. Also in this example, the coordinates of each pixel of the obtained equirectangular image and the original coordinates on the celestial sphere can be mutually converted.

さらに取得部２２は、機械学習処理を行うときには、受け入れた動画像データに関連付けられた教師マップデータについても、上記の動画像データと同様に、仮想的な天球の、関連付けられた動画像データを投影した範囲と同じ範囲に投影する。 Furthermore, when performing the machine learning process, the acquiring unit 22 also obtains the associated moving image data of the virtual celestial sphere for the teacher map data associated with the received moving image data as well as the moving image data described above. Project to the same range as the projected range.

そして取得部２２は、フレームごとに複数の正距円筒画像を得た際の、各フレームの画像データを投影した天球の回転角θ1，θ2，…（または（φ1，ψ1，θ1），（φ2，ψ2，θ2），…）を用い、それぞれの角度で各座標軸周りに、当該フレームに対応する教師マップデータを投影した天球を回転し、各角度（または各角度の組）で回転した後のＸ軸正の方向の子午線を標準子午線とし、この標準子午線と、所定の標準緯線（既に述べたように、Ｚ＝ｚであるＸＹ面に平行な面（ただしｚは０≦ｚ≦ｒの範囲で予め定めた値であり、ここでｒは天球の半径とする）で天球を切ってできる円であり、Ｘ軸周りにφ1，φ2，…、Ｙ軸周りにψ1，ψ2，…とそれぞれさらに回転する場合は、また各角度の組で回転した後のＺ＝ｚで、それぞれの角度に対応する標準緯線を定めるものとする）とをパラメータとして、当該フレームに対応する教師マップデータを、矩形状の正距円筒画像（教師画像）に変換して、フレームごとに、処理対象の画像データに対応する複数の教師画像を得る。 Then, the acquisition unit 22 obtains a plurality of equirectangular images for each frame, and calculates the rotation angles θ1, θ2, . . . (or (φ1, ψ1, θ1), (φ2 , ψ2, θ2), …), rotate the celestial sphere onto which the training map data corresponding to the frame is projected around each coordinate axis at each angle, and after rotating at each angle (or each angle set) The meridian in the positive direction of the X axis is defined as the standard meridian, and this standard meridian and a predetermined standard parallel (as already mentioned, a plane parallel to the XY plane where Z = z (where z is in the range of 0 ≤ z ≤ r is a predetermined value, where r is the radius of the celestial sphere). In the case of rotation, the standard latitude corresponding to each angle is determined by Z=z after rotation by each set of angles. A plurality of teacher images corresponding to the image data to be processed are obtained for each frame by converting into an equirectangular image (teacher image) of the shape.

これにより取得部２２は、動画の広角画像データに基づいて得た複数のフレームごとに（互いに異なる視線方向をそれぞれの中心とする）複数の正距円筒画像のそれぞれに対応する、上記フレームごとに複数の教師画像を得る。 As a result, the acquisition unit 22 obtains, for each of a plurality of frames obtained based on the wide-angle image data of the moving image, a Obtain multiple teacher images.

［受け入れた画像データが広角でない静止画像データである場合］
また受入部２１が受け入れたデータが、比較的画角の小さい、広角でない静止画像データである場合は、取得部２２は、当該受け入れたデータをそのまま処理対象の画像データとして出力する。 [When the received image data is non-wide-angle still image data]
If the data received by the receiving unit 21 is still image data that has a relatively small angle of view and is not wide-angle, the acquiring unit 22 outputs the received data as it is as image data to be processed.

この例では、取得部２２は、機械学習処理を行うときには、受け入れた画像データに関連付けられた教師マップデータを出力する。 In this example, the acquiring unit 22 outputs teacher map data associated with the received image data when performing machine learning processing.

［処理対象の画像データを得た後の処理］
候補領域抽出部２３は、画像処理装置１が推定処理を行う際に動作し、処理の対象となった画像データのそれぞれから、予め定めた条件を満足する候補領域Ｉk（ｋ＝１，２，…ｎ）を複数抽出し、抽出した候補領域を表す情報を出力する。ここで抽出する候補領域Ｉkは、領域の重複を許して抽出するものとする。例えば候補領域Ｉiと、候補領域Ｉj（ｉ≠ｊ）とが重なり合っていてもよい。 [Process after obtaining image data to be processed]
The candidate area extraction unit 23 operates when the image processing apparatus 1 performs estimation processing, and extracts candidate areas Ik (k=1, 2, . . n) are extracted, and information representing the extracted candidate regions is output. The candidate regions Ik extracted here are extracted with overlapping regions allowed. For example, the candidate area Ii and the candidate area Ij (i≠j) may overlap.

この候補領域抽出部２３は、一例として、いわゆるセレクティブサーチ（Selective Search：J.R. Uijlings et al.,"Selective Search for object recognition”, International journal of computer vision, Vol.104, No.2, pp.154-171(2013)）の方法を用いて複数の候補領域の抽出を行う。なお、候補領域を表す情報は、例えば、候補領域を矩形としてその左上頂点の、処理対象の画像データ上の座標（ｘ，ｙ）と、候補領域の高さ及び幅（ｈ，ｗ）とを含むものとすればよい。 As an example, the candidate region extraction unit 23 performs a so-called selective search (Selective Search: J.R. Uijlings et al., "Selective Search for object recognition", International journal of computer vision, Vol.104, No.2, pp.154- 171 (2013)) to extract a plurality of candidate regions. The information representing the candidate area is, for example, the coordinates (x, y) of the upper left vertex of the candidate area on the image data to be processed, and the height and width (h, w) of the candidate area. should be included.

もっともこの方法は一例であり、候補領域抽出部２３は別の処理によって候補領域を抽出してもよい。この別の処理の例については後に述べる。 However, this method is only an example, and the candidate area extraction unit 23 may extract the candidate area by another process. An example of this alternative processing will be described later.

顕著性マップ学習処理部２４は、学習処理時に動作し、取得部２２が出力する画像データと、対応する教師マップデータとを用いて、例えばD.Martin, et al., “Panoramic convolutions for 360 single-image saliency prediction”, CVPR Workshop on Computer Vision for Augmented and Virtual Reality, 2020にある畳み込みネットワークなどのニューラルネットワークを機械学習する。そしてこの機械学習により、上記ニューラルネットワークを、画像データの入力を受けて、その顕著な領域を推定して出力するよう機械学習した状態とする。 The saliency map learning processing unit 24 operates during learning processing, and uses the image data output by the acquisition unit 22 and the corresponding teacher map data to perform, for example, D. Martin, et al., “Panoramic convolutions for 360 single -image saliency prediction”, machine learning neural networks such as convolutional networks in CVPR Workshop on Computer Vision for Augmented and Virtual Reality, 2020. Then, by this machine learning, the neural network receives input of image data, estimates and outputs a remarkable area of the image data, and performs machine learning.

この機械学習の方法は広く知られた方法を採用できるので、ここでの詳しい説明を省略するが、本実施の形態において特徴的なことの一つは、全天球画像等の広角画像データとそれに対応する教師マップデータとを、広角画像データや教師マップデータを投影した天球を回転させて中心方向を互いに異ならせた複数の正距円筒画像へ変換することで、学習データの豊富化を図ったことである。 A widely known method can be adopted for this machine learning method, so a detailed explanation is omitted here. By converting the corresponding teacher map data into a plurality of equirectangular images with different center directions by rotating the celestial sphere onto which the wide-angle image data and teacher map data are projected, enrichment of learning data is achieved. That's what it was.

顕著性マップ推定部２５は、推定処理時に動作し、顕著性マップ学習処理部２４により機械学習したニューラルネットワークに、取得部２２が出力する、処理の対象となった各画像データを入力する。そして顕著性マップ推定部２５は、当該ニューラルネットワークが出力する、各画像データに対応する顕著性マップ情報を推定する。 The saliency map estimation unit 25 operates during estimation processing, and inputs each image data to be processed output by the acquisition unit 22 to the neural network machine-learned by the saliency map learning processing unit 24 . The saliency map estimation unit 25 then estimates saliency map information corresponding to each image data output by the neural network.

領域決定部２６は、推定処理時に動作し、顕著性マップ推定部２５が推定した顕著性マップ情報と、候補領域抽出部２３が生成した候補領域の情報とに基づいて、処理対象となった画像データ内の注目領域を決定する。一例としてこの領域決定部２６は、候補領域同士の重なり合いの程度と、顕著性マップ情報から得られる、候補領域内の顕著性の情報とに基づいて候補領域の集合を評価しつつ、候補領域を削減するなどして集合を補正し、注目領域を決定する。 The region determination unit 26 operates during estimation processing, and selects an image to be processed based on the saliency map information estimated by the saliency map estimation unit 25 and the candidate region information generated by the candidate region extraction unit 23. Determine the region of interest in the data. As an example, the region determination unit 26 evaluates a set of candidate regions based on the degree of overlap between the candidate regions and saliency information within the candidate regions obtained from the saliency map information, while determining the candidate regions. The set is corrected, such as by reduction, and the region of interest is determined.

具体的に、顕著性マップ情報上で、候補領域Ｉkの内部に対応する顕著性を表す値の和をｇ（Ｉk）と表し、候補領域Ｉiと、Ｉjとの重複率（重なり合いの程度）をＩｏＵ（Ｉi，Ｉj）と表すとき、領域候補Ｉkの集合Ｓの評価値ＳＩｏＵを次のように規定しておく。

Specifically, on the saliency map information, the sum of values representing the saliency corresponding to the inside of the candidate region Ik is expressed as g(Ik), and the overlapping rate (degree of overlap) between the candidate regions Ii and Ij is When expressed as IoU (Ii, Ij), the evaluation value SIoU of the set S of area candidates Ik is defined as follows.

領域決定部２６は、領域候補抽出部２３が抽出した候補領域の集合Ｒに含まれる候補領域Ｉiのそれぞれについて、当該候補領域Ｉiの内部に対応する顕著性を表す値の和ｇ（Ｉi）を求めておく。 The region determination unit 26 determines the sum g(Ii) of the values representing the salience corresponding to the inside of each of the candidate regions Ii included in the candidate region set R extracted by the region candidate extraction unit 23. keep asking

領域決定部２６は、候補領域の集合Ｒから、予め定めた条件に従って所定の数ｎだけの候補領域Ｉ1，Ｉ2，…Ｉnを抽出する。ここで条件は例えばｇ（Ｉi）の大きい順にｎ個の候補領域を抽出する条件などとする。領域決定部２６は、抽出したｎ個の候補領域Ｉ1，Ｉ2，…Ｉnの集合Ｓについて（１）式により、当該集合Ｓの評価値ＳＩｏＵ（Ｓ）を求める。 The region determination unit 26 extracts a predetermined number n of candidate regions I1, I2, . . . Here, the conditions are, for example, conditions for extracting n candidate regions in descending order of g(Ii). The region determination unit 26 obtains the evaluation value SIoU(S) of the set S of the extracted n candidate regions I1, I2, . . .

以下、領域決定部２６は、集合Ｓに抽出されていない候補領域Ｉjを、候補領域の集合Ｒから順次取り出して次の処理を実行する。すなわち領域決定部２６は、集合Ｓに含まれる候補領域Ｉ1，Ｉ2，…Ｉnのいずれかに置き換えて集合Ｓ′を生成する。ここで集合Ｓに含まれる候補領域のうち、置き換えの対象となる候補領域は、例えば一様乱数によりランダムに決定すればよい。 Thereafter, the area determination unit 26 sequentially extracts the candidate areas Ij that are not extracted from the set S from the candidate area set R, and executes the following processing. In other words, the area determining unit 26 replaces any of the candidate areas I1, I2, . Among the candidate areas included in the set S, the candidate areas to be replaced may be randomly determined by uniform random numbers, for example.

領域決定部２６は、当該置き換えにより得られた集合Ｓ′について、（１）により評価値ＳＩｏＵ（Ｓ′）を求める。そして領域決定部２６は、集合Ｓの評価値ＳＩｏＵ（Ｓ）と、集合Ｓ′の評価値ＳＩｏＵ（Ｓ′）とを比較し、ＳＩｏＵ（Ｓ′）＜ＳＩｏＵ（Ｓ）となっていれば、集合Ｓ′を集合Ｓに置き換えて、集合Ｓを更新する。また領域決定部２６は、ＳＩｏＵ（Ｓ′）＜ＳＩｏＵ（Ｓ）でなければ、集合Ｓをそのままとする。 The area determining unit 26 obtains the evaluation value SIoU(S') from (1) for the set S' obtained by the replacement. Area determination unit 26 then compares the evaluation value SIoU(S) of set S with the evaluation value SIoU(S') of set S', and if SIoU(S')<SIoU(S), Set S′ is replaced with set S, and set S is updated. If SIoU(S')<SIoU(S), the area determining unit 26 leaves the set S as it is.

領域決定部２６は、候補領域の集合Ｒから次の候補領域Ｉjを取り出して、別の集合Ｓ′を生成し、上記処理を繰り返す。領域決定部２６は集合Ｒに含まれる（当初の集合Ｓに含まれる候補領域以外の）候補領域のすべてについて上記の処理を実行した後、当該実行後の集合Ｓに含まれる候補領域が表すｎ個の領域を注目領域として当該ｎ個の領域を特定する情報を出力する。 The area determination unit 26 extracts the next candidate area Ij from the candidate area set R, generates another set S', and repeats the above process. After performing the above process on all of the candidate areas included in the set R (other than the candidate areas included in the initial set S), the area determination unit 26 determines n , and outputs information specifying the n regions as the regions of interest.

出力部２７は、処理対象となった画像データまたはその元となったデータに関連付けて、領域決定部２６が出力する領域を特定する情報を記録する。具体的にこの出力部２７は、受入部２１が受け入れたデータの種類に応じて次のように処理を行う。 The output unit 27 records information specifying the area output by the area determining unit 26 in association with the image data to be processed or the original data thereof. Specifically, the output section 27 performs the following processing according to the type of data received by the receiving section 21 .

［広角画像データの場合］
例えば受入部２１が受け入れたデータが広角画像データであれば、この出力部２７は、処理対象となった画像データごとに領域決定部２６が出力する、領域を特定する情報を用いて次のように処理を行う。この例では、領域決定部２６が出力する領域を特定する情報は、処理対象となった画像データ上の座標によって特定される。そこで出力部２７は、領域決定部２６が出力する領域を特定する情報を、広角画像データを投影する仮想的な天球上の領域の情報に変換する。この変換は天球面の画像を正距円筒画像に変換したときと逆の変換を行えばよい。 [For wide-angle image data]
For example, if the data received by the receiving unit 21 is wide-angle image data, the output unit 27 uses the information for specifying the area output by the area determining unit 26 for each image data to be processed as follows. process. In this example, the information specifying the area output by the area determination unit 26 is specified by coordinates on the image data to be processed. Therefore, the output unit 27 converts the information specifying the area output by the area determining unit 26 into information on the area on the virtual celestial sphere onto which the wide-angle image data is projected. This transformation may be performed in the reverse manner to the transformation of the image of the celestial sphere into the equirectangular image.

そして出力部２７は、当該変換後の領域の情報を、受入部２１が受け入れた広角画像データに関連付けて記録する。出力部２７は、処理対象となった画像データのそれぞれについて上記の処理を実行し、当該広角画像データを上記仮想的な天球に投影したときの当該天球上での各注目領域の情報を記録する。 Then, the output unit 27 records the information of the converted area in association with the wide-angle image data received by the receiving unit 21 . The output unit 27 executes the above processing for each of the image data to be processed, and records information of each attention area on the virtual celestial sphere when the wide-angle image data is projected onto the virtual celestial sphere. .

［動画像データの場合］
また、受入部２１が受け入れたデータが動画像データである場合、出力部２７は、処理対象となった画像データごとに領域決定部２６が出力する、領域を特定する情報と、各処理対象の画像データのフレーム番号（再生時の先頭フレームを「１」とした再生順を表す番号、再生時刻を表す情報となる）の情報とを関連付けた情報を生成する。そして出力部２７は、当該生成した情報を、受入部２１が受け入れた動画像データに関連付けて記録する。 [For moving image data]
Further, when the data received by the receiving unit 21 is moving image data, the output unit 27 outputs the information specifying the area output by the area determining unit 26 for each image data to be processed, Information relating frame numbers of image data (a number indicating the order of reproduction with the first frame at the time of reproduction being "1" and information indicating the reproduction time) is generated. The output unit 27 then records the generated information in association with the moving image data received by the receiving unit 21 .

［動画の広角画像データの場合］
さらに、受入部２１が受け入れたデータが全天球画像の静止画像データなど、天球の少なくとも半球を超える範囲に投影される広角画像データであって、かつ動画像データである場合は、この出力部２７は、次のように処理を行う。 [For wide-angle video image data]
Furthermore, when the data received by the receiving unit 21 is wide-angle image data projected to a range exceeding at least a hemisphere of the celestial sphere, such as still image data of an omnidirectional image, and is moving image data, this output unit 27 processes as follows.

この例では出力部２７は、処理対象となった画像データごとに領域決定部２６が出力する、領域を特定する情報を、静止画の広角画像データの場合と同様に、仮想的な天球上の領域の情報に変換する。そして出力部２７は、当該処理対象となった画像データのフレーム番号を表す情報と当該変換して得た仮想的な天球上の領域の情報とを関連付けた情報を生成する。出力部２７は、ここで生成した情報を、受入部２１が受け入れたデータに関連付けて記録する。 In this example, the output unit 27 outputs the information specifying the area output by the area determination unit 26 for each image data to be processed, as in the case of wide-angle image data of a still image, on a virtual celestial sphere. Convert to area information. Then, the output unit 27 generates information that associates the information representing the frame number of the image data to be processed with the information on the virtual celestial sphere obtained by the conversion. The output unit 27 records the information generated here in association with the data received by the receiving unit 21 .

［広角でない静止画像データである場合］
また受入部２１が受け入れたデータが、比較的画角の小さい、広角でない静止画像データである場合は、出力部２７は、処理対象となった画像データに基づいて領域決定部２６が出力する、領域を特定する情報をそのまま、受入部２１が受け入れた静止画像のデータに関連付けて記録する。 [For non-wide-angle still image data]
If the data received by the receiving unit 21 is still image data that has a relatively small angle of view and is not wide-angle, the output unit 27 causes the area determination unit 26 to output based on the image data to be processed. The information specifying the area is recorded as it is in association with the data of the still image received by the receiving unit 21 .

［動作］
本実施の形態の画像処理装置１の動作例を、入力されるデータが広角画像の動画のデータである場合を例として以下、機械学習処理を行う際の動作と、推定処理時の動作とに分けて説明する。 [motion]
An example of the operation of the image processing apparatus 1 according to the present embodiment will be described below using a case in which input data is moving image data of a wide-angle image as an example. I will explain separately.

［学習処理時］
画像処理装置１は、学習処理時には、機械学習処理の対象となるデータとして、画像データである広角画像データ（機械学習の際には動画である必要は必ずしもないが、以下ではこの画像データは広角画像の動画データであるものとする）と、この画像データを入力したときの正解となる顕著性マップの情報（以下、教師マップデータと呼ぶ）とを関連付けたデータの入力を受ける。 [During learning process]
During the learning process, the image processing apparatus 1 uses wide-angle image data, which is image data, as data to be machine-learned (the image data does not necessarily have to be moving images for machine learning; (assumed to be video data of images) and saliency map information (hereinafter referred to as teacher map data) that is the correct answer when this image data is input.

画像処理装置１は、広角画像を投影する天球の回転角度を複数、一様乱数によりランダムに決定する。この天球の回転角度は図３に例示した各軸周りの角度を組として（φ1，ψ1，θ1），（φ2，ψ2，θ2），…といったように定めておくものとする。 The image processing apparatus 1 randomly determines a plurality of rotation angles of the celestial sphere on which a wide-angle image is projected using uniform random numbers. The rotation angles of the celestial sphere are defined as (φ1, ψ1, θ1), (φ2, ψ2, θ2), .

画像処理装置１は、受け入れたデータのうち、動画像データに含まれるキーフレームのデータ（例えばＩフレーム）を抽出し、静止画像の広角画像データを複数取得する。画像処理装置１は、取得した静止画像の広角画像データのそれぞれについて、仮想的な天球に投影した天球画像データを得る。 The image processing apparatus 1 extracts key frame data (for example, I frame) included in the moving image data from the received data, and acquires a plurality of wide-angle image data of still images. The image processing apparatus 1 obtains celestial image data projected onto a virtual celestial sphere for each of the acquired wide-angle image data of still images.

また画像処理装置１は、受け入れたデータのうち、動画像データから抽出した各キーフレームの静止画像の広角画像に対応する教師マップデータをそれぞれ取得する。画像処理装置１は、取得した教師マップデータのそれぞれについて、広角画像データと同様に、仮想的な天球に投影したデータを得る。 The image processing apparatus 1 also acquires teacher map data corresponding to the wide-angle still images of the key frames extracted from the moving image data, among the received data. The image processing apparatus 1 obtains data projected onto a virtual celestial sphere for each acquired teacher map data, similarly to the wide-angle image data.

画像処理装置１は、画像データと、教師マップデータとを投影した仮想的な天球をそれぞれ、先に決定した各回転角度で、対応する軸周りに回転し、当該角度（または当該角度の組）で回転した後のＸ軸正の方向の子午線を中央子午線とし、、この中央子午線と所定の標準緯線（各角度または各角度の組で回転した後の天球を、Ｚ＝ｚであるＸＹ面に平行な面（ただしｚは０≦ｚ≦ｒの範囲で予め定めた値であり、ここでｒは天球の半径とする）で切ってできる円）とをパラメータとして、当該天球に投影された広角画像データや教師マップデータを、矩形状の正距円筒画像に変換する。なお、正距円筒画像の各画素の座標と、そのもととなった天球上の座標とは相互に変換可能となっているものとする。 The image processing apparatus 1 rotates the virtual celestial sphere onto which the image data and the teacher map data are projected, at each of the previously determined rotation angles, around the corresponding axis, and rotates the corresponding angle (or set of angles). Let the meridian in the positive direction of the X axis after rotation be the central meridian, and this central meridian and a predetermined standard parallel (the celestial sphere after being rotated by each angle or each angle set is placed on the XY plane where Z = z A wide-angle projected onto the celestial sphere, using a parallel plane (where z is a predetermined value in the range of 0 ≤ z ≤ r, where r is the radius of the celestial sphere) as a parameter. Image data and teacher map data are converted into rectangular equirectangular images. It is assumed that the coordinates of each pixel of the equirectangular image and the original coordinates on the celestial sphere can be mutually converted.

画像処理装置１は、この正距円筒画像を得る処理を、先に決定した各回転角度について実行し、広角画像と対応する教師マップデータについて、それぞれ複数の正距円筒画像を得る。 The image processing apparatus 1 executes the process of obtaining this equirectangular image for each of the previously determined rotation angles, and obtains a plurality of equirectangular images for each of the wide-angle image and the corresponding teacher map data.

画像処理装置１は、この処理を、キーフレームの広角画像及び対応する教師マップデータのそれぞれについて繰り返して実行する。これによりキーフレームごとの広角画像データを天球に投影し、当該天球に投影したキーフレームごとの広角画像データを、互いに異なるパラメータで変換して得た複数の正距円筒画像と、対応する教師マップデータの正距円筒画像とを得る。なお、ここでは各キーフレームを投影した天球を、同じ回転角度の集合を用いて回転させているが、キーフレームごとに互いに異なる回転角度の集合を用いて回転させることとしてもよい。 The image processing apparatus 1 repeatedly executes this process for each of the wide-angle image of the key frame and the corresponding teacher map data. As a result, the wide-angle image data for each keyframe is projected onto the celestial sphere, and the wide-angle image data for each keyframe projected onto the celestial sphere is transformed with mutually different parameters to obtain a plurality of equirectangular images and the corresponding teacher map. Obtain an equirectangular image of the data. Here, the celestial sphere onto which each keyframe is projected is rotated using the same set of rotation angles, but it may be rotated using a different set of rotation angles for each keyframe.

そして画像処理装置１は、ここで得た画像データと、対応する教師マップデータとの組を用いて、例えばD.Martin, et al., “Panoramic convolutions for 360 single-image saliency prediction”, CVPR Workshop on Computer Vision for Augmented and Virtual Reality, 2020にある畳み込みネットワークなどのニューラルネットワークを機械学習する。そしてこの機械学習により、上記ニューラルネットワークを、広角画像データの入力を受けて、その顕著な領域を推定した顕著性マップ情報を出力するよう機械学習した状態とする。 Then, the image processing device 1 uses the set of the image data obtained here and the corresponding teacher map data to perform, for example, D. Martin, et al., “Panoramic convolutions for 360 single-image saliency prediction”, CVPR Workshop Machine learning neural networks such as convolutional networks in on Computer Vision for Augmented and Virtual Reality, 2020. Through this machine learning, the neural network receives input of wide-angle image data and outputs saliency map information that estimates the salient area of the neural network.

この機械学習処理の後、画像処理装置１は、推定処理を実行可能となる。推定処理を行う際には、画像処理装置１は、広角画像の動画のデータとして、例えば全天球画像の動画像データを受け入れる。画像処理装置１は、当該受け入れたデータから、処理対象となる画像データを次のようにして取得する。 After this machine learning process, the image processing apparatus 1 can execute the estimation process. When performing the estimation process, the image processing apparatus 1 receives moving image data of, for example, omnidirectional images as data of moving images of wide-angle images. The image processing apparatus 1 acquires image data to be processed from the received data as follows.

画像処理装置１は、図４に例示するように、動画像データに含まれる広角画像のキーフレームのデータ（例えばＩフレーム）を抽出して、静止画像の広角画像データを取得し（Ｓ１１）、得られた静止画像の広角画像データ（ここでは全天球画像としている）のそれぞれについて、仮想的な天球に投影する（Ｓ１２）。 As illustrated in FIG. 4, the image processing apparatus 1 extracts wide-angle image key frame data (for example, I frame) included in the moving image data to acquire wide-angle image data of a still image (S11), Each of the obtained wide-angle image data (here, omnidirectional image) of the still image is projected onto a virtual celestial sphere (S12).

画像処理装置１は、全天球画像を投影した仮想的な天球のＸ軸正の方向の子午線を中央子午線とし、この中央子午線と所定の標準緯線とをパラメータとして上記天球に投影された広角画像データを、矩形状の正距円筒画像に変換する（Ｓ１３）。なお、正距円筒画像の各画素の座標と、そのもととなった天球上の座標とは相互に変換可能となっているものとする。 The image processing apparatus 1 uses the central meridian as the meridian in the positive direction of the X axis of the virtual celestial sphere on which the omnidirectional image is projected, and uses this central meridian and a predetermined standard parallel as parameters to create a wide-angle image projected onto the celestial sphere. The data is converted into a rectangular equirectangular image (S13). It is assumed that the coordinates of each pixel of the equirectangular image and the original coordinates on the celestial sphere can be mutually converted.

画像処理装置１は、ステップＳ１３で得た正距円筒画像を、処理対象の画像データとして、処理Ｓ１１で天球画像データとしたキーフレームを特定する情報（例えばフレーム番号など）とともに出力する（Ｓ１４）。 The image processing apparatus 1 outputs the equirectangular image obtained in step S13 as image data to be processed together with information (for example, a frame number) specifying the key frame used as the celestial image data in step S11 (S14). .

画像処理装置１は、処理Ｓ１１で抽出したキーフレームのデータのそれぞれについて、処理Ｓ１３，Ｓ１４の処理を繰り返して実行する。これによりキーフレームごとに生成した正距円筒画像を、処理対象の画像データとして得る。 The image processing apparatus 1 repeatedly executes the processes S13 and S14 for each of the key frame data extracted in the process S11. Thus, an equirectangular image generated for each key frame is obtained as image data to be processed.

また画像処理装置１は、図５に例示するように、処理の対象となった複数のフレームに対応する正距円筒画像のそれぞれを順次選択し（Ｓ２１）、既に述べたセレクティブサーチの方法を用いて、選択した正距円筒画像から候補領域Ｉk（ｋ＝１，２，…ｎ）を複数抽出する（Ｓ２２）。また画像処理装置１は、当該選択した正距円筒画像に係る顕著性マップ情報を、機械学習済みのニューラルネットワーク等を用いて推定する（Ｓ２３）。 Further, as illustrated in FIG. 5, the image processing apparatus 1 sequentially selects each of the equirectangular images corresponding to the plurality of frames to be processed (S21), and uses the selective search method described above. Then, a plurality of candidate areas Ik (k=1, 2, . . . n) are extracted from the selected equirectangular image (S22). The image processing apparatus 1 also estimates saliency map information related to the selected equirectangular image using a machine-learned neural network or the like (S23).

そして画像処理装置１は、処理Ｓ２２，Ｓ２３で得られた候補領域の情報と顕著性マップとに基づいて、処理対象となった画像データのそれぞれから注目領域を決定する（Ｓ２４）。 Then, the image processing apparatus 1 determines a region of interest from each of the image data to be processed, based on the candidate region information and the saliency map obtained in steps S22 and S23 (S24).

この処理Ｓ２４の処理では、画像処理装置１は、図６に例示するように、処理Ｓ２２で抽出した候補領域の集合Ｒに含まれる候補領域Ｉiのそれぞれについて、当該候補領域Ｉiの内部に対応する、処理Ｓ２３で求められた顕著性を表す値の和ｇ（Ｉi）を求める（Ｓ２４１）。 In this process S24, the image processing apparatus 1, as exemplified in FIG. , the sum g(Ii) of the values representing the saliency obtained in step S23 is obtained (S241).

また画像処理装置１は、候補領域の集合Ｒから、ｇ（Ｉi）の大きい順に所定の数ｎだけの候補領域Ｉ1，Ｉ2，…Ｉnを抽出する（Ｓ２４２）。画像処理装置１は、抽出したｎ個の候補領域Ｉ1，Ｉ2，…Ｉnの集合Ｓについて（１）式：

により、当該集合Ｓの評価値ＳＩｏＵ（Ｓ）を求め（Ｓ２４３）、現在の集合Ｓの評価値ＳＩｏＵ（Ｓ）とする。 Also, the image processing apparatus 1 extracts a predetermined number n of candidate areas I1, I2, . . . The image processing apparatus 1 calculates the set S of the extracted n candidate regions I1, I2, . . . In by formula (1):

Then, the evaluation value SIoU(S) of the set S is obtained (S243) and set as the current evaluation value SIoU(S) of the set S.

次に画像処理装置１は、候補領域の集合Ｒに属する候補領域のうち、集合Ｓ（当初の集合Ｓ）に属していない候補領域Ｉjを順次選択して（Ｓ２４４）、当該選択した候補領域Ｉjを、集合Ｓに含まれる候補領域Ｉ1，Ｉ2，…Ｉnのいずれかに置き換えて集合Ｓ′を生成する（Ｓ２４５）。ここで集合Ｓに含まれる候補領域のうち、置き換えの対象となる候補領域は、例えば一様乱数によりランダムに決定する。 Next, the image processing apparatus 1 sequentially selects the candidate areas Ij that do not belong to the set S (original set S) from among the candidate areas belonging to the set R of the candidate areas (S244). is replaced with any of the candidate areas I1, I2, . . . In included in the set S to generate the set S' (S245). Among the candidate areas included in the set S, the candidate areas to be replaced are randomly determined by uniform random numbers, for example.

画像処理装置１は、当該置き換えにより得られた集合Ｓ′について、（１）式により評価値ＳＩｏＵ（Ｓ′）を求める。そして画像処理装置１は、現在の集合Ｓの評価値ＳＩｏＵ（Ｓ）と、処理Ｓ２４５で求めた集合Ｓ′の評価値ＳＩｏＵ（Ｓ′）とを比較して、集合Ｓ′の評価値ＳＩｏＵ（Ｓ′）がより小さいか否かを判断する（Ｓ２４６）。ここでＳＩｏＵ（Ｓ′）＜ＳＩｏＵ（Ｓ）となっていれば（Ｓ２４６：Ｙｅｓ）、集合Ｓ′を集合Ｓとして置き換えて、集合Ｓを更新する（Ｓ２４７）。また、処理Ｓ２４５で求めた集合Ｓ′の評価値ＳＩｏＵ（Ｓ′）を、新たに現在の集合Ｓの評価値ＳＩｏＵ（Ｓ）とする（Ｓ２４８：評価値の更新）。 The image processing apparatus 1 obtains the evaluation value SIoU(S') from the equation (1) for the set S' obtained by the replacement. Then, the image processing apparatus 1 compares the current evaluation value SIoU(S) of the set S with the evaluation value SIoU(S') of the set S' obtained in the process S245, and obtains the evaluation value SIoU(S') of the set S'. S') is smaller (S246). If SIoU(S')<SIoU(S) (S246: Yes), the set S' is replaced with the set S and the set S is updated (S247). Also, the evaluation value SIoU(S') of the set S' obtained in the process S245 is newly set as the current evaluation value SIoU(S) of the set S (S248: Update evaluation value).

一方、画像処理装置１は、処理Ｓ２４６において、ＳＩｏＵ（Ｓ′）＜ＳＩｏＵ（Ｓ）でなければ（Ｓ２４６：Ｎｏ）、集合Ｓをそのままとする。 On the other hand, if SIoU(S')<SIoU(S) is not satisfied in step S246 (S246: No), the image processing apparatus 1 leaves the set S as it is.

画像処理装置１は、処理Ｓ２４４からＳ２４８までの処理を候補領域の集合Ｒに属する候補領域のうち、当初の集合Ｓに属していない候補領域Ｉjのそれぞれについて繰り返し実行し、すべての当該候補領域Ｉjについて実行した後で得られている集合Ｓに含まれる候補領域が表すｎ個の領域を注目領域として当該ｎ個の領域を特定する情報を得る。 The image processing apparatus 1 repeatedly executes the processes from S244 to S248 for each of the candidate areas Ij that do not belong to the initial set S among the candidate areas belonging to the set R of the candidate areas. The n regions represented by the candidate regions included in the set S obtained after the above are set as the regions of interest, and information specifying the n regions is obtained.

画像処理装置１は、図５の処理に戻り、処理対象となった画像データについて定めた注目領域の情報を記録する（Ｓ２５）。このとき画像処理装置１は、当該処理対象となった画像データのもととなったキーフレームを特定する情報を、注目領域の情報に関連付けて記録しておく。 The image processing apparatus 1 returns to the process of FIG. 5 and records the information of the attention area determined for the image data to be processed (S25). At this time, the image processing apparatus 1 records the information specifying the key frame that is the source of the image data to be processed in association with the information of the attention area.

画像処理装置１は、処理Ｓ２１からＳ２５の処理を繰り返して実行し、すべての処理対象の画像データについての処理を終了すると、記録したキーフレームを特定する情報ごとに、当該情報に関連付けて記録した注目領域の情報を取り出す。情報処理装置１は、当該取り出した注目領域の情報を、キーフレームを天球に投影したときの天球上の座標の情報へ変換し、当該天球上の座標で表される注目領域の情報を、キーフレームを特定する情報（例えばキーフレームのフレーム番号などであり、キーフレームを再生する再生時刻に関わる情報となる）に関連付けて、注目領域データベースとして記憶部１２に格納する（Ｓ２６）。 The image processing apparatus 1 repeatedly executes the processing from S21 to S25, and when the processing for all image data to be processed is completed, each piece of information specifying the recorded key frame is recorded in association with the information. Extract the information of the region of interest. The information processing apparatus 1 converts the extracted information of the attention area into information of the coordinates on the celestial sphere when the key frame is projected onto the celestial sphere, and converts the information of the attention area represented by the coordinates on the celestial sphere to the key. It is stored in the storage unit 12 as a region-of-interest database in association with information specifying a frame (for example, the frame number of a keyframe, which is information related to the reproduction time of the keyframe) (S26).

そして画像処理装置１は、この注目領域データベースと、入力された全天球画像の動画像のデータとを関連付けて出力する（Ｓ２７）。 Then, the image processing apparatus 1 associates this attention area database with the input moving image data of the omnidirectional image and outputs them (S27).

［再生処理］
画像のデータを再生する情報処理装置（パーソナルコンピュータ等であり、本発明の画像再生装置に相当する）は、このようにして画像処理装置１が生成した注目領域の情報（注目領域データベース）を用いて、次のような処理を行う。情報処理装置は、全天球画像の動画のデータに基づいて、逐次的に表示する一連の静止画像（フレーム）を再生する。 [Playback process]
An information processing apparatus (such as a personal computer, which corresponds to the image reproducing apparatus of the present invention) that reproduces image data uses the attention area information (attention area database) generated by the image processing apparatus 1 in this way. and perform the following processing. The information processing device reproduces a series of still images (frames) to be sequentially displayed based on moving image data of omnidirectional images.

ここで各フレームは全天球画像となっているので、情報処理装置は、各フレームの全天球画像を逐次的に仮想的な天球上に投影しつつ、当該天球の中心に配した仮想的なカメラから見た画像をレンダリングして表示する。この処理は、全天球画像を表示する処理として広く知られた方法を採用できるので、ここでの詳しい説明は省略する。 Here, since each frame is an omnidirectional image, the information processing device sequentially projects the omnidirectional image of each frame onto a virtual celestial sphere and creates a virtual image centered on the celestial sphere. Render and display the image seen from the camera. Since this process can employ a method widely known as a process for displaying an omnidirectional image, a detailed description thereof will be omitted here.

また情報処理装置は、各フレームの投影像をレンダリングする際に、注目領域データベースに、当該フレームのフレーム番号に関連付けられた注目領域の情報があれば、当該情報で表される天球上の領域を囲む枠の図形画像（図７（Ｘ））を、レンダリングしたフレームの画像に重ね合わせて描画して、表示する。図７では、道路走行中の車両から撮像した動画の天球画像データにおいて、左側に並ぶ建物のうち、一つの建物の入り口付近に注目領域（Ｘ）が設定された例を示している。 Further, when rendering the projected image of each frame, if the attention area database has information on the attention area associated with the frame number of the frame, the information processing apparatus renders the area on the celestial sphere represented by the information. The graphic image of the enclosing frame (FIG. 7(X)) is superimposed on the rendered image of the frame and displayed. FIG. 7 shows an example in which a region of interest (X) is set near the entrance of one of the buildings lined up on the left side in celestial image data of a moving image taken from a vehicle traveling on the road.

なお、広角画像データを表示する情報処理装置は、仮想的な天球の中心に配したカメラの視線方向など、画角を、視聴者の指示を受けて変更することとしてよい。これにより視聴者は天球上の種々の箇所を参照できるようになる。 Note that the information processing device that displays wide-angle image data may change the angle of view, such as the line-of-sight direction of a camera placed at the center of the virtual celestial sphere, in response to an instruction from the viewer. This allows the viewer to refer to different points on the celestial sphere.

上述の例では、キーフレームについてのみ、注目領域の情報が関連付けられているので、動画の再生を行う情報処理装置は、最後に再生したキーフレーム（再生時刻ｔiの時点で再生したフレームとする）に関連付けられていた注目領域の情報に基づいて、当該情報で表される天球上の領域を囲む枠の図形画像を、レンダリングした当該キーフレーム以降に再生する各フレーム（再生時刻ｔi+1，ｔi+2，…で再生される各フレーム）の画像に重ね合わせて描画してもよい。 In the above example, only the keyframes are associated with the attention area information. Based on the information on the attention area associated with the , each frame (playback time ti+1, ti +2, etc.) may be overlaid on the image and drawn.

このようにすると、あるキーフレームＫjが再生されてから次のキーフレームＫj+1が再生されるまでの間は、キーフレームＫjを特定する情報に関連付けられていた注目領域の情報に基づく表示が行われることとなる。 In this way, the display based on the information of the attention area associated with the information specifying the keyframe Kj is displayed from the time when a certain keyframe Kj is played until the time when the next keyframe Kj+1 is played. It will be done.

また、注目領域の情報の表示方法はこの例だけに限られない。広角画像データを表示する情報処理装置（本発明の画像再生装置に相当する）は、各キーフレームを特定する情報に関連付けられた注目領域の情報（各キーフレームの再生時点における注目領域の情報）を参照する。情報処理装置は、当該参照した情報で特定される注目領域のそれぞれについて、当該注目領域に係るフレームを再生するべき時点（ここではその注目領域に関連付けられた情報で特定されるキーフレームの再生時刻）ｔより前の、所定の方法で定めた時点ｔ－Δｔ（ただしΔｔ＞０）で再生されるべきフレーム（キーフレームには限られない）を再生する際に、上記参照した情報で特定される注目領域に関する情報を表示してもよい。この例では、注目領域となるべき画像が撮像されているキーフレームの表示より時刻Δｔだけ前（あるいはフレーム番号が所定の数だけ前のフレームを再生する時点）に、当該注目領域の位置を表す枠の図形画像等が描画される。これにより、視聴者は注目領域が現れるより前に注目領域の位置を知ることができるようになる。 Also, the method of displaying the information of the attention area is not limited to this example. An information processing device for displaying wide-angle image data (corresponding to the image reproducing device of the present invention) receives attention area information associated with information specifying each key frame (attention area information at the time of reproduction of each key frame). See For each of the attention areas specified by the referenced information, the information processing apparatus determines the time at which the frame associated with the attention area should be reproduced (here, the reproduction time of the key frame specified by the information associated with the attention area). ), when reproducing a frame (not limited to a key frame) to be reproduced at a point in time t-Δt (where Δt>0) determined by a predetermined method before t, specified by the information referred to above. information about the region of interest may be displayed. In this example, the position of the attention area is indicated at time Δt before the display of the key frame in which the image to be the attention area is captured (or at the time of reproducing the frame whose frame number is a predetermined number before). A graphic image of the frame and the like are drawn. This allows the viewer to know the position of the attention area before the attention area appears.

なお、ここでは広角画像の動画像を再生する例を示したが、注目領域を表す情報が関連付けられた静止画像データを表示する場合、当該静止画像データを表示する情報処理装置は、静止画像データが表す静止画像を表示出力するとともに、関連付けられている注目領域の情報に従い、当該情報で表される領域を囲む枠の図形画像を、表示した画像に重ね合わせて描画すればよい。 Although an example of reproducing a moving image of a wide-angle image is shown here, when still image data associated with information representing an attention area is displayed, an information processing device that displays the still image data A still image represented by is displayed and output, and a graphic image of a frame surrounding the area represented by the information is superimposed and drawn on the displayed image according to the information of the associated attention area.

なお、広角画像の静止画像データを表示する場合は、情報処理装置は、当該広角画像を、仮想的な所定の天球に投影して、当該天球の中心に配した仮想的なカメラから見た画像をレンダリングして表示する。そして情報処理装置は、関連付けられた注目領域の情報で表される天球上の領域を囲む枠の図形画像を、レンダリングした画像に重ね合わせて描画して、表示する。この場合も情報処理装置は、仮想的な天球の中心に配したカメラの視線方向など、画角を、視聴者の指示を受けて変更することとしてよい。これにより視聴者は天球上の種々の箇所を参照できるようになる。 When displaying still image data of a wide-angle image, the information processing device projects the wide-angle image onto a predetermined virtual celestial sphere, and an image viewed from a virtual camera placed at the center of the celestial sphere. is rendered and displayed. Then, the information processing apparatus draws and displays a graphic image of a frame surrounding the area on the celestial sphere represented by the information of the associated attention area so as to be superimposed on the rendered image. In this case as well, the information processing apparatus may change the angle of view, such as the line-of-sight direction of the camera placed at the center of the virtual celestial sphere, in response to the viewer's instruction. This allows the viewer to refer to different points on the celestial sphere.

また広角画像でない動画像データの場合、既に説明したように、注目領域を表す情報は対応するフレーム番号の情報に関連付けて記録される。そこでこの画像のデータを再生する情報処理装置は、当該記録を読み出して、画像のデータに基づいて動画像データの再生を開始し、各フレームごとに、当該フレームのフレーム番号の情報に関連付けて記録されている注目領域を表す情報があれば、当該情報で表される注目領域を囲む枠の図形画像を、表示した画像（各フレームの画像）に重ね合わせて描画する。また、当該フレームのフレーム番号の情報に関連付けて記録されている注目領域を表す情報がなければ、現在重ね合わせて描画している注目領域を囲む枠の図形画像の描画を続けることとしてもよい。 In the case of moving image data that is not a wide-angle image, as already explained, the information representing the attention area is recorded in association with the information of the corresponding frame number. Therefore, an information processing apparatus that reproduces the image data reads the record, starts reproducing the moving image data based on the image data, and records each frame in association with the frame number information of the frame. If there is information representing a focused area, a graphic image of a frame surrounding the focused area represented by the information is superimposed on the displayed image (image of each frame) and drawn. Also, if there is no information representing the attention area recorded in association with the information of the frame number of the frame, drawing of the graphic image of the frame enclosing the attention area currently superimposed and drawn may be continued.

また動画像データの再生の際、必ずしも関連付けられた情報で特定されるすべての注目領域を視聴者に提示する必要はない。例えば情報処理装置は、１以上の整数Ｐ0と、２以上の整数Ｐとを定め、値がＰ0，Ｐ0＋Ｐ，Ｐ0＋２Ｐ，…であるフレーム番号に関連付けられた情報で特定される注目領域を囲む枠の図形画像を表示することとしてもよい。またフレーム番号ｆがＰ0＋（ｋ－１）Ｐ＜ｆ＜P0＋ｋＰ（ただしｋ＝１，２，…）である間は、フレーム番号Ｐ0＋（ｋ－１）ＰまたはP0＋ｋＰに関連付けられた情報で特定される注目領域を囲む枠の図形画像を表示する。この例によると、Ｐ個のフレームが再生されるごとに一度、注目領域の情報が更新されることとなる。 Also, when reproducing moving image data, it is not always necessary to present the viewer with all the attention areas specified by the associated information. For example, the information processing apparatus defines an integer P0 of 1 or more and an integer P of 2 or more, and defines a frame surrounding a region of interest specified by information associated with frame numbers whose values are P0, P0+P, P0+2P, . A graphic image may be displayed. Also, while the frame number f is P0+(k-1)P<f<P0+kP (where k=1, 2, . . . ), it is specified by information associated with the frame number P0+(k-1)P or P0+kP display a graphical image of a frame surrounding the region of interest. According to this example, the information of the attention area is updated once every P frames are reproduced.

［再生中に注目領域を決定する例］
なお、画像データが動画像データまたは広角画像の動画像データである場合、これらの画像データの再生中（すなわち推定処理時）に注目領域を決定してもよい。この例において広角画像の動画像データを再生する際には、画像処理装置１は、図４の処理においてステップＳ１１に代えて、再生のためにレンダリングしようとしている画像がキーフレームであるか否かを判断し、キーフレームでなければ当該フレームの画像を仮想的な天球上に投影し、当該天球の中心に配した仮想的なカメラから、視聴者が指定した方向を見た画像をレンダリングして表示する。 [Example of determining attention area during playback]
If the image data is moving image data or moving image data of a wide-angle image, the region of interest may be determined during reproduction of the image data (that is, during estimation processing). In this example, when reproducing moving image data of a wide-angle image, the image processing apparatus 1 determines whether the image to be rendered for reproduction is a key frame, instead of step S11 in the processing of FIG. If it is not a key frame, the image of the frame is projected onto a virtual celestial sphere, and the image viewed from the virtual camera placed at the center of the celestial sphere is rendered in the direction specified by the viewer. indicate.

一方、キーフレームの広角画像を再生のためにレンダリングするときには、画像処理装置１は、ステップＳ１２を実行し、次にステップＳ１３の処理において次の処理を行う。この例では画像処理装置１は、当該キーフレームの広角画像を投影した仮想的な天球上の点のうち、再生中に利用者が指定した方向（例えば当該天球の中心に立つ視聴者の前方となるべき視線方向など）が変換後の正距円筒画像の中心となるように、当該広角画像データを、矩形状の正距円筒画像に変換する。そして画像処理装置１は、この正距円筒画像を用いて、注目領域を推定する処理を実行する。 On the other hand, when rendering the wide-angle image of the key frame for reproduction, the image processing apparatus 1 executes step S12, and then performs the following processing in the processing of step S13. In this example, the image processing device 1 selects a point on the virtual celestial sphere onto which the wide-angle image of the key frame is projected, in the direction specified by the user during playback (for example, in front of the viewer standing at the center of the celestial sphere). The wide-angle image data is converted into a rectangular equirectangular image so that the desired line-of-sight direction, etc., becomes the center of the converted equirectangular image. The image processing apparatus 1 then uses this equirectangular image to execute processing for estimating a region of interest.

［候補領域抽出の別の例］
本実施の形態のここまでの説明では、画像処理装置１は、候補領域を抽出する際に、セレクティブ・サーチを実行していたが、本実施の形態はこの例に限られない。 [Another example of candidate region extraction]
In the description of the present embodiment so far, the image processing apparatus 1 executes selective search when extracting candidate regions, but the present embodiment is not limited to this example.

例えば画像処理装置１は、予め、画像データを入力とし、当該画像データのうち、店舗の看板や店舗の入り口などの画像部分を検出してその範囲を表す情報を出力するよう機械学習した状態にあるニューラルネットワーク等を用いてもよい。このようなニューラルネットワークを用いる場合、処理対象となった画像を入力して、出力された画像部分の範囲を候補領域とする。この例では、店舗の看板や入り口などが注目領域として特定されることとなる。 For example, the image processing apparatus 1 receives image data in advance, detects image portions such as store signboards and store entrances from the image data, and performs machine learning to output information representing the range. A certain neural network or the like may be used. When using such a neural network, an image to be processed is input, and the range of the output image portion is used as a candidate area. In this example, a store signboard, an entrance, or the like is specified as the attention area.

［注目領域の決定処理の変形例］
さらに画像処理装置１の制御部１１は、機械学習処理時または推定処理時の少なくとも一方において、天球に投影された画像データを変換して得た正距円筒画像を処理対象とする場合には次のように処理を行ってもよい。例えば、制御部１１は、推定処理時に領域決定部２６としての処理を行う際、当初の候補領域の集合のうちから処理対象となった正距円筒座標の比較的高緯度の範囲（上辺または下辺から予め定めた範囲）との重複範囲が、その面積の所定割合以上を占める候補領域を除いてもよい。例えば、領域決定部２６は、正距円筒画像の上辺及び下辺のそれぞれから、正距円筒画像の高さの５％の範囲を高緯度の範囲とし、各候補領域について、この範囲と重なり合う部分を特定する。そして領域決定部２６は、当該特定した部分の面積が、当該候補領域の面積の所定の割合（例えば７０％）以上となっている場合、当該候補領域を、候補領域の集合から取り除く。 [Modified Example of Attention Area Determination Processing]
Furthermore, in at least one of machine learning processing and estimation processing, the control unit 11 of the image processing apparatus 1 performs the following when processing an equirectangular image obtained by converting image data projected onto the celestial sphere. You may process as follows. For example, when performing processing as the region determination unit 26 during estimation processing, the control unit 11 selects a relatively high latitude range (from the upper side or the lower A candidate area whose overlapping range with the predetermined range) occupies a predetermined ratio or more of its area may be excluded. For example, the region determination unit 26 defines a high latitude range that is 5% of the height of the equirectangular image from each of the upper and lower sides of the equirectangular image, and identifies the portion of each candidate region that overlaps with this range. do. Then, if the area of the identified portion is equal to or greater than a predetermined percentage (for example, 70%) of the area of the candidate area, the area determination unit 26 removes the candidate area from the set of candidate areas.

領域決定部２６は、こうして得た候補領域の集合を初期の集合Ｓとして、この集合Ｓに含まれる候補領域同士の重なり合いの程度と、顕著性マップから得られる、この集合Ｓに含まれる各候補領域内の顕著性の情報とに基づいて候補領域の集合を評価しつつ、候補領域を削減するなどして集合を補正し、注目領域を決定する。 The region determining unit 26 uses the set of candidate regions obtained in this way as an initial set S, the degree of overlap between the candidate regions included in this set S, and each candidate included in this set S obtained from the saliency map. While evaluating a set of candidate regions based on saliency information in the region, the set is corrected by, for example, reducing candidate regions, and a region of interest is determined.

この例によると、正距円筒画像への変換の際に変形の程度が比較的大きくなる高緯度部分の情報を排除することとなる。これより、比較的大きい変形によって誤って候補領域として検出されることが阻止される。 According to this example, the information of the high latitude portion where the degree of deformation is relatively large in the conversion to the equirectangular image is excluded. This prevents erroneous detection as candidate regions due to relatively large deformations.

［注目領域を関連付けるフレーム］
さらにここまでの説明では、動画像データ（広角画像であると否とを問わない）の再生の際に、ある注目領域の情報について、当該注目領域の情報が関連付けられたフレームが再生されるよりも前の時点で当該注目領域の範囲を表す図形画像を表示する例を示した。しかしながらこの結果を得るための処理は再生時点ではなく、注目領域の情報を生成する時点で行ってもよい。 [Frame associated with attention area]
Furthermore, in the description so far, when reproducing moving image data (regardless of whether it is a wide-angle image), for information on a certain attention area, the frame associated with the information on the attention area is reproduced. An example of displaying a graphic image representing the range of the attention area was shown at the previous point. However, the processing for obtaining this result may be performed at the time of generating the information of the attention area, not at the time of reproduction.

この例の画像処理装置１は、注目領域を決定した際に、当該注目領域を決定した際に処理対象となっているフレームの再生時刻（フレーム番号）より以前の所定の再生時刻のフレーム番号に、当該決定した注目領域の情報を関連付けて記録してもよい。 When the attention area is determined, the image processing apparatus 1 of this example selects a frame number of a predetermined reproduction time earlier than the reproduction time (frame number) of the frame to be processed when the attention area is determined. , information on the determined attention area may be associated and recorded.

例えば画像処理装置１は、フレーム番号Ｆのフレームを処理対象として注目領域を決定したときに、当該注目領域の情報を、フレーム番号ｍｉｎ（Ｆ－ΔＦ，１）のフレーム番号に関連付けて記録する。ここでｍｉｎ（ａ，ｂ）はａ，ｂのうち小さい値をとることを意味するものとする。またＦ，ΔＦは、いずれも正の整数であるものとし、ΔＦは予め定めておくものとする。 For example, when the image processing apparatus 1 determines a region of interest for a frame of frame number F as a processing target, it records information of the region of interest in association with the frame number of frame number min(F−ΔF, 1). Here, min(a, b) means taking the smaller value of a and b. Both F and ΔF are positive integers, and ΔF is predetermined.

１画像処理装置、１１制御部、１２記憶部、１３操作部、１４表示部、１５インタフェース部、２１受入部、２２取得部、２３候補領域抽出部、２４顕著性マップ学習処理部、２５顕著性マップ推定部、２６領域決定部、２７出力部。

1 image processing device, 11 control unit, 12 storage unit, 13 operation unit, 14 display unit, 15 interface unit, 21 reception unit, 22 acquisition unit, 23 candidate region extraction unit, 24 saliency map learning processing unit, 25 saliency map estimation unit, 26 area determination unit, 27 output unit.

Claims

acquisition means for acquiring image data to be processed;
Candidate area extracting means for extracting a plurality of candidate areas serving as attention area candidates that satisfy a predetermined condition from the acquired image data;
saliency map estimation means for estimating saliency map information related to the acquired image data based on the acquired image data;
determining means for determining a region of interest in the acquired image data based on the saliency map and the information of the candidate region;
An image processing device including

The image processing device according to claim 1,
The determining means is
An image processing device that determines a region of interest based on the saliency map and information on overlap of the candidate regions.

The image processing device according to claim 1 or 2,
The saliency map estimation means performs the estimation using a neural network that has undergone machine learning so as to receive input of image data and estimate and output saliency map information related to the image data,
The acquisition means receives an input of wide-angle image data projected to a range exceeding at least a hemisphere of the celestial sphere, and projects the wide-angle image data onto the celestial sphere at angles about at least one axis that are different from each other, or obtaining a plurality of pieces of image data to be processed by converting the wide-angle image data projected onto the celestial sphere after being rotated by a set of angles into rectangular image data;
An image processing apparatus for subjecting the image data to be subjected to the plurality of processes to machine learning processing of the saliency map estimation means.

The image processing device according to any one of claims 1 to 3,
The acquisition means receives input of moving image data, extracts still images at a plurality of playback times from the moving image data, and acquires image data of the extracted still images.

The image processing device according to claim 4,
The determining means, when determining the attention area, records information of the determined attention area in association with image data at a predetermined reproduction time before the reproduction time of the image data related to the attention area.

Moving image data representing still image data reproduced in chronological order, and including information specifying a predetermined attention area in the still image data reproduced at any point in time t (where t>0) means for accepting input of
When the moving image data is reproduced, the information specifying the attention area is referred to, and a time point t- determined by a predetermined method before the time point t at which the still image data related to the attention area specified by the information is to be reproduced. An image reproducing apparatus for displaying information about an attention area specified by the information when reproducing still image data reproduced at Δt (where Δt>0).

the computer,
acquisition means for acquiring image data to be processed;
Candidate area extracting means for extracting a plurality of candidate areas serving as attention area candidates that satisfy a predetermined condition from the acquired image data;
saliency map estimation means for estimating saliency map information related to the acquired image data based on the acquired image data;
determining means for determining a region of interest in the acquired image data based on the saliency map and the information of the candidate region;
A program that acts as