JP2024045532A

JP2024045532A - Judgment device

Info

Publication number: JP2024045532A
Application number: JP2024020183A
Authority: JP
Inventors: 俊明井上
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 2019-11-13
Filing date: 2024-02-14
Publication date: 2024-04-02
Also published as: JP2021077249A

Abstract

【課題】視覚的に見落とす可能性のある物体を検出する。【解決手段】判定装置１は、視覚顕著性抽出手段３が車両等の移動体から外部を撮像した画像データから視覚顕著性の高低を推測して得られた視覚顕著性マップを生成する。一方、物体領域検出手段４には検出対象とする物体の種類が設定され、画像データから設定された種類の物体を検出する。そして、見落とし判定手段５では、物体領域検出手段４が検出した物体について、視覚顕著性マップに基づいて見落とし判定を行う。【選択図】図１[Problem] Detecting objects that may be visually overlooked. [Solution] In a determination device 1, a visual saliency extraction means 3 generates a visual saliency map obtained by estimating the level of visual saliency from image data captured of the outside from a moving body such as a vehicle. Meanwhile, an object region detection means 4 is set with a type of object to be detected, and detects objects of the set type from the image data. Then, an oversight determination means 5 performs an oversight determination for the object detected by the object region detection means 4 based on the visual saliency map. [Selected Figure] Figure 1

Description

本発明は、移動体から外部を撮像した画像に基づいて見落とし可能性判定を行う判定装置に関する。 The present invention relates to a determination device that determines the possibility of oversight based on an image taken of the outside from a moving object.

従来、移動体として例えば自車両の運転者等に対して様々な注意喚起を行うことが提案されている。例えば、特許文献１には、視認度推定部と、視認完了時間算出部と、視認判定部と、を含む道路標識視認判定システムが記載されている。 BACKGROUND ART Conventionally, it has been proposed to give various warnings to a moving object such as a driver of a vehicle. For example, Patent Document 1 describes a road sign visibility determination system that includes a visibility estimation section, a visibility completion time calculation section, and a visibility determination section.

特許文献１について詳しく説明すると、視認度推定部が、道路標識の内容に基づいて当該道路標識の複雑さを算出し、算出した前記道路標識の複雑さに応じた当該道路標識の視認度を推定する。次に、視認完了時間算出部が、運転者から道路標識までの距離を算出し、車両の速度、道路標識の視認度、及び算出した運転者から道路標識までの距離を用いて運転者による当該道路標識の視認に要する視認完了時間を算出する。そして、視認判定部が、車両の位置情報と、道路標識情報と、運転者の視線の方向を含む視線情報とに基づいて運転者が道路標識を連続して注視している注視時間を算出し、注視時間と視認完了時間とに基づいて運転者が道路標識の内容を認識したか否かを判定する。 To explain Patent Document 1 in detail, the visibility estimation unit calculates the complexity of the road sign based on the content of the road sign, and estimates the visibility of the road sign according to the calculated complexity of the road sign. do. Next, the visibility completion time calculation unit calculates the distance from the driver to the road sign, and uses the speed of the vehicle, visibility of the road sign, and the calculated distance from the driver to the road sign to Calculate the time required to complete visual recognition of a road sign. Then, the visibility determination unit calculates the gaze time during which the driver continuously gazes at the road sign based on the vehicle position information, the road sign information, and the line of sight information including the direction of the driver's line of sight. , it is determined whether the driver has recognized the content of the road sign based on the gaze time and the visual recognition completion time.

特開２０１７－１１１４６９号公報Japanese Patent Application Publication No. 2017-111469

特許文献１に記載の道路標識視認判定システムは、道路標識が対象であり、他の車両や歩行者等の移動体については何ら考慮されていない。通常、道路上には他の移動体（自動車、バイク、自転車、歩行者等）もあるため、これらの見落しについても注意喚起をすることが望ましい。また、特許文献１に記載の道路標識視認判定システムは、運転者の視線を検出する必要があり、そのため視線を検出するための設備を車内に取り付けなければならない。 The road sign visibility determination system described in Patent Document 1 targets road signs, and does not take any moving objects such as other vehicles or pedestrians into account. Since there are usually other moving objects (cars, motorbikes, bicycles, pedestrians, etc.) on the road, it is desirable to warn drivers about overlooking these objects. Further, the road sign visibility determination system described in Patent Document 1 needs to detect the driver's line of sight, and therefore, equipment for detecting the line of sight must be installed inside the vehicle.

本発明が解決しようとする課題としては、視覚的に見落とす可能性のある物体を検出することが一例として挙げられる。 An example of the problem to be solved by the present invention is to detect an object that may be visually overlooked.

上記課題を解決するために、移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する生成部と、検出対象とする物体の種類を設定する設定部と、前記画像から設定された種類の物体を検出する物体検出部と、前記物体検出部が検出した物体について、前記視覚顕著性分布情報における前記物体の領域内の前記視覚顕著性の値の和を、前記領域の面積で除した値に基づいて見落とし可能性判定を行う判定部と、を備えることを特徴としている。 In order to solve the above problems, we have developed a generation unit that generates visual saliency distribution information obtained by estimating the level of visual saliency based on images taken of the outside from a moving object, and the type of object to be detected. a setting unit that sets an object of a set type from the image; and an object detection unit that detects a set type of object from the image; and an object detection unit that detects a set type of object from the image; The present invention is characterized by comprising a determination unit that determines the possibility of oversight based on a value obtained by dividing the sum of the values of the characteristics by the area of the region.

請求項７に記載の発明は、移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する生成部と、検出対象とする物体の種類を設定する設定部と、前記画像の撮像範囲を含む領域の前記設定部に設定された種類の物体を検出する物体検出部と、前記物体検出部が検出した物体について、前記視覚顕著性分布情報における前記物体の領域内の前記視覚顕著性の値の和を、前記領域の面積で除した値に基づいて見落とし可能性判定を行う判定部と、を備えることを特徴としている。 The invention described in claim 7 is characterized by comprising a generation unit that generates visual saliency distribution information obtained by estimating the level of visual saliency based on an image of the outside captured from a moving body, a setting unit that sets the type of object to be detected, an object detection unit that detects objects of the type set in the setting unit in an area including the imaging range of the image, and a determination unit that performs a probability of overlooking determination for an object detected by the object detection unit based on a value obtained by dividing the sum of the visual saliency values within the object's area in the visual saliency distribution information by the area of the area.

請求項８に記載の発明は、移動体から外部を撮像した画像に基づいて見落とし可能性判定を行う判定装置で実行される判定方法であって、前記画像から視覚顕著性の高低を推測
して得られた視覚顕著性分布情報を生成する生成工程と、検出対象とする物体の種類を設定する設定工程と、前記画像から設定された種類の物体を検出する物体検出工程と、前記物体検出工程で検出した物体について、前記視覚顕著性分布情報における前記物体の領域内の前記視覚顕著性の値の和を、前記領域の面積で除した値に基づいて見落とし可能性判定を行う判定工程と、を含むことを特徴としている。 The invention according to claim 8 is a determination method that is executed by a determination device that performs oversight possibility determination based on an image taken of the outside from a moving body, the determination method being performed by estimating the level of visual saliency from the image. a generation step of generating the obtained visual saliency distribution information; a setting step of setting the type of object to be detected; an object detection step of detecting the set type of object from the image; and the object detection step. A determination step of determining the possibility of overlooking the detected object based on a value obtained by dividing the sum of the visual saliency values in the area of the object in the visual saliency distribution information by the area of the area; It is characterized by including.

請求項９に記載の発明は、請求項８に記載の判定方法をコンピュータにより実行させることを特徴としている。 The invention according to claim 9 is characterized in that the determination method according to claim 8 is executed by a computer.

請求項１０に記載の発明は、請求項９に記載の判定プログラムを格納したことを特徴としている。 The invention described in claim 10 is characterized in that the determination program described in claim 9 is stored.

請求項１１に記載の発明は、移動体から外部を撮像した画像に基づいて見落とし可能性判定を行う判定装置で実行される判定方法であって、前記画像から視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する生成工程と、検出対象とする物体の種類を設定する設定工程と、前記画像の撮像範囲を含む領域の前記設定工程で設定された種類の物体を検出する物体検出工程と、前記物体検出工程で検出した物体について、前記視覚顕著性分布情報における前記物体の領域内の前記視覚顕著性の値の和を、前記領域の面積で除した値に基づいて見落とし可能性判定を行う判定工程と、を含むことを特徴としている。 The invention described in claim 11 is a determination method executed by a determination device that performs an oversight probability determination based on an image captured of the outside from a moving body, and is characterized by including a generation process for generating visual saliency distribution information obtained by estimating the level of visual saliency from the image, a setting process for setting the type of object to be detected, an object detection process for detecting objects of the type set in the setting process in an area including the imaging range of the image, and a determination process for performing an oversight probability determination for the object detected in the object detection process based on a value obtained by dividing the sum of the visual saliency values within the object area in the visual saliency distribution information by the area of the area.

請求項１２に記載の発明は、請求項１１に記載の判定方法をコンピュータにより実行させることを特徴としている。 The invention according to claim 12 is characterized in that the determination method according to claim 11 is executed by a computer.

請求項１３に記載の発明は、請求項１２に記載の判定プログラムを格納したことを特徴としている。 The invention described in claim 13 is characterized in that the judgment program described in claim 12 is stored.

本発明の一実施例にかかる判定装置の機能構成図である。FIG. 2 is a functional configuration diagram of a determination device according to an embodiment of the present invention. 図１に示された視覚顕著性抽出手段の構成を例示するブロック図である。2 is a block diagram illustrating a configuration of a visual saliency extraction unit shown in FIG. 1 . （ａ）は判定装置へ入力する画像を例示する図であり、（ｂ）は（ａ）に対し推定される、視覚顕著性マップを例示する図である。(a) is a diagram illustrating an image input to the determination device, and (b) is a diagram illustrating a visual saliency map estimated for (a). 図１に示された視覚顕著性抽出手段の処理方法を例示するフローチャートである。2 is a flow chart illustrating a processing method of the visual saliency extraction means shown in FIG. 1 . 非線形写像部の構成を詳しく例示する図である。FIG. 3 is a diagram illustrating in detail the configuration of a nonlinear mapping section. 中間層の構成を例示する図である。FIG. 3 is a diagram illustrating a configuration of an intermediate layer. （ａ）および（ｂ）はそれぞれ、フィルタで行われる畳み込み処理の例を示す図である。(a) and (b) are diagrams each showing an example of convolution processing performed by a filter. （ａ）は、第１のプーリング部の処理を説明するための図であり、（ｂ）は、第２のプーリング部の処理を説明するための図であり、（ｃ）は、アンプーリング部の処理を説明するための図である。(a) is a diagram for explaining the processing of the first pooling section, (b) is a diagram for explaining the processing of the second pooling section, and (c) is a diagram for explaining the processing of the second pooling section. FIG. 図１に示された判定手段の動作のフローチャートである。2 is a flowchart of the operation of the determining means shown in FIG. 1. FIG. 物体領域検出手段から出力される領域情報を例示した図である。5 is a diagram illustrating an example of region information output from an object region detection means. FIG.

以下、本発明の一実施形態にかかる判定装置を説明する。本発明の一実施形態にかかる判定装置は、生成部が移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する。一方、設定部で検出対象とする物体の種類が設定され、物体検出部で画像から設定された種類の物体を検出する。そして、判定部では、物体検出部が検出した物体について、視覚顕著性分布情報に基づいて見落とし可能性判定を行う。このようにすることにより、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。また、移動体から外部を撮像した画像のみで見落とし可能性の判定ができるので、例えばドライブレコーダ等の画像から判定可能であり、視線検出等も不要となる。 The following describes a determination device according to one embodiment of the present invention. In the determination device according to one embodiment of the present invention, a generation unit generates visual saliency distribution information obtained by estimating the level of visual saliency based on an image captured from a moving body of the outside. Meanwhile, a setting unit sets the type of object to be detected, and an object detection unit detects objects of the set type from the image. The determination unit then performs an oversight possibility determination based on the visual saliency distribution information for the object detected by the object detection unit. In this way, it is possible to determine the oversight possibility by combining visual saliency distribution information and object detection. Therefore, it is possible to detect objects that may be visually overlooked. In addition, since the oversight possibility can be determined only from an image captured from the outside of a moving body, it can be determined from images such as a drive recorder, and gaze detection is not required.

また、判定部は、物体検出部が検出した物体について、視覚顕著性分布情報と対比して見落とし可能性判定を行ってもよい。このようにすることにより、視覚顕著性分布情報の分布と撮像された画像とを対比することにより、見落とし可能性を判定することができる。 Further, the determination unit may determine the possibility of overlooking the object detected by the object detection unit by comparing it with the visual saliency distribution information. By doing so, the possibility of oversight can be determined by comparing the distribution of visual saliency distribution information and the captured image.

また、判定部は、物体検出部が検出した物体について、視覚顕著性が高いと判定された領域と重ならない物体は見落とされる可能性が高いと判定してもよい。このようにすることにより、画像中で視覚顕著性が高くない部分に位置する物体が見落とし易いと判定することができる。 The determination unit may also determine that, with regard to objects detected by the object detection unit, objects that do not overlap with areas determined to have high visual saliency are likely to be overlooked. By doing so, it is possible to determine that objects located in parts of the image that do not have high visual saliency are likely to be overlooked.

また、生成部は、画像を写像処理可能な中間データに変換する入力部と、中間データを写像データに変換する非線形写像部と、写像データに基づき顕著性分布を示す顕著性推定情報を生成する出力部と、を備え、非線形写像部は、中間データに対し特徴の抽出を行う特徴抽出部と、特徴抽出部で生成されたデータのアップサンプルを行うアップサンプル部と、を備えてもよい。このようにすることにより、小さな計算コストで、視覚顕著性を推定することができる。 The generation unit may also include an input unit that converts the image into intermediate data that can be mapped, a nonlinear mapping unit that converts the intermediate data into mapped data, and an output unit that generates saliency estimation information indicating a saliency distribution based on the mapped data, and the nonlinear mapping unit may include a feature extraction unit that extracts features from the intermediate data, and an upsampling unit that upsamples the data generated by the feature extraction unit. In this way, visual saliency can be estimated with low computational cost.

また、判定部における判定結果を提示する提示部を備えてもよい。このようにすることにより、判定結果を運転者に提示して見落とし可能性を警告することができる。 The device may also include a presentation unit that presents the results of the determination made by the determination unit. In this way, the results of the determination can be presented to the driver to warn them of the possibility of overlooking something.

また、本発明の他の実施形態にかかる判定装置は、生成部が移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する。一方、設定部で検出対象とする物体の種類が設定され、物体検出部で画像の撮像範囲を含む領域の設定部に設定された種類の物体を検出する。そして、判定部では、物体検出部が検出した物体について、視覚顕著性分布情報に基づいて見落とし可能性判定を行う。このようにすることにより、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。また、物体検出は画像によらなくてもよく、例えばライダ（ＬｉＤＡＲ：Light Detection and Ranging）等の他のセンサの物体検出結果を利用することができる。 Further, in a determination device according to another embodiment of the present invention, the generation unit generates visual saliency distribution information obtained by estimating the level of visual saliency based on an image taken of the outside from a moving object. On the other hand, the setting section sets the type of object to be detected, and the object detection section detects the object of the type set in the setting section of the area including the image capturing range. The determination unit then determines the possibility of overlooking the object detected by the object detection unit based on the visual saliency distribution information. By doing so, it is possible to determine the possibility of overlooking by combining visual saliency distribution information and object detection. Therefore, objects that may be visually overlooked can be detected. Furthermore, object detection does not have to be based on images; for example, object detection results from other sensors such as lidar (Light Detection and Ranging) can be used.

また、本発明の一実施形態にかかる判定方法は、生成工程で移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する。一方、設定工程で検出対象とする物体の種類が設定され、物体検出工程で画像から設定された種類の物体を検出する。そして、判定工程では、物体検出工程で検出した物体について、視覚顕著性分布情報に基づいて見落とし可能性判定を行う。このようにすることにより、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。また、移動体から外部を撮像した画像のみで見落とし可能性の判定ができるので、例えばドライブレコーダ等の画像から判定可能であり、視線検出等も不要となる。 Further, in the determination method according to an embodiment of the present invention, visual saliency distribution information is generated by estimating the level of visual saliency based on an image taken of the outside from a moving body in the generation step. On the other hand, in the setting step, the type of object to be detected is set, and in the object detection step, the set type of object is detected from the image. In the determination step, the possibility of overlooking the object detected in the object detection step is determined based on the visual saliency distribution information. By doing so, it is possible to determine the possibility of overlooking by combining visual saliency distribution information and object detection. Therefore, objects that may be visually overlooked can be detected. Furthermore, since the possibility of overlooking can be determined only from an image taken of the outside from a moving object, it is possible to determine the possibility of overlooking, for example, from an image of a drive recorder, etc., and line of sight detection etc. are not required.

また、上述した判定方法を、コンピュータにより実行させている。このようにすることにより、コンピュータを用いて、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。 Further, the above-described determination method is executed by a computer. By doing so, it is possible to use a computer to determine the possibility of overlooking by combining visual saliency distribution information and object detection. Therefore, objects that may be visually overlooked can be detected.

また、上述した判定プログラムをコンピュータ読み取り可能な記憶媒体に格納してもよい。このようにすることにより、当該プログラムを機器に組み込む以外に単体でも流通させることができ、バージョンアップ等も容易に行える。 The above-mentioned determination program may also be stored on a computer-readable storage medium. In this way, the program can be distributed as a standalone program in addition to being incorporated into a device, and version upgrades, etc., can be easily performed.

また、本発明の他の実施形態にかかる判定方法は、生成工程で移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性分布情報を生成する。一方、設定工程で検出対象とする物体の種類が設定され、物体検出工程で画像の撮像範囲を含む領域の設定部に設定された種類の物体を検出する。そして、判定工程では、物体検出工程で検出した物体について、視覚顕著性分布情報に基づいて見落とし可能性判定を行う。このようにすることにより、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。また、物体検出は画像によらなくてもよく、例えばライダ等の他のセンサの物体検出結果を利用することができる。 In addition, in a determination method according to another embodiment of the present invention, in a generation process, visual saliency distribution information is generated by estimating the level of visual saliency based on an image of the outside captured from a moving body. Meanwhile, in a setting process, the type of object to be detected is set, and in an object detection process, objects of the type set in a setting section of an area including the imaging range of the image are detected. Then, in a determination process, a likelihood of overlooking is determined for the object detected in the object detection process based on the visual saliency distribution information. In this way, it is possible to determine the likelihood of overlooking by combining visual saliency distribution information and object detection. Therefore, it is possible to detect objects that may be visually overlooked. In addition, object detection does not have to be based on images, and object detection results from other sensors such as lidar, for example, can be used.

また、上述した判定方法を、コンピュータにより実行させている。このようにすることにより、コンピュータを用いて、視覚顕著性分布情報と物体検出とを組みわせて見落とし可能性を判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。 The above-mentioned determination method is also executed by a computer. In this way, the computer can be used to determine the likelihood of overlooking by combining visual saliency distribution information and object detection. Therefore, it is possible to detect objects that may be visually overlooked.

また、上述した判定プログラムをコンピュータ読み取り可能な記憶媒体に格納してもよい。このようにすることにより、当該プログラムを機器に組み込む以外に単体でも流通させることができ、バージョンアップ等も容易に行える。 Further, the above-described determination program may be stored in a computer-readable storage medium. By doing so, the program can be distributed as a standalone program in addition to being incorporated into a device, and version upgrades can be easily performed.

本発明の一実施例にかかる判定装置を図１～図１０を参照して説明する。本実施例にかかる判定装置は、例えば自動車等の移動体に設置されている。但し、図１に示した構成の全てを移動体に搭載するに限らない。少なくとも後述する情報提示手段６のみを移動体に設置すれば、他の手段は例えばサーバ装置等で構成し、サーバ装置と移動体間で通信するように構成してもよい。 A determination device according to one embodiment of the present invention will be described with reference to Figs. 1 to 10. The determination device according to this embodiment is installed in a mobile body such as an automobile. However, it is not limited to mounting all of the components shown in Fig. 1 on the mobile body. As long as at least the information presentation means 6 described below is installed on the mobile body, the other means may be configured as, for example, a server device, etc., and communication may be performed between the server device and the mobile body.

図１に示したように、判定装置１は、入力手段２と、視覚顕著性抽出手段３と、物体領域検出手段４と、見落とし判定手段５と、情報提示手段６と、を備えている。 As shown in FIG. 1, the determination device 1 includes an input unit 2, a visual saliency extraction unit 3, an object region detection unit 4, an oversight determination unit 5, and an information presentation unit 6.

入力手段２は、例えばカメラなどで撮像された画像（静止画像又は動画像）が入力され、その画像を画像データとして出力する。なお、入力された画像が動画像の場合は、例えばフレーム毎等の時系列に分解された画像データとして出力する。入力手段２に入力される画像は、例えば車両の進行方向が撮像された画像が挙げられるが、いわゆるパノラマ画像等の水平方向に１８０°や３６０°等進行方向以外が含まれる画像であってもよい。また、入力手段２には入力されるのは、カメラで撮像された画像に限らず、ハードディスクドライブやメモリカード等の記録媒体から読み出した画像であってもよい。 The input means 2 receives an image (still image or video image) captured by, for example, a camera, and outputs the image as image data. If the input image is a video image, it outputs the image data broken down into a time series, for example, for each frame. The image input to the input means 2 may be, for example, an image captured in the direction of travel of the vehicle, but it may also be an image that includes a horizontal direction other than the travel direction, such as 180° or 360°, such as a so-called panoramic image. Furthermore, the images input to the input means 2 are not limited to images captured by a camera, and may also be images read from a recording medium such as a hard disk drive or memory card.

視覚顕著性抽出手段３は、入力手段２から画像データが入力され、後述する視覚顕著性推定情報として視覚顕著性マップを出力する。即ち、視覚顕著性抽出手段３は、移動体から外部を撮像した画像に基づいて視覚顕著性の高低を推測して得られた視覚顕著性マップ（視覚顕著性分布情報）を生成する生成部として機能する。 The visual saliency extraction means 3 receives image data from the input means 2 and outputs a visual saliency map as visual saliency estimation information described below. In other words, the visual saliency extraction means 3 functions as a generation unit that generates a visual saliency map (visual saliency distribution information) obtained by estimating the level of visual saliency based on an image captured of the outside from a moving body.

図２は、視覚顕著性抽出手段３の構成を例示するブロック図である。本実施例に係る視覚顕著性抽出手段３は、入力部３１０、非線形写像部３２０、出力部３３０および記憶部３９０を備える。入力部３１０は、画像を写像処理可能な中間データに変換する。非線形写像部３２０は、中間データを写像データに変換する。出力部３３０は、写像データに基づき顕著性分布を示す顕著性推定情報を生成する。そして、非線形写像部３２０は、中間データに対し特徴の抽出を行う特徴抽出部３２１と、特徴抽出部３２１で生成されたデータのアップサンプルを行うアップサンプル部３２２とを備える。記憶部３９０は、入力手段２から入力された画像データや後述するフィルタの係数等が保持されている。以下に詳しく説明する。 Figure 2 is a block diagram illustrating the configuration of the visual saliency extraction means 3. The visual saliency extraction means 3 according to this embodiment includes an input unit 310, a nonlinear mapping unit 320, an output unit 330, and a storage unit 390. The input unit 310 converts an image into intermediate data that can be subjected to mapping processing. The nonlinear mapping unit 320 converts the intermediate data into mapping data. The output unit 330 generates saliency estimation information indicating a saliency distribution based on the mapping data. The nonlinear mapping unit 320 includes a feature extraction unit 321 that extracts features from the intermediate data, and an upsampling unit 322 that upsamples the data generated by the feature extraction unit 321. The storage unit 390 holds image data input from the input unit 2 and coefficients of a filter, which will be described later. This will be explained in detail below.

図３（ａ）は、視覚顕著性抽出手段３へ入力する画像を例示する図であり、図３（ｂ）は、図３（ａ）に対し推定される、視覚顕著性分布を示す画像を例示する図である。本実施例に係る視覚顕著性抽出手段３は、画像における各部分の視覚顕著性を推定する装置である。視覚顕著性とは例えば、目立ちやすさや視線の集まりやすさを意味する。具体的には視覚顕著性は、確率等で示される。ここで、確率の大小は、たとえばその画像を見た人の視線がその位置に向く確率の大小に対応する。 Figure 3(a) is a diagram illustrating an example of an image input to the visual saliency extraction means 3, and Figure 3(b) is a diagram illustrating an example of an image showing a visual saliency distribution estimated for Figure 3(a). The visual saliency extraction means 3 according to this embodiment is a device that estimates the visual saliency of each part in an image. Visual saliency means, for example, how easily something stands out or how easily it attracts attention. More specifically, visual saliency is expressed as a probability or the like. Here, the magnitude of the probability corresponds, for example, to the probability that the gaze of a person viewing the image will be directed to that position.

図３（ａ）と図３（ｂ）とは、互いに位置が対応している。そして、図３（ａ）において、視覚顕著性が高い位置ほど、図３（ｂ）において輝度が高く表示されている。図３（ｂ）のような視覚顕著性分布を示す画像は、出力部３３０が出力する視覚顕著性マップの一例である。本図の例において、視覚顕著性は、２５６階調の輝度値で可視化されている。出力部３３０が出力する視覚顕著性マップの例については詳しく後述する。 Figure 3(a) and Figure 3(b) correspond to each other in terms of position. In Figure 3(a), the higher the visual saliency is at a position, the higher the luminance is displayed in Figure 3(b). The image showing the visual saliency distribution as in Figure 3(b) is an example of a visual saliency map output by the output unit 330. In this example, visual saliency is visualized with 256 gradations of luminance values. An example of a visual saliency map output by the output unit 330 will be described in detail later.

図４は、本実施例に係る視覚顕著性抽出手段３の動作を例示するフローチャートである。図４に示したフローチャートは、コンピュータによって実行される判定方法の一部であって、入力ステップＳ１１０、非線形写像ステップＳ１２０、および出力ステップＳ１３０を含む。入力ステップＳ１１０では、画像が写像処理可能な中間データに変換される。非線形写像ステップＳ１２０では、中間データが写像データに変換される。出力ステップＳ１３０では、写像データに基づき顕著性分布を示す視覚顕著性推定情報が生成される。ここで、非線形写像ステップＳ１２０は、中間データに対し特徴の抽出を行う特徴抽出ステップＳ１２１と、特徴抽出ステップＳ１２１で生成されたデータのアップサンプルを行うアップサンプルステップＳ１２２とを含む。 FIG. 4 is a flowchart illustrating the operation of the visual saliency extraction means 3 according to this embodiment. The flowchart shown in FIG. 4 is part of a determination method executed by a computer, and includes an input step S110, a nonlinear mapping step S120, and an output step S130. In the input step S110, the image is converted into intermediate data that can be mapped. In the nonlinear mapping step S120, intermediate data is converted into mapping data. In the output step S130, visual saliency estimation information indicating saliency distribution is generated based on the mapping data. Here, the nonlinear mapping step S120 includes a feature extraction step S121 that extracts features from intermediate data, and an upsampling step S122 that upsamples the data generated in the feature extraction step S121.

図２に戻り、視覚顕著性抽出手段３の各構成要素について説明する。入力ステップＳ１１０において入力部３１０は、画像を取得し、中間データに変換する。入力部３１０は、画像データを入力手段２から取得する。そして入力部３１０は、取得した画像を中間データに変換する。中間データは非線形写像部３２０が受け付け可能なデータであれば特に限定されないが、たとえば高次元テンソルである。また、中間データはたとえば、取得した画像に対し輝度を正規化したデータ、または、取得した画像の各画素を、輝度の傾きに変換したデータである。入力ステップＳ１１０において入力部３１０は、さらに画像のノイズ除去や解像度変換等を行っても良い。 Returning to FIG. 2, each component of the visual saliency extraction means 3 will be described. In the input step S110, the input unit 310 acquires an image and converts it into intermediate data. The input unit 310 acquires image data from the input means 2. The input unit 310 then converts the acquired image into intermediate data. The intermediate data is not particularly limited as long as it is data that can be accepted by the nonlinear mapping unit 320, and is, for example, a high-dimensional tensor. The intermediate data is, for example, data in which the luminance of the acquired image is normalized, or data in which each pixel of the acquired image is converted into a luminance gradient. In the input step S110, the input unit 310 may further perform noise removal and resolution conversion of the image.

非線形写像ステップＳ１２０において、非線形写像部３２０は入力部３１０から中間データを取得する。そして、非線形写像部３２０において中間データが写像データに変換される。ここで、写像データは例えば高次元テンソルである。非線形写像部３２０で中間データに施される写像処理は、たとえばパラメータ等により制御可能な写像処理であり、関数、汎関数、またはニューラルネットワークによる処理であることが好ましい。 In the nonlinear mapping step S120, the nonlinear mapping section 320 obtains intermediate data from the input section 310. Then, the intermediate data is converted into mapping data in the nonlinear mapping section 320. Here, the mapping data is, for example, a high-dimensional tensor. The mapping process performed on the intermediate data by the nonlinear mapping unit 320 is, for example, a mapping process that can be controlled by parameters, etc., and is preferably a process using a function, a functional, or a neural network.

図５は、非線形写像部３２０の構成を詳しく例示する図であり、図６は、中間層３２３の構成を例示する図である。上記した通り、非線形写像部３２０は、特徴抽出部３２１およびアップサンプル部３２２を備える。特徴抽出部３２１において特徴抽出ステップＳ１２１が行われ、アップサンプル部３２２においてアップサンプルステップＳ１２２が行われる。また、本図の例において、特徴抽出部３２１およびアップサンプル部３２２の少なくとも一方は、複数の中間層３２３を含むニューラルネットワークを含んで構成される。ニューラルネットワークにおいては、複数の中間層３２３が結合されている。 Figure 5 is a diagram illustrating in detail the configuration of the nonlinear mapping unit 320, and Figure 6 is a diagram illustrating the configuration of the intermediate layer 323. As described above, the nonlinear mapping unit 320 includes a feature extraction unit 321 and an upsampling unit 322. The feature extraction step S121 is performed in the feature extraction unit 321, and the upsampling step S122 is performed in the upsampling unit 322. In the example shown in this figure, at least one of the feature extraction unit 321 and the upsampling unit 322 is configured to include a neural network including multiple intermediate layers 323. In the neural network, multiple intermediate layers 323 are connected.

特にニューラルネットワークは畳み込みニューラルネットワークであることが好ましい。具体的には、複数の中間層３２３のそれぞれは、一または二以上の畳み込み層３２４を含む。そして、畳み込み層３２４では、入力されたデータに対し複数のフィルタ３２５による畳み込みが行われ、複数のフィルタ３２５の出力に対し活性化処理が施される。 In particular, it is preferable that the neural network is a convolutional neural network. Specifically, each of the plurality of intermediate layers 323 includes one or more convolutional layers 324. Then, in the convolution layer 324, the input data is convolved by a plurality of filters 325, and the outputs of the plurality of filters 325 are subjected to activation processing.

図５の例において、特徴抽出部３２１は、複数の中間層３２３を含むニューラルネットワークを含んで構成され、複数の中間層３２３の間に第１のプーリング部３２６を備える。また、アップサンプル部３２２は、複数の中間層３２３を含むニューラルネットワークを含んで構成され、複数の中間層３２３の間にアンプーリング部３２８を備える。さらに、特徴抽出部３２１とアップサンプル部３２２とは、オーバーラッププーリングを行う第２のプーリング部３２７を介して互いに接続されている。 In the example of FIG. 5, the feature extraction unit 321 is configured to include a neural network including multiple intermediate layers 323, and includes a first pooling unit 326 between the multiple intermediate layers 323. The upsampling unit 322 is configured to include a neural network including multiple intermediate layers 323, and includes an unpooling unit 328 between the multiple intermediate layers 323. Furthermore, the feature extraction unit 321 and the upsampling unit 322 are connected to each other via a second pooling unit 327 that performs overlap pooling.

なお、本図の例において各中間層３２３は、二以上の畳み込み層３２４からなる。ただし、少なくとも一部の中間層３２３は、一の畳み込み層３２４のみからなってもよい。互いに隣り合う中間層３２３は、第１のプーリング部３２６、第２のプーリング部３２７およびアンプーリング部３２８のいずれかで区切られる。ここで、中間層３２３に二以上の畳み込み層３２４が含まれる場合、それらの畳み込み層３２４におけるフィルタ３２５の数は互いに等しいことが好ましい。 Note that in the example shown in this figure, each intermediate layer 323 consists of two or more convolutional layers 324. However, at least some of the intermediate layers 323 may consist of only one convolutional layer 324. The intermediate layers 323 that are adjacent to each other are separated by one of a first pooling section 326, a second pooling section 327, and an unpooling section 328. Here, when the intermediate layer 323 includes two or more convolutional layers 324, it is preferable that the numbers of filters 325 in those convolutional layers 324 are equal to each other.

本図では、「Ａ×Ｂ」と記された中間層３２３は、Ｂ個の畳み込み層３２４からなり、各畳み込み層３２４は、各チャネルに対しＡ個の畳み込みフィルタを含むことを意味している。このような中間層３２３を以下では「Ａ×Ｂ中間層」とも呼ぶ。たとえば、６４×２中間層３２３は、２個の畳み込み層３２４からなり、各畳み込み層３２４は、各チャネルに対し６４個の畳み込みフィルタを含むことを意味している。 In this figure, the intermediate layer 323 labeled "A×B" is composed of B convolutional layers 324, and each convolutional layer 324 means that it includes A convolutional filters for each channel. . Such an intermediate layer 323 will also be referred to as an "A×B intermediate layer" below. For example, a 64×2 hidden layer 323 consists of two convolutional layers 324, meaning that each convolutional layer 324 includes 64 convolutional filters for each channel.

本図の例において、特徴抽出部３２１は、６４×２中間層３２３、１２８×２中間層３２３、２５６×３中間層３２３、および、５１２×３中間層３２３をこの順に含む。また、アップサンプル部３２２は、５１２×３中間層３２３、２５６×３中間層３２３、１２８×２中間層３２３、および６４×２中間層３２３をこの順に含む。また、第２のプーリング部３２７は、２つの５１２×３中間層３２３を互いに接続している。なお、非線形写像部３２０を構成する中間層３２３の数は特に限定されず、たとえば画像データの画素数に応じて定めることができる。 In the example shown in the figure, the feature extraction unit 321 includes a 64×2 intermediate layer 323, a 128×2 intermediate layer 323, a 256×3 intermediate layer 323, and a 512×3 intermediate layer 323 in this order. Further, the up-sample section 322 includes a 512×3 intermediate layer 323, a 256×3 intermediate layer 323, a 128×2 intermediate layer 323, and a 64×2 intermediate layer 323 in this order. Further, the second pooling section 327 connects the two 512×3 intermediate layers 323 to each other. Note that the number of intermediate layers 323 constituting the nonlinear mapping section 320 is not particularly limited, and can be determined depending on, for example, the number of pixels of image data.

なお、本図は非線形写像部３２０の構成の一例であり、非線形写像部３２０は他の構成を有していても良い。たとえば、６４×２中間層３２３の代わりに６４×１中間層３２３が含まれても良い。中間層３２３に含まれる畳み込み層３２４の数が削減されることで、計算コストがより低減される可能性がある。また、たとえば、６４×２中間層３２３の代わりに３２×２中間層３２３が含まれても良い。中間層３２３のチャネル数が削減されることで、計算コストがより低減される可能性がある。さらに、中間層３２３における畳み込み層３２４の数とチャネル数との両方を削減しても良い。 Note that this diagram is an example of the configuration of the nonlinear mapping unit 320, and the nonlinear mapping unit 320 may have other configurations. For example, a 64×1 intermediate layer 323 may be included instead of the 64×2 intermediate layer 323. Reducing the number of convolutional layers 324 included in the intermediate layer 323 may further reduce the computational cost. Also, for example, a 32×2 intermediate layer 323 may be included instead of the 64×2 intermediate layer 323. Reducing the number of channels in the intermediate layer 323 may further reduce the computational cost. Furthermore, both the number of convolutional layers 324 and the number of channels in the intermediate layer 323 may be reduced.

ここで、特徴抽出部３２１に含まれる複数の中間層３２３においては、第１のプーリング部３２６を経る毎にフィルタ３２５の数が増加することが好ましい。具体的には、第１の中間層３２３ａと第２の中間層３２３ｂとが、第１のプーリング部３２６を介して互いに連続しており、第１の中間層３２３ａの後段に第２の中間層３２３ｂが位置する。そして、第１の中間層３２３ａは、各チャネルに対するフィルタ３２５の数がＮ１である畳み込み層３２４で構成されており、第２の中間層３２３ｂは、各チャネルに対するフィルタ３２５の数がＮ２である畳み込み層３２４で構成されている。このとき、Ｎ２＞Ｎ１が成り立つことが好ましい。また、Ｎ２＝Ｎ１×２が成り立つことがより好ましい。 Here, in the plurality of intermediate layers 323 included in the feature extraction section 321, it is preferable that the number of filters 325 increases each time the filter passes through the first pooling section 326. Specifically, the first intermediate layer 323a and the second intermediate layer 323b are continuous with each other via the first pooling part 326, and the second intermediate layer 323a is disposed after the first intermediate layer 323a. 323b is located. The first intermediate layer 323a is composed of a convolutional layer 324 in which the number of filters 325 for each channel is N1, and the second intermediate layer 323b is composed of a convolutional layer 324 in which the number of filters 325 for each channel is N2. It is composed of layer 324. At this time, it is preferable that N2>N1 holds true. Further, it is more preferable that N2=N1×2 holds true.

また、アップサンプル部３２２に含まれる複数の中間層３２３においては、アンプーリング部３２８を経る毎にフィルタ３２５の数が減少することが好ましい。具体的には、第３の中間層３２３ｃと第４の中間層３２３ｄとが、アンプーリング部３２８を介して互いに連続しており、第３の中間層３２３ｃの後段に第４の中間層３２３ｄが位置する。そして、第３の中間層３２３ｃは、各チャネルに対するフィルタ３２５の数がＮ３である畳み込み層３２４で構成されており、第４の中間層３２３ｄは、各チャネルに対するフィルタ３２５の数がＮ４である畳み込み層３２４で構成されている。このとき、Ｎ４＜Ｎ３が成り立つことが好ましい。また、Ｎ３＝Ｎ４×２が成り立つことがより好ましい。 Further, in the plurality of intermediate layers 323 included in the up-sampling section 322, it is preferable that the number of filters 325 decreases each time the signal passes through the unpooling section 328. Specifically, the third intermediate layer 323c and the fourth intermediate layer 323d are continuous with each other via the unpooling section 328, and the fourth intermediate layer 323d is provided after the third intermediate layer 323c. To position. The third intermediate layer 323c is composed of a convolutional layer 324 in which the number of filters 325 for each channel is N3, and the fourth intermediate layer 323d is composed of a convolutional layer 324 in which the number of filters 325 for each channel is N4. It is composed of layer 324. At this time, it is preferable that N4<N3 holds true. Further, it is more preferable that N3=N4×2 holds true.

特徴抽出部３２１では、入力部３１０から取得した中間データから勾配や形状など、複数の抽象度を持つ画像特徴を中間層３２３のチャネルとして抽出する。図６は、６４×２中間層３２３の構成を例示している。本図を参照して、中間層３２３における処理を説明する。本図の例において、中間層３２３は第１の畳み込み層３２４ａと第２の畳み込み層３２４ｂとで構成されており、各畳み込み層３２４は６４個のフィルタ３２５を備える。第１の畳み込み層３２４ａでは、中間層３２３に入力されたデータの各チャネルに対して、フィルタ３２５を用いた畳み込み処理が施される。たとえば入力部３１０へ入力された画像がＲＧＢ画像である場合、３つのチャネルｈ０ｉ（ｉ＝１．．３）のそれぞれに対して処理が施される。また、本図の例において、フィルタ３２５は６４種の３×３フィルタであり、すなわち合計６４×３種のフィルタである。畳み込み処理の結果、各チャネルｉに対して、６４個の結果ｈ０ｉ，ｊ（ｉ＝１．．３，ｊ＝１．．６４）が得られる。 The feature extraction unit 321 extracts image features having multiple levels of abstraction, such as gradients and shapes, from the intermediate data obtained from the input unit 310 as channels of the intermediate layer 323. FIG. 6 illustrates the configuration of the 64×2 intermediate layer 323. Processing in the intermediate layer 323 will be described with reference to this figure. In the example shown, the intermediate layer 323 is composed of a first convolutional layer 324a and a second convolutional layer 324b, and each convolutional layer 324 includes 64 filters 325. In the first convolution layer 324a, convolution processing using a filter 325 is performed on each channel of data input to the intermediate layer 323. For example, if the image input to the input unit 310 is an RGB image, processing is performed on each of the three channels h0i (i=1..3). Further, in the example of this figure, the filter 325 is a 3×3 filter of 64 types, that is, a total of 64×3 types of filters. As a result of the convolution process, 64 results h0i,j (i=1..3, j=1..64) are obtained for each channel i.

次に、複数のフィルタ３２５の出力に対し、活性化部３２９において活性化処理が行われる。具体的には、全チャネルの対応する結果ｊについて、対応する要素毎の総和に活性化処理が施される。この活性化処理により、６４チャネルの結果ｈ１ｉ（ｉ＝１．．６４）、すなわち、第１の畳み込み層３２４ａの出力が、画像特徴として得られる。活性化処理は特に限定されないが、双曲関数、シグモイド関数、および正規化線形関数の少なくともいずれかを用いる処理が好ましい。 Next, an activation process is performed on the outputs of the plurality of filters 325 in an activation unit 329 . Specifically, for the corresponding result j of all channels, activation processing is performed on the sum of each corresponding element. Through this activation process, the 64-channel result h1i (i=1..64), that is, the output of the first convolutional layer 324a, is obtained as an image feature. Although the activation process is not particularly limited, it is preferable to use at least one of a hyperbolic function, a sigmoid function, and a normalized linear function.

さらに、第１の畳み込み層３２４ａの出力データを第２の畳み込み層３２４ｂの入力データとし、第２の畳み込み層３２４ｂにて第１の畳み込み層３２４ａと同様の処理を行って、６４チャネルの結果ｈ２ｉ（ｉ＝１．．６４）、すなわち第２の畳み込み層３２４ｂの出力が、画像特徴として得られる。第２の畳み込み層３２４ｂの出力がこの６４×２中間層３２３の出力データとなる。 Furthermore, the output data of the first convolutional layer 324a is used as the input data of the second convolutional layer 324b, and the second convolutional layer 324b performs the same processing as the first convolutional layer 324a, resulting in 64 channels h2i (i=1..64), that is, the output of the second convolutional layer 324b is obtained as an image feature. The output of the second convolutional layer 324b becomes the output data of this 64×2 intermediate layer 323.

ここで、フィルタ３２５の構造は特に限定されないが、３×３の二次元フィルタであることが好ましい。また、各フィルタ３２５の係数は独立に設定可能である。本実施例において、各フィルタ３２５の係数は記憶部３９０に保持されており、非線形写像部３２０がそれを読み出して処理に用いることができる。ここで、複数のフィルタ３２５の係数は機械学習を用いて生成、修正された補正情報に基づいて定められてもよい。たとえば、補正情報は、複数のフィルタ３２５の係数を、複数の補正パラメータとして含む。非線形写像部３２０は、この補正情報をさらに用いて中間データを写像データに変換することができる。記憶部３９０は視覚顕著性抽出手段３に備えられていてもよいし、視覚顕著性抽出手段３の外部に設けられていてもよい。また、非線形写像部３２０は補正情報を、通信ネットワークを介して外部から取得しても良い。 Here, the structure of the filter 325 is not particularly limited, but it is preferable that the filter 325 is a two-dimensional filter of 3×3. The coefficients of each filter 325 can be set independently. In this embodiment, the coefficients of each filter 325 are stored in the memory unit 390, and the nonlinear mapping unit 320 can read them and use them for processing. Here, the coefficients of the multiple filters 325 may be determined based on correction information generated and corrected using machine learning. For example, the correction information includes the coefficients of the multiple filters 325 as multiple correction parameters. The nonlinear mapping unit 320 can further use this correction information to convert the intermediate data into mapping data. The memory unit 390 may be provided in the visual saliency extraction means 3, or may be provided outside the visual saliency extraction means 3. The nonlinear mapping unit 320 may also obtain the correction information from the outside via a communication network.

図７（ａ）および図７（ｂ）はそれぞれ、フィルタ３２５で行われる畳み込み処理の例を示す図である。図７（ａ）および図７（ｂ）では、いずれも３×３畳み込みの例が示されている。図７（ａ）の例は、最近接要素を用いた畳み込み処理である。図７（ｂ）の例は、距離が二以上の近接要素を用いた畳み込み処理である。なお、距離が三以上の近接要素を用いた畳み込み処理も可能である。フィルタ３２５は、距離が二以上の近接要素を用いた畳み込み処理を行うことが好ましい。より広範囲の特徴を抽出することができ、視覚顕著性の推定精度をさらに高めることができるからである。 FIGS. 7A and 7B are diagrams showing examples of convolution processing performed by the filter 325, respectively. 7(a) and 7(b) both show examples of 3×3 convolution. The example in FIG. 7(a) is a convolution process using the nearest elements. The example in FIG. 7(b) is a convolution process using adjacent elements having a distance of two or more. Note that convolution processing using adjacent elements having a distance of three or more is also possible. It is preferable that the filter 325 performs convolution processing using adjacent elements having a distance of two or more. This is because a wider range of features can be extracted and the accuracy of estimating visual saliency can be further improved.

以上、６４×２中間層３２３の動作について説明した。他の中間層３２３（１２８×２中間層３２３、２５６×３中間層３２３、および、５１２×３中間層３２３等）の動作についても、畳み込み層３２４の数およびチャネルの数を除いて、６４×２中間層３２３の動作と同じである。また、特徴抽出部３２１における中間層３２３の動作も、アップサンプル部３２２における中間層３２３の動作も上記と同様である。 The operation of the 64×2 intermediate layer 323 has been described above. Regarding the operation of other intermediate layers 323 (128×2 intermediate layer 323, 256×3 intermediate layer 323, 512×3 intermediate layer 323, etc.), except for the number of convolutional layers 324 and the number of channels, 64× The operation is the same as that of the second intermediate layer 323. Furthermore, the operation of the intermediate layer 323 in the feature extraction section 321 and the operation of the intermediate layer 323 in the up-sampling section 322 are similar to those described above.

図８（ａ）は、第１のプーリング部３２６の処理を説明するための図であり、図８（ｂ）は、第２のプーリング部３２７の処理を説明するための図であり、図８（ｃ）は、アンプーリング部３２８の処理を説明するための図である。 Figure 8(a) is a diagram for explaining the processing of the first pooling unit 326, Figure 8(b) is a diagram for explaining the processing of the second pooling unit 327, and Figure 8(c) is a diagram for explaining the processing of the unpooling unit 328.

特徴抽出部３２１において、中間層３２３から出力されたデータは、第１のプーリング部３２６においてチャネル毎にプーリング処理が施された後、次の中間層３２３に入力される。第１のプーリング部３２６ではたとえば、非オーバーラップのプーリング処理が行われる。図８（ａ）では、各チャネルに含まれる要素群に対し、２×２の４つの要素３０を１つの要素３０に対応づける処理を示している。第１のプーリング部３２６ではこのような対応づけが全ての要素３０に対し行われる。ここで、２×２の４つの要素３０は互いに重ならないよう選択される。本例では、各チャネルの要素数が４分の１に縮小される。なお、第１のプーリング部３２６において要素数が縮小される限り、対応づける前後の要素３０の数は特に限定されない。 In the feature extraction unit 321, the data output from the intermediate layer 323 is subjected to pooling processing for each channel in the first pooling unit 326, and then input to the next intermediate layer 323. For example, the first pooling unit 326 performs non-overlapping pooling processing. FIG. 8A shows a process of associating four 2×2 elements 30 with one element 30 for a group of elements included in each channel. The first pooling unit 326 performs this kind of association for all elements 30. Here, the four 2×2 elements 30 are selected so as not to overlap each other. In this example, the number of elements in each channel is reduced by a factor of four. Note that as long as the number of elements is reduced in the first pooling unit 326, the number of elements 30 before and after being associated is not particularly limited.

特徴抽出部３２１から出力されたデータは、第２のプーリング部３２７を介してアップサンプル部３２２に入力される。第２のプーリング部３２７では、特徴抽出部３２１からの出力データに対し、オーバーラッププーリングが施される。図８（ｂ）では、一部の要素３０をオーバーラップさせながら、２×２の４つの要素３０を１つの要素３０に対応づける処理を示している。すなわち、繰り返される対応づけにおいて、ある対応づけにおける２×２の４つの要素３０のうち一部が、次の対応づけにおける２×２の４つの要素３０にも含まれる。本図のような第２のプーリング部３２７では要素数は縮小されない。なお、第２のプーリング部３２７において対応づける前後の要素３０の数は特に限定されない。 The data output from the feature extraction unit 321 is input to the upsampling unit 322 via the second pooling unit 327. In the second pooling unit 327, overlap pooling is performed on the output data from the feature extraction unit 321. FIG. 8(b) shows a process of matching four 2×2 elements 30 to one element 30 while overlapping some of the elements 30. That is, in repeated matching, some of the four 2×2 elements 30 in a certain matching are also included in the four 2×2 elements 30 in the next matching. The number of elements is not reduced in the second pooling unit 327 as shown in this figure. Note that the number of elements 30 before and after matching in the second pooling unit 327 is not particularly limited.

第１のプーリング部３２６および第２のプーリング部３２７で行われる各処理の方法は特に限定されないが、たとえば、４つの要素３０の最大値を１つの要素３０とする対応づけ（max pooling）や４つの要素３０の平均値を１つの要素３０とする対応づけ（average pooling）が挙げられる。 The method of each process performed by the first pooling unit 326 and the second pooling unit 327 is not particularly limited, but for example, the maximum value of four elements 30 is associated with one element 30 (max pooling), An example of this is average pooling, in which the average value of two elements 30 is used as one element 30.

第２のプーリング部３２７から出力されたデータは、アップサンプル部３２２における中間層３２３に入力される。そして、アップサンプル部３２２の中間層３２３からの出力データはアンプーリング部３２８においてチャネル毎にアンプーリング処理が施された後、次の中間層３２３に入力される。図８（ｃ）では、１つの要素３０を複数の要素３０に拡大する処理を示している。拡大の方法は特に限定されないが、１つの要素３０を２×２の４つの要素３０へ複製する方法が例として挙げられる。 The data output from the second pooling section 327 is input to the intermediate layer 323 in the up-sampling section 322. Then, the output data from the intermediate layer 323 of the up-sampling section 322 is subjected to unpooling processing for each channel in an unpooling section 328, and then input to the next intermediate layer 323. FIG. 8C shows a process of expanding one element 30 into multiple elements 30. The method of enlarging is not particularly limited, but an example is a method of duplicating one element 30 into four 2×2 elements 30.

アップサンプル部３２２の最後の中間層３２３の出力データは写像データとして非線形写像部３２０から出力され、出力部３３０に入力される。出力ステップＳ１３０において出力部３３０は、非線形写像部３２０から取得したデータに対し、たとえば正規化や解像度変換等を行うことで視覚顕著性マップを生成し、出力する。視覚顕著性マップはたとえば、図３（ｂ）に例示したような視覚顕著性を輝度値で可視化した画像（画像データ）である。また、視覚顕著性マップはたとえば、ヒートマップのように視覚顕著性に応じて色分けされた画像であっても良いし、視覚顕著性が予め定められた基準より高い視覚顕著領域を、その他の位置とは識別可能にマーキングした画像であっても良い。さらに、視覚顕著性推定情報は画像等として示されたマップ情報に限定されず、視覚顕著領域を示す情報を列挙したテーブル等であっても良い。 The output data of the last intermediate layer 323 of the upsampling unit 322 is output from the nonlinear mapping unit 320 as mapping data and input to the output unit 330. In the output step S130, the output unit 330 generates and outputs a visual saliency map by performing, for example, normalization or resolution conversion on the data acquired from the nonlinear mapping unit 320. The visual saliency map is, for example, an image (image data) in which visual saliency is visualized by brightness values, as shown in FIG. 3B. In addition, the visual saliency map may be, for example, an image that is colored according to visual saliency, such as a heat map, or an image in which visual saliency areas with visual saliency higher than a predetermined standard are marked so as to be distinguishable from other positions. Furthermore, the visual saliency estimation information is not limited to map information shown as an image or the like, and may be a table or the like that lists information indicating visual saliency areas.

物体領域検出手段４は、入力手段２から入力された画像データと検出する物体の種類を指定する物体指定情報とに基づいて画像データ内における指定された物体を検出（検出）する。物体領域検出手段４で行う物体検出方法は、例えばＳＳＤ（Single Shot multiple Detector）といった周知の方法を用いればよく、特に限定されない。物体指定情報は、例えば検出すべき物体のラベルとすることができる。また、物体指定情報は、物体領域検出手段４の内部情報として保持してもよい。物体指定情報に指定される物体のラベルは、例えば、自動車、バイク、自転車や歩行者等の移動体が挙げられるが、道路標示や道路標識を含めてもよい。 The object area detection means 4 detects (detects) a designated object in the image data based on the image data input from the input means 2 and object designation information that designates the type of object to be detected. The object detection method performed by the object area detection means 4 is not particularly limited, and may be a well-known method such as SSD (Single Shot Multiple Detector), for example. The object designation information can be, for example, a label of the object to be detected. Further, the object designation information may be held as internal information of the object area detection means 4. The label of the object specified in the object specification information includes, for example, a moving object such as a car, a motorcycle, a bicycle, or a pedestrian, but may also include a road marking or a road sign.

物体領域検出手段４は、検出結果を画像上の領域（検出領域）を示す領域情報として出力する。この領域情報が示す領域は物体の形状に沿ったものでなくてもよく、当該物体を含む例えば矩形状や円状の領域であってもよい。即ち、物体領域検出手段４は、検出対象とする物体の種類を設定する設定部として機能するとともに、画像から設定された種類の物体を検出する物体検出部として機能する。 The object area detection means 4 outputs the detection result as area information indicating an area (detection area) on the image. The area indicated by this area information does not have to follow the shape of the object, and may be, for example, a rectangular or circular area that includes the object. That is, the object area detection means 4 functions as a setting section that sets the type of object to be detected, and also functions as an object detection section that detects the set type of object from the image.

見落とし判定手段５は、視覚顕著性抽出手段３が出力した視覚顕著性マップと、物体領域検出手段４が検出した物体領域情報とを対比して、予め定めた判定基準情報に基づいて見落とされる可能性がある物体を判定して、その判定結果を見落とし物体情報として出力する。判定基準情報は、見落としと判定するための基準とする情報であり、例えば特定の閾値（スカラー値又はベクトル値）とすることができる。また、この判定基準情報は、見落とし判定手段５の内部情報として保持してもよい。見落とし判定手段５が出力する見落とし物体情報としては、例えば見落とし物体を含む矩形状の領域情報あるいは画素座標を示す情報とすることができる。即ち、見落とし判定手段５は、物体領域検出手段４（物体検出部）が検出した物体について、視覚顕著性マップ（視覚顕著性分布情報）に基づいて見落とし可能性判定を行う判定部として機能する。 The oversight determination means 5 compares the visual saliency map output by the visual saliency extraction means 3 with the object region information detected by the object region detection means 4 to determine which objects may be overlooked based on predetermined criteria information, and outputs the determination result as oversight object information. The criteria information is information used as a criterion for determining whether an object has been overlooked, and may be, for example, a specific threshold value (scalar value or vector value). This criteria information may also be held as internal information of the oversight determination means 5. The oversight object information output by the oversight determination means 5 may be, for example, rectangular region information including the oversight object or information indicating pixel coordinates. In other words, the oversight determination means 5 functions as a determination unit that performs an oversight possibility determination based on the visual saliency map (visual saliency distribution information) for the object detected by the object region detection means 4 (object detection unit).

情報提示手段６は、見落とし判定手段５が出力した物体見落とし情報を提示する。情報提示手段としては、物体見落とし情報を表示する表示装置で構成することができる。この表示装置は、例えはヘッドアップディスプレイやメータ内など運転者の視認しやすい位置に設置されているのが望ましい。 The information presentation means 6 presents the object oversight information output by the oversight determination means 5. The information presentation means can be configured as a display device that displays the object oversight information. It is desirable for this display device to be installed in a position that is easily visible to the driver, such as a head-up display or in the meter.

次に、上述した構成の判定装置１における動作（判定方法）について、図９のフローチャートを参照して説明する。また、このフローチャートを判定装置１として機能するコンピュータで実行されるプログラムとして構成することで判定プログラムとすることができる。また、この判定プログラムは、判定装置１が有するメモリ等に記憶するに限らず、メモリカードや光ディスク等の記憶媒体に格納してもよい。 Next, the operation (determination method) of the determination device 1 having the above-described configuration will be described with reference to the flowchart of FIG. 9. Further, by configuring this flowchart as a program executed by a computer functioning as the determination device 1, it can be made into a determination program. Further, this determination program is not limited to being stored in the memory of the determination device 1, but may be stored in a storage medium such as a memory card or an optical disk.

まず、入力手段２が、入力された画像を画像データとして視覚顕著性抽出手段３及び物体領域検出手段４に出力する（ステップＳ２１０）。本ステップでは、入力手段２に入力された画像データを動画像の場合は時系列に分解して視覚顕著性抽出手段３及び物体領域検出手段４へ入力している。また、本ステップでノイズ除去や幾何学変換などの画像処理を施してもよい。 First, the input means 2 outputs the input image as image data to the visual saliency extraction means 3 and the object area detection means 4 (step S210). In this step, the image data input to the input means 2 is decomposed into time series in the case of a moving image and input to the visual saliency extraction means 3 and the object area detection means 4. Further, image processing such as noise removal and geometric transformation may be performed in this step.

次に、視覚顕著性抽出手段３が、視覚顕著性マップを抽出する（ステップＳ２２０）。視覚顕著性マップは、視覚顕著性抽出手段３において、上述した方法により図３（ｂ）に示したような視覚顕著性マップを出力する。 Next, the visual saliency extraction means 3 extracts a visual saliency map (step S220). The visual saliency map is outputted by the visual saliency extracting means 3 as shown in FIG. 3(b) using the method described above.

ステップＳ２２０と並行して物体領域検出手段４が、領域情報を出力する（ステップＳ２３０）。領域情報は、物体領域検出手段４において、入力手段２から入力された画像データに対して、物体指定情報に基づいて当該画像データ内に存在する物体の領域を検出して領域情報として出力する。図１０に領域情報の例を示す。図１０は、図３（ａ）に示した画像データに領域情報を付加したものである。図１０に示したように、領域情報は、検出された物体を含む領域を示す領域部４１、４２と、検出された物体の種類や名称等を示すラベル名４３、４４と、から構成されている。 In parallel with step S220, the object region detection means 4 outputs region information (step S230). The object region detection means 4 detects the region of an object present in the image data input from the input means 2 based on the object designation information and outputs the region information. An example of region information is shown in FIG. 10. FIG. 10 shows the image data shown in FIG. 3(a) to which region information has been added. As shown in FIG. 10, the region information is composed of region sections 41, 42 indicating the region including the detected object, and label names 43, 44 indicating the type, name, etc. of the detected object.

領域部４１は、図示したように矩形状の枠で示されている。図１０では、物体として検出された「犬」を含むように枠が示されている。領域部４２も同様に、図示したように矩形状の枠で示されている。図１０では、物体として検出された「車」を含むように枠が示されている。なお、領域部４１、４２の形状は、矩形に限らず円や楕円等であってもよい。 The area portion 41 is indicated by a rectangular frame as illustrated. In FIG. 10, a frame is shown to include a "dog" detected as an object. Similarly, the area portion 42 is also indicated by a rectangular frame as illustrated. In FIG. 10, a frame is shown to include a "car" detected as an object. Note that the shape of the regions 41 and 42 is not limited to a rectangle, but may be a circle, an ellipse, or the like.

ラベル名４３は、図示したように領域部４１に隣接するように示されている。図１０では、検出された物体のラベル名である「犬」が示されている。ラベル名４４は、図示したように領域部４２に隣接するように示されている。図１０では、検出された物体のラベル名である「車」が示されている。 The label name 43 is shown adjacent to the area portion 41 as shown. In FIG. 10, "dog" which is the label name of the detected object is shown. The label name 44 is shown adjacent to the area portion 42 as shown. In FIG. 10, "car" is shown as the label name of the detected object.

図９の説明に戻る。次に、見落とし判定を行う（ステップＳ２４０）。見落とし判定は、見落とし判定手段５において、視覚顕著性抽出手段３が出力した視覚顕著性マップと、物体領域検出手段４が出力した領域情報と、から判定基準情報に基づいて領域情報に含まれる物体から視認的に見落とされる可能性のある物体を選択し、見落とし判定情報として出力する。 Returning to the explanation of FIG. 9. Next, an oversight determination is performed (step S240). The oversight determination means 5 uses the visual saliency map outputted by the visual saliency extraction means 3, the area information outputted by the object area detection means 4, and the object included in the area information based on the determination criterion information. Objects that may be visually overlooked are selected and output as oversight determination information.

例えば、図３に例示した画像において、図３（ａ）で「犬」と「車」が検出された場合に、図３（ｂ）の視覚顕著性マップにおいて右側中央に位置する輝度が高い部分が判定基準情報を超えて視覚顕著性が高いと判定されたとすると、この視覚顕著性が高いと判定された領域を図３（ａ）に重ねた場合に重なる領域にある物体である「犬」は見落とす可能性は低い。一方で、視覚顕著性が高いと判定された領域を図３（ａ）に重ねた場合に重ならない領域にある物体である「車」は見落とす可能性は高いと判定される。即ち、見落とし判定手段５（判定部）は、物体領域検出手段４（物体検出部）が検出した物体について、視覚顕著性マップ（視覚顕著性分布情報）において視覚顕著性が高いと判定された領域と重ならない物体は見落とされる可能性が高いと判定している。 For example, in the image illustrated in FIG. 3, when "dog" and "car" are detected in FIG. 3(a), the part with high brightness located at the center right in the visual saliency map of FIG. 3(b) If this exceeds the criterion information and is determined to have high visual saliency, then when the areas determined to have high visual saliency are superimposed on FIG. is unlikely to be overlooked. On the other hand, when regions determined to have high visual saliency are superimposed on FIG. 3(a), it is determined that there is a high possibility that the object "car" in the non-overlapping region will be overlooked. That is, the oversight determination means 5 (determination unit) detects an area determined to have high visual saliency in the visual saliency map (visual saliency distribution information) for the object detected by the object area detection unit 4 (object detection unit). It is determined that objects that do not overlap are likely to be overlooked.

上述した見落とし判定手段５における判定方法の詳細について説明する。物体領域検出手段４で検出された各物体に対して、以下の（１）式を用いて視覚顕著性の平均値が判定基準情報として得た閾値以下の場合に見落とし可能性ありと判定する。

The following describes in detail the determination method used by the above-mentioned oversight determination means 5. For each object detected by the object region detection means 4, if the average visual saliency is equal to or less than a threshold value obtained as the determination criterion information, it is determined that there is a possibility of oversight using the following formula (1).

（１）式において、ａｒｅａ（ｏｂｊ_ｉ）は物体ｉの面積、（ｘ，ｙ）∈ｏｂｊ_ｉは物体ｉ内の全画素の座標、ｓａｌ（ｘ，ｙ）は座標（ｘ，ｙ）の視覚顕著性の値、Ｔｈは閾値を示す。 In equation (1), area (obj _i ) is the area of object i, (x, y)∈obj _i is the coordinate of all pixels in object i, and sal (x, y) is the visual field of coordinate (x, y). The saliency value, Th, indicates a threshold value.

そして、見落とし物体を提示する（ステップＳ２５０）。本実施例では、例えば見落とされる可能性のある物体に対応する領域部やラベル名は赤色等、他の物体に対応する領域部やラベル名よりも目立つ色や書体等で表示するといったことが挙げられる。例えば図１０に示した「車」が見落とされる可能性のある物体と判定された場合は、領域部４１とラベル名４３を赤色で表示する。あるいは、例えば見落とされる可能性のある物体のみ領域部４１やラベル名４３を表示してもよい。 Then, the overlooked object is presented (step S250). In this embodiment, for example, the area and label name corresponding to the object that may be overlooked may be displayed in a color or font that stands out more than the area and label names corresponding to other objects, such as red. For example, if the "car" shown in FIG. 10 is determined to be an object that may be overlooked, the area 41 and label name 43 are displayed in red. Alternatively, for example, the area 41 and label name 43 may be displayed only for the object that may be overlooked.

本実施例によれば、判定装置１は、視覚顕著性抽出手段３が車両等の移動体から外部を撮像した画像データから視覚顕著性の高低を推測して得られた視覚顕著性マップを生成する。一方、物体領域検出手段４には検出対象とする物体の種類が設定され、画像データから設定された種類の物体を検出する。そして、見落とし判定手段５では、物体領域検出手段４が検出した物体について、視覚顕著性マップに基づいて見落とし判定を行う。このようにすることにより、視覚顕著性マップと物体検出とを組みわせて見落としを判定することができる。したがって、視覚的に見落とす可能性のある物体を検出することができる。また、移動体から外部を撮像した画像のみで見落としの判定ができるので、例えばドライブレコーダやＡＤＡＳ（先進運転システム）用の車載カメラ等で撮像された画像から判定可能であり、視線検出等も不要となる。 According to this embodiment, the determination device 1 generates a visual saliency map obtained by the visual saliency extraction means 3 estimating the level of visual saliency from image data captured from a moving body such as a vehicle. Meanwhile, the object region detection means 4 is set with a type of object to be detected, and detects objects of the set type from the image data. Then, the oversight determination means 5 performs an oversight determination for the object detected by the object region detection means 4 based on the visual saliency map. In this way, oversight can be determined by combining the visual saliency map and object detection. Therefore, objects that may be visually overlooked can be detected. In addition, since oversight can be determined only from images captured from the outside of a moving body, it can be determined from images captured by, for example, a drive recorder or an in-vehicle camera for ADAS (advanced driving system), and gaze detection, etc. is not required.

また、見落とし判定手段５は、物体領域検出手段４が検出した物体について、視覚顕著性マップと対比して見落とし判定を行っている。このようにすることにより、視覚顕著性マップと撮像した画像とを対比することにより、見落としを判定することができる。 The oversight determination means 5 also performs oversight determination by comparing the object detected by the object region detection means 4 with the visual saliency map. In this way, oversight can be determined by comparing the visual saliency map with the captured image.

また、見落とし判定手段５は、物体領域検出手段４が検出した物体について、視覚顕著性が高いと判定された領域と重ならない物体は見落とされる可能性が高いと判定している。このようにすることにより、画像中で視覚顕著性が高くない部分に位置する物体が見落とし易いと判定することができる。 Moreover, the oversight determining means 5 determines that, among the objects detected by the object area detecting means 4, objects that do not overlap with the area determined to have high visual saliency are likely to be overlooked. By doing so, it can be determined that objects located in parts of the image that are not highly visually salient are likely to be overlooked.

また、視覚顕著性抽出手段３は、画像を写像処理可能な中間データに変換する入力部３１０と、中間データを写像データに変換する非線形写像部３２０と、写像データに基づき顕著性分布を示す顕著性推定情報を生成する出力部３３０と、を備え、非線形写像部３２０は、中間データに対し特徴の抽出を行う特徴抽出部３２１と、特徴抽出部３２１で生成されたデータのアップサンプルを行うアップサンプル部３２２と、を備えている。このようにすることにより、小さな計算コストで、視覚顕著性を推定することができる。 The visual saliency extraction means 3 also includes an input unit 310 that converts an image into intermediate data that can be mapped, a nonlinear mapping unit 320 that converts the intermediate data into mapped data, and an output unit 330 that generates saliency estimation information indicating a saliency distribution based on the mapped data, and the nonlinear mapping unit 320 includes a feature extraction unit 321 that extracts features from the intermediate data, and an upsampling unit 322 that upsamples the data generated by the feature extraction unit 321. In this way, visual saliency can be estimated with low calculation costs.

また、見落とし判定手段５における判定結果を提示する情報提示手段６を備えている。このようにすることにより、判定結果を運転者に提示して見落とし可能性を警告することができる。 Further, information presentation means 6 for presenting the judgment result of the oversight judgment means 5 is provided. By doing so, it is possible to present the determination result to the driver and warn him of the possibility of oversight.

なお、上述した実施例において、物体領域検出手段４については、入力手段２からの画像データに基づいて物体検出をしなくてもよい。例えば、ライダ等の他のセンサにより検出された結果を利用してもよい。この場合、他のセンサの物体検出範囲は画像データの撮像範囲と同じ範囲であることが好ましく、少なくとも画像データの撮像範囲を含むようにする必要がる。 Note that in the embodiment described above, the object area detecting means 4 does not need to perform object detection based on the image data from the input means 2. For example, results detected by other sensors such as lidar may be used. In this case, the object detection range of the other sensor is preferably the same range as the imaging range of the image data, and needs to include at least the imaging range of the image data.

また、本発明は上記実施例に限定されるものではない。即ち、当業者は、従来公知の知見に従い、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。かかる変形によってもなお本発明の判定装置を具備する限り、勿論、本発明の範疇に含まれるものである。 Further, the present invention is not limited to the above embodiments. That is, those skilled in the art can implement various modifications based on conventionally known knowledge without departing from the gist of the present invention. Of course, such modifications fall within the scope of the present invention as long as the determination device of the present invention is still provided.

１判定装置
２入力手段
３視覚顕著性抽出手段（生成部）
４物体領域検出手段（設定部、物体検出部）
５見落とし判定手段（判定部）
６情報提示手段（提示部） 1 Determination device 2 Input means 3 Visual saliency extraction means (generation unit)
4. Object region detection means (setting unit, object detection unit)
5. Oversight determination means (determination unit)
6. Information presentation means (presentation unit)

Claims

a generation unit that generates visual saliency distribution information obtained by estimating the level of visual saliency based on an image captured from a moving object of the outside;
A setting unit that sets the type of object to be detected;
an object detection unit that detects an object of a set type from the image;
a determination unit that performs a likelihood of overlooking an object detected by the object detection unit based on a value obtained by dividing a sum of the visual saliency values within a region of the object in the visual saliency distribution information by an area of the region;
A determination device comprising:

The determination device according to claim 1, characterized in that the determination unit compares the value with a predetermined threshold value to determine the possibility of overlooking the object.

3. The determination device according to claim 1, wherein the determination unit determines the possibility of overlooking the object detected by the object detection unit by comparing it with the visual saliency distribution information.

The determination unit is characterized in that, regarding the objects detected by the object detection unit, it is determined that an object that does not overlap with a region determined to have high visual saliency in the visual saliency distribution information is likely to be overlooked. The determination device according to claim 3.

The generation unit is
an input unit for converting the image into intermediate data that can be subjected to mapping processing;
a nonlinear mapping unit that converts the intermediate data into mapping data;
an output unit that generates saliency estimation information indicating a saliency distribution based on the mapping data,
The nonlinear mapping unit includes a feature extraction unit that extracts features from the intermediate data, and an upsampling unit that upsamples data generated by the feature extraction unit.
5. The determination device according to claim 1, wherein the determination device is a detection device.

The determination device according to any one of claims 1 to 5, further comprising a presentation unit that presents the determination result of the determination unit.

a generation unit that generates visual saliency distribution information obtained by estimating the level of visual saliency based on an image captured from a moving object of the outside;
A setting unit for setting a type of object to be detected;
an object detection unit that detects an object of a type set in the setting unit in an area including an imaging range of the image;
a determination unit that performs a likelihood of overlooking an object detected by the object detection unit based on a value obtained by dividing a sum of the visual saliency values within a region of the object in the visual saliency distribution information by an area of the region;
A determination device comprising:

A determination method executed by a determination device that determines the possibility of oversight based on an image taken of the outside from a moving object, the method comprising:
a generation step of generating visual saliency distribution information obtained by estimating the level of visual saliency from the image;
a setting step of setting the type of object to be detected;
an object detection step of detecting a set type of object from the image;
For the object detected in the object detection step, a possibility of overlooking is determined based on a value obtained by dividing the sum of the visual saliency values in the area of the object in the visual saliency distribution information by the area of the area. Judgment process;
A determination method characterized by comprising:

A determination program that causes a computer to execute the determination method described in claim 8.

A computer-readable storage medium storing the determination program according to claim 9.

A method for determining a possibility of overlooking an object based on an image captured from a moving object, comprising:
A generating step of generating visual saliency distribution information obtained by estimating the level of visual saliency from the image;
A setting step for setting a type of object to be detected;
an object detection step of detecting an object of the type set in the setting step in an area including an imaging range of the image;
a determination step of determining a possibility of overlooking an object detected in the object detection step based on a value obtained by dividing a sum of the visual saliency values within a region of the object in the visual saliency distribution information by an area of the region;
A method for determining whether or not a

A determination program that causes a computer to execute the determination method according to claim 11.

A computer-readable storage medium storing the determination program according to claim 12.