JP2024045551A

JP2024045551A - Information Output Device

Info

Publication number: JP2024045551A
Application number: JP2024020867A
Authority: JP
Inventors: 渉小野寺; 俊明井上
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 2020-03-10
Filing date: 2024-02-15
Publication date: 2024-04-02
Also published as: JP2021144310A

Abstract

【課題】簡易な構成で脇見の傾向を検出する。【解決手段】情報処理装置１は、視覚顕著性処理部３が、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性マップを時系列に取得し、視覚顕著性ピーク検出部４が、視覚顕著性マップにおける少なくとも１つのピーク位置を時系列に検出する。そして、脇見傾向判定部５部が、画像における注視エリアＧを設定し、ピーク位置が注視エリアＧから所定時間以上連続して外れていた場合は、脇見警告部６が脇見の傾向がある旨の情報を出力する。【選択図】図１[Problem] Detecting inattentive tendencies with a simple configuration. [Solution] In an information processing device (1), a visual saliency processing unit (3) acquires a visual saliency map in time series obtained by estimating the level of visual saliency in an image captured of the outside from a moving object, and a visual saliency peak detection unit (4) detects at least one peak position in the visual saliency map in time series. Then, an inattentive tendency determination unit (5) sets a gaze area G in the image, and if the peak position is out of the gaze area G for a predetermined period of time or more, an inattentive tendency warning unit (6) outputs information indicating that there is an inattentive tendency. [Selected Figure] Figure 1

Description

本発明は、移動体から外部を撮像した画像に基づいて所定の情報を出力する情報出力装置に関する。 The present invention relates to an information output device that outputs predetermined information based on an image captured from a moving object.

交通事故を減少させるために運転者の脇見を検出することが行われている。例えば、特許文献１には、車内カメラ１で撮像された撮像画像Ｇに基づいて、車両前方に存在する視認対象物となる視覚特徴点（例えば前方車両４４、信号機４５）を検出し、視覚特徴点と判定対象者の注視点とに基づいて、わき見状態であるか否かを判定することが記載されている。 Detection of driver inattentiveness is being carried out to reduce traffic accidents. For example, in Patent Document 1, visual feature points (for example, a vehicle ahead 44, a traffic light 45) that are objects to be visually recognized that exist in front of the vehicle are detected based on an image G captured by an in-vehicle camera 1, and visual features are detected. It is described that it is determined whether or not the person is in a distracted state based on the point and the gaze point of the person to be determined.

特開２０１７－２２４０６７号公報JP 2017-224067 A

特許文献１に記載した方法の場合、運転者の画像から、運転者の視線や顔の向き、姿勢などを検出し、運転者がどこを見ているからを走行画像と照らし合わせて判断していた。そのため、走行画像だけでなく運転者側の映像も必要となりカメラが複数必要となる。また、運転者の視線と走行画像との照合のために膨大な演算処理を必要としていた。 In the case of the method described in Patent Document 1, the driver's line of sight, facial direction, posture, etc. are detected from the image of the driver, and where the driver is looking is judged by comparing it with the driving image. Therefore, in addition to the driving image, images from the driver's side are also required, which requires multiple cameras. Also, a huge amount of calculation processing is required to compare the driver's line of sight with the driving image.

本発明が解決しようとする課題としては、簡易な構成で脇見の傾向を検出することが一例として挙げられる。 An example of the problem to be solved by the present invention is to detect a tendency to look aside with a simple configuration.

上記課題を解決するために、請求項１に記載の発明は、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性分布情報を時系列に取得する取得部と、前記視覚顕著性分布情報における少なくとも１つのピーク位置を検出するピーク位置検出部と、前記画像における前記移動体の運転者が注視すべき範囲を設定する注視範囲設定部と、前記ピーク位置が前記注視すべき範囲から連続して外れていた場合は脇見の傾向がある旨の情報を出力する脇見出力部と、を備えることを特徴としている。 In order to solve the above problem, the invention according to claim 1 provides visual saliency distribution information obtained by estimating the level of visual saliency in the image based on an image taken of the outside from a moving body. a peak position detection unit that detects at least one peak position in the visual saliency distribution information in chronological order; and a gaze range that sets a range that the driver of the mobile object should gaze at in the image. The present invention is characterized by comprising a setting section and an inattentiveness output section that outputs information indicating that there is a tendency to inattentiveness when the peak position is continuously out of the range to be watched.

本発明の第１の実施例にかかる情報出力装置の機能構成図である。FIG. 1 is a functional configuration diagram of an information output device according to a first embodiment of the present invention. 図１に示された視覚顕著性処理部の構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a visual saliency processing section shown in FIG. 1. FIG. （ａ）は判定装置へ入力する画像を例示する図であり、（ｂ）は（ａ）に対し推定される、視覚顕著性マップを例示する図である。(a) is a diagram illustrating an image input to the determination device, and (b) is a diagram illustrating a visual saliency map estimated for (a). 図１に示された視覚顕著性処理部の処理方法を例示するフローチャートである。2 is a flowchart illustrating a processing method of the visual saliency processing unit shown in FIG. 1 . 非線形写像部の構成を詳しく例示する図である。FIG. 3 is a diagram illustrating in detail the configuration of a nonlinear mapping section. 中間層の構成を例示する図である。FIG. 2 is a diagram illustrating a configuration of an intermediate layer. （ａ）および（ｂ）はそれぞれ、フィルタで行われる畳み込み処理の例を示す図である。13A and 13B are diagrams illustrating an example of convolution processing performed by a filter. （ａ）は、第１のプーリング部の処理を説明するための図であり、（ｂ）は、第２のプーリング部の処理を説明するための図であり、（ｃ）は、アンプーリング部の処理を説明するための図である。(a) is a diagram for explaining the processing of the first pooling section, (b) is a diagram for explaining the processing of the second pooling section, and (c) is a diagram for explaining the processing of the second pooling section. FIG. 2 is a diagram for explaining the process. 注視エリアの設定方法の説明図である。FIG. 11 is an explanatory diagram of a method for setting a gaze area. 脇見検出エリアの説明図である。FIG. 3 is an explanatory diagram of an inattentiveness detection area. 他の脇見検出エリアの説明図である。FIG. 11 is an explanatory diagram of another inattentive looking detection area. 図１に示された情報出力装置の動作のフローチャートである。2 is a flowchart of an operation of the information output device shown in FIG. 1 . 本発明の第２の実施例にかかる情報処理装置を有するシステムの構成図である。FIG. 2 is a configuration diagram of a system including an information processing device according to a second embodiment of the present invention. 図１３に示された情報処理装置の機能構成図である。FIG. 14 is a functional configuration diagram of the information processing device shown in FIG. 13. 図１４に示された情報処理装置の動作のフローチャートである。15 is a flowchart of the operation of the information processing device shown in FIG. 14 .

以下、本発明の一実施形態にかかる情報出力装置を説明する。本発明の一実施形態にかかる情報出力装置は、取得部が、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性分布情報を時系列に取得し、ピーク位置検出部が、視覚顕著性分布情報における少なくとも１つのピーク位置を時系列に検出する。そして、注視範囲設定部が、画像における移動体の運転者が注視すべき範囲を設定し、脇見出力部が、ピーク位置が注視すべき範囲から所定時間以上連続して外れていた場合は脇見の傾向がある旨の情報を出力する。この視覚顕著性分布情報には、統計的なヒトの視線の集まりやすさを示している。したがって、視覚顕著性分布情報のピークは、その中で最も統計的にヒトの視線が集まりやすい位置を示している。そのため、視覚的顕著性分布情報を用いることで、実際の運転手の視線を計測することなく、簡易な構成で脇見の傾向を検出することができる。 The information output device according to one embodiment of the present invention will be described below. In the information output device according to one embodiment of the present invention, the acquisition unit acquires visual saliency distribution information in a time series, which is obtained by estimating the level of visual saliency in an image based on an image of the outside taken from a moving body, and the peak position detection unit detects at least one peak position in the visual saliency distribution information in a time series. Then, the gaze range setting unit sets a range in the image that the driver of the moving body should gaze at, and the inattentive output unit outputs information indicating a tendency to inattentiveness if the peak position is out of the gaze range for a predetermined period of time or more. This visual saliency distribution information indicates the statistical likelihood of human gaze gathering. Therefore, the peak of the visual saliency distribution information indicates the position that is statistically most likely to attract human gaze. Therefore, by using the visual saliency distribution information, it is possible to detect a tendency to inattentiveness with a simple configuration without measuring the actual driver's gaze.

また、注視範囲設定部は、注視すべき範囲を画像の消失点に基づいて設定してもよい。このようにすることにより、例えば前方車両等を検出しなくても注視すべき範囲を容易に設定することが可能となる。 The gaze range setting unit may also set the gaze range based on the vanishing point of the image. In this way, it becomes possible to easily set the gaze range without detecting, for example, a vehicle ahead.

また、脇見出力部は、ピーク位置が注視すべき範囲よりも上方又は下方に所定時間以上連続して位置していた場合は、固定物による脇見の傾向がある旨の情報を出力してもよい。注視すべき範囲よりも上方は、一般的に建物や交通信号、標識、街灯などの固定物が映り込むエリアであり、注視すべき範囲よりも下方は、一般的に道路標識等の路上ペイントが映り込むエリアである。つまり、これらの範囲にピーク位置が含まれる場合は、脇見による脇見対象物が固定物であると特定することができる。 The inattentive output unit may also output information indicating a tendency to look away due to a fixed object if the peak position is located above or below the range to be watched for a predetermined period of time or more. The area above the range to be watched is generally an area in which fixed objects such as buildings, traffic signals, signs, and street lights are reflected, while the area below the range to be watched is generally an area in which road markings such as road signs are reflected. In other words, if the peak position is included in these ranges, it is possible to identify that the object that is the cause of inattentive looking is a fixed object.

また、移動体の移動速度を取得する速度取得部を備え、注視範囲設定部は、移動速度に基づいて注視すべき範囲を変更してもよい。例えば高速走行時には、運転者の視野が狭くなることが知られており、それを考慮して注視すべき範囲を設定することができる。なお、速度取得部は、車速センサや加速度センサ等から速度に関する情報を取得してもよいし、撮像画像から速度を求めてもよい。 The vehicle may also include a speed acquisition unit that acquires the moving speed of the moving body, and the gaze range setting unit may change the gaze range based on the moving speed. For example, it is known that the driver's field of vision narrows when driving at high speed, and the gaze range can be set taking this into consideration. The speed acquisition unit may acquire information about speed from a vehicle speed sensor, an acceleration sensor, etc., or may determine the speed from a captured image.

また、移動体の位置情報を取得する位置取得部を備え、注視範囲設定部は、位置情報に基づいて注視すべき範囲を変更してもよい。このようにすることにより、例えば住宅街、幹線道路、繁華街等の走行している位置に応じて注視すべき範囲を設定することができる。 Furthermore, the apparatus may include a position acquisition section that acquires position information of the moving body, and the gaze range setting section may change the range to be gazed on based on the position information. By doing this, it is possible to set the range to be watched depending on the location where the vehicle is traveling, such as a residential area, a main road, or a downtown area.

また、取得部は、画像を写像処理可能な中間データに変換する入力部と、中間データを写像データに変換する非線形写像部と、写像データに基づき顕著性分布を示す顕著性推定情報を生成する出力部と、を備え、非線形写像部は、中間データに対し特徴の抽出を行う特徴抽出部と、特徴抽出部で生成されたデータのアップサンプルを行うアップサンプル部と、を備えてもよい。このようにすることにより、小さな計算コストで、視覚顕著性を推定することができる。また、このようにして推定した視覚顕著性は、文脈的な注意状態を反映したものとなる。 The acquisition unit also includes an input unit that converts the image into intermediate data that can be mapped, a nonlinear mapping unit that converts the intermediate data into mapped data, and generates saliency estimation information that indicates a saliency distribution based on the mapped data. The nonlinear mapping section may include a feature extraction section that extracts features from the intermediate data, and an upsampling section that upsamples the data generated by the feature extraction section. By doing so, visual saliency can be estimated with low calculation cost. Furthermore, the visual saliency estimated in this way reflects the contextual attention state.

また、本発明の一実施形態にかかる情報出力方法は、取得工程で、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性分布情報を時系列に取得し、ピーク位置検出工程で、視覚顕著性分布情報における少なくとも１つのピーク位置を時系列に検出する。そして、注視範囲設定工程で、画像における移動体の運転者が注視すべき範囲を設定し、脇見出力工程で、ピーク位置が注視すべき範囲から所定時間以上連続して外れていた場合は脇見の傾向がある旨の情報を出力する。この視覚顕著性分布情報には、統計的なヒトの視線の集まりやすさを示している。したがって、視覚顕著性分布情報のピークは、その中で最も統計的にヒトの視線が集まりやすい位置を示している。そのため、視覚的顕著性分布情報を用いることで、実際の運転手の視線を計測することなく、簡易な構成で脇見の傾向を検出することができる。 Further, in the information output method according to an embodiment of the present invention, in the acquisition step, the visual saliency obtained by estimating the level of visual saliency in the image based on the image captured from the outside from the moving object Distribution information is acquired in time series, and at least one peak position in the visual saliency distribution information is detected in time series in the peak position detection step. Then, in the gaze range setting step, the range that the driver of the moving object should gaze on in the image is set, and in the inattentiveness output step, if the peak position is out of the gaze range for a predetermined period of time or more, the inattentiveness is detected. Outputs information indicating that there is a trend. This visual saliency distribution information shows the statistical ease with which people's gaze is attracted. Therefore, the peak of the visual saliency distribution information indicates the position where human gaze is statistically most likely to be attracted. Therefore, by using visual saliency distribution information, it is possible to detect a tendency to look aside with a simple configuration without actually measuring the driver's line of sight.

また、上述した情報出力方法を、コンピュータにより実行させている。このようにすることにより、コンピュータを用いて実際の運転手の視線を計測することなく、簡便な構成で脇見の傾向を検出することができる。 Further, the information output method described above is executed by a computer. By doing so, it is possible to detect a tendency to look aside with a simple configuration without using a computer to measure the driver's actual line of sight.

また、上述した情報出力プログラムをコンピュータ読み取り可能な記憶媒体に格納してもよい。このようにすることにより、当該プログラムを機器に組み込む以外に単体でも流通させることができ、バージョンアップ等も容易に行える。 Furthermore, the above-described information output program may be stored in a computer-readable storage medium. By doing so, the program can be distributed as a standalone program in addition to being incorporated into a device, and version upgrades can be easily performed.

本発明の第１の実施例にかかる情報出力装置を図１～図１２を参照して説明する。本実施例にかかる情報出力装置は、例えば自動車等の移動体に設置されるに限らず、事業所等に設置されるサーバ装置等で構成してもよい。即ち、リアルタイムに解析する必要はなく、走行後等に解析を行ってもよい。 An information output device according to a first embodiment of the present invention will be described with reference to Figs. 1 to 12. The information output device according to this embodiment is not limited to being installed in a moving object such as an automobile, but may also be configured as a server device or the like installed in a business establishment or the like. In other words, analysis does not need to be performed in real time, and analysis may be performed after driving, etc.

図１に示したように、情報出力装置１は、走行画像取得部２と、視覚顕著性処理部３と、視覚顕著性ピーク検出部４と、脇見傾向判定部５と、脇見警告部６と、を備えている。 As shown in FIG. 1, the information output device 1 includes a driving image acquisition unit 2, a visual saliency processing unit 3, a visual saliency peak detection unit 4, an inattentive tendency determination unit 5, and an inattentive warning unit 6.

走行画像取得部２は、例えばカメラなどで撮像された画像（例えば動画像）が入力され、その画像を画像データとして出力する。なお、入力された動画像は、例えばフレーム毎等の時系列に分解された画像データとして出力する。走行画像取得部２に入力される画像として静止画を入力してもよいが、時系列に沿った複数の静止画からなる画像群として入力するのが好ましい。 The driving image acquisition unit 2 receives input of images (e.g., video images) captured by, for example, a camera, and outputs the images as image data. The input video images are output as image data broken down into a time series, for example, for each frame. Although still images may be input as images to be input to the driving image acquisition unit 2, it is preferable to input them as an image group consisting of multiple still images in a time series.

走行画像取得部２に入力される画像は、例えば車両の進行方向が撮像された画像が挙げられる。つまり、移動体から外部を連続的に撮像した画像とする。この画像はいわゆるパノラマ画像や複数カメラを用いて取得した画像等の水平方向に１８０°や３６０°等進行方向以外が含まれる画像であってもよい。また、走行画像取得部２には入力されるのは、カメラで撮像された画像に限らず、ハードディスクドライブやメモリカード等の記録媒体から読み出した画像であってもよい。 The images input to the driving image acquisition unit 2 include, for example, images of the vehicle's travel direction. In other words, images of the outside world continuously captured from a moving body. These images may be so-called panoramic images or images captured using multiple cameras, and may include angles other than the travel direction, such as 180° or 360° in the horizontal direction. Furthermore, images input to the driving image acquisition unit 2 are not limited to images captured by a camera, and may also be images read from a recording medium such as a hard disk drive or memory card.

視覚顕著性処理部３は、走行画像取得部２から画像データが入力され、後述する視覚顕著性推定情報として視覚顕著性マップを出力する。即ち、視覚顕著性処理部３は、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性マップ（視覚顕著性分布情報）を取得する取得部として機能する。 The visual saliency processing unit 3 receives image data from the driving image acquisition unit 2 and outputs a visual saliency map as visual saliency estimation information described below. In other words, the visual saliency processing unit 3 functions as an acquisition unit that acquires a visual saliency map (visual saliency distribution information) obtained by estimating the level of visual saliency in an image captured of the outside from a moving body.

図２は、視覚顕著性処理部３の構成を例示するブロック図である。本実施例に係る視覚顕著性処理部３は、入力部３１０、非線形写像部３２０、出力部３３０および記憶部３９０を備える。入力部３１０は、画像を写像処理可能な中間データに変換する。非線形写像部３２０は、中間データを写像データに変換する。出力部３３０は、写像データに基づき顕著性分布を示す顕著性推定情報を生成する。そして、非線形写像部３２０は、中間データに対し特徴の抽出を行う特徴抽出部３２１と、特徴抽出部３２１で生成されたデータのアップサンプルを行うアップサンプル部３２２とを備える。記憶部３９０は、走行画像取得部２から入力された画像データや後述するフィルタの係数等が保持されている。以下に詳しく説明する。 FIG. 2 is a block diagram illustrating the configuration of the visual saliency processing section 3. As shown in FIG. The visual saliency processing unit 3 according to this embodiment includes an input unit 310, a nonlinear mapping unit 320, an output unit 330, and a storage unit 390. The input unit 310 converts the image into intermediate data that can be mapped. The nonlinear mapping unit 320 converts intermediate data into mapping data. The output unit 330 generates saliency estimation information indicating a saliency distribution based on the mapping data. The nonlinear mapping section 320 includes a feature extraction section 321 that extracts features from intermediate data, and an upsampling section 322 that upsamples the data generated by the feature extraction section 321. The storage unit 390 stores image data input from the traveling image acquisition unit 2, coefficients of a filter described later, and the like. This will be explained in detail below.

図３（ａ）は、視覚顕著性処理部３へ入力する画像を例示する図であり、図３（ｂ）は、図３（ａ）に対し推定される、視覚顕著性分布を示す画像を例示する図である。本実施例に係る視覚顕著性処理部３は、画像における各部分の視覚顕著性を推定する装置である。視覚顕著性とは例えば、目立ちやすさや視線の集まりやすさを意味する。具体的には視覚顕著性は、確率等で示される。ここで、確率の大小は、たとえばその画像を見た人の視線がその位置に向く確率の大小に対応する。 FIG. 3(a) is a diagram illustrating an image input to the visual saliency processing unit 3, and FIG. 3(b) is a diagram illustrating an image showing the visual saliency distribution estimated for FIG. 3(a). It is a figure which illustrates. The visual saliency processing unit 3 according to this embodiment is a device that estimates the visual saliency of each part in an image. Visual conspicuousness means, for example, how easy it is to stand out or how easy it is to attract attention. Specifically, visual saliency is indicated by probability or the like. Here, the magnitude of the probability corresponds to, for example, the magnitude of the probability that the line of sight of the person viewing the image will turn to the position.

図３（ａ）と図３（ｂ）とは、互いに位置が対応している。そして、図３（ａ）において、視覚顕著性が高い位置ほど、図３（ｂ）において輝度が高く表示されている。図３（ｂ）のような視覚顕著性分布を示す画像は、出力部３３０が出力する視覚顕著性マップの一例である。本図の例において、視覚顕著性は、２５６階調の輝度値で可視化されている。出力部３３０が出力する視覚顕著性マップの例については詳しく後述する。 Figure 3(a) and Figure 3(b) correspond to each other in terms of position. In Figure 3(a), the higher the visual saliency is at a position, the higher the luminance is displayed in Figure 3(b). The image showing the visual saliency distribution as in Figure 3(b) is an example of a visual saliency map output by the output unit 330. In this example, visual saliency is visualized with 256 gradations of luminance values. An example of a visual saliency map output by the output unit 330 will be described in detail later.

図４は、本実施例に係る視覚顕著性処理部３の動作を例示するフローチャートである。図４に示したフローチャートは、コンピュータによって実行される情報出力方法の一部であって、入力ステップＳ１１５、非線形写像ステップＳ１２０、および出力ステップＳ１３０を含む。入力ステップＳ１１５では、画像が写像処理可能な中間データに変換される。非線形写像ステップＳ１２０では、中間データが写像データに変換される。出力ステップＳ１３０では、写像データに基づき顕著性分布を示す視覚顕著性推定情報（視覚顕著性分布情報）が生成される。ここで、非線形写像ステップＳ１２０は、中間データに対し特徴の抽出を行う特徴抽出ステップＳ１２１と、特徴抽出ステップＳ１２１で生成されたデータのアップサンプルを行うアップサンプルステップＳ１２２とを含む。 FIG. 4 is a flowchart illustrating the operation of the visual saliency processing unit 3 according to this embodiment. The flowchart shown in FIG. 4 is part of an information output method executed by a computer, and includes an input step S115, a nonlinear mapping step S120, and an output step S130. In input step S115, the image is converted into intermediate data that can be mapped. In the nonlinear mapping step S120, intermediate data is converted into mapping data. In the output step S130, visual saliency estimation information (visual saliency distribution information) indicating the saliency distribution is generated based on the mapping data. Here, the nonlinear mapping step S120 includes a feature extraction step S121 that extracts features from intermediate data, and an upsampling step S122 that upsamples the data generated in the feature extraction step S121.

図２に戻り、視覚顕著性処理部３の各構成要素について説明する。入力ステップＳ１１５において入力部３１０は、画像を取得し、中間データに変換する。入力部３１０は、画像データを走行画像取得部２から取得する。そして入力部３１０は、取得した画像を中間データに変換する。中間データは非線形写像部３２０が受け付け可能なデータであれば特に限定されないが、たとえば高次元テンソルである。また、中間データはたとえば、取得した画像に対し輝度を正規化したデータ、または、取得した画像の各画素を、輝度の傾きに変換したデータである。入力ステップＳ１１５において入力部３１０は、さらに画像のノイズ除去や解像度変換等を行っても良い。 Returning to FIG. 2, each component of the visual saliency processing unit 3 will be explained. In input step S115, the input unit 310 acquires an image and converts it into intermediate data. The input unit 310 acquires image data from the traveling image acquisition unit 2 . The input unit 310 then converts the acquired image into intermediate data. The intermediate data is not particularly limited as long as it is data that can be accepted by the nonlinear mapping unit 320, and is, for example, a high-dimensional tensor. Further, the intermediate data is, for example, data obtained by normalizing the brightness of the obtained image, or data obtained by converting each pixel of the obtained image into a slope of brightness. In input step S115, the input unit 310 may further perform image noise removal, resolution conversion, etc.

非線形写像ステップＳ１２０において、非線形写像部３２０は入力部３１０から中間データを取得する。そして、非線形写像部３２０において中間データが写像データに変換される。ここで、写像データは例えば高次元テンソルである。非線形写像部３２０で中間データに施される写像処理は、たとえばパラメータ等により制御可能な写像処理であり、関数、汎関数、またはニューラルネットワークによる処理であることが好ましい。 In the nonlinear mapping step S120, the nonlinear mapping section 320 obtains intermediate data from the input section 310. Then, the intermediate data is converted into mapping data in the nonlinear mapping section 320. Here, the mapping data is, for example, a high-dimensional tensor. The mapping process performed on the intermediate data by the nonlinear mapping unit 320 is, for example, a mapping process that can be controlled by parameters, etc., and is preferably a process using a function, a functional, or a neural network.

図５は、非線形写像部３２０の構成を詳しく例示する図であり、図６は、中間層３２３の構成を例示する図である。上記した通り、非線形写像部３２０は、特徴抽出部３２１およびアップサンプル部３２２を備える。特徴抽出部３２１において特徴抽出ステップＳ１２１が行われ、アップサンプル部３２２においてアップサンプルステップＳ１２２が行われる。また、本図の例において、特徴抽出部３２１およびアップサンプル部３２２の少なくとも一方は、複数の中間層３２３を含むニューラルネットワークを含んで構成される。ニューラルネットワークにおいては、複数の中間層３２３が結合されている。 FIG. 5 is a diagram illustrating the configuration of the nonlinear mapping unit 320 in detail, and FIG. 6 is a diagram illustrating the configuration of the intermediate layer 323. As described above, the nonlinear mapping section 320 includes the feature extraction section 321 and the up-sampling section 322. The feature extraction section 321 performs a feature extraction step S121, and the upsampling section 322 performs an upsampling step S122. Further, in the example shown in the figure, at least one of the feature extraction section 321 and the up-sampling section 322 is configured to include a neural network including a plurality of intermediate layers 323. In the neural network, multiple intermediate layers 323 are coupled.

特にニューラルネットワークは畳み込みニューラルネットワークであることが好ましい。具体的には、複数の中間層３２３のそれぞれは、一または二以上の畳み込み層３２４を含む。そして、畳み込み層３２４では、入力されたデータに対し複数のフィルタ３２５による畳み込みが行われ、複数のフィルタ３２５の出力に対し活性化処理が施される。 In particular, it is preferable that the neural network is a convolutional neural network. Specifically, each of the multiple intermediate layers 323 includes one or more convolutional layers 324. In the convolutional layer 324, the input data is convolved by multiple filters 325, and activation processing is performed on the output of the multiple filters 325.

図５の例において、特徴抽出部３２１は、複数の中間層３２３を含むニューラルネットワークを含んで構成され、複数の中間層３２３の間に第１のプーリング部３２６を備える。また、アップサンプル部３２２は、複数の中間層３２３を含むニューラルネットワークを含んで構成され、複数の中間層３２３の間にアンプーリング部３２８を備える。さらに、特徴抽出部３２１とアップサンプル部３２２とは、オーバーラッププーリングを行う第２のプーリング部３２７を介して互いに接続されている。 In the example of FIG. 5, the feature extraction unit 321 includes a neural network including a plurality of intermediate layers 323, and includes a first pooling unit 326 between the plurality of intermediate layers 323. Further, the up-sampling unit 322 includes a neural network including a plurality of intermediate layers 323, and includes an unpooling unit 328 between the plurality of intermediate layers 323. Further, the feature extraction section 321 and the up-sampling section 322 are connected to each other via a second pooling section 327 that performs overlap pooling.

なお、本図の例において各中間層３２３は、二以上の畳み込み層３２４からなる。ただし、少なくとも一部の中間層３２３は、一の畳み込み層３２４のみからなってもよい。互いに隣り合う中間層３２３は、第１のプーリング部３２６、第２のプーリング部３２７およびアンプーリング部３２８のいずれかで区切られる。ここで、中間層３２３に二以上の畳み込み層３２４が含まれる場合、それらの畳み込み層３２４におけるフィルタ３２５の数は互いに等しいことが好ましい。 Note that in the example shown in this figure, each intermediate layer 323 consists of two or more convolutional layers 324. However, at least some of the intermediate layers 323 may consist of only one convolutional layer 324. The intermediate layers 323 that are adjacent to each other are separated by one of a first pooling section 326, a second pooling section 327, and an unpooling section 328. Here, when the intermediate layer 323 includes two or more convolutional layers 324, it is preferable that the numbers of filters 325 in those convolutional layers 324 are equal to each other.

本図では、「Ａ×Ｂ」と記された中間層３２３は、Ｂ個の畳み込み層３２４からなり、各畳み込み層３２４は、各チャネルに対しＡ個の畳み込みフィルタを含むことを意味している。このような中間層３２３を以下では「Ａ×Ｂ中間層」とも呼ぶ。たとえば、６４×２中間層３２３は、２個の畳み込み層３２４からなり、各畳み込み層３２４は、各チャネルに対し６４個の畳み込みフィルタを含むことを意味している。 In this diagram, the intermediate layer 323 marked "A×B" is composed of B convolution layers 324, meaning that each convolution layer 324 includes A convolution filters for each channel. Such an intermediate layer 323 is also referred to below as an "A×B intermediate layer." For example, a 64×2 intermediate layer 323 is composed of two convolution layers 324, meaning that each convolution layer 324 includes 64 convolution filters for each channel.

本図の例において、特徴抽出部３２１は、６４×２中間層３２３、１２８×２中間層３２３、２５６×３中間層３２３、および、５１２×３中間層３２３をこの順に含む。また、アップサンプル部３２２は、５１２×３中間層３２３、２５６×３中間層３２３、１２８×２中間層３２３、および６４×２中間層３２３をこの順に含む。また、第２のプーリング部３２７は、２つの５１２×３中間層３２３を互いに接続している。なお、非線形写像部３２０を構成する中間層３２３の数は特に限定されず、たとえば画像データの画素数に応じて定めることができる。 In the example shown in the figure, the feature extraction unit 321 includes a 64×2 intermediate layer 323, a 128×2 intermediate layer 323, a 256×3 intermediate layer 323, and a 512×3 intermediate layer 323 in this order. Further, the up-sample section 322 includes a 512×3 intermediate layer 323, a 256×3 intermediate layer 323, a 128×2 intermediate layer 323, and a 64×2 intermediate layer 323 in this order. Further, the second pooling section 327 connects the two 512×3 intermediate layers 323 to each other. Note that the number of intermediate layers 323 constituting the nonlinear mapping section 320 is not particularly limited, and can be determined depending on, for example, the number of pixels of image data.

なお、本図は非線形写像部３２０の構成の一例であり、非線形写像部３２０は他の構成を有していても良い。たとえば、６４×２中間層３２３の代わりに６４×１中間層３２３が含まれても良い。中間層３２３に含まれる畳み込み層３２４の数が削減されることで、計算コストがより低減される可能性がある。また、たとえば、６４×２中間層３２３の代わりに３２×２中間層３２３が含まれても良い。中間層３２３のチャネル数が削減されることで、計算コストがより低減される可能性がある。さらに、中間層３２３における畳み込み層３２４の数とチャネル数との両方を削減しても良い。 Note that this diagram is an example of the configuration of the nonlinear mapping unit 320, and the nonlinear mapping unit 320 may have other configurations. For example, a 64×1 intermediate layer 323 may be included instead of the 64×2 intermediate layer 323. Reducing the number of convolutional layers 324 included in the intermediate layer 323 may further reduce the computational cost. Also, for example, a 32×2 intermediate layer 323 may be included instead of the 64×2 intermediate layer 323. Reducing the number of channels in the intermediate layer 323 may further reduce the computational cost. Furthermore, both the number of convolutional layers 324 and the number of channels in the intermediate layer 323 may be reduced.

ここで、特徴抽出部３２１に含まれる複数の中間層３２３においては、第１のプーリング部３２６を経る毎にフィルタ３２５の数が増加することが好ましい。具体的には、第１の中間層３２３ａと第２の中間層３２３ｂとが、第１のプーリング部３２６を介して互いに連続しており、第１の中間層３２３ａの後段に第２の中間層３２３ｂが位置する。そして、第１の中間層３２３ａは、各チャネルに対するフィルタ３２５の数がＮ１である畳み込み層３２４で構成されており、第２の中間層３２３ｂは、各チャネルに対するフィルタ３２５の数がＮ２である畳み込み層３２４で構成されている。このとき、Ｎ２＞Ｎ１が成り立つことが好ましい。また、Ｎ２＝Ｎ１×２が成り立つことがより好ましい。 Here, in the multiple intermediate layers 323 included in the feature extraction unit 321, it is preferable that the number of filters 325 increases each time the first pooling unit 326 is passed through. Specifically, the first intermediate layer 323a and the second intermediate layer 323b are continuous with each other via the first pooling unit 326, and the second intermediate layer 323b is located after the first intermediate layer 323a. The first intermediate layer 323a is composed of a convolution layer 324 in which the number of filters 325 for each channel is N1, and the second intermediate layer 323b is composed of a convolution layer 324 in which the number of filters 325 for each channel is N2. At this time, it is preferable that N2>N1 holds. It is more preferable that N2=N1×2 holds.

また、アップサンプル部３２２に含まれる複数の中間層３２３においては、アンプーリング部３２８を経る毎にフィルタ３２５の数が減少することが好ましい。具体的には、第３の中間層３２３ｃと第４の中間層３２３ｄとが、アンプーリング部３２８を介して互いに連続しており、第３の中間層３２３ｃの後段に第４の中間層３２３ｄが位置する。そして、第３の中間層３２３ｃは、各チャネルに対するフィルタ３２５の数がＮ３である畳み込み層３２４で構成されており、第４の中間層３２３ｄは、各チャネルに対するフィルタ３２５の数がＮ４である畳み込み層３２４で構成されている。このとき、Ｎ４＜Ｎ３が成り立つことが好ましい。また、Ｎ３＝Ｎ４×２が成り立つことがより好ましい。 In addition, in the multiple intermediate layers 323 included in the upsampling unit 322, it is preferable that the number of filters 325 decreases each time the unpooling unit 328 is passed through. Specifically, the third intermediate layer 323c and the fourth intermediate layer 323d are continuous with each other via the unpooling unit 328, and the fourth intermediate layer 323d is located after the third intermediate layer 323c. The third intermediate layer 323c is composed of a convolution layer 324 in which the number of filters 325 for each channel is N3, and the fourth intermediate layer 323d is composed of a convolution layer 324 in which the number of filters 325 for each channel is N4. At this time, it is preferable that N4<N3 holds. It is more preferable that N3=N4×2 holds.

特徴抽出部３２１では、入力部３１０から取得した中間データから勾配や形状など、複数の抽象度を持つ画像特徴を中間層３２３のチャネルとして抽出する。図６は、６４×２中間層３２３の構成を例示している。本図を参照して、中間層３２３における処理を説明する。本図の例において、中間層３２３は第１の畳み込み層３２４ａと第２の畳み込み層３２４ｂとで構成されており、各畳み込み層３２４は６４個のフィルタ３２５を備える。第１の畳み込み層３２４ａでは、中間層３２３に入力されたデータの各チャネルに対して、フィルタ３２５を用いた畳み込み処理が施される。たとえば入力部３１０へ入力された画像がＲＧＢ画像である場合、３つのチャネルｈ０ｉ（ｉ＝１．．３）のそれぞれに対して処理が施される。また、本図の例において、フィルタ３２５は６４種の３×３フィルタであり、すなわち合計６４×３種のフィルタである。畳み込み処理の結果、各チャネルｉに対して、６４個の結果ｈ０ｉ，ｊ（ｉ＝１．．３，ｊ＝１．．６４）が得られる。 The feature extraction unit 321 extracts image features with multiple abstraction levels, such as gradients and shapes, from the intermediate data acquired from the input unit 310 as channels of the intermediate layer 323. FIG. 6 illustrates an example of the configuration of the 64×2 intermediate layer 323. The processing in the intermediate layer 323 will be described with reference to this figure. In the example of this figure, the intermediate layer 323 is composed of a first convolution layer 324a and a second convolution layer 324b, and each convolution layer 324 has 64 filters 325. In the first convolution layer 324a, convolution processing using the filters 325 is performed on each channel of the data input to the intermediate layer 323. For example, if the image input to the input unit 310 is an RGB image, processing is performed on each of the three channels h0i (i=1..3). Also, in the example of this figure, the filters 325 are 64 types of 3×3 filters, that is, a total of 64×3 types of filters. As a result of the convolution process, 64 results h0i,j (i = 1..3, j = 1..64) are obtained for each channel i.

次に、複数のフィルタ３２５の出力に対し、活性化部３２９において活性化処理が行われる。具体的には、全チャネルの対応する結果ｊについて、対応する要素毎の総和に活性化処理が施される。この活性化処理により、６４チャネルの結果ｈ１ｉ（ｉ＝１．．６４）、すなわち、第１の畳み込み層３２４ａの出力が、画像特徴として得られる。活性化処理は特に限定されないが、双曲関数、シグモイド関数、および正規化線形関数の少なくともいずれかを用いる処理が好ましい。 Next, the activation unit 329 performs activation processing on the outputs of the multiple filters 325. Specifically, activation processing is performed on the sum of the corresponding elements for the corresponding results j of all channels. This activation processing results in 64 channels h1i (i=1..64), i.e., the output of the first convolution layer 324a, as image features. The activation processing is not particularly limited, but processing using at least one of a hyperbolic function, a sigmoid function, and a normalized linear function is preferable.

さらに、第１の畳み込み層３２４ａの出力データを第２の畳み込み層３２４ｂの入力データとし、第２の畳み込み層３２４ｂにて第１の畳み込み層３２４ａと同様の処理を行って、６４チャネルの結果ｈ２ｉ（ｉ＝１．．６４）、すなわち第２の畳み込み層３２４ｂの出力が、画像特徴として得られる。第２の畳み込み層３２４ｂの出力がこの６４×２中間層３２３の出力データとなる。 Furthermore, the output data of the first convolutional layer 324a is used as input data for the second convolutional layer 324b, which performs the same processing as the first convolutional layer 324a, and the 64-channel result h2i (i = 1..64), that is, the output of the second convolutional layer 324b, is obtained as the image feature. The output of the second convolutional layer 324b becomes the output data of this 64 x 2 intermediate layer 323.

ここで、フィルタ３２５の構造は特に限定されないが、３×３の二次元フィルタであることが好ましい。また、各フィルタ３２５の係数は独立に設定可能である。本実施例において、各フィルタ３２５の係数は記憶部３９０に保持されており、非線形写像部３２０がそれを読み出して処理に用いることができる。ここで、複数のフィルタ３２５の係数は機械学習を用いて生成、修正された補正情報に基づいて定められてもよい。たとえば、補正情報は、複数のフィルタ３２５の係数を、複数の補正パラメータとして含む。非線形写像部３２０は、この補正情報をさらに用いて中間データを写像データに変換することができる。記憶部３９０は視覚顕著性処理部３に備えられていてもよいし、視覚顕著性処理部３の外部に設けられていてもよい。また、非線形写像部３２０は補正情報を、通信ネットワークを介して外部から取得しても良い。 Here, the structure of the filter 325 is not particularly limited, but it is preferable that the filter 325 is a two-dimensional filter of 3×3. The coefficients of each filter 325 can be set independently. In this embodiment, the coefficients of each filter 325 are stored in the memory unit 390, and the nonlinear mapping unit 320 can read them and use them for processing. Here, the coefficients of the multiple filters 325 may be determined based on correction information generated and corrected using machine learning. For example, the correction information includes the coefficients of the multiple filters 325 as multiple correction parameters. The nonlinear mapping unit 320 can further use this correction information to convert the intermediate data into mapping data. The memory unit 390 may be provided in the visual saliency processing unit 3 or may be provided outside the visual saliency processing unit 3. In addition, the nonlinear mapping unit 320 may obtain the correction information from the outside via a communication network.

図７（ａ）および図７（ｂ）はそれぞれ、フィルタ３２５で行われる畳み込み処理の例を示す図である。図７（ａ）および図７（ｂ）では、いずれも３×３畳み込みの例が示されている。図７（ａ）の例は、最近接要素を用いた畳み込み処理である。図７（ｂ）の例は、距離が二以上の近接要素を用いた畳み込み処理である。なお、距離が三以上の近接要素を用いた畳み込み処理も可能である。フィルタ３２５は、距離が二以上の近接要素を用いた畳み込み処理を行うことが好ましい。より広範囲の特徴を抽出することができ、視覚顕著性の推定精度をさらに高めることができるからである。 FIGS. 7A and 7B are diagrams showing examples of convolution processing performed by the filter 325, respectively. 7(a) and 7(b) both show examples of 3×3 convolution. The example in FIG. 7(a) is a convolution process using the nearest elements. The example in FIG. 7(b) is a convolution process using adjacent elements having a distance of two or more. Note that convolution processing using adjacent elements having a distance of three or more is also possible. It is preferable that the filter 325 performs convolution processing using adjacent elements having a distance of two or more. This is because a wider range of features can be extracted and the accuracy of estimating visual saliency can be further improved.

以上、６４×２中間層３２３の動作について説明した。他の中間層３２３（１２８×２中間層３２３、２５６×３中間層３２３、および、５１２×３中間層３２３等）の動作についても、畳み込み層３２４の数およびチャネルの数を除いて、６４×２中間層３２３の動作と同じである。また、特徴抽出部３２１における中間層３２３の動作も、アップサンプル部３２２における中間層３２３の動作も上記と同様である。 The above describes the operation of the 64×2 intermediate layer 323. The operation of the other intermediate layers 323 (such as the 128×2 intermediate layer 323, the 256×3 intermediate layer 323, and the 512×3 intermediate layer 323) is the same as that of the 64×2 intermediate layer 323, except for the number of convolutional layers 324 and the number of channels. In addition, the operation of the intermediate layer 323 in the feature extraction unit 321 and the operation of the intermediate layer 323 in the upsampling unit 322 are also the same as above.

図８（ａ）は、第１のプーリング部３２６の処理を説明するための図であり、図８（ｂ）は、第２のプーリング部３２７の処理を説明するための図であり、図８（ｃ）は、アンプーリング部３２８の処理を説明するための図である。 Figure 8(a) is a diagram for explaining the processing of the first pooling unit 326, Figure 8(b) is a diagram for explaining the processing of the second pooling unit 327, and Figure 8(c) is a diagram for explaining the processing of the unpooling unit 328.

特徴抽出部３２１において、中間層３２３から出力されたデータは、第１のプーリング部３２６においてチャネル毎にプーリング処理が施された後、次の中間層３２３に入力される。第１のプーリング部３２６ではたとえば、非オーバーラップのプーリング処理が行われる。図８（ａ）では、各チャネルに含まれる要素群に対し、２×２の４つの要素３０を１つの要素３０に対応づける処理を示している。第１のプーリング部３２６ではこのような対応づけが全ての要素３０に対し行われる。ここで、２×２の４つの要素３０は互いに重ならないよう選択される。本例では、各チャネルの要素数が４分の１に縮小される。なお、第１のプーリング部３２６において要素数が縮小される限り、対応づける前後の要素３０の数は特に限定されない。 In the feature extraction unit 321, the data output from the intermediate layer 323 is subjected to pooling processing for each channel in the first pooling unit 326, and then input to the next intermediate layer 323. In the first pooling unit 326, for example, non-overlapping pooling processing is performed. FIG. 8(a) shows a process of associating four 2×2 elements 30 with one element 30 for the element group included in each channel. In the first pooling unit 326, such association is performed for all elements 30. Here, the four 2×2 elements 30 are selected so that they do not overlap with each other. In this example, the number of elements in each channel is reduced to one-fourth. Note that, as long as the number of elements is reduced in the first pooling unit 326, the number of elements 30 before and after the association is not particularly limited.

特徴抽出部３２１から出力されたデータは、第２のプーリング部３２７を介してアップサンプル部３２２に入力される。第２のプーリング部３２７では、特徴抽出部３２１からの出力データに対し、オーバーラッププーリングが施される。図８（ｂ）では、一部の要素３０をオーバーラップさせながら、２×２の４つの要素３０を１つの要素３０に対応づける処理を示している。すなわち、繰り返される対応づけにおいて、ある対応づけにおける２×２の４つの要素３０のうち一部が、次の対応づけにおける２×２の４つの要素３０にも含まれる。本図のような第２のプーリング部３２７では要素数は縮小されない。なお、第２のプーリング部３２７において対応づける前後の要素３０の数は特に限定されない。 The data output from the feature extraction unit 321 is input to the upsampling unit 322 via the second pooling unit 327. In the second pooling unit 327, overlap pooling is performed on the output data from the feature extraction unit 321. FIG. 8(b) shows a process of matching four 2×2 elements 30 to one element 30 while overlapping some of the elements 30. That is, in repeated matching, some of the four 2×2 elements 30 in a certain matching are also included in the four 2×2 elements 30 in the next matching. The number of elements is not reduced in the second pooling unit 327 as shown in this figure. Note that the number of elements 30 before and after matching in the second pooling unit 327 is not particularly limited.

第１のプーリング部３２６および第２のプーリング部３２７で行われる各処理の方法は特に限定されないが、たとえば、４つの要素３０の最大値を１つの要素３０とする対応づけ（max pooling）や４つの要素３０の平均値を１つの要素３０とする対応づけ（average pooling）が挙げられる。 The method of each process performed by the first pooling unit 326 and the second pooling unit 327 is not particularly limited, but for example, the maximum value of four elements 30 is associated with one element 30 (max pooling), An example of this is average pooling, in which the average value of two elements 30 is used as one element 30.

第２のプーリング部３２７から出力されたデータは、アップサンプル部３２２における中間層３２３に入力される。そして、アップサンプル部３２２の中間層３２３からの出力データはアンプーリング部３２８においてチャネル毎にアンプーリング処理が施された後、次の中間層３２３に入力される。図８（ｃ）では、１つの要素３０を複数の要素３０に拡大する処理を示している。拡大の方法は特に限定されないが、１つの要素３０を２×２の４つの要素３０へ複製する方法が例として挙げられる。 The data output from the second pooling unit 327 is input to the intermediate layer 323 in the upsampling unit 322. The output data from the intermediate layer 323 of the upsampling unit 322 is subjected to unpooling processing for each channel in the unpooling unit 328, and then input to the next intermediate layer 323. Figure 8(c) shows the process of expanding one element 30 to multiple elements 30. The method of expansion is not particularly limited, but one example is a method of duplicating one element 30 to four elements 30 (2 x 2).

アップサンプル部３２２の最後の中間層３２３の出力データは写像データとして非線形写像部３２０から出力され、出力部３３０に入力される。出力ステップＳ１３０において出力部３３０は、非線形写像部３２０から取得したデータに対し、たとえば正規化や解像度変換等を行うことで視覚顕著性マップを生成し、出力する。視覚顕著性マップはたとえば、図３（ｂ）に例示したような視覚顕著性を輝度値で可視化した画像（画像データ）である。また、視覚顕著性マップはたとえば、ヒートマップのように視覚顕著性に応じて色分けされた画像であっても良いし、視覚顕著性が予め定められた基準より高い視覚顕著領域を、その他の位置とは識別可能にマーキングした画像であっても良い。さらに、視覚顕著性推定情報は画像等として示されたマップ情報に限定されず、視覚顕著領域を示す情報を列挙したテーブル等であっても良い。 The output data of the last intermediate layer 323 of the upsampling unit 322 is output from the nonlinear mapping unit 320 as mapping data and input to the output unit 330. In the output step S130, the output unit 330 generates and outputs a visual saliency map by performing, for example, normalization or resolution conversion on the data acquired from the nonlinear mapping unit 320. The visual saliency map is, for example, an image (image data) in which visual saliency is visualized by brightness values, as shown in FIG. 3B. The visual saliency map may also be, for example, an image that is colored according to visual saliency, such as a heat map, or an image in which visual saliency areas with visual saliency higher than a predetermined standard are marked in a manner that makes them distinguishable from other positions. Furthermore, the visual saliency estimation information is not limited to map information shown as an image or the like, and may be a table or the like that lists information indicating visual saliency areas.

視覚顕著性ピーク検出部４は、視覚顕著性処理部３において取得した視覚顕著性マップにおいて、ピークとなる位置（画素）を検出する。ここで、本実施例においてピークとは画素値が最大値（輝度が最大）となる視覚顕著性が高い画素であり、位置は座標で表される。即ち、視覚顕著性ピーク検出部４は、視覚顕著性マップ（視覚顕著性分布情報）における少なくとも１つのピーク位置を検出するピーク位置検出部として機能する。 The visual saliency peak detection unit 4 detects peak positions (pixels) in the visual saliency map acquired by the visual saliency processing unit 3. Here, in this embodiment, a peak is a pixel with high visual saliency where the pixel value is the maximum value (luminance is maximum), and the position is represented by coordinates. That is, the visual saliency peak detection unit 4 functions as a peak position detection unit that detects at least one peak position in the visual saliency map (visual saliency distribution information).

脇見傾向判定部５は、視覚顕著性ピーク検出部４で検出されたピークとなる位置に基づいて、走行画像取得部２から入力された画像が脇見の傾向があるか判定する。脇見傾向判定部５は、まず、走行画像取得部２から入力された画像について注視エリア（注視すべき範囲）を設定する。注視エリアの設定方法について図９を参照して説明する。即ち、脇見傾向判定部５は、画像における移動体の運転者が注視すべき範囲を設定する注視範囲設定部として機能する。 The inattentive tendency determination unit 5 determines whether the image input from the driving image acquisition unit 2 has an inattentive tendency based on the peak position detected by the visual saliency peak detection unit 4. The inattentive tendency determination unit 5 first sets a gaze area (range to be gazed at) for the image input from the driving image acquisition unit 2. The method of setting the gaze area will be described with reference to FIG. 9. That is, the inattentive tendency determination unit 5 functions as a gaze range setting unit that sets the range in the image that the driver of the moving object should gaze at.

図９に示した画像Ｐにおいて、注視エリアＧは、消失点Ｖの周囲に設定されている。即ち、注視エリアＧ（注視すべき範囲）を画像の消失点に基づいて設定している。この注視エリアＧは、予め注視エリアＧの大きさ（例えば幅３ｍ、高さ２ｍ）を設定し、画像Ｐの水平画素数、垂直画素数、水平画角、垂直画角、先行車両までの車間距離、画像を撮像しているドライブレコーダー等のカメラの取り付け高さ等から、設定した大きさの画素数を算出することが可能である。なお、消失点は、白線等から推定してもよいし、オプティカルフロー等を用いて推定してもよい。また、先行車両までの車間距離は、実際の先行車両を検出する必要はなく仮想的に設定するものでよい。 In the image P shown in FIG. 9, the gaze area G is set around the vanishing point V. That is, the gaze area G (range to be watched) is set based on the vanishing point of the image. This gaze area G is determined by setting the size of the gaze area G (for example, 3 m in width and 2 m in height) in advance, and determining the number of horizontal pixels, the number of vertical pixels, the horizontal angle of view, the vertical angle of view, and the distance to the preceding vehicle. It is possible to calculate the number of pixels of the set size from the distance, the mounting height of the camera such as a drive recorder that is capturing the image, etc. Note that the vanishing point may be estimated from a white line or the like, or may be estimated using optical flow or the like. Furthermore, the inter-vehicle distance to the preceding vehicle may be set virtually without the need to actually detect the preceding vehicle.

次に、設定した注視エリアＧに基づいて画像Ｐにおける脇見検出エリアを設定する（図１０の網掛け部分）。この脇見検出エリアは、上方エリアＩｕ、下方エリアＩｄ、左側方エリアＩｌ、右側方エリアＩｒがそれぞれ設定される。これらのエリアは、消失点Ｖと、注視エリアＧの各頂点を結ぶ線分により区分けされる。即ち、上方エリアＩｕと左側方エリアＩｌとは、消失点Ｖと注視エリアＧの頂点Ｇａとを結ぶ線分Ｌ１により区切られている。上方エリアＩｕと右側方エリアＩｒとは、消失点Ｖと注視エリアＧの頂点Ｇｄとを結ぶ線分Ｌ２により区切られている。下方エリアＩｄと左側方エリアＩｌとは、消失点Ｖと注視エリアＧの頂点Ｇｂとを結ぶ線分Ｌ３により区切られている。下方エリアＩｄと右側方エリアＩｒとは、消失点Ｖと注視エリアＧの頂点Ｇｃとを結ぶ線分Ｌ４により区切られている。 Next, an inattentive detection area in image P is set based on the set gaze area G (shaded area in FIG. 10). The inattentive detection area is set to an upper area Iu, a lower area Id, a left side area Il, and a right side area Ir. These areas are divided by lines connecting the vanishing point V and each vertex of gaze area G. That is, the upper area Iu and the left side area Il are separated by a line segment L1 connecting the vanishing point V and the vertex Ga of gaze area G. The upper area Iu and the right side area Ir are separated by a line segment L2 connecting the vanishing point V and the vertex Gd of gaze area G. The lower area Id and the left side area Il are separated by a line segment L3 connecting the vanishing point V and the vertex Gb of gaze area G. The lower area Id and the right side area Ir are separated by a line segment L4 connecting the vanishing point V and the vertex Gc of gaze area G.

なお、脇見検出エリアは図１０に示したような区分けに限らない。例えば、図１１に示したようにしてもよい。図１１は、注視エリアＧの各辺を延長した線分により脇見検出エリアを区分けしている。図１１の方法は、形状が単純になるので、脇見検出エリアの区分けにかかる処理を軽減することができる。 Note that the inattentiveness detection area is not limited to the division as shown in FIG. For example, the configuration shown in FIG. 11 may be used. In FIG. 11, the inattentiveness detection area is divided by line segments obtained by extending each side of the gaze area G. Since the method of FIG. 11 has a simple shape, it is possible to reduce the processing involved in dividing the inattentiveness detection areas.

次に、脇見傾向判定部５における脇見傾向の判定について説明する。視覚顕著性ピーク検出部４で検出されたピーク位置が、所定時間以上注視エリアＧから連続して外れていた場合は脇見傾向であると判定する。ここで、所定時間は例えば２秒とすることができるが適宜変更してもよい。即ち、脇見傾向判定部５は、ピーク位置が注視すべき範囲から所定時間以上連続して外れていたか判定している。 Next, the determination of the inattentive tendency by the inattentive tendency determination unit 5 will be described. If the peak position detected by the visual saliency peak detection unit 4 is continuously outside the gaze area G for a predetermined period of time or more, it is determined that there is an inattentive tendency. Here, the predetermined period of time can be, for example, two seconds, but may be changed as appropriate. In other words, the inattentive tendency determination unit 5 determines whether the peak position has been continuously outside the range to be gazed at for a predetermined period of time or more.

また、脇見傾向判定部５は、脇見検出エリアが上方エリアＩｕ又は下方エリアＩｄであった場合は固定物による脇見の傾向があると判定してもよい。これは、車両から前方を撮像した画像の場合、上方エリアＩｕには、建物や交通信号、標識、街灯などの固定物が映り込むのが一般的であり、下方エリアＩｄには、道路標識等の路上ペイントが映り込むのが一般的である。一方、左側方エリアＩｌや右側方エリアＩｒは、他の走行車線等を走行する車両等の自車両以外の移動体が映り込むことがあり、エリアにより脇見対象物（固定物か移動体か）まで判定するのは困難である。 The inattentive tendency determination unit 5 may also determine that there is a tendency to look inattentively due to fixed objects when the inattentive detection area is the upper area Iu or the lower area Id. This is because, in the case of an image captured of the front from the vehicle, fixed objects such as buildings, traffic signals, signs, and street lights are generally reflected in the upper area Iu, and road paint such as road signs is generally reflected in the lower area Id. On the other hand, moving objects other than the vehicle itself, such as vehicles traveling in other driving lanes, may be reflected in the left side area Il and right side area Ir, and it is difficult to determine the object of inattentiveness (whether it is a fixed object or a moving object) depending on the area.

脇見警告部６は、脇見傾向判定部５の判定結果に基づいて警告等を報知する。警告の報知の方法は、運転者等が視認可能な表示装置等に表示することにより行ってもよいし、音声や振動として出力してもよい。即ち、脇見警告部６は、脇見の傾向がある旨の情報を出力する脇見出力部として機能する。本実施例では、警告を脇見の傾向がある旨の情報としているが、脇見の傾向がある旨の情報として、フラグ等の脇見検出の情報に加え、その際の時刻、位置等の情報を含めて記憶媒体や情報出力装置１の外部に通信等で出力してもよい。また、脇見傾向判定部５の判定結果に基づき、ヒヤリハットに係る情報としてその結果を出力してもよい。 The inattentiveness warning section 6 issues a warning or the like based on the determination result of the inattentiveness tendency determination section 5. The warning may be notified by displaying it on a display device that is visible to the driver, or by outputting it as a sound or vibration. That is, the inattentiveness warning section 6 functions as an inattentiveness output section that outputs information indicating that there is a tendency to look aside. In this embodiment, the warning is information that there is a tendency to look aside, but the information that there is a tendency to look aside includes information such as the time and position at the time in addition to information on detection of looking aside such as flags. The data may also be output to a storage medium or outside the information output device 1 via communication or the like. Further, based on the determination result of the inattentive tendency determination unit 5, the result may be output as information related to a near-miss.

次に、上述した構成の情報出力装置１における動作（情報出力方法）について、図１２のフローチャートを参照して説明する。また、このフローチャートを情報出力装置１として機能するコンピュータで実行されるプログラムとして構成することで情報処理プログラムとすることができる。また、この情報出力プログラムは、情報出力装置１が有するメモリ等に記憶するに限らず、メモリカードや光ディスク等の記憶媒体に格納してもよい。 Next, the operation (information output method) of the information output device 1 configured as described above will be explained with reference to the flowchart of FIG. 12. Further, by configuring this flowchart as a program executed by a computer functioning as the information output device 1, it can be made into an information processing program. Further, this information output program is not limited to being stored in the memory of the information output device 1, but may be stored in a storage medium such as a memory card or an optical disc.

まず、脇見警告部６は、脇見警告スイッチ（ＳＷ）がＯＮかＯＦＦか判断する（ステップＳ１０１）。脇見警告ＳＷとは、脇見警告部６による警告を実行するか否かを切り替えるスイッチであり、脇見警告部６が有して、脇見傾向判定部５により切り替え制御がされる。 First, the inattentive warning unit 6 determines whether the inattentive warning switch (SW) is ON or OFF (step S101). The inattentive warning SW is a switch that switches whether or not the inattentive warning unit 6 issues a warning, and is provided in the inattentive warning unit 6 and is switched under the control of the inattentive tendency determination unit 5.

脇見警告ＳＷがＯＮの場合は（ステップＳ１０１；ＳＷ＝ＯＮ）、脇見警告部６は、警告タイマー閾値の比較を行う（ステップＳ１０２）。警告タイマーとは、脇見警告部６による警告を実行する期間を計時するタイマーであり、警告タイマー閾値とは、その警告の実行期間を定める閾値である。つまり、脇見警告部６による警告は警告タイマー閾値に定められた期間だけ警告を行う。 When the inattentive warning SW is ON (step S101; SW = ON), the inattentive warning unit 6 performs a comparison with the warning timer threshold (step S102). The warning timer is a timer that measures the period during which the inattentive warning unit 6 issues a warning, and the warning timer threshold is a threshold that determines the period during which the warning is executed. In other words, the inattentive warning unit 6 issues a warning only for the period determined by the warning timer threshold.

警告タイマー閾値を超えていた場合は（ステップＳ１０２；閾値超え）、脇見警告部６は、脇見警告ＳＷをＯＦＦにして警告タイマーを停止して（ステップＳ１０３）、後述すステップＳ１０４が実行される。警告タイマー閾値を超えていない場合は（ステップＳ１０２；閾値超えない）、何もせずに後述するステップＳ１０４が実行される。 If the warning timer threshold is exceeded (step S102; threshold exceeded), the inattentiveness warning section 6 turns off the inattentiveness warning SW to stop the warning timer (step S103), and step S104, which will be described later, is executed. If the warning timer threshold has not been exceeded (step S102; threshold not exceeded), step S104, which will be described later, is executed without doing anything.

一方、脇見警告ＳＷがＯＦＦの場合又は、上述したステップＳ１０２、Ｓ１０３から進んだ場合は、走行画像取得部２が走行画像を取得し（ステップＳ１０４）、視覚顕著性処理部３において視覚顕著性画像処理（視覚顕著性マップの取得）を行う（ステップＳ１０５）。そして、視覚顕著性ピーク検出部４が、ステップＳ１０５で視覚顕著性処理部３が取得した視覚顕著性マップに基づいてピーク位置を取得（検出）する（ステップＳ１０６）。 On the other hand, if the inattentive warning switch is OFF or if the process proceeds from steps S102 or S103 described above, the driving image acquisition unit 2 acquires a driving image (step S104), and the visual saliency processing unit 3 performs visual saliency image processing (acquisition of a visual saliency map) (step S105). Then, the visual saliency peak detection unit 4 acquires (detects) a peak position based on the visual saliency map acquired by the visual saliency processing unit 3 in step S105 (step S106).

次に、脇見傾向判定部５が、注視エリアＧを設定して、当該注視エリアＧと視覚顕著性ピーク検出部４が取得したピーク位置とを比較する（ステップＳ１０７）。比較した結果ピーク位置が注視エリアＧ外である場合は（ステップＳ１０７；注視エリア外）、脇見傾向判定部５は、滞留タイマーが開始後か停止中か判定する（ステップＳ１０８）。滞留タイマーとは、ピーク位置が注視エリアＧ外に滞留している時間を計測するタイマーである。なお、注視エリアＧの設定は、ステップＳ１０４で画像を取得した際に行ってもよい。 Next, the inattentive tendency determination unit 5 sets a gaze area G and compares the gaze area G with the peak position acquired by the visual saliency peak detection unit 4 (step S107). If the comparison result shows that the peak position is outside the gaze area G (step S107; outside the gaze area), the inattentive tendency determination unit 5 determines whether the dwell timer has started or is stopped (step S108). The dwell timer is a timer that measures the time that the peak position dwells outside the gaze area G. Note that the gaze area G may be set when the image is acquired in step S104.

滞留タイマーが停止中である場合は（ステップＳ１０８；停止中）、脇見傾向判定部５は、滞留タイマーを開始する（ステップＳ１０９）。一方、滞留タイマーが開始後である場合は（ステップＳ１０８；開始後）、脇見傾向判定部５は、滞留タイマー閾値の比較を行う（ステップＳ１１０）。滞留タイマー閾値とは、ピーク位置が注視エリアＧ外に滞留している時間の閾値であり、上述したように２秒などと設定されている。 If the dwell timer is stopped (step S108; stopped), the inattentive tendency judgment unit 5 starts the dwell timer (step S109). On the other hand, if the dwell timer has already started (step S108; started), the inattentive tendency judgment unit 5 performs a comparison with the dwell timer threshold (step S110). The dwell timer threshold is a threshold for the time that the peak position remains outside the gaze area G, and is set to, for example, 2 seconds as described above.

滞留タイマーが閾値を超えていた場合は（ステップＳ１１０；閾値超え）、脇見傾向判定部５は、脇見警告部６の脇見警告ＳＷをＯＮにして、警告タイマーを開始させる（ステップＳ１１１）。そして、脇見傾向判定部５は、滞留タイマーを停止させる（ステップＳ１１２）。つまり、ピーク位置が注視エリアＧ外に滞留している時間が閾値以上であったので、脇見警告部６による警告を開始させる。 If the retention timer exceeds the threshold (step S110; exceeds the threshold), the inattentive tendency determination section 5 turns on the inattentiveness warning SW of the inattentiveness warning section 6 to start the warning timer (step S111). The inattentive tendency determining unit 5 then stops the retention timer (step S112). In other words, since the time during which the peak position remained outside the gaze area G was equal to or greater than the threshold value, the inattentiveness warning section 6 starts issuing a warning.

一方、滞留タイマーが閾値を超えない場合は（ステップＳ１１０；閾値超えない）、脇見傾向判定部５は、何もせずにステップＳ１０１に戻る。 On the other hand, if the dwell timer does not exceed the threshold (step S110; threshold not exceeded), the inattentive tendency determination unit 5 does nothing and returns to step S101.

また、ステップＳ１０７で比較した結果、ピーク位置が注視エリアＧ内である場合は（ステップＳ１０７；注視エリア内）、脇見傾向判定部５は、滞留タイマーを停止させる（ステップＳ１１２）。 Also, if the comparison in step S107 shows that the peak position is within the gaze area G (step S107; within gaze area), the inattentive driving tendency determination unit 5 stops the dwell timer (step S112).

以上の説明から明らかなように、ステップＳ１０５が取得工程、ステップＳ１０６がピーク位置検出工程、ステップＳ１０７が注視範囲設定工程、ステップＳ１０７～Ｓ１１１が脇見出力工程としてそれぞれ機能する。 As is clear from the above explanation, step S105 functions as an acquisition process, step S106 functions as a peak position detection process, step S107 functions as a gaze range setting process, and steps S107 to S111 functions as an aside-gaze output process.

本実施例によれば、情報出力装置１は、視覚顕著性処理部３が、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性マップを時系列に取得し、視覚顕著性ピーク検出部４が、視覚顕著性マップにおける少なくとも１つのピーク位置を時系列に検出する。そして、脇見傾向判定部５部が、画像における注視エリアＧを設定し、ピーク位置が注視エリアＧから所定時間以上連続して外れていた場合は、脇見警告部６が脇見の傾向がある旨の情報を出力する。この視覚顕著性マップには、統計的なヒトの視線の集まりやすさを示している。したがって、視覚顕著性マップのピークは、その中で最も統計的にヒトの視線が集まりやすい位置を示している。そのため、視覚的顕著性マップを用いることで、実際の運転手の視線を計測することなく、簡易な構成で脇見の傾向を検出することができる。 According to the present embodiment, the information output device 1 uses visual saliency processing unit 3 to estimate the level of visual saliency in the image based on an image taken of the outside from a moving body. The saliency map is acquired in time series, and the visual saliency peak detection unit 4 detects at least one peak position in the visual saliency map in time series. Then, the inattentiveness tendency determination unit 5 sets the gaze area G in the image, and if the peak position is continuously outside the gaze area G for a predetermined period of time or more, the inattentiveness warning unit 6 determines that there is a tendency to inattentiveness. Output information. This visual saliency map shows the statistical ease of attracting human gaze. Therefore, the peak of the visual saliency map indicates the position where human gaze is statistically most likely to be attracted. Therefore, by using a visual saliency map, it is possible to detect a tendency to look aside with a simple configuration without actually measuring the driver's line of sight.

また、脇見傾向判定部５は、注視エリアＧを画像の消失点Ｖに基づいて設定している。このようにすることにより、例えば前方車両等を検出しなくても注視エリアＧを容易に設定することが可能となる。 The inattentive tendency determination unit 5 also sets the gaze area G based on the vanishing point V of the image. In this way, it becomes possible to easily set the gaze area G without detecting, for example, a vehicle ahead.

また、脇見傾向判定部５が、ピーク位置が注視エリアＧよりも上方又は下方に所定時間以上連続して位置していた場合は、脇見警告部６が、固定物による脇見の傾向がある旨の情報を出力してもよい。注視エリアＧよりも上方は、一般的に建物や交通信号、標識、街灯などの固定物が映り込むエリアであり、注視エリアＧよりも下方は、一般的に道路標識等の路上ペイントが映り込むエリアである。つまり、範囲にピーク位置が含まれる場合は、脇見による脇見対象物が固定物であると特定することができる。 In addition, if the inattentive tendency determination unit 5 determines that the peak position is located above or below the gaze area G for a predetermined period of time or more, the inattentive tendency warning unit 6 may output information indicating that there is a tendency to look inattentively due to a fixed object. The area above the gaze area G is generally an area in which fixed objects such as buildings, traffic signals, signs, and street lights are reflected, and the area below the gaze area G is generally an area in which road paint such as road signs is reflected. In other words, if the range includes the peak position, it is possible to identify that the object of inattentive looking due to inattentive looking is a fixed object.

また、視覚顕著性処理部３は、画像を写像処理可能な中間データに変換する入力部３１０と、中間データを写像データに変換する非線形写像部３２０と、写像データに基づき顕著性分布を示す顕著性推定情報を生成する出力部３３０と、を備え、非線形写像部３２０は、中間データに対し特徴の抽出を行う特徴抽出部３２１と、特徴抽出部３２１で生成されたデータのアップサンプルを行うアップサンプル部３２２と、を備えている。このようにすることにより、小さな計算コストで、視覚顕著性を推定することができる。また、このようにして推定した視覚顕著性は、文脈的な注意状態を反映したものとなる。 The visual saliency processing unit 3 also includes an input unit 310 that converts an image into intermediate data that can be mapped, a nonlinear mapping unit 320 that converts the intermediate data into mapped data, and an output unit 330 that generates saliency estimation information indicating a saliency distribution based on the mapped data. The nonlinear mapping unit 320 includes a feature extraction unit 321 that extracts features from the intermediate data, and an upsampling unit 322 that upsamples the data generated by the feature extraction unit 321. In this way, visual saliency can be estimated with low computational cost. Furthermore, the visual saliency estimated in this way reflects the contextual attention state.

なお、注視エリアＧは、固定的な範囲に設定されるに限らない。例えば移動体の移動速度に応じて変更してもよい。例えば高速走行時には、運転者の視野が狭くなることが知られている。そこで、例えば脇見傾向判定部５が、車両に搭載されている速度センサ等から車速を取得して、速度が高くなるにしたがって注視エリアＧの範囲を狭めてもよい。また、移動速度に応じて適正な車間距離も変化するため、図９を参照して説明した算出方法による注視エリアＧの範囲も変化させてもよい。車両の速度は、速度センサに限らず、加速度センサや撮像画像から求めてもよい。 Note that the gaze area G is not limited to being set to a fixed range. For example, it may be changed according to the moving speed of the moving body. For example, it is known that the driver's field of vision narrows when driving at high speed. Therefore, for example, the inattentive tendency determination unit 5 may obtain the vehicle speed from a speed sensor or the like mounted on the vehicle, and narrow the range of the gaze area G as the speed increases. In addition, since the appropriate inter-vehicle distance also changes according to the moving speed, the range of the gaze area G calculated by the calculation method described with reference to FIG. 9 may also be changed. The speed of the vehicle is not limited to being obtained by a speed sensor, and may be obtained from an acceleration sensor or captured images.

また、注視エリアＧを車両等の走行位置や状況に応じて変更してもよい。周囲への注意が必要な状況であれば、注視エリアＧを広くする必要がある。例えば、住宅街、幹線道路、繁華街等の走行する位置によって注視すべき範囲は変わる。住宅街であれば歩行者は少ないが急な飛び出しに注意する必要があり注視エリアＧは狭くできない。一方で、幹線道路であれば、走行速度が高くなり、上述したように視野が狭くなる。 Furthermore, the viewing area G may be changed depending on the driving position and situation of the vehicle or the like. If the situation requires attention to the surroundings, the gaze area G needs to be widened. For example, the range to be watched changes depending on where the vehicle is traveling, such as in a residential area, on a main road, or in a downtown area. If it is a residential area, there are few pedestrians, but it is necessary to be careful about sudden jumps, so the gaze area G cannot be narrowed. On the other hand, if the vehicle is on a main road, the traveling speed will be high and the field of view will be narrow as described above.

具体例を示すと、通学路、公園、学校近傍は子供の飛び出しの危険性があると考えられる。駅学校近傍、催事の場所・観光地近傍等は歩行者が多いと考えられる。駐輪場近傍、学校近傍等は自転車が多いと考えられる。歓楽街近傍等は酔客が多いと考えられる。以上のような地点等は、周囲への注意が必要な状況であり、注視エリアＧを広くして、脇見傾向と判定されるエリアを狭くしてもよい。一方で、高速道路走行時や、交通量・人口密度の低い地域等は、走行速度が高くなる傾向があり、注視エリアＧを狭くして、脇見傾向と判定されるエリアを広くしてもよい。 To give specific examples, school routes, parks, and areas near schools are considered to be at risk of children running out into the street. There are considered to be many pedestrians near train stations, schools, event locations, and tourist sites. There are considered to be many bicycles near bicycle parking lots and schools. There are considered to be many drunk people near entertainment districts. Such locations are situations that require attention to the surroundings, so the gaze area G may be widened and the area determined to have a tendency to look away may be narrowed. On the other hand, when driving on highways or in areas with low traffic volume and population density, there is a tendency for driving speeds to be high, so the gaze area G may be narrowed and the area determined to have a tendency to look away may be widened.

また、時間帯やイベント等で注視エリアＧを変化させてもよい。例えば、通勤通学時間帯は、周囲への注意が必要な状況であり、通常時間帯よりも注視エリアＧを広くして脇見傾向と判定されるエリアを狭くしてもよい。あるいは薄暮～夜間にかけても同様に注視エリアＧを広くして脇見傾向と判定されるエリアを狭くしてもよい。一方で深夜は注視エリアＧを狭くして脇見傾向と判定されるエリアを広くしてもよい。 The gaze area G may also be changed depending on the time of day, events, etc. For example, during commuting hours, when attention to the surroundings is required, the gaze area G may be made wider than during normal hours, narrowing the area determined to be prone to looking away. Similarly, the gaze area G may be made wider from twilight to nighttime, narrowing the area determined to be prone to looking away. On the other hand, in the middle of the night, the gaze area G may be narrowed, widening the area determined to be prone to looking away.

さらに、イベント情報により注視エリアＧを変化させてもよい。例えば催事等は人の往来が多い場所や時間帯となるので、通常より注視エリアＧを広くして脇見傾向の判定を緩くしてもよい。 Furthermore, the gaze area G may be changed depending on event information. For example, special events and similar events are held in locations and at times with high pedestrian traffic, so the gaze area G may be made wider than usual to loosen the judgment of the tendency to look away.

このような地点の情報は、脇見傾向判定部５が、ＧＰＳ受信機や地図データ等の現在位置及び走行している地域が判別できる手段から情報を取得し、画像データと対応付けておくことで、注視エリアＧの範囲を変化させることができる。時刻情報は情報出力装置１が内部又は外部から取得すればよい。イベント情報は外部サイト等から取得すればよい。また、位置と時刻、日付を組み合わせて変更の判定を行ってもよいし、何れか一つを用いて変更の判定を行ってもよい。 Information on such points can be obtained by the inattentiveness tendency determination unit 5 acquiring information from a means that can determine the current position and the area in which the vehicle is traveling, such as a GPS receiver or map data, and associating it with image data. , the range of the gaze area G can be changed. The time information may be acquired by the information output device 1 from inside or outside. Event information may be obtained from external sites, etc. Further, the change may be determined by combining the location, time, and date, or any one of them may be used to determine the change.

さらに、高速走行する際には、滞留タイマー閾値を短くしてもよい。これは、高速走行時は、短時間の脇見でも危険な状態になるためである。 Furthermore, when driving at high speeds, the dwell timer threshold may be shortened. This is because even a brief moment of inattentive driving can be dangerous when driving at high speeds.

次に、本発明の第２の実施例にかかる情報処理装置を図１３～図１５を参照して説明する。なお、前述した第１の実施例と同一部分には、同一符号を付して説明を省略する。 Next, an information processing device according to a second embodiment of the present invention will be described with reference to Figures 13 to 15. Note that the same parts as those in the first embodiment described above are given the same reference numerals and the description will be omitted.

図１３に本実施例の典型的なシステム構成例を示す。本実施例にかかるシステムは情報処理装置１Ａとサーバ装置１０と、を有している。本実施例にかかる情報処理装置１Ａは、車両Ｖに搭載されている。そして、情報処理装置１Ａとサーバ装置１０とは、インターネット等のネットワークＮを介して通信可能となっている。 FIG. 13 shows a typical system configuration example of this embodiment. The system according to this embodiment includes an information processing device 1A and a server device 10. An information processing device 1A according to this embodiment is mounted on a vehicle V. The information processing device 1A and the server device 10 can communicate via a network N such as the Internet.

本実施例にかかる情報処理装置１Ａを図１４に示す。情報処理装置１Ａは、走行画像取得部２と、視覚顕著性処理部３と、視覚顕著性ピーク検出部４と、脇見傾向判定部５Ａと、出力部７と、を備えている。 FIG. 14 shows an information processing apparatus 1A according to this embodiment. The information processing device 1A includes a running image acquisition section 2, a visual saliency processing section 3, a visual saliency peak detection section 4, an inattentive tendency determination section 5A, and an output section 7.

走行画像取得部２と、視覚顕著性処理部３と、視覚顕著性ピーク検出部４と、は第１の実施例と同様である。脇見傾向判定部５Ａは、脇見の傾向を判定するに加えて、脇見の傾向があると判定された際の脇見対象物（ピーク位置）が常在対象物か否かを判定する。即ち、脇見傾向判定部５Ａは、ピーク位置が注視エリアＧ（注視すべき範囲）から所定時間以上連続して外れていた場合、ピーク位置に対応するのが常在対象物であるか判定する判定部として機能する。 The running image acquisition section 2, visual saliency processing section 3, and visual saliency peak detection section 4 are the same as those in the first embodiment. In addition to determining the tendency of looking aside, the looking aside tendency determination unit 5A also determines whether the looking aside object (peak position) when it is determined that there is a tendency of looking aside is a constant object. That is, when the peak position is out of the gaze area G (range to be watched) for a predetermined period of time or more, the inattentive tendency determination unit 5A determines whether the object corresponding to the peak position is a constant object. function as a department.

常在対象物とは、第１の実施例で説明した建物や交通信号、標識、街灯、路上ペイントなどの固定物を示し、映像撮影場所から常時観察できる（建物等の常にその位置に存在する）脇見対象物をいう。 A constantly present object refers to a fixed object such as a building, traffic signal, sign, street lamp, or road paint as described in the first embodiment, and is an object that can be constantly observed from the video shooting location (a building or other object that is always present in that position).

常在対象物の判定は、第１の実施例で説明した、ピーク位置が上方エリアＩｕか下方エリアＩｄかの判定に加えて、ピーク位置が左側方エリアＩｌや右側方エリアＩｒであった場合にも判定を行う。 The determination of whether a constant object is present is made not only when the peak position is in the upper area Iu or the lower area Id, as described in the first embodiment, but also when the peak position is in the left side area Il or the right side area Ir.

ピーク位置が上方エリアＩｕか下方エリアＩｄである場合はエリアのみで常在対象物の判定が可能である。一方、ピーク位置が左側方エリアＩｌや右側方エリアＩｒであった場合は、エリアだけでは脇見対象物が常在対象物か判定できないため、物体認識を用いて判定を行う。物体認識（物体検出ともいう）は周知のアルゴリズムを用いればよく、具体的な方法は特に限定されない。 If the peak position is in the upper area Iu or the lower area Id, it is possible to determine whether the object is a resident object based only on the area. On the other hand, if the peak position is in the left area Il or the right area Ir, it is not possible to determine whether the inattentive object is a permanent object based only on the area, so the determination is performed using object recognition. Object recognition (also referred to as object detection) may be performed using a well-known algorithm, and the specific method is not particularly limited.

また、物体認識に限らず相対速度を利用して常在対象物の判定を行ってもよい。これは、自車速度と脇見対象物のフレーム間の移動速度から相対速度を求め、その相対速度から脇見対象物が常在対象物か判定する。ここで、脇見対象物のフレーム間の移動速度は、ピーク位置のフレーム間の移動速度を求めればよい。そして求めた相対速度が所定の閾値以上である場合は、ある位置に固定されている物（常在対象物）と判定することができる。 In addition, relative velocity may be used in addition to object recognition to determine a resident object. This calculates the relative speed from the speed of the own vehicle and the moving speed of the object to be looked at in between frames, and then determines from the relative speed whether the object to be looked at is a permanent object. Here, the inter-frame moving speed of the inattentive object can be determined by calculating the inter-frame moving speed of the peak position. If the determined relative velocity is greater than or equal to a predetermined threshold, it can be determined that the object is fixed at a certain position (a permanent object).

常在対象物は、上述したように、ある位置に固定され、映像撮影場所から常時観察できる（建物等の常にその位置に存在する）ものであるので、本実施例による判定を行うことで、脇見対象物（ピーク位置）が常在対象物と判定された位置では、常に脇見し易い位置であると見なすことができる。 As described above, a permanent object is fixed at a certain position and can be observed at all times from the video shooting location (always existing at that position, such as a building), so by making the determination according to this embodiment, A position where the object of inattentiveness (peak position) is determined to be a constant object can be considered to be a position where it is easy to look aside.

出力部７は、脇見傾向判定部５において脇見対象物が常在対象物であると判定した場合は、判定結果をサーバ装置１０へ送信する。あるいは判定結果として常在対象物の有無を常に送信するようにしてもよい。このとき、リアルタイムに判定した場合は判定時刻、メモリカード等に保存された画像を判定した場合は画像の撮像時刻や日付を付加してもよい。時刻の情報を付加することで、時間帯により脇見し易い地点等を抽出することができる。例えば、昼間にのみ見える建物により脇見し易い地点、照明等の影響により脇見し易い地点、花火などのイベントにより脇見し易い地点、等が判別できる。また、撮像地点の情報を付加してもよい。 When the inattentive tendency determining unit 5 determines that the inattentive object is a constant object, the output unit 7 transmits the determination result to the server device 10 . Alternatively, the presence or absence of a resident object may be always transmitted as a determination result. At this time, if the determination is made in real time, the determination time may be added, and if the image stored in a memory card or the like is determined, the imaging time or date of the image may be added. By adding time information, it is possible to extract locations where it is easy to look aside depending on the time of day. For example, it is possible to determine locations where it is easy to look aside due to buildings that are only visible during the day, locations where it is easy to look aside due to the effects of lighting, etc., locations where it is easy to look aside due to events such as fireworks, etc. Additionally, information on the imaging point may be added.

サーバ装置１０は、情報処理装置１Ａから送信された判定結果を集計する。例えば、位置情報を含めて情報処理装置１Ａ（車両）から送信された判定結果を集計することで、脇見し易い地点を抽出することができる。さらに、時間情報を含めることで、時間帯による脇見のし易い地点を抽出することができる。 The server device 10 tally up the judgment results transmitted from the information processing device 1A. For example, by tallying up the judgment results transmitted from the information processing device 1A (vehicle) including the location information, it is possible to extract locations where it is easy to look away from the vehicle. Furthermore, by including time information, it is possible to extract locations where it is easy to look away from the vehicle depending on the time of day.

本実施例にかかる情報処理装置１Ａにおける動作について、図１５のフローチャートを参照して説明する。図１５において、ステップＳ１０４～Ｓ１１０、Ｓ１１２は、図１２と同様である。また、図１２のステップＳ１０１～Ｓ１０３は、本実施例では警告は必須ではないので省略した。 The operation of the information processing apparatus 1A according to this embodiment will be explained with reference to the flowchart of FIG. 15. In FIG. 15, steps S104 to S110 and S112 are the same as in FIG. 12. Furthermore, steps S101 to S103 in FIG. 12 are omitted because warnings are not essential in this embodiment.

図１５では、滞留タイマーが閾値を超えていた場合は（ステップＳ１１０；閾値超え）、脇見傾向判定部５は、常在対象物の判定及び判定結果の送信を行う（ステップＳ１１１Ａ）。ステップＳ１１１Ａでは、上述した物体認識等により常在対象物の判定を行って判定結果をサーバ装置１０に送信する。 In FIG. 15, if the dwell timer exceeds the threshold (step S110; exceeded threshold), the inattentive tendency determination unit 5 determines whether or not there is a habitual object and transmits the determination result (step S111A). In step S111A, the determination of the habitual object is performed by the object recognition described above, and the determination result is transmitted to the server device 10.

なお、図１５に示したフローチャートでは、警告に係るステップは省略したが、警告も行うようにしてもよい。警告をする場合は、図１に示した脇見警告部６も備え、ステップＳ１０１～Ｓ１０３も実行するようにし、ステップＳ１１１Ａの前後又は並行して図１２のステップＳ１１１を実行するようにすればよい。 Note that in the flowchart shown in FIG. 15, steps related to warnings are omitted, but warnings may also be issued. If a warning is to be issued, the inattentiveness warning unit 6 shown in FIG. 1 may be provided, steps S101 to S103 may also be executed, and step S111 in FIG. 12 may be executed before, after, or in parallel with step S111A.

また、本実施例では、車両側の情報処理装置１Ａが常在対象物の判定を行っていたが、サーバ装置１０側で常在対象物の判定を行ってもよい。つまり、走行画像等をサーバ装置１０が取得し、その画像に対して常在対象物の判定を行って、判定結果をサーバ装置１０内の記憶装置や他のサーバ装置等に出力するようにしてもよい。 In addition, in this embodiment, the information processing device 1A on the vehicle side judges whether a target object is always present, but the server device 10 may also judge whether a target object is always present. In other words, the server device 10 may acquire driving images, etc., judge whether a target object is always present on the images, and output the judgment result to a storage device in the server device 10 or another server device, etc.

本実施例によれば、情報処理装置１Ａは、視覚顕著性処理部３が、移動体から外部を撮像した画像に基づいて、その画像内における視覚顕著性の高低を推測して得られた視覚顕著性マップを時系列に取得し、視覚顕著性ピーク検出部４が、視覚顕著性マップにおける少なくとも１つのピーク位置を時系列に検出する。そして、脇見傾向判定部５部が、画像における注視エリアＧを設定し、ピーク位置が注視エリアＧから所定時間以上連続して外れていた場合、ピーク位置に対応するのが常在対象物であるか判定する。このようにすることにより、視覚顕著性マップに基づいて検出された脇見対象物が映像撮影場所から常時観察できる（建物等の常にその位置に存在する）常在対象物か、移動体かを判定することが可能となる。したがって、脇見対象物を少なくとも常在対象物であるか特定することができる。 According to the present embodiment, the information processing device 1A uses visual saliency processing unit 3 to estimate the level of visual saliency in the image based on an image taken of the outside from a moving body. The saliency map is acquired in time series, and the visual saliency peak detection unit 4 detects at least one peak position in the visual saliency map in time series. Then, the inattentive tendency determination unit 5 sets a gaze area G in the image, and if the peak position is continuously away from the gaze area G for a predetermined time or more, it is determined that the object corresponding to the peak position is a constant object. Determine whether By doing this, it is determined whether the inattentive object detected based on the visual saliency map is a permanent object that can be observed at all times from the video shooting location (always existing at that position, such as a building) or a moving object. It becomes possible to do so. Therefore, it is possible to specify whether the neglected object is at least a constant object.

また、脇見傾向判定部５部は、ピーク位置が注視エリアＧよりも左側方又は右側方に外れていた場合に、物体認識により常在対象物であるか判定してもよい。このようにすることにより、ピーク位置が注視すべき範囲よりも左側方又は右側方に外れていた場合は、ピーク位置に対応するのは常在対象物または移動体のいずれかであるので、物体認識を行うことで、建物等の常在対象物と自動車等の移動体とを精度良く判定することができる。 The inattentive tendency determination unit 5 may also determine whether the peak position is a resident object by object recognition when the peak position is to the left or right of the gaze area G. In this way, when the peak position is to the left or right of the gaze area G, the peak position corresponds to either a resident object or a moving object, so by performing object recognition, it is possible to accurately determine whether the object is a resident object such as a building or a moving object such as a car.

また、脇見傾向判定部５部が、移動体の移動速度を取得し、ピーク位置が注視エリアＧよりも左側方又は右側方に外れていた場合に、移動速度に基づいてピーク位置が示す物体の相対速度を算出し、相対速度に基づいて常在対象物であるか判定してもよい。このようにすることにより、相対速度により常在対象物を判定することができ、判定時の処理負荷を軽減することができる。 In addition, the inattentiveness tendency determination unit 5 obtains the moving speed of the moving object, and when the peak position is off to the left or right side of the gaze area G, the inattentiveness tendency determination unit 5 acquires the moving speed of the moving object, and when the peak position is off to the left or right side of the gaze area G, the object indicated by the peak position is determined based on the moving speed. The relative velocity may be calculated and it may be determined whether the object is a permanent object based on the relative velocity. By doing so, it is possible to determine whether the object is present based on the relative velocity, and the processing load at the time of determination can be reduced.

また、判定結果を出力する出力部７を備えている。このようにすることにより、例えば車両で判定した結果をサーバ装置１０等へ送信して集計することが可能となる。 The system also includes an output unit 7 that outputs the judgment results. In this way, it is possible to transmit the results of the judgment made in the vehicle to a server device 10 or the like for compilation.

また、本発明は上記実施例に限定されるものではない。即ち、当業者は、従来公知の知見に従い、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。かかる変形によってもなお本発明の情報処理装置を具備する限り、勿論、本発明の範疇に含まれるものである。 Further, the present invention is not limited to the above embodiments. That is, those skilled in the art can implement various modifications based on conventionally known knowledge without departing from the gist of the present invention. Of course, such modifications fall within the scope of the present invention as long as they still include the information processing apparatus of the present invention.

１情報出力装置
２走行画像取得部
３視覚顕著性処理部（取得部）
４視覚顕著性ピーク検出部（ピーク位置検出部）
５脇見傾向判定部（注視範囲設定部、脇見出力部、速度取得部、位置取得部、判定部）
６脇見警告部
７出力部 1 Information output device 2 Driving image acquisition unit 3 Visual saliency processing unit (acquisition unit)
4. Visual saliency peak detection unit (peak position detection unit)
5. Inattentive tendency determination unit (gaze range setting unit, inattentive output unit, speed acquisition unit, position acquisition unit, determination unit)
6 Inattentive warning section 7 Output section

Claims

an acquisition unit that acquires visual saliency distribution information in time series obtained by estimating the level of visual saliency in the image based on an image taken of the outside from the moving body;
a peak position detection unit that detects at least one peak position in the visual saliency distribution information;
a gaze range setting unit that sets a range in the image that the driver of the mobile object should gaze at;
an inattentiveness output unit that outputs information indicating that there is a tendency to inattentiveness when the peak position is continuously out of the range to be watched;
An information output device comprising: