JP2011250928A

JP2011250928A - Device, method and program for space recognition for visually handicapped person

Info

Publication number: JP2011250928A
Application number: JP2010125662A
Authority: JP
Inventors: Hisashi Suzuki; 寿鈴木; Shuichi Utsugi; 修一宇都木
Original assignee: Chuo University
Current assignee: Chuo University
Priority date: 2010-06-01
Filing date: 2010-06-01
Publication date: 2011-12-15

Abstract

PROBLEM TO BE SOLVED: To provide a novel space recognition device for visually handicapped persons enabling a user to recognize a three-dimensional space with high accuracy through a sensory system (a tactile sensory system, an auditory sensory system, or the like) markedly lower in resolution than a visual system.SOLUTION: A partial pixel region of a main image imaged by a main camera of a stereo imaging apparatus is defined as a noted region, and distance values on every pixel are computed by carrying out stereo matching with sub-images imaged by a sub-camera on each pixel that composes the noted region. A statistical central value (e.g. an average value) is derived on a plurality of computed distance values, and a vibrator is driven on a larger output level as the central value becomes smaller. In response to the operation of the stereo imaging apparatus, the output level of the vibrator changes with the lapse of time. The user recognizes a depth feeling of an obstacle from the aging effect of a vibration level sensed by a tactile sensation.

Description

本発明は、視覚障害者用空間認識装置に関し、より詳細には、ステレオ視法を利用した視覚障害者用空間認識装置に関する。 The present invention relates to a space recognition device for visually impaired people, and more particularly to a space recognition device for visually impaired people using stereo vision.

従来、視覚障害者の支援を目的とし、ステレオ視法を利用した空間認識システムについて種々検討がなされている。特開２００２−６５７２１号公報（特許文献１）は、ユーザの前頭部に装着したステレオカメラによって撮影された画像から、ユーザの前方空間内にある物体の三次元位置を計算し、その結果に基づいてユーザが手で触れる触覚表示装置やステレオヘッドフォンを駆動することを特徴とした視覚障害者用環境認識支援装置を開示する。また、特開２００３−７９６８５号公報（特許文献２）は、ステレオカメラによって撮影された画像から得られる障害物の三次元情報を二次元情報に変換し、当該二次元情報に基づいてユーザの額に当てられたアクチュエータアレイの突起部を駆動することによって、ユーザに空間情報を体感的に伝達することを特徴とした視覚障害者歩行用補助具を開示する。 Conventionally, various studies have been made on spatial recognition systems using stereo vision for the purpose of assisting visually impaired people. Japanese Patent Laid-Open No. 2002-65721 (Patent Document 1) calculates a three-dimensional position of an object in a user's front space from an image captured by a stereo camera attached to the user's frontal head, An environment recognition support device for a visually impaired person is disclosed, in which a tactile display device touched by a user or a stereo headphone is driven. Japanese Patent Laying-Open No. 2003-79585 (Patent Document 2) converts the three-dimensional information of an obstacle obtained from an image photographed by a stereo camera into two-dimensional information, and the user's forehead based on the two-dimensional information. Disclosed is a walking aid for visually impaired persons, characterized in that the spatial information is transmitted to the user in a bodily manner by driving the protrusions of the actuator array applied to.

しかしながら、人間の聴覚系の情報伝達速度は、視覚系のそれに比較して格段に遅く、触覚系のそれは、聴覚系のそれよりもさらに格段に遅いことが知られており、上述したシステムがステレオ視法を利用して取得した正確な三次元情報に基づいて、どんなに解像度の高い情報をユーザに提供したとしても、ユーザ側の感覚系の分解能が不十分であるため、結局のところ、ユーザに空間の奥行き感を正確に認識させることができないという問題があった。 However, it is known that the information transmission speed of the human auditory system is much slower than that of the visual system, and that of the tactile system is much slower than that of the auditory system. Regardless of how high resolution information is provided to the user based on accurate three-dimensional information obtained using vision, the resolution of the sensory system on the user side is insufficient. There was a problem that the sense of depth in the space could not be recognized accurately.

特開２００２−６５７２１号公報JP 2002-65721 A 特開２００３−７９６８５号公報JP 2003-79585 A

本発明は、上記従来技術における課題に鑑みてなされたものであり、本発明は、視覚系に比較して格段に分解能が低い感覚系（聴覚系、触覚系など）を介してユーザに三次元空間を正確に認識させることが可能な新規な視覚障害者用空間認識装置を提供することを目的とする。 The present invention has been made in view of the above problems in the prior art, and the present invention provides a three-dimensional view to a user via a sensory system (such as an auditory system or a tactile system) that has a significantly lower resolution than the visual system. It is an object of the present invention to provide a novel space recognition device for visually handicapped persons capable of accurately recognizing a space.

本発明者は、視覚系に比較して格段に分解能が低い感覚系を介してユーザに三次元空間を正確に認識させることが可能な新規な視覚障害者用空間認識装置につき鋭意検討した結果、空間がもつ膨大な三次元情報の中からユーザが真に必要とする情報のみを抽出し、これを触覚や聴覚などの感覚器の分解能に適合した一次元情報へと縮約する着想を得た。本発明者は、この着想に基づいて、ステレオ撮像装置の主カメラが撮像した主画像の一部の画素領域を注目領域として定義し、当該注目領域を構成する全画素についてステレオ視法を利用して算出された複数の距離値について統計的な代表値を導出し、当該代表値に基づいて出力装置を駆動するシステム構成に想到し、本発明に至ったのである。 As a result of earnestly examining the novel space recognition device for the visually impaired, which enables the user to accurately recognize the three-dimensional space through a sensory system that has a remarkably lower resolution than the visual system, The idea was that only the information that the user really needed was extracted from the vast amount of 3D information in the space, and this was reduced to 1D information suitable for the resolution of sensory organs such as touch and hearing. . Based on this idea, the present inventor defines a partial pixel area of the main image captured by the main camera of the stereo imaging device as the attention area, and uses stereo vision for all the pixels constituting the attention area. Thus, a statistical representative value is derived for the plurality of distance values calculated in this way, and a system configuration for driving the output device based on the representative value is conceived and the present invention has been achieved.

すなわち、本発明によれば、主カメラおよび副カメラを含むステレオ撮像装置と情報処理装置と出力装置とを含む視覚障害者用空間認識装置であって、前記情報処理装置は、前記主カメラが撮像した主画像の一部の画素領域を注目領域として定義する注目領域設定部と、前記注目領域を構成する各画素について副カメラが撮像した副画像との間でステレオマッチングを実行し、前記注目領域の前記各画素に対応する前記副画像内の画素を検出するステレオマッチング部と、前記注目領域の前記各画素と検出された前記対応する前記副画像内の画素の座標から算出される視差に基づいて前記注目領域の前記各画素の距離値を算出する距離値算出部と、算出された複数の前記距離値について統計的な代表値を導出する代表値導出部と、前記代表値が小さくなるほど大きな出力レベルの出力信号を生成する出力信号生成部と、前記出力信号に基づいて駆動する出力装置とを含む視覚障害者用空間認識装置が提供される。 In other words, according to the present invention, there is provided a space recognition device for a visually impaired person including a stereo imaging device including a main camera and a sub camera, an information processing device, and an output device, wherein the information processing device is picked up by the main camera. Stereo matching is performed between the attention area setting unit that defines a part of the pixel area of the main image as the attention area, and the sub-image captured by the sub camera for each pixel constituting the attention area, and the attention area A stereo matching unit that detects pixels in the sub-image corresponding to each of the pixels, and a parallax calculated from coordinates of the pixels in the corresponding sub-image detected with the pixels in the region of interest. A distance value calculating unit that calculates a distance value of each pixel of the attention area, a representative value deriving unit that derives a statistical representative value for the plurality of calculated distance values, and the representative value An output signal generator for generating a large output level of the output signal as smaller, visually impaired spatial recognition device and an output device that drives on the basis of the output signal is provided.

上述したように、本発明によれば、視覚系に比較して格段に分解能が低い感覚系を介してユーザに三次元空間を正確に認識させることが可能な新規な視覚障害者用空間認識装置が提供される。 As described above, according to the present invention, a novel space recognition device for visually handicapped persons that allows a user to accurately recognize a three-dimensional space through a sensory system that has a significantly lower resolution than the visual system. Is provided.

本実施形態の視覚障害者用空間認識装置を示す図。The figure which shows the space recognition apparatus for visually impaired persons of this embodiment. 本実施形態における注目領域設定部の機能を説明するための概念図。The conceptual diagram for demonstrating the function of the attention area setting part in this embodiment. 注目領域の３つのモードを示す図。The figure which shows three modes of an attention area. 注目領域の３つのモードの活用方法を説明するための概念図。The conceptual diagram for demonstrating the utilization method of three modes of an attention area. 本実施形態の視覚障害者用空間認識装置のステレオ撮像装置を示す図。The figure which shows the stereo imaging device of the space recognition apparatus for visually impaired persons of this embodiment. 本実施形態の視覚障害者用空間認識装置の使用態様を示す図。The figure which shows the usage condition of the space recognition apparatus for visually impaired persons of this embodiment. 本実施形態の視覚障害者用空間認識装置の「スリットモード」における使用態様を示す図。The figure which shows the usage condition in "slit mode" of the space recognition apparatus for visually impaired persons of this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment. 本実施形態における「スリットモード」の効用を説明するための概念図。The conceptual diagram for demonstrating the effect of "slit mode" in this embodiment.

以下、本発明を図面に示した実施の形態をもって説明するが、本発明は、図面に示した実施の形態に限定されるものではない。なお、以下に参照する各図においては、共通する要素について同じ符号を用い、適宜、その説明を省略するものとする。 Hereinafter, the present invention will be described with reference to embodiments shown in the drawings, but the present invention is not limited to the embodiments shown in the drawings. In the drawings referred to below, the same reference numerals are used for common elements, and the description thereof is omitted as appropriate.

図１は、本発明の実施形態である視覚障害者用空間認識装置１００を示す。視覚障害者用空間認識装置１００（以下、単に、空間認識装置１００という）は、ステレオ撮像装置１０と、情報処理装置２０と、出力装置３０と、設定切替手段４０とを含んで構成されている。ステレオ撮像装置１０は、視覚障害者であるユーザ（以下、単にユーザという）の進行方向上にある対象物Ｓを撮像するための手段であり、平行等位に設置される主カメラ１２および副カメラ１４を備えている。なお、主カメラ１２および副カメラ１４は、デジタル動画あるいはデジタル連続静止画を高解像度で撮像可能な３ＣＣＤカメラとして構成することが好ましい。主カメラ１２および副カメラ１４によって撮像された各画像は、情報処理装置２０に転送される。情報処理装置２０は、主カメラ１２および副カメラ１４が撮像した２つの画像に基づいて対象物Ｓとステレオ撮像装置１０（すなわち、ユーザ）との離間距離をステレオ視法を利用して算出し、当該離間距離に対応する出力信号を生成して出力装置３０に送信する。 FIG. 1 shows a space recognition device 100 for a visually impaired person according to an embodiment of the present invention. The space recognition device 100 for the visually impaired (hereinafter simply referred to as the space recognition device 100) includes a stereo imaging device 10, an information processing device 20, an output device 30, and setting switching means 40. . The stereo imaging device 10 is a means for capturing an image of the object S in the traveling direction of a visually impaired user (hereinafter simply referred to as a user), and includes a main camera 12 and a sub camera installed in parallel equivalence. 14 is provided. The main camera 12 and the sub camera 14 are preferably configured as a 3CCD camera capable of capturing a digital moving image or a digital continuous still image with high resolution. Each image captured by the main camera 12 and the sub camera 14 is transferred to the information processing apparatus 20. The information processing device 20 calculates the separation distance between the object S and the stereo imaging device 10 (that is, the user) based on the two images captured by the main camera 12 and the sub camera 14, using stereo vision. An output signal corresponding to the separation distance is generated and transmitted to the output device 30.

本実施形態における情報処理装置２０は、汎用コンピュータとして構成することができ、適切なオペレーティング・システムの管理下で、プログラミング言語により記述されたアプリケーション・プログラムを実行することによって、以下の各機能手段を実現する。すなわち、情報処理装置２０は、ステレオ撮像装置１０から転送される画像を受領するための画像入力用インターフェース２１と、注目領域設定部２２と、ステレオマッチング部２３と、距離値算出部２４と、代表値導出部２５と、出力信号生成部２６と、設定切替部２７とを含んで構成されている。以下、上記各機能手段が果たす機能について、図１〜図４を参照して説明する。 The information processing apparatus 20 in the present embodiment can be configured as a general-purpose computer. By executing an application program described in a programming language under the management of an appropriate operating system, the following functional units are provided. Realize. That is, the information processing device 20 includes an image input interface 21 for receiving an image transferred from the stereo imaging device 10, an attention area setting unit 22, a stereo matching unit 23, a distance value calculation unit 24, a representative A value deriving unit 25, an output signal generating unit 26, and a setting switching unit 27 are included. Hereinafter, functions performed by the respective functional means will be described with reference to FIGS.

図２は、本実施形態における注目領域設定部２２の機能を説明するための概念図である。本実施形態においては、まず、ステレオ撮像装置１０の主カメラ１２および副カメラ１４によって撮像された対象物Ｓの２つの画像が画像入力用インターフェース２１を介して注目領域設定部２２に入力される。図２（ａ）は、注目領域設定部２２に入力された２つの画像を示す。なお、図２においては、紙面の左側に主カメラ１２の撮像画像（以下、主画像として参照する）を示し、紙面の右側に副カメラ１４の撮像画像（以下、副画像として参照する）を示している。先に述べたように、主カメラ１２と副カメラ１４は、平行等位に設置されているため、図２（ａ）に示されるように、主画像と副画像のエピポーラ線は一致している。 FIG. 2 is a conceptual diagram for explaining the function of the attention area setting unit 22 in the present embodiment. In the present embodiment, first, two images of the object S imaged by the main camera 12 and the sub camera 14 of the stereo imaging device 10 are input to the attention area setting unit 22 via the image input interface 21. FIG. 2A shows two images input to the attention area setting unit 22. In FIG. 2, a captured image of the main camera 12 (hereinafter referred to as a main image) is shown on the left side of the paper, and a captured image of the sub camera 14 (hereinafter referred to as a sub image) is shown on the right side of the paper. ing. As described above, since the main camera 12 and the sub camera 14 are installed in parallel equiposition, the epipolar lines of the main image and the sub image coincide with each other as shown in FIG. .

続いて、注目領域設定部２２は、図２（ｂ）に示すように、ステレオ撮像装置１０から受領した主画像の一部の画素領域を注目領域Ｒ（破線で囲んで示す）として設定する。なお、注目領域Ｒは、その重心が主カメラの光軸（図中「+」で示す）と一致するように定義することが好ましい。本実施形態においては、形状および大きさの異なった複数の注目領域Ｒを定義し、設定によって注目領域Ｒの態様を適宜切替えることができるように構成することが好ましい。なお、この点については後に詳説する。 Subsequently, as illustrated in FIG. 2B, the attention area setting unit 22 sets a part of the pixel area of the main image received from the stereo imaging device 10 as the attention area R (enclosed by a broken line). Note that the attention area R is preferably defined so that its center of gravity coincides with the optical axis of the main camera (indicated by “+” in the drawing). In the present embodiment, it is preferable that a plurality of attention areas R having different shapes and sizes are defined and configured so that the aspect of the attention area R can be appropriately switched by setting. This point will be described in detail later.

注目領域設定部２２によって設定された主画像の注目領域Ｒを構成する全ての画素情報と副画像の全画素情報は、ステレオマッチング部２３に送られる。図２（ｃ）は、ステレオマッチング部２３に送られた画素情報を概念的に示す。なお、ここで、画素情報とは、各画素の輝度値および座標を含む情報である。ステレオマッチング部２３は、注目領域Ｒを構成する全画素について、画素毎に副画像との間でステレオマッチング処理を実行し、注目領域Ｒに含まれる各画素に対応する副画像内の画素を特定する。ステレオマッチング処理は、動的計画法を用いた方法など既知の手法によって行なうことができる。ステレオマッチング部２３は、マッチング結果として、注目領域Ｒ内の各画素の座標とこれに対応する副画像内の画素の座標を対応付けた情報（以下、画素対情報として参照する）を距離値算出部２４に送る。距離値算出部２４は、画素対情報に含まれる注目領域Ｒの画素の座標および副画像の画素の座標に基づいて距離値を算出する。本実施形態において、距離値は、注目領域Ｒの各画素に写り込んだ対象物Ｓ上の位置から主カメラ１２および副カメラ１４の各レンズの中心を結ぶ線に下ろした垂線の長さまたはその近似値として定義することができる。距離値算出部２４は、注目領域Ｒの画素の座標および副画像の画素の座標に基づいて求めた両画素間の距離から偏差を減じたものを視差ｐとし、主カメラ１２および副カメラ１４の各レンズの中心間距離とカメラの焦点距離との積を視差ｐで除算することによって距離値を算出することができ、あるいは、視差ｐに適当な固定係数を乗じることによって近似値としての距離値を導出することもできる。当該距離値は、注目領域Ｒを構成する全画素について算出され、注目領域Ｒを構成する画素数と同数の距離値が計測結果として代表値導出部２５に送られる。 All the pixel information constituting the attention area R of the main image set by the attention area setting section 22 and all the pixel information of the sub-image are sent to the stereo matching section 23. FIG. 2C conceptually shows the pixel information sent to the stereo matching unit 23. Here, the pixel information is information including the luminance value and coordinates of each pixel. The stereo matching unit 23 performs a stereo matching process for each pixel constituting the attention area R with the sub image for each pixel, and identifies pixels in the sub image corresponding to each pixel included in the attention area R. To do. The stereo matching process can be performed by a known method such as a method using dynamic programming. As a matching result, the stereo matching unit 23 calculates the distance value of information (hereinafter referred to as pixel pair information) that associates the coordinates of each pixel in the attention area R with the coordinates of the corresponding pixel in the sub-image. Send to part 24. The distance value calculation unit 24 calculates a distance value based on the coordinates of the pixel of the attention area R and the coordinates of the pixel of the sub-image included in the pixel pair information. In the present embodiment, the distance value is the length of a perpendicular line drawn from the position on the object S reflected in each pixel of the region of interest R to the line connecting the centers of the lenses of the main camera 12 and the sub camera 14 or its length. It can be defined as an approximate value. The distance value calculation unit 24 sets the parallax p as a value obtained by subtracting the deviation from the distance between both pixels obtained based on the coordinates of the pixel of the attention area R and the coordinates of the pixel of the sub-image, and A distance value can be calculated by dividing the product of the center-to-center distance of each lens and the focal length of the camera by the parallax p, or a distance value as an approximate value by multiplying the parallax p by an appropriate fixed coefficient. Can also be derived. The distance value is calculated for all the pixels constituting the attention area R, and the same number of distance values as the number of pixels constituting the attention area R are sent to the representative value deriving unit 25 as a measurement result.

代表値導出部２５は、注目領域Ｒに含まれる膨大な空間情報を縮約するための機能部である。すなわち、代表値導出部２５は、距離値算出部２４から受領した複数の距離値について統計的な代表値を導出する。このとき、画素単位の計測誤差は自動的に吸収されることになる。本実施形態においては、代表値として、平均値、中央値、最頻値または最小値のいずれかを設定することができるように構成されている。導出された代表値は出力信号生成部２６に送られる。 The representative value deriving unit 25 is a functional unit for reducing enormous spatial information included in the attention area R. That is, the representative value deriving unit 25 derives a statistical representative value for a plurality of distance values received from the distance value calculating unit 24. At this time, the measurement error in pixel units is automatically absorbed. In the present embodiment, as a representative value, any one of an average value, a median value, a mode value, and a minimum value can be set. The derived representative value is sent to the output signal generator 26.

出力信号生成部２６は、上記代表値を触覚や聴覚などの感覚器の分解能に適合した一次元情報（出力レベル）へと変換する。出力信号生成部２６には、上記代表値を出力レベルに変換するための関数が予め用意されており、代表値をパラメータとして当該関数を参照して出力レベルを決定する。本実施形態において、出力レベルは、周波数または振幅、あるいはその両方（以下、周波数等という）によって定義することができる。出力レベルを周波数等によって定義する場合には、代表値と周波数等が負の相関関係を有する関数を用意することによって、代表値が小さくなるほど高い周波数（あるいは大きな振幅）が決定される。出力信号生成部２６は、決定された周波数等に基づいて周期信号を生成する。また、上記関数に代えて、代表値と周波数等が負の相関関係を有するように対応付けられた出力信号生成用テーブルを用意し、当該テーブルを参照することによって出力レベルを決定するように構成してもよい。 The output signal generation unit 26 converts the representative value into one-dimensional information (output level) adapted to the resolution of a sensory organ such as a sense of touch or hearing. A function for converting the representative value into an output level is prepared in advance in the output signal generation unit 26, and the output level is determined with reference to the function using the representative value as a parameter. In the present embodiment, the output level can be defined by frequency and / or amplitude (hereinafter referred to as frequency and the like). When the output level is defined by a frequency or the like, a function having a negative correlation between the representative value and the frequency is prepared, so that a higher frequency (or a larger amplitude) is determined as the representative value becomes smaller. The output signal generator 26 generates a periodic signal based on the determined frequency or the like. In addition, instead of the above function, an output signal generation table is prepared in which the representative value and the frequency have a negative correlation, and the output level is determined by referring to the table. May be.

最後に、出力装置３０が情報処理装置２０から送信された出力信号によって駆動される。本実施形態においては、出力レベルを周波数等によって定義し、出力装置３０をバイブレータ（振動器）やスピーカなどの振動子として構成することができる。たとえば、振動子としてバイブレータ（振動器）を採用した場合には、出力レベルが大きくなるほど（すなわち、代表値が小さくなるほど）高い周波数（あるいは、大きい振幅）でバイブレータが振動するため、ユーザは、触覚を介して対象物Ｓの距離感を直感的に認識することができる。また、振動子としてスピーカを採用した場合には、出力レベルが大きくなるほど（すなわち、代表値が小さくなるほど）高い周波数（あるいは、大きい振幅）の音波が発生するため、ユーザは、聴覚を介して対象物Ｓの距離感を直感的に認識することができる。その他、出力装置３０を加圧子として構成し、出力レベルを圧力によって定義することによって、対象物Ｓの距離感をユーザの圧覚を介して伝えるように構成することもできる。 Finally, the output device 30 is driven by the output signal transmitted from the information processing device 20. In the present embodiment, the output level is defined by the frequency or the like, and the output device 30 can be configured as a vibrator such as a vibrator or a speaker. For example, when a vibrator is used as the vibrator, the vibrator vibrates at a higher frequency (or a larger amplitude) as the output level increases (that is, as the representative value decreases). It is possible to intuitively recognize the sense of distance of the object S via In addition, when a speaker is used as the vibrator, a sound wave having a high frequency (or a large amplitude) is generated as the output level increases (that is, as the representative value decreases). The sense of distance of the object S can be recognized intuitively. In addition, the output device 30 may be configured as a pressurizer, and the output level may be defined by pressure, so that the sense of distance of the object S can be transmitted via the user's pressure sense.

なお、上述した代表値を出力レベルに変換するための関数（あるいは出力信号生成用テーブル）を定義する場合、出力装置３０に固有の出力レンジを代表値の範囲に割り当てることが必要になる。代表値の範囲を広くとった場合、近景から遠景までの距離感を網羅的にユーザに伝えることが可能になるが、その分、情報の解像度は低くなる。したがって、本実施形態においては、代表値の範囲について複数のモード（たとえば、「遠景モード」、「近景モード」など）を定義し、各モードを切替え可能に構成することが好ましい。たとえば、「遠景モード」においては、代表値の範囲を「１〜４ｍ」とすることによって、ユーザは、周囲の状況について、従来の白杖を使用する場合よりも広い範囲の情報を取得することが可能になる。一方、「近景モード」においては、代表値の範囲を、「０〜２ｍ」とすることによって、ユーザは、近接した環境について、高い解像度の情報を取得することが可能になる。なお、これら各モードの切替えは、ユーザが操作する設定切替手段４０からの入力に応答して設定切替部２７が出力信号生成部２６を制御することによって実行される。 When defining a function (or an output signal generation table) for converting the above-described representative value into an output level, it is necessary to assign an output range unique to the output device 30 to the range of the representative value. When the range of the representative value is wide, it becomes possible to comprehensively convey the sense of distance from the near view to the distant view to the user, but the information resolution is reduced accordingly. Therefore, in the present embodiment, it is preferable to define a plurality of modes (for example, “distant view mode”, “near view mode”, etc.) for the range of representative values and to be able to switch between the modes. For example, in the “distant view mode”, by setting the range of the representative value to “1 to 4 m”, the user can acquire a wider range of information about the surrounding situation than when the conventional white cane is used. Is possible. On the other hand, in the “near view mode”, by setting the range of the representative value to “0 to 2 m”, the user can acquire high-resolution information about the close environment. These modes are switched by the setting switching unit 27 controlling the output signal generating unit 26 in response to an input from the setting switching unit 40 operated by the user.

以上、障害物の距離感をユーザに直感的に伝える実施形態について説明してきたが、本発明はこれに限定されるものではなく、たとえば、出力装置３０を音声出力装置として構成し、代表値を音の階調に対応付けて定義することよって、対象物Ｓの距離感をユーザに概念的に伝えたり、代表値に対応する距離値を予め用意された言語音声で伝えたりすることによって、対象物Ｓまでの距離をユーザに定量的に伝えることもできる。以上、本実施形態である空間認識装置１００の各構成要素について説明してきたが、続いて、本実施形態の空間認識装置１００における注目領域Ｒの設定モードについて、以下説明する。 As described above, the embodiment has been described in which the sense of distance of the obstacle is intuitively transmitted to the user. However, the present invention is not limited to this. For example, the output device 30 is configured as an audio output device, and the representative value is set. By defining in correspondence with the tone of the sound, the user can conceptually convey the sense of distance of the object S to the user, or convey the distance value corresponding to the representative value with a prepared language voice. The distance to the object S can be quantitatively transmitted to the user. The components of the space recognition device 100 according to the present embodiment have been described above. Next, the attention area R setting mode in the space recognition device 100 according to the present embodiment will be described below.

本実施形態の空間認識装置１００においては、上述したように注目領域Ｒについて複数の設定モードを予め定義しておき、各モードを切り替え可能に構成することが好ましい。本実施形態においては、設定モードとして、たとえば、「広域モード」、「局所モード」および「スリットモード」という３つのモードを定義することができる。これら３つのモードは、空間認識装置１００の異なった活用方法をユーザに提供することができる。以下、図３および図４を参照しながら、上記３つのモードについて説明する。 In the space recognition apparatus 100 of the present embodiment, it is preferable that a plurality of setting modes are defined in advance for the attention area R as described above, and each mode can be switched. In the present embodiment, as the setting mode, for example, three modes of “wide area mode”, “local mode”, and “slit mode” can be defined. These three modes can provide the user with different utilization methods of the space recognition apparatus 100. Hereinafter, the three modes will be described with reference to FIGS. 3 and 4.

図３（ａ）〜（ｃ）は、３つのモードにおける注目領域Ｒを破線で囲んで示す。図３（ａ）に示す「広域モード」においては、主画像の全域（あるいはそれに近い範囲）が注目領域Ｒとして定義され、代表値として「最小値」が定義される。また、図３（ｂ）に示す「局所モード」においては、ごく狭い領域が注目領域Ｒとして定義され、代表値として「平均値」、「中央値」、または「最頻値」のいずれかを採用することができる。さらに、図３（ｃ）に示す「スリットモード」においては、アスペクト比が大きい長方形領域（すなわち、スリット状領域）が注目領域Ｒとして定義され、「局所モード」と同じく、代表値として「平均値」、「中央値」、または「最頻値」のいずれかを採用することができる。なお、本実施形態においては、図３（ｃ）に示すように「スリットモード」における注目領域Ｒをその長手方向がエピポーラ線に対して平行になるように構成する他、エピポーラ線に対して垂直になるように構成したり、エピポーラ線に対して任意の角度に傾けて構成したりすることもできる。各モードの切替えは、ユーザが操作する設定切替手段４０からの入力に応答して設定切替部２７が注目領域設定部２２および代表値導出部２５を制御することによって実行される。なお、以下の説明においては、「局所モード」および「スリットモード」の代表値を「平均値」とするものとして説明する。 3A to 3C show the attention area R in the three modes surrounded by a broken line. In the “wide area mode” shown in FIG. 3A, the entire area of the main image (or an area close thereto) is defined as the attention area R, and the “minimum value” is defined as the representative value. In the “local mode” shown in FIG. 3B, a very narrow area is defined as the attention area R, and any one of “average value”, “median value”, and “mode” is set as a representative value. Can be adopted. Further, in the “slit mode” shown in FIG. 3C, a rectangular area having a large aspect ratio (that is, a slit-like area) is defined as the attention area R. ”,“ Median ”, or“ mode ”can be employed. In the present embodiment, as shown in FIG. 3C, the region of interest R in the “slit mode” is configured such that its longitudinal direction is parallel to the epipolar line, and perpendicular to the epipolar line. It can also be configured to be inclined or inclined at an arbitrary angle with respect to the epipolar line. Switching of each mode is executed by the setting switching unit 27 controlling the attention area setting unit 22 and the representative value deriving unit 25 in response to an input from the setting switching unit 40 operated by the user. In the following description, the “local mode” and “slit mode” representative values are assumed to be “average values”.

図４（ａ）〜（ｃ）は、３つのモードの活用方法を説明するための概念図である。図４（ａ）に示す「広域モード」は、障害物の存在を広く検出したい場合に利用することができる。図４（ａ）に示す例においては、注目領域Ｒには、（０．５ｍ先にある）障害物５２および（４ｍ先にある）障害物５４ならびにその他の背景が写り込んでいる。「広域モード」においては、代表値として「最小値」が設定されるため、ユーザの視野範囲内において一番近くに存在する細長い棒状の障害物５２までの離間距離に近似する距離値（０．５ｍ）が代表値として導出され、当該距離値（０．５ｍ）に基づいて出力レベルが決定される。その結果、ユーザは、自身の視野範囲内において一番近い障害物５２の存在をその距離感とともに認知することができる。但し、「広域モード」においては、その障害物がロープのようなものであっても、壁のようなものあっても、それらがユーザから見て同じ距離にあれば、出力レベルは等しくなる。また、その障害物が視野範囲内のどこにあっても、それらがユーザから見て同じ距離にあれば、出力レベルは等しくなる。したがって、「広域モード」においては、ユーザは、どのような形状の障害物がどの方向にどのような態様で存在するのかについて認知することができない。 4A to 4C are conceptual diagrams for explaining a method of utilizing the three modes. The “wide area mode” shown in FIG. 4A can be used when it is desired to widely detect the presence of an obstacle. In the example illustrated in FIG. 4A, the attention area R includes an obstacle 52 (0.5 m ahead), an obstacle 54 (4 m ahead), and other backgrounds. In the “wide mode”, since “minimum value” is set as a representative value, a distance value (0...) That approximates the separation distance to the elongated bar-shaped obstacle 52 that is closest to the user in the visual field range. 5m) is derived as a representative value, and the output level is determined based on the distance value (0.5 m). As a result, the user can recognize the presence of the closest obstacle 52 within his visual field range together with the sense of distance. However, in the “wide area mode”, even if the obstacle is a rope or a wall, the output levels are equal if they are at the same distance as viewed from the user. Also, wherever the obstacle is in the visual field range, if they are at the same distance when viewed from the user, the output levels are equal. Therefore, in the “wide area mode”, the user cannot recognize what shape of an obstacle exists in which direction and in what manner.

一方、図４（ｂ）に示す「局所モード」は、対象物までに距離感を正確に把握したい場合に利用することができる。図４（ｂ）に示す例においては、注目領域Ｒ内の画素の大部分に（４ｍ先にある）障害物５４が写り込んでいる。「局所モード」においては、代表値として「平均値」が設定されるため、ユーザから障害物５４までの離間距離に近似する距離値（４ｍ）が代表値として導出され、当該距離値（４ｍ）に基づいて出力レベルが決定される。その結果、ユーザは、主カメラ１２の光軸上にある障害物５４の存在をその距離感とともに認知することができる。なお、代表値に対応する距離値を示す言語音声（出力信号）を生成するように出力信号生成部２６を構成した場合、「局所モード」においては、障害物５４までの距離を「４ｍです」と言うような言語音声でユーザに伝えることができる。 On the other hand, the “local mode” shown in FIG. 4B can be used when it is desired to accurately grasp the sense of distance to the object. In the example shown in FIG. 4B, the obstacle 54 (4 meters ahead) is reflected in most of the pixels in the attention area R. In the “local mode”, since “average value” is set as the representative value, a distance value (4 m) that approximates the separation distance from the user to the obstacle 54 is derived as the representative value, and the distance value (4 m) The output level is determined based on. As a result, the user can recognize the presence of the obstacle 54 on the optical axis of the main camera 12 together with the sense of distance. In addition, when the output signal generation unit 26 is configured to generate language speech (output signal) indicating the distance value corresponding to the representative value, the distance to the obstacle 54 is “4 m” in the “local mode”. Can be communicated to the user with a language voice.

従来、ユーザは、白杖が対象物に触れる度にその感触を「点」として捉え、捉えた複数の「点」を時系列的に統合することによって、対象物の全体像を認知していたということができる。そういった意味では、本実施形態の「局所モード」は、従来の白杖と同様の利用形態が期待できるものと言えるであろう。但し、本実施形態の「局所モード」には、従来の白杖にない利点がある。すなわち、白杖の場合は、杖が届く距離範囲内にある対象物しか認知することができないのに対し、本実施形態の「局所モード」によれば、理論的にはその適用距離範囲に限定がない。したがって、ユーザは、本実施形態の空間認識装置１００を「伸縮自在の白杖」といった感覚で利用することができるであろう。 Conventionally, each time a white cane touches an object, the user recognizes the touch as a “point”, and recognizes the overall image of the object by integrating multiple captured “points” in time series. It can be said. In that sense, it can be said that the “local mode” of the present embodiment can be expected to be used in the same manner as a conventional white cane. However, the “local mode” of the present embodiment has an advantage over the conventional white cane. That is, in the case of a white cane, only an object within the distance range where the cane can reach can be recognized, whereas according to the “local mode” of the present embodiment, theoretically limited to the applicable distance range. There is no. Therefore, the user will be able to use the space recognition device 100 of the present embodiment as if it were a “stretchable white cane”.

一方、図４（ｃ）に示す「スリットモード」においては、代表値として「平均値」が設定されるため、注目領域Ｒ内の全画素から算出される距離値の「平均値」が出力レベルに変換される。ここで、図４（ｃ）に示す例においては、注目領域Ｒ内の画素に障害物５２の一部および障害物５４の一部ならびにその他の背景が写り込んでいるため、注目領域Ｒ内の全画素から算出される距離値の「平均値」から変換された出力レベルは、障害物５２の距離感と障害物５４の距離感とその他の背景の距離感とを平準化したものになり、情報としての有意性がないようにも見える。しかしながら、本実施形態の「スリットモード」によれば、ユーザに対して障害物のより具体的なイメージ（奥行き感）を提供することができるのである。この点について、図５〜図１４を参照して、以下説明する。 On the other hand, in the “slit mode” shown in FIG. 4C, the “average value” is set as the representative value, and therefore the “average value” of the distance values calculated from all the pixels in the attention area R is the output level. Is converted to Here, in the example shown in FIG. 4C, a part of the obstacle 52, a part of the obstacle 54, and other backgrounds are reflected in the pixels in the attention area R. The output level converted from the “average value” of the distance values calculated from all the pixels is obtained by leveling the sense of distance of the obstacle 52, the sense of distance of the obstacle 54, and the sense of distance of other backgrounds. It seems that there is no significance as information. However, according to the “slit mode” of the present embodiment, a more specific image (a sense of depth) of the obstacle can be provided to the user. This point will be described below with reference to FIGS.

図５は、本実施形態の空間認識装置１００のステレオ撮像装置１０を示す。なお、図５（ａ）は、ステレオ撮像装置１０の前面を示し、図５（ｂ）は、ステレオ撮像装置１０の背面を示す。図５に示す例においては、ステレオ撮像装置１０は、ユーザが片手で持って使用することを想定して細長い円柱状の筐体１１を利用して実装されている。図５（ａ）に示すように、筐体１１前面の両端近傍には、超小型３ＣＣＤカメラとして構成される主カメラ１２および副カメラ１４が適切な間隔をもって平行等位に設置されている。また、筐体１１前面の中央部には凹凸などの特殊な表面加工１５が施されており、ユーザが表面加工１５を触覚により認知することによってステレオ撮像装置１０の前面（すなわち、カメラの撮像方向）を認識することができるように構成されている。 FIG. 5 shows the stereo imaging device 10 of the space recognition device 100 of the present embodiment. 5A shows the front surface of the stereo imaging device 10, and FIG. 5B shows the back surface of the stereo imaging device 10. In the example shown in FIG. 5, the stereo imaging device 10 is mounted using an elongated cylindrical casing 11 on the assumption that the user uses it with one hand. As shown in FIG. 5A, in the vicinity of both ends of the front surface of the housing 11, a main camera 12 and a sub camera 14 configured as an ultra-small 3CCD camera are installed in parallel equidistant positions with appropriate intervals. Further, a special surface processing 15 such as unevenness is applied to the central portion of the front surface of the housing 11, and when the user recognizes the surface processing 15 by touch, the front surface of the stereo imaging device 10 (that is, the imaging direction of the camera). ) Can be recognized.

一方、ステレオ撮像装置１０の背面には、図５（ｂ）に示すように、出力装置３０としてのバイブレータ３０が一体的に固設されており、ユーザがステレオ撮像装置１０を握った状態において、手の平でバイブレータ３０の振動を感じ取ることができるように構成されている。 On the other hand, as shown in FIG. 5B, a vibrator 30 as an output device 30 is integrally fixed on the rear surface of the stereo imaging device 10, and in a state where the user holds the stereo imaging device 10, The vibration of the vibrator 30 can be sensed with a palm.

さらに、図５に示す例においては、設定切替手段４０としてのスイッチ４０がステレオ撮像装置１０に一体的に形成されている。ユーザは、必要に応じて、図５（ｃ）に示すように、スイッチ４０を親指で操作することによって、上述した「広域モード」、「局所モード」、「スリットモード」、「遠景モード」、「近景モード」などの各種設定の切替えを行なうことができるように構成されている。本実施形態は、スイッチ４０の具体的な構成について特に限定するものではないが、多段式押しスイッチやユニバーサルスイッチなどを用いて、視覚障害者が現在の選択モードを触覚で確認することができるように構成することが好ましいであろう。 Further, in the example shown in FIG. 5, the switch 40 as the setting switching unit 40 is formed integrally with the stereo imaging device 10. As shown in FIG. 5C, the user operates the switch 40 with his / her thumb as necessary, thereby performing the above-described “wide area mode”, “local mode”, “slit mode”, “distant view mode”, Various settings such as “near view mode” can be switched. Although this embodiment does not specifically limit the specific configuration of the switch 40, a visually handicapped person can confirm the current selection mode with a tactile sense using a multistage push switch, a universal switch, or the like. It would be preferable to configure.

図６（ａ）は、図５に示したステレオ撮像装置１０を採用した空間認識装置１００の使用態様を示す図である。図６（ａ）に示す例においては、ユーザは、ステレオ撮像装置１０を右手に持ち、情報処理装置２０を腰ベルトに装着した状態で歩いている。ステレオ撮像装置１０と情報処理装置２０とは信号線１６で接続され、双方向通信が可能に構成されている。なお、両者の双方向通信は、無線通信によって実現してもよい。さらに、ステレオ撮像装置１０および情報処理装置２０を一つの筐体内に実装して一体化してもよい。 FIG. 6A is a diagram illustrating a usage mode of the space recognition device 100 that employs the stereo imaging device 10 illustrated in FIG. 5. In the example shown in FIG. 6A, the user is walking with the stereo imaging device 10 in the right hand and the information processing device 20 mounted on the waist belt. The stereo imaging device 10 and the information processing device 20 are connected by a signal line 16 and configured to be capable of bidirectional communication. The two-way communication between them may be realized by wireless communication. Furthermore, the stereo imaging device 10 and the information processing device 20 may be mounted and integrated in one housing.

なお、本発明は、上述した実施形態に限定されるものではなく、出力装置３０をステレオ撮像装置１０と別体とすることもでき、たとえば、図６（ｂ）に示すように、ユーザの腕などにバイブレータ３０を装着することもできる。また、図６（ｂ）に示すように、出力装置３０をスピーカ装置（たとえば、ヘッドフォン等）として構成することもでき、あるいは、バイブレータとスピーカ装置を併用することもできる。さらに、図６（ｂ）に示すように、設定切替手段４０として、情報処理装置２０の筐体に点字を付した設定ボタンを備える操作パネルを形成することもできる。 Note that the present invention is not limited to the above-described embodiment, and the output device 30 can be separated from the stereo imaging device 10. For example, as shown in FIG. It is also possible to attach the vibrator 30 to, for example. As shown in FIG. 6B, the output device 30 can be configured as a speaker device (for example, headphones), or a vibrator and a speaker device can be used in combination. Furthermore, as shown in FIG. 6B, an operation panel including a setting button with Braille attached to the housing of the information processing apparatus 20 can be formed as the setting switching unit 40.

次に、本実施形態における「スリットモード」の効用について説明する。ユーザの前に広がる生活空間には、様々な対象物が存在しうるが、その中でもユーザが特に必要とするのは、たとえば、「柱」、「壁」、「塀」、「出入口」、「通路」などに関する情報である。仮に、これらの対象物の距離感や奥行き感をユーザに高精度に認知させることができたならば、それはユーザにとって大きな助けとなるであろう。本発明者は、この点につき、これらの対象物が水平方向または鉛直方向の少なくとも一方に空間的な境界を有することに着目して、上述した「スリットモード」に想到したのである。すなわち、「スリットモード」は、対象物の水平方向あるいは鉛直方向にのびる空間的な境界の存在を選択的に検出することが可能なモードであり、ユーザはこの「スリットモード」を利用することによって、対象物についてより具体的な空間イメージを取得することができるのである。以下、この点について、図７〜図１４を参照しながら、具体的な例に基づいて説明する。 Next, the utility of the “slit mode” in the present embodiment will be described. There can be various objects in the living space that spreads in front of the user. Among them, the user particularly needs, for example, “pillar”, “wall”, “、”, “entrance”, “ It is information about "passage". If the user can perceive the sense of distance and depth of these objects with high accuracy, it will greatly help the user. The present inventor has come up with the “slit mode” described above, focusing on the fact that these objects have a spatial boundary in at least one of the horizontal direction and the vertical direction. That is, the “slit mode” is a mode capable of selectively detecting the presence of a spatial boundary extending in the horizontal direction or the vertical direction of an object, and the user can use this “slit mode” by using this “slit mode”. Therefore, a more specific spatial image can be acquired for the object. Hereinafter, this point will be described based on a specific example with reference to FIGS.

図７は、「スリットモード」における（図５に示した）ステレオ撮像装置１０の使用態様を示す図である。なお、図７においては、説明の便宜上、空間認識装置１００のうちステレオ撮像装置１０のみを示している。ユーザは、まず、図７（ａ）に示すように、ステレオ撮像装置１０を持った右手を進行方向に向かって突き出した状態で前に進む。この時、空間認識装置１００を「広域モード」に設定しておくことによって、ユーザは、バイブレータ３０の振動を介して進行方向に何らかの障害物が存在することを認知することができる。 FIG. 7 is a diagram illustrating a usage mode of the stereo imaging device 10 (shown in FIG. 5) in the “slit mode”. In FIG. 7, only the stereo imaging device 10 of the space recognition device 100 is shown for convenience of explanation. First, as shown in FIG. 7A, the user proceeds forward with the right hand holding the stereo imaging device 10 protruding in the traveling direction. At this time, by setting the space recognition device 100 to the “wide area mode”, the user can recognize that there is an obstacle in the traveling direction through the vibration of the vibrator 30.

たとえば、図８に示すように、ユーザの進行方向上に幅の狭い通路が現われたとする。図８（ａ）においては、「広域モード」に設定された主カメラの視野を実線太枠で示し、注目領域Ｒを破線枠で示している。「広域モード」においては、図８（ａ）に示すように、注目領域Ｒ内に「壁」、「通路」、および「通路の先の遠景」が写り込んでいる。「広域モード」においては、各画素について算出された距離値の「最小値」が代表値として導出されるので、ユーザに一番近接した障害物である「壁」に対応した距離値が代表値として導出され、これがバイブレータ３０の振動に変換される。ユーザは、当該振動により前方に何らかの障害物が存在することを認知して立ち止まる。 For example, as shown in FIG. 8, it is assumed that a narrow passage appears in the traveling direction of the user. In FIG. 8A, the field of view of the main camera set to the “wide area mode” is indicated by a solid thick frame, and the attention area R is indicated by a broken line frame. In the “wide area mode”, as shown in FIG. 8A, “wall”, “passage”, and “far view beyond the passage” are reflected in the attention area R. In the “wide area mode”, the “minimum value” of the distance value calculated for each pixel is derived as a representative value, so the distance value corresponding to the “wall” that is the obstacle closest to the user is the representative value. This is converted into vibration of the vibrator 30. The user stops by recognizing that there is some obstacle ahead due to the vibration.

しかし、この時点では、その障害物の態様についてユーザはその詳細を知ることができない。そこで、ユーザは、設定切替手段４０を操作して、空間認識装置１００を「スリットモード」に切替える。図８（ｂ）においては、「スリットモード」に設定された主カメラの視野を実線太枠で示し、注目領域Ｒを破線枠で示している。ここで、注目すべきは、「スリットモード」の注目領域Ｒの長手方向がステレオ撮像装置１０の長手方向に対応している点である。このように注目領域Ｒの長手方向をステレオ撮像装置１０の長手方向に対応させることによって、ユーザは、「スリットモード」における「スリット」の向きを直感的に認識することができる。 However, at this point, the user cannot know the details of the obstacle mode. Therefore, the user operates the setting switching unit 40 to switch the space recognition device 100 to the “slit mode”. In FIG. 8B, the field of view of the main camera set to the “slit mode” is indicated by a solid thick frame, and the attention area R is indicated by a broken line frame. Here, it should be noted that the longitudinal direction of the region of interest R in the “slit mode” corresponds to the longitudinal direction of the stereo imaging device 10. Thus, by making the longitudinal direction of the attention area R correspond to the longitudinal direction of the stereo imaging device 10, the user can intuitively recognize the direction of the “slit” in the “slit mode”.

空間認識装置１００を「スリットモード」に切替えたユーザは、図７（ｂ）に示すように、ステレオ撮像装置１０の長手方向を水平方向と平行にした状態で上下に動かす。図９（ａ）は、その間に撮像される複数の主画像の注目領域Ｒを丸囲み数字１〜７で時系列的に示す。なお、図９（ａ）においては、主カメラの視野については省略し、注目領域Ｒのみを破線枠で示している（以下、図１０〜図１４についても同様）。一方、図９（ｂ）は、バイブレータ３０の出力レベル（％）の時系列的な変化を示す。なお、図９に示す例においては、空間認識装置１００は、上述した「遠景モード」に設定されているものとして参照されたい（図１０および図１１についても同様）。 As shown in FIG. 7B, the user who has switched the space recognition device 100 to the “slit mode” moves the stereo imaging device 10 up and down in a state where the longitudinal direction of the stereo imaging device 10 is parallel to the horizontal direction. FIG. 9A shows the attention area R of a plurality of main images captured in the meantime in a time series with circled numbers 1-7. In FIG. 9A, the field of view of the main camera is omitted, and only the attention area R is indicated by a broken line frame (the same applies to FIGS. 10 to 14). On the other hand, FIG. 9B shows a time-series change in the output level (%) of the vibrator 30. In the example illustrated in FIG. 9, the space recognition apparatus 100 should be referred to as being set in the “distant view mode” described above (the same applies to FIGS. 10 and 11).

「スリットモード」においては、各注目領域Ｒを構成する各画素について算出された距離値の平均値が算出され、当該平均値に対応する出力レベルでバイブレータ３０が振動することになるが、丸囲み数字１〜７で示す注目領域Ｒに写り込んだ内容は、図９（ａ）に示すように互いに大きな差異がないため、各注目領域Ｒから導出される平均値はほぼ等しくなる。その結果、図９（ｂ）に示すように、バイブレータ３０の出力レベルは、時系列的にほとんど変化しない。ただし、この場合の「スリットモード」における平均値は、「通路の先の遠景」が写り込んでいる分、図８（ａ）に示した「広域モード」において取得された最小値よりも若干大きくなるはずなので、ユーザが感じる振動レベルは「広域モード」時のそれよりも少し弱くなる。この感覚の経時的変化によって、ユーザは、前方に壁一面が広がっているのではなく、何らかの奥行きを持った障害物が存在することを認識する。 In the “slit mode”, an average value of the distance values calculated for each pixel constituting each region of interest R is calculated, and the vibrator 30 vibrates at an output level corresponding to the average value. Since the contents reflected in the attention area R indicated by the numbers 1 to 7 are not significantly different from each other as shown in FIG. 9A, the average values derived from the attention areas R are almost equal. As a result, as shown in FIG. 9B, the output level of the vibrator 30 hardly changes in time series. However, the average value in the “slit mode” in this case is slightly larger than the minimum value obtained in the “wide area mode” shown in FIG. 8A because the “far view beyond the passage” is reflected. Therefore, the vibration level felt by the user is slightly weaker than that in the “wide area mode”. The user recognizes that there is an obstacle with some depth rather than the whole wall spreading forward, due to the temporal change of this sensation.

次に、ユーザは、空間認識装置１００を「スリットモード」に保持したまま、図７（ｃ）に示すように、手を９０°回転させてステレオ撮像装置１０の長手方向を鉛直方向に平行にしてステレオ撮像装置１０を左右に動かす。図１０（ａ）は、その間に撮像される複数の主画像の注目領域Ｒを丸囲み数字１〜５で時系列的に示す。図１０（ａ）に示されるように、各注目領域Ｒに写り込んだ内容は、時系列的に大きく変化している。これに伴って、図１０（ｂ）に示すようにバイブレータ３０の出力レベルは、時系列的に大きく変化する。ユーザは、バイブレータ３０の振動レベルの経時的変化から、前方に２つの壁が存在し、さらに、その２つの壁の間に少なくとも４ｍ以上の奥行きが延びていることを推定する。すなわち、ユーザは、前方に狭い通路が存在することを認識することができる。 Next, as shown in FIG. 7C, the user rotates the hand by 90 ° while keeping the space recognition device 100 in the “slit mode” so that the longitudinal direction of the stereo imaging device 10 is parallel to the vertical direction. Then, the stereo imaging device 10 is moved left and right. FIG. 10A shows the attention area R of a plurality of main images captured in the meantime in circles with numerals 1 to 5 in a time series. As shown in FIG. 10A, the contents reflected in each region of interest R change greatly in time series. Along with this, as shown in FIG. 10B, the output level of the vibrator 30 largely changes in time series. The user estimates from the time-dependent change in the vibration level of the vibrator 30 that there are two walls in front and that a depth of at least 4 m extends between the two walls. That is, the user can recognize that there is a narrow passage ahead.

ユーザは、空間認識装置１００を「スリットモード」に保持したまま、図７（ｄ）に示すように、同様にステレオ撮像装置１０を左右に動かしながら、通路の中を進んでいく。図１１（ａ）は、その間に撮像される複数の主画像の注目領域Ｒを丸囲み数字１〜８で時系列的に示す。図１１（ａ）に示されるように、各注目領域Ｒに写り込んだ内容の変化に応じて、バイブレータ３０の出力レベルは、図１１（ｂ）に示すように時系列的に変化する。ユーザは、バイブレータ３０の振動レベルの経時的変化から通路の幅を認識することができ、安全に前進することができる。 As shown in FIG. 7D, the user proceeds in the passage while moving the stereo imaging device 10 in the same manner as shown in FIG. 7D while holding the space recognition device 100 in the “slit mode”. FIG. 11A shows the attention areas R of a plurality of main images captured in the meantime in a time series with circled numbers 1-8. As shown in FIG. 11 (a), the output level of the vibrator 30 changes in time series as shown in FIG. 11 (b) in accordance with the change in the contents reflected in each region of interest R. The user can recognize the width of the passage from the change in the vibration level of the vibrator 30 with time, and can advance safely.

さらに、図１２〜図１４を参照して、「スリットモード」のさらなる効用について説明する。なお、図１２〜図１４に示す例においては、空間認識装置１００は、上述した「近景モード」に設定されているものとして参照されたい。 Furthermore, further utility of the “slit mode” will be described with reference to FIGS. 12 to 14. In the examples illustrated in FIGS. 12 to 14, the space recognition device 100 should be referred to as being set to the “near view mode” described above.

たとえば、ユーザの進行方向上に図１２に示すような障害物が現われたとする。この場合、ユーザが、図１２（ａ）に示すように、ステレオ撮像装置１０を左右に動かした場合であっても、図１２（ｂ）に示すように、ステレオ撮像装置１０を上下に動かした場合であっても、バイブレータ３０の出力レベル（すなわち、ユーザが感じる振動レベル）は、経時的に変化しない。このことに基づいて、ユーザは、前方に壁一面がはだかっていることを認識することができる。 For example, it is assumed that an obstacle as shown in FIG. 12 appears in the traveling direction of the user. In this case, even if the user moves the stereo imaging device 10 left and right as shown in FIG. 12A, the user moves the stereo imaging device 10 up and down as shown in FIG. 12B. Even in this case, the output level of the vibrator 30 (that is, the vibration level felt by the user) does not change with time. Based on this, the user can recognize that the whole wall is facing forward.

次に、ユーザの進行方向上に図１３に示すような障害物が現われたとする。この場合、ユーザがステレオ撮像装置１０を左右に動かした場合には、図１３（ａ）に示すように、バイブレータ３０の出力レベル（ユーザが感じる振動レベル）は、経時的に変化しないが、ステレオ撮像装置１０を上下に動かした場合には、バイブレータ３０の出力レベル（ユーザが感じる振動レベル）は、図１３（ｂ）に示す態様で経時的に変化する。ユーザは、ステレオ撮像装置１０を左右に動かした場合に感じる感覚の経時的変化と上下に動かした場合に感じる感覚の経時的変化の違いに基づいて、前方に急傾斜の坂がはだかっていることを認識することができる。 Next, it is assumed that an obstacle as shown in FIG. 13 appears in the traveling direction of the user. In this case, when the user moves the stereo imaging device 10 left and right, the output level of the vibrator 30 (vibration level felt by the user) does not change with time, as shown in FIG. When the imaging apparatus 10 is moved up and down, the output level of the vibrator 30 (vibration level felt by the user) changes with time in the manner shown in FIG. The user has a steeply sloped slope forward based on the difference between the temporal change in the sensation felt when moving the stereo imaging device 10 left and right and the temporal change in the sensation felt when moving up and down. I can recognize that.

次に、ユーザの進行方向上に図１４に示すような障害物が現われたとする。この場合、ユーザがステレオ撮像装置１０を左右に動かした場合には、図１４（ａ）に示すように、バイブレータ３０の出力レベル（ユーザが感じる振動レベル）は、経時的に変化しないが、ステレオ撮像装置１０を上下に動かした場合には、バイブレータ３０の出力レベル（ユーザが感じる振動レベル）は、図１４（ｂ）に示す態様で経時的に変化する。ここで、注目すべきは、図１４（ｂ）に示した出力レベルの経時的変化と図１３（ｂ）に示した急傾斜の坂のときの出力レベルの経時的変化の態様の違いである。ユーザは、この２つの経時的変化の態様の違いを感覚的に検知することによって、前方にはだかっている障害物が急傾斜の坂ではなく、階段であることを認識することができるであろう。 Next, it is assumed that an obstacle as shown in FIG. 14 appears in the traveling direction of the user. In this case, when the user moves the stereo imaging device 10 to the left or right, the output level of the vibrator 30 (vibration level felt by the user) does not change with time as shown in FIG. When the imaging apparatus 10 is moved up and down, the output level of the vibrator 30 (vibration level felt by the user) changes with time in the manner shown in FIG. What should be noted here is the difference between the temporal change in the output level shown in FIG. 14B and the temporal change in the output level at the steep slope shown in FIG. 13B. . The user can recognize that the obstacle standing ahead is not a steep slope but a staircase by detecting the difference between the two temporal changes. Let's go.

以上、説明したように、本実施形態の「スリットモード」によれば、ユーザは、ステレオ撮像装置１０を上下、左右に動かすことに伴って触覚により感受される振動レベルの経時的変化に基づいて、水平方向あるいは鉛直方向に延びる空間的な境界の存在をその距離感とともに認知することができ、その結果、空間をより正確に認識することができるのである。 As described above, according to the “slit mode” of the present embodiment, the user is based on the temporal change of the vibration level sensed by tactile sense as the stereo imaging device 10 is moved up and down and left and right. The presence of a spatial boundary extending in the horizontal direction or the vertical direction can be recognized together with the sense of distance, and as a result, the space can be recognized more accurately.

以上、本発明について実施形態をもって説明してきたが、本発明は上述した実施形態に限定されるものではない。上述した実施形態においては、注目領域の形状および大きさについて予め複数の設定モードを定義しておき、これを適宜切替える構成について説明したが、本発明においては、別の実施形態として、ダイヤル式などの設定切替手段を採用し、注目領域の高さおよび幅を調整自在に構成することもできる。さらに、上述した実施形態においては、出力装置の出力レンジを割り当てる代表値の範囲について予め複数の設定モードを定義しておき、これを適宜切替える構成について説明したが、同じく、ダイヤル式などの設定切替手段を採用して任意の代表値の範囲を設定自在に構成することもできる。その他、当業者が推考しうる実施態様の範囲内において、本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 Although the present invention has been described with the embodiment, the present invention is not limited to the above-described embodiment. In the embodiment described above, a configuration has been described in which a plurality of setting modes are defined in advance for the shape and size of the region of interest, and this is switched as appropriate. In the present invention, a dial type or the like is provided as another embodiment. By adopting the setting switching means, it is possible to adjust the height and width of the attention area. Furthermore, in the above-described embodiment, a configuration has been described in which a plurality of setting modes are defined in advance for the range of representative values to which the output range of the output device is assigned, and this is switched as appropriate. It is also possible to adopt a means to freely set a range of representative values. In addition, it is included in the scope of the present invention as long as the effects and effects of the present invention are exhibited within the scope of embodiments that can be considered by those skilled in the art.

１０…ステレオ撮像装置
１１…筐体
１２…主カメラ
１４…副カメラ
１５…表面加工
１６…信号線
２０…情報処理装置
２１…画像入力用インターフェース
２２…注目領域設定部
２３…ステレオマッチング部
２４…距離値算出部
２５…代表値導出部
２６…出力信号生成部
２７…設定切替部
３０…出力装置（バイブレータ）
４０…設定切替手段（スイッチ）
５２，５４…障害物
１００…視覚障害者用空間認識装置 DESCRIPTION OF SYMBOLS 10 ... Stereo imaging device 11 ... Housing 12 ... Main camera 14 ... Sub camera 15 ... Surface processing 16 ... Signal line 20 ... Information processing device 21 ... Image input interface 22 ... Attention area setting part 23 ... Stereo matching part 24 ... Distance Value calculating unit 25 ... representative value deriving unit 26 ... output signal generating unit 27 ... setting switching unit 30 ... output device (vibrator)
40. Setting switching means (switch)
52, 54 ... Obstacle 100 ... Space recognition device for visually impaired

Claims

A spatial recognition device for a visually impaired person including a stereo imaging device including a main camera and a sub camera, an information processing device, and an output device,
The information processing apparatus includes:
A region-of-interest setting unit that defines a pixel region of a part of the main image captured by the main camera as a region of interest;
A stereo matching unit that performs stereo matching with a sub-image captured by a sub-camera for each pixel constituting the region of interest, and detects pixels in the sub-image corresponding to the pixels of the region of interest;
A distance value calculation unit that calculates a distance value of each pixel of the region of interest based on a parallax calculated from the coordinates of the pixel in the corresponding sub-image detected with each pixel of the region of interest;
A representative value deriving unit for deriving a statistical representative value for the plurality of calculated distance values;
An output signal generator that generates an output signal with a higher output level as the representative value becomes smaller;
A space recognition device for a visually impaired person including an output device that is driven based on the output signal.

The space recognition device for a visually impaired person according to claim 1, wherein the region of interest is defined as a slit-like region, and the representative value is selected from the group consisting of an average value, a median value, and a mode value.

The space recognition device for a visually impaired person according to claim 1, wherein the attention area is defined as a narrow area, and the representative value is selected from the group consisting of an average value, a median value, and a mode value.

The space recognition apparatus for a visually impaired person according to claim 1, wherein the attention area is defined as the entire area of the main image, and the representative value is a minimum value.

A slit mode in which the region of interest is defined as a slit-like region and the representative value is selected from the group consisting of an average value, a median value, and a mode value, the region of interest is defined as a narrow region, and the representative value is an average value and a median value And a local mode selected from the group consisting of the mode values, and a setting switching unit for switching three settings consisting of a wide area mode in which the region of interest is defined as the entire area of the main image and the representative value is the minimum value The space recognition device for a visually impaired person according to claim 1, further comprising:

The space recognition device for a visually impaired person according to claim 1, wherein the attention area is defined such that a center of gravity thereof coincides with an optical axis of the main camera.

The space recognition device for a visually impaired person according to any one of claims 1 to 6, wherein the output device is a vibrator.

The space recognition device for a visually impaired person according to claim 7, wherein the vibrator is a vibrator or a speaker device.

The space for visually impaired persons according to any one of claims 1 to 8, wherein the stereo imaging device and the output device are integrally mounted in a single portable case. Recognition device.

9. The stereo imaging device, the output device, and the information processing device are integrally mounted in a single portable case that can be carried with one hand. Space recognition device for the visually impaired.

A computer-executable method for causing a computer to drive an output device to allow a user to recognize space, the method comprising:
A functional unit that defines a pixel region of a part of a main image captured by the main camera of a stereo imaging device including a main camera and a sub camera as a region of interest
A function unit that performs stereo matching with a sub-image captured by a sub-camera for each pixel constituting the region of interest, and detects pixels in the sub-image corresponding to the pixels of the region of interest;
A functional unit that calculates a distance value of each pixel of the region of interest based on a parallax calculated from the coordinates of the pixel in the corresponding sub-image that is detected and the pixel of the region of interest; Functional means for deriving a statistical representative value for a plurality of the distance values;
Functional means for generating an output signal having a larger output level as the representative value becomes smaller;
Functional means for driving the output device based on the output signal;
How to realize.

A computer-executable program for realizing each functional unit according to claim 11.