JP6930011B2

JP6930011B2 - Information processing equipment, information processing system, and image processing method

Info

Publication number: JP6930011B2
Application number: JP2020158686A
Authority: JP
Inventors: 良徳大橋
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2017-04-14
Filing date: 2020-09-23
Publication date: 2021-09-01
Anticipated expiration: 2037-04-14
Also published as: JP2021007231A

Description

本発明は、撮影画像を用いた処理を行う情報処理装置および当該装置でなされる画像処理方法に関する。 The present invention relates to an information processing apparatus that performs processing using captured images and an image processing method performed by the apparatus.

ユーザの頭部など体の一部をビデオカメラで撮影し、目、口、手などの所定の領域を抽出して、その領域を別の画像で置換してディスプレイに表示するゲームが知られている（例えば、特許文献１参照）。また、ビデオカメラで撮影された口や手の動きをアプリケーションの操作指示として受け取るユーザインタフェースシステムも知られている。このように、実世界を撮影しその動きに反応する仮想世界を表示させたり、何らかの情報処理を行ったりする技術は、小型の携帯端末からレジャー施設まで、その規模によらず幅広い分野で利用されている。 A game is known in which a part of the body such as the user's head is photographed with a video camera, a predetermined area such as eyes, mouth, and hands is extracted, and the area is replaced with another image and displayed on a display. (See, for example, Patent Document 1). There is also known a user interface system that receives mouth and hand movements taken by a video camera as application operation instructions. In this way, technologies that capture the real world, display a virtual world that responds to its movements, and perform some kind of information processing are used in a wide range of fields, from small mobile terminals to leisure facilities, regardless of their scale. ing.

撮影画像を用いて実物体の状態に係る情報を特定する技術として、左右の異なる視点から同じ空間を撮影するステレオカメラを導入し、撮影されたステレオ画像における同じ被写体の像の視差に基づき、被写体までの距離を取得するステレオ画像法が知られている。この技術では、環境光の反射を検出した一般的なカラー画像を用いる場合のほか、被写空間に赤外線など特定の波長帯の光を照射し、その反射を検出した画像を用いる場合もある。 As a technology to identify information related to the state of a real object using captured images, we introduced a stereo camera that captures the same space from different left and right viewpoints, and based on the parallax of the same subject image in the captured stereo image, the subject A stereo imaging method is known to obtain the distance to. In this technique, in addition to using a general color image in which reflection of ambient light is detected, there is also a case in which an image in which light in a specific wavelength band such as infrared rays is irradiated to the subject space and the reflection is detected is used.

一方、近年では、テレビジョン放送や配信動画などの映像表示において画質を向上させるための様々な技術開発の結果、解像度や色域を向上させる技術に加え、輝度のレンジを拡大した信号を処理する技術が普及しつつある。例えばＨＤＲ（High Dynamic Range）は、従来のＳＤＲ（Standard Dynamic Range）と比較し輝度の許容範囲が１００倍程になるため、実世界で眩しいと感じるような対象をよりリアルに表現することができる。 On the other hand, in recent years, as a result of various technological developments for improving image quality in video displays such as television broadcasts and distributed videos, in addition to technologies for improving resolution and color gamut, signals with an expanded brightness range are processed. Technology is becoming widespread. For example, HDR (High Dynamic Range) has a brightness tolerance of about 100 times that of conventional SDR (Standard Dynamic Range), so it is possible to more realistically express objects that feel dazzling in the real world. ..

欧州特許出願公開第０９９９５１８号明細書European Patent Application Publication No. 0999518

撮影画像の輝度レンジは、被写体の位置、数、色、模様、光の状態など、被写空間の状況と、撮影条件や画像の補正パラメータなど撮像装置側の設定との組み合わせによって様々に変化し得る。このため、撮影画像を用いて被写体に係る情報を得たり表示画像を生成したりする場合、そのような不確定要素に起因して、必要な精度が得られなかったり、表示装置の性能を生かしきれなかったりすることが考えられる。 The brightness range of the captured image changes variously depending on the combination of the conditions of the subject space such as the position, number, color, pattern, and light condition of the subject and the settings on the image pickup device side such as the shooting conditions and image correction parameters. obtain. For this reason, when information about a subject is obtained or a display image is generated using a captured image, the required accuracy cannot be obtained due to such uncertainties, or the performance of the display device is utilized. It is possible that it cannot be cut off.

本発明はこうした課題に鑑みてなされたものであり、その目的は、撮影画像を用いた実物体の情報を安定した精度で取得する技術を提供することにある。本発明の別の目的は、撮影画像を用いて、好適な輝度レンジでの画像表現を実現できる技術を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique for acquiring information on a real object using a photographed image with stable accuracy. Another object of the present invention is to provide a technique capable of realizing an image expression in a suitable luminance range by using a photographed image.

本発明のある態様は情報処理装置に関する。この情報処理装置は、撮影される動画像のフレームの画像データを順次取得する画像データ取得部と、新たに取得した現フレームの画像の画素値に、それより前に取得した過去フレームの画像の画素値を、対応する位置で加算した加算画像を生成する画像加算部と、加算画像から特徴点を抽出して所定の解析処理を実施するとともに、現フレームの画像からも特徴点を抽出して同じ解析処理を実施し、両者の結果を統合する画像解析部と、統合した結果を表すデータを出力する出力部と、を備えたことを特徴とする。 One aspect of the present invention relates to an information processing device. This information processing device has an image data acquisition unit that sequentially acquires image data of frames of moving images to be captured, pixel values of newly acquired images of the current frame, and images of past frames acquired before that. An image addition unit that generates an added image in which pixel values are added at corresponding positions, and feature points are extracted from the added image to perform predetermined analysis processing, and feature points are also extracted from the image of the current frame. It is characterized by including an image analysis unit that performs the same analysis processing and integrates the results of both, and an output unit that outputs data representing the integrated results.

ここで「フレームの画像」は、１つのカメラで周期的に撮影された動画像を構成するフレームの画像でもよいし、複数のカメラで同時かつ周期的に撮影された動画像を構成するフレームの画像でもよい。また加算画像を用いて行われる「所定の解析処理」は、撮影画像を用いて何らかの出力を行うための一般的な処理のいずれでもよい。例えば撮影画像から特徴点を検出する処理を含む、位置および姿勢の取得、物体認識、動き検出、視覚追跡解析などが挙げられる。 Here, the "frame image" may be an image of a frame constituting a moving image periodically taken by one camera, or a frame image constituting a moving image taken simultaneously and periodically by a plurality of cameras. It may be an image. Further, the "predetermined analysis process" performed using the added image may be any general process for performing some output using the captured image. For example, acquisition of position and posture, object recognition, motion detection, visual tracking analysis and the like including processing of detecting feature points from captured images can be mentioned.

本発明の別の態様は情報処理システムに関する。この情報処理システムは、ユーザの視線に対応する視野で動画像を撮影する撮像装置を備えたヘッドマウントディスプレイと、動画像に基づき、前記ヘッドマウントディスプレイに表示させる表示画像のデータを生成する情報処理装置と、を備え、情報処理装置は、動画像のフレームの画像データを順次取得する画像データ取得部と、新たに取得した現フレームの画像の画素値に、それより前に取得した過去フレームの画像の画素値を、対応する位置で加算した加算画像を生成する画像加算部と、加算画像から特徴点を抽出して所定の解析処理を実施するとともに、現フレームの画像からも特徴点を抽出して同様の解析処理を実施し、両者の結果を統合する画像解析部と、統合した結果を利用して前記表示画像のデータを生成し出力する出力部と、を備えたことを特徴とする。 Another aspect of the present invention relates to an information processing system. This information processing system includes a head-mounted display equipped with an imaging device that captures a moving image in a field of view corresponding to the user's line of sight, and information processing that generates data of a display image to be displayed on the head-mounted display based on the moving image. The information processing device includes an image data acquisition unit that sequentially acquires image data of moving image frames, and a pixel value of a newly acquired image of the current frame of a past frame acquired before that. An image addition unit that generates an added image in which the pixel values of the image are added at the corresponding positions, and a feature point is extracted from the added image to perform a predetermined analysis process, and the feature point is also extracted from the image of the current frame. It is characterized in that it is provided with an image analysis unit that performs the same analysis processing and integrates the results of both, and an output unit that generates and outputs the data of the display image using the integrated result. ..

本発明のさらに別の態様は画像処理方法に関する。この画像処理方法は、撮影される動画像のフレームの画像データを順次取得しメモリに格納するステップと、新たに取得した現フレームの画像の画素値に、メモリより読み出した、それより前に取得した過去フレームの画像の画素値を、対応する位置で加算した加算画像を生成するステップと、加算画像から特徴点を抽出して所定の解析処理を実施するとともに、現フレームの画像からも特徴点を抽出して同様の解析処理を実施し、両者の結果を統合するステップと、統合した結果を表すデータを出力するステップと、を含むことを特徴とする。 Yet another aspect of the present invention relates to an image processing method. In this image processing method, the image data of the frame of the moving image to be captured is sequentially acquired and stored in the memory, and the pixel value of the newly acquired image of the current frame is read from the memory and acquired before that. In addition to the step of generating an added image in which the pixel values of the images of the past frame are added at the corresponding positions, and the feature points are extracted from the added image and subjected to a predetermined analysis process, the feature points are also obtained from the image of the current frame. It is characterized in that it includes a step of extracting and performing the same analysis process and integrating the results of both, and a step of outputting data representing the integrated result.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、コンピュータプログラムを記録した記録媒体などの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above components and the conversion of the expression of the present invention between a method, a device, a system, a computer program, a recording medium on which a computer program is recorded, and the like are also effective as aspects of the present invention. ..

本発明によると、撮影画像を用いた実物体の位置情報取得や画像表示において、好適な結果を安定的に得ることができる。 According to the present invention, it is possible to stably obtain suitable results in the acquisition of position information of a real object and the display of an image using a photographed image.

実施の形態１の情報処理システムの構成例を示す図である。It is a figure which shows the configuration example of the information processing system of Embodiment 1. 実施の形態１の表示装置をヘッドマウントディスプレイとしたときの外観形状の例を示す図である。It is a figure which shows the example of the appearance shape when the display device of Embodiment 1 is a head-mounted display. 実施の形態１における情報処理装置の内部回路構成を示す図である。It is a figure which shows the internal circuit structure of the information processing apparatus in Embodiment 1. FIG. 実施の形態１における情報処理装置の機能ブロックの構成を示す図である。It is a figure which shows the structure of the functional block of the information processing apparatus in Embodiment 1. FIG. 実施の形態１において、赤外線をパターン照射し、その反射光の撮影画像を用いて被写体の距離を取得する手法を説明するための図である。It is a figure for demonstrating the technique of irradiating the pattern with infrared rays in Embodiment 1, and acquiring the distance of a subject using the photographed image of the reflected light. 実施の形態１においてフレームを加算することによる作用を説明するための図である。It is a figure for demonstrating the action by adding a frame in Embodiment 1. FIG. 実施の形態１において複数のデプス画像を統合する様子を示す図である。It is a figure which shows the state of integrating a plurality of depth images in Embodiment 1. FIG. 実施の形態１において、情報処理装置が撮影画像を用いて位置情報を取得しデータ出力を行う処理手順を示すフローチャートである。FIG. 5 is a flowchart showing a processing procedure in which the information processing apparatus acquires position information using captured images and outputs data in the first embodiment. 実施の形態２における情報処理装置の機能ブロックの構成を示す図である。It is a figure which shows the structure of the functional block of the information processing apparatus in Embodiment 2. FIG. 実施の形態２における画像加算部が、過去フレームの画像を補正したうえで現フレームの画像と加算する様子を模式的に示す図である。It is a figure which shows typically how the image addition part in Embodiment 2 adds with the image of the present frame after correcting the image of the past frame. 実施の形態２における情報処理装置が、撮影画像の輝度レンジを拡張して表示させる処理手順を示すフローチャートである。FIG. 5 is a flowchart showing a processing procedure in which the information processing apparatus according to the second embodiment expands the brightness range of the captured image and displays it. 実施の形態２における加算部が、画素領域をずらして画像を加算する様子を模式的に示す図である。It is a figure which shows typically how the addition part in Embodiment 2 adds an image by shifting a pixel area.

実施の形態１
本実施の形態は、撮影画像から被写体の位置情報を取得する技術に関する。このような技術では多くの場合、撮影画像から特徴点を検出することがなされる。ところが被写空間の明るさや被写体の実際の位置など実空間での状況に応じて、画像上での特徴点の表れ方は様々に変化する。特に輝度が低い領域では特徴点が検出できず、位置情報が不定となったり誤差を多く含んだりすることが考えられる。 Embodiment 1
The present embodiment relates to a technique for acquiring position information of a subject from a captured image. In many cases, such a technique detects feature points from a captured image. However, the appearance of feature points on the image changes in various ways depending on the situation in the real space such as the brightness of the subject space and the actual position of the subject. Especially in a region where the brightness is low, the feature points cannot be detected, and it is conceivable that the position information becomes indefinite or contains a lot of errors.

本実施の形態では、複数フレームの撮影画像を加算することにより輝度レンジを制御し、特徴点の検出精度を向上させる。なお以後の説明では、左右の視点から同じ空間を撮影したステレオ画像を用いて被写体の位置情報を取得する手法に主眼を置くが、本実施の形態は、撮影画像から特徴点を検出する処理を含めば同様に適用でき、その対象はステレオ画像に限らない。また特徴点を検出する目的は被写体の位置情報取得に限らず、顔検出、顔認識、物体検出、視覚追跡など各種画像解析のいずれでもよい。 In the present embodiment, the brightness range is controlled by adding the captured images of a plurality of frames, and the detection accuracy of the feature points is improved. In the following description, the main focus will be on the method of acquiring the position information of the subject using stereo images obtained by capturing the same space from the left and right viewpoints, but in the present embodiment, the process of detecting feature points from the captured images is performed. If included, it can be applied in the same way, and the target is not limited to stereo images. The purpose of detecting the feature points is not limited to the acquisition of the position information of the subject, but may be any of various image analyzes such as face detection, face recognition, object detection, and visual tracking.

図１は、本実施の形態の情報処理システムの構成例を示す。情報処理システム１は、実空間を撮影する撮像装置１２、撮影画像に基づき情報処理を行う情報処理装置１０、情報処理装置１０が出力した画像を表示する表示装置１６を含む。情報処理装置１０はインターネットなどのネットワーク１８と接続可能としてもよい。 FIG. 1 shows a configuration example of the information processing system of the present embodiment. The information processing system 1 includes an image pickup device 12 that captures a real space, an information processing device 10 that performs information processing based on the captured image, and a display device 16 that displays an image output by the information processing device 10. The information processing device 10 may be connectable to a network 18 such as the Internet.

情報処理装置１０と、撮像装置１２、表示装置１６、ネットワーク１８とは、有線ケーブルで接続されてよく、また無線ＬＡＮ（Local Area Network）などにより無線接続されてもよい。撮像装置１２、情報処理装置１０、表示装置１６のうちいずれか２つ、または全てが組み合わされて一体的に装備されてもよい。例えばそれらを装備した携帯端末やヘッドマウントディスプレイなどで情報処理システム１を実現してもよい。いずれにしろ撮像装置１２、情報処理装置１０、表示装置１６の外観形状は図示するものに限らない。また、情報処理の内容によって画像表示を必要としない場合は、表示装置１６はなくてもよい。 The information processing device 10, the image pickup device 12, the display device 16, and the network 18 may be connected by a wired cable, or may be wirelessly connected by a wireless LAN (Local Area Network) or the like. Any two or all of the image pickup device 12, the information processing device 10, and the display device 16 may be combined and integrally equipped. For example, the information processing system 1 may be realized by a mobile terminal or a head-mounted display equipped with them. In any case, the external shapes of the image pickup device 12, the information processing device 10, and the display device 16 are not limited to those shown in the drawing. Further, if the image display is not required depending on the content of the information processing, the display device 16 may not be provided.

撮像装置１２は、既知の間隔を有する左右の位置から被写空間を所定のフレームレートで撮影する一対のカメラを含む。撮像装置１２が左右の視点から撮影した一対の画像、すなわちステレオ画像のデータは、情報処理装置１０へ順次送信される。情報処理装置１０は当該ステレオ画像を解析することにより、撮像面からの距離を含む３次元実空間における被写体の位置情報を取得する。ステレオ画像から被写体の位置情報を取得する技術は従来知られている。 The image pickup apparatus 12 includes a pair of cameras that capture a subject space at a predetermined frame rate from left and right positions having a known interval. A pair of images taken by the image pickup apparatus 12 from the left and right viewpoints, that is, stereo image data are sequentially transmitted to the information processing apparatus 10. By analyzing the stereo image, the information processing device 10 acquires the position information of the subject in the three-dimensional real space including the distance from the imaging surface. A technique for acquiring the position information of a subject from a stereo image is conventionally known.

すなわち一対の画像から同じ被写体の像を表す対応点を求め、その位置ずれを視差として、三角測量の原理によりカメラから被写体までの距離が求められる。画像平面における像の位置と当該距離とから、被写体の３次元空間での位置座標が得られる。情報処理装置１０は例えば、解析により求めた被写体までの距離を、画像平面における像の画素値として表したデプス画像を位置情報として生成する。 That is, the corresponding points representing the images of the same subject are obtained from the pair of images, and the distance from the camera to the subject is obtained by the principle of triangulation using the positional deviation as the parallax. From the position of the image on the image plane and the distance, the position coordinates of the subject in the three-dimensional space can be obtained. The information processing device 10 generates, for example, a depth image in which the distance to the subject obtained by analysis is represented as the pixel value of the image on the image plane as position information.

このような目的において、撮像装置１２が撮影する画像の種類は限定されない。例えば撮像装置１２は、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）センサなど一般的な撮像素子を有するカメラにより、可視光のカラー画像を撮影してもよい。あるいは、赤外線など特定の波長帯の光を検出して当該光の強度分布を表す画像を撮影してもよい。 For this purpose, the type of image captured by the image pickup apparatus 12 is not limited. For example, the image pickup apparatus 12 may capture a color image of visible light by a camera having a general image pickup element such as a CMOS (Complementary Metal Oxide Semiconductor) sensor. Alternatively, light in a specific wavelength band such as infrared rays may be detected and an image showing the intensity distribution of the light may be taken.

この場合、撮像装置１２には、検出対象の波長帯の光を被写空間に照射する機構を設けてよい。スポット状、スリット状、あるいはパターン状の光を照射し、その反射光を撮影することにより被写体の距離を得る手法は、アクティブステレオ法として知られている。アクティブステレオ法は、環境光のカラーステレオ画像から距離を得る、いわゆるパッシブステレオ法と比較し、特徴点の乏しい被写体でも、画像における対応点を抽出しやすい、という特性を有する。 In this case, the image pickup apparatus 12 may be provided with a mechanism for irradiating the subject space with light in the wavelength band to be detected. A method of irradiating spot-shaped, slit-shaped, or patterned light and photographing the reflected light to obtain a distance to the subject is known as an active stereo method. Compared with the so-called passive stereo method, which obtains a distance from a color stereo image of ambient light, the active stereo method has a characteristic that it is easy to extract corresponding points in an image even for a subject having few feature points.

なお位置情報を得るために不可視光を利用する場合、撮像装置１２には一般的なカラー画像を撮影するカメラを別途設け、表示画像の生成など別の目的に用いてもよい。以後、検出する光の波長帯によらず、撮像装置１２が検出した光の輝度の２次元データを「画像」と総称する。情報処理装置１０は、撮像装置１２から送信された画像のデータを用いて上述のとおり被写体の位置情報を所定のレートで取得し、それに基づき適宜情報処理を行い、出力データを生成する。 When invisible light is used to obtain position information, a camera for capturing a general color image may be separately provided in the image pickup apparatus 12 and used for another purpose such as generation of a display image. Hereinafter, the two-dimensional data of the brightness of the light detected by the image pickup apparatus 12 is collectively referred to as an "image" regardless of the wavelength band of the light to be detected. The information processing device 10 acquires the position information of the subject at a predetermined rate as described above using the image data transmitted from the image pickup device 12, and appropriately performs information processing based on the information processing device 10 to generate output data.

ここで出力データの内容は特に限定されず、ユーザがシステムに求める機能や起動させたアプリケーションの内容などによって様々でよい。例えば情報処理装置１０は、被写体の位置情報に基づき、撮影画像に何らかの加工を施したり電子ゲームを進捗させてゲーム画面を生成したりしてもよい。このような態様の代表的なものとして、仮想現実（VR:Virtual Reality）や拡張現実（AR:Augmented Reality）が挙げられる。 Here, the content of the output data is not particularly limited, and may vary depending on the function requested by the user from the system, the content of the started application, and the like. For example, the information processing device 10 may perform some processing on the captured image or advance the electronic game to generate a game screen based on the position information of the subject. Typical examples of such an embodiment include virtual reality (VR) and augmented reality (AR).

表示装置１６は、画像を出力する液晶、プラズマ、有機ＥＬなどのディスプレイと、音声を出力するスピーカーを備え、情報処理装置１０から供給された出力データを画像や音声として出力する。表示装置１６は、テレビ受像器、各種モニター、携帯端末の表示画面などでもよいし、ユーザの頭に装着してその眼前に画像を表示するヘッドマウントディスプレイでもよい。 The display device 16 includes a display such as a liquid crystal display, plasma, or organic EL that outputs an image, and a speaker that outputs sound, and outputs output data supplied from the information processing device 10 as an image or sound. The display device 16 may be a television receiver, various monitors, a display screen of a mobile terminal, or the like, or may be a head-mounted display that is worn on the user's head and displays an image in front of the user's head.

図２は表示装置１６をヘッドマウントディスプレイとしたときの外観形状の例を示している。この例においてヘッドマウントディスプレイ１００は、出力機構部１０２および装着機構部１０４で構成される。装着機構部１０４は、ユーザが被ることにより頭部を一周し装置の固定を実現する装着バンド１０６を含む。 FIG. 2 shows an example of the appearance shape when the display device 16 is a head-mounted display. In this example, the head-mounted display 100 is composed of an output mechanism unit 102 and a mounting mechanism unit 104. The mounting mechanism unit 104 includes a mounting band 106 that goes around the head and realizes fixing of the device when the user wears it.

出力機構部１０２は、ヘッドマウントディスプレイ１００をユーザが装着した状態において左右の目を覆うような形状の筐体１０８を含み、内部には装着時に目に正対するように表示パネルを備える。筐体１０８内部にはさらに、ヘッドマウントディスプレイ１００の装着時に表示パネルとユーザの目との間に位置し、ユーザの視野角を拡大するレンズを備えてよい。またヘッドマウントディスプレイ１００はさらに、装着時にユーザの耳に対応する位置にスピーカーやイヤホンを備えてよい。さらにヘッドマウントディスプレイ１００には、ユーザの頭部の位置や姿勢を取得するため、加速度センサなどの各種モーションセンサを内蔵させてもよい。 The output mechanism 102 includes a housing 108 having a shape that covers the left and right eyes when the head-mounted display 100 is worn by the user, and includes a display panel inside so as to face the eyes when the head-mounted display 100 is worn. Further, a lens located between the display panel and the user's eyes when the head-mounted display 100 is attached may be provided inside the housing 108 to expand the viewing angle of the user. Further, the head-mounted display 100 may further include a speaker or earphone at a position corresponding to the user's ear when worn. Further, the head-mounted display 100 may incorporate various motion sensors such as an acceleration sensor in order to acquire the position and posture of the user's head.

この例でヘッドマウントディスプレイ１００は、撮像装置１２として、筐体１０８の前面にステレオカメラ１１０を備え、ユーザの視線に対応する視野で周囲の実空間を所定のフレームレートで撮影する。このようなヘッドマウントディスプレイ１００によれば、ユーザの視野にある実物体の見かけの形状や位置情報を取得できる。ＳＬＡＭ（Simultaneous Localization and Mapping）の技術を導入すれば、それらの情報に基づきユーザの頭部の位置や姿勢を取得することもできる。 In this example, the head-mounted display 100 includes a stereo camera 110 on the front surface of the housing 108 as the image pickup device 12, and photographs the surrounding real space at a predetermined frame rate in a field of view corresponding to the user's line of sight. According to such a head-mounted display 100, it is possible to acquire the apparent shape and position information of a real object in the user's field of view. If SLAM (Simultaneous Localization and Mapping) technology is introduced, the position and posture of the user's head can be acquired based on the information.

このような情報を用いて仮想世界に対する視野を決定し、左眼視用、右眼視用の表示画像を生成して、ヘッドマウントディスプレイの左右の領域に表示させれば、あたかも眼前に仮想世界が広がっているような仮想現実を実現できる。また、左右の視点から撮影したカラー画像に、被写体である実物体とインタラクションする仮想オブジェクトを重畳描画し表示すれば、拡張現実を実現できる。表示にカラー画像を用い、被写体の情報を得るのに特定の波長帯の画像を用いる場合、ヘッドマントディスプレイ１００には、検出する波長帯ごとにステレオカメラ１１０を複数セット備えてもよい。 By using such information to determine the field of view for the virtual world, generate display images for left-eye vision and right-eye vision, and display them in the left and right areas of the head-mounted display, it is as if the virtual world is in front of you. It is possible to realize a virtual reality that seems to be spreading. Augmented reality can be realized by superimposing and displaying a virtual object that interacts with a real object that is the subject on a color image taken from the left and right viewpoints. When a color image is used for display and an image of a specific wavelength band is used to obtain information on the subject, the head cloak display 100 may be provided with a plurality of sets of stereo cameras 110 for each wavelength band to be detected.

なお情報処理装置１０は、ヘッドマウントディスプレイ１００と通信を確立できる外部装置としてもよいし、ヘッドマウントディスプレイ１００に内蔵してもよい。このように本実施の形態の情報処理システム１は、様々な態様への適用が可能であるため、各装置の構成や外観形状もそれに応じて適宜決定してよい。このような態様において、被写体の位置や被写空間の状態が変化すると、撮影画像における像の表れ方も変化する。 The information processing device 10 may be an external device capable of establishing communication with the head-mounted display 100, or may be built in the head-mounted display 100. As described above, since the information processing system 1 of the present embodiment can be applied to various aspects, the configuration and appearance shape of each device may be appropriately determined accordingly. In such an aspect, when the position of the subject or the state of the subject space changes, the appearance of the image in the captured image also changes.

例えば特徴点が多い表面形状の被写体であっても、照度が低い環境では、撮影画像において像が鮮明に得られない場合がある。また特定の波長帯の光を照射し、その反射光を観測する態様では、光の照射強度によっては遠くにある被写体からの反射光が十分な輝度で得られない場合がある。結果として、ステレオ画像における対応点の抽出に失敗し、位置情報が得られなかったり精度が低くなったりすることが考えられる。 For example, even if the subject has a surface shape with many feature points, a clear image may not be obtained in the captured image in an environment with low illuminance. Further, in the embodiment of irradiating light in a specific wavelength band and observing the reflected light, the reflected light from a distant subject may not be obtained with sufficient brightness depending on the irradiation intensity of the light. As a result, it is conceivable that the extraction of the corresponding points in the stereo image fails, the position information cannot be obtained, or the accuracy becomes low.

それらの対策として、露光時間やゲイン値など撮影条件や画像補正パラメータを調整したり、照射光の強度を調整したりすることが考えられる。ところが被写体の位置や光の当たり具合は様々に変化し、また同じ被写空間でも最適条件は一つとは限らない。例えば光の照射強度を調整する場合、遠くの被写体に合わせて強度を高めることにより、近くの被写体では反射光が強すぎて像が不鮮明になることがあり得る。カラー画像においても同様に、露光時間を長くしたりゲイン値を増加させたりすることにより、元々高輝度であった領域全体が白っぽくなってしまう場合がある。そもそも撮影条件、補正パラメータ、照射強度などの組み合わせを、状況が変化する都度、最適化するのは困難である。 As a countermeasure against them, it is conceivable to adjust shooting conditions such as exposure time and gain value and image correction parameters, and to adjust the intensity of irradiation light. However, the position of the subject and the degree of light hitting it vary, and even in the same subject space, the optimum conditions are not always one. For example, when adjusting the irradiation intensity of light, by increasing the intensity according to a distant subject, the reflected light may be too strong for a nearby subject and the image may become unclear. Similarly, in a color image, by lengthening the exposure time or increasing the gain value, the entire region that was originally high in brightness may become whitish. In the first place, it is difficult to optimize the combination of shooting conditions, correction parameters, irradiation intensity, etc. each time the situation changes.

そこで本実施の形態ではそれらの条件を一定としたうえで、得られた撮影画像に、直前のフレームの画像を加算することで、画素値のレンジを増幅させる。すなわち撮像装置がΔｔの周期でフレームを動画撮影しているとすると、時刻ｔの現フレームの各画素値に、ｔ−Δｔ、ｔ−２Δｔ、・・・、ｔ−ＮΔｔの時刻のフレームの同じ位置の画素値を加算する。ここでＮは加算する過去フレームの数を表す自然数である。例えばＮ＝３とすると、現フレームを含め４フレームの画像を加算することになる。 Therefore, in the present embodiment, the range of pixel values is amplified by adding the image of the immediately preceding frame to the obtained captured image after keeping these conditions constant. That is, assuming that the imaging device shoots a moving image of a frame at a period of Δt, the same pixel value of the current frame at time t is the same as the frame at the time t−Δt, t-2Δt, ···, t−NΔt. Add the pixel values of the positions. Here, N is a natural number representing the number of past frames to be added. For example, if N = 3, images of 4 frames including the current frame are added.

このようにすると、加算後の画像の画素値は元の画像のおよそＮ＋１倍の値となる。これにより、照射光の反射位置や特徴点など、周囲の画素との差が大きく表れるべき部分で輝度が増幅され検出が容易になる。また別のフレームの画像を加算することにより、ノイズについては平均化されＳＮ比が高くなる。結果として被写体までの距離が遠かったり、光の強度が低かったりしても、解析に足る輝度の画像を得ることができ、被写体の位置情報を正確に取得できる。 In this way, the pixel value of the added image is approximately N + 1 times the value of the original image. As a result, the brightness is amplified in the portion where the difference from the surrounding pixels should appear large, such as the reflection position of the irradiation light and the feature point, and the detection becomes easy. By adding the images of another frame, the noise is averaged and the SN ratio becomes high. As a result, even if the distance to the subject is long or the light intensity is low, an image having brightness sufficient for analysis can be obtained, and the position information of the subject can be accurately acquired.

このようにして反射光の強度が低い被写体でも像の輝度を保障できるため、露光時間やゲイン値を増加させたり、照射光の強度を高めたりする必要がなくなる。つまり撮影側で光の強度を上げずに高精度な解析が可能になる。一方、元々輝度が得られている被写体がある場合、当該被写体については過去フレームの加算は必要ない。そこで、過去フレームを加算して位置情報を取得する処理経路と、過去フレームを加算せずに現フレームのみで位置情報を取得する処理経路とを設けることにより、様々な被写体の状態を網羅しつつ各位置情報を正確に取得できる。また加算する過去フレームの数を２通り以上とすることで、輝度の増幅率を３段階以上としてもよい。 In this way, the brightness of the image can be guaranteed even for a subject having a low intensity of the reflected light, so that it is not necessary to increase the exposure time and the gain value or increase the intensity of the irradiation light. In other words, high-precision analysis is possible without increasing the light intensity on the shooting side. On the other hand, if there is a subject whose brightness is originally obtained, it is not necessary to add the past frames for the subject. Therefore, by providing a processing path for acquiring position information by adding past frames and a processing path for acquiring position information only with the current frame without adding past frames, it is possible to cover various subject states. Each position information can be acquired accurately. Further, by setting the number of past frames to be added to two or more, the luminance amplification factor may be set to three or more steps.

図３は情報処理装置１０の内部回路構成を示している。情報処理装置１０は、ＣＰＵ（Central Processing Unit）２３、ＧＰＵ（Graphics Processing Unit)２４、メインメモリ２６を含む。これらの各部は、バス３０を介して相互に接続されている。バス３０にはさらに入出力インターフェース２８が接続されている。入出力インターフェース２８には、ＵＳＢやＩＥＥＥ１３９４などの周辺機器インターフェースや、有線又は無線ＬＡＮのネットワークインターフェースからなる通信部３２、ハードディスクドライブや不揮発性メモリなどの記憶部３４、表示装置１６へデータを出力する出力部３６、撮像装置１２や図示しない入力装置からデータを入力する入力部３８、磁気ディスク、光ディスクまたは半導体メモリなどのリムーバブル記録媒体を駆動する記録媒体駆動部４０が接続される。 FIG. 3 shows the internal circuit configuration of the information processing device 10. The information processing device 10 includes a CPU (Central Processing Unit) 23, a GPU (Graphics Processing Unit) 24, and a main memory 26. Each of these parts is connected to each other via a bus 30. An input / output interface 28 is further connected to the bus 30. The input / output interface 28 outputs data to a peripheral device interface such as USB or IEEE1394, a communication unit 32 composed of a wired or wireless LAN network interface, a storage unit 34 such as a hard disk drive or a non-volatile memory, and a display device 16. An output unit 36, an input unit 38 for inputting data from an image pickup device 12 or an input device (not shown), and a recording medium drive unit 40 for driving a removable recording medium such as a magnetic disk, an optical disk, or a semiconductor memory are connected.

ＣＰＵ２３は、記憶部３４に記憶されているオペレーティングシステムを実行することにより情報処理装置１０の全体を制御する。ＣＰＵ２３はまた、リムーバブル記録媒体から読み出されてメインメモリ２６にロードされた、あるいは通信部３２を介してダウンロードされた各種プログラムを実行する。ＧＰＵ２４は、ジオメトリエンジンの機能とレンダリングプロセッサの機能とを有し、ＣＰＵ２３からの描画命令に従って描画処理を行い、出力部３６に出力する。メインメモリ２６はＲＡＭ（Random Access Memory）により構成され、処理に必要なプログラムやデータを記憶する。 The CPU 23 controls the entire information processing apparatus 10 by executing the operating system stored in the storage unit 34. The CPU 23 also executes various programs read from the removable recording medium, loaded into the main memory 26, or downloaded via the communication unit 32. The GPU 24 has a geometry engine function and a rendering processor function, performs drawing processing according to a drawing command from the CPU 23, and outputs the drawing process to the output unit 36. The main memory 26 is composed of a RAM (Random Access Memory) and stores programs and data required for processing.

図４は情報処理装置１０の機能ブロックの構成を示している。図４および後述する図９に示す装置の各機能ブロックは、ハードウェア的には、図３で示した各種回路によりで実現でき、ソフトウェア的には、記録媒体からメインメモリにロードした、画像解析機能、情報処理機能、画像描画機能、データ入出力機能などの諸機能を発揮するプログラムで実現される。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 FIG. 4 shows the configuration of the functional block of the information processing device 10. Each functional block of the apparatus shown in FIG. 4 and FIG. 9 to be described later can be realized by various circuits shown in FIG. 3 in terms of hardware, and in terms of software, image analysis loaded from a recording medium into the main memory. It is realized by a program that exerts various functions such as functions, information processing functions, image drawing functions, and data input / output functions. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any of them.

情報処理装置１０は、撮像装置１２から画像のデータを取得する画像データ取得部５２、取得した画像のデータを格納する画像データ格納部５４、所定数の過去フレームの画像を足し合わせる画像加算部５６、画像を解析して被写体の位置情報を得る画像解析部５８、位置情報など解析結果を利用して情報処理を行う情報処理部６０、および、出力すべきデータを出力する出力部６２を含む。 The information processing device 10 includes an image data acquisition unit 52 that acquires image data from the image pickup device 12, an image data storage unit 54 that stores the acquired image data, and an image addition unit 56 that adds a predetermined number of images of past frames. Includes an image analysis unit 58 that analyzes an image to obtain position information of a subject, an information processing unit 60 that processes information using analysis results such as position information, and an output unit 62 that outputs data to be output.

画像データ取得部５２は、図３の入力部３８、ＣＰＵ２３、メインメモリ２６などで実現され、撮像装置１２が所定のフレームレートで撮影する画像のデータを順次取得する。当該データには、可視光のカラーステレオ画像、赤外線など特定の波長帯の光を照射した結果得られた反射光のステレオ画像など、特徴点検出の対象とする画像のデータが含まれる。画像データ取得部５２は、所定のフレームレートで送られる画像のデータを順次、画像データ格納部５４に格納する。画像データ格納部５４には少なくとも、現フレームおよび過去の所定数のフレームの画像データが格納される。 The image data acquisition unit 52 is realized by the input unit 38, the CPU 23, the main memory 26, and the like in FIG. 3, and sequentially acquires image data to be captured by the image pickup apparatus 12 at a predetermined frame rate. The data includes data of an image to be detected as a feature point, such as a color stereo image of visible light and a stereo image of reflected light obtained as a result of irradiating light in a specific wavelength band such as infrared light. The image data acquisition unit 52 sequentially stores image data sent at a predetermined frame rate in the image data storage unit 54. The image data storage unit 54 stores at least image data of the current frame and a predetermined number of past frames.

画像加算部５６は、図３のＣＰＵ２３、ＧＰＵ２４、メインメモリ２６などで実現され、画像データ格納部５４に格納された現フレームの画像データと、その直前に格納された、所定数の過去フレームの画像データを読み出す。そして同じ位置の画素同士で画素値を加算した加算画像を、ステレオ画像の左右の視点それぞれについて生成する。上述のとおり、加算するフレーム数を異ならせた複数の加算画像の対を生成してもよい。加算画像を生成する頻度は、位置情報に求められる時間分解能に基づき決定され、撮像装置１２が画像を撮影するフレームレートと同じでもよいし、それより小さくてもよい。 The image addition unit 56 is realized by the CPU 23, GPU 24, main memory 26, etc. of FIG. 3, and is the image data of the current frame stored in the image data storage unit 54 and a predetermined number of past frames stored immediately before the image data. Read the image data. Then, an added image in which the pixel values are added to the pixels at the same position is generated for each of the left and right viewpoints of the stereo image. As described above, a plurality of pairs of added images with different numbers of frames to be added may be generated. The frequency of generating the additive image is determined based on the time resolution required for the position information, and may be the same as or less than the frame rate at which the image pickup apparatus 12 captures the image.

画像解析部５８は、図３のＣＰＵ２３、ＧＰＵ２４、メインメモリ２６などで実現され、画像加算部５６が加算画像のステレオ画像を生成する都度、それらから対応点を特定し、その視差に基づき三角測量の原理で被写体までの距離を取得する。画像解析部５８はそれと並行して、加算前の現フレームのステレオ画像から対応点を特定し、それに基づいても被写体の距離を取得する。そして画像解析部５８は両者の結果を統合し、被写体の状態によらず精度が均一な、最終的な位置情報を所定の頻度で生成する。 The image analysis unit 58 is realized by the CPU 23, GPU 24, main memory 26, etc. in FIG. 3, and each time the image addition unit 56 generates a stereo image of the addition image, a corresponding point is specified from them, and triangulation is performed based on the parallax. Obtain the distance to the subject based on the principle of. In parallel with this, the image analysis unit 58 identifies the corresponding points from the stereo image of the current frame before addition, and acquires the distance of the subject based on the corresponding points. Then, the image analysis unit 58 integrates the results of both and generates final position information having uniform accuracy regardless of the state of the subject at a predetermined frequency.

情報処理部６０は、図３のＣＰＵ２３、メインメモリ２６などで実現され、画像解析部５８が生成した位置情報を順次取得して、それを用いて所定の情報処理を実施する。上述のとおりここで実施する情報処理の内容は特に限定されない。情報処理部６０は当該情報処理の結果として、表示画像や音声などの出力データを所定の頻度で生成する。この際、必要に応じて画像データ格納部５４に格納された現フレームの画像データを読み出し、出力データの生成に用いる。出力部６２は、図３のＣＰＵ２３、出力部３６などで構成され、生成された出力データを順次、表示装置１６に適したタイミングで出力する。 The information processing unit 60 is realized by the CPU 23, the main memory 26, and the like in FIG. 3, sequentially acquires the position information generated by the image analysis unit 58, and performs predetermined information processing using the position information. As described above, the content of the information processing performed here is not particularly limited. As a result of the information processing, the information processing unit 60 generates output data such as display images and sounds at a predetermined frequency. At this time, if necessary, the image data of the current frame stored in the image data storage unit 54 is read out and used for generating output data. The output unit 62 is composed of the CPU 23, the output unit 36, and the like shown in FIG. 3, and sequentially outputs the generated output data at a timing suitable for the display device 16.

図５は、本実施の形態の一例として、赤外線をパターン照射し、その反射光の撮影画像を用いて被写体の距離を取得する手法を説明するための図である。図５の（ａ）、（ｂ）はそれぞれ、左視点および右視点の撮影画像を模式的に示している。各撮影画像には被写体である２人の人物７０、７２が写っている。人物７０は人物７２より撮像装置１２に近い位置にいる。また（ａ）に示す左視点の画像は、（ｂ）に示す右視点の画像より、被写体の像が右に寄っている。 FIG. 5 is a diagram for explaining a method of irradiating a pattern of infrared rays as an example of the present embodiment and acquiring a distance of a subject by using a photographed image of the reflected light. (A) and (b) of FIG. 5 schematically show captured images of the left viewpoint and the right viewpoint, respectively. Two people 70 and 72 who are the subjects are shown in each photographed image. The person 70 is closer to the image pickup device 12 than the person 72. Further, in the image of the left viewpoint shown in (a), the image of the subject is closer to the right than the image of the right viewpoint shown in (b).

このような状況において、スポット状の赤外線を所定の分布で被写空間に照射すると、人物７０、７２の表面での反射光が撮影画像にスポット状に表れる（例えば像７４ａ、７４ｂ、７６ａ、７６ｂ）。照射した赤外線の分布パターンは既知のため、撮影画像における反射光の像の分布パターンに基づき、左右の視点の画像における対応点が求められる。例えば（ａ）の左視点の画像における像７４ａ、７６ａはそれぞれ、（ｂ）の右視点の画像における像７４ｂ、７６ｂに対応することがわかる。 In such a situation, when the subject space is irradiated with spot-shaped infrared rays in a predetermined distribution, the reflected light on the surfaces of the persons 70 and 72 appears in spots on the captured image (for example, images 74a, 74b, 76a, 76b). ). Since the distribution pattern of the irradiated infrared rays is known, the corresponding points in the images of the left and right viewpoints can be obtained based on the distribution pattern of the image of the reflected light in the captured image. For example, it can be seen that the images 74a and 76a in the left viewpoint image of (a) correspond to the images 74b and 76b in the right viewpoint image of (b), respectively.

ここで、人物７０での同じ反射を表す像７４ａ、７４ｂがそれぞれの画像の横方向の位置ｘ１＿Ｌ、ｘ１＿Ｒで検出されたとすると、視差はｘ１＿Ｌ−ｘ１＿Ｒである。同様に、人物７２での同じ反射を表す像７６ａ、７６ｂがそれぞれの画像の横方向の位置ｘ２＿Ｌ、ｘ２＿Ｒで検出されたとすると、視差はｘ２＿Ｌ−ｘ２＿Ｒである。定性的に被写体までの距離は視差に反比例するため、キャリブレーションにより反比例の定数を求めておけば、視差に基づき距離を導出できる。 Here, assuming that the images 74a and 74b representing the same reflection on the person 70 are detected at the lateral positions x1_L and x1_R of the respective images, the parallax is x1_L-x1_R. Similarly, assuming that the images 76a and 76b representing the same reflection on the person 72 are detected at the lateral positions x2_L and x2_R of the respective images, the parallax is x2_L-x2_R. Since the distance to the subject is qualitatively inversely proportional to the parallax, the distance can be derived based on the parallax by obtaining the inverse proportional constant by calibration.

図示する例では人物７０の方が近くにいるため、その視差ｘ１＿Ｌ−ｘ１＿Ｒは人物７２の視差ｘ２＿Ｌ−ｘ２＿Ｒより大きくなっている。このような系において、撮像装置１２の近くにいる人物７０における反射光の像、例えば像７４ａ、７４ｂの輝度が適切に得られるような強度で赤外線を照射した場合、人物７２における反射光の像、例えば像７６ａ、７６ｂの輝度が得られず、場合によってはノイズとの差がなくなってしまうことが考えられる。この場合、画像から像７６ａ、７６ｂを検出できず、人物７２の位置情報が得られなくなってしまう。 In the illustrated example, since the person 70 is closer, the parallax x1_L-x1_R is larger than the parallax x2_L-x2_R of the person 72. In such a system, when the image of the reflected light in the person 70 near the image pickup device 12, for example, the image of the reflected light in the person 72 is irradiated with infrared rays at an intensity such that the brightness of the images 74a and 74b can be appropriately obtained, the image of the reflected light in the person 72. For example, it is conceivable that the brightness of the images 76a and 76b cannot be obtained, and in some cases, the difference from the noise disappears. In this case, the images 76a and 76b cannot be detected from the image, and the position information of the person 72 cannot be obtained.

図６は、図５と同様の状況において、フレームを加算することによる作用を説明するための図である。図６の（ａ）は、現フレームのみを用いて位置情報を生成する処理経路、（ｂ）は加算画像を用いて位置情報を生成する処理経路を模式的に示している。ここで位置情報は、被写体までの距離を画素値とするデプス画像とし、距離が小さいほど高い輝度としている。ただし位置情報をこれに限る主旨ではない。（ａ）の処理経路では、現フレームのステレオ画像７８ａ、７８ｂのみを用いて対応点を検出し、その視差から被写体の距離を導出する。 FIG. 6 is a diagram for explaining the action of adding frames in the same situation as in FIG. FIG. 6A schematically shows a processing path for generating position information using only the current frame, and FIG. 6B schematically shows a processing path for generating position information using an additive image. Here, the position information is a depth image in which the distance to the subject is a pixel value, and the smaller the distance, the higher the brightness. However, the purpose of location information is not limited to this. In the processing path (a), the corresponding points are detected using only the stereo images 78a and 78b of the current frame, and the distance of the subject is derived from the parallax.

ところが後方にいる人物７２で反射する光の強度が弱く、図示するように本来あるべき反射光の像が画像に明確に表れなかった場合、正確な視差が求められず距離の精度が低下する。その結果、位置情報取得部５８が生成するデプス画像８０において、前方の人物７０の距離値が正確に表される一方、後方の人物７２の距離値は不定となり表されなかったり、誤差を多く含む距離値が表されたりする。 However, when the intensity of the light reflected by the person 72 behind is weak and the image of the reflected light that should be originally does not appear clearly in the image as shown in the figure, accurate parallax is not required and the accuracy of the distance is lowered. As a result, in the depth image 80 generated by the position information acquisition unit 58, the distance value of the front person 70 is accurately represented, while the distance value of the rear person 72 is indefinite and is not represented or contains a lot of errors. The distance value is expressed.

（ｂ）の処理経路ではまず画像加算部５６が、現フレームのステレオ画像７８ａ、７８ｂのそれぞれに、直前の３フレームの画像８２ａ、８２ｂを加算し、加算ステレオ画像８４ａ、８４ｂを生成する。この加算ステレオ画像では、輝度値が元の画像のおよそ４倍となるため、加算前では明確でなかった人物７２での反射光の像（例えば像８６）が明確になる。一方、加算前の画像で適度な輝度が得られていた人物７０での反射光の像は、加算することにより演算に想定されている輝度の上限を超えてしまうことがあり得る。 In the processing path (b), the image addition unit 56 first adds the images 82a and 82b of the immediately preceding three frames to the stereo images 78a and 78b of the current frame, respectively, to generate the addition stereo images 84a and 84b. In this addition stereo image, the brightness value is about four times that of the original image, so that the image of the reflected light (for example, image 86) of the person 72, which was not clear before the addition, becomes clear. On the other hand, the image of the reflected light on the person 70 in which the appropriate brightness was obtained in the image before the addition may exceed the upper limit of the brightness assumed in the calculation by the addition.

同図では、そのような反射光の像（例えば像８８）を星形で示している。その結果、位置情報取得部５８が生成するデプス画像９０において、後方の人物７２の距離値が正確に表され、前方の人物７０の距離値は不定となり表されなかったり、誤差を多く含む距離値が表されたりする。このように加算の必要性や適切な加算フレーム数は、被写体の位置、被写体にあたる光の強度、撮影露光時間などの撮影条件、画像補正パラメータなどによって様々となる。 In the figure, such an image of reflected light (for example, image 88) is shown in a star shape. As a result, in the depth image 90 generated by the position information acquisition unit 58, the distance value of the person 72 behind is accurately represented, and the distance value of the person 70 in front is indefinite and is not represented, or is a distance value including many errors. Is expressed. As described above, the necessity of addition and the appropriate number of frames to be added vary depending on the position of the subject, the intensity of the light hitting the subject, the shooting conditions such as the shooting exposure time, the image correction parameters, and the like.

図示するように複数の処理経路を設け、独立して解析を行いそれぞれに対し位置情報を生成すれば、被写空間がどのような状態であっても、また照射強度、撮影条件、補正パラメータなどを調整せずとも、いずれかに精度の高い情報が含まれていることになる。そこで位置情報取得部５８は、各位置情報から精度が高いと見込まれる情報を抽出して合成し、最終的な位置情報を１つ生成する。 If multiple processing paths are provided as shown in the figure, analysis is performed independently, and position information is generated for each, regardless of the state of the subject space, irradiation intensity, shooting conditions, correction parameters, etc. Even if you do not adjust, one of them will contain highly accurate information. Therefore, the position information acquisition unit 58 extracts and synthesizes information that is expected to have high accuracy from each position information, and generates one final position information.

図７は、図６で示した２つのデプス画像８０、９０を統合する様子を示している。図示する例では、デプス画像によって距離値が表されている被写体が明確に分かれている。このような場合、双方のデプス画像を比較し、一方のみに距離値が表されている領域を抽出して、他方のデプス画像における対応する領域の画素値を置き換えればよい。 FIG. 7 shows how the two depth images 80 and 90 shown in FIG. 6 are integrated. In the illustrated example, the subject whose distance value is represented by the depth image is clearly separated. In such a case, both depth images may be compared, a region in which the distance value is represented in only one of them may be extracted, and the pixel value of the corresponding region in the other depth image may be replaced.

例えば基準となるデプス画像８０をラスタ順などで走査していき、距離値として有効な値が格納されていない画素を検出したら、他方のデプス画像９０の同じ画素を参照する。当該画素に有効な値が格納されていれば、その値で元のデプス画像８０の画素値を更新する。この処理をデプス画像８０の全ての画素で行うことにより、２つのデプス画像８０、９０を統合したデプス画像９２を生成できる。 For example, the reference depth image 80 is scanned in raster order or the like, and when a pixel in which a valid value as a distance value is not stored is detected, the same pixel of the other depth image 90 is referred to. If a valid value is stored in the pixel, the pixel value of the original depth image 80 is updated with that value. By performing this process on all the pixels of the depth image 80, it is possible to generate a depth image 92 in which the two depth images 80 and 90 are integrated.

なお元のデプス画像８０、９０を生成する際、距離値の信頼度を画素ごとに取得しておき、統合時には信頼度の高い方の距離値を採用するようにしてもよい。例えばステレオ画像の一方に定めた微小ブロックに対し、他方の微小ブロックを水平方向に移動させて最も高い類似度が得られる位置を対応点として求めるブロックマッチングの手法では、対応点を決定した際のブロックの類似度の大きさに基づき信頼度を決定できる。距離値の信頼度はこのほか、様々な基準で取得できることは当業者には理解されるところである。 When generating the original depth images 80 and 90, the reliability of the distance value may be acquired for each pixel, and the distance value having the higher reliability may be adopted at the time of integration. For example, in the block matching method in which the position where the highest similarity is obtained by moving the other minute block in the horizontal direction with respect to the minute block defined on one side of the stereo image is obtained as the corresponding point, the corresponding point is determined. The reliability can be determined based on the degree of similarity of the blocks. It is understood by those skilled in the art that the reliability of the distance value can be obtained by various other criteria.

図６、図７は、ステレオ画像から複数の処理経路でデプス画像を生成する例を示しているが、上述のとおり本実施の形態は特徴点を検出する処理を含めば出力する情報は限定されず同様の効果を得ることができる。すなわち輝度レンジが異なる複数の被写体のそれぞれについて、異なる処理経路で独立に解析結果を取得し、それらを統合すれば、被写体の状況に関わらず精度が保障された解析結果を出力できる。 6 and 7 show an example of generating a depth image from a stereo image by a plurality of processing paths, but as described above, in the present embodiment, the information to be output is limited if the processing for detecting the feature point is included. The same effect can be obtained. That is, if the analysis results of a plurality of subjects having different brightness ranges are independently acquired by different processing paths and integrated, the analysis results whose accuracy is guaranteed can be output regardless of the situation of the subject.

次に、以上の構成によって実現される、本実施の形態における情報処理装置の動作を説明する。図８は、情報処理装置１０が撮影画像を用いて位置情報を取得しデータ出力を行う処理手順を示すフローチャートである。このフローチャートは、被写体の位置情報を必要とする電子コンテンツをユーザが選択し、撮像装置１２において撮影がなされている状態での情報処理装置１０の動作を示している。このとき表示装置１６には必要に応じて初期画像が表示されている。 Next, the operation of the information processing apparatus according to the present embodiment realized by the above configuration will be described. FIG. 8 is a flowchart showing a processing procedure in which the information processing apparatus 10 acquires position information using captured images and outputs data. This flowchart shows the operation of the information processing device 10 in a state where the user selects electronic content that requires the position information of the subject and the image pickup device 12 is taking a picture. At this time, an initial image is displayed on the display device 16 as needed.

まず情報処理装置１０の画像データ取得部５２は、撮像装置１２から現在時刻ｔのフレームのステレオ画像データを取得し、画像データ格納部５４に格納する（Ｓ１０）。上述のとおり、特定の波長帯を検出したステレオ画像から位置情報を取得する場合、さらにカラー画像のデータを取得してもよい。位置情報取得部５８は、当該時刻ｔのフレームのステレオ画像を画像データ格納部５４から読み出し、それらの対応点を検出して被写体の距離値を求めることにより、デプス画像を生成する（Ｓ１２）。 First, the image data acquisition unit 52 of the information processing device 10 acquires stereo image data of the frame at the current time t from the image pickup device 12 and stores it in the image data storage unit 54 (S10). As described above, when the position information is acquired from the stereo image in which a specific wavelength band is detected, the data of the color image may be further acquired. The position information acquisition unit 58 generates a depth image by reading the stereo image of the frame at the time t from the image data storage unit 54, detecting the corresponding points thereof, and obtaining the distance value of the subject (S12).

一方、画像加算部５６は、現フレームの直前に取得した所定数の過去フレームのステレオ画像のデータを画像データ格納部５４から読み出し、現フレームの画像とともに同じ位置の画素同士で画素値を足し合わせることにより、加算ステレオ画像のデータを生成する（Ｓ１４）。ただしこの処理は当然、過去フレームの画像が所定数、画像データ格納部５４に格納された時点から実施する。加算する過去フレームの数は、論理的あるいは実験などにより最適値を求めておく。例えば被写体の距離の想定範囲と、照射光の強度や環境光の照度の想定範囲との組み合わせから、ステレオ画像において得られる輝度の想定範囲が判明する。 On the other hand, the image addition unit 56 reads out a predetermined number of stereo image data of the past frame acquired immediately before the current frame from the image data storage unit 54, and adds the pixel values to the pixels at the same position together with the image of the current frame. As a result, the data of the added stereo image is generated (S14). However, as a matter of course, this process is performed from the time when a predetermined number of images of the past frame are stored in the image data storage unit 54. For the number of past frames to be added, find the optimum value logically or experimentally. For example, from the combination of the assumed range of the distance of the subject and the assumed range of the intensity of the irradiation light and the illuminance of the ambient light, the assumed range of the brightness obtained in the stereo image can be found.

その想定範囲のうち最低値近傍の輝度値が、対応点の検出に十分な値となり、かつ対応点取得処理に想定される上限値を十分下回るような倍率を計算することにより、適切な加算フレーム数を求めることができる。典型的には３つの過去フレームを加算することにより輝度レンジを４倍とし、輝度の階調を２ビット分、増加させる。加算数を２、４、８、・・・などと複数種類とし、Ｓ１４、Ｓ１６の処理をそれぞれの加算数で実施してもよい。 An appropriate addition frame is calculated by calculating a magnification such that the brightness value near the lowest value in the assumed range becomes a value sufficient for detecting the corresponding point and sufficiently lower than the upper limit value assumed for the corresponding point acquisition process. You can find the number. Typically, by adding three past frames, the luminance range is quadrupled and the luminance gradation is increased by 2 bits. There may be a plurality of types of addition numbers such as 2, 4, 8, ..., And the processing of S14 and S16 may be performed with the respective addition numbers.

図６の加算ステレオ画像８４ａ、８４ｂで示したように、元から十分な輝度が得られている像は、加算することにより演算に想定される上限を超えることがあり得る。そのためあらかじめ輝度の上限値を設定しておき、加算結果が上限値を超える場合、画像加算部５６は得られた輝度を当該上限値に置き換える。これにより、対応点の検出に係る演算プログラムを変更することなく、本実施の形態を容易に導入できる。位置情報取得部５８は、過去フレームが加算されたステレオ画像を画像加算部５６から取得し、それらの対応点を検出して被写体の距離値を求めることにより、加算画像に基づくデプス画像を生成する（Ｓ１６）。 As shown in the added stereo images 84a and 84b of FIG. 6, an image in which sufficient brightness is originally obtained may exceed the upper limit assumed in the calculation by adding. Therefore, the upper limit value of the brightness is set in advance, and when the addition result exceeds the upper limit value, the image addition unit 56 replaces the obtained brightness with the upper limit value. As a result, the present embodiment can be easily introduced without changing the arithmetic program related to the detection of the corresponding points. The position information acquisition unit 58 acquires a stereo image to which past frames have been added from the image addition unit 56, detects corresponding points thereof, and obtains a distance value of the subject to generate a depth image based on the added image. (S16).

次に位置情報取得部５８は、Ｓ１２で生成した現フレームの画像に基づくデプス画像と、加算画像に基づくデプス画像とを統合し、被写体の様々な状態を網羅した１つのデプス画像を生成する（Ｓ１８）。情報処理部６０は、当該デプス画像を用いて所定の情報処理を実施する（Ｓ２０）。上述のとおり情報処理の内容は特に限定されず、ユーザが選択したアプリケーションなどによって異なってよい。情報処理部６０は情報処理の結果として表示画像や音声などの出力データを生成し、出力部６２がそれを表示装置１６などに出力することで、時刻ｔでの実空間の状況に対応した出力がなされる（Ｓ２２）。 Next, the position information acquisition unit 58 integrates the depth image based on the image of the current frame generated in S12 and the depth image based on the additive image to generate one depth image covering various states of the subject ((). S18). The information processing unit 60 performs predetermined information processing using the depth image (S20). As described above, the content of information processing is not particularly limited and may differ depending on the application selected by the user and the like. The information processing unit 60 generates output data such as a display image and sound as a result of information processing, and the output unit 62 outputs the output data to the display device 16 or the like, so that the output corresponds to the situation in the real space at time t. Is done (S22).

ゲームの進捗やユーザ操作などにより処理を終了させる必要が生じない間は（Ｓ２４のＮ）、次の時刻ｔ＝ｔ＋Δｔのフレームの画像データを取得し（Ｓ２６、Ｓ１０）、Ｓ１２〜Ｓ２２の処理を繰り返す。処理を終了させる必要が生じたら、全ての処理を終了させる（Ｓ２４のＹ）。 While it is not necessary to end the process due to the progress of the game or user operation (N in S24), the image data of the frame at the next time t = t + Δt is acquired (S26, S10), and the processes in S12 to S22 are performed. repeat. When it becomes necessary to end the processing, all the processing is terminated (Y in S24).

以上述べた本実施の形態によれば、撮影画像を解析して特徴点を検出し、被写体の位置情報を取得する技術において、現フレームの直前に撮影された所定数の過去フレームの画像を加算したうえで解析を実施する。これにより、輝度が乏しく特徴点として検出が困難な画素の輝度レンジを増幅できるとともに、相対的にノイズのレベルを下げることができる。結果として反射光の像や特徴点の検出精度が向上し、ひいては位置情報を正確に求めることができる。 According to the present embodiment described above, in the technique of analyzing the captured image to detect the feature points and acquiring the position information of the subject, a predetermined number of images of the past frames captured immediately before the current frame are added. Then perform the analysis. As a result, the brightness range of the pixel, which has poor brightness and is difficult to detect as a feature point, can be amplified, and the noise level can be relatively lowered. As a result, the detection accuracy of the reflected light image and the feature points is improved, and the position information can be accurately obtained.

また、元から適切な輝度が得られている場合にも対応するように、過去フレームを加算せずに解析する処理経路を設ける。あるいは加算する過去フレームの数を異ならせてそれぞれについて画像解析する処理経路を設ける。これらの一方、あるいは組み合わせにより、元の撮影画像における反射光の像や特徴点の輝度レベルによらず、それらの検出精度を保証できる。結果として、光の照射強度や撮影条件を調整せずとも、幅広い距離範囲の被写体の位置を精度よく取得できる。 In addition, a processing path for analysis without adding past frames is provided so as to correspond to the case where appropriate brightness is originally obtained. Alternatively, a processing path is provided in which the number of past frames to be added is different and image analysis is performed for each. By one or a combination of these, the detection accuracy can be guaranteed regardless of the brightness level of the reflected light image or the feature point in the original captured image. As a result, the position of the subject in a wide range of distance can be accurately acquired without adjusting the light irradiation intensity and the shooting conditions.

また、異なる処理経路で取得した複数の位置情報を統合し、１つのデプス画像を生成する。これにより同じ視野に距離や状態の大きく異なる被写体が混在していても、最終的に生成される位置情報の精度は均一となり、それを用いた情報処理において精度のばらつきを考慮する必要がなくなる。結果として、被写体の状況によらず情報処理の精度を容易に維持することができる。 In addition, a plurality of position information acquired by different processing paths are integrated to generate one depth image. As a result, even if subjects with greatly different distances and states coexist in the same field of view, the accuracy of the finally generated position information becomes uniform, and it is not necessary to consider the variation in accuracy in information processing using the same field of view. As a result, the accuracy of information processing can be easily maintained regardless of the situation of the subject.

実施の形態２
実施の形態１では、特徴点を抽出する対象として撮影画像を用い、抽出精度を向上させることを主な目的として過去フレームを加算した。本実施の形態では、撮影画像をそのまま表示させたり、撮影画像を加工して表示させたりする態様において、表示画像の輝度レンジを拡張させる目的で画像を加算する。 Embodiment 2
In the first embodiment, the captured image is used as the target for extracting the feature points, and the past frames are added mainly for the purpose of improving the extraction accuracy. In the present embodiment, images are added for the purpose of expanding the brightness range of the displayed image in a mode in which the captured image is displayed as it is or the captured image is processed and displayed.

この際、撮像面の動きや被写体の動きを考慮して過去フレームの像を補正し、現フレームの時刻の像を生成したうえで加算することにより、加算画像の像が鮮明になるようにする。本実施の形態の情報処理システムの構成や情報処理装置の内部回路構成は、実施の形態１で説明したのと同様でよい。また撮像装置１２および表示装置１６を、図２で示したようなヘッドマウントディスプレイ１００で構成してもよい。 At this time, the image of the past frame is corrected in consideration of the movement of the imaging surface and the movement of the subject, and the image of the time of the current frame is generated and then added to make the image of the added image clear. .. The configuration of the information processing system and the internal circuit configuration of the information processing apparatus of the present embodiment may be the same as those described in the first embodiment. Further, the image pickup device 12 and the display device 16 may be configured by the head-mounted display 100 as shown in FIG.

図９は本実施の形態における情報処理装置の機能ブロックの構成を示している。情報処理装置１５０は、撮像装置１２から画像のデータを取得する画像データ取得部１５２、取得した画像のデータを格納する画像データ格納部１５４、所定数の過去フレームの画像を足し合わせる画像加算部１５６、被写体の状態に係る情報を取得する状態情報取得部１５８、および、出力すべきデータを出力する出力部６２を含む。 FIG. 9 shows the configuration of the functional block of the information processing device according to the present embodiment. The information processing device 150 includes an image data acquisition unit 152 that acquires image data from the image pickup device 12, an image data storage unit 154 that stores the acquired image data, and an image addition unit 156 that adds a predetermined number of images of past frames. Includes a state information acquisition unit 158 that acquires information related to the state of the subject, and an output unit 62 that outputs data to be output.

画像データ取得部１５２は、実施の形態１の画像データ取得部５２と同様の機能を有する。ただし本実施の形態における画像データ取得部１５２は、少なくとも表示画像に用いるデータを取得すればよい。画像データ取得部１５２はさらに、撮像面と被写体との相対的な位置や姿勢の変化を得るためのデータを、撮影画像と対応づけて取得する。例えば図２で示したヘッドマウントディスプレイ１００を導入し、筐体１０８の前面に設けたカメラで撮影した画像を用いた表示を行う場合、ヘッドマウントディスプレイ１００に内蔵したジャイロセンサ、加速度センサなどのモーションセンサから計測値を取得することにより、ユーザ頭部の動きが求められる。 The image data acquisition unit 152 has the same function as the image data acquisition unit 52 of the first embodiment. However, the image data acquisition unit 152 in the present embodiment may acquire at least the data used for the display image. The image data acquisition unit 152 further acquires data for obtaining changes in the relative positions and postures of the imaging surface and the subject in association with the captured image. For example, when the head-mounted display 100 shown in FIG. 2 is introduced and a display is performed using an image taken by a camera provided on the front surface of the housing 108, the motion of the gyro sensor, acceleration sensor, etc. built in the head-mounted display 100 is performed. By acquiring the measured value from the sensor, the movement of the user's head is required.

これにより撮影画像平面に対する被写体の相対的な動きを特定できるため、過去フレームの画像に写る被写体の像を、現フレームと同時刻における像に補正できる。なお撮影画像平面に対する被写体の相対的な動きを特定できれば、その根拠とするデータはモーションセンサの計測値に限らず、ひいては本実施の形態の撮像装置１２および表示装置１６をヘッドマウントディスプレイ１００に限る主旨ではない。 As a result, the relative movement of the subject with respect to the captured image plane can be specified, so that the image of the subject reflected in the image of the past frame can be corrected to the image at the same time as the current frame. If the relative movement of the subject with respect to the captured image plane can be specified, the data on which it is based is not limited to the measured value of the motion sensor, and the image pickup device 12 and the display device 16 of the present embodiment are limited to the head-mounted display 100. Not the point.

例えば被写体が既知の形状やサイズを有する場合、その実空間での位置や姿勢は、テンプレート画像やオブジェクトモデルとのマッチングにより、撮影画像を用いて求められる。その他、撮影画像を用いて被写体の実空間での位置や姿勢の変化を追跡したり推定したりする技術には様々なものが提案されており、そのいずれを適用してもよい。 For example, when the subject has a known shape and size, its position and posture in the real space can be obtained by using a photographed image by matching with a template image or an object model. In addition, various techniques for tracking or estimating changes in the position or posture of a subject in real space using captured images have been proposed, and any of them may be applied.

画像データ取得部１５２は、所定のフレームレートで送られる画像のデータを順次、画像データ格納部１５４に格納する。画像データ格納部１５４には、現フレームおよび過去の所定数のフレームの画像データが格納される。ヘッドマウントディスプレイ１００からモーションセンサの計測値を取得する場合、画像データ取得部１５２は、当該データも各時刻の撮影画像と対応づけて画像データ格納部１５４に順次格納する。 The image data acquisition unit 152 sequentially stores image data sent at a predetermined frame rate in the image data storage unit 154. The image data storage unit 154 stores image data of the current frame and a predetermined number of past frames. When acquiring the measured value of the motion sensor from the head-mounted display 100, the image data acquisition unit 152 sequentially stores the data in the image data storage unit 154 in association with the captured image at each time.

状態情報取得部１５８は、図３のＣＰＵ２３、ＧＰＵ２４、メインメモリ２６などで実現され、画像データ格納部１５４に格納されたモーションセンサの計測値または撮影画像のデータを順次読み出し、上述のとおり３次元実空間での被写体の位置や姿勢を各時刻に対し取得する。取得した情報は、各時刻の撮影画像と対応づけて画像データ格納部１５４に順次格納する。画像加算部１５６は、図３のＣＰＵ２３、ＧＰＵ２４、メインメモリ２６などで実現され、補正部１６４および加算部１６６を含む。 The state information acquisition unit 158 is realized by the CPU 23, GPU 24, main memory 26, etc. in FIG. 3, and sequentially reads out the measured values of the motion sensor or the data of the captured image stored in the image data storage unit 154, and three-dimensionally as described above. Acquires the position and orientation of the subject in real space for each time. The acquired information is sequentially stored in the image data storage unit 154 in association with the captured image at each time. The image addition unit 156 is realized by the CPU 23, GPU 24, main memory 26, etc. of FIG. 3, and includes a correction unit 164 and an addition unit 166.

補正部１６４は、状態情報取得部１５８が取得した、各フレームにおける被写体の位置や姿勢の情報に基づき、過去フレームから現フレームまでに生じた回転角や並進量を被写体ごとに取得する。そして３次元空間において仮想的に被写体の位置や姿勢を操作することで、過去フレームの画像における被写体を現フレームの時刻まで進ませたときの像を求める。 The correction unit 164 acquires the rotation angle and translation amount generated from the past frame to the current frame for each subject based on the information on the position and posture of the subject in each frame acquired by the state information acquisition unit 158. Then, by virtually manipulating the position and orientation of the subject in the three-dimensional space, the image when the subject in the image of the past frame is advanced to the time of the current frame is obtained.

加算部１６６は、そのように補正した過去フレームの画像を、現フレームの画像に加算することにより、表示に用いる加算画像を生成する。加算するフレーム数は、元の撮影画像の輝度レンジと、表示装置１６が対応している輝度レンジあるいは画像表現に望まれる輝度レンジと、に基づき決定する。接続された表示装置１６に応じて、加算するフレーム数を適応的に決定してもよい。 The addition unit 166 generates an addition image to be used for display by adding the image of the past frame corrected in this way to the image of the current frame. The number of frames to be added is determined based on the brightness range of the original captured image and the brightness range supported by the display device 16 or the brightness range desired for image expression. The number of frames to be added may be adaptively determined according to the connected display device 16.

出力部１６２は、図３のＣＰＵ２３、出力部３６などで構成され、画像加算部１５６が生成した加算画像のデータを順次、表示装置１６に適したタイミングで出力する。出力部１６２は、加算された画像上に仮想オブジェクトを描画するなど所定の加工を行ったうえでデータを出力してもよい。このような加工において、状態情報取得部１５８が取得した被写体の位置や姿勢に係る情報を利用してもよい。出力部１６２はさらに音声のデータも出力してよい。 The output unit 162 includes the CPU 23 and the output unit 36 of FIG. 3, and sequentially outputs the data of the added image generated by the image adding unit 156 at a timing suitable for the display device 16. The output unit 162 may output data after performing predetermined processing such as drawing a virtual object on the added image. In such processing, the information related to the position and posture of the subject acquired by the state information acquisition unit 158 may be used. The output unit 162 may also output audio data.

図１０は、本実施の形態における画像加算部１５６が、過去フレームの画像を補正したうえで現フレームの画像と加算する様子を模式的に示している。同図上段は、各フレームの撮影周期をΔｔ、現フレームの撮影時刻をｔとしたときに、加算の対象となる４フレーム分の撮影時刻と各撮影画像の内容を例示している。同図の例では、時刻ｔ−３Δｔにおいて被写体である円板状の物の側面が見えている状態から、時刻ｔ−２Δｔ、ｔ−Δｔ、ｔ、と時間が経過するにつれ、徐々に円板上面が見えるように変化している。また当該被写体は、視野の左から右に移動している。 FIG. 10 schematically shows how the image addition unit 156 in the present embodiment corrects the image of the past frame and then adds it to the image of the current frame. The upper part of the figure illustrates the shooting times of four frames to be added and the contents of each shot image when the shooting cycle of each frame is Δt and the shooting time of the current frame is t. In the example of the figure, the disk gradually changes from the state where the side surface of the disk-shaped object which is the subject is visible at the time t-3Δt to the time t-2Δt, t−Δt, t, and so on. It has changed so that the top surface can be seen. The subject is moving from left to right in the field of view.

このとき状態情報取得部１５８は、各撮影画像の下に示すように、各時刻における被写体の３次元空間での位置および姿勢、またはそれらの変化量に係る情報を取得する。補正部１６４は、過去フレームと現フレームの位置および姿勢の差分、すなわち３軸での回転角と並進量に基づき、過去フレームにおける被写体の像を現フレームの時刻の状態に補正する。被写体の回転角をロールφ、ピッチθ、ヨーψとし、並進量を（Ｔ_ｘ，Ｔ_ｙ，T_ｚ）とすると、３次元空間において位置座標（ｘ，ｙ，ｚ）にあった被写体表面の点は、下式により、回転、並進後の位置座標（ｘ’，ｙ’，ｚ’）に移動する。 At this time, the state information acquisition unit 158 acquires information on the position and posture of the subject in the three-dimensional space at each time, or the amount of change thereof, as shown below each captured image. The correction unit 164 corrects the image of the subject in the past frame to the time state of the current frame based on the difference between the positions and orientations of the past frame and the current frame, that is, the rotation angle and the translation amount on the three axes. Assuming that the angle of rotation of the subject is roll φ, pitch θ, and yaw ψ, and the translation amount is (T _x , T _y , T _z ), the surface of the subject that matches the position coordinates (x, y, z) in the three-dimensional space. The point moves to the position coordinates (x', y', z') after rotation and translation according to the following equation.

上式により求めた位置座標（ｘ’，ｙ’，ｚ’）を、透視変換により画像平面に射影すれば、元の撮影画像における画素の移動先が求められる。この補正処理を、被写体を構成する全ての画素について実施することにより、同図中段に示すように、時刻ｔ−３Δｔ、ｔ−２Δｔ、ｔ−Δｔの各フレームにおける被写体の像から、現フレームの時刻ｔの像を生成できる。加算部１６６は、時刻ｔ−３Δｔ、ｔ−２Δｔ、ｔ−Δｔの過去フレームの像を補正した画像と、時刻ｔの現フレームの撮影画像とを加算することにより、同図下段に示す加算後の画像を生成できる。この画像の色深度は、元の撮影画像から２ビット分、増加している。したがって、これに対応する表示装置１６を用いて表示すれば、よりダイナミックな画像表現が可能となる。 If the position coordinates (x', y', z') obtained by the above equation are projected onto the image plane by fluoroscopic transformation, the movement destination of the pixels in the original captured image can be obtained. By performing this correction process on all the pixels that make up the subject, as shown in the middle part of the figure, from the image of the subject in each frame at time t-3Δt, t-2Δt, and t−Δt, the current frame An image at time t can be generated. The addition unit 166 adds the image corrected by the image of the past frame at time t-3Δt, t-2Δt, and t−Δt and the captured image of the current frame at time t, and after the addition shown in the lower part of the figure. Image can be generated. The color depth of this image is increased by 2 bits from the original captured image. Therefore, if the display device 16 corresponding to this is used for display, more dynamic image expression becomes possible.

次に、以上の構成によって実現される、本実施の形態における情報処理装置の動作を説明する。図１１は、情報処理装置１５０が撮影画像の輝度レンジを拡張して表示させる処理手順を示すフローチャートである。このフローチャートは、撮影画像を用いた表示を伴う電子コンテンツをユーザが選択し、撮像装置１２において撮影がなされている状態での情報処理装置１５０の動作を示している。このとき表示装置１６には必要に応じて初期画像が表示されている。 Next, the operation of the information processing apparatus according to the present embodiment realized by the above configuration will be described. FIG. 11 is a flowchart showing a processing procedure in which the information processing device 150 expands the brightness range of the captured image and displays it. This flowchart shows the operation of the information processing device 150 in a state where the user selects an electronic content accompanied by a display using a captured image and the image pickup device 12 captures the image. At this time, an initial image is displayed on the display device 16 as needed.

まず情報処理装置１５０の画像データ取得部１５２は、撮像装置１２から現在時刻ｔのフレームの撮影画像のデータを取得し、画像データ格納部５４に格納する（Ｓ３０）。この際、態様によっては、撮像装置１２を兼ねるヘッドマウントディスプレイ１００のモーションセンサから、ユーザ頭部の位置や姿勢に係る計測値を取得し、撮影画像のデータに対応づけて画像データ格納部５４に格納する。 First, the image data acquisition unit 152 of the information processing device 150 acquires the data of the captured image of the frame at the current time t from the image pickup device 12 and stores it in the image data storage unit 54 (S30). At this time, depending on the mode, the measured value related to the position and posture of the user's head is acquired from the motion sensor of the head-mounted display 100 that also serves as the image pickup device 12, and is associated with the captured image data in the image data storage unit 54. Store.

状態情報取得部１５８は、現在時刻ｔのフレームに写る被写体の位置および姿勢を取得する（Ｓ３２）。当該情報の取得目的は、図１０で示したような画像平面における像の補正にあるため、状態情報取得部１５８は、撮像面と被写体との相対的な位置や角度の関係を導出できる情報を取得する。その限りにおいて当該情報は、ワールド座標系における被写体およびスクリーンの位置および姿勢であっても、カメラ座標系における被写体の位置および姿勢であってもよい。 The state information acquisition unit 158 acquires the position and orientation of the subject in the frame at the current time t (S32). Since the purpose of acquiring the information is to correct the image on the image plane as shown in FIG. 10, the state information acquisition unit 158 obtains information capable of deriving the relative position and angle relationship between the imaging surface and the subject. get. To that extent, the information may be the position and orientation of the subject and screen in the world coordinate system, or the position and orientation of the subject in the camera coordinate system.

このような情報は上述のとおり、ヘッドマウントディスプレイ１００のモーションセンサによる計測値から取得してもよいし、撮影画像に写る被写体の像の形状やサイズに基づき取得してもよい。撮像装置１２からステレオ画像のデータを取得し、それに基づき被写体の３次元空間での位置を特定してもよい。この場合、実施の形態１で説明したように、過去フレームの撮影画像を加算した画像を用いて対応点を検出してもよい。モーションセンサに基づく情報と、撮影画像に基づく情報とを統合して、最終的な位置や姿勢の情報を求めてもよい。 As described above, such information may be acquired from the measured value by the motion sensor of the head-mounted display 100, or may be acquired based on the shape and size of the image of the subject appearing in the captured image. The stereo image data may be acquired from the image pickup apparatus 12, and the position of the subject in the three-dimensional space may be specified based on the data. In this case, as described in the first embodiment, the corresponding point may be detected by using an image obtained by adding the captured images of the past frames. The information based on the motion sensor and the information based on the captured image may be integrated to obtain the final position and posture information.

取得した情報は、時刻ｔのフレームの撮影画像のデータと対応づけて画像データ格納部１５４に格納する。続いて画像加算部１５６の補正部１６４は、現フレームの直前に取得した所定数の過去フレームの撮影画像のデータ、およびそれらに対応づけられた、被写体の位置および姿勢の情報を画像データ格納部５４から読み出し、被写体の像を現時刻ｔの状態に補正した画像を生成する（Ｓ３４）。 The acquired information is stored in the image data storage unit 154 in association with the data of the captured image of the frame at time t. Subsequently, the correction unit 164 of the image addition unit 156 stores the data of the captured images of a predetermined number of past frames acquired immediately before the current frame and the information of the position and orientation of the subject associated with them as the image data storage unit. Read from 54 and generate an image in which the image of the subject is corrected to the state of the current time t (S34).

具体的には過去フレームの撮影時刻ｔ−ｎΔｔ（１≦ｎ≦Ｎ、Ｎは加算する過去フレームの数）から現フレームの撮影時刻ｔまでに生じた被写体の回転角および並進量から、被写体を構成する各画素の移動先の位置座標を、上式および透視変換により求める。そして補正前の像の画素を移動させることにより、補正後の像を形成する。なお数フレーム分での物体の並進量に対し誤差の割合が大きいと考えられる場合、上式における並進量の成分（Ｔ_ｘ，Ｔ_ｙ，Ｔ_ｚ）は演算に含めなくてもよい。 Specifically, the subject is selected from the rotation angle and translation amount of the subject generated from the shooting time t−nΔt of the past frame (1 ≦ n ≦ N, N is the number of past frames to be added) to the shooting time t of the current frame. The position coordinates of the movement destination of each of the constituent pixels are obtained by the above equation and perspective conversion. Then, by moving the pixels of the image before correction, the image after correction is formed. If it is considered that the ratio of the error to the translation amount of the object in several frames is large, the translation amount components (T _x , T _y , T _z ) in the above equation need not be included in the calculation.

また補正部１６４は、画質をより向上させるためにさらなる補正処理を行ってもよい。具体的には、過去フレームおよび現フレームの画像の画素を既存の手法により補間し、解像度を上げてもよい。またノイズ除去フィルタなど、各種補正フィルタを施してもよい。加算部１６６は、そのようにして補正された画像を、現フレームの画像とともに同じ位置の画素同士で加算することにより加算画像を生成する（Ｓ３６）。ここで後述するように、加算する画素の領域をサブピクセル単位でずらすことにより、加算画像の高精細高解像度化を実現してもよい。 Further, the correction unit 164 may perform further correction processing in order to further improve the image quality. Specifically, the pixels of the images of the past frame and the current frame may be interpolated by an existing method to increase the resolution. Further, various correction filters such as a noise removal filter may be applied. The addition unit 166 generates an addition image by adding the image corrected in this way to the pixels at the same position together with the image of the current frame (S36). As will be described later, high-definition and high-resolution of the added image may be realized by shifting the area of the pixels to be added in units of sub-pixels.

なお実施の形態１と同様、加算することにより輝度値が所定の上限を超える画素が生じた場合、その画素値を当該上限値に置き換える。出力部１６２は、生成された加算画像のデータを表示装置１６などに出力する（Ｓ３８）。これにより時刻ｔの表示画像が高精細に表示される。出力部１６２は適宜音声データも出力してよい。また上述のように、出力部１６２は加算画像に所定の加工を施してもよい。 As in the first embodiment, when a pixel whose luminance value exceeds a predetermined upper limit is generated by addition, the pixel value is replaced with the upper limit value. The output unit 162 outputs the generated data of the added image to the display device 16 or the like (S38). As a result, the display image at time t is displayed in high definition. The output unit 162 may also output audio data as appropriate. Further, as described above, the output unit 162 may perform a predetermined process on the added image.

ゲームの進捗やユーザ操作などにより処理を終了させる必要が生じない間は（Ｓ４０のＮ）、次の時刻ｔ＝ｔ＋Δｔのフレームの撮影画像データを取得し（Ｓ４２、Ｓ３０）、Ｓ３２〜Ｓ３８の処理を繰り返す。処理を終了させる必要が生じたら、全ての処理を終了させる（Ｓ４０のＹ）。 While it is not necessary to end the process due to the progress of the game or user operation (N in S40), the captured image data of the frame at the next time t = t + Δt is acquired (S42, S30), and the processes in S32 to S38. repeat. When it becomes necessary to end the processing, all the processing is terminated (Y in S40).

図１２は、図１１のＳ３６において、加算部１６６が画素領域をずらして画像を加算する様子を模式的に示している。この例では２つの画像１７０、１７２を加算する際の、２行３列分の画素の位置ずれを例示しており、画像１７０の画素の境界線を実線、画像１７２の画素の境界線を破線で示している。例えば画像１７０における画素１７４が、画像１７２における画素１７６に対応する。図示するように画素の境界を縦横双方向に半画素分ずらすと、一方の画像の画素領域は、他方の画像の画素領域の境界によって４分割される。 FIG. 12 schematically shows how the addition unit 166 shifts the pixel area and adds images in S36 of FIG. In this example, the positional deviation of the pixels for 2 rows and 3 columns when adding the two images 170 and 172 is illustrated, the boundary line of the pixels of the image 170 is a solid line, and the boundary line of the pixels of the image 172 is a broken line. It is shown by. For example, pixel 174 in image 170 corresponds to pixel 176 in image 172. When the pixel boundary is shifted by half a pixel in both the vertical and horizontal directions as shown in the figure, the pixel area of one image is divided into four by the boundary of the pixel area of the other image.

例えば画素１７６は、「Ａ」、「Ｂ」、「Ｃ」、「Ｄ」の４つの領域に分割される。領域「Ａ」は他方の画像１７０の画素１７４の画素値と加算される。その他の領域はそれぞれ、画像１７０の、画素１７４に隣接する異なる画素の画素値と加算される。その結果、両者を加算した画像の解像度は、元の画像の４倍となる。この処理により、画素間を線形補間するのと比較し、より高い精度の高解像度画像を生成できる。 For example, pixel 176 is divided into four regions of "A", "B", "C", and "D". The region "A" is added to the pixel value of pixel 174 of the other image 170. Each of the other regions is added to the pixel value of a different pixel of the image 170 adjacent to the pixel 174. As a result, the resolution of the image obtained by adding both is four times that of the original image. By this processing, a high-resolution image with higher accuracy can be generated as compared with linear interpolation between pixels.

図示する例では縦方向と横方向のずらし量を同じとしているが、本実施の形態では被写体の回転角や並進量が得られているため、それに応じてずらし量に異方性を持たせてもよい。例えば補正前後の画素の移動量が大きい方向により大きくずらすようにしてもよい。また３フレーム以上の画像を加算する場合は特に、ずらし量は半画素分に限らず、それより小さい単位としてもよい。すなわち加算に際して各画像をずらす方向と量との組み合わせを、加算するフレーム数や被写体の位置および姿勢から導出できるように、あらかじめ規則を設定してもよい。 In the illustrated example, the vertical and horizontal shift amounts are the same, but in the present embodiment, the rotation angle and translation amount of the subject are obtained, so that the shift amount is made anisotropic accordingly. May be good. For example, the pixel movement amount before and after the correction may be shifted larger in the larger direction. Further, especially when adding images of 3 frames or more, the shift amount is not limited to half a pixel, and may be a unit smaller than that. That is, a rule may be set in advance so that the combination of the direction and amount of shifting each image at the time of addition can be derived from the number of frames to be added and the position and orientation of the subject.

以上述べた本実施の形態によれば、撮影画像を用いて表示を行う技術において、現フレームの直前に撮影された所定数の過去フレームの画像を加算して表示画像を生成する。これにより、被写体の像の輝度を、ノイズを増幅させることなく制御できるため、表示装置が対応可能な輝度レンジに応じて、より表現力のある高精細な画像を表示できる。 According to the present embodiment described above, in the technique of displaying using captured images, a predetermined number of images of past frames captured immediately before the current frame are added to generate a displayed image. As a result, the brightness of the image of the subject can be controlled without amplifying the noise, so that a more expressive and high-definition image can be displayed according to the brightness range that the display device can handle.

また被写体の３次元空間での位置や姿勢を各時刻で取得しておき、加算前の過去フレームの画像における像を、現フレームの時刻に合わせて補正する。これにより、過去フレームからの時間経過によって撮像面や被写体が動いていても、当該微小な動きをも影響させずに鮮明な画像を表示できる。これにより表示画像への影響を最小限としつつ、加算する過去フレームの数を自由に設定でき、望ましい輝度レンジへの変換を容易に実現できる。 Further, the position and posture of the subject in the three-dimensional space are acquired at each time, and the image in the image of the past frame before addition is corrected according to the time of the current frame. As a result, even if the imaging surface or the subject moves due to the passage of time from the past frame, a clear image can be displayed without affecting the minute movement. As a result, the number of past frames to be added can be freely set while minimizing the influence on the displayed image, and conversion to a desired brightness range can be easily realized.

なお本実施の形態の補正部１６４の機能を、実施の形態１の画像加算部５６に設けてもよい。この場合、モーションセンサの計測値などに基づき被写体の３次元空間での回転角および並進量を取得する状態情報取得部１５８を、情報処理装置１０にさらに設けてもよいし、画像解析部５８が当該処理を実施し、その結果を画像加算部５６に供給するようにしてもよい。これにより、被写体の微小な動きをも加味して加算画像を生成でき、特徴点の検出、ひいては位置情報を高精度に取得できる。また加算する過去フレーム数に対する自由度が増えるため、より多様な状況にある被写体の特徴点を正確に取得できる。 The function of the correction unit 164 of the present embodiment may be provided in the image addition unit 56 of the first embodiment. In this case, the information processing device 10 may further be provided with a state information acquisition unit 158 that acquires the rotation angle and translation amount of the subject in the three-dimensional space based on the measured values of the motion sensor, or the image analysis unit 58 may provide the information processing unit 58. The process may be performed and the result may be supplied to the image addition unit 56. As a result, an additive image can be generated in consideration of minute movements of the subject, feature points can be detected, and position information can be acquired with high accuracy. In addition, since the degree of freedom for the number of past frames to be added increases, it is possible to accurately acquire the feature points of subjects in more diverse situations.

以上、本発明を実施の形態をもとに説明した。上記実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. The above-described embodiment is an example, and it is understood by those skilled in the art that various modifications are possible for each of these components and combinations of each processing process, and that such modifications are also within the scope of the present invention. be.

例えば実施の形態１において画像解析部５８は、撮影画像を加算しない処理経路と加算する処理経路でそれぞれ独立に位置情報の取得を実施し、それらの結果を統合した。一方画像解析部５８は、それ以外の観点でも処理経路を分離してそれぞれで位置情報を取得し、結果を統合してもよい。例えば環境光を撮影したカラーステレオ画像を用いて位置情報を取得する経路と、特定の波長帯の光を撮影したステレオ画像を用いて位置情報を取得する経路を設けてもよい。 For example, in the first embodiment, the image analysis unit 58 independently acquires the position information in the processing path in which the captured image is not added and the processing path in which the captured image is added, and integrates the results. On the other hand, the image analysis unit 58 may separate the processing paths from other viewpoints, acquire position information for each, and integrate the results. For example, a path for acquiring position information using a color stereo image obtained by photographing ambient light and a path for acquiring position information using a stereo image obtained by photographing light in a specific wavelength band may be provided.

そして各処理経路において、撮影画像の加算／非加算や加算数が異なる処理経路にさらに分岐させてもよい。処理経路が増加するほど、被写体の状況変化に対しより高い頑健性で特徴点検出や位置情報取得を行える。処理経路の設定は、取得する情報に求められる精度や時間分解能、情報処理装置の処理性能、許容される通信帯域などに応じて適宜決定する。 Then, in each processing path, the captured image may be further branched into processing paths having different addition / non-addition and addition numbers. As the number of processing paths increases, feature point detection and position information acquisition can be performed with higher robustness against changes in the subject's situation. The processing path setting is appropriately determined according to the accuracy and time resolution required for the acquired information, the processing performance of the information processing device, the allowable communication band, and the like.

１情報処理システム、１０情報処理装置、１２撮像装置、１６表示装置、５２画像データ取得部、５４画像データ格納部、５６画像加算部、５８画像解析部、６０情報処理部、６２出力部、１５２画像データ取得部、１５４画像データ格納部、１５６画像加算部、１５８状態情報取得部、１６２出力部、１６４補正部、１６６加算部。 1 Information processing system, 10 Information processing device, 12 Imaging device, 16 Display device, 52 Image data acquisition unit, 54 Image data storage unit, 56 Image addition unit, 58 Image analysis unit, 60 Information processing unit, 62 Output unit, 152 Image data acquisition unit, 154 image data storage unit, 156 image addition unit, 158 status information acquisition unit, 162 output unit, 164 correction unit, 166 addition unit.

Claims

An image data acquisition unit that sequentially acquires image data of frames of moving images to be captured, and an image data acquisition unit.
An image addition unit that generates an addition image by adding the pixel value of the image of the past frame acquired before that to the pixel value of the image of the newly acquired current frame at the corresponding position.
An image analysis unit that extracts feature points from the added image and performs a predetermined analysis process, extracts feature points from the image of the current frame and performs the same analysis process, and integrates the results of both.
An output unit that outputs data representing the integrated result, and
An information processing device characterized by being equipped with.

The image data acquisition unit acquires stereo image data obtained by capturing the same space from the left and right viewpoints as the image data.
The image analysis unit obtains depth images representing the distance to the subject based on the stereo image of the added image and the feature points extracted from the stereo image of the current frame, and integrates the two to obtain different distances. The information processing apparatus according to claim 1, wherein one depth image relating to a subject is generated.

The image addition unit generates a plurality of the addition images in which the number of the past frames to be added is different.
The information processing apparatus according to claim 1 or 2, wherein the image analysis unit performs analysis processing based on feature points extracted from the plurality of added images.

The information processing apparatus according to claim 2 or 3, wherein the image adding unit determines the number of past frames to be added based on at least an assumed range of a distance to a subject.

The image data acquisition unit acquires image data of a frame in which the reflected light of light in a predetermined wavelength band irradiated in the subject space is captured as a moving image.
The information processing apparatus according to any one of claims 1 to 4, wherein the image analysis unit extracts an image of the reflected light from the added image as the feature point.

A head-mounted display equipped with an imaging device that captures moving images in a field of view that corresponds to the user's line of sight,
An information processing device that generates data for a display image to be displayed on the head-mounted display based on the moving image.
With
The information processing device
An image data acquisition unit that sequentially acquires image data of frames of the moving image,
An image addition unit that generates an addition image by adding the pixel value of the image of the past frame acquired before that to the pixel value of the image of the newly acquired current frame at the corresponding position.
An image analysis unit that extracts feature points from the added image and performs a predetermined analysis process, extracts feature points from the image of the current frame and performs the same analysis process, and integrates the results of both. ,
An output unit that generates and outputs the data of the display image using the integrated result,
An information processing system characterized by being equipped with.

The step of sequentially acquiring the image data of the frame of the moving image to be captured and storing it in the memory,
A step of generating an added image by adding the pixel values of the newly acquired image of the current frame to the pixel values of the image of the past frame acquired earlier than that read from the memory at the corresponding positions.
A step of extracting feature points from the added image and performing a predetermined analysis process, extracting feature points from the image of the current frame and performing the same analysis process, and integrating the results of both.
Steps to output data representing the result of integration,
An image processing method by an information processing apparatus, which comprises.

A function to sequentially acquire image data of frames of moving images to be shot, and
A function to generate an added image by adding the pixel value of the image of the past frame acquired before that to the pixel value of the image of the newly acquired current frame at the corresponding position.
A function of extracting feature points from the added image and performing a predetermined analysis process, extracting feature points from the image of the current frame and performing the same analysis process, and integrating the results of both.
A function to output data representing the integrated result and
A computer program characterized by realizing a computer.