JP7376839B2

JP7376839B2 - 3D information estimation method, 3D information estimation device and program

Info

Publication number: JP7376839B2
Application number: JP2022569414A
Authority: JP
Inventors: 志織杉本; 隆行黒住; 英明木全
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2023-11-09
Anticipated expiration: 2040-12-17
Also published as: JPWO2022130555A1; WO2022130555A1

Description

本発明は、３次元情報推定方法、３次元情報推定装置およびプログラムの技術に関する。 The present invention relates to a three-dimensional information estimation method, a three-dimensional information estimation device, and a program technique.

各画素の輝度変化を非同期で検出し、輝度値が変化した画素についてのみ、その座標、時間、及び変化量の符号をイベントデータとして出力するイベントベースビジョンセンサがある。以下では通常のイメージセンサを備えるカメラをフレームカメラ、イベントベースビジョンセンサを備えるカメラをイベントカメラと表現することがある。 There is an event-based vision sensor that asynchronously detects a change in brightness of each pixel and outputs, as event data, the coordinates, time, and sign of the amount of change only for pixels whose brightness value has changed. Below, a camera equipped with a normal image sensor may be referred to as a frame camera, and a camera equipped with an event-based vision sensor may be referred to as an event camera.

フレームカメラは、各画素に積算された輝度値をフレームごとに出力する。フレームカメラにおいて、積算時間すなわち露光時間と信号のダイナミックレンジは全ての画素で同じである。そのため、非常に明るい画素と暗い画素が混在している場合、白飛び、黒つぶれや量子化誤差等が生じる。また、照明の影響などでシーンの明度が激しく変化する場合、露光時間の調整ができずに白飛びや黒つぶれが生じることもある。 The frame camera outputs the luminance value integrated into each pixel for each frame. In a frame camera, the integration time, that is, the exposure time, and the dynamic range of the signal are the same for all pixels. Therefore, when very bright pixels and very dark pixels coexist, blown out highlights, blocked up shadows, quantization errors, etc. occur. Furthermore, if the brightness of the scene changes drastically due to the influence of lighting, the exposure time may not be able to be adjusted, resulting in blown-out highlights or blown-out shadows.

一方、イベントカメラは、各画素の輝度変化が一定のしきい値を超える度に非同期でイベントデータを出力するため、白飛び、黒つぶれ、量子化誤差等の問題が生じない。また、イベントカメラから出力されるイベントデータはフレームカメラから出力される画像に比べて非常に疎なデータである。そのためイベントデータを記憶するメモリや、イベントデータを伝送するための伝送帯域が少なく済み、非常に高い時間分解能を達成することができる。同様の理由により、撮影やデータ処理に要する演算量や消費電力量は、フレームカメラと比較して非常に低く抑えることができる。 On the other hand, an event camera outputs event data asynchronously every time the luminance change of each pixel exceeds a certain threshold, so problems such as blown-out highlights, blocked-up shadows, and quantization errors do not occur. Furthermore, the event data output from the event camera is much sparser than the image output from the frame camera. Therefore, the memory for storing event data and the transmission band for transmitting event data can be reduced, and extremely high time resolution can be achieved. For the same reason, the amount of calculation and power consumption required for photographing and data processing can be kept very low compared to frame cameras.

以上より、イベントカメラを用いることで、光量が少ない状況や照明の変化が激しい状況において安定的に、低電力で、また非常に高い時間分解能でマシンビジョンを行うことができる。 As described above, by using an event camera, machine vision can be performed stably, with low power, and with extremely high temporal resolution even in situations where the amount of light is low or where the lighting changes rapidly.

イベントカメラを用いた３次元位置推定方法として、ＳＬＡＭ（Simultaneous Localization and Mapping）法がある（非特許文献１参照）。ＳＬＡＭ法では、カメラを動かしながら静止した被写体を撮影し、各フレームで求めた局所特徴点についてフレーム間での対応関係を求めることで、カメラの移動量と特徴点の３次元位置を推定する。 As a three-dimensional position estimation method using an event camera, there is a SLAM (Simultaneous Localization and Mapping) method (see Non-Patent Document 1). In the SLAM method, a stationary subject is photographed while moving a camera, and the correspondence between frames is determined for the local feature points found in each frame, thereby estimating the amount of camera movement and the three-dimensional position of the feature points.

フレームカメラを用いてＳＬＡＭ法を行う場合、画像の各画素について特徴量を抽出し、全ての特徴量についてフレーム間対応の推定をするため、非常に高い演算コストがかかる。また、対応関係の誤推定が多く生じることから、これを考慮して３次元位置の推定を行うために、多くの演算が必要となる。 When performing the SLAM method using a frame camera, a feature amount is extracted for each pixel of an image and correspondence between frames is estimated for all feature amounts, which requires very high calculation cost. Further, since many erroneous estimates of correspondence occur, many calculations are required to estimate the three-dimensional position taking this into consideration.

イベントカメラを用いてＳＬＡＭ法を行う場合、処理対象のデータ数を少なく、また時間分解能が高いことで対応点探索を行う範囲を非常に狭くできる。よって、イベントカメラを用いてＳＬＡＭ法を行う場合には、フレームカメラを用いてＳＬＡＭ法を行う場合と比較して、高速で安定した３次元情報を推定できる。 When performing the SLAM method using an event camera, the number of data to be processed is small and the time resolution is high, so the range in which matching points are searched can be extremely narrowed. Therefore, when performing the SLAM method using an event camera, three-dimensional information can be estimated more quickly and stably than when performing the SLAM method using a frame camera.

フリー百科事典「ウィキペディア（Wikipedia）」、"ＳＬＡＭ"［令和２年１２月２日検索］、インターネット（URL: https://ja.wikipedia.org/wiki/SLAM）Free encyclopedia "Wikipedia", "SLAM" [searched on December 2, 2020], Internet (URL: https://ja.wikipedia.org/wiki/SLAM)

しかしながら、ＳＬＡＭ法では動物体に対して３次元情報を推定できない。動物体に対して３次元情報を推定する場合は、ステレオ法を用いることが考えられる。イベントカメラはシャッター機能を持たないため、異なるイベントカメラ同士で同期を行うことは難しい。またイベントビジョンセンサにはイベント出力後一定期間その画素での測定を行うことができない休眠期間が設けられている。そのため、異なるイベントカメラ同士で測定開始タイミングが同時でない場合は、イベントデータのパターンが異なる。この場合、パターンの対応関係を求めることは難しい。 However, the SLAM method cannot estimate three-dimensional information for a moving object. When estimating three-dimensional information for a moving object, a stereo method may be used. Since event cameras do not have a shutter function, it is difficult to synchronize different event cameras. Furthermore, the event vision sensor is provided with a dormant period during which measurement cannot be performed at that pixel for a certain period of time after an event is output. Therefore, if different event cameras do not start measurement at the same time, the event data patterns will be different. In this case, it is difficult to find correspondence between patterns.

このように、イベントカメラを用いて動物体の３次元情報を推定することは困難であるという課題があった。 As described above, there has been a problem in that it is difficult to estimate three-dimensional information of a moving object using an event camera.

上記事情に鑑み、本発明は、動物体の３次元情報を推定する技術の提供を目的としている。 In view of the above circumstances, the present invention aims to provide a technique for estimating three-dimensional information of a moving object.

本発明の一態様は、被写体表面上の１点から発せられた光線をイベントベースビジョンセンサ上の複数の位置に集光させる３次元情報推定装置における３次元情報推定方法であって、前記イベントベースビジョンセンサの画素のうち、輝度値の変化量が一定値を超えた場合に、輝度値の変化量が一定値を超えた画素の位置と、輝度値の変化量が一定値を超えた時刻とを含むイベント情報を取得するイベント情報取得ステップと、前記イベント情報取得ステップにより取得された複数の前記イベント情報のうち、前記イベント情報に含まれる時刻に基づいて２つの前記イベント情報を対応付ける対応ステップと、前記対応ステップにより対応付けられた２つの前記イベント情報のそれぞれに含まれる画素の位置に基づき、被写体までの距離を推定する推定ステップと、を備えた３次元情報推定方法である。 One aspect of the present invention is a three-dimensional information estimation method in a three-dimensional information estimation device that focuses light rays emitted from one point on a subject surface onto a plurality of positions on an event-based vision sensor, the method comprising: Among the pixels of the vision sensor, when the amount of change in brightness value exceeds a certain value, the position of the pixel where the amount of change in brightness value exceeds the certain value, the time when the amount of change in brightness value exceeds the certain value, and and a corresponding step of associating two of the event information based on the time included in the event information among the plurality of event information acquired in the event information acquisition step. and an estimating step of estimating a distance to a subject based on the position of a pixel included in each of the two pieces of event information associated in the corresponding step.

本発明の一態様は、被写体表面上の１点から発せられた光線をイベントベースビジョンセンサ上の複数の位置に集光させる３次元情報推定装置であって、前記イベントベースビジョンセンサの画素のうち、輝度値の変化量が一定値を超えた場合に、輝度値の変化量が一定値を超えた画素の位置と、輝度値の変化量が一定値を超えた時刻とを含むイベント情報を取得するイベント情報取得部と、前記イベント情報取得部により取得された複数の前記イベント情報のうち、前記イベント情報に含まれる時刻に基づいて２つの前記イベント情報を対応付ける対応部と、前記対応部により対応付けられた２つの前記イベント情報のそれぞれに含まれる画素の位置に基づき、被写体までの距離を推定する推定部と、を備えた３次元情報推定装置である。 One aspect of the present invention is a three-dimensional information estimation device that focuses light rays emitted from one point on the surface of a subject onto a plurality of positions on an event-based vision sensor, the device comprising: , when the amount of change in brightness value exceeds a certain value, obtain event information including the position of the pixel where the amount of change in brightness value exceeds the certain value and the time when the amount of change in brightness value exceeds the certain value. an event information acquisition unit that associates two pieces of event information based on the time included in the event information among the plurality of event information acquired by the event information acquisition unit; The three-dimensional information estimating device includes an estimating unit that estimates a distance to a subject based on the position of a pixel included in each of the two attached event information.

本発明の一態様は、３次元情報推定方法をコンピュータに実行させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to execute a three-dimensional information estimation method.

本発明により、動物体の３次元情報を推定することが可能となる。 According to the present invention, it becomes possible to estimate three-dimensional information of a moving object.

３次元情報推定装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a three-dimensional information estimation device. 撮像部の概略構成を示す図であるFIG. 2 is a diagram showing a schematic configuration of an imaging unit. マスク部を光軸方向から見た図である。FIG. 3 is a diagram of the mask section viewed from the optical axis direction. ３次元情報推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of three-dimensional information estimation processing. ３次元情報推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of three-dimensional information estimation processing.

本発明の実施形態について、図面を参照して詳細に説明する。 Embodiments of the present invention will be described in detail with reference to the drawings.

図１は、実施形態における３次元情報推定装置１００の構成を示すブロック図である。３次元情報推定装置１００は、撮像部１１０、およびイベント処理部１２０で構成される。撮像部１１０は、イベントが発生した時に、イベントデータをイベント処理部１２０に出力する。ここで、イベントとは、イベントベースビジョンセンサ（以下、「イベントセンサ」という）の画素の輝度値の変化量が一定値を超えたことを示す。本実施形態では、変化量として、所定時間当たりの変化量を用いている。 FIG. 1 is a block diagram showing the configuration of a three-dimensional information estimation device 100 in an embodiment. The three-dimensional information estimation device 100 includes an imaging section 110 and an event processing section 120. The imaging unit 110 outputs event data to the event processing unit 120 when an event occurs. Here, an event indicates that the amount of change in the brightness value of a pixel of an event-based vision sensor (hereinafter referred to as "event sensor") exceeds a certain value. In this embodiment, the amount of change per predetermined time is used as the amount of change.

またイベントデータは、イベントが発生した画素の位置、輝度値の変化量が一定値を超えた時刻（以下、「タイムスタンプ」ともいう）、および輝度値の増減を示す符号情報を含む。輝度値の増減を示す符号情報は、輝度値が増加した場合にはプラスを示し、減少した場合にマイナスを示す。一般的な被写体では、イベントは各画素で同期することなく発生するため、イベントデータもその都度出力される。 The event data also includes the position of the pixel where the event occurred, the time when the amount of change in the brightness value exceeds a certain value (hereinafter also referred to as a "time stamp"), and code information indicating an increase or decrease in the brightness value. The code information indicating an increase/decrease in the brightness value indicates a positive value when the brightness value increases, and a negative value when the brightness value decreases. In a typical subject, events occur without synchronization at each pixel, so event data is also output each time.

イベント処理部１２０は、イベント情報取得部１２１、対応部１２２、推定部１２３、およびイベント情報記憶部１２４で構成される。イベント情報取得部１２１は、撮像部１１０から出力されたイベント情報を取得する。イベント情報取得部１２１は、取得したイベント情報をイベント情報記憶部１２４に記憶する。なお、イベント情報取得部１２１は、任意のフィルタ処理を行ってノイズ除去を行ってもよい。 The event processing section 120 includes an event information acquisition section 121, a correspondence section 122, an estimation section 123, and an event information storage section 124. The event information acquisition unit 121 acquires event information output from the imaging unit 110. The event information acquisition unit 121 stores the acquired event information in the event information storage unit 124. Note that the event information acquisition unit 121 may perform arbitrary filter processing to remove noise.

対応部１２２は、イベント情報取得部１２１によりイベント情報記憶部１２４に記憶されたイベント情報のうち、タイムスタンプに基づいて２つのイベント情報を対応付ける。対応付けられたイベント情報をイベントセットとも表現する。対応付けの方法については後述する。 The correspondence unit 122 associates two pieces of event information among the event information stored in the event information storage unit 124 by the event information acquisition unit 121 based on time stamps. The associated event information is also expressed as an event set. The method of association will be described later.

推定部１２３は、対応付けられた２つのイベント情報のそれぞれに含まれる画素の位置に基づき、３次元情報推定装置１００から被写体までの距離を推定する。この推定された距離により、被写体の３次元情報を推定できる。３次元情報推定装置１００は、推定した距離を、例えばＰＣ（Personal Computer）などの上位装置に出力してもよい。 The estimating unit 123 estimates the distance from the three-dimensional information estimation device 100 to the subject based on the position of the pixel included in each of the two pieces of associated event information. Based on this estimated distance, three-dimensional information of the subject can be estimated. The three-dimensional information estimation device 100 may output the estimated distance to a host device such as a PC (Personal Computer).

図２は、撮像部１１０の概略構成を示す図である。撮像部１１０は、レンズ１１１、マスク部１１２、およびイベントセンサ１１３で構成される。また、図２には、被写体１１５、および合焦面１１６が示されている。なお、図２では、説明を分かりやすくするためにレンズ１１１とマスク部１１２とが離して描かれているが、実際のレンズ１１１とマスク部１１２との距離は、ほぼ零である。また、図２に記載されているＤ、Ｄ'、Ｃ、Ｌ、Ｌ'、Ｒについては、後の説明で用いられる。 FIG. 2 is a diagram showing a schematic configuration of the imaging section 110. The imaging section 110 includes a lens 111, a mask section 112, and an event sensor 113. Further, FIG. 2 shows a subject 115 and a focusing plane 116. Note that in FIG. 2, the lens 111 and the mask section 112 are drawn apart for easy understanding, but the actual distance between the lens 111 and the mask section 112 is approximately zero. Further, D, D', C, L, L', and R shown in FIG. 2 will be used in the later explanation.

マスク部１１２には、複数（図２では２つ）の開口部１１２Ａ、１１２Ｂが設けられている。図３は、マスク部１１２を光軸方向から見た図である。図３には、マスク部１１２と、開口部１１２Ａ、１１２Ｂが示されている。なお、開口部１１２Ａ、１１２Ｂは実際の大きさよりも大きく描かれている。すなわち、実際には、マスク部１１２の直径に対する開口部１１２Ａ、１１２Ｂの直径は、図３に示されている場合よりもはるかに小さい。 The mask portion 112 is provided with a plurality of (two in FIG. 2) openings 112A and 112B. FIG. 3 is a diagram of the mask portion 112 viewed from the optical axis direction. FIG. 3 shows the mask portion 112 and the openings 112A and 112B. Note that the openings 112A and 112B are drawn larger than their actual sizes. That is, in reality, the diameter of the openings 112A, 112B relative to the diameter of the mask portion 112 is much smaller than that shown in FIG.

図２において、被写体１１５の表面における反射光は、レンズ１１１、開口部１１２Ａ、１１２Ｂを通過してイベントセンサ１１３に集光される。このように、開口部１１２Ａ、１１２Ｂにより、撮像部１１０は、被写体表面上の１点から発せられた光線をイベントセンサ１１３上の複数の位置に集光させる。なお、被写体表面上の１点から発せられた光線とは、被写体自らが発する光線の他に、被写体表面に反射した光線も含む。イベント情報に含まれる画素の位置は、イベントセンサ１１３を２次元座標とみなしたＸＹ座標で表現される。例えば、原点に対応する画素から右方向に５画素、上方向に３画素に位置する画素の座標は（５、３）となる。画素のピッチｐが与えられると、原点（０、０）から例えば（７、０）の距離は、７ｐとなる。 In FIG. 2, reflected light from the surface of a subject 115 passes through a lens 111 and openings 112A and 112B, and is focused on an event sensor 113. In this way, the imaging unit 110 focuses the light rays emitted from one point on the subject surface onto a plurality of positions on the event sensor 113 using the openings 112A and 112B. Note that the light rays emitted from one point on the subject's surface include not only the light rays emitted by the subject itself but also the light rays reflected on the subject's surface. The position of a pixel included in the event information is expressed by XY coordinates with the event sensor 113 regarded as two-dimensional coordinates. For example, the coordinates of a pixel located five pixels to the right and three pixels upward from the pixel corresponding to the origin are (5, 3). Given the pixel pitch p, the distance from the origin (0, 0) to, for example, (7, 0) is 7p.

ここで、対応部１２２の対応付けの方法について説明する。図２に示されるように、被写体のある１点から出た光線は、開口部１１２Ａ、１１２Ｂのいずれかを通過してイベントセンサ１１３上の２点に到達する。このとき到達したイベントセンサ１１３上の２点に対応する画素において、輝度値の変化量が一定値を超えた場合、同時刻またはほぼ同時刻にイベントが発生することとなる。対応部１２２は、このような２つのイベント情報を対応付ける。イベント情報を対応付けるタイミングは、イベント情報記憶部１２４に新たなイベント情報が記憶されたタイミングでもよいし、一定時間ごとに到来するタイミングであっても良い。 Here, a method of association by the correspondence unit 122 will be explained. As shown in FIG. 2, a light beam emitted from one point on the subject passes through either the opening 112A or 112B and reaches two points on the event sensor 113. If the amount of change in luminance value exceeds a certain value at pixels corresponding to the two points on the event sensor 113 reached at this time, events will occur at the same time or approximately the same time. The correspondence unit 122 associates these two pieces of event information. The timing for associating event information may be the timing when new event information is stored in the event information storage unit 124, or may be the timing that arrives at regular intervals.

対応部１２２は、イベント情報から、ある時刻ｔに発生したイベント情報のうち、同一の符号情報を含むイベント情報をグループ化し、これらからイベント情報を対応付けるマッチングを行う。符号情報は２つあるので、プラスとマイナスの２つにグループ化されることもある。このグループを候補イベント群とも表現する。 The correspondence unit 122 groups event information that includes the same code information among the event information that occurred at a certain time t from the event information, and performs matching to associate event information from these. Since there are two pieces of code information, they may be grouped into two groups, plus and minus. This group is also expressed as a candidate event group.

なお、時刻ｔに発生したイベント情報として、タイムスタンプが時刻ｔと完全一致するイベント情報だけとしてもよいし、タイムスタンプの差が所定範囲内のイベント情報だけとしてもよい。なお、タイムスタンプの差が所定範囲内として、例えば時刻ｔ周辺の適当な時間しきい値の範囲に収まることが挙げられる。この場合、時間しきい値を例えばＴの場合、時刻ｔ周辺は、時刻ｔ－Ｔから時刻ｔ＋Ｔまでであることから、タイムスタンプの差は２Ｔ以内となる。 Note that the event information that occurred at time t may be only event information whose timestamp completely matches time t, or only event information whose timestamps have a difference within a predetermined range. Note that the difference between the timestamps is considered to be within a predetermined range, for example, when it falls within an appropriate time threshold range around time t. In this case, if the time threshold is T, for example, the time around time t is from time t-T to time t+T, so the difference in time stamps is within 2T.

イベントセンサ１１３の仕様によっては、センサ上の水平またや垂直のラインごとにイベント情報を出力する。そのため、異なるラインで発生したイベントの間でタイムスタンプにずれが生じることがある。これらのずれを考慮して時間しきい値を設けてもよいし、開口部を水平または垂直のライン上に配置することで、対応するイベントのイベント情報が同じタイミングで出力されるようにしてもよい。 Depending on the specifications of the event sensor 113, event information is output for each horizontal or vertical line on the sensor. As a result, timestamps may deviate between events that occur on different lines. You can set a time threshold to account for these shifts, or you can arrange the openings on a horizontal or vertical line so that event information for corresponding events is output at the same timing. good.

このように、イベント情報は、当該イベント情報に含まれる時刻に関する条件（以下、「時刻条件」ともいう）を満たし、さらに符号情報に関する条件（以下、「符号条件」ともいう）によりグループ化される。したがって、時刻条件を満たすイベント情報が複数存在し、かつ符号条件によるグループ化によって同じグループに属するイベント情報が複数存在した場合に候補イベント群が得られる。 In this way, event information satisfies the time-related conditions included in the event information (hereinafter also referred to as "time conditions"), and is further grouped according to code information-related conditions (hereinafter also referred to as "code conditions"). . Therefore, when there is a plurality of event information that satisfies the time condition and there is a plurality of event information that belongs to the same group by grouping based on the sign condition, a candidate event group is obtained.

次に、対応部１２２は、候補イベント群の中でイベントセットを決定するために、マッチングを行う。各候補イベント群に対し、逐次イベントセットを決定しても良いし、全てのイベントセットを同時に決定するような最適化を行ってもよい。 Next, the correspondence unit 122 performs matching in order to determine an event set among the candidate event group. Event sets may be determined sequentially for each candidate event group, or optimization may be performed such that all event sets are determined simultaneously.

マッチングの方法はどのような方法でも良い。最も単純なマッチングの方法は、候補イベント群の中から任意に２つのイベント情報を選択し、それらイベントセットとする方法である。また、例えば、候補イベント群を更に別の方法で絞り込んでもよい。絞り込む例として、あるイベント情報についてマッチングを行う際に開口部の配置に基づいてエピポーラ線を定義し、エピポーラ線上で発生したイベント情報のみを候補とする方法が上がられる。この場合、エピポーラ線上で発生したイベント情報のみに対して処理を行えばよいので、処理量を削減することができる。その他のマッチングの方法として、一般的な画像処理で行われるテンプレートマッチングや、相互相関法、ニューラルネットワークを使う方法などが考えられる。 Any matching method may be used. The simplest matching method is to arbitrarily select two pieces of event information from a group of candidate events and create an event set. Furthermore, for example, the candidate event group may be further narrowed down by another method. As an example of narrowing down, when performing matching on certain event information, an epipolar line is defined based on the arrangement of openings, and only event information that occurs on the epipolar line is selected as candidates. In this case, since it is only necessary to process event information that occurs on the epipolar line, the amount of processing can be reduced. Other matching methods include template matching performed in general image processing, cross-correlation methods, and methods using neural networks.

なお、候補イベント群に属するイベント情報に含まれる位置と同じ位置またはその近傍の位置で、イベント情報に含まれるタイプスタンプの前後の時刻で発生したイベント情報をマッチングで使用してもよい。前後の時刻で発生したイベント情報を用いて決定されたイベントセットや既に推定された３次元情報に基づいて、マッチングに用いるイベント情報に拘束条件を与えてもよい。また、符号情報がプラスのイベントセットとマイナスのイベントセットの間に対応関係があるとして、プラスの候補イベント群とマイナスのイベント候補群の２つでマッチングを行ってもよい。例えば、タイムスタンプが一致する符号情報がプラスのイベント情報と符号情報がマイナスのイベント情報とを用いてマッチングを行ってもよい。 Note that event information that occurs at a time before or after the timestamp included in the event information at the same location as the location included in the event information belonging to the candidate event group or a location near the location may be used for matching. Constraint conditions may be given to event information used for matching based on an event set determined using event information that occurred at previous and subsequent times or already estimated three-dimensional information. Further, assuming that there is a correspondence between an event set whose sign information is positive and an event set whose sign information is negative, matching may be performed using two event sets, a positive event candidate group and a negative event candidate group. For example, matching may be performed using event information with positive code information and event information with negative code information with matching time stamps.

なお、被写体上の点Ａと点Ｂから出た光がイベントセンサ１１３に到達した際に、到達点のどちらかが重なる場合を考慮したマッチングを行うこともできる。重なった点をＸ、重なっていない方の点をＸ_Ａ、Ｘ_Ｂとする。Ｘでは２つの光の光量の合計に応じて輝度値が変化する。このため、Ｘ_Ａ、Ｘ_Ｂからイベント情報が出力するとは異なるタイミングでＸからイベント情報が出力されることもある。このとき、Ｘからイベント情報が出力されるタイミングから一定時間内に出力されるイベントデータの個数と、輝度値の変化パターンは、２つの光の変化パターンを合成したものと関係するものとみなす。これにより、前記関係が成立するＸ_Ａ、Ｘ_Ｂを対応付けることができる。Ａに対応する輝度値の変化とＢに対応する輝度値の変化において、符号情報が異なることもあり得るため、プラスの候補イベント群とマイナスのイベント候補群の２つを用いてマッチングを行ってもよい。 Note that when the light emitted from points A and B on the subject reaches the event sensor 113, matching can also be performed in consideration of the case where either of the arrival points overlaps. Let the overlapping point be X, and the non-overlapping points be X_A and X_B. In X, the brightness value changes depending on the total amount of light of the two lights. Therefore, event information may be output from X at a different timing from when event information is output from X_A and X_B. At this time, the number of event data output within a certain period of time from the timing at which event information is output from X and the change pattern of the brightness value are considered to be related to the combination of the two light change patterns. Thereby, it is possible to associate X_A and X_B with which the above relationship is established. Since the sign information may be different between the change in brightness value corresponding to A and the change in brightness value corresponding to B, matching is performed using two candidate events: a positive event candidate group and a negative event candidate group. Good too.

上述した図２を用いて、３次元情報推定装置１００から被写体までの距離の推定方法について説明する。まず、図２おいて、レンズ１１１と開口部１１２Ａとの距離を０とし、同様にレンズ１１１と開口部１１２Ｂとの距離を０とする。レンズ１１１の焦点距離をｆ、レンズ１１１とイベントセンサ１１３との距離をＬとする。レンズ１１１から合焦位置までの距離をＤとする。開口部１１２Ａと開口部１１２Ｂとの間の距離をＡとする。被写体１１５とレンズ１１１と距離をＤ'とする。 A method for estimating the distance from the three-dimensional information estimation device 100 to the subject will be described using FIG. 2 described above. First, in FIG. 2, the distance between the lens 111 and the opening 112A is set to 0, and similarly the distance between the lens 111 and the opening 112B is set to 0. Let f be the focal length of the lens 111, and L be the distance between the lens 111 and the event sensor 113. Let D be the distance from the lens 111 to the focal position. Let A be the distance between the opening 112A and the opening 112B. Let the distance between the subject 115 and the lens 111 be D'.

開口部１１２Ａと開口部１１２Ｂのそれぞれを通過した光線が交差する面とレンズとの距離をＬ'とし、それぞれの光線がイベントセンサ１１３に到達した点同士の距離をＣとする。なお、上述したように開口部１１２Ａ、１１２Ｂの大きさはごく小さいことから、イベントセンサ１１３上に到達した点のボケはごく小さく、互いに重ならないものとする。このときレンズの結像公式に基づくと下記（１）が成り立つ。 Let L' be the distance between the lens and the plane where the light rays that have passed through the openings 112A and 112B intersect, and C be the distance between the points where the respective light rays reach the event sensor 113. Note that, as described above, since the sizes of the openings 112A and 112B are extremely small, the blurring of the points reaching the event sensor 113 is extremely small, and it is assumed that they do not overlap with each other. At this time, the following (1) holds true based on the lens imaging formula.

また、Ｒ、Ｃ、Ｌ'、Ｌの間には下記（２）が成り立つ。 Furthermore, the following (2) holds true between R, C, L', and L.

上記（１）（２）から、下記（３）に示されるようにＤ'が求まる。 From (1) and (2) above, D' can be found as shown in (3) below.

よっては、画素のピッチをｐとし、光線がイベントセンサ１１３に到達した点同士間の座標上での距離をΔｘとすると、上記（３）は、下記（４）となる。 Therefore, if the pixel pitch is p and the distance on the coordinates between the points where the light rays reach the event sensor 113 is Δx, the above (3) becomes the following (4).

この（４）に示した数式を用いて、３次元情報推定装置１００は、３次元情報推定装置１００から被写体までの距離を推定する。 Using the formula shown in (4), the three-dimensional information estimation device 100 estimates the distance from the three-dimensional information estimation device 100 to the subject.

以上説明した３次元情報推定装置１００により実行される処理の流れをフローチャートを用いて説明する。図４、図５は、３次元情報推定処理の流れを示すフローチャートである。なお、図４、図５に示されるフローチャートは、イベント情報を対応付けるタイミングを、イベント情報記憶部１２４に新たなイベント情報が記憶されたタイミングとした場合の処理の流れを示す。 The flow of processing executed by the three-dimensional information estimating device 100 described above will be explained using a flowchart. 4 and 5 are flowcharts showing the flow of three-dimensional information estimation processing. Note that the flowcharts shown in FIGS. 4 and 5 show the flow of processing when the timing for associating event information is the timing at which new event information is stored in the event information storage unit 124.

図４において、イベント情報取得部１２１は、イベントが発生すると（ステップＳ１０１：ＹＥＳ）、撮像部１１０から出力されるイベント情報を取得する（ステップＳ１０２）。イベント情報取得部１２１は、取得したイベント情報をイベント情報記憶部１２４に記憶する（ステップＳ１０３）。イベント情報取得部１２１は、イベントが発生したことを示すイベント発生ＭＳＧ（メッセージ）を対応部１２２に送信し（ステップＳ１０４）、ステップＳ１０１の処理に戻る。 In FIG. 4, when an event occurs (step S101: YES), the event information acquisition unit 121 acquires event information output from the imaging unit 110 (step S102). The event information acquisition unit 121 stores the acquired event information in the event information storage unit 124 (step S103). The event information acquisition unit 121 transmits an event occurrence MSG (message) indicating that an event has occurred to the corresponding unit 122 (step S104), and returns to the process of step S101.

次に、図５のフローチャートについて説明する。なお、図５のフローチャートでは、上述した時刻条件を満たすイベント情報が複数存在することを、時刻条件満足情報が存在する、と表現する。また、符号条件によるグループ化によって同じグループに属するイベント情報が複数存在することを、符号条件満足情報が存在する、と表現する。 Next, the flowchart of FIG. 5 will be explained. In addition, in the flowchart of FIG. 5, the presence of a plurality of pieces of event information that satisfies the above-described time condition is expressed as the presence of time condition satisfaction information. Furthermore, the existence of a plurality of pieces of event information that belong to the same group due to grouping based on code conditions is expressed as the presence of code condition satisfaction information.

図５において、対応部１２２は、イベント情報取得部１２１からイベント発生ＭＳＧを受信すると（ステップＳ２０１：ＹＥＳ）、対応部１２２は、時刻条件満足情報が存在するか否かを判定する（ステップＳ２０２）。時刻条件満足情報が存在しない場合には（ステップＳ２０２：ＮＯ）、対応部１２２は、処理を終了する。 In FIG. 5, when the handling unit 122 receives the event occurrence MSG from the event information acquisition unit 121 (step S201: YES), the handling unit 122 determines whether time condition satisfaction information exists (step S202). . If the time condition satisfaction information does not exist (step S202: NO), the correspondence unit 122 ends the process.

時刻条件満足情報が存在する場合には（ステップＳ２０２：ＹＥＳ）、対応部１２２は、符号条件満足情報が存在するか否かを判定する（ステップＳ２０３）。符号条件満足情報が存在しない場合には（ステップＳ２０３：ＮＯ）、対応部１２２は、処理を終了する。符号条件満足情報が存在する場合には（ステップＳ２０３：ＹＥＳ）、対応部１２２は、上述したマッチングを行う（ステップＳ２０４）。推定部１２３は、上記（４）を用いて３次元情報推定装置１００から被写体までの距離を推定し（ステップＳ２０５）、処理を終了する。 If time condition satisfaction information exists (step S202: YES), the correspondence unit 122 determines whether code condition satisfaction information exists (step S203). If the code condition satisfaction information does not exist (step S203: NO), the correspondence unit 122 ends the process. If code condition satisfaction information exists (step S203: YES), the correspondence unit 122 performs the above-mentioned matching (step S204). The estimation unit 123 estimates the distance from the three-dimensional information estimation device 100 to the subject using the above (4) (step S205), and ends the process.

以上説明した３次元情報推定処理が、イベントが発生するたびに実行されることで、３次元情報推定装置１００から被写体までの距離が推定されることから、被写体の３次元情報を推定できる。 By executing the three-dimensional information estimation process described above every time an event occurs, the distance from the three-dimensional information estimation device 100 to the subject is estimated, so that the three-dimensional information of the subject can be estimated.

上述した実施形態において、開口部１１２Ａ、１１２Ｂを有するマスク部１１２は、レンズ１１１と被写体の間にあるものとしてもよい。また、レンズ１１１が複数のレンズで構成される場合、マスク部１１２は、レンズとレンズの間にあるものとしてもよい。さらに、被写体が合焦面より手前にあるものとしてもよい。 In the embodiment described above, the mask section 112 having the openings 112A and 112B may be located between the lens 111 and the subject. Furthermore, when the lens 111 is composed of a plurality of lenses, the mask section 112 may be located between the lenses. Furthermore, the subject may be in front of the focal plane.

また、被写体に照射する照明としてレーザー走査式プロジェクタを用いてもよい。レーザー走査式プロジェクタではレーザーの照射位置を高速で動かすため、肉眼では広い範囲に照明があたっているように見える場合でも、実際は常に１点にしか照明があたっていない。したがっていずれの時刻ｔにおいても、イベント情報は高々２つしか存在しないため、それらがそのままイベントセットとして決定される。 Furthermore, a laser scanning projector may be used as the illumination for illuminating the subject. Laser scanning projectors move the laser irradiation position at high speed, so even if it appears to the naked eye that a wide area is illuminated, in reality only one point is illuminated at any given time. Therefore, at any time t, there are only two pieces of event information at most, and these pieces of event information are determined as they are as an event set.

また、任意のプロジェクタを利用して離散的なドットパターンの照明を被写体に照射してもよい。ドットの一つから開口部を通してイベントセンサ上に到達する点のセット同士の距離が、隣り合うドットとの距離よりも十分に小さくなるように、開口部の配置とドットパターンを設計することで、到達点が重なることを考慮せずにマッチングを行うことができる。また、このような照明を高速で切り替えることによってより多くのイベント情報が得られるため、より密な３次元情報を得ることができる。 Further, an arbitrary projector may be used to irradiate the subject with discrete dot pattern illumination. By designing the arrangement of the apertures and the dot pattern so that the distance between the set of points that reach the event sensor from one of the dots through the aperture is sufficiently smaller than the distance between adjacent dots, Matching can be performed without considering the fact that the arrival points overlap. Furthermore, by switching such lighting at high speed, more event information can be obtained, so more detailed three-dimensional information can be obtained.

また、本実施形態では、マスク部１１２は、２つの開口部１１２Ａ、１１２Ｂを有しているが、３つ以上の開口部を有するようにしてもよい。開口部の配置位置として、例えば開口部の中心が全て同一直線上となる位置が挙げられる。開口部が３つの場合には、開口部の配置位置として、三角形の頂点が挙げられる。 Further, in this embodiment, the mask section 112 has two openings 112A and 112B, but it may have three or more openings. An example of the arrangement position of the openings is a position where the centers of the openings are all on the same straight line. When there are three openings, the openings can be arranged at the vertices of the triangle.

また、開口部に代えて、マイクロレンズアレイやその他の光学素子を使用して複数の像をイベントセンサ上に到達させるようにしてもよい。 Further, instead of the aperture, a microlens array or other optical element may be used to cause a plurality of images to reach the event sensor.

なお、通常のイメージセンサを備えるカメラ（以下、「フレームカメラ」という）に、本実施形態と同様の開口部を設ける場合、イメージセンサにおいて２つの像が重畳して観測されるため、各開口部を互いに異なるカラーチャンネルに対応させて像を分離するなどしない限り３次元情報推定は不可能である。 Note that when a camera equipped with a normal image sensor (hereinafter referred to as a "frame camera") is provided with an opening similar to this embodiment, two images are observed superimposed on the image sensor, so each opening It is impossible to estimate three-dimensional information unless the images are separated by making them correspond to different color channels.

一方、本実施形態の３次元情報推定装置１００と同様に、イベントセンサを備えるカメラ（以下、「イベントカメラ」という）は被写体、照明、イベントカメラが動いた際に発生する輝度の変化量を観測するため、限られた部分でしかイベントが発生しないことから２つの像に対応するイベントが重畳する可能性が低い。 On the other hand, similarly to the three-dimensional information estimation device 100 of this embodiment, a camera equipped with an event sensor (hereinafter referred to as "event camera") observes the amount of change in brightness that occurs when the subject, lighting, and event camera move. Therefore, since events occur only in a limited area, there is a low possibility that events corresponding to two images will overlap.

また、フレームカメラの出力画像でマッチングを行う場合、基本的に一画素の輝度情報でのマッチングは困難であるため、空間的に隣接した画素の輝度情報も使用するブロックマッチングを行う。イベントカメラでは一画素について輝度の変化を非常に高い分解能で観測するため、その変化パターンである時系列データを使用することで隣接画素の情報を使用せずにマッチングを行うことができる。したがって、イベントセンサでは、２つの像に対応するイベントの発生位置がごく近くに位置などの理由でブロックマッチングが不可能であっても探索が容易である。 Furthermore, when matching is performed using the output image of a frame camera, it is basically difficult to match using luminance information of one pixel, so block matching is performed that also uses luminance information of spatially adjacent pixels. Event cameras observe changes in brightness for each pixel with very high resolution, so by using time-series data that represents the change pattern, matching can be performed without using information about adjacent pixels. Therefore, with the event sensor, even if block matching is impossible due to the occurrence positions of events corresponding to two images being located very close together, the search is easy.

以上の理由により、イベントカメラに複数の開口部を設けることで、フレームカメラでの同様の構成では不可能な、単一のチャンネルで３次元情報を推定することができる。 For the above reasons, by providing a plurality of apertures in the event camera, three-dimensional information can be estimated using a single channel, which is not possible with a similar configuration of a frame camera.

イベント処理部１２０は、ＣＰＵ（Central Processing Unit）等のプロセッサーとメモリとを用いて構成されてもよい。この場合、イベント処理部１２０は、プロセッサーがプログラムを実行することによって、イベント処理部１２０として機能する。なお、イベント処理部１２０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピュータ読み取り可能な記録媒体に記録されても良い。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピュータシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。 The event processing unit 120 may be configured using a processor such as a CPU (Central Processing Unit) and a memory. In this case, the event processing unit 120 functions as the event processing unit 120 when the processor executes the program. Note that all or part of each function of the event processing unit 120 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). . The above program may be recorded on a computer-readable recording medium. Computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, semiconductor storage devices (such as SSDs: Solid State Drives), and hard disks and semiconductor storages built into computer systems. It is a storage device such as a device. The above program may be transmitted via a telecommunications line.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

本発明は、イベントベースビジョンセンサを備えた３次元情報推定装置に適用可能である。 The present invention is applicable to a three-dimensional information estimation device equipped with an event-based vision sensor.

１００…３次元情報推定装置、１１０…撮像部、１１１…レンズ、１１２…マスク部、１１２Ａ、１１２Ｂ…開口部、１１３…イベントセンサ、１１５…被写体、１１６…合焦面、１２０…イベント処理部、１２１…イベント情報取得部、１２２…対応部、１２３…推定部、１２４…イベント情報記憶部 DESCRIPTION OF SYMBOLS 100... Three-dimensional information estimation device, 110... Imaging unit, 111... Lens, 112... Mask unit, 112A, 112B... Opening part, 113... Event sensor, 115... Subject, 116... Focus plane, 120... Event processing unit, 121...Event information acquisition section, 122...Corresponding section, 123...Estimation section, 124...Event information storage section

Claims

A 3D information estimation method in a 3D information estimation device that focuses light rays emitted from one point on a subject surface to multiple positions on an event-based vision sensor, the method comprising:
Among the pixels of the event-based vision sensor, when the amount of change in brightness value exceeds a certain value, the position of the pixel where the amount of change in brightness value exceeds the certain value and the amount of change in brightness value exceeds the certain value. an event information acquisition step of acquiring event information including the time when the event occurred;
a matching step of associating two pieces of event information among the plurality of pieces of event information acquired in the event information acquisition step based on the time included in the event information;
an estimating step of estimating a distance to the subject based on the position of a pixel included in each of the two pieces of event information associated in the corresponding step;
A three-dimensional information estimation method.

The event information includes code information indicating an increase or decrease in brightness value,
2. The three-dimensional information estimation method according to claim 1, wherein the matching step matches two pieces of event information that include the same code information.

3. The three-dimensional method according to claim 1, wherein the matching step matches two pieces of event information in which the times included in the event information are the same or the difference in time included in the event information is within a predetermined range. Information estimation method.

The three-dimensional information estimation method according to any one of claims 1 to 3, wherein the estimating step estimates the distance to the subject using the distance between the positions of pixels included in each of the two pieces of event information. .

A three-dimensional information estimation device that focuses light rays emitted from one point on a subject surface to multiple positions on an event-based vision sensor,
Among the pixels of the event-based vision sensor, when the amount of change in brightness value exceeds a certain value, the position of the pixel where the amount of change in brightness value exceeds the certain value and the amount of change in brightness value exceeds the certain value. an event information acquisition unit that acquires event information including the time when the event occurred;
a correspondence unit that associates two pieces of event information among the plurality of pieces of event information acquired by the event information acquisition unit based on times included in the event information;
an estimating unit that estimates a distance to the subject based on the position of a pixel included in each of the two pieces of event information correlated by the corresponding unit;
A three-dimensional information estimation device equipped with

A program for causing a computer to execute the three-dimensional information estimation method according to any one of claims 1 to 4.