GB2521429A

GB2521429A - Visual Servoing

Info

Publication number: GB2521429A
Application number: GB1322595.8A
Authority: GB
Inventors: Benoit Vandame
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-12-19
Filing date: 2013-12-19
Publication date: 2015-06-24
Also published as: GB201322595D0

Abstract

An apparatus comprising a moveable actuator and an imaging device coupled to the actuator, in which the actuator is controlled by visual servoing, based on images from the imaging device and a reference image. The invention employs an imaging device including a micro-lens array, with each micro-lens having its own polarisation filter, with each filter having a different polarising direction. Current images captured by the imaging device are compared to a reference image to create an error image which is then modified to take into account the effects of specular reflection, which are determined by comparing micro-images corresponding to different polarisation directions for the reference image and/or the current image. The actuator is then controlled using the modified error image. Embodiments of the present invention may be employed in so called eye-in-hand cameras which are mounted on or embedded in the actuator of a robotic apparatus.

Description

I

Visual Servoing The present invention relates methods and apparatus concerning visual servoing.

Visual Servoing methods aim at guiding a robot from a current position to a reference position using a camera. One application of visual servoing is the so-called "eye-in-hand" robot or apparatus whereby a camera is mounted on the actuator of the apparatus as for instance a robotic arm. Among visual servoing methods, the so-called photometric visual servoing (PVS) algorithm has been introduced by the LAGADIC/INRIA located at Rennes France, see "Visual seivoing set free from image processing' Christophe Collewet, Eric Marchand, Francois Chaumette, ICRA, 2008. This algorithm defined a control law which computes the speed of the robot depending on the subtraction of the current image versus the reference image. Iteratively the control law updates the speed of the robot up to result in the current image being equal to the reference image.

PVS algorithm offers very precise guiding: the current position converges to the reference position with great accuracy.

A so-called 4D light-field camera typically refers to a multi-array camera made of a lens-array and a single sensor. This type of camera can be used for robot guiding based on PVS. The advantage of using a lens array camera, having for example NxN lenses or micro-lenses, over a single lens camera is that for the same field of view and sensor, the camera thickness is typically shorter by a factor N, and also the depth-of-field is enlarged by a factor N. Using a multi-array camera for robot guiding based on a PVS algorithm could be advantageous since the multi-camera array camera can be made shorter in depth than a single lens camera. Unfortunately the accuracy of the guiding is typically decreased by a factor N with an N x N multi-array camera, as compared to a single-lens camera.

Light-Field cameras record 4D (four dimensional) light-field data which can be transformed into various reconstructed images like re-focused images with freely selected focal distance that is the depth of the image plane which is in focus. A re-focused image is built by projecting the various 4D light-field pixels into a 2D (two dimensional) image. Unfortunately the resolution of a re-focused image varies with the focal distance.

First let us consider light-field cameras which record a 4D light-field on a single sensor (a 2 dimensional regular array of pixels). The 4D light-field is recorded by multi-camera array: an array of lenses and the sensor. Figure 1 illustrates a light-field camera with two elements: the lens array and the sensor. Optionally some spacers might be located between the micro-lens array around each lens and the sensor to prevent light from one lens to overlap with the light of other lenses at the sensor side. The lenses of the lens array can be designed with small radius and hence the lenses which make up an array may be referred to as micro-lenses.

4D Light-Field data

The sensor of a light-field camera records an image which is made of a collection of 2D small images arranged within a 2D image. Each small image is produced by the lens (i,f) from the array of lenses.

Figure 3 illustrates the image which is recorded at the sensor. The sensor of a light-field camera records an image of the scene which is made of a collection of 2D micro-images, also called small images, arranged within a 2D image. Each small image is produced by a lens from the array of lenses. Each small image is represented by a circle, the shape of that small image being function of the shape of the micro-lens. A pixel of the sensor is located by its coordinates (x,y). pis the distance in pixels between two centres of contiguous micro lens images. The micro-lenses are chosen such as p is larger than a pixel width. A micro-lens image is referenced by its coordinates (i,j). Some pixels might not receive any light from any micro-lens; those pixels are discarded. Indeed, the space between the micro-lenses can be masked to prevent photons falling outside of a lens (if the micro-lenses are square or another close packed shape, no masking is needed). However most of the pixels receive the light from one micro-lens. The pixels are associated with four coordinates (x,y) and (i,j). The centre of the micro-lens image (i,j) on the sensor is labelled (xy,,y1). Figure 3 illustrates the first micro-lens image (0,0) centred on (x00,y00). The pixel rectangular lattice and the micro-lens rectangular lattice are not rotated. The coordinate (x,,y,,) can be written in function of the 3 parameters: p and (x00,y00): = . i.+ x (1) = Figure 3 also illustrates how an object, represented by the black squares 3, in the scene is simultaneously visible on numerous micro-lens images. The distance it between two consecutive imaging points of the same object 3 on the sensor is known as the disparity. The disparity depends on the physical distance between the camera and the object. it converges to p as the object distance z tends to infinity from the camera.

Geometrical property of the light-field camera

The previous section introduces v the disparity of a given observed object, and p the distance between 2 consecutive micro-lens images. Both distances are defined the disparity of a given observed object, and in pixel units. They are converted into physical distances (meters) w and P by multiplying respectively it and p by the pixels size 8 of the sensor: iv = Sit and P = The distances W and.Pcan be computed knowing the characteristics of the light-field camera. Figure 2 gives a schematic view of the light-field camera having the following features.

* The lens array is made of N by V lenses having a focal distance f. The pitch of the micro-lenses is P The micro-lenses might have any shape like circular or square. The diameter of the shape is lower or equal to.

One can consider the particular case where the lenses are pinholes. With pinholes the following equations remain valid posing f = d.

* The sensor is made of a squared lattice of pixels having a physical size of 8. 8 is in unit of meter per pixel. The sensor is located at the fix distance d from the micro-lens array.

* The object is located at the distance z from the lens-array. The distance between the lens array and the sensor is d, The disparity of the observed object between two consecutive lens is equal to w. The distance between IS 2 lens image centres isP.

From the Thales law we can derive that: zz-i-d (2)

P H Or:

w=Pli+fL9 (3) This equation gives the relation between the physical object located at distance z from the lens array and the disparity w of the corresponding views of that object. This relation is built using geometrical considerations and does not assume that the object is in focus at the sensor side. The focal length f of the micro-lenses and other properties such as the lens apertures allow determining if the micro-lens images observed on the sensor are in focus. In practice, one tunes the distance d once for all using the relation: 11 1 (4) zdj The micro-lens images observed on the sensor of an object located at distance z from the micro-lens array appears in focus as far the circle of confusion is smaller than the pixel size. In practice the range [z,,7,z] of distances z which allows observing a focused micro-images is large and can be optimized depending on the focal length f and the distance ci: for instance one could tune the micro-lens camera to have a range of z from 1 meter to infinite [i,ci]. Embodiments of the presently proposed invention however do not adopt this approach.

Variation of the disparity The light-field camera being designed the values ci, f are tuned and fixed. The disparity W varies with z the object distance. One notes special values of W: * W1. is the disparity for an object at distance such that the micro-lens images are exactly in focus, it corresponds to equation (4) . Mixing equations (3) and (4) one obtains: = P.± (5) * wT is the disparity for an object located at distance az from the lens array. According to equation (3) one obtains: prç =P[1_-fi_9 (6) The variation of disparity is an important property of the light-field camera. The ratio Hç. / W10 is a good indicator of the variation of disparity. Indeed the micro-lens images of objects located at z, are sharp and the light field camera is designed to observed objects around z which are also in focus. The ratio is computed with equations Error! Reference source not found, and Error! Reference source not found.: W a-If ____ = (7) 147Thcu.c a d The ratio is close to 1.0 for a around 1.0. In practice the variations of disparity are within few percent around W1.

The present inventor has further brought to light the following aspects.

A potential difficulty faced by visual servoing techniques concerns specular reflections. PVS typically assumes that all visible objects in the scene are reflecting light according to the so-called Lambertian reflectance. Lambertian reflectance is obtained with material having a surface such that apparent brightness is equivalent to observers whatever the observer's angle of view However, certain materials often exhibit specular as well as Lambertian reflectance. Specular reflection is observed for instance with glossy materials such as metals and plastics. Specular reflections depend on the observer's angle of view and can adversely affect the PVS algorithm because the specular reflections are not necessarily located at the same 3D position between a reference image and the current image.

According to a first aspect of the invention there is provided an apparatus comprising: a moveable actuator; an imaging device coupled to said actuator; a controller adapted to receive images from said imaging device, and control movement of said actuator; said imaging device comprising: a plurality of micro-lenses, and polarising means arranged such that at least two micro-lenses have different polarisation directions and one or more photo-sensors having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the or each photo-sensor forming a micro-image; wherein said controller is configured to compare a reference image with a current image captured by the imaging device for a current position of the actuator, to derive an error image; compare the micro-images corresponding to different polarisation directions for at least one of the reference image or the current image; modify the error image according to said micro-image comparison; and control said actuator based on said modified error image.

In embodiments, each micro-lens of the array has a polarization filter, each filter having a different direction of polarization. It is noted that the effect of the plurality of micro-lenses is to form multiple micro-images which correspond to substantially the same scene (some variation will result if the optical axes of the microlenses are offset). The multiple images can be captured through polarising means having different directions of polarisation, enabling specular reflections to be identified. A light field camera is one arrangement for achieving this, whereby multiple lenses are arranged in conjunction with a single sensor, and registration between sub images is convenient as a result of the single sensor.

However, it is noted that the imaging device could comprise independent sub-imaging devices, each having a separate lens and sensor, but arranged in cooperation to image the same, or substantially the same scene. In such a case, each separate lens corresponds to each micro-lens of a light field camera producing an equivalent micro-image or sub-image, and accordingly, the polarising means is arranged such that at least two separate lenses have different polarisation directions.

In this way, image regions corresponding to specular reflections can be identified and excluded for the purposes of controlling the actuator. In one embodiment therefore the error image is modified by excluding image portions for which the difference between micro-images corresponding to different polarisations directions is above a threshold.

In embodiments, a micro-image mask is constructed, the value of each pixel or region of the mask being determined by the difference between pairs of micro images having different polarization directions, and wherein said error image is modified by masking using said micro-image mask. Advantageously, a pixel of said micro-image mask is set to be excluded if any difference between compared micro-images is greater than a threshold.

This arrangement further affords the advantage that a sensor or camera having a shorter depth dimension can be provided. In addition, extended depth of field compared to a single lens camera can be provided In embodiments, the image sensor is coupled to the actuator in such a way that movements to the actuator are replicated by the image sensor, ie in a rigid manner.

In embodiments the controller is adapted to control the velocity of the actuator, based on the difference between the reference image and the current image.

Preferably the velocity of the actuator is iteratively updated by the controller for each new image acquired by said imaging device, up until the current image and the reference image are substantially the same, or conversely the velocity is substantially zero. The velocity may calculated as a function of the difference between the reference image and the current image, the function being determined by an interaction matrix, in certain embodiments As will be explained below, in embodiments of the invention the interaction matrix depends on the distance of on object in a captured image from the sensor, and in such embodiments an estimate of said distance can be derived from said imaging device having a microlens array. For example, because a light field camera having such a microlens array is able to capture a scene from slightly different angles of view, an estimate of object depth can be made.

In embodiments the plurality of microlenses comprises a square array of NxN micro-lenses. For example the imaging device may include a lens array having a total of 4 lenses, in a 2x2 array pattern, or a total of nine lenses, in a 3x3 array pattern, or a total of 16 lenses, in a 4x4 array pattern.

ID In embodiments including a square array of MW micro-lenses, the interaction matrix, which determines the relationship between the velocity of the actuator and the difference between the reference image and the current image, is a combination of NxN decentred matrices.

Embodiments of the present invention may be employed in so called eye-in-hand cameras which are mounted on or embedded in the actuator of a robotic apparatus. Such cameras are expected to be as small and lightweight as possible. The thickness of a camera is determined by the focal-length of the lens.

To make a camera smaller one option is to use a light-field camera made of a micro-lens array mounted on a single sensor. The distance between the micro-lens array to the sensor is equal to the focal-length of the micro-lenses. That distance is typically N times smaller than a single-lens camera, where N x being the number of micro-lenses covering the sensor.

Each micro-lens forms a small micro-image. The field-of-view of each micro-lens is almost identical to the field-of-view of the single-lens camera assuming same sensor size and N times shorter focal-length. Another advantage of this design is that the depth-of-field of the micro-images is N times larger than the depth-of-field of the single-lens camera. This is a great advantage since the depth-of-field is often quite limited when a single-lens is used with a large aperture to collect more light with short exposure time.

The invention also provides a computer program and a computer program product for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, and a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

ID

The invention extends to methods, apparatus and/or use substantially as herein described with reference to the accompanying drawings. Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, features of method aspects may be IS applied to apparatus aspects, and vice versa. Furthermore, features implemented in hardware may generally be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings: Figure 1 -Schematic view of a multi-array camera made of a lens array and a single sensor.

Figure 2 -Detailed view of a light-field camera made of perfect lenses.

Figure 3 -Schematic view of the 4D Light4ield data recorded by the 2D image

sensor of a light-field camera.

Figure 4 -Projection of a 3D point into the sensor of a single lens camera Figure 5 -Projection of a 3D point into the sensor of a single lens camera Figure 6 -Pixel coordinate of the micro-images on the sensor.

Figure 7 -An articulated actuator including an image sensor.

Figure 8 -control process flow including specular reflection compensation Figure 9 -Schematic illustration of variation of polarised light with 2x2 sub images Figure 10-Schematic illustration of mask

Introduction to visual servoing

Embodiments will be described relating to the so-called eye-in-hand robot: the camera is mounted on or into the robot arm and rigidly follows the effector (as illustrated in Figure 15). A visual servoing algorithm use images captured by the camera to control the motion of the robot. The robot is controlled by 3 translations t and 3 rotations (br; > ), the so-called 6 degrees of freedom. Visual servoing algorithms are control-laws which drive the robot from an initial position to a reference position. Special features are observed by the camera (image coordinates of interest points...). s contains the desired or reference value of the features. The error function a(s) is the error between the desired features extracted from the image at the reference position, and the features extracted from a current position. The aim of the control law is to move the robot to minimize the error e(t), The control law is operated in a closed-loop and causes the robot to converge to the reference position. The control law defines the relation between a (t) and the velocity of the robot v: v=X14e =-2.&-s) (8) With v = (, ci). where is the linear camera velocity, and is its angular velocity.

is the interaction matrix which describes the relationship between the time variation of s and the camera velocity v (L, is a Jacobian matrix). L is the approximation of the pseudo-inverse of L. The various control algorithms which can be used to guide the robot from an initial position to a reference position are grouped in 2 different approaches: Image Based Visual Servoing (IBVS) and Position Based Visual Servoing S (PBVS). In the second case, the 2D extracted features s are reverse-projected to estimate the 3D position of the object being tracked. This latter case is actually 3D servoing.

IBVS is performed only on 2D features extracted from the image. At a reference position the features s are known and serve as a reference for the 2D features extracted from a current position. Figure 4 illustrates a pinhole model of the camera: a point P*(X, YZ) in the 3D space is projected in the camera image plane y. -f) through the pinhole located at (QOM). Coordinate system:(X, Y,.Z) is relative to the pinhole. The distance f from the camera image plane to the pinhole corresponds to the focal length of a perfect thin lens located at the pinhole. Using the perspective equation one deduces that (ry} = {fX7ZfY/Z).

By derivation of the projection equation and using kinematic relation, one deduces the interaction matrix L, of a given projected point p: * rv --a -÷ v f -(9)

ZZ

The interaction matrix is a Jacobian matrix which establishes the relation between the derivative of the 2D features position (2 rows) versus the derivative of the robot position (6 columns corresponding to the robot motion). :(;.y) are real coordinates (not pixel coordinates), which are equal to zero on the main optical axis.

The interaction matrix 1. of the projected point p depends notably on the distance ?. The value 2 is the distance of the point P relative to the pinhole. Therefore, any control algorithm that uses this form of the interaction matrix will estimate or approximate 2,

S

A light-field camera delivers various viewing angles which can be used to roughly evaluate the distance of the objects visible in the scene. This rough estimation could be used to evaluate the 2 parameter of the interaction matrix.

Photometric Visual Servoing Photometric Visual Servoing (PVS) is an IBVS control law where features are the luminance of the pixels. An estimation of the interaction matrix is required to control the motion of the actuator. The interaction matrix L related to the IS luminance i(p,f) atatime t is equal to (assuming Lambertian scene): (10) Where 1. and L. are respectively the first and second row of the interaction matrix given in (28). VL and VL, are the gradient component among and y of the gradient vi of image 1.

Images! recorded by the camera have a size of IL x N,.. pixels with a pixel-pitch The interaction matrix L7is an NJV.. by 6 matrix. It is built with the reference image!. The row {kNJ. 11 associated with the pixel PC/c 3) is computed as follows: LrW--fdx fch' xdx + vdi xvax / -/ x rvdv (11) --*-:t 1 LW --_____ -.

z z z f p) V Where dcx = -:11)/S ---:11)/S is the derivate along k axis; and dv = -1)/S -P(k2 + 1)/S is the derivate along 2 axis. (kJ) is a pixel coordinate which is converted into normalized coordinate (x,Y) such as (icy) = k/N -1(2 1.1N,. -1/2). The value Z is set to the average distance of the objects visible on the image.

Basic control law At a current position of the actuator, and hence camera, image 2 is captured. The error between the visual features at the current position versus the visual features at the reference position is equal to the image difference: e = -I, The velocity v of the robot is computed with equation (8). The pseudo inverse LT of the interaction matrix Lr can be computed simply as IT = Equation (8) becomes: = -L3.LritL(2 -(12) By construction, L is a NJL by 6 matrix. Thus L is a 6 by N;1<çmatrix, and is a 6 by 6 matrix which is then inverted. With this formulation -A(LL1+)1 is computed once knowing the reference image [. The computation of 1L(J -1') requires just 6NJV3. multiplications and additions which are performed for each guiding iteration. The vector v contains the motion of robot (6 values) .2. is typically chosen to equal 1, larger values often makes the robot unstable or oscillating around the reference position. The error = r -I converges to C as the actuator converges to the reference position. The cost function c is defined by = The actuator motion is controlled by vwhich is updated at each iteration: guidance is performed in closed loop, a new image is acquired and a new robot velocity is computed and applied. The control algorithm is operated iteratively converging to a robot velocity substantially equal to zero.

Interaction matrix dedicated to light-field camera

Let us consider a light-field camera made of N. N micro-lenses covering a sensor. The interaction matrix L.r from equation (9) needs to be updated for this light-field camera. Actually, one should derive N < N interaction matrices: one for each micro-lens. To start with, let us define the interaction matrix of a single decentred micro-lens.

The micro-lens is decentred versus the camera coordinate system. Figure 5 IS illustrates the pinhole model of a decentred lens. The pinhole is located at a distance of (X. ?) from the camera centre 0. The coordinate system (0xy) of the camera imaging plane is decentred by (..vv,. a) versus the camera centre.

The difference with a centred camera resides in the rotation axes which are not passing through the middle of the pinhole but through the middle of the camera 0. The interaction matrix must include the decentring of the rotation axis versus the pinhole. Knowing this, the projection equation gives: x (x -x/z and v = Y -Y)JZ. The matrix interaction becomes: Yx -Lx Yf

C TT

L,' -. t--__ itV; (13) The light-field camera is made of N x N micro-lenses with N x N corresponding micro-images recorded at the sensor. Actually.w2 interaction matrices are computed for each decentred micro-lens where (X3, K4) is the decentring of the micro-lens (>j) as illustrated in Figure 6. A global interaction matrix L' is built by concatenating all the.r2 interaction matrices L,(A'. ), .L:' becomes a 2?V by 6 matrix (this matrix has r2 more rows than the L. matrix defined in equation (9) because a point F is forming N2 points p on the camera image plane).

The interaction matrix > of a projected point p being characterized, the interaction matrix U,?' of the luminance of the reference image pixels is computed as described by equation (10). A pixel of coordinate (ki) is located under the micro-lens (t.f) = ckZ'/ki:iL\Y/A..])where LI denotes the integer ceiling function. A pixel of coordinate (kJ) is converted into the normalized coordinate (rv) such as (rv) = (kN/N,. --1/2 kN/N. --1/2 Images I recorded by the lightfield camera have a size of A >< N, pixels with a pixel-pitch d. The interaction matrix L''I<' is a!VL by 6 matrix. It is built with the reference image P. The row [kNX + fl associated with the pixel P:(k,IJ which belongs to the micro-lens (1,1) = (kNJNJ J.!N/N..j) is computed as follows: Mx fdy x.dx + vdv LrfkN, Ci = (In' .v2 y.,/ (8) --u. -f ---r --kf Z.1 :2 1 L 7r: 77J\ / yf ( --x --N / Where dr = -1, fl/S -r( k 4-Lfl/Sis the derivate among k axis; and = r(k. -1)c? -[(k, 4-1)/S is the derivate among axis. (kM is a pixel coordinate which is converted into normalized coordinate (.x, Y) such as (x,. y) = :(k/N. -112 N,. -1/2.). The value.Z is set to the average distance of the objects visible on the image.

The actuator motion is controlled by v which is updated at each iteration: = L::(; -ft (9) Guidance is performed in closed loop, a new image is acquired and a new actuator velocity is computed and applied. The control algorithm is operated iteratively converging to a robot velocity substantially equal to zero.

It is proposed, in one embodiment, to mount N x N polarization filters on top of the N x N micro-lenses with various different polarization angles. Ideally the polarization filter (C]) located on top the micro-lens.1(j) is positioned to have a polarization angle: Li = IN1-+thodN} ( LV I N2 (10) Where [J/NJ is an integer division, and LmodN is the modulo N of t. With this choice of orientations, the N2 polarization angles are uniformly distributed on the half circle from 00 to 1800. With this arrangement of polarization filters, it is possible to identify the effects polarized reflections of various objects. Indeed, if the light received by the camera of a given object is partially or fully polarized, then the light flux recorded on the various sub-images will vary. The pixels affected by the variation of polarization can be identified, and discarded, and thus the effect on actuator guidance is reduced or avoided.

The proposed embodiment comprises the following steps (as illustrated in Figure 8). In step 801, the actuator is positioned at a given reference position. At that position the reference image P is acquired. The interaction matrix L is computed in step 802 as described by equation (8).

S

In step 803, the actuator is positioned at an initial position and the robot motion is set to 0. Then, in step 804, the image / at the current position is acquired.

In step 805, polarised reflections are masked out. The current image and the reference images might record polarized lights. Polarized lights are due to special reflections on the objects visible on the observed scene. Polarized reflections appear with various intensities on the N2 sub-images (iJ). thanks to the polarization filters. The goal of this step is to identify the variation of flux between the sub-images and to record them on a mask M having the size of the sensor IS image (.N. by N. pixels). This mask will be used remove the influence of marked pixels when computing the actuator motion by comparing the current image with the reference image.

The mask M which records the variation of flux between the sub-images is computed as follows in the present embodiment: 1. Sub-image mask creation -Sub-image mask M is created. It is made of A/N by Ny/N pixels as per the sub-image. By default all pixels of Mc are set to I which indicates that corresponding pixels are retained for computing actuator motion.

2. Selecting the sub-images in pairs -The variation of intensities on the N2 sub-images are detected by comparing sub-images 5(L)) with sub-images IN'/2IJ 4-tN2/31) with (U) E ([7V2P LNI2J). 5(U) and 5(1 ± jv2Jjj [2/ji Such pairs have polarization filters that are oriented substantially perpendicularly. In total 1N2/21 couples of sub-images are compared.

3. Updating the sub-image mask -Each couple of sub-images is compared by subtraction such that CJ) = 15(1>)) -L/2L! [2/ D is substantially equal to zero if there is no variation in polarization. Pixels (k.) of image D(kJ) which have values larger than a threshold defined by the user are set to 0 on the sub-image mask: M(k 1) = 0. The sub-image mask updated with the reference image, and also the current image.

4. Building the complete mask 1f -The sub-image mask M records all variations of flux observed on the various sub-aperture images. The complete mask M having the size of the sensor is built by padding the sub-image mask ? by N times. By doing this, any object affected by polarization is masked out in all the sub-aperture images.

The procedure to compute the mask M can be applied with the reference image; with the current image; or ideally with the reference image and then with the current image such that the polarization reflections are masked out on both the current and reference images.

Figure 9 gives a schematic illustration of a 2 by 2 camera observing specular light being polarized. The specular light is maximum on the bottom-left sub-image, and almost invisible on the top-left sub-image. The polarisation filter directions are shown by solid double-headed arrows. The distance between the specular light observed between 2 consecutive sub-images is equal to the disparity NJ2. The sub-image mask Ac. is updated with the variations of flux between sub-images. The complete mask Al is computed by padding sub-image mask P$ as illustrated in Figure 10 (dark value corresponds to null value on the mask).

In step 806 the error image between the reference image U and the current image i is computed, taking into consideration the mask M. For any pixels (k, 0, the error image is computed: e(k. ) = (r(k. 1]-1(k. ))M(k.1), The error image e is then used to compute the actuator motion by: = -).HL'1 (as illustrated in equation 8). The actuator motion is controlled to the new speed v.

In step 807 the actuator guidance is stopped if a maximum number of iterations has been reached. In step 808 the actuator reaches the reference position if the energy of the residual image e is sufficiently small, as determined by comparison with a threshold for example. If the actuator does not reach the reference position, it means that the algorithm is not able to converge for the initial position.

It will be understood that the present invention has been described above purely by way of example, and modification of detail can be made within the scope of the invention. Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.

Claims

CLAIMS1. An apparatus comprising: a moveable actuator; an imaging device coupled to said actuator; a controller adapted to receive images from said imaging device, and control movement of said actuator, said imaging device comprising: a plurality of micro-lenses, and polarising means arranged such that at least two micro-lenses have different polarisation directions and a photo-sensor having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the photo-sensor forming a micro-image; wherein said controller is configured to compare a reference image with a current image captured by the imaging device for a current position of the actuator, to derive an error image; compare the micro-images corresponding to different polarisation directions for at least one of the reference image or the current image; modify the error image according to said micro-image comparison; and control said actuator based on said modified error image.
2. Apparatus according to Claim I wherein, each micro-lens of the array has a polarization filter, each filter having a different direction of polarization.
3. Apparatus according to Claim 1 or Claim 2, wherein said error image is modified by excluding image porbons for which the difference between micro-images corresponding to different polarisations directions is above a threshold.
4. Apparatus according to any preceding claim, wherein said controller is configured to construct a micro-image mask, the value of each pixel of the mask being determined by the difference between pairs of micro images having different polarization directions, and wherein said error image is modified by masking using said micro-image mask.
5. Apparatus according to claim 4, wherein a pixel of said micro-image mask is set to be excluded if any difference between compared micro-images is greater than a threshold.
6. Apparatus according to claim 4 or claim 5, wherein a complete mask is constructed by padding said micro-image mask.
7. Apparatus according to any preceding claim, wherein the controller is adapted to control the velocity of the actuator, based on the modified error image.
8. Apparatus according to Claim 7, wherein the velocity of the actuator is iteratively updated by the controller for each new image acquired by said imaging device.
9. Apparatus according to Claim 7 or Claim 8, wherein the velocity is calculated as a function of the modified error image, the function being determined by an interaction matrix.
1O.Apparatus according to Claim 9, wherein the interaction matrix depends on the distance of an object in a captured image from the sensor, and wherein an estimate of said distance is derived from said imaging device having a microlens array.
11.Apparatus according to any preceding claim, wherein the micro-lenses are arranged in a square array of NxN micro-lenses.
12.Apparatus according to Claim 6 as dependent upon Claim 4, wherein said interaction matrix is a combination of NxN decentred matrices.
13.A method of controlling an apparatus comprising: an actuator; an imaging device coupled to said actuator; a controller adapted to receive images from said imaging device, and control movement of said actuator, said imaging device comprising: a plurality of micro-lenses; polarising means arranged such that at least two micro-lenses have different polarisation directions; and a photo-sensor having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the photo-sensor forming a micro-image; said method comprising: comparing a reference image with a current image captured by the imaging device for a current position of the actuator; determining effects of specular reflection by comparing the micro-images corresponding to different polarisation directions for at least one of the reference image or the current image; and controlling said actuator in dependence upon differences between the current image and the reference image, wherein said differences are compensated for the determined effects of specular reflection.