EP1000318A2

EP1000318A2 - Scanning apparatus and methods

Info

Publication number: EP1000318A2
Application number: EP98937633A
Authority: EP
Inventors: Christopher Peter Flockhart; Guy Richard John Fowler
Original assignee: Tricorder Technology PLC
Current assignee: Tricorder Technology PLC
Priority date: 1997-07-31
Filing date: 1998-07-31
Publication date: 2000-05-17
Also published as: WO1999006950A2; GB9716240D0; GB2328280B; GB9816756D0; GB2328280A; WO1999006950A3; JP2001512241A; AU8636298A; CA2299426A1

Abstract

An arrangement for acquiring the 3D shape of an object (O) comprises a monocular (Figure 9) or a stereoscopic (Figure 8) digital camera arrangement optionally provided with an inertial sensor (11) which indicates the position and/or orientation of the camera(s) in relation to the object either to facilitate the derivation of 3D information from overlapping images (in the former case) or the combination of 3D data sets (in the latter case). In a variant utilising a single freely movable camera (Figures 10, 13 and 14) the inertial sensor is dispensed with or used only to determine relative rotation, and at least the translation of the camera relative to the object (O, O') is determined by tracking a network of points (a, b, c; a', b', c') in the image (I). In other embodiments (Figures 3, 4 and 6) an image processor utilises converging epipolar lines (EPa to EPd) to constrain a search for correlated regions of the images or a projected fractal pattern (Figures 15 and 15A) whose correlated regions (P, P') can be identified from their identical topology in both images.

Description

Scanning apparatus and methods

The present mvent.on relates to a scanning apparatus and method for acquirin_o the 5 three-α.mensⁱonal shape^, size or other three-dimensional surface features of an object such as colour for example. The invention relates particularly but not exclusively to hand-held or other freely movable 3D scanners.

As far as we are aware only two hand-held scanners are known, namely that described _{] ()} ⁱn US 4.627^,734 (Schulz⁾ and its equivalent EP-A-553,226 and that described in our own PCT/GB95/01994 and equivalent granted patent GB 2.292.605B The Schulz scanner arrangement utilises a fixed optical system to determine the instantaneous posⁱtⁱon and orⁱentatⁱon of the scanner and one embodiment of the scanner in our PCT applⁱcatⁱon utⁱlises inertial sensors on the scanner to determine its position and orⁱentatⁱon. Other embodiments utilise software techniques to combine separate 5 ⁱndⁱvⁱdually acquⁱred 3D surface portions into an overall 3D surface descπption. Both the Schulz patents and our own PCT application disclose scanners whose optical arrangements acquire depth information directly by means of an inclined photodetector array which detects an optical pattern projected onto the scanned object.

One scanner which does not require a projected optical pattern and which is freely movable wⁱth respect to the scanned object is disclosed in US 4_,993.836 ₍Furuhashi e_l aϊ\ whⁱch uses a photogrammetric arrangement comprising two spaced apart cameras wⁱth parallel op^tical axes and having fields of view which overlap in the region of the scanned ob^ject. An arbitrary line of points on the scanned object is pro_jected as two ₅ lⁱnes on the respective image planes of the cameras and the geometry of the camera arrangement then enables the 3D shape of the line to be determined. Overlappino lⁱnes obtaⁱned in a similar manner following rotation of the scanned object are combⁱned by correlating their overlapping regions to derive a closed loop which defⁱnes one cross-section of the object's surface and a multiplicity of such cross- sectⁱonal loops are derived and combined to form a wire frame description of the ⁰ object s surface. In the preferred embodiment the ob_ject is a large forging which is rotated and the cameras are fixed and spaced one metre apart. There is no reference to a hand-held scanner.

The prⁱnciple of the geometrical calculation of Furuhashi et al is used in some ₅ embodⁱments of the present invention and is illustrated in Figures 1 and 2 below.

Referring to Figures 1 and 2, which show the two cameras 1 and 2 in the XZ ⁽horizontal ⁾ plane and the ZY ⁽vertical) plane respectively, each camera comprises a focussⁱng lens L of focal length a and a photosensitive imaging plane 1 and the optical 99/06950

2 axis of each camera is spaced apaπ from the Z axis b\ a distance S. An arbitrary point (X,Y.Z) on the object in the field of view of both cameras is imaged onto the image plane 1 of each camera as illustrated by rays 10, 10" and 20, 20^". Camera 1 images the point (X,Y,Z⁾ at a point (xj _t -y) in local coordinates of the camera system and camera 2 images the point at a point (-x2--y) in the local coordinates.

Considering the similar triangles formed by the undeviated rays 10, 10^* and 10":-

(S - X)/Z = x j /a (i)

(X + S)/Z = -x₂ /a (ϋ)

Y/Z = -y/a (iϋ)

1⁽1

Hence Z = 2Sa/(x ) - x2) (iv)

X = Z(x ] ÷ 2)/2a (v)

Y = -yZ/a (vi).

15

Hence the coordinates X, Y and Z can be determined from x ] , X2 and y.

In Furuhashi e\ al it is assumed that the point (X,Y,Z) is relatively distant in comparison with the focal length but in fact the above analysis is applicable irrespective of whether point (X. Y, Z) is distant from the cameras, provided that it is in focus eg as a result of stopping down the lenses L.

20

No correlation of the images is disclosed in Furuhashi.

A problem which arises in the case of objects of complex shapes is that it is difficult to determine whether a particular point on one camera's image plane corresponds to a particular point on the other camera^'s image plane (ie whether the points are both conjugate points of the same point on the object's surface). For example, overhanging -⁵ regions of an object might obscure an underlying region so that it only appears in one camera^'s filed of view. This problem is particularly likely to arise when the spacing of the cameras is relatively large compared to their distance from the object. However if the spacing is reduced relative to the distance from the object, the geometrical accuracy is compromised.

WO91/15732 (Gordon) discloses an arrangement in which a laser scanner projects a ³ series of stripes onto the scanned object and left and right cameras to detect the distorted stripes from the object. It is recognised that a given bright point in the image plane of one camera cannot be simply correlated with an illuminated point on the surface of the object because it is not known which stripe illuminates that point.

Accordingly, an arbitrary pixel in the stripe in one camera's image plane is selected and a line drawn through the centre of the camera lens projecting this line out into ³⁵ space. This line is then projected onto the image plane of the other camera and the resulting epi-polar line in the other camera's image plane cuts a number of stripes also imaged on its image plane. Any one of these points of intersection could in principle correspond to the arbitrary pixel mentioned above. The particular point which corresponds is found by projecting all the points of intersection back into space 99/06950

3 and determining w hich of the resulting lines intersects a laser stripe from the laser projector.

The above arrangement has the disadvantage that a projected pattern is required and that some uncertainty in the correlation of points may arise if any of the last- mentioned projected points cut or nearly cut more than one laser stripe. ^'

This arrangement is also susceptible to the problem of overhang'impaired accuracy mentioned above.

One other type of scanner of potentially high accuracy which may be mentioned in passing is based on the detection of Moire fringe patterns. One such scanner is disclosed in our co-pending patent application PCT/GB95/02431 (Moore).

10 One object of the present invention is to provide scanner arrangements which do not require projected patterns.

Another object of certain embodiments is to provide scanner arrangements in which movement of the scanner relative to the scanned object is determined by processing _{] <} the image of the object and does not require hardware such as inertial sensors (although the output of inertial sensors can be used to supplement such processing).

In one aspect the invention provides an arrangement as claimed in claim 1 for acquiring the 3D shape of an object.

20 In another aspect the invention provides an arrangement as claimed in claim 8 for acquiring the 3D shape of an object.

In another aspect the invention provides a method of processing overlapping images as claimed in claim 1 1.

25

In another aspect the invention provides an image processing arrangement as claimed in claim 17.

In another aspect the invention provides an arrangement for acquiring the 3D shape of an object as claimed in claim 24.

30

The '"viewpoints" of the cameras can be characterised by differences in position and/or in orientation of the camera relative to the object.

Further independent aspects of the invention are claimed in claims 26, 27, 28 and 32.

⁵ Preferred embodiments of the invention are described below by w ay of example only with reference to Figures 1 to 19 of the accompanying drawings, wherein: Figure 1 is a ra\ diagram show ing the camera arrangement of one embodiment in plan view,

Figure 2 is a ray diagram showing the camera arrangement of Figure 1 in elevation,

Figure 3 is a sketch perspective ray diagram of the embodiment of Figures 1 and 2 showing the generation of epi-polar lines of the corners of one image region ABCD and their intersection at a common point adjacent the corresponding image region in the other camera^'s image plane.

Figure 4 is an end elevation of the above embodiment looking through the image planes towards the object region and show ing the geometrical construction of the above epi-polar lines.

Figure 5 is a ray diagram in plan view showing the above embodiment and illustrating the uncertainty the object position w ithout correlation of image points in the left and right cameras' image planes.

Figure 6 is a front ele\ ation of the above ray diagram show ing how epi polar lines can be used to assist in the correlation of image points

Figure 7 is a flow diagram showing the process of correlating the image points in the above embodiment

Figure 8 is a sketch perspective view of a variant of the above embodiment utilising inertial sensors,

Figure 9 is a sketch perspective view of another embodiment utilising onlv a single camera

Figure 10 is a sketch perspective view show ing a rav diagram of the embodiment of Figure 9 being used to track its position relatι\ e to an object

Figure 1 1 is an elevation of a ray diagram showing the correlation of image points between the images of the respective cameras in the first embodiment or of the camera m different known positions in the second embodiment

Figure 12 is a ray diagram which is a section taken on XII-XII of Figure 1 1 and illustrates the derivation of the object position relativ e to the camera from the correlated images,

Figure 13 is a flow diagram showing the process of correlating the image points in the embodiment of Fisures 9 and 10 Figure 14 is a flow diagram showing a process for obtainin g a complete 3d surface description using the embodiment of Figures 10 and 1 1 and image correlation software:

Figure 15A is an illustration of a projected fractal pattern for use in an embodiment of the invention:

Figure 15B is an illustration of the distortion of the fractal pattern by the inclination of the camera relative to the surface;

Figure 16 is a sketch perspective view of a further embodiment of the invention;

Figure 17 is a diagrammatic transverse cross-section of an endoscope head in accordance with another aspect of the invention:

Figure 18 is a diagrammatic side view of the endoscope head of Figure 17, and

Figure 19 is a diagrammatic representation of a further endoscope in accordance with the present invention.

Figures 1 and 2 have already been referred to and are applicable to the optics of the first embodiment. Provided that a given point x ] , y in one camera's image plane 1 can be correlated with an image point x₂, y in the other camera's image plane (ie both points are conjugate points of a common point X, Y, Z on the object^'s surface) then the coordinates of the point X, Y, Z can be found. In general, however, this correlation cannot be performed without some processing of the two images in the respective image planes of the two cameras 1 and 2.

Figure 3 illustrates a preliminary step in searching for points A2, B2. C2 and D2 in the image plane I of camera 2 which correlate with given points A l , B l . C l and DI in the image plane of camera 1. Points A, B, C and D on the surface of the object define a surface region S and are imaged by both cameras as the above sets of points. Undeviated ray lines a, b, c and d connect A to A 1 , B to B 1 , C to C 1 and D to D 1 and when imaged onto image plane I of camera 2 define four epi-polar lines EP_a to EPd respectively which meet at a point X2'. This is the conjugate point of the centre XI of lens L of camera 1. The position of these epi-polar lines is independent of the position and orientation of the object and the vertices of the image region A2, B2. C2, D2 of camera 2 must lie on them.

Figure 4 illustrates the construction of the epi-polar lines. A construction line CONST is constructed so as to join the optical centres X I and X2 of the lenses L. It can then be seen that line a projects onto epi-polar line EP_a w hose points A2 and X2^" define with point X2 a triangle which is geometrically similar to triangle A. X I . X2. The other epi-polar lines can be constructed similarly and it will be noted that they intersect at a point X2^* which is a distance 2S (the spacing between X I and X2) from

X2.

The uncertainty in the position of the object O is illustrated in Figures 5 and 6. If the object is in position O then points 3 and 4 in the image of camera 1 correlate w ith points 3' and 4^' respectively in the image of camera 2. If, however, the object is in a position O^' then they correlate with points 5 and 6 respectively.

Referring now to Figure 5. if it is assumed for the sake of simplicity that the region S of the object O which defines the above points is rectangular (ie there are two points 3 and two points 4 vertically displaced from each other) then in position O the image region 3, 3, 4, 4 of camera 1 correlates with image region 8 of camera 2 whereas in position O' the image region 3. 3, 4, 4 of camera 1 correlates with image region 7 of camera 2. However the vertices of both image regions 7 and 8 lie on epi-polar lines EP of points 3, 3, 4. 4. In general there will be a range of possible positions of the object corresponding to a range of correlated regions (eg 7 and 8) which however will all lie on the set of epi-polar lines EP whose positions are determined entirely by the camera geometry and are independent of the position of the object.

The true position of the object can be found by an algorithm which selects various rectangular image regions whose vertices lie on the respective epi-polar lines EP and compares them with image region 3, 3, 4, 4. The image region of camera 2 lying on these epi-polar lines which gives the best match is taken to be that w hich truly correlates with the image region 3, 3, 4, 4 of camera 1 and thereby defines the position of the object with reference to the camera arrangement and also enables the 3D coordinates of each point on image region S to be determined by the geometrical procedure of Figures 1 and 2.

Suitable algorithms for correlating much less closely constrained image regions of much less closely corresponding images (eg photographs taken during airborne surveys) are already known -eg Gruen^'s algorithm (see Gruen. A W "Adaptive least squares correlation : a powerful image matching technique"5 Afr J of Photogrammetry, remnie sensing and Cartography Vol 14 No 3 ( 1985) and Gruen. A W and Baltsavias, E P "High precision image matching for digital terrain model generation" Int Arch photogrammetry Vol 25 No 3 ( 1986) p254) and particularly the "region-growing" modification thereto which is described in Otto and Chau "Region- growing algorithm for matching terrain images" Image and Vision Computing Vol 7 No 2 May 1989 p83, all of which are incorporated herein by reference.

Essentially, Gruen's algorithm is an adaptive least squares correlation algorithm in which two image patches of typically 15 x 15 to 30 x 30 pixels are correlated (ie selected from larger left and right images in such a manner as to give the most consistent match between patches) by allowing an affine geometric distortion between coordinates in the images (ie stretching or compression in which originally adjacent points remain adjacent in the transformation) and allowing an additive radiometric distortion between the grey levels of the pixels in the image patches, generating an over-constrained set of linear equations representing the discrepancies between the correlated pixels and finding a least squares solution which minimises the discrepancies.

The Gruen algorithm is essentially an iterative algorithm and requires a reasonable approximation for the correlation to be fed in before it will converge to the correct solution. The Otto and Chau region-growing algorithm begins with an approximate match between a point in one image and a point in the other, utilises Gruen^'s algorithm to produce a more accurate match and to generate the geometric and radiometric distortion parameters, and uses the distortion parameters to predict approximate matches for points in the region of the neighbourhood of the initial matching point. The neighbouring points are selected by choosing the four adjacent points on a grid having a grid spacing of eg 5 or 10 pixels in order to avoid running Gruen^'s algorithm for every pixel.

The first pair of points used to generate the initial approximate match can be found eg by searching for patterns or features in each image which match approximately and choosing pairs of clearly defined points within the pairs of matching patterns or features. This can be done by appropriate software or firmware w ithout the need for human intervention.

By constraining the matching image region to lie on epi-polar lines as described herein, the algorithm can be made to converge much more quickly and with less uncertainty. This is an important advantage.

The procedure for deriving the complete 3d surface description of a surface region S in the present embodiment is summarised in Figure 7.

In step 100, a pattern, eg 3, 3, 4, 4 (Figure 6) is selected in one camera^'s image plane and its vertices located and identified. The shape of the pattern boundary is preferably but not necessarily a rectangle, a parallelogram, an equilateral triangle, a regular hexagon or some other figure hich can be repeated to cov er substantial]₎ the entire area of the photodetector array

In step 1 10, the epi polar lines of these vertices (eg EP in Figure 6) are pro_jected onto the other camera^'s image plane These epi-polar lines converge to a point (which is not necessaπly within the photosensitiv e detector of the other camera but whose coordinates in the other camera^'s image plane can be found by extrapolation if necessary) and define a range of pattern boundaπes having a similar shape to the selected pattern boundary of step 100

In step 120 sets of possible corresponding pattern vertices in the other camera's image plane are determined by selecting a pattern shape corresponding to the shape selected in step 100 (eg rectangular in the embodiment described above) and fitting its vertices to the above epi-polar lines A range of eg rectangular patterns of different elongation and size results

In step 130 each of these patterns is compared with the pattern selected in step 100 and the pattern with the closest match (in eg image densitv distribution, colour distribution, contrast distribution (ie distribution of rate of change of image density) or any w eighted average of the above parameters) is selected ( using eg Gruen s algonthm or a similar algonthm) as the pattern which best correlates with the pattern of step 100. The individual pixels within the matching patterns are then correlated ith each other and the 3D coordinates of these pixels relative to the scanner can then be determined, as will be shown below ith reference to Figures 12 and 13

In step 140. the abo e steps 100 to 130 are repeated for all other patterns in the first camera^'s image plane and hence all the pixels in each camera^'s photodetector array are correlated

It should be noted that in some cases none of the patterns selected in step 120 will correspond to the pattern selected in step 100, eg as a result of overhang of a region of the object which prevents a region of the object surface from being viewed by both cameras In such a case the processor will report that no correlation is possible and will select a new pattern (step 100)

The size of the pattern selected in step 100 is not critical but should preferably be smaller than most surface features of the object in order to minimise the possibility of some pixels but not others in the selected pattern correlating w ith a pattern in the other camera^'s field of view, possibly leading to incorrect matching of the tw o patterns The size could be selected by the user

A vanant of the above embodiment is shown in more detail in Figure 8 Hand-held scanner 13 is provided w ith an inertial sensor arrangement comprising a 3-axιs vibratory gyroscope arrangement 1 1 (which detects rate of rotation about mutually perpendicular axes θ_x, θ_v and θ_z) and an accelerometer 12 (which detects acceleration along the corresponding axes X, Y and Z). The output signals from these sensors are processed by a microprocessor arrangement μP which is similar to that shown in Figure 2 of our granted patent GB 2,292,605B (whose entire disclosure is hereb^y incorporated by reference) and the resulting position and orientation information is stored in a memory (such as a miniature hard disc M) which is optionall^y removable) The position and onentation information can also be output from the processor to a computer PC by a wire, radio or optical link via a bidirectional output port Computer PC is provided witTi a conventional RAM, hard disk and microprocessor and is arranged to perform the processing already referred to in connection w ith Figures 3 to 7 as well as the processing descπbed subsequently in connection with Figures 10 to 15. The arrangement is powered from a rechargeable battery B and the acquisition of data from the cameras, accelerometer and gyroscope arrangement is controlled by a hand-operated tngger button TR

The position and onentation data from the microprocessor can be used to guide the software which carnes out the processing of Figure 7, particularlv in the search for a matching pattern in step 130. However it should be noted that once a match has been found and the images in the two cameras correlated, the position and onentation of the object O with reference to the scanner 13 is defined. This position and onentation is likelv to be more accurate than that obtained from the gyroscopes and accelerometers because the latter are subject to drift. Hence it is prei erred to use the position and onentation information from the gyroscopes and accelerometers only for guidance of the image correlation software.

In pnnciple the entire surface of the object could be scanned in one scan but in some cases it may be necessary to combine the surface portions denved from overlapping scans and the processes disclosed in our co-pending PCT7GB95/01994 can be used for this purpose.

Scanner 13 also carnes a light source LS (eg a stroboscopic light) for illuminating the object O and optionally a supplementary laser arrangement LA which can be used to derive additional depth information or other surface coordinate information by 50

10 projecting an appropriate optical pattern onto the surface of object O: for example it could be a triangulation arrangement similar to that disclosed in Figure 1 or Figure 3 of our PCT/GB95/01994 or it could be a Moire arrangement.

In a further major variant of the above embodiment wherein laser arrangement LA is a triangulation arrangement the optical axes of the cameras are arranged to cross at the centre of such a projected pattern, at a distance corresponding to the centre of the depth of field of triangulation. and the cameras are used to acquire a true monochrome or colour representation of the object either simultaneously with triangulation or alternateh . This can be superimposed on the 3D image acquired by triangulation to allow the profile obtained by triangulation to be rendered using the image data from the cameras. In this manner surface features other than profile , eg surface printing and colours can be captured.

The light source LS can be used to provide consistent incident light for the object O in the majoritv of ambient lighting conditions. Jt can be synchronised to image capture by the cameras in order to prevent interference with the triangulation process.

The position and attitude data obtained from the gyroscopes and accelerometers can be tagged to the image data from the cameras and triangulation arrangement to enable the data from the cameras and triangulation aπangement to be processed either in real time or off line, allowing areas which are poorly described by the triangulation signals to be to be corrected by the image data from the cameras, or vice versa.

In addition, surface features detected by the cameras (ie any distinct pattern of pixels) or groups of such surface features can be tracked as the scanner 13 is moved relative to the object O and the trend of movement and/or distortion of such features and/or the trend in the movement or spacing of such surface features can be used to predict the next position and orientation of the scanner and hence to guide the software in correlating the images from the two cameras. This information can also be used to correct acceleration and rate of rotation data from the accelerometers and gyroscopes.

It should be noted that the above combination of stereoscopic imaging using cameras 1 and 2 and triangulation provides a system that is capable of producing range data 950

1 1 for a wide variety of surface topologies in a wide range of lighting conditions. Errors introduced by one method can be largely corrected by data from the other, since the two methods are largely complementary in their applicability in different lighting conditions.

The post processing methods disclosed in our PCT/GB95/01994 can be used to combine separately acquired surface regions obtained from the outputs of either a triangulation arrangement or a Moire anangement or the cameras 1 and 2. Such methods can also be used to provide position and orientation data for use in any of the processes described above requiring such methods.

In one embodiment a common processor is used a) to process stereoscopic data (eg by the process of Figure 7 of the present application) from the two cameras 1 and 2 b) to predict the position and/or orientation of the scanner relative to the object by tracking groups of features (which could be any distinct groups of pixels) between frames and c) to process triangulation data (eg as obtained by the optical arrangement of Figure 1 or Figure 3 of our PCT/GB95/01994). These processes, designated S, F and T respectively, can be carried out sequentially in ratios varying with the frame rate and the processing power of the processor. For example at 60 frames/second the sequence could be:

STTFTTFTT, giving a 1:2:6 ratio

and at 30 frames/second the sequence could be:

STFTFT, giving a 1:2:3 ratio.

Figure 9 shows a further embodiment 13' which utilises only a single camera and a microprocessor μP to track an object O and to determine its shape.

This embodiment utilises stereoscopic image processing analogous to that outlined in Figures 1 and 2 and utilised by the embodiment of Figures 1 to 8, but stereoscopically combines the images acquired at different positions and orientations of the scanner 13' relative to the object, using information on the position and orientation of the scanner acquired by tracking the image in its image plane I. This processing is carried out by a computer PC which receives the output data from the scanner.

This is illustrated in Figure 10. The object O (assumed to be polyhedral for ease of illustration) moves relative to the scanner from position O to position O^' and the face ABC is projected onto image plane 1 first as image abc (position O) and then as image a^'b^'c^' (position O^' ). The undeviated ray lines Aa". Bb' and Cc' can be constructed merely from the position of the image a^'b'c^' and a knowledge of the camera geometry. The triangle ABC of defined size and shape can be fitted to these ray lines in only one position and orientation beyond the optical centre of the lens L. Hence the position and orientation of the object can be tracked.

It should be noted that the above analysis assumes that the size and orientation of the face ABC of the object is initially known. However this is not essential in order to track the movement of the scanner relative to the object. Thus the ray lines Aa, Bb and Cc are consistent with a smaller object having a face A'B'C (as shown) located nearer the camera. However (assuming its initial orientation is as shown) the difference in detected orientation of the object between positions O' and O will be the same and the distance of movement of the scanner will be scaled by the assumed object size. The shape of the object will not be affected.

Of course the initial position O of the object is not the only position consistent with ray lines Aa, Bb and Cc. For example these ray lines would also be consistent with a face A^'BC. However, since the size and shape of the face would be invariant as the object and scanner move relative to each other, the orientation and position of hypothetical face A 'BC in relation to (equally hypothetical ) face ABC would also remain unchanged and the relative movement of the object between positions O and O^' as determined by tracking movement of the image abc/a^'b^'c^' would not be affected.

In the above example the points abc are derived from corners of the object. However in principle any set of three or more clearly defined points or groups of points on the object O can be tracked to track its movement and rotation relative to the scanner 13^'. In practice the scanner would normally move and the object would be stationary. Since the tracked points will gradually move off the edge of the photosensitive array of the camera, it is desirable to track at least four points in order to track the movement of the object relative to the scanner over distances which are large in comparison with the size of the photodetector array.

The above analysis illustrates the tracking of the object but does not enable its shape to be determined.

The shape determination is illustrated in Figures 1 1 and 12. Two patterns P are shown which are distorted versions of each other as illustrated by the correlations 25 between their corresponding pixels. It is assumed that these patterns are each formed on the image plane of the camera of scanner 13 ' as it moves from a first to a second position relative to the object and (since they conespond) are derived from a common surface portion S of the object. The undeviated ray lines shown in Figure 12 passing through the optical centre of the lens L determine the position and orientation of surface region S. given that the position and orientation of the scanner relative to the object are known, from the analysis of Figure 10.

The movement M shown in Figure 12 is a rectilinear movement in the direction of the image plane of the camera, and illustrates the similarity in the analysis of a stationary stereoscopic camera arrangement and a moving monocular camera arrangement. In principle however the position and orientation of the surface could be determined just as easily from a known, non-rectilinear movement of the scanner.

The overall process of determining the shape of the object is illustrated in the flow diagram of Figure 13. In step 200. four or more points in the image plane are selected. Alternatively, in some cases it may not be necessary to select more than three points.

In step 210. adjacent ones of the four or more points are connected to form a network of at least two triangles. (Alternatively it may be possible to rely on only three such points to form one triangle in certain cases). One such triangle abc has already been referred to in connection with Figure 10.

The second triangle enables the network to be tracked as one point moves off the edge of the photosensitive array, but this will not be necessary on certain cses The points are tracked (step 220).

The undeviated ray lines are then projected from the tracked points (step 230).

In step 240 the network of four or more points ( in some cses three or more points) is then fitted to the projected ray lines to define the new position of the object (eg O' in Figure 10).

In step 250 the point(s) which have moved out of the field of view are discarded and a new network is constructed (ie steps 200 and 210 are repeated). This step will not be necessary in certain cases. At the same time, the calculated position and orientation data are output.

In step 260, successive views are compared (cf Figures 1 and 2 and Figure 12) and, using the position and orientation output of step 250, used to obtain the 3D coordinates of the surface portion (eg S in Figure 12) of the object which is common to the two views.

Finally the surface portions of the object are combined to obtain the entire 3D coordinates of the object surface, using either the position and orientation data output from step 250 or using a postprocessing method as disclosed in our co-pending PCT/GB95/01994.

Step 260 (which is illustrated in Figures 1 1 and 12) is elaborated in the flow diagram of Figure 14. This flow diagram is equally applicable to the stereoscopic and monocular camera anangements.

In step 300 a pattern (eg the left hand pattern P in Figure 14) is selected.

In step 310 a corresponding pattern (eg the right hand pattern P in Figure 11) is searched for, using variable scaling factors eg to compress/expand the image in the X and Y directions and possibly also to apply a linear correction to the image density in order to take into account oblique illumination of the object.

In step 320 the above step is repeated for other patterns and the points in the corresponding pa^tterns are correlated (cf the mapping illustrated bv lines 25 in F._oure 1 1).

In step 330 the correlated points are used to construct the 3D surface region of the ob^ject region common to both cameras^' fields of view ₍ cf Figure ^ usin_c the position and/or orientation data of step 250 of Figure 13. Additionall_y gyπTdata and/or acceleratⁱon data and/or geometncal data defining the geometry of the stereoscopⁱc camera arrangement of Figures 1 to 8 (if used₎ can be utilised in this step.

In step 340 the process is repeated for other surface regions and the process then continues with step 270 (Figure 13).

In some cases it may be dⁱfficult to conelate the patterns P in s_tep 310_, eg as a result of lack of contrast in the images.

The aspect of the invention in which a fractal pattern (in which the small-scale structure and the large-scale structure share a common element₎ is provided (eg by optical projectⁱon ⁾ on the scanned object and the pattern is viewed from different angles to derive dⁱfferent images which are correlated to derive the 3-D surface coordinates of the region of the object on which the pattern is formed is illustrated by way of example only in Figures 15 and 15A.

A fractal pattern 500 ⁽obtained bv forming a cross and then supeπmposing a cross of half the size on each tip, and repeating this process in respect of each prev ious cross) is shown in Figure 1 5 after three iterations of the above process ₍in practice more iterations may be used to increase the detail and the density of coverage) and a pattern region P is selected. For the sake of simplicity it is assumed that the pattern is projected orthogonally onto a fiat region of the object so that the image shown in Fⁱgure 15 is an undistorted representation of the original pattern.

If the pattern is viewed from another angle a distorted view (in this case vertically compressed and horizontally elongated ) is obtained as show n in Figure 15A.

However there is only one pattern region P' having the same features ^"as pattern region P ⁽in this case^, having its top and left hand boundaries on a line_, having its „,„,.„ PCT/GB98/02307 06950

16 right hand boundary touching the tip of a line and its bottom boundary crossing two lines and touching the tip on another) and this pattern region can therefore be unambiguously located by searching for the above topological features by means of a suitable algorithm. Hence the entire pattern images (either left and right images as viewed either by a stereoscopic camera arrangement or successive images as viewed by a moving monocular camera) can be coπelated and hence the entire 3-D surface coordinates of the region of the object on which the pattern 500 is projected can be found.

A completely different arrangement for determining depth information, at least approximately, is disclosed in Figure 16. An array of point illumination sources LS projects light spots R onto the surface S of the object and a lens L images the illumination pattern formed by the spots on a photosensitive image plane 700. The size and/or shape of each imaged spot is measured by appropriate image-processing software. The size of a spot R is directly proportional to the distance of the corresponding illuminated surface region from the conesponding light source LS and the shape (in this case the deviation from circularity) gives information about the local inclination and curvature of the surface. Hence at least an approximation to the surface coordinates can be derived from the sizes and/or shapes of the spots R, given the angle of each cone of illumination and other basic geometrical information. The light sources could suitably be optical fibres and the arrangement is suitable for miniaturisation in eg an endoscope. The light sources need not be point sources but could for example be line sources, whereby the width of each illumination stripe on the surface S is related to the distance from the light source. Alternatively, dark region(s) could be projected onto the surface S using appropriate mask(s).

Background information on the geometry of an alternative range sensor using a mask is given by Lorange et α/in Procs. Intl. Conference on Recent Advances in 3-D digital Imaging and Modelling May 12th-15th 1997, pp51-58 at p52(pub. IEEE Computer Society Press, Los Alamitos, California USA ). Background information on constructing a 3-D representation of a surface from silhouettes taken at different angles (these being combined in the correct manner by reconstructing a circular reference pattern initially provided beneath the object) is disclosed in the above Conference Proceedings by Nie et al at ppl73- 180. Both these articles are incorporated by reference and provide further techniques for obtaining at least an initial approximation to the surface coordinates which can be used prior to or in combination with any of the novel techniques disclosed herein, eg to assist in the process of Figure 14. Any other initial approximation could alternatively be used in the process of eg Figure 14.

Refering to Figure 17, an endoscope arranged to provide a 3-D representation of a body canal or cavity is disclosed, having a head having a diameter about 10mm in diameter with a central region (of about 6 mm diameter) in which two CCD photosensitive arrays 701 and 702 are disposed. The endoscope is also provided with four regularly circumferentially distributed channels 704 for accommodating surgical instruments and the like, as is standard. Cables 703 are disposed regularly about the peripheral region of the endoscope and are used for bending the endoscope to guide it through a body canal, as is standard.

Referring to Figure 18 a fibre-optic bundle 710 terminates at the front face of the head 705 and carries a light beam from an illumination source at the proximal end of the endoscope (not shown). The beam is projected onto the surface S of the region to be viewed and appears as an array of overlapping circular areas, only one of which is shown. The size and distortion from circularity of these illuminated areas can be detected by the detector arrays 701 and 702 to provide an initial estimate of the 3-D coordinates of the surface region S (indeed in some cases satisfactory- information could be provided by this technique alone, so that only one photo sensitive array would be needed). However in the stereoscopic arrangement ill ustrated, the illuminated region S is imaged separately by the two photosensitive arrays and the images are correlated as described above with reference to Figure 14 to find the accurate 3-D coordinates of the interior of the body canal.

The image is focussed by a graded index (GRIN) rod or fibre 71 1.

In a variant, the illumination source is coupled to one or more selected fibres and at least these fibres in bundle 710 are provided with lenses 708 (or GRIN portions) which focus an optical beam as a spot on the interior surface of the body canal as shown. The focussed spot can be scanned eg in raster fashion either by coupling the illumination source to adjacent fibres in succession to displace the beam relative to the exit face of the fibre bundle in eg raster fashion or, less preferably, by moving the optic fibre bundle by any suitable means, eg piezoelectrically. The resulting spot can then be tracked and imaged by the photosensitive arrays and the coordinates on the respective arrays used to derive the 3-D coordinates of the surface S. as explained with reference to Figures 1 and 2. Alternatively a single photodetector could be employed, as in our PCT/GB95/01994, preferably satisfying the Scheimpflug condition

Referring to Figure 19, which is based on Figure 1 of WO 95/14952, a conventional monocular rigid endoscope 401 having an objective lens 402 at its distal tip and an ocular 403 at its proximal end is combined with a video camera 404 which focusses light exiting from ocular 403 onto the photosensitive image plane I of a photosensitive detector (such as a CCD for example) by means of a focussing lens L. In practice lens L will normally be a multi-element lens and the exposure will normally be controlled by an iris (not shown). As described thus far the arrangement is conventional. Alternatively the camera 404 may be a cine camera, in which case the light from lens L is focussed onto the photosensitive image plane of cine film.

In another embodiment of the present invention the distal head of the endoscope is provided with miniature inertial sensors 1 1 and 12 similar to those shown in Figure 8 which measure angular rotation about three orthogonal axes and linear acceleration along the three orthogonal axes respectively. The signals from these sensors are integrated as in the embodiment of Figure 8 to derive the instantaneous position and orientation of the endoscope head.

The lens L is used to obtain stereoscopic views by alternately occluding the light exiting from the left and right regions of the ocular 403 with a shutter means 405 preferably at a rapid rate such as 50 times per second (for video), under the control of a signal from processing circuitry 408. The shutter means 405 may be provided in front of lens L as shown, between different lens elements of a multi-element lens (not illustrated) or may be located between the lens L and photosensitive image plane I, for example. In particular the shutter may be a LCD shutter printed on a surface of lens L. The rays blocked by shutter means 405 are preferably parallel as shown but may alternatively converge or diverge. Particularly if the rays are converging, the shutter should preferably be located close to the lens. The stereoscopic image pairs can be tagged w ith position information and orientation information as in the embodiment of Figure 8 in order to obtain a 3-D representation of a body canal in which the endoscope is inserted, as previously described in connection with Figure 8

In a variant, the shutter 405 is omitted and the endoscope head is manipulated to change its position and/or onentation the resulting vie s being tagged w ith position and/or onentation information in order to enable the required 3-D representation to be obtained

10

In a further vaπant the inertial sensors are omitted and the 3-D representation is built up by appropπate processing of monocular images as in the embodiment of Figure 9

Preferably a hood (not shown ) is provided at the interface of camera 404 and ¹⁵ endoscope 401 to prevent stray light enteπng the video camera, or the camera and endoscope are integral The endoscope may be a laparoscope, a borescope, a cystoscope or an arthroscope for example. The user may pull focus or zoom (assuming the lens has this facility ) without affecting the stereoscopic imaging

->, 0 A switching output (synchronised with the shutter 405) and a video output may be fed to a standard stereoscopic v iew ing arrangement as show n in Figure 1 of WO 95/14952

The endoscope aspect of the invention is also applicable to the arrangements shown in GB-A-2,268.283. US 5.222.477 and US 5.12,650 for example It is not limited to any of the image processing methods disclosed herein and can be used in conjunction with eg the scanned pattern and depth detection anangements of our co-pending PCT/GB95/01994 for example, using eg an optic fibre array (not show n) in the endoscope head to project the pattern

It will be apparent that a number of inventions have been disclosed The inventive

30 features disclosed herein are all considered to be independent but to be optionally combinable where indicated Preferred features of the invention are defined in the dependent claims.

35

Claims

1 An arrangement for acquinng the 3D shape of an object (O). the a╧Çangement comp╧Çsmg

a) an image acquisition arrangement ( 13) for acquinng groups of images of overlapping surface regions of the object, the image acquisition arrangement being freely movable relative to the object and compnsing two or more mutually offset cameras (L. I) with ov erlapping fields of view,

b) inertial sensing means ( 1 1 , 12) arranged to detect movement of said image acquisition arrangement relativ e to the object and to generate a motion output signal representati e of such movement

c) image processing means (PC) ananged to correlate features (A 1 B 1C1D1 , A2 B2.C2D2 ) of the respectiv e images of each group denved from a feature (ABCD) of the object common to the group and to derive in respect of each group a set of output data representing the 3D shape of a surface region of the object, and

d) combining means (PC) arranged to combine the sets of output data into a common set of output data in dependence upon both said output data denv ed by said image processing means and said motion output signal generated by said inertial sensing means

2 An arrangement as claimed in claim 1 wherein said inertial sensing means ( 1 1 , 12) is arranged to detect rotation of the image acquisition arrangement ( 13) relativ e to the object (O) and to generate a rotation output signal and said combining means (PC) is arranged to combine said sets of output data in dependence upon both said rotation output signal and translation information denved f rom the image acquisition arrangement

3 An arrangement as claimed in claim 1 or claim 2 wherein said inertial sensing means ( 1 1. 12) is arranged to generate a translation output signal and said combining means (PC) is arranged to combine said sets of output data in dependence upon both said translation output signal and translation information denved from the output data of said processing means (PC).

4. An arrangement as claimed in any preceding claim which comprises an endoscope (401 ) ananged to acquire the 3D shape of a body cavity, the inertial sensing means ( 1 1, 12) being mounted in the head of the endoscope.

5. An arrangement as claimed in any preceding claim wherein the mutually offset cameras (405. L, I) have a common objective lens (402) and their offset is provided by a shutter means (405) which selectively occludes offset ray bundles from the object.

6. An anangement as claimed in any of claims 1 to 4 wherein the respective optical axes of the cameras ( L. I ) are coplanar.

7. An arrangement as claimed in claim 6 wherein said optical axes are parallel.

8. An anangement for acquiring the 3D shape of an object (O). the anangement comprising:

a) a camera ( 13' ) which is freely movable relative to the object so as to acquire overlapping images of the object from different viewpoints:

b) inertial sensing means ( 1 1. 12) arranged to detect movement of said camera relative to the object and to generate a motion output signal representative of such movement.

c) image processing means (PC) arranged to correlate features (A1B 1C1D1; A2B2C2D2) of the respective images derived from a common feature (ABCD) of the object which are acquired by the camera from different viewpoints and to derive output data representing the 3D shape of a surface region of the object: and

d) combining means (PC) ananged to combine sets of such output data into a common set of output data in dependence upon both said output data derived by said processing means and said motion output signal generated by said inertial sensing means.

9. An arrangement as claimed in claim 8 wherein said inertial sensing means ( 1 1, 12) is ananged to detect rotation of the camera (L, I) relative to the object (O) and to generate a rotation output signal and said combining means (PC) is arranged to combine said sets of output data in dependence upon both said rotation output signal and translation information derived from the overlapping images.

10. An anangement as claimed in claim 8 or claim 9 wherein said inertial sensing means ( 1 1. 12) is ananged to generate a translation output signal and said combining

10 means (PC) is ananged to combine said sets of output data in dependence upon both said translation output signal and translation information derived from the output data of said image processing means ( PC).

₁₅ 1 1. A method of processing overlapping 2D images of an object (O) acquired from different spaced apart viewpoints relative to the object, the 2D images being associated with pairs of corresponding regions (A1B 1C1D1: A2B2C2D2), each such pair of conesponding regions comprising a region of one 2D image and a region of another 2D image each derived from region of the object (ABCD) common to the

-0 images, there being a mutual offset between the conesponding regions of each pair which is determined by the relative rotation (if any) and translation between the viewpoints, the method comprising the step of digitally processing the 2D images to correlate the conesponding regions by utilising a plurality of epipolar lines (EP_a -

EPd) conesponding to one such region in one such image to constrain a search for the 25 corresponding region in the other image.

12. A method as claimed in claim 1 1 wherein a 3D reconstruction of the common region of the object (O) is generated by virtually projecting the images in simulated _Q 3D space with an offset between the projected images conesponding to said mutual offset.

13. A method as claimed in claim 1 1 or claim 12 wherein said mutual offset is derived by tracking (220) a group of points (a. b, c ; a", b', c^' ) in the field of view of a

35 camera (L, I) ananged to acquire said overlapping 2D images, virtually projecting (230) into simulated 3D space ray lines from the tracked points as they appear in one of the images and connecting (240) an arbitrary network to the ray lines, virtually projecting (230) into simulated 3D space further ray lines from the tracked points as they appear in the other of the images and offsetting said network to connect it to said further ray lines whereby the offset of the network conesponds to said mutual offset.

14. A method as claimed in any of claims 1 1 to 13 wherein said mutual offset is determined by an inertial sensing means ( 1 1 , 12) arranged to detect the relative motion between the object and the viewpoints.

15. A method as claimed in claim 1 1 or claim 12 wherein said mutual offset is predetermined.

16. A method as claimed in claim 15 wherein said images are acquired by an assembly ( 13) of spaced apart cameras, the assembly being freely movable relative to the object (O).

17. An image processing arrangement for acquiring the 3D shape of an object (O) from overlapping images of the object acquired from mutually offset viewpoints, the anangement comprising:

a) tracking means (PC) arranged to determine the offset between said vie points by optically tracking movement of features between the images:

b) conelation means ( PC) ananged to conelate features ( P/A 1 B 1C 1D1 : P/A2B2C2D2) of the respective images derived from a common feature (ABCD) of the object which are acquired from said offset viewpoints, said conelation means being ananged to derive a group of two or more epipolar lines (EP_a-EP<j) in one image corresponding to a region in another image and to correlate that region with the corresponding region of said one image by utilising said group of epipolar lines to constrain a search for said corresponding region, and

c) 3D reconstruction means (PC) responsive to said tracking means and conelation means and arranged to derive said 3D shape from said offset and conelated features.

18. An arrangement as claimed in claim 17 wherein said 3D reconstruction means (PC) comprises means for virtually projecting the images with an offset conesponding to said mutual offset and with the conelated features (P/A1B 1C 1D1 :

P/A2B2C2D2) intersecting in simulated 3D space

19 An anangement as claimed in claim or claim 18 wherein said tracking means (PC) is arranged to v irtually project into simulated 3D space ray lines from tracked points (a. b, c) as they appear in one of the images and to connect an arbitrary, network to the ray lines to v irtually project into simulated 3D space further ray lines from the tracked points ( a^' b^' c^' ) as they appear in the other of the images and to offset said network to connect it to said further ray lines hereby the offset of the network conesponds to said mutual offset

20 An anangement as claimed in any of claims 17 to 19 hich further compnses an image acquisition anangement ( 13/137401 ) for acquiring said overlapping images, said image acquisition arrangement being freely mov able ith respect to said object (O)

21. An anangement as claimed in any of claims 17 to 20 w hich compnses a camera (1, 1) ananged to acquire said images and wherein a computer (PC) ananged to receive said images is provided with a program w hich implements said tracking means, conelation means and 3D reconstruction means

22. An anangement as claimed in claim 21 w herein said camera ( L. I ) is a digital camera

23. A method as claimed in any of claims 1 1 to 16 or an anangement as claimed in any of claims 17 to 22 wherein said regions (P/A 1B 1C 1 D 1 PA2B2C2D2) are areas

24 An anangement for acquinng the 3D shape of an object, the anangement compnsing

a) an image acquisition anangement ( 13/137401 ) tor acquinng images of overlapping surface regions of the object (O),

b) means for providing a fractal pattern (500/500A) on said surface regions, and c) image processing means (PC) ananged to conelate features (P. P^" ) of the fractal pattern between said images and to derive the 3D surface coordinates of the region of the object on which the pattern is formed.

25. An anangement as claimed in claim 24 herein said image processing means (PC) is arranged to correlate said features by identifying pattern features ( P. P') of common topology in the overlapping images

26. A method of acquiring the 3D shape of an object, the method compnsing.

10 a) providing a fractal pattern (500. 500A ) on a surface region of the object,

b ) acquiring overlapping images of said surf ace region, and

c ) conelatmg features (P. P^' ) of the fractal pattern between said images and denving the 3D surface coordinates of said surface region

27 An anangement for determining depth information, comprising beam-forming 20 means ananged to illuminate a surface with an optical pattern, the size of each of the features of the pattern on the illuminated surface being dependent on the distance of that feature from the beam-forming means, means for detecting the size distnbution of said features and processing means arranged to denve the distance distnbution from said size distnbution and the optical characte╧Çstics of the beam-forming means.

25

28. An anangement for determining shape information, comprising beam-forming means (LS/708)) ananged to illuminate a surface ( S ) w ith an optical pattern (R), the shape of each of the features of the pattern on the illuminated surface being ,₍₎ dependent on the gradient of the corresponding region of the surface relativ e to the beam-forming means, means (PC) for detecting the gradient distnbution of said features and processing means (PC) arranged to derive the shape information from said gradient distnbution and the optical charactenstics of the beam-forming means

35 29. An anangement as claimed in claim 27 or claim 28 wherein the beam-forming means (LS/708) compnses an array of optic fibres (710, 71 1 ).

30. An endoscope (401) comprising an anangement as claimed in any of claims 27 to

29 and canying said beam-forming means (708) in its head (705).

31. An endoscope having an optical anangement for sensing the interior surface (S) of a body canal or cavity, the arrangement comprising an array (710) of optic fibres ananged to form a pattern of illumination on said surface, means (71 1) for imaging said pattern on a photosensitive image plane (701 ), and means (PC) for processing the image to derive depth information in respect of the illuminated surface.

10 32. A endoscope as claimed in claim 31 wherein the pattern of illumination is a scanned pattern formed by coupling illumination to adjacent optical fibres (706) in succession.

33. An endoscope as claimed in any of claims 30 to 32 having inertial sensing means ( 11. 12) ananged to determine the position and/or orientation of the endoscope head.

34. An image processing anangement substantially as described hereinabove with reference to Figures 3 to 7 or Figures 10, 13 and 14 or Figures 15 and 15A of the

20 accompanying drawings.

35. An anangement for acquiring the 3D shape of an object, the anangement being substantially as described hereinabove with reference to Figure 8 or Figure 9 in conjunction with Figures 3 to 7 or in conjunction with Figures 10. 13 and 14 of the

_D accompanying drawings.

36. An anangement for acquiring the 3D shape of an object, substantially as described hereinabove with reference to Figure 16 of the accompanying drawings.

30

37. An endoscope substantially as described hereinabove with reference to Figures 17 and 18 or w ith reference to Figure 19 of the accompanying drawings.

35