GB2544263A

GB2544263A - Systems and methods for imaging three-dimensional objects

Info

Publication number: GB2544263A
Application number: GB1519398.0A
Authority: GB
Inventors: Rubio Navarro Leonardo
Original assignee: Fuel 3d Tech Ltd
Current assignee: Fuel 3d Tech Ltd
Priority date: 2015-11-03
Filing date: 2015-11-03
Publication date: 2017-05-17
Also published as: WO2017077277A1; EP3371780A1; GB201519398D0

Abstract

An apparatus for computing a 3D model of an object comprises successively illuminating the object from at least three illumination directions and capturing 101 at least three images of the object with one or more sensors (cameras), in which different images represent the object when illuminated from one of the illumination directions. Reference features corresponding to object landmarks are identified 102 in one of the images. Corresponding features are identified 103 in at least some of the other captured images. The positions of the reference features in the images are used to estimate 104 image viewpoints from which the motion of the object relative to the cameras may be determined. The estimated motion is used to register the images in a common coordinate system, and thereby correct for the relative motion of the object and imaging system between different times at which the images were captured. The registered images are used to generate 105 a 3D model of the object. The reference features are selected such that they are likely to be landmarks on the object, rather than on a background behind the object.

Description

Systems and Methods For Imaging Three-Dimensional Objects Summary of the invention

The present invention relates to an imaging system for generating three-dimensional (3D) images of a 3D object, and a method performed by the imaging system. In particular it relates to situations in which the object and imaging system may move relatively during a period in which the imaging system is capturing multiple images of the object.

Background of the invention

Modelling of 3D surfaces using two-dimensional images has been a major research topic for many years. The 3D surface is illuminated by light (or other electromagnetic radiation), and the two-dimensional images are created using the light reflected from it.

Most real objects exhibit two forms of reflectance: specular reflection (particularly exhibited by glass or polished metal) in which, if incident light (visible light or other electromagnetic radiation) strikes the surface of the object in a single direction, the reflected radiation propagates in a very narrow range of angles; and Lambertian reflection (exhibited by diffuse surfaces, such as matte white paint) in which the reflected radiation is isotropic with an intensity according to Lambert’s cosine law (an intensity directly proportional to the cosine of the angle between the direction of the incident light and the surface normal). Most real objects have some mixture of Lambertian and specular reflective properties.

Recently, great progress has been made in imaging three-dimensional surfaces which exhibit Lambertian reflective properties by means of photometry (the science of measuring the brightness of light). For example, WO 2009/122200, “3D Imaging System” describes a system in which, in preferred embodiments, the object is successively illuminated by at least three directional light sources, and multiple cameras at spatially separated spatial positions capture images of the object.

Particularly if the object is liable to move relative to the imaging system while the images are taken, a localization template, fixed to the object, is provided in the optical fields of all the light sensors, to allow the images to be registered with each other, in a frame of reference in which the object is unmoving. This is critical because, even if the cameras do not have a large translational movement with respect to the object, a small rotation of the cameras in the intervals between the times at which the images are captured produces a very great change in the position of the object within the images. However, using the localization template, it is possible to determine, for each image, the corresponding viewpoint from which the image was captured, in the reference frame of the localization template.

Typically, the object will have a number of “landmarks” which, when imaged, produce features which can be easily recognized in each of the images. Consider two images (a “stereo pair” of images) which are captured simultaneously respectively by two or more of the cameras with a known geometrical relationship, such that the relationship between the viewpoints is known. For each of a number of landmarks, the system determines the corresponding positions in the stereo pair of the corresponding features. Using this data, an initial 3D model of the object is created stereoscopically (i.e. by optical triangulation).

Photometric data is generated from images captured at different times when successive ones of the directional light sources are activated. If the object is moving relative to the cameras during this period, the images are registered using the localization template (i.e. such that the respective viewpoints are known in a common reference frame in which the object is stationary). On the assumption that the object exhibits Lambertian reflection, the photometric data makes it possible to obtain an estimate the normal direction to the surface of the object with a resolution comparable to individual pixels of the image. The normal directions are then used to refine the initial model of the 3D object.

This permits highly accurate imaging, providing that the registration of the images using the localization template is sufficiently accurate. However, an error of just 0.2 pixels in the registration process can cause a significant degradation in the accuracy of the photometry.

Since one of the applications of the method is for hand-held imaging systems, camera shake is a significant problem.

Typically, the user does not know whether the errors caused by movement of the camera were too large to permit high-quality photometric processing until the photometric processing has been completed. Since the processing is computationally intense, this can lead to a significant delay before the user is warned that the process must be performed again.

In principle, the problem could be solved by arranging for the images to be captured more rapidly, so that there is less time for relative motion of the imaging system and object between the images being captured. However, it has been found that even an imaging rate of 60 images per second is not sufficient to solve the problem, and capturing images more quickly than this without dramatically increasing the cost of the imaging system is a significant engineering challenge.

Summary of the invention

The present invention aims to provide new and useful methods and systems for obtaining three-dimensional (3D) models of a 3D object, and optionally displaying images of the models.

In general terms, the invention proposes that, in a 3D imaging system in which an object is illuminated (preferably successively) from at least three directions (by energy generated by at least one energy source) and at least three respective images of the object are captured, corresponding features in different ones of the images are identified, and the positions of the features in the images are used to estimate motion of the object relative to the energy sensors. The estimated motion is used to register the images in a common coordinate system in which the object is stationary (i.e. the respective positions and directions of the respective viewpoints from which the images were captured are found in the common reference frame), and thereby correct for the relative motion of the object and imaging system between different times at which the images were captured.

This makes it possible for an embodiment of the invention to omit the localization target of WO 2009/122200, and yet perform 3D modeling of the object with high accuracy.

In fact, it is already known in the field of computer vision that when multiple images of a scene are captured by a camera moving relative to the scene, the images can be “stitched together” by a process known as homography. Sophisticated algorithms for homography exist, based on matching features in the images. However, existing homography algorithms assume that the scene is planar or at a very great distance from the camera, whereas when imaging a 3D object located in front of a background, as in the present invention, the images captured by the energy sensors may include images of items which are proportionally much further from the camera than other items in the image. Specifically, the object which is being imaged typically has a fairly small depth range (e.g. in the range 350mm-550mm from the camera) but it is in front of a background which may be at least a meter away from the energy sensors. If the conventional homography algorithms are applied to such images, a very poor result is to be expected, leading to a very poor 3D model of the object.

In such applications, preferred embodiments of the invention make a selection of features within at least one of the images (“feature selection image(s)”) which are likely to be associated with landmarks on the imaged object, rather than the background. Only these features (“reference features”), and the corresponding features in other of the images, are used in the algorithm for determining the motion between the images. A first way of selecting the reference features is based on a knowledge of the scene. For example, if the background has a predetermined color or pattern, areas of the images which show the background may be identified using the color or pattern, and the offset calculation would not use features in those areas. A second way of selecting the reference features is based on the intensities in the images. This is motivated by the observation that objects distant from the energy source(s) and the energy sensors are more likely to appear dark. First, because of the “near/far” effect, according to which energy received from nearby energy sources is greater than from distant ones. Secondly, the user of the system will often have been careful to ensure that the object is illuminated by the directional energy sources. Thus, light generated by the directional energy source(s) (which may be in flashes) typically dominates ambient light falling onto the background. Furthermore, the user may be instructed to ensure that there are no bright light sources in the background.

In a first possibility, the reference features may be selected as features which have an intensity above a threshold.

Alternatively or additionally, the reference features may be selected by identifying areas of the images (e.g. areas which are at least 5x5 pixels in size, or at least 10x10 pixels in size, or at least 20x20 pixels in size) with a relatively high average intensity, and selecting features which are located within those areas (that is, not considering features in areas identified as having a low average intensity). The “average” may be the mean intensity, the median intensity, or any other average value.

Note that this way of selecting features may be carried out even in a system which includes only a single energy sensor. A third way of selecting reference features is by estimating the distance of the corresponding landmarks from the imaging system, and selecting from those landmarks ones within a predetermined distance range from the imaging system, e.g. less than a certain distance from a certain point on the imaging system. The corresponding features are then used as the reference features.

Distance may be calculated in several ways. One way would be using a depth camera, e.g. one using sheet-of-light triangulation, structured light (that is, light having a specially designed light pattern), time-of-flight or interferometry.

Another way of finding the distances of landmarks is stereoscopically, using a stereo pair of images captured at the same time by different respective energy sensors from different respective viewpoints having a known positional relationship. Since the stereo pair was captured at the same time, any motion of the imaging system relative to the object affects them both equally. An approximate distance can be obtained for any landmarks which cause features in both images of the stereo pair. The distance can be used to select corresponding reference features. Corresponding features can then be identified in other of the images captured by the imaging system at other times, to register those images with the stereo pair of images.

Once the reference features in the feature selection image(s) have been identified, and the corresponding features in the other images have been found, a computational algorithm is used to estimate the motion relative to the object of the energy sensors which captured those images. The algorithm may be a known homographic algorithm.

Optionally, the algorithm may incorporate any prior knowledge of motion between the object and the imaging system. For example, if it is known that the object is moving past the imaging system on a conveyor belt at a certain speed, that information may be used to give an initial estimate of the relative motion between times at which two images were captured.

As mentioned above, the object is preferably illuminated successively in individual ones of the at least three directions. If this is done, the energy sources may emit light of the same frequency spectrum (e.g. if the energy is visible light, the directional light sources may each emit white light, and the captured images may be color images). However, in principle, the object could alternatively be illuminated in at least three directions by energy sources which emit energy with different respective frequency spectra (e.g. in the case of visible light, the directional light sources may respectively emit red, white and blue light). In this case, the directional energy sources could be activated simultaneously, if the energy sensors are able to distinguish the energy spectra. For example, the energy sensors might be adapted to record received red, green and blue light separately. That is, the red, green and blue light channels of the captured images would be captured simultaneously, and would respectively constitute the images in which the object is illuminated in a single direction. However, this second possibility is not preferred, because coloration of the object may lead to incorrect photometric imaging.

Optionally, the algorithm may include calculating a quality control index, determining whether the quality control index is above or below a threshold, and if the quality control index is below (or in other embodiments, above) the threshold issuing a warning to a user of the imaging system. For example, the quality control index may simply be a measure of the offset between two or more of the images. In this case, the threshold may be set to warn the user when the offset is sufficiently great that the 3D imaging process may be unreliable. In this way, an embodiment of the invention may be able to issue a warning to the user before the computationally-complex formation of the 3D model of the object is carried out. Optionally, if the warning signal is issued, the imaging system may not form the 3D model.

Optionally, if features are found in at least three images captured at different known times by a single energy sensor, the algorithm may assume that the relative motion of the object and energy sensor was uniform over the period in which the at least three images were captured, and use this assumption to improve the estimation of the respective viewpoints from which the images were captured. For example, if one of the three images is darker than the others, such that the landmarks cannot be identified in that image, the algorithm may use a relative motion of the object and energy sensor calculated from the other two images to estimate the viewpoint from which the dark image was captured. This possibility is particularly useful if the three images include an image which is darker because it was captured at a time when none of the energy sources was illuminating the object (e.g. because it was desired to measure how much ambient light the object reflects).

Once the registration of the images has been performed, the 3D model of the object may be reconstructed from some or all of the images, such as using the methods explained in WO 2009/122200. Specifically, an initial model of the 3D object may be formed stereoscopically from two or more of the images (“stereo pairs”) which were captured, preferably simultaneously, by energy sensors at different spatial locations, and this initial model may be refined using photometric data from at least three of the images which were captured (preferably successively) when the object was illuminated in the respective directions. Note that optionally, for a given energy sensor, one of the latter images may be one of the stereo pair of images.

If the stereo pair of images are captured simultaneously by energy sensors with a fixed, known geometrical relationship, then those images may be registered with each other by knowledge of the geometrical relationship, rather than using the features according to the present inventive concept. However, the present inventive concept is used to register ones of the images which were not captured at the same time into a common coordinate system in which the object is stationary.

In particular, the present inventive concept may be used, for each of the energy sensors, to mutually register the set of images captured at different times by that energy sensor.

Optionally, the inventive concept may also be used to register a set of images taken by one of the cameras with respective set(s) of images taken with other of the camera(s).

However, more typically, if each set of images includes a respective image taken simultaneously (which, as mentioned above, they typically do, since this is preferably true of the stereo pair of images), once each set of images are mutually registered, each set of images may be registered with the other set(s) of images using the known geometrical relationship between the respective viewpoints of the simultaneously taken pair of images.

The energy used is electromagnetic radiation, i.e. light. The term “light” is used in this document to include electromagnetic radiation which is not in the visible spectrum. Various forms of directional energy source may be used in embodiments of the invention. For example, a standard photographic flash, a high brightness LED cluster, or Xenon flash bulb or a 'ring flash' of small diameter (if the diameter is too large, it will not be a directional light source, though the source may still be useful for the stereoscopy). It will be appreciated that the energy need not be in the visible light spectrum. One or more of the energy sources may be configured to generate light in the infrared (IR) spectrum (wavelengths from 700nm to 1 mm) or part of the near infrared spectrum (wavelengths from 700nm to 1100nm). Optionally, the energy may be polarized.

Where visible-light directional energy is applied, then the energy sensors may be two or more standard digital cameras, or video cameras, or CMOS sensors and lenses appropriately mounted. In the case of other types of directional energy, sensors appropriate for the directional energy used are adopted. A discrete energy sensor may be placed at each viewpoint, or in another alternative a single sensor may be located behind a split lens or in combination with a mirror arrangement.

The energy sources and viewpoints preferably have a known positional relationship, which is typically fixed. The energy sensor(s) and energy sources may be incorporated in a portable apparatus, such as a hand-held instrument. Alternatively, the energy sensor(s) and energy sources may be incorporated in an apparatus which is mounted in a building.

Although at least three illumination directions are required for photometric imaging, the number of illumination directions may be higher than this. The timing may be controlled by a processor, such as the one which calculates the relative motion of the object and energy sensor(s).

In principle, the energy to illuminate the object could be provided by a single energy source which moves between successive positions in which it illuminates the object in corresponding ones of the directions.

However, more typically at least three energy sources are provided, It would be possible for these sources to be provided as at least three energy outlets from an illumination system in which there are fewer than three elements which generate the energy. For example, there could be a single energy generation unit (light generating unit) and a switching unit which successively transmits energy generated by the single energy generation unit to respective input ends of at least three energy transmission channels (e.g. optical fibers). The energy would be output at the other ends of the energy transmission channels, which would be at three respective spatially separately locations. Thus the output ends of the energy transmission channels would constitute respective energy sources. The light would propagate from the energy sources in different respective directions.

The invention may be expressed as an apparatus for capturing images, including a processor for analyzing the images according to program instructions (which may be stored in non-transitory form on a tangible data storage device). Alternatively, it may be expressed as the method carried out by the apparatus.

Brief description of the drawings

An embodiment of the invention will now be described for the sake of example only with reference to the following figures in which:

Fig. 1 shows a first schematic view of an imaging assembly for use in an embodiment of the present invention to form a 3D model of an object;

Fig. 2 is a flow diagram of a method performed by an embodiment of Fig 1;

Fig. 3 shows, as Figs. 3(a) and 3(b), two images successively captured by one of the cameras of the embodiment of Fig. 1;

Fig. 4 illustrates sub-steps of a first possible implementation of a step of the method of

Fig. 2;

Fig. 5 illustrates sub-steps of a second possible implementation of a step of the method of Fig. 2;

Fig. 6 is composed of Figs. 6(a) which shows reference features identified in the image of Fig. 3(a), and Fig. 6(b) which shows corresponding features in the image of Fig. 3(b); and

Fig. 7 illustrates an embodiment of the invention incorporating the imaging assembly of Fig. 1 and a processor.

Detailed description of the embodiments

Referring firstly to Fig. 1, an imaging assembly is shown which is a portion of an embodiment of the invention. The imaging assembly includes an energy source 1. It further includes units 2, 3 which each include a respective energy sensor 2a, 3a in form of an image capturing device, and a respective energy source 2b, 3b (note that in variations of the embodiment, the energy sensors 2a, 3a are not part of the same units as the energy sources 2b, 3b). The units 2, 3 are fixedly mounted to each other by a strut 6, and both are fixedly mounted to the energy source 1 by struts 4, 5. The exact form of the mechanical connection between the units 2, 3 and the energy source 1 is different in other forms of the invention, but it is preferable if it maintains the energy source 1 and the units 2,3 at fixed distances from each other and at fixed relative orientations. The energy sources 1,2b, 3b and image capturing devices 2a, 3a may be incorporated in a portable, hand-held instrument. In addition to the assembly shown in Fig. 1, the embodiment includes a processor which is in electronic communication with the energy sources 1,2b, 3b and image capturing devices 2a, 3a. This is described below in detail with reference to Fig. 7.

The energy sources 1,2b, 3b are each adapted to generate electromagnetic radiation, such as visible light or infra-red radiation. The energy sources 1,2b, 3b are all controlled by the processor. The output of the image capturing devices 2a, 3a is transmitted to the processor.

Each of the image capturing devices 2a, 3a is arranged to capture an image of an object 7 (in Fig. 1, a dodecahedron) positioned in both the respective fields of view of the image capturing devices 2a, 3a. The image capturing devices 2a, 3a are spatially separated, and preferably also arranged with converging fields of view, so the apparatus is capable of providing two separated viewpoints of the object 7, so that stereoscopic imaging of the object 7 is possible.

The case of two viewpoints is often referred to as a “stereo pair”, although it will be appreciated that in variations of the embodiment more than two spatially-separated image capturing devices may be provided, so that the object 7 is imaged from more than two viewpoints. This may increase the precision and/or visible range of the apparatus. The words “stereo” and “stereoscopic” as used herein are intend to encompass, in addition to the possibility of the subject being imaged from two viewpoints, the possibility of the subject being imaged from more than two viewpoints.

Suitable image capture devices for use in the invention include the 1/3-Inch CMOS Digital Image Sensor (AR0330) provided by ON Semiconductor of Arizona, US.

Turning to Fig. 2, a 100 method according to the invention is shown.

In step 101, the image capturing devices 2a, 3a take a plurality of images of the object 7 over a certain time period, and transmit them to the processor. The plurality of images preferably comprises at least two images taken respectively by the image capturing devices 2a, 3a at the same time (that is a stereo pair of simultaneously taken images); as discussed below this stereo pair can be used for generating an initial model of the object 7 stereoscopically, and optionally in other ways.

Furthermore, the plurality of images comprises, for each of the energy sources 1,2b, 3b, at least one image taken by one of the image capturing devices 2a, 3a at a time when the processor controls that energy source to illuminate the object 7, and controls the other energy sources not to illuminate the object 7. These images, which of course are not all taken at the same time, will be used for photometric modelling of the object.

Note that one or more of the images used for the photometric modeling may be one of the stereo pairs used for stereoscopic modelling. Alternatively, the stereo pair of images may be captured at a time when all three of the image capturing devices are illuminating the object 7 (and/or the object 7 is being illuminated by other energy sources (not shown) which need not be directional), and the images for the photometric modelling may be captured at time when only one of the energy sources 1,2b, 3b is illuminating the object 7.

Optionally, the set of images may include at least one “dark image” which is captured by one of the image capture devices 2a, 3a at a time when none of the energy sources is illuminating the object, so that the object is just reflecting ambient light. Such an image may be useful to measure how much ambient light the object reflects. For example, the ambient light reflected from each pixel may optionally be subtracted from the images used to perform photometry.

Fig. 3 shows two of the images 9, 10 of the object 7 captured by one of image capture devices 2a, 3a at different respective times. Each image shows in the foreground a view of the object 7 from a different respective viewpoint. Each image also includes a portion showing whatever lies behind the object 7, in the background.

In step 102, for at least one of the images (“feature selection image(s)”), the processor seeks features of one or more of the feature selection images which are likely to correspond to (i.e. be images of) landmarks on the object 7. A first way in which step 102 may be carried out is shown in Fig. 4. In sub-step 201, the method identifies bright regions in a feature selection image. These regions are likely to correspond to areas of the object 7 rather than the background. Alternatively, if the background is known to have a certain color and/or pattern, then sub-step 201 may be include rejecting all regions of the image which have this color or pattern.

In sub-step 202, the processor identifies features in the bright region of the feature selection image. This may be done using any standard algorithm for identifying features. Such algorithms are used in the methods for performing stereographic modeling disclosed in WO 2009/122200.

For example, if the feature selection image is the image 9 of Fig. 3, the identified set of identified features may be at the locations shown in Fig. 6(a). All these are vertices of the dodecahedron 7. They are used as “reference features”. A second way in which step 102 may be carried out is shown in Fig. 5. In this case, step 102 is performed using two feature selection images, which are a stereo pair of images simultaneously captured by the respective image capturing devices 2a, 3a.

In this case, step 102 includes a sub-step 301 of the processor identifying features in the stereo pair of feature selection images. The algorithm for identifying features may be the same as explained above in relation to in sub-step 202, but unlike in sub-step 202 each of the features of one of the feature selection images is matched with a corresponding feature in the other of the feature selection images. In other words, a number of pairs of features are identified, with the features of each pair being in the different respective feature selection images. The pair of features correspond to (i.e. are images of) the same landmark on the object 7 or on the background. The system of WO 2009/122200, makes use of well-known algorithms for matching features in this way.

In sub-step 302, the processor uses stereoscopy, and the known positional relationship of the image capturing devices 2a, 3a to determine, for each of the feature pairs, the position of the corresponding landmark in a three-dimensional space defined based on the imaging system. From this, the distance is obtained of each landmark from a position in the imaging system. The processor then rejects all landmarks which are found to be out of a certain distance range from the position in the imaging system, e.g. landmarks with a distance from the position in the imaging system which is outside a certain distance range defined by one or more distance parameters (e.g. greater than 2 meters or less than 50 mm; or greater than 550mm or less than 350mm), and the features corresponding to the remaining landmarks are used as the reference features. The distance range may be chosen to reflect the distance of the object 7 from the imaging system. Optionally, if the number of feature pairs for which the corresponding location is in this range turns out to be high, the process may be repeated using a narrower distance range.

Returning to Fig. 2, in step 103 the processor finds features in all the other images (i.e. the images other than the feature selection image(s)) which correspond to the features of the feature selection image(s) identified in step 102. For example, the features in the image 10 of Fig. 3(b) which correspond to the reference features shown in Fig. 6(a), are at the locations shown by dots in Fig. 6(b).

In step 104, the system uses a known homography algorithm, using the features identified in steps 102 and 103, to determine the viewpoints of all the images relative to the object 7 in a common coordinate system. In this way, all the images are registered with each other accurately in the common coordinate system.

Optionally, there may be at least one respective feature selection image for each of the image capturing devices 2a, 3a. If so, all the images captured by one of the image capturing devices may be registered together into a first common coordinate system in which the object is stationary (that is, “mutually registered”) based on the respective feature selection image for that image capturing device. Similarly, all the images captured by the other of the image capturing devices may be mutually registered into a second common coordinate system in which the object is stationary, based on the respective feature selection image for the other image capturing device.

The images captured by one of the image capturing devices 2a include at least one image captured at the same time as an image captured by the second image capturing device 3a (i.e. the at least one stereo pair of images). Thus, the fixed geometrical relationship of the positions and imaging directions of the image capturing devices 2a, 3a may be used to register the respective viewpoints of the two simultaneously captured images. Using this information, all the mutually registered images captured by one of the image capturing devices 2a are registered with all the mutually registered images captured by the other image capturing device 3a, so that all the images captured by both the image capturing devices 2a, 3a have known viewpoints in a common reference frame in which the object is stationary.

The homography algorithm may incorporate any prior knowledge of motion between the object and the imaging system. For example, if it is known that the object is moving past the imaging system on a conveyor belt at a certain speed, that information may be used to give an initial estimate of the relative motion between times at which two images were captured. The initial estimate may be refined using the identified features.

Optionally, the homography algorithm includes calculating a quality control index, determining whether the quality control index is above or below a threshold, and if the quality control index is below the threshold issuing a warning to a user of the imaging system. For example, the quality control index may simply be a measure of the offset between two or more of the images. In this case, the threshold may be set to warn the user when the offset is sufficiently great that the 3D imaging process may be unreliable. In this way, the embodiment may be able to issue a warning to the user before the computationally-complex formation of the 3D computer model is carried out (see step 105 below).

Optionally, if features are found in at least three images captured at different known times by a single energy sensor, the algorithm may assume that the relative motion of the camera and energy sensor was uniform over the period in which the at least three images were captured, and use this assumption to improve the estimation of the motion. For example, if one of the images is darker than the others, such that the reference features cannot be identified in that image, the algorithm may use a relative motion calculated from the other images to estimate the viewpoint from which the dark image was captured. This possibility is particularly useful if the images include a “dark image”, in which the reference features of the object 7 are hard to identify: the viewpoint from which the dark image was captured may be inferred from the respective determined viewpoints of two images captured at neighboring times. For example, the viewpoint of a dark image taken by one of the image capture devices 2a, 3a may be assumed to be an average of the viewpoints of an immediately preceding image and an immediately succeeding image taken by the same image capture device (i.e. by interpolation). Or, the viewpoint of a dark image may be inferred by extrapolation from the viewpoints of two or more preceding images, or two or more succeeding images. Since interpolation is usually more accurate than extrapolation, the former possibility is preferred, which suggests that the dark image captured by a certain image capture device should be captured in the middle of the sequence of images captured by that image capture device.

In step 105, the method uses the images to form a 3D model of the object. This can be by the method described in WO 2009/122200. In brief, two acquisition techniques are used to construct the 3D model.

One technique of acquisition is passive stereoscopic reconstruction, which calculates surface depth based on optical triangulation. This is based around known principles of optical parallax. This technique generally provides good unbiased low-frequency information (the coarse underlying shape of the surface of the object), but is noisy or lacks high frequency detail. The other technique is photometric reconstruction, in which surface orientation is calculated from the observed variation in reflected energy against the known angle of incidence of the directional source. This provides a relatively high-resolution surface normal map alongside a map of relative surface reflectance (or illumination-free color), which may be integrated to provide depth, or range, information which specifies the 3D shape of the object surface. Inherent to this method of acquisition is output of good high-frequency detail, but there is also the introduction of low-frequency drift, or curvature, rather than absolute metric geometry because of the nature of the noise present in the imaging process. Thus the two methods can be seen to be complementary. The model may be formed by forming an initial model of the shape of the object 7 using stereoscopic reconstruction, and then refining the model using the photometric data.

The stereoscopic reconstruction uses optical triangulation, by geometrically correlating pairs of features in the respective stereo pair of images captured by the image capture devices 2a, 3a to give the positions of each of the corresponding landmarks in a three-dimensional space defined based on the imaging system. If step 102 was performed in the way shown in Fig. 5, then these steps have already been performed, and need not be repeated. The positions of the landmarks are then used to form the initial model of the object 7. Note that in variations of the embodiment the initial model of the object 7 may be formed in other ways, such as using a depth camera.

The photometric reconstruction requires an approximating model of the surface material reflectivity properties. In the general case this may be modelled (at a single point on the surface) by the Bidirectional Reflectance Distribution Function (BRDF). A simplified model is typically used in order to render the problem tractable. One example is the Lambertian Cosine Law model. In this simple model the intensity of the surface as observed by the camera depends only on the quantity of incoming irradiant energy from the energy source and foreshortening effects due to surface geometry on the object. This may be expressed as: /= PpL · N (Eqn 1) where / represents the intensity observed by the image capture devices 2a, 2b at a single point on the object, P the incoming irradiant light energy at that point, N the object-relative surface normal vector, L the normalized object-relative direction of the incoming lighting and p the Lambertian reflectivity of the object at that point. Typically, variation in Pand L is pre-known from a prior calibration step, or from knowledge of the position of the energy sources 1, 2b, 3b, and this (plus the knowledge that N is normalized) makes it possible to recover both N and p at each pixel. Since there are three degrees of freedom (two for N and one for p), intensity values / are needed for at least three directions L in order to uniquely determine both N and p. This is why three energy sources 1,2b, 3b are provided.

The data obtained by the photometric and stereoscopic reconstructions is fused by treating the stereoscopic reconstruction as a low-resolution skeleton providing a gross-scale shape of the object, and using the photometric data to provide high-frequency geometric detail and material reflectance characteristics.

Fig. 7 is a block diagram showing a technical architecture of the overall system 200 for performing the method.

The technical architecture includes a processor 322 (which may be referred to as a central processor unit or CPU) that is in communication with the cameras 2a, 3a, for controlling when they capture images and receiving the images. The processor 322 is further in communication with, and able to control the energy sources 1,2b, 3b.

The processor 322 is also in communication with memory devices including secondary storage 324 (such as disk drives or memory cards), read only memory (ROM) 326, random access memory (RAM) 328. The processor 322 may be implemented as one or more CPU chips.

The system 200 includes a user interface (Ul) 330 for controlling the processor 322. The Ul 330 may comprise a touch screen, keyboard, keypad or other known input device. If the Ul 330 comprises a touch screen, the processor 322 is operative to generate an image on the touch screen. Alternatively, the system may include a separate screen (not shown) for displaying images under the control of the processor 322.

The system 200 optionally further includes a unit 332 for forming 3D objects designed by the processor 322; for example the unit 332 may take the form of a 3D printer. Alternatively, the system 200 may include a network interface for transmitting instructions for production of the objects to an external production device.

The secondary storage 324 is typically comprised of a memory card or other storage device and is used for non-volatile storage of data and as an over-flow data storage device if RAM 328 is not large enough to hold all working data. Secondary storage 324 may be used to store programs which are loaded into RAM 328 when such programs are selected for execution.

In this embodiment, the secondary storage 324 has an order generation component 324a, comprising non-transitory instructions operative by the processor 322 to perform various operations of the method of the present disclosure. The ROM 326 is used to store instructions and perhaps data which are read during program execution. The secondary storage 324, the RAM 328, and/or the ROM 326 may be referred to in some contexts as computer readable storage media and/or non-transitory computer readable media.

The processor 322 executes instructions, codes, computer programs, scripts which it accesses from hard disk, floppy disk, optical disk (these various disk based systems may all be considered secondary storage 324), flash drive, ROM 326, RAM 328, or the network connectivity devices 332. While only one processor 322 is shown, multiple processors may be present. Thus, while instructions may be discussed as executed by a processor, the instructions may be executed simultaneously, serially, or otherwise executed by one or multiple processors.

Whilst the foregoing description has described exemplary embodiments, it will be understood by those skilled in the art that many variations of the embodiment can be made within the scope of the attached claims.

Claims

1. An apparatus for computing a three-dimensional (3D) model of an object, comprising: at least one directional energy source arranged to directionally illuminate the object in at least three directions; an imaging sensing assembly having at least one energy sensor arranged to capture images of the object, the captured images including images respectively captured when the object is illuminated in a corresponding one of the directions; a processor arranged to analyze the captured images, by: identifying reference features in at least one feature selection image among the captured images; finding features in other of the captured images corresponding to the reference features; using the respective positions of the reference features in the at least one feature selection image and corresponding features in the other images, to determine the difference between (i) the viewpoint relative to the object from which the at least one feature selection image was captured, and (ii) the viewpoint relative to the object from which the other captured images were captured, thereby registering the captured images in a common coordinate system in which the object is at rest; and using the registered images to obtain a 3D model of the object.

2. An apparatus according to claim 1 in which the processor is adapted to obtain the 3D model of the object from the registered images by: obtaining an initial model of the object; and obtaining photometric data from the images captured respectively when the object is illuminated in a corresponding one of the directions; and refining the initial model using the photometric data.

3. An apparatus according to claim 2 in which the processor is arranged to obtain the initial model of the object by stereoscopic reconstruction using optical triangulation using at least two of said captured images which were captured simultaneously by at least two respective said energy sensors having known positions relative to each other.

4. An apparatus according to claim 1, claim 2 or claim 3 in which the processor is arranged to identify the reference features in the at least one feature selection image, by: deriving a plurality of candidate features in the at least one feature selection image, each of the candidate features being an image of a corresponding landmark in the field of view of energy sensor which captured the feature selection image; determining which of the candidate features meet a criterion; and selecting the reference features from among the candidate features, the reference features comprising the candidate features which meet the criterion.

5. An apparatus according to claim 4 in which the processor is arranged to determine which of the candidate features meet the criterion by: identifying one or more regions of the feature selection image which satisfy a characteristic, the criterion being whether the candidate feature is within one of the identified regions.

6. An apparatus according to claim 5 in which the characteristic is that an average brightness of the region is above a threshold.

7. An apparatus according to claim 4 in which each of the candidate features is associated with a brightness level in the at least one feature selection image, and the criterion is whether the brightness level is above the threshold.

8. An apparatus according to claim 4 in which the selection of the reference features from among the candidate features includes, for each candidate feature, estimating a corresponding distance of the corresponding landmark from the imaging system, and said criterion is whether the corresponding distance is within a distance range.

9. An apparatus according to claim 8 in which the processor is arranged to use at least two said feature selection images, the feature selection images being ones of the captured images which were captured simultaneously by respective spaced apart said energy sensors with a known mutual positional relationship, the processor being arranged to estimate the distance of the corresponding landmark from the imaging system by: identifying the candidate features in the respective feature selection images which are images of the landmark; and geometrically calculating the distance of the corresponding landmark using the positions in the features selection images of the identified candidate features, and the known mutual positional relationship of the energy sensors which captured the feature selection images.

10. An apparatus according to any preceding claim in which the processor is arranged to: use the determined differences between the viewpoints relative to the object from which the images were captured to calculate a quality control index, determine whether the quality control index is above or below a threshold, and according to the determination of whether the quality control index is above or below the threshold, to issue a warning signal.

11. An apparatus according to any preceding claim in which the processor is arranged, for a set of at least three said captured images captured by a single one of the energy sensors at different respective times, to use the determined difference between (i) the viewpoint relative to the object from which the at least one feature selection image was captured, and (ii) the viewpoint relative to the object from which at least two of the set of captured images were captured, to estimate the viewpoint relative to the object from which another of the set of captured images was captured.

12. A computerized method for forming a three-dimensional (3D) model of an object, the method comprising: (a) illuminating the object in at least three directions; (b) using at least one energy sensor to capture images of the object, the captured images including images respectively captured when the object is illuminated in a corresponding one of the directions; (c) identifying reference features in at least one feature selection image among the captured images; (d) finding features in other of the captured images corresponding to the reference features; (e) using the respective positions of the reference features in the at least one feature selection image and corresponding features in the other images, to determine the difference between (i) the viewpoint relative to the object from which the at least one feature selection image was captured, and (ii) the viewpoint relative to the object from which the other captured images were captured, thereby registering the captured images in a common coordinate system in which the object is at rest; and (f) using the registered images to obtain a 3D model of the object.

13. A method according to claim 12 in which step (f) is performed by: obtaining an initial model of the object; and obtaining photometric data from the images captured respectively when the object is illuminated in a corresponding one of the directions; and refining the initial model using the photometric data.

14. A method according to claim 13 in which the initial model of the object is obtained by stereoscopic reconstruction using optical triangulation using at least two of said captured images which were captured simultaneously by at least two respective said energy sensors having known positions relative to each other.

15. A method according to claim 12, claim 13 or claim 14 in which step (c) is performed by: deriving a plurality of candidate features in the at least one feature selection image, each of the candidate features being an image of a corresponding landmark in the field of view of energy sensor which captured the feature selection image; determining which of the candidate features meet a criterion; and selecting the reference features from among the candidate features, the reference features comprising the candidate features which meet the criterion.

16. A method according to claim 15, said determination of which of the candidate features meet the criterion including identifying one or more regions of the features selection image which satisfy a characteristic, the criterion being whether the candidate feature is within one of the identified regions.

17. A method according to claim 16 in which the characteristic is that an average brightness of the region is above a threshold.

18. A method according to claim 15 in which each of the candidate features is associated with a brightness level in the at least one feature selection image, and the criterion is whether the brightness level is above the threshold.

19. A method according to claim 15 in which the selection of the reference features from among the candidate features includes, for each candidate feature, estimating a corresponding distance of the corresponding landmark from the imaging system, and said criterion is whether the corresponding distance is within a distance range.

20. A method according to claim 19 in which there are at least two said feature selection images, the feature selection images being ones of the captured images which were captured simultaneously by respective spaced apart energy sensors with a known mutual positional relationship, the estimation of the distance of the corresponding landmark from the imaging system being performed by: identifying the candidate features in the respective feature selection images which are images of the landmark; and geometrically calculating the distance of the corresponding landmark using the positions in the features selection images of the identified candidate features, and the known mutual positional relationship of the energy sensors which captured the feature selection images.

21. A method according to any of claims 12 to 20, further comprising: using the determined differences between the viewpoints relative to the object from which the images were captured to calculate a quality control index, determining whether the quality control index is above or below a threshold, and according to the determination of whether the quality control index is above or below the threshold, issuing a warning signal.

22. A method according to any of claims 12 to 21, further comprising, for a set of at least three said captured images captured by a single one of the energy sensors at different respective times, using the determined difference between (i) the viewpoint relative to the object at which the at least one feature selection image was captured, and (ii) the viewpoint relative to the object from which at least two of the set of captured images were captured, to estimate the viewpoint relative to the object from which another of the set of captured images was captured.