WO2011054040A1

WO2011054040A1 - System and method for integration of spectral and 3-dimensional imaging data

Info

Publication number: WO2011054040A1
Application number: PCT/AU2010/001464
Authority: WO
Inventors: George V. Poropat
Original assignee: Commonwealth Scientific And Industrial Research Organisation
Priority date: 2009-11-03
Filing date: 2010-11-03
Publication date: 2011-05-12

Abstract

A system for producing a spatially referenced spectral image of an area of interest, said system comprising: a first imaging device for capturing a set of images of the area of interest; a second imaging device for capturing a spectral image of the area of interest; and at least one processor adapted to produce a spatially referenced spectral image by compiling a 3-dimensional image of the area of interest from the set of images captured by said first imaging device and merging the 3-dimensional image and spectral image captured by the second imaging device, wherein the merging of the 3-dimensional image and spectral image is based on a predetermined relationship between the first and second imaging devices.

Description

TITLE

System and Method for Integration of Spectral and 3-Dimensional Imaging Data BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to a system and method for providing spatial references for spectral data. In particular, although not exclusively, the present invention relates to the integration of hyperspectral or multispectral image data with 3 dimensional spatial data.

Discussion of the Background Art

Image spectroscopy has been utilised across a wide range of fields including agriculture, mining, medical and in some surveillance and reconnaissance applications. Two specific forms of image spectroscopy which have broad application are multispectral and hyperspectral imaging. Essentially the two techniques form an image of an area or sample of interest by capturing measurements of reflected or emitted electromagnetic radiation from the area or sample of interest. The resultant spectra are then combined to form an image.

The boundary between multispectral and hyperspectral is somewhat difficult to distinguish. One primary means of distinguishing the two methods of imaging is based on the number of spectral bands utilised to produce the images. Multispectral data contains from tens to hundreds of bands. Hyperspectral data contains hundreds to thousands of bands. However, hyperspectral imaging may be best defined by the manner in which the data is collected. Hyperspectral data is collected as a set of contiguous bands (usually by one sensor). By contrast, multispectral data is typically composed of a set of optimally chosen spectral bands that are typically not contiguous and can be collected from multiple sensors.

As mentioned above, the use of such imaging techniques has been employed in a number of industries. One example of the application of multispectral and hyperspectral imaging is in the identification and classification of the mineralogy of an area or sample of interest. As the spectral signatures of the various minerals of interest are known, it is possible to identify the location and quality of mineral deposits within an area or sample of interest.

Due to the dwindling nature of readily accessible ore bodies, the extraction of viable ore is becoming more costly, difficult and hazardous. The increasingly risky nature of mineral extraction has seen a growing trend in the mining sector toward automation of the extraction process. There are a number of challenges in automation of the extraction process, such as ensuring that the automated equipment extracts the appropriate grade of ore in a relatively efficient manner. While the use of multispectral and hyperspectral imaging can assist with the identification of an appropriate ore body, the task of directing the mining equipment to extract the ore is less straight forward. To properly position the extraction equipment, knowledge of the relationship of the machine to objects in the vicinity of the machine and the relationship of the machine to the ore body it is intended to mine is also required (i.e. a degree of spatial awareness is required).

In military surveillance applications the use of spectral fingerprint analysis of data obtained from hyperspectral imaging can be extremely effective in identifying a target within a relatively noisy environment, e.g. suburban environments etc. The notion of spectral finger printing provides that any given object has a unique spectral signature within a range of bands. As hyperspectral images provide spectral readings over a large portion of the spectrum for a given area of interest, it is possible to isolate the spectral fingerprint of a particular object within the surveyed area. Once a target is located, its geographical position can be determined and ordinance directed to the target based on its geographical position. Again a degree of spatial awareness is required to accurately guide the ordinance. Simply providing geographic coordinates for the target without taking into account its surrounding environment can lead to disastrous results. Providing additional spatial reference data could aid in guidance and reduce the occurrence of collateral damage to infrastructure in urban warfare environments. SUMMARY OF THE INVENTION

Disclosure of the Invention

In one aspect of the present invention there is provided a system for producing a spatially referenced spectral image of an area of interest, said system comprising: a first imaging device for capturing a set of images of the area of interest;

a second imaging device for capturing a spectral image of the area of interest; and

at least one processor adapted to produce a spatially referenced spectral image of the area of interest by compiling a 3-dimensipnal image of the area of interest from the set of images capture by said first imaging device and merging the 3-dimensional image and the spectral image captured by the second imaging device, wherein the merging of the images is based on a predetermined relationship between the first and second imaging devices.

In another aspect of the invention there is provided a method of producing a spatially referenced spectral image of an area of interest, said method comprising the steps of:

^• generating a 3-dimensional image of the area of interest from a set of images captured via a first imaging device;

capturing a spectral image of the area of interest via a second imaging device; and

merging the spectral image with the 3-dimensional image of the area of interest to produced a 3-dimensional spatially referenced spectral image of the area of interest, wherein the merging of the spectral image and 3-dimensional image of the area of interest is performed based on a predetermined relationship between the first and second imaging devices.

The first imaging device may be a conventional camera, a laser scanning system or similar device. Alternatively, the first imaging device may be a pair of cameras arranged in stereo. The cameras may be paired in a fixed relationship or variable relationship (i.e. the distance between the two may be varied). The second imaging device may be a spectral camera. Preferably the spectral camera has sufficient resolution to capture a multispectral and/or hyperspectral image of the area of interest.

Suitably the merging of the 3-dimensional and spectral images is preformed on the basis of the spatial relationship, that is the position and orientation, of the second imaging device relative to the first imaging device. The merging of the 3-dimensional and spectral images may include the step of calculating the spatial location of every pixel based on the spatial relation between the second imaging device relative to the first imaging device.

In the case where the first imaging device is a pair of cameras arranged in stereo, the relationship of the second imaging device and one or both the cameras of the first imaging device may be determined by any suitable means. For example the relationship between the second imaging device and the pair of cameras used for the stereo image acquisition may be determined using the methods associated with image triplets e.g. calculation of the trifocal tensor and associated parameters. Once the relationship between the spectral camera and one or other of the stereo camera pair is known the spectral pixels can then be referenced to the 3-dimensional image created from the stereo image pair using computational geometry to calculate the intersection of the ray associated with each pixel with the 3D surface created from the stereo image pair.^' Alternatively, calculation of the spatial position of a spectral pixel may be achieved by calculating the spatial intersection of rays from the stereo camera pair and the spectral camera.

Suitably, the 3-dimensional image can be georeferenced such that the resultant combined image includes the georeferences allowing the spectral data to be referenced against the georeferences.

Throughout the specification the term "comprising" shall be understood to have a broad meaning similar to the term "including" and will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. This definition also applies to variations on the term "comprising" such as "comprise" and "comprises".

BRIEF DETAILS OF THE DRAWINGS

In order that this invention may be more readily understood and put into practical effect, reference will now be made to the accompanying drawings, which illustrate preferred embodiments of the invention, and wherein:

FIG. 1 is a schematic diagram depicting the process of generating a georeferenced hyperspectral image according to one embodiment of the present invention;

FIG. 2 is a schematic diagram depicting the geometry for the determination of the line of sight for a pixel according to one embodiment of the present invention; and

FIG. 3 is a schematic diagram depicting the geometry for the determination of the line of sight for a pixel according to a further embodiment of the present invention.

DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Fig 1 broadly illustrates the process for generating a georeferenced spectral image of a scene of interest according to one embodiment of the present invention. As shown, a series of static images 10 of the scene of interest are obtained 100 via a first imaging device 12. A spectral image 11 (preferably a hypersectral or multispectral image) of the scene of interest is simultaneously obtained 101 via a second imaging device 3. The static images 10 of the scene of interest are then collated to produce a georeferenced 3-dimensional (3D) image of the scene of interest 102. The spectral information obtained from the spectral image 11 is then integrated into the georeferenced 3D image 103. Each pixel in the resultant 3D image is in essence the integration of the radiant energy from the field of view of the pixel collected by the second imaging device and the spatial position of the particular pixel in relation to the three dimensional surface of the image produced from the set of images obtained from the first imaging device. The resultant georeferenced spectral image is then displayed for further analysis 104.

In most cases the radiant energy received from the scene will be dominated by the primary spectra emitted or reflected from any solid surface within the field of view. It should be appreciated, however, that there are some instances where this is not the case. One example of where the radiant energy received by the spectral imaging device may not be dominated by the primarily spectra emitted or reflected by the surface is when the path length from the sensor to the surface is long enough for scattering to occur. For example, when viewing distant objects the objects may be obscured by atmospheric haze.

In order to produce a georeferenced pixel within a given image the spatial location of the pixel being imaged must be determined. The spatial location of each pixel within the image can be determined by firstly identifying the line of sight through the optical system of the spectral imaging device for each pixel in the image. Once this is achieved, the spatial location of each pixel is then determined based on the intersection point of the line of sight with a point on the surface from which the radiant energy was emitted or reflected. This process requires knowledge of the spectral imaging device characteristics (the internal geometry of the spectral imaging device, known as the interior orientation), knowledge of the line of sight for each pixel (derived from the external geometry of the spectral imaging device, known as the exterior orientation of the sensor), and a three dimensional model of the surface intersecting the line of sight. The three dimensional model of the surface intersecting the line of sight must be defined in a manner that supports identification of the spatial location of the intersection of the line of sight with the given surface.

For the purposes of the following discussion it will be assumed that the spectral imaging device's characteristics (i.e. internal orientation) are determined by independent means. As noted above, the determination of the line of sight for each pixel also requires information on the exterior orientation of the sensor. The exterior orientation and the interior orientation are used in conjunction to estimate the direction of the line of sight of any pixel in the spectral image. The line of sight is calculated using the sensor calibration matrix which may have a number of forms.

For a spectral imaging device pointed so that it is oriented horizontally and pointed along the Y axis, the typical alignment used as the starting point for calculations for vehicle based sensors, the calibration matrix is:

c p_x 0

P_y C

0 1 0

where c is the distance to the principal point and ;¾ and p_y are the X and Y offsets of the principal point from the centre of the image.

For a spectral imaging device pointed so that it is oriented vertically and looking down, that is pointed along the Z axis, the typical alignment used as the starting point for calculations for aerial photography, the calibration matrix is:

-c 0 _Px

K = 0 -c _Py

0 0 1

where, as before, c is the distance to the principal point and p* and are the X and Y offsets of the principal point from the centre of the image.

For the purpose of the following discussion only the case for terrestrial imaging will be considered, that is, the case where the camera reference frame is assumed to be such that the optical axis is aligned to the Y axis (sometimes treated as North) and there is no rotation of the image frame relative to the X and Y axes. It should be appreciated that the mathematical description of other geometries is similar and for clarity of description, discussion of these geometries has been omitted. If a pixel centre has coordinates X and Y in the image frame, the vector from the principal point along the line of sight when expressed in the image frame is as follows:

This vector points from the principal point of the spectral imaging device somewhere into what might be described as the northern hemisphere of the spectral imaging device's coordinate system as shown in Fig 2. The vector is then aligned to the local world coordinate frame. The reference here to a world coordinate frame is a general reference since coordinate frames are defined in terms of some local datum and, in the case of 3D imaging systems, the local world coordinate frame may in fact be a relative frame aligned to some predefined coordinate system.

The orientation of the spectral imaging device is specified in terms of a tilt angle (the angle of rotation about the Y axis), an elevation angle (the angle of rotation about the X axis) and an azimuth angle (the angle of rotation about the Z axis). The vector is then converted to a world coordinate vector using the sequence of rotations around the Y, then the X, then the Z axis to align the spectral imaging device's coordinate system to the world coordinate system.

The tilt is defined to be consistent with rotations in a standard Right Hand Side (RHS) Cartesian system where the rotations align the world coordinate system to the camera coordinate system consistent with the use of the right handed definition of rotation. When using this convention, the rotation is viewed from the negative Y axis. The rotation to align the world coordinate system to the camera coordinate system is made in a clockwise direction about the Y axis. The coordinate transformation is then:

Similarly the rotations around the X axis and the Z axis are implemented using rotation matrices:

1 0 0

0 cos(fi>) sin(<y)

- sin(.y) cos(o)

To align the spectral imaging device's coordinate system to the world coordinate system, the rotations are reversed and so the vector along the line of sight is defined by:

where T indicates the transpose of a matrix.

If the position of the principal point of the camera is X_c, Yc, ¼: ^{tne ,ine of s}'9^{nt is tnen} specified by the two vectors vi (the vector from the world coordinate system origin to the camera principal point) and V2 (the vector along the line of sight). Vectors vi and v₂ are defined as follows.

v₂ = c R_z ^T

Any point on the line of sight is then given by:

where a is a variable parameter that effectively moves a point along the line.

The exterior orientation of the spectral imaging device is the position and orientation of the imaging device relative to the predefined coordinate system. Determination of the exterior orientation of the sensor may be achieved by a number of methods. Two methods of determining the exterior orientation of the spectral imaging device are direct and indirect measurement.

Under the direct method, the sensor is mounted on a device such as a theodolite. The position and the rotation angles of the imaging device are then directly measured to determine the imaging devices physical orientation. With the indirect method, a set of at least three non-colinear control points with known positions in the field of view of the spectral imaging device are established. From the control points the position and the rotation angles defining the orientation of the spectral imaging device are then estimated. The estimation of the position and rotation angles is commonly achieved through the use of the colinearity equations that define the relationship between a spatial point and the coordinates of the image formed by the point in the camera coordinate system. The estimation may also be performed using the coplanarity condition that described the geometrical relationship between the two cameras, the image coordinates and the spatial point, as is shown in Fig 3.

Ignoring distortion, the colinearity equations are commonly expressed as

m ^■{X - X_c)+ m₂₂ jY- Y_c )+ m₂₃ (Z - X_c )

y_a = -f - m '31^■(X - X_c)+ m₃₂ - (t- Y_c)+ m₃₃ {Z - X_c) where x_a, y_a are the image coordinates, /is the focal length of the camera, X, Y and Z are the world coordinates of the spatial point, Xc, Yc and Zc are the world coordinates of the camera principal point and the terms ^ are the terms of the rotation matrix that describes the orientation of the camera coordinate system relative to the world coordinate system.

These equations are then linearised by standard means and the set of all equations is solved for the world coordinates of the position of the camera principal point and the terms mm, from which the orientation angles are estimated.

Alternatively, the orientation of the spectral imaging device may be estimated in a coordinate frame that is defined by another camera. In this case, the coordinate system of the camera, which may be one camera of a stereo pair, may be treated in the same manner as the. world coordinate system. When the camera is one camera of a stereo pair, the three cameras (including the two cameras of the stereo pair and a spectral imaging device) create a triplet of images. In either case standard methods may be used to determine the orientations of the cameras in the coordinate frame of one camera, referred to here as the 'control' camera.

When the spectral imaging device is oriented relative to a single camera the relative orientation of the spectral imaging device is determined by any means using either the colinearity condition or the coplanarity condition. The position and orientation so determined are not to scale. The position and orientation of the spectral imaging device relative to the second camera of the stereo pair must then be determined and this data used with the position and orientation of the spectral imaging device relative to the control camera to fix the physical arrangement of all three cameras relative to each other.

Alternatively, the relationship between the second imaging device and the pair of cameras used for the stereo image acquisition may be determined by the calculation of the trifocal tensor from point or line correspondences between the .images. Knowledge of the trifocal tensor then allows estimation of the spatial relationship of all three cameras. One such example of the use of the trifocal tensor see is discussed in 'Multiple View Geometry in Computer Vision', R. Hartley and A. Zisserman, Cambridge University Press, 2003 which is herein incorporated by reference.

In the case of use with a stereo camera pair, knowledge of the baseline between the stereo cameras provides the scale information necessary to construct data that is accurate in position relative to the control camera, the coordinate system of which may be used as the world coordinate system or registered to the true world coordinate system as required.

The equations describing the relative orientation using the coplanarity condition have a different form the colinearity discussed above. Advantageously, coplanarity equations do not require a-priori estimates of the spatial locations of points seen in the images, the coordinates x_a arid y_a being the image coordinates of the spatial points. The coplanarity condition is expressed using the scalar triple product of the baseline between the two cameras and the lines of sight from the cameras to an object point.

6 · (α, ^χ α₂) = 0

When expressed in the form of a determinant this becomes

Once the line of sight for each pixel has been determined within the world coordinate system, or the apparent world coordinate system in the case of the use of the spectral imaging device with a stereo camera pair, the spatial location of the surface from which the energy associated with the pixel was reflected or emitted can be determined from the intersection of the vector denoting the line of sight with the 3D model of the surface. The phrase "3D model" as used here refers to any collection of 3D coordinates or specification of 3D functions that defines a surface. These may be in the form of an implicit surface defined by functional relationships or discrete measurements of spatial coordinates which may be tied together by some form of relationship such as a mesh to define the surface. The exact form of the surface model is not critical since any surface model may be decomposed to a set of points that may also be integrated into a mesh.

If the surface model is composed of a discrete set of points, then georeferencing is performed by finding the point within the set of spatial points that lies the closest to the line of sight for a pixel. Standard methods of geometry may be used to achieve this.

One method of geometry used to achieve this uses the fact that the vector between a point in 3D space and the nearest point on a line in 3D space must be perpendicular to the line. Thus, the dot (inner) product of the vector joining the points and the vector along the line must be zero. The solution of the equations describing this relationship gives the closest point on a line to any point in 3D space.

In the case where the surface model is composed of a set of points that have been integrated into a triangulated mesh, or any other form of mesh, georeferencing is performed by finding the intersection point of the line vector representing the line of sight for the pixel with the mesh. The element within the mesh lying closest to the point of intersection is then taken as the georeference point for the given pixel. As meshes are typically composed of sets of triangular facets (these being the most basic 3D surface shape), the following discussion will focus on meshes consisting of triangular facets. However, it should be appreciated that the process for determining the intersection point of the line of sight with a mesh composed of sets of triangular facets is equally applicable to other mesh structures.

A number of methods may be utilised to determine the intersection point of the line of sight of a pixel with every facet in the surface model. For example, KD trees can be used to restrict the amount of computation by reducing the number of facets for which the intersection point is determined. With any method a number of intersections will be indentified. Only facets for which the calculated intersection point lies within the boundaries of the facet will be valid points from which the pixel energy has been collected. Therefore, by testing whether the intersection point is internal to a facet, the true intersection may be identified. In some cases where a part of a surface is occluded, more than one true intersection point may be identified. In these cases the characteristics of the surface, that is whether it is opaque or transparent, may be used to identify the correct point of intersection.

In cases where the surface is defined by an implicit surface model, one method of georeferencing the pixel data is to create a triangulated mesh from the implicit surface model and then apply the technique described previously. Creating a triangulated mesh from an implicit surface model is used, for example, in computer graphics and other disciplines.

The processes discussed above for determining the line of sight of a pixel assumes that the pixel is represented by a line from the sensor principal point through the centre of the pixel. In practice the pixel will have a finite spatial extent. The georeferencing of the true shape of the pixel is achieved by undertaking the calculations described for the points defining the pixel limits. Typically, pixels are square structures on a piece of silicon or other detector material such as InGaAs, but they may have other shapes such as hexagons. The pixel shape may be well defined or approximated by a set of vertices such as the corners of a square and the calculations replicated for each vertex, thus defining the true spatial extent of the source of the radiant energy collected to create the pixel signal.

In the latter case, the line of sight to each vertex of the pixel structure can be used to estimate the apparent spatial location of the pixel vertex in the world coordinate system thus defining the distribution of the source of the radiant energy in the world coordinate system.

The georeferencing of spectral data as described above may be used to integrate such data with any form of 3D image characterised by a surface model. Such 3D images may be acquired using techniques such as, but not limited to lidar scanning, phbtogrammetry or radar. Examples of these are the lidar systems manufactured by companies such as Reigl (Germany), photogrammetry systems such as Sirovision created by CSIRO (Australia), or radar systems such as those manufactured by Groundprobe (Australia) or Ruetech Radar Systems (South Africa). Other forms of 3D imaging systems are also known.

The methods described above are used to determine the orientation of the sensor relative the 3D surface model. In the case of a 3D surface model where the 3D surface model is created from a stereo camera system, the spectral sensor may be integrated with the stereo cameras such that the exterior orientation of the spectral sensor relative to any 3D surface model is directly known. A similar approach may be used if the spectral sensor is collocated in a fixed arrangement with a lidar system or a radar system. Such systems may be used for guidance and control of machinery in activities such as mining.

It is to be understood that the above embodiments have been provided only by way of exemplification of this invention, and that further modifications and improvements thereto, as would be apparent to persons skilled in the relevant art, are deemed to fall within the broad scope and ambit of the present invention described herein.

Claims

1. A system for producing a spatially referenced spectral image of an area of interest, said system comprising:

a first imaging device for capturing a set of images of the area of interest;

at least one processor adapted to produce a spatially referenced spectral image by compiling a 3-dimensional image of the area of interest from the set of images captured by said first imaging device and merging the 3-dimensional image and spectral image captured by the second imaging device, wherein the merging of the 3-dimensional image and spectral image is based on a predetermined relationship between the first and second imaging devices.

2. The system of claim 1 , wherein the first imaging device is a camera.

3. The system of claim 1 , wherein the first imaging device is a laser scanning system and the set of images includes multiple measurements of range and angle of the area of interest relative to the laser scanning device.

4. The system of claim 1 , wherein the first imaging device includes a pair of cameras arranged in a stereo configuration.

5. The system of claim 4, wherein the cameras are paired in a fixed relationship.

6. The system of any one of claims 1 to 3, wherein the predetermined relationship is a fixed spatial relationship between the first and second imaging devices.

7. The system of claim 6, wherein the merging of the 3-dimensional image and spectral image further comprises calculating for each pixel its spatial location based on the spatial relationship between the second imaging device relative to the first imaging device.

8. The system of claim 4 or 5, wherein the predetermined relationship between the first and second imaging devices is determined by calculating a trifocal tensor of the system.

9. The system of claim 8, wherein each pixel within the spectral image is referenced to the 3-dimensional image by calculating a point of intersection for a ray associated with each pixel within the spectral image with the 3-dimensional image.

10. The system of claim 9, wherein the ray associated with each pixel is a vector representing the line of sight through the second imaging device to the centre of the pixel.

11. The system of claim 8, further comprising calculating the spatial position of a spectral pixel by calculating the spatial intersection of rays projected from the first and second imaging device.

12. A method of producing a spatially referenced spectral image of an area of interest, said method comprising the steps of:

generating a 3-dimensional image of the area of interest from an image captured by a first imaging device;

merging the spectral image with the 3-dimensional image of the area of interest to produce a 3-dimensional spatially referenced spectral image of the area of interest, wherein the merging of the spectral image and 3-dimensional image of the area of interest is performed based on a predetermined relationship between the first and second imaging devices.

13. The method of claim 12, wherein the set of images captured via the first imaging device is a set of photographs.

14. The method of claim 12, wherein the set of images captured via the first , imaging device is a set of optical images including multiple measurements of range and angle of the area of interest relative to the first imaging device.

15. The method of claim 12, wherein the set of images captured via the first imaging device is a set of stereo images.

16. The method of any one of claims 12 to 15, wherein the predetermined relationship is a fixed spatial relationship between the first and second imaging devices.

17. The method of claim 16, wherein the step of merging of the 3-dimensional image and spectral image further comprises calculating for each pixel its spatial location based on the spatial relationship between the second imaging device relative to the first imaging device.

18. The method of claim 15, wherein the predetermined relationship between the first and second imaging devices is determined by calculating a trifocal tensor of the first imaging device.

19. The method of claim 18, wherein the step of merging further includes calculating a point of intersection for a ray associated with each pixel within the spectral image with the 3-dimensional image.

20. The method of claim 19, wherein the ray associated with each pixel is a vector representing the line of sight through the second imaging device to the centre of the pixel.

21. The method of claim 18, further comprising calculating the spatial position of a pixel by calculating the spatial intersection of rays projected from the first and second imaging device.