Image processing system, method and computer program product
The present invention relates to an image processing system, method, and computer program product.
There exists a significant quantity of recorded image data such as, for example, video, television, films, CDRO images and computer generated images . Typically once the images have been recorded or generated and then recorded, it is very^time consuming and very difficult to give effect to any 'changes to the images .
It is often the case that a recorded sequence of images may contain artefacts which were not intended to be included within the sequence of images. Alternatively, artefacts may be added to a sequence of images. For example, it may be desirable to add special effects to the images . The removal or addition of such artefacts is referred to generically as post-production processing and involves a significant degree of skill and time.
Producing a mosaic of or modifying images which have already been recorded is a very difficult task. The degree of effort required to modify or mosaic images or a sequence of images can be reduced if information is available which relates to at least one of the orientation or the position of a camera at the time when the images were produced. Deriving image camera data from a sequence of images is disclosed in, for example, "Creating Full View Panoramic Image Mosaics and Environment Maps", Szeliski R., Shum, HY, Computer Graphics Proceedings, Annual Conference Series, pp 251-258, 1997. However, the processing required by the techniques disclosed within the above paper are very complex and require a significant degree of numerically intensive processing in order to solve equations containing eight unknowns .
Often, special effects may be added to an image or other edits may be made either digitally or manually. The placement of such special effects typically relies upon the judgment of the artist or operator giving effect to the edits. Clearly, there is scope for some error in the placement of the special effects as they are made to a clean plate .
It is therefore an object of the present invention to at Feast mitigate some of the problems of the prior art .
Accordingly the present invention provides a method for determining camera image data for at least one of first and second images of a sequence of images within an image processing system comprising memory for storing said first and second images, said method comprising the steps of storing in said memory the first and second images; and deriving, as a consequence of a comparison between the first and second images, camera image data for at least one of the first and second images.
Advantageously, the camera image data for each image within a sequence thereof can be determined. The determined camera image data can then be used in order to give effect to further image processing such as, for example, producing a mosaic of selected images .
A second aspect of the present invention provides an image processing system for determining camera image data for at least one of first and second images of a sequence of images, the system comprising memory for storing said first and second images, means for storing in said memory the first and second images; and means for deriving, as a consequence of a comparison between the first and second
images, camera image data for at least one of the first and second images .
It will be appreciated that the means referred to above and in the claims can be preferably realised using a computer such as that schematically depicted in figure 1 and suitable program code.
Accordingly, a third aspect of the present invention provides a computer program product for determining camera imag-e data for at least one of first and second images of a sequence of images within an image processing system comprising memory for storing said first and second images, the product comprising computer program code means for storing in said memory the first and second images; and computer program code means for deriving, as a consequence of a comparison between the first and second images, camera image data for at least one of the first and second images.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings in which: figure 1 illustrates a computer suitable for implementing an image processing system or method according to the embodiments of the present invention; figure 2 depicts a sequence of source images from which at least one generated image can be produced; figure 3 shows a world co-ordinate system having an image i positioned therein according to corresponding image camera data cj., β
i7 θi and λj
.; figure 4 illustrates several equations utilised by the present invention; figure 5 shows a flow chart for deriving camera image data, (Xj
. , β
t ,θ
',
and μ; and
figure 6 illustrates a further equation utilised by the present invention.
Referring to figure 1 there is shown a computer 100 suitable for implementing embodiments of the present invention. The computer 100 comprises at least one microprocessor 102, for executing computer instructions to process data, a memory 104, such as ROM and RAM, accessible by the microprocessor via a system bus 106. Mass storage devices 108 are also accessible via the system bus 106 and are -used to store, off-line, data used by and instructions for execution by the microprocessor 102. Information is output and displayed to a user via a display device 110 which typically comprises a VDU together with an appropriate graphics controller. Data is input to the computer using at least one of either of the keyboard 112 and associated controller and the mass storage devices 108.
Referring to Figure 2, there is illustrated schematically several images 1 to N of a sequence of images 200. The images may be derived from a film, video tape, CD- ROM, video disc or other mass storage medium. Typically, the image data varies as between images as a consequence of a combination of motion of an object within an image and as a consequence of, for example, variations in the position or orientation of a camera during image capture. Typically, only once the images are recorded, the camera orientation information is lost.
Referring to figure 3 there is shown schematically a world co-ordinate system 300 comprising three axes 302, 306 and 308 which correspond to the x, y, and z axes respectively. The position of one image, image j, with reference to another image, image i, can be described in terms of the orientation of a unit vector, u, within the
world co-ordinate system in conjunction with a degree of rotation, θi7 about that vector. The orientation of the unit vector within the world co-ordinate system 300 can be specified using two angles, cq and βi . The combination of cq, βi and θi define a rotation which maps image i onto image j or which can be used to identify corresponding portions of images i and j . The distance λt represents the distance from the origin to the centre of the image i. Similarly, the distance λ represent the distance from the origin to the centre. of image j. Conceptually, the camera is positioned such that the optical centre thereof is as close as possible to the origin. " The line of sight 308 of the camera is normal to the centre of the image i, which represents in reality the field of view of the camera at a focal length determined by λi.
It will be appreciated that the camera image data C-, βi, θt, and λt is not available when retrieving images from some mass storage media such as, for example, a video disc or CD-ROM or the like.
The relationship between a point Q=(x,y,z), in terms of world co-ordinates, within an image frame or field of view at a first orientation of the camera and the same point Q=(x',y',z') within the field of view at a second orientation of the camera is given by the equations shown in figure 4 where u,., Uy and uz are the components of the unit vector u, θ is the degree of rotation about the unit vector, Qr= (xr/yr, zr) represents the rotation of the point Q about the unit vector u and x' , y' and z' represent the location, in terms of world co-ordinates, of the projection of point Qr onto the second field of view. It will be appreciated that f is the scaling required after rotation of Q by R to
project the rotated point onto the field of view of at the second orientation.
Referring to figure 5 there is shown a flow chart for deriving camera image data, cc.. , βi ,θ' and Λ(=ln(λi)), describing the orientation within a world co-ordinate system of a camera which captured each of the images in the sequence of images shown in figure 2, wherein θ' decouples θ and λ using the equation shown in figure 6.
Essentially, the determination of the above parameters utilises a function which compares the colour differences between pixels of any two given images. The first image of the two images is addressed using conventional pixel row and address information. The second image of the two images is addressed using co-ordinates which have been transformed using the matrix T as shown in figure 4. The second image is preferably addressed using sub-pixels so that colour interpolation can be performed. The function which is used to measure the colour difference and hence derive optimal values for cX , βt ,θ' .and Λ is as follows:
ΔC=(l/N)Σ((δr)2+(δg)2+(δb)2) for all N
where δr = rL - rjf δg = gL - gjy δb = bi - hi represent the differences between the RGB colour components any given pixels i and j , and
N is the number of pixels for which the transformed coordinates fall inside the image frame boundary of the second image .
The minimisation process preferably uses the simplex method as is well known within the art (see, for example, "Numerical Recipes in C" , The Art of Scientific Computing,
second edition, Press by Teukolsky, Vetterling, Flannery, CUP, IUSBN 0 521 431085, the entire content of which is incorporated herein by reference) .
Referring more particularly to figure 5, two images i and j are selected at step 500 from a sequence of images 200. A portion or sample, P, of pixels from the first image i of the two images is selected at step 502. The sample P preferably comprises 2000 pixels which are expressed in terms of pixel rows and columns .
The initial estimates of the camera image data parameters are set at step 504. Preferably, at least three of the parameters α. , βi ,Θ',Λ and μ should be initialised. The initialised parameters are used to establish a first estimate of the matrix T illustrated in figure 4.
At step 506, the pixels of the sample P are transformed, using the current estimate of matrix T, into the second image to produce a second set of sample pixels P' . The two images i and j are compared at steps 508 and 510 to determine the degree of similarity or alignment therebetween. Preferably, effect is given to the comparison by extracting image data, such as, for example, rgb colour data, from the pixels at sample points P in image i and at sample points P' in image j and determining the differences between the colour data. Still more preferably, the differences are evaluated using the function given above for calculating ΔC.
A determination is made at step 512 as to whether or not a minimum value of ΔC has been reached. If not, processing continues at step 514 in which the estimates or
current values of a , βi ,Θ',Λ and μ are altered in accordance with the Simplex method and processing continues again at step 506. If a minimum has been reached, the current estimates for cq , βt ,θ', Λ and μ are stored for the pair of images i and j at step 516. This defines the relationship between corresponding portions of images i and j .
A determination is made at step 518 as to whether or not all images of the sequence of images have been processed. If not, processing continues at step 500 with the "selection of another two images from the sequence of images. If all source images have been processed a subset of images or eligible images are selected from the sequence of images according to predeterminable criteria at step 520. Preferably, eligible source images are those pairs of consecutive images in which the relative motion, that is distance, therebetween is at least 0.2 in terms of frame coordinates and for which there are no closer pairs of frames in the sequence. The estimate of the distance between selected images utilised the estimated values of α, β, θ, λ and μ.
The re-iteration of steps 500 to 518 for selected or all pairs of eligible images is commenced at step 530. Once it is determined that all eligible images have been processed, average values for Λ and μ are determined at step 540. Preferably, the averages are weighted according to the reciprocals of the final colour difference ΔC between the pairs of images .
Once estimates for Λ and μ have been obtained, these values are fixed and a re-iteration of steps 500 to 518 is repeated for all or a selected portion of the images in the sequence of images using the fixed values of Λ and μ in
order to produce refined estimates of α, β, and θ at step 550.
After the above iteration of the processing shown in the flow chart, each image pair has a set of camera image data governing the orientation, focal length and lens distortion of the camera at the instant the pair of images were captured.
For each pair of images a (3x3) relative rotation matrix,* Rij, is calculated, using the estimated values of α, β, θ, and λ. A -relative rotation matrix governs the relative rotation between the two images. The orientation of the first or a reference image of the sequence of images is selected and the relative rotation matrices are used to produced a set of matrices of each image relative to the first or reference image by combining or compositing the relative rotation matrices.
The derived rotation matrices or the relative rotation matrices can be used to define the absolute or relative positions in a world co-ordinate system of corresponding images relative to a reference image or relative to an arbitrary common reference .
Preferably, once the relative rotation matrices Ri have been derived for each pair of images i and j , a set of noncontiguous images, which are preferably substantially evenly distributed throughout the sequence of images, is selected such that there is some overlap between pairs of the selected images. For each consecutive pair of selected images i and j an estimate of the camera image data, α, β, θ, λ and μ, is determined as above to produce rotation matrices Si;j . The rotation matrix Si:j between any two images
i and j is calculated from the relative rotation matrices R of the images which are between images i and j in the sequence. The difference between S._ and R.. represents an error matrix E.. such that S.j =E1-.R1-. Therefore, S.-.R^'1 = E... The inverse matrix R.."1 is calculated by negating the value of θ in the matrix Rl: . The matrix Eι, that is the error, is then divided or spread across the relative rotation matrices for all or a selected number of the images which are between the two images i and image j . Assuming that the error is to be spread across n images, the error E._ is expressed in terms of α, β and θ and a rotation defined by α, β and θ/ή is applied to each relative rotation matrix for all images between images i and j . The value of n may represent all images which are between the two images i and j or n may present a selected number of the images between the two images i and j .
The resulting matrices can then be used for further image processing, such as producing a mosaic of images, clean plate generation, improving the image quality or resolution and the like.
Image capture includes capture of images using camera such as, for example, charge coupled devices, or computer generated images within, for example, a virtual reality context or computer animation context using a virtual camera.