WO2005122084A1

WO2005122084A1 - Method of motion correction in sequence of images

Info

Publication number: WO2005122084A1
Application number: PCT/EP2004/051080
Authority: WO
Inventors: Andrew Augustine Wajs
Original assignee: Active Optics Pty Ltd.
Priority date: 2004-06-09
Filing date: 2004-06-09
Publication date: 2005-12-22

Abstract

An image processing method includes a step (19) of calculating a motion vector (20) representing at least a component indicative of relative movement of at least a part of successive images in a sequence of images. The step (19) of calculating the motion vector includes a step of determining at least a first term in a series expansion representing at least one element of the motion vector (20), which step includes an estimation process wherein at least the part in each of a plurality of the images is repositioned in accordance with the calculated motion vector (20) and values of corresponding pixels in at least the repositioned parts are summed to form a combined image. The estimation process includes calculation of a measure of energy contained in an upper range of the spatial frequency spectrum of the combined image. The step of determining at least the first term includes at least one further iteration of the estimation process to maximise the energy.

Description

METHOD OF MOTION CORRECTION IN SEQUENCE OF IMAGES

The invention relates to an image processing method, including a step of calculating a motion vector representing at least a component indicative of relative movement of at least a part of successive images in a sequence of images, wherein the step of calculating the motion vector includes a step of determining ' at least a first term in a series expansion representing at least one element of the motion vector, which step includes an estimation process wherein at least the part in each of a plurality of the images is repositioned in accordance with the calculated motion vector and values of corresponding pixels in at least the repositioned parts are summed to form a combined image . The invention also relates to an image processing method, including a step of analysing frames in a sequence of frames of pixel values, each frame representing a corresponding one of a sequence of images associated with successive moments in time, including: determining a plurality of time series of cumulative pixel values, each one associated with a pixel position within the im- ages, and selecting pixel positions at which a deviation of the associated time series fulfils a pre-determined criterion. The invention also relates to a method of processing a plurality of first frames of pixel values, each first frame representing a corresponding one of a sequence of images, such that the pixel values of a first frame, when added to corresponding pixel values of other first frames, form a first frame representing a combined final image . The invention also relates to an image processing system, comprising a processor and memory for storing pixel values . The invention also relates to a digital camera. The invention also relates to a computer program product . An example of the first type of method mentioned above is known in the art as stacking or image combination and has been implemented in software packages designed for astronomers . The known technique involves taking several images of the same stars and combining them into a single image. This is done by identifying starts in the image, aligning the stars and then adding the images. By re-positioning the images before adding them, the combined image shows less blur. This technique works well on the types of images processed by astronomers, as the stars stand out as distinct spots against an otherwise homogeneously dark background. A disadvantage of the known method is that it is not well suited to other types of images. For example, underexposed images taken using a digital camera under daylight cannot be aligned very well. It is an object of the present invention to provide an improved method of the first-mentioned type that results in a motion vector with increased accuracy, ensuring better correspondence between corresponding parts in images re-positioned using the motion vector. This object is achieved by the method according to the invention which is characterised in that the estimation process includes calculation of a measure of energy contained in an upper range of the spatial frequency spectrum of the com- bined image and the step of determining at least the first term includes at least one further iteration of the estimation process to maximise the energy. Thus, the resulting motion vector is that which would lead to a combined image with the highest amount of detail . It is thus chosen as the motion vector which best aligns the repositioned corresponding part of each image, so that the combined image shows the least amount of blur. The term motion vector is used herein to denote a one- or multi-dimensional array of values quantifying one or more components of relative movement of the part of the image concerned. Variants are conceivable wherein the motion vector takes the shape of a tensor. In a preferred embodiment of the invention, the step of calculating the motion vector includes determining a further term in the series expansion, and the estimation process is it- eratively executed using a motion vector with an adjusted value of the further term to maximise the energy. Thus, an even more accurate determination of the motion vector is made possible. This takes account of the fact that the relative movement of (parts of) the successive images is usually the result of shaking objects within the image or disturbances in the position of the device used to capture the image. As the different sources of motion need not necessarily be correlated, calculating further terms in the series expansion enables a better characterisation of the various disturbance factors . In a preferred embodiment, the series expansion is a

Fourier sum representing at least one element of a time-varying motion vector, each image being associated with a point in time. This embodiment is highly suited to images captured using a camera, the motion vector characterising the various sources of camera shake and/or movement of objects represented in the images, which may each have their own characteristic frequency . In a preferred embodiment of the invention, the step of calculating the motion vector is carried out by manipulating frames of pixel values, each frame representing a corresponding one of the successive images, wherein the pixel values of each frame lie on one of a sequence of scales of discrete values, with increasing absolute maximum, the scales being applicable to respective successive sets of at least one image in the sequence of successive images. Thus, the dynamic range of the combined image is increased, resulting in improved resolution of the combined image. Preferably, the step of calculating the motion vector is carried out by manipulating frames of pixel values, each frame representing a corresponding one of the successive im- ages,_and the frames of pixel values are formed by exposing an image capturing device comprising an array of light-sensitive sensors at a certain exposure level; deriving at least one frame of pixel values by reading output values of the light-sensitive sensors; and re-setting the light-sensitive sensors when the certain exposure level has been achieved. This has the effect of reducing the level of shot noise when the frames are combined by adding corresponding pixel values. One variant of the method comprises repeatedly executing the steps of: exposing the image capturing device at a certain exposure level ; deriving a frame of pixel values by reading output values of the light-sensitive sensors when the exposure level has substantially been reached; and re-setting the light-sensitive sensors, wherein the certain exposure level is monotonically stepped between each execution of the steps. This variant has the advantage of being easy to implement in existing image capturing devices, such as digital cameras, which need only be suitably programmed. In a further embodiment, multiple frames of pixel values are formed by taking readings of the output values of the light-sensitive sensors at respective intervals between resets of the light-sensitive sensors. This has the advantage that the dynamic range of the combined image is increased, and that the amount of noise due to readout and re-set of the light-sensitive sensors is also decreased. Thus, the combined image is sharper. A preferred embodiment comprises receiving user input defining a region within an image in the sequence of images and calculating a motion vector indicative of relative movement of a part of successive images in the sequence of images corresponding to the defined region. Thus, it is possible to allow for desired blurring effects in the combined image, for example due to movement of objects represented in the scene. In particular, this embodiment enables improved panning in action photography (the de- fined region is a moving object) , as well as allowing moving objects to be removed from a still image (the defined region surrounds a moving object) . Furthermore, it allows for the execution of a method of panoramic photography, wherein only edges of images are aligned, prior to combination into a panoramic photograph. A preferred embodiment comprises calculating a rate of change in magnitude of the motion vector from one of the successive images to a next and providing an output signal to the user if the rate of change exceeds a certain threshold level. This variant is intended for execution by a digital camera, whereby a user captures a sequence of images by swivelling round at one position, which images are then combined into a panoramic image . The method ensures that the user takes a sufficient number of images, i.e. does not swivel round too fast . In a further embodiment of the invention, the step of calculating a motion vector is repeated for at least one further sequence of images, wherein at least a part of each image of the sequence of images is re-positioned in accordance with the respective motion vector for the sequence, and wherein a combined final image is formed as a weighted sum of all images subsequent to re-positioning, each of the sequences being accorded a weighting factor. Thus, it is possible to form a combined final image of a scene under exactly the desired lighting conditions, without having to physically fine-tune the lighting. Each sequence of images need only be captured at one extreme end of a range of lighting conditions. 'Mixing' is achieved by varying the weighting factors. Due to the use of the method of the invention, the combined final image is still relatively free from blurring . According to another aspect of the invention, there is provided an image processing method, including a step of analysing frames in a sequence of frames of pixel values, each frame representing a corresponding one of a sequence of images associated with successive moments in time, including: determining a plurality of time series of cumulative pixel values, each one associated with a pixel position within the images, and selecting pixel positions at which a deviation of the associ- ated time series fulfils a pre-determined criterion, characterised by determining at least one region of contiguous selected pixel positions and calculating an associated motion vector local to that region and representative of movement of at least part of that region in the images represented by the frames . This method solves another problem of the known methods of image stacking, namely that they are primarily suited to correcting for "camera shake'. The method as defined in the preceding paragraph achieves an object of providing a method that allows for separate correction of movement of objects represented in images and movement of a device used to capture the images . The pre-determined criterion characterises the effect of movement of the totality of the image. Deviations from this characterisation are indicative of movement of an object occupying the pixel position to a new pixel position. By determining regions of contiguous selected pixel positions, such objects can be identified. It is noted that US 6,538,593 B2 discloses a method using an analog-to-digital converter with a maximum input signal of S_s for converting a monotonically changing analog signal to a cumulative floating-point, digital representation even if the analog signal has a value greater than S_s at time T. First, an analog signal is reset to a reference value at an initial time t=0. Then, the analog signal is sub-converted at a first time to Ti _> 0 to obtain a first digital representation which corresponds to the magnitude of the analog signal at this first time. Next, the analog signal is sub-converted at a subsequent time T₂>Tι to obtain a second digital representation which corresponds to the magnitude of the analog signal at this second time. The two digital representations are then combined into an intermediate floating-point, digital representation with greater dynamic range than either of the first two digital representations on their own. The steps of sub-converting the analog signal and combining each new digital representation with the existing intermediate digital representation to obtain a new intermediate representation are repeated for other subsequent times, in order to produce a cumulative floating-point, digital representation of the analog signal at time t=T. The known method does not, however, comprise determining at least one region of contiguous selected pixel posi- tions and calculating an associated motion vector local to that region and representative of movement of at least part of that region in images represented by the frames. The known method is used to increase the dynamic range of the pixel values rather than to characterise relative motion of objects represented in images and moving within each image. A preferred embodiment includes a repositioning step prior to the analysing step, the repositioning step including: deriving the frames by repositioning at least a part of successive images in the sequence of images, which part encompasses the pixel positions associated with the time series, in accordance with a global motion vector, representing at least a component of relative movement of that part in the sequence of images. Thus, the local motion vector represents movement of (a part of) the determined region as a perturbation on a global motion vector, which represents movement of a larger part encompassing the region. By separating information representative of movement of the region relative to the rest of the image from information representative of movement of the whole scene from one image to the next ('camera shake'), each can be determined more accurately. In addition, each can be corrected for separately, depending on user preferences. For example, it may be desirable to keep a region representing a moving object blurred, but to correct for camera shake to achieve a sharp view of the background. A preferred embodiment includes repeating the analy- sis step on a sequence of arrays of pixel values, each array derived from the pixel values representing the region in a corresponding one of the sequence of images, to calculate at least one motion vector local to a sub-region within the region. Thus, more detail can be discerned. For example, a global motion vector may provide information on camera shake, a local motion vector on the movement of a car through the represented scene, and a further local motion vector on the movement of the wheels of the car (each wheel being a sub-region within the region representing the car) . It would then be possible to correct each image in the sequence in such a way that, when the corrected images are combined into a final image, the car and the background appear as sharp, whereas the wheels appear blurred. The or each motion vector is preferably calculated by means of the motion vector calculation step in a method according to the first-mentioned image processing method according to the invention. According to another aspect, the invention provides a method of processing a plurality of first frames of pixel val- ues, each first frame representing a corresponding one of a sequence of images, such that the pixel values of a first frame, when added to corresponding pixel values of other first frames, form a first frame representing a combined final image, characterised by converting at least some of the first frames into respective second frames having a smaller data size than the first frames and adding corresponding pixel values of the second frames, so as to form a second frame representing a preview image. This method achieves the object of enabling a relatively accurate impression of the combined final image to be gained in an efficient manner. It solves a problem occurring in application of stacking methods in image capturing devices with limited data processing power. One would like to know whether the captured images are sufficient to generate a combined final image of sufficient quality, or whether new or additional im- ages need to be captured. However, the generation of a combined final image from the captured image requires processing a large amount of data. Because the images in the sequence represented by the first frames are such that the pixel values, when added form a frame representing a combined final image, they can each of themselves be underexposed. Because a preview image is formed, there is no need to generate the combined final image on the spot to ascertain whether the first frames allow for the generation of a frame representing a combined final image of suf- ficient intensity. This means that the preview image can be generated in a device with relatively low processing power, such as a camera. The frame representing the combined final image can be generated in a more powerful processing device, or on a processing device of limited capabilities at a low speed. Thus, the method of the invention solves the problem set out in the preceding paragraph. According to another aspect, the invention provides an image processing system, comprising a processor and memory for storing pixel values, which system is configured to execute a method according to any one of claims 1-20. According to another aspect, the invention provides a digital camera, arranged to carry out a method according to any one of claims 1-20. According to another aspect, the invention provides a computer program product having thereon means, when run on a programmable data processing system, to enable the programmable data processing system to execute a method according to any one of claims 1-20. The computer program product comprises a series of instructions instructing a programmable processing device to carry out certain steps in the method according to the invention. It may be comprised in image processing software for a personal computer or workstation, but also as embedded software for a digital camera or scanner, for instance. The invention will now be explained in further detail with reference to the accompanying drawings, in which: Fig. 1 shows schematically the layout of an example of a digital camera for use in conjunction with the invention; Fig. 2 is a schematic overview of an embodiment of an image processing method; Fig. 3 is a flow diagram giving an overview of a first embodiment of a method of calculating a motion vector; Fig. 4 is a flow diagram giving an overview of a second embodiment of a method of calculating a motion vector; Figs. 5A and 5B illustrate in a graphical manner the development of two components of a motion vector over a se- quence of eight images; Fig. 6 is a very schematic representation of a part of an image-capturing device; Fig. 7 illustrates the variation of the exposure of the image-capturing device in order to capture a sequence of images in one embodiment; Fig. 8 illustrates the variation of the exposure of the image-capturing device in order to capture a sequence of images in a second embodiment ; Fig. 9 illustrates the variation of the exposure of the image capturing device in order to capture a sequence of images in a third embodiment; Fig. 10 illustrates how an embodiment of the image processing method is used to form a panoramic image; Fig. 11 illustrates how an embodiment of the image processing method is used to form a combined image under certain desired lighting conditions ; Fig. 12 is a flow diagram schematically illustrating a step of local adjustment in the image processing method of Fig. 1; Fig. 13 illustrates schematically the development of an intensity signal for a pixel; Fig. 14 is a schematic illustration of an embodiment of the image processing method in which regions within images are re-positioned; Fig. 15 is a schematic illustration of a variant of a method of capturing images wherein a preview image is generated in parallel; and Fig. 16 is an enhancement to the embodiment illustrated in Fig. 2, wherein a preview image is generated. One example of an image processing system usable in the context of the present invention is a digital camera 1. Alternatively, a conventional camera (i.e. exposing a film with a photographic emulsion) could be used to capture images as an alternative to the digital camera. To use these in the context of the present invention, they would then be scanned and digi- tised using a photo-scanner as known generally in the art. Thus are formed frames of pixel values, each frame representing a corresponding one of the images. The invention will be described herein using an embodiment with a digital camera as an example . The digital camera comprises a lens system 2 for focussing on one or more objects in a scene. When a shutter 3 is opened, the scene is projected through an aperture 4 onto an image-capturing device 5. The shutter time is controllable, as is the diameter of the aperture. As an alternative to the shut- ter, the image capturing device 5 could be electronically controlled to provide the same effect (electronic shutter) , namely to capture a signal from the image capturing device 5 representative of the light to which this image capturing device is exposed for the duration of an exposure time. The image-capturing device 5 can be a device using CMOS (Complimentary Metal Oxide

Semiconductor) technology or, preferably, a CCD (Charge Coupled Device) sensor. Alternatives are possible. It is noted that, for simplicity, this description will not focus on the way in which colour images are captured. It is merely noted that any known type of technology can be used, such as colour filters, a colour-sensitive variant of the image capturing device 5, etc. The output of the image-capturing device 5 is provided in the form of one or more analogue signals to an Analogue to Digital converter (A/D-converter) 6. The A/D- converter 6 samples and quantises the signals received from the image capturing device 5, i.e. records it on a scale with discrete levels, the number of which is determined by the number of bits of resolution of the digital words provided as output by the A/D-converter 6. These digital words are provided to a digital signal processor (DSP) 7, which performs such features as interpolation between pixels, and optionally compression of the image (e.g. in accordance with the MPEG-2 standard) . The result is a frame of pixel values representing an image. Each exposure of the image-capturing device 5 during an exposure time results in at least one frame, as will be explained in more detail below. The digital camera comprises a storage device 8 for storing the image data representative of the captured images. The storage device 8 can be any usual type of storage device, e.g. built-in flash memory, inserted flash memory modules, a disk drive with floppy disk, a PCMCIA-format hard disk, or an optical medium writer. A microprocessor 9 controls the operation of the digital camera, by executing instructions stored in nonvolatile memory, in this example a Read-Only-Memory (ROM) 10. Indications of the operating condition of the digital camera 1 are provided on an output device 11, for example a Liquid Crystal Display, possibly in combination with a sound-producing device (not shown separately) . An input device 12 is shown schematically as being representative for the controls by means of which the user of the digital camera provides commands. In addition, the digital camera 1 illustrated as an example comprises a flash driver circuit 13, for providing appropriate driving signals to one or more sources of flash light. The digital camera 1 shown in Fig. 1 also comprises a motion sen- sor 14, for providing a signal representative of the movement of the digital camera 1, and thus of the image capturing device 5. Furthermore, the digital camera 1 comprises an expo- sure-metering device 15. The purpose of the exposure metering device 15 is to measure the strength of light, so that the microprocessor 9 can determine the intensity of light to be emitted by a flash in combination with the correct exposure value as determined by aperture and shutter speed. The camera 1 can be used in a substantially stationary position to capture a sequence of images and to derive a sequence of corresponding frames of pixel values representing the images. Each image is underexposed on purpose. The image capturing system, comprising the image capturing device 5 and any controlled sources of light also comprises means for recording pixel intensity values on a certain scale. The dynamic range of the scale is determined by the recording means. For example, in the digital camera, the dynamic range of the image sensor 5 and the number of bits of resolution of the A/D converter 6 determine the dynamic range of the scale. In a camera using film with a photosensitive surface, the properties of the film determine the dynamic range of the scale. Underexposure in the above sense means that parameters of the image capturing system, comprising the camera 1 and any controlled sources of light, are adjusted, so that each of the pixel values for each frame is recorded in a range occupying a minor part of the scale allowed by the recording means. How this is achieved varies according to the embodiment of the invention chosen. By means of the invention, the images are adjusted prior to forming them into a combined final image . The combined final image is formed by summing the values of corresponding pixels in the adjusted images. The combined final image may therefore be formed from underexposed images, but is itself sufficiently bright, as well as having a good resolution. The adjustment is used to prevent the combined final image from being blurred. The adjustment corrects for relative movement of at least a part of successive images in the sequence of captured images due to movement of objects and/or shaking of the digital camera 1. Turning to Fig. 2, a representative embodiment of the method is shown. It should be noted that this embodiment is by no means the only possible embodiment. In particular, the shown embodiment starts with a sequence 16 of frames of pixel values, each frame representing a corresponding one in a sequence of images. This sequence 16 is retrieved from the storage device 8 and manipulated. It is noted that the method could be carried out by the digital camera 1, i.e. executed on the microprocessor 9, but also on a general-purpose computer (not shown) to which the frames in the sequence 16 have been transferred. Also, the schematic illustration of the sequence 16 does not imply a particular type of file format. Indeed, the images could be stored in compressed form, for example by storing a first frame of pixel values representing intensity values for each of the colour components and subsequent frames wherein each pixel value represents a difference in value to the corre- sponding pixel value in the first frame. Such techniques are well known in the art. It is noted in particular that various embodiments of the invention could be carried out in the digital camera 1 as the images are being captured, instead of after the entire se- quence 16 has been stored. It is further noted that the images represented by the sequence of frames of pixel values need not be the direct output of the image-capturing device, but may be the result of interpolation or other techniques. To enhance the embodiment shown in Fig. 2, especially when the steps shown in detail are carried out in the digital camera 1, an optional, parallel process may take place, indicated by the start and end points A,B. This parallel process will be explained in more detail in conjunction with Fig. 16. An object of the method shown in Fig. 2 is to calcu- late a global motion vector 17. The global motion vector is representative of relative movement of at least a part of successive images represented by the sequence 16 of frames of pixel values. Implementations of the invention are possible in which only one component of translation is calculated, i.e. the x- or y-component. In other implementations, the motion vector may comprise a component representing rotation around an axis perpendicular to the image and/or components representing rota- tion around an x- or y-axis in the plane of the image (i.e. a multiplication factor for the mutual spacing of pixels) . For simplicity, the explanation given here will assume that the motion vector has two components: translation in the x- and y- direction, both the x- and y-axis lying in the plane of an image. Relative movement means the amount by which corresponding parts of images are displaced relative to one another. This may be displacement relative to the first in a sequence of images or the preceding image. In the illustration given here, the x- component of the motion vector represents displacement in the x-direction relative to the first of successive images. The y- component is indicative of displacement in the y-direction. As will be appreciated from the discussion provided above, the motion vector can be seen as a time-varying vector. This means that each component in fact comprises a sequence of values representing the progression over the sequence of images of that component of movement . In general, the motion vector is indicative of relative movement of at least apart of successive images in a se- quence of images to be adjusted. This means that the motion vector may be calculated on the basis of all the frames in the sequence of frames 16 (as in the example illustrated here) , or on only some of them, with interpolation used to derive values for frames in the sequence 16 between two successively used frames. In the embodiment shown in Fig. 2, several zones are identified in the image sequence. This can be done on the following criteria: 1. Manual selection. 2. Image analysis - detecting areas that appear to have more detail based on frequency analysis of the image . 3. Fixed points, for example in the centre and in the centre of each quadrant . For each zone a two-dimensional array of pixels is selected. The size of this array could be in the region of 64 x 64 pixels, for example. Selection of a zone is depicted in step 18 in Fig. 2. Subsequently, a step 19 of calculating a motion vector representative of relative movement of the selected zone is carried out . These two steps 18,19 are repeated for each of the identified zones. In this way, a set 20 of motion vectors is calculated. The global motion vector 17 is calculated from the set 20 of motion vectors by finding the norm of the motion vec- tors in the set 20, removing the motion vectors from the set 20 that vary from the norm according to some threshold and then recalculating the norm of the motion vectors. In this way, the average for the movement of the digital camera 1 is calculated (step 21) . Alternative embodiments are possible in which only one zone is effectively selected, or only zones within a user- defined region of the image . In a subsequent step 22, substantially the whole of each image in the sequence of images represented by the se- quence 16 of frames of pixel values is re-positioned in accordance with the global motion vector 17. This implies that pixel values are assigned to different co-ordinates by - in the present example - translation in accordance with the component values applicable to the image concerned. It will be re-called that each component of the global motion vector 17 is formed by a time series. Each image is associated with a point in time. Where the values in the time series do not coincide with the points in time associated with the images, interpolation or approximation is used to derive the appropriate component values for each image. Thus, a sequence 23 of adjusted images results. Subsequent to the re-positioning step 22, the values of corresponding pixels in each of the images in the sequence 23 are summed to form a combined final image 24 in step 25. If an optional step 26 of local adjustment of regions within the adjusted images 23 is not carried out, the combined final image 24 is provided as output, and image processing fin- ishes. Otherwise, correction for movement of objects within the images is carried out to result in a locally adjusted final image 27. This is explained in more detail with reference to Fig. 12. Fig. 3 illustrates a first manner of calculation of a motion vector, i.e. an embodiment of step 19 in Fig. 2. It will be recalled that in the present example, the motion vector D comprises two elements, D_k, k = 1, 2, representative of relative (horizontal) movement in the x-direction and (vertical) movement in the y-direction, respectively. In the present explanation, it will be assumed that the motion vector D is calculated from a sequence of eight images, and that each element D_k thus comprises eight values D [i], i = 0, 1, ... 7 in a time series. Figs. 5A and 5B show the development of each of the two elements of the motion vector to be arrived at. As can be seen, the part of the image to which the motion vector pertains moves continuously, with slight variations in speed, in the x- direction and around a fixed point in the y-direction in the present example. In the variant of Fig. 3, each element D_k is represented as a series expansion D_k = a₀*i*Dt + aι*sin(2pi/8) + ..., i.e. a Fourier sum. Alternative functions, such as sawtooth functions centred on each value of i can be used instead of sinusoids. Thus, in the frequency domain, each element D_k may also be represented as a sequence of frequency components fk³ , k = l, 2, j = 0, 1, .... Determining the frequency component amounts to determining the corresponding term in the series expansion representing the element of the motion vector. Other forms of series expansion may be used, such as perturbation analysis of each element of the motion vector, for instance. In the variant shown in Fig. 3, each element D_k of the motion vector is calculated individually. Also, a best estimate of the first element D_x is first determined before the second element D₂ is calculated. Thus, in a first step 28, the DC component of the first element Di of the motion vector D is guessed, i.e. f ° . The motion vector is then calculated on the basis of that value in a subsequent step 29. The part of the images to which the motion vector is applicable is then re-positioned (step 30) . This is done in the manner explained above in relation to step 22 of Fig. 2. Assuming Fig. 3 to be an illustration of step 19 in Fig. 2, the part concerned is the zone selected in step 18. Alternatively, it could be a larger part encompassing this zone. After re-positioning, at least the re-positioned parts are summed to form a combined image. This is done in step 31. Note that, where only the zones are re-positioned and summed, the combined image corresponds to a very small part in the total sequence of images represented by the sequence 16 of frames of pixel values retrieved from the digital camera 1. In a subsequent step 32 a measure of the energy con- tained in a range of the spatial frequency spectrum of the combined image is calculated. This range- lies at least partly, preferably entirely within the upper half of the spatial frequency spectrum as determinable on the basis of the resolution of the combined image. It will be realised that the determin- able total range of the frequency spectrum depends on the number of pixel values used to represent the combined image. Thus, for instance, where the combined image is represented by a frame of eight by eight pixels, the range will be more limited than when 'represented by a frame of sixteen by sixteen pixels. The estimation process, comprising the above steps 28-32 is re-iterated at least once with an improved estimate of f _k , in order to maximise the energy as calculated in step 32. Known optimisation methods may be used. Suitable criteria for breaking off the iterative process are likewise known in the art . As demonstrated in Fig. 3, the calculation of the motion vector includes determining a further term in the series expansion, i.e. the next frequency component f _\ . The value of j is incremented in step 33, and the estimation process compris- ing a number of iterations of steps 28-32 is repeated whilst varying the further term. Again, the value of f^¹ yielding the highest energy in the pre-selected range of the spatial frequency spectrum is determined. Depending on the desired accuracy and the number of frames on the basis of which the motion vector is calculated, further terms in the series expansion can be calculated. The iterative execution of steps 28-33 can be looked upon as one step of determining an element of the motion vector D. In the variant of Fig. 3, this step is repeated for each element D_k, as represented by step 34. When all elements (in this example two) have been calculated the motion vector D[i] is returned (step 35) . Fig. 4 illustrates an alternative, but very similar, embodiment in which the motion vector is optimised in two- dimensions for each frequency component ? in turn. That is, the motion vector D is represented as one series expansion. Thus, in a first step 36, the DC component of D is guessed, as determined by f°. The motion vector is then calculated on the basis of that value in a subsequent step 37. The part of the images to which the motion vector is applicable is then re-positioned. This is done in the manner explained above in relation to step 22 of Fig. 2. After re-positioning (in this case involving a two-dimensional translation) in step 38, at least the re-positioned parts are summed to form a combined image. This is done in step 39. In a subsequent step 40 a measure of the energy contained in an upper range of the spatial frequency spectrum of the combined image is calculated. The estimation process, comprising the above steps 36-40 is re-iterated at least once with an improved esti- mate of f³ , in order to maximise the energy as calculated in step 40. As demonstrated in Fig. 4, the calculation of the motion vector includes determining a further term in the series expansion, i.e. the next frequency component ? . The value of j is incremented in step 41, and the estimation process comprising a number of iterations of steps 36-40 is repeated whilst varying the further term. Again, the value of f³ yielding the highest energy in the pre-selected range of the spatial frequency spectrum is determined. Depending on the desired accuracy and the number of frames on the basis of which the motion vector is calculated, further terms in the series expansion can be calculated. Otherwise, the motion vector D is returned in a final step 42. In advantageous embodiments of the image processing method presented herein, calculations are carried out on frames of pixel values, each frame representing a corresponding image, wherein the pixel values of each frame lie on one of a sequence of scales of discrete values, with increasing absolute maximum, the scales being applicable to respective successive sets of at least one image in the sequence of successive images. For purposes of illustration, it will be assumed that a 'set' com- prises but one image. Thus, each of the frames in the sequence 16 of frames of Fig. 2 comprises pixel values on a corresponding scale in a sequence of scales. The absolute maximum increases with each frame (of course it could also decrease) . To form a sequence 16 of frames with the above- mentioned characteristics, various methods of capturing a sequence of images may be employed by the digital camera 1. Some of these can also be used in other types of image processing systems, such as scanners or computers processing digitised images. This will be illustrated using Fig. 6 as a simple illus- tration of the manner in which pixel values are derived from signals of the image-capturing device 5. A light-sensitive cell 43 is exposed at a certain exposure level. Exposure level in this case is taken to mean the combination of settings determining the strength of an analogue signal as sampled to provide an input to an analogue/digital converter 44 (A/D converter) . The settings include the exposure time (the amount of time for which the light-sensitive cell 43 is exposed to light) , as determined, for instance, by the shutter 3 in the digital camera 1 shown in Fig. 1. They further include the duration and/or intensity of artificial lighting, i.e. as controlled by the microprocessor 9 and flash driver circuit 13 of digital camera 1 (Fig. 1) . They also include the area of the aperture 4, through which the area of the aperture 4, through which the light-sensitive cell 43 is exposed. The settings also include the gain of an amplifier 45, used to amplify the analogue signal provided to the A/D converter 44. The criterion for the correct exposure of an image is to achieve the point where the light-sensitive sensor comprising the light-sensitive cell 43 is not saturated, while at the same time the intensity of the images (i.e. of the signal provided as output from the A/D converter 44) is as high as possi- ble relative to the noise associated with the image sensor. In accordance with the methods set out herein, the whole of each image in a series of images is re-positioned in accordance with the calculated motion vector and values of corresponding pixels in each of the images subsequent to re-positioning are summed to form a combined final image (step 25 in Fig. 2) . Thus, each of the sequence 16 of frames of pixel values (Fig. 2) is actually formed by underexposure. The problem with an image with low exposure is that, whilst it is far from reaching the point of saturation, the image signal is small relative to the noise of the light-sensitive sensor 43 (Fig. 6) . Among the types of light-sensitive cell 43 usable in the present context, two stand out: CMOS (Complementary Metal Oxide Semiconductor) and CCD (Charge-Coupled Devices) devices. These two types of device have slightly different noise charac- teristics. The noise of a sensor is generally made up of the following components: - Dark current. This is generally very low. Shot noise or photon noise.^' This noise is reduced by ensuring that the number of photons counted is sufficiently high, i.e. that the sequence of images captured comprises a sufficient number of images with sufficient exposure levels. Readout noise. - Reset noise. This noise occurs when the light- sensitive cell 43 is re-set. This is figuratively illustrated in Fig. 6 by a re-set switch 46, used to set the voltage across a capacitor 47 to zero. The capacitor 47 is a representative example of an accumulator for presenting an accumulation of an output signal of the light-sensitive cell 43 as an output signal of the light-sensitive sensor com- prising the cell 43 and capacitor 47. A sample switch 48 illustrates how a sample is obtained as a reading of an output value of the light-sensitive sensor. Additional sources of noise include noise added by the A/D converter 44 and fixed pattern noise, arising from differences between the circuits associated with each sensor in the image capturing device 5. Noise that would be introduced due to the fact that a sequence of frames of pixel values representing a sequence of images is used to form a single combined image 24,27 is reduced by means of one or more of various techniques. One manner of reducing noise is to form multiple frames of pixel values by taking readings of the output values of the light-sensitive sensors at respective intervals between re-sets of the light-sensitive sensors. A certain exposure level is determined to be correct for a combined image, given certain user input values provided through the input device 12 and certain environmental conditions as determined by means of exposure metering device 15 of digital camera 1, for example. This is divided over the frames in the sequence 16 of frames. The exposure level for each frame results in an exposure time, for the duration of which the light-sensitive cell 43 is exposed to light. Upon expiry of the exposure time, the light- sensitive sensor is re-set by means of the re-set switch 46. Normally, a single reading would be taken at the end of the exposure time, by closing the sample switch 48 once. In one variant, the readout noise is reduced by closing the sample switch a plurality of times between closing the re-set switch 46. Each reading is scaled according to the elapsed exposure time, whereupon the multiple readings are averaged to form a single output value. Because noise increases at a slower rate than the amplitude of a signal, the signal to noise ratio is decreased. To decrease the noise level of the output signal of the A/D converter 44, the exposure level can be stepped from frame to frame, i.e. with each exposure of the light-sensitive cell 43. This means that the light-sensitive cell 43 is exposed at a certain exposure level . A frame of pixel value is derived by reading output values of the light-sensitive sensor at least once, including when the exposure level has substantially been reached (i.e. at the end of the exposure time) . Then the re-set switch 46 is closed prior to the next exposure. The exposure level is stepped monotonically, i.e. continuously incremented or decremented from one cycle of execution of these steps to the next . As mentioned above, the exposure level is determined by the exposure time, aperture, (flash) lighting intensity, gain of the amplifier 45 and A/D conversion threshold of the A/D converter 44. Stepping may be accomplished by varying any or a combination of these settings with each execution of the steps resulting in a frame of digital pixel values. It is noted that stepping an amplification factor used to amplify an output value of the light-sensitive sensor has the advantage of simple implementation. Figs. 7 and 8 illustrate alternatives, namely variation of the exposure time and variation of the intensity of light admitted onto the image-capturing device 5, respectively. Figs. 7 and 8 are diagrams showing the intensity of light admitted onto the image sensor against time. The area under the curve is the exposure. Instead of capturing one frame with the exposure resulting from the exposure value, a plurality of frames is captured. To this end, the exposure is 'divided' over a plurality of frames. In a first embodiment, the size of the aperture 4, as well as lighting conditions are kept constant between exposures. This embodiment is illustrated in Fig. 7. Because the size of the aperture 4 is kept constant, the intensity values are each the same. The total exposure is the sum of the areas of the bars. The number of exposures depends on the time required for a stable image to be captured. The number is se- lected to keep the exposure time below a certain threshold value. Preferably, this threshold value is pre-determined at 1/60 second as this is considered the lowest shutter speed to capture a steady image for the average photographer. In one embodiment, the exposure time is varied randomly between frames. In another embodiment, settings of the image capturing system, in this case the exposure time, are adjusted before several further captures of a frame in such a manner that at least a maximum of the scale on which intensity values for each pixel are recorded changes substantially uniformly in value with each adjustment. This has the advantage of resulting in a more accurate capture of the colour and tonal depth in the combined final image 24 (Fig. 2) or locally adjusted final image 27. One algorithm for calculating the exposure time is as follows. The maximum exposure time is chosen to be below the threshold, of 1/60 second for example. The average exposure time is set to equal half the maximum exposure time. The minimum exposure time is set equal to the maximum exposure time di- vided by the number of frames. Suppose the total exposure time determined to result in the desired exposure is 1 second. The maximum exposure time is chosen to be 1/60 second. Then the exposure times would be stepped in equal increments from 1/7200 second to 1/60 second. In another embodiment, a different parameter of the image capturing system, in this case the size of the aperture 4, is adjusted before several further captures of a frame in such a manner that at least a maximum of the scale on which intensity values for each pixel are recorded changes substan- tially uniformly in value with each adjustment. This embodiment is illustrated in Fig. 8. In this variant the exposure time is the same for each successive frame. However, the maximum intensity that can be captured and recorded decreases uniformly with each successive exposure. Aperture is the ratio of the focal length to the diameter of the opening of the lens system 2. Aperture area is the area of the opening, i.e. proportional to the square of the diameter. The aperture area is stepped down in equal increments in the embodiment illustrated in Fig. 8. In this embodiment, the aperture area needed for a given sum of exposure times is determined using the exposure-metering device 15, or a default value is taken for the sum of exposure times . The thus determined aperture area will be referred to as the metered aperture area. Given the number of frames needed to keep the exposure time for each frame below a certain threshold value, the average aperture area is calculated as the metered aperture area divided by the number of frames. The maximum ap~ erture is set to equal twice the average aperture area. The minimum aperture area is set equal to the maximum aperture area divided by the number of frames. Fig. 8 could also serve as illustration of another embodiment of the invention in which the aperture size is kept constant, but the intensity of artificial light used to illuminate a scene to which the image capturing device 5 is exposed is decreased in steps. For this purpose, the flash driver circuit 13 is controlled by the microprocessor 9 in such a way that the intensity of light emitted by a connected flash light- ing source is increased or decreased in steps. This may be achieved by varying the duration of each flash or the intensity. The result is a graph similar to that of Fig. 8, in case the intensity of the flashlight is decreased uniformly. An approach used in one embodiment of this variant is to drive a flash connected to the camera 1 as a test shot without capturing a frame. Using the exposure metering device 15, for example, the longest required exposure for the flash is determined, from which the camera calculates the exposure times for each of the frames to be captured with the flash light driven. Alterna- tively, a range finding system provided with the digital camera 1 is used to determine the distance between the flash and/or the camera 1 and the furthest object of interest. This can be used to configure the flash for the camera 1. A simpler variant involves the use of a fixed mode of operation, where the maximum range of a flash light source connected to the camera 1 is used to determine the breakdown into multiple frames. The variant with stepped intensities of flashlight simplifies the capture of correct exposure without requiring multiple flashes and multiple measuring points. In particular for situations where there is an object close to the digital camera 1 (with flash) and a screen at a substantially different distance from the digital camera 1, the variant helps to expose both objects correctly. Existing techniques to overcome problems associated with multiples objects at different distances from the flash are difficult and cumbersome, as they involve multiples flashes in the vicinity of each object, which are switched off, once the required light has been received by the camera 1. Fig. 9 shows the (analogue) output signal of the image sensor 5 plotted against the intensity of light to which it is exposed. The scale shown schematically along the V-axis comprises a number of discrete levels, determined by the number of bits of resolution of the A/D converter 6. The number of bits of resolution sets the tonal depth for the image. As bits of resolution are added, the tonal range deepens allowing the im- age to show more detail. Stepping the exposure time or aperture area between exposures has the effect of stretching (or shrinking) the scale. That is to say, the settings of the image capturing system are adjusted in such a manner that image data for each pixel are recorded on a scale varying in range between two successive frames. It is observed that this is also true in embodiments of the invention in which photographic film is used to capture frames and in which images are subsequently scanned to yield a frame of pixel values. The advantage of varying at least the maximum of the scale is that the image resulting from the combination of captured frames has increased resolution, because the number of possible intensity values resulting from summing intensity values for the individual frames (after repositioning) increases. Another way to achieve this effect is to adjust the threshold of the A/D converter 6,44. In the case of a single bit A/D converter 6 operating on a signal with continuous val- ues between 0 and 1, the threshold for determining the value would be 0.5. All values above 0.5 are converted to a 1 and all values below that to a 0. To increase the resolution to two bits, the threshold is first adjusted to 0.2 to capture and re- cord a frame, then to 0.4, then to 0.6 and then to 0.8. When the output of each A/D conversion is added to create a final value, the resolution is increased from one to two bits. Thus, this increase occurs in step 25 of Fig. 2. Returning to Fig. 2, another advantage of re- positioning parts of multiple images prior and combining them into a single combined final image 24 will be explained. In a preferred embodiment of the invention, the digital camera 1 or image processing system executing the method receives user input defining a region within an image in the sequence of images represented by the sequence 16 of frames. Subsequently, a motion vector indicative of relative movement of a part of successive images corresponding to the defined region is calculated in step 19. In other words, a sub-section of the total area of an image is used to calculate the motion vector. An example is illustrated in Fig. 10 for three captured images 49-51. A rectangular part 52 of the first captured image 49 corresponds to the user-defined region, as does second rectangular part 53 in the second captured image and a third rectangular part 51 in the third captured image. Information representative of its rotation and translation from the first captured image 49 to the third captured image 51 is determined (step 19 and/or 21 of Fig. 2) . The global motion vector 17 is thus representative of relative movement of the rectangular part. In step 22, the second and third captured images 50,51 are re-positioned in accordance with the global motion vector 17. In Step 25 they are combined into a single combined final image 25, which is a panorama picture. To actually capture the three images 49-51 using the digital camera 1, the user must swivel round in one position, whilst the image capturing device 5 of the digital camera 1 is repeatedly exposed. In one favourable embodiment, the rate of change in magnitude of the global motion vector 17 is calcu- lated from one of the successive frames of pixel values representing the images 49-51 to the next. An output signal is provided to the user if the rate of change exceeds a certain threshold level . The threshold level is that calculated to pro- vide sufficient exposure for each of the images 49-51, so that the combined final image 24 has sufficient exposure over its entire area. It is noted that an alternative method of determining the rate of change in magnitude of the global motion vector 17 could be used. For example, the output from the o- tion sensor 14 could be used to determine this rate of change, without any real-time execution of the steps illustrated in Fig. 2 being necessary. In an embodiment, the camera 1 comprises means for providing an indication of the speed of motion relative to a desired speed of motion. The basis for the feedback would be the measurement of the exposure for each of the pixels. If an eight-frame mode is selected, the motion of the camera should be slow enough that each part of the scene should be exposed eight times to the image-capturing device 5. In other words, the camera 1 sums the image data value of substantially each pixel in a first captured frame with the image data values of the respective corresponding pixels in the further captured frames in which a corresponding pixel is present, and compares the sum total with a pre-determined desired exposure value. An indication of the result of the comparison is provided on the output device 11. A set of controls is provided on the camera to indicate the desired direction of movement of the camera 1 relative to the scene to be captured in one such embodiment . For exam- pie, the photographer may wish to move the camera 1 sideways, then downwards in the opposite direction. In one such embodiment, the photographer uses a four-button control system to indicate the desired motion. Bars on a graphical indicator provide feedback to the photographer that the motion is in the correct direction and at the correct speed. A commonly used technique for sports or action photography is panning. This is often used to show motion, where an object is moving against a background. For example, an object may be moving across a screen. Normally, a photographer would use a slow shutter speed and follow the motion of the object. The resulting effect is a sharp object with a blurred background. This technique is difficult to perform and may require many attempts to get the desired result . Using the invention, the object, determined to be in the foreground of the scene is selected as the user-defined region on the basis of which the global motion vector 17 is to be calculated. This may be done by analysing image data representative of each respective frame and determining the motion of the object relative to the camera 1. Alternatively, the analysis may be carried out by comparing two successive frames. The motion information will be used to re-position the images rep- resented by the sequence 16 of frames of pixel values, which are then combined into a single combined final image. Fig. 11 illustrates another advantageous feature. In studio applications where multiple flash sources are used a problem for the photographer is the determination of the right combination of the strengths of multiple light sources 55,56. In the embodiment, the light sources 55,56 are controlled in such a manner that the intensity of light provided by a first light source 55 relative to the intensity of light provided by the second light source 56 whilst capturing a first set of im- ages is different from that whilst capturing a second set of images. Thus, a single image can later be generated by combining them using a different weighting for the image data values for pixels in frames of the first set than for the image data values for pixels in frames of the second set, when values for pixels at corresponding positions are summed. Thus, the photographer has maximum control of the image in the post-production process. Bracketing of the light sources 55,56 is thus unnecessary. (Bracketing is a technique involving the taking of successive shots of the same scene with different exposure set- tings. After the film has been processed, the shot with the best exposure setting is chosen. Another approach to bracketing is to change the intensity of the light sources and shoot sue- cessive frames. Again the optimum frame is selected once the film has been processed.) Fig. 12 will now be used to illustrate in more detail an embodiment of the step 26 of local adjustment of regions within the adjusted images 23 in Fig. 2. In a first step 57, a sequence 58 of frames of pixel values, each frame representative of a corresponding one of a sequence of images associated with successive moments in time is analysed. This step 57 includes determining a plurality of time series of cumulative pixel values, each one associated with a pixel position within the frames. A representative example of such a time series is shown in Fig. 13 for one pixel position. In the shown example, the sequence 58 comprises six f ames. Thus, there are shown six cumulative pixel values. In the shown example, the cumulative pixel (intensity) value increases substantially linearly in time, as evidenced by the continuous line fitted to the curve represented by the values. This is expected where the successive images are of the same, static scene. However, the value for the sixth frame deviates by more than a certain pre-determined amount from an approximation of the time series determined on the basis of the preceding five frames in the sequence 58. This is an indication that an object in the image has moved position relative to the rest of the image from the fifth to the sixth im- age. In the course of the analysis step 57, such pixel positions are 'flagged'. As an example, Fig. 14 shows a frame 59 in which pixel positions that fulfil the criterion for 'flagging' have been marked with a star. In a subsequent step 60 (Fig. 12), regions 61-63 (Fig. 14) of contiguous selected pixel positions are determined. In a next step 64, associated motion vectors are calculated for the respective regions. Each local motion vector is representative of movement of at least a part of the region 61-63 in images represented by the frames. The local mo- tion vector is advantageously calculated using a method in accordance with step 19 in Fig. 2, examples of which have been set out above. In a further step 65, at least a part of each of the regions 61-63 within the successive images in the sequence of images represented by the sequence 58 is re-positioned in accordance with its associated local motion vector. Thus, the i - ages are corrected for movement of objects within a scene captured in the images. It is noted that, when applied in the method of Fig. 2, the sequence 58 of frames represents the sequence 23 of adjusted images illustrated in Fig. 2. Thus, the analysis step 57 of Fig. 12 is preceded by the steps 22 and 25 of Fig. 2 in which at least parts of successive images in the sequence of images 23 are re-positioned in accordance with the global motion vector 17. This part advantageously encompasses the pixel positions associated with the time series. Thus, the se- quence 58 of frames represents aligned images. Camera shake has been compensated for, so any deviations of a time series will not be due to camera shake, but only to movement of an object represented in the images. This improves the quality of the analysis step 57. Only the ^Λ right' pixel positions are flagged. In Fig. 12, the re-positioning step 65 is optionally followed by a repeated execution of the steps 57,60,64,65 performed on the sequence of frames, but now performed on sequences of corresponding arrays of pixel values . Each array is derived from the pixel values representing the region in a corresponding one of the sequence of images. In most embodiments, each array will correspond to these pixel values. Variants are possible in which extra pixel values at immediately adjacent pixel positions are included, or in which the arrays of pixel values are derived through interpolation, or comprise only pixel values at every other pixel position in the regions 61-63. Thus, the analysis step 57 is repeated on the regions 61-63 in the frame 59 shown in Fig. 14. Note that the analysis step is carried out on a sequence of arrays of pixel values, each derived from a corresponding one of the regions 61-63. Furthermore, the analysis step 57 is preceded by the re-positioning step 65 in which at least a part of succes- sive regions in the sequence of regions is re-positioned in accordance with the local motion vector calculated for that region. It is further noted that each of the steps 57,60,64,65 is carried out independently for each the respective re- gions 61-63. Due to the prior re-positioning of each of the regions 61-63, the time series analysed in step 57 selects those pixel positions at which objects moving within the region cause a deviation of the time series. These pixel positions are again 'flagged' . In the situation depicted in Fig. 14, only pixel po- sitions within the third region 63 are ^Λ flagged'. Sub- regions 66,67 of contiguous "flagged' pixel positions are determined in the subsequent step 60. Then, motion vectors local to the sub-regions 66,67 are calculated (step 64). This is advantageously done using one of the methods also applicable to determine the global motion vector (steps 19 and/or 21 in Fig. 2) . Subsequently, the re-positioning step 65 is applied to the sub-regions 66,67, in order to re-position at least a part of the successive regions 63 in accordance with the motion vector local to one of the sub-regions 66,67. This part, of course, corresponds substantially to the sub-region to which the motion vector is applicable. Thus, in the example of Fig. 14, re-positioning is carried out twice, in that a part corresponding to the first sub-region 66 in each of the se- quence of the third regions 63 is re-positioned and a part corresponding to the second sub-region 67 in each of the sequence of the third regions 62 is re-positioned. Finally, the values of corresponding pixels in each of the sequence of images represented by the sequence 58 of ar- rays of pixel values (now adjusted due to the various executions of the re-positioning step 65) are summed to form the locally adjusted final image 27 (step 68) . This locally adjusted final image 27 is corrected for blur due to camera shake as well as blur due to moving objects within the represented scene. Additionally, the locally adjusted final image 27 has a relatively high resolution and tonal depth, despite having been formed by processing underexposed images. It is noted that features of the overall image processing method depicted in Fig. 2 may also be applied to the local image processing as shown in Fig. 12. In particular, a user may select a region on which the analysis and re-positioning steps 57,65 are to be carried out. This may be implemented, for example, by allowing the user to discard regions selected in step 60, so that no re-positioning is carried out on these regions. Thus blur may deliberately be left in the locally adjusted final image 27, for example to create a dynamic impres- sion. Turning now to Fig. 15, a general overview is given of an image capturing process resulting in the sequence 16 of frames of pixel values. More particularly, to enable a user of the digital camera to assert more control over the image cap- turing process, a preview image is generated. Such a preview image, when displayed by the output device 11, enables the user to check the settings for image capture. Because each of the images represented by the sequence 16 of frames is underexposed, generation of a preview image is not quite trivial. It is not enough to simply generate a preview image on the basis of one of the frames in the sequence 16 of frames. This will give a relatively poor indication of what the combined final image 24 will look like. For this reason, a further processing step is carried out as well as the main processing step wherein the pixel values of the frames in the sequence 16 are adjusted and combined into a combined final image. The further processing step includes converting at least some of the frames in the sequence 16 into frames having a smaller data size and adding corresponding pixel values of those frames so as to form a frame of pixel values representing a preview image. As can be inferred from the fact that the further processing step is carried out during the image capturing process, the main and further processing steps may be carried out sequentially. In a first step 69 an image is captured. The step 69 of capturing an image may make use of any of the techniques set out above in conjunction with Figs. 6-9. The result is a frame 70 representing the captured image. The frame 70 is saved in a step 71, wherein it is added to the sequence 16 of frames of pixel values. In a parallel step 72, the frame 70 representing a captured image is converted to a frame 73 of reduced data size. The frame 73 of reduced data size represents the same captured image, but with a reduced amount of detail. The reduction in data size is preferably achieved by a reduction in resolution. That is, the number of pixel values, and thus the number of represented pixel positions in the captured image, is reduced. Known interpolation techniques may be used, or a number of pixel values may simply be dropped. Alternatively, each pixel value could be encoded in a lesser number of bits. Another alternative would be a reduction in the number of sub- pixels, i.e. colour components. In the shown variant, a frame 74 representing a preview image is formed each time a frame 73 of reduced data size has been generated. This is done in a step 75 wherein the last generated frame 73 of reduced data size is combined with the current version of the frame 74 representing the preview image to produce a new version of the latter. Combination includes at least the addition of each pixel value of the latest frame 73 of reduced size with the pixel value corresponding to it, i.e. representing a pixel at the same pixel position, in the frame 74 representing the preview image. The step 75 of co bin- ing the frames 73,74 optionally includes re-positioning part or all of the image represented by the frame 73 of reduced data size prior to adding the corresponding pixel values. Repositioning may be carried out in accordance with an embodiment of the methods discussed above in connection with the images represented by the sequence 16 of frames of pixel values with a relatively large data size. When the sequence 16 of frames is complete, and all the frames 70 representing a captured image have been converted and combined, a signal carrying the frame 74 of pixel values representing the preview image is gen- erated and provided to the output device 11, e.g. an LCD screen. Figs. 2 and 16 in combination illustrate an alternative, wherein the main processing to generate the combined final image 24 or locally adjusted combined final image 27 and the further processing to generate a preview image are carried out at least partly in parallel. The effect is that the main processing can be carried out at a relatively slow pace in the background, whereas the preview image is available for viewing much sooner. Thus, more responsive feedback is provided to enable control over the main image processing to be asserted. This is made possible without the use of a very powerful microprocessor 9 or DSP 7. The combined final image 24 or locally adjusted final image need not themselves be shown on the output device 11. Instead, upon completion of the generation thereof (steps 25 and 26, respectively), the frame of pixel values generated to represent them may be committed to a storage medium, for example in storage device 8 or in a device external to the digital camera 1. In a first step 78, a frame 79 is retrieved from the sequence 16 of frames. This frame 79 is of a relatively large data size. It is scaled down in a subsequent step 80, wherein a frame 81 of reduced data size is generated. This step 80 is similar to the step 72 in the method of Fig. 15. Next, a frame 82 of reduced data size representing the preview image is formed in a further step 83. The frame 81 of reduced data size representing the last retrieved image is combined with the current version of the frame 82 representing the preview image in this step. Local or global adjustment of the frame 81 is optionally carried out prior to addition of pixel values . A last step 84 consists of displaying the preview image. The invention is not limited to the described embodiments, which may be varied within the scope of the accompanying claims. For example, the time series of cumulative pixel values may be analysed on the basis of a stored sequence 58 of arrays of pixel values (post-processing) , but may also be carried out as samples are taken from the light-sensitive sensor shown in Fig. 6, by analysing the output of the A/D converter 44. In ad- dition, parts of the depicted methods may be carried out in the digital camera 1 before completion of the image processing methods in a general-purpose computer system. Thus, an image processing system adapted to carry out one or more of the meth- ods set out above could comprise multiple processing devices.

Claims

CLAIMS 1. Image processing method, including a step (19) of calculating a motion vector (20) representing at least a component indicative of relative movement of at least a part of successive images in a sequence of images, wherein the step (19) of calculating the motion vector includes a step of determining at least a first term in a series expansion representing at least one element of the motion vector (20) , which step includes an estimation process wherein at least the part in each of a plurality of the images is repositioned in accordance with the calculated motion vector (20) and values of corresponding pixels in at least the repositioned parts are summed to form a combined image, characterised in that the estimation process includes calculation of a measure of en- ergy contained in an upper range of the spatial frequency spectrum of the combined image and the step of determining at least the first term includes at least one further iteration of the estimation process to maximise the energy.

2. Method according to claim 1, wherein the step (19) of calculating the motion vector includes determining a further term in the series expansion, and wherein the estimation process is iteratively executed using a motion vector with an adjusted value of the further term to maximise the energy.

3. Method according to claim 1 or 2 , wherein the series expansion is a Fourier sum representing at least one element of a time-varying motion vector, each image being associated with a point in time.

4. Method according to any one of the preceding claims, wherein the step (19) of calculating the motion vector is carried out by manipulating frames (16) of pixel values, each frame representing a corresponding one of the successive images, wherein the pixel values of each frame lie on one of a sequence of scales of discrete values, with increasing absolute maximum, the scales being applicable to respective successive sets of at least one image in the sequence of successive images .

5. Method according to any one of the preceding claims, wherein the step (19) of calculating the motion vector is carried out by manipulating frames (16) of pixel values, each frame representing a corresponding one of the successive images,_and wherein the frames (16) of pixel values are formed by exposing an image capturing device (5) comprising an array of light-sensitive sensors (43,46,47) at a certain exposure level; deriving at least one frame of pixel values by reading output values of the light-sensitive sensors; and re-setting the light-sensitive sensors when the certain exposure level has been achieved.

6. Method according to claim 5, comprising repeatedly executing the steps of: exposing the image capturing device (5) at a certain exposure level ; deriving a frame of pixel values by reading output values of the light-sensitive sensors when the exposure level has substantially been reached; and re-setting the light-sensitive sensors, wherein the certain exposure level is monotonically stepped between each execution of the steps .

7. Method according to claim 5 or 6 , wherein multiple frames (16) of pixel values are formed by taking readings of the output values of the light-sensitive sensors at respective intervals between re-sets of the light-sensitive sensors.

8. Method according to any one of the preceding claims, comprising receiving user input defining a region (52-54) within an image in the sequence of images (49-51) and calculating a motion vector (17) indicative of relative movement of a part of successive images (49-51) in the sequence of images corresponding to the defined region.

9. Method according to claim 8 , comprising calcu- lating a rate of change in magnitude of the motion vector (17)

■ from one of the successive images to a next and providing an output signal to the user if the rate of change exceeds a certain threshold level .

10. Method according to any one of the preceding claims, wherein substantially the whole of each image in the sequence is re-positioned in accordance with the motion vector (17) and values of corresponding pixels in each of the images subsequent to re-positioning are summed to form a combined final image (24) .

11. Method according to any one of claims 8-10, wherein the step of calculating a motion vector (17) is repeated for at least one further sequence of images, wherein at least a part of each image of the sequence of images is repositioned in accordance with the respective motion vector for the sequence, and wherein a combined final image is formed as a weighted sum of all images subsequent to re-positioning, each of the sequences being accorded a weighting factor.

12. Image processing method, including a step (57) of analysing frames in a sequence (58) of frames of pixel values, each frame (59) representing a corresponding one of a se- quence of images associated with successive moments in time, including : determining a plurality of time series of cumulative pixel values, each one associated with a pixel position within the images , and selecting pixel positions at which a deviation of the associated time series fulfils a pre-determined criterion, characterised by determining at least one region (61-63) of contiguous selected pixel positions and calculating an associated motion vector lo- cal to that region and representative of movement of at least part of that region in the images represented by the frames (59) .

13. Method according to claim 12, including a repositioning step (22;65) prior to the analysing step (57), the repositioning step (22; 65) including: deriving the frames by repositioning at least a part of succes- sive images in the sequence of images, which part encompasses the pixel positions associated with the time series, in accordance with a global motion vector (17) , representing at least a component of relative movement of that part in the sequence of images .

14. Method according to claim 12 or 13, including repeating the analysis step (57) on a sequence of arrays of pixel values, each array derived from the pixel values representing the region in a corresponding one of the sequence of images, to calculate at least one motion vector local to a sub- region (66,67) within the region (63).

15. Method according to any one of claims 12-14, wherein the or each motion vector is calculated by means of the motion vector calculation step (19) in a method according to any one of claims 1-11.

16. Method of processing a plurality of first frames (16, 70; 79) of pixel values, each first frame representing a corresponding one of a sequence of images, such that the pixel values of a first frame (70;79), when added to corre- sponding pixel values of other first frames, form a first frame representing a combined final image (24,27), characterised by converting at least some of the first frames into respective second frames (73; 81) having a smaller data size than the first frames (70; 79) and adding corresponding pixel values of the second frames (73; 81), so as to form a second frame (74; 82) representing a preview image.

17. Method according to claim 16, wherein a main processing step, including adding the pixel values of a first frame to the corresponding pixel values of other first frames to form a first frame representing a combined final image (24,27), and a further processing step, including converting at least some of the first frames into respective second frames (81) and adding corresponding pixel values of the second frames (81), are at least partly carried out in parallel.

18. Method according to claim 16 or 17, wherein the second frame (74; 82) representing a preview image is converted into a signal for showing the preview image on a display.

19. Method according to any one of claims 16-18, wherein the pixel values of the first frame representing a combined final image are committed to a storage medium.

20. Method according to any one of claims 16-19, including carrying out a method according to any one of claims 1-15.

21. Image processing system, comprising a processor and memory for storing pixel values, which system is configured to execute a method according to any one of claims 1-20.

22. Digital camera, arranged to carry out a method according to any one of claims 1-20.

23. Computer program product having thereon means, when run on a programmable data processing system, to enable the programmable data processing system to execute a method according to any one of claims 1-20.