WO2013135962A1

WO2013135962A1 - A method, an apparatus and a computer program for predicting a position of an object in an image of a sequence of images

Info

Publication number: WO2013135962A1
Application number: PCT/FI2013/050278
Authority: WO
Inventors: Markus KUUSISTO
Original assignee: Mirasys Oy
Priority date: 2012-03-14
Filing date: 2013-03-13
Publication date: 2013-09-19
Also published as: FI20125276L

Abstract

An arrangement for predicting a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a second object at least partially overlap to form a combined object is provided. The arrangement comprises determining a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, determining a vertical reference position in the first image on basis of a pixel of the first set of pixel positions, determining a horizontal reference position in the first image on basis of a pixel of the first set of pixel positions, and predicting the position of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions.

Description

A METHOD, AN APPARATUS AND A COMPUTER PROGRAM FOR PREDICTING A POSITION OF AN OBJECT IN AN IMAGE OF A SEQUENCE OF IMAGES

FIELD OF THE INVENTION

5 The invention relates to image processing. In particular, the invention relates to a method, an apparatus and a computer program for predicting a position of an object in an image of a sequence of images.

BACKGROUND OF THE INVENTION

Image analysis and processing techniques involved in analysis of images to 10 identify an object and track the movement thereof based on images of a sequence of images are typically computationally demanding. Moreover, simultaneous identification and tracking of multiple objects where the objects in an image may be occasionally fully or partially overlapping is a challenging task, for which for which the existing techniques do not provide good performance at 15 a reasonable computational complexity.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a method, an apparatus and a computer program that facilitates computationally efficient but yet accurate tracking of an object in images of a sequence of images.

20 The objects of the invention are reached by a method, an apparatus and a computer program as defined by the respective independent claims.

According to a first aspect of the invention, a method for predicting a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a second object at least partially overlap to form a

25 combined object is provided, the method comprising determining a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, determining a vertical reference position in the first image on basis of a pixel of the first set of pixel positions, determining a horizontal reference posi-

30 tion in the first image on basis of a pixel of the first set of pixel positions, and predicting the position of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions.

According to a second aspect of the invention, an apparatus for predicting a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a second object at least partially overlap to form a combined object is provided, the apparatus comprising an image analysis unit configured to determine a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, an image estimation unit configured to determine a vertical reference position in the first image on basis of a pixel of the first set of pixel positions, and to determine a horizontal reference position in the first image on basis of a pixel of the first set of pixel positions, and an image prediction unit configured to predict the position of the first object in a subsequent image of the sequence of images on basis of the verti- cal and horizontal reference positions.

According to a third aspect of the invention, a computer program is provided, the computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform a method in accordance with the first aspect of the in- vention.

The computer program may be embodied on a volatile or a non-volatile computer-readable record medium, for example as a computer program product comprising at least one computer readable non-transitory medium having program code stored thereon, the program code, which when executed by an ap- paratus, causes the apparatus at least to perform the operations described hereinbefore for the computer program in accordance with the third aspect of the invention.

The exemplifying embodiments of the invention presented in this patent application are not to be interpreted to pose limitations to the applicability of the ap- pended claims. The verb "to comprise" and its derivatives are used in this patent application as an open limitation that does not exclude the existence of also unrecited features. The features described hereinafter are mutually freely combinable unless explicitly stated otherwise. The novel features which are considered as characteristic of the invention are set forth in particular in the appended claims. The invention itself, however, both as to its construction and its method of operation, together with additional objects and advantages thereof, will be best understood from the following de- tailed description of specific embodiments when read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 illustrates a coordinate system used to describe an image.

Figure 2 illustrates a principle of the concept of estimating a size of an object in an image based on its distance from the bottom of the image.

Figure 3 illustrates an apparatus in accordance with an embodiment of the invention.

Figures 4a to 4c illustrate an example of processing pixel positions of a combined object in accordance with an embodiment of the invention. Figure 5 illustrates a method in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Figure 1 illustrates a coordinate system used to describe an image plane 100 and an image 101 in the image plane 101 in this document. The image plane 100 can be considered to comprise a number of pixels, positions of which are determined by coordinates along a u axis and a v axis, and where the origin of the coordinate system determined by the u and v axes is at the center of the image 101 on the image plane 100. The origin could and even the directions of the axes could naturally be selected differently; many conventional image pro- cessing applications place the origin in the top left corner and make the magnitude of the v coordinate increase downwards. For brevity and clarity of description, without losing generality, in the following a position along the u axis may be referred to as a horizontal position and a position along the v axis is referred to as a vertical position. Terms left and right may be used to refer to a position in the direction of the u axis, and terms up and down may be used to refer to a position in the direction of the v axis. Moreover, an extent of an object in the direction of the u axis is referred to as width of the object and an ex- tent of the object along the direction of the v axis is referred to as a height of the object.

An image, such as the image 101 , may be part of a sequence of images. A sequence of images is considered as a time-ordered set of images, where each image of a sequence of images has its predetermined temporal location within the sequence with known temporal distance to the immediately preceding and following images of the sequence. A sequence of images may originate from an imaging device such as a (digital or analog) still camera, from a (digital or analog) video camera, from a device equipped with a camera or a video camera module etc., configured to capture and provide a number of images at a predetermined rate, i.e. at predetermined time intervals. Hence, a sequence of images may comprise still images and/or frames of a video sequence.

For the purposes of efficient analysis and prediction of movement within imag- es of a sequence of images, the images preferably provide a fixed field of view to the environment of the imaging device(s) employed to capture the images. Preferably, the images of a sequence of images originate from an imaging device that has a fixed position throughout the capture of the images of the sequence of images, thereby providing a fixed or essentially fixed field of view throughout the sequence of images. Consequently, any fixed element or object in the field of view of the imaging device remains at the same position in each image of the sequence of images. On the other hand, objects that are moving in the field of view may be present in only some of the images and may have a varying position in these images. However, it is also possible to generate a sequence of images representing a fixed field of view on basis of an original sequence of images captured by an imaging device that is not completely fixed but whose movement with respect its position or orientation is known, thereby providing a field of view that may vary from one image to another. Assuming that the orientation and position of the imaging device for each of the images of the original sequence images is known, it is possible to apply pre-processing to modify images of the original sequence of images in order to create a series of images having a fixed field of view. As example, an imaging device may be arranged to overlook a parking lot, where the parking area, driveways to and from the parking area and the surroundings thereof within the field of view of the imaging device are part of the fixed portion of the images of the sequence of images, whereas a changing portion of the images of the sequence of images comprises e.g. people and cars moving within, to and from the parking area. As another example, an imaging device may be arranged to overlook a portion of an interior of a building, such as a shop or a store. In this another example the fixed portion of the images may comprise shelves and other structures arranged in the store and the items arranged thereon, whereas the changing portion of the images may comprise e.g. the customers moving in the store within the field of view of the imaging device.

For the purposes of efficient analysis and prediction of movement within images of a sequence of images, an imaging device employed to capture the imag- es is preferably positioned in such a way that the camera horizon is in parallel with the plane horizon, consequently resulting in a horizon level in the image plane to be an imaginary line that is in parallel with the u axis. Hence, the horizon level within the image plane may be considered as an imaginary horizontal line at a certain distance from the u axis - or from an edge of the image, the certain distance being dependent on the vertical orientation of the imaging device. In case the vertical orientation of the imaging device does not enable representing the horizon level within the captured image, the horizon level may be considered as an imaginary line in the image plane that is in parallel with the u axis but which is outside the image. In case an imaging device positioned such that there is a non-zero angle of known value between the camera horizon and the plane horizon, preprocessing of images of the captured sequence of images may be applied in order to modify the image data to compensate for the said angle to provide a sequence of images where the horizon can be represented as an imaginary line that is in parallel with the u axis of the image plane.

With the assumption that the images of a sequence of images represent a fixed field of view, an object moving in the field of view may be detected by observing any changes between (consecutive) images of the sequence. As example, an object in an image, i.e. a set of pixel positions in an image, may be identified by comparing the image to a reference image comprising only the fixed portion of the field of view of the images of a sequence of images and identifying the set of pixels that is not present in the reference image. An object in an image may be determined by indicating its position in the image plane together with its shape and/or size in the image plane, all of which may be ex- pressed using the u and v coordinates of the image plane.

Once an object is detected in an image of a sequence of images, a data record comprising information on the object may be created. The information may comprise for example the current and previous positions of the object, the current and/or previous shape(s) of the object, the current and previous size(s) of the object, an identifier of the object and/or any further suitable data that can be used to characterize the object.

In case multiple objects are detected in an image, a dedicated data record may be created and/or updated for each of the objects.

An object moving within the field of view of the imaging device is typically de- picted in two or more images of the sequence of images. An object detected in an image can be identified as the same object already detected in a previous image of the sequence by comparing the characteristics - e.g. with respect to the shape of the object - of the object detected in the image to characteristics of an image detected in a previous image (e.g. as stored in a corresponding data record).

Hence, it is possible to track the movement of the object by determining its position in a number of images of the sequence of images and by characterizing the movement on basis of the change in its position over a number of images. In this regard, the information on the position(s) of the object in a number of images may be stored in the data record comprising information on the object in order to enable subsequent analysis and determination of a movement pattern of the object.

Due to the movement the positions of two objects identified as two objects detected or identified in a previous image of the sequence of images as separate objects may overlap, fully or in part, in an image of the sequence. Consequently, such two objects may merge into a combined object for one or more images of the sequence, while they may again separate as individually identifiable first and second objects in a subsequent image of the sequence. In a similar manner, an object initially identified as a single individual object, e.g. at or near an border of an image of the sequence, may in a subsequent image separate as two individual objects spawn from the initial single object. Information indicating merging of two objects into a combined object and/or separation of a (combined) object into two separate objects may be kept in the data record comprising information on the object in order to facilitate analysis of the evolution of the object(s) within the sequence of images.

While it would be possible to separately determine a position of each pixel in an image representing a given object, in case of an object whose shape or approximation thereof is known, e.g. based on a data record comprising infor- mation on the object, it is sufficient to determine the position of the group of pixels representing the object in an image as a single position. Such determination of position is applicable, in particular, to objects having a fixed shape or having a shape that only slowly evolves in the image plane, resulting in only a small change in shape of the object from one image to another. A position of an object whose shape or approximation thereof is known may be determined or expressed for example as the position(s) of one or more predetermined parts of the object in the image plane. An example of such a predetermined part is a pixel position indicating a geographic center point of the object, thereby - conceptually - indicating a center of mass of the object (with the assumption that each pixel position of the modified first set of pixel positions representing the enlarged first object represents an equal 'mass'). The geographic center point of an object in an image may be determined for example as the average of the coordinates of the pixel positions representing the object in the image. Another example for using predetermined part(s) of an object to indicate a position of the object in an image involves determining at least one of a lower boundary and an upper boundary together with at least one of a left boundary and a right boundary of an imaginary rectangle enclosing the pixel positions representing the object by touching the lowermost, the uppermost, the leftmost and the rightmost pixel positions representing the object. Such a rectangle may be referred to as a bounding box. The lower and upper boundaries may be expressed as a v coordinate, i.e. as a position in the v axis, whereas the left and right boundaries may be expressed as a u coordinate, i.e. a position in the u axis. Consequently, the position of an object may be expressed for example by a coordinate of the u axis indicating the left boundary of a bounding box enclosing the object and by a coordinate of the v axis indicating the lower boundary of the bounding box. This is equivalent of expressing the coordinates of the pixel position indicating the lower left corner of the (rectangular) bounding box. In principle the bounding box does not need to have an exactly rectangular shape; it is possible to use e.g. a bounding circle just large enough to enclose all pixels of the object, or a bounding oval with its u and v dimensions selected to match those of the object. However, a rectangular bounding box is the most common and most easily handled in processing.

A size of an object in an image may be expressed for example by its dimension^) along the axis or axes of the image plane. Thus, a size of an object in an image may be expressed as its extent in the direction of the v axis, i.e. as the height of the object in the image. Alternatively or additionally, a size of an object in an image may be expressed as its extent in the direction of the u axis, i.e. as the width of the object in the image. Such information may be derived for example with the aid of a bounding box, as described hereinbefore. A further alternative for expressing the size of the object is to indicate either the height or the width of an object, e.g. as a height or width of a bounding box en- closing the object, together with an aspect ratio determining the relationship between the height and width of the object. Since the size of an object as represented in an image may vary over time, the data record comprising information on an object may be employed to keep track of the current (or most recent) size of the object and possibly also of the size of the object in a number of previous images.

A shape of an object can be expressed for example by a set of pixel positions or as a two-dimensional 'bitmap' indicating the pixel positions forming the object. Such information may be stored in a data record comprising information on the object. The information regarding the shape of the object may include the current or most recent observed shape of the object and/or the shape of the object in a number of preceding images of the sequence.

Figure 2 schematically illustrates two images 201 , 203 of a sequence of images, the images schematically illustrating a reference object in real-world moving along a plane that is essentially horizontal. Note that only changing por- tions of images are illustrated in the images 201 , 203 thereby omitting any possible fixed portion (or background objects) of the images for clarity of illustration.

The image 201 illustrates the real-world object as an object 205 having a height h_vi and a width w_vi with its lower edge situated at position v_w in the v axis of the image plane. The image 203 illustrates the real-world object as an object 205' having a height h_v2 and a width w_v2 with its lower edge situated at position v_b2 of the v axis of the image plane. Moreover, a level representing the horizon 207 is assumed to be a line that is parallel to the u axis - and also parallel to the lower and upper edges of the images 201 and 203 The real-world object in image 201 is closer to the imaging device than in the image 203, and hence the object is depicted in the image 201 as larger than in the image 203. In particular, both the height h_vi of the object 205 in the image 201 is larger than the height h_v2 of the object 205' and width w_vi of the object 205 in the image 201 is larger than the width w_v2 of the object 205' in the im- age 203. Moreover, since the real-world object depicted in the images 201 and 203 was moving along an essentially horizontal plane, due to the object 205 in the image 201 being closer to the imaging device than the corresponding object 205' in the image 203, the object 205 in the image 201 is closer to the bottom of the image than the object 205' in the image 203. This can be generalized into a rule that a real-world object closer to the imaging device appears closer to the bottom of the image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. In a similar manner, a real- world object closer to the imaging device appears larger in an image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. Moreover, the point, either actual or conceptual, where the size of a real-world object in an image would appear zero or essentially zero, represents the point in the image - e.g. a level of a line parallel to the u axis of the image plane - representing a horizon in the image.

Therefore, a real-world object exhibiting movement towards or away from the imaging device - i.e. towards or away from the horizon - is typically depicted as an object of different size and different distance from the bottom of an image in two images of a sequence of images captured using an imaging device arranged to capture a sequence of images with a fixed field of view. Consequently, it is possible to determine a mapping function configured to determine a size of an object, e.g. a height of the object, in an image on basis of a vertical position of the object within the image. The determined position of the object in two or more previous images may be used to predict the position of the object in a subsequent image. In the following a straightforward example on the principle of predicting the position of an image is provided. The position of an object in image n may be expressed by as a pair of u and v coordinates (u_n, v_n), and the position of the object in image n+1 may expressed as (u_n+i , v_n+ ). Hence, the change in position between the images n and n+1 can be expressed as (u_d, v_d) = (u_n+i - u_n, v_n+ - v_n), thereby indicating the motion of the object in the image plane between two consecutive images of the sequence of images. Consequently, the position of the object in image n+2 may be predicted based on position of the object in image n+1 and the above-mentioned change in position as (u'_n+2, v'_n+2) = (u_n+i + u_d, v_n+ + v_d). Since the change in position is not typically fully constant and hence the prediction may not be fully accurate, the actual position of the object in image n+2, expressed as (u_n+2, v_n+2) may be different from the predicted one, resulting in prediction error of (u_e2, v_e2) = (u_n+2 - u'n+2, v_n+2 - v'_n+2). Moreover, the change in position of the object between two consecutive images may be updated into (u_d, v_d) = (u_n+2 - u_n+i , v_n+2 - v_n+ ) to enable prediction based on the most recently observed change of position between two consecutive images. As a further example, the change in position of an object used in the prediction may be determined on basis of a number observed changes in position of the object in a number of pairs of two consecutive images, e.g. as an average of the change in position of the object in a predetermined number of most recent pairs of consecutive images or as a linear combination of the observed change in position of the object in a predetermined number of most recent pairs of consecutive images. The exemplifying prediction described hereinbefore jointly predicts the change of position in the directions of the u and v axis. Alternatively, it is possible to predict the change of position in the direction of u axis and the change of position in the direction of the v axis separately from each, thereby allowing a straightforward use of different prediction schemes for the predictions in direc- tions of the two axes. A prediction as described hereinbefore may be applied to a number of pixels representing an object in an image - for example to all pixels representing an object or to a subset thereof - or the prediction may be applied to a single pixel chosen to represent the object (as described hereinbefore). While the discussion in the foregoing describes the concept of prediction by using a straightforward example, it is readily apparent that more elaborate prediction schemes can be determined and applied without departing from the scope of the embodiments of the invention.

Figure 3 schematically illustrates an apparatus 300 for predicting a position of an object based on a first image of a sequence of images in accordance with an embodiment of an aspect of the invention.

The apparatus 300 comprises an image analysis unit 301 , an image estimation unit 303 and an image prediction unit 305. The image analysis unit 301 may also be referred to as an image analyzer or as an object analyzer, the image estimation unit 303 may be also referred to as an image estimator or an object estimator, and the image prediction unit 305 may be also referred to as an image predictor or object predictor.

The image analysis unit 301 is operatively coupled to the image estimation unit 303 and to the image prediction unit 305. The image estimation unit 303 is op- eratively coupled also to the image prediction unit 305.

The apparatus 300 may comprise further components, such as a processor, a memory, a user interface, a communication interface, etc.

The apparatus 300 may be configured, in particular, to predict a position of an object in a subsequent image of a sequence of images based on an image of the sequence, wherein in said image a first object and a second object at least partially overlap to form a combined object. In the following, for clarity and brevity of description, we simply refer to such an image illustrating two objects at least partially overlapping as the current image (of the sequence of images).

In this regard, the image analysis unit 301 may be configured to obtain infor- mation indicating the pixel positions of the current image representing a predicted first object and to obtain information indicating the pixel positions of the current image representing a predicted second object. Obtaining such infor- mation may comprise directly obtaining sets of pixel positions representing the first object and the second object. Alternatively, obtaining may comprise deriving the sets of pixel positions on basis of predicted positions, predicted sizes and predicted or known shapes of the first and second objects in the current image. The information indicating the pixel positions predicted to represent the first and second objects may be derived from the image prediction unit 305, either directly or via an intermediate processing unit (not shown in Figure 3) that may be available for transforming the information on the predicted positions, predicted sizes and predicted or known shapes of the predicted first and se- cond objects into sets of pixel positions in the current image.

The image analysis unit 301 may be configured to obtain indication of a combined object in an image of the sequence of images, where the combined object is an object where the first and second objects, identified as separate objects in a preceding image of the sequence of images, overlap. The overlap may be partial, resulting in a scenario where a part or parts of the first and second objects are visible in the image. Alternatively, the overlap may be complete in such a way that the first object completely obstructs the view to the second object or vice versa.

Obtaining indication of a combined object in an image may comprise analysis of the image data of the current image and/or in one or more preceding images of the sequence of images in order to detect presence of a combined object in the image. Alternatively, obtaining indication of a combined object in an image may comprise receiving an indication on the current image comprising a combined object from an intermediate processing unit configured to analyze an im- age in order to detect presence of a combined object in an image.

An object in the current image may be identified as an image comprising a combined object for example in case the predicted first object and the predicted second object in the current image overlap at least partially, i.e. at least some pixel positions are predicted to represent both the first object and the se- cond object in the current image and objects corresponding to the first and second objects cannot be identified as separate objects in the current image. As another example, the analysis may involve search of an object matching or approximating the shape of the union of the pixel positions predicted to represent the first and second objects. As a further example, the analysis may in- volve search - and detection - of an object that is substantially different from the (known or predicted) shape of the first object and from the (known or predicted) shape of the second object in or near the portion of an image where the first and second objects are predicted occur.

The analysis may involve analysis of the image as a whole or analysis of a por- tion of the image. In particular, the analysis may concentrate on a portion of the image where, according to an employed prediction scheme, the first and the second objects are likely to overlap and hence likely to result in a combined object.

The combined object identified in the current image occupies a certain set of pixel positions. This set of pixel positions may be expressed for example by indicating the coordinates in the image plane occupied by these pixel positions, thereby fully specifying the position, the size and the shape of the combined image within the current image.

The image estimation unit 303 is configured to obtain or determine information indicating the pixel positions of the combined object that are predicted to represent the first object only in the current image, i.e. to determine a set of pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object. This set of pixel positions may be referred to as a first set of pixel positions. The image estimation unit 303 may be configured to determine such information for example by obtaining information indicating the set of pixel positions representing the combined object and the set of pixel positions representing the first object, and identifying the first set of pixel positions as the intersection of these two sets of pixel positions. The image estimation unit 303 may be further configured to obtain or determine information indicating the pixel positions of the combined object that are predicted to represent the second object only in the current image, i.e. to determine a set of the pixel positions of the combined object that are predicted to represent the second object but that are not predicted to represent the first ob- ject. This set of pixel positions may be referred to as a second set of pixel positions.

The image estimation unit 303 may be configured to determine such information for example by obtaining information indicating the set of pixel positions representing the combined object and the set of pixel positions representing the second object, and identifying the second set of pixel positions as the intersection of these two sets of pixel positions.

Figure 4a illustrates an example of a combined object 402 within a portion of an image of a sequence of images. Each square in the example of Figure 4a represents a pixel position, where the pixel positions illustrated as white belong the background (i.e. to the fixed portion of the image), whereas the pixel positions illustrates as black belong to the combined object 402.

Figure 4b illustrates an example of the pixel positions of the combined object 402 predicted to represent the first object only, i.e. the first set of pixel positions 404 and the pixel positions of the combined object 402 predicted to represent the second object only, i.e. the second set of pixel positions 406.

The image estimation unit 303 is configured to determine a vertical reference position in the first image on basis of at least one pixel position of the first set of pixel positions. The vertical reference position may be determined, for example, to indicate a lower boundary of the first object, an upper boundary of the first object, or both. As an example, the vertical reference position may be determined to indicate the lower boundary of the first object, and the lower boundary may be determined as a vertical position of the lowermost pixel posi- tion of the first set of pixel positions. In a similar manner, as another example, the vertical reference position may be determined to indicate the upper boundary of the first object, and the upper boundary may be determined as a vertical position of the uppermost pixel position of the first set of pixel positions.

The image estimation unit 303 is configured to determine a horizontal refer- ence position in the first image on basis of at least one pixel position of the first set of pixel positions. The horizontal reference position may be determined, for example, to indicate a left boundary of the first object, a right boundary of the first object, or both. As an example, the horizontal reference position may be determined to indicate the left boundary of the first object, and the left bounda- ry may be determined as a horizontal position of the leftmost pixel position of the first set of pixel positions. In a similar manner, as another example, the horizontal reference position may be determined to indicate the right boundary of the first object, and the right boundary may be determined as a horizontal position of the rightmost pixel position of the first set of pixel positions. Determining the boundaries of the first object corresponds to using a bounding box to (conceptually) enclose the pixel positions of the first object in the current image, as described hereinbefore. Consequently, the lower, upper, left and right sides of the bounding box correspond to the lower, upper, left and right boundaries of the first object, respectively.

The image estimation unit 303 may be configured to determine second vertical and horizontal reference positions to the second object on basis of the second set of pixel positions in a manner similar to described hereinbefore for the first object. The image prediction unit 305 is configured to predict the position of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions. In other words, the vertical and horizontal reference positions are considered to represent the estimated position of the first object, for example, for the purpose of prediction the position of the first object in a subsequent image of the sequence. The subsequent image may be, for example, the image immediately following the current image in the sequence. Any prediction method suitable for estimating a position of an object in an image of a sequence of images based on its position in one or more other images of the sequence, for example a prediction method along the lines described hereinbefore, may employed.

The prediction may comprise prediction of the position of the vertical reference position and the horizontal reference position in a subsequent image. The predicted vertical and horizontal reference positions in the subsequent image can be considered to indicate the predicted position of the boundaries of a set of pixel positions representing the first object in the subsequent image. Consequently, the predicted vertical position of the first object may be employed together with a predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object within the image to determine the estimated height of the first object in the subse- quent image. The same mapping function, or a separate mapping function of similar type, may be also configured to determine a width of the image on basis of the vertical position of the object in an image, or a predetermined aspect ratio (stored, for example, in the data record comprising information on the first object) may be used to estimate the width of the first object on basis of the es- timated height thereof. Together with the known or estimated shape of the ob- ject (found, for example, in the data record comprising information on the first object) the prediction of the first object in the subsequent image is fully specified.

An example of a predetermined mapping function configured to determine a size of an object, e.g. a height of the object, as depicted in an image on basis of a vertical position of the object in the image plane and determination thereof is described in the Finnish patent application No. 20125275.

The image prediction unit 305 may be configured to predict the position of the second object in a subsequent image of the sequence of images in a manner similar to that described hereinbefore for the first object.

The image prediction unit 305 may be further configured to output the predicted position of the first object in a subsequent image of the sequence of images. The image prediction unit 305 may be configured, for example, to provide information indicating the predicted position to the image analysis unit 301 or to another processing unit to facilitate the processing and analysis of subsequent images in the sequence of images.

The image estimation unit 303 may be further configured to determine enlarged versions of the predicted first and second objects. In particular, the image estimation unit 303 may be configured to enlarge the first set of pixel posi- tions to occupy some of the pixel positions of the combined object that are not predicted to represent the first object. Additionally or alternatively, the image estimation unit 303 may be configured to enlarge the second set of pixel positions to occupy some of the pixel positions of the combined object that are not predicted to represent the second object. In order to describe examples of enlarging an object, we may (conceptually) define two further sets of pixel positions associated with the combined object.

The image estimation unit 303 may be further configured to obtain information indicating the pixel positions of the combined object that are predicted to represent both the first object and the second object in the current image. This set of pixel positions may be referred to as a set of shared pixel positions. The image estimation unit 303 may be configured to determine such information for example by obtaining information indicating the set of pixel positions representing the combined object, the set of pixel positions representing the first ob- ject and the set of pixel positions representing the second object, and identifying the set of shared pixel positions as the intersection of these three sets of pixel positions.

The image estimation unit 303 may be further configured to obtain information indicating the pixel positions of the combined object that are not predicted to represent either the first object or the second object in the current image. This set of pixel positions may be referred to as a set of free pixel positions. The image estimation unit 303 may be configured to determine such information for example by obtaining information indicating the set of pixel positions repre- senting the combined object, the first set of pixel positions, the second set of pixel positions and the set of shared pixel positions, and determining the set of free pixel positions as the pixel positions of the combined object not included in the first, the second or the shared set of pixel positions.

As an example, the image estimation unit 303 may be configured to enlarge the first set of pixel positions and/or the second set of pixel positions to occupy at least some of the pixel positions of the combined object not included neither in the first set or the second set. In particular, the image estimation unit 303 may be configured to enlarge the first and/or second sets of pixel positions to occupy at least some of the pixel positions of the set of free pixel positions and/or to some of the pixel positions of the set of shared pixel positions to result in modified first set of pixel positions and in modified second set of pixel positions, respectively.

As another example, the image estimation unit 303 may be configured to jointly enlarge the first and second sets of pixel positions to occupy at least some of the pixel positions of the set of free pixel positions and/or to some of the pixel positions of the set of shared pixel positions.

The image estimation unit 303 may be configured to enlarge the first and/or second sets of pixel positions such that all pixel positions of the set of shared pixel positions to be included either in the modified first set of pixel positions or in the modified second set of pixel positions. Consequently, the modified first and second sets of pixel positions together with the set of shared pixel positions would fully occupy the pixel positions of the combined object in the current image. The image estimation unit 303 is preferably configured to determine the enlarged version of the predicted first and second objects by extending the first and second sets of pixel positions to occupy pixel positions of the set of shared pixel positions such that the enlarged first and second sets of pixels (continue to) represent objects occupying pixel positions forming continuous areas in the image plane.

In the following, a first example enlarging algorithm suitable for enlarging the first and/or second sets of pixel positions is provided. a) Determine an intermediate set of pixel positions as the pixel positions of the set of free pixel positions that are adjacent to a pixel position of the first set or to a pixel position of the second set, and remove the pixel positions of the intermediate set from the set of free pixel positions; b) Add pixel positions of the intermediate set adjacent to a pixel position of the first set but not adjacent to a pixel position of the second set to the first set and remove the added pixel positions from the intermediate set; c) Add pixel positions of the intermediate set adjacent to a pixel position of the second set but not adjacent to a pixel position of the first set to the second set and remove the added pixel positions from the intermediate set;

d) Randomly add the remaining pixel positions of the intermediate set either to the first set or to the second set;

e) In case there are pixel positions remaining in the set of free pixel positions, repeat the steps from a) to d), otherwise the enlarging process is complete. The first example described hereinbefore results in all pixel positions of the set of free pixel positions to be added either to the first set or to the second set of pixel positions.

As a modification of the first example enlarging algorithm described above, a second example enlarging algorithm may be provided by modifying the steps a) and e) to consider both the set of free pixel positions and the set of shared pixel positions instead of only considering the set of free pixel positions. Consequently, the second example enlarging algorithm enlarges the first and second sets of pixel positions to occupy also all pixel positions of the set of shared pixel positions. The process of enlarging the first and second sets of pixels may involve a watersheding process according to a predetermined watersheding algorithm. In this document the term watersheding is used to refer to an enlarging algorithm where two or more objects competitively extend their current areas to a prede- termined area or areas according to a predetermined rule or a set of rules. Such predetermined rule(s) are configured to make a first object entitled to extend its current area to portions of the predetermined area(s) that is/are considered to be closer, according to a predetermined distance measure, to the first object than to the other objects competing to enlarge their areas to the predetermined area(s). The first and second example algorithms described hereinbefore may be consider as watersheding (type of) algorithms.

In particular, the first and second example algorithms described hereinbefore may be considered as binary algorithms that basically consider all 'occupied' pixel positions, i.e. the pixel positions of the first and second sets, to represent the same color, hence corresponding to a binary watersheding applied to a black and white image.

Alternatively, for example a watersheding algorithm or an enlarging algorithm of other type, suitable for a grey scale image may be employed instead. An example of such an algorithm is the Meyer's flooding algorithm known in the art, while a number of further examples of watersheding algorithms are known in the art. With reference to the first and second sets of pixel positions and the set of free pixel positions, an algorithm operating along the lines of the Mayer's algorithm is outlined in the following.

A) Insert pixel positions of the set of free pixel positions adjacent to the first or second set of pixel positions to a priority queue and remove the inserted pixel positions from the set of free pixel positions. A pixel position of the priority queue has a priority corresponding to the grey level of the pixel position, e.g. the value of the pixel.

B) Extract the pixel position with the highest priority from the priority queue.

In case all pixel positions adjacent to the extracted pixel position belong to the first set of pixel positions the extracted pixel position is added to the first set of pixel positions, whereas in case all pixel positions adjacent to the extracted pixel position belong to the second set of pixel positions the extracted pixel position is added to the second set of pixel positions. C) In case there are pixel positions remaining in the set of free pixel positions, repeat the steps A) and B), otherwise the enlarging process is complete.

At the end of the process described above, the pixel positions of the original set of free pixel positions that were not added either to the first set or to the second set (in step B)) form the watershed lines. The process described above may be modified by introducing a further step comprising randomly adding the pixel positions forming the watershed lines either in the first set of pixel positions or in the second set of pixel positions. Instead of watersheding, any suitable process may be employed to extend the first and second sets of pixel positions into the set of free pixel positions and/or the set of shared pixel positions in order to determine modified first and second sets of pixel positions representing enlarged predicted first and second objects in the current image, respectively. While for example the first example enlarging algorithm described hereinbefore aims to enlarge the first and second sets of pixel positions to occupy all pixel positions of the set of free pixel positions, the enlarging process may be configured to leave some of the pixel positions of the set of free pixel positions unoccupied. As a modification of the first or second example enlarging algorithm described above, a third example enlarging algorithm may be provided by replacing the step e) of the with the step e') described in the following. e') In case the number of pixel positions remaining in the set of free pixel positions exceeds a predetermined threshold, repeat the steps from a) to d), otherwise the enlarging process is complete.

Consequently, the enlarging process is continued only until at least a certain percentage of the pixels of the set of free pixel positions closest to the pixel positions of the first or second set have been added either to the first set or to the second set of pixel positions. As a fourth example enlarging algorithm, the steps a) to e) of the first, the second or the third example are repeated for a predetermined number of times Figures 4b and 4c illustrate a process of enlarging the first and second objects by an example. Figure 4b further illustrates pixels of the combined object allocated to three portions of the pixels 408, 408', 408" representing the set of free pixel positions and the set of shared pixel positions 410. In this example an en- larging process results in the three uppermost pixel positions of the portion 408 of the set of free pixel positions and all pixel positions of the portion 408" of the set of free pixel positions to be added to the first set of pixel positions 404, thereby resulting in the modified first set of pixel positions 404' illustrated in Figure 4c. Similarly, the exemplifying enlarging process results in the lower- most pixel position of the portion 408 of the set of free pixel positions and all pixel positions of the portion 408' of the set of free pixel positions to be added to the second set of pixel positions 406, thereby resulting in the modified second set of pixel positions 406' illustrated in Figure 4c.

The first and second sets of pixel positions or the modified first and second sets of pixel positions and the vertical and horizontal reference positions derivable therefrom can be assumed to be indicative of the predicted positions of the first and second objects in the current image. However, due to the overlap of the first and second objects in the current image, the position information may not be considered fully reliable for purpose of accurate prediction of the position of the first and/or the second object in a subsequent image.

In this regard, the predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object within the image, as already referred to hereinbefore, may be employed to refine estimated vertical and/or horizontal reference positions to further improve the accuracy of the prediction.

The image analysis unit 303 may be configured to determine a refined estimate of the lower boundary of the first object, which lower boundary may be used as the vertical reference position.

The determination of the refined estimate of the lower boundary may comprise determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions. Instead of using the lowermost pixel position of the first set, another pixel position representative of the lower boundary of the first object may used to be indicative of the lower boundary of the first object. This may be useful for example to ex- elude outliers in the (modified) first set, caused for example errors or disturbances in the current image, from unduly distorting the estimate. The determination of the refined estimate of the lower boundary may further comprise determining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions. Like in case of estimation of the lower boundary, also the upper boundary of the first object may be based on a pixel position other than the uppermost pixel in case the uppermost pixel position of the first set is not considered suitable representative of the upper boundary of the first object. The determination of the refined estimate of the lower boundary may further comprise using aforementioned predetermined mapping function to determine a first estimate of the height of the first object on basis of the first estimate of the position of the upper boundary of the first object. The determination of the refined estimate of the lower boundary may further comprise determining a second estimate of the position of the lower boundary of the first object on basis of the first estimate of the position of the upper boundary of the first object and on the first estimate of the height of the first object, e.g. by subtracting the first estimate of the height from the v axis coordinate indicating the first estimate of the position of the upper boundary. The refined estimate may be then determined as an average of the first and second estimates of the position of the lower boundary of the first object, and the refined estimate may be employed for example as the vertical reference position indicative of the position of the lower boundary of the first object in the current image.

Alternatively or additionally, the image analysis unit 303 may be configured to determine a refined estimate of the upper boundary of the first object, which may be used as the vertical reference position.

The determination of the refined estimate of the upper boundary may comprise determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions, and de- termining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions. The considerations provided hereinbefore for using a pixel position other than the lowermost or the uppermost one of the first set apply. The determination of the refined estimate of the upper boundary may further comprise using said prede- termined mapping function to determine a first estimate of the height of the first object on basis of the first estimate of the position of the lower boundary of the first object. The determination of the refined estimate of the upper boundary may further comprise determining a second estimate of the position of the upper boundary of the first object on basis of the first estimate of the position of the lower boundary of the first object and on the first estimate of the height of the first object, for example as a sum of the v axis coordinate indicating the first estimate of the position of the lower boundary and the first estimate of the height. The refined estimate of the upper boundary may then be determined as an average of the first and second estimates of the position of the upper boundary of the first object, and the refined estimate may be employed for ex- ample as the vertical reference position indicative of the position of the upper boundary of the first object in the current image.

The image analysis unit 303 may be configured to determine a refined estimate of the right boundary of the first object, which may be used as the horizontal reference position. The determination of the refined estimate of the right boundary may comprise using the aforementioned predetermined mapping function to determine a second estimate of the height of the first object on basis of the vertical reference position. The vertical reference position may be the an estimate of the lower or upper boundary of the first object, defined based on the pixel positions of first set, or the vertical reference position may be a refined estimate of the lower or upper boundary, determined as described hereinbefore.

The determination of the refined estimate of the right boundary may further comprise using a predetermined aspect ratio of the first object to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object. The predetermined aspect ratio may be for example determined on basis of the first object as observed in a number of preceding images of the sequence and/or on basis of information thereof stored in a data record comprising information on the first object.

The determination of the refined estimate of the right boundary may further comprise determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, or on basis of another pixel position considered representative of the left boundary of the first object in consideration of the first set of pixel positions, and determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, or on basis of another pixel position considered representative of the right boundary of the first object in consideration of the first set of pixel positions.

The determination of the refined estimate of the right boundary may further comprise determining a second estimate of the position of the right boundary of the first object on basis of the first estimate of the position of the left boundary of the first object and on the estimate of the width of the first object, for example a sum of the u axis coordinate indicating the first estimate of the position of the left boundary and the first estimate of the width. The refined esti- mate of the right boundary may then be determined as an average of the first and second estimates of the position of the right boundary of the first object, and the refined estimate may be employed for example as the horizontal reference position indicative of the position of the right boundary of the first object in the current image. The image analysis unit 303 may be configured to determine a refined estimate of the left boundary of the first object, which may be used as the horizontal reference position.

The determination of the refined estimate of the left boundary may comprise using the aforementioned predetermined mapping function to determine a se- cond estimate of the height of the first object on basis of the vertical reference position. The vertical reference position may be the an estimate of the lower or upper boundary of the first object, defined based on the pixel positions of first set, or the vertical reference position may be a refined estimate of the lower or upper boundary, determined as described hereinbefore. The determination of the refined estimate of the left boundary may further comprise using the predetermined aspect ratio of the first object to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object.

The determination of the refined estimate of the left boundary may further comprise determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, or on basis of another pixel position considered representative of the left boundary of the first object in consideration of the first set of pixel positions, and determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, or on basis of another pixel position considered representative of the right boundary of the first object in consideration of the first set of pixel positions.

The determination of the refined estimate of the left boundary may further comprise determining a second estimate of the position of the left boundary of the first object on basis of the first estimate of the position of the right boundary of the first object and on the estimate of the width of the first object, for example by subtracting the first estimate of the width from the u axis coordinate indicating the first estimate of the position of the right boundary. The refined esti- mate of the left boundary may then be determined as an average of the first and second estimates of the position of the left boundary of the first object, and the refined estimate may be employed for example as the horizontal reference position indicative of the position of the left boundary of the first object in the current image. The averaging operation involved in determination of the refined estimate of the lower, upper, right and/or left boundary of the first object may involve computing an arithmetic mean of the coordinate(s) of the respective positions.

Alternatively, the averaging operation may involve using a weighted average. For the purpose of determining a weight, the image analysis unit 303 may be configured to determine a reference position representative of an estimated position of the first object in said first image. As an example, the reference position may be determined as a center position of the first set of pixel positions, computed for example as an arithmetic mean of the u and v coordinates of the pixel positions of the first set or the modified first set of pixel positions. The weighting may be arranged to be indicative of the assumption that closer an estimated position of a boundary of the first object is to the reference point the more reliable the respective estimated position can be considered to be. As an example, for determination of the refined estimate of the lower boundary of the first object, the image analysis unit 303 may be configured to use a weighted average where the first estimate of the position of the lower side of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the lower boundary of the first object and said reference position, and the second estimate of the position of the lower boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the upper boundary of the first object and said reference position. The weighting may involve further weighting factors in addition to the ones described hereinbefore, for example a weighting factors that takes the predicted direction of movement into account.

In a similar manner, as further examples, the image analysis unit 303 may be configured to determine the average as a weighted average where the first estimate of the position of the upper/right/left side of the first object is weighted by a factor having a value that is inversely proportional to the distance be- tween the first estimate of the position of the upper/right/left boundary of the first object and said reference position, and the second estimate of the position of the upper/right/left boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the lower/left/right boundary of the first object and said refer- ence position.

In the foregoing, the process involved determination of the first set, the second set and the shared set of pixel positions of the combined object in the current image, consequently enabling determination of the set of free pixel positions. However, it is possible that the first object fully covers the view to the second object in the current image (or vice versa, discussed here only for the first object covering the view to the second object without losing the generality of description). Consequently, the second set of pixel positions, i.e. the pixel positions of the combined object that are predicted to represent the second object only in the current image, may be an empty set. In case this happens in a number of consecutive images of the sequence of images, the prediction of the position of the second object in an image may become redundant. In such a case the data record comprising information on the first object may updated or complemented to include an indication that the second object is - at least temporarily - merged into the first object, while still keeping the data record com- prising information on the second object in memory for subsequent use in case the second object departs from the first object in a later image of the sequence of images to enable further tracking and prediction of position of the second object based on the information already available for the second object.

Even though the process of handling a combined object in the current image was described using an example of two partially or fully overlapping objects, the description can be generalized for processing and prediction of three or more fully or partially overlapping objects forming a combined object in an image of a sequence of images.

The operations, procedures and/or functions assigned to the image analysis unit 301 , the image estimation unit 303 and the image prediction unit 305 described hereinbefore may be divided between the units in a different manner, or there may be further units to perform some of the operations, procedures and/or functions described hereinbefore for the above-mentioned units. On the other hand, the operations, procedures and/or functions the image analysis 301 , the image estimation unit 303 and the image prediction unit 305 are configured to perform may be assigned to a single processing unit within the apparatus 300 instead. In particular, in accordance with an aspect of the invention, the apparatus 300 may comprise means for determining a first set of pixel positions as the pixel positions of the combined object that are predicted to rep- resent the first object but that are not predicted to represent the second object, means for determining a vertical reference position in the first image on basis of at least one pixel position of the first set of pixel positions, means for determining a horizontal reference position in the first image on basis of at least one pixel position of the first set of pixel positions, and mans for predicting the posi- tion of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions. The apparatus 300 may further comprise means for outputting the predicted position of the first object in a subsequent image of the sequence of images.

The operations, procedures and/or functions described hereinbefore in context of the apparatus 300 may also be expressed as steps of a method implementing the corresponding operation, procedure and/or function.

As an example, a method 500 in accordance with an embodiment of an aspect of the invention is illustrated in Figure 5. The method 500 may be arranged to predict a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a second object at least partially overlap to form a combined object. The method 500 may comprise obtaining image data of a first image of a sequence of images, wherein in the first image a first object and a second object at least partially overlap, thereby forming a combined object in the image plane. The method 500 comprises determining a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, as indicated in step 502. The method 500 further comprises determining a vertical reference position in the first image on basis of at least one pixel position of the first set of pixel positions and determining a horizontal ref- erence position in the first image on basis of at least one pixel position of the first set of pixel positions, as indicated in steps 504 and 506, respectively. The method 500 further comprises predicting the position of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions, as indicated in step 508. The method 500 may fur- ther comprise outputting the predicted position of the first object in a subsequent image of the sequence of images.

The apparatus 300 may be implemented as hardware alone, for example as an electric circuit, as a programmable or non-programmable processor, as a microcontroller, etc. The apparatus 300 may have certain aspects implement- ed as software alone or can be implemented as a combination of hardware and software.

The apparatus 300 may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instructions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium to be executed by such a processor. The apparatus 300 may further comprise a memory as the computer readable storage medium the processor is configured to read from and write to. The memory may store a computer program comprising computer-executable instructions that control the operation of the switch control apparatus 300 when loaded into the processor. The processor is able to load and execute the computer program by reading the computer-executable instructions from memory.

While the processor and the memory are hereinbefore referred to as single components, the processor may comprise one or more processors or processing units and the memory may comprise one or more memories or memory units. Consequently, the computer program, comprising one or more sequences of one or more instructions that, when executed by the one or more processors, cause an apparatus to perform steps implementing operations, procedures and/or functions described in context of the apparatus 300. Reference to a processor or a processing unit should not be understood to encompass only programmable processors, but also dedicated circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processors, etc. Features described in the preceding description may be used in combinations other than the combinations explicitly described. Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.

Claims

1 . A method for predicting a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a second object at least partially overlap to form a combined object, the method comprising determining a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, determining a vertical reference position in the first image on basis of at least one pixel position of the first set of pixel positions, determining a horizontal reference position in the first image on basis of at least one pixel position of the first set of pixel positions, and predicting the position of the first object in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions.

2. A method according to claim 1 , further comprising enlarging the first set of pixel positions to occupy some of the pixel positions of the combined object that are not predicted to represent the first object prior to determining the vertical and horizontal reference positions.

3. A method according to claim 2, further comprising determining a set of free pixel positions as the pixel positions of the combined object that are not predicted to represent either the first object or the second object, and enlarging the first set of pixel positions to occupy some of the pixel positions of the set of free pixel positions.

4. A method according to claim 3, further comprising determining a second set of pixel positions as the pixel positions of the combined object that are predicted to represent the second object but that are not predicted to represent the first object, and jointly enlarging the first and second sets of pixel positions to occupy some of the pixel positions of the set of free pixel positions.

5. A method according to claim 4, wherein said joint enlarging comprises using a watersheding algorithm.

6. A method according to any of claims 1 to 5, wherein the vertical reference position is determined as an estimated position of the lower boundary of the first object and wherein determining the vertical reference position comprises determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions, determining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions, using a predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object within the image to determine a first estimate of the height of the first object on ba- sis of the first estimate of the position of the upper boundary of the first object, determining a second estimate of the position of the lower boundary of the first object on basis of the first estimate of the position of the upper boundary of the first object and on the first estimate of the height of the first object, determining the vertical reference position as an average of the first and second estimates of the position of the lower boundary of the first object.

7. A method according to any of claims 1 to 5, wherein the vertical reference position is determined as an estimated position of the upper boundary of the first object and wherein determining the vertical reference position comprises determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions, determining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions, using a predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object with- in the image to determine a first estimate of the height of the first object on basis of the first estimate of the position of the lower boundary of the first object, determining a second estimate of the position of the upper boundary of the first object on basis of the first estimate of the position of the lower boundary of the first object and on the first estimate of the height of the first object, determining the vertical reference position as an average of the first and second estimates of the position of the upper boundary of the first object.

8. A method according to claim 6 or 7, wherein the horizontal reference po- sition is determined as an estimated position of the right boundary of the first object and wherein determining the horizontal reference position comprises using said predetermined mapping function to determine a second estimate of the height of the first object on basis of the vertical reference position, using a predetermined aspect ratio to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object, determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, determining a second estimate of the position of the right boundary of the first object on basis of the first estimate of the position of the left boundary of the first object and on the estimate of the width of the first object, determining the horizontal reference position as an average of the first and second estimates of the position of the right boundary of the first object.

9. A method according to claim 6 or 7, wherein the horizontal reference position is determined as an estimated position of the left boundary of the first ob- ject and wherein determining the horizontal reference position comprises using said predetermined mapping function to determine a second estimate of the height of the first object on basis of the vertical reference position, using a predetermined aspect ratio to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object, determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, determining a second estimate of the position of the left boundary of the first object on basis of the first estimate of the position of the right boundary of the first object and on the estimate of the width of the first object, determining the horizontal reference position as an average of the first and second estimates of the position of the left boundary of the first object.

10. A method according to any of claims 6 to 9, further comprising determining a reference position representative of an estimated position of the first object in said first image as a center position of the first set of pixel positions, and wherein said average is a weighted average, where the first estimate of the position of the lower/upper/right/left boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the lower/upper/right/left boundary of the first object and said reference position, and the second estimate of the position of the lower/upper/right/left boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the upper/lower/left/right boundary of the first object and said reference position.

1 1 . A method according to any of claims 1 to 5, wherein the vertical reference position is determined as one of the vertical position of the lowermost pixel position of the first set of pixel positions and the vertical position of uppermost pixel position of the first set of pixel positions, and the horizontal reference position is determined as one of the horizontal position of the leftmost pixel position of the first set of pixel positions and the horizontal position of the rightmost pixel position of the first set of pixel positions.

12. A computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform the method of any of claims 1 to 1 1 .

13. An apparatus for predicting a position of an object based on a first image of a sequence of images, wherein in said first image a first object and a se- cond object at least partially overlap to form a combined object, the apparatus comprising an image analysis unit configured to determine a first set of pixel positions as the pixel positions of the combined object that are predicted to represent the first object but that are not predicted to represent the second object, an image estimation unit configured to determine a vertical reference position in the first image on basis of at least one pixel position of the first set of pixel positions, and to determine a horizontal reference position in the first image on basis of at least one pixel position of the first set of pixel positions, and an image prediction unit configured to predict the position of the first ob- ject in a subsequent image of the sequence of images on basis of the vertical and horizontal reference positions.

14. An apparatus according to claim 13, wherein the image estimation unit is configured to enlarge enlarging the first set of pixel positions to occupy some of the pixel positions of the combined object that are not predicted to represent the first object prior to determining the vertical and horizontal reference positions.

15. An apparatus according to claim 14, wherein the image analysis unit is configured to determine a set of free pixel positions as the pixel positions of the combined object that are not predicted to represent either the first object or the second object, and the image estimation unit is configured to enlarge the first set of pixel positions to occupy some of the pixel positions of the set of free pixel positions.

16. An apparatus according to claim 15, wherein the image analysis unit is configured to determine a second set of pixel positions as the pixel positions of the combined object that are predicted to represent the second object but that are not predicted to represent the first object, and the image estimation unit is configured to jointly enlarge the first and second sets of pixel positions to occupy some of the pixel positions of the set of free pixel positions.

17. An apparatus according to claim 16, wherein said joint enlarging comprises using a watersheding algorithm.

18. An apparatus according to any of claims 13 to 17, wherein the image estimation unit is configured to determine the vertical reference position as an es- timated position of the lower boundary of the first object, the determination of the vertical reference position comprising determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions, determining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions, using a predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object within the image to determine a first estimate of the height of the first object on basis of the first estimate of the position of the upper boundary of the first object, determining a second estimate of the position of the lower boundary of the first object on basis of the first estimate of the position of the upper boundary of the first object and on the first estimate of the height of the first object, determining the vertical reference position as an average of the first and second estimates of the position of the lower boundary of the first object.

19. An apparatus according to any of claims 13 to 17, wherein the image estimation unit is configured to determine the vertical reference position as an estimated position of the upper boundary of the first object, the determination of the vertical reference position comprising determining a first estimate of the position of the lower boundary of the first object as the lowermost pixel position of the first set of pixel positions, determining a first estimate of the position of the upper boundary of the first object as the uppermost pixel position of the first set of pixel positions, using a predetermined mapping function configured to determine a height of an object in an image on basis of a vertical position of the object within the image to determine a first estimate of the height of the first object on basis of the first estimate of the position of the lower boundary of the first object, determining a second estimate of the position of the upper boundary of the first object on basis of the first estimate of the position of the lower boundary of the first object and on the first estimate of the height of the first object, and determining the vertical reference position as an average of the first and second estimates of the position of the upper boundary of the first object.

20. An apparatus according to claim 18 or 19, wherein the image estimation unit is configured to determine the horizontal reference position as an estimated position of the right boundary of the first object, the determination of the horizontal reference position comprising using said predetermined mapping function to determine a second estimate of the height of the first object on basis of the vertical reference position, using a predetermined aspect ratio to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object, determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, determining a second estimate of the position of the right boundary of the first object on basis of the first estimate of the position of the left bounda- ry of the first object and on the estimate of the width of the first object, and determining the horizontal reference position as an average of the first and second estimates of the position of the right boundary of the first object.

21 . An apparatus according to claim18 or 19, wherein the image estimation unit is configured to determine the horizontal reference position as an estimated position of the left boundary of the first object, the determination of the horizontal reference position comprising using said predetermined mapping function to determine a second estimate of the height of the first object on basis of the vertical reference posi- tion, using a predetermined aspect ratio to determine an estimate of the width of the first object on basis of the second estimate of the height of the first object, determining a first estimate of the position of the left boundary of the first object as the leftmost pixel position of the first set of pixel positions, determining a first estimate of the position of the right boundary of the first object as the rightmost pixel position of the first set of pixel positions, determining a second estimate of the position of the left boundary of the first object on basis of the first estimate of the position of the right boundary of the first object and on the estimate of the width of the first object, determining the horizontal reference position as an average of the first and second estimates of the position of the left boundary of the first object.

22. An apparatus according to any of claims 18 to 21 , wherein the image estimation unit is configured to determine a ref- erence position representative of an estimated position of the first object in said first image as a center position of the first set of pixel positions, and wherein said average is a weighted average, where the first estimate of the position of the lower/upper/right/left boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the lower/upper/right/left boundary of the first object and said reference position, and the second estimate of the position of the lower/upper/right/left boundary of the first object is weighted by a factor having a value that is inversely proportional to the distance between the first estimate of the position of the upper/lower/left/right boundary of the first object and said reference position.

23. An apparatus according to any of claims 13 to 17, wherein the image estimation unit is configured to determine the vertical reference position as one of the vertical position of the lowermost pixel position of the first set of pixel positions and the vertical position of uppermost pixel position of the first set of pixel positions, and determine the horizontal reference position as one of the horizontal position of the leftmost pixel position of the first set of pixel positions and the horizontal position of the rightmost pixel position of the first set of pixel positions.