WO2013135963A1

WO2013135963A1 - A method, an apparatus and a computer program for determination of an image parameter

Info

Publication number: WO2013135963A1
Application number: PCT/FI2013/050279
Authority: WO
Inventors: Markus KUUSISTO; Jussi SAINIO
Original assignee: Mirasys Oy
Priority date: 2012-03-14
Filing date: 2013-03-13
Publication date: 2013-09-19
Also published as: FI20125281L

Abstract

An arrangement for estimating a position representing a reference level in an image plane in a sequence of images is provided. The arrangement comprises obtaining information indicating positions and sizes of two or more objects in the image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size, determining a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, and using the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size.

Description

A method, an apparatus and a computer program for determination of an image parameter

FIELD OF THE INVENTION The invention relates to image analysis and image processing. In particular, the invention relates to a method, an apparatus and a computer program for determining an imaging parameter or parameters associated with images of a sequence of images.

BACKGROUND OF THE INVENTION Information regarding a position and an orientation of an imaging device with respect to its surroundings at the time of capture of an image or images may provide useful information for analysis and processing of the captured images. For example the height of the imaging device from a ground level, orientation of the resulting image plane with respect to the ground level, etc. may be pa- rameters that facilitate efficient analysis of images.

While some such parameters may be determined by rather straightforward measurements at the location of the imaging device, such straightforward approach is not always available. For example, the location of the imaging device may be inaccessible or even unknown. Moreover, the imaging device may move or be moved e.g. periodically, hence requiring repeated measurements which may be impractical or even impossible.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a method, an apparatus and a computer program that enables determination of one or more parameters as- sociated with a position and/or orientation of an imaging device with respect to its surroundings based on one or more images captured with the imaging device in a computationally efficient but yet accurate manner.

The objects of the invention are reached by a method, an apparatus and a computer program as defined by the respective independent claims. According to a first aspect of the invention, a method for estimating a position representing a reference level in an image plane in a sequence of images is provided, the method comprising obtaining information indicating positions and sizes of two or more objects in the image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size, determining a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, and using the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size.

According to a second aspect of the invention, an apparatus for estimating a position representing a reference level in an image plane in a sequence of im- ages is provided, the apparatus comprising an image analysis unit and a reference level determination unit, wherein the image analysis unit is configured to obtain information indicating positions and sizes of two or more objects in the image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world ob- ject having a first size, and wherein the reference level determination unit configured to determine a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images and to use the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size.

According to a third aspect of the invention, a computer program is provided, the computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an appa- ratus to at least perform a method in accordance with the first aspect of the invention.

The computer program may be embodied on a volatile or a non-volatile computer-readable record medium, for example as a computer program product comprising at least one computer readable non-transitory medium having pro- gram code stored thereon, the program code, which when executed by an ap- paratus, causes the apparatus at least to perform the operations described hereinbefore for the computer program in accordance with the third aspect of the invention.

The exemplifying embodiments of the invention presented in this patent appli- cation are not to be interpreted to pose limitations to the applicability of the appended claims. The verb "to comprise" and its derivatives are used in this patent application as an open limitation that does not exclude the existence of also unrecited features. The features described hereinafter are mutually freely combinable unless explicitly stated otherwise. The novel features which are considered as characteristic of the invention are set forth in particular in the appended claims. The invention itself, however, both as to its construction and its method of operation, together with additional objects and advantages thereof, will be best understood from the following detailed description of specific embodiments when read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 a illustrates a coordinate system used to describe an image plane.

Figure 1 b illustrates a coordinate system used to describe a real world.

Figure 2 illustrates a principle of the concept of estimating a size of an object in an image based on its distance from the bottom of the image.

Figure 3 schematically illustrates an apparatus in accordance with an embodiment of the invention.

Figure 4 illustrates the principle of linear fitting for determination of a mapping function. Figure 5 provides a flowchart illustrating a method in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Figure 1 a illustrates a coordinate system used to describe an image plane 100 and an image 101 in the image plane 100 in this document. The image plane 100 can be considered to comprise a number of pixels, positions of which are determined by coordinates along a u axis and a v axis, and where the origin of the coordinate system determined by the u and v axes is at the center of the image 101 on the image plane 100. The origin could and even the directions of the axes could naturally be selected differently; many conventional image pro- cessing applications place the origin in the top left corner and make the magnitude of the v coordinate increase downwards. For brevity and clarity of description, without losing generality, in the following a position along the u axis may be referred to as a horizontal position and a position along the v axis is referred to as a vertical position. Terms left and right may be used to refer to a position in the direction of the u axis, and terms up and down may be used to refer to a position in the direction of the v axis. Moreover, an extent of an object in the direction of the u axis is referred to as width of the object and an extent of the object along the direction of the v axis is referred to as a height of the object. Figure 1 b illustrates a coordinate system 1 10 used to describe a real world, projection of which is mapped on an image on the image plane upon capture of an image. A position in the real world may be expressed by coordinates in the x, y and z axes, as illustrated in Figure 1 b. A coordinate in direction of x, y and/or z axes may be expressed as a distance from the origin, for example in meters. The x and z axes can be considered to represent a plane that approximates the ground level. While this may not be exactly accurate representation of the ground, which locally may comprise hills and slopes and which in the large scale is actually a geoid, it provides sufficient modeling accuracy. Consequently, the y axis can be considered as the height from the ground level - or from the plane approximating the ground level.

Figure 1 c schematically illustrates a relationship between the real world coordinate system 1 10 and the image plane 101 . Figure 1 c shows the x, y and z axes of the real world coordinate system 1 10 such that the x axis is perpendicular to the figure. The illustration of the image plane 100 in Figure 1 c explicitly indicates the direction of the v axis, whereas the u axis is assumed to be perpendicular to the figure. The parameter y_c indicates the height of the focal point of an imaging device from the ground level represented by the x and z axes, f denotes a focal length of the imaging device along an imaginary line perpendicular to the image plane 100, and θ_χ denotes the angle between the imagi- nary line perpendicular to the image plane 100 and the horizon plane 121 , i.e. the tilt angle of the imaging device. In the following, the relationship between a point in the real world coordinate system 1 10 and the corresponding point in the image plane 100 is described.

Since the coordinate system 1 10 employed hereinbefore to model the real world is a left-handed coordinate system, a minus sign is added in front of the u and v coordinates representing the corresponding point - or pixel position - of the image plane 100. Let's make the following definitions.

0

K / (1 )

0

uw

U —vw (3) w

where K is a projection matrix of the imaging device, R is a rotation matrix, U denotes the projection of the real-world point in the image plane in homogenous coordinates, and X denotes the real-world coordinates of a point to be projected. Note that the imaging device projects a point in the real world coor- dinate system into the two-dimensional image plane. The vector U introduces an additional dimension w as the third dimension of the projected point. The actual projected point may be recovered by dividing the components of the vector U by w.

Consequently, the projection of a point (x, y, z) in the real world coordinate sys- tern 1 10 on the image plane 100 is

U = WKRX - KRX

z (5) and hence the position in the image plane may be expressed as uw fWx

—vw W(f(y_c - y) cos θ_χ - fz sin θ_χ) (6) w W{z cos θ_χ + (y_c - y) sin θ_χ) An image, such as the image 101 , may be part of a sequence of images. A sequence of images is considered as a time-ordered set of images, where each image of a sequence of images has its predetermined position within the sequence. Moreover, each image of the sequence preferably has a predeter- mined temporal location within the sequence with known temporal distance to the immediately preceding and immediately following images of the sequence. A sequence of images may originate from an imaging device such as a (digital or analog) still camera, from a (digital or analog) video camera, from a device equipped with a camera or a video camera module etc., configured to capture and provide a number of images at a predetermined rate, i.e. at predetermined time intervals. Hence, a sequence of images may comprise still images and/or frames of a video sequence.

For the purposes of efficient analysis and prediction of movement within images of a sequence of images, the images preferably provide a fixed field of view to the environment of the imaging device(s) employed to capture the images. Preferably, the images of a sequence of images originate from an imaging device that has a fixed position throughout the capture of the images of the sequence of images, thereby providing a fixed or essentially fixed field of view throughout the sequence of images. Consequently, any fixed element or object in the field of view of the imaging device remains at the same position in each image of the sequence of images. On the other hand, objects that are moving in the field of view may be present in only some of the images and may have a varying position in these images.

However, it is also possible to generate a sequence of images representing a fixed field of view on basis of an original sequence of images captured by an imaging device that is not completely fixed but whose movement with respect its position or orientation is known, thereby providing a field of view that may vary from one image to another. Assuming that the orientation and position of the imaging device for each of the images of the original sequence images is known, it is possible to apply pre-processing to modify images of the original sequence of images in order to create a sequence of images having a fixed field of view.

As an example, an imaging device may be arranged to overlook a parking lot, where the parking area, driveways to and from the parking area and the sur- roundings thereof within the field of view of the imaging device are part of the fixed portion of the images of the sequence of images, whereas a changing portion of the images of the sequence of images comprises e.g. people and cars moving within, to and from the parking area. As another example, an imaging device may be arranged to overlook a portion of an interior of a building, such as a shop or a store. In this another example the fixed portion of the images may comprise shelves, racks and other structures arranged in the store and the items arranged thereon, whereas the changing portion of the images may comprise e.g. the customers moving in the store within the field of view of the imaging device. For the purposes of efficient analysis and prediction of movement within images of a sequence of images, an imaging device employed to capture the images is preferably positioned in such a way that the camera horizon is in parallel with the plane horizon, consequently resulting in a horizon level in the image plane to be an imaginary line that is in parallel with the u axis. Hence, the hori- zon level within the image plane may be considered as an imaginary horizontal line at a certain distance from the u axis - or from an edge of the image, the certain distance being dependent on the vertical orientation of the imaging device. In case the vertical orientation of the imaging device does not enable representing the horizon level within the captured image, the horizon level may be considered as an imaginary line in the image plane that is in parallel with the u axis but which is outside the image.

In case an imaging device positioned such that there is a non-zero angle of known magnitude between the camera horizon and the plane horizon, preprocessing of images of the captured sequence of images may be applied in order to modify the image data to compensate for the said angle, e.g. by rotating the images, to provide a sequence of images where the horizon can be represented as an imaginary line that is in parallel with the u axis of the image plane.

With the assumption that the images of a sequence of images represent a fixed field of view, an object moving in the field of view may be detected by observing any changes between (consecutive) images of the sequence. As example, an object in an image, i.e. a set of pixel positions in an image, may be identified by comparing the image to a reference image comprising only the fixed portion of the field of view of the images of a sequence of images and identifying the set of pixels that is not present in the reference image. An object in an image may be determined by indicating its position in the image plane together with its shape and/or size in the image plane, all of which may be expressed using the u and v coordinates of the image plane.

Once an object is detected in an image of a sequence of images, a data record comprising information on the object may be created. The information may comprise for example the current and/or previous positions of the object, the current and/or previous shape(s) of the object, the current and/or previous size(s) of the object, an identifier of the object and/or any further suitable data that can be used to characterize the object. In case multiple objects are detected in an image, a dedicated data record may be created and/or updated for each of the objects.

An object moving within the field of view of the imaging device is typically depicted in two or more images of the sequence of images. An object detected in an image can be identified as the same object already detected in a previous image of the sequence by comparing the characteristics - e.g. with respect to the shape of the object - of the object detected in the image to characteristics of an image detected in a previous image (e.g. as stored in a corresponding data record).

Hence, it is possible to track the movement of the object by determining its po- sition in a number of images of the sequence of images and by characterizing the movement on basis of the change in its position over a number of images. In this regard, the information on the position(s) of the object in a number of images may be stored in the data record comprising information on the object in order to enable subsequent analysis and determination of a movement pat- tern of the object.

Due to the movement the positions of two objects detected or identified in a previous image of the sequence of images as separate objects may overlap, fully or in part, in an image of the sequence. Consequently, such two objects may merge into a combined object for one or more images of the sequence, while they may again separate as individually identifiable two objects in a subsequent image of the sequence. In a similar manner, an object initially identified as a single individual object, e.g. at or near a border of an image of the sequence, may in a subsequent image separate as two individual objects spawn from the initial single object. Information indicating merging of two ob- jects into a combined object and/or separation of a (combined) object into two separate objects may be kept in the data record comprising information on the object in order to facilitate analysis of the evolution of the object(s) within the sequence of images. While it would be possible to separately determine a position of each pixel in an image representing a given object, in case of an object whose shape or approximation thereof is known, e.g. based on a data record comprising information on the object, it is sufficient to determine the position of the group of pixels representing the object in an image as a single position in the image plane. Such determination of position is applicable, in particular, to objects having a fixed shape or having a shape that only slowly evolves in the image plane, resulting in only a small change in shape of the object from one image to another.

A position of an object whose shape or approximation thereof is known may be determined or expressed for example as the position(s) of one or more predetermined parts of the object in the image plane. An example of such a predetermined part is a pixel position indicating a geographic center point of the object, thereby - conceptually - indicating a center of mass of the object (with the assumption that each pixel position of the modified first set of pixel positions representing the enlarged first object represents an equal 'mass'). The geographic center point of an object in an image may be determined for example as the average of the coordinates of the pixel positions representing the object in the image.

Another example for using predetermined part(s) of an object to indicate a po- sition of the object in an image involves determining at least one of a lower boundary and an upper boundary together with at least one of a left boundary and a right boundary of an imaginary rectangle enclosing the pixel positions representing the object by touching the lowermost, the uppermost, the leftmost and the rightmost pixel positions representing the object in the image plane. Such a rectangle may be referred to as a bounding box. The lower and upper boundaries may be expressed as a v coordinate, i.e. as a position in the v axis, whereas the left and right boundaries may be expressed as a u coordinate, i.e. a position in the u axis. Consequently, the position of an object may be expressed for example by a coordinate of the u axis indicating the left boundary of a bounding box enclosing the object and by a coordinate of the v axis indicating the lower boundary of the bounding box. This is equivalent of expressing the coordinates of the pixel position indicating the lower left corner of the (rectangular) bounding box. In principle the bounding box does not need to have an exactly rectangular shape; it is possible to use e.g. a bounding circle just large enough to enclose all pixels of the object, or a bounding oval with its u and v dimensions selected to match those of the object. However, a rectangular bounding box is the most common and most easily handled in processing.

A size of an object in an image may be expressed for example by its dimension^) along the axis or axes of the image plane. Thus, a size of an object in an image may be expressed as its extent in the direction of the v axis, i.e. as the height of the object in the image. Alternatively or additionally, a size of an object in an image may be expressed as its extent in the direction of the u axis, i.e. as the width of the object in the image. A height and/or a width may be expressed for example as a number of pixel positions corresponding to the height/width in the image plane. Such information may be derived for example with the aid of a bounding box, as described hereinbefore. A further alternative for expressing the size of the object is to indicate either the height or the width of an object, e.g. as a height or width of a bounding box enclosing the object, together with an aspect ratio determining the relationship between the height and width of the object. Since the size of an object as depicted in an image may vary over time, the data record comprising information on an object may be employed to keep track of the current (and/or most recent) size of the object and possibly also of the size of the object in a number of previous images.

A shape of an object can be expressed for example by a set of pixel positions or as a two-dimensional 'bitmap' indicating the pixel positions forming the object. Such information may be stored in a data record comprising information on the object. The information regarding the shape of the object may include the current or most recent observed shape of the object and/or the shape of the object in a number of preceding images of the sequence.

Figure 2 schematically illustrates two images 201 , 203 of a sequence of images, the images schematically illustrating a reference object in real-world mov- ing along a plane that is essentially horizontal, for example the plane deter- mined by the x and z axes of the real world coordinate system 1 10 described hereinbefore. Note that only changing portions of images are illustrated in the images 201 , 203 thereby omitting any possible fixed portion (or background objects) of the images for clarity of illustration. The image 201 illustrates the real-world object as an object 205 having a height h_vi and a width w_vi with its lower edge situated at position v_w in the v axis of the image plane. The image 203 illustrates the real-world object as an object 205' having a height h_v2 and a width w_v2 with its lower edge situated at position v_b2 of the v axis of the image plane. Moreover, a level representing the horizon 207 is assumed to be a line that is parallel to the u axis - and also parallel to the lower and upper edges of the images 201 and 203.

The real-world object in image 201 is closer to the imaging device than in the image 203, and hence the object is depicted in the image 201 as larger than in the image 203. In particular, both the height h_vi of the object 205 in the image 201 is larger than the height h_v2 of the object 205' and width w_vi of the object 205 in the image 201 is larger than the width w_v2 of the object 205' in the image 203. Moreover, since the real-world object depicted in the images 201 and 203 was moving along an essentially horizontal plane, due to the object 205 in the image 201 being closer to the imaging device than the corresponding ob- ject 205' in the image 203, the object 205 in the image 201 is closer to the bottom of the image than the object 205' in the image 203.

This can be generalized into a rule that a real-world object closer to the imaging device appears closer to the bottom of the image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. In a similar manner, a real- world object closer to the imaging device appears larger in an image than the same real-world object - or another real-world object of identical or essentially identical size - situated further away from the imaging device. Moreover, the point, either actual or conceptual, where the size of a real-world object in an image would appear zero or essentially zero, represents the point in the image - e.g. a level of a line parallel to the u axis of the image plane - representing a horizon in the image.

Therefore, a real-world object exhibiting movement towards or away from the imaging device - i.e. towards or away from the horizon - is typically depicted as an object of different size and different distance from the bottom of an image in two images of a sequence of images captured using an imaging device arranged to capture a sequence of images with a fixed field of view. Consequently, it is possible to determine a mapping function configured to determine a size, .e.g. a height, of an object in an image on basis of a vertical position of the object within the image.

Figure 3 schematically illustrates an apparatus 300 for estimating a position representing a reference level in an image plane in an image of a sequence of images. The apparatus 300 comprises an image analysis unit 301 and a refer- ence level determination unit 303. The image analysis unit 301 may also be referred to as an image analyzer or as an object analyzer, and the reference level determination unit 303 may be also referred to as a reference level estimator or reference level determiner.

The image analysis unit 301 is operatively coupled to the reference level de- termination unit 303. The apparatus 300 may comprise further processing units and components, such as a processor, a memory, a user interface, a communication interface, etc. In particular, the apparatus 300 may receive input from one or more external processing units and/or apparatuses and the apparatus 300 may provide output to one or more external processing units and/or appa- ratuses.

The reference level may be expressed, for example, as a v coordinate of the image plane, hence determining an imaginary line parallel to the v axis in the image plane.

The image analysis unit 301 is configured to obtain information indicating posi- tions and sizes of two or more objects in an image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size. Said two or more objects may depict a single real-world object of the first size or two or more real-world objects of similar or essentially similar size, said two or more real-world objects hence having a size matching or essentially matching the first size. Furthermore, said two or more objects may comprise real-world objects of different size, for example an object or objects having a first size and an object or objects having a second size, where the size of the second object as depicted in the image plane is scaled with a suitable scaling factor such that the scaled size corresponds to the first size.

In order to enable determination of a mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes in the image plane, at least two position - size pairs are needed. Having more than two observed position - size pairs improves the accuracy of mapping, thereby improving the reliability of the estimate of a position in the image plane representing the reference level. Typically, the higher the number of observed position - size pairs, the better the re- liability of mapping. The observations may originate from a single real-world object depicted in the image plane in two or more images of the sequence, or the observations may originate from two or more real-world objects of the same, similar or essentially similar size depicted in the image plane in one or more images of the sequence. Moreover, the observation may originate from two or more real-world objects of different size, e.g. a first size and a second size, depicted in the image plane in one or more images of the sequence, wherein the sizes of the objects in image plane depicting the real-world object having the second size are scaled by a scaling factor indicative of the ratio between the first and second sizes. The set of images of the sequence images applied in determination of a mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes in the image plane may comprise a predetermined number of observations or at least a predetermined number of observations in order to ensure reliable enough estimate. This set of images may, consequently, comprise a subset of images of the sequence in which a real-world object of given size is depicted or all images of the sequence in which the real-world object of given size is depicted.

The image analysis unit 301 may be configured to obtain information indicating positions of and sizes of two or more objects in the image plane depicting the same real-world object in two or more images of the sequence of images. In other words, the two or more images depict a single real-world object moving within the field of view represented by the images of the sequence and, consequently, depict the real-world object in at least two different positions in the image plane. The image analysis unit 301 may be configured to obtain information indicating positions and sizes of two or more objects in the image plane depicting two or more real-world objects of essentially identical size in one or more images of the sequence of images. In other words, the one or more images depict two or more real-world objects of essentially identical size within the field of view represented by the images of the sequence and, consequently, depict a real-world object of essentially identical size in at least two different positions in the image plane.

Information indicating or identifying the two to or more objects in the image plane to depict two or more real-world objects of essentially identical size may be obtained for example as input from a user via a suitable user interface, e.g. by the user indicating the two or more objects in the image plane that are considered to represent real-world objects of essentially identical size. As another example, Information indicating or identifying the two to or more objects in the image plane to depict two or more real-world objects of essentially identical size may be obtained by analysis of image data of an image indicating two objects at a similar distance from a reference level in the image plane exhibiting essentially similar size. In case the reference level is assumed to be a level that is in parallel to the u axis of the image plane, there is no need to have an indication of the position of the reference level but it is sufficient to identify two objects of essentially identical size in the image plane at essentially the same position in the direction of the v axis of the image plane.

The image analysis unit 301 may be configured to obtain information indicating positions and sizes of two or more objects in the image plane depicting two or more real-world objects having different sizes. In particular, the two or more objects may comprise a first object having a first size in the real-world and a second object having a second size in the real-world, wherein the information indicating size of the second object in the image plane is scaled, e.g. multiplied, by a scaling factor indicative of the ratio between the first size and the second size. The scaling converts the size of the second object as observed in the image plane in such a way that it corresponds to a size the first object would have in the current position of the second object, hence enabling determination of the mapping between a position of an object in the image plane and a size of the object as depicted in the image plane on basis of observed positions and sizes of real-world objects of different size. The terms essentially similar size and essentially identical size as used herein refer to two - or several - real-world objects to have sizes that differ by a few percent at most. While the actual tolerance to deviation in size of the two - or several - real-world objects considered to represent an identical size depends on the distance of the real-world object from the focal point of the imaging device, a difference in size of up to 5 percent does not typically unduly degrade the accuracy of the mapping between a position in the image plane and a size of an object as depicted in the image plane. In general, for real-world objects further away from the focal point of the imaging device a larger difference in size may be tolerated without unduly affecting the accuracy of the mapping.

Similar considerations also apply to a size of a single real-world object that may exhibit subtle changes in size as depicted in the image plane even when the real-world object does not move in relation to the imaging device. An example of such real-world object is a person moving or standing within the field of view of the imaging device, where the subtle changes in size as depicted in the image plane may occur e.g. due to change in posture, change in orientation with respect to the image plane, etc.

The image analysis unit 301 may be configured to obtain information indicating a position of an object in the image plane and/or the size of the object for ex- ample by performing an analysis of image data of a number of images of the sequence of images in order to identify an object of predetermined characteristics, its position in the image plane and its size in the image plane. Image analysis techniques for detecting and identifying an object of predetermined characteristics in an image known in the art may be used for this purpose. The output of such analysis may comprise indication of pixel positions in the image plane indicating a position of the object in the image plane and/or indication of the size of the object.

Alternatively, the image analysis unit 301 may be configured to receive information indicating a position of an object in the image plane and/or the size of the object by receiving an indication of a pixel position or pixel positions of the image plane indicating a position of the object in the image plane and/or an indication of the size of the object. Such information may be received, for example, from another processing unit of the apparatus 300 or from a processing unit outside the apparatus 300, such processing unit configured to apply image analysis in order to determine a presence of an object of predetermined char- acteristics and a position and size thereof in the image plane. As another alternative, the information indicating a position and a size of an object in the image plane may be received, for example, based on input from a user. The user may indicate an object of interest in an image via a suitable user interface (such as display & pointing device, a touchscreen, etc.), for example by indicating a lower and upper boundaries of the object in the image plane and/or a left and right boundaries of the object in the image plane. As a particular further example, the user may be involved in initial detection of an object, whereas the image analysis unit 301 may be configured to track the object indicated by the user in the subsequent (and/or preceding) images of the sequence of images.

The information indicating a position of an object in the image plane may comprise, for example, a position indicating a lower boundary of the object in the image plane and/or a position indicating an upper boundary of the object in the image plane. Additionally or alternatively, the information indicating a position of an object may comprise for example a position indicating a left boundary of the object and/or a position indicating a right boundary of the object, as described hereinbefore. The information indicating a size of an object in the image plane may comprise, for example, a height of the object in the image plane and/or a width of the object in the image, as described hereinbefore. The height and/or the width in the image plane may be expressed e.g. as number of pixel positions.

The reference level determination unit 303 is configured to determine a mapping between a position of an object in the image plane and a size of the ob- ject in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images. In particular, the reference level determination unit 303 may be configured to determine such mapping for a real- world object of the first size, depicted in the image plane as an object of different size in one or more images of a sequence of images, where the size of the object in the image plane varies in dependence of a distance of the real-world object from the focal point of the imaging device used to capture the sequence of images.

The mapping may be determined as a function taking a position of the object in the image plane as an input argument and providing a corresponding size in the image plane as an output argument. A respective inverse function may al- so be determined, hence taking a size of an object in the image plane as an input argument and providing a corresponding position of the object in the image plane as an output argument. The position(s) and size(s) may be expressed as described hereinbefore. The reference level determination unit 303 may be configured to determine the mapping between a position of an object in the image plane and a size of the object in the image plane is determined as a linear function. Any suitable linear model may be employed. As an example, the reference level determination unit 303 may be configured to apply a function of the form h_v = av_b + b, (7) where h_v represents a size of the object as a height of an object in the image plane, v_b represents a position of the object as a position of a lower boundary of the object in the image plane, while a and b represent mapping parameters to be determined. The mapping may be determined for example using a least squares fit to an equation system comprising a number of equations of the form indicated by the equation (7), each equation of the system representing a pair of an observed position of a lower boundary of an object v¾, and a corresponding observed height of the object h_vi in the image plane. Consequently, the fitting involves determining the parameters a and b such that the overall er- ror in an equation system (8) is minimized (by using methods known in the art)

Figure 4 illustrates the principle of linear fitting according to the equations (7) and (8) by an example. The black dots represent observed pairs of position of a lower boundary of an object in the image plane and the respective height of the object in the image plane in a coordinate system where the position of an object in the image plane is indicated as the position along the v axis and the height of an object in the image plane is indicated by the position in the h axis, which may also be referred to as a 'size axis'. Note that the observed positions and sizes are explicitly indicated only for some of the observed pairs for clarity of illustration. The line 402 crossing the v axis at v_h = -bla and crossing the h axis at point h_ref = b illustrates the fitted line providing a minimum error in consideration of the observed position - height pairs. In particular, v_h indicates an estimate of a position (along the v axis of the image plane) representing a level where the height of the object is zero, whereas h_ret indicates an estimated height of the object at the bottom of the image.

The exemplifying mapping function illustrated by the equations (7) and (8) may be modified to employ a parameter different from the observed height to indicate a size of the object in the image plane and/or a parameter different from the observed position of a lower boundary of the object to indicate a position of the object in the image plane. As an example of such a modification, the exemplifying process of determining the mapping function may be modified by replacing the height of the object h_v in equations (7) and (8) by a width of the object (w_v, Wvl) to represent a size of the object in the image plane and/or the position of the lower boundary of the object v_b in equations (7) and (8) by the position of an upper boundary of the object (v_t, v¾) to represent a position of the object in the image plane. Instead of applying a linear function, the mapping may be determined by using a parabolic function or a 'skewed' parabolic function, i.e. a second order function. Consequently, the mapping between a position of an object in the image plane and a size of the object in the image plane is determined using a parabolic fit. As an example of a parabolic fit, we may consider determination of the mapping on basis of observed positions of the lower and upper boundaries of an object in the image plane, v_b and v_t, respectively. These positions may be expressed as

fsz-cfy_c

V_b = (9)

cz+sy_c and

cfy-cfy_c+fsz

(10) cz—sy+sy_c where s = sin θ_χ and c = cos θ_χ. The equations (9) and (10) enable solving the projected object height h_v in the image plane (in pixels), as indicated by equations (1 1 ) and (12), respectively. _ y(c²fv_t-cf²s+csv^ -fs²v_t)

fy-c fy_c+csv_ty-fs y_c

The equations (1 1 ) and (12) essentially provide mapping between a position of an object in the image plane, expressed by the position of the lower boundary of the object v_b and the position of the upper boundary of the object v_t and a size of the object, expressed as the size of the object as 'skewed' parabolic curves.

The equations (1 1 ) and (12) may be written as

Av +BV†+C ,

— — = h_v (14)

-Av_t+E ^v ' Thus, the mapping may be determined for example using a least squares fit to an equation system comprising equations of the form indicated by the equations (13) and/or (14), each equation of the system representing a pair of an observed position of a lower or upper boundary of an object v¾, or v¾, respectively, and a corresponding observed height of the object h_vi in the image plane. Consequently, the fitting involves determining the parameters A, B and C together with D and/or E such that the overall error in the equation system is minimized (by using methods known in the art).

Consequently, in case the reference level of interest is a horizon level in the image plane where the height of an object, representing the reference size, can be assumed to be zero. Hence, if the projected object height h_v in the image plane in the equations (13) and/or (14) is set to zero, once the mapping parameters A, B and C together with D and/or E have been estimate the horizon level may be determined as

B+ lB² - AC

ν» = ^ — ^{· (15)} While a parabolic or a 'skewed' parabolic function may be considered to provide a model that is theoretically more accurate than a linear one, it is more sensitive to errors in observed positions and sizes in the image plane and also requires slightly more complex computation. Hence, a parabolic or a 'skewed' parabolic function may not be applicable to all scenarios. Furthermore, a map- ping function of any order and any kind may be employed without departing from the scope of the present invention

The reference level determination unit 303 is further configured to use the mapping to determine an estimate of a position representing the reference lev- el in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a predetermined reference size. Such predetermined reference size is preferably zero, or a size that maps to zero in consideration of the available image resolution. Consequently, the reference level represents a horizon in the image plane. An estimate of a position or level representing a horizon in the image plane may be useful for example in determination of parameters associated with the imaging device employed to capture the sequence of images and its position and/or orientation with respect to the real-world. As another example, an estimate of a position or level representing a horizon in the image plane may be useful for image analysis, in par- ticular in analysis of objects, their positions and changes thereof in images of the sequence of images.

. In a scenario where the plane in the field of view of the imaging device is not a horizontal one but an upward or a downward slope of essentially constant ascent/descent with respect to the image plane, the reference level determined on basis of a position in the image plane where a size of an object maps to a predetermined reference size may not represent a 'real' horizon in the image plane but rather a virtual horizon with respect to the non-horizontal plane in the field of view of the imaging device.

Alternatively, instead of using a zero-size to determine an estimate of a posi- tion of a horizon in the image plane, a non-zero reference size may be used to determine a reference level different from the horizon level.

As an example, an estimate of a position representing the reference level in the image plane in images of the sequence of images may be expressed, i.e. determined, as a distance from a predetermined reference point in the direc- tion of the v axis of the image plane. As an example, the reference level may be expressed as a distance in number of pixel positions from the origin of the image plane, thereby directly indicating the v axis coordinate v_h of the image plane estimating the position of the reference level, as illustrated by an example in Figure 4. As another example, an estimate of a position representing the reference level in the image plane may be expressed as an angle corresponding to a slope of the mapping function (or the inverse mapping function) together with a second reference size. For example in case of a linear mapping on basis of a function according to the equation (7), a slope of the mapping function may be determined on basis of the parameter a. The corresponding angle φ, which is the angle between the h axis, i.e. the 'size axis', of the example of Figure 4 and the fitted line 402 representing the mapping function, may be determined as

(16)

The angle φ may be used together with a second reference size h_re which may be for example the (estimated) height of the object at a predetermined position of the image, for example at the origin of the image plane or at the bottom of the image href. , or the height of the object at any other suitable posi- tion in the image plane, to indicate an estimate of a position representing the reference level in the image plane. In case of a linear mapping on basis of a function according to the equation (7), the (estimated) height of the object at the origin of the image plane can be rather conveniently obtained by setting the position of the object in the image plane v_b in equation (7) to zero, resulting in the second reference height h_ret = b as the second reference size. Consequently, in case the reference level of interest is a horizon level in the image plane, an estimated of a position representing horizon v_h may be computed as v_h = h_ref tan <p. (17)

In the foregoing, the angle between the fitted line 402 and the h axis (i.e. the 'size axis') were used as parameters descriptive of the estimate of a position representing the reference level. As another example, the angle between the fitted line 402 and the v axis, i.e. β = arctan {h_ref I v_h) = arctan (-a), together with href as the second reference size. Consequently, in case the reference level of interest is a horizon level in the image plane, this may be estimated by v_h = tan β I h_ref.

In case a size parameter different from the (observed) height of a depicted object in the image plane and/or a position parameter different from the (observed) position of a lower boundary of the depicted object in the image plane are employed to determine the mapping, similar considerations with respect to expressing, or determining, the estimate of a position representing the reference level apply.

Determination of an estimate of a position representing the reference level in the image plane in images of the sequence of images described hereinbefore may be applied to determine a single estimate of the reference level position in the image plane. Consequently, the reference level determination unit 303 may be configured to determine a final, or refined, estimate of a position representing the reference level in the image plane on basis of a single estimate of a position representing the reference level.

While the refined estimate of a position representing the reference level may be reliably determined based on a single estimate, the accuracy of the refined estimate may be improved by determining a number of (initial) estimates of a position representing the reference level in the image plane and by deriving the refined estimate. Hence, the reference level determination unit 303 may be configured to determine the refined estimate of a position representing the reference level on basis of one or more (initial) estimates of a position representing the reference level. The size of the real-world object, referred to hereinbefore also as the first size, used as a basis for determination the number of (ini- tial) estimates need not be the same, but a refined estimate of a position representing the reference level in the image plane may be determined on basis of a number of (initial) estimates that are based on real-world objects of different size.

As an example, the reference level determination unit 303 may be configured to determine the refined estimate as an average of two or more estimates of a position representing the reference level in the image plane, for example as an average of two or more estimates of a v axis coordinate v_h in the image plane estimating the position of the reference level or as an average of two or more estimates of the angle φ indicating the position of the reference level together with the second reference size h_ref.

While averaging of a number of v axis coordinates v_h indicating an estimated position of the reference level in the image plane may be realized in a straightforward manner, some further processing is required for averaging of a number of angles φ indicating the position of the reference level together with the se- cond reference size h_ref. In particular, assuming there is an initial estimate of the angle φ and the second reference size h_re one may determine another mapping according to the equations (7) and (8) to find another estimates of the parameters a and b, denoted as Ά_Λ and £>i , based on which one may determine another pair of parameters v_ren = -£>i and φι = arctan (-1 /a-i) estimating the position of the reference level in the image plane at v_M = In order to combine the another estimate with the initial estimate, one may use the another estimate of the position of the reference level v_hu which in this example is the estimated horizon level in the image plane, to find the adjusted angle as φΊ = arctan {v_Mlh_rei). The adjusted angle compensates for the difference between the sizes of the real-world objects - and hence the corresponding sizes thereof as depicted on the image plane - such that a combined angle q>_avg = (φ₀ + φ'ϊ) I 2 may be computed as the arithmetic mean, enabling computation of the combined estimate of the horizon level as v_avg = h_ref ^* tan q>_avg. Any possible further estimates of the parameters a and b may be incorporated into the combined estimate in a similar manner as the another estimate discussed in the foregoing.

The average may be an arithmetic mean or, alternatively, a weighted average may be employed. The weighting may involve making use the fitting error that may be derived as part of a least squares fit applied to a group of equations according to the equations (7) and (8), for example such that a given (initial) estimate of a position representing the reference level is multiplied by a weight that has a value increasing with decreasing value of the respective fitting error.

The reference level estimation unit 305, or the apparatus 300 in general, may be further configured to output an estimate of a position representing the reference level or the refined estimate of a position representing the reference level. The reference level estimation unit 305 may be configured to output a number of (initial) estimates of a position representing the reference level, which may be determined e.g. as part of a process determining a refined estimate. The reference level determination unit 305 may be configured to provide the one or more estimates and/or the refined estimate to another processing unit within or outside the apparatus 305, for example, to facilitate image analysis, determination of parameters associated with the imaging device employed to capture the sequence of images, etc. The operations, procedures and/or functions assigned to the image analysis unit 301 and the reference level determination unit 303 described hereinbefore may be divided between the respective units in a different manner, or there may be further units to perform some of the operations, procedures and/or functions described hereinbefore for the above-mentioned units. On the other hand, the operations, procedures and/or functions the image analysis unit 301 and the reference level determination unit 303 are configured to perform may be assigned to a single processing unit within the apparatus 300 instead. In particular, in accordance with an aspect of the invention, the apparatus 300 may comprise means for obtaining information indicating positions and sizes of two or more objects in an image plane in one or more images of a sequence of images, wherein said two or more objects in the image plane depict a real- world object having a first size, means for determining a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, and means for using the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size. The apparatus 300 may further comprise means for outputting the estimate of a position representing the reference level in the image plane.

The operations, procedures and/or functions described hereinbefore in context of the apparatus 300 may also be expressed as steps of a method implementing the corresponding operation, procedure and/or function. As an example, Figure 5 provides a flowchart illustrating a method 500. The method 500 may be arranged to estimate a position representing a reference level in a sequence of images. The method 500 comprises obtaining information indicating positions and sizes of two or more objects in an image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size, as indicated in step 502. The method 500 further comprises determining a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, as indicated in step 504. The method 500 further comprises using the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size, as indicated in step 506. The method 500 may further comprise outputting the estimate of a position representing the reference level in the image plane. An estimate, or a refined estimate, of a position representing the reference level in the image plane may be employed in further determination of parameters associated with the imaging device employed to capture the sequence of images and its position and/or orientation with respect to the real-world, e.g. with respect to objects of real-world within the field of view of the imaging de- vice. In particular, an estimate or a refined estimate of a position representing a horizon in the image plane, both of which are referred to as an estimated horizon in the image plane in the following, may be employed.

A z coordinate of the real-world, for example, may be defined as z = az, (1 8) where a is a scaling factor. Furthermore, z may be calculated such that k ^vb ~^vh

= (1 9) and

^ = kv_b— kv_h. (20)

While this may be considered as a linear simplification, since the z coordinate of the real world is parabolically 'skewed', the linear model according to equations (1 9) and (20) results in modeling error small enough for a number of practical applications.

Coordinates of the image plane corresponding to a position in the real world coordinates may be further expressed as

az -s c l az. (y - y_c) + caz where s = sin θ_χ and c = cos θ_χ.

The v coordinate of the image plane may be obtained from wv by dividing it by w, i.e. by normalizing the homogenous coordinates (wv, w): WV _f cy-cy_c+saz

(22)

W * -sy+sy_c+c z

Multiplication of both sides of the equation (22) with its denominator yields

(— sy + sy_c + caz)v = f(cy— cy_c + saz) (23)

The equation (23), in turn, may be employed to formulate linear equations for the positions of the lower and upper boundaries of an object in the image plane v_b and, v_t, respectively, with the further assumption that the depicted real-world object resides on the plane determined by the x and z axes of the real-world coordinate system, implying that y = 0:

+ sy_cv_t + cazv_t— fey + fcy_c = 0

(24) sy_cv_b + cazv_b + fcy_c— fsaz

The equation (24) provides a possibility to substitute the known values with their respective observed values and use for example a QR decomposition or a singular value decomposition (SVD) to solve the rest of the variables of the equations (24).

In case of known positions of the lower boundary v¾, and the upper boundary v_ti of an object in the image plane at least for three objects, a homogenous linear equation system of a format Ax = 0 can be formulated:

= 0 (25)

The equation (25), in turn, may be solved using a SVD to decompose the matrix A into an m-by-n unitary matrix U, an m-by-n diagonal matrix D and an n- by-n unitary matrix V, i.e. into a format:

A = UDV^T (26) such that an estimate of the vector is the last column of matrix V of the equation (26). Note that the expression m-by-n matrix is used herein to indicate a matrix having m rows and n columns. Based on the estimate of the vector x it is possible to estimate a value of parameter β of the equation (1 8) by making use of the identity sin²x + cos²x = 1 : ttfsaXs) + {fc){ca)) (27)

The rest of the parameters of the equation (25), i.e. the tilt angle of the imaging device θ_χ, the scaling factor cr, the focal length of the imaging device and the height of the focal point of an imaging device y_c may be solved using the equations (28) to (31 ).

₍₅₎

Θ_Χ = arcsin(— ) (28)

(ca)

a = (29) β cos θ_χ

(fc) cos θ_χ

f = (30) β

(fcy_c) cos

Vc = (31 ) f

In case of known positions of the lower boundary v¾, and the upper boundary v¾ of an object in the image plane at least for two objects, a known focal length of the imaging device f, and a known estimate of a position v_h representing a horizon in the image plane, the tilt angle of the imaging device θ_χ may be determined:

Θ_Ύ = arctan (32)

Consequently, the remaining variables of the equation(s) (24) having unknown values are the height of the focal point of an imaging device y_c and the scaling factor a. With this information it is possible to determine a linear equation system as a product of a matrix A and a vector x, resulting in a vector b, i.e. as Ax = b.

czv_tl — sfz^' syv_tl + cfy- czv_bl — sfz 0

sv_t2 + cf czv_t2 — sfz syv_t2 + cfy (33) czv_b2 — sfz 0

Note that in the exemplifying linear equation system of the equation (33) the pairs of rows in matrix A and in vector b are repeated as many times as there are observed positions of the lower boundary

and the upper boundary v¾ of an object in the image plane.

A linear system of the format indicated in the equation (33) may be solved for example by using a least squares fit approach. A QR matrix decomposition (as known in the art) may be applied to the m-by-n matrix A to decompose it into a m-by-m unitary matrix Q and to a m-by-n upper triangular matrix R, i.e. into

such that an estimate x of the vector x may be solved using the equation (35) below.

X = R^~1Q^Tb. (36)

As can be seen from the equation (26), the estimated height of the focal point of an imaging device y_c may be obtained as the first element of the vector x, whereas the estimated scaling factor a may be obtained as the second element of the vector x. In case the positions of the lower boundary v_bi and the upper boundary v_ti of an object in the image plane are known at least for three objects, it is also possible to estimate the error involved in the least squares fit approach using the QR decomposition described hereinbefore, for example to enable analysis of reliability of the estimated parameter values obtained as elements of the vector x. The estimated error E_x may found for example using the following equations.

\\b-Ax\\²

MSE = (37)

m—n

S = {A^TA)-¹MSE (38)

E_x = Vdiag(S) (39)

In the equations (37 to 39), m denotes the number of rows in the matrix A and n denotes the number of columns in the matrix A, and diag(S) denotes a vector containing the diagonal elements of the matrix S.

The apparatus 300 may further comprise an imaging parameter estimation unit 305, operatively coupled to the image analysis unit 301 and/or to the reference level estimation unit 303. The imaging parameter estimation unit 305 may be also referred to as an image parameter estimator or parameter estimator.

The imaging parameter estimation unit 305 may be configured to obtain information indicating positions of a number of objects in the image plane in one or more images of the sequence of images and to determine one or more parameters associated with the imaging device employed to capture the sequence of images and its position and/or orientation with respect to the real-world, as described hereinbefore.

As an example, the imaging parameter estimation unit 305 may be configured to obtain information indicating positions of lower and upper boundaries of three or more objects in the image plane and use for example one or more of the equations (25) to (31 ) to solve one or more parameters associated with the imaging device and/or its orientation within its environment in the real-world.

As another example, the imaging parameter estimation unit 305 may be con- figured to obtain information indicating positions of lower and upper boundaries of two or more objects in the image plane and use for example one or more of the equations (32) to (39) to solve one or more parameters associated with the imaging device and/or its orientation within its environment in the real-world.

The apparatus 300 may be implemented as hardware alone, for example as an electric circuit, as a programmable or non-programmable processor, as a microcontroller, etc. The apparatus 300 may have certain aspects implemented as software alone or can be implemented as a combination of hardware and software.

The apparatus 300 may be implemented using instructions that enable hard- ware functionality, for example, by using executable computer program instructions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium to be executed by such a processor. The apparatus 300 may further comprise a memory as the computer readable storage medium the processor is configured to read from and write to. The memory may store a computer program comprising computer-executable instructions that control the operation of the switch control apparatus 300 when loaded into the processor. The processor is able to load and execute the computer program by reading the computer-executable instructions from memory. While the processor and the memory are hereinbefore referred to as single components, the processor may comprise one or more processors or processing units and the memory may comprise one or more memories or memory units. Consequently, the computer program, comprising one or more sequences of one or more instructions that, when executed by the one or more processors, cause an apparatus to perform steps implementing operations, procedures and/or functions described in context of the apparatus 300.

Reference to a processor or a processing unit should not be understood to encompass only programmable processors, but also dedicated circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processors, etc. Features described in the preceding description may be used in combinations other than the combinations explicitly described. Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Alt- hough features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.

Claims

1 . A method for estimating a position representing a reference level in an image plane in a sequence of images, the method comprising obtaining information indicating positions and sizes of two or more objects in the image plane in one or more images of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size, determining a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, and using the mapping to determine an estimate of a position representing the reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size.

2. A method according to claim 1 , wherein said two or more objects in the image plane depict the same real-world object in two or more images of the sequence of images.

3. A method according to claim 1 , wherein said two or more objects in the image plane depict two or more real-world objects of essentially identical size in one or more images of the sequence of images.

4. A method according to claim 1 , wherein said two or more objects in the image plane depict at least a first real-world object having the first size and a second real-world object having a second size in one or more images of the sequence of images, wherein the information indicating size of the second object in the image plane is scaled by a scaling factor indicative of the ratio be- tween the first size and the second size.

5. A method according to any of claims 1 to 4, wherein a position of an object in the image plane is determined by a position of a lower boundary of an object or by a position of an upper boundary of an object.

6. A method according to any of claims 1 to 5, wherein a size of an object in the image plane is determined by a height of an object or by a width of an object.

7. A method according to any of claims 1 to 6, wherein the mapping between a position of an object in the image plane and a size of the object in the image plane is determined as a linear function.

8. A method according to claim 7, wherein the linear function is determined by applying a least squares fit for the equation h_v = av_b + b, where h_v represents a height of an object in the image plane, v_b represents a position of a lower boundary of the object in the image plane, and a and b represent mapping parameters to be determined by the least squares fit.

9. A method according to claim 8, wherein the estimate of a position representing the reference level is determined as an angle corresponding to the slope determined by the parameter a and a second reference size at a second reference level in the image plane.

10. A method according to any of claims 1 to 9, wherein the estimate of a po- sition of the reference level is determined as a distance from a second reference level.

1 1 . A method according to claim 9 or10, wherein the second reference level is the bottom of an image.

12. A method according to any of claims 1 to 1 1 , wherein the reference level represents a horizon in the image plane and wherein the reference size is zero or essentially zero.

13. A method according to any of claims 1 to 12, further comprising determining a refined estimate of a position representing the reference level on basis of one or more estimates of a position representing the reference level.

14. A method according to claim 13, wherein the refined estimate is determined as an average of two or more estimates of a position representing the reference level.

15. A computer program comprising one or more sequences of one or more instructions which, when executed by one or more processors, cause an appa- ratus to at least perform the method of any of claims 1 to 14.

16. An apparatus for estimating a position representing a reference level in an image plane in a sequence of images, the apparatus comprising an image analysis unit configured to obtain information indicating positions and sizes of two or more objects in the image plane in one or more imag- es of the sequence of images, wherein said sizes of two or more objects in the image plane correspond to a real-world object having a first size, and a reference level determination unit configured to determine a mapping between a position of an object in the image plane and a size of the object in the image plane on basis of said positions and sizes of the objects in the image plane in said one or more images, and use the mapping to determine an estimate of a position representing a reference level in the image plane in said sequence of images as a position in the image plane where a size of an object maps to a reference size.

17. An apparatus according to claim 16, wherein said two or more objects in the image plane depict the same real-world object in two or more images of the sequence of images.

18. An apparatus according to claim 16, wherein said two or more objects in the image plane depict two or more real-world objects of essentially identical size in one or more images of the sequence of images.

19. An apparatus according to claim 16, wherein said two or more objects in the image plane depict at least a first real-world object having the first size and a second real-world object having a second size in one or more images of the sequence of images, wherein the information indicating size of the second object in the image plane is scaled by a scaling factor indicative of the ratio be- tween the first size and the second size.

20. An apparatus according to any of claims 16 to 1 9, wherein a position of an object in the image plane is determined by a position of a lower boundary of an object or by a position of an upper boundary of an object.

21 . An apparatus according to any of claims 16 to 20, wherein a size of an object in the image plane is determined by a height of an object or by a width of an object.

22. An apparatus according to any of claims 16 to 21 , wherein the mapping between a position of an object in the image plane and a size of the object in the image plane is determined as a linear function.

23. An apparatus according to claim 22, wherein the linear function is deter- mined by applying a least squares fit for the equation h_v = av_b + b, where h_v represents a height of an object in the image plane, v_b represents a position of a lower boundary of the object in the image plane, and a and b represent mapping parameters to be determined by the least squares fit.

24. An apparatus according to any of claims 16 to 23, wherein the estimate of a position representing the reference level is determined as an angle corresponding to the slope determined by the parameter a and a second reference size at a second reference level in the image plane.

25. An apparatus according to any of claims 16 to 24, wherein the estimate of a position representing the reference level is determined as a distance from a second reference level.

26. An apparatus according to claim 25 or 26, wherein the second reference level is the bottom of an image.

27. An apparatus according to any of claims 16 to 26, wherein the reference level represents a horizon in the image plane and wherein the reference size is zero or essentially zero.

28. An apparatus according to any of claims 16 to 27, wherein the reference level determination unit is further configured to determine a refined estimate of a position representing the reference level on basis of one or more estimates of a position representing the reference level.

29. An apparatus according to claim 28, wherein the refined estimate is determined as an average of two or more estimates of a position representing the reference level.