WO2007072370A2

WO2007072370A2 - Method and apparatus for estimating object speed

Info

Publication number: WO2007072370A2
Application number: PCT/IB2006/054874
Authority: WO
Inventors: Harold G.P.H. Benten; Gerd Lanfermann
Original assignee: Koninklijke Philips Electronics, N.V.; U.S. Philips Corporation
Priority date: 2005-12-20
Filing date: 2006-12-14
Publication date: 2007-06-28
Also published as: WO2007072370A3

Abstract

A motion estimation device and method having an image acquisition device or a source of images, and a processor for receiving a plurality of images, determining a motion vector for a background depicted in the plurality of images, and determining a motion vector of an object depicted in the plurality of images determined in part on the motion vector of the background.

Description

METHOD AND APPARATUS FOR ESTIMATING OBJECT SPEED

The present system relates to motion estimation of objects using a video processing method and apparatus that detects the speed and direction of motion of moving objects in a video sequence.

U.S. Patent No. 6,130,706, which issued to Arthur Clifford Hart, Jr., et al. on October 10, 2000, discloses an optical monitoring process in which successive images from one or more cameras are analyzed for monitoring movement of features of a surface over which a vehicle is traveling to determine vehicle dynamic parameters, such as slip angle.

U.S. Patent No. 6,211,912, which issued to Behzad Shahraray on April 3, 2001 discloses a method for automatically detecting scene changes in video image sequences from camera induced motion resulting from camera operations such as panning, tilting, and zooming.

U.S. Patent Publication No. 2002/0036692A1 published March 28, 2002 by Ryuzo Okada, discloses a method and apparatus for compensating for video shake in which a selected region of a video image is selected and analyzed to determine a motion vector for each pixel in the selected region. The amount of shake is computed and compensated by analyzing the motion vector. International Publication No. WO 2004/06624A1 published on

August 5, 2004 to Rimmert B. Wittebrood, et al., discloses a selector for selecting a background motion vector of a pixel in an occlusion region of an image.

Techniques for motion estimation have been developed by Koninklijke Philips

Electronics N.V. ("Philips"). Motion estimation techniques developed by Philips are applied in video processing solutions, such as high-end television sets. Motion estimation is a key technology for realizing improved picture quality. A well-established motion estimation algorithm is known as the 3D Recursive Search ("3D-RS") algorithm.

Philips has several real-time implementations of the 3D-RS algorithm available, both in hardware and software. Examples of hardware implementations are found in integrated circuits for video applications, such as TV applications. The software versions execute on different platforms, including Intel Pentium™ processors and Trimedia processors (e.g., Jaguar platform for TV applications featuring Software Natural Motion). The 3D-RS technology is described by Gerard de Haan, Paul Biezen cs, True-Motion Estimation with 3-D Recursive Search Block Matching, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 3, No. 5, October 1993 and Gerard de Haan and Paul Biezen, Sub-pixel motion estimation with 3-D recursive search block-matching, Signal Processing: Image Communication 6, Elsevier 1994, pages 229-239.

The entire disclosure of each of the above-cited patents and publications is hereby incorporated in entirety by reference herein.

It is known to determine motion vectors for objects in a scene. Known MPEG processors employ this technique so that the processors can compress the amount of information transmitted in video images. In MPEG, an object description may be sent together with a motion vector for how the object moves in a series of video frames. In this way, the object information for each frame does not need to be transmitted. Two methods of image processing for speed estimation of vehicles are known in the prior art. One method employs two cameras separated by a fixed known distance. The license plates of vehicles are recognized at the first and second camera, and the time interval that it takes the recognized vehicle to travel between the cameras is used to calculate the average speed of the vehicle.

In a second known method, several frames (images) taken by one camera are processed. After vehicle recognition is applied to the frames, the position of a certain vehicle is known in at least two images. Since the time interval between the images is also known, the speed of the vehicle can be calculated using a change in the objects position between the images.

It is an object of the present system to overcome the disadvantages in the prior art.

The present system in one embodiment involves speed measurement of obstacles, e.g. for motor vehicle obstacle detection system. The present system measures a global motion vector (background motion) that is related to movement of a capturing device, such as a camera, and a motion vector for an object, such as a car, person, etc., to determine the speed of the object. The speed of the object is determined through sign based subtraction of the global motion vector from the object motion vector and by compensating for positioning of the object in the image. The present system provides a method and apparatus for compensating for the global movement to apply motion estimation for applications including automotive applications such as obstacle recognition and speed measurements. In another aspect of the present system, motion compensation is accomplished using global motion extracted from the video material shot by a moving camera. The present system contemplates that the accuracy of motion estimation may be improved by using data from a vehicle wherein the camera is positioned. Vehicle data may include vehicle speed, positioning of a steering wheel of the vehicle, vehicle Global Positioning Satellite (GPS) information, etc.

The proposed approach for speed estimation of road traffic is based on image processing, and in one embodiment, video processing. The requirements of such a system include a camera and electronics for performing the image processing. The camera may include a CMOS image sensor. The electronics may be based on off-the-shelf components including a processing unit (e.g., CPU) suitable for image processing, a dedicated processing unit (e.g., reduced instruction set computer, RISC), video image processing unit such as utilized for MPEG compression calculations, etc.

One advantage as compared to prior art vehicle speed measuring apparatus and methods solutions is the potential for replacement of expensive parts (radar, laser gun, infrared camera, etc.) or difficult to install parts (inductive loops in the road) by cheaper, potentially off the shelf components.

Compared to other systems based on image processing, the unique feature of the proposed system is the use of existing motion estimation technology, such as 3D-RS. Since the technology for motion estimation is designed for consumer products and therefore cost- effectiveness, the implementation is more efficient in terms of processing power requirements (operations per second) than other known solutions. The speed estimation method according to the present system may be implemented as a real-time embedded system inside a camera, without the use of expensive remote computers to perform the video processing.

It should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the present system. In the accompanying drawings, like reference numbers in different drawings designate similar elements.

FIG. 1 shows an illustrative speed estimation device according to an embodiment of the present system.

FIGs. 2 illustrate an effect of motion of an image acquisition device on a motion vector of an object in accordance with the present system, particularly, FIG. 2A illustrates the object vector when the image acquisition device is stationary and the object is moving to the right; FIG. 2B illustrates the object vector when both the image acquisition device and the object are moving to the right; and FIG. 2C illustrates a case when the image acquisition device and the moving object are moving in opposite directions.

FIG. 3 is a block diagram of processing actions which illustrate a vector processing method according to the present system.

FIG. 1 shows an illustrative speed estimation device 100 according to an embodiment of the present system. The device has a processor 110 operationally coupled to a memory 120, a display 130, a user input 140, and an image capture device (e.g., camera) 150. The memory 420 may be any type of device for storing application data and image data from the camera 150. The application data is received by the processor 110 for configuring the processor 110 to perform operation acts in accordance with the present system. The operation acts include controlling the camera 150 to capture images of an object 170. The images may be stored in the memory 120. The images are also operated on by the processor 110 for determining a motion vector 175 of the object 170 that is a motion vector of the object 170 relative to the camera 150, and a motion vector 155 that is motion vector of the camera 150 relative to background features of the images.

As shown, the camera 150 has a field of view 180 that is sufficient to capture two or more images of the object 170 and background objects (not shown) for determining the motion vectors 155, 175. The user input 140 operates on the processor 110 together with the application to facilitate calibration of the device 100 and for further operation as discussed further herein. The device 100 is operationally coupled to the display 130 to facilitate the user input.

In one embodiment, the camera 150 may also be operable to capture identifying information of the object 170, such as a license plate for an automobile object to facilitate an identification of the object 170. The processor 110 may operate on the images from the camera 150 to recognize details of the object 170 using automated recognizing techniques to further facilitate identification of the object. The recognizing techniques may include object character recognition of identifying characteristics of the object, computer vision and object matching, etc. In one embodiment, the device 100 has a camera 160 to further facilitate identification of the object 170. The camera 160 is shown having a narrower field of view 190 than the camera 150 to facilitate capturing identifying details of the object 170. In another embodiment, the camera 160 may not be present and the camera 150 may simply be controlled by the processor 110 to capture a more detailed image of the object 170, for example by controlling a zoom feature of the camera 150 subsequent to image vectors 155, 175 being determined. In other embodiments, image identification may not be a feature of the device 100.

Operation acts of the processor 110 include determining the motion vectors 155, 175. One illustrative embodiment of the present system employs operation acts of the processor 110 to cause the processor 110 to operate as a motion estimator. Numerous motion estimators may be utilized in accordance with the present system including video processing techniques utilized for determining motion estimation related to MPEG video compression. In one embodiment, the processor 110 may operate to perform a 3D- Recursive Search (3D-RS) algorithm on the images to determine the motion vectors 155,

175. As stated above, there is no need for object (e.g., vehicle) recognition in order to obtain motion vectors, because an algorithm, such as 3D-RS, may process the complete frame in blocks of pixels (e.g. 8x8 pixels per block). The 3D-RS motion estimation algorithm is very efficient and robust, and may require only 7 to 10 operations per pixel depending on the actual implementation and requirements. The output of the motion estimator (e.g., processor 110) in this illustrative embodiment is one motion vector for each 8x8 block of pixels. Every vector has an x- and y-component, indicating horizontal and vertical motion respectively. The vector value represents the motion measured in pixels or in fractions of pixels, such as in quarter pixels, over the images.

The conversion of motion vectors into actual speeds is executed by the processor performing algebraic operations on the motion vector components to take for example the median or average motion vector of a group of pixels blocks (e.g., a group of 8x8 pixel blocks) of the object 170. The calculated (e.g., average) x- and y-components are then utilized to calculate a length of the vector in the direction of the motion that results in the object's 170 motion vector, given by:

Veclength = \ ^v* ^{+ vy} (1)

wherein v_x and v_y are an average or median vector component.

Utilizing a known frame frequency (a speed of acquiring images), such as 25 frames per seconds (Hz), the speed of the object 170 in pixels per second (pps) may be calculated as follows:

Speed_of_Object_pps=veclength*frame_freq (2)

The speed in pixels per second is converted into the actual speed in meters per second (mps) by dividing by a conversion factor:

, speed pps speed _ mps = ^~ conv _ factor

A speed in meters per second may be expressed in km/h or miles/h for easier interpretation.

The conversion factor depends on the location of the object in the images. Each location within the image has its own conversion factor. The conversion factor for a specific location within the image may be extracted from information present in the image itself. For example, by measuring a known length, size, distance, etc. of other objects positioned within the image in pixels. Examples of objects that may be used for this purpose are stripes on the road, the distance between two objects next to the road, the objects themselves, or any other object having known size characteristics.

Once the conversion factor for a specific image location is measured, the conversion factor may be translated to other locations by taking the height and viewing angle of the camera 150 into account to calibrate the camera 150 with the acquired images. Typically, this may require a transformation from the real three-dimensional (3D) world into the 2D image. The mathematics required for the conversion factor transformation may be quite complex requiring several processing cycles, however the calibration only has to be performed once during positioning of the camera 150.

To simplify the calibration, the conversion factor may only be required at positions where the actual speed of the objects is important. For example, the speed of objects may be performed when the object enters a particular portion of the image. This enables the present system to limit the calibration and calculation of conversion factors to just a few, such as one per lane of a road or portion of the image that is being monitored. In a vehicle application, this may be an image view of objects positioned in front, side, back, etc. of a vehicle where the image acquisition device is positioned.

The accuracy of the speed estimation is determined by the estimation error in the motion vectors ε:

, frame freq.Λl2 »ε speed _ error = = (4) conv _ factor

Since the frame frequency and conversion factor are proportional to each other, reducing the frame_frequency does not reduce the speed_error. The speed_error is introduced in the algorithm utilized for estimating the motion vectors (e.g., 3D-RS).

However, for motion estimation applied to an image sequence produced by a non- stationary camera 150, the motion vector field (motion vectors for the pixel blocks) also reflects the apparent motion of the background caused by movement of the camera 150. In the case of a non-stationary camera 150, the estimated motion vector cannot be applied for identifying the object speed directly because the vectors do not indicate just the motion of the object, but also the apparent motion of the background with relation to the moving camera 150. Accordingly, the estimated motion vector cannot be directly applied for the speed estimation of objects because, due to the motion of the camera, the vectors are no longer solely related to the motion of the objects.

Illustratively, the camera 150 may be a video camera, such as a Panasonic NV- DXl 1 OEG consumer video camera. The illustrative video camera has a ¹A inch CCD image sensor; an auto iris lens with an aperture of Fl.6; a focal length of 4.0 - 48 mm, a zoom capability of 12: 1 power zoom; and a digital interface for interfacing with the processor 110 for controlling operation of the video camera. The recordings may be transferred to the processor 110 and stored in AVI-format (dvsd at 25 Hz) in the memory 120.

For an illustrative embodiment that utilizes 3D-RS motion estimation algorithm that operates on luminance (i.e. color information is not used), the 3 D-RS motion estimation algorithm performs well with a standard video camera in night-shot mode, with an infra red camera, etc.

This problem with calculating the motion vectors when using a non-stationary camera is illustrated in more detail with reference to FIGs. 2A-2C. A relatively larger arrow in each figure represents a motion vector indicating the motion (speed) of the object 170. Motion vectors for each block of pixels across the image is termed a vector field. With reference to FIG. 2A, the vector field is determined as described above utilizing a camera that is stationary. In principle, the vector field may be used for obstacle recognition and speed estimation since the object (obstacle) may be found from the vector field, and the motion vectors from the image reflect the true motion of the object.

FIG. 2B illustrates the object vector when both the camera and the object are moving to the right. In the resulting images, a background motion vector indicates movement to the left, represented by the small arrow pointing to the left, and the speed of the moving object is estimated too low. FIG. 2C illustrates a case when the camera and the moving object are moving in opposite directions. In FIG. 2C, the camera is moving to the left and the object moves to the right. As a result, the apparent background motion is to the right, as represented by the small arrow, and the speed of the moving object is overestimated.

FIG. 3 is a block diagram 300 of processing actions which illustrate a vector processing method according to the present system that overcomes the above mentioned problems encountered when using a moving camera for speed estimation. The present system uses a global motion vector (motion related to movement of the image acquiring device (e.g., camera) and a motion vector for an object (e.g., a car, person, etc.) to determine the speed of the object through simple addition/subtraction of the global motion vector from the object motion vector plus, in some embodiments, compensation for where an object is in the image. In one aspect of the present system, a post-processing method is applied on the vector fields generated by the motion estimator. The post-processing method of the present system employs two steps: (1) the elimination of the background vectors from the vector field; and (2) restoring the true motion of the object by applying motion compensation.

The first act includes a comparison of every vector in the vector field to a scaled version of a global motion vector during act 310. The global motion vector may be determined as a highest occurring motion vector amongst the vector field. Another system is to we calculate the so called pan-zoom vector by sampling vectors at certain positions and then calculating an average value. Any value of any vector in the vector field that is close to this pan-zoom vector, may be assigned to be a background vector. Other ways of discriminating between background and object vectors would be readily appreciated by a person of ordinary skill and should be interpreted as considered in the claims that follow. If a motion vector amongst the plurality of blocks of pixels of the vector field resembles the scaled global motion vector closely enough (e.g., the difference between the vector and the scaled global vector is less than a predetermined threshold value, such as five percent or less of the scaled global motion vector, although this figure is a configurable factor, so that fine-tuning is possible to account for variations in the hardware), the vector is reset to zero during act 320, indicating that this vector is attributable to a background motion vector. This provides a background motion vector that indicates that there is no motion of the background, thereby providing a reference for estimation of the object motion vector.

Act 330 comprises the subtraction of the scaled global motion vector from all vectors that are not classified as background vectors. The subtraction is carried out with regard to positive and negative values (e.g., a signed motion vector may be utilized) to take the direction of motion properly into account. Therefore the subtraction may in fact sometimes behave as an addition, such as when the camera and object are moving in a similar direction. The result of this vector processing is the production of an output vector

(vector_out) which accurately reflects the true motion of the moving object, compensated for by any camera motion.

In a further embodiment, if there is an undesirable amount of perspective in a scene captured by the camera, a scaling factor may be applied to the motion vectors including the global motion vector as discussed above. The scaling vector accounts for the fact that across the images, far away objects move slower than nearby objects, even if their actual speeds are comparable. For this functionality, a grid of conversion factors covering the image area may be previously determined for the translation of motion vectors into real speeds. Application of the foregoing steps on an estimated vector field produces a useful new vector field, particularly in applications using a non-stationary camera.

The present system, using for example the 3D-RS algorithm, will start to estimate the speed of every object as soon as it enters the image. In one embodiment, the proper positions of objects in the image for speed identification are selected as positions that are not too close to the borders of the image and not too far into the background. The algorithm of the present system utilizes some period of time to converge to find the correct speed and direction of motion of an object, so the selection of the location within the image for the object speed checking should be selected to be not too close to the borders of the image where objects, such as vehicles are entering or leaving the picture. This helps ensure that objects are present in the images for a sufficient period of time to determine corresponding motion vectors. As mentioned before, the 3D-RS algorithm works on groups of pixels, called blocks. For proper operation, it is required that objects (vehicles) are larger than blocks. Therefore these positions may be selected to be not too far into the background. Another option is to pick smaller groups of pixels for the blocks.

Object recognition (e.g., license plate recognition) may also operate in the system on the same image frames as the motion estimation algorithm. Accordingly, an object may be identified, as discussed above, while continuously monitoring the objects speed.

An illustrative device in accordance with the present system may include a video camera, a processor, including processors, such as an Intel Pentium processor, a Trimedia processor, etc. running a 3D-RS motion estimation algorithm. Additionally the processor may support a user interface on the display 130 to enable installation and calibration of the motion. The processor may be a multi-application processor running within a portable computer. The camera may be located in a remote location and communicate via a wireless connection or wired network connection to the present speed measuring processor although no remote computing resources are required.

The present system contemplates that the accuracy of motion estimation may be improved by using data from a vehicle wherein the camera is positioned. Vehicle data may include vehicle speed, positioning of a steering wheel of the vehicle, vehicle Global

Positioning Satellite (GPS) information, etc. that may be received by the processor 110 for correction factors, such as may be attributable to camera positioning, etc.

The present system affords several advantages over the prior art including requiring only one camera and a simple motion estimation algorithm that is robust. The device may be implemented as a real-time embedded system within a camera device. The device may work in real-time and accordingly, there may be no need to store large amounts of data, for example for off-line processing.

The present system may estimate vector fields from pixel wise (or portion thereof) to block wise. The present system may be utilized in an obstacle detection system and object collision warning system of an automobile since a magnitude and sign of motions vectors is utilized. Objects may be classified as approaching or receding to facilitate accident avoidance systems or an obstacle detection system. Further, the present system may utilize any type of camera, including a video camera having CMOS, CCD, infrared, etc., sensors.

Having described embodiments of the invention with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments, and that various changes and modifications may be effected therein by one having ordinary skill in the art without departing from the scope or spirit as defined in the appended claims.

In interpreting the appended claims, it should be understood that:

a) the word "comprising" does not exclude the presence of other elements or acts than those listed in a given claim;

b) the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements;

c) any reference signs in the claims do not limit their scope;

d) several "means" may be represented by the same item or hardware or software implemented structure or function;

e) any of the disclosed elements may be comprised of hardware portions (e.g., including discrete and integrated electronic circuitry), software portions (e.g., computer programming), and any combination thereof;

f) hardware portions may be comprised of one or both of analog and digital portions;

g) any of the disclosed devices or portions thereof may be combined together or separated into further portions unless specifically stated otherwise; and h) no specific sequence of acts or steps is intended to be required unless specifically indicated.

Claims

CLAIMS:

1. A motion estimation device comprising:

an image acquisition device; and

a processor configured to receive a plurality of images from the image acquisition device, configured to determine a motion vector for a background depicted in the plurality of images and configured to determine a motion vector for an object depicted in the plurality of images determined in part on the motion vector of the background.

2. The device of Claim 1, wherein the processor is configured to identify the background depicted in the image as a largest number of occurring motion vectors that are substantially similar amongst a vector field of the image.

3. The device of Claim 2, wherein the processor is configured to subtract each motion vector amongst the vector field from a global motion vector and if the result of any subtraction is less than a predetermined number, that motion vector is set to zero.

4. The device of Claim 1, wherein three-dimensional recursive searching (3D-RS) is performed to determine the motion vectors.

5. The device of Claim 1, wherein the image acquisition device is a camera and the device is implemented as a real-time embedded system inside the camera.

6. The device of Claim 1, wherein the processor is further configured to identify the object in the plurality of images.

7. The device of Claim 4, wherein the processor uses one of character recognition of identifying characteristics of the object, computer vision and object matching to identify the object.

8. The device of Claim 7, wherein the processor is configured to control a setting of the image acquisition device to identify the object.

9. The device of Claim 1, wherein the device comprises a portion of an obstacle detection system in a vehicle.

10. The device of Claim 1, wherein the processor is configured to divide motion vectors of a motion vector field of the plurality of images by a conversion factor to determine the background motion vector and the motion vector of the object.

11. An application embodied on a computer readable medium, the application configured to determine a motion vector for an object depicted in a plurality of images, the application comprising:

a portion configured to receive number representations of the plurality of images;

a portion configured to segment the plurality of images into pixel blocks;

a portion configured to determine motion vectors for each of the pixel blocks;

a portion configured to identify which of the pixel blocks is attributable to a background; and

a portion configured to determine a motion vector for the background and configured to determine a motion vector for an object determined in part on the motion vector of the background.

12. The application of Claim 11, wherein the portion configured to identify which of the pixel blocks is attributable to the background is configured to identify the background as a largest number of occurring motion vectors that are substantially similar amongst the pixel blocks.

13. The application of Claim 12, wherein the portion configured to determine a motion vector for the background is configured to determine an average motion vector of the motion vectors attributable to the largest number of occurring motion vectors, wherein the average motion vector is the background motion vector.

14. The application of Claim 12, wherein the portion configured to determine a motion vector for the background is configured to determine a median motion vector of the motion vectors attributable to the largest number of occurring motion vectors, wherein the average motion vector is the background motion vector.

15. The application of Claim 11, wherein three-dimensional recursive searching (3D- RS) is performed to determine the motion vectors.

16. The application of Claim 11, comprising a portion configured to identify the object in the plurality of images.

17. The application of Claim 16, comprising a portion configured to control a setting of an image acquisition device that is a source of the plurality of images to identify the object.

18. The application of Claim 11, comprising a portion configured to divide motion vectors of a motion vector field of the plurality of images by a conversion factor to determine the background motion vector and the motion vector of the object.

19. A method for motion estimation of an object depicted in a plurality of images, the method comprising:

receiving a plurality of images;

determining a motion vector for a background depicted in the plurality of images; and

determining a motion vector of an object depicted in the plurality of images determined in part on the motion vector of the background.

20. The method of Claim 19, comprising the act of determining a motion vector for the background as one of an average motion vector of the motion vectors attributable to the largest number of occurring motion vectors, and a median motion vector of the motion vectors attributable to the largest number of occurring motion vectors.