EP1110397A1 - Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism - Google Patents

Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism

Info

Publication number
EP1110397A1
EP1110397A1 EP00954423A EP00954423A EP1110397A1 EP 1110397 A1 EP1110397 A1 EP 1110397A1 EP 00954423 A EP00954423 A EP 00954423A EP 00954423 A EP00954423 A EP 00954423A EP 1110397 A1 EP1110397 A1 EP 1110397A1
Authority
EP
European Patent Office
Prior art keywords
interest
camera
image
zooming operation
zoom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00954423A
Other languages
German (de)
English (en)
French (fr)
Inventor
Eric Cohen-Solal
Mi-Suen Lee
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1110397A1 publication Critical patent/EP1110397A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/78Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using electromagnetic waves other than radio waves
    • G01S3/782Systems for determining direction or deviation from predetermined direction
    • G01S3/785Systems for determining direction or deviation from predetermined direction using adjustment of orientation of directivity characteristics of a detector or detector system to give a desired condition of signal derived from that detector or detector system
    • G01S3/786Systems for determining direction or deviation from predetermined direction using adjustment of orientation of directivity characteristics of a detector or detector system to give a desired condition of signal derived from that detector or detector system the desired condition being maintained automatically
    • G01S3/7864T.V. type tracking systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates generally to the field of video signal processing, and more particularly to techniques for tracking persons or other objects of interest using a video camera such that a desired video output can be achieved.
  • Tracking a person or other object of interest is an important aspect of video- camera-based systems such as video conferencing systems and video surveillance systems. For example, in a video conferencing system, it is often desirable to frame the head and shoulders of a particular conference participant in the resultant output video signal, while in a video surveillance system, it may be desirable to frame the entire body of, e.g., a person entering or leaving a restricted area monitored by the system.
  • Such systems generally utilize one of two distinct approaches to implement tracking of an object of interest. The first approach uses a pan-tilt-zoom (PTZ) camera that allows the system to position and optically zoom the camera to perform the tracking task.
  • PTZ pan-tilt-zoom
  • a problem with this approach is that, in some cases, the tracking mechanism is not sufficiently robust to sudden changes in the position of the object of interest. This may be due to the fact that the camera is often being zoomed-in too far to react to the sudden changes. For example, it is not uncommon in a video conferencing system for participants to move within their seats, e.g., to lean forward or backward, or to one side or the other. If a PTZ camera is zoomed-in too far on a particular participant, a relatively small movement of the participant may cause the PTZ camera to lose track of that participant, necessitating zoom-out and re-track operations that will be distracting to a viewer of the resultant output video signal.
  • the second approach is referred to as Avirtual zoom ⁇ or Aelectronic zoom.s
  • video information from one or more cameras is processed electronically such that the object of interest remains visible in a desired configuration in the output video signal despite the fact that the object may not be centered in the field of view of any particular camera.
  • U.S. Patent No. 5,187,574 discloses an example of such an approach, in which an image of an arriving guest is picked up by a fixed television camera of a surveillance system. The image is processed using detection, extraction and interpolation operations to ensure that the head of the guest is always displayed at the center of the monitor screen.
  • This approach ensures that the video output has a desired form, e.g., is centered on an object of interest, without the need for pan, tilt or zoom operations.
  • this approach can operate with fixed cameras, which are generally significantly less expensive than the above-noted PTZ cameras.
  • this approach fails to provide the output image quality required in many applications. For example, the extraction and interpolation operations associated with virtual zooming will generally result in a decreased resolution and image quality in the resultant output video signal, and therefore may not be suitable for video conferencing or other similar applications.
  • the invention provides methods and apparatus for real-time tracking of an object of interest in a video processing system, using a hybrid combination of (i) optical zooming by a pan-tilt-zoom (PTZ) camera, and (ii) virtual zooming on an image generated by that camera.
  • the object of interest is initially detected in an image generated by the camera.
  • An optical zooming operation then adjusts pan and tilt settings to frame the object of interest, and zooms in on the object of interest until one or more designated stopping criteria are met.
  • a virtual zooming operation processes the resulting optically-zoomed image to identify and extract a particular region of interest, and then inte ⁇ olates the extracted region of interest to generate a virtually-zoomed image.
  • the designated stopping criteria may indicate, e.g., that the optical zooming continues until the object of interest occupies a fixed or dynamic percentage of the resulting optically-zoomed image.
  • the percentage may vary as a function of a detected quality associated with the object of interest. Examples of such detected qualities include a level of apparent motion, a use of a particular audibly-detectable key word or other cue, and a change in intensity, pitch or other voice quality.
  • the virtual zooming operation may be repeated on the resulting optically-zoomed image, using the same pan, tilt and zoom settings established in the optical zooming operation, if a level of movement of the object of interest exceeds a first designated threshold.
  • the optical zooming operation itself may be repeated in order to establish new pan, tilt and zoom settings for the camera if the level of movement of the object of interest exceeds a second designated threshold higher than the first threshold.
  • the hybrid optical and virtual zoom mechanism of the invention provides a number of significant advantages over conventional approaches. For example, the hybrid mechanism accommodates a certain amount of movement of the object of interest without the need to determine new optical pan, tilt and zoom settings, while also providing a desired output image quality level.
  • the invention ensures that the PTZ camera settings are adjusted less frequently, and the computational load on the system processor is thereby reduced relative to that required by a conventional optical zoom approach.
  • the hybrid mechanism of the invention can provide an improved compression rate for image transmission.
  • Fig. 1 is a block diagram of a video processing system in accordance with an illustrative embodiment of the invention.
  • Fig. 2 is a functional block diagram illustrating hybrid real-time tracking video processing operations implemented in the system of Fig. 1.
  • Fig. 1 shows a video processing system 10 in accordance with an illustrative embodiment of the invention.
  • the system 10 includes a processor 12, a memory 14, an input/output (I/O) device 15 and a controller 16, all connected to communicate over a system bus 17.
  • the system 10 further includes a pan-tilt-zoom (PTZ) camera 18 which is coupled to the controller 16 as shown.
  • the PTZ camera 18 is employed in a video conferencing application in which a table 20 accommodates a number of conference participants 22-1, ..., 22-k, ..., 22-N.
  • the PTZ camera 18 In operation, the PTZ camera 18, as directed by the controller 16 in accordance with instructions received from the processor 12, tracks an object of interest which in this example application corresponds to a particular participant 22-k.
  • the PTZ performs this real-time tracking function using a hybrid optical and virtual zooming mechanism to be described in greater detail below in conjunction with Fig. 2.
  • the video processing system 10 can be used in a wide variety of other applications.
  • the portion 24 of the system 10 can be used in video surveillance applications, and in other types of video conferencing applications, e.g., in applications involving congress-like seating arrangements, circular or rectangular table arrangements, etc.
  • the portion 24 of system 10 can be used in any application which can benefit from the improved tracking function provided by a hybrid optical and virtual zoom mechanism.
  • the portion 26 of the system 10 may therefore be replaced with, e.g., other video conferencing arrangements, video surveillance arrangements, or any other arrangement of one or more objects of interest to be tracked using the portion 24 of the system 10.
  • the invention can be used with image capture devices other than PTZ cameras.
  • elements or groups of elements of the system 10 may represent corresponding elements of an otherwise conventional desktop or portable computer, as well as portions or combinations of these and other processing devices.
  • some or all of the functions of the processor 12 , controller 16 or other elements of the system 10 may be combined into a single device.
  • one or more of the elements of system 10 may be implemented as an application specific integrated circuit (ASIC) or circuit card to be incorporated into a computer, television, set-top box or other processing device.
  • Aprocessor ⁇ as used herein is intended to include a microprocessor, central processing unit, microcontroller or any other data processing element that may be utilized in a given data processing device.
  • Fig. 2 is a functional block diagram illustrating a hybrid optical and virtual zoom mechanism 30 implemented in the system 10 of Fig. 1.
  • the hybrid optical and virtual zoom mechanism 30 includes a detection and tracking operation 32, an optical zooming operation 34, and a virtual zooming operation 36. These operations will be described with reference to images 40, 42, 44 and 46 which correspond to images generated for the exemplary video conferencing application in portion 26 of system 10.
  • the operations 32, 34 and 36 may be implemented in system 10 by processor 12 and controller 16, utilizing one or more software programs stored in the memory 14 or accessible via the I/O device 15 from a local or remote storage device.
  • PTZ camera 18 In operation, PTZ camera 18 generates image 40 which includes an object of interest, i.e., video conference participant 22-k, and an additional object, i.e., another participant 22-k+l adjacent to the object of interest.
  • the image 40 is supplied as a video input to the detection and tracking operation 32, which detects and tracks the object of interest 22-k using well-known conventional detection and tracking techniques.
  • the object of interest 22-k may correspond to the current speaker.
  • the detection and tracking operation 32 may detect and track the object of interest 22-k using techniques such as audio location to determine which conference participant is the current speaker, motion detection to determine which conference participant is talking, gesturing, shaking his or her head, moving in a particular manner, speaking in a particular manner, etc.
  • the object of interest may be a person taking a particular action, e.g., entering or leaving a restricted area or engaging in suspicious behavior, a child moving about in a room of a home, a vehicle entering or leaving a parking garage, etc.
  • the output of the detection and tracking operation 32 includes information identifying the particular object of interest 22-k, which is shown as shaded in the image 42.
  • the particular type of detection and tracking mechanisms used in operation 32 will generally vary depending upon the application. Conventional detection and tracking techniques which may be used in operation 32 include those described in, e.g., C. Wren, A. Azarbayejani, T. Darrell, A. Pentland. APfinder: Real-time Tracking of the Human Body, ⁇ IEEE Trans.
  • PAMI 19(7):780-785, July 1997; H. Rowley, S. Bluja, T. Kanade, ARotation Invariant Neural Network-Based Face Detection, ⁇ Proc. IEEE Conf. on Computer Vision, pp.38-44, June 1998; and A. Lipton, H. Fujiyoshi, R. Patil, AMoving Target Classification and Tracking from Real-Time Video,s Proc. IEEE Workshop on Application of Computer Vision, pp.8-14, Oct l998.
  • the optical zooming operation 34 of Fig. 2 provides a sufficient amount of zooming to ensure that a desired output image quality can be achieved, while also allowing for a certain amount of movement of the object of interest.
  • the optical zooming operation 34 includes a framing portion with pan and tilt operations for framing the object of interest 22-k, followed by a zooming portion with a zooming operation that continues until designated stopping criteria are satisfied.
  • the following approach can be used to estimate the required amount of pan and tilt in the framing portion of operation 34.
  • the object of interest 22-k is detected in operation 32 as being located at a pixel coordinate position (x, y) in image 42.
  • the framing portion of operation 34 adjusts the pan and tilt of camera 18 such that the object of interest appears in the center (c x ,c y ) of the image.
  • Z be the current zoom factor
  • ctp C be the current camera pan angle
  • ⁇ y the current camera tilt angle
  • the new pan angle ctp N and new tilt angle ⁇ -r N are then given by:
  • ⁇ P N ⁇ P c + D*((x-c x )/ZF),
  • pan and tilt adjustments may also be used to determine the appropriate pan and tilt adjustments for the framing portion of operation 34.
  • techniques for determining pan and tilt in the presence of radial distortion of the camera lens will be apparent to those skilled in the art.
  • this portion of operation 34 involves an optical zooming which continues until one or more designated stopping criteria are satisfied.
  • stopping criteria There are a number of different types of stopping criteria which may be used.
  • the optical zooming continues until the object of interest occupies a fixed percentage of the image.
  • the optical zooming may continue until the head of the current speaker occupies between about 25% and 35% of the vertical size of the image.
  • the specific percentages used will vary depending upon the tracking application. The specific percentages suitable for a particular application can be determined in a straightforward manner by those of ordinary skill in the art.
  • the optical zooming again continues until the object of interest reaches a designated percentage of the image, but the percentage in this approach is a function of another detected quality associated with the object of interest.
  • the percentage may vary as a function of qualities such as level of apparent motion, use of particular key words or other audio or speech cues, change in intensity, pitch or other voice quality, etc.
  • the specific percentages and the manner in which they vary based on the detected qualities will generally depend upon the particular tracking application, and can be determined in a straightforward manner by those skilled in the art.
  • the result of the optical zooming operation 34 is an optically-zoomed image 44, in which the object of interest 22-k is centered within the image and occupies a desired percentage of the image as determined based on the above-described fixed or dynamic stopping criteria.
  • the image 44 may be stored by the system 10, e.g., in memory 14.
  • the virtual zooming operation 36 is then performed on the optically-zoomed image 44.
  • This virtual zooming operation first extracts a region of interest from the image 44.
  • a region of interest 47 may be identified as the head and shoulders of the current object of interest 22-k.
  • the region of interest may be the hands, feet, head, body or other designated portion of the object of interest.
  • the identification of the region of interest may be a dynamic process, e.g., it may be selected by an operator based on the current tracking objectives.
  • the region of interest may be identified and extracted using known techniques, e.g., the techniques described in the references cited above in conjunction with detection of the object of interest.
  • the extracted region of interest is then interpolated using well-known image interpolation techniques to generate a video output which includes the virtually-zoomed image 46.
  • the image 46 thus represents a virtual zoom of the optically-zoomed image 44.
  • the virtual zooming operation 36 may be performed in a different system than the detection and tracking operation 32 and optical zooming operation 34.
  • the image 44 may be compressed and then transmitted from the system 10 via the I/O device 15, with the virtual zooming operation being performed in signal processing elements of a corresponding receiver.
  • the hybrid mechanism 30 allows for a certain amount of movement on the part of the object of interest, while preserving a desired level of image quality in the video output.
  • the virtual zooming operation 36 can be repeated using the same pan, tilt and zoom settings determined in the optical zooming operation 34.
  • the extraction and interpolation operations of the virtual zoom can result in an output image in which the object of interest 22-k remains substantially centered in the image.
  • the hybrid mechanism 30 can incorporate multiple thresholds for determining when the virtual zooming and optical zooming operations should be repeated. For example, if a given amount of movement of the object of interest exceeds a first threshold, the virtual zooming operation 36 may be repeated with the pan, tilt and zoom settings of the camera unchanged. If the given amount of movement exceeds a second, higher threshold, the optical zooming step 34 may be repeated to determine new pan, tilt and zoom settings, and then the virtual zooming operation 36 is repeated to obtain the desired output image 46.
  • a feedback path 48 is included between the optical zooming operation 34 and the detection and tracking operation 32 such that the detection and tracking operation can be repeated if necessary, e.g., in the event that the optical zooming operation detects a substantial movement of the object of interest such that it can no longer track that object.
  • the hybrid optical and virtual zoom mechanism of the invention provides a number of significant advantages over conventional approaches.
  • the hybrid mechanism accommodates some movement of the object of interest without the need to determine new optical pan, tilt and zoom settings, while also providing a desired output image quality level.
  • the invention ensures that the PTZ camera settings are adjusted less frequently, and the computational load on the system processor is thereby reduced relative to that required by a conventional optical zoom approach.
  • the hybrid mechanism of the invention can provide an improved compression rate for image transmission.
  • the virtual zoom operation can be performed after an image is transmitted from the system 10 to a receiver via the I/O device 15. Consequently, the proportion of the object in the transmitted image is lower than it would otherwise be using a conventional approach, thereby allowing for less compression and an improved compression rate.
  • the above-described embodiment of the invention is intended to be illustrative only.
  • the invention can be used to implement real-time tracking of any desired object of interest, and in a wide variety of applications, including video conferencing systems, video surveillance systems, and other camera-based systems.
  • the invention is also applicable to systems with multiple PTZ cameras, and to systems with other types and arrangements of image capture devices.
  • the invention can utilize many different types of techniques to detect and track an object of interest, and to extract and interpolate a region of interest.
  • the invention can also be implemented at least in part in the form of one or more software programs which are stored on an electronic, magnetic or optical storage medium and executed by a processing device, e.g., by the processor 12 of system 10.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Electromagnetism (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Studio Devices (AREA)
  • Lens Barrels (AREA)
EP00954423A 1999-06-29 2000-06-27 Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism Withdrawn EP1110397A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US34364699A 1999-06-29 1999-06-29
US343646 1999-06-29
PCT/EP2000/005951 WO2001001685A1 (en) 1999-06-29 2000-06-27 Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism

Publications (1)

Publication Number Publication Date
EP1110397A1 true EP1110397A1 (en) 2001-06-27

Family

ID=23346978

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00954423A Withdrawn EP1110397A1 (en) 1999-06-29 2000-06-27 Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism

Country Status (4)

Country Link
EP (1) EP1110397A1 (ko)
JP (1) JP2003503910A (ko)
KR (1) KR100711950B1 (ko)
WO (1) WO2001001685A1 (ko)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004095044A1 (en) 2003-04-22 2004-11-04 Philips Intellectual Property & Standards Gmbh Multiscale localization procedure
KR100585822B1 (ko) * 2004-04-26 2006-06-01 주식회사 일리시스 실시간 파노라마 비디오 영상을 이용한 감시 시스템 및 그시스템의 제어방법
US8803978B2 (en) 2006-05-23 2014-08-12 Microsoft Corporation Computer vision-based object tracking system
CN101534413B (zh) * 2009-04-14 2012-07-04 华为终端有限公司 一种远程呈现的系统、装置和方法
US8860775B2 (en) 2009-04-14 2014-10-14 Huawei Device Co., Ltd. Remote presenting system, device, and method
CN102611872B (zh) * 2011-01-19 2014-07-02 株式会社理光 基于感兴趣区域动态检测的场景影像转换系统和方法
US9100572B2 (en) 2013-05-24 2015-08-04 Xerox Corporation Methods and systems for confidence-based image processing
US11430084B2 (en) 2018-09-05 2022-08-30 Toyota Research Institute, Inc. Systems and methods for saliency-based sampling layer for neural networks
CN112347924A (zh) * 2020-11-06 2021-02-09 杭州当虹科技股份有限公司 一种基于人脸跟踪的虚拟导播改进方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5187674A (en) * 1989-12-28 1993-02-16 Honeywell Inc. Versatile, overpressure proof, absolute pressure sensor
JPH0771288B2 (ja) * 1990-08-24 1995-07-31 神田通信工業株式会社 自動視野調整方法及び装置
US5200818A (en) * 1991-03-22 1993-04-06 Inbal Neta Video imaging system with interactive windowing capability
US5185667A (en) * 1991-05-13 1993-02-09 Telerobotics International, Inc. Omniview motionless camera orientation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0101685A1 *

Also Published As

Publication number Publication date
KR20010079719A (ko) 2001-08-22
KR100711950B1 (ko) 2007-05-02
WO2001001685A1 (en) 2001-01-04
JP2003503910A (ja) 2003-01-28

Similar Documents

Publication Publication Date Title
US10339386B2 (en) Unusual event detection in wide-angle video (based on moving object trajectories)
US6766035B1 (en) Method and apparatus for adaptive position determination video conferencing and other applications
US6850265B1 (en) Method and apparatus for tracking moving objects using combined video and audio information in video conferencing and other applications
EP2311256B1 (en) Communication device with peripheral viewing means
US8004565B2 (en) System and method for using motion vectors for object tracking
KR101530255B1 (ko) 객체 자동 추적 장치가 구비된 cctv 시스템
JP2004515982A (ja) テレビ会議及び他の適用においてイベントを予測する方法及び装置
CN108198199B (zh) 运动物体跟踪方法、运动物体跟踪装置和电子设备
EP1739966A1 (en) System for videoconferencing
CN106470313B (zh) 影像产生系统及影像产生方法
CN1980384A (zh) 空间移动物体锁定寻标装置及其方法
KR100711950B1 (ko) 하이브리드 광학 및 가상 주밍 장치를 사용한 관심있는물체의 실시간 트래킹
US20030044083A1 (en) Image processing apparatus, image processing method, and image processing program
US20190158731A1 (en) Method and device for capturing a video with a front camera according to an image of the user captured by a rear camera
US20120075467A1 (en) Image capture device and method for tracking moving object using the same
Huang et al. Networked omnivision arrays for intelligent environment
Zhang et al. Semantic saliency driven camera control for personal remote collaboration
JP2889410B2 (ja) 画像認識装置
JP2004128648A (ja) 侵入物体追尾方法
US20220198620A1 (en) Camera system and method for determining a viewing frustum
JP3832362B2 (ja) 画像処理装置と画像処理方法および画像処理プログラム
JP5398359B2 (ja) 情報処理装置、撮像装置及び制御方法
JP2001008191A (ja) 人物検出機能搭載装置
CN117037271A (zh) 一种会议摄像头的发言人追踪方法、系统及存储介质
JPH0568195A (ja) テレビドアフオン

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17P Request for examination filed

Effective date: 20010704

RBV Designated contracting states (corrected)

Designated state(s): DE ES FR GB IT

17Q First examination report despatched

Effective date: 20080305

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20080716