EP1310102A2 - Utilisation d'un detecteur secondaire pour des communications video optimisees - Google Patents

Utilisation d'un detecteur secondaire pour des communications video optimisees

Info

Publication number
EP1310102A2
EP1310102A2 EP01969495A EP01969495A EP1310102A2 EP 1310102 A2 EP1310102 A2 EP 1310102A2 EP 01969495 A EP01969495 A EP 01969495A EP 01969495 A EP01969495 A EP 01969495A EP 1310102 A2 EP1310102 A2 EP 1310102A2
Authority
EP
European Patent Office
Prior art keywords
image
video
encoding
parameter
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01969495A
Other languages
German (de)
English (en)
Inventor
Michael Bakhmutsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1310102A2 publication Critical patent/EP1310102A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/37Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability with arrangements for assigning different transmission priorities to video input data or to video coded data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • H04N23/11Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths for generating image signals from visible and infrared light wavelengths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/30Transforming light or analogous information into electric information
    • H04N5/33Transforming infrared radiation

Definitions

  • This invention relates to the field of video communications, and in particular to a method and system that facilitates an optimized transmission of images based on a coupling of a video camera with a secondary sensor, such as a heat sensor mosaic.
  • An MPEG encoding of a stream of images uses a variety of techniques to reduce the amount of data that needs to be transmitted or stored.
  • bandwidth is used herein to include the amount of encoded data required to either store or transmit video images.
  • a discrete cosine transform (DCT) is used to reduce the size of the encoded information spatially within each image frame, or portion of a frame.
  • Motion estimation techniques are used to reduce the size of the encoded information temporally, based on the amount of difference, or movement, between successive images.
  • Quantization is used to reduce the size of the encoded information based on the degree of detail required, or to reduce the size, and thus the detail, based on available bandwidth.
  • Each of these techniques are intended to optimize the allocation of bandwidth to different characteristics of the image, without introducing noticeable visible anomalies when the received image is decoded and displayed.
  • bandwidth optimizing techniques of MPEG encoding some compromises are required for low bandwidth systems.
  • video images communicated over the Internet are typically constrained to small size images, providing substantially less resolution than a full-resolution DND version of the same stream of images.
  • Video images communicated for videoconferencing are typically encoded at less than half the frame rate of conventional television broadcasts, and produces delayed and discontinuous images on the display.
  • MPEG-4 allows for the separation of an object from its background, and thereby allows the object to be encoded at a different, typically finer, level of detail than the background.
  • This encoding technique is expected to be particularly well suited for videoconferencing, wherein the majority of the limited bandwidth is allocated to the human 'objects' in the scene, with minimal bandwidth being allocated to background scenes. In this manner, although movements in the background may appear staggered and potentially blurred, the human objects in the scene will appear clearly, and potentially at a higher frame rate that reduces delays and discontinuities.
  • object-dependent encoding techniques are also expected to facilitate graphic art effects, wherein select objects can be encoded with different emphasis than the background scene, or other objects.
  • These advanced techniques for allocating bandwidth or providing graphic art effects to objects of interest in an encoded image requires the recognition of each object in the image.
  • Object recognition is a complex processing task that currently requires processing equipment that is beyond the feasible cost range for consumer devices. The high cost and relatively low accuracy of current object recognition devices precludes its use in most applications that could benefit from an optimized encoding, such as video conferencing and Internet video communications.
  • a preferred secondary sensor for detecting animate objects, such as humans in a videoconference scene is a conventional infrared heat sensor matrix.
  • the secondary image may also be used as a "front end filter" to conventional object recognition applications, thereby increasing the efficiency and accuracy of these applications.
  • Fig. 1 illustrates an example block diagram of an encoding system in accordance with this invention.
  • Fig. 2 illustrates an example camera system in accordance with this invention.
  • Fig. 3 illustrates an example flow diagram of an encoding system in accordance with this invention.
  • FIG. 1 illustrates an example block diagram of an encoding system 100 in accordance with this invention.
  • the encoding system 100 includes a source of a video image 110, a corresponding secondary image 120, and an encoder 150.
  • the term image is used herein to define an array of values corresponding to items within a field of view of a collection device.
  • the video image 110 generally corresponds to an array of values associated with the collection of visible light within the field of view of a video camera.
  • This array of values may be in any of a variety of formats, and although represented as an array of values in the figures, may be a serial stream of values.
  • the secondary image 120 in accordance with this invention is not a derivative of the video image 110, but is a representation of substantially the same scene as the video image, collected via an alternative sensor to the sensor that is used to collect the video image 110.
  • the secondary image 120 is a representation of the scene collected via an infrared heat sensor, although other secondary sensing devices may be used as well.
  • the secondary sensor captures a characteristic of the scene that facilitates the recognition of potential objects of interest 101 in the video image 110.
  • An infrared sensor is particularly well suited for detecting animate objects, such as humans, even when the object is fully clothed.
  • the resolution of the secondary image 120 may be different than the video image 110.
  • the secondary image 120 may be a 64x64 array of thermal values, whereas the video image 110 may be a 330x485, or larger, array of luminance and chrominance values.
  • the resolution of the secondary image 120 is selected based on a cost/performance tradeoff.
  • the resolution of the secondary image 120 determines the accuracy of determining the shape of the object of interest 101, and thereby the degree of encoding optimization that can be achieved, but the cost of a sensor to produce a high-resolution image 120 may be substantially higher than the cost of a sensor that produces a low-resolution image 120.
  • Such a high cost may be warranted, for example, in a professional system that is used to identify a newscaster in a scene, and substitutes appropriate background images based on the news content.
  • this invention is presented using the paradigm of an identification of potential objects or regions of interest in the video image based on thermal emissions, and an adjustment of the level of detail provided in the encoding of these objects or regions of interest.
  • other characteristics of a secondary image can be used to control the encoding of the video images, such as an identification of objects based on a particular color.
  • other encoding parameters such as brightness, color intensity, frame rate, and so on, may be adjusted in dependence upon the detected characteristics.
  • any parameter or characteristic that affects the encoding of an image is termed an "encoding parameter".
  • the luminance and chrominance values within these regions may be set to a constant value, thereby minimizing the information content that needs to be encoded for these regions.
  • the characteristics of the secondary image 120 are used to control the encoding parameters 160 that are used by the encoder 150 for encoding the video image 110.
  • the object 101 is illustrated as being overlaid on the images 110, 120.
  • the sensor regions corresponding to infrared image 120 that are overlaid by the infrared emitting object 101 will have higher sensed values than the surrounding regions.
  • Regions that are partially overlaid by the infrared emitting object 101 will have an average sensed value that is lower than the regions that are overlaid completely by the infrared emitting object 101, but higher than the regions that do not contain a source of infrared emissions. If, as illustrated, a region 121 of the secondary image 120 contains a characteristic (high thermal sensed value) that corresponds to the presence of an animate object (warm body) in that region 121, the encoder 150 encodes the regions 111 of the video image 110 corresponding to this region 121 at a finer level of detail than regions in the secondary image 120 that do not exhibit the presence of a warm body.
  • This level of detail can be changed, for example, by modifying the quantization step size used in the quantization of DCT values in an MPEG encoding.
  • Other encoding parameters 160 may be adjusted, in addition to, or in lieu of the quantization parameter.
  • the perception of a higher frame rate can be achieved by transmitting frames containing the regions of interest more often than frames that contain the other regions.
  • the characteristic of the region 121 may be one of many parameters
  • a "static" block in an image is progressively encoded in finer and finer detail, subject to bandwidth availability, so that any potential “lulls" in bandwidth utilization can be used to improve the picture quality.
  • a preferred combination of this invention with the copending invention would favor the progressively finer encoding of the regions of interest identified by the secondary image 120, rather than potentially less interesting regions. That is, for example, identified regions of interest would be given higher priority for allocation of the available bandwidth, and the regions of less interest would be allocated bandwidth after the interesting regions are rendered at a predefined acceptable level of detail.
  • the secondary image 120 can be used as a "front-end" filter to a conventional object-recognition application.
  • the object-recognition application is configured to prioritize the search for potential objects to the areas of interest identified by the characteristics of the regions of the secondary image 120. Similarly, if the object-recognition application is designed to find objects that are known to correspond to a minimum size area relative to the secondary image 120, the search can be restricted to the areas of the secondary- image that contain contiguous blocks having the desired characteristic that occupy the minimum size area.
  • the encoder 150 can encode the individual regions of the video image 110 at a finer level of detail, or, if the encoding directly supports object-dependent encoding, such as an MPEG-4 encoding, the encoder 150 encodes the identified regions as an explicit object, with an associated quantization parameter.
  • object-dependent encoding such as an MPEG-4 encoding
  • the specific details of the encoding and its associated level of detail dependencies will be dependent upon the particular encoding scheme employed, and other techniques for optimizing the level of detail based on an identification of an obj ect or region of interest will be evident to one of ordinary skill in the art in view of this disclosure.
  • FIG. 2 illustrates an example camera system 200 in accordance with this invention.
  • the camera system 200 includes a camera 210 for collecting video images (110 in FIG. 1), and a secondary sensor 220 for collecting secondary images (120 in FIG. 1).
  • the field of view 215 of the camera 210 and the field of view 225 of the sensor 220 should substantially correspond.
  • the same optic system that is used by the camera to produce the video image 110 would be used to produce the secondary image 120, via a sensor 220 that is integral to the camera 210, so that an exact correspondence can be achieved.
  • an exact correspondence is not required.
  • FIG. 2 illustrates a secondary sensor 220 that is adjacent the camera 210, illustrative of a configuration for a sensor 220 that is provided as an "option” to a conventional video camera 210, or as a removable item on a camera 210 that includes an integral encoder (150 of FIG. 1) in accordance with this invention.
  • the correspondence between the images 110, 120 is substantially linear, as illustrated in FIG. 1.
  • a mapping between the images 110, 120 in regions beyond the substantially corresponding region 275 can be defined in terms of a more complex coordinate transformation, using approximation techniques common in the art.
  • the field of view 215 will contract or expand accordingly. In an ideal embodiment, the change of zoom in the camera 210 will effect a corresponding change of the field of view 225 of the secondary sensor.
  • the field of view 225 may be fixed.
  • the field of view 225 is set to a "typical" field, within which objects of interest are likely to appear.
  • the regions of video image 110 in the field of view 215 of the camera 210 that are beyond the field of view 225 of the secondary sensor 220, because of a zoomed- out setting of the camera 210, in this embodiment are set to a default coarse level of detail setting.
  • regions of the secondary image 120 that are beyond the field of view 215 of the camera 210, because of a zoomed-in setting of the camera 210 are ignored, except as necessary to effect the aforementioned interpolation of characteristic values to prevent edge discontinuities.
  • ancillary methods for improving the correlation between the images 110, 120 may also be used.
  • the appropriate coordinate transformation may be determined by comparing characteristics of the images 110, 120 and using least-square-error curve fitting techniques, common in the art, to determine the appropriate parameters of the coordinate transformation between the images 110, 120.
  • thermal imaging arrays are commonly available. Commonly available thermal arrays provide images (120 in FIG. 1) having 64x64 regions (121); larger and smaller arrays are also available.
  • US patent 6,031,231 "INFRARED FOCAL PLANE ARRAY”, issued 29 February 2000 to Kimata et al, and incorporated by reference herein, provides an overview of two-dimensional infrared focal plane arrays of temperature detecting units that are arranged on semiconductor substrates.
  • US patent 4,868,391 "INFRARED LENS ARRAYS”, issued 19 September 1989 to Antoine Y.
  • each of the lenses have a common focal point, energizing a single temperature detecting unit.
  • an array of fresnel lenses are arranged to direct thermal energy to a plurality of temperature detecting units on a semiconductor substrate. The output from the temperature detecting units corresponds to the image 120 of FIG. 1.
  • the fields of view of the individual detecting units within the sensor 220 need not be uniform. That is, for example, in a preferred embodiment of this invention, the fresnel lenses corresponding to the perimeter regions of the image 120 have a wider field of view than the fresnel lenses corresponding to the center region of the image 120, because it is likely that objects or regions of interest will generally be located near the center of the video image 110.
  • the sensor 220 may correspond to a conventional infrared camera. In such an embodiment, the infrared camera 220 and the video camera 210 are mounted on a common carrier, and controlled by a common control system. Each of the cameras 210, 220 provide their corresponding images 110, 120 to an encoder 150 for processing as discussed above.
  • the encoder 150 may be located in a device that reads the images 110, 120 directly from the camera 210 and sensor 220, and may be embedded within either the camera 210 or sensor 220. In like manner, the encoder 150, camera 210, and sensor 220 may be embodied as a single device.
  • the encoder 150 may also be an independent device that acquires the images 110, 120 from recordings or transmissions from the camera 210, and sensor 220.
  • a time-stamp is provided for each image 110, 120, to facilitate a synchronization between the video images 110 and secondary images 120.
  • the frame rate of the camera 210 and sensor 220 need not be identical, provided only that the secondary images 120 can be substantially correlated in time with the video images 110.
  • FIG. 3 illustrates an example flow diagram of an encoding system in accordance with this invention.
  • this flow diagram is presented with reference to the objects of FIGs. 1 and 2, and in the context of a straightforward MPEG encoding, without the details of alternative embodiments presented above.
  • the invention is not limited to this example.
  • a default quantization factor is determined.
  • This default quantization factor corresponds to a quantization step size in a conventional MPEG encoding that produces a relatively coarse level of detail. This default factor may be determined based on available bandwidth, prior image quality, overall complexity or dynamics of prior images, and so on. For convenience, this default quantization factor is allocated to each region of the video image 110, at 330, and then selectively modified, via the loop 340-360, based on the characteristics of the secondary image 120, such as a thermal-object-outline derived from the secondary image 120.
  • Each region 121 of the secondary image 120 is successively processed in the loop 340-360.
  • a simple threshold test at 345, is used to determine whether each region corresponds to a "region of interest".
  • Each region 121 of the secondary image 120 has an associated characteristic, such as a resistance or a voltage corresponding to the detected heat within the region 121, and a measure of this characteristic is used to determine whether or not the region is a "region of interest”. If the measure exceeds the threshold, the quantization factor of the corresponding regions 111 of the video image 110 is adjusted so as to effect an encoding at a finer level of detail, at 350.
  • the loop 340-360 may be replaced by a continuous determination of an appropriate quantization factor for each region 111 of the video image 110 based on an interpolation of the measures of each region 121.
  • the loop 340-360 may be replaced or augmented by a fuzzy logic system as discussed in US patent 5,475,433, or a progressive approach as discussed in copending application 09/220,292, discussed above.
  • the loop 340-360 may be replaced by a conventional object-recognition system that uses the measures of the characteristics of the image 120 to facilitate an efficient object search, also discussed above.
  • the video image 110 is encoded, using the quantization factors determined above based on the secondary image 120.
  • the encoding and quantization factors may also be dependent on other parameters, such as available bandwidth, degree of complexity and movement, and so on, using techniques common in the art, or as disclosed in the copending US patent application 09/220,292.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

L'invention concerne un détecteur secondaire qui détecte la même scène qu'une caméra vidéo. L'image provenant du détecteur secondaire sert à identifier des zones de l'image vidéo qui correspondent à des objets intéressants. Les zones identifiées intéressantes peuvent ensuite être codées à un niveau de détail plus fin que les autres zones de l'image vidéo. Une matrice classique de détecteurs de chaleur infrarouge constitue un détecteur secondaire préféré pour détecter des objets animés tels que des êtres humains dans une scène de vidéoconférence. En codant les zones de l'image vidéo qui correspondent à des zones de température ambiante de la matrice de détecteurs de chaleur à un niveau de détail généralement inférieur, on peut utiliser la bande passante disponible pour transmettre les zones de température supérieure à un niveau de détail plus fin ou à une fréquence de trames supérieure. L'image secondaire peut aussi servir de « filtre frontal » dans des applications classiques de reconnaissance d'objet, ce qui permet d'accroître le rendement et la précision de ces applications.
EP01969495A 2000-08-08 2001-07-23 Utilisation d'un detecteur secondaire pour des communications video optimisees Withdrawn EP1310102A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63468200A 2000-08-08 2000-08-08
US634682 2000-08-08
PCT/EP2001/008538 WO2002013535A2 (fr) 2000-08-08 2001-07-23 Utilisation d'un detecteur secondaire pour des communications video optimisees

Publications (1)

Publication Number Publication Date
EP1310102A2 true EP1310102A2 (fr) 2003-05-14

Family

ID=24544796

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01969495A Withdrawn EP1310102A2 (fr) 2000-08-08 2001-07-23 Utilisation d'un detecteur secondaire pour des communications video optimisees

Country Status (5)

Country Link
EP (1) EP1310102A2 (fr)
JP (1) JP2004506354A (fr)
KR (1) KR20020064794A (fr)
CN (1) CN1393111A (fr)
WO (1) WO2002013535A2 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5535626B2 (ja) * 2006-07-28 2014-07-02 コーニンクレッカ フィリップス エヌ ヴェ ショップウィンドーに沿って自動的に分布するプライベート用スクリーン
US8994845B2 (en) 2012-04-27 2015-03-31 Blackberry Limited System and method of adjusting a camera based on image data
EP2658245B1 (fr) * 2012-04-27 2016-04-13 BlackBerry Limited Système et procédé de réglage de données d'image de caméra
EP3477941B1 (fr) * 2017-10-27 2020-11-25 Axis AB Procédé et dispositif de commande pour commander une unité de traitement vidéo basée sur la détection de nouveaux arrivants dans un premier environnement

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764803A (en) * 1996-04-03 1998-06-09 Lucent Technologies Inc. Motion-adaptive modelling of scene content for very low bit rate model-assisted coding of video sequences
AUPP340798A0 (en) * 1998-05-07 1998-05-28 Canon Kabushiki Kaisha Automated video interpretation system
US6496607B1 (en) * 1998-06-26 2002-12-17 Sarnoff Corporation Method and apparatus for region-based allocation of processing resources and control of input image formation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0213535A2 *

Also Published As

Publication number Publication date
JP2004506354A (ja) 2004-02-26
WO2002013535A2 (fr) 2002-02-14
CN1393111A (zh) 2003-01-22
WO2002013535A3 (fr) 2002-06-13
KR20020064794A (ko) 2002-08-09

Similar Documents

Publication Publication Date Title
US8605185B2 (en) Capture of video with motion-speed determination and variable capture rate
US20080129857A1 (en) Method And Camera With Multiple Resolution
US5751378A (en) Scene change detector for digital video
EP1431912B1 (fr) Méthode et système pour déterminer une region d'importance dans une image archivée
US6961083B2 (en) Concurrent dual pipeline for acquisition, processing and transmission of digital video and high resolution digital still photographs
US20100141770A1 (en) Imaging apparatus and imaging method
EP0725536A2 (fr) Procédé et appareil pour la prise de vue avec expansion de la gamme dynamique
US6873727B2 (en) System for setting image characteristics using embedded camera tag information
US7889265B2 (en) Imaging apparatus, control method for the imaging apparatus, and storage medium storing computer program which causes a computer to execute the control method for the imaging apparatus
US20090290645A1 (en) System and Method for Using Coded Data From a Video Source to Compress a Media Signal
CN1212018C (zh) 和摄像机组合的视频记录/再现装置及其记录控制方法
TW200939779A (en) Intelligent high resolution video system
WO2008107713A1 (fr) Capture contrôlée de sous-image haute résolution avec flux de données vidéo de référence à champ de visée global haute vitesse multiplexé dans le domaine temporel pour des applications biométriques basées sur des images
JPH05191718A (ja) 撮像装置
CN106888355B (zh) 比特率控制器和用于限制输出比特率的方法
US8120675B2 (en) Moving image recording/playback device
EP1310102A2 (fr) Utilisation d'un detecteur secondaire pour des communications video optimisees
JP2009284208A (ja) 動画像符号化装置及び動画像記録装置
CN110557532B (zh) 摄像装置、客户端装置和控制方法
CN112788364B (zh) 码流动态调整装置、方法及计算机可读存储介质
JPH03230691A (ja) ディジタル電子スチルカメラ
JP2002300607A (ja) 符号化装置及び復号化装置
KR100457302B1 (ko) 영상 처리를 이용한 다채널 자동 트랙킹 및 자동 줌 방법
JP2005109757A (ja) 画像撮像装置、画像処理装置、画像撮像方法、及びプログラム
US7483059B2 (en) Systems and methods for sampling an image sensor

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20030310

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20040426