WO2002073517A1 - Dispositifs et procedes de traitement d'images - Google Patents

Dispositifs et procedes de traitement d'images Download PDF

Info

Publication number
WO2002073517A1
WO2002073517A1 PCT/EP2002/002791 EP0202791W WO02073517A1 WO 2002073517 A1 WO2002073517 A1 WO 2002073517A1 EP 0202791 W EP0202791 W EP 0202791W WO 02073517 A1 WO02073517 A1 WO 02073517A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
user
head
current user
devices
Prior art date
Application number
PCT/EP2002/002791
Other languages
German (de)
English (en)
Inventor
Wes Bell
Original Assignee
Voxar Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voxar Ag filed Critical Voxar Ag
Publication of WO2002073517A1 publication Critical patent/WO2002073517A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • the present invention relates to image processing devices and methods.
  • the invention finds application in the field of so-called videophones or video telephones and video conference systems.
  • Video communication devices, video communication systems and video communication methods including those for video telephones and video conferences, which in addition to the auditory area also offer visual media or channels for the transmission of sound and image information, are known, but have not yet been widely used by the general public.
  • a major disadvantage of the prior art is that the associated transmission of image information to at least one other communication subscriber often tends to interfere with the privacy of the user
  • the user and / or his communication partner would like to transmit certain visual information or not.
  • the communication participants would prefer to transmit an optimal “desired appearance” tailored to the respective communication partner. This includes not only a suitable background, but also suitable clothing and an advantageous other appearance.
  • the invention is based on the presence of audiovisual communication media.
  • General features of audio-visual communication media are microphone and loudspeaker, video camera and screen, a control unit, an output processing unit for processing outgoing audio and video signals, an input processing unit for processing incoming audio and video signals and preferably a compression unit for optimal use of the available ones Line bandwidth, e.g. via analog and digital telephone networks, packet-controlled communication via the Internet, internal computer networks, etc.
  • the invention relates to image processing devices, such as, for example, in telecommunications or video communication or video conference devices with user image data input devices for entering current user image data, image data editing devices for generating edited user image data from the current user image data, and image data output devices for outputting edited user image data, for example at least one other communication participant.
  • the basis of the present invention is also an image processing method for e.g. Tele or video communication method, wherein current user image data are input into user image data input devices, image data editing devices generate edited user image data from the current user image data and finally edited user image data are output by means of image data output devices.
  • the basis is an appropriately equipped or functioning telecommunications or video communication system.
  • the present invention relates to possibilities for determining the face position, particularly and preferably in the case of video telephones and video cameras, as part of the development of the image data editing devices and the generation of edited user image data
  • Video conferencing applications but also in other comparable applications. Additional features of the device and method of this invention are disclosed below.
  • the head or face of a video communication participant is of particular, but possibly not exclusive, interest. It is therefore necessary that the image data editing means or the editing method perform or include detection or calculation of the location or position of a head or face. Such face position detection is an important step in many applications, such as
  • facial recognition to identify a person.
  • the present invention therefore aims to provide devices and methods for simple and uncomplicated face position detection.
  • the invention thus provides image processing devices, with user image data input devices for entering current user image data, image data editing devices for generating edited user image data from the current user image data, and image data output devices for outputting edited user image data, the image data editing devices for image component localization devices To capture contain the position of header or face data within the current user image data in order to enable a separate editing of the face data within the current user image data. It is further provided that the image component localization devices contain a memory for all or part of the current user image data and processing devices that determine the position of head or face data within the current user image data or the stored part thereof, so that the position of the head - or face data is available for further processing within the image data editing devices.
  • the invention also provides an image processing method in which current user image data are input into user image data input devices, image data editing devices generate edited user image data from the current user image data, and finally edited user image data are output using image data output devices, the position of the head being detected by means of the image data editing devices - or face data is carried out within the current user image data or a part thereof in order to enable separate editing of the head or face data within the current user image data or part thereof. It is further provided that the current user image data or the part thereof is searched for a structure from which the position of the header or face data within the current user image data or part thereof results, and that the position of the head or face data within the current user image data or part thereof is used within the image data editing devices for further processing.
  • the invention takes advantage of the special circumstances of and in video communication, such as a visdeo conference or a videophone connection. It has been recognized that specific problems in the prior art are much easier to solve or do not exist in video communication as a specific target application.
  • the key idea is to take advantage of an additional structure created by the upper half of a user's body, such as a video conference participant's body, is produced.
  • the new technology enables a very effective implementation that easily detects a face in real time.
  • Applications for face detection in this target application are graphic effects for image processing, but also automatic face recognition.
  • the present invention can work with the entire head and therefore does not depend on specific neck or body angles of rotation.
  • the present invention operates in real time using a particular face detection algorithm.
  • the present invention is a simple approach that works well for constraint scenes. This approach has proven to be an effective solution not only for desktop video conferencing or communication applications, but also for the emerging market with handheld video conferencing or communication devices and other similar applications. In all tested applications, cheap desktop video cameras, including so-called “web cams", were used as user image data input devices for entering current user image data, or a digital camera card in connection with a specific hand-held device.
  • the image data editing devices and in particular the image component localization devices are designed for separating personal image data from the entirety of the current user image data.
  • the image component localization devices can be designed to detect a structure within the current user image data or, if appropriate, separate personal image data.
  • Such a structure can be a top or edge or an uppermost point of the user's head, at least one left or right, preferably one left and one right neck point or neck area, and / or at least one left or right, preferably one left and one right expansion maximum of the head of the user within the current user image data or possibly separate personal image data.
  • edge information of the user's head subsequent to the top or edge or top of the user's head and a circle fit based on the top or edge or top of the user's head and the other edge of the user's head.
  • the further edge information of the user's head relates to an area from the top or
  • the circle fit can preferably only within the current user image data or possibly separated personal image data from the top or edge or the uppermost point of the head of the user up to at least about 10 to 40%, in particular 20 to 35% and preferably 30% below the top or edge or the uppermost point of the user's head.
  • a particularly good performance of the image processing devices according to the invention is obtained if the user image data input devices for detecting a
  • Color image of the user and his background are designed, and if the image data editing devices and in particular the image component localization devices are designed to convert the color image data of the current user image data or, if applicable, at least of the person image data into grayscale image data to determine the position of header or face data within convert the current user image data or the stored part thereof.
  • the image processing method according to the invention it can preferably also be provided that personal image data are separated from the total of the current user image data.
  • a structure can be recorded within the current user image data or, if necessary, separated personal image data.
  • This structure can in particular be a top or edge or an uppermost point of the user's head, at least one left or right, preferably one left and one right neck point or neck area, and / or at least one left or right, preferably one left and one right expansion maximum of the head of the user within the current user image data or possibly separate personal image data.
  • the structure of further edge information of the head of the user subsequent to the top or edge or the uppermost point of the head of the user and a circular fit based on the top or edge or the top point of the head of the user Contains the user and the further edge information of the head of the user, the further edge information of the head of the user can range from the top or edge or the uppermost point of the head of the user only up to at least about 10 to 40% - particularly 20 to 35% and preferably 30% below the top or edge or the uppermost point of the head of the user.
  • the circle fit within the current user image data or possibly separated personal image data from the top or edge or the uppermost point of the head of the user can only be up to at least about 10 to 40%, in particular 20 to 35% and preferably 30% below that
  • Header or face data can be converted into grayscale image data within the current user image data or, if necessary, separated personal image data
  • detection There are two different types of detection in the present invention, which are also based on corresponding algorithms, in particular face detection algorithms, which can be used. Both run in real time with a video feed or general input of current user image data by means of user image data input devices, such as a suitably equipped cheap desktop or a hand-held video camera.
  • a color image of the user and his background is preferably recorded, but it is advantageous if the image data editing devices convert the color image data of the current user image data or at least their color image data into simpler and, above all, less extensive with regard to the relevant person image data Convert grayscale image data.
  • the processing of the grayscale image data For the localization of the head or face of the user within the user or person image data is sufficient the processing of the grayscale image data. Further edits, such as replacements or cosmetic edits (see PCT / DE 00/00442) can then be carried out on the color image data in the area of the localized head or face of the user and ultimately lead to the output of edited color image data to the image data output devices.
  • the image data editing devices include image component locating devices and image component editing devices. Whether the conversion of color image data into grayscale image data takes place in the image component localization devices themselves or in the rest of the image data editing devices is not relevant and can be designed in an optimized manner according to the technical equipment of the respective devices. Depending on the data format, the neck area detection, as well as any other type of detection based on
  • Image data is applied, but can also be applied directly to the color image data.
  • both the color image data of the current user image data and any gray level image data possibly generated therefrom are available in a pixel format.
  • the present invention can also be used without restrictions if this image data is in another, for example vector-oriented or -based, format or is more precisely available from the user image data input devices.
  • the first type of detection is the neck area detection.
  • the image data editing devices are designed to use this type of detection in order to use the input current user image data or more precisely current separated person image data, which separation is already taking place in the image data editing devices, to scan them for the given gray level data
  • a current video frame which contains the current snapshot of the person of the user in the form of the person image data and forms or represents the input data in the image component localization devices, is scanned line by line from top to bottom until a non-zero pixel ( a shape pixel) is found. Since the video frame contains no further data apart from the separated person image data, all pixels or storage locations with the exception of those which belong to the separated person image data are zero.
  • Such scanning or browsing is well known in the art and, in terms of apparatus and method, is done in a memory in which the video frame is held for editing and made available.
  • Each memory cell or each memory location of the memory is checked with regard to its / its contents and in the present case become flat identifies the "coordinates" or the exact location that contains information corresponding to said first non-zero pixel.
  • the corresponding work is carried out by processing devices, such as a processor.
  • this first non-zero pixel when scanned, identifies the top or edge or the top point of the user's head within the separated person image data.
  • scanning is started column-by-column from left to right, starting from the left side of the video frame, starting at the bottom of the video frame and continuing up to a level of 30% from the previously identified top point of the user's head.
  • the entire area is scanned or scanned on the trace of the x-scanning minimum and maximum, i.e. in the x direction (in the present example from left to right in the video frame) the minimum and maximum coordinates below each
  • Non-zero pixels were obtained.
  • the minimum x value is the left side of the user neck.
  • the maximum x value above the neck scan line is the far left side of the user head.
  • the second step is repeated, scanning from right to left column by column starting from the right side of the video frame, starting from 30% below the previously identified top point of the user's head.
  • the entire area is scanned or scanned on the track of the -x scan minimum and maximum, i.e. in the -x direction (in the present example, the positive x direction is still from left to right in the video frame), the respective minimum and maximum coordinates under which non-zero pixels were obtained.
  • the minimum -x value is the right side of the neck.
  • the maximum -x value above the minimum -x value (or in other words, above the neck scan line) is the rightmost side of the user head. This enables the rightmost side of the neck and head of the user to be located.
  • the "head top contour” detection starts just like the “neck region” detection.
  • the first step is to line up the input data in the current video frame
  • the top or edge or top of the head is used and the best fit circle is calculated using the stored shape boundary pixels as estimates of the circumference of this best fit circle. Note that the resulting best-fit circle need not pass through any of the stored shape boundary pixels, but it is the circle that minimizes the radius measurement error.
  • the height of the circle is increased by a value known as the elongation factor.
  • the left side of the user head is located by scanning or scanning from left to right, from top to bottom within the area bounded in the vertical direction, multiplied by the top or edge or the top point of the user head and the elongation factor with the calculated circle radius.
  • the use of the invention is particularly in the field of video communication and conference applications.
  • this information is used for face detection and recognition, or this data is used as restrictions for facial features that are considered
  • Input can be used to hide, alter, or replace the user's head or face. Both of these applications have been implemented and tested in a real-time video communication or conferencing application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne des dispositifs de traitement d'images comprenant des dispositifs d'entrée de données image utilisateur, destinés à l'entrée de données image utilisateur actuelles, des dispositifs d'édition de données image destinés à l'obtention de données image utilisateur éditées à partir des données image actuelles, ainsi que des dispositifs de sortie de données image destinés à la sortie de données image utilisateur éditées, caractérisées en ce que les dispositifs d'édition de données image renferment des dispositifs de localisation des constituants images pour la détection de la position des données relatives à la tête et au visage, à l'intérieur des données image utilisateur actuelles, en vue d'obtenir une édition séparée des données visage à l'intérieur des données image utilisateur actuelles. En outre, l'invention est caractérisée en ce que les dispositifs de localisation des constituants image renferment une mémoire pour l'ensemble des données image utilisateur actuelles ou une partie de celles-ci, ainsi que des dispositifs de traitement déterminant la position des données tête ou visage à l'intérieur des données image utilisateur actuelle ou de la partie mémorisée de celles-ci, de façon que la position des données corps ou visage soit disponible pour d'autres traitements à l'intérieur des dispositifs de données image-édition. L'invention concerne en outre un procédé de traitement d'images, caractérisé les données image utilisateur actuelles sont introduites dans des dispositifs d'entrée de données image utilisateur, en ce que les dispositifs d'édition-données image fournissent, à partir des données image utilisateur actuelles, des données image utilisateur éditées, et en ce que les données image utilisateur finalement éditées sont extraites au moyen de dispositifs de sortie de données image, en ce qu'au moyen des dispositifs d'édition-données image, on effectue une détection de la position des données tête ou visage à l'intérieur des données image utilisateur actuelles ou d'une partie de celles-ci, en vue d'avoir une édition séparée des données tête ou visage à l'intérieur des données image utilisateur actuelles ou d'une partie de celles-ci. Il est en outre prévu que les données image utilisateur actuelles ou une partie de celles-ci sont analysées suivant une structure à partir de laquelle on obtient la position des données tête ou visage à l'intérieur des données image utilisateur actuelles ou d'une partie de celles-ci, et que la position des données tête ou visage à l'intérieur des données image utilisateur actuelles ou d'une partie de celles-ci est utilisée à l'intérieur des dispositifs données image-édition pour d'autres traitements.
PCT/EP2002/002791 2001-03-13 2002-03-13 Dispositifs et procedes de traitement d'images WO2002073517A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP01106182 2001-03-13
EP01106182.7 2001-03-13

Publications (1)

Publication Number Publication Date
WO2002073517A1 true WO2002073517A1 (fr) 2002-09-19

Family

ID=8176766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/002791 WO2002073517A1 (fr) 2001-03-13 2002-03-13 Dispositifs et procedes de traitement d'images

Country Status (1)

Country Link
WO (1) WO2002073517A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009056919A1 (fr) * 2007-10-30 2009-05-07 Sony Ericsson Mobile Communications Ab Système et procédé de rendu et de sélection d'une partie discrète d'une image numérique à des fins de manipulation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852669A (en) * 1994-04-06 1998-12-22 Lucent Technologies Inc. Automatic face and facial feature location detection for low bit rate model-assisted H.261 compatible coding of video
US5870138A (en) * 1995-03-31 1999-02-09 Hitachi, Ltd. Facial image processing
EP0990416A1 (fr) * 1998-10-01 2000-04-05 Mitsubishi Denki Kabushiki Kaisha Système de classement pour les directions de vue d'une personne

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852669A (en) * 1994-04-06 1998-12-22 Lucent Technologies Inc. Automatic face and facial feature location detection for low bit rate model-assisted H.261 compatible coding of video
US5870138A (en) * 1995-03-31 1999-02-09 Hitachi, Ltd. Facial image processing
EP0990416A1 (fr) * 1998-10-01 2000-04-05 Mitsubishi Denki Kabushiki Kaisha Système de classement pour les directions de vue d'une personne

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HU J ET AL: "Locating head and face boundaries for head-shoulder images", PATTERN RECOGNITION, PERGAMON PRESS INC. ELMSFORD, N.Y, US, vol. 32, no. 8, August 1999 (1999-08-01), pages 1317 - 1333, XP004169481, ISSN: 0031-3203 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009056919A1 (fr) * 2007-10-30 2009-05-07 Sony Ericsson Mobile Communications Ab Système et procédé de rendu et de sélection d'une partie discrète d'une image numérique à des fins de manipulation

Similar Documents

Publication Publication Date Title
DE69938173T2 (de) Automatische voreingestellte teilnehmerpositionsbestimmung für videokonferenzsysteme
DE69628282T2 (de) Verfahren zur kompression mehrerer videobilder
DE112013001461B4 (de) Modifizieren des Aussehens eines Teilnehmers während einer Videokonferenz
DE69837233T2 (de) Verfahren und Gerät zur Bestimmung der Augenposition in einem Bild
DE3823219C1 (fr)
DE69530908T2 (de) Verfahren und Vorrichtung zur Bildkodierung
DE60037485T2 (de) Signalverarbeitungsverfahren und Videosignalprozessor zum Ermitteln und Analysieren eines Bild- und/oder Audiomusters
DE102005024097B4 (de) Mechanisches Schwenken, Neigen und Zoomen bei einer Webcam
DE60108373T2 (de) Verfahren zur Detektion von Emotionen in Sprachsignalen unter Verwendung von Sprecheridentifikation
DE602004003443T2 (de) Sprachperiodenerkennung basierend auf Elektromyographie
EP1667113A2 (fr) Procédé pour la saisie sélective de signaux sonores
DE3322413A1 (de) Videouebertragung mit verringerter bandbreite
WO2000021021A1 (fr) Procede et systeme de reconnaissance de personne avec localisation modelisee du visage
DE102011085361A1 (de) Mikrofoneinrichtung
EP0814611A2 (fr) Méthode et système de communication pour l'enregistrement et la gestion d'images numériques
DE10030105A1 (de) Spracherkennungseinrichtung
DE4242796A1 (en) High efficiency coding of two level mixed natural images for image transmission - identifying input signals and digitising before image synthesis and coding processing into final form
DE102005014772A1 (de) Verfahren zur Darstellung eines einem Kommunikationsteilnehmer zugeordneten Bildes an einem Kommunikationsendgerät
EP1976291B1 (fr) Procédé et système de communication vidéo destinés à la commande en temps réel basée sur la gestuelle d'un avatar
WO2002073517A1 (fr) Dispositifs et procedes de traitement d'images
DE102020211007A1 (de) Bildverarbeitungseinrichtung und Bildverarbeitungsverfahren, und Bildverarbeitungsprogramm
EP0897638B1 (fr) Procede d'affectation automatique a l'image d'un participant a une videoconference, d'un signal vocal emanant de lui
DE10221391B4 (de) Verfahren zum Analysieren einer Szene
DE10126375B4 (de) Verfahren und System zur Erkennung von Objekten
DE102004040023B4 (de) Verfahren, Vorrichtung, Anordnung, Computerlesbares Speichermedium und Programm-Element zum nachgeführten Anzeigen eines menschlichen Gesichts

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP