WO2014001095A1 - Method for audiovisual content dubbing - Google Patents

Method for audiovisual content dubbing Download PDF

Info

Publication number
WO2014001095A1
WO2014001095A1 PCT/EP2013/062243 EP2013062243W WO2014001095A1 WO 2014001095 A1 WO2014001095 A1 WO 2014001095A1 EP 2013062243 W EP2013062243 W EP 2013062243W WO 2014001095 A1 WO2014001095 A1 WO 2014001095A1
Authority
WO
WIPO (PCT)
Prior art keywords
zones
traceable
objects
video shot
traceable objects
Prior art date
Application number
PCT/EP2013/062243
Other languages
French (fr)
Inventor
Pierre Hellier
Lionel Oisel
Patrick Perez
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2014001095A1 publication Critical patent/WO2014001095A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4318Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body

Definitions

  • the present invention relates to a solution for processing a video shot.
  • the invention relates to a method for ergonomic and secure dubbing of audiovisual content.
  • a first solution to solve this issue is to add subtitles in the destination country language.
  • Another, sometimes preferred solution is to replace the original audio track in the original language with an audio track in the language of the destination country.
  • a particular step of replacing the voices in the original language with the voices in the destination language called dubbing, is required.
  • the dubbers advantageously follow with their eyes the faces on a screen, and especially the lips, related to the dubbed voices, such that the added voices are well synchronized with the actions occurring in the movie.
  • a problem is that it is cumbersome for the dubber to follow with the eyes the face related to the voice he wants to dub.
  • a method for processing a video shot comprising a set of traceable objects comprises the steps of:
  • the objects of interest for the dubber do not move, and it is easier for him to follow, for example, a face with his eyes.
  • blurring the video content outside the positioned zones i.e. the geographical complement of the positioned zones, allows to keeping the visual attention of the dubber centered on the objects of interest displayed on the screen.
  • the size of the processed video is lighter in terms of memory. Therefore, the processed video can be transmitted more easily, for example via a network.
  • the video quality of the processed video may be kept maximal at the selected zones, allowing the dubbers to better follow in particular the movement of the lips when human faces appear.
  • Another benefit of the blurring is that, if the processed video is intercepted by movie pirates, this processed video has no commercial value in comparison to the original video to be processed. As a result, the dubbing can, thanks to the
  • the method also allows to input coordinates for defining the positioning of the one or more positioned zones, e.g. rectangles. In this way an overlap of zones is avoidable.
  • the method also comprises a step of resizing the one or more selected traceable objects. In this way, when a traceable object, for example a face, is tracked, the resizing by zooming allows to better follow the lips of a talking face. Reducing the size may also avoid the overlap of two talking faces.
  • Fig. 1 illustrates a step of selecting a traceable object according to the present invention
  • Fig. 2 illustrates the processing of a video shot according to the present invention.
  • a video shot comprising a set of traceable objects is provided.
  • An example of such traceable objects are human faces appearing on videos. Their detection is well documented by the state of the art, for example in international application
  • traceable objects are human faces.
  • the user is provided with a set of traceable objects A, B, C and D, from which the user, thanks to a dedicated user interface, selects the traceable objects he wants to track. He selects A, B and C for example.
  • the set of traceable objects may be extracted from the first frame of the video shot to process.
  • zones encompassing the selected objects are positioned at a same position between frames t, appearing at time t, and frame t+dt appearing at time t+dt, of the video shot to process, as illustrated in Fig. 2.
  • the zones are rectangles in the provided example. Everything outside the zones is displayed with a blur on it. For example, it is made totally black.
  • a user may define the positioning of one or more selected traceable objects with the aid of the dedicated user interface .
  • the selected traceable objects are resized. This allows more easily following the lips of a talking face and avoiding the overlap of two talking faces.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A method for processing a video shot comprising a set of traceable objects is described. One or more traceable objects of the set of traceable objects are selected. One or more zones respectively encompassing the one or more selected traceable objects are positioned such that the respective zones are kept at the same respective position between subsequent frames of the video shot. Content of the video shot outside the positioned zones is blurred.

Description

METHOD FOR AUDIOVISUAL CONTENT DUBBING
FIELD OF THE INVENTION The present invention relates to a solution for processing a video shot. In particular, the invention relates to a method for ergonomic and secure dubbing of audiovisual content.
BACKGROUND OF THE INVENTION
When a movie meets success in a given country, it is generally exported to other countries. But the other countries do not necessarily speak the same language. A first solution to solve this issue is to add subtitles in the destination country language. Another, sometimes preferred solution, is to replace the original audio track in the original language with an audio track in the language of the destination country. In this second solution, a particular step of replacing the voices in the original language with the voices in the destination language, called dubbing, is required. In order to perform this dubbing in good conditions, the dubbers advantageously follow with their eyes the faces on a screen, and especially the lips, related to the dubbed voices, such that the added voices are well synchronized with the actions occurring in the movie.
A problem is that it is cumbersome for the dubber to follow with the eyes the face related to the voice he wants to dub.
SUMMARY OF THE INVENTION
It is an object of the present invention to solve the
aforementioned problem and propose an improved solution for dubbing of audiovisual content. According to the invention, a method for processing a video shot comprising a set of traceable objects comprises the steps of:
- selecting one or more traceable object of the set of
traceable objects;
- positioning one or more zones respectively encompassing the one or more selected traceable objects such that the respective zones are kept at the same respective position between
subsequent frames of the video shot; and
- blurring content of the video shot outside the positioned zones .
In this way, when displayed on a screen, the objects of interest for the dubber do not move, and it is easier for him to follow, for example, a face with his eyes. In addition to that, blurring the video content outside the positioned zones, i.e. the geographical complement of the positioned zones, allows to keeping the visual attention of the dubber centered on the objects of interest displayed on the screen.
Another benefit is that the size of the processed video is lighter in terms of memory. Therefore, the processed video can be transmitted more easily, for example via a network. The video quality of the processed video may be kept maximal at the selected zones, allowing the dubbers to better follow in particular the movement of the lips when human faces appear.
Another benefit of the blurring is that, if the processed video is intercepted by movie pirates, this processed video has no commercial value in comparison to the original video to be processed. As a result, the dubbing can, thanks to the
described method, be performed in both a secure and quick manner . Advantageously, the method also allows to input coordinates for defining the positioning of the one or more positioned zones, e.g. rectangles. In this way an overlap of zones is avoidable. Advantageously, the method also comprises a step of resizing the one or more selected traceable objects. In this way, when a traceable object, for example a face, is tracked, the resizing by zooming allows to better follow the lips of a talking face. Reducing the size may also avoid the overlap of two talking faces.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims. In the figures:
Fig. 1 illustrates a step of selecting a traceable object according to the present invention; and Fig. 2 illustrates the processing of a video shot according to the present invention.
DETAILED DESCRIPTION OF PREFERED EMBODIMENTS A video shot comprising a set of traceable objects is provided. An example of such traceable objects are human faces appearing on videos. Their detection is well documented by the state of the art, for example in international application
PCT/GB2003/005186. For the sake of clarity, in the following it will be considered that traceable objects are human faces. As illustrated in Fig. 1, the user is provided with a set of traceable objects A, B, C and D, from which the user, thanks to a dedicated user interface, selects the traceable objects he wants to track. He selects A, B and C for example. The set of traceable objects may be extracted from the first frame of the video shot to process.
Then, once the objects have been selected by the user, zones encompassing the selected objects are positioned at a same position between frames t, appearing at time t, and frame t+dt appearing at time t+dt, of the video shot to process, as illustrated in Fig. 2. The zones are rectangles in the provided example. Everything outside the zones is displayed with a blur on it. For example, it is made totally black. Optionally, a user may define the positioning of one or more selected traceable objects with the aid of the dedicated user interface .
Advantageously, the selected traceable objects are resized. This allows more easily following the lips of a talking face and avoiding the overlap of two talking faces.

Claims

1. A method for processing a video shot comprising a set of
traceable objects, the method comprising the steps of:
- selecting (101) one or more traceable object of the set of traceable objects;
- positioning one or more zones respectively encompassing the one or more selected traceable objects such that the respective zones are kept at the same respective position (102) between subsequent frames of the video shot; and
- blurring content of the video shot outside the positioned zones .
The method according to claim 1, further comprising the step of inputting coordinates for defining the positioning of the one or more positioned zones.
The method according to claim 1 or 2, further comprising the step of resizing the one or more selected traceable objects.
The method according to one of claims 1 to 3, wherein the one or more positioned zones are rectangles.
PCT/EP2013/062243 2012-06-26 2013-06-13 Method for audiovisual content dubbing WO2014001095A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP12305743.2 2012-06-26
EP12305743 2012-06-26

Publications (1)

Publication Number Publication Date
WO2014001095A1 true WO2014001095A1 (en) 2014-01-03

Family

ID=48607288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/062243 WO2014001095A1 (en) 2012-06-26 2013-06-13 Method for audiovisual content dubbing

Country Status (1)

Country Link
WO (1) WO2014001095A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
CN111666831A (en) * 2020-05-18 2020-09-15 武汉理工大学 Decoupling representation learning-based speaking face video generation method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6297846B1 (en) * 1996-05-30 2001-10-02 Fujitsu Limited Display control system for videoconference terminals
US20050265603A1 (en) * 2004-05-28 2005-12-01 Porter Robert M S Image processing
US20080240563A1 (en) * 2007-03-30 2008-10-02 Casio Computer Co., Ltd. Image pickup apparatus equipped with face-recognition function
US20080259154A1 (en) * 2007-04-20 2008-10-23 General Instrument Corporation Simulating Short Depth of Field to Maximize Privacy in Videotelephony
JP2009288945A (en) * 2008-05-28 2009-12-10 Canon Inc Image display unit and image display method
FR2951605A1 (en) * 2009-10-15 2011-04-22 Thomson Licensing METHOD FOR ADDING SOUND CONTENT TO VIDEO CONTENT AND DEVICE USING THE METHOD
US20110216158A1 (en) * 2010-03-05 2011-09-08 Tessera Technologies Ireland Limited Object Detection and Rendering for Wide Field of View (WFOV) Image Acquisition Systems
US20120051658A1 (en) * 2010-08-30 2012-03-01 Xin Tong Multi-image face-based image processing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6297846B1 (en) * 1996-05-30 2001-10-02 Fujitsu Limited Display control system for videoconference terminals
US20050265603A1 (en) * 2004-05-28 2005-12-01 Porter Robert M S Image processing
US20080240563A1 (en) * 2007-03-30 2008-10-02 Casio Computer Co., Ltd. Image pickup apparatus equipped with face-recognition function
US20080259154A1 (en) * 2007-04-20 2008-10-23 General Instrument Corporation Simulating Short Depth of Field to Maximize Privacy in Videotelephony
JP2009288945A (en) * 2008-05-28 2009-12-10 Canon Inc Image display unit and image display method
FR2951605A1 (en) * 2009-10-15 2011-04-22 Thomson Licensing METHOD FOR ADDING SOUND CONTENT TO VIDEO CONTENT AND DEVICE USING THE METHOD
US20110216158A1 (en) * 2010-03-05 2011-09-08 Tessera Technologies Ireland Limited Object Detection and Rendering for Wide Field of View (WFOV) Image Acquisition Systems
US20120051658A1 (en) * 2010-08-30 2012-03-01 Xin Tong Multi-image face-based image processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819313A (en) * 2019-01-10 2019-05-28 腾讯科技(深圳)有限公司 Method for processing video frequency, device and storage medium
CN109819313B (en) * 2019-01-10 2021-01-08 腾讯科技(深圳)有限公司 Video processing method, device and storage medium
CN111666831A (en) * 2020-05-18 2020-09-15 武汉理工大学 Decoupling representation learning-based speaking face video generation method
CN111666831B (en) * 2020-05-18 2023-06-20 武汉理工大学 Method for generating face video of speaker based on decoupling expression learning

Similar Documents

Publication Publication Date Title
KR101527672B1 (en) System and method for video caption re-overlaying for video adaptation and retargeting
EP2893700B1 (en) Generating and rendering synthesized views with multiple video streams in telepresence video conference sessions
CN109074404B (en) Method and apparatus for providing content navigation
US20090189912A1 (en) Animation judder compensation
US20140240472A1 (en) 3d subtitle process device and 3d subtitle process method
EP2404452A1 (en) 3d video processing
US10021433B1 (en) Video-production system with social-media features
US20200112712A1 (en) Placement And Dynamic Rendering Of Caption Information In Virtual Reality Video
WO2013088688A1 (en) Image processing device and image processing method
JP2011097470A (en) Stereoscopic image reproduction apparatus, stereoscopic image reproduction method, and stereoscopic image reproduction system
JP2011216937A (en) Stereoscopic image display device
US9111363B2 (en) Video playback apparatus and video playback method
JP5599063B2 (en) Display control apparatus, display control method, and program
US9426445B2 (en) Image processing apparatus and image processing method and program using super-resolution and sharpening
WO2014001095A1 (en) Method for audiovisual content dubbing
US20120008692A1 (en) Image processing device and image processing method
US8169541B2 (en) Method of converting frame rate of video signal
JP2011244328A (en) Video reproduction apparatus and video reproduction apparatus control method
JP5066041B2 (en) Image signal processing apparatus and image signal processing method
US20120127265A1 (en) Apparatus and method for stereoscopic effect adjustment on video display
US10764655B2 (en) Main and immersive video coordination system and method
KR20110018523A (en) Apparatus and method for compensating of image in image display device
JP2012004653A (en) Image processing system and its control method
CN110121032B (en) Method, device and equipment for displaying animation special effect and storage medium
KR20100065318A (en) Method and device for creating a modified video from an input video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13728213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13728213

Country of ref document: EP

Kind code of ref document: A1