WO2014001095A1 - Method for audiovisual content dubbing - Google Patents
Method for audiovisual content dubbing Download PDFInfo
- Publication number
- WO2014001095A1 WO2014001095A1 PCT/EP2013/062243 EP2013062243W WO2014001095A1 WO 2014001095 A1 WO2014001095 A1 WO 2014001095A1 EP 2013062243 W EP2013062243 W EP 2013062243W WO 2014001095 A1 WO2014001095 A1 WO 2014001095A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- zones
- traceable
- objects
- video shot
- traceable objects
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 210000000887 face Anatomy 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4318—Generation of visual interfaces for content selection or interaction; Content or additional data rendering by altering the content in the rendering process, e.g. blanking, blurring or masking an image region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
Definitions
- the present invention relates to a solution for processing a video shot.
- the invention relates to a method for ergonomic and secure dubbing of audiovisual content.
- a first solution to solve this issue is to add subtitles in the destination country language.
- Another, sometimes preferred solution is to replace the original audio track in the original language with an audio track in the language of the destination country.
- a particular step of replacing the voices in the original language with the voices in the destination language called dubbing, is required.
- the dubbers advantageously follow with their eyes the faces on a screen, and especially the lips, related to the dubbed voices, such that the added voices are well synchronized with the actions occurring in the movie.
- a problem is that it is cumbersome for the dubber to follow with the eyes the face related to the voice he wants to dub.
- a method for processing a video shot comprising a set of traceable objects comprises the steps of:
- the objects of interest for the dubber do not move, and it is easier for him to follow, for example, a face with his eyes.
- blurring the video content outside the positioned zones i.e. the geographical complement of the positioned zones, allows to keeping the visual attention of the dubber centered on the objects of interest displayed on the screen.
- the size of the processed video is lighter in terms of memory. Therefore, the processed video can be transmitted more easily, for example via a network.
- the video quality of the processed video may be kept maximal at the selected zones, allowing the dubbers to better follow in particular the movement of the lips when human faces appear.
- Another benefit of the blurring is that, if the processed video is intercepted by movie pirates, this processed video has no commercial value in comparison to the original video to be processed. As a result, the dubbing can, thanks to the
- the method also allows to input coordinates for defining the positioning of the one or more positioned zones, e.g. rectangles. In this way an overlap of zones is avoidable.
- the method also comprises a step of resizing the one or more selected traceable objects. In this way, when a traceable object, for example a face, is tracked, the resizing by zooming allows to better follow the lips of a talking face. Reducing the size may also avoid the overlap of two talking faces.
- Fig. 1 illustrates a step of selecting a traceable object according to the present invention
- Fig. 2 illustrates the processing of a video shot according to the present invention.
- a video shot comprising a set of traceable objects is provided.
- An example of such traceable objects are human faces appearing on videos. Their detection is well documented by the state of the art, for example in international application
- traceable objects are human faces.
- the user is provided with a set of traceable objects A, B, C and D, from which the user, thanks to a dedicated user interface, selects the traceable objects he wants to track. He selects A, B and C for example.
- the set of traceable objects may be extracted from the first frame of the video shot to process.
- zones encompassing the selected objects are positioned at a same position between frames t, appearing at time t, and frame t+dt appearing at time t+dt, of the video shot to process, as illustrated in Fig. 2.
- the zones are rectangles in the provided example. Everything outside the zones is displayed with a blur on it. For example, it is made totally black.
- a user may define the positioning of one or more selected traceable objects with the aid of the dedicated user interface .
- the selected traceable objects are resized. This allows more easily following the lips of a talking face and avoiding the overlap of two talking faces.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
A method for processing a video shot comprising a set of traceable objects is described. One or more traceable objects of the set of traceable objects are selected. One or more zones respectively encompassing the one or more selected traceable objects are positioned such that the respective zones are kept at the same respective position between subsequent frames of the video shot. Content of the video shot outside the positioned zones is blurred.
Description
METHOD FOR AUDIOVISUAL CONTENT DUBBING
FIELD OF THE INVENTION The present invention relates to a solution for processing a video shot. In particular, the invention relates to a method for ergonomic and secure dubbing of audiovisual content.
BACKGROUND OF THE INVENTION
When a movie meets success in a given country, it is generally exported to other countries. But the other countries do not necessarily speak the same language. A first solution to solve this issue is to add subtitles in the destination country language. Another, sometimes preferred solution, is to replace the original audio track in the original language with an audio track in the language of the destination country. In this second solution, a particular step of replacing the voices in the original language with the voices in the destination language, called dubbing, is required. In order to perform this dubbing in good conditions, the dubbers advantageously follow with their eyes the faces on a screen, and especially the lips, related to the dubbed voices, such that the added voices are well synchronized with the actions occurring in the movie.
A problem is that it is cumbersome for the dubber to follow with the eyes the face related to the voice he wants to dub.
SUMMARY OF THE INVENTION
It is an object of the present invention to solve the
aforementioned problem and propose an improved solution for dubbing of audiovisual content.
According to the invention, a method for processing a video shot comprising a set of traceable objects comprises the steps of:
- selecting one or more traceable object of the set of
traceable objects;
- positioning one or more zones respectively encompassing the one or more selected traceable objects such that the respective zones are kept at the same respective position between
subsequent frames of the video shot; and
- blurring content of the video shot outside the positioned zones .
In this way, when displayed on a screen, the objects of interest for the dubber do not move, and it is easier for him to follow, for example, a face with his eyes. In addition to that, blurring the video content outside the positioned zones, i.e. the geographical complement of the positioned zones, allows to keeping the visual attention of the dubber centered on the objects of interest displayed on the screen.
Another benefit is that the size of the processed video is lighter in terms of memory. Therefore, the processed video can be transmitted more easily, for example via a network. The video quality of the processed video may be kept maximal at the selected zones, allowing the dubbers to better follow in particular the movement of the lips when human faces appear.
Another benefit of the blurring is that, if the processed video is intercepted by movie pirates, this processed video has no commercial value in comparison to the original video to be processed. As a result, the dubbing can, thanks to the
described method, be performed in both a secure and quick manner .
Advantageously, the method also allows to input coordinates for defining the positioning of the one or more positioned zones, e.g. rectangles. In this way an overlap of zones is avoidable. Advantageously, the method also comprises a step of resizing the one or more selected traceable objects. In this way, when a traceable object, for example a face, is tracked, the resizing by zooming allows to better follow the lips of a talking face. Reducing the size may also avoid the overlap of two talking faces.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims. In the figures:
Fig. 1 illustrates a step of selecting a traceable object according to the present invention; and Fig. 2 illustrates the processing of a video shot according to the present invention.
DETAILED DESCRIPTION OF PREFERED EMBODIMENTS A video shot comprising a set of traceable objects is provided. An example of such traceable objects are human faces appearing on videos. Their detection is well documented by the state of the art, for example in international application
PCT/GB2003/005186.
For the sake of clarity, in the following it will be considered that traceable objects are human faces. As illustrated in Fig. 1, the user is provided with a set of traceable objects A, B, C and D, from which the user, thanks to a dedicated user interface, selects the traceable objects he wants to track. He selects A, B and C for example. The set of traceable objects may be extracted from the first frame of the video shot to process.
Then, once the objects have been selected by the user, zones encompassing the selected objects are positioned at a same position between frames t, appearing at time t, and frame t+dt appearing at time t+dt, of the video shot to process, as illustrated in Fig. 2. The zones are rectangles in the provided example. Everything outside the zones is displayed with a blur on it. For example, it is made totally black. Optionally, a user may define the positioning of one or more selected traceable objects with the aid of the dedicated user interface .
Advantageously, the selected traceable objects are resized. This allows more easily following the lips of a talking face and avoiding the overlap of two talking faces.
Claims
1. A method for processing a video shot comprising a set of
traceable objects, the method comprising the steps of:
- selecting (101) one or more traceable object of the set of traceable objects;
- positioning one or more zones respectively encompassing the one or more selected traceable objects such that the respective zones are kept at the same respective position (102) between subsequent frames of the video shot; and
- blurring content of the video shot outside the positioned zones .
The method according to claim 1, further comprising the step of inputting coordinates for defining the positioning of the one or more positioned zones.
The method according to claim 1 or 2, further comprising the step of resizing the one or more selected traceable objects.
The method according to one of claims 1 to 3, wherein the one or more positioned zones are rectangles.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12305743.2 | 2012-06-26 | ||
EP12305743 | 2012-06-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014001095A1 true WO2014001095A1 (en) | 2014-01-03 |
Family
ID=48607288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2013/062243 WO2014001095A1 (en) | 2012-06-26 | 2013-06-13 | Method for audiovisual content dubbing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014001095A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819313A (en) * | 2019-01-10 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and storage medium |
CN111666831A (en) * | 2020-05-18 | 2020-09-15 | 武汉理工大学 | Decoupling representation learning-based speaking face video generation method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6297846B1 (en) * | 1996-05-30 | 2001-10-02 | Fujitsu Limited | Display control system for videoconference terminals |
US20050265603A1 (en) * | 2004-05-28 | 2005-12-01 | Porter Robert M S | Image processing |
US20080240563A1 (en) * | 2007-03-30 | 2008-10-02 | Casio Computer Co., Ltd. | Image pickup apparatus equipped with face-recognition function |
US20080259154A1 (en) * | 2007-04-20 | 2008-10-23 | General Instrument Corporation | Simulating Short Depth of Field to Maximize Privacy in Videotelephony |
JP2009288945A (en) * | 2008-05-28 | 2009-12-10 | Canon Inc | Image display unit and image display method |
FR2951605A1 (en) * | 2009-10-15 | 2011-04-22 | Thomson Licensing | METHOD FOR ADDING SOUND CONTENT TO VIDEO CONTENT AND DEVICE USING THE METHOD |
US20110216158A1 (en) * | 2010-03-05 | 2011-09-08 | Tessera Technologies Ireland Limited | Object Detection and Rendering for Wide Field of View (WFOV) Image Acquisition Systems |
US20120051658A1 (en) * | 2010-08-30 | 2012-03-01 | Xin Tong | Multi-image face-based image processing |
-
2013
- 2013-06-13 WO PCT/EP2013/062243 patent/WO2014001095A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6297846B1 (en) * | 1996-05-30 | 2001-10-02 | Fujitsu Limited | Display control system for videoconference terminals |
US20050265603A1 (en) * | 2004-05-28 | 2005-12-01 | Porter Robert M S | Image processing |
US20080240563A1 (en) * | 2007-03-30 | 2008-10-02 | Casio Computer Co., Ltd. | Image pickup apparatus equipped with face-recognition function |
US20080259154A1 (en) * | 2007-04-20 | 2008-10-23 | General Instrument Corporation | Simulating Short Depth of Field to Maximize Privacy in Videotelephony |
JP2009288945A (en) * | 2008-05-28 | 2009-12-10 | Canon Inc | Image display unit and image display method |
FR2951605A1 (en) * | 2009-10-15 | 2011-04-22 | Thomson Licensing | METHOD FOR ADDING SOUND CONTENT TO VIDEO CONTENT AND DEVICE USING THE METHOD |
US20110216158A1 (en) * | 2010-03-05 | 2011-09-08 | Tessera Technologies Ireland Limited | Object Detection and Rendering for Wide Field of View (WFOV) Image Acquisition Systems |
US20120051658A1 (en) * | 2010-08-30 | 2012-03-01 | Xin Tong | Multi-image face-based image processing |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109819313A (en) * | 2019-01-10 | 2019-05-28 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and storage medium |
CN109819313B (en) * | 2019-01-10 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Video processing method, device and storage medium |
CN111666831A (en) * | 2020-05-18 | 2020-09-15 | 武汉理工大学 | Decoupling representation learning-based speaking face video generation method |
CN111666831B (en) * | 2020-05-18 | 2023-06-20 | 武汉理工大学 | Method for generating face video of speaker based on decoupling expression learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101527672B1 (en) | System and method for video caption re-overlaying for video adaptation and retargeting | |
EP2893700B1 (en) | Generating and rendering synthesized views with multiple video streams in telepresence video conference sessions | |
CN109074404B (en) | Method and apparatus for providing content navigation | |
US20090189912A1 (en) | Animation judder compensation | |
US20140240472A1 (en) | 3d subtitle process device and 3d subtitle process method | |
EP2404452A1 (en) | 3d video processing | |
US10021433B1 (en) | Video-production system with social-media features | |
US20200112712A1 (en) | Placement And Dynamic Rendering Of Caption Information In Virtual Reality Video | |
WO2013088688A1 (en) | Image processing device and image processing method | |
JP2011097470A (en) | Stereoscopic image reproduction apparatus, stereoscopic image reproduction method, and stereoscopic image reproduction system | |
JP2011216937A (en) | Stereoscopic image display device | |
US9111363B2 (en) | Video playback apparatus and video playback method | |
JP5599063B2 (en) | Display control apparatus, display control method, and program | |
US9426445B2 (en) | Image processing apparatus and image processing method and program using super-resolution and sharpening | |
WO2014001095A1 (en) | Method for audiovisual content dubbing | |
US20120008692A1 (en) | Image processing device and image processing method | |
US8169541B2 (en) | Method of converting frame rate of video signal | |
JP2011244328A (en) | Video reproduction apparatus and video reproduction apparatus control method | |
JP5066041B2 (en) | Image signal processing apparatus and image signal processing method | |
US20120127265A1 (en) | Apparatus and method for stereoscopic effect adjustment on video display | |
US10764655B2 (en) | Main and immersive video coordination system and method | |
KR20110018523A (en) | Apparatus and method for compensating of image in image display device | |
JP2012004653A (en) | Image processing system and its control method | |
CN110121032B (en) | Method, device and equipment for displaying animation special effect and storage medium | |
KR20100065318A (en) | Method and device for creating a modified video from an input video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13728213 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13728213 Country of ref document: EP Kind code of ref document: A1 |