WO2021009298A1 - Lip sync management device - Google Patents

Lip sync management device Download PDF

Info

Publication number
WO2021009298A1
WO2021009298A1 PCT/EP2020/070173 EP2020070173W WO2021009298A1 WO 2021009298 A1 WO2021009298 A1 WO 2021009298A1 EP 2020070173 W EP2020070173 W EP 2020070173W WO 2021009298 A1 WO2021009298 A1 WO 2021009298A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
acoustic signal
rendering device
lip sync
brightness change
Prior art date
Application number
PCT/EP2020/070173
Other languages
French (fr)
Inventor
Stefan KRÄGELOH
Thomas Heller
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Publication of WO2021009298A1 publication Critical patent/WO2021009298A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Definitions

  • the present application is concerned with synchronization of video signals and audio signals, lip sync, management.
  • lip sync also written as lipsync
  • Lip sync technology is used, for example, a home cinema system which is composed of a set of devices comprising at least rendering devices to render the audio/video signal and source devices such as a Set-Top-Box, STB, a Blue-Ray player or a video game console.
  • Video rendering devices such as televisions, projectors and head-mounted displays allow to display the images corresponding to the video signal.
  • Audio rendering devices such as audio amplifiers connected to sets of loudspeakers, sound bars, and headphones allow to output the sound waves corresponding to the audio signal. Many topologies of devices are possible and different types of connections are applicable.
  • Each rendering device includes a latency for processing the signal.
  • This latency varies depending on the type of signal, audio or video, between devices and also depends on the rendering mode chosen for a same device.
  • a television has video rendering modes with minimal processing for low latency applications such as games leading to a video latency of about 30ms.
  • the STB delays the video signal, e.g., 180ms the TV needs 70ms which will add up to 250ms in total the same number at the MPEG-H audio latency.
  • the difference of latency between the audio and the video signal generates a so-called lip sync issue, noticeable by the viewer when the delay between the image and the sound is too large: when the sound is advance with respect to the video of more than 45ms or when the sound is late with respect to the video of more than 125ms according to the recommendation BT.1359-1 of the International Telecommunication Union (ITU).
  • ITU International Telecommunication Union
  • lip sync issue can severely impair the viewing experience.
  • lip sync has always been considered in the case where video processing is longer than audio processing. This might change by using modern audio codecs the decoding time of audio exceeds the decoding time of video, for example, with the advent of 3D audio where more complex processing will be required on the audio signal, potentially causing audio latencies up to 250ms.
  • the STB needs to adjust the lipsync therefore the data about lipsync is transmitted in the present invention to the STB.
  • the object of the present invention to provide a concept for accurate and simplified lip sync management.
  • a video rendering device for displaying a video comprises a flash detector which detects an abrupt brightness change, e.g., a flash light or an optical signal, of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change. That is, the video signal is converted to the acoustic signal and, therefore, only one measurement approach is required to synchronize the video signals and the audio signals. Hence, the measurement approach regarding lip sync is simplified and it is possible to improve the lip sync accuracy by using the same measuring approach for the video and the audio.
  • the optical signal of the video is converted to the acoustic signal, it is possible to adjust synchronization between the video (image) and the audio by human, since adjustment is able by hearing the acoustic signal, e.g., converted sound.
  • a lip sync unit e.g., a processing unit of the video content (source device), which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal or vice versa, and issues a command for a delay adjust, of a delay of an imaging device, e.g., display for rendering an image and/or of a sound reproduction device. That is, the command for a delay adjust is determined based on the different acoustic signals.
  • the command is preferably issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a lip sync point.
  • the lip sync unit of the present application it is possible to synchronize the video and the audio in the simplified way.
  • a lip sync management system for synchronizing audio signals on a sound rendering device, e.g., a soundbar or a loudspeaker or any device to provide sound contents, and video signals on a video rendering device, e.g., TV or any device to display the video contents, comprising: a convertor circuit for converting an optical signal to an acoustical signal, wherein the convertor circuit comprises a flash detector which detects abrupt brightness change of the video from the video rendering device, and a signalling device which provides a first acoustic signal in response to the detection of the brightness change, a microphone, e.g., any device suitable for recoding acoustic signals, configured to capture the first acoustic signal and a second acoustic signal from the sound rendering device, and the lip sync unit which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change,
  • the audio signals and the video signals are synchronizing based on the difference between the first and second acoustic signals.
  • the convertor circuit is not included in the video rendering device or another device, and, therefore, it is possible to incorporate or integrate the convertor circuit into another device, e.g. , a remote controller of the video rendering device or the lip sync unit or the sound rendering device, and, therefore, it is possible to reduce required modification and costs for introducing the convertor circuit.
  • the converter circuit is not a part of another device, and, therefore, it is possible to adjust the distance between the microphone and the convertor circuit having the same distance between the sound rendering device and the microphone. So that the first and the second acoustic signals arrive at the same time to the microphone. Hence, it is possible to simplify the measurement approach and effectively improve accuracy of the synchronization.
  • a method for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: detecting an abrupt brightness change of the video, providing a first acoustic signal in response to the detection of the brightness change, capturing the first acoustic signal and a second acoustic signal which is an audio signal provided from the sound rendering device, determining a time difference between the captured first and second acoustic signals, and issuing a command for a delay adjust, of a delay of the video rendering device and/or of the sound rendering device.
  • the abrupt brightness change i.e., the optical signal or flash of the video content, is converted to the first acoustic signal.
  • the first acoustic signal and the second acoustic signal provided by the sound rendering device are measured, and the command for the delay adjustment for synchronizing the video content and the audio content is issued. So that, it is possible to improve accuracy of the synchronization of the video content and the audio content by the simplified measurements approach.
  • Fig. 1 shows a block diagram illustrating an example of a video rendering device for displaying a video according to embodiments of the present application
  • Fig. 2 shows a block diagram illustrating examples of a lip sync unit according to embodiments of the present application
  • Fig. 3 shows a block diagram illustrating examples of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application
  • Fig. 4 shows a block diagram illustrating another example of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application
  • Fig. 5 shows a table indicating examples of device combination according to embodiments of the present application.
  • Fig. 6 shows a flowchart explaining an example of a method for synchronizing an audio signals and video signals according to an embodiment of the present application.
  • Fig. 1 illustrates an embodiment of a video rendering device (for example, television: TV) 2 comprising a video processing unit 4, a display unit 6, a flash detector 12 and a signalling device 14.
  • the flash detector 12 detects an abrupt brightness change of the video, i.e., an image signal or optical signal of an image of the video.
  • the flash detector 12 is any kind of device to detect the abrupt brightness change, for example, a sensor for detecting the optical signal (flash) of the video.
  • the signalling device 14 provides an acoustic signal in response to the detection of the abrupt brightness change of the video by the flash detector 12.
  • the signalling device 14 is any kind of device to provide an acoustic signal in response to the detection of the abrupt brightness change, i.e., to convert the optical signal to the acoustic signal.
  • the optical signal is converted to the acoustic signal by using any suitable technologies, for example, sound generator controlled by a photo diode.
  • the flash detector 12 and the signalling device 14 are able to configure a convertor circuit 10 to convert the optical signal (flash) to the acoustic signal.
  • Fig. 2 depicts examples of the lip sync unit according to the present application.
  • Fig. 2 (a) illustrates the lip sync unit 30 comprising a processing unit 32 and a memory 34.
  • the lip sync device 30 is an information appliance device so called source device such as a Set- Top-Box, a Set-Top-Unit, a Blu-Ray player or a video game console.
  • the lip sync unit 30 receives acoustic signals, e.g., a first acoustic signal which is provided in response to the abrupt brightness change of the video from an image device (e.g., TV or projector or any video rendering device appropriate to display images) and a second acoustic signal which is provided from the sound reproduction device (e.g., a sound rendering device, a soundbar, loudspeaker or any other device appropriate to provide sounds), and the received signals are stored in the memory 34.
  • acoustic signals e.g., a first acoustic signal which is provided in response to the abrupt brightness change of the video from an image device (e.g., TV or projector or any video rendering device appropriate to display images) and a second acoustic signal which is provided from the sound reproduction device (
  • the processing unit 32 retrieves the first and second acoustic signals from the memory 34, and determines a time difference between the first and second acoustic signals, and issues a command for a delay adjust, of a delay of the imaging device and/or of a sound reproduction device.
  • the command is issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a delay period, e.g., how many second (micro or milli second) should be delayed to the device.
  • the command indicates delay period (lip sync point) of the video signals at the lip sync unit 30.
  • Fig. 2 (b) illustrates the lip sync unit 30’ comprising a microphone 36 addition to the processing unit 32 and the memory 34 as shown in Fig. 2 (a).
  • the microphone 36 captures the first and second acoustic signals.
  • the processing unit 32 and the memory 34 work as explained above.
  • Fig. 3 depicts examples of a lip sync management system according to embodiments of the present application.
  • Fig. 3 (a) shows an example of the lip sync management system 100 comprising a video rendering device (an imaging device) 102, a sound rendering device (a sound reproduction device) 104, a convertor circuit 10, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a).
  • the convertor circuit 10 is configured to convert an optical signal to an acoustical signal and comprises a flash detector and a signalling device as shown in Fig. 1 .
  • the video rendering device 102 is connected e.g., via HDMI to the sound rendering device 104 and the lip sync unit 30.
  • the microphone 108 is connected e.g., via wire, Bluetooth or WLAN, to the lip sync unit 30, and as the microphone, a device like a smartphone or tablet, e.g., any device having a function to record sounds (acoustic signals) and transmit the recoded sound is used.
  • the convertor circuit 10 is positioned to be able to detect an abrupt brightness change of the video, i.e. , the position of the convertor circuit 10 is adjusted to capture an optical signal (a flash) from the video rendering device 102.
  • the optical signal from the video rendering device 102 is detected by the convertor circuit 10 and detected optical signal is converted to a first acoustic signal (first beep tone).
  • the first acoustic signal is then signalled to the microphone 108.
  • a second acoustic signal (second beep tone) from the sound rendering device 104 is signalled to the microphone 108.
  • the first acoustic signal and the second acoustic signal have a different frequency, e.g., a frequency of the first acoustic signal is higher than the frequency of the second acoustic signal or vice versa.
  • the microphone 108 records the signalled acoustical signal, which now contains two different tones at different frequency, the frequency of the first acoustical signal and the frequency of the second acoustic signal.
  • the microphone 108 is positioned where the distance (d) from the convertor circuit 10 and the sound rendering device 104 is the same. That is, the distance (d) between the convertor circuit 10 and the microphone 108, and the distance (d) between the sound rendering device 104 and the microphone 108 is adjusted to be the same.
  • the recoded signal is provided to the lip sync unit 30.
  • the distance (d) is a sound distance and the sound distance could be not the same as physical distance between the microphone and the convertor circuit, and the distance between the microphone and the sound rendering device. That is, when soundwaves of both beeps, which are emitted at the same time to the microphone, arrive at the same time to the microphone, the sound distance is the same, but the physical distance could be different.
  • the lip sync unit 30 receives the provided signal from the microphone 108 and stores the received signal at the memory 34 as shown in Fig. 2. Then, the lip sync unit 30 determines a time difference between the first and second acoustic signals based on the stored acoustic signal (stored beep tone). That is, it is now possible via signal processing to calculate the time difference between the first and second beep tones. This could be done via simple methods like band-pass-filtering and correlation. For example, this could be done with an error ⁇ 0.3ms for 1 kHz and 3.2 kHz beep tones. The lip sync unit 30 issues a command for a delay adjust of a delay of the video rendering device 102. That is, the command is generated based on the determined time difference and generated command is issued to the video rendering device 102. That is, the lip sync unit 30 is able to adjust to the point of lip sync by delaying the video.
  • Fig. 3 (b) shows another example of the lip sync management system 100’ comprising a video rendering device (an imaging device) 2 as shown in Fig. 1 , a sound rendering device (a sound reproduction device) 104, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a).
  • the lip sync management system 100’ comprises the video rendering device 2 of Fig. 1 , i.e., the convertor circuit 10 is included in the video rendering device 2; this point is an only difference from the lip sync managements system 100 shown in Fig. 3 (a).
  • the video rendering device 2 could have, e.g., a special lip sync-mode where a flash results in a beep on the loudspeakers of the video rendering device 2. This mode needs to be implemented and calibrated, e.g., by triggering the beep if high brightness Bits occur.
  • Fig. 4 shows further example of a lip sync management system 120.
  • the lip sync management system 120 comprises a TV 122 as a video rendering device, the soundbar (audio sink) 124 as a sound rendering device, Set_Top_Box: STB (source device) 30’ as a lip sync unit, and a remote controller 126 of the STB 30’.
  • the TV 122 is connected to the soundbar 124 and the STB 30’ via HDMI, and the remote controller 126 is positioned to be able to detect an optical signal (a flash) of the TV 122. That is, the convertor circuit to convert an optical signal to an acoustic signal is incorporated into the remote controller 126.
  • a microphone is incorporated into the STB 30’ and therefore, the distance (sound distance) (d) between the remote control and the STB 30’, and the distance (d) between the soundbar 124 and the STB 30’ is the same as also shown in Fig. 3.
  • the convertor circuit could be an external device, incorporated into the video rendering device (TV) or incorporated into the remote controller of the lip sync unit (STB).
  • TV video rendering device
  • STB lip sync unit
  • Fig. 5 shows a possible device combination of a lip sync management system, e.g., for each row, combine one box left with one box right of column.
  • RC indicates a remote controller of the Set-Top Box
  • STB indicates a Set-Top-Box
  • DataC indicates data connection to the Set-Top-Box. That is, for example, in case an extra device is used as a device to convert light (optical signal) to sound (acoustic signal), the STB, the STB with the remote controller or the remote controller (date connection to the STB) is used as a device for recoding audio (microphone) and a device for adjusting video latency algorithm (lip sync unit).
  • the remote controller is able to work as a lip sync unit having a microphone (but without convertor circuit).
  • the result in form of data is then transmitted to the source device via an appropriate technology like e.g. Bluetooth.
  • This remote control would be a hybrid of known technology and a modern smart remote control, which already has a microphone and a Bluetooth connection installed.
  • Fig. 6 shows flowchart indicating an example of a method for synchronizing audio signals and video signals according to the present application.
  • an abrupt brightness change of the video is detected (S10).
  • the abrupt brightness change i.e., a flash or an optical signal from the video rendering device is detected by a flash detector of a convertor circuit in the video rendering device, in the remote controller or in the lip sync unit.
  • a first acoustic signal is provided in response to the detection of the brightness change (S12). That is, a signalling device of the convertor circuit provides the first acoustic signal in response to the detected brightness change, i.e., the optical signal is converted into the first acoustic signal.
  • the first acoustic signal (first beep tone) from the signalling device (the convertor circuit) and a second acoustic signal (second beep tone) from the sound rendering device are emitted to the microphone at the same time.
  • the frequency of the first acoustic signal is different from the frequency of the second acoustic signal, e.g., the frequency of the first acoustic signal is higher than the second acoustic signal or vice versa.
  • the sound including the first and second acoustic signals is captured by the microphone (S14).
  • the distance between the convertor circuit and the microphone, and the distance between the sound rendering device and the microphone is the same, and therefore, the microphone captures the sound including the first and second acoustic signals.
  • the captured sound is recoded by the microphone and transmitted to the lip sync unit.
  • a time difference between the first and second acoustic signals is determined (S16).
  • the sound transmitted from the microphone is stored at the memory of the lip sync unit.
  • the lip sync unit determines a time difference between the first and second acoustic signals.
  • a command for a delay adjust is issued (S18).
  • the lip sync unit issued the command indicating a lip sync point, e.g., delay period of the video content.
  • the lip sync unit is a so- called source device which stores video content to be displayed at the video rendering device.
  • the video content is transmitted to the video rendering device based on the issued command, i.e., transmitting timing is controlled by the command.
  • the video content is transmitted 30ms delayed timing for synchronizing the sound of the sound rendering device.
  • the first acoustic signal and the second acoustic signal are emitted to the microphone at the same time and the microphone captures a sound including the first and second acoustic signals. That is, compared with the technical standard the proposed method transforms the old 'two measurements’ approach (separated audio and video, i.e., acoustic signal from the sound rendering device and optical signal from the video rendering device) to a new 'one conversion and one measurement’ approach, new approach saves complexity and avoids problems like the synchronization of the measurements.
  • the optical signal is converted to the acoustic signal and, therefore, it enables a human to adjust the audio/video synchronization only via hearing, which is more precise than the combination of hearing (acoustic signal) and seeing (optical signal).
  • an operator can adjust the audio/video synchronization by hearing, since the optical signal from the video is converted into the acoustic signal.
  • the lip sync unit e.g., the source device does not need any optical sensor to capture the optical sensor from the video rendering device. If the optical sensor is required, it is known that the optical sensor would need a line of sight or an extension since light doesn’t go around corners in contrast to sound.
  • the video rendering device comprises a flash detector and a signalling device, i.e., a convertor circuit. Therefore, the extra device for synchronizing audio/video is not required.
  • the convertor circuit is incorporated into the remote controller of the lip sync unit. Therefore, it is possible to reduce extra costs to a minimum, since no extra device is needed and buttons to control the process and a power source are already existing in the remote controller, which reduce the conversion circuit cost. Furthermore, the remote controller is flexible enough to adjust the distance to the source device. Thereby it is easily possible to ensure that the loudspeakers and the remote controller have the same distance (d) to the microphone or to the microphone in the lip sync device. So that soundwaves of both beeps arrive at the same time.
  • the smartphone as a microphone for recoding the beep. This is implemented by using a smartphone app which could automate the process of measuring, calculating and reporting to the source.
  • the lip sync unit comprises a microphone, i.e., recoding of the beep is done at the lip sync unit.
  • recoding of the beep is done at the lip sync unit.
  • aspects described in the context of an apparatus it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
  • Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
  • the inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
  • embodiments of the application can be implemented in hardware or in software.
  • the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a0 programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer 5 system, such that one of the methods described herein is performed.
  • embodiments of the present application can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program0 code may, for example, be stored on a machine readable carrier.
  • inventions comprise a computer program for performing one of the methods described herein, stored on a machine readable carrier. 5
  • an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the inventive methods is, therefore, a data carrier (or a digital0 storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
  • the data carrier, the digital storage medium or the recorded medium are typically tangible and/or nontransitionary.
  • a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
  • a further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver.
  • the receiver may, for example, be a computer, a mobile device, a memory device or the like.
  • the apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
  • a programmable logic device for example a field programmable gate array
  • a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
  • the methods are preferably performed by any hardware apparatus.
  • the apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
  • the apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A video rendering device for displaying a video comprising: a flash detector which detect an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.

Description

LIP SYNC MANAGEMENT DEVICE
The present application is concerned with synchronization of video signals and audio signals, lip sync, management.
It is well known that to synchronize audio signals and video signals as lip sync (also written as lipsync).
Lip sync technology is used, for example, a home cinema system which is composed of a set of devices comprising at least rendering devices to render the audio/video signal and source devices such as a Set-Top-Box, STB, a Blue-Ray player or a video game console. Video rendering devices such as televisions, projectors and head-mounted displays allow to display the images corresponding to the video signal. Audio rendering devices such as audio amplifiers connected to sets of loudspeakers, sound bars, and headphones allow to output the sound waves corresponding to the audio signal. Many topologies of devices are possible and different types of connections are applicable.
Each rendering device includes a latency for processing the signal. This latency varies depending on the type of signal, audio or video, between devices and also depends on the rendering mode chosen for a same device. For example, a television has video rendering modes with minimal processing for low latency applications such as games leading to a video latency of about 30ms. As a further example, the STB delays the video signal, e.g., 180ms the TV needs 70ms which will add up to 250ms in total the same number at the MPEG-H audio latency. The difference of latency between the audio and the video signal generates a so-called lip sync issue, noticeable by the viewer when the delay between the image and the sound is too large: when the sound is advance with respect to the video of more than 45ms or when the sound is late with respect to the video of more than 125ms according to the recommendation BT.1359-1 of the International Telecommunication Union (ITU).
A lip sync issue can severely impair the viewing experience. However, up to now, lip sync has always been considered in the case where video processing is longer than audio processing. This might change by using modern audio codecs the decoding time of audio exceeds the decoding time of video, for example, with the advent of 3D audio where more complex processing will be required on the audio signal, potentially causing audio latencies up to 250ms.
It Is known to measure audio / video synchronization in a laboratory is to play a stream with a beep and a flash at the same time and measure both via suitable sensors on an oscilloscope. Two different simultaneous sensors measurements are needed. In addition, it is known as standard to ensure lip sync at home is a slider in the menu of the source device that is manually adjustable normally based on the impression of a human. There is a commercial device for lip sync, for example,“Sync-One2” (registered trademark) which has a light sensor and a microphone. If a special stream is played, where an audio beep and a flash occur at the same time, the known commercial device will calculate the difference between audio and video and will show it on a display. Thereby manually adjustment, again via slider, is possible but more precise than human judgement.
According to the standard measurement approach, it is necessary to detect the optical signal of the video and the acoustic signal of the sound. That is, two different type of signals must be measured and it is required to synchronize the measurement of the video signal and the acoustic signal to accurately adjust delay of the video or the sound. The key information here is, for example, that the STB needs to adjust the lipsync therefore the data about lipsync is transmitted in the present invention to the STB.
It is thus, the object of the present invention to provide a concept for accurate and simplified lip sync management.
This object is achieved by the subject-matter of a video rendering device according to claim 1 , a lip sync unit according to claim 2, a lip sync system according to claim 5, a method for synchronizing audio signals and video signals according to claim 8, a computer program according to claim 9 of the present application.
According to an embodiment of the present application, a video rendering device for displaying a video comprises a flash detector which detects an abrupt brightness change, e.g., a flash light or an optical signal, of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change. That is, the video signal is converted to the acoustic signal and, therefore, only one measurement approach is required to synchronize the video signals and the audio signals. Hence, the measurement approach regarding lip sync is simplified and it is possible to improve the lip sync accuracy by using the same measuring approach for the video and the audio. Furthermore, when the optical signal of the video is converted to the acoustic signal, it is possible to adjust synchronization between the video (image) and the audio by human, since adjustment is able by hearing the acoustic signal, e.g., converted sound.
In accordance with embodiments of the present application, a lip sync unit, e.g., a processing unit of the video content (source device), which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal or vice versa, and issues a command for a delay adjust, of a delay of an imaging device, e.g., display for rendering an image and/or of a sound reproduction device. That is, the command for a delay adjust is determined based on the different acoustic signals. In addition, the command is preferably issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a lip sync point. According to the lip sync unit of the present application, it is possible to synchronize the video and the audio in the simplified way.
According to the embodiments of the present application, a lip sync management system for synchronizing audio signals on a sound rendering device, e.g., a soundbar or a loudspeaker or any device to provide sound contents, and video signals on a video rendering device, e.g., TV or any device to display the video contents, comprising: a convertor circuit for converting an optical signal to an acoustical signal, wherein the convertor circuit comprises a flash detector which detects abrupt brightness change of the video from the video rendering device, and a signalling device which provides a first acoustic signal in response to the detection of the brightness change, a microphone, e.g., any device suitable for recoding acoustic signals, configured to capture the first acoustic signal and a second acoustic signal from the sound rendering device, and the lip sync unit which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal, and issues a command for a delay adjust, of a delay of an imaging device, e.g., display for rendering an image and/or of a sound reproduction device, wherein the captured first and second acoustic signals are transmitted from the microphone to the lip sync unit. That is, the audio signals and the video signals are synchronizing based on the difference between the first and second acoustic signals. In addition, the convertor circuit is not included in the video rendering device or another device, and, therefore, it is possible to incorporate or integrate the convertor circuit into another device, e.g. , a remote controller of the video rendering device or the lip sync unit or the sound rendering device, and, therefore, it is possible to reduce required modification and costs for introducing the convertor circuit. Furthermore, the converter circuit is not a part of another device, and, therefore, it is possible to adjust the distance between the microphone and the convertor circuit having the same distance between the sound rendering device and the microphone. So that the first and the second acoustic signals arrive at the same time to the microphone. Hence, it is possible to simplify the measurement approach and effectively improve accuracy of the synchronization.
According to the embodiments of the present application, a method for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: detecting an abrupt brightness change of the video, providing a first acoustic signal in response to the detection of the brightness change, capturing the first acoustic signal and a second acoustic signal which is an audio signal provided from the sound rendering device, determining a time difference between the captured first and second acoustic signals, and issuing a command for a delay adjust, of a delay of the video rendering device and/or of the sound rendering device. The abrupt brightness change, i.e., the optical signal or flash of the video content, is converted to the first acoustic signal. The first acoustic signal and the second acoustic signal provided by the sound rendering device are measured, and the command for the delay adjustment for synchronizing the video content and the audio content is issued. So that, it is possible to improve accuracy of the synchronization of the video content and the audio content by the simplified measurements approach.
Advantageous aspects of the present application are the subject of dependent claims. Preferred embodiments of the present application are described below with respect to the figures, among which: Fig. 1 shows a block diagram illustrating an example of a video rendering device for displaying a video according to embodiments of the present application;
Fig. 2 shows a block diagram illustrating examples of a lip sync unit according to embodiments of the present application; Fig. 3 shows a block diagram illustrating examples of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application; Fig. 4 shows a block diagram illustrating another example of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application;
Fig. 5 shows a table indicating examples of device combination according to embodiments of the present application; and
Fig. 6 shows a flowchart explaining an example of a method for synchronizing an audio signals and video signals according to an embodiment of the present application.
Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals.
In the following description, a plurality of details is set forth to provide a more thorough explanation of embodiments of the present application. However, it will be apparent to one skilled in the art that embodiments of the present application may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present application. In addition, features of the different embodiments described hereinafter may be combined with each other, unless specifically noted otherwise.
Fig. 1 illustrates an embodiment of a video rendering device (for example, television: TV) 2 comprising a video processing unit 4, a display unit 6, a flash detector 12 and a signalling device 14. The flash detector 12 detects an abrupt brightness change of the video, i.e., an image signal or optical signal of an image of the video. The flash detector 12 is any kind of device to detect the abrupt brightness change, for example, a sensor for detecting the optical signal (flash) of the video. The signalling device 14 provides an acoustic signal in response to the detection of the abrupt brightness change of the video by the flash detector 12. The signalling device 14 is any kind of device to provide an acoustic signal in response to the detection of the abrupt brightness change, i.e., to convert the optical signal to the acoustic signal. The optical signal is converted to the acoustic signal by using any suitable technologies, for example, sound generator controlled by a photo diode. The flash detector 12 and the signalling device 14 are able to configure a convertor circuit 10 to convert the optical signal (flash) to the acoustic signal. Fig. 2 depicts examples of the lip sync unit according to the present application. Fig. 2 (a) illustrates the lip sync unit 30 comprising a processing unit 32 and a memory 34. The lip sync device 30 is an information appliance device so called source device such as a Set- Top-Box, a Set-Top-Unit, a Blu-Ray player or a video game console. The lip sync unit 30 receives acoustic signals, e.g., a first acoustic signal which is provided in response to the abrupt brightness change of the video from an image device (e.g., TV or projector or any video rendering device appropriate to display images) and a second acoustic signal which is provided from the sound reproduction device (e.g., a sound rendering device, a soundbar, loudspeaker or any other device appropriate to provide sounds), and the received signals are stored in the memory 34. Then, the processing unit 32 retrieves the first and second acoustic signals from the memory 34, and determines a time difference between the first and second acoustic signals, and issues a command for a delay adjust, of a delay of the imaging device and/or of a sound reproduction device.
The command is issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a delay period, e.g., how many second (micro or milli second) should be delayed to the device. In case the decoding time of audio signals exceeds the decoding time of video signals, the command indicates delay period (lip sync point) of the video signals at the lip sync unit 30. Fig. 2 (b) illustrates the lip sync unit 30’ comprising a microphone 36 addition to the processing unit 32 and the memory 34 as shown in Fig. 2 (a). The microphone 36 captures the first and second acoustic signals. The processing unit 32 and the memory 34 work as explained above. Fig. 3 depicts examples of a lip sync management system according to embodiments of the present application. Fig. 3 (a) shows an example of the lip sync management system 100 comprising a video rendering device (an imaging device) 102, a sound rendering device (a sound reproduction device) 104, a convertor circuit 10, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a). As shown in Fig. 3 (a), the convertor circuit 10 is configured to convert an optical signal to an acoustical signal and comprises a flash detector and a signalling device as shown in Fig. 1 . The video rendering device 102 is connected e.g., via HDMI to the sound rendering device 104 and the lip sync unit 30. The microphone 108 is connected e.g., via wire, Bluetooth or WLAN, to the lip sync unit 30, and as the microphone, a device like a smartphone or tablet, e.g., any device having a function to record sounds (acoustic signals) and transmit the recoded sound is used. The convertor circuit 10 is positioned to be able to detect an abrupt brightness change of the video, i.e. , the position of the convertor circuit 10 is adjusted to capture an optical signal (a flash) from the video rendering device 102. The optical signal from the video rendering device 102 is detected by the convertor circuit 10 and detected optical signal is converted to a first acoustic signal (first beep tone). The first acoustic signal is then signalled to the microphone 108. At the same time, a second acoustic signal (second beep tone) from the sound rendering device 104 is signalled to the microphone 108. The first acoustic signal and the second acoustic signal have a different frequency, e.g., a frequency of the first acoustic signal is higher than the frequency of the second acoustic signal or vice versa.
The microphone 108 records the signalled acoustical signal, which now contains two different tones at different frequency, the frequency of the first acoustical signal and the frequency of the second acoustic signal. The microphone 108 is positioned where the distance (d) from the convertor circuit 10 and the sound rendering device 104 is the same. That is, the distance (d) between the convertor circuit 10 and the microphone 108, and the distance (d) between the sound rendering device 104 and the microphone 108 is adjusted to be the same. The recoded signal is provided to the lip sync unit 30. The distance (d) is a sound distance and the sound distance could be not the same as physical distance between the microphone and the convertor circuit, and the distance between the microphone and the sound rendering device. That is, when soundwaves of both beeps, which are emitted at the same time to the microphone, arrive at the same time to the microphone, the sound distance is the same, but the physical distance could be different.
The lip sync unit 30 receives the provided signal from the microphone 108 and stores the received signal at the memory 34 as shown in Fig. 2. Then, the lip sync unit 30 determines a time difference between the first and second acoustic signals based on the stored acoustic signal (stored beep tone). That is, it is now possible via signal processing to calculate the time difference between the first and second beep tones. This could be done via simple methods like band-pass-filtering and correlation. For example, this could be done with an error < 0.3ms for 1 kHz and 3.2 kHz beep tones. The lip sync unit 30 issues a command for a delay adjust of a delay of the video rendering device 102. That is, the command is generated based on the determined time difference and generated command is issued to the video rendering device 102. That is, the lip sync unit 30 is able to adjust to the point of lip sync by delaying the video.
Fig. 3 (b) shows another example of the lip sync management system 100’ comprising a video rendering device (an imaging device) 2 as shown in Fig. 1 , a sound rendering device (a sound reproduction device) 104, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a). The lip sync management system 100’ comprises the video rendering device 2 of Fig. 1 , i.e., the convertor circuit 10 is included in the video rendering device 2; this point is an only difference from the lip sync managements system 100 shown in Fig. 3 (a). In the lip sync management system 100’, the video rendering device 2 could have, e.g., a special lip sync-mode where a flash results in a beep on the loudspeakers of the video rendering device 2. This mode needs to be implemented and calibrated, e.g., by triggering the beep if high brightness Bits occur.
Fig. 4 shows further example of a lip sync management system 120. The lip sync management system 120 comprises a TV 122 as a video rendering device, the soundbar (audio sink) 124 as a sound rendering device, Set_Top_Box: STB (source device) 30’ as a lip sync unit, and a remote controller 126 of the STB 30’. The TV 122 is connected to the soundbar 124 and the STB 30’ via HDMI, and the remote controller 126 is positioned to be able to detect an optical signal (a flash) of the TV 122. That is, the convertor circuit to convert an optical signal to an acoustic signal is incorporated into the remote controller 126. In addition, a microphone is incorporated into the STB 30’ and therefore, the distance (sound distance) (d) between the remote control and the STB 30’, and the distance (d) between the soundbar 124 and the STB 30’ is the same as also shown in Fig. 3.
To be summarized, the convertor circuit could be an external device, incorporated into the video rendering device (TV) or incorporated into the remote controller of the lip sync unit (STB).
Fig. 5 shows a possible device combination of a lip sync management system, e.g., for each row, combine one box left with one box right of column. In Fig. 5, RC indicates a remote controller of the Set-Top Box, STB indicates a Set-Top-Box, DataC indicates data connection to the Set-Top-Box. That is, for example, in case an extra device is used as a device to convert light (optical signal) to sound (acoustic signal), the STB, the STB with the remote controller or the remote controller (date connection to the STB) is used as a device for recoding audio (microphone) and a device for adjusting video latency algorithm (lip sync unit).
As an alternative, a slightly different approach could be considered. That is, to install a light sensor and a microphone in the remote controller and do the signal processing there, i.e., the remote controller is able to work as a lip sync unit having a microphone (but without convertor circuit). The result in form of data is then transmitted to the source device via an appropriate technology like e.g. Bluetooth. This remote control would be a hybrid of known technology and a modern smart remote control, which already has a microphone and a Bluetooth connection installed.
Fig. 6 shows flowchart indicating an example of a method for synchronizing audio signals and video signals according to the present application.
As depicted in Fig. 6, an abrupt brightness change of the video is detected (S10). For example, the abrupt brightness change, i.e., a flash or an optical signal from the video rendering device is detected by a flash detector of a convertor circuit in the video rendering device, in the remote controller or in the lip sync unit.
Then, a first acoustic signal is provided in response to the detection of the brightness change (S12). That is, a signalling device of the convertor circuit provides the first acoustic signal in response to the detected brightness change, i.e., the optical signal is converted into the first acoustic signal. The first acoustic signal (first beep tone) from the signalling device (the convertor circuit) and a second acoustic signal (second beep tone) from the sound rendering device are emitted to the microphone at the same time. It should be noted that the frequency of the first acoustic signal is different from the frequency of the second acoustic signal, e.g., the frequency of the first acoustic signal is higher than the second acoustic signal or vice versa.
The sound including the first and second acoustic signals is captured by the microphone (S14). The distance between the convertor circuit and the microphone, and the distance between the sound rendering device and the microphone is the same, and therefore, the microphone captures the sound including the first and second acoustic signals. The captured sound is recoded by the microphone and transmitted to the lip sync unit. A time difference between the first and second acoustic signals is determined (S16). The sound transmitted from the microphone is stored at the memory of the lip sync unit. The lip sync unit determines a time difference between the first and second acoustic signals.
A command for a delay adjust is issued (S18). The lip sync unit issued the command indicating a lip sync point, e.g., delay period of the video content. The lip sync unit is a so- called source device which stores video content to be displayed at the video rendering device. Hence, the video content is transmitted to the video rendering device based on the issued command, i.e., transmitting timing is controlled by the command. For example, the video content is transmitted 30ms delayed timing for synchronizing the sound of the sound rendering device.
In accordance with the embodiments of the present application, the first acoustic signal and the second acoustic signal are emitted to the microphone at the same time and the microphone captures a sound including the first and second acoustic signals. That is, compared with the technical standard the proposed method transforms the old 'two measurements’ approach (separated audio and video, i.e., acoustic signal from the sound rendering device and optical signal from the video rendering device) to a new 'one conversion and one measurement’ approach, new approach saves complexity and avoids problems like the synchronization of the measurements.
In accordance with the embodiments of the present application, the optical signal is converted to the acoustic signal and, therefore, it enables a human to adjust the audio/video synchronization only via hearing, which is more precise than the combination of hearing (acoustic signal) and seeing (optical signal). For example, an operator can adjust the audio/video synchronization by hearing, since the optical signal from the video is converted into the acoustic signal. In accordance with the embodiments of the present application, the lip sync unit, e.g., the source device does not need any optical sensor to capture the optical sensor from the video rendering device. If the optical sensor is required, it is known that the optical sensor would need a line of sight or an extension since light doesn’t go around corners in contrast to sound. Also, sound is much slower than light which will lead to a measurement offset. The requirements for the microphone and the sampling are very low since it is only necessary to distinguish between two frequencies and theirs states: on or off. Hence, it is possible to improve the accuracy of the lip sync with simplified feature configuration of the lip sync unit.
In accordance with the embodiments of the present application, the video rendering device comprises a flash detector and a signalling device, i.e., a convertor circuit. Therefore, the extra device for synchronizing audio/video is not required.
In accordance with the embodiments of the present application, the convertor circuit is incorporated into the remote controller of the lip sync unit. Therefore, it is possible to reduce extra costs to a minimum, since no extra device is needed and buttons to control the process and a power source are already existing in the remote controller, which reduce the conversion circuit cost. Furthermore, the remote controller is flexible enough to adjust the distance to the source device. Thereby it is easily possible to ensure that the loudspeakers and the remote controller have the same distance (d) to the microphone or to the microphone in the lip sync device. So that soundwaves of both beeps arrive at the same time.
In accordance with the embodiments of the present application, it is possible to use the smartphone as a microphone for recoding the beep. This is implemented by using a smartphone app which could automate the process of measuring, calculating and reporting to the source.
In accordance with the embodiments of the present application, the lip sync unit comprises a microphone, i.e., recoding of the beep is done at the lip sync unit. In this case, there are fewer points where something could go wrong, e.g., it is possible to reduce error occurrence possibility. Especially human insertion of numbers is not necessary.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus. The inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
5 Depending on certain implementation requirements, embodiments of the application can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a0 programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer 5 system, such that one of the methods described herein is performed.
, Generally, embodiments of the present application can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program0 code may, for example, be stored on a machine readable carrier.
Other embodiments comprise a computer program for performing one of the methods described herein, stored on a machine readable carrier. 5 In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital0 storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or nontransitionary. 5 A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.

Claims

Claims
1. A video rendering device for displaying a video comprising: a flash detector which detects an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.
2. A lip sync unit which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal, and issues a command for a delay adjust, of a delay of an imaging device and/or of a sound reproduction device.
3. The lip sync unit according to claim 2, wherein the command is issued to adjust a delay difference between an audio content and a video content.
4. The lip sync unit according to claim 2 or 3 comprises a microphone configured to capture the first and the second acoustic signal.
5. A lip sync management system for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: a convertor circuit for converting an optical signal to an acoustical signal, wherein the convertor circuit comprises a flash detector which detects abrupt brightness change of the video from the video rendering device, and a signalling device which provides a first acoustic signal in response to the detection of the brightness change, a microphone configured to capture the first acoustic signal and a second acoustic signal from the sound rendering device, and the lip sync unit according to claim 2 or 3, wherein the captured first and second acoustic signals are transmitted from the microphone to the lip sync unit.
6, The system according to claim 5, wherein the converter circuit is configured to convert the abrupt brightness change of the video to the first acoustic signal at a different frequency from the second acoustic signal.
7. The system according to claim 5 or 6, wherein the distance between the converter circuit and the microphone, and the distance between the sound rendering device and the microphone are the same distance.
8. A method for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: detecting an abrupt brightness change of the video, providing a first acoustic signal in response to the detection of the brightness change, capturing the first acoustic signal and a second acoustic signal which is an audio signal provided from the sound rendering device, determining a time difference between the captured first and second acoustic signals, and issuing a command for a delay adjust, of a delay of the video rendering device and/or of the sound rendering device.
9. Computer program having a program code for performing, when running on computer, a method according to claim 8.
10. A circuit used for a video rendering device for displaying a video comprising: a flash detector which detects an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.
PCT/EP2020/070173 2019-07-17 2020-07-16 Lip sync management device WO2021009298A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP19186830 2019-07-17
EP19186830.6 2019-07-17

Publications (1)

Publication Number Publication Date
WO2021009298A1 true WO2021009298A1 (en) 2021-01-21

Family

ID=67437868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/070173 WO2021009298A1 (en) 2019-07-17 2020-07-16 Lip sync management device

Country Status (1)

Country Link
WO (1) WO2021009298A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060127053A1 (en) * 2004-12-15 2006-06-15 Hee-Soo Lee Method and apparatus to automatically adjust audio and video synchronization
US20110013085A1 (en) * 2008-03-19 2011-01-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and Apparatus for Measuring Audio-Video Time skew and End-to-End Delay

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060127053A1 (en) * 2004-12-15 2006-06-15 Hee-Soo Lee Method and apparatus to automatically adjust audio and video synchronization
US20110013085A1 (en) * 2008-03-19 2011-01-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and Apparatus for Measuring Audio-Video Time skew and End-to-End Delay

Similar Documents

Publication Publication Date Title
EP2599296B1 (en) Methods and apparatus for automatic synchronization of audio and video signals
KR102140612B1 (en) A/v receiving apparatus and method for delaying output of audio signal and a/v signal processing system
CN101204081B (en) Automatic audio and video synchronization
US8997169B2 (en) System, method, and infrastructure for synchronized streaming of content
US7970222B2 (en) Determining a delay
US20140376873A1 (en) Video-audio processing device and video-audio processing method
US9742964B2 (en) Audio/visual device and control method thereof
WO2016105322A1 (en) Simultaneously viewing multiple camera angles
US20120182387A1 (en) 3d motion picture processing device
KR20080006486A (en) Information-processing device, information-processing method and computer program
KR20120105497A (en) Optimizing content calibration for home theaters
US9838584B2 (en) Audio/video synchronization using a device with camera and microphone
KR101311463B1 (en) remote video transmission system
JP2012191583A (en) Signal output device
JP2018207152A (en) Synchronization controller and synchronization control method
WO2021009298A1 (en) Lip sync management device
JP2006250638A (en) Video camera provided with clock synchronization function
JP4669854B2 (en) Video processor and video delay measuring method
JP2013143706A (en) Video audio processing device and program therefor
KR100677162B1 (en) AV system for adjusting audio/video rip synchronization
JP2008066931A (en) Video audio reproduction system, and video audio synchronization reproducer composing it
JP5051159B2 (en) Electronic device, display control method, and computer program
JP2013081134A (en) Communication device
JP2013207307A (en) Sound signal processor
JP2012175339A (en) Three-dimensional video signal processing apparatus and processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20739416

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20739416

Country of ref document: EP

Kind code of ref document: A1