WO2021009298A1

WO2021009298A1 - Lip sync management device

Info

Publication number: WO2021009298A1
Application number: PCT/EP2020/070173
Authority: WO
Inventors: Stefan KRÄGELOH; Thomas Heller
Original assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date: 2019-07-17
Filing date: 2020-07-16
Publication date: 2021-01-21

Abstract

A video rendering device for displaying a video comprising: a flash detector which detect an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.

Description

LIP SYNC MANAGEMENT DEVICE

The present application is concerned with synchronization of video signals and audio signals, lip sync, management.

It is well known that to synchronize audio signals and video signals as lip sync (also written as lipsync).

Lip sync technology is used, for example, a home cinema system which is composed of a set of devices comprising at least rendering devices to render the audio/video signal and source devices such as a Set-Top-Box, STB, a Blue-Ray player or a video game console. Video rendering devices such as televisions, projectors and head-mounted displays allow to display the images corresponding to the video signal. Audio rendering devices such as audio amplifiers connected to sets of loudspeakers, sound bars, and headphones allow to output the sound waves corresponding to the audio signal. Many topologies of devices are possible and different types of connections are applicable.

Each rendering device includes a latency for processing the signal. This latency varies depending on the type of signal, audio or video, between devices and also depends on the rendering mode chosen for a same device. For example, a television has video rendering modes with minimal processing for low latency applications such as games leading to a video latency of about 30ms. As a further example, the STB delays the video signal, e.g., 180ms the TV needs 70ms which will add up to 250ms in total the same number at the MPEG-H audio latency. The difference of latency between the audio and the video signal generates a so-called lip sync issue, noticeable by the viewer when the delay between the image and the sound is too large: when the sound is advance with respect to the video of more than 45ms or when the sound is late with respect to the video of more than 125ms according to the recommendation BT.1359-1 of the International Telecommunication Union (ITU).

A lip sync issue can severely impair the viewing experience. However, up to now, lip sync has always been considered in the case where video processing is longer than audio processing. This might change by using modern audio codecs the decoding time of audio exceeds the decoding time of video, for example, with the advent of 3D audio where more complex processing will be required on the audio signal, potentially causing audio latencies up to 250ms.

It Is known to measure audio / video synchronization in a laboratory is to play a stream with a beep and a flash at the same time and measure both via suitable sensors on an oscilloscope. Two different simultaneous sensors measurements are needed. In addition, it is known as standard to ensure lip sync at home is a slider in the menu of the source device that is manually adjustable normally based on the impression of a human. There is a commercial device for lip sync, for example,“Sync-One2” (registered trademark) which has a light sensor and a microphone. If a special stream is played, where an audio beep and a flash occur at the same time, the known commercial device will calculate the difference between audio and video and will show it on a display. Thereby manually adjustment, again via slider, is possible but more precise than human judgement.

According to the standard measurement approach, it is necessary to detect the optical signal of the video and the acoustic signal of the sound. That is, two different type of signals must be measured and it is required to synchronize the measurement of the video signal and the acoustic signal to accurately adjust delay of the video or the sound. The key information here is, for example, that the STB needs to adjust the lipsync therefore the data about lipsync is transmitted in the present invention to the STB.

It is thus, the object of the present invention to provide a concept for accurate and simplified lip sync management.

This object is achieved by the subject-matter of a video rendering device according to claim 1 , a lip sync unit according to claim 2, a lip sync system according to claim 5, a method for synchronizing audio signals and video signals according to claim 8, a computer program according to claim 9 of the present application.

According to an embodiment of the present application, a video rendering device for displaying a video comprises a flash detector which detects an abrupt brightness change, e.g., a flash light or an optical signal, of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change. That is, the video signal is converted to the acoustic signal and, therefore, only one measurement approach is required to synchronize the video signals and the audio signals. Hence, the measurement approach regarding lip sync is simplified and it is possible to improve the lip sync accuracy by using the same measuring approach for the video and the audio. Furthermore, when the optical signal of the video is converted to the acoustic signal, it is possible to adjust synchronization between the video (image) and the audio by human, since adjustment is able by hearing the acoustic signal, e.g., converted sound.

In accordance with embodiments of the present application, a lip sync unit, e.g., a processing unit of the video content (source device), which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal or vice versa, and issues a command for a delay adjust, of a delay of an imaging device, e.g., display for rendering an image and/or of a sound reproduction device. That is, the command for a delay adjust is determined based on the different acoustic signals. In addition, the command is preferably issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a lip sync point. According to the lip sync unit of the present application, it is possible to synchronize the video and the audio in the simplified way.

According to the embodiments of the present application, a lip sync management system for synchronizing audio signals on a sound rendering device, e.g., a soundbar or a loudspeaker or any device to provide sound contents, and video signals on a video rendering device, e.g., TV or any device to display the video contents, comprising: a convertor circuit for converting an optical signal to an acoustical signal, wherein the convertor circuit comprises a flash detector which detects abrupt brightness change of the video from the video rendering device, and a signalling device which provides a first acoustic signal in response to the detection of the brightness change, a microphone, e.g., any device suitable for recoding acoustic signals, configured to capture the first acoustic signal and a second acoustic signal from the sound rendering device, and the lip sync unit which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal, and issues a command for a delay adjust, of a delay of an imaging device, e.g., display for rendering an image and/or of a sound reproduction device, wherein the captured first and second acoustic signals are transmitted from the microphone to the lip sync unit. That is, the audio signals and the video signals are synchronizing based on the difference between the first and second acoustic signals. In addition, the convertor circuit is not included in the video rendering device or another device, and, therefore, it is possible to incorporate or integrate the convertor circuit into another device, e.g. , a remote controller of the video rendering device or the lip sync unit or the sound rendering device, and, therefore, it is possible to reduce required modification and costs for introducing the convertor circuit. Furthermore, the converter circuit is not a part of another device, and, therefore, it is possible to adjust the distance between the microphone and the convertor circuit having the same distance between the sound rendering device and the microphone. So that the first and the second acoustic signals arrive at the same time to the microphone. Hence, it is possible to simplify the measurement approach and effectively improve accuracy of the synchronization.

According to the embodiments of the present application, a method for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: detecting an abrupt brightness change of the video, providing a first acoustic signal in response to the detection of the brightness change, capturing the first acoustic signal and a second acoustic signal which is an audio signal provided from the sound rendering device, determining a time difference between the captured first and second acoustic signals, and issuing a command for a delay adjust, of a delay of the video rendering device and/or of the sound rendering device. The abrupt brightness change, i.e., the optical signal or flash of the video content, is converted to the first acoustic signal. The first acoustic signal and the second acoustic signal provided by the sound rendering device are measured, and the command for the delay adjustment for synchronizing the video content and the audio content is issued. So that, it is possible to improve accuracy of the synchronization of the video content and the audio content by the simplified measurements approach.

Advantageous aspects of the present application are the subject of dependent claims. Preferred embodiments of the present application are described below with respect to the figures, among which: Fig. 1 shows a block diagram illustrating an example of a video rendering device for displaying a video according to embodiments of the present application;

Fig. 2 shows a block diagram illustrating examples of a lip sync unit according to embodiments of the present application; Fig. 3 shows a block diagram illustrating examples of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application; Fig. 4 shows a block diagram illustrating another example of a lip sync management system for synchronizing audio signals and video signals according to embodiments of the present application;

Fig. 5 shows a table indicating examples of device combination according to embodiments of the present application; and

Fig. 6 shows a flowchart explaining an example of a method for synchronizing an audio signals and video signals according to an embodiment of the present application.

Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals.

In the following description, a plurality of details is set forth to provide a more thorough explanation of embodiments of the present application. However, it will be apparent to one skilled in the art that embodiments of the present application may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present application. In addition, features of the different embodiments described hereinafter may be combined with each other, unless specifically noted otherwise.

Fig. 1 illustrates an embodiment of a video rendering device (for example, television: TV) 2 comprising a video processing unit 4, a display unit 6, a flash detector 12 and a signalling device 14. The flash detector 12 detects an abrupt brightness change of the video, i.e., an image signal or optical signal of an image of the video. The flash detector 12 is any kind of device to detect the abrupt brightness change, for example, a sensor for detecting the optical signal (flash) of the video. The signalling device 14 provides an acoustic signal in response to the detection of the abrupt brightness change of the video by the flash detector 12. The signalling device 14 is any kind of device to provide an acoustic signal in response to the detection of the abrupt brightness change, i.e., to convert the optical signal to the acoustic signal. The optical signal is converted to the acoustic signal by using any suitable technologies, for example, sound generator controlled by a photo diode. The flash detector 12 and the signalling device 14 are able to configure a convertor circuit 10 to convert the optical signal (flash) to the acoustic signal. Fig. 2 depicts examples of the lip sync unit according to the present application. Fig. 2 (a) illustrates the lip sync unit 30 comprising a processing unit 32 and a memory 34. The lip sync device 30 is an information appliance device so called source device such as a Set- Top-Box, a Set-Top-Unit, a Blu-Ray player or a video game console. The lip sync unit 30 receives acoustic signals, e.g., a first acoustic signal which is provided in response to the abrupt brightness change of the video from an image device (e.g., TV or projector or any video rendering device appropriate to display images) and a second acoustic signal which is provided from the sound reproduction device (e.g., a sound rendering device, a soundbar, loudspeaker or any other device appropriate to provide sounds), and the received signals are stored in the memory 34. Then, the processing unit 32 retrieves the first and second acoustic signals from the memory 34, and determines a time difference between the first and second acoustic signals, and issues a command for a delay adjust, of a delay of the imaging device and/or of a sound reproduction device.

The command is issued to adjust a delay difference between an audio content and a video content, i.e., the command indicates a delay period, e.g., how many second (micro or milli second) should be delayed to the device. In case the decoding time of audio signals exceeds the decoding time of video signals, the command indicates delay period (lip sync point) of the video signals at the lip sync unit 30. Fig. 2 (b) illustrates the lip sync unit 30’ comprising a microphone 36 addition to the processing unit 32 and the memory 34 as shown in Fig. 2 (a). The microphone 36 captures the first and second acoustic signals. The processing unit 32 and the memory 34 work as explained above. Fig. 3 depicts examples of a lip sync management system according to embodiments of the present application. Fig. 3 (a) shows an example of the lip sync management system 100 comprising a video rendering device (an imaging device) 102, a sound rendering device (a sound reproduction device) 104, a convertor circuit 10, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a). As shown in Fig. 3 (a), the convertor circuit 10 is configured to convert an optical signal to an acoustical signal and comprises a flash detector and a signalling device as shown in Fig. 1 . The video rendering device 102 is connected e.g., via HDMI to the sound rendering device 104 and the lip sync unit 30. The microphone 108 is connected e.g., via wire, Bluetooth or WLAN, to the lip sync unit 30, and as the microphone, a device like a smartphone or tablet, e.g., any device having a function to record sounds (acoustic signals) and transmit the recoded sound is used. The convertor circuit 10 is positioned to be able to detect an abrupt brightness change of the video, i.e. , the position of the convertor circuit 10 is adjusted to capture an optical signal (a flash) from the video rendering device 102. The optical signal from the video rendering device 102 is detected by the convertor circuit 10 and detected optical signal is converted to a first acoustic signal (first beep tone). The first acoustic signal is then signalled to the microphone 108. At the same time, a second acoustic signal (second beep tone) from the sound rendering device 104 is signalled to the microphone 108. The first acoustic signal and the second acoustic signal have a different frequency, e.g., a frequency of the first acoustic signal is higher than the frequency of the second acoustic signal or vice versa.

The microphone 108 records the signalled acoustical signal, which now contains two different tones at different frequency, the frequency of the first acoustical signal and the frequency of the second acoustic signal. The microphone 108 is positioned where the distance (d) from the convertor circuit 10 and the sound rendering device 104 is the same. That is, the distance (d) between the convertor circuit 10 and the microphone 108, and the distance (d) between the sound rendering device 104 and the microphone 108 is adjusted to be the same. The recoded signal is provided to the lip sync unit 30. The distance (d) is a sound distance and the sound distance could be not the same as physical distance between the microphone and the convertor circuit, and the distance between the microphone and the sound rendering device. That is, when soundwaves of both beeps, which are emitted at the same time to the microphone, arrive at the same time to the microphone, the sound distance is the same, but the physical distance could be different.

The lip sync unit 30 receives the provided signal from the microphone 108 and stores the received signal at the memory 34 as shown in Fig. 2. Then, the lip sync unit 30 determines a time difference between the first and second acoustic signals based on the stored acoustic signal (stored beep tone). That is, it is now possible via signal processing to calculate the time difference between the first and second beep tones. This could be done via simple methods like band-pass-filtering and correlation. For example, this could be done with an error < 0.3ms for 1 kHz and 3.2 kHz beep tones. The lip sync unit 30 issues a command for a delay adjust of a delay of the video rendering device 102. That is, the command is generated based on the determined time difference and generated command is issued to the video rendering device 102. That is, the lip sync unit 30 is able to adjust to the point of lip sync by delaying the video.

Fig. 3 (b) shows another example of the lip sync management system 100’ comprising a video rendering device (an imaging device) 2 as shown in Fig. 1 , a sound rendering device (a sound reproduction device) 104, a microphone 108 and the lip sync unit 30 as shown in Fig. 2 (a). The lip sync management system 100’ comprises the video rendering device 2 of Fig. 1 , i.e., the convertor circuit 10 is included in the video rendering device 2; this point is an only difference from the lip sync managements system 100 shown in Fig. 3 (a). In the lip sync management system 100’, the video rendering device 2 could have, e.g., a special lip sync-mode where a flash results in a beep on the loudspeakers of the video rendering device 2. This mode needs to be implemented and calibrated, e.g., by triggering the beep if high brightness Bits occur.

Fig. 4 shows further example of a lip sync management system 120. The lip sync management system 120 comprises a TV 122 as a video rendering device, the soundbar (audio sink) 124 as a sound rendering device, Set_Top_Box: STB (source device) 30’ as a lip sync unit, and a remote controller 126 of the STB 30’. The TV 122 is connected to the soundbar 124 and the STB 30’ via HDMI, and the remote controller 126 is positioned to be able to detect an optical signal (a flash) of the TV 122. That is, the convertor circuit to convert an optical signal to an acoustic signal is incorporated into the remote controller 126. In addition, a microphone is incorporated into the STB 30’ and therefore, the distance (sound distance) (d) between the remote control and the STB 30’, and the distance (d) between the soundbar 124 and the STB 30’ is the same as also shown in Fig. 3.

To be summarized, the convertor circuit could be an external device, incorporated into the video rendering device (TV) or incorporated into the remote controller of the lip sync unit (STB).

Fig. 5 shows a possible device combination of a lip sync management system, e.g., for each row, combine one box left with one box right of column. In Fig. 5, RC indicates a remote controller of the Set-Top Box, STB indicates a Set-Top-Box, DataC indicates data connection to the Set-Top-Box. That is, for example, in case an extra device is used as a device to convert light (optical signal) to sound (acoustic signal), the STB, the STB with the remote controller or the remote controller (date connection to the STB) is used as a device for recoding audio (microphone) and a device for adjusting video latency algorithm (lip sync unit).

As an alternative, a slightly different approach could be considered. That is, to install a light sensor and a microphone in the remote controller and do the signal processing there, i.e., the remote controller is able to work as a lip sync unit having a microphone (but without convertor circuit). The result in form of data is then transmitted to the source device via an appropriate technology like e.g. Bluetooth. This remote control would be a hybrid of known technology and a modern smart remote control, which already has a microphone and a Bluetooth connection installed.

Fig. 6 shows flowchart indicating an example of a method for synchronizing audio signals and video signals according to the present application.

As depicted in Fig. 6, an abrupt brightness change of the video is detected (S10). For example, the abrupt brightness change, i.e., a flash or an optical signal from the video rendering device is detected by a flash detector of a convertor circuit in the video rendering device, in the remote controller or in the lip sync unit.

Then, a first acoustic signal is provided in response to the detection of the brightness change (S12). That is, a signalling device of the convertor circuit provides the first acoustic signal in response to the detected brightness change, i.e., the optical signal is converted into the first acoustic signal. The first acoustic signal (first beep tone) from the signalling device (the convertor circuit) and a second acoustic signal (second beep tone) from the sound rendering device are emitted to the microphone at the same time. It should be noted that the frequency of the first acoustic signal is different from the frequency of the second acoustic signal, e.g., the frequency of the first acoustic signal is higher than the second acoustic signal or vice versa.

The sound including the first and second acoustic signals is captured by the microphone (S14). The distance between the convertor circuit and the microphone, and the distance between the sound rendering device and the microphone is the same, and therefore, the microphone captures the sound including the first and second acoustic signals. The captured sound is recoded by the microphone and transmitted to the lip sync unit. A time difference between the first and second acoustic signals is determined (S16). The sound transmitted from the microphone is stored at the memory of the lip sync unit. The lip sync unit determines a time difference between the first and second acoustic signals.

A command for a delay adjust is issued (S18). The lip sync unit issued the command indicating a lip sync point, e.g., delay period of the video content. The lip sync unit is a so- called source device which stores video content to be displayed at the video rendering device. Hence, the video content is transmitted to the video rendering device based on the issued command, i.e., transmitting timing is controlled by the command. For example, the video content is transmitted 30ms delayed timing for synchronizing the sound of the sound rendering device.

In accordance with the embodiments of the present application, the first acoustic signal and the second acoustic signal are emitted to the microphone at the same time and the microphone captures a sound including the first and second acoustic signals. That is, compared with the technical standard the proposed method transforms the old 'two measurements’ approach (separated audio and video, i.e., acoustic signal from the sound rendering device and optical signal from the video rendering device) to a new 'one conversion and one measurement’ approach, new approach saves complexity and avoids problems like the synchronization of the measurements.

In accordance with the embodiments of the present application, the optical signal is converted to the acoustic signal and, therefore, it enables a human to adjust the audio/video synchronization only via hearing, which is more precise than the combination of hearing (acoustic signal) and seeing (optical signal). For example, an operator can adjust the audio/video synchronization by hearing, since the optical signal from the video is converted into the acoustic signal. In accordance with the embodiments of the present application, the lip sync unit, e.g., the source device does not need any optical sensor to capture the optical sensor from the video rendering device. If the optical sensor is required, it is known that the optical sensor would need a line of sight or an extension since light doesn’t go around corners in contrast to sound. Also, sound is much slower than light which will lead to a measurement offset. The requirements for the microphone and the sampling are very low since it is only necessary to distinguish between two frequencies and theirs states: on or off. Hence, it is possible to improve the accuracy of the lip sync with simplified feature configuration of the lip sync unit.

In accordance with the embodiments of the present application, the video rendering device comprises a flash detector and a signalling device, i.e., a convertor circuit. Therefore, the extra device for synchronizing audio/video is not required.

In accordance with the embodiments of the present application, the convertor circuit is incorporated into the remote controller of the lip sync unit. Therefore, it is possible to reduce extra costs to a minimum, since no extra device is needed and buttons to control the process and a power source are already existing in the remote controller, which reduce the conversion circuit cost. Furthermore, the remote controller is flexible enough to adjust the distance to the source device. Thereby it is easily possible to ensure that the loudspeakers and the remote controller have the same distance (d) to the microphone or to the microphone in the lip sync device. So that soundwaves of both beeps arrive at the same time.

In accordance with the embodiments of the present application, it is possible to use the smartphone as a microphone for recoding the beep. This is implemented by using a smartphone app which could automate the process of measuring, calculating and reporting to the source.

In accordance with the embodiments of the present application, the lip sync unit comprises a microphone, i.e., recoding of the beep is done at the lip sync unit. In this case, there are fewer points where something could go wrong, e.g., it is possible to reduce error occurrence possibility. Especially human insertion of numbers is not necessary.

Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus. The inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

5 Depending on certain implementation requirements, embodiments of the application can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a0 programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer 5 system, such that one of the methods described herein is performed.

, Generally, embodiments of the present application can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program0 code may, for example, be stored on a machine readable carrier.

Other embodiments comprise a computer program for performing one of the methods described herein, stored on a machine readable carrier. 5 In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

A further embodiment of the inventive methods is, therefore, a data carrier (or a digital0 storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or nontransitionary. 5 A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.

A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.

The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.

Claims

1. A video rendering device for displaying a video comprising: a flash detector which detects an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.

2. A lip sync unit which determines a time difference between the different acoustic signals, one of which is a first acoustic signal and provided in response to an abrupt brightness change, and another of which is an audio signal as a second acoustic signal, and issues a command for a delay adjust, of a delay of an imaging device and/or of a sound reproduction device.

3. The lip sync unit according to claim 2, wherein the command is issued to adjust a delay difference between an audio content and a video content.

4. The lip sync unit according to claim 2 or 3 comprises a microphone configured to capture the first and the second acoustic signal.

5. A lip sync management system for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: a convertor circuit for converting an optical signal to an acoustical signal, wherein the convertor circuit comprises a flash detector which detects abrupt brightness change of the video from the video rendering device, and a signalling device which provides a first acoustic signal in response to the detection of the brightness change, a microphone configured to capture the first acoustic signal and a second acoustic signal from the sound rendering device, and the lip sync unit according to claim 2 or 3, wherein the captured first and second acoustic signals are transmitted from the microphone to the lip sync unit.

6, The system according to claim 5, wherein the converter circuit is configured to convert the abrupt brightness change of the video to the first acoustic signal at a different frequency from the second acoustic signal.

7. The system according to claim 5 or 6, wherein the distance between the converter circuit and the microphone, and the distance between the sound rendering device and the microphone are the same distance.

8. A method for synchronizing audio signals on a sound rendering device and video signals on a video rendering device comprising: detecting an abrupt brightness change of the video, providing a first acoustic signal in response to the detection of the brightness change, capturing the first acoustic signal and a second acoustic signal which is an audio signal provided from the sound rendering device, determining a time difference between the captured first and second acoustic signals, and issuing a command for a delay adjust, of a delay of the video rendering device and/or of the sound rendering device.

9. Computer program having a program code for performing, when running on computer, a method according to claim 8.

10. A circuit used for a video rendering device for displaying a video comprising: a flash detector which detects an abrupt brightness change of the video, and a signalling device which provides an acoustic signal in response to the detection of the brightness change.