EP2356817B1 - Device and method for synchronizing received audio data with video data - Google Patents

Device and method for synchronizing received audio data with video data Download PDF

Info

Publication number
EP2356817B1
EP2356817B1 EP08878786.6A EP08878786A EP2356817B1 EP 2356817 B1 EP2356817 B1 EP 2356817B1 EP 08878786 A EP08878786 A EP 08878786A EP 2356817 B1 EP2356817 B1 EP 2356817B1
Authority
EP
European Patent Office
Prior art keywords
audio data
segment
audio
video data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP08878786.6A
Other languages
German (de)
French (fr)
Other versions
EP2356817A1 (en
EP2356817A4 (en
Inventor
Håkan OLOFSSON
Fredrik Gustav ÅBERG
Arto Juhani Mahkonen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of EP2356817A1 publication Critical patent/EP2356817A1/en
Publication of EP2356817A4 publication Critical patent/EP2356817A4/en
Application granted granted Critical
Publication of EP2356817B1 publication Critical patent/EP2356817B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the present invention generally relates to methods, devices and systems capable of processing audio and video data and, more particularly, to techniques and methods for compensating for a time delay associated with reproducing video data transmitted with audio data.
  • a communication which includes both video data and associated audio data, using multiple media is becoming increasingly important in the communications industry, both for fixed and mobile access.
  • the traditional speech telephony is more and more often being upgraded to include a video component (i.e., video data), resulting in the opportunity for users to communicate using so-called "video telephony.”
  • the video data associated with a video telephony call is typically created by a video camera in the sending device.
  • the sending device may be a portable device, such as a mobile phone.
  • the user orients the sending device so that the camera is positioned to show the speaker's face.
  • the camera may be used to show other things, which the user finds relevant for the conversation, for example a view that the user wants to share with the person that she or he is talking to.
  • the video data and the audio data are usually generated having a logical connection, e.g., a speech of a user is associated with a video of the face of the user that corresponds to the user generating the speech.
  • the audio and video data are synchronized so that the user experiences a good coordination between the sound and the video.
  • the lip movements of the user shall normally be in synch with the sound from the device's speakerphone to achieve the good coordination. This provides a connection between the lip movements and the heard words, as it would be in a normal discussion between two people at short distance. This is referred to herein as lip-sync or logically related audio and video data.
  • RTP Real-time Transport Protocol
  • RTCP Real-time Transport Protocol
  • RTCP RTP Control Protocol
  • some existing multimedia communication services do not provide any media synchronization, resulting in a poor user experience when lip-synchronization is needed.
  • the systems that are synchronizing the audio with the video typically delay the audio data by a certain amount of time until the video data is decoded, and then both data are played simultaneously to achieve the desired lip-synchronization.
  • this synchronizing method is unpleasant for users due to the increased delay causing long response times and problems for the conversation.
  • the video data typically has a longer delay from the camera to the screen than the speech has from the microphone to the speakerphone.
  • the longer delay for video data is caused by longer algorithmic delay for encoding and decoding, often a slower frame rate (compared to audio data), and in some cases also by longer transfer delay due to the higher bit rate. Assuming that the receiving device synchronizes audio and video, the device has to delay the audio data flow before playing it out.
  • a method for synchronizing video data with audio data received by a communication device the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data.
  • the method includes receiving a first segment of audio data at the communication device; receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data; and applying a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator.
  • a communication device for synchronizing received video data with received audio data, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data.
  • the communication device includes an input/output unit configured to receive a first segment of audio data and to receive a first segment of video data, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data; and a processor configured to apply a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator.
  • a general system 10 includes first 12 and second 14 communication devices connected via a communication network 16 to each other.
  • the devices 12 and 14 may be a desktop, a laptop, a mobile phone, a traditional phone, a personal digital assistant, a digital camera, a video camera, etc.
  • the two devices may be connected to each other via a wireline or a wireless interface.
  • the two devices may be directly connected to each other or via one or more base stations (not shown) that are part of the communication network.
  • base station is used herein as a generic term for any device that facilitates an exchange of data between connecting devices, as for example, a modem, a station in a telecommunication system, a whole network etc.
  • a structure of the device 12 or 14 includes an input/output port 18 that is configured to receive/transmit an audio-video signal AVS.
  • the audio-video signal AVS may include audio data and video data.
  • Each of the audio or video data may include a plurality of segments.
  • a segment may include a number of frames that correspond to a certain time. However, this definition of the segment may be further qualified depending on the specific environment. Examples are provided later for specific embodiments.
  • the plurality of segments of audio and/or video data may include a first segment, a last segment and also may include other segments between the first and last segments.
  • a segment of audio data may correspond to a segment of video data, e.g., a user that records an audio message while video recording his face.
  • the input/output port 18 may be connected via a bus 20 to an antenna 22 or to a wireline (not shown) to receive the video signal AVS.
  • the antenna 22 may be a single antenna or a multiple antenna and may be configured to receive the audio-video signal AVS via an infrared, radio frequency or other known wireless interfaces.
  • the input/output port 18 is also connected to a processor 24 that receives the audio-video signal AVS for processing.
  • the processor 24 may be connected via the bus 20 to a memory 26.
  • the memory 26 may store the audio-video signal AVS and other data necessary for the processor 24.
  • the device 12 may have a display 28 configured to display an image corresponding to the received audio-video signal AVS.
  • the display 28 may be a screen and may also include one or more LED or any other known source emitting device.
  • the display 28 may be a combination of the screen and the LED.
  • the device 12 may have in another exemplary embodiment an input/output interface 30, e.g., a keyboard, a mouse, a microphone, a video camera, etc., which is capable of inputting commands and/or data from a user.
  • the device 12 may have a processing unit 32, connected to the bus 20, which is capable of measuring various indicators of the received audio-video signal AVS, or capable of analyzing the video data of the AVS to extract a face of a user, or capable of reproducing the audio data of the AVS at a different speed (higher or lower than the recording speed).
  • the device 12 may have a sound unit 34 configured to produce a sound based on audio data received by the device. Also, the sound unit 34 may emit a sound as instructed by the processor 24 or may record a sound. In one exemplary embodiment, the sound unit may include a speakerphone and a microphone.
  • the device 14 shown in Figure 1 may have the same structure as the device 12 shown in Figure 2 .
  • the device 12 (see Figure 1 ) is considered to be the sender and the device 14 (also see Figure 1 ) is considered to be the receiver. However, both devices 12 and 14 may act as a sender and/or as a receiver.
  • a user 1 of the device 12 transmits video and audio data to a user 2 of the device 14, the actions, as shown in Figure 3 , are taking place initially in the device 12 and then in the device 14. More specifically, the device 12 receives audio data S1 from user 1 or another source and also video data V1 from the user 1 or another source at a time t1. Both the audio data S1 and the video data V1 are encoded by device 12 and then sent via the input/output unit 18 or the antenna 22 to the user 2.
  • the encoded audio data S2 is sent at a time t2, later than t1 but earlier than a time t3 when the encoded video data V2 is sent.
  • Figure 3 shows that already the video data is delayed by t3-t2 from the audio data. This delay in sending the encoded video data V2 is due to the longer encoding process required by the video data.
  • the encoded audio data S2 is received at a time t4 and the encoded video data V2 is received at a later time t6 by the device 14 of the user 2.
  • the receiving device 14 Because of the delay of the encoded video data V2, it may happen that the receiving device 14 starts to decode the encoded audio data S2 at a time t5, later than the time t4 but prior to the time t6, when the encoded video data V2 is received by the device 14. However, in one exemplary embodiment, the time t5 may be later than time t6. The device 14 also decodes the encoded video data V2 at a time t7, later than the time t6.
  • the earliest the device 14 can play both the decoded video data V3 and the decoded audio data S3 in a synchronized manner is at time t8.
  • the device 14 delays the audio data from time t5 to time t8 and starts to play both the decoded audio data S3 and the decoded video data V3 at time t8. This delay between t5 and t8 creates the problems discussed in the Background section in the conventional devices.
  • Figure 3 also shows timings and encoded/decoded data when the user 2 replies to the user 1 and how a reaction time T1 of the user 2 is experienced by user 1 as an experienced reaction time T.
  • the receiving device of the receiving user may inform the receiving user that the sending user has stopped talking. By having this information, the receiving user may avoid starting to talk while his device is still processing the received data.
  • the receiving device may provide an indication to the user 2 that speech of the user 1 will stop shortly, so that the user 2 can start to talk sooner than otherwise, thus reducing the reaction time T1.
  • the indication may be a visual signal (e.g., a turned on LED or a symbol on a screen of the device) that lasts as long as the speech is active.
  • the signal may be other visual or audible signs.
  • a flow of video and audio data may include a speech pre-notification "a" to user 2. More specifically, user 2 may receive (generates) the pre-notification "a” that user 1 has stopped sending audio data. The pre-notification "a” may be generated at t4 or shortly thereafter and the receiving user becomes aware of the incoming audio data at t4 and not at t7 when the audio data is played out. This pre-notification reduces the reaction time T1 of user 2.
  • the gain i.e., the reduction in the time delay of the audio data
  • FIG. 4 the timing and the symbols used in Figure 4 are similar to those used in Figure 3 and their explanation is not repeated herein.
  • the receiving device may generate a pre-notification informing the user 2 that audio data from user1 is not detected.
  • This pre-notification may be generated and displayed at t8, which is earlier than a time t9 when conventionally user 2 determines that no speech is coming from user 1.
  • the time difference t9 - t8 may be another gain of user 2.
  • an end of a last segment of audio data is determined and the pre-notification is generated based on the end of the last segment.
  • User 1 determines from the speech pre-notification "b” to avoid starting to talk again until the information from user 2 is presented synchronized with other media.
  • the speech pre-notification "b” may be implemented similar to pre-notification "a.” By using the speech pre-notification, the risk of both parties talking at the same time is thus substantially reduced and a reaction time of each party is also reduced.
  • both the pre-notification "a" and "b" may be implemented in each of the communication devices 12 or 14.
  • the user is alerted by his/her own device that audio data from another user has started and is also alerted when that audio data has stopped prior to the audio data being played out.
  • the total gain (i.e., reduction in the time delay of the audio data) when both the pre-notifications "a" and "b" are used in this exemplary embodiment is an actual shortened round-trip delay due to user 2 being notified of the talk burst end, thus shortening his reaction time, combined with an additional reduced risk of cross talking due to user 1 being pre-notified of audio data coming from user 2.
  • the total gain is shown as "B" in Figure 4 .
  • a device configured to generate speech pre-notification reduces the risk of cross talking (talking at the same time of users 1 and 2), and/or achieves a faster response of the user (because the user can better decide when it is his time to speak).
  • Another advantage of one or all of the exemplary embodiments discussed above is the simplicity of implementation, because the device uses already available information in the terminal, i.e., no terminal-external signaling is required.
  • Figure 5 shows a method for synchronizing video data with audio data received by a communication device.
  • the video data includes a plurality of segments of video data and the audio data includes a plurality of segments of audio data.
  • the method includes a step 50 of receiving a first segment of audio data at the communication device, a step 52 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, a step 54 of generating a pre-notification at the communication device related to the first segment of audio data, and a step 56 of processing the pre-notification to generate visual or audio information indicative of the pre-notification.
  • step 54 may include a step 54-1 of generating a pre-notification at the communication device related to a beginning of the first segment of audio data, and a step 54-2 of displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the beginning of the first segment of audio data.
  • step 54 may include a step 54-3 of generating a pre-notification at the communication device related to an end of the first segment of audio data and a step 54-4 of displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the end of the first segment of audio data.
  • step 54 may include all of steps 54-1 to 54-4.
  • a communication between devices 12 and 14 may be set up using state-of-the-art session setup protocols.
  • the exemplary embodiments referring to the technique including face analysis are discussed based on an RTP/ User Datagram Protocol (UDP)/ Internet Protocol (IP) communication system, using RTCP as the enabling synchronization protocol.
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • the exemplary embodiments may be applied to other systems and protocols.
  • the receiving device is configured to apply the synchronization function if needed.
  • the synchronization function may include the time delay of the audio data relative to the video data, novel techniques that are discussed in this disclosure, or a combination thereof.
  • the synchronization function may be implemented in the processor 24 or in the processing unit 32 shown in Figure 2 .
  • At least one of the communication devices 12 and 14 includes the synchronization function according to this exemplary embodiment. In another exemplary embodiment, both communication devices 12 and 14 have the synchronization function.
  • the communication device may be configured to initially have the synchronization function switched on or off. Throughout the communication, the sending device transmits audio and video data along with standard protocol tools to enable synchronization.
  • the synchronization function may be one discussed in the exemplary embodiments or one known by those skilled in the art.
  • the audio data may be synchronized with the video data based on a novel approach that is described next. No pre-notification, face detection or user input is necessary for the following exemplary embodiments.
  • a synchronization process at the start of the speech was described with reference to Figure 3 . That synchronization process is undesirable because of the large time delay of the audio data.
  • the time delay of the audio data is reduced such that the delay does not become annoying for the users of the communication system.
  • the reduction in the time delay may be achieved, according to an exemplary embodiment, by time scaling the audio data.
  • one or more segments of the audio data are played during a first part of the speech at a different speed than during a second part of the speech, which is later than the first part.
  • the first part of the speech may include a first segment and one or more subsequent segments.
  • the first segment which was defined earlier in a generic manner, may be further defined, for this exemplary embodiment, as lasting from a time indicative of a beginning of a talk spurt, when less delayed audio data may be played out with a slower speed than normal, ahead of more delayed video data, until a time when the audio data catches up with the video data, i.e., the audio and video are synchronized.
  • a last segment of the first part of the speech is related to an end of a talk spurt and may last between the play out time of the talk spurt and a current time, when a beginning of a silence period is detected.
  • each part of the speech may correspond to a talk spurt.
  • the audio data may start with a reduced time delay and then, during the first seconds of the speech, more delay is added by time scaling segments of the audio data (audio in "slow-motion") in order to achieve the synchronization of the audio data with the video data.
  • time scaling segments of the audio data audio in "slow-motion"
  • Appendix I for ITU-T's Recommendation G.711 the entire content of which is incorporated here by reference, refers to Waveform Shift Overlap Add (WSOLA), which is such a method.
  • WOLA Waveform Shift Overlap Add
  • At least a first segment of the audio data may be "dilated” by playing the first segment of the audio data at a slower speed than normal.
  • more segments may be played at a lower speed to achieve the synchronization between the audio data and the video data.
  • the reaction delay of the other user may be reduced by using again the time scaling of the received audio data (speed up at least a last segment of the audio data, i.e., "fast-motion" of the audio data).
  • the audio data is not in sync any longer with the video data, the user is able to reduce his or her reaction time and answer with a shorter delay to the other user.
  • the scaling at the end of the speech which is discussed in more detail later, may be implemented in a device without implementing the scaling at the beginning of the speech. However, in one exemplary embodiment, both scaling methods are implemented at least at one of the users.
  • Figure 6 shows the user 1 sending audio and video data to user 2 and user 2 also sending audio and video data to user 1 in response to the received audio and video data.
  • the input of audio data A1 and video data V1, the encoding and receiving of audio data A2 and video data V2, and the decoding of audio data A3 and video data V3 have been discussed with reference to Figure 3 .
  • Playing the decoded audio data A3 and the decoded video data V3 is different and novel from what is shown in Figure 3 . This part is discussed next in detail with regard to Figure 6 .
  • the decoded audio data A3 is played (reproduced) after this audio data is decoded.
  • the starting time t Astart of the decoded audio data A3 is earlier than the starting time t Vstart of the decoded video data V3.
  • the time delay of the speech is reduced comparative to traditional delay methods.
  • at least a first portion "A" of the audio data is played at a slower speed than a normal speed (a predetermined speed) until the decoded video data V3 becomes available.
  • the audio data may still be played at a slower speed in order for the video data to "catch up” with the audio data.
  • the audio data may be played for a period "a" after the video data has started to be reproduced at time t Vstart at the slower speed to achieve the synchronization between the audio and video data.
  • the time period "a” has a predetermined value, for example 2 seconds.
  • the audio and video data may be synchronized after a certain time t catch-up (e.g., 1 s), and "a" is defined as being t catch-up -"A".
  • the audio speed is increased to the normal speed so that at time t s , the synchronization between the audio and video data is achieved.
  • the audio data speed is slowly (continuously and/or monotonically) increased during the time period "a" to the normal speed.
  • the audio data speed is suddenly (step-like manner) increased from the low speed to the normal speed.
  • Speeding up at the end of a talk spurt requires some methods to detect in advance, when a silence period is going to start. One way could be to peek at the packet(s) at the end of the speech buffer as soon as possible in order to enable speeding up.
  • the silence is visible with certain audio codecs (e.g., AMR) from the different size and rate of the frames during silence.
  • the reaction delay of the other user may be reduced by applying the time scaling to at least a last segment of the audio data (speed up the audio data, i.e., "fast-motion" of the audio data).
  • the reproduction speed of the audio data is increased above the normal speed in order to have the audio data A3 presented to the user earlier than the decoded video data V3.
  • the audio data ends, in one exemplary embodiment, with a time interval B earlier that the decoded video data V3.
  • the audio is not in sync any longer with the video, the user is able to reduce his or her reaction time and answer with a shorter delay T1 (shorter by B) to the other user.
  • the scaling at the end of the speech B or D may be implemented in a device without implementing the scaling at the beginning of the speech (A or C).
  • User 1 may similarly start the audio sooner than without time scaling, both because user 2 started sending information earlier, and because the audio data in the communication device is started earlier. Again, the synchronization of the audio data with the video data is achieved after some time (after part C is played), since the audio data is played in the beginning with a slower speed.
  • This approach prevents user 1 from starting to send information, e.g., to start talking while information is being received from user 2. This approach also decreases the level of disturbance, since the experienced reaction time is shorter than in conventional processes.
  • the end of the speech burst can be determined from the already received speech frames, e.g., when silence is detected. Based on this detection, the end of the audio data may be played at a higher speed so that the video data is played for time interval D without the audio data (as the audio data has already been played), to allow that other user to reduce her or his response time.
  • the audio is synchronized with the video for most of the time (except, for example, for periods A, B, C, and D), with a minimum impact on the conversation quality because of the reduced delay of the speech.
  • This exemplary method is a method for synchronizing video data with audio data received by a communication device, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data.
  • the method includes a step 110 of receiving a first segment of audio data at the communication device, a step 112 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, a step 114 of scaling the first segment of audio data, and a step 116 of reproducing the scaled first segment of audio data prior to receiving or decoding the first segment of video data.
  • the method includes a step 120 of receiving a last segment of audio data at the communication device, a step 122 of receiving a last segment of video data at the communication device, at the same time or later in time than the last segment of audio data, a step 124 of scaling the last segment of audio data, and a step 126 of reproducing the scaled last segment of audio data prior to receiving or decoding the last segment of the video data.
  • the steps shown in Figure 12 may be performed in conjunction with the steps shown in Figure 11 or may be performed independent of the steps shown in Figure 11.
  • Figure 9 is a flow chart that shows steps of a method for synchronizing video data with audio data received by a communication device, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data.
  • the method includes a step 130 of receiving a first segment of audio data at the communication device, a step 132 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, and a step 134 of applying a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator.
  • the synchronization mechanism may be one of the above discussed novel synchronization mechanisms.
  • the disclosed exemplary embodiments provide a communication device, a system, a method and a computer program product for sending audio and video data from a sending device to a receiving device and for synchronizing the audio and video data at the receiving device. It should be understood that this description is not intended to limit the invention. On the contrary, the exemplary embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention as defined by the appended claims. Further, in the detailed description of the exemplary embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
  • the exemplary embodiments may be embodied in a wireless communication device, a wired communication device, in a telecommunication network, as a method or in a computer program product. Accordingly, the exemplary embodiments may take the form of an entirely hardware embodiment or an embodiment combining hardware and software aspects. Further, the exemplary embodiments may take the form of a computer program product stored on a computer-readable storage medium having computer-readable instructions embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, digital versatile disc (DVD), optical storage devices, or magnetic storage devices such a floppy disk or magnetic tape. Other non-limiting examples of computer readable media include flash-type memories or other known memories.
  • the exemplary embodiments may also be implemented in an application specific integrated circuit (ASIC), or a digital signal processor.
  • Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
  • a processor in association with software may be used to implement a radio frequency transceiver for use in the user terminal, the base station or any host computer.
  • the user terminal may be used in conjunction with modules, implemented in hardware and/or software, such as a camera, a video camera module, a videophone, a speakerphone, a vibration device, a speaker, a microphone, a television transceiver, a hands free headset, a keyboard, a Bluetooth module, a frequency modulated (FM) radio unit, a liquid crystal display (LCD) display unit, an organic light-emitting diode (OLED) display unit, a digital music player, a media player, a video game player module, an Internet browser, and/or any wireless local area network (WLAN) module.
  • modules implemented in hardware and/or software, such as a camera, a video camera module, a videophone, a speakerphone, a vibration device, a speaker, a microphone, a television transceiver, a hands free headset, a keyboard, a Bluetooth module, a frequency modulated (FM) radio unit, a liquid crystal display (LCD) display unit, an organic light-emitting dio

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Description

    TECHNICAL FIELD
  • The present invention generally relates to methods, devices and systems capable of processing audio and video data and, more particularly, to techniques and methods for compensating for a time delay associated with reproducing video data transmitted with audio data.
  • BACKGROUND
  • A communication, which includes both video data and associated audio data, using multiple media is becoming increasingly important in the communications industry, both for fixed and mobile access. The traditional speech telephony is more and more often being upgraded to include a video component (i.e., video data), resulting in the opportunity for users to communicate using so-called "video telephony."
  • The video data associated with a video telephony call is typically created by a video camera in the sending device. The sending device may be a portable device, such as a mobile phone. Sometimes the user orients the sending device so that the camera is positioned to show the speaker's face. However, the camera may be used to show other things, which the user finds relevant for the conversation, for example a view that the user wants to share with the person that she or he is talking to. Thus, what is shown during a communication session can change. In this context, the video data and the audio data are usually generated having a logical connection, e.g., a speech of a user is associated with a video of the face of the user that corresponds to the user generating the speech.
  • When the speaking user is also shown on the listening user's screen, it is desirable that the audio and video data are synchronized so that the user experiences a good coordination between the sound and the video. The lip movements of the user shall normally be in synch with the sound from the device's speakerphone to achieve the good coordination. This provides a connection between the lip movements and the heard words, as it would be in a normal discussion between two people at short distance. This is referred to herein as lip-sync or logically related audio and video data.
  • Hence, in the existing services, such as 3G circuit-switched video telephony (see for example 3GPPTS26.111, from 3GPP standard group, ETSI Mobile Competence Centre 650, route des Lucioles 06921 Sophia-Antipolis Cedex, France) and emerging IP multimedia services such as IMS Multimedia Telephony (see for example 3GPP TS 22.173 and ETSI TS181002 from ETSI) the support of inter-media synchronization is desired. The traditional methods to achieve synchronization between audio and video are discussed next. For Circuit Switched Multimedia, there can be provided an indication of how much the audio shall be delayed in order to be synchronized with the video (see ITU-T H.324). For services that are transported on Real-time Transport Protocol (RTP, see IETF RFC3550), RTP timestamps together with RTP Control Protocol (RTCP) sender reports can be used as input to achieve the synchronization (see IETF RFC3550). However, some existing multimedia communication services do not provide any media synchronization, resulting in a poor user experience when lip-synchronization is needed.
  • The systems that are synchronizing the audio with the video typically delay the audio data by a certain amount of time until the video data is decoded, and then both data are played simultaneously to achieve the desired lip-synchronization. However, this synchronizing method is unpleasant for users due to the increased delay causing long response times and problems for the conversation. For example, the video data typically has a longer delay from the camera to the screen than the speech has from the microphone to the speakerphone. The longer delay for video data is caused by longer algorithmic delay for encoding and decoding, often a slower frame rate (compared to audio data), and in some cases also by longer transfer delay due to the higher bit rate. Assuming that the receiving device synchronizes audio and video, the device has to delay the audio data flow before playing it out. This naturally causes a reduced user experience of the speech, which in turn hampers the conversational quality. For example, when the delay of the audio data exceeds a certain limit (about 200ms), it starts to impact the conversational quality. First, there may be some annoyance of the user because, the other speaker seems to react slowly, and sometimes both speakers start to talk simultaneously (because they will notice this problem only after some time delay). If the delay is large (e.g., over 500ms), it starts to be difficult to keep up a normal conversation. Thus, one cause of the dissatisfaction of the speakers using video telephony is that the response time of the other speaker is too long, unlike in a normal face-to-face or speech telephony conversation.
  • Accordingly, it would be desirable to provide devices, systems and methods for audio and video communications that avoid the afore-described problems and drawbacks.
  • Related art within this technical field is disclosed e.g. in US 5,818,514 , describing a video conferencing system, in which the problem caused by the difference between the processing delay of the video signal and the processing delay of the audio signal, e.g. during dialogue between two speakers, is reduced by providing other users with an immediate indication when a first user begins to speak.
  • SUMMARY
  • According to an exemplary embodiment, there is a method for synchronizing video data with audio data received by a communication device, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data. The method includes receiving a first segment of audio data at the communication device; receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data; and applying a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator.
  • According to another exemplary embodiment, there is a communication device for synchronizing received video data with received audio data, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data. The communication device includes an input/output unit configured to receive a first segment of audio data and to receive a first segment of video data, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data; and a processor configured to apply a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate one or more embodiments and, together with the description, explain these embodiments. In the drawings:
    • Figure 1 is a schematic diagram of a communication system including a sending device, a receiving device and a communication network according to an exemplary embodiment;
    • Figure 2 is a schematic diagram of the sending or receiving device according to an exemplary embodiment;
    • Figure 3 is a schematic diagram showing the timing of audio and video data exchanged between the sending device and the receiving device;
    • Figure 4 is a schematic diagram showing the timing of audio and video data exchanged between the sending device and the receiving device using pre-notification according to an exemplary embodiment; Figure 5 is a flow chart indicating steps performed for transmitting the pre-notification according to an exemplary embodiment;
    • Figure 6 a schematic diagram showing the timing of audio and video data exchanged between the sending device and the receiving device with time scaling;
    • Figure 7 is a flow chart illustrating steps for applying the time scaling to a first segment of audio data according to an exemplary embodiment;
    • Figure 8 is a flow chart illustrating steps for applying the time scaling to a last segment of audio data according to an exemplary embodiment; and
    • Figure 9 is a flow chart illustrating steps of a method for synchronizing video data with audio data.
    LIST OF ABBREVIATIONS
  • RTP -
    Real-Time Transport Protocol;
    RTCP -
    Real-Time Control Protocol;
    AVS -
    Audio-video signal;
    LED -
    Light Emitting Diode;
    UDP -
    User Datagram Protocol;
    IP -
    Internet Protocol;
    AMR -
    Adaptive Multi-Rate;
    DVD -
    Digital Versatile Disc;
    ASIC -
    Application Specific Integrated Circuit;
    DSP -
    Digital Signal Processor;
    FPGA -
    Field Programmable Gate Array;
    IC -
    Integrated Circuit;
    FM -
    Frequency Modulated;
    LCD -
    Liquid Crystal Display;
    OLED -
    Organic Light-Emitting Diode; and
    WLAN -
    Wireless Local Area Network.
    DETAILED DESCRIPTION
  • The following description of the exemplary embodiments refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims. The following embodiments are discussed, for simplicity, with regard to a user that uses a mobile phone to communicate with another user that also uses a mobile phone. However, the embodiments to be discussed next are not limited to this system but may be applied to other existing audio and video transmission systems.
  • Reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present invention. Thus, the appearance of the phrases "in one embodiment" or "in an embodiment" in various places throughout the specification are not necessarily all referring to the same embodiment. Further, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
  • As shown in Figure 1, according to an exemplary embodiment, a general system 10 includes first 12 and second 14 communication devices connected via a communication network 16 to each other. The devices 12 and 14 may be a desktop, a laptop, a mobile phone, a traditional phone, a personal digital assistant, a digital camera, a video camera, etc. The two devices may be connected to each other via a wireline or a wireless interface. The two devices may be directly connected to each other or via one or more base stations (not shown) that are part of the communication network. The term "base station" is used herein as a generic term for any device that facilitates an exchange of data between connecting devices, as for example, a modem, a station in a telecommunication system, a whole network etc.
  • As shown in Figure 2, a structure of the device 12 or 14 includes an input/output port 18 that is configured to receive/transmit an audio-video signal AVS. The audio-video signal AVS may include audio data and video data. Each of the audio or video data may include a plurality of segments. A segment may include a number of frames that correspond to a certain time. However, this definition of the segment may be further qualified depending on the specific environment. Examples are provided later for specific embodiments. The plurality of segments of audio and/or video data may include a first segment, a last segment and also may include other segments between the first and last segments. A segment of audio data may correspond to a segment of video data, e.g., a user that records an audio message while video recording his face. The input/output port 18 may be connected via a bus 20 to an antenna 22 or to a wireline (not shown) to receive the video signal AVS. The antenna 22 may be a single antenna or a multiple antenna and may be configured to receive the audio-video signal AVS via an infrared, radio frequency or other known wireless interfaces. The input/output port 18 is also connected to a processor 24 that receives the audio-video signal AVS for processing. The processor 24 may be connected via the bus 20 to a memory 26. The memory 26 may store the audio-video signal AVS and other data necessary for the processor 24. In an exemplary embodiment, the device 12 may have a display 28 configured to display an image corresponding to the received audio-video signal AVS. The display 28 may be a screen and may also include one or more LED or any other known source emitting device. The display 28 may be a combination of the screen and the LED. The device 12 may have in another exemplary embodiment an input/output interface 30, e.g., a keyboard, a mouse, a microphone, a video camera, etc., which is capable of inputting commands and/or data from a user. The device 12 may have a processing unit 32, connected to the bus 20, which is capable of measuring various indicators of the received audio-video signal AVS, or capable of analyzing the video data of the AVS to extract a face of a user, or capable of reproducing the audio data of the AVS at a different speed (higher or lower than the recording speed). The device 12 may have a sound unit 34 configured to produce a sound based on audio data received by the device. Also, the sound unit 34 may emit a sound as instructed by the processor 24 or may record a sound. In one exemplary embodiment, the sound unit may include a speakerphone and a microphone. The device 14 shown in Figure 1 may have the same structure as the device 12 shown in Figure 2.
  • In the following, for simplicity, the device 12 (see Figure 1) is considered to be the sender and the device 14 (also see Figure 1) is considered to be the receiver. However, both devices 12 and 14 may act as a sender and/or as a receiver. When a user 1 of the device 12 transmits video and audio data to a user 2 of the device 14, the actions, as shown in Figure 3, are taking place initially in the device 12 and then in the device 14. More specifically, the device 12 receives audio data S1 from user 1 or another source and also video data V1 from the user 1 or another source at a time t1. Both the audio data S1 and the video data V1 are encoded by device 12 and then sent via the input/output unit 18 or the antenna 22 to the user 2. The encoded audio data S2 is sent at a time t2, later than t1 but earlier than a time t3 when the encoded video data V2 is sent. Figure 3 shows that already the video data is delayed by t3-t2 from the audio data. This delay in sending the encoded video data V2 is due to the longer encoding process required by the video data. The encoded audio data S2 is received at a time t4 and the encoded video data V2 is received at a later time t6 by the device 14 of the user 2. Because of the delay of the encoded video data V2, it may happen that the receiving device 14 starts to decode the encoded audio data S2 at a time t5, later than the time t4 but prior to the time t6, when the encoded video data V2 is received by the device 14. However, in one exemplary embodiment, the time t5 may be later than time t6. The device 14 also decodes the encoded video data V2 at a time t7, later than the time t6.
  • The earliest the device 14 can play both the decoded video data V3 and the decoded audio data S3 in a synchronized manner is at time t8. Thus, in traditional devices, the device 14 delays the audio data from time t5 to time t8 and starts to play both the decoded audio data S3 and the decoded video data V3 at time t8. This delay between t5 and t8 creates the problems discussed in the Background section in the conventional devices. Figure 3 also shows timings and encoded/decoded data when the user 2 replies to the user 1 and how a reaction time T1 of the user 2 is experienced by user 1 as an experienced reaction time T.
  • According to an exemplary embodiment, the receiving device of the receiving user may inform the receiving user that the sending user has stopped talking. By having this information, the receiving user may avoid starting to talk while his device is still processing the received data. In this regard, it is noted that in the conventional devices there is a delay between (i) the time the receiving device has received the last fragment of audio data from the sending user, and (ii) the time the receiving user becomes aware of this fact, due to the internal processing of the receiving device. However, according to this embodiment, this delay is reduced or eliminated. According to another exemplary embodiment, the receiving device may provide an indication to the user 2 that speech of the user 1 will stop shortly, so that the user 2 can start to talk sooner than otherwise, thus reducing the reaction time T1. The indication may be a visual signal (e.g., a turned on LED or a symbol on a screen of the device) that lasts as long as the speech is active. The signal may be other visual or audible signs.
  • According to an exemplary embodiment shown in Figure 4, a flow of video and audio data may include a speech pre-notification "a" to user 2. More specifically, user 2 may receive (generates) the pre-notification "a" that user 1 has stopped sending audio data. The pre-notification "a" may be generated at t4 or shortly thereafter and the receiving user becomes aware of the incoming audio data at t4 and not at t7 when the audio data is played out. This pre-notification reduces the reaction time T1 of user 2. The gain (i.e., the reduction in the time delay of the audio data) is shown in Figure 4 as "A". In this regard, the timing and the symbols used in Figure 4 are similar to those used in Figure 3 and their explanation is not repeated herein.
  • According to another exemplary embodiment, the receiving device may generate a pre-notification informing the user 2 that audio data from user1 is not detected. This pre-notification may be generated and displayed at t8, which is earlier than a time t9 when conventionally user 2 determines that no speech is coming from user 1. Thus, the time difference t9 - t8 may be another gain of user 2. In this exemplary embodiment, an end of a last segment of audio data is determined and the pre-notification is generated based on the end of the last segment.
  • In another exemplary embodiment, there is an indication "b" generated in user 1's equipment when a start of audio data from user 2 is received) to user 1 that user 2 has started to send audio data. User 1 determines from the speech pre-notification "b" to avoid starting to talk again until the information from user 2 is presented synchronized with other media. The speech pre-notification "b" may be implemented similar to pre-notification "a." By using the speech pre-notification, the risk of both parties talking at the same time is thus substantially reduced and a reaction time of each party is also reduced.
  • In one exemplary embodiment, both the pre-notification "a" and "b" may be implemented in each of the communication devices 12 or 14. In this embodiment, the user is alerted by his/her own device that audio data from another user has started and is also alerted when that audio data has stopped prior to the audio data being played out.
  • The total gain (i.e., reduction in the time delay of the audio data) when both the pre-notifications "a" and "b" are used in this exemplary embodiment is an actual shortened round-trip delay due to user 2 being notified of the talk burst end, thus shortening his reaction time, combined with an additional reduced risk of cross talking due to user 1 being pre-notified of audio data coming from user 2. The total gain is shown as "B" in Figure 4. Thus, according to the discussed exemplary embodiments, a device configured to generate speech pre-notification reduces the risk of cross talking (talking at the same time of users 1 and 2), and/or achieves a faster response of the user (because the user can better decide when it is his time to speak). Another advantage of one or all of the exemplary embodiments discussed above is the simplicity of implementation, because the device uses already available information in the terminal, i.e., no terminal-external signaling is required.
  • According to an exemplary method that implements the above discussed exemplary embodiments, Figure 5 shows a method for synchronizing video data with audio data received by a communication device. The video data includes a plurality of segments of video data and the audio data includes a plurality of segments of audio data. The method includes a step 50 of receiving a first segment of audio data at the communication device, a step 52 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, a step 54 of generating a pre-notification at the communication device related to the first segment of audio data, and a step 56 of processing the pre-notification to generate visual or audio information indicative of the pre-notification.
  • In more details, step 54 may include a step 54-1 of generating a pre-notification at the communication device related to a beginning of the first segment of audio data, and a step 54-2 of displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the beginning of the first segment of audio data. Alternatively, step 54 may include a step 54-3 of generating a pre-notification at the communication device related to an end of the first segment of audio data and a step 54-4 of displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the end of the first segment of audio data. Still in another embodiment, step 54 may include all of steps 54-1 to 54-4.
  • A communication between devices 12 and 14 may be set up using state-of-the-art session setup protocols. For simplicity, the exemplary embodiments referring to the technique including face analysis are discussed based on an RTP/ User Datagram Protocol (UDP)/ Internet Protocol (IP) communication system, using RTCP as the enabling synchronization protocol. However, the exemplary embodiments may be applied to other systems and protocols.
  • The receiving device is configured to apply the synchronization function if needed. The synchronization function may include the time delay of the audio data relative to the video data, novel techniques that are discussed in this disclosure, or a combination thereof. The synchronization function may be implemented in the processor 24 or in the processing unit 32 shown in Figure 2. At least one of the communication devices 12 and 14 includes the synchronization function according to this exemplary embodiment. In another exemplary embodiment, both communication devices 12 and 14 have the synchronization function.
  • The communication device may be configured to initially have the synchronization function switched on or off. Throughout the communication, the sending device transmits audio and video data along with standard protocol tools to enable synchronization.
  • The synchronization function may be one discussed in the exemplary embodiments or one known by those skilled in the art.
  • According to the following exemplary embodiments, the audio data may be synchronized with the video data based on a novel approach that is described next. No pre-notification, face detection or user input is necessary for the following exemplary embodiments. A synchronization process at the start of the speech was described with reference to Figure 3. That synchronization process is undesirable because of the large time delay of the audio data. However, according to the present novel approach, the time delay of the audio data is reduced such that the delay does not become annoying for the users of the communication system. The reduction in the time delay may be achieved, according to an exemplary embodiment, by time scaling the audio data.
  • More specifically, one or more segments of the audio data are played during a first part of the speech at a different speed than during a second part of the speech, which is later than the first part. The first part of the speech may include a first segment and one or more subsequent segments. In this context, the first segment, which was defined earlier in a generic manner, may be further defined, for this exemplary embodiment, as lasting from a time indicative of a beginning of a talk spurt, when less delayed audio data may be played out with a slower speed than normal, ahead of more delayed video data, until a time when the audio data catches up with the video data, i.e., the audio and video are synchronized. One way to monitor and decide when the audio and video data are synchronized is to monitor a timestamp of frames of the audio and video data. A last segment of the first part of the speech is related to an end of a talk spurt and may last between the play out time of the talk spurt and a current time, when a beginning of a silence period is detected. According to an exemplary embodiment, each part of the speech may correspond to a talk spurt.
  • Thus, the audio data may start with a reduced time delay and then, during the first seconds of the speech, more delay is added by time scaling segments of the audio data (audio in "slow-motion") in order to achieve the synchronization of the audio data with the video data. There are various methods for accomplishing the time scaling of the audio data, so that its perceptual quality is not degraded too much. For instance Appendix I for ITU-T's Recommendation G.711, the entire content of which is incorporated here by reference, refers to Waveform Shift Overlap Add (WSOLA), which is such a method. When the synchronization is achieved, the audio and video data is played at normal speed, until just before the end. In other words, because the audio data is played earlier than the video data and because the two types of data have the same original length, at least a first segment of the audio data may be "dilated" by
    playing the first segment of the audio data at a slower speed than normal. According to an exemplary embodiment, more segments (the first segment and subsequent segments of the audio data) may be played at a lower speed to achieve the synchronization between the audio data and the video data.
  • At the end of the audio data received from a user, the reaction delay of the other user may be reduced by using again the time scaling of the received audio data (speed up at least a last segment of the audio data, i.e., "fast-motion" of the audio data). Although the audio data is not in sync any longer with the video data, the user is able to reduce his or her reaction time and answer with a shorter delay to the other user. The scaling at the end of the speech, which is discussed in more detail later, may be implemented in a device without implementing the scaling at the beginning of the speech. However, in one exemplary embodiment, both scaling methods are implemented at least at one of the users. These novel processes may make the conversational interaction between the users better, while still achieving the synchronization of video and audio data for most of the duration of the conversation.
  • According to an exemplary embodiment, Figure 6 shows the user 1 sending audio and video data to user 2 and user 2 also sending audio and video data to user 1 in response to the received audio and video data. The input of audio data A1 and video data V1, the encoding and receiving of audio data A2 and video data V2, and the decoding of audio data A3 and video data V3 have been discussed with reference to Figure 3. Playing the decoded audio data A3 and the decoded video data V3 is different and novel from what is shown in Figure 3. This part is discussed next in detail with regard to Figure 6. Instead of delaying the decoded audio data A3 until the decoded video data V3 becomes available, according to an exemplary embodiment, the decoded audio data A3 is played (reproduced) after this audio data is decoded. Thus, as shown in Figure 6, the starting time tAstart of the decoded audio data A3 is earlier than the starting time tVstart of the decoded video data V3. Thus, the time delay of the speech is reduced comparative to traditional delay methods. However, to achieve the synchronization between the decoded audio data A3 and the decoded video data V3, at least a first portion "A" of the audio data is played at a slower speed than a normal speed (a predetermined speed) until the decoded video data V3 becomes available. When the decoded video data V3 is available, the audio data may still be played at a slower speed in order for the video data to "catch up" with the audio data. The audio data may be played for a period "a" after the video data has started to be reproduced at time tVstart at the slower speed to achieve the synchronization between the audio and video data. In one exemplary embodiment, the time period "a" has a predetermined value, for example 2 seconds. According to another exemplary embodiment, the audio and video data may be synchronized after a certain time t catch-up (e.g., 1 s), and "a" is defined as being t catch-up -"A". At the end of the time period "a," the audio speed is increased to the normal speed so that at time ts, the synchronization between the audio and video data is achieved. In one exemplary embodiment the audio data speed is slowly (continuously and/or monotonically) increased during the time period "a" to the normal speed. In another embodiment, the audio data speed is suddenly (step-like manner) increased from the low speed to the normal speed.
  • Speeding up at the end of a talk spurt requires some methods to detect in advance, when a silence period is going to start. One way could be to peek at the packet(s) at the end of the speech buffer as soon as possible in order to enable speeding up. The silence is visible with certain audio codecs (e.g., AMR) from the different size and rate of the frames during silence. In the end of the audio data, the reaction delay of the other user may be reduced by applying the time scaling to at least a last segment of the audio data (speed up the audio data, i.e., "fast-motion" of the audio data). As shown in Figure 6, just before the end of the decoded video data V3, the reproduction speed of the audio data is increased above the normal speed in order to have the audio data A3 presented to the user earlier than the decoded video data V3. The audio data ends, in one exemplary embodiment, with a time interval B earlier that the decoded video data V3. Although the audio is not in sync any longer with the video, the user is able to reduce his or her reaction time and answer with a shorter delay T1 (shorter by B) to the other user. The scaling at the end of the speech B or D may be implemented in a device without implementing the scaling at the beginning of the speech (A or C).
  • User 1 may similarly start the audio sooner than without time scaling, both because user 2 started sending information earlier, and because the audio data in the communication device is started earlier. Again, the synchronization of the audio data with the video data is achieved after some time (after part C is played), since the audio data is played in the beginning with a slower speed. This approach prevents user 1 from starting to send information, e.g., to start talking while information is being received from user 2. This approach also decreases the level of disturbance, since the experienced reaction time is shorter than in conventional processes.
  • The end of the speech burst can be determined from the already received speech frames, e.g., when silence is detected. Based on this detection, the end of the audio data may be played at a higher speed so that the video data is played for time interval D
    without the audio data (as the audio data has already been played), to allow that other user to reduce her or his response time. Thus, according to these exemplary embodiments, the audio is synchronized with the video for most of the time (except, for example, for periods A, B, C, and D), with a minimum impact on the conversation quality because of the reduced delay of the speech.
  • An exemplary method that scales the first segment is shown in Figure 7. This exemplary method is a method for synchronizing video data with audio data received by a communication device, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data. The method includes a step 110 of receiving a first segment of audio data at the communication device, a step 112 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, a step 114 of scaling the first segment of audio data, and a step 116 of reproducing the scaled first segment of audio data prior to receiving or decoding the first segment of video data.
  • Another exemplary method that scales the last segment of audio data is discussed with reference to Figure 8. The method includes a step 120 of receiving a last segment of audio data at the communication device, a step 122 of receiving a last segment of video data at the communication device, at the same time or later in time than the last segment of audio data, a step 124 of scaling the last segment of audio data, and a step 126 of reproducing the scaled last segment of audio data prior to receiving or decoding the last segment of the video data. The steps shown in Figure 12 may be performed in conjunction with the steps shown in Figure 11 or may be performed independent of the steps shown in Figure 11.
  • Figure 9 is a flow chart that shows steps of a method for synchronizing video data with audio data received by a communication device, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data. The method includes a step 130 of receiving a first segment of audio data at the communication device, a step 132 of receiving a first segment of video data at the communication device, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data, and a step 134 of applying a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator. The synchronization mechanism may be one of the above discussed novel synchronization mechanisms.
  • The various exemplary embodiments have been discussed above in isolation. However, any combination of these exemplary embodiments may be used as would be appreciated by those skilled in the art.
  • The disclosed exemplary embodiments provide a communication device, a system, a method and a computer program product for sending audio and video data from a sending device to a receiving device and for synchronizing the audio and video data at the receiving device. It should be understood that this description is not intended to limit the invention. On the contrary, the exemplary embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention as defined by the appended claims. Further, in the detailed description of the exemplary embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
  • As also will be appreciated by one skilled in the art, the exemplary embodiments may be embodied in a wireless communication device, a wired communication device, in a telecommunication network, as a method or in a computer program product. Accordingly, the exemplary embodiments may take the form of an entirely hardware embodiment or an embodiment combining hardware and software aspects. Further, the exemplary embodiments may take the form of a computer program product stored on a computer-readable storage medium having computer-readable instructions embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, digital versatile disc (DVD), optical storage devices, or magnetic storage devices such a floppy disk or magnetic tape. Other non-limiting examples of computer readable media include flash-type memories or other known memories. Although the features and elements of the present exemplary embodiments are described in the embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the embodiments or in various combinations with or without other features and elements disclosed herein. The methods or flow charts provided in the present application may be implemented in a computer program, software, or firmware tangibly embodied in a computer-readable storage medium for execution by a general purpose computer or a processor.
  • The exemplary embodiments may also be implemented in an application specific integrated circuit (ASIC), or a digital signal processor. Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. A processor in association with software may be used to implement a radio frequency transceiver for use in the user terminal, the base station or any host computer. The user terminal may be used in conjunction with modules, implemented in hardware and/or software, such as a camera, a video camera module, a videophone, a speakerphone, a vibration device, a speaker, a microphone, a television transceiver, a hands free headset, a keyboard, a Bluetooth module, a frequency modulated (FM) radio unit, a liquid crystal display (LCD) display unit, an organic light-emitting diode (OLED) display unit, a digital music player, a media player, a video game player module, an Internet browser, and/or any wireless local area network (WLAN) module.

Claims (4)

  1. A method for synchronizing video data with audio data received by a communication device (12), the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data, the method comprising:
    receiving a first segment of audio data at the communication device (12);
    receiving a first segment of video data at the communication device (12), at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data;
    applying a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator;
    generating a pre-notification at the communication device related to a beginning of the first segment of audio data and displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the beginning of the first segment of audio data;
    generating a pre-notification at the communication device related to an end of a last segment of audio data; and
    displaying visual information or reproducing audio information indicative of the pre-notification prior to playing the end of the last segment of audio data.
  2. The method of Claim 1, further comprising:
    displaying the visual information as a picture or a character on a screen of the communication device or as a light produced by a light source or processing the audio information to emit a sound.
  3. A communication device (12) for synchronizing received video data with received audio data, the video data including a plurality of segments of video data and the audio data including a plurality of segments of audio data, the communication device (12) comprising:
    an input/output unit (18) configured to receive a first segment of audio data and to receive a first segment of video data, at the same time or later in time than the first segment of audio data, the first segment of video data being logically related to the first segment of audio data; and
    a processor (24) configured to apply a synchronization mechanism between the first segment of audio data and the first segment of video data based on a predetermined indicator; and further to generate a pre-notification related to a beginning of the first segment of audio data and display visual information or reproduce audio information indicative of the pre-notification prior to playing the beginning of the first segment of audio data, and to generate a pre-notification related to an end of a last segment of audio data; and
    display visual information or reproduce audio information indicative of the pre-notification prior to playing the end of the last segment of audio data.
  4. The device of Claim 3, further comprising:
    a display unit configured to display the visual information as a picture or a character on a screen of the communication device or as a light generated by a light source; or
    a sound reproducing unit configured to reproduce the audio information as a sound.
EP08878786.6A 2008-12-08 2008-12-08 Device and method for synchronizing received audio data with video data Active EP2356817B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2008/051420 WO2010068151A1 (en) 2008-12-08 2008-12-08 Device and method for synchronizing received audio data with video data

Publications (3)

Publication Number Publication Date
EP2356817A1 EP2356817A1 (en) 2011-08-17
EP2356817A4 EP2356817A4 (en) 2014-04-09
EP2356817B1 true EP2356817B1 (en) 2017-04-12

Family

ID=42242945

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08878786.6A Active EP2356817B1 (en) 2008-12-08 2008-12-08 Device and method for synchronizing received audio data with video data

Country Status (4)

Country Link
US (1) US9392220B2 (en)
EP (1) EP2356817B1 (en)
JP (1) JP5363588B2 (en)
WO (1) WO2010068151A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8965026B2 (en) * 2011-06-10 2015-02-24 Canopy Co. Method and apparatus for remote capture of audio in a handheld device
US9808724B2 (en) * 2010-09-20 2017-11-07 Activision Publishing, Inc. Music game software and input device utilizing a video player
US20130166769A1 (en) * 2010-12-20 2013-06-27 Awind, Inc. Receiving device, screen frame transmission system and method
US8495236B1 (en) 2012-02-29 2013-07-23 ExXothermic, Inc. Interaction of user devices and servers in an environment
US20150296247A1 (en) * 2012-02-29 2015-10-15 ExXothermic, Inc. Interaction of user devices and video devices
WO2015002586A1 (en) * 2013-07-04 2015-01-08 Telefonaktiebolaget L M Ericsson (Publ) Audio and video synchronization
WO2015107909A1 (en) * 2014-01-20 2015-07-23 パナソニックIpマネジメント株式会社 Reproduction device and data reproduction method
CN107690055A (en) * 2016-08-05 2018-02-13 中兴通讯股份有限公司 The control method of video calling, apparatus and system
KR20180068069A (en) * 2016-12-13 2018-06-21 삼성전자주식회사 Electronic apparatus and controlling method thereof
US10629223B2 (en) 2017-05-31 2020-04-21 International Business Machines Corporation Fast playback in media files with reduced impact to speech quality
US10872115B2 (en) 2018-03-19 2020-12-22 Motorola Mobility Llc Automatically associating an image with an audio track
US11197054B2 (en) 2018-12-05 2021-12-07 Roku, Inc. Low latency distribution of audio using a single radio
KR20220042893A (en) * 2020-09-28 2022-04-05 삼성전자주식회사 An electronic device that performs synchronization of video data and audio data and control method thereof

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04137987A (en) * 1990-09-28 1992-05-12 Matsushita Electric Ind Co Ltd Information processor
US5608839A (en) * 1994-03-18 1997-03-04 Lucent Technologies Inc. Sound-synchronized video system
US5818514A (en) * 1994-12-01 1998-10-06 Lucent Technologies Inc. Video conferencing system and method for providing enhanced interactive communication
US5953049A (en) * 1996-08-02 1999-09-14 Lucent Technologies Inc. Adaptive audio delay control for multimedia conferencing
US6359656B1 (en) * 1996-12-20 2002-03-19 Intel Corporation In-band synchronization of data streams with audio/video streams
US20020089602A1 (en) * 2000-10-18 2002-07-11 Sullivan Gary J. Compressed timing indicators for media samples
FR2823038B1 (en) * 2001-03-29 2003-07-04 Eads Defence & Security Ntwk METHOD OF MANAGING INTERNSHIP FOR HALF-DUPLEX COMMUNICATION THROUGH A PACKET SWITCHED TRANSPORT NETWORK
WO2003039086A1 (en) * 2001-10-29 2003-05-08 Mpnet International, Inc. Data structure, method, and system for multimedia communications
US6693663B1 (en) * 2002-06-14 2004-02-17 Scott C. Harris Videoconferencing systems with recognition ability
US7212248B2 (en) * 2002-09-09 2007-05-01 The Directv Group, Inc. Method and apparatus for lipsync measurement and correction
US7142250B1 (en) * 2003-04-05 2006-11-28 Apple Computer, Inc. Method and apparatus for synchronizing audio and video streams
WO2005011281A1 (en) * 2003-07-25 2005-02-03 Koninklijke Philips Electronics N.V. Method and device for generating and detecting fingerprints for synchronizing audio and video
US7170545B2 (en) 2004-04-27 2007-01-30 Polycom, Inc. Method and apparatus for inserting variable audio delay to minimize latency in video conferencing
EP1751956B1 (en) * 2004-05-13 2011-05-04 Qualcomm, Incorporated Delivery of information over a communication channel
US7471337B2 (en) * 2004-06-09 2008-12-30 Lsi Corporation Method of audio-video synchronization
JP4182437B2 (en) * 2004-10-04 2008-11-19 ソニー株式会社 Audio video synchronization system and monitor device
KR20060067053A (en) * 2004-12-14 2006-06-19 삼성전자주식회사 Method for controlling time to talk for poc user having the right to speak and system thereof
US20080273116A1 (en) * 2005-09-12 2008-11-06 Nxp B.V. Method of Receiving a Multimedia Signal Comprising Audio and Video Frames
US7764713B2 (en) * 2005-09-28 2010-07-27 Avaya Inc. Synchronization watermarking in multimedia streams
JP5273042B2 (en) * 2007-05-25 2013-08-28 日本電気株式会社 Image sound section group association apparatus, method, and program
US8576922B2 (en) * 2007-06-10 2013-11-05 Apple Inc. Capturing media in synchronized fashion
US20140096167A1 (en) * 2012-09-28 2014-04-03 Vringo Labs, Inc. Video reaction group messaging with group viewing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
JP5363588B2 (en) 2013-12-11
EP2356817A1 (en) 2011-08-17
US20120169837A1 (en) 2012-07-05
EP2356817A4 (en) 2014-04-09
US9392220B2 (en) 2016-07-12
JP2012511279A (en) 2012-05-17
WO2010068151A1 (en) 2010-06-17

Similar Documents

Publication Publication Date Title
EP2356817B1 (en) Device and method for synchronizing received audio data with video data
US10930262B2 (en) Artificially generated speech for a communication session
KR101008764B1 (en) Method and system for improving interactive media response systems using visual cues
US10250661B2 (en) Method of controlling a real-time conference session
US8704869B2 (en) Videoconferencing systems with recognition ability
EP2063662B1 (en) Multimedia apparatus and synchronization method thereof
US20110158235A1 (en) Stream delivery system, call control server, and stream delivery control method
US20080153442A1 (en) Apparatus and method for displaying multi-point communication information
JP2006238445A (en) Method and apparatus for handling network jitter in voice-over ip communication network using virtual jitter buffer and time scale modification
EP2105014B1 (en) Receiver actions and implementations for efficient media handling
CN105991854B (en) System and method for visualizing VoIP (Voice over Internet protocol) teleconference on intelligent terminal
US7110416B2 (en) Method and apparatus for reducing synchronization delay in packet-based voice terminals
US8493429B2 (en) Method and terminal for synchronously recording sounds and images of opposite ends based on circuit domain video telephone
US20050282580A1 (en) Video and audio synchronization
US8345664B2 (en) IP communication apparatus
JP2015012557A (en) Video audio processor, video audio processing system, video audio synchronization method, and program
JP2007274020A (en) Communication terminal device, and communication control device
KR100632509B1 (en) Audio and video synchronization of video player
JP2008028599A (en) Reproduction method of multimedia data, and main communication apparatus, sub-communication apparatus, and program for execution of the method
JP5340880B2 (en) Output control device for remote conversation system, method thereof, and computer-executable program
US10812401B2 (en) Jitter buffer apparatus and method
US11741933B1 (en) Acoustic signal cancelling
JP4911579B2 (en) Terminal, program and method for storing or playing back stream for analysis
JP2009188974A (en) Ip communication apparatus

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20110607

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602008049779

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: H04N0007520000

Ipc: H04N0021430000

A4 Supplementary search report drawn up and despatched

Effective date: 20140312

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 21/43 20110101AFI20140306BHEP

Ipc: H04N 21/4788 20110101ALI20140306BHEP

Ipc: H04N 21/414 20110101ALI20140306BHEP

Ipc: H04N 7/14 20060101ALI20140306BHEP

Ipc: H04N 21/434 20110101ALI20140306BHEP

Ipc: H04N 21/2368 20110101ALI20140306BHEP

17Q First examination report despatched

Effective date: 20150204

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

INTG Intention to grant announced

Effective date: 20170209

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ABERG, FREDRIK GUSTAV

Inventor name: MAHKONEN, ARTO JUHANI

Inventor name: OLOFSSON, HAKAN

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 884898

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170515

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602008049779

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20170412

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 884898

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170412

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170712

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170713

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170712

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170812

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602008049779

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

26N No opposition filed

Effective date: 20180115

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171208

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171208

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20180831

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20171231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171208

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180102

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171231

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171231

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20081208

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170412

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20221227

Year of fee payment: 15

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20221228

Year of fee payment: 15