WO2015002586A1

WO2015002586A1 - Audio and video synchronization

Info

Publication number: WO2015002586A1
Application number: PCT/SE2013/050863
Authority: WO
Inventors: Stefan HÅKANSSON
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2013-07-04
Filing date: 2013-07-04
Publication date: 2015-01-08

Abstract

There is provided handling of audio and video synchronization. A multimedia stream is acquired. The multimedia stream comprises synchronized audio and video content. It is determined whether at least video content of the multimedia stream is rendered or not by a user equipment. If determined as rendered synchronization of audio and video content is enabled during play- out of the multimedia stream. If determined as not rendered synchronization of the audio and video content is enabled to be disabled during play-out of the multimedia stream. If determined as not rendered possible time delays are removed from the audio content before play-out of the audio content.

Description

AUDIO AND VIDEO SYNCHRONIZATION

TECHNICAL FIELD

Embodiments presented herein relate to audio and video synchronization, and particularly to a method, a user equipment, a computer program, and a computer program product for handling audio and video synchronization.

BACKGROUND

In communications systems, there is a challenge to obtain good performance and capacity for a given communications protocol, its parameters and the physical environment in which the communications is deployed. Recent advancements in research and development of communications systems have resulted in increased throughput and quality of the

communications systems. This increased throughput and quality has enabled the range or services offered by communications systems to evolve. From offering services relating to voice communication and text communication, current communications systems enables communication of multimedia streams comprising audio and video content. The multimedia stream may represent a broadcast, a video conference, a web content streaming service, etc.

One factor that may influence the user experience of these services is the network latency as a whole, and particularly latency in applications involving communications of multimedia streams. Perceptible latency may even have a strong effect on user satisfaction and usability. Hence, for communications of multimedia streams comprising audio and video content, the round-trip mouth-to-ear delay for audio may be regarded as an important parameter for the quality of the communication. If the delay in a video conference is too long, there is a tendency that two users may talk over each other. This may happen if the delay is in the range of 200-300 ms or more. However, it may also be important to synchronize the audio content with the video content during play out of the same. For example, if lip movement as being part of the video content is clearly out of synchronization with corresponding speech of the audio content, the quality of the communication is by the user considered as degraded. However, such synchronizing requirements are usually conflicting because video processing and the audio processing have different delays; video processing commonly has a higher delay than audio processing. Known mechanisms work in one of a few different ways. According to one proposal play out of audio content and video content is not synchronized for the duration of the multimedia stream. This does not solve the

aforementioned issues relating to synchronization. According to one proposal play out of audio content and video content is synchronized for the duration of the multimedia stream. This does not solve the aforementioned issues relating to audio latency. According to one proposal play out of audio content and video content is synchronized if the delay (transport, camera-screen, round-trip - different measuring points maybe used) of the video content is below a threshold, otherwise not synchronized to limit the audio delay. However, there is still a need for an improved handling of audio and video synchronization.

SUMMARY

An object of embodiments herein is to provide improved handling of audio and video synchronization. The inventors of the enclosed embodiments have realized that existing mechanisms do not take into account that often the video content of the multimedia stream is actually not displayed, or is displayed on such a small screen surface that a user may have difficulty to distinguish certain features of displayed visual multimedia components. The inventors of the enclosed embodiments have realized that this fact could be utilized during handling of audio and video synchronization.

A particular object is therefore to provide improved handling of audio and video synchronization based on at least video content of the multimedia stream. According to a first aspect there is presented a method for handling audio and video synchronization. The method comprises acquiring a multimedia stream, the multimedia stream comprising synchronized audio and video content. The method comprises determining whether at least video content of the multimedia stream is rendered or not. The method comprises, if determined to be rendered, enabling synchronization of audio and video content during play-out of the multimedia stream. The method comprises, if determined not to be rendered, enabling synchronization of the audio and video content to be disabled during play-out of the multimedia stream. The method comprises, if determined not to be rendered, removing possible time delays from the audio content before play-out of the audio content.

Advantageously this enables improved handling of audio and video

synchronization.

Advantageously this enables improved handling of audio and video

synchronization particularly in the context of multimedia communication.

Advantageously this enables improved perceived communication quality for the user since the audio delay - which is one important factor for how the user experiences a communication session - is minimized.

According to one embodiment the method is performed by a user equipment. The method may then comprise determining whether at least video content of the multimedia stream is rendered or not by the user equipment.

According to a second aspect there is presented a user equipment for handling audio and video synchronization. The user equipment comprises a processing unit. The processing unit is arranged to acquire a multimedia stream, the multimedia stream comprising synchronized audio and video content. The processing unit is arranged to determine whether at least video content of the multimedia stream is rendered or not by the user equipment. The processing unit is arranged to, if determined to be rendered, enable synchronization of audio and video content during play-out of the

multimedia stream. The processing unit is arranged to, if determined not to be rendered, enable synchronization of the audio and video content to be disabled during play-out of the multimedia stream. The processing unit is arranged to, if determined not to be rendered, remove possible time delays from the audio content before play-out of the audio content. According to a fourth aspect there is presented a vehicle. The vehicle comprises a user equipment according to the second aspect.

According to a fourth aspect there is presented a computer program for handling audio and video synchronization, the computer program comprising computer program code which, when run on a user equipment, causes the user equipment to perform a method according to the first aspect.

According to a fifth aspect there is presented a computer program product comprising a computer program according to the fourth aspect and a computer readable means on which the computer program is stored.

It is to be noted that any feature of the first, second, third, fourth and fifth aspects may be applied to any other aspect, wherever appropriate. Likewise, any advantage of the first aspect may equally apply to the second, third, fourth, and/or fifth aspect, respectively, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.

Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. BRIEF DESCRIPTION OF THE DRAWINGS

The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which:

Fig l is a schematic diagram illustrating a communications system according to embodiments;

Fig 2 is a schematic diagram showing functional modules of a user equipment according to an embodiment;

Fig 3 is a schematic diagram showing functional units of a user equipment according to an embodiment; Fig 4 shows one example of a computer program product comprising computer readable means according to an embodiment;

Figs 5 and 6 are flowcharts of methods according to embodiments;

Fig 7 schematically illustrates a determining unit according to an

embodiment; Fig 8 schematically illustrates a user equipment according to an embodiment; and

Fig 9 schematically illustrates a vehicle according to an embodiment. DETAILED DESCRIPTION

The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Figure l shows a schematic overview of an exemplifying communications system 31 where embodiments presented herein can be applied. The wireless communications system 31 comprises a network node (NN) 32 providing network coverage over cells (not shown). A user equipment (UE) 11a, positioned in a particular cell is thus provided network service by the network node 32 serving that particular cell. As the skilled person understands, the communications system 31 may comprise a plurality of network nodes 32 and a plurality of UEs 11a operatively connected to at least one of the plurality of network nodes 32. The network node 32 is operatively connected to a core network 33. The core network 33 may provide services and data to the UE 11a operatively connected to the network node 32 from an external Internet Protocol (IP) packet switched data network 34. At least parts of the

communications system 31 may generally comply with any one or a combination of W-CDMA (Wideband Code Division Multiplex), LTE (Long Term Evolution), EDGE (Enhanced Data Rates for GSM Evolution, Enhanced GPRS (General Packet Radio Service)), CDMA2000 (Code Division Multiple Access 2000), WiFi, microwave radio links, etc., as long as the principles described hereinafter are applicable. A user equipment (UE) 11b may further have a wired connection to the external IP packet switched data network 34. Examples of UEs 11a, 11b include, but are not limited to mobile phones, tablet computers, laptop computers, and stationary computers. In general terms, a UE 11a, 11b as herein disclosed may have either a wireless connection, or a wired connection, or both a wireless connection and a wired connection to the IP packet switched network 34. Hence the communications system 31 may comprise any combinations of purely wirelessly connected UEs, purely wired connected UEs, and UEs with both wireless and wired connections.

One example of services and data which may be communicated through the communications system 31 is multimedia communications. In multimedia communications multimedia streams are communicated between two UEs (such as from UE 11a to UE 11b, or vice versa) or between a server of the IP network 34 and at least one UE 11a, 11b (such as from the server to at least one UE 11a, 11b or from a at least one UE 11a, 11b to the server). The multimedia stream may comprise payload data in the form of audio and video content. The audio and video content may be synchronized. As the skilled person understands the multimedia streams may comprise further payload data. The embodiments disclosed herein relate to handling audio and video synchronization. In order to handle audio and video synchronization here is provided a user equipment, a method performed by the user equipment, a computer program comprising code, for example in the form of a computer program product, that when run on a user equipment, causes the user equipment to perform the method.

Fig 2 schematically illustrates, in terms of a number of functional modules, the components of a user equipment na, lib according to an embodiment. A processing unit 12 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate arrays (FPGA) etc., capable of executing software instructions stored in a computer program product 41 (as in Fig 4), e.g. in the form of a (non-volatile) storage medium 14. Thus the processing unit 12 is thereby arranged to execute methods as herein disclosed. The storage medium 14 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.

The user equipment 11a, lib may further comprise a communications interface 13 for receiving and providing information to a user, and/ or for receiving and providing information to other devices. The communications interface 13 may thus comprise a user interface, one or more transmitters and receivers, comprising analogue and digital components and a suitable number of antennae for radio communication. The user equipment 11a, 11b may further comprise a multimedia player 15 arranged to render a

multimedia stream comprising at least video and audio content. Thus the multimedia player 15 may comprise a suitable number of video and audio codecs. The processing unit 12 controls the general operation of the user equipment 11a, 11b, e.g. by sending control signals to the storage medium 14, the communications interface 13 and the multimedia player 15, and receiving reports and data from the storage medium 14, the communications interface 13 and the multimedia player 15. Other components, as well as the related functionality, of the user equipment 11a, 11b are omitted in order not to obscure the concepts presented herein.

Fig 3 schematically illustrates, in terms of a number of functional units, the components of a user equipment 11a, 11b according to an embodiment. The user equipment 11a, lib of Fig 3 comprises a number of functional units; a acquiring unit 21a, a determining unit 21b, an enabling synchronization unit 21c, and a removing unit 2id. The user equipment 11a, lib of Fig 3 may further comprises a number of optional functional units, such as any of a face detecting unit 2ie, a disabling synchronization unit 2if, a detecting unit 2ig, a switching unit 2ih, a receiving unit 2ij, and an overriding unit 21k. The functionality of each functional unit 2ia-k will be further disclosed below in the context of which the functional units may be used. In general terms, each functional unit 2ia-k maybe implemented in hardware or in software. The processing unit 12 may thus be arranged to from the storage medium 14 fetch instructions as provided by a functional unit 2ia-k and to execute these instructions, thereby performing any steps as will be disclosed hereinafter.

The UE 11a, 11b may be provided as a standalone device or as a part of a further device. For example, the UE 11a, 11b maybe provided in a vehicle 91. Fig 9 illustrates a vehicle 91 comprising at least one UE 11a, 11b as herein disclosed. The UE 11a, 11b may be provided as an integral part of the vehicle 91. That is, the components of the UE 11a, 11b maybe integrated with other components of the vehicle 91; some components of the vehicle 91 and the UE 11a, 11b maybe shared. For example, if the vehicle 91 as such comprises a processing unit, this processing unit may be arranged to perform the actions of the processing unit 12 associated with the UE 11a, lib. Alternatively the UE 11a, 11b maybe provided as a separate unit in the vehicle 91. The vehicle 91 may be a vehicle for land transportation, such as a car, a truck, a motorcycle, or the like, a vehicle for water transportation, such as a boat, a ship, a vessel, or a submarine, or the like, or a vehicle for aerial transportation, such as an aeroplane, a helicopter, or the like.

Figs 5 and 6 are flow charts illustrating embodiments of methods for handling audio and video synchronization. The methods are performed by the user equipment iia, lib. The methods are advantageously provided as computer programs 42. Fig 4 shows one example of a computer program product 41 comprising computer readable means 43. On this computer readable means 43, a computer program 42 can be stored, which computer program 42 can cause the processing unit 21 and thereto operatively coupled entities and devices, such as the storage medium 14, the communications interface 13, and/ or the multimedia player 15 to execute methods according to embodiments described herein. The computer program 42 and/ or computer program product 41 may thus provide means for performing any steps as herein disclosed. In the example of Fig 4, the computer program product 41 is illustrated as an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. The computer program product 41 could also be embodied as a memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or an electrically erasable programmable read-only memory (EEPROM) and more particularly as a non-volatile storage medium of a device in an external memory such as a USB (Universal Serial Bus) memory. Thus, while the computer program 42 is here schematically shown as a track on the depicted optical disk, the computer program 42 can be stored in any way which is suitable for the computer program product 41.

Returning now to Fig 1, the enclosed embodiments relate to a UE 11a, 11b that is arranged to play out audio and video content, such as audiovisual conversational data, of a multimedia stream to a user of the UE 11a, 11b.

The multimedia stream may origin from a peer UE 11a, lib with which a user interacts to record audio and video content in a multimedia stream by speaking into a microphone and where the video content is captured by a video camera of the UE na, lib recording a video signal. The audio signal captured by the microphone and the video signal captured by the camera are transformed into at least one multimedia stream and sent over the

communications system 31. The processes for transforming audio and video signals into at least one multimedia stream and for sending the at least one multimedia stream over the communications system 31 are as such known in the art and further description thereof is therefore omitted. The enclosed embodiments for handling audio and video synchronization are based on whether at least video content of a multimedia stream as acquired by the UE 11a, 11b is rendered by the UE 11a, 11b or not.

A method for handling audio and video synchronization as performed by a user equipment 11a, 11b will now be disclosed. The processing unit 12 of the UE 11a, 11b is arranged to, in a step S102, acquire a multimedia stream. The acquiring may be performed by executing functionality of the acquiring unit 21a. The computer program 42 and/ or computer program product 41 may thus provide means for this acquiring. The multimedia stream comprises synchronized audio and video content. The fact that the rendering

application (as for example being part of the multimedia player 15) in the UE 11a, 11b knows whether or not it is actually rendering the video part of the multimedia stream is now utilized. The processing unit 12 of the UE 11a, lib is arranged to, in a step S106, determine whether at least video content of the multimedia stream is rendered or not by the UE 11a, lib. The determining may be performed by executing functionality of the determining unit 21b. The computer program 42 and/or computer program product 41 may thus provide means for this determining. Hence, the decision from this

determination is that either at least video content of the multimedia stream is "is rendered", or at least video content of the multimedia stream is "is not rendered".

If at least the video content of the multimedia stream is rendered then audio and video content should be played out in synchronization. Particularly, the processing unit 12 of the UE 11a, 11b is arranged to, in a step Sio8a, enable synchronization of audio and video content during play-out of the multimedia stream. The enabling may be performed by executing

functionality of the enabling unit 21c. The computer program 42 and/or computer program product 41 may thus provide means for this enabling.

If at least the video content of the multimedia stream is not rendered then audio and video content should not be played out in synchronization.

Particularly, the processing unit 12 of the UE 11a, lib is arranged to, in a step Sio8b, enable synchronization of the audio and video content to be disabled during play-out of the multimedia stream. The enabling may be performed by executing functionality of the enabling unit 21c. The computer program 42 and/ or computer program product 41 may thus provide means for this enabling. This is particularly advantageous if the video content has longer delay than the audio content. Removal of the condition to enable

synchronization allows the audio content to be played out with minimum delay. Particularly, the processing unit 12 of the UE 11a, lib is arranged to, in a step S110, remove possible time delays from the audio content before play- out of the audio content. Any intended audio delay (as inserted e.g. for enabling synchronization with the video content) may thereby be removed so as to minimize the audio delay. The removing may be performed by executing functionality of the removing unit 2id. The computer program 42 and/or computer program product 41 may thus provide means for this removing.

As noted above, in step sio8b synchronization of the audio and video content is enabled to be disabled during play-out of the multimedia stream. The synchronization of the audio and video content may even be disabled during play-out of the multimedia stream. According to an embodiment the processing unit 12 of the UE 11a, lib therefore is arranged to, in an optional step Sio8c, disable synchronization of the audio and video content during play-out of the multimedia stream in case the at least video content of the multimedia stream is determined as not rendered. The disabling may be performed by executing functionality of the disabling unit 2 if. The computer program 42 and/ or computer program product 41 may thus provide means for this disabling. Embodiments and considerations relating to further details of handling of audio and video synchronization will now be disclosed.

There may be different ways to determine whether at least the video content of the multimedia stream is rendered or not by the UE na, lib, as in step Sio6. Different embodiments relating thereto will now be described in turn with references to Fig 8. Fig 8 schematically illustrates an exemplary exterior of a UE na, lib according to an embodiment. The exterior of the UE na, lib comprises a display screen 82 for displaying visual multimedia components, such video content, images, text, graphics, etc. The exterior of the UE 11a, 11b comprises a microphone 83 for recording audio. The exterior of the UE 11a, 11b comprises a loudspeaker 84 for playing out audio. On the display screen 82 a first visual multimedia component 85, a second visual multimedia component 86, and a third visual multimedia component 87 are illustrated. As the skilled person understands, these are just examples of visual multimedia components which may be displayed by the display screen 82. On the first visual multimedia component 85 a face is schematically identified at reference numeral 88. It is for illustrative purposes assumed that the first visual multimedia component 85 represents a foreground object, that the second visual multimedia component 86 represents a background object, and that the third visual multimedia component 87 represents a thumbnail object. It is for illustrative purposes assumed that each of visual multimedia component represent video content of a multimedia stream. D denotes the diameter of the display 82

According to a first general embodiment the rendering condition is based on at least one property of the display screen 82 on which the video content of the multimedia stream is displayed. Often the video content of the

multimedia stream is actually not displayed, or is displayed on such a small screen surface that a user may have difficulty to distinguish certain features of displayed visual multimedia components 85, 86, 87. In general terms the multimedia stream may be determined as rendered in a case the video content is displayed on a reasonably large display, on a reasonably large part of the display, and/or with a reasonably high resolution. A reasonably large display may be a display with diameter D equal or larger than 3 inches. A reasonably large part of the display may be defined by a relation between the length 1 of the visual multimedia component and the total length L of the display, where l/L≥o.5. A reasonably high resolution may be defined as at least 150 pixels per inch (PPI), preferably at least 200 PPI, and most preferably at least 300 PPI.

According to a second general embodiment the rendering condition is based on how the video content of the multimedia stream is displayed. According to one embodiment the multimedia stream is determined as rendered in a case the video content is displayed in a full screen mode. For example, the video content could be displayed in a communications application that is in the foreground on the display screen 82, such as the first visual multimedia component 85. According to one embodiment the multimedia stream is determined as rendered in a case the video content is displayed as a foreground object. For example, the video content could be displayed in a tab of a browser other than the tab the user presently has in the foreground. The browser may be a web browser. For example, the video content could be displayed in a communications application that is not in the foreground on the display screen 82, such as the second visual multimedia component 86. For example, the video content could have been minimized on the display screen 82, such as the third visual multimedia component 87. According to one embodiment the multimedia stream is determined as not rendered in a case the video content is displayed in a reduced screen mode, such as a thumbnail object, minimized, not as a foreground object, or hidden. According to a third general embodiment the rendering condition is based on at least one property of the video content itself. One example of such a property considers whether or not a human face is present in currently displayed video content. Therefore, according to an embodiment the processing unit 12 of the UE 11a, 11b is arranged to, in an optional step S104, perform face detection to detect if a human face is present in currently displayed video content. The face detection may be performed by executing functionality of the face detection unit 2ie. The computer program 42 and/or computer program product 41 may thus provide means for this face detection. The processing unit 12 of the UE 11a, 11b may then be arranged to, in an optional step Sio6a, determine the multimedia stream as rendered as long as the human face is present in the currently displayed video content. The determining may be performed by executing functionality of the determining unit 21b. The computer program 42 and/or computer program product 41 may thus provide means for this determining. As the skilled person understands, there are further examples of such properties than human face detection. One such further example could consider whether objects of the video content have movements on the display screen, or other appearance properties, that are correlated with the audio content of the multimedia stream.

According to a fourth general embodiment the rendering condition is based on time delay properties of the audio and video content. According to one embodiment determining whether the multimedia stream is rendered or not is based on a time delay of the video content. The multimedia stream may then be determined as not rendered in a case the time delay is longer than a predetermined threshold value. The predetermined threshold value may be in the range 100-400 ms, preferably in the range 100-300 ms, and more preferably in the range 100-150 ms. Additionally or alternatively also any (non-intentional) delay of the audio content may be considered.

The first general embodiment, the second general embodiment, the third general embodiment, and the fourth general embodiment may be readily combined. Thereby a multitude of rendering conditions may be considered when determining whether at least video content of the multimedia stream is rendered or not by the UE 11a, 11b, as in step S106. Fig 7 schematically illustrates a determining unit 21b according to an embodiment. The determining unit 21b is configured to receive input 72 representing rendering conditions, such as rendering conditions according to any of the first general embodiment, the second general embodiment, the third general

embodiment, and the fourth general embodiment. Hence the rendering conditions may be received from any of the communications interface 13, the multimedia player 15, and the face detection unit 2ie. The determining unit 21b is configured to, based on the received input determine whether at least the video content of the multimedia stream is rendered or not by the UE 11a, 11b, as in step S106. The determining unit 21b is configured to provide output 73 representing the decision of the determination step (i.e. "is rendered", or "is not rendered"). The decision maybe provided to any of the multimedia player 15, the enabling synchronization unit 21c, the disabling

synchronization unit 2 if, and the removing unit 2 id.

There may be different ways to determine when to switch from enabling the synchronization to enabling the synchronization to be disabled, or vice versa. Different considerations relating thereto will now be described. In general terms, pauses, such as speech pauses, in the audio content may be used to switch between synchronizing and not synchronizing the audio content with the video content, thereby enabling a smooth transition between

synchronizing and not synchronizing the audio content with the video content. According to an embodiment the processing unit 12 of the UE 11a, 11b is therefore arranged to, in an optional step S112, detect a speech pause in the audio content. The detecting may be performed by executing functionality of the detecting unit 2ig. The computer program 42 and/or computer program product 41 may thus provide means for this detecting. The processing unit 12 of the UE 11a, 11b may then be arranged to, in an optional step S114, switch from enabling the synchronization to enabling the synchronization to be disabled ,or vice versa, only after the speech pause has been detected. The switching may be performed by executing functionality of the switching unit 2ih. The computer program 42 and/ or computer program product 41 may thus provide means for this switching. Hence, although determined not to be rendered, the synchronization of the audio content with the video content may remain until a pause has been detected in the audio content. There may be different ways to transport the multimedia stream to the UE 11a, 11b. Different considerations relating thereto will now be described. As noted above, the UE 11a, 11b may either have a wireless connection to the l6 provider of the multimedia stream (as is the case for UE na),a wired connection to the provider of the multimedia stream (as is the case for UE 11b), or any combination thereof. In general terms, commonly the RTP (realtime transport protocol) protocol is used to transport the multimedia stream over the network 34. However, the embodiments disclosed herein are not limited to the use of the RTP protocol. In general terms, the multimedia stream may be transported by any communications protocol that allows the audio and video to be played out in synchronization. The RTP protocol has such ability. According to one embodiment the multimedia stream is thus transported using a protocol comprising synchronization information enabling playing out audio and video content in synchronization.

There may be different ways to play out the audio and video by the UE 11a, 11b. Different considerations relating thereto will now be described. As noted above the UE 11a, lib may comprise a multimedia player 15 for playing out audio and video content. The multimedia player 15 may be configured to freely select if the audio and video content is to be played out in

synchronization or if the audio and video content is to be played out as received by the multimedia player 15 without any synchronization being applied - assuming that synchronization of the audio and video content has been enabled to be disabled during play-out of the multimedia stream, as in step Sio8b, or even has been disabled, as in step Sio8c. Further, the communications interface 13 of the UE 11a, 11b may be configured to receive user input relating to properties of how audio and video content of the multimedia stream is going to be played out. Hence the UE 11a, 11b may be configured to allow the user of the UE 11a, 11b to determine how the audio and video content should be played out by the multimedia player 15. How a user may control a multimedia player 15 is as such known in the art and further description is therefore emitted. Particularly, as disclosed herein, the user may control whether or not synchronization between played out audio and video content should be enabled. It may be that the user desires the synchronization to be maintained although the video content is not rendered, or the synchronization to be (enabled to be) disabled although the video content is rendered. Particularly, according to an embodiment the processing unit 12 of the UE 11a, 11b is arranged to, in an optional step S116, receive user input relating to enabling and disabling of the synchronization. The receiving may be performed by executing functionality of the receiving unit 2ij. The computer program 42 and/ or computer program product 41 may thus provide means for this receiving. The processing unit 12 of the UE 11a, lib may then be arranged to, in an optional step S118, override the step of determining whether the multimedia stream is rendered or not (as in step S106) according to the received user input. The overriding maybe performed by executing functionality of the overriding unit 21k. The computer program 42 and/or computer program product 41 may thus provide means for this overriding.

In summary, mechanism for handling audio and video synchronization may involve, in a case it has been determined that video content is rendered: enable synchronization between played out audio and video content (unless synchronization should be disabled for another reason (e.g., timing reasons; the audio delay or video delay may as such be too large)), and in a case it has been determined that video content is not rendered: enable synchronization between played out audio and video content to be disabled. As all processing according to there herein disclosed steps is performed by the UE 11a, 11b itself no device external signaling is needed.

The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.

Claims

l8 CLAIMS

1. A method for handling audio and video synchronization, comprising the steps of:

acquiring (S102) a multimedia stream, the multimedia stream

comprising synchronized audio and video content;

determining (S106) whether at least video content of the multimedia stream is rendered or not;

and if rendered:

enabling (Sio8a) synchronization of audio and video content during play-out of the multimedia stream;

and if not rendered:

enabling (Sio8b) synchronization of said audio and video content to be disabled during play-out of the multimedia stream; and

removing (S110) possible time delays from said audio content before play-out of said audio content.

2. The method according to claim 1, further comprising:

disabling (Sio8c) synchronization of said audio and video content during play-out of the multimedia stream in case the at least video content of the multimedia stream is determined as not rendered.

3. The method according to claim 1 or 2, wherein the multimedia stream is determined as rendered in a case the video content is displayed in a full screen mode.

4. The method according to any of the preceding claims, wherein the multimedia stream is determined as rendered in a case the video content is displayed as a foreground object.

5. The method according to any of the preceding claims, wherein the multimedia stream is determined as not rendered in a case the video content is displayed in a reduced screen mode, such as a thumbnail object, minimized, not as a foreground object, or hidden.

6. The method according to any of the preceding claims, further comprising:

detecting (S112) a speech pause in said audio content; and

switching (S114) from enabling said synchronization to enabling said synchronization to be disabled, or vice versa, only after said speech pause has been detected.

7. The method according to any of the preceding claims, further comprising:

performing (S104) face detection to detect if a human face is present in currently displayed video content; and

determining (Sio6a) the multimedia stream as rendered as long as said human face is present in said currently displayed video content.

8. The method according to any of the preceding claims, wherein the multimedia stream is transported using a protocol comprising

synchronization information enabling playing out audio and video content in synchronization.

9. The method according to any of the preceding claims, further comprising:

receiving (S116) user input relating to enabling and disabling of said synchronization, and

overriding (S118) the step of determining whether the multimedia stream is rendered or not according to the received user input.

10. The method according to any of the preceding claims, wherein determining whether the multimedia stream is rendered or not is further based on a time delay of said video content, and wherein the multimedia stream is determined as not rendered in a case said time delay is longer than a predetermined threshold value.

11. The method according to claim 10, wherein the predetermined threshold value is in the range 100-400 ms, preferably 100-300 ms, more preferably 100-150 ms.

12. A user equipment ( na, lib) for handling audio and video

synchronization, the user equipment comprising a processing unit (12) arranged to:

acquire a multimedia stream, the multimedia stream comprising synchronized audio and video content;

determine whether at least video content of the multimedia stream is rendered or not by the user equipment;

and if rendered:

enable synchronization of audio and video content during play-out of the multimedia stream;

and if not rendered:

enable synchronization of said audio and video content to be disabled during play-out of the multimedia stream; and

remove possible time delays from said audio content before play- out of said audio content.

13. The user equipment (11a, lib) according to claim 12, wherein

the processing unit (12) is further arranged to disable synchronization of said audio and video content during play-out of the multimedia stream in case the at least video content of the multimedia stream is determined as not rendered.

14. The user equipment (11a, lib) according to claim 12, wherein

the processing unit (12) is further arranged to detect a speech pause in said audio content; and to switch from enabling said synchronization to enabling said synchronization to be disabled, or vice versa, only after said speech pause has been detected.

15. The user equipment (11a, lib) according to claim 12, wherein

the processing unit (12) is further arranged to perform face detection to detect if a human face is present in currently displayed video content; and to determine the multimedia stream as rendered as long as said human face is present in said currently displayed video content.

16. The user equipment (11a, lib) according to claim 12, wherein

the processing unit (12) is further arranged to receive user input relating to enabling and disabling of said synchronization, and to override the determination whether the multimedia stream is rendered or not according to the received user input.

17. A vehicle (91) comprising a user equipment (11a, lib) according to any one of claims 12 to 16.

18. A computer program (42) for handling audio and video

synchronization, the computer program comprising computer program code which, when run on a user equipment (11, 11a, 11b), causes the user equipment to:

acquire (S102) a multimedia stream, the multimedia stream comprising synchronized audio and video content;

determine (S106) whether at least video content of the multimedia stream is rendered or not by the user equipment;

and if rendered:

enable (Sio8a) synchronization of audio and video content during play-out of the multimedia stream;

and if not rendered:

enable (Sio8b) synchronization of said audio and video content to be disabled during play-out of the multimedia stream; and

remove (S110) possible time delays from said audio content before play-out of said audio content.

19. A computer program product (41) comprising a computer program (42) according to claim 14, and computer readable means (43) on which the computer program is stored.