WO2015131934A1

WO2015131934A1 - System and method for live video streaming

Info

Publication number: WO2015131934A1
Application number: PCT/EP2014/054234
Authority: WO
Inventors: Johannes SCHRIEWER; E. Servet MUTLU
Original assignee: 2Kb Beteiligungs Gmbh
Priority date: 2014-03-05
Filing date: 2014-03-05
Publication date: 2015-09-11

Abstract

A system for enabling HTTP live streaming of video films is described. A handheld electronic device, e.g. a smartphone, generates a video film in segments and uploads the media segments to a server when a segment is completed while a subsequent segment is recorded. The server generates a playlist file listing uploaded segments of a video film and provide the playlist and the segments for downloading.

Description

System and method for live video streaming

The invention relates to a system and a corresponding method for video streaming, in particular for enabling live video streaming wherein a handheld electronic device uploads a segmented video to a server via a wireless link. The server in turn processes the segments of the video and provides the video segments to a plurality of down-streaming or down-loading viewer clients thus enabling live-streaming of video films.

Modern smartphones, so-called tablet computers or similar handheld electronic devices not only are equipped with a camera means for taking still images, but also with video functionality for producing high-quality video clips and video films extending to nearly endless duration. Since smartphones typically are capable of uploading recordings, it has become popular to upload the videos to a network server in order to publish them. A plurality of internet services is known that allow uploading of complete videos. Once the file containing the video is uploaded it is accessible by a plurality of users, wherein the video may be downloaded either as a stream for immediate viewing or may be downloaded as a file thus enabling storing of the video file for offline viewing.

Furthermore there are services that enable so-called live-streaming of videos. These services aim at providing the video data to a download user for viewing as soon as the data is provided to the service. In one example TV live transmissions can be received via smartphones or handheld devices that are communicatively coupled to the internet using software applications that allow reception of a TV program via a radio connection, i.e. a data connection according to a well-known

telecommunications standard such as UMTS or LTE or via wireless LAN. Video cameras deployed for generating the video data such as for TV live broadcasting typically are coupled to a wired transport network exhibiting a fast and reliable upload connection enabling the transmission of the generated video data with a sufficient upload transmission rate. By providing a sufficiently fast upload connection to the network any transmission delay in the upload channel is anticipated.

However, smartphones typically are capable of generating video films and uploading these once the video film is completed, i.e. the file containing the video and audio data of the film is stored on a storage device comprised in or communicatively coupled to the handheld electronic device. These consumer devices typically differ from TV live cameras in that they - initially - are not configured for live video streaming. In other words a video film cannot be uploaded to a server unless the file is completed. As a consequence live- streaming of a video is precluded by a conventional consumer device.

This problem is solved by a method and corresponding device as described in more detail and with reference to the accompanying figures wherein

Fig. 1 depicts an arrangement for uploading a video from a handheld electronic device to a server;

Fig. 2A depicts a sketch illustrating a first embodiment of functional blocks of the handheld electronic device;

Fig. 2B depicts a sketch illustrating a second embodiment of functional blocks of the handheld electronic device.

Figure 1 depicts an arrangement 100 of a handheld electronic device 110 coupled to a network server 130 via a radio connection 140 provided by a radio access network 160 for uploading video film data. A viewing client 150 is communicatively coupled to server 130 via network 170 and is configured for downloading the video data generated by device 110.

The handheld electronic device is capable of generating segments of a video film, called media segments in the following, wherein a media segment contains video and audio data. The device comprises a camera means adapted for generating video data, i.e. motion picture data. Similarly the device comprises a means for recording audio data, e.g. a microphone means. As described in more detail below, device 110 is capable of generating media segments of a video film.

Furthermore device 110 comprises a radio interface and is capable of uploading data and particularly the media segments via a radio connection 140 to a server 130, wherein the radio connection can be established via WLAN, a cellular

telecommunication network, a wide area network WAN, WIMAX protocol or any other suitable radio access system 160, and wherein the cellular communication system can be a system according to the GSM/CSD/HSCSD/GPRS or UMTS or LTE or PDC or any other system allowing wireless data file transmission. Radio access system 160 may comply with any of the above exemplifying standards. Radio access system 160 is communicatively coupled to network system 170 and in particular to a server 130 comprised in network system 170. The network system may comprise a plurality of networks, i.e. sub-networks or sub-nets. In one embodiment network system 170 may comprise the internet and a smaller network, e.g. the network of a service provider, wherein said small network comprises server 130.

Server 130 is configured and adapted for receiving, storing and enabling download of files, particularly video films or segmented video films, i.e. media segments, including additional files associated with the film segments. Server 130 furthermore may be adapted for processing the uploaded files, i.e. the media segments. In one embodiment the video film may be provided for downloading according to the "HTTP Live Streaming" media communications protocol as specified by APPLE.

As insinuated in figure 1, a user may generate a film of an arbitrary object using handheld electronic device 110, which in one embodiment may be a smartphone comprising a camera means capable of generating motion picture data and audio data. The device further comprises a control means, e.g. a software application, enabling the method steps and control of the smartphone, particularly of the camera means, the microphone means and the radio interface as described in the following.

When the user starts filming the object using the control means, i.e. the application, the means controls operation of the camera and microphone means, furthermore produces the film as a temporal sequence of segments of video data and audio data, i.e. media segments. Each segment, except for the last one, exhibits a predefined finite duration defined by the number of video and audio frames contained in the media segment. However, other varying durations may be chosen and the described system shall not be limited by the duration of the segments. The duration of a segment typically is chosen to a value of a few seconds, which in one embodiment may be between one and sixty seconds, preferably between 5 and 10 seconds. A video film that lasts a couple of minutes thus is segmented into a plurality of media segments.

In order to enable nearly live streaming of the recorded motion picture and sound, the data is segmented. When a user starts recording, the video film including audio data is produced in segments thus producing a sequence of media segments. As soon as a media segment is completed, i.e. a chunk of the video film has been completed as a separate file, the handheld electronic device uploads that segment to server 130, while the next media segment is generated by the handheld electronic device. When said next media segment has reached its predefined duration or the recording has been intentionally stopped by the user, i.e. the file containing said next media segment is completed, it is uploaded to server 130. In this way the entire film is segmented into a sequence of media segments that are uploaded to a server 130 as soon as possible. The media server 130 that actually may comprise a plurality of physical machines, processes the uploaded media segments and provides the processed segments of the film and optionally an associated playlist file for download as soon as possible, i.e. after having processed the uploaded media segments to comply with the above mentioned "HTTP LIVE STREAMING".

Figures 2A and 2b depict schematics of a first and a second embodiment of functional blocks comprised in a handheld electronic device 110 deployed for generating at least one media segment of a video film.

Note that the handheld electronic devices may comprise functional blocks, e.g. a cpu for running an operating system that in turn enables execution of software applications, optionally a digital signal processor for performing particular processing steps such as video and/or audio encoding, means for storing data, a radio interface for implementing data communication via any arbitrary wireless communication connection to a server in addition to those blocks and functions described below.

Figure 2A depicts functional blocks of a first embodiment 200 comprised in a handheld electronic device adapted for generating the video and audio data for at least one media segment.

The operating system of device 110 runs an application 210 that provides a graphic user interface for receiving control input from a user thus enabling a user to control the application. Application 210 is communicatively coupled to at least a camera means, i.e. control block 221 controlling a video camera 220, at least one video buffer 222 for buffering raw video data frames, a microphone means comprising a microphone control block 231 for controlling a microphone 231, at least one audio buffer 232 for buffering audio samples, a media segment writer 240 for generating segments of a video film, means for manipulating the generated media segments and means 250 for uploading the media segments via the radio interface of the handheld device. The camera 220 comprised in the device 110 is capable of generating video data, i.e. motion picture. The camera is communicatively coupled to a video control module 221 that is capable of controlling the camera, i.e. at least controlling start and stop of video generation etc. Optionally control module 221 may be capable of

configuring the pixel resolution of the camera. Furthermore module 211 is capable of receiving the generated video data from camera 220 and is capable of outputting raw video data in frames. In one embodiment the raw video data frames exhibit the highest possible pixel resolution, i.e. the camera means comprising camera 220 and camera control block 221 provides raw video data frames having the maximum pixel resolution of the camera, wherein one video frame is provided in one buffer. Camera 220 and camera control module 221 may be considered as a camera means.

Accordingly, when application 210 controls the camera 220 via video control module 221 a sequence of raw, i.e. uncompressed, video frames of the configured pixel resolution is provided in buffers, wherein in one embodiment each video frame is provided in a separate buffer. In particular the camera means may be configured to the highest pixel resolution allowed by the hardware, thus providing video frames having the highest possible pixel resolution.

The handheld electronic device furthermore comprises a means for recording audio data, e.g. a microphone unit 230, which may be a conventional microphone. Said microphone unit is communicatively coupled and controlled by audio control module 231 that in turn is controlled by the application 210. Audio control module 231 controls the operation of the microphone, i.e. starts and stops recording and optionally may control the sampling rate of the microphone unit. In one

embodiment audio control module may specify a buffer for temporarily storing audio samples, wherein the amount of audio samples per audio buffer may be specified. Microphone 230 and microphone control module 231 may be considered as a microphone means. In one embodiment a buffer size adapted for storing 1024 audio samples may be specified.

Though microphone 230 may be configured to any arbitrary sampling rate, we assume a sample rate of 48kHz, thus there are 48000 audio samples per second. Audio control 231 thus outputs audio samples according to the configured sampling rate in buffers 232 having the predefined size, e.g. 1024 samples per buffer. Note that the invention shall not be limited neither by this particular sampling rate nor by the specific audio buffer size. The video raw data as buffered in video buffers 222 and the audio samples as buffered in audio buffers 232 are provided to media segment writer block 240. Block 240 may be implemented as a software module taking a number of raw video frames and a number of raw audio samples contained in audio frames and outputs a media segment containing the provided video and audio frames in a media file in a format as controlled by application module 210. Said media segment typically complies with a standardized video format, e.g. an mp4 container. In other words application 210 controls the settings of block 240 regarding the encoder to be used for encoding the raw video frames, wherein the type of the encoder and furthermore various parameters may be provided to block 240 at the beginning of an encoding session, i.e. prior to starting the process of encoding. In this way the pixel resolution and the encoding format to be used for the output media file may be controlled by application 210. Similarly application 210 may control the parameters for encoding audio frames regarding the encoder type and the audio resolution. In this way the video resolution and audio resolution of the output media file may be lower than that of the provided raw video frames and audio samples.

Since the pixel resolution of the contained video frames and the audio sampling rate of the audio frames contained in the media file as output by media segment writer have impact on the file size of the media segment, this feature may be useful for controlling the size of the generated media segement.

The amount of raw video data provided to media writer 240 for one media segment corresponds to an integer number of video frames and the integer number of audio samples provided to the media segmenter corresponds to number of audio samples contained in an integer number of audio frames. The generated media segments shall contain an integer number of video frames and an integer number of audio frames while at the same time any padding of the media segment shall be avoided. Consequently a media segment advantageously contains a number of video frames and a number of audio frames so that the total duration of the video frames comprised in the segment equals the duration of the audio frames. Since the duration of a video frame differs from that of an audio frame, the least common multiple of the video frame duration and audio frame duration is provided to the media segment writer. In this way the segment writer does not fill in any padding to a media segment.

Note that the number of raw video frames and the number of raw audio samples are provided to media segment writer 240, wherein writer 240 compresses the audio samples to audio frames containing a predefined number of audio samples.

However, one audio frame basically may contain an arbitrary number of audio samples and the invention shall not be limited in this regard.

An audio frame typically is considered as the smallest access unit that is processed in a processing queue, for example at a video decoder.

Subsequently we assume that one audio frame contains 1024 audio samples.

Assuming now a sampling rate of 48kHz, i.e. 48000 audio samples are generated per second, one audio frame is of duration T _A. _t

T_AudioFrame = l/(48000/s)^■ 1024 = 0,0213 seconds

Note that different sampling rates and audio frames containing a different number of audio samples will result in different frame durations.

Considering now that one second of video film consists of a fixed number of video frames, i.e. typically 24 video frames yield one second of video film, one video frame is of duration 1/24-th second.

Media segment writer 240 is controlled by application 210 to generate a media segment based on the provided raw audio samples and the provided raw video frames and to finalize the generated media segment thus providing a completed file for further processing. As mentioned above and in order to avoid any padding of a video segment generated by segmenter block 240 and to enable fast and smooth processing of the video frames and audio samples, the number of video frames and the number of audio frames is chosen such that their borders match in time when replayed. Consequently, when considering a video frame length T_videoFrame = ls/24 and an audio frame length of T_AudioFrame = 0,0213 s , the integer number n of provided video frames and the integer number m of audio frames are chosen so that their durations match the smallest common multiple, i.e. ^{n '} TvideoFrame ^{= m '} TAudioFrame wherein Π, m 6 N with N denoting natural integer numbers. In one embodiment this equation leads to n = 64 and m = 125. As a consequence a media segment has a duration of a multiple of T_MediaSegment = 64 ^■ ls/24 = 2, 6 seconds.

Application 210 thus may provide the n raw video frames and m audio sample buffers to media segment writer 240, each audio sample buffer containing 1024 raw audio samples, and may control the media segment writer 240 to process the provided video and audio frames according to the configuration set by application 210 prior to triggering the encoding of the video and audio frames. In this way the user controllable application 210 may configure media segment writer 240 regarding the video encoder to deploy for compressing the provided raw video frames, the desired pixel resolution of the video data contained in the generated media segment and other parameters needed or adjustable. In this way application 210 may control the pixel resolution of the video data contained in the media segment.

In one embodiment the video encoder may comply with the H.264 encoding standard. However, other video encoders may be used and the invention shall not be limited in this regard.

Similarly application 210 controls block 240 regarding the audio encoder and the respective parameters for adjusting the audio encoder regarding the audio sampling rate of the audio data contained in the media segment prior to triggering the encoding.

In one embodiment the audio encoder used in block 240 may be the Advanced Audio Coding, AAC, as standardized by ISO and IEC as part of the MPEG-2 and MPEG-4 specifications.

When block 240 has completed the processing of the provided audio and video frames, i.e. a media segment has been completed and the corresponding file can be processed, application 210 is notified, wherein block 240 may issue a notification to application 210 directly or application 210 may be notified by any other suitable means provided for example by the operating system of the handheld device.

As soon as any optional processing performed on the media segment has been completed, application 210 may then control upload module 250 to upload the media segment to a server 130 via a wireless communication interface 260 as described above and comprised in device 110.

While a current media segment is generated by block 240 based on the provided audio samples and video frames and uploaded to server 130, subsequent video frames and audio samples are provided by camera 220 / camera control 221 and microphone 230 / microphone control 231. As soon as the current media segment is completed by media segment writer 240, generation of a subsequent media segment may be started by application 210 based on the subsequent audio samples and video frames. Said subsequent media segment can be provided to upload module 250 as soon as completed. In this way application 210 controls the continuous recording of video and audio data that are uploaded to a server 130 in chunks of media segments that form a seamless video film, wherein a current media segment is uploaded while a subsequent media segment is generated.

Note that more than one instance of media segment writer block 240 may be operated by application 210. This enables application 210 to configure a second instance of media segment writer 240 to produce a media segment having different properties, i.e. generated using a different configuration, than a media segment currently generated by a first instance of media segment writer 240 using a first configuration. Application 210 thus may control the generation of media segments having different properties. Taking into account that a higher pixel resolution of the video frames yields to bigger file sizes, application 210 may at least roughly control the size of the media segment files by controlling the configurations of media segment writer 240. Accordingly, if during the process of producing and uploading media segments to server 130 it is found, that due to a reduced upload rate the upload of a current media segment cannot be completed before a subsequent media segment is completed and ready for upload, application 210 may adapt the configuration of media segment writer 240 to produce media segments of smaller file size.

In this way application 210 may tune the file size of the media segments so that these can be uploaded in time before a subsequent media segment is provided for upload via the radio interface comprised in the handheld device 110, thus keeping the flow of media segments running.

Figure 2B depicts an alternative implementation comprising similar functional blocks as described with reference to figure 2A. Note that similar functional blocks are denoted by similar reference numerals used in figure 2A. Similar as described above for the first embodiment an application is communicatively coupled to a camera control 221 for controlling a camera 220 and a microphone control 231 for controlling microphone 230. Furthermore the application similarly controls media segment writer block 240 and an upload block 250 coupled to radio interface 260.

In contrast to the embodiment described above the camera and/or camera control 220/221 provides a stream of raw video frames instead of buffers, and the microphone 230 and/or microphone control 231 provides a stream of raw audio samples instead of audio samples in buffered audio frames. The stream of video frames and the stream of audio samples are directed to media segment writer 240, wherein said directing can be implemented by conventional means such as socket connections or software pipes.

Similar as described above media segment writer 240 is controlled and configured by the application to encode the provided audio samples into audio frames and to encode the provided video frames into video frames according to the configuration of the media segment writer as set up by application 210. Note that in this embodiment media segment writer 240 provides a first file containing encoded video data only and a second file containing encoded audio data only.

Said first file may be a media file according to a standardized container format, i.e. such as the above mentioned mp4 format. The contained video data may be encoded by a video encoder as configured by application 210, wherein said configuration may determine the pixel resolution of the video frames. The video encoder thus generates the encoded video frames based on the provided raw video frames, wherein the encoder may reduce the pixel resolution of the video frames relative to the pixel resolution of the raw video frames. In one embodiment the video encoder may comply with the H.264 encoder.

The second file may contain audio data only, wherein the audio data may be encoded by an audio encoder complying with the above mentioned Advanced Audio Coder AAC. The audio encoder may be configured by application 210. The audio encoder thus generates audio samples based on the raw audio samples as provided, wherein the audio encoder optionally may group the audio samples into frames and wherein the audio sampling rate may differ from the audio sample rate of the provided raw audio frames.

The first and second files as generated by media segment writer 240 thus contain the information of a media segment, said media segment thus comprising two physical files. When media segment writer 240 has completed the generation of the media segment, application 210 controls upload module 250 to upload the media segment, i.e. the two files, via a radio interface 260 comprised in the handheld electronic device to a server 130 within one upload action.

Furthermore, when a first media segment has been completed, the generation of a second media segment may be triggered, wherein the generation of said second media segment may be performed by the media segment writer instance deployed for generating the first media segment. Alternatively, application 210 may control and configure a second instance of the media segment writer, wherein the

configuration of the second media segment writer program instance may differ in the configuration provided by application 210. In other words application 210 may control a second program instance of media segment writer 240 having a different configuration. By redirecting the stream of raw video frames as output by the camera means and the stream of raw audio samples as provided by the microphone means to the second instance, application 210 may control the configuration of media segment writer for each media segment separately. In this way application 210 may determine to generate media segments by media segment writer instances having different configurations thereby producing media segments having for example different pixel resolutions or different audio sampling rates.

Similar as described above with reference to figure 2A, application 210 may tune the file size of the media segments by deploying media segment writer instances 240 having different configurations. Similarly as described above with reference to figure 2A, application 210 may control upload module 250 to transfer the media segment, i.e. the two video and the audio segment, via a wireless interface 260 to server 130 while at the same time generating a subsequent media segment, thus uploading chunks of a video film to server 130 as soon as these are completed.

Note that in one optional embodiment application 210 may control to provide a finite number of arbitrary audio samples, for example silent audio samples, to media segment writer before providing audio samples generated by microphone means 230. Media segment writer 240, i.e. the audio encoder comprised in media segment writer 240, encodes the audio samples in the sequence as provided, i.e. the arbitrary or silent samples are encoded into the leading audio frames. In this way application 210 may control that each media segment comprises a predefined number of arbitrary audio samples at the beginning of the media segment. Taking into account, that conventional audio encoders may provide silent audio samples at the beginning of a generated media segment, e.g. the AAC audio encoder at least in some embodiments generates 2112 silent audio samples at the beginning of an MP4 file, application 210 may provide additional silent or arbitrary audio samples in order to fully fill a predetermined number of audio frames with ignorable audio samples in order to simplify processing at the server side. In one embodiment, e.g. if the audio encoder encodes 1024 audio samples into one audio frame and starts an MP4 file with 2112 silent audio frames, application 210 may provide another 960 ignorable audio samples at the beginning of a media segment, so that there are exactly 3 audio frames containing ignorable audio samples in the generated media segment.

Note that in one embodiment each upload of a media segment comprises uploading of a digital token that has been requested from and delivered by a server, e.g. server 130, of the system in order to authorize the upload action.

Server 130 accepts media segments uploaded by handheld device 110 if the uploading handheld electronic device has been authorized for uploading previously. Once an upload of a media segment has been completed media server 130 starts processing the uploaded media segment.

The processing of an upload media segment may comprise a plurality of steps including checking integrity of the media segment to verify that an uploaded media segment actually does not comprise any malware or virus or any other objectionable data. Further processing of a media segment is stopped and the segment is deleted in order to protect the system, if objectionable data is identified in the media segment.

Further optionally the processing of a received media segment may comprise the aligning of time stamps of audio and video data comprised in a media segment. In one embodiment the video and/or the audio frames contained in a media segment are provided with a time stamp that indicates the sequence of frames relative to another upon playback of the frames. Since media segment writer 240 in some embodiments may handle each media segment as a new, separate video film, the time stamp provided to the samples in each segment starts from zero. This may irritate an application that sequentially replays the media segments. To prevent any irritation in any further processing, each media segment may undergo a postprocessing at the server side that replaces the timestamps in a media segment to reflect the correct time of recording relative to the first audio sample and video frame of the first media segment of the video film.

Furthermore the processing of a received media segment may comprise removing of leading ignorable audio samples, i.e. if the audio encoder deployed upon encoding the audio information inserts ignorable so-called priming samples. In particular the ignorable priming samples of any media segment except for the first of a sequence of media segments may be removed in order to enable seamless transition between media segments upon replay. In one embodiment the removing of silent or ignorable audio samples can be achieved by deleting full audio frames from a media segment in case the generating handheld device provided an integer number of audio frames containing ignorable audio samples only as described above.

In case a media segment has been uploaded comprising separate files for video and audio data as described above with reference to figure 2B, the server may generate a single media segment file containing audio and video data. Said single media segment file may be a container containing audio and video data as uploaded.

Furthermore said media segment file may comply with a standard media container format, e.g. an mp4 file format.

Optionally server 130 may generate at least one media segment based on an uploaded media segment, wherein the at least one generated media segment exhibits a lower pixel resolution and/or a lower audio sampling rate, thus providing the media segment in reduced video and/or audio quality, but at a smaller file size.

Furthermore server 130 may generate a so-called playlist file that lists the media segments forming part of a video film. In one embodiment said playlist file may list the sequence of already uploaded media segments of a video film. Accordingly the playlist file is updated whenever another media segment of a video film has been received and successfully processed on the server in order to provide an updated playlist.

The playlist as well as uploaded media segments are provided for download on a network attached storage system, wherein a webserver may provide download of the files from the storage system. A viewer may use any arbitrary device, e.g. any digital device communicatively coupled to the network enabling access to server 130, for downloading the playlist and segments of the video film. While the viewer may download and or watch a first chunk, i.e. a first media segment of a sequence of media segments forming a video film, the generating handheld electronic may generate another segment of the video film, upload that to the server when said segment is completed. Thus the delay between generating a video film and watching said film is reduced.

Claims

Claims:

1. A system (100) comprising at least one handheld electronic device (110)

comprising

- a camera means (220, 221) for generating motion pictures and

- a microphone means (230, 231) for generating audio samples, and

- a wireless interface means (260) capable of uploading a file to a remote server (130), and

- a digital signal processor adapted for executing an application (210) that is adapted to control the camera means (220, 221), the microphone means (230, 231) and the wireless interface (260) and a media segment writer module (240) to generate a media segment of a video film based on video frames provided by the camera means (220, 221) and audio samples provided by the microphone means (230, 231), and to control the media segment writer (240) to encode the audio samples into at least one audio frame, and wherein the duration of the number of video frames equals the duration of the number of audio samples provided for one media segment, and

wherein the application (210) controls the radio interface to upload the media segment to a server (130).

2. The handheld electronic device (110) wherein the application is adapted to

control the camera means (220, 221) to generate motion pictures and to control the microphone means (230, 231) to generate audio samples for a subsequent media segment of a video film while simultaneously uploading a current media segment of the video film via the radio interface (260) to the server.

3. The system of claim 1 wherein the media segment comprises one file containing video data and one file containing audio data.

4. The system of any preceding claim further comprising a server (130) wherein the server generates a single media segment file based on audio and video data of an uploaded media segment.

5. The system of claim 1 wherein the application (210) is adapted to control the media segment writer (230) to generate an integer number of leading ignorable audio frames.

6. The system of claim 5 wherein the server deletes an integer number of ignorable audio frames from an uploaded media segment.

7. The system of any preceding claim wherein the server (130) is adapted to

generate a playlist file listing at least one media segment of a sequence of uploaded media segments, and to provide the playlist file and the at least one media segment on a network attached storage for downloading.

8. A method for providing a media segment of a video film comprising the

following steps

- generating video frames by a camera means (220, 221) comprised in a handheld device (110); and

- generating audio samples by a microphone means (230, 231) comprised in the handheld device (110); and

- controlling the camera means, the microphone means and a media segment writer (240) by an application comprised in the handheld device to generate a media segment of a video film based on video frames provided by the camera means and audio samples provided by the microphone means,

wherein media segment writer encodes the audio samples into at least one audio frame, and wherein the duration of the number of video frames equals the duration of the number of audio samples provided for one media segment, and

- uploading the media segment to a server (130) via a radio interface (260) comprised in the handheld electronic device (110).

9. The method of claim 8 wherein the application (210) controls the camera means (220, 221) to generate motion pictures and the microphone means (230, 231) to generate audio samples for a subsequent media segment while simultaneously uploading a current media segment of the video film via the radio interface (260) to the server (130).

10. The method of claim 8 wherein the media segment comprises one file

containing video data and one file containing audio data.

11. The method of claim 10 further comprising generating a single media segment file based on the file containing video data and the file containing audio data.

12. The method of claim 8 wherein the application controls the media segment writer (240) to generate an integer number of leading ignorable audio frames.

13. The method of claim 12 further comprising the step of deleting the integer

number of leading ignorable audio frames from a media segment at server (130).

14. The method of any preceding claim 8 to 13 further comprising the step of

generating a playlist file listing at least one media segment of a sequence of uploaded media segments, and providing the playlist file and the at least one media segment on a network attached storage for downloading.