CN106331871B

CN106331871B - Method and device for realizing fast forward or fast backward of video stream

Info

Publication number: CN106331871B
Application number: CN201510397463.0A
Authority: CN
Inventors: 黄敦笔
Original assignee: Alibaba Group Holding Ltd
Current assignee: Youku Culture Technology Beijing Co ltd
Priority date: 2015-07-08
Filing date: 2015-07-08
Publication date: 2020-01-17
Anticipated expiration: 2035-07-08
Also published as: WO2017005098A1; CN106331871A

Abstract

The invention provides a method and a device for realizing fast forward or fast backward of a video stream, wherein the method comprises the following steps: receiving a fast forward or fast backward request triggered by a user, and positioning a timestamp of the position of the fast forward or fast backward request; acquiring a video metadata stream corresponding to the fast forward or fast backward request segment according to the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position; and selecting a video frame to be played from the video metadata stream according to the playing magnification, and playing the video frame to be played. The method and the device can not ignore the residual video frames of the current sequence group, and can not be influenced by the interval of the I frames, so that the method and the device can avoid the unsmooth problem, and can improve the user experience.

Description

Method and device for realizing fast forward or fast backward of video stream

Technical Field

The present invention relates to the field of multimedia processing technologies, and in particular, to a method and an apparatus for implementing fast forward or fast backward of a video stream.

Background

When watching a video, a user can quickly locate the video playing position in which the user is interested through fast forward or fast backward operation; the playing system responds to the fast forward or fast backward operation triggered by the user, displays a corresponding fast forward or fast backward picture for the user on a playing interface, and directly influences the video watching experience of the user whether the picture is smooth in the fast forward or fast backward process.

In the prior art, most playing systems implement fast forward or fast rewind of video streams by the following method. In one approach, the playback system plays the decoded video stream primarily by fetching a synchronizable frame (I-frame) adjacent to a fast forward or fast reverse request point, decoding the remaining video stream starting from the I-frame; the method ignores video frames of other sequence groups in the fast forward or fast backward request segment, and is easy to have unsmooth playing and jumping phenomena.

In another method, the playing system mainly plays the I frame in the fast forward or fast backward request segment by obtaining the I frame and in a frequency hopping mode; the method has strict requirements on the playing multiplying power and the interval between I frames, and when the interval between the I frames is not constant and equal in length or when the playing multiplying power and the interval between the I frames have no integer multiple relation, the method is easy to have the phenomenon of unsmooth playing, and the user experience is poor.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method for implementing fast forward or fast backward of a video stream, so as to solve the problem in the prior art that unsmooth playing is likely to occur during fast forward or fast backward, thereby improving user experience.

The invention also provides a device for realizing fast forward or fast backward of the video stream, which is used for ensuring the realization and the application of the method in practice.

In one aspect, the present invention provides a method for implementing fast forward or fast backward of a video stream, where the method includes:

receiving a fast forward or fast backward request triggered by a user, and positioning a timestamp of the position of the fast forward or fast backward request;

acquiring a video metadata stream corresponding to the fast forward or fast backward request segment according to the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position;

and selecting a video frame to be played from the video metadata stream according to the playing magnification, and playing the video frame to be played.

In another aspect, the present invention provides an apparatus for fast forwarding or fast rewinding a video stream, the apparatus comprising:

the time stamp positioning unit is used for receiving a fast forward or fast backward request triggered by a user and positioning the time stamp of the position of the fast forward or fast backward request;

a metadata stream acquiring unit, configured to acquire a video metadata stream corresponding to the fast forward or fast backward request segment according to a timestamp of a current play position and a timestamp of the fast forward or fast backward request position;

and the playing unit is used for selecting the video frame to be played from the video metadata stream according to the playing multiplying power and playing the video frame to be played.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the method comprises the steps of firstly, positioning a time stamp of a fast forward or fast backward request position according to the fast forward or fast backward request triggered by a user; then, according to the time stamp of the current playing position and the time stamp of the fast forward or fast backward request position, the video metadata stream corresponding to the fast forward or fast backward request segment is obtained; and finally, selecting a video frame to be played from the video metadata stream according to the playing magnification, and playing the video frame to be played. The invention completely abandons the technical idea of fast forward or fast backward starting from the adjacent I frame or based on the I frame in the prior art, but focuses on the video frames meeting the playing rate in the process of fast forward or fast backward, and the video frames can be decoded and played no matter what types of the video frames are, so that the invention can not ignore sequence groups in the fast forward or fast backward request segment and can not be influenced by the interval of the I frame, thereby avoiding the problem of unsmooth occurrence and improving the user experience.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of an embodiment 1 of a method for implementing fast forward or fast backward of a video stream according to the present invention;

FIG. 2 is a diagram of a buffer for FIFO operation according to the present invention;

FIG. 3 is a block diagram of a segment of a video stream with a hierarchical prediction reference provided by the present invention;

FIG. 4 is a schematic diagram of a frequency hopping playing mode provided by the present invention;

fig. 5 is a flowchart of embodiment 2 of a method for implementing fast forward or fast backward of a video stream according to the present invention;

fig. 6 is a structural diagram of an embodiment 1 of an apparatus for implementing fast forward or fast backward of a video stream according to the present invention;

fig. 7 is a structural diagram of an embodiment 2 of the apparatus for implementing fast forward or fast backward of a video stream according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Referring to fig. 1, fig. 1 is a flowchart of embodiment 1 of a method for implementing fast forward or fast backward of a video stream, where the method may include the following steps:

s101: receiving a fast forward or fast backward request triggered by a user, and positioning a time stamp of the position of the fast forward or fast backward request.

In practical application, when a user watches a video by using a terminal, the user can trigger a fast forward or fast backward request by clicking a fast forward or fast backward button, the user can also trigger the fast forward or fast backward request by dragging a playing progress bar on a playing page, and of course, the user can also trigger the fast forward or fast backward request by a shortcut key on a keyboard.

Aiming at the triggering modes of the software button and the hardware fast forward key, the playing system predefines the operation duration of clicking the fast forward or fast backward button or the fast forward key, such as 2s, and also predefines the single fast forward or fast backward request duration corresponding to the mode of clicking the button or the fast forward key, such as 10s, so as to conveniently position the timestamp of the fast forward or fast backward request position triggered by the user. In practical application, the operation duration and the duration of the single fast forward or fast rewind request may be adaptively adjusted according to requirements, and the duration of the single fast forward request may be the same as or different from the duration of the single fast rewind request. And aiming at the triggering mode of dragging the playing progress bar, recording the dragging time length of the user as the operation time length by the playing system, and positioning the fast forward or fast backward request position according to the position of the video stream when the user stops dragging.

In order to meet the personal requirement of the user on fast forward or fast backward, in practical application, the playing system can also provide the user with a function of setting the time length of a single fast forward or fast backward request, and the user sets the size of the time length of the single fast forward or fast backward request before triggering the fast forward or fast backward operation. Of course, the playback system may provide a setting range for the user, and the user can set the setting range at will.

And S102, acquiring the video metadata stream corresponding to the fast forward or fast backward request segment according to the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position.

In practical application, the playing system may send a request carrying the fast forward or fast backward service type, the timestamp of the current playing position, and the timestamp of the fast forward or fast backward request position to the server, and the server returns a corresponding video metadata stream to the playing system according to the request.

However, the inventor finds that frequent interaction between the playing system and the server sometimes causes a network loading phenomenon, which may affect a user to watch a video, and in order to avoid the network loading phenomenon, the invention provides another implementation manner to enable the playing system to quickly acquire a video metadata stream, which specifically includes: acquiring a video metadata stream corresponding to a time period from the time stamp of the current playing position to the time stamp of the fast forward or fast backward request position from a cache region of a local first-in first-out working mode; the buffer area comprises a forward buffer area and a backward buffer area, the forward buffer area is used for storing data before the current playing position, and the backward buffer area is used for storing data after the current playing position.

The following explains the buffer of the first-in first-out operation mode provided by the present invention by an example. Referring to fig. 2, fig. 2 is a structural diagram of a buffer in a first-in first-out mode according to the present invention, as shown in fig. 2, the buffer stores M seconds forward and N seconds backward; at the time of T0, the current playing position is A0; at time T1(T1 > T0), the buffer data of the black background portion at time T0 is removed, and the buffer data of the white background at time T1 is newly added, and the play position at time T1 is a 1.

Based on the cache structure with the forward cache region and the backward cache region, the playing system directly obtains the corresponding video metadata stream locally, thereby avoiding frequently sending requests to a server and avoiding the phenomenon of frequently network loading.

S103, selecting a video frame to be played from the video metadata stream according to the playing magnification, and playing the video frame to be played.

In practical applications, the playing system can process according to a preset fixed playing magnification. However, the preset fixed playing magnification cannot adapt to the actual requirement of each user, and in order to meet the actual requirement of the user, the invention provides a mode of adaptively adjusting the playing magnification according to the operation triggered by the user, and the mode specifically comprises the following steps:

according to the formula PlayMul ^ 2^ CeilLog2 (Dur)_actual/Dur_request) Calculating the playing multiplying power; PlayMul represents a playback magnification; CeilLog2 (Dur)_actual/Dur_request) Formulaic pair Log2 (Dur)_actual/Dur_request) Taking the upper limit integer value as the result of (1); dur_actualIndicating the duration of the fast forward or fast reverse request segment; dur_requestIndicating the operation duration of the fast forward or fast reverse request.

This mode is explained below by an example.

Suppose that: duration Dur of operation of fast forward or fast reverse request_requestDuration Dur of fast forward or fast reverse request segment of 2 seconds_actual7 seconds; calculating the playing rate Playmu according to the formulal＝4。

Suppose that: as shown in fig. 3, when the playback magnification is 4, the video metadata stream corresponding to the fast forward or fast reverse request segment includes the 1 st frame and video frames corresponding to positions of 4 frames every other frame from the 1 st frame, such as the video frames marked by the black triangle in fig. 3.

The inventor considers that there are various structures of video streams in practical application, and proposes a more specific implementation scheme in order to adapt to the video streams of different structures. The inventors have divided the video stream structure into two broad categories, one being a video stream with a hierarchical structure and the other being a video stream without a hierarchical structure. The inventor comprehensively considers the structures of the two types of video streams and proposes an implementation mode to adapt to the two types of video streams, which specifically comprises the following steps:

a1: analyzing the video metadata stream to obtain the I frame offset position of each image group, and decoding the image groups in parallel according to the I frame offset position of each image group to obtain the data of each image group; and the number of the first and second groups,

a2: and selecting a video frame to be played from the video metadata stream according to the playing magnification, and selecting data corresponding to the video frame to be played from the data of each image group for playing.

The realization mode takes the Group of Pictures (Gop) as a basic unit, from the viewpoint of the Group of Pictures, the Group of Pictures is not influenced by an interlayer structure and a non-interlayer structure, the I frame offset position of each Group of Pictures is firstly analyzed, a good foundation is laid for parallel decoding, the Group of Pictures is decoded in parallel to ensure the timeliness of data playing in the subsequent fast forward or fast backward process, and the phenomena of buffering, pause and the like are avoided.

The inventor considers that there is a reference dependency relationship between video frames in a video stream with a layered structure, and the layered structure comprises a full frame rate, 1/2 frame rate, 1/4 frame rate, and the like, and for such a video stream with an inter-layer dependency relationship, the inventor further proposes an implementation manner, which includes:

b1: analyzing the video metadata stream to obtain the structure information of each video frame, wherein the structure information comprises: video frame sequence number, payload offset and reference frame; and the number of the first and second groups,

b2: and selecting a video frame to be played from the video metadata stream according to the playing magnification, decoding the video frame to be played according to the structural information of the video frame to be played, and playing the data obtained by decoding.

The realization mode starts from the reference dependency relationship of the video frames, and only realizes decoding the video frames to be played without decoding all the video frames on the basis of the reference dependency relationship.

In addition, for video streams with an interlayer structure, the inventor provides a more specific implementation way to further avoid the problem of unsmooth playing. In the specific implementation process of the method, when the video frame to be played is selected, the video frame with higher hierarchical priority is directly selected first, and when the frame is played, the frame is played in a mode of starting frame skipping and subsequent equal intervals.

For example, the structure of the video stream is shown in fig. 3, the position where the user triggers the fast forward or fast reverse request is the 2 nd frame from the left, and the playing of each video frame is shown in fig. 4 by adopting the mode of starting frame skipping and following equal interval when the playing magnification is 2. As shown in fig. 4, the skipped B frames do not need to be decoded, so as to further ensure the timeliness of video playing in the fast forward or fast backward process. Still based on the video stream structure of fig. 3, when playing back, the equal-spacing decoded output is directly used for playing back, and when some video frames are selected for output but the reference frames that depend on them are not selected for output, the reference frames that depend on them also need to be decoded for output.

In addition, considering that the above-mentioned manner of starting frame skipping and subsequent equidistant spacing is adopted, and the distance between the starting frame skipping position and the current playing position is not the same as the subsequent normal equidistant frame skipping playing distance, especially when the distance between the starting frame skipping and the subsequent frame skipping is small and the subsequent frame skipping spacing is large, the unsmooth phenomenon occurs in the starting playing stage of fast forward or fast backward, for this case, the inventor further proposes an implementation manner, specifically, the playing distance in the starting stage is calculated by using the starting frame skipping spacing and the subsequent frame skipping spacing, specifically, the playing distance in the starting stage is equal to the average value of the starting frame skipping spacing and the subsequent frame skipping spacing, and then the playing is delayed after the video frame of the first frame skipping is decoded, so as to alleviate the problem of unsmooth starting playing.

In addition, considering that in practical applications, there is also fast forward or fast backward processing on a multimedia stream, the present invention further provides a specific solution for adapting to an application scenario of the multimedia stream, referring to fig. 5, where fig. 5 is a flowchart of embodiment 2 of a method for implementing fast forward or fast backward of a video stream provided by the present invention, the method includes:

s501, receiving a fast forward or fast backward request triggered by a user, and positioning a time stamp of the fast forward or fast backward request position.

And S502, acquiring the video metadata stream corresponding to the fast forward or fast backward request segment according to the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position.

And S503, selecting a video frame to be played from the video metadata stream according to the playing magnification.

S504, according to the video frame sequence number and the time stamp of the video frame to be played, the audio frame to be played is selected from the audio metadata stream, and the audio frame to be played and the video frame to be played are subjected to audio and video synchronization processing and played.

The main difference between the method shown in fig. 5 and the method shown in fig. 1 is that before playing a video frame, an audio frame to be played is selected according to a timestamp of the video frame to be played, then the video frame and the audio frame to be played are synchronously processed, and finally the audio frame and the video frame are respectively output to corresponding playing devices to be played, so that the multimedia stream can be played in a fast forward or fast backward manner.

Corresponding to the method, the invention provides a device for realizing fast forward or fast backward of the video stream. Referring to fig. 6, fig. 6 is a structural diagram of an embodiment 1 of an apparatus for implementing fast forward or fast backward of a video stream, where the apparatus includes:

a timestamp positioning unit 601, configured to receive a fast forward or fast backward request triggered by a user, and position a timestamp of a location of the fast forward or fast backward request;

a metadata stream obtaining unit 602, configured to obtain a video metadata stream corresponding to the fast forward or fast rewind request segment according to a timestamp of a current play position and a timestamp of the fast forward or fast rewind request position;

a playing unit 603, configured to select a video frame to be played from the video metadata stream according to a playing magnification, and play the video frame to be played.

Optionally, the selecting the playing unit includes:

a structure information parsing subunit, configured to parse the video metadata stream to obtain structure information of each video frame, where the structure information includes: video frame sequence number, payload offset and reference frame;

and the decoding playing subunit is used for selecting the video frame to be played from the video metadata stream according to the playing magnification, decoding the video frame to be played according to the structural information of the video frame to be played, and playing the data obtained by decoding.

Optionally, the selecting the playing unit includes:

the parallel decoding subunit is used for analyzing the video metadata stream to obtain an I frame offset position of each image group, and decoding the image groups in parallel according to the I frame offset position of each image group to obtain data of each image group;

and the selective playing subunit is used for selecting the video frames to be played from the video metadata stream according to the playing magnification and selecting the data corresponding to the video frames to be played from the data of each image group for playing.

Optionally, the apparatus further comprises:

a playback magnification calculation unit for playing Mul 2^ CeilLog2 (Dur)_actual/Dur_request) Calculating the playing multiplying power; PlayMul represents a playback magnification; CeilLog2 (Dur)_actual/Dur_request) Formulaic pair Log2 (Dur)_actual/Dur_request) Taking the upper limit integer value as the result of (1); dur_actualIndicating the duration of the fast forward or fast reverse request segment; dur_requestAn operation duration indicating a fast-forward or fast-rewind request;

the playing unit is configured to select a video frame to be played from the video metadata stream according to the playing magnification calculated by the playing magnification calculating unit, and play the video frame to be played.

Optionally, the metadata stream obtaining unit includes:

a first obtaining subunit, configured to obtain, from a buffer area in a local first-in first-out working mode, a video metadata stream corresponding to a time period from a timestamp of the current play position to a timestamp of the fast-forward or fast-backward request position; the buffer area comprises a forward buffer area and a backward buffer area, the forward buffer area is used for storing data before the current playing position, and the backward buffer area is used for storing data after the current playing position; or,

and the second acquisition subunit is used for sending a request carrying the fast forward or fast backward service type, the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position to the server and receiving the video metadata stream returned by the server according to the request.

Referring to fig. 7, fig. 7 is a structural diagram of an embodiment 2 of an apparatus for implementing fast forward or fast backward of a video stream, where the apparatus further includes, on the basis of the structure of fig. 6:

optionally, the apparatus further comprises:

and the audio and video synchronization unit 604 is configured to select an audio frame to be played from the audio metadata stream according to the video frame sequence number and the timestamp of the video frame to be played, perform audio and video synchronization on the audio frame to be played and the video frame to be played, and play the audio frame to be played and the video frame to be played.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method and apparatus for implementing fast forward or fast backward of video stream provided by the present invention are introduced in detail above, and a specific example is applied in this document to illustrate the principle and implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for fast-forwarding or fast-rewinding a video stream, the method comprising:

selecting a video frame to be played from the video metadata stream according to a playing multiplying power, and playing the video frame to be played, wherein the playing multiplying power is adaptively adjusted according to the operation triggered by the user, and the operation triggered by the user comprises the following steps: the duration of the fast forward or fast rewind request segment, the duration of the operation of the fast forward or fast rewind request.

2. The method according to claim 1, wherein said selecting a video frame to be played from said video metadata stream according to a playback magnification, and playing said video frame to be played comprises:

analyzing the video metadata stream to obtain the structure information of each video frame, wherein the structure information comprises: video frame sequence number, payload offset and reference frame;

and selecting a video frame to be played from the video metadata stream according to the playing magnification, decoding the video frame to be played according to the structural information of the video frame to be played, and playing the data obtained by decoding.

3. The method according to claim 1, wherein said selecting a video frame to be played from said video metadata stream according to a playback magnification, and playing said video frame to be played comprises:

analyzing the video metadata stream to obtain the I frame offset position of each image group, and decoding the image groups in parallel according to the I frame offset position of each image group to obtain the data of each image group;

and selecting a video frame to be played from the video metadata stream according to the playing magnification, and selecting data corresponding to the video frame to be played from the data of each image group for playing.

4. The method of claim 1, wherein the playback magnification is calculated by:

according to the formula PlayMul ^ 2^ CeilLog2 (Dur)_actual/Dur_request) Calculating the playing multiplying power;

PlayMul represents a playback magnification; CeilLog2 (Dur)_actual/Dur_request) Formulaic pair Log2 (Dur)_actual/Dur_request) Taking the upper limit integer value as the result of (1); dur_actualIndicating the duration of the fast forward or fast reverse request segment; dur_requestIndicating the operation duration of the fast forward or fast reverse request.

5. The method of claim 1, wherein before playing the video frame to be played, the method further comprises:

and selecting an audio frame to be played from the audio metadata stream according to the video frame sequence number and the time stamp of the video frame to be played, and carrying out audio and video synchronization processing on the audio frame to be played and the video frame to be played and playing.

6. The method according to claim 1, wherein the obtaining the video metadata stream corresponding to the fast forward or fast backward request segment according to the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position comprises:

acquiring a video metadata stream corresponding to a time period from the time stamp of the current playing position to the time stamp of the fast forward or fast backward request position from a cache region of a local first-in first-out working mode; the buffer area comprises a forward buffer area and a backward buffer area, the forward buffer area is used for storing data before the current playing position, and the backward buffer area is used for storing data after the current playing position; or,

and sending a request carrying the fast forward or fast backward service type, the timestamp of the current playing position and the timestamp of the fast forward or fast backward request position to a server, and receiving a video metadata stream returned by the server according to the request.

7. An apparatus for fast forward or fast reverse of a video stream, the apparatus comprising:

a playing unit, configured to select a video frame to be played from the video metadata stream according to a playing magnification, and play the video frame to be played, where the playing magnification is adaptively adjusted according to the operation triggered by the user, and the operation triggered by the user includes: the duration of the fast forward or fast rewind request segment, the duration of the operation of the fast forward or fast rewind request.

8. The apparatus of claim 7, wherein the playback unit comprises:

9. The apparatus of claim 7, wherein the playback unit comprises:

10. The apparatus of claim 7, further comprising:

a playback magnification calculation unit for playing Mul 2^ CeilLog2 (Dur)_actual/Dur_request) Calculating the playing multiplying power;

PlayMul represents a playback magnification; CeilLog2 (Dur)_actual/Dur_request) Formulaic pair Log2 (Dur)_actual/Dur_request) Taking the upper limit integer value as the result of (1); dur_actualIndicating the duration of the fast forward or fast reverse request segment; dur_requestAn operation duration indicating a fast-forward or fast-rewind request; the playing unit is used for obtaining the result calculated by the playing multiplying power calculating unitSelecting a video frame to be played from the video metadata stream according to the playing multiplying power, and playing the video frame to be played.

11. The apparatus of claim 7, further comprising:

and the audio and video synchronization unit is used for selecting the audio frame to be played from the audio metadata stream according to the video frame sequence number and the time stamp of the video frame to be played, and carrying out audio and video synchronization processing on the audio frame to be played and the video frame to be played and playing.

12. The apparatus according to claim 7, wherein said metadata stream acquiring unit includes: