CN113873271B

CN113873271B - Video stream playing method and device, storage medium and electronic equipment

Info

Publication number: CN113873271B
Application number: CN202111052506.3A
Authority: CN
Inventors: 刘海涛
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2023-08-11
Anticipated expiration: 2041-09-08
Also published as: CN113873271A

Abstract

The invention discloses a video stream playing method and device, a storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; responding to a first video playing request, and determining a first video frame set and a second video frame set from a plurality of reference frames; acquiring a first video difference frame matched with a first moment in a first video frame set, and a first type of target video difference frame and a target video key frame; and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment. The invention solves the technical problem that the video stream playing mode has lower utilization rate of the code stream.

Description

Video stream playing method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computers, and in particular, to a video stream playing method and apparatus, a storage medium, and an electronic device.

Background

In a video stream playing scenario, the related art often uses GOP (Group of Pictures) to encode a video frame group, and since GOP is an ippppppp.

However, too long GOP may cause delay problem, for example, assuming that GOP takes 10 seconds, the viewer side pulls the video stream to play at 9 seconds, and for decoding integrity, the viewer side needs to pull 9 seconds before to start decoding, that is, delay of about 9 seconds is generated. That is, the GOP length is limited, which results in the problem that the video stream playing mode in the related art has low utilization rate of the code stream.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a video stream playing method and device, a storage medium and electronic equipment, which are used for at least solving the technical problem that the utilization rate of a code stream is low in a video stream playing mode.

According to an aspect of an embodiment of the present invention, there is provided a video stream playing method, including: acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; responding to the first video playing request, determining a first video frame set and a second video frame set from the plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame; acquiring a first video difference frame matched with the first moment in the first video frame set, and the first type target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type target video difference frame; and decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

According to another aspect of the embodiment of the present invention, there is also provided a video stream playing device, including: a first obtaining unit, configured to obtain a first video playing request, where the first video playing request is used to request to play a target video from a first moment, and the target video is associated with a plurality of reference frames; a first determining unit, configured to determine, in response to the first video play request, a first video frame set and a second video frame set from the plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame; the second obtaining unit is configured to obtain a first video difference frame that is matched with the first time in the first video frame set, and the first type target video difference frame and the target video key frame, where the first video difference frame belongs to the second type target video difference frame; and the first decoding unit is used for decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

As an alternative, the first decoding unit includes: the first decoding module is configured to decode the first type of target video difference frame according to the target video key frame to obtain first video frame data corresponding to a second moment in the target video, where the first type of target video difference frame is used to represent a difference between the first video frame data and video frame data corresponding to the target video key frame; the second decoding module is used for decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video; and the integration module is used for integrating the first video frame data and the second video frame data to obtain the target video stream.

As an alternative, the second decoding module includes: a first decoding submodule, configured to decode the first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first time in the target video when the first time is a time next to the second time, where the first video difference frame is used to represent a difference between the first sub-video frame data and video frame data corresponding to the first class of target video difference frame; or, a second decoding submodule, configured to decode, when a third time is a time next to the second time and the first time is a time next to the third time, a second video difference frame corresponding to the third time in the plurality of reference frames according to the first video frame data, to obtain second sub-video frame data corresponding to the third time in the target video, where the second video difference frame is used to represent a difference between the second sub-video frame data and video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain the second video frame data.

As an alternative, the first determining unit includes: a first determining module, configured to determine, from the plurality of reference frames, a target video frame set in which the first video difference frame is located as a first target video frame set, where the first target video frame set includes the first video frame set and the second video frame set, each of the target video frame sets includes a video frame of a key class and a plurality of video frames of a non-key class, the target video key frame is a video frame of the key class, and the first target video difference frame and the second target video difference frame are video frames of the non-key class; and the second determining module is used for determining the first video frame set and the second video frame set from the first target video frame set.

As an alternative, the apparatus further includes: a third obtaining unit, configured to obtain a second video playing request, where the second video playing request is used to request playing of the target video from a fourth time; a second determining unit, configured to determine a second target video frame set from the plurality of reference frames in response to the second video play request, where the second target video frame set includes a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set includes a third category target video difference frame and a fourth category target video difference frame, the third category target video difference frame is a difference frame determined based on a video frame belonging to the key category in the fourth video frame set, and the fourth category target video difference frame is a difference frame determined based on the third category target video difference frame; a fourth obtaining unit, configured to obtain a third video difference frame that matches the fourth time in the third video frame set, and the third target video difference frame and a video frame that belongs to the key class in the fourth video frame set, where the third video difference frame belongs to the fourth target video difference frame; and the second decoding unit is used for decoding the third video difference frame according to the video frames belonging to the key class and the third category target video difference frame in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth time.

As an alternative, it includes: a fifth obtaining unit, configured to obtain first video data corresponding to a first time in the second video frame set of the target video before the first video playing request is obtained, where the first video data is video frame data corresponding to a key frame of the target video; a sixth obtaining unit, configured to obtain a data difference between the first video data and second video data at a time next to the first time before the first video playing request is obtained, where the data difference is video frame data corresponding to a target video difference frame; and the integrating unit is used for integrating the target video key frames and the target video difference frames according to a time sequence before the first video playing request is acquired to obtain the second video frame set.

As an alternative, the method is applied to the target live broadcast application, and the device further includes: a seventh obtaining unit, configured to obtain the first video playing request triggered on the live client; a third determining unit, configured to determine a playing time of the target video in response to the first video playing request; a first playing unit, configured to play the target video stream when the playing time is the first time; and the second playing unit is used for playing the target video stream corresponding to the target video at the second moment when the playing moment is the second moment corresponding to the first type target video difference frame.

According to yet another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described video stream playing method when run.

According to still another aspect of the embodiment of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the video stream playing method described above through the computer program.

In the embodiment of the invention, a first video playing request is acquired, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; responding to the first video playing request, determining a first video frame set and a second video frame set from the plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame; acquiring a first video difference frame matched with the first moment in the first video frame set, and the first type target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type target video difference frame; the first video difference frame is decoded according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first moment, the first video frame set and the second video frame set are used for dividing the original same multiple reference frames into two (or more) sets, each set is regarded as a GOP (group of pictures) substructure in a mode that the first type target video difference frame is directly used as the difference frame of the target video key frame, and then the GOP substructures jointly form a complete GOP structure, namely the aim of adjusting the original GOP structure IPPPP … … into IPPP and P (I) PP … … is achieved, and the purposes of achieving the advantages of high code stream utilization rate of a long GOP structure and low delay of a short GOP structure are achieved, so that the technical effect of improving the utilization rate of the code stream in the video stream playing process is achieved, and the technical problem that the utilization rate of the code stream is lower in the video stream playing mode is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative video streaming method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a flow of an alternative video streaming method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative video streaming method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an alternative video streaming method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an alternative video streaming method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an alternative video streaming method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an alternative video streaming method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an alternative video stream playback device in accordance with an embodiment of the application;

FIG. 9 is a schematic diagram of an alternative video stream playback device in accordance with an embodiment of the present application;

Fig. 10 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that, the technical solutions in the embodiments of the present invention will be clearly and completely described in the specification of the present invention, the claims and the drawings in the term of the embodiments in the above drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention. The size and/or location of the area is measured. A process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present invention, a video stream playing method is provided, optionally, as an optional implementation manner, the video stream playing method may be applied, but is not limited to, in the environment shown in fig. 1. Including but not limited to a user device 102, a network 110, and a server 112, where the user device 102 may include but is not limited to a display 108, a processor 106, and a memory 104.

The specific process comprises the following steps:

step S102, user equipment 102 obtains a first video playing request;

steps S104-S106, the user equipment 102 sends the first video playing request to the server 112 through the network 110;

step S108, the server 112 searches the database 114 for frame data corresponding to the plurality of reference frames requested to be played by the first video playing request, and processes the frame data by the processing engine 116, thereby generating a target video stream;

in steps S110-S112, the server 112 sends the target video stream to the user device 102 via the network 110, the processor 106 in the user device 102 displays the target video stream in the display 108, and stores the target video stream in the memory 104.

In addition to the example shown in fig. 1, the above steps may be performed independently by the user equipment 102, i.e., the steps of generating the target video stream, etc., are performed by the user equipment 102, thereby reducing the processing pressure of the server. The user device 102 includes, but is not limited to, a handheld device (e.g., a mobile phone), a notebook computer, a desktop computer, a vehicle-mounted device, etc., and the invention is not limited to a particular implementation of the user device 102.

Optionally, as an optional implementation manner, as shown in fig. 2, the video stream playing method includes:

s202, a first video playing request is obtained, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

s204, responding to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame;

s206, acquiring a first video difference frame matched with a first moment in a first video frame set, a first type target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type target video difference frame;

and S208, decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

Optionally, in this embodiment, the video stream playing method may be, but not limited to, used in a playing scene of a live stream, for example, in the related art, a video frame combination of I, P, P, P, P, P, P and p. is generally adopted for playing a live stream, but an obvious playing delay is generated at the viewer end due to an excessively long video frame combination, so that in this embodiment, the video frame combination is adjusted to be a long video frame combination formed by a plurality of video frame sets, and a mode of jointly forming a long video frame combination by a plurality of short video frame combinations is considered to have both low playing delay and high code stream utilization rate.

Further by way of example, if it is assumed that there is an 8-frame live stream currently, the related art will often store video frames in a video frame combination of the types of video key frame I1, video difference frame P2, video difference frame P3, video difference frame P4, video difference frame P5, video difference frame P6, video difference frame P7, and video difference frame P8. In this embodiment, a 9-frame live stream is split into two or more video frame sets by using a special difference frame (a first type of target video difference frame), for example, as shown in fig. 3, a video key frame I1, a video difference frame P2, a video difference frame P3, and a video difference frame P4 are located in the second video frame set 302, and a video difference frame PI1, a video difference frame P5, a video difference frame P6, a video difference frame P7, and a video difference frame P9 are located in the first video frame set 304, where the video difference frame PI1 is the special difference frame (the first type of target video difference frame);

Alternatively, in the present embodiment shown in fig. 3, solid arrows are used to indicate the sequence relationship between video frames, for example, the time of the video key frame T1 is the last time of the video difference frame P1; the dashed arrow is used to represent a reference relationship between video frames, such as video difference frame PI1 being a difference frame of video key frame I1. In this way, in this embodiment, although the 9-frame live stream is split into two or more video frame sets, a special difference frame (such as the video difference frame PT 1) is used to correlate the different video frame sets, so that the 9-frame live stream is still stored as a whole, and the effect is equivalent to the utilization rate of the video frame combination of the type of video key frame I1, the video difference frame P2, the video difference frame P3, the video difference frame P4, the video difference frame P5, the video difference frame P6, the video difference frame P7 and the video difference frame P8, and the defect of the type of video frame combination in the delay problem is overcome by using the video difference frame PT1 as another type of video key frame.

Alternatively, in this embodiment, the target video key frames may be, but not limited to, I frames, and the first type target video difference frames and the second type target video difference frames may be, but not limited to, P frames.

The I frame may be, but not limited to, an intra-frame, an independent frame with all information, and may be independently encoded without reference to other images, i.e., all intra-frames. Can be simply understood as a still picture. The first frame in the video sequence may always be, but is not limited to all I frames;

p-frames may be understood, but are not limited to, predictively encoded frames, requiring reference to previous frames for encoding. The difference between the current frame picture and the previous frame (the previous frame may be an I frame or a P frame). When decoding, the difference defined by the frame is overlapped by the picture cached before, and a final picture is generated. P frames typically occupy fewer data bits than I frames, but are very sensitive to transmission errors due to the complex dependence of P frames on previous P and I reference frames.

Alternatively, in the present embodiment, the second type of target video difference frame may be understood as, but not limited to, representing the difference of the current frame picture from the previous frame, and the first type of target video difference frame may be understood as, but not limited to, representing the difference of the current frame picture from the I frame (target video key frame).

It should be noted that, in the related art, the combination manner of IPPPP is often limited by the length effect, if more frames need to be stored, more I frames, i.e. I1PPPP, I2PPPP … …, need to be set in the video sequence, and the I frames usually occupy more data bits, which naturally affects the utilization rate of the code stream.

Optionally, in this embodiment, a special P frame is used to replace part of the I frames, as shown in fig. 4, where video key frames I1, video difference frames P2, video key frames I2, video difference frames P4, and video difference frames P5 are stored in the original video sequence, where the number of video key frames is 2; by adopting the video stream playing mode of the embodiment, the difference frame (the video difference frame PT 1) of the video key frame T1 is used for replacing the video key frame I2, so that the replaced video sequence still maintains the convenience of combining the short video key frames compared with the original video sequence, and meanwhile, the number of the video key frames is reduced, and the utilization rate of the code stream in the video stream playing process is further improved.

Optionally, in this embodiment, when the encoding layer encodes the special P frame, the encoding layer will force to take the inter-frame residual error by taking the previous IDR frame (the first I frame) as the reference frame, so as to achieve the effect that the second key frame similar to the normal mode can be generated by using the special P frame and the previous I frame; for the pull layer, the CDN or the interface layer may only issue the first IDR, and then download the data after the special P frame to achieve the low latency effect.

Further by way of example, assuming that the common encoded streams are I1, P2, P3, P4, I2, P5, P6, P7, P8, assuming that the user pulls from P5, the CDN issues two frames I2P 5 for rendering; however, if the optimized encoded code streams in this embodiment are I1, P2, P3, P4, SP1, P5, P6, P7, and P8, and if the user pulls from P5, the CDN will issue three frames I1 SP 1P 5 to render, then the decoding layer decodes I1 and SP1 to generate a video frame with an equivalent I2, and then P5 continues to decode with the equivalent I2 frame data.

It should be noted that, the first video frame set and the second video frame set divide the same multiple original reference frames into two (or more) sets, each set is regarded as a GOP substructure by using the first type of target video difference frames directly as the difference frames of the target video key frames, and then the multiple GOP substructures together form a complete GOP structure, that is, the method is equivalent to adjusting the original GOP structure IPPPP … … to IPPP and P (I) PP … …, and combines the advantages of low play delay of the short GOP structure and the advantages of high code stream utilization rate of the long GOP structure.

Further by way of example, as shown in fig. 5, a first video playing request is optionally acquired on an interface of the list of living broadcast rooms, as shown in (a) in fig. 5, where the first video playing request is used to request to play a target video (i.e. a live view of a selected living broadcast room) from a first moment, and the target video is associated with a plurality of reference frames; responding to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame; acquiring a first video difference frame matched with a first moment in a first video frame set, and a first type target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type target video difference frame; decoding a first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at a first moment; the target video stream is further displayed as shown in fig. 5 (b).

According to the embodiment provided by the application, a first video playing request is obtained, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; responding to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame; acquiring a first video difference frame matched with a first moment in a first video frame set, and a first type target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type target video difference frame; and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment, so that the aim of taking into account the advantages of high code stream utilization rate of a long GOP structure and low delay of a short GOP structure is fulfilled, and the technical effect of improving the utilization rate of the code stream in the video stream playing process is realized.

As an alternative, decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first time, including:

s1, decoding a first type of target video difference frame according to a target video key frame to obtain first video frame data corresponding to a second moment of decoding the first type of target video difference frame in the target video, wherein the first type of target video difference frame is used for representing the difference between the first video frame data and the video frame data corresponding to the target video key frame;

s2, decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video;

and S3, integrating the first video frame data and the second video frame data to obtain a target video stream.

It should be noted that, decoding a first type of target video difference frame according to a target video key frame to obtain first video frame data corresponding to a second moment in the target video decoded by the first type of target video difference frame, where the first type of target video difference frame is used to represent a difference between the first video frame data and the video frame data corresponding to the target video key frame; decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video; and integrating the first video frame data and the second video frame data to obtain a target video stream.

Further by way of example, as shown in fig. 3, optionally, the video difference frame PT1 (first type of target video difference frame) is decoded according to the video key frame T1 (target video key frame), so as to obtain first video frame data corresponding to the second moment in the target video decoded by the first type of target video difference frame; decoding a first video difference frame (such as a video difference frame P5) according to the first type of target video difference frame to obtain second video frame data corresponding to a first moment in the target video; and integrating the first video frame data and the second video frame data to obtain a target video stream.

As an alternative, decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first time in the target video, including:

s1, under the condition that the first moment is the next moment of the second moment, decoding a first video difference frame according to first video frame data to obtain first sub-video frame data corresponding to the first moment in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and video frame data corresponding to the first type of target video difference frame; or alternatively, the first and second heat exchangers may be,

s2, decoding a second video difference frame corresponding to the third moment in the plurality of reference frames according to the first video frame data under the condition that the third moment is the next moment of the second moment and the first moment is the next moment of the third moment to obtain second sub-video frame data corresponding to the third moment in the target video, wherein the second video difference frame is used for representing the difference between the second sub-video frame data and the video frame data corresponding to the first type of target video difference frame; decoding a first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to a second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

Under the condition that the first moment is the next moment of the second moment, decoding the first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first moment in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and the video frame data corresponding to the first type of target video difference frame.

For further illustration, as shown in fig. 3, if the first video difference frame is a video difference frame P5, the time corresponding to the video difference frame P5 is the next time corresponding to the video difference frame PI1 (the first type of target video difference frame), and then the video difference frame P5 is decoded according to the video difference frame PI1 to obtain the first sub-video frame data corresponding to the first time in the target video.

When the third time is the next time of the second time and the first time is the next time of the third time, decoding a second video difference frame corresponding to the third time in the plurality of reference frames according to the first video frame data to obtain second sub-video frame data corresponding to the third time in the target video, wherein the second video difference frame is used for representing a difference between the second sub-video frame data and video frame data corresponding to the first type of target video difference frame; decoding a first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to a second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

For further illustration, as shown in fig. 3, if the first video difference frame is a video difference frame P6, the time corresponding to the video difference frame P6 is the next time of the time corresponding to the video difference frame P5 (the second type of target video difference frame), and the time corresponding to the video difference frame P5 is the next time of the time corresponding to the video difference frame PI1 (the first type of target video difference frame), then the video difference frame P6 is decoded first according to the video difference frame PI1 to obtain the second sub-video frame data in the target video, and then the video difference frame P6 is decoded according to the video difference frame P5 to obtain the third sub-video frame data in the target video.

As an alternative, determining the first video frame set and the second video frame set from the plurality of reference frames includes:

s1, determining a target video frame set where a first video difference frame is located from a plurality of reference frames as a first target video frame set, wherein the first target video frame set comprises a first video frame set and a second video frame set, each target video frame set comprises a video frame of a key type and a plurality of video frames of a non-key type, the target video key frames are video frames of the key type, and the first target video difference frames and the second target video difference frames are video frames of the non-key type;

S2, determining a first video frame set and a second video frame set from the first target video frame set.

Alternatively, in this embodiment, since P frames (non-critical video frames) have complex dependencies on previous P and I reference frames, the transmission error is very sensitive, and if there is only one I frame and multiple P frames in the video sequence, the P frames occupy fewer data bits, but the relative error tolerance of the video stream playback to the transmission error is also reduced. In this regard, the first video frame set and the second video frame set are further configured with a higher dimension set, i.e., a target video frame set, and each target video frame set is limited to include only one I frame, but the video sequence may include multiple target video frame sets, i.e., multiple I frames, so as to maintain a balance between the fault tolerance to transmission errors and the utilization of the code stream.

It should be noted that, determining, from a plurality of reference frames, a target video frame set in which a first video difference frame is located as a first target video frame set, where the first target video frame set includes a first video frame set and a second video frame set, each target video frame set includes a video frame of a key class and a plurality of video frames of a non-key class, the target video key frame is a video frame of a key class, and the first type target video difference frame and the second type target video difference frame are video frames of a non-key class; a first set of video frames and a second set of video frames are determined from the first set of target video frames.

Further by way of example, and optionally based on the scenario shown in fig. 3, continuing with the scenario shown in fig. 6, determining, from a plurality of reference frames, a set of target video frames in which a first video difference frame (e.g., video difference frame P5) is located as a first set of target video frames 602; a first set of video frames 302 and a second set of video frames 304 are determined from the first set of target video frames 602.

As an alternative, the method further comprises:

s1, acquiring a second video playing request, wherein the second video playing request is used for requesting to play a target video from a fourth moment;

s2, responding to a second video playing request, determining a second target video frame set from a plurality of reference frames, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third category target video difference frame and a fourth category target video difference frame, the third category target video difference frame is a difference frame determined based on a video frame belonging to a key category in the fourth video frame set, and the fourth category target video difference frame is a difference frame determined based on the third category target video difference frame;

S3, obtaining a third video difference frame matched with a fourth time in a third video frame set, a third category target video difference frame and a video frame belonging to a key category in a fourth video frame set, wherein the third video difference frame belongs to a fourth category target video difference frame;

and S4, decoding the third video difference frame according to the video frames belonging to the key class and the third category target video difference frame in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth time.

It should be noted that, a second video playing request is obtained, where the second video playing request is used to request to play the target video from the fourth moment; responding to a second video playing request, determining a second target video frame set from a plurality of reference frames, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third category target video difference frame and a fourth category target video difference frame, the third category target video difference frame is a difference frame determined based on a video frame belonging to a key category in the fourth video frame set, and the fourth category target video difference frame is a difference frame determined based on the third category target video difference frame; acquiring a third video difference frame matched with a fourth time in a third video frame set, and video frames belonging to a key class in a third category target video difference frame and a fourth video frame set, wherein the third video difference frame belongs to a fourth category target video difference frame; and decoding the third video difference frame according to the video frames belonging to the key class and the third category target video difference frame in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth moment.

Further by way of example, the optional scenario shown in fig. 6 may be based on continuing to obtain a second video playing request, as shown in fig. 7, where the second video playing request is used to request playing of the target video from the fourth time; determining a second target video frame set 702 from a plurality of reference frames in response to a second video playing request, wherein the second target video frame set 702 comprises a third video frame set 706 and a fourth video frame set 704, the fourth video frame set 704 is a previous video frame set of the third video frame set 706, the third video frame set 706 comprises a third category target video difference frame (video difference frame PI 2) and a fourth category target video difference frame (video difference frame P13, video difference frame P14, video difference frame P15, video difference frame P16), and the third category target video difference frame is a difference frame determined based on video frames belonging to a key category (video key frame I2) in the fourth video frame set 704; obtaining a third video difference frame (such as a video difference frame P13) matched with a fourth time in the third video frame set 706, and a third category target video difference frame (video difference frame PI 2) and a video frame belonging to a key category (video key frame I2) in the fourth video frame set; and decoding a third video difference frame (such as a video difference frame P13) according to the third category target video difference frame (the video difference frame PI 2) and the video frame belonging to the key category (the video key frame I2) in the fourth video frame set, and obtaining a video stream corresponding to the target video at the fourth moment.

As an alternative, before the first video playing request is acquired, the method includes:

s1, acquiring first video data corresponding to a target video at the first moment in a second video frame set, wherein the first video data is video frame data corresponding to a target video key frame;

s2, acquiring a data difference between the first video data and second video data at a time next to the first time, wherein the data difference is video frame data corresponding to a target video difference frame;

and S3, integrating the target video key frames and the target video difference frames according to the time sequence to obtain a second video frame set.

As an alternative, the method is applied to the target live broadcast application, and is characterized in that the method further comprises:

s1, acquiring a first video playing request triggered on a live client;

s2, responding to the first video playing request, and determining the playing time of the target video;

s3, playing the target video stream under the condition that the playing time is the first time;

and S4, playing the target video stream corresponding to the target video at the second moment under the condition that the playing moment is the second moment corresponding to the first type of target video difference frame.

Optionally, in this embodiment, the manner of determining the playing time of the target video may be divided into at least two manners, where one manner is to determine a candidate time closest to the current time of triggering the first video playing request, and determine a video frame corresponding to the candidate time, that is, play the target video stream; and secondly, determining a candidate key time closest to the current time triggering the first video playing request, and determining a video key frame corresponding to the candidate key time, namely a target video stream corresponding to the playing target video at the second time.

It should be noted that, a first video playing request triggered on a live client is obtained; responding to the first video playing request, and determining the playing time of the target video; playing the target video stream under the condition that the playing time is the first time; and under the condition that the playing time is a second time corresponding to the first type of target video difference frame, playing the target video stream corresponding to the target video at the second time.

Further by way of example, as shown in fig. 5, optionally, a first video play request triggered on the live client is obtained, as shown in (a) in fig. 5; responding to the first video playing request, and determining the playing time of the target video; in the case where the play time is the first time, the target video stream is played as shown in (b) of fig. 5; in the case where the playing time is the second time corresponding to the first type of target video difference frame, the target video stream corresponding to the target video at the second time is played, as shown in (b) of fig. 5.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiment of the present invention, there is also provided a video stream playing device for implementing the video stream playing method. As shown in fig. 8, the apparatus includes:

a first obtaining unit 802, configured to obtain a first video playing request, where the first video playing request is used to request to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

a first determining unit 804, configured to determine, in response to a first video playing request, a first video frame set and a second video frame set from a plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame;

a second obtaining unit 806, configured to obtain a first video difference frame that is matched with the first time in the first video frame set, and a first type of target video difference frame and a target video key frame, where the first video difference frame belongs to a second type of target video difference frame;

The first decoding unit 808 is configured to decode the first video difference frame according to the target video key frame and the first type target video difference frame, so as to obtain a target video stream corresponding to the target video at the first time.

Optionally, in this embodiment, the video stream playing device may be, but not limited to, used in a playing scene of a live stream, for example, in the related art, a video frame combination of I, P, P, P, P, P, P and p. is generally adopted for playing a live stream, but an excessive video frame combination may cause an obvious playing delay at the viewer end, so that in this embodiment, the video frame combination is adjusted to be a plurality of video frame sets, and a long video frame combination is formed by a plurality of short video frame combinations, which gives consideration to both a low playing delay and a high code stream utilization rate.

For specific embodiments, reference may be made to the examples shown in the video stream playing method, and this example will not be described herein.

As an alternative, the first decoding unit 808 includes:

the first decoding module is used for decoding first-class target video difference frames according to the target video key frames to obtain first video frame data corresponding to second moment in the target video, wherein the first-class target video difference frames are used for representing differences between the first video frame data and the video frame data corresponding to the target video key frames;

the second decoding module is used for decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video;

and the integration module is used for integrating the first video frame data and the second video frame data to obtain a target video stream.

As an alternative, the second decoding module includes:

the first decoding sub-module is used for decoding a first video difference frame according to the first video frame data under the condition that the first moment is the next moment of the second moment to obtain first sub-video frame data corresponding to the first moment in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and the video frame data corresponding to the first type of target video difference frame; or alternatively, the first and second heat exchangers may be,

The second decoding sub-module is used for decoding a second video difference frame corresponding to the third moment in the plurality of reference frames according to the first video frame data under the condition that the third moment is the next moment of the second moment and the first moment is the next moment of the third moment to obtain second sub-video frame data corresponding to the third moment in the target video, wherein the second video difference frame is used for representing the difference between the second sub-video frame data and the video frame data corresponding to the first type of target video difference frame; decoding a first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to a second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

As an alternative, the first determining unit 804 includes:

the first determining module is used for determining a target video frame set where a first video difference frame is located from a plurality of reference frames as a first target video frame set, wherein the first target video frame set comprises a first video frame set and a second video frame set, each target video frame set comprises a video frame of a key class and a plurality of video frames of a non-key class, the target video key frames are video frames of the key class, and the first target video difference frames and the second target video difference frames are video frames of the non-key class;

And the second determining module is used for determining a first video frame set and a second video frame set from the first target video frame set.

As an alternative, the apparatus further includes:

a third obtaining unit, configured to obtain a second video playing request, where the second video playing request is used to request playing of the target video from a fourth time;

the second determining unit is used for responding to a second video playing request and determining a second target video frame set from a plurality of reference frames, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third category target video difference frame and a fourth category target video difference frame, the third category target video difference frame is a difference frame determined based on a video frame belonging to a key category in the fourth video frame set, and the fourth category target video difference frame is a difference frame determined based on the third category target video difference frame;

a fourth obtaining unit, configured to obtain a third video difference frame that is matched with a fourth time in a third video frame set, and a third category target video difference frame and a video frame belonging to a key category in a fourth video frame set, where the third video difference frame belongs to a fourth category target video difference frame;

And the second decoding unit is used for decoding the third video difference frame according to the video frames belonging to the key class and the third category target video difference frame in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth time.

As an alternative, as shown in fig. 9, it includes:

a fifth obtaining unit 902, configured to obtain, before obtaining a first video play request, first video data corresponding to a first moment of a target video in a second video frame set, where the first video data is video frame data corresponding to a target video key frame;

a sixth obtaining unit 904, configured to obtain, before obtaining the first video play request, a data difference between the first video data and second video data at a time next to the first time, where the data difference is video frame data corresponding to the target video difference frame;

the integrating unit 906 is configured to integrate the target video key frame and the target video difference frame according to the time sequence before the first video playing request is acquired, so as to obtain a second video frame set.

As an alternative, applied to the target live broadcast application, the apparatus further includes:

a seventh obtaining unit, configured to obtain a first video playing request triggered on the live client;

a third determining unit, configured to determine a playing time of the target video in response to the first video playing request;

the first playing unit is used for playing the target video stream under the condition that the playing time is the first time;

and the second playing unit is used for playing the target video stream corresponding to the target video at the second moment under the condition that the playing moment is the second moment corresponding to the first type of target video difference frame.

According to a further aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the video stream playing method described above, as shown in fig. 10, the electronic device comprising a memory 1002 and a processor 1004, the memory 1002 having stored therein a computer program, the processor 1004 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

s2, responding to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame;

s3, acquiring a first video difference frame matched with a first moment in a first video frame set, a first type target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type target video difference frame;

and S4, decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 10 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobi le Internet Devices, MID), a PAD, etc. Fig. 10 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

The memory 1002 may be configured to store software programs and modules, such as program instructions/modules corresponding to the video stream playing method and apparatus in the embodiment of the present invention, and the processor 1004 executes the software programs and modules stored in the memory 1002 to perform various functional applications and data processing, that is, implement the video stream playing method described above. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1002 may be used for storing information such as, but not limited to, a first set of video frames, a second set of video frames, a target video key frame, and a target video stream. As an example, as shown in fig. 10, the memory 1002 may include, but is not limited to, the first acquisition unit 802, the first determination unit 804, the second acquisition unit 806, and the first decoding unit 808 in the video stream playing device. In addition, other module units in the video stream playing device may be included but not limited to the above, and will not be described in detail in this example.

Optionally, the transmission device 1006 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1006 includes a network adapter (Network Interface Control ler, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1006 is a Radio Frequency (RF) module for communicating with the internet wirelessly.

In addition, the electronic device further includes: a display 1008, configured to display the first set of video frames, the second set of video frames, the target video key frame, the target video stream, and other information; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the video stream playback method described above, wherein the computer program is arranged to execute the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A video stream playing method, comprising:

acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

responding to the first video playing request, determining a first video frame set and a second video frame set from the plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame;

Acquiring a first video difference frame matched with the first moment in the first video frame set, and the first type target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type target video difference frame;

and decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

2. The method according to claim 1, wherein decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first time includes:

decoding the first type of target video difference frames according to the target video key frames to obtain first video frame data corresponding to the second moment in the target video decoded by the first type of target video difference frames, wherein the first type of target video difference frames are used for representing differences between the first video frame data and the video frame data corresponding to the target video key frames;

Decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video;

and integrating the first video frame data and the second video frame data to obtain the target video stream.

3. The method according to claim 2, wherein decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first time in the target video includes:

decoding the first video difference frame according to the first video frame data under the condition that the first time is the next time of the second time to obtain first sub-video frame data corresponding to the first time in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and video frame data corresponding to the first type of target video difference frame; or alternatively, the first and second heat exchangers may be,

decoding a second video difference frame corresponding to the third moment in the plurality of reference frames according to the first video frame data when the third moment is the next moment in the second moment and the first moment is the next moment in the third moment, so as to obtain second sub-video frame data corresponding to the third moment in the target video, wherein the second video difference frame is used for representing the difference between the second sub-video frame data and the video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain the second video frame data.

4. The method of claim 1, wherein determining a first set of video frames and a second set of video frames from the plurality of reference frames comprises:

determining a target video frame set in which the first video difference frame is located as a first target video frame set from the plurality of reference frames, wherein the first target video frame set comprises the first video frame set and the second video frame set, each target video frame set comprises a video frame of a key class and a plurality of video frames of a non-key class, the target video key frame is a video frame of the key class, and the first target video difference frame and the second target video difference frame are video frames of the non-key class;

and determining the first video frame set and the second video frame set from the first target video frame set.

5. The method according to claim 4, wherein the method further comprises:

acquiring a second video playing request, wherein the second video playing request is used for requesting to play the target video from a fourth moment;

determining a second target video frame set from the plurality of reference frames in response to the second video playing request, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third category target video difference frame and a fourth category target video difference frame, the third category target video difference frame is a difference frame determined based on a video frame belonging to the key category in the fourth video frame set, and the fourth category target video difference frame is a difference frame determined based on the third category target video difference frame;

Acquiring a third video difference frame matched with the fourth time in the third video frame set, and video frames belonging to the key class in the third category target video difference frame and the fourth video frame set, wherein the third video difference frame belongs to the fourth category target video difference frame;

and decoding the third video difference frame according to the video frames belonging to the key class and the third category target video difference frame in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth time.

6. The method according to any one of claims 1 to 5, comprising, prior to said obtaining a first video play request:

acquiring first video data corresponding to the first moment of the target video in the second video frame set, wherein the first video data is video frame data corresponding to the target video key frame;

acquiring a data difference between the first video data and second video data at a time next to the first time, wherein the data difference is video frame data corresponding to a target video difference frame;

and integrating the target video key frame and the target video difference frame according to a time sequence to obtain the second video frame set.

7. The method according to any of claims 1 to 5, applied to a target live application, wherein the method further comprises:

acquiring the first video playing request triggered on the live client;

responding to the first video playing request, and determining the playing time of the target video;

playing the target video stream under the condition that the playing time is the first time;

and under the condition that the playing time is a second time corresponding to the first type target video difference frame, playing the target video stream corresponding to the target video at the second time.

8. A video stream playback device, comprising:

the first acquisition unit is used for acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

a first determining unit, configured to determine, in response to the first video play request, a first video frame set and a second video frame set from the plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type target video difference frame and a second type target video difference frame, the first type target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type target video difference frame is a difference frame determined based on the first type target video difference frame;

The second obtaining unit is used for obtaining a first video difference frame matched with the first moment in the first video frame set, the first type target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type target video difference frame;

and the first decoding unit is used for decoding the first video difference frame according to the target video key frame and the first type target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 7 by means of the computer program.