CN113873271A

CN113873271A - Video stream playing method and device, storage medium and electronic equipment

Info

Publication number: CN113873271A
Application number: CN202111052506.3A
Authority: CN
Inventors: 刘海涛
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-31
Anticipated expiration: 2041-09-08
Also published as: CN113873271B

Abstract

The invention discloses a video stream playing method, a video stream playing device, a storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; responding to a first video playing request, and determining a first video frame set and a second video frame set from a plurality of reference frames; acquiring a first video difference frame matched with a first moment in a first video frame set, a first type of target video difference frame and a target video key frame; and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment. The invention solves the technical problem of low utilization rate of the code stream in the video stream playing mode.

Description

Video stream playing method and device, storage medium and electronic equipment

Technical Field

The invention relates to the field of computers, in particular to a video stream playing method, a video stream playing device, a storage medium and electronic equipment.

Background

In a video stream playing scene, related technologies often adopt GOP (group of pictures) to encode a video frame group, because GOP is a video frame combination of the ippppp.

However, an excessively long GOP may cause a delay problem, for example, if the GOP takes 10 seconds, the viewer end pulls the video stream for 9 seconds, and for the integrity of decoding, the viewer end before 9 seconds needs to be pulled to start decoding, that is, a delay of about 9 seconds is generated. That is, the GOP length is limited, which further causes a problem that the utilization rate of the video stream playing mode in the related art to the code stream is low.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a video stream playing method and device, a storage medium and electronic equipment, which at least solve the technical problem that the utilization rate of a video stream playing mode to a code stream is low.

According to an aspect of the embodiments of the present invention, there is provided a video stream playing method, including: acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames; determining a first video frame set and a second video frame set from the plurality of reference frames in response to the first video playing request, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on target video key frames in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame; acquiring a first video difference frame matched with the first time in the first video frame set, the first type of target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type of target video difference frame; and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

According to another aspect of the embodiments of the present invention, there is also provided a video stream playing apparatus, including: a first obtaining unit, configured to obtain a first video playing request, where the first video playing request is used to request that a target video starts to be played at a first time, and the target video is associated with multiple reference frames; a first determining unit, configured to determine, in response to the first video playing request, a first video frame set and a second video frame set from the plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame; a second obtaining unit, configured to obtain a first video difference frame, the first type of target video difference frame, and the target video key frame, which are matched with the first time point, in the first video frame set, where the first video difference frame belongs to the second type of target video difference frame; a first decoding unit, configured to decode the first video difference frame according to the target video key frame and the first-class target video difference frame, so as to obtain a target video stream corresponding to the target video at the first time.

As an alternative, the first decoding unit includes: a first decoding module, configured to decode the first-class target video differential frame according to the target video key frame to obtain first video frame data corresponding to a second moment when the first-class target video differential frame is decoded in the target video, where the first-class target video differential frame is used to represent a difference between the first video frame data and video frame data corresponding to the target video key frame; a second decoding module, configured to decode the first video difference frame according to the first type of target video difference frame, so as to obtain second video frame data corresponding to the first time in the target video; and the integration module is used for integrating the first video frame data and the second video frame data to obtain the target video stream.

As an alternative, the second decoding module includes: a first decoding submodule, configured to decode the first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first time in the target video when the first time is a time next to the second time, where the first video difference frame is used to indicate a difference between the first sub-video frame data and video frame data corresponding to the first type of target video difference frame; or, a second decoding submodule, configured to, when a third time is a time next to the second time and the first time is a time next to the third time, decode a second video difference frame corresponding to the third time in the plurality of reference frames according to the first video frame data, to obtain second sub-video frame data corresponding to the third time in the target video, where the second video difference frame is used to indicate a difference between the second sub-video frame data and video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second time in the target video, wherein the first video difference frame is used for representing a difference between the third sub-video frame data and video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain the second video frame data.

As an optional solution, the first determining unit includes: a first determining module, configured to determine, from the multiple reference frames, that a target video frame set in which the first video difference frame is located is a first target video frame set, where the first target video frame set includes the first video frame set and the second video frame set, each of the target video frame sets includes a key-type video frame and multiple non-key-type video frames, the target video key frame is a video frame of the key type, and the first-type target video difference frame and the second-type target video difference frame are video frames of the non-key type; a second determining module, configured to determine the first video frame set and the second video frame set from the first target video frame set.

As an optional solution, the apparatus further includes: a third obtaining unit, configured to obtain a second video playing request, where the second video playing request is used to request that the target video starts to be played at a fourth time; a second determining unit, configured to determine a second target video frame set from the plurality of reference frames in response to the second video playing request, where the second target video frame set includes a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set includes a third type target video difference frame and a fourth type target video difference frame, the third type target video difference frame is a difference frame determined based on a video frame belonging to the key class in the fourth video frame set, and the fourth type target video difference frame is a difference frame determined based on the third type target video difference frame; a fourth obtaining unit, configured to obtain a third video difference frame in the third video frame set, where the third video difference frame matches the fourth time, and video frames in the third type target video difference frame and the fourth video frame set that belong to the key type, where the third video difference frame belongs to the fourth type target video difference frame; a second decoding unit, configured to decode the third video difference frame according to the video frame belonging to the key class in the fourth video frame set and the third type target video difference frame, so as to obtain a video stream corresponding to the target video at the fourth time.

As an alternative, the method comprises the following steps: a fifth obtaining unit, configured to obtain, before the first video playing request is obtained, first video data corresponding to a first moment of the target video in the second video frame set, where the first video data is video frame data corresponding to the target video key frame; a sixth obtaining unit, configured to obtain, before the obtaining of the first video playing request, a data difference between the first video data and second video data at a time next to the first time, where the data difference is video frame data corresponding to a target video difference frame; and an integrating unit, configured to integrate the target video key frames and the target video difference frames according to a time sequence before the first video playing request is obtained, so as to obtain the second video frame set.

As an optional solution, the apparatus is applied to a target live broadcast application, and is characterized in that: a seventh obtaining unit, configured to obtain the first video playing request triggered on the live client; a third determining unit, configured to determine a playing time of the target video in response to the first video playing request; a first playing unit, configured to play the target video stream when the playing time is the first time; and a second playing unit, configured to play a target video stream of the target video corresponding to a second time corresponding to the first-class target video difference frame when the playing time is the second time.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above video stream playing method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the video stream playing method through the computer program.

In the embodiment of the present invention, a first video playing request is obtained, where the first video playing request is used to request that a target video is played from a first time, and the target video is associated with a plurality of reference frames; determining a first video frame set and a second video frame set from the plurality of reference frames in response to the first video playing request, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on target video key frames in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame; acquiring a first video difference frame matched with the first time in the first video frame set, the first type of target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type of target video difference frame; decoding the first video differential frame according to the target video key frame and the first-class target video differential frame to obtain a target video stream corresponding to the target video at the first time, segmenting a plurality of original reference frames into two (or more) sets through a first video frame set and a second video frame set, regarding each set as a GOP substructure by using a mode that the first-class target video differential frame is directly used as a differential frame of the target video key frame, and then forming a complete GOP structure by a plurality of GOP substructures, namely, adjusting the original GOP structure IPPPPPP … … into IPPP and P (I) PP … …, thereby achieving the purpose of taking into account the advantages of high code stream utilization rate of a long GOP structure and low delay of a short GOP structure, and further achieving the technical effect of improving the code stream utilization rate in the video stream playing process, and then the technical problem that the utilization rate of the code stream is low in a video stream playing mode is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a schematic diagram of an application environment of an alternative video stream playing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a flow of an alternative video stream playing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an alternative video stream playing method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an alternative video stream playing method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating an alternative video stream playing method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an alternative video stream playing method according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating an alternative video stream playing method according to an embodiment of the present invention;

fig. 8 is a schematic diagram of an alternative video stream playing apparatus according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an alternative video stream playback device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the accompanying drawings in the description and claims of the present invention and the embodiments of the above drawings are included to clearly and completely describe the technical solutions in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The size and/or location of the area is measured. A process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, a video stream playing method is provided, and optionally, as an optional implementation manner, the video stream playing method may be applied, but not limited, to the environment shown in fig. 1. The system may include, but is not limited to, a user equipment 102, a network 110, and a server 112, wherein the user equipment 102 may include, but is not limited to, a display 108, a processor 106, and a memory 104.

The specific process comprises the following steps:

step S102, user equipment 102 acquires a first video playing request;

step S104-S106, the user equipment 102 sends a first video playing request to the server 112 through the network 110;

step S108, the server 112 searches, through the database 114, frame data corresponding to a plurality of reference frames requested to be played by the first video playing request, and processes the frame data through the processing engine 116, thereby generating a target video stream;

steps S110-S112, the server 112 sends the target video stream to the user device 102 via the network 110, and the processor 106 in the user device 102 displays the target video stream on the display 108 and stores the target video stream in the memory 104.

In addition to the example shown in fig. 1, the above steps may be performed by the user device 102 independently, that is, the user device 102 performs the steps of generating the target video stream, and the like, so as to relieve the processing pressure of the server. The user equipment 102 includes, but is not limited to, a handheld device (e.g., a mobile phone), a notebook computer, a desktop computer, a vehicle-mounted device, and the like, and the specific implementation manner of the user equipment 102 is not limited in the present invention.

Optionally, as an optional implementation manner, as shown in fig. 2, the video stream playing method includes:

s202, a first video playing request is obtained, wherein the first video playing request is used for requesting that a target video is played from a first moment, and the target video is associated with a plurality of reference frames;

s204, responding to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frames and a second type of target video difference frames, the first type of target video difference frames are difference frames determined based on target video key frames in the second video set, and the second type of target video difference frames are difference frames determined based on the first type of target video difference frames;

s206, acquiring a first video difference frame matched with a first moment in a first video frame set, a first type of target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type of target video difference frame;

s208, decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

Optionally, in this embodiment, the video stream playing method may be, but is not limited to, used in a playing scene of a live stream, for example, I, P, P, P, P, P, P, p.

To further illustrate, if it is assumed that there are currently 8 live streams, the related art would perform storage of video frames with video frame combinations of the types of video key frames I1, video difference frames P1, video difference frames P2, video difference frames P3, video difference frames P4, video difference frames P5, video difference frames P6, video difference frames P7, and video difference frames P8. In this embodiment, a special difference frame (first type target video difference frame) is used to split the 9-frame live broadcast stream into two or more video frame sets, for example, as shown in fig. 3, a video key frame I1, a video difference frame P1, a video difference frame P2, a video difference frame P3, a video difference frame P4 are located in the second video frame set 302, a video difference frame PI1, a video difference frame P5, a video difference frame P6, a video difference frame P7, and a video difference frame P9 are located in the first video frame set 304, where the video difference frame PI1 is the above-mentioned special difference frame (first type target video difference frame);

alternatively, in the present embodiment shown in fig. 3, solid arrows are used to indicate the sequence relationship between video frames, such as the time of the video key frame T1 being the last time of the video difference frame P1; the dashed arrows are used to indicate the reference relationship between video frames, such as the video difference frame PI1 being the difference frame of the video key frame I1. In this way, in the embodiment, although the 9-frame live stream is divided into two or more video frame sets, different video frame sets are associated by using a special difference frame (such as the video difference frame PT1), so that the 9-frame live stream can still be stored as a whole, the effect of the video frame set is equivalent to the utilization rate of the video frame combination pair code stream of the video key frame I1, the video difference frame P1, the video difference frame P2, the video difference frame P3, the video difference frame P4, the video difference frame P5, the video difference frame P6, the video difference frame P7 and the video difference frame P8, and the video difference frame PT1 is used as a video key frame of another form, thereby making up the defect of the video frame combination of the type on the delay problem.

Optionally, in this embodiment, the target video key frame may be, but is not limited to, an I frame, and the first type target video difference frame and the second type target video difference frame may be, but is not limited to, a P frame.

The I frame may be, but not limited to, an intra-coded frame, which is an independent frame with all information, and may be independently coded without referring to other pictures, that is, all intra-coded frames. Which can be simply understood as a still picture. The first frame in the video sequence may always be, but is not limited to, an I-frame;

a P-frame may be understood, but not limited to, a predictively coded frame that requires reference to a previous frame for coding. The difference between the current frame picture and the previous frame (the previous frame may be an I frame or a P frame) is shown. When decoding, the difference defined by the frame needs to be overlapped by the picture buffered before, and a final picture is generated. P-frames generally occupy fewer data bits than I-frames, but are very sensitive to transmission errors due to their complex dependencies on previous P and I-reference frames.

Alternatively, in the present embodiment, the second type of target video difference frame may be understood, but not limited, as representing the difference between the current frame picture and the previous frame, and the first type of target video difference frame may be understood, but not limited, as representing the difference between the current frame picture and the I frame (target video key frame).

It should be noted that, in the related art, the combination mode of IPPPP is often limited by the length, if more frames are needed to be stored, more I frames, i.e. I1PPPP and I2PPPP … …, need to be set in the video sequence, and the I frame usually occupies more data bits, which naturally affects the utilization rate of the code stream.

Optionally, in this embodiment, a special P frame is used to replace part of the I frame, as shown in fig. 4, a video key frame I1, a video difference frame P1, a video difference frame P2, a video key frame I2, a video difference frame P4, and a video difference frame P5 are stored in the original video sequence, where the number of the video key frames is 2; by adopting the video stream playing method of the embodiment, the video key frame I2 is replaced by the difference frame (video difference frame PT1) of the video key frame T1, so that the convenience of the combination of the short video key frames is still maintained in the replaced video sequence compared with the original video sequence, and meanwhile, the number of the video key frames is reduced, thereby improving the utilization rate of the code stream in the video stream playing process.

Optionally, in this embodiment, when the coding layer codes the special P frame, it is forced to use the previous IDR frame (the first I frame) as the reference frame to take the inter-frame residual, so as to achieve the effect of generating the second key frame similar to the normal mode by using the special P frame and the previous I frame; for the pull layer, the CDN or the interface layer may only issue the first IDR, and then download the data after the special P frame to achieve the effect of low latency.

For further example, it is optionally assumed that the common encoded code streams are I1, P1, P2, P3, P4, I2, P5, P6, P7, and P8, and if the user pulls the stream from P5, the CDN drops the stream to two frames I2 and P5 to render; however, if the encoded code stream optimized by the present embodiment is I1, P1, P2, P3, P4, SP1, P5, P6, P7, and P8, and further, it is assumed that the user pulls the stream from P5, the CDN will send down three frames I1 SP 1P 5 to take rendering, then the decoding layer decodes I1 and SP1 to generate a video frame equivalent to I2, and then the P5 continues decoding with the frame data equivalent to I2.

It should be noted that, the original multiple reference frames are divided into two (or multiple) sets by the first video frame set and the second video frame set, each set is regarded as a GOP substructure by using the way that the first-class target video differential frame is directly used as the differential frame of the target video key frame, and then a complete GOP structure is formed by multiple GOP substructures together, that is, the original GOP structure IPPPP … … is adjusted to IPPP and P (I) PP … …, which considers the advantages of low play delay of the short GOP structure and the advantages of high code stream utilization of the long GOP structure.

Further by way of example, optionally, for example, as shown in fig. 5, a first video playing request is obtained on an interface of a live broadcast room list, as shown in (a) in fig. 5, where the first video playing request is used to request that a target video (i.e., a live broadcast picture of a selected live broadcast room) is played from a first time, and the target video is associated with multiple reference frames; determining a first video frame set and a second video frame set from a plurality of reference frames in response to a first video playing request, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frames and a second type of target video difference frames, the first type of target video difference frames are difference frames determined based on target video key frames in the second video set, and the second type of target video difference frames are difference frames determined based on the first type of target video difference frames; acquiring a first video difference frame matched with a first moment in a first video frame set, a first type of target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type of target video difference frame; decoding a first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at a first moment; the target video stream is further displayed as shown in (b) of fig. 5.

According to the embodiment provided by the application, a first video playing request is obtained, wherein the first video playing request is used for requesting that a target video is played from a first moment, and the target video is associated with a plurality of reference frames; determining a first video frame set and a second video frame set from a plurality of reference frames in response to a first video playing request, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frames and a second type of target video difference frames, the first type of target video difference frames are difference frames determined based on target video key frames in the second video set, and the second type of target video difference frames are difference frames determined based on the first type of target video difference frames; acquiring a first video difference frame matched with a first moment in a first video frame set, a first type of target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type of target video difference frame; and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment, so that the purposes of taking into account the advantages of high code stream utilization rate of a long GOP structure and low delay of a short GOP structure are achieved, and the technical effect of improving the code stream utilization rate in the video stream playing process is achieved.

As an optional scheme, decoding a first video difference frame according to a target video key frame and a first type of target video difference frame to obtain a target video stream corresponding to a target video at a first time, includes:

s1, decoding a first type of target video differential frame according to the target video key frame to obtain first video frame data corresponding to a second moment when the first type of target video differential frame is decoded in the target video, wherein the first type of target video differential frame is used for representing the difference between the first video frame data and the video frame data corresponding to the target video key frame;

s2, decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first time in the target video;

and S3, integrating the first video frame data and the second video frame data to obtain the target video stream.

It should be noted that, a first-class target video differential frame is decoded according to a target video key frame to obtain first video frame data corresponding to a second moment when the first-class target video differential frame is decoded in a target video, where the first-class target video differential frame is used to represent a difference between the first video frame data and video frame data corresponding to the target video key frame; decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video; and integrating the first video frame data and the second video frame data to obtain the target video stream.

For further example, optionally, as shown in fig. 3, a video difference frame PT1 (a first type target video difference frame) is decoded according to a video key frame T1 (a target video key frame), so as to obtain a first video frame data corresponding to a second time point of the decoding of the first type target video difference frame in a target video; decoding a first video difference frame (such as a video difference frame P5) according to the first type of target video difference frame to obtain second video frame data corresponding to a first moment in the target video; and integrating the first video frame data and the second video frame data to obtain the target video stream.

As an optional scheme, decoding a first video difference frame according to a first-class target video difference frame to obtain second video frame data corresponding to a first time in a target video, includes:

s1, under the condition that the first time is the next time of the second time, decoding a first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first time in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and the video frame data corresponding to the first type of target video difference frame; or the like, or, alternatively,

s2, when the third time is the next time of the second time and the first time is the next time of the third time, decoding a second video difference frame corresponding to the third time in the multiple reference frames according to the first video frame data to obtain second sub-video frame data corresponding to the third time in the target video, wherein the second video difference frame is used for representing the difference between the second sub-video frame data and the video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

It should be noted that, when the first time is a next time of the second time, the first video difference frame is decoded according to the first video frame data to obtain first sub-video frame data corresponding to the first time in the target video, where the first video difference frame is used to represent a difference between the first sub-video frame data and video frame data corresponding to the first type of target video difference frame.

For further example, as shown in fig. 3, it is further shown that, if the first video difference frame is a video difference frame P5, it indicates that the time corresponding to the video difference frame P5 is the next time of the time corresponding to the video difference frame PI1 (the first type of target video difference frame), and the video difference frame P5 is decoded according to the video difference frame PI1 to obtain the first sub-video frame data corresponding to the first time in the target video.

It should be noted that, when the third time is a time next to the second time and the first time is a time next to the third time, decoding a second video difference frame corresponding to the third time in the multiple reference frames according to the first video frame data to obtain second sub-video frame data corresponding to the third time in the target video, where the second video difference frame is used to represent a difference between the second sub-video frame data and video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

For further example, as shown in fig. 3, it is assumed that the first video difference frame is a video difference frame P6, which indicates that a time corresponding to the video difference frame P6 is a next time of a time corresponding to the video difference frame P5 (a second type target video difference frame), and a time corresponding to the video difference frame P5 is a next time of a time corresponding to the video difference frame PI1 (a first type target video difference frame), and then the video difference frame P6 is decoded according to the video difference frame PI1 to obtain second sub-video frame data in the target video, and then the video difference frame P6 is decoded according to the video difference frame P5 to obtain third sub-video frame data in the target video.

As an alternative, determining the first set of video frames and the second set of video frames from a plurality of reference frames includes:

s1, determining a target video frame set where a first video difference frame is located from a plurality of reference frames as a first target video frame set, wherein the first target video frame set comprises a first video frame set and a second video frame set, each target video frame set comprises a video frame of a key class and a plurality of video frames of non-key classes, the target video key frame is a video frame of a key class, and the first target video difference frame and the second target video difference frame are video frames of non-key classes;

s2, a first set of video frames and a second set of video frames are determined from the first set of target video frames.

Alternatively, in this embodiment, since P frames (video frames of non-key classes) have complex dependencies on previous P and I reference frames, and are therefore very sensitive to transmission errors, if there is only one I frame in a video sequence and there are multiple P frames, although the P frames occupy fewer data bits, the error tolerance rate of video streaming playback to transmission errors is also reduced. In this regard, a higher-dimensional set, i.e., a target video frame set, is configured on the first video frame set and the second video frame set, and each target video frame set is limited to include only one I frame, but a plurality of target video frame sets, i.e., representations, may be included in the video sequence, so as to maintain a balance between a transmission error tolerance and a code stream utilization.

It is to be noted that a target video frame set where a first video difference frame is located is determined from a plurality of reference frames as a first target video frame set, where the first target video frame set includes a first video frame set and a second video frame set, each target video frame set includes a video frame of a key class and a plurality of video frames of a non-key class, the target video key frame is a video frame of a key class, and the first type target video difference frame and the second type target video difference frame are video frames of a non-key class; a first set of video frames and a second set of video frames are determined from the first set of target video frames.

To further illustrate, optionally based on the scenario shown in fig. 3, and continuing as shown in fig. 6, a set of target video frames in which a first video difference frame (e.g., the video difference frame P5) is located is determined from the plurality of reference frames as a first set of target video frames 602; a first set of video frames 302 and a second set of video frames 304 are determined from a first set of target video frames 602.

As an optional solution, the method further comprises:

s1, acquiring a second video playing request, wherein the second video playing request is used for requesting that the target video is played from the fourth time;

s2, responding to a second video playing request, determining a second target video frame set from a plurality of reference frames, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third type target video difference frame and a fourth type target video difference frame, the third type target video difference frame is a difference frame determined based on video frames belonging to a key class in the fourth video frame set, and the fourth type target video difference frame is a difference frame determined based on the third type target video difference frame;

s3, acquiring a third video difference frame matched with a fourth time in a third video frame set, and video frames belonging to key classes in a third class target video difference frame and a fourth video frame set, wherein the third video difference frame belongs to a fourth class target video difference frame;

and S4, decoding the third video difference frame according to the video frames belonging to the key class in the fourth video frame set and the third class target video difference frame to obtain a video stream corresponding to the target video at the fourth time.

It should be noted that a second video playing request is obtained, where the second video playing request is used to request that the target video is played from the fourth time; determining a second target video frame set from the plurality of reference frames in response to the second video playing request, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third type target video difference frame and a fourth type target video difference frame, the third type target video difference frame is a difference frame determined based on video frames belonging to a key class in the fourth video frame set, and the fourth type target video difference frame is a difference frame determined based on the third type target video difference frame; acquiring a third video difference frame matched with a fourth time in a third video frame set, and video frames belonging to a key class in a third class target video difference frame and a fourth video frame set, wherein the third video difference frame belongs to a fourth class target video difference frame; and decoding the third video difference frame according to the video frames belonging to the key class in the fourth video frame set and the third class target video difference frame to obtain a video stream corresponding to the target video at the fourth moment.

For further example, optionally based on the scenario shown in fig. 6, continuing to obtain a second video playing request, for example, as shown in fig. 7, where the second video playing request is used to request that the target video is played from the fourth time; determining a second target video frame set 702 from the plurality of reference frames in response to the second video playing request, wherein the second target video frame set 702 comprises a third video frame set 706 and a fourth video frame set 704, the fourth video frame set 704 is a previous video frame set of the third video frame set 706, the third video frame set 706 comprises a third type target video difference frame (video difference frame PI2) and a fourth type target video difference frame (video difference frame P13, video difference frame P14, video difference frame P15, video difference frame P16), and the third type target video difference frame is a difference frame determined based on a video frame (video key frame I2) belonging to a key class in the fourth video frame set 704; acquiring a third video difference frame (such as a video difference frame P13) matched with the fourth time in the third video frame set 706, a third type target video difference frame (a video difference frame PI2) and a video frame (a video key frame I2) belonging to a key class in the fourth video frame set; and decoding the third video difference frame (such as the video difference frame P13) according to the third type target video difference frame (the video difference frame PI2) and the video frame (the video key frame I2) belonging to the key class in the fourth video frame set to obtain a video stream corresponding to the target video at the fourth moment.

As an optional scheme, before obtaining the first video playing request, the method includes:

s1, acquiring first video data corresponding to the first moment of the target video in the second video frame set, wherein the first video data are video frame data corresponding to the key frame of the target video;

s2, acquiring a data difference between the first video data and second video data at a next moment of the first moment, wherein the data difference is video frame data corresponding to a target video difference frame;

and S3, integrating the target video key frames and the target video difference frames according to the time sequence to obtain a second video frame set.

As an optional solution, the method is applied to a target live broadcast application, and is characterized in that the method further includes:

s1, acquiring a first video playing request triggered on the live client;

s2, responding to the first video playing request, and determining the playing time of the target video;

s3, playing the target video stream under the condition that the playing time is the first time;

and S4, when the playing time is the second time corresponding to the first type of target video difference frame, playing the target video stream corresponding to the target video at the second time.

Optionally, in this embodiment, the manner of determining the playing time of the target video may be divided into at least two, one of which is to determine a candidate time closest to the current time at which the first video playing request is triggered, and determine a video frame corresponding to the candidate time, that is, to play the target video stream; and secondly, determining a candidate key moment closest to the current moment for triggering the first video playing request, and determining a video key frame corresponding to the candidate key moment, namely playing a target video stream corresponding to the target video at the second moment.

It should be noted that, a first video playing request triggered on a live client is acquired; responding to the first video playing request, and determining the playing time of the target video; under the condition that the playing time is the first time, playing the target video stream; and under the condition that the playing time is the second time corresponding to the first type of target video difference frame, playing the target video stream corresponding to the target video at the second time.

For further example, optionally, for example, as shown in fig. 5, a first video playing request triggered on the live client is obtained, as shown in (a) in fig. 5; responding to the first video playing request, and determining the playing time of the target video; in the case where the play time is the first time, playing the target video stream, as shown in (b) of fig. 5; and in the case that the playing time is the second time corresponding to the difference frame of the first type of target video, playing the target video stream corresponding to the target video at the second time, as shown in (b) in fig. 5.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided a video stream playing apparatus for implementing the video stream playing method. As shown in fig. 8, the apparatus includes:

a first obtaining unit 802, configured to obtain a first video playing request, where the first video playing request is used to request that a target video starts to be played from a first time, and the target video is associated with multiple reference frames;

a first determining unit 804, configured to determine, in response to a first video playing request, a first video frame set and a second video frame set from a plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame;

a second obtaining unit 806, configured to obtain a first video difference frame, a first type of target video difference frame, and a target video key frame, where the first video difference frame is matched with a first time in a first video frame set, and the first type of target video difference frame belongs to a second type of target video difference frame;

the first decoding unit 808 is configured to decode the first video difference frame according to the target video key frame and the first type of target video difference frame, so as to obtain a target video stream corresponding to the target video at the first time.

Optionally, in this embodiment, the video stream playing apparatus may be but is not limited to be used in a playing scene of a live stream, for example, I, P, P, P, P, P, P, p.

For a specific embodiment, reference may be made to the example shown in the video stream playing method, which is not described herein again in this example.

As an alternative, the first decoding unit 808 includes:

the first decoding module is used for decoding a first type of target video differential frame according to the target video key frame to obtain first video frame data corresponding to a second moment when the first type of target video differential frame is decoded in the target video, wherein the first type of target video differential frame is used for representing the difference between the first video frame data and the video frame data corresponding to the target video key frame;

the second decoding module is used for decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first moment in the target video;

and the integration module is used for integrating the first video frame data and the second video frame data to obtain the target video stream.

As an alternative, the second decoding module includes:

the first decoding submodule is used for decoding a first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first moment in the target video under the condition that the first moment is the next moment of the second moment, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and the video frame data corresponding to the first type of target video difference frame; or the like, or, alternatively,

the second decoding submodule is used for decoding a second video difference frame corresponding to a third moment in the multiple reference frames according to the first video frame data to obtain second sub-video frame data corresponding to the third moment in the target video under the condition that the third moment is the next moment of the second moment and the first moment is the next moment of the third moment, wherein the second video difference frame is used for representing the difference between the second sub-video frame data and the video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain second video frame data.

As an optional solution, the first determining unit 804 includes:

the first determining module is used for determining a target video frame set where a first video difference frame is located from a plurality of reference frames as a first target video frame set, wherein the first target video frame set comprises a first video frame set and a second video frame set, each target video frame set comprises a key video frame and a plurality of non-key video frames, the target video key frame is a key video frame, and the first type target video difference frame and the second type target video difference frame are non-key video frames;

and the second determining module is used for determining the first video frame set and the second video frame set from the first target video frame set.

As an optional scheme, the apparatus further comprises:

a third obtaining unit, configured to obtain a second video playing request, where the second video playing request is used to request that a target video is played from a fourth time;

a second determining unit, configured to determine a second target video frame set from the multiple reference frames in response to a second video playing request, where the second target video frame set includes a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set includes a third type target video difference frame and a fourth type target video difference frame, the third type target video difference frame is a difference frame determined based on a video frame belonging to a key class in the fourth video frame set, and the fourth type target video difference frame is a difference frame determined based on the third type target video difference frame;

the fourth acquiring unit is used for acquiring a third video difference frame matched with a fourth time in a third video frame set, and a third type target video difference frame and a video frame belonging to a key type in a fourth video frame set, wherein the third video difference frame belongs to a fourth type target video difference frame;

and the second decoding unit is used for decoding the third video difference frame according to the video frames belonging to the key class in the fourth video frame set and the third class target video difference frame to obtain a video stream corresponding to the target video at the fourth moment.

As an alternative, as shown in fig. 9, the method includes:

a fifth obtaining unit 902, configured to obtain, before obtaining the first video playing request, first video data corresponding to a first moment of the target video in the second video frame set, where the first video data is video frame data corresponding to a key frame of the target video;

a sixth obtaining unit 904, configured to obtain, before obtaining the first video playing request, a data difference between the first video data and second video data at a time next to the first time, where the data difference is video frame data corresponding to a target video difference frame;

an integrating unit 906, configured to integrate the target video key frames and the target video difference frames according to the time sequence before the first video playing request is obtained, so as to obtain a second video frame set.

As an optional solution, the apparatus is applied to a target live broadcast application, and is characterized in that:

a seventh obtaining unit, configured to obtain a first video playing request triggered on a live client;

a third determining unit configured to determine a play time of the target video in response to the first video play request;

the first playing unit is used for playing the target video stream under the condition that the playing time is the first time;

and the second playing unit is used for playing the target video stream corresponding to the target video at the second moment under the condition that the playing moment is the second moment corresponding to the first-class target video difference frame.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the video stream playing method, as shown in fig. 10, the electronic device includes a memory 1002 and a processor 1004, the memory 1002 stores a computer program, and the processor 1004 is configured to execute the steps in any one of the method embodiments through the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

s2, in response to a first video playing request, determining a first video frame set and a second video frame set from a plurality of reference frames, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frames and a second type of target video difference frames, the first type of target video difference frames are difference frames determined based on target video key frames in the second video set, and the second type of target video difference frames are difference frames determined based on the first type of target video difference frames;

s3, acquiring a first video difference frame matched with a first moment in a first video frame set, a first type of target video difference frame and a target video key frame, wherein the first video difference frame belongs to a second type of target video difference frame;

s4, decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

The memory 1002 may be used to store software programs and modules, such as program instructions/modules corresponding to the video stream playing method and apparatus in the embodiments of the present invention, and the processor 1004 executes various functional applications and data processing by running the software programs and modules stored in the memory 1002, that is, implements the video stream playing method described above. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1002 may be specifically, but not limited to, used for storing information such as a first video frame set, a second video frame set, a target video key frame, and a target video stream. As an example, as shown in fig. 10, the memory 1002 may include, but is not limited to, a first obtaining unit 802, a first determining unit 804, a second obtaining unit 806, and a first decoding unit 808 in the video stream playing apparatus. In addition, the video stream playing apparatus may further include, but is not limited to, other module units in the video stream playing apparatus, which is not described in this example again.

Optionally, the above-mentioned transmission device 1006 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1006 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices so as to communicate with the internet or a local area Network. In one example, the transmission device 1006 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1008, configured to display information such as the first video frame set, the second video frame set, the target video key frame, and the target video stream; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. The nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, and other electronic devices, may become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. A processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the video stream playing method, wherein the computer program is configured to execute the steps in any of the method embodiments described above.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for playing a video stream, comprising:

acquiring a first video playing request, wherein the first video playing request is used for requesting to play a target video from a first moment, and the target video is associated with a plurality of reference frames;

determining a first video frame set and a second video frame set from the plurality of reference frames in response to the first video playing request, wherein the second video frame set is a previous video frame set of the first video frame set, the first video frame set comprises a first type of target video difference frames and a second type of target video difference frames, the first type of target video difference frames are difference frames determined based on target video key frames in the second video set, and the second type of target video difference frames are difference frames determined based on the first type of target video difference frames;

acquiring a first video difference frame matched with the first time in the first video frame set, the first type of target video difference frame and the target video key frame, wherein the first video difference frame belongs to the second type of target video difference frame;

and decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

2. The method of claim 1, wherein the decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first time comprises:

decoding the first type of target video differential frame according to the target video key frame to obtain first video frame data corresponding to a second moment of the decoding of the first type of target video differential frame in the target video, wherein the first type of target video differential frame is used for representing the difference between the first video frame data and the video frame data corresponding to the target video key frame;

decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first time in the target video;

and integrating the first video frame data and the second video frame data to obtain the target video stream.

3. The method of claim 2, wherein the decoding the first video difference frame according to the first type of target video difference frame to obtain second video frame data corresponding to the first time in the target video comprises:

when the first time is the next time of the second time, decoding the first video difference frame according to the first video frame data to obtain first sub-video frame data corresponding to the first time in the target video, wherein the first video difference frame is used for representing the difference between the first sub-video frame data and the video frame data corresponding to the first type of target video difference frame; or the like, or, alternatively,

when a third moment is a moment next to the second moment and the first moment is the moment next to the third moment, decoding a second video difference frame corresponding to the third moment in the multiple reference frames according to the first video frame data to obtain second sub-video frame data corresponding to the third moment in the target video, wherein the second video difference frame is used for representing a difference between the second sub-video frame data and video frame data corresponding to the first type of target video difference frame; decoding the first video difference frame according to the second video difference frame to obtain third sub-video frame data corresponding to the second moment in the target video, wherein the first video difference frame is used for representing the difference between the third sub-video frame data and the video frame data corresponding to the second video difference frame; and integrating the second sub-video frame data and the third sub-video frame data to obtain the second video frame data.

4. The method of claim 1, wherein determining a first set of video frames and a second set of video frames from the plurality of reference frames comprises:

determining a target video frame set in which the first video difference frame is located from the plurality of reference frames as a first target video frame set, wherein the first target video frame set comprises the first video frame set and the second video frame set, each target video frame set comprises a video frame of a key class and a plurality of video frames of a non-key class, the target video key frame is a video frame of the key class, and the first target video difference frame and the second target video difference frame are video frames of the non-key class;

determining the first set of video frames and the second set of video frames from the first set of target video frames.

5. The method of claim 4, further comprising:

acquiring a second video playing request, wherein the second video playing request is used for requesting to play the target video from a fourth time;

determining a second target video frame set from the plurality of reference frames in response to the second video playing request, wherein the second target video frame set comprises a third video frame set and a fourth video frame set, the fourth video frame set is a previous video frame set of the third video frame set, the third video frame set comprises a third type target video difference frame and a fourth type target video difference frame, the third type target video difference frame is a difference frame determined based on a video frame belonging to the key class in the fourth video frame set, and the fourth type target video difference frame is a difference frame determined based on the third type target video difference frame;

acquiring a third video difference frame matched with the fourth time in the third video frame set, and video frames belonging to the key class in the third class target video difference frame and the fourth video frame set, wherein the third video difference frame belongs to the fourth class target video difference frame;

and decoding the third video difference frame according to the video frames belonging to the key class in the fourth video frame set and the third class target video difference frame to obtain a video stream corresponding to the target video at the fourth moment.

6. The method according to any one of claims 1 to 5, comprising, before said obtaining the first video playback request:

acquiring first video data corresponding to the first moment of the target video in the second video frame set, wherein the first video data is video frame data corresponding to the key frame of the target video;

acquiring a data difference between the first video data and second video data at a time next to the first time, wherein the data difference is video frame data corresponding to a target video difference frame;

and integrating the target video key frame and the target video difference frame according to a time sequence to obtain the second video frame set.

7. The method of any one of claims 1 to 5, applied to a target live application, further comprising:

acquiring the first video playing request triggered on a live client;

responding to the first video playing request, and determining the playing time of the target video;

under the condition that the playing time is the first time, playing the target video stream;

and under the condition that the playing time is a second time corresponding to the first type of target video difference frame, playing a target video stream corresponding to the target video at the second time.

8. A video stream playback apparatus, comprising:

the video playing method comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a first video playing request, the first video playing request is used for requesting that a target video is played from a first moment, and the target video is associated with a plurality of reference frames;

a first determining unit, configured to determine, in response to the first video playing request, a first video frame set and a second video frame set from the plurality of reference frames, where the second video frame set is a previous video frame set of the first video frame set, the first video frame set includes a first type of target video difference frame and a second type of target video difference frame, the first type of target video difference frame is a difference frame determined based on a target video key frame in the second video set, and the second type of target video difference frame is a difference frame determined based on the first type of target video difference frame;

a second obtaining unit, configured to obtain a first video difference frame, the first type of target video difference frame, and the target video key frame, which are matched with the first time point, in the first video frame set, where the first video difference frame belongs to the second type of target video difference frame;

and the first decoding unit is used for decoding the first video difference frame according to the target video key frame and the first type of target video difference frame to obtain a target video stream corresponding to the target video at the first moment.

9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.