CN111212025B

CN111212025B - Method and device for transmitting network self-adaptive video stream

Info

Publication number: CN111212025B
Application number: CN201911144329.4A
Authority: CN
Inventors: 李志成
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2022-02-08
Anticipated expiration: 2039-11-20
Also published as: CN111212025A

Abstract

The invention discloses a transmission method and a device of a network self-adaptive video stream. Wherein, the method comprises the following steps: coding a target video stream to be transmitted to obtain a target frame sequence; determining part of frames in the target frame sequence as designated frames, wherein reference frames of the designated frames are frames positioned in front of and behind the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence; and transmitting the target frame sequence to the target equipment through the target network, wherein in the case of congestion of the target network, part or all of designated frames in the target frame sequence are discarded. The invention solves the technical problems that the self-adaptive scheme adopted in the prior art reduces the video coding code rate and simultaneously reduces the video watching image quality and the interactive experience of the user under the condition that the network transmission is congested.

Description

Method and device for transmitting network self-adaptive video stream

Technical Field

The invention relates to the field of computers, in particular to a method and a device for transmitting network self-adaptive video streams.

Background

In the existing mainstream video coding algorithm (h.264/h.265/h.266/AV1/VP9), when a default video is coded, a GOP sequence (I/P/B frame) provides reference for other types of frames in the sequence, the frames are mutually referenced to improve coding compression efficiency, one frame only needs to store the difference between the frame itself and a referenced frame, but the mutual reference mode can cause error propagation, that is, x frames are erroneous, y frames referencing the y frames are also erroneous, z frames referencing the y frames are also erroneous, and so on, if frame discarding occurs in the video transmission playing process, decoding of the coded frames referencing the discarded frames can fail, and abnormal conditions such as black screen, block missing, frame skipping and the like occur.

In a video CDN (Content Delivery Network) distribution process, for poor Network users, there are some Network Adaptive schemes such as hls (http Live Streaming)/dash (dynamic Adaptive Streaming over http) multi-rate Adaptive schemes, but these Adaptive schemes are all solutions for reducing video coding rate based on a Network transmission layer, that is, in a case where Network transmission is congested, video coding rate is forcibly reduced, but in the prior art, video viewing quality and interaction experience of users are also reduced while video coding rate is reduced.

In view of the above problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a transmission method and a transmission device of a network self-adaptive video stream, which at least solve the technical problems that the video coding rate is reduced and the video watching quality and the interactive experience of a user are reduced by adopting a self-adaptive scheme under the condition that the network transmission is congested in the related technology.

According to an aspect of the embodiments of the present invention, there is provided a method for network adaptive video streaming transmission, including: coding a target video stream to be transmitted to obtain a target frame sequence; determining a part of frames in the target frame sequence as designated frames, wherein reference frames of the designated frames are frames which are positioned in front of the designated frames and are positioned behind the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence; and transmitting the target frame sequence to a target device through a target network, wherein in the case of congestion of the target network, part or all of the designated frames in the target frame sequence are discarded.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for network adaptive video streaming, including: the encoding module is used for encoding a target video stream to be transmitted to obtain a target frame sequence; a determining module, configured to determine a part of frames in the target frame sequence as designated frames, where reference frames of the designated frames are frames in front of and behind the designated frames in the target frame sequence, and the designated frames are not reference frames of other frames in the target frame sequence; and the transmission module is used for transmitting the target frame sequence to target equipment through a target network, wherein under the condition that the target network is congested, part or all of the appointed frames in the target frame sequence are discarded.

According to a further aspect of the embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to perform the above method when executed.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method by the computer program.

In the embodiment of the present invention, a part of frames in a target frame sequence are determined as designated frames, and when a target network is congested, part or all of the designated frames in the target frame sequence are discarded, because reference frames of the designated frames are frames located before and after the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence, after part or all of the designated frames in the target frame sequence are discarded, decoding of other frames in the target frame sequence is not affected, and transmission of a video stream can be normal, thereby solving the technical problem that an adaptive scheme adopted in the related art when network transmission is congested reduces video coding rate and reduces user video quality and interaction experience at the same time.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a schematic diagram of an application environment of a transmission method of a network adaptive video stream according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating the transmission of an alternative network adaptive video stream according to an embodiment of the present invention;

FIG. 3 is a block diagram of an encoding framework according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a partition of a CU/PU/TU according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an alternative GOP frame sequence in accordance with embodiments of the present invention;

fig. 6 is a schematic structural diagram of a network adaptive video streaming transmission apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Furthermore, the terms appearing in the present application need to be correspondingly described;

IDR frame: in video coding algorithms (H.264/H.265/H.266/AV1, etc.), images are organized in units of sequences. The first picture of a sequence is called an IDR picture (immediate refresh picture), and IDR pictures are all I-frame pictures.

I frame: IDR will cause the DPB (Decoded Picture Buffer reference frame list) to be empty, while I will not. An IDR picture is necessarily an I picture, but an I picture is not necessarily an IDR picture. There may be many I pictures in a sequence and pictures following an I picture may reference pictures between I pictures for motion reference. There may be many I pictures in a sequence and pictures following an I picture may reference pictures between I pictures for motion reference.

P frame: forward predictive coding the frame; the P frame represents the difference between the frame and a previous key frame (or P frame), and the difference defined by the frame needs to be superimposed on the previously buffered picture during decoding to generate a final picture.

B frame: bi-directionally predicting the interpolated encoded frame; the B frame is a bidirectional difference frame, that is, the B frame records the difference between the current frame and previous and next frames, and may or may not be used as a reference frame for other B frames.

Macro block: the basic unit of coding, a coded image is first divided into a plurality of blocks for processing, and obviously, a macro block should be an integer number of blocks.

Slice (Slice): a frame of video image may be encoded in one or more slices, each slice containing an integer number of macroblocks, i.e. at least one macroblock per slice, and at most, macroblocks of the entire image. The purpose of the slice is to limit the spreading and transmission of bit errors, keeping the coded slices independent of each other.

GOP (group Of Pictures): the interval between two I frames.

Flv (flash video): the FLV streaming media format is a video format that has evolved with the introduction of Flash MX. The video file can be watched on the network due to the fact that the file formed by the video file is extremely small and the loading speed is extremely high, and the video file can be effectively watched on the network.

Hls (http Live streaming): apple's dynamic code rate adaptation technique. The method is mainly used for audio and video services of the PC and the Apple terminal. The index file of m3u8, TS media fragment file and key encryption string file are included.

Dash (dynamic Adaptive Streaming over HTTP), which is a dynamic Adaptive Streaming based on HTTP, like the HLS protocol, which enables high quality Streaming media to be transmitted over the HTTP protocol via an Adaptive bit rate Streaming technique.

CDN (Content Delivery Network): the basic idea is to avoid bottlenecks and links possibly influencing data transmission speed and stability on the Internet as far as possible, so that content transmission is faster and more stable. By placing node servers at various positions of the network to form a layer of intelligent virtual network on the basis of the existing internet, the CDN system can redirect the request of a user to a service node closest to the user in real time according to network flow, connection of each node, load condition, distance to the user, response time and other comprehensive information. The method aims to enable the user to obtain the required content nearby, solve the problem of congestion of the Internet network and improve the response speed of the user for accessing the website.

According to an aspect of the embodiments of the present invention, a method for transmitting a network adaptive video stream is provided. Alternatively, the transmission method of the virtual network adaptive video stream may be applied, but not limited to, in the application scenario shown in fig. 1. As shown in fig. 1, data transmitted by a server 106 is acquired within a terminal 102 through a network 104. The server 106 encodes a target video stream to be transmitted to obtain a target frame sequence, and determines a part of frames in the target frame sequence as designated frames, wherein reference frames of the designated frames are frames positioned in front of and behind the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence; and further transmits the target frame sequence to the terminal 102 through the network, wherein in case of congestion of the target network, part or all of the designated frames in the target frame sequence are discarded.

Optionally, in this embodiment, the terminal may include, but is not limited to, at least one of the following: mobile phones, tablet computers, and the like. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server may include, but is not limited to, at least one of: PCs and other devices used for computing services. The above is only an example, and the present embodiment is not limited to this.

Optionally, in this embodiment, as an optional implementation manner, as shown in fig. 2, the method for transmitting a network self-use video stream may include:

s202, coding a target video stream to be transmitted to obtain a target frame sequence;

s204, determining partial frames in the target frame sequence as designated frames, wherein reference frames of the designated frames are frames positioned in front of and behind the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence;

s206, transmitting the target frame sequence to the target equipment through the target network, wherein under the condition that the target network is congested, part or all of designated frames in the target frame sequence are discarded.

Optionally, in this embodiment, the target video stream may be a video stream of a movie and television work viewed by the user through the target device, or may be a game video stream of a game played through the target device, or the like. In addition, the target frame sequence in the present application may be a GOP frame sequence, and may be other frame sequences.

Taking the target frame sequence as the GOP frame sequence as an example, the GOP frame sequence in this embodiment includes the following cases:

1) fixed GOPs and sequences, for example, a fixed GOP is 120 frames, i.e., an I frame is generated every 120 frames, and the sequence of GOP frames is fixed as follows: ib B P … B I.

2) The fixed sequence of GOP sizes is not fixed, for example, the fixed GOP is 120 frames, i.e. one I frame is generated every 120 frames, and the sequence of GOP frames determines whether a frame is a P frame or a B frame according to the picture complexity and the generation weight of the related P/B frame.

3) Open GOPs (both GOP size and sequence are not fixed) are automatically generated based on picture texture, motion complexity, and I/P/B frame generation strategy and weight configuration.

Therefore, in the case where the target frame sequence is a GOP frame sequence, the specified frame referred to in the present embodiment may be a B frame or a P frame in a GOP, or a combination of a B frame and a P frame.

In this regard, the different types of GOP frame sequences may be obtained by encoding an image through an encoding frame as shown in fig. 3, based on fig. 3, a frame of image is sent to an encoder, and is first divided into Coding Tree Units (CTUs) according to a size of 64 × 64 blocks, and then depth division is performed to obtain Coding Units (CU), where each CU includes a prediction Unit (prediction Unit, PU) and a transform Unit (transform Unit, TU). And predicting each PU to obtain a predicted value, subtracting the predicted value from input data to obtain a residual error, performing DCT (discrete cosine transformation) and quantization to obtain a residual error coefficient, sending the residual error coefficient to an entropy coding module to output a code stream, performing inverse quantization and inverse transformation on the residual error coefficient to obtain a residual error value of a reconstructed image, adding the residual error value and the predicted value to obtain a reconstructed image, performing in-loop filtering on the reconstructed image, sending the reconstructed image to a reference frame queue to serve as a reference image of a next frame, and sequentially coding backwards. During prediction, starting from a Largest Coding Unit (LCU), each layer is divided downwards layer by layer according to a quadtree, and recursive calculation is performed. First, the division is from top to bottom. From depth equal to 0, the 64x64 block is first divided into 4 32x32 sub-CUs blocks. Then, one 32x32 sub-CU is further divided into 4 16x16 sub-CUs CUs, and so on until depth is 3 and CU size is 8x 8. Then, trimming is performed from bottom to top. And summing up the RDcosts of CUs of 4 pieces of 8x8 (marked as cost1), comparing the RDcosts of the CU corresponding to the previous level 16x16 (marked as cost2), if the cost1 is less than the cost2, keeping the CU segmentation of 8x8, and otherwise, continuing to trim upwards and comparing layer by layer. And finally, finding out the optimal CU deep division condition. PU prediction is divided into intra-frame prediction and frame-level prediction, firstly, comparison is carried out between different PUs in the same prediction type to find out an optimal segmentation mode, and then intra-frame inter-frame mode comparison is carried out to find out an optimal prediction mode under the current CU; meanwhile, a Quad-tree structure-based adaptive Transform (RQT) is performed on the CU to find out an optimal TU mode. And finally, dividing a frame of image into the CUs and the PUs and TUs corresponding to the CUs. As shown in fig. 4, the PU has 8 partition modes, and TU has only 2 partition modes or no partition mode.

Further, based on the coding frame shown in fig. 3, the optimal partition mode can be found by predicting and comparing different PUs in the same type, and then the optimal prediction mode under the current CU can be found by comparing intra-frame and inter-frame modes; meanwhile, a Quad-tree structure-based adaptive Transform (RQT) is performed on the CU to find out an optimal TU mode. And finally, dividing a frame of image into the CUs and the PUs and TUs corresponding to the CUs. And predicting each PU by a frame image to obtain a predicted value, subtracting the input data from the predicted value to obtain a residual error, performing DCT (discrete cosine transformation) and quantization to obtain a residual error coefficient, sending the residual error coefficient to an entropy coding module to output a code stream, determining that the frame is I, P, B frames according to coding parameter setting and a code rate control strategy, and determining that different frame types of the I/P/B frame have different reference frame queues.

And determining that the picture frame is an I/P/B frame type by combining a plurality of conditions of the GOP frame sequence and an encoding rate control algorithm, further selecting an optimal segmentation mode by combining the encoding frame in the picture 3, and comparing intra-frame and inter-frame modes to find an optimal prediction mode under the current CU. According to the intra and inter modes predicted by PU analysis and frame type configuration, and according to coding parameter configuration and picture scene complexity, a certain proportion of b frames are selected to refer to the frames before the b frames and also refer to the frames after the b frames (playing sequence), but the b frames cannot be used as reference frames for other frames to refer to the prediction mode analysis, and the reference frame sequence predicted by other frame coding modes is not added, as shown in FIG. 5.

Because the adaptive scheme adopted under the condition of congestion in network transmission in the related art reduces the video coding rate and the video viewing quality and the interactive experience of a user, part of frames in a target frame sequence are determined as designated frames by the method of the embodiment, part or all of the designated frames in the target frame sequence are discarded under the condition of congestion in a target network, and because reference frames of the designated frames are frames before and after the designated frames in the target frame sequence and the designated frames are not used as reference frames of other frames in the target frame sequence, after part or all of the designated frames in the target frame sequence are discarded, decoding of other frames in the target frame sequence is not influenced, transmission of a video stream can be normal, so that the problem that the adaptive scheme adopted under the condition of congestion in network transmission in the related art reduces the video coding rate and simultaneously reduces the video coding rate is solved The technical problems of user video watching image quality and interactive experience are solved.

Optionally, in this embodiment, the manner of discarding some or all of the designated frames in the sequence of target frames referred to in the above step S206 may be further implemented by:

step S206-11, a first group of frames in the designated frames in the target frame sequence are discarded, wherein the frame rate of the target frame sequence after discarding the first group of frames is still larger than the minimum predetermined frame rate.

It should be noted that the first group of frames in the designated frames may be a predetermined number of designated frames that implement the setting, or may be one or more designated frames at random. Taking the target frame sequence as the GOP frame sequence as an example, one GOP frame sequence may be: ib B P B B P … B B B P B B B I, where B is the designated frame referred to in this embodiment, is determined from B-frames and/or P-frames in the sequence of GOP frames. Therefore, the first group of frames in the designated frame (b-frame) may be a plurality of top-ranked b-frames (e.g., top 10 b-frames) or a plurality of frames randomly selected from the b-frames. Of course, the above-mentioned manner is merely an example, and other manners may be used to determine the first group of frames in the specification.

Further, in the event that the target network is still congested after dropping a first set of frames in the designated frames in the sequence of target frames, continuing to drop one or more sets of frames in the designated frames in the sequence of target frames; wherein a frame rate of the sequence of target frames after dropping one or more of the specified frames is still greater than a minimum predetermined frame rate.

That is, after dropping the first group of frames in the designated frames, the target network is still in a congested state, and the designated frames need to be dropped continuously. For example, again taking the target frame sequence as the GOP frame sequence as an example, the first group of frames in the specified frame, such as a plurality of b frames (e.g. the top 10 b frames) in the top sequence, or a plurality of frames randomly selected from the b frames, is discarded in the current plurality of GOP frame sequences; if the current target network is still congested, one or more frames are dropped on a previous basis in the sequence of GOP frames transmitted next, which is a cyclic process. However, no matter how to discard the designated frame, the frame rate of the transmission is still guaranteed to be greater than the minimum predetermined frame rate, which is preferably greater than or equal to 15 frame rates, and when the frame rate is guaranteed to be at least greater than or equal to 15 frame rates, it can be guaranteed that the video stream is not blocked and is basically continuous, although the minimum predetermined frame rate may be other values, and the corresponding setting is performed according to the actual situation.

Optionally, in this embodiment, the manner involved in step S206 for transmitting the target frame sequence to the target device through the target network includes the following manner:

the method (1) transmits the designated frame to the target device through a first transmission channel, and transmits other frames except the designated frame in the target frame sequence to the target device through a second transmission channel, wherein the first transmission channel is different from the second transmission channel.

The method (2) receives an indication message sent by a target device when the target network is congested, wherein the indication message is used for indicating to discard part or all of specified frames in a target frame sequence; and transmitting the frames which are not discarded in the target frame sequence to the target device through the target network.

As can be seen from the above-mentioned manner (1) and manner (2), in this embodiment, for the discarding of the designated frame, it can also be implemented by setting a transmission channel, that is, the designated frame is separately placed in one channel, so that the target device does not directly receive the designated frame of the channel when network transmission congestion occurs.

Optionally, based on the above manner (1) and manner (2), the dropping of some or all of the designated frames in the target frame sequence in this embodiment includes: stopping transmitting the designated frame to the target device through the first transmission channel; alternatively, the target device stops receiving the specified frame through the first transmission channel. That is to say, if the user judges that the network is not good when downloading or watching the video, the first channel data can not be downloaded, so that the bandwidth consumption of the user for downloading the video data is saved, and the network congestion and the blocking are reduced.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided a network adaptive video stream transmission apparatus for implementing the network adaptive video stream transmission method, as shown in fig. 6, the apparatus includes:

(1) the encoding module 62 is configured to encode a target video stream to be transmitted to obtain a target frame sequence;

(2) a determining module 64, configured to determine a part of frames in the target frame sequence as designated frames, where reference frames of the designated frames are frames before and after the designated frames in the target frame sequence, and the designated frames are not reference frames of other frames in the target frame sequence;

(3) a transmission module 66, configured to transmit the target frame sequence to the target device through the target network, wherein in case of congestion of the target network, part or all of the designated frames in the target frame sequence are discarded.

Because the adaptive scheme adopted under the condition of congestion of network transmission in the related technology reduces the video coding rate and simultaneously reduces the video watching quality and the interactive experience of users, by the method of the embodiment, part of B frames in the target frame sequence are determined as designated frames, when the target network is congested, part or all of the designated frames in the target frame sequence are discarded, because the reference frames of the designated frames are the frames positioned in front of and behind the designated frames in the target frame sequence and the reference frames of the designated frames are not used as the reference frames of other frames in the target frame sequence, after the part or all of the designated frames in the target frame sequence are discarded, the decoding of other frames in the target frame sequence is not influenced, the transmission of the video stream can be normal, thereby solving the problem that the adaptive scheme adopted under the condition of congestion of network transmission in the related technology reduces the video coding rate and simultaneously reduces the video watching quality and interactive experience of users The technical problem of mutual experience.

Optionally, the transmission module 66 in this embodiment is further configured to discard a first frame group of the designated frames in the target frame sequence, wherein the frame rate of the target frame sequence after discarding the first frame group is still greater than the minimum predetermined frame rate.

Further, the transmission module 66 in this embodiment is further configured to continue to drop one or more frames in the designated frames in the target frame sequence if the target network is still congested after dropping the first frame in the designated frames in the target frame sequence; wherein the frame rate of the sequence of target frames after dropping one or more of the specified frames is still greater than the minimum predetermined frame rate.

That is, after dropping the first group of frames in the designated frames, the target network is still in a congested state, and the designated frames need to be dropped continuously. For example, taking the target frame sequence as the GOP frame sequence as an example, the first group of frames in the specified frames, such as a plurality of b frames (e.g. the top 10 b frames) ordered at the top, or a plurality of frames randomly selected from the b frames, is discarded in the current plurality of GOP frame sequences; if the current target network is still congested, one or more frames are dropped on a previous basis in the sequence of GOP frames transmitted next, which is a cyclic process. However, no matter how to discard the designated frame, the frame rate of the transmission is still guaranteed to be greater than the minimum predetermined frame rate, which is preferably greater than or equal to 15 frame rates, and when the frame rate is guaranteed to be at least greater than or equal to 15 frame rates, it can be guaranteed that the video stream is not blocked and is basically continuous, although the minimum predetermined frame rate may be other values, and the corresponding setting is performed according to the actual situation.

Optionally, the transmission module 66 in this embodiment includes: and the first transmission unit is used for transmitting the appointed frame to the target equipment through a first transmission channel and transmitting other frames except the appointed frame in the target frame sequence to the target equipment through a second transmission channel, wherein the first transmission channel is different from the second transmission channel.

Optionally, the transmission module 66 in this embodiment further includes: the receiving unit is used for receiving an indication message sent by the target equipment; and the second transmission unit is used for responding to the indication message, transmitting the specified frame to the target device through the first transmission channel, and transmitting other frames except the specified frame in the target frame sequence to the target device through the second transmission channel, wherein the first transmission channel is different from the second transmission channel.

As can be seen from the first transmission unit and the second transmission unit, in this embodiment, for the discarding of the designated frame, it can also be implemented by setting a transmission channel, that is, the designated frame is separately placed in one channel, so that the target device does not directly receive the designated frame of the channel under the condition of network transmission congestion.

According to a further aspect of embodiments of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, encoding a target video stream to be transmitted to obtain a target frame sequence;

s2, determining partial frames in the target frame sequence as designated frames, wherein the reference frames of the designated frames are the frames before and after the designated frames in the target frame sequence, and the designated frames are not the reference frames of other frames in the target frame sequence;

and S3, transmitting the target frame sequence to the target device through the target network, wherein, in the case of congestion of the target network, part or all of the designated frames in the target frame sequence are discarded.

s1, a first frame group of the designated frames in the target frame sequence is discarded, wherein the frame rate of the target frame sequence after discarding the first frame group is still larger than the minimum predetermined frame rate.

s1, after discarding the first group of frames in the appointed frames in the target frame sequence, if the target network is still in congestion, continuing to discard one or more groups of frames in the appointed frames in the target frame sequence; wherein the frame rate of the sequence of target frames after dropping one or more of the specified frames is still greater than the minimum predetermined frame rate.

s1, transmitting the designated frame to the target device through a first transmission channel, and transmitting other frames except the designated frame in the target frame sequence to the target device through a second transmission channel, wherein the first transmission channel is different from the second transmission channel.

s1, receiving an indication message sent by the target equipment;

and S2, responding to the indication message, transmitting the appointed frame to the target device through a first transmission channel, and transmitting other frames except the appointed frame in the target frame sequence to the target device through a second transmission channel, wherein the first transmission channel is different from the second transmission channel.

s1, stopping transmitting the designated frame to the target device through the first transmission channel; alternatively, the target device stops receiving the specified frame through the first transmission channel.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the method for transmitting a network adaptive video stream, as shown in fig. 7, the electronic device including: a processor 702, a memory 704, a display 706, a user interface 708, a transmission device 710, and the like. The memory has stored therein a computer program, and the processor is arranged to execute the steps of any of the above method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

The memory 704 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for transmitting a network adaptive video stream in the embodiment of the present invention, and the processor 702 executes various functional applications and data processing by running the software programs and modules stored in the memory 704, so as to implement the above-mentioned method for transmitting a network adaptive video stream. The memory 704 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 704 may further include memory located remotely from the processor 702, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 710 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 710 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 710 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for network adaptive video streaming, comprising:

coding a target video stream to be transmitted to obtain a target frame sequence;

determining a part of frames in the target frame sequence as designated frames, wherein reference frames of the designated frames are frames which are positioned in front of the designated frames and are positioned behind the designated frames in the target frame sequence, and the designated frames are not used as reference frames of other frames in the target frame sequence;

and transmitting the target frame sequence to a target device through a target network, wherein the specified frames are transmitted to the target device through a first transmission channel, and other frames except the specified frames in the target frame sequence are transmitted to the target device through a second transmission channel, the first transmission channel is different from the second transmission channel, in the case of congestion of the target network, part or all of the specified frames in the target frame sequence are discarded, and the frame rate of the target frame sequence after the part or all of the specified frames are discarded is greater than a minimum preset frame rate.

2. The method of claim 1, wherein dropping some or all of the designated frames in the sequence of target frames comprises:

discarding a first set of frames of the designated frames of the sequence of target frames, wherein a frame rate of the sequence of target frames after discarding the first set of frames is still greater than a minimum predetermined frame rate.

3. The method of claim 2, further comprising:

continuing to drop one or more of the designated frames in the sequence of target frames if the target network is still congested after dropping a first set of the designated frames in the sequence of target frames; wherein a frame rate of the sequence of target frames after dropping one or more of the specified frames is still greater than a minimum predetermined frame rate.

4. The method of claim 1, wherein transmitting the sequence of target frames to a target device over a target network comprises:

receiving an indication message sent by target equipment;

and transmitting the designated frame to the target device through a first transmission channel in response to the indication message, and transmitting other frames except the designated frame in the target frame sequence to the target device through a second transmission channel, wherein the first transmission channel is different from the second transmission channel.

5. The method of claim 1 or 4, wherein said dropping some or all of the designated frames in the sequence of target frames comprises:

stopping transmitting the designated frame to the target device through the first transmission channel; alternatively, the first and second electrodes may be,

the target device stops receiving the designated frame through the first transmission channel.

6. An apparatus for network adaptive transmission of video streams, comprising:

the encoding module is used for encoding a target video stream to be transmitted to obtain a target frame sequence;

a determining module, configured to determine a part of frames in the target frame sequence as designated frames, where reference frames of the designated frames are frames in front of and behind the designated frames in the target frame sequence, and the designated frames are not reference frames of other frames in the target frame sequence;

a transmission module, configured to transmit the target frame sequence to a target device through a target network, where the specified frame is transmitted to the target device through a first transmission channel, and other frames in the target frame sequence except the specified frame are transmitted to the target device through a second transmission channel, where the first transmission channel is different from the second transmission channel, and in a case where congestion occurs in the target network, part or all of the specified frames in the target frame sequence are dropped, and a frame rate of the target frame sequence after dropping part or all of the specified frames is still greater than a minimum predetermined frame rate.

7. The apparatus of claim 6,

the transmission module is further configured to discard a first frame group of the designated frames in the sequence of target frames, where frame rates of the sequence of target frames after discarding the first frame group are still greater than a minimum predetermined frame rate.

8. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 5 when executed.

9. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.