WO2022262602A1

WO2022262602A1 - Video coding and decoding method and apparatus

Info

Publication number: WO2022262602A1
Application number: PCT/CN2022/097097
Authority: WO
Inventors: 要瑞宵; 张凯明
Original assignee: 百果园技术(新加坡)有限公司; 要瑞宵
Priority date: 2021-06-16
Filing date: 2022-06-06
Publication date: 2022-12-22
Also published as: CN113573063A

Abstract

Disclosed are a video coding and decoding method and apparatus. The video coding and decoding method comprises: acquiring a non-first video frame to be coded, a transmission frame loss rate between a coding end and a decoding end, and a reference frame set, wherein the reference frame set comprises a video frame corresponding to at least one coded video frame, and in the reference frame set, a video frame corresponding to a coded video frame, which can be successfully decoded by the decoding end, is a reliable frame; according to the frame loss rate and a video coding rule, determining, from the reference frame set, a target reference frame corresponding to the video frame to be coded, wherein the video coding rule comprises: the greater the frame loss rate, the more video frames there are in a video to be coded that correspond to target reference frames, which are reliable frames; determining the distance between the video frame to be coded and the target reference frame in a display time sequence to be a reference distance of the video frame to be coded; coding, by using the target reference frame, the video frame to be coded, so as to obtain a coded video frame; and sending the coded video frame and the reference distance to the decoding end.

Description

Video encoding and decoding method and device

This application claims the priority of the Chinese patent application with application number 202110667857.9 submitted to the China Patent Office on June 16, 2021, the entire content of which is incorporated herein by reference.

technical field

The present application relates to the field of computer technology, for example, to a video encoding and decoding method and device.

Background technique

In order to facilitate video transmission, the video sender usually encodes multiple video frames in the video stream to be sent before sending the video stream to obtain an encoded video stream. And send the encoded video stream to the video receiving end.

The encoded video stream mainly includes two types of video frames: intra-frame coding frames (also called Intra frames, I frames) and inter-frame predictive coding frames (also called Inter frames, P frames) arranged at intervals. Wherein, the I frame is a frame that can be decoded independently, that is, when the I frame is decoded at the video receiving end, the decoded video frame can be obtained without referring to other frame data. P frames cannot be decoded independently. That is, when a P frame is decoded at the video receiving end, it needs to rely on the decoding of its previous video frame to obtain a decoded video frame. That is, the correct decoding of a P frame depends on the correct decoding of its previous frame.

However, since the P frame can only be decoded depending on its previous video frame, when the network status between the video sending end and the video receiving end is poor, video frames may be lost during transmission. Therefore, it is easy to cause some P frames to be unable to be correctly decoded, which reduces the efficiency of correct decoding of video frames, and increases the probability of problems such as playback freezes at the video receiving end.

Contents of the invention

The present application provides a video encoding and decoding method and device, electronic equipment, and a storage medium.

This application provides a video encoding and decoding method, which is applied to the encoding end, including:

Obtain the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a reference frame set, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set can be related to the decoding end The video frame corresponding to the successfully decoded coded video frame is a reliable frame;

According to the frame loss rate and the video encoding rule, determine the target reference frame corresponding to the video frame to be encoded from the reference frame set, and the video encoding rule includes: the larger the frame loss rate, the more video frames to be encoded The target reference frame corresponding to the number of video frames is the reliable frame;

determining the distance between the video frame to be encoded and the target reference frame in display timing as the reference distance of the video frame to be encoded;

Encoding the video frame to be encoded by using the target reference frame to obtain an encoded video frame;

sending the coded video frame and the reference distance to the decoding end.

This application provides a video encoding and decoding method, which is applied to the decoding end, including:

Receive the encoded video frame and the reference distance sent by the encoding end according to the above video encoding and decoding method;

Obtain a set of decoded frames, the set of decoded frames includes at least one decoded video frame, and the number of video frames included in the set of decoded frames is greater than or equal to the number of video frames included in the set of reference frames at the decoding end;

When it is determined according to the reference distance that the set of decoded frames includes a target reference frame corresponding to the encoded video frame, acquiring the target reference frame corresponding to the encoded video frame;

Decoding the coded video frame by using the target reference frame to obtain a decoded video frame.

This application provides a video codec device, which is applied to the encoding end, including:

The acquisition module is configured to acquire the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a reference frame set, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set The video frames corresponding to the encoded video frames that can be successfully decoded by the decoder are reliable frames;

The determination module is configured to determine the target reference frame corresponding to the video frame to be encoded from the set of reference frames according to the frame loss rate and video coding rules, the video coding rules include: the larger the frame loss rate, The target reference frame corresponding to the larger number of video frames in the video to be encoded is the reliable frame; and it is also set to determine the distance between the video frame to be encoded and the target reference frame in display timing as the the reference distance of the encoded video frame;

An encoding module, configured to use the target reference frame to encode the video frame to be encoded to obtain an encoded video frame;

A sending module, configured to send the coded video frame and the reference distance to the decoding end.

This application provides a video codec device, which is applied to the decoding end, including:

The receiving module is configured to receive the encoded video frame and the reference distance sent by the encoding end according to the above-mentioned video encoding and decoding method;

An acquisition module, configured to acquire a set of decoded frames, the set of decoded frames includes at least one decoded video frame, and the number of video frames included in the set of decoded frames is greater than or equal to that included in the set of reference frames at the decoding end the number of video frames;

The determining module is configured to obtain the target reference frame corresponding to the encoded video frame when determining that the set of decoded frames includes the target reference frame corresponding to the encoded video frame according to the reference distance;

The decoding module is configured to use the target reference frame to decode the coded video frame to obtain a decoded video frame.

The present application provides an electronic device, including a processor, a memory, and a computer program stored on the memory and operable on the processor. When the computer program is executed by the processor, the above-mentioned video coding decoding method.

The present application provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above video encoding and decoding method is implemented.

Description of drawings

FIG. 1 is a schematic structural diagram of a video processing system provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of another video processing system provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an IPPP frame structure provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of another IPPP frame structure provided by an embodiment of the present application;

Fig. 5 is a schematic diagram of a time-domain scalable type of scalable video coding (Scalable Video Coding, SVC) coded video frame structure provided by an embodiment of the present application;

FIG. 6 is a flowchart of a video encoding and decoding method provided by an embodiment of the present application;

FIG. 7 is a flow chart of another video encoding and decoding method provided by an embodiment of the present application;

FIG. 8 is a flow chart of another video encoding and decoding method provided by an embodiment of the present application;

FIG. 9 is a flow chart of another video encoding and decoding method provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a set of reference frames provided by an embodiment of the present application;

Fig. 11 is a schematic diagram of the principle of an encoding sub-rule provided by the embodiment of the present application;

Fig. 12 is a schematic diagram of the principle of another coding sub-rule provided by the embodiment of the present application;

Fig. 13 is a schematic diagram of the principle of another coding sub-rule provided by the embodiment of the present application;

FIG. 14 is a schematic diagram of the principle of acquiring a target reference frame provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of another method for obtaining a target reference frame provided by an embodiment of the present application;

Fig. 16 is a schematic diagram of a video frame reference relationship provided by an embodiment of the present application;

Fig. 17 is a block diagram of a video encoding and decoding device provided by an embodiment of the present application;

FIG. 18 is a block diagram of another video codec device provided by an embodiment of the present application;

Fig. 19 is a block diagram of an electronic device provided by an embodiment of the present application.

detailed description

Reference will now be made to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments are merely examples of apparatuses and methods consistent with aspects of the present application.

Please refer to FIG. 1 , which shows a schematic structural diagram of a video processing system provided by an embodiment of the present application. The video processing system is the implementation environment involved in the video codec method. As shown in FIG. 1 , the video processing system may include: an encoding end 101 and at least one decoding end 102 . In FIG. 1, a decoding terminal 102 is taken as an example for illustration. Wherein, the encoding end 101 and the decoding end 102 may be connected through a wired network or a wireless network. Both the encoding end 101 and the decoding end 102 may be located on the electronic device. The electronic device may be a mobile terminal, and the mobile terminal may be a mobile phone, a computer, a multimedia player, an electronic reader, a wearable device, and the like. The encoding end and the decoding end can realize their functions through the operating system of the electronic device, or realize their functions through the client installed on the electronic device.

Optionally, on the basis of the video processing system shown in FIG. 1 , please refer to FIG. 2 , which shows a schematic structural diagram of another video processing system provided by an embodiment of the present application. As shown in FIG. 2 , the encoding end 101 includes: a feedback sorting module 1011 , an encoding module 1012 , a sending module 1013 and a decoding module 1014 . The coding module 1012 is connected to the feedback sorting module 1011 and the sending module 1013 respectively. The decoding end 102 includes: a feedback sorting module 1021 , an encoding module 1022 , a sending module 1023 and a decoding module 1024 . The coding module 1022 is connected to the feedback sorting module 1021 and the sending module 1023 respectively. Wherein, the functions of the same modules in the encoding end 101 and the encoding end 102 are the same. The feedback sorting module is configured to sort out the decoded feedback information sent by the opposite end. The coding module is configured to generate coded video frames based on the video to be coded to form a video stream. The sending modules are all configured to send encoded video frames to the decoding module at the opposite end. The decoding module is configured to decode the received coded video frame to obtain the decoded video frame.

In the embodiment of the present application, the video encoding and decoding method provided can be applied to a real-time communication (Real-Time Communication, RTC) scenario. The real-time communication scene may include a video communication scene, a live broadcast scene, and the like. For example, for a live broadcast scenario, when an anchor user is performing live video broadcasting, the encoding end is located at the anchor terminal where the anchor user performs live video broadcasting. The host terminal generates a video stream corresponding to a video with a certain definition by performing a video encoding method on the video to be encoded, and sends the generated video stream to the audience terminal. The audience terminal refers to the terminal of the user who watches the live video of the anchor user. The decoding end is located at the audience terminal, and can decode the coded video frames in the video stream to obtain the decoded video frames, thereby obtaining a video with a certain definition.

In order to facilitate the reader's understanding, some of the professional terms involved in the following are described here in the embodiments of the present application.

1. Intra-frame predictive coding and inter-frame predictive coding

When encoding a video frame, an intra-frame predictive encoding mode or an inter-frame predictive encoding mode may be used. Wherein, when an intra-frame predictive coding mode is used to perform intra-frame predictive coding on a video frame, there is no need to use other video frames to generate an I frame. When the inter-frame predictive coding mode is used to perform inter-frame predictive coding on video frames, the previous video frame in the display sequence can be used as a reference frame to generate a P frame. Inter-frame predictive coding is forward predictive coding.

2. Encoded video frame structure in IPPP mode

In the application scenario of real-time video communication, in order to improve video compression efficiency, the frame structure of IPPP is generally used, that is, multiple P frames are encoded after an I frame. As shown in FIG. 3 , FIG. 3 shows the frame reference relationship of multiple video frames under the IPPP frame structure. Among them, the frame reference relationship of multiple video frames is shown as: I frame, P frame, P frame...P frame, I frame, P frame, P frame...P frame... ..I frame, P frame, P frame...P frame, etc. In Fig. 3, I represents an I frame, and P represents a P frame. Wherein, the arrows between the video frames in FIG. 3 indicate the reference frames of the video frames. The reference frame of each P frame is only its previous frame.

But when the network condition is not good (such as jitter, packet loss, speed limit, etc.), this frame structure is easy to cause video freeze. Please refer to Figure 4, which shows that under the frame structure shown in Figure 3, if the X frame is lost, all P frames before the next I frame, that is, the P-failure frame set identified in Figure 4 cannot be decoded correctly. In this case, even if the network condition is poor, only a few frames are lost, but because many frames cannot be decoded correctly on the decoder side, it will cause the video playback on the decoder side to freeze.

3. SVC

SVC is a mainstream video codec standard, such as an extension of H.264. SVC adopts a hierarchical prediction structure, which can be divided into three types: Temporal scalability (Temporal scalability) SVC, spatial scalability (Spatial scalability) SVC and quality scalability (Quality scalability) SVC. Taking the time-domain scalable type as an example, SVC coding can obtain the base layer and multiple enhancement layers of the original video.

Please refer to FIG. 5 , which shows a schematic diagram of a time-domain scalable SVC encoded video frame structure. As shown in FIG. 5 , the video frames included in the video to be encoded can be divided into a base layer (Layer0) and an enhancement layer (Layer1) in the temporal domain. Wherein, the video frames of the base layer may adopt intra-frame predictive coding or inter-frame predictive coding to obtain a coded frame structure in the periodically arranged IPPP mode. That is, in FIG. 5 , the encoded video frames corresponding to the video frames of the base layer include multiple P frames encoded after an I frame, and multiple P frames encoded after an I frame. The video frame of the enhancement layer (higher layer) can be coded by using the video frame of the base layer as a reference frame to obtain a coded video frame. In FIG. 5 , p represents the coded video frame obtained after the video frame of the enhancement layer is coded by using the video frame of the base layer as a reference frame. Arrows between video frames in FIG. 5 indicate reference frames for that video frame.

Under the SVC frame structure, if the coded video frame in the enhancement layer is lost, it will not affect the decoding of the coded video frame in the base layer. For example, if frame X is missing in Figure 5, only frame X cannot be decoded correctly. The rest of the encoded video frames can be decoded correctly, which can reduce the probability of video freeze at the decoding end caused by the inability to decode the encoded video frames.

Please refer to FIG. 6 , which shows a flowchart of a video encoding and decoding method provided by an embodiment of the present application. The video encoding and decoding method is applied to the encoding end shown in Fig. 1 and Fig. 2 . As shown in Figure 6, the video encoding and decoding methods include:

Step 601. Obtain the non-first video frame to be encoded, the transmission frame loss rate between the encoding end and the decoding end, and a reference frame set. The reference frame set includes at least one video frame corresponding to the encoded video frame. In the reference frame set, the decoding end can successfully The video frame corresponding to the decoded coded video frame is a reliable frame.

In the embodiment of the present application, the video encoding method may be applied to an entire video to be encoded, or may also be applied to an encoding cycle of the video to be encoded. The video to be encoded may include multiple encoding periods, and each encoding period may include multiple video frames. If the video coding method is applied to a whole section of video to be coded, the first video frame to be coded is the first video frame to be coded arranged according to display timing among the multiple video frames to be coded included in the video to be coded. The corresponding non-first video frame to be encoded is a video frame except the first video frame to be encoded among the plurality of video frames to be encoded included in the video to be encoded. If the video coding method is applied in one coding cycle, the first video frame to be coded is the first video frame to be coded arranged according to the display time sequence among the multiple video frames to be coded included in the coding cycle. The corresponding non-first video frame to be encoded is a video frame except the first video frame to be encoded among the plurality of video frames to be encoded included in the encoding cycle.

The transmission frame loss rate between the encoding end and the decoding end refers to the ratio of the number of video frames not received by the decoding end to the number of video frames transmitted from the encoding end to the decoding end within the set time period. The video frames not received by the decoding end are Video frames lost on the decoding side. The encoded video frame refers to a video frame obtained after encoding the video frame to be encoded. The video frame corresponding to the coded video frame is the video frame before coding corresponding to the coded video frame, or the reconstructed frame is obtained after the coded video frame is reconstructed. For example, the encoding end may receive the number of encoded video frames received within each set time period sent by the decoding end. And obtain the number of encoded video frames transmitted by the encoding end within the set duration. The ratio of the number of encoded video frames not received by the decoding end to the number of encoded video frames transmitted from the encoding end to the decoding end is used as the transmission frame loss rate between the encoding end and the decoding end, and the number of encoded video frames not received by the decoding end is the difference between the number of encoded video frames transmitted by the encoder within the set duration and the number of encoded video frames received by the decoder within the set duration.

The set of reference frames includes at least one video frame corresponding to the coded video frame. That is, the reference frame set includes: video frames corresponding to encoded video frames obtained after encoding in the video to be encoded. Therefore, after encoding the video frame to be encoded to obtain the encoded video frame, the encoder can store the video frame corresponding to the encoded video frame to obtain the reference frame set. For example, when the encoder acquires the first non-first video frame to be encoded, the acquired reference frame set includes the video frame corresponding to the first encoded video frame, and the first encoded video frame is the first video frame to be encoded Encoded to get. When the encoder obtains the second video frame that is not the first to be encoded, the acquired reference frame set includes the video frame corresponding to the first encoded video frame and the video frame corresponding to the second encoded video frame, the second The encoded video frame is obtained by encoding the first not the first video frame to be encoded. Wherein, the reference frame set includes reliable frames, and the reliable frames refer to video frames corresponding to coded video frames that can be successfully decoded by the decoding end. A successfully decodable encoded video frame may refer to an encoded video frame that has been successfully decoded. Alternatively, the successfully decodable encoded video frames may also refer to the successfully received encoded video frames. For example, I frames that are successfully received. Alternatively, a successfully decodable video frame may also refer to a successfully received coded video frame, and a reference frame of the coded video frame is also successfully received. A reference frame refers to a frame that needs to be referred to when encoding a video frame.

Optionally, the reference frame set, also known as the first decoded picture buffer (Decoded Picture Buffer, DPB), which may include video frames respectively corresponding to the first target number of encoded video frames, that is, may include the first target number of video frames , the value of the first target quantity may be greater than 1. The number of video frames that can be included in the reference frame set is related to the number of short-term reference frames or the number of long-term reference frames set by the encoder. Exemplarily, the first target number of video frames included in the reference frame set may be 8, 16, or 32, and so on. Wherein, the encoding end may include a reconstructed frame buffer, and the reconstructed frame buffer may be used to store the reference frame set.

In this embodiment of the present application, the video frames included in the reference frame set may be reference frames of most video frames to be encoded. For example, when the coded video frame structure of the IPPP mode is used for the video to be coded, the video frames included in the reference frame set may be video frames corresponding to any coded video frame. When the video to be encoded adopts temporally scalable SVC encoding, the video frames included in the reference frame set can only be encoded for video frames at the base layer to obtain video frames corresponding to the encoded video frames.

Step 602: Determine the target reference frame corresponding to the video frame to be encoded from the reference frame set according to the frame loss rate and the video encoding rule. The video encoding rule includes: the larger the frame loss rate, the greater the number of video frames in the video to be encoded. The target reference frame of is a reliable frame.

Optionally, the multiple video frames to be encoded included in the video to be encoded may have frame numbers in order of display timing. The video coding rules may include: when the frame loss rate is less than or equal to the first quantity threshold, the target reference frames corresponding to the video frames to be coded included in the video to be coded may all have the closest display distance to the video frames to be coded, That is, the video frame closest to the frame sequence number of the video frame to be encoded. When the frame loss rate is greater than the first number threshold, among the video frames to be encoded included in the video to be encoded, target reference frames corresponding to a set number of video frames to be encoded at each interval may be reliable frames. The target reference frames corresponding to the remaining video frames to be encoded may all be the video frames with the closest display distance to the video frame to be encoded, that is, the video frame with the closest distance to the frame sequence number of the video frame to be encoded. Wherein, the larger the frame loss rate is, the smaller the value of the set number of intervals is.

For example, in the case that the video to be encoded adopts the encoding video frame structure of the IPPP mode, it is assumed that the video to be encoded includes five video frames, that is, the first video frame to the fifth video frame. Wherein, the first video frame and the second video frame arranged according to the display sequence have been coded, and the reference frame set includes: the video frame A corresponding to the coded video frame obtained by coding the first video frame, and the coded video frame A obtained by coding the second video frame Frame corresponds to video frame B, and video frame B is a reliable frame. The first quantity threshold may be 0, that is, when the frame loss rate is 0, the target reference frame corresponding to the third video frame is: video frame B. The target reference frame corresponding to the fourth video frame is: the video frame corresponding to the encoded video frame obtained from the third video frame. The target reference frame corresponding to the fifth video frame is: the video frame corresponding to the encoded video frame obtained from the fourth video frame.

When the frame loss rate is greater than 0, and assuming that the set number is 1, the target reference frame corresponding to the third video frame is: video frame B. The target reference frame corresponding to the fourth video frame is: the video frame corresponding to the encoded video frame obtained from the third video frame. The target reference frame corresponding to the fifth video frame is: video frame B.

As another example, when the video to be coded adopts temporally scalable SVC coding, the video coding rule is applicable to all non-first video frames to be coded at the base layer. Assume that the video to be encoded includes eight video frames, that is, the first video frame to the eighth video frame, the odd video frames are in the base layer, and the even video frames are in the enhancement layer. Wherein, the first video frame, the second video frame, and the third video frame arranged according to the display sequence have all been coded, and the reference frame set includes: the video frame A corresponding to the coded video frame obtained by coding the first video frame, and the third video frame Frame coding obtains a video frame C corresponding to the coded video frame, and the video frame C is a reliable frame.

The first quantity threshold may be 0, that is, when the frame loss rate is 0, the target reference frame corresponding to the fourth video frame is: video frame C. The target reference frame corresponding to the fifth video frame is: video frame C. The target reference frame corresponding to the sixth video frame is: the video frame corresponding to the coded video frame obtained from the fifth video frame. The target reference frame corresponding to the seventh video frame is: the video frame corresponding to the encoded video frame obtained from the fifth video frame. The target reference frame corresponding to the eighth video frame is: the video frame corresponding to the encoded video frame obtained from the seventh video frame.

When the frame loss rate is greater than 0, and assuming that the set number is 0, the target reference frame corresponding to the fourth video frame is: video frame C. The target reference frame corresponding to the fifth video frame is: video frame C. The target reference frame corresponding to the sixth video frame is: the video frame corresponding to the coded video frame obtained from the fifth video frame. The target reference frame corresponding to the seventh video frame is: video frame C. The target reference frame corresponding to the eighth video frame is: the video frame corresponding to the encoded video frame obtained from the seventh video frame.

Step 603: Determine the distance between the video frame to be encoded and the target reference frame in display timing as the reference distance of the video frame to be encoded.

In the embodiment of the present application, the encoding end may determine the number of video frames between the video frame to be encoded and the target reference frame as the reference distance (RefDelta) of the video frame to be encoded. Optionally, in the case that the multiple video frames to be encoded included in the video to be encoded can have frame numbers in order of display timing, the encoding end can combine the frame numbers of the video frames to be encoded with the frame numbers of the target reference frame The difference is determined as the reference distance of the video frame to be encoded. As an example, assume that if the frame number of the video frame to be encoded is 3, that is, the third displayed video frame. The frame number of the target reference frame is 1. Then the reference distance is 2.

Step 604: Encode the video frame to be encoded by using the target reference frame to obtain the encoded video frame.

Optionally, the coding end may use the target reference frame to obtain the coded video frame by using predictive coding on the video frame to be coded. For example, the coding end may use the target reference frame to obtain the coded video frame by using inter-frame predictive coding on the video frame to be coded.

Step 605: Send the coded video frame and the reference distance to the decoder.

In the embodiment of the present application, the encoding end sends the encoded video frame and the reference distance to the decoding end through a network connected between the encoding end and the decoding end. So that after receiving the transmitted encoded video frame and the reference distance, the decoding end obtains the target reference frame corresponding to the encoded video frame according to the reference distance. And the coded video frame is decoded by using the target reference frame to obtain the decoded video frame.

To sum up, the video encoding and decoding method provided by the embodiment of the present application obtains the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a set of reference frames. According to the frame loss rate and the video coding rule, the target reference frame corresponding to the video frame to be coded is determined from the reference frame set. The distance between the video frame to be encoded and the target reference frame in display timing is determined as the reference distance of the video frame to be encoded. The video frame to be coded is coded by using the target reference frame to obtain the coded video frame. To send the encoded video frame and the reference distance to the decoder. Wherein, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set includes the video frame corresponding to the encoded video frame that can be successfully decoded by the decoding end as a reliable frame. The video coding rules include: the larger the frame loss rate is, the more target reference frames corresponding to the video frames in the video to be coded are reliable frames. Therefore, when the network status between the video sending end and the video receiving end is poor, resulting in video frame loss during transmission, that is, when the frame loss rate is greater than 0, the larger the frame loss rate, the video to be encoded The more video frames to be encoded in the frame are encoded with reliable frames. Therefore, the probability that some coded video frames cannot be decoded correctly due to the loss of video frames is reduced, the efficiency of correct decoding of video frames is improved, and the probability of problems such as playback freeze at the decoding end is reduced.

Please refer to FIG. 7 , which shows a flowchart of a video encoding and decoding method provided by an embodiment of the present application. The video coding and decoding method is applied to the decoding end shown in Fig. 1 and Fig. 2 . As shown in Figure 7, video encoding and decoding methods include:

Step 701: Receive the coded video frame and the reference distance sent by the coder.

The encoded video frame received by the decoding end is an encoded video frame generated by the encoding end according to a video encoding and decoding method provided by an embodiment of the present application.

Step 702: Obtain a decoded frame set, which includes at least one decoded video frame, and the number of video frames included in the decoded frame set is greater than or equal to the number of video frames included in the reference frame set at the decoding end.

In this embodiment of the present application, the set of decoded frames includes at least one decoded video frame. The decoding end decodes the encoded video frames received by the encoding end, and after obtaining the decoded video frames, the decoded video frames may be stored to obtain a set of decoded frames.

Optionally, the decoded frame set is also called a second decoded picture buffer (Decoded Picture Buffer, DPB), which may include a second target number of decoded video frames, and the value of the second target number may be greater than 1. Exemplarily, the second target number of video frames included in the decoded frame set is the same as the first target number of video frames included in the reference frame set.

Step 703, when it is determined according to the reference distance that the set of decoded frames includes the target reference frame corresponding to the encoded video frame, acquire the target reference frame corresponding to the encoded video frame.

Optionally, in the case that a plurality of video frames to be encoded included in the video to be encoded can have frame numbers in order of display timing, the reference distance of the video frames to be encoded can be the frame number of the video frames to be encoded and the target The difference between the frame numbers of the reference frames. The decoding end may determine the target frame number that differs from the frame number of the encoded video frame by the difference according to the reference distance. When it is determined that the decoded frame set includes the decoded video frame corresponding to the target frame number, obtain the decoded video frame corresponding to the target frame number in the decoded frame set, and use the video frame as the target reference frame corresponding to the coded video frame . When it is determined that the decoded frame set does not include the decoded video frame corresponding to the target frame number, it indicates that the decoder cannot obtain the decoded video frame corresponding to the target frame number from the decoded frame set, that is, the coded frame cannot be obtained. The target reference frame corresponding to the video frame. Even if the decoding end successfully receives the coded video frame, it cannot be decoded correctly because it cannot obtain its corresponding target reference frame. When the decoding end determines that the decoded frame set does not include the decoded video frame corresponding to the target frame number, the encoded video frame is discarded.

For example, if the frame number of the coded video frame is 3, and the reference distance is 2. Then the target reference frame corresponding to the coded video frame is the video frame whose frame number is 1 in the decoded frame set. Traverse the frame numbers of the video frames included in the decoded frame set. If it is determined that the decoded frame set includes a video frame with a frame number of 1, the video frame with a frame number of 1 is used as a target reference frame corresponding to the encoded video frame. If it is determined that the decoded frame set does not include the video frame whose frame number is 1, the received coded video frame is discarded.

Step 704: Use the target reference frame to decode the coded video frame to obtain a decoded video frame.

Optionally, the decoding end may use the target reference frame to perform predictive decoding on the coded video frame to obtain the decoded video frame. For example, the decoding end may use the target reference frame to perform inter-frame predictive decoding on the coded video frame to obtain the decoded video frame.

To sum up, the video encoding and decoding method provided by the embodiment of the present application receives the coded video frame and the reference distance generated by the encoding terminal according to a video encoding and decoding method provided in the embodiment of the present application. Acquire a decoded frame set, where the decoded frame set includes at least one decoded video frame. The target reference frame corresponding to the coded video frame can be obtained from the decoded frame set according to the reference distance. Therefore, the coded video frame is decoded by using the target reference frame to obtain a decoded video frame. Wherein, the encoded video frame is generated by the encoding end according to a video encoding and decoding method provided by the embodiment of the present application. Therefore, when the network status between the video sending end and the video receiving end is poor, resulting in the loss of video frames during transmission, the greater the frame loss rate, the more encoded video frames are used and the decoding end can be successfully decoded The encoded video frame corresponds to the reliable frame encoding. Therefore, the probability that some coded video frames cannot be correctly decoded due to video frame loss is reduced, the efficiency of correct decoding of video frames is improved, and the probability of problems such as playback freezes at the decoding end is reduced.

In the video encoding and decoding method provided in the embodiment of the present application, the video to be encoded can be encoded in IPPP mode to obtain the encoded video frame structure of IPPP mode, or can be encoded by SVC to obtain the SVC encoded video frame structure, and of course other modes can also be used. coding. The embodiments shown in Fig. 8 and Fig. 9 below take the video to be coded using time-domain scalable SVC coding as an example for illustration, then the video to be coded consists of multiple video frames, and the multiple video frames may include a base layer and At least one enhancement layer.

Please refer to FIG. 8 and FIG. 9 , which show a flowchart of a video encoding and decoding method provided by an embodiment of the present application. The video encoding and decoding method is applied to the video processing system shown in Fig. 1 and Fig. 2 .

As shown in Figure 8, for the first video frame to be encoded, the video encoding and decoding methods include:

Step 801, the encoder obtains the first video frame to be encoded.

In step 802, the encoding end performs intra-frame predictive encoding on the first video frame to be encoded to obtain the first encoded video frame.

In the embodiment of the present application, the encoding end performs intra-frame predictive encoding on the first video frame to be encoded to obtain the first encoded video frame, and the first encoded video frame is an I frame.

Step 803: The encoder adds the video frame corresponding to the first encoded video frame to the reference frame set, and selects the video frame corresponding to the first encoded video frame as a reliable frame.

In the embodiment of the present application, the reference frame set includes at least one video frame corresponding to the coded video frame. That is, the reference frame set includes: video frames corresponding to encoded video frames obtained after encoding in the video to be encoded. The first video frame is the video frame at the base layer. The encoding end encodes the first video frame to obtain a video frame corresponding to the first encoded video frame and adds it to the reference frame set. Since the first coded video frame is an I frame, and the I frame is received by the decoding end, it can be guaranteed to be successfully unlocked. Therefore, the first coded video frame can be selected as the first reliable frame in the reference frame set.

Step 804, the encoding end sends the first encoded video frame to the decoding end.

Step 805: The decoding end performs intra-frame predictive decoding on the first coded video frame to obtain the first decoded video frame.

In the embodiment of the present application, after receiving the first coded video frame sent by the coder, the decoder can use intra-frame predictive decoding for the first I frame to obtain the first decoded video frame corresponding to the first coded video frame .

Step 806, the decoder adds the first decoded video frame to the decoded frame set.

In this embodiment of the present application, the set of decoded frames may include at least one decoded video frame. The decoding end may decode the first coded video frame and add the first decoded video frame to the decoded frame set, so as to facilitate subsequent decoding of coded video frames received based on the video frame pair included in the decoded frame set.

As shown in Figure 9, for each non-first video frame to be encoded, the video encoding and decoding method may also include:

Step 901, the encoding end acquires the non-first video frame to be encoded, the transmission frame loss rate between the encoding end and the decoding end, and a set of reference frames.

Optionally, the reference frame set, also known as the first DPB, may include video frames corresponding to the first target number of encoded video frames, that is, may include the first target number of video frames, and the value of the first target number may be Greater than 1. The number of video frames that can be included in the reference frame set is related to the number of short-term reference frames or the number of long-term reference frames set by the encoder. Exemplarily, the first target number of video frames included in the reference frame set may be 8, 16, or 32, and so on. Wherein, the encoding end may include a reconstructed frame buffer, and the reconstructed frame buffer may be used to store the reference frame set. Exemplarily, as shown in FIG. 10 , the reference frame set DPB may include 16 video frames, that is, the first target number is 16. Assuming that the frame number of the current non-first video frame to be encoded is 29, the frame numbers of the video frames that may be included in the reference frame set are 1, 3...19, 21, 23, 25 and 27 respectively.

In this embodiment of the present application, the video frames included in the reference frame set may be reference frames of most video frames to be encoded. For example, when the video to be encoded adopts temporally scalable SVC encoding, the video frames included in the reference frame set can only be encoded for video frames at the base layer to obtain video frames corresponding to the encoded video frames.

Step 902, the encoding end determines the target reference frame corresponding to the video frame to be encoded from the reference frame set according to the frame loss rate and the video encoding rule.

The video coding rules include: the larger the frame loss rate is, the more target reference frames corresponding to the video frames in the video to be coded are reliable frames. Optionally, there may be multiple frame loss rate intervals. The video encoding rules may include: encoding sub-rules corresponding to multiple different frame loss rate intervals one-to-one. When the target reference frame is determined for each video frame in the same video to be encoded according to different encoding sub-rules, the number of video frames whose corresponding target reference frame is a reliable frame is different. Then, according to the frame loss rate and the video encoding rules, the encoding end determines the target reference frame corresponding to the video frame to be encoded may include: determining the corresponding target encoding sub-rule according to the target frame loss rate interval to which the frame loss rate belongs. According to the target encoding sub-rules, determine the target reference frame corresponding to the video frame to be encoded.

For example, among the video frames to be encoded included in the video to be encoded, the target reference frames corresponding to a set number of video frames to be encoded at every interval may be reliable frames. The target reference frames corresponding to the remaining video frames to be encoded may all be the video frames with the closest display distance to the video frame to be encoded, that is, the video frame with the closest distance to the frame sequence number of the video frame to be encoded. Wherein, under the encoding sub-rules corresponding to different frame loss rate intervals, the value of the set number of intervals is different, and the larger the frame loss rate corresponding to the frame loss rate interval, the smaller the value of the set number of intervals.

In this embodiment of the present application, the video encoding rule may include: a first encoding sub-rule, a second encoding sub-rule, and a third encoding sub-rule. Wherein, the first encoding sub-rule is also called unreliable reference rule, the second encoding sub-rule is also called incompletely reliable reference, and the third encoding sub-rule is also called completely reliable reference. For the video frame to be encoded at the base layer, the first encoding sub-rule is used to use the video frame with the closest frame number to the video frame to be encoded in the reference frame set as the target reference frame corresponding to the video frame to be encoded ; The second coding sub-rule is used to use the reliable frame as the target reference frame corresponding to the video frames to be coded at each interval in all video frames to be coded; the third coding sub-rule is used to use the reliable frame, As the target reference frame corresponding to each frame to be encoded. Optionally, for other video frames to be encoded (including: for video frames to be encoded in the enhancement layer), any encoding sub-rule is used to set the frame number closest to the video frame to be encoded in the reference frame set The video frame of is used as the target reference frame corresponding to the video frame to be encoded.

For example, as shown in FIG. 11 to FIG. 13 , the video frame whose frame sequence number is 29 is the video frame that is not the first video frame to be encoded currently acquired. The video to be encoded also includes frame numbers of non-first video frames to be encoded are 30, 31, 32, 33, and so on. The number of video frames that can be included in the reference frame set is 16, and the frame numbers of the video frames included in the reference frame set are 1, 3...19, 21, 23, 25 and 27. Wherein, the video frame whose frame sequence number is 21 is a reliable frame as an example for illustration.

Please refer to FIG. 11 , which shows a schematic diagram of the principle of the first encoding sub-rule provided by the embodiment of the present application. For a video frame to be encoded at the base layer, the first encoding subrule is used to use the video frame whose frame number is closest to the video frame to be encoded in the set of reference frames as the target reference frame of the video frame to be encoded. As shown in FIG. 11 , the target reference frame corresponding to the video frame to be encoded with frame number 29 is the video frame with frame number 27 found in the reference frame set. Target reference frames corresponding to video frames with

frame numbers

30, 31, 32, and 33 to be encoded are video frames with

frame numbers

29, 29, 31, and 31 in sequence.

Please refer to FIG. 12 , which shows a principle example diagram of the second coding sub-rule provided by the embodiment of the present application. The second encoding sub-rule is used to use the reliable frame as the target reference frame corresponding to the interval between the video frames to be encoded among all the video frames to be encoded, that is, the set number of intervals is 1. As shown in FIG. 12 , the target reference frames corresponding to the video frames to be encoded with

frame numbers

29 and 33 are all reliable frames. Target reference frames corresponding to video frames with

frame numbers

30, 31, and 32 to be encoded are video frames with

frame numbers

29, 29, and 31 in sequence.

Please refer to FIG. 13 , which shows a principle example diagram of the third coding sub-rule provided by the embodiment of the present application. The third encoding sub-rule is used to use reliable frames as the target reference frames corresponding to each frame to be encoded. reliable frame. Target reference frames corresponding to video frames with

frame numbers

30 and 32 to be encoded are video frames with

frame numbers

29 and 31 in sequence. The arrows in FIG. 11 to FIG. 13 indicate that the video frame pointed by the arrow is the target video frame corresponding to the video frame to be encoded at the beginning of the arrow.

Because in the video to be encoded, the frame number of the target reference frame used by the encoding end during encoding processing, the more video frames to be encoded that are closest to the frame number of the video frame to be encoded, when the decoding end plays the decoded video frame, The higher the quality of the resulting video. Therefore, the quality of the video obtained by using the first encoding sub-rule, the second encoding sub-rule, and the third encoding sub-rule is from high to low. And because the higher the quality of the video, the higher the requirements for network transmission performance between the encoding end and the decoding end, so when the network status between the encoding end and the decoding end is poor (for example, a weak network), the first encoding subclass is used. The rules, the second coding sub-rule, and the third coding sub-rule correspond to the smooth performance of video under real-time communication from low to high.

Step 903 , the encoding end determines the distance between the video frame to be encoded and the target reference frame in display timing as the reference distance of the video frame to be encoded.

Step 904: The encoding end uses the target reference frame to encode the video frame to be encoded to obtain the encoded video frame.

Step 905: When the coded video frame is obtained by coding the video frame at the base layer, the coder adds the video frame corresponding to the coded video frame to the reference frame set to obtain a new reference frame set.

Optionally, when the video to be encoded adopts temporally scalable SVC encoding, the video frames included in the reference frame set are video frames at the base layer. After encoding the video frame to be encoded to obtain the encoded video frame, the encoding end may determine whether the encoded video frame is the encoded video frame corresponding to the video frame to be encoded at the base layer. When it is determined that the encoded video frame is not obtained by encoding the video frame to be encoded at the base layer, the video frame corresponding to the encoded video frame does not need to be added to the reference frame set. When it is determined that the coded video frame is obtained by coding the video frame to be coded at the base layer, the video frame corresponding to the coded video frame is added to the reference frame set to obtain a new reference frame set. Afterwards, when the encoding end encodes the video frame that is not the first to be encoded again, the new reference frame set can be obtained. The encoding end determines the target reference frame corresponding to the video frame to be encoded from the new reference frame set according to the frame loss rate and the video encoding rule, so as to facilitate subsequent encoding of the video frame to be encoded by using the target reference frame.

Exemplarily, the coded video frame may have a hierarchical identification. The level identifier is used to indicate that the video frame to be encoded corresponding to the encoded video frame is at the base layer or at the enhancement layer. After obtaining the coded video frame, the coder can add the video frame corresponding to the coded video frame to the reference frame set when it is determined that the layer identifier of the coded video frame indicates that the video frame to be coded corresponding to the coded video frame is at the base layer . When it is determined that the level identifier of the coded video frame indicates that the video frame to be coded corresponding to the coded video frame is at an enhancement layer, the decoded video frame of the coded video frame does not need to be added to the reference frame set.

In this embodiment of the present application, the maximum number of video frames included in the reference frame set may be the first target number. Then, when the coded video frame is obtained by coding the video frame at the base layer, the coder can compare the number of video frames currently included in the reference frame set with the first target number. When the number of video frames currently included in the reference frame set is less than the first target number, the encoding end may directly add video frames corresponding to the encoded video frames to the reference frame set to obtain a new reference frame set. When the number of video frames currently included in the reference frame set is equal to the first target number, the encoder can delete the video frame with the smallest frame number among all the video frames included in the reference frame set, and add the video frame corresponding to the encoded video frame to Reference frame set to get a new reference frame set.

The reference frame sets obtained in step 901 are all: when the encoding end executes the video encoding and decoding method for the previous video frame that is not the first to be encoded, it obtains a new reference frame set through step 905 .

Step 906, the encoding end sends the encoded video frame and the reference distance to the decoding end.

Step 907: The decoding end obtains a decoded frame set, which includes at least one decoded video frame, and the number of video frames included in the decoded frame set is greater than or equal to the number of video frames included in the reference frame set of the decoding end.

In the embodiment of the present application, after receiving the encoded video frame and the reference distance sent by the encoding end, the decoding end can obtain the set of decoded frames. The decoded frame set is also referred to as the second DPB, which may include a second target number of decoded video frames, and the value of the second target number may be greater than 1. Exemplarily, the second target number of video frames included in the decoded frame set is the same as the first target number of video frames included in the reference frame set.

The set of decoded frames may include at least one decoded video frame. The decoding end may decode the encoded video frames received from each encoding end, and after obtaining the decoded video frames, store the decoded video frames sequentially according to the receiving order to obtain a set of decoded frames.

Step 908, when the decoding end determines that the decoded frame set includes the target reference frame corresponding to the encoded video frame according to the reference distance, obtain the target reference frame corresponding to the encoded video frame.

Optionally, in the case that a plurality of video frames to be encoded included in the video to be encoded can have frame numbers in order of display timing, the reference distance of the video frames to be encoded can be the frame number of the video frames to be encoded and the target The difference between the frame numbers of the reference frames. The decoding end may determine the target frame number that differs from the frame number of the encoded video frame by the difference according to the reference distance. When it is determined that the decoded frame set includes the decoded video frame corresponding to the target frame number, obtain the decoded video frame corresponding to the target frame number in the decoded frame set, and use the video frame as the target reference frame corresponding to the coded video frame . When it is determined that the decoded frame set does not include the decoded video frame corresponding to the target frame number, it indicates that the decoder Wang Fuan has obtained the decoded video frame corresponding to the target frame number from the decoded frame set, that is, the decoded video frame cannot be obtained. The target reference frame corresponding to the encoded video frame. Even if the decoding end successfully receives the coded video frame, it cannot be decoded correctly because it cannot obtain its corresponding target reference frame. When the decoding end determines that the decoded frame set does not include the decoded video frame corresponding to the target frame number, the encoded video frame is discarded.

For example, as shown in FIG. 14 , it is assumed that the frame sequence number of the currently received coded video frame is 29. The frame numbers of the video frames included in the decoded frame set are 1, 3...19, 23, 25 and 27, and the reference distance is 8. Then the target reference frame corresponding to the coded video frame is the video frame whose frame number is 21 in the decoded frame set. Traverse the frame numbers of the video frames included in the decoded frame set. If it is determined that the video frame with frame number 21 is not included in the set of decoded frames, the received coded video frame with frame number 29 is discarded.

As shown in FIG. 15 , assume that the frame sequence number of the currently received coded video frame is 29. The frame numbers of the video frames included in the decoded frame set are 1, 3...19, 21, 23 and 27, and the reference distance is 8. Then the target reference frame corresponding to the coded video frame is the video frame whose frame number is 21 in the decoded frame set. Traverse the frame numbers of the video frames included in the decoded frame set. If it is determined that the decoded frame set includes the video frame whose frame number is 21, then the video frame whose frame number is 21 in the decoded frame set is used as the target reference frame corresponding to the encoded video frame.

Step 909, the decoding end uses the target reference frame to decode the coded video frame to obtain a decoded video frame.

In step 910, if the decoded video frame is a video frame in the base layer, the decoder adds the decoded video frame to the decoded frame set to obtain a new decoded frame set.

Optionally, in the case that the video to be encoded adopts time-domain scalable SVC encoding, corresponding to the reference frame set, the video frame included in the decoded frame set is the decoded video frame corresponding to the video frame to be encoded at the base layer video frame. After decoding the received coded video frame, the decoder determines whether the coded video frame is a coded video frame corresponding to a video frame to be coded at the base layer. When it is determined that the coded video frame is not the coded video frame corresponding to the video frame to be coded at the base layer, it is not necessary to store the decoded video frame of the coded video frame in the decoded frame set, and the decoded video frame of the coded video frame can be displayed video frames. When it is determined that the coded video frame is the coded video frame corresponding to the video frame to be coded at the base layer, the decoded video frame of the coded video frame is stored in the decoded frame set to obtain a new decoded frame set. And display the decoded video frame of the coded video frame. Afterwards, when the decoding end receives the encoded video frame sent by the encoding end again, it can acquire the new set of decoded frames. According to the reference distance corresponding to the coded video frame received again, when it is determined that the target reference frame corresponding to the coded video frame is included in the new set of decoded frames, the target reference frame corresponding to the coded video frame is obtained, so as to facilitate subsequent use of the target reference frame Frame to decode encoded video frames.

For example, the coded video frame received by the decoding end may have a layer identifier. The level identifier is used to indicate that the video frame to be encoded corresponding to the encoded video frame is at the base layer or at the enhancement layer. After the decoding end decodes the received coded video frame, when it is determined that the layer identifier of the coded video frame indicates that the video frame to be coded corresponding to the coded video frame is at the basic layer, the decoded video frame of the coded video frame Store to the set of decoded frames and display the decoded video frame of the encoded video frame. When it is determined that the layer identifier of the coded video frame indicates that the video frame to be coded corresponding to the coded video frame is at an enhancement layer, the decoded video frame of the coded video frame is displayed.

In this embodiment of the present application, the maximum number of video frames included in the set of decoded frames may be the second target number. Then, when the encoded video frame is obtained by encoding the video frame at the base layer, the decoder may compare the number of video frames currently included in the decoded frame set with the second target number. When the number of video frames currently included in the decoded frame set is less than the second target number, the decoding end may directly add the decoded video frames of the coded video frame to the decoded frame set to obtain a new decoded frame set. When the number of video frames currently included in the decoded frame set is equal to the second target number, the decoder can delete the video frame with the smallest frame number among all the video frames included in the decoded frame set, and add the decoded video frame to the decoded frame Set to get a new set of decoded frames.

The decoded frame sets obtained in step 907 are all: when the decoder executes the video encoding and decoding method for the previously received video frame that is not the first to be encoded, a new decoded frame set is obtained through step 910 .

Step 911, the decoding end sends decoding feedback information to the encoding end, and the decoding feedback information includes: frame number and loss flag.

The lost flag is used to reflect whether the decoder successfully receives the coded video frame corresponding to the frame number and decodes the coded video frame corresponding to the frame number.

Optionally, the lost flag may include not lost status and lost status. The not-lost state may indicate that the coded video frame is successfully received by the decoding end, and the coded video frame is successfully decoded. That is, the loss flag in the not-lost state is used to reflect that the decoding end decodes the coded video frame corresponding to the frame number. A lost state may indicate that the encoded video frame was successfully received at the decoder, but the encoded video frame was not successfully decoded. Alternatively, the lost state may also indicate that the decoding end has not successfully received the encoded video frame.

For example, when the decoding end determines that the target reference frame corresponding to the received coded video frame is included in the set of decoded frames, it may confirm that the loss mark of the coded video frame is not lost. When the decoding end determines that the set of decoded frames does not include the target reference frame corresponding to the received coded video frame, it can confirm that the loss of the coded video frame is marked as a lost state. When the decoding end determines that the encoded video frame corresponding to the frame sequence number has not been received, it may determine that the loss flag of the encoded video frame corresponding to the frame sequence number is in a lost state.

Because the frame number is allocated according to the display timing of multiple video frames in the video to be encoded. Therefore, the video frame to be encoded in the video to be encoded, the encoded video frame after encoding the video frame to be encoded, and the decoded video frame obtained after decoding the encoded video frame all have the same frame number.

Step 912, the encoder updates the reliable frames in the new reference frame set based on the decoding feedback information.

In this embodiment of the present application, after receiving the decoding feedback information sent by the decoding end, the encoding end may update the reliable frames in the new reference frame set obtained in step 905 based on the encoding feedback information.

Optionally, based on the decoding feedback information, the process of updating the reliable frames in the new reference frame set at the encoding end may include: the encoding end selects the lost flag as the unlost state, frame A video frame whose serial number is the maximum value, whose corresponding target reference frame is a reliable frame, and is in the base layer is regarded as a reliable frame.

For example, the decoding end may send decoding feedback information to the encoding end each time step 909 is completed, that is, each time the encoded video frame is decoded to obtain a decoded video frame. Alternatively, the decoding end may send decoding feedback information to the encoding end once after performing step 909 multiple times, that is, decoding the encoded video frame multiple times to obtain the decoded video frame. Then, the decoding feedback information received by the encoder each time may only include a frame sequence number and a corresponding loss flag. Alternatively, the decoding feedback information received by the encoding end each time may include multiple frame numbers and loss flags corresponding to the multiple frame numbers. Alternatively, the decoding end may also send decoding feedback information to the encoding end once at a set time interval. The decoding feedback information received by the encoding end may include the frame sequence number of the encoded video frame transmitted between the encoding end and the decoding end within the set time period and a loss flag corresponding to the frame sequence number. For example, within a set time interval, the frame numbers of the coded video frames sent by the encoding end to the decoding end include: frame number X1, frame number X2, and frame number X3. Then the decoding feedback information sent by the decoding end to the encoding end after setting the interval includes: frame number X1 and the loss flag corresponding to frame number X1, frame number X2 and the loss flag corresponding to frame number X2, frame number X3 and frame number X3 Corresponding missing markers.

In the case where the decoding feedback information includes multiple frame numbers and the missing flags corresponding to the multiple frame numbers, the multiple frame numbers in the decoding feedback information can be arranged monotonically increasing according to the display order of the corresponding video frames, that is, according to the frame numbers from small to Arranged in big order.

The encoder can perform reliable frame judgment processing for each frame number in sequence in the order of frame numbers in the decoding feedback information from small to large, until the reliable frame judgment processing is completed for multiple frame numbers in the decoding feedback information. Among all the video frames included in the new set of reference frames, the lost flag is not lost, the frame number is the maximum value, and the corresponding target reference frame is a reliable frame, and the video frame in the base layer is regarded as a reliable frame. The reliable frame judging process includes: judging whether the loss flag corresponding to the frame number is not lost, judging whether the frame number is greater than the frame number of the current reliable frame in the new reference frame set, judging whether the video frame corresponding to the frame number is in the basic Layer video frames, and determine whether the target reference frame corresponding to the frame number is a reliable frame. When it is determined that the loss mark corresponding to the frame number is not lost, the frame number is greater than the frame number of the current reliable frame in the new reference frame set, the video frame corresponding to the frame number is a video frame in the basic layer, and the target reference corresponding to the frame number When the frame is a reliable frame, the video frame corresponding to the frame number in the new reference frame set is taken as a reliable frame. That is, the video frame corresponding to the frame number in the new reference frame set is updated as a new reliable frame.

As an example, assume that the video frame whose frame number is 21 in the current reference frame set is a reliable frame. The decoding feedback information sent by the decoding end includes

frame numbers

22 , 23 , 24 and 25 . Among them, the frame reference relationship corresponding to the

frame numbers

21, 22, 23, 24 and 25 is shown in FIG. 16 . As shown in FIG. 16 , video frames with

frame numbers

21, 23 and 25 are video frames in the base layer; video frames with

frame numbers

22 and 24 are video frames in the enhancement layer. Arrow marks reference relationship among Fig. 16, and the frame sequence number of its corresponding target reference frame is 21 for the video frame of

frame sequence number

22 and 23; The frame sequence number of its corresponding target reference frame of frame sequence number is 24 and 25 is 23.

In an example, it is assumed that the loss flags corresponding to the multiple frame numbers included in the decoding feedback information are all in a non-lost state. Then, after receiving the decoding feedback information, the encoding end may perform reliable frame determination processing for the frame number 22 first. The video frame corresponding to the frame number 22 is a video frame in the enhancement layer, and the video frame corresponding to the frame number 22 in the new reference frame set cannot be used as a reliable frame. Reliable frame judgment processing is then performed for frame number 23. The loss mark corresponding to the frame number 23 is not lost, the frame number 23 is greater than the frame number 21 of the current reliable frame in the new reference frame set, and the video frame corresponding to the frame number 23 is a video frame in the basic layer, and the frame number 23 corresponds to The target reference frame 21 of is a reliable frame. Therefore, the video frame with frame number 23 in the new reference frame set is updated as a reliable frame. After that, reliable frame judgment processing is executed for frame number 24 . The video frame corresponding to the frame number 24 is a video frame in the enhancement layer, and the video frame corresponding to the frame number 24 in the new reference frame set cannot be used as a reliable frame. After that, reliable frame judgment processing is executed for frame number 25. The loss mark corresponding to the frame number 25 is not lost, the frame number 25 is greater than the frame number 23 of the current reliable frame in the new reference frame set, and the video frame corresponding to the frame number 25 is a video frame in the basic layer, and the frame number 25 corresponds to The target reference frame 23 is a reliable frame. Therefore, the video frame with frame number 25 in the new reference frame set is updated as a reliable frame. Finally, based on the decoding feedback information received this time, the decoding end updates and obtains that the reliable frame in the new reference frame set is the video frame with frame number 25.

In another example, it is assumed that the loss flags corresponding to the

frame numbers

22, 24 and 25 included in the decoding feedback information are not lost, and the loss flag corresponding to the frame number 23 is in the lost state. Then, after receiving the decoding feedback information, the encoding end may perform reliable frame determination processing for the frame number 22 first. The video frame corresponding to the frame number 22 is a video frame in the enhancement layer, and the video frame corresponding to the frame number 22 in the new reference frame set cannot be used as a reliable frame. Reliable frame judgment processing is then performed for frame number 23. If the loss flag corresponding to frame number 23 is in the lost state, then the video frame corresponding to frame number 23 in the new reference frame set cannot be used as a reliable frame. After that, reliable frame judgment processing is executed for frame number 24 . The video frame corresponding to the frame number 24 is a video frame in the enhancement layer, and the video frame corresponding to the frame number 24 in the new reference frame set cannot be used as a reliable frame. After that, reliable frame judgment processing is executed for frame number 25. The loss mark corresponding to frame number 25 is not lost, but its reference frame 23 is in a lost state, then the video frame corresponding to frame number 25 in the new reference frame set cannot be used as a reliable frame. Finally, based on the decoding feedback information received this time, the decoding end updates and obtains that the reliable frame in the new reference frame set is still the video frame with frame number 21.

In the embodiment of the present application, when the decoding end sends decoding feedback information, the encoding end may obtain the transmission frame loss rate between the encoding end and the decoding end based on the decoding feedback information sent by the decoding end. Optionally, in step 901, the process for the encoding end to obtain the transmission frame loss rate of the encoding end and the decoding end includes: the encoding end determines the transmission frame loss rate according to the loss flag received within a set time period.

For example, the encoding end may count the received decoding feedback information sent by the decoding end within a set time period t closest to the current moment of the encoding end. Count the total number of frame numbers N _ack included in the decoding feedback information, that is, the total number of video frames fed back by the decoding feedback information, and count the number of all lost flags N _loss , that is, the number of video frames that are marked as lost Number N _loss . The encoding end determines the ratio of the number N _loss of lost markers in the lost state to the total number N _ack as the transmission frame loss rate P _-loss between the encoding end and the decoding end.

In the embodiment of the present application, the encoding end may include a sending buffer, and the sending buffer is used to store encoded video frames to be sent and a reference distance. Then, before the encoding end sends the encoded video frame and the reference distance to the decoding end, the video encoding and decoding method further includes: the encoding end writes the encoded video frame into the sending buffer. Then the process of sending the encoded video frame and the reference distance from the encoding end to the decoding end may include: when the occupancy of the sending buffer is greater than the data volume threshold, and the encoding video frame is the target encoding video frame, sending the encoding video frame to the decoding end, the target The encoded video frame is obtained by encoding the video frame at the base layer. Alternatively, when the occupancy of the sending buffer is less than or equal to the data volume threshold, the encoded video frame is sent to the decoding end.

Optionally, the coding end encodes the video frame to be coded to obtain the coded video frame and may first store it in the sending buffer (SendBuffer). By judging the occupancy of the sending buffer and the data volume threshold, the current network transmission capacity between the encoding end and the decoding end is determined. When the occupancy of the sending buffer is greater than the data volume threshold, it indicates that there are too many encoded video frames to be sent to the decoding end stored in the sending buffer, and the network transmission capacity between the encoding end and the decoding end is insufficient, resulting in the failure to transfer the sending buffer The encoded video frames stored in the internal storage are sent out in time. Then the encoding end can delete the encoded video frames with relatively low importance stored in the sending buffer, so as to ensure that the encoded video frames with high importance can be sent to the decoding end in time through the limited network transmission capacity. When the occupancy of the sending buffer is less than or equal to the data volume threshold, it indicates that there are not too many encoded video frames to be sent to the decoding end stored in the sending buffer, and the network transmission capacity between the encoding end and the decoding end is sufficient. Then the encoding end can send all the encoded video frames stored in the sending buffer to the decoding end, so as to ensure the quality of the transmitted video.

For example, when the occupancy of the sending buffer is greater than the data volume threshold, the encoding end may only send the target encoded video frame obtained by encoding the video frame at the base layer to the decoding end, so that in consideration of the network transmission capacity, Ensure that the decoding end receives encoded video frames that can be decoded continuously. When the occupancy of the sending buffer is less than or equal to the data volume threshold, the encoding end may send the encoded video frames obtained by encoding the video frames in the base layer and the enhancement layer to the decoding end.

The encoding end can also determine the current network transmission capability between the encoding end and the decoding end by whether the total time range of encoded video frames accumulated and stored in the sending buffer is greater than the time threshold T _-drop . When the total time range of encoded video frames accumulated and stored in the sending buffer is greater than the time threshold, it indicates that the current network transmission capacity between the encoding end and the decoding end is insufficient. When the encoding end determines that the encoded video frame is a target encoded video frame, it sends the encoded video frame to the decoding end, and the target encoded video frame is obtained by encoding a video frame at the base layer. When the total time range of encoded video frames accumulated and stored in the sending buffer is less than or equal to the time threshold, it indicates that the current network transmission capacity between the encoding end and the decoding end is sufficient. The encoder sends all encoded video frames to the decoder.

In the embodiment of the present application, the decoding end may determine the number of encoded video frames to be sent according to the current network transmission capability between the encoding end and the decoding end. Therefore, when the network transmission capability is poor, on the basis of ensuring that the decoding end can be continuously decoded, a small number of encoded video frames can be transmitted between the encoding end and the decoding end. Therefore, the probability of frame loss caused by insufficient network transmission capacity is reduced, which in turn reduces the probability that some encoded video frames cannot be decoded correctly due to video frame loss, improves the correct decoding efficiency of video frames, and reduces playback stuttering at the decoding end, etc. probability of the problem.

The order of the steps of the video encoding and decoding method provided in the embodiment of the present application can be adjusted appropriately, and the steps can also be increased or decreased accordingly according to the situation. The following are device embodiments of the present application, which can be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application. For example, step 905 can be located before any step from step 907 to step 912, as long as it is ensured that after the encoding end obtains the encoded video frame, the encoding end updates the reference frame set based on the video frame corresponding to the encoded video frame, so that the next to-be-encoded frame can be determined The reference frame set used by the video frame is the updated reference frame set.

To sum up, the video encoding and decoding method provided by the embodiment of the present application obtains the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a set of reference frames. According to the frame loss rate and the video coding rule, the target reference frame corresponding to the video frame to be coded is determined from the reference frame set. The distance between the video frame to be encoded and the target reference frame in display timing is determined as the reference distance of the video frame to be encoded. The video frame to be coded is coded by using the target reference frame to obtain the coded video frame. To send encoded video frames and reference distances to the decoder. Wherein, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set includes the video frame corresponding to the encoded video frame that can be successfully decoded by the decoding end as a reliable frame. The video coding rules include: the larger the frame loss rate is, the more target reference frames corresponding to the video frames in the video to be coded are reliable frames. Therefore, when the network status between the video sending end and the video receiving end is poor, resulting in video frame loss during transmission, that is, when the frame loss rate is greater than 0, the larger the frame loss rate, the video to be encoded The more video frames to be encoded in the frame are encoded with reliable frames. Therefore, the probability that some coded video frames cannot be correctly decoded due to video frame loss is reduced, the efficiency of correct decoding of video frames is improved, and the probability of problems such as playback freezes at the decoding end is reduced.

A video encoding and decoding device provided in the embodiment of the present application can execute the video encoding and decoding method applied to any one of the multiple microservice nodes of the server provided in any embodiment of the application, and has the ability to execute the video encoding and decoding method applied to the client The corresponding functional modules and effects of the video codec method.

Fig. 17 is a flowchart showing a video codec device according to an exemplary embodiment, and the video codec device is applied to an encoding end. As shown in FIG. 17 , a video codec device 1700 includes: an acquisition module 1701 , a determination module 1702 , an encoding module 1703 and a sending module 1704 .

The acquisition module 1701 is configured to acquire the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a reference frame set, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set is related to the decoding The video frame corresponding to the coded video frame that can be successfully decoded by the end is a reliable frame; the determination module 1702 is configured to determine the target reference frame corresponding to the video frame to be coded from the reference frame set according to the frame loss rate and the video coding rule, and video coding The rules include: the larger the frame loss rate, the larger the target reference frame corresponding to the larger number of video frames in the video to be encoded is a reliable frame; and it is also set to determine the distance between the video frame to be encoded and the target reference frame in display timing as The reference distance of the video frame to be encoded; the encoding module 1703 is configured to use the target reference frame to encode the video frame to be encoded to obtain the encoded video frame; the sending module 1704 is configured to send the encoded video frame and the reference distance to the decoding end.

To sum up, the video encoding and decoding device provided by the embodiment of the present application acquires the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and the reference frame set through the acquisition module. The determination module determines the target reference frame corresponding to the video frame to be encoded from the reference frame set according to the frame loss rate and the video encoding rule. The distance between the video frame to be encoded and the target reference frame in display timing is determined as the reference distance of the video frame to be encoded. The encoding module uses the target reference frame to encode the video frame to be encoded to obtain the encoded video frame. So that the sending module sends the coded video frame and the reference distance to the decoding end. Wherein, the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set includes the video frame corresponding to the encoded video frame that can be successfully decoded by the decoding end as a reliable frame. The video coding rules include: the larger the frame loss rate is, the more target reference frames corresponding to the video frames in the video to be coded are reliable frames. Therefore, when the network status between the video sending end and the video receiving end is poor, resulting in video frame loss during transmission, that is, when the frame loss rate is greater than 0, the larger the frame loss rate, the video to be encoded The more video frames to be encoded in the frame are encoded with reliable frames. Therefore, the probability that some coded video frames cannot be correctly decoded due to video frame loss is reduced, the efficiency of correct decoding of video frames is improved, and the probability of problems such as playback freezes at the decoding end is reduced.

A video codec device provided in the embodiment of the present application can execute the video codec method applied to the microservice management device provided in any embodiment of the present application, and has the corresponding functions for executing the video codec method applied to the microservice management device Function modules and effects.

Fig. 18 is a flowchart showing a video codec device according to an exemplary embodiment, and the video codec device is applied to a decoding end. As shown in FIG. 18 , a video codec device 1800 includes: a receiving module 1801 , an acquiring module 1802 and a decoding module 1803 .

The receiving module 1801 is configured to receive the encoded video frame and the reference distance sent by the encoding end according to any video codec device provided by the embodiment of the present application; the obtaining module 1802 is configured to obtain a set of decoded frames, and the set of decoded frames includes at least one decoding After the video frame, the number of video frames included in the decoded frame set is greater than or equal to the number of video frames included in the reference frame set at the decoding end; and it is also set to determine the encoded video frame in the decoded frame set according to the reference distance When the corresponding target reference frame is obtained, the target reference frame corresponding to the coded video frame is obtained; the decoding module 1803 is configured to decode the coded video frame by using the target reference frame to obtain a decoded video frame.

The video codec device provided in the embodiment of the present application receives, through the receiving module, the coded video frame and the reference distance generated by the coder according to a video codec method provided in the embodiment of the present application. The acquiring module acquires a decoded frame set, and the decoded frame set includes at least one decoded video frame. The target reference frame corresponding to the coded video frame can be obtained from the decoded frame set according to the reference distance. Therefore, the decoding module uses the target reference frame to decode the coded video frame to obtain the decoded video frame. Wherein, the encoded video frame is generated by the encoding end according to a video encoding and decoding method provided by the embodiment of the present application. Therefore, when the network status between the video sending end and the video receiving end is poor, resulting in the loss of video frames during transmission, the greater the frame loss rate, the more encoded video frames are used and the decoding end can be successfully decoded The encoded video frame corresponds to the reliable frame encoding. Therefore, the probability that some coded video frames cannot be correctly decoded due to video frame loss is reduced, the efficiency of correct decoding of video frames is improved, and the probability of problems such as playback freezes at the decoding end is reduced.

Fig. 19 is a block diagram of an electronic device provided by an embodiment of the present application. The electronic device provided in this embodiment of the present application includes a processor 1901, a memory 1902, and a computer program stored on the memory 1902 and operable on the processor 1901, and the computer program is implemented when executed by the processor 1901. The video encoding and decoding method described in any one of the above embodiments.

The embodiment of the present application also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, multiple processes of the above-mentioned video encoding and decoding method embodiments can be achieved, and the same To avoid repetition, the technical effects will not be repeated here. Wherein, the computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like. The computer-readable storage medium may be a non-transitory storage medium.

Claims

A video encoding and decoding method applied to an encoding end, comprising:

Obtain the non-first video frame to be encoded, the transmission frame loss rate of the encoding end and the decoding end, and a reference frame set, wherein the reference frame set includes at least one video frame corresponding to the encoded video frame, and the reference frame set is related to the The video frame corresponding to the encoded video frame that can be successfully decoded by the decoder is a reliable frame;

According to the frame loss rate and video coding rules, determine the target reference frame corresponding to the video frame to be coded from the set of reference frames, wherein the video coding rules include: the larger the frame loss rate, the larger the video frame to be coded The target reference frame corresponding to a larger number of video frames is the reliable frame;

determining the distance between the video frame to be encoded and the target reference frame in display timing as the reference distance of the video frame to be encoded;

Encoding the video frame to be encoded by using the target reference frame to obtain an encoded video frame;

sending the coded video frame and the reference distance to the decoding end.
The method according to claim 1, wherein the video to be encoded is composed of a plurality of video frames, and the plurality of video frames include a base layer and at least one enhancement layer, and the plurality of video frames have The assigned frame number, the video frame included in the reference frame set is a video frame in the basic layer;

The method also includes:

When the encoded video frame is obtained by encoding a video frame at the base layer, adding a video frame corresponding to the encoded video frame to the reference frame set to obtain a new reference frame set;

receiving decoding feedback information sent by the decoding end, wherein the decoding feedback information includes: a frame number and a loss flag, and the loss flag is used to reflect whether the decoding end has successfully received the encoded video frame corresponding to the frame number, And decoding the coded video frame corresponding to the frame number;

Based on the decoding feedback information, reliable frames in the new set of reference frames are updated.
The method according to claim 2, wherein said updating reliable frames in said new reference frame set based on said decoding feedback information comprises:

Among all the video frames included in the new reference frame set, select the video frame whose loss flag is not lost, the frame number is the maximum value, the corresponding target reference frame is a reliable frame, and is in the basic layer, as a reliable A frame, wherein the loss flag in the not-lost state is used to reflect that the decoding end decodes the coded frequency frame corresponding to the frame sequence number.
The method according to claim 2, wherein the adding the video frame corresponding to the coded video frame to the reference frame set to obtain a new reference frame set comprises:

In the case that the number of video frames currently included in the reference frame set is less than the first target number, adding the video frame corresponding to the coded video frame to the reference frame set to obtain a new reference frame set, wherein the The first target number is the maximum number of video frames that the reference frame set can include;

In the case that the number of video frames currently included in the reference frame set is the first target number, delete the video frame with the smallest frame sequence number among all video frames included in the reference frame set, and correspond to the coded video frame The video frames of are added to the reference frame set to obtain a new reference frame set.
The method according to claim 1, before the acquisition of the non-first video frame to be encoded, the transmission frame loss rate between the encoding end and the decoding end, and the set of reference frames, further comprising:

Get the first video frame to be encoded;

Using intra-frame predictive coding on the first video frame to be coded to obtain the first coded video frame;

adding the video frame corresponding to the first coded video frame to the reference frame set, and selecting the video frame corresponding to the first coded video frame as a reliable frame;

sending the first coded video frame to the decoding end.
The method according to claim 2, wherein said obtaining the transmission frame loss rate of the encoding end and the decoding end comprises:

The transmission frame loss rate is determined according to the loss flags received within the set time period.
The method according to claim 2, wherein there are a plurality of frame loss rate intervals, and the video encoding rules include: one-to-one encoding sub-rules corresponding to a plurality of different frame loss rate intervals, according to different encoding sub-rules for the same When each video frame in the coded video determines the target reference frame, the number of video frames whose corresponding target reference frame is a reliable frame is different;

The determining the target reference frame corresponding to the video frame to be encoded from the reference frame set according to the frame loss rate and video coding rules includes:

According to the target frame loss rate interval to which the frame loss rate belongs, determine a corresponding target coding sub-rule;

A target reference frame corresponding to the video frame to be encoded is determined according to the target coding sub-rule.
The method according to claim 7, wherein the video coding rules include: a first coding sub-rule, a second coding sub-rule and a third coding sub-rule,

For the video frame to be encoded at the base layer, the first encoding sub-rule is used to use the video frame with the closest frame sequence number to the video frame to be encoded in the set of reference frames as the The target reference frame corresponding to the video frame to be encoded;

The second encoding sub-rule is used to use the reliable frame as a target reference frame corresponding to a set number of video frames to be encoded at each interval among all video frames to be encoded;

The third encoding sub-rule is used to use the reliable frame as the target reference frame corresponding to each frame to be encoded.
The method according to claim 2, before sending the encoded video frame and the reference distance to the decoding end, further comprising:

Writing the encoded video frame to a sending buffer;

The sending the encoded video frame and the reference distance to the decoding end includes:

When the occupancy of the sending buffer is greater than a data volume threshold and the encoded video frame is a target encoded video frame, send the encoded video frame to the decoding end, wherein the target encoded video frame is A video frame at the base layer is encoded; or,

When the occupancy of the sending buffer is less than or equal to the data amount threshold, sending the encoded video frame to the decoding end.
A video encoding and decoding method applied to a decoding end, comprising:

Receiving the encoded video frame and the reference distance sent by the encoding end according to any one of the video encoding and decoding methods in claims 1 to 9;

Obtain a set of decoded frames, wherein the set of decoded frames includes at least one decoded video frame, and the number of video frames included in the set of decoded frames is greater than or equal to the number of video frames included in the set of reference frames at the decoding end quantity;

If it is determined according to the reference distance that the set of decoded frames includes a target reference frame corresponding to the encoded video frame, acquiring the target reference frame corresponding to the encoded video frame;

Decoding the coded video frame by using the target reference frame to obtain a decoded video frame.
The method of claim 10, further comprising:

Send decoding feedback information to the encoding end, wherein the decoding feedback information includes: a frame number and a loss flag, and the loss flag is used to reflect whether the decoding end has successfully received the encoded video frame corresponding to the frame number, and The coded video frame corresponding to the above frame number is decoded.
The method according to claim 10, wherein the video to be encoded is composed of a plurality of video frames, and the plurality of video frames include a base layer and at least one enhancement layer, and the plurality of video frames respectively have A sequence number of a frame allocated in time sequence, the video frame included in the decoded frame set is a video frame in the basic layer;

The method also includes:

If the decoded video frame is a video frame in the base layer, adding the decoded video frame to the decoded frame set to obtain a new decoded frame set.
The method according to claim 12, wherein said adding the decoded video frame to the set of decoded frames to obtain a new set of decoded frames comprises:

When the number of video frames currently included in the decoded frame set is less than a second target number, adding the decoded video frames to the decoded frame set to obtain a new decoded frame set, wherein the first The target number is the maximum number of video frames that the set of decoded frames can include;

When the number of video frames currently included in the decoded frame set is the second target number, delete the video frame with the smallest frame sequence number among all the video frames included in the decoded frame set, and convert the decoded video Frames are added to the set of decoded frames, resulting in a new set of decoded frames.
The method of claim 10, further comprising:

Receiving the first coded video frame, wherein the first coded video frame is the first video frame to be coded by intra-frame predictive coding;

Decoding the first encoded video frame by intra-frame prediction to obtain the first decoded video frame;

Adding the first decoded video frame to the set of decoded frames.
A video codec device applied to an encoding end, comprising:

An acquisition module configured to acquire a non-first video frame to be encoded, a transmission frame loss rate between the encoding end and the decoding end, and a reference frame set, wherein the reference frame set includes at least one video frame corresponding to an encoded video frame, and the reference The video frames in the frame set corresponding to the encoded video frames that can be successfully decoded by the decoder are reliable frames;

The determination module is configured to determine the target reference frame corresponding to the video frame to be encoded from the set of reference frames according to the frame loss rate and video encoding rules, wherein the video encoding rule includes: the frame loss rate is higher Larger, the target reference frame corresponding to the larger number of video frames in the video to be encoded is a reliable frame; and it is also set to determine the distance between the video frame to be encoded and the target reference frame in display timing as the the reference distance of the encoded video frame;

An encoding module, configured to use the target reference frame to encode the video frame to be encoded to obtain an encoded video frame;

A sending module, configured to send the coded video frame and the reference distance to the decoding end.
A video codec device applied to a decoding end, comprising:

The receiving module is configured to receive the encoded video frame and the reference distance sent by the encoding end according to any one of the video encoding and decoding methods in claims 1 to 9;

An acquisition module, configured to acquire a set of decoded frames, wherein the set of decoded frames includes at least one decoded video frame, and the number of video frames included in the set of decoded frames is greater than or equal to the set of reference frames at the decoding end the number of video frames included;

The determination module is configured to obtain the target reference frame corresponding to the encoded video frame when it is determined according to the reference distance that the set of decoded frames includes the target reference frame corresponding to the encoded video frame;

The decoding module is configured to use the target reference frame to decode the coded video frame to obtain a decoded video frame.
An electronic device, comprising a processor, a memory, and a computer program stored on the memory and operable on the processor, when the computer program is executed by the processor, any of claims 1 to 9 can be realized. A video encoding and decoding method according to one of the claims, or realize the video encoding and decoding method according to any one of claims 10 to 14.
A computer-readable storage medium configured to store a computer program, and when the computer program is executed by a processor, implements the video encoding and decoding method according to any one of claims 1 to 9, or implements the method described in claims 10 to 14 The video encoding and decoding method described in any item.