CN112351284B

CN112351284B - Video encoding method and apparatus, video decoding method and apparatus, electronic device, and storage medium

Info

Publication number: CN112351284B
Application number: CN202011216864.9A
Authority: CN
Inventors: 宋嘉文; 樊鸿飞; 徐琴琴
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2022-08-16
Anticipated expiration: 2040-11-04
Also published as: CN112351284A

Abstract

The application provides a video encoding method, a video decoding method, a video encoding device, a video decoding device, an electronic device and a storage medium, wherein the video encoding method comprises the following steps: acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded, which are divided by a plurality of non-key frames to be coded, a non-key frame to be coded in a first section to be coded refers to a key frame to be coded, and non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before the other sections to be coded; and coding the image group to be coded according to the target coding mode. By the method and the device, the problem that data transmission timeliness is poor due to overlarge coding and decoding time delay in a video coding and decoding mode in the related technology is solved.

Description

Video encoding method, video decoding method, video encoding apparatus, video decoding apparatus, electronic device, and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a video encoding method and apparatus, a video decoding method and apparatus, an electronic device, and a storage medium.

Background

At present, for some video processing scenes requiring timeliness of data transmission, a low-delay coding mode can be used for coding video. For example, in VR encoding, a main view uses a high definition stream, and other views use a low definition stream. When the user rotates, the code streams of other visual angles need to be switched to the high-definition code stream, so that the phenomenon that the visual experience of the user is influenced due to the fact that the user feels dizzy and the like caused by the fact that the definition of the picture in the visual angle is changed (the picture with high definition is switched to the picture with low definition) is avoided. In order to quickly switch from a low-definition video stream to a high-definition video stream, VR encoding requires the use of low-latency encoding.

The low delay coding employed in the related art is generally LDP coding. When encoding each P frame in a GOP, it is necessary to refer to both the I frame and the previous P frame, and therefore, the process of encoding and decoding P frames is serial. When the non-first P frame is decoded randomly, it is necessary to wait for the previous frame to be decoded, that is, it is necessary to decode the previous frame first, so there is a delay of at least 3 frames, and the delay time is long, which cannot meet the requirement of fast switching video stream.

For example, in VR encoding, the last frame in a GOP requires all previous frames to be decoded. When VR takes place the visual angle and switches, postpone too much, can greatly reduced use experience.

Therefore, the video encoding and decoding method in the related art has the problem of poor data transmission timeliness due to overlarge encoding and decoding time delay.

Disclosure of Invention

The application provides a video encoding method and device, a video decoding method and device, electronic equipment and a storage medium, and aims to at least solve the problem that data transmission timeliness is poor due to overlarge encoding and decoding time delay in a video encoding and decoding mode in the related technology.

According to an aspect of an embodiment of the present application, there is provided a video encoding method, including: acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded into which a plurality of non-key frames to be coded are divided, the non-key frame to be coded in a first section to be coded refers to the key frame to be coded, and the non-key frame to be coded in other sections to be coded except the first section to be coded refers to the last non-key frame of at least one section to be coded before the other sections to be coded; and coding the image group to be coded according to the target coding mode.

According to another aspect of the embodiments of the present application, there is also provided a video decoding method, including: acquiring an image group to be decoded of a video to be decoded, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to the group of pictures to be decoded, wherein the target reference relationship is used for indicating that, in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before the other fragments to be decoded; and decoding the image group to be decoded according to the target reference relation.

According to another aspect of embodiments of the present application, there is also provided a video encoding apparatus, including: the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring a group of images to be coded of a video to be coded, and the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; a determining unit, configured to determine a target encoding mode matching the group of images to be encoded, where the target encoding mode is used to indicate that, in a plurality of segments to be encoded into which a plurality of non-key frames to be encoded are divided, the non-key frame to be encoded in a first segment to be encoded refers to the key frame to be encoded, and the non-key frame to be encoded in a segment to be encoded other than the first segment to be encoded refers to a last non-key frame of at least one segment to be encoded before the other segment to be encoded; and the coding unit is used for coding the image group to be coded according to the target coding mode.

Optionally, the encoding unit includes: a first determining module, configured to determine the multiple to-be-encoded slices corresponding to the to-be-encoded image group, where each to-be-encoded slice includes at least one to-be-encoded non-key frame; and the first coding module is used for coding each non-key frame to be coded in each segment to be coded according to the target coding mode.

Optionally, the first determining module includes: the first determining submodule is used for determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is the allowed maximum delay time, the target coding and decoding time is the coding and decoding time of one video frame, the coding and decoding time comprises the coding time and the decoding time, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time; and the second determining submodule is used for determining the multiple to-be-encoded segments corresponding to the to-be-encoded image group according to the target progression, wherein the number of the to-be-encoded video segments contained in the multiple to-be-encoded video segments is smaller than or equal to the target progression.

Optionally, the first determining sub-module includes: a first determining subunit, configured to determine a target time difference between the target delay time and a first coding and decoding time, where the first coding and decoding time is the coding and decoding time of a key frame; a second determining subunit, configured to determine a quotient of the target time difference and a second coding and decoding time as the target stage number, where the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time includes the first coding and decoding time and the second coding and decoding time.

Optionally, the first determining sub-module includes: a third determining subunit, configured to determine the first segment to be encoded according to the target progression, where the number of the non-key frames to be encoded included in the segment to be encoded is a target number, the target number is greater than or equal to a value obtained by dividing a target video frame number by an upper integer of a target progression result, and the target video frame number is the number of a plurality of non-key frames to be encoded; a fourth determining subunit, configured to determine the other to-be-encoded segments according to the target number, where the number of the to-be-encoded non-key frames included in the other to-be-encoded segments is less than or equal to the target number.

Optionally, the encoding unit includes: the second coding module is used for carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is the key frame to be coded; a third encoding module, configured to, when a current video frame to be encoded belongs to the first segment to be encoded, perform target encoding on the current video frame to be encoded by using the key frame to be encoded as a reference video frame of the current video frame to be encoded according to the target encoding mode; a fourth encoding module, configured to, when a current video frame to be encoded belongs to the other segments to be encoded, perform target encoding on the current video frame to be encoded according to the target encoding mode by using a target reference video frame as a reference video frame of the current video frame to be encoded, where the target reference video frame includes a last non-key frame of a segment to be encoded before the current segment to be encoded; wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.

Optionally, the fourth encoding module comprises: a third determining sub-module, configured to determine, according to the target encoding mode, the key frame to be encoded and a last non-key frame of at least one segment to be encoded that is located before the current segment to be encoded as the target reference video frame corresponding to the current video frame to be encoded, when the number of reference video frames corresponding to the current video frame to be encoded is multiple; and the coding submodule is used for performing target coding on the current video frame to be coded by taking the target reference video frame as a reference video frame of the current video frame to be coded.

Optionally, the encoding unit includes: and the fifth coding module is used for coding all the non-key frames to be coded in the same segment to be coded in the group of pictures to be coded in parallel according to the target coding mode.

Optionally, the obtaining unit includes: the first obtaining module is configured to obtain the to-be-encoded image group of the to-be-encoded video when a main view area of a target object in the panoramic video is switched from a first view area to a second view area, where the to-be-encoded video is a portion of the to-be-encoded panoramic video corresponding to the main view area, the to-be-encoded image group is an image group in which a first video frame after view switching occurs, a definition corresponding to the main view area is a first definition, and definitions corresponding to other areas of the panoramic video except the main view area are second definitions, and the first definition is higher than the second definition.

According to still another aspect of embodiments of the present application, there is also provided a video decoding apparatus including: the device comprises an acquisition unit, a decoding unit and a decoding unit, wherein the acquisition unit is used for acquiring an image group to be decoded of a video to be decoded, and the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; a determining unit, configured to determine a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in a plurality of fragments to be decoded into which the plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to a last non-key frame of at least one fragment to be decoded before the other fragments to be decoded; and the decoding unit is used for decoding the image group to be decoded according to the target reference relation.

Optionally, the decoding unit includes: the first decoding module is used for carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded; a second decoding module, configured to, when a current video frame to be decoded belongs to the first segment to be decoded, use the key frame to be decoded as a reference video frame of the current video frame to be decoded according to the target reference relationship, and perform target decoding on the current video frame to be decoded; a third decoding module, configured to, when a current video frame to be decoded belongs to the other segments to be decoded, perform target decoding on the current video frame to be decoded according to the target reference relationship by using a target reference video frame as a reference video frame of the current video frame to be decoded, where the target reference video frame includes a last non-key frame of at least one segment to be decoded before the current segment to be decoded; wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.

Optionally, the third decoding module comprises: a fourth determining submodule, configured to determine, according to the target reference relationship, the key frame to be decoded and a last non-key frame of at least one segment to be decoded before the segment to be decoded as the target reference video frame corresponding to the current video frame to be decoded, when the number of reference video frames corresponding to the current video frame to be decoded is multiple; and the decoding submodule is used for performing target decoding on the current video frame to be decoded by taking the target reference video frame as a reference video frame of the current video frame to be decoded.

Optionally, the decoding unit includes: and the fourth decoding module is used for decoding all the non-key frames to be decoded in the same segment to be decoded in the image group to be decoded in parallel according to the target reference relationship.

Optionally, the obtaining unit includes: the second obtaining module is configured to obtain the to-be-decoded image group of the to-be-decoded video when a main view area of a target object in the panoramic video is switched from a first view area to a second view area, where the to-be-decoded video is a portion of the to-be-decoded panoramic video corresponding to the main view area, the to-be-decoded image group is an image group in which a first video frame after view switching occurs, a definition corresponding to the main view area is a first definition, and a definition corresponding to another area of the panoramic video except the main view area is a second definition, and the first definition is higher than the second definition.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein the memory is used for storing the computer program; a processor for performing the method steps in any of the above embodiments by running the computer program stored on the memory.

According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method steps of any of the above embodiments when the computer program is executed.

In the embodiment of the application, a segmented coding and decoding mode is adopted, and a group of images to be coded of a video to be coded is obtained, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded, which are divided by a plurality of non-key frames to be coded, a non-key frame to be coded in a first section to be coded refers to a key frame to be coded, and non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before the other sections to be coded; the method comprises the steps of coding an image group to be coded according to a target coding mode, segmenting non-key frames of the image group, coding and decoding all non-key frames of a first segment by referring to key frames, and decoding one non-key frame by waiting for the decoding time of the key frame and reducing the number of segments by 1 non-key frame in addition to the fact that the non-key frame of each segment after the first segment refers to the last non-key frame of at least one segment before the non-key frame.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic diagram of a hardware environment for an alternative video encoding method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an alternative video encoding method according to an embodiment of the present application;

FIG. 3 is a diagram of an alternative LDP coding mode;

FIG. 4 is a schematic diagram of an alternative VR perspective in accordance with embodiments of the present application;

FIG. 5 is a schematic diagram of an alternative video encoding method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of another alternative video encoding method according to an embodiment of the present application;

FIG. 7 is a flow chart illustrating an alternative video decoding method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an alternative video encoding and decoding method according to an embodiment of the present application;

FIG. 9 is a flow chart of an alternative video encoding and decoding method according to an embodiment of the present application;

fig. 10 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application;

FIG. 11 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the embodiments of the present application better understood, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

1. video coding: the method is a method for converting a file in an original video format into a file in another video format by a compression technology, and common video coding and decoding standards are h.264, h.265, AVS, AV1 and the like.

2. Delaying: is an important index in network transmission, measures the time required for data to travel from one end point to another end point, and generally uses milliseconds, seconds and the like as units thereof.

3. Coding delay: the delay generated in the encoding process is the time consumed by inputting the video frame to the code stream generated after the encoding is finished.

4. The type of the encoded frame: the encoded frames are generally divided into 3 types: the I frame (intra-coded frame) is also called a key frame, is used as a random access point in a video stream, is coded in an intra-frame prediction mode (intra-frame coding), does not refer to other frames, and is generally high in coding quality and low in compression efficiency; a P frame (predictive coding frame) is coded by referring to a forward I frame or other forward P frames in an interframe prediction mode or an interframe and interframe prediction combined mode, and the compression efficiency is high; b frames (bidirectional predictive coding frames) can be predictive coded by referring to the frames in the forward direction and the backward direction, and the compression efficiency is highest.

5. GOP (Group Of Pictures, coding Group): in video coding, a GOP is a set of multi-frame consecutive encoded frame sequences used to aid random access in decoding, typically each GOP beginning with an I-frame.

6. POC (PictureOrder Count, picture order): which represents the display order of the source video frames when encoding video.

7. LDP (Low Delay P, Low Delay P frame) coding: the first frame in each GOP is encoded as an I-frame, the subsequent frames are all encoded as P-frames, and each P-frame is encoded with reference to only the picture in the play order preceding it. By avoiding backward reference, the coding and decoding sequence is ensured to be consistent with the display sequence, and the coding and decoding delay is reduced. Besides the LDP coding mode, the video coding also comprises All-Intra (All-I-frame) coding configuration and Random-Access (Random Access) coding configuration.

8. RTC (Real-time Communications): the most typical applications are live broadcast and live broadcast, real-time audio and video call, video conference, interactive online education, etc.

9. VR (Virtual Reality): it is a technology for providing an immersive sensation in an interactive three-dimensional environment generated on a computer by comprehensively using a computer graphic system and various interface devices such as display and control.

According to an aspect of an embodiment of the present application, there is provided a video encoding method. Alternatively, in the present embodiment, the video encoding method described above may be applied to a hardware environment formed by an encoding end (encoding device) 102, a decoding end (decoding device) 104, and a playing device 106 as shown in fig. 1. As shown in fig. 1, the encoding end 102 is connected to the decoding end 104 through a network, and a database may be provided on the encoding end 102 (and/or the decoding end 104) or independent of the encoding end 102 (and/or the decoding end 104) for providing a data storage service for the encoding end 102 (and/or the decoding end 104). The decoding end 104 and the playing device 106 may be two devices that are independently arranged, or may be the same device, which is not limited in this embodiment.

As shown in fig. 1, the encoding end 102 may be configured to encode an input video to be transmitted (or a video frame in the video to be transmitted), obtain a corresponding video code stream, and transmit the video code stream to the decoding end 104 through a network; the decoding end 104 may be configured to decode the received video code stream to obtain a corresponding video (or a video frame), and play the obtained video (or the video frame) through the playing device 106.

Such networks may include, but are not limited to: the encoding end 102 and the decoding end 104 may be terminal devices or servers, and may be but are not limited to at least one of the following: a PC, a cell phone, a tablet, a VR device, etc. The video encoding method of the embodiment of the present application may be executed by the encoding end 102, where the encoding end 102 may be a terminal device or a server. The terminal device executing the video coding method of the embodiment of the present application may also be executed by a client installed thereon.

Taking the video encoding method in the present embodiment executed by the encoding end 102 as an example, fig. 2 is a schematic flowchart of an alternative video encoding method according to an embodiment of the present application, and as shown in fig. 2, the flowchart of the method may include the following steps:

step S202, acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded.

The video encoding method in this embodiment may be applied to scenes with video transmission requirements, such as live broadcast, RTC, VR, and the like, where the video may be live broadcast video, real-time audio and video, panoramic video, and the like, and this is not limited in this embodiment.

The encoding device may encode a video to be encoded, and the video to be encoded may be a video to be transmitted to the decoding end and played by the playing device. The video to be encoded may comprise a plurality of groups of pictures, each group of pictures may comprise a key frame and a plurality of non-key frames, and the POC of the video frames within a group of pictures may be consecutively numbered starting from 0. The POC-0 video frame may be a key frame and the remaining video frames are non-key frames. The video to be encoded is a group of pictures to be encoded currently, i.e., a group of pictures to be encoded. The group of images to be encoded may comprise a key frame to be encoded and a plurality of non-key frames to be encoded.

For example, a current group of pictures to be encoded includes 9 video frames, and according to the playing order of the video frames, the POC of the 9 video frames is: 0,1,2,3,4,5,6,7,8.

Step S204, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded divided by a plurality of non-key frames to be coded, the non-key frame to be coded in a first section to be coded refers to the key frame to be coded, and the non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before other sections to be coded.

In the related art, when encoding is performed by LDP and LDB, since a certain frame (not the first frame and the second frame) of the GOP refers to both the I frame and the picture of the previous frame, if the frame is to be decoded, the frame to which the GOP refers is decoded first, and therefore, there is a delay of at least 3 frames. For the last frame in a GOP, it is necessary to decode all previous frames before decoding is complete before decoding the frame.

For example, the LDP coding scheme may be as shown in fig. 3. For a GOP, each P frame is encoded while referring to the I frame and its previous P frame, and when any one frame is decoded, it is necessary that the I frame and all its previous P frames have been decoded before decoding the frame, and therefore, it is necessary to wait for the decoding time of the I frame and several P frames.

For the scenes with higher time delay requirements, the coding delay is too large by adopting LDP and LDB, so that the use experience can be greatly reduced. For example, for a VR scene, as shown in fig. 4, the main view area of the user is a high definition stream, and the other views are low definition streams. The main viewing angle area of the user may be switched with the transition of the user viewing angle, for example, the user viewing angle moves to the left, and the main viewing angle area moves to the left. Switching the view angle at any time requires fast conversion of the current view angle to a high definition video stream. If the LDP coding method is adopted, if the timing of random switching is just a certain frame later, it needs to wait for about 1.2 to 1.5 seconds, which exceeds the time that VR users can tolerate, and reduces the use experience.

In this embodiment, for a group of pictures to be encoded, the encoding device may determine a target encoding mode matching the group of pictures to be encoded, or a target encoding mode matching the video to be encoded. The target coding mode is used for indicating that in a plurality of to-be-coded segments into which a plurality of to-be-coded non-key frames are divided, a to-be-coded non-key frame in a first to-be-coded segment refers to a to-be-coded key frame, and a to-be-coded non-key frame in other to-be-coded segments except the first to-be-coded segment refers to a last non-key frame of at least one to-be-coded segment before other to-be-coded segments (the to-be-coded segment).

The number of to-be-encoded non-key frames included in the to-be-encoded segment may be an integer greater than or equal to 1, and the number of to-be-encoded non-key frames included in each to-be-encoded segment may be the same or different, which is not specifically limited in this embodiment.

The target encoding mode may be indicated according to configuration information, which may be an encoding rule corresponding to the target encoding mode, which may include, but is not limited to: the video frame segmentation rules, the video frame reference rules, the video frame encoding scheme (e.g., intra-frame encoding, inter-frame encoding, intra-frame encoding in combination with inter-frame encoding), and other encoding rules may also be included. The configuration information may also be encoding indication information for indicating a reference relationship between video frames, that is, the second video frame refers to the second video frame, and the encoding indication information may also indicate an encoding manner of each video frame. This is not limited in this embodiment.

And step S206, coding the image group to be coded according to the target coding mode.

After the target coding mode is determined, the coding device may code the image group to be coded according to the target coding mode to obtain a corresponding video code stream. If the target coding mode is indicated by the coding rule, the coding device can determine the reference video frame of each non-key frame to be coded according to the target coding rule. If the target coding mode is indicated by the coding indication information, the coding device can determine the reference video frame of each non-key frame to be coded according to the coding indication information. The encoding device may encode each non-key frame to be encoded according to the reference video frame of each non-key frame to be encoded and according to the encoding mode corresponding to each non-key frame to be encoded.

Each frame in the encoded video bitstream may contain information about which frames (i.e., reference relationships) are referred to when the frame is encoded, that is, indication information indicating reference relationships between video frames. The coding device can transmit the obtained video code stream to the decoding device through a network.

It should be noted that besides video frames, the video to be encoded may also contain other data, such as corresponding audio data, subtitle information, and so on. For other data, the encoding device may perform data compression in a certain data compression manner to obtain a corresponding data code stream, and transmit the data code stream to the decoding end through a network, where the data compression manner and the transmission manner (for example, transmission together with the video code stream, independent transmission, and the like) may be configured as needed, and this is not limited in this embodiment.

In this embodiment, by using a segmented encoding/decoding manner, a non-key frame of a group of pictures is divided into a plurality of segments, all non-key frames in a first segment refer to a key frame, a non-key frame in each subsequent segment refers to a last non-key frame of at least one segment before the non-key frame, and a plurality of video frames of the group of pictures are no longer in a chain reference relationship, so that the encoding/decoding delay can be reduced, and the encoding/decoding speed can be increased.

For example, for a VR scene, if the timing of the random switching is a next frame, the waiting time includes: the I frame of the image group where the video stream is located and the coding and decoding time of the P frame with the incidence relation (direct reference relation and indirect reference relation) between the I frame and the P frame can shorten the waiting time and quickly convert the current visual angle into the high-definition video stream compared with the coding and decoding time of all the video frames located before the I frame in the image group where the video stream is located, so that the use experience of a user is improved.

Through the steps S202 to S206, a group of images to be encoded of a video to be encoded is obtained, where the group of images to be encoded includes a key frame to be encoded and a plurality of non-key frames to be encoded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded, which are divided by a plurality of non-key frames to be coded, a non-key frame to be coded in a first section to be coded refers to a key frame to be coded, and non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before the other sections to be coded; the method and the device encode the image group to be encoded according to the target encoding mode, solve the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in video encoding and decoding modes in the related technology, reduce the encoding and decoding time delay and improve the encoding and decoding efficiency.

As an alternative embodiment, encoding the group of images to be encoded according to the target encoding mode includes:

s11, determining a plurality of to-be-encoded segments corresponding to the to-be-encoded image group, wherein each to-be-encoded segment comprises at least one to-be-encoded non-key frame;

and S12, according to the target coding mode, coding each non-key frame to be coded in each section to be coded.

Based on the target encoding mode, the encoding device may first determine a plurality of to-be-encoded slices corresponding to the group of images to be encoded, and each to-be-encoded slice may include at least one non-key frame to be encoded. The number of the non-key frames to be encoded contained in different segments to be encoded may be the same or different, which is not limited in this embodiment.

According to the reference relationship indicated by the target coding mode, the coding device may first determine a reference video frame of each non-key frame to be coded in each segment to be coded, and then, according to the reference video frame corresponding to each non-key frame to be coded, the coding device may code each non-key frame to be coded. The reference video frames of different video frames to be encoded may be the same or different.

By the embodiment, the accuracy and reliability of video coding can be improved by grouping the non-key frames of the image group and coding each non-key frame according to the grouping.

As an alternative embodiment, determining two segments to be encoded corresponding to a group of images to be encoded comprises:

s21, determining a target progression according to a target delay time and a target coding and decoding time, wherein the target delay time is an allowable maximum delay time, the target coding and decoding time is a coding and decoding time of a video frame, the coding and decoding time comprises a coding time and a decoding time, and the target progression is the number of non-key frames allowed to be coded and decoded in the target delay time;

and S22, determining a plurality of to-be-encoded segments corresponding to the to-be-encoded image group according to the target progression, wherein the number of to-be-encoded video segments contained in the plurality of to-be-encoded video segments is less than or equal to the target progression.

For the image group to be encoded, the number of the non-key frames to be encoded contained in each segment to be encoded may be pre-configured, and the encoding device may read the segment configuration information corresponding to the video to be encoded, and determine the non-key frames to be encoded contained in each segment to be encoded according to the segment configuration information, thereby determining a plurality of segments to be encoded corresponding to the image group to be encoded.

Optionally, the plurality of segments to be encoded corresponding to the image group to be encoded may also be estimated according to the time delay requirement (target delay time) and the time taken to encode and decode a frame (target encoding and decoding time): when a frame in a GOP is encoded, the encoding device may estimate the maximum number of delay frames that a user can endure according to the delay requirement and the time required to encode and decode a frame, and calculate the maximum progression (target progression, maximum number of segments) according to the maximum number of delay frames, thereby determining a plurality of to-be-encoded segments corresponding to the to-be-encoded image group.

The latency requirement may be a maximum decoding latency (maximum allowed delay time) tolerable by a user, for example, for a VR scene (or similar panoramic video scene), the latency requirement refers to: after switching views, the user can tolerate the time required to switch from the current view to the high definition video stream. The time delay requirement may be manually input (manually input by a user, a preset default value according to an empirical value, etc.), or may be calculated according to object characteristics of a certain object (for example, characteristics for characterizing the object state), or may be calculated according to object characteristics of a plurality of objects, which is not limited in this embodiment.

The encoding and decoding time of a frame is the encoding and decoding time of a video frame, and the encoding and decoding time comprises the encoding time and the decoding time. The time taken to encode and decode a frame is an estimate, which is a statistical estimate of the time taken to encode and decode a video frame (image frame, e.g., I frame, P frame) on the premise of continuously transmitting a video stream. The time for encoding and decoding a frame may be manually input (manually input by a user, preset by a relevant person according to an empirical value, etc.), or may be dynamically adjusted based on a statistical value in the encoding and decoding process, which is not limited in this embodiment.

The encoding apparatus may estimate the maximum number of delay frames that the user can endure according to the delay requirement and the time for encoding and decoding one frame, that is, the number of video frames allowed to be encoded and decoded within the target delay time, which may be the number of all types of video frames (e.g., I-frames and P-frames), or the number of specific types of video frames (e.g., P-frames).

According to the maximum number of delayed frames, the encoding device may determine a target level corresponding to the image group to be encoded, where the target level is a number of non-key frames that are allowed to be included in one segment at most in order to meet the requirement of the maximum number of delayed frames. If the maximum delay frame number is the total number of the key frames and the non-key frames, the target stage number is the difference between the maximum delay frame number and the number of the key frames, and if the maximum delay frame number is the number of the non-key frames, the target stage number is the maximum delay frame number.

According to the target progression, the encoding device can determine a plurality of to-be-encoded segments corresponding to the to-be-encoded image group. The number of the segments to be encoded is less than or equal to the target progression so as to meet the delay requirement, and the total number of the non-key frames contained in all the segments to be encoded is the same as the total number of the non-key frames contained in the image group to be encoded. For example, the encoding device may sequentially divide the segments to be encoded according to the target progression in the playing order, where the number of non-key frames to be encoded contained in each segment to be encoded is a positive integer greater than or equal to 1, and the number of segments to be encoded does not exceed the target progression.

It should be noted that, since the codec may fluctuate, in this embodiment, the target stage number is dynamically estimated according to the statistical delay, so as to adjust the segmentation mode of the non-key frames in the graphics group.

The segment to be encoded may be represented by a number of levels. The number of levels in the first segment to be encoded may be 0, the number of levels in the second segment to be encoded may be 1, and so on. The picture of level 0 refers to the I frame and the following P frames of higher levels refer to the last frame of the previous level. Optionally, the number of the segments may also be numbered from 1, or may start numbering from any other value, which is not limited in this embodiment.

By the embodiment, the target stage number (the maximum delay frame number of the P frame that the user can endure) is estimated according to the time delay requirement and the time for coding and decoding one frame, so that the coding and decoding speed can be improved, the coding and decoding time delay can meet the time delay requirement, and the use experience of the user is improved.

As an alternative embodiment, determining the target stage number according to the target delay time and the target codec time includes:

s31, determining a target time difference value between the target delay time and a first coding and decoding time, wherein the first coding and decoding time is the coding and decoding time of a key frame;

and S32, determining the quotient of the target time difference and second coding and decoding time as a target stage number, wherein the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time comprises the first coding and decoding time and the second coding and decoding time.

Due to different encoding and decoding modes, the encoding and decoding time of the key frame (e.g., I frame) and the encoding and decoding time of the non-key frame (e.g., P frame) of a group of pictures are different, and the target encoding and decoding time may include: a first codec time corresponding to a key frame of a group of pictures and a second codec time corresponding to a non-key frame of a group of pictures. The encoding apparatus may determine a target delay time, a first codec time, and a second codec time, and determine a target stage number according to the first codec time and the second codec time.

Optionally, in this embodiment, the encoding device may calculate a target time difference between the target delay time and the first codec time, where the target time difference is a maximum delay time of the allowed codec non-key frame; and determining the quotient of the target time difference and the second coding and decoding time as a target progression (the maximum delay frame number of the P frames).

For example, when the tolerable delay is t, the time required for encoding and decoding a frame I is t _I The time required for encoding and decoding a frame P is t _P Then the adaptive progression L (number of adaptive slices, target progression) can be calculated as: (t-t) _I )/t _P 。

According to the embodiment, the maximum delay frame number is determined according to the target delay time, the coding and decoding time of the I frame and the coding and decoding time of the P frame, so that the target stage number can be adaptively adjusted according to the delay and the calculation power (calculation capacity), the application delay requirement can be met, and the use experience of a user is ensured.

As an alternative embodiment, determining a plurality of segments to be encoded corresponding to the image group to be encoded according to the target progression includes:

s41, determining a first segment to be coded according to the target progression, wherein the number of non-key frames to be coded contained in the segment to be coded is the target number, the target number is larger than or equal to the value obtained by dividing the target video frame number by the target progression result and rounding up, and the target video frame number is the number of a plurality of non-key frames to be coded;

and S42, determining other to-be-encoded segments according to the target number, wherein the number of the to-be-encoded non-key frames contained in the other to-be-encoded segments is less than or equal to the target number.

The number of the non-key frames to be coded contained in the image group to be coded is the target video frame number. After the target progression is determined, the encoding device may determine, according to the target progression and the target video frame number, the number of non-key frames to be encoded included in each segment to be encoded. For example, the non-key frames to be encoded may be distributed as uniformly as possible in each segment to be encoded, for example, a part of the segments may contain the same number of non-key frames, and another part of the segments may contain the same number of non-key frames, but the two part of the segments may contain different numbers of non-key frames; the non-key frames to be coded can also be distributed into each section to be coded according to other rules.

For example, a group of pictures contains 8P frames, the number of target levels is 3, and the number of P frames contained in each segment is: 3, 3, 2; alternatively, 2, 3, 3; alternatively, 3, 2, 3.

Optionally, in this embodiment, the most non-key frames may be allocated to the first segment, and the number of non-key frames included in the following segments may be sequentially reduced, or may be allocated in other manners.

The encoding device may first determine a first segment to be encoded according to the target progression, where the number of non-key frames to be encoded (target number) included in the first segment to be encoded is greater than or equal to a value obtained by rounding up a result of dividing the target video frame number by the target progression. For example, a group of pictures contains 8P frames, the number of target levels is 3, and the number of P frames contained in the first segment is greater than or equal to that of P frames in the first segment

I.e. greater than or equal to 3.

According to the target number, the encoding device may determine other to-be-encoded segments, where the number of to-be-encoded non-key frames included in the other to-be-encoded segments is less than or equal to the target number, and the number of to-be-encoded non-key frames included in the other to-be-encoded segments may be the same or different except for the first to-be-encoded segment.

The segmentation mode provided in the embodiment can encode and decode frames at a high speed during visual switching, because human dynamic vision is insensitive to details of an object moving quickly, frames with low segment numbers can be displayed quickly, subjective human perception of large detail errors is not easy, and then with the reduction of parallelism in a GOP, inter-frame errors are further reduced, and better image quality can be obtained even if the frames are sensitive to details after visual fixation.

Alternatively, the number of non-key frames included in the first segment may be determined in a plurality of ways, and the number of non-key frames included in each subsequent segment may be determined according to the number of non-key frames included in the first segment.

As an alternative implementation, the number of non-key frames to be encoded contained in the first segment to be encoded can be calculated by formula (1):

wherein N is ₀ The number of non-key frames contained in the 0 th level segment (the first segment to be encoded), n is the total number of non-key frames contained in the group of pictures, and L is the target number of levels.

After the number of non-key frames included in the 0 th segment is obtained, the number of non-key frames included in each subsequent segment may be sequentially determined, for example, the number of non-key frames included in the subsequent segment is half of the number of non-key frames included in the previous segment.

For example, N P frames are included in one GOP, and after calculating the adaptive series L, the encoding apparatus can calculate the number of P frames included in the 0 th stage segment as N according to formula (1) ₀ The following stage has a number of frames N _i ＝N _i-1 L-1 is less than or equal to i 2. Referring to fig. 5, the adaptive number L is 3, and 7P frames are included in one GOP, then according to formula (1), the number of P frames included in the 0 th level segment is 4, the number of P frames included in the 1 st level segment is 2, and the number of P frames included in the 2 nd level segment is 1. The P frames in level 0 are coded and decoded with reference to the I frame, and the P frames in each slice can be coded and decoded with reference to the last P frame of the previous level.

As an alternative implementation, the number of non-key frames to be encoded contained in the first segment to be encoded can be calculated by formula (2):

wherein N is ₀ The number of non-key frames contained in the level 0 segmentation is shown, n is the total number of non-key frames contained in the image group, and L is the target progression.

After the number of non-key frames included in the 0 th segment is obtained, the number of non-key frames included in each subsequent segment may be sequentially determined, for example, the number of non-key frames included in the subsequent segment is a difference between the number of non-key frames included in the previous segment and 1.

For example, N P frames are included in a GOP, and after calculating the adaptive series L, the encoding apparatus may calculate the number of P frames included in the 0 th level segment as N according to formula (2) ₀ The following stage has a number of frames N _i ＝N _i-1 -1, i.ltoreq.L-1. Referring to fig. 6, the adaptive number L is 3, and 9P frames are included in one GOP, then according to formula (2), the number of P frames included in the 0 th level segment is 4, the number of P frames included in the 1 st level segment is 3, and the number of P frames included in the 2 nd level segment is 2. The P frames in level 0 are coded and decoded with reference to the I frame, and the P frames in each slice can be coded and decoded with reference to the last P frame of the previous level.

According to the embodiment, the number of the non-key frames contained in each segment is determined according to the target number of stages and the number of the non-key frames contained in the image group, and the number of the non-key frames contained in the first segment is controlled to be the largest, so that the compression efficiency can be improved while the quality of image coding and decoding is ensured.

s51, carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is a key frame to be coded;

s52, under the condition that the current video frame to be coded belongs to the first segment to be coded, according to the target coding mode, using the key frame to be coded as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded;

s53, when the current video frame to be coded belongs to other segments to be coded, according to the target coding mode, using the target reference video frame as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded, wherein the target reference video frame comprises the last non-key frame of the segment to be coded before the current segment to be coded;

wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.

If the current video frame to be coded is the key frame (the first video frame) to be coded in the image group to be coded, the current video frame to be coded can be used as a random access point in the video stream. The coding device may encode (intra-frame coding) the image group to be coded by using intra-frame prediction without referring to other frames, so as to obtain an I frame (intra-coded frame) corresponding to the image group to be coded. I-frames are typically encoded with higher quality and compression efficiency.

And if the current video frame to be coded belongs to the first segment to be coded, the reference video frame of the current video frame to be coded is the key frame to be coded according to the target coding mode. The coding device can use the key frame to be coded as a reference video frame of the current video frame to be coded, and code the current video frame to be coded in an inter-frame prediction mode (inter-frame coding), or code the current video frame to be coded in a mode of combining intra-frame prediction and inter-frame prediction (intra-frame coding and inter-frame coding), so that the compression efficiency is improved.

If the current video frame to be coded belongs to other segments to be coded (non-first segments to be coded), the reference video frame of the current video frame to be coded comprises the last non-key frame of the segment to be coded positioned before the current segment to be coded according to the target coding mode. The encoding device may first determine a target reference video frame corresponding to a video frame currently to be encoded. The target reference video frame may include: the last non-key frame of the segment to be encoded, which is located before the current segment to be encoded, may also include: and key frames to be coded.

The encoding device can use the target reference video frame as a reference video frame of the current video frame to be encoded, and perform interframe encoding on the video frame to be encoded, or perform intraframe encoding and interframe encoding on the current video frame to be encoded, so as to improve the compression efficiency.

According to the embodiment, the key frames of the image group and the non-key frames in different segments are coded in different coding modes, so that both the coding quality and the compression efficiency can be considered, and the resource utilization rate is improved.

As an alternative embodiment, according to the target encoding mode, the target reference video frame is used as a reference video frame of the current video frame to be encoded, and the target encoding of the current video frame to be encoded includes:

s61, under the condition that the number of reference video frames corresponding to the current video frame to be coded is multiple, according to the target coding mode, determining the key frame to be coded and the last non-key frame of at least one segment to be coded before the current segment to be coded as the target reference video frame corresponding to the current video frame to be coded;

and S62, taking the target reference video frame as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded.

If the current video frame to be encoded allows multiple reference frames, that is, the number of reference video frames corresponding to the current video frame to be encoded is multiple, according to the target encoding mode, the encoding apparatus may determine multiple reference video frames (target reference video frames) corresponding to the current video frame to be encoded, and may include: the key frame to be encoded is the last non-key frame of at least one segment to be encoded before the current segment to be encoded, for example, the last non-key frame of the segment to be encoded before the current segment to be encoded.

For example, the key frame to be encoded and the last non-key frame of the previous segment to be encoded of the current segment to be encoded may be determined as the target reference video frame corresponding to the current video frame to be encoded.

After determining the target reference video frame, the encoding device may use the target reference video frame as a reference video frame of the current video frame to be encoded, and perform inter-frame encoding or intra-frame encoding combined with inter-frame encoding on the current video frame to be encoded.

For example, as shown in fig. 5 and 6, if the reference to the multi-frame is allowed, each P frame may refer to the multi-frame forward according to the current stage number, for example, when the reference frame number is 2, the frame with POC of 6 in fig. 2 may refer to the frame with POC of 0 and POC of 4.

By the embodiment, when the reference multi-frame is allowed, a plurality of reference video frames corresponding to the video frame to be coded are determined based on the segments of the image group, and the reasonability of the determination of the reference video frames can be ensured.

and S71, according to the target coding mode, coding all the non-key frames to be coded in the same segment to be coded in the group of pictures to be coded in parallel.

In LDP coding mode, each P frame references both an I frame and its preceding P frame, starting with the second P frame, and therefore the way P frames are coded is serial and coding is inefficient.

Optionally, in this embodiment, during encoding, a GOP is segmented according to a target level, and since a P frame is segmented and is not a chain reference relationship any more, peer P frames in the GOP may be encoded and decoded in parallel, which greatly accelerates the encoding and decoding speed.

According to the target coding mode, all the non-key frames to be coded in the same segment to be coded refer to the same frame (for example, the key frame to be coded and the last non-key frame of the previous segment), so that the coding equipment can carry out parallel coding and decoding on all the non-key frames to be coded in the same segment to be coded, and the coding and decoding speed is increased.

For example, as shown in fig. 5, the level 0 segment contains the 1 st P frame, the 2 nd P frame, the 3 rd P frame, and the 4 th P frame, and these 4P frames can be coded and decoded in parallel. The level 1 segment contains the 5 th and 6 th P frames, and these 2P frames can be coded and decoded in parallel.

For another example, as shown in fig. 6, the level 0 segment includes the 1 st P frame, the 2 nd P frame, the 3 rd P frame and the 4 th P frame, and the 4P frames can be coded and decoded in parallel. The level 1 segment contains the 5 th P frame, the 6 th P frame, and the 7 th P frame, and these 3P frames can be coded and decoded in parallel. The level 2 segment contains the 8 th P frame and the 9 th P frame, and the 2P frames can be coded and decoded in parallel.

By the embodiment, the coding and decoding speed can be increased by using a parallel mode for coding and decoding all the P frames in the same segment.

As an alternative embodiment, the obtaining of the group of images to be encoded of the video to be encoded includes:

s81, under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, acquiring an image group to be coded of a video to be coded, wherein the video to be coded is a part of the panoramic video to be coded, which corresponds to the main view angle area, the image group to be coded is an image group in which a first video frame after view angle switching occurs, the definition corresponding to the main view angle area is a first definition, the definition corresponding to other areas except the main view angle area in the panoramic video is a second definition, and the first definition is higher than the second definition.

The encoding method in this embodiment may be applied to different video transmission scenes, for example, scenes of panoramic video transmission. For a panoramic video (e.g., a panoramic video in a VR scene), the definition corresponding to the main viewing angle area of the user may be configured as a first definition (high definition), and the definitions corresponding to the other areas except the main viewing angle area in the panoramic video may be configured as a second definition (low definition), where the first definition is higher than the second definition.

Before the view switching does not occur, the main view area of the target object (user) is the first view area, high-definition videos are displayed in the first view area, and low-definition videos are displayed in other areas except the first view area in the panoramic video. At a certain moment, the visual angle of the target object is switched, the main visual angle area of the target object is switched from the first visual angle area to the second visual angle area, and then the video code stream corresponding to the second visual angle area needs to be switched into the high-definition code stream, so that the high-definition video is displayed in the second visual angle area.

The video to be encoded may be a portion of the panoramic video to be encoded, which is located in the main view angle region, and the definition corresponding to the video code stream of the video to be encoded is the first definition. For example, when the VR device detects that the user is switched at a certain time, the video code stream located in the main view area needs to be quickly switched to the high definition code stream after the switching occurs. The encoder in the encoding device itself has a parameter that controls the quality of the video stream, and the sharpness of the video stream can be controlled by this parameter.

Through the embodiment, the definition corresponding to the video code stream is controlled according to the switching of the user visual angle area, the conversion speed from the low definition stream to the high definition stream in the main visual angle area can be increased, and the use experience of a user is improved.

Optionally, in this embodiment, before acquiring the group of pictures to be encoded of the video to be encoded, the encoding device may receive target view information transmitted by a target device, where the target device is a device for acquiring view information of a target object viewing a panoramic video; according to the target view information, the encoding apparatus may determine that a main view region of a target object is switched from a first view region to a second view region in the panoramic video.

At a decoding end or a playing device end, a target device (e.g., a VR device) may acquire view information of a panoramic video viewed by a target object, obtain target view information, and transmit the target view information to an encoding device through a network by the target device, the encoding device, or the playing device. The target device, the encoding device, and the playing device may be the same device or different devices, which is not limited in this embodiment.

The encoding device may receive target view information transmitted by the target device, and determine, according to the target view information, that a main view area of a target object in the panoramic video is switched from a first view area to a second view area. The target view information may be area information of a main view area, or may be a position of a main viewpoint of a target object on a panoramic video (panoramic video frame). The encoding device may directly determine the main view area through the target view information, or may determine the main view area through a position of the main viewpoint on the panoramic video (panoramic video frame) and the range information of the main view area, which is not limited in this embodiment.

For example, the VR glasses (or the VR of the mobile phone when drawing a picture) may obtain the view angle information of the user, determine the position of the main viewpoint in the picture, and further determine the main view angle area of the user.

By the embodiment, the main visual angle area of the target object is determined by acquiring the visual angle information of the target object, so that the accuracy of determining the main visual angle area can be improved.

Optionally, in this embodiment, for a video frame to be encoded in a group of pictures to be encoded, the encoding device may encode the video frame to be encoded to obtain a first video code stream, where a definition corresponding to the first video code stream is a first definition and may be a high-definition code stream. In order to ensure that a user can see the video pictures in the switched view angle area when the view angle is switched, the coding device can code the panoramic video frame to be coded of the panoramic video to be coded to obtain a second video code stream, wherein the definition corresponding to the second video code stream is the second definition and is a low-definition code stream. The encoding device can transmit the first video code stream and the second video code stream to the decoding device, so that the decoding device can render video frames (image frames and images) obtained by decoding the first video code stream into a main view angle area of the video frames obtained by decoding the second video code stream, and high-definition video pictures are displayed in the main view angle area.

The encoding process of the panoramic video frame to be encoded and the encoding process of the video frame to be encoded may be executed simultaneously (consuming a certain storage resource), or may be executed sequentially, which is not limited in this embodiment.

For example, if a video picture displayed in a certain view angle region is converted into a high definition video, a low definition video stream can still be transmitted, because the transmission cost is low, and if the region corresponding to the current view angle is no longer the main visual range during view angle conversion, the low definition video stream can be directly switched to for decoding.

Through this embodiment, through transmitting low clear code stream and high definition code stream simultaneously, can guarantee the integrality that the user video information shows when the visual angle switches, improve user's use and experience.

According to another aspect of the embodiment of the present application, there is also provided a video decoding method. Alternatively, in this embodiment, the video decoding method may be applied to a hardware environment formed by the encoding end 102, the decoding end 104 and the playing device 106 as shown in fig. 1. The description is already given and will not be repeated herein.

The video decoding method of the embodiment of the present application may be executed by the decoding end 104, where the decoding end 104 may be a terminal device or a server. The terminal device executing the video decoding method according to the embodiment of the present application may also be executed by a client installed thereon. Taking the video decoding method in the present embodiment executed by the decoding end 104 as an example, fig. 7 is a schematic flowchart of an alternative video decoding method according to an embodiment of the present application, and as shown in fig. 7, the flowchart of the method may include the following steps:

step S702, acquiring a group of pictures to be decoded of a video to be decoded, wherein the group of pictures to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded.

The video decoding method in this embodiment may be used to decode a video code stream obtained by encoding a group of pictures to be encoded by any one of the above video encoding methods. The decoding device can obtain a video code stream transmitted by the device through a network, namely, a group of pictures to be decoded of a video to be decoded, wherein the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded.

Step S704, determining a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, a non-key frame to be decoded in a first fragment to be decoded refers to a key frame to be decoded, and non-key frames to be decoded in other fragments to be decoded except the first fragment to be decoded refer to a last non-key frame of at least one fragment to be decoded before the other fragments to be decoded.

The decoding device may determine a target reference relationship corresponding to the group of pictures to be decoded, the target reference relationship being used to indicate video frames referenced by respective video frames to be decoded in the group of pictures to be decoded. The indication information of the target reference relationship may be carried in a video code stream corresponding to each video frame to be decoded in the group of pictures to be decoded.

The target reference relationship corresponds to a target coding mode adopted by a coding side, and the indicated reference relationship is as follows: in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in the first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in the other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before the other fragments to be decoded.

And step S706, decoding the image group to be decoded according to the target reference relation.

According to the target reference relation, the decoding device can decode each video frame to be decoded in the group of pictures to be decoded. For the key frame to be decoded in the image group to be decoded, the decoding device can perform intra-frame decoding on the key frame to be decoded to obtain a corresponding video frame; for the non-key frame to be decoded in the image to be decoded, the decoding device may determine the reference image frame of the non-key frame to be decoded based on the target reference relationship, and perform inter-frame decoding according to the corresponding reference image frame, or obtain the corresponding video frame by combining intra-frame decoding with inter-frame decoding. The related art may be referred to for the decoding process of the video frame to be decoded, which is not limited in this embodiment.

Through the steps S702 to S706, a group of pictures to be decoded of a video to be decoded is obtained, where the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to a group of pictures to be decoded, wherein the target reference relationship is used for indicating that in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, a non-key frame to be decoded in a first fragment to be decoded refers to a key frame to be decoded, and a non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before other fragments to be decoded; the image group to be decoded is decoded according to the target reference relationship, so that the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in a video encoding and decoding mode in the related technology is solved, the encoding and decoding time delay is reduced, and the encoding and decoding efficiency is improved.

As an alternative embodiment, decoding the group of pictures to be decoded according to the target reference relationship includes:

s91, under the condition that the current video frame to be decoded is the key frame to be decoded, carrying out intra-frame decoding on the key frame to be decoded;

s92, under the condition that the current video frame to be decoded belongs to the first segment to be decoded, according to the target reference relationship, taking the key frame to be decoded as the reference video frame of the current video frame to be decoded, and carrying out target decoding on the current video frame to be decoded;

s93, when the current video frame to be decoded belongs to other segments to be decoded, according to the target reference relationship, using the target reference video frame as the reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded, wherein the target reference video frame comprises the last non-key frame of at least one segment to be decoded before the current segment to be decoded;

wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.

If the current video frame to be decoded is the key frame (the first video frame, I frame) to be decoded of the group of pictures to be decoded, it can be used as a random access point in the video stream. The decoding device can directly perform intra-frame decoding on other frames without referring to the other frames to obtain corresponding video frames.

And if the current video frame to be decoded belongs to the first segment to be decoded, the reference video frame of the current video frame to be decoded is the key frame to be decoded according to the target reference relationship. The decoding device may use the key frame to be decoded as a reference video frame of the current video frame to be decoded, and perform inter-frame decoding on the current video frame to be decoded, or perform intra-frame decoding and inter-frame decoding on the current video frame to be decoded.

And if the current video frame to be decoded belongs to other segments to be decoded (non-first segments to be decoded), according to the target indication information, the reference video frame of the current video frame to be decoded comprises at least one non-key frame (non-key frame to be decoded) positioned before the current video frame to be decoded. The decoding device may first determine a target reference video frame corresponding to a video frame currently to be decoded. The target reference video frame may include: the last non-key frame of at least one segment to be decoded before the segment to be decoded currently may also include: the key frame to be decoded.

The decoding device may perform inter-frame decoding on the video frame to be decoded by using the target reference video frame as a reference video frame of the video frame to be decoded currently, or perform intra-frame decoding and inter-frame decoding on the video frame to be decoded currently.

By the embodiment, the key frames of the image group and the non-key frames in different segments are decoded in different decoding modes, so that the video quality and the compression efficiency can be considered, and the resource utilization rate is improved.

As an optional embodiment, according to the target reference relationship, the target reference video frame is used as a reference video frame of the current video frame to be decoded, and the target decoding of the current video frame to be decoded includes:

s101, under the condition that the number of reference video frames corresponding to a current video frame to be decoded is multiple, determining a key frame to be decoded and the last non-key frame of at least one segment to be decoded before the segment to be decoded as a target reference video frame corresponding to the current video frame to be decoded according to a target reference relationship;

and S102, taking the target reference video frame as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded.

If multiple reference frames are allowed, that is, the number of reference video frames corresponding to the current video frame to be decoded can be multiple, according to the target reference relationship, the decoding device can determine the key frame to be decoded of the image group to be decoded and the last non-key frame of at least one fragment to be decoded before the fragment to be decoded as the target reference video frame corresponding to the current video frame to be decoded.

For example, the key frame to be decoded and the previous non-key frame of the current video frame to be decoded may be determined as the target reference video frame corresponding to the current video frame to be decoded.

After the target reference video frame is determined, the decoding device may use the target reference video frame as a reference video frame of a current video frame to be decoded, and perform inter-frame decoding or intra-frame decoding combined with inter-frame decoding on the current video frame to be decoded.

By the embodiment, the accuracy of video frame decoding can be improved by decoding the video frame to be decoded by referring to the plurality of video frames.

and S111, according to the target reference relationship, performing parallel decoding on all non-key frames to be decoded in the same segment to be decoded in the image group to be decoded.

According to the target reference relationship, the decoding device can decode all the non-key frames to be decoded in the same segment to be decoded in the group of pictures to be decoded. The decoding device can firstly carry out parallel decoding on all the non-key frames to be decoded in the first segment to be decoded, and the coding and decoding speed can be increased because all the non-key frames in the same segment are coded and decoded in parallel.

By the embodiment, the coding and decoding speed can be increased by coding and decoding all the non-key frames in the same segment in parallel.

As an alternative embodiment, the obtaining a group of pictures to be decoded of a video to be decoded includes:

s121, under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, acquiring an image group to be decoded of a video to be decoded, wherein the video to be decoded is a part of the panoramic video to be decoded, which corresponds to the main view angle area, the image group to be decoded is an image group in which a first video frame after view angle switching is positioned, the definition corresponding to the main view angle area is a first definition, the definition corresponding to other areas except the main view angle area in the panoramic video is a second definition, and the first definition is higher than the second definition.

For the application scene of the panoramic video, a high-definition video can be displayed in a main visual angle area of a user, corresponding to a first definition, a low-definition video can be displayed in other areas except the main visual angle area, corresponding to a second definition, and the first definition is higher than the second definition.

If the main view area of the target object is switched from the first view area to the second view area, the decoding device may acquire an image group in which the first video frame after the view switching occurs is located, that is, an image group to be decoded. For example, the VR device detects that the user is switched at a certain time, and after the switching occurs, the decoding device may perform video decoding in the above decoding manner, so that the video code stream located in the main view angle area may be quickly switched to the high definition code stream.

The decoding device may receive a video code stream transmitted by the encoding device, for example, the first video code stream may be a high-definition code stream corresponding to a video frame to be decoded in the group of pictures to be decoded, and for example, the second video code stream may be a low-definition code stream corresponding to a panoramic video frame to be decoded of the panoramic video to be decoded. The video frame to be decoded and the panoramic video frame to be decoded have a corresponding relationship, and the decoding device can render the video frame (image frame and image) decoded by the first video code stream into the main view angle area of the video frame decoded by the second video code stream, so that a high-definition video picture is displayed in the main view angle area. The sharpness corresponding to the main viewing angle region (second viewing angle region) is a first sharpness, and the sharpness corresponding to the other regions except the second viewing angle region is a second sharpness, and the first sharpness is higher than the second sharpness.

The second view region may be indicated by region indication information in the video bitstream. The area indication information is used to indicate a position of the main view area in the panoramic video, and may be area information of the main view area, or may be other types of information, and information that can indicate an area range of the main view area may be used as the area indication information.

Optionally, the second view angle region may also be determined by target view angle information transmitted by the target device, and by matching the time, the decoding device may determine a corresponding relationship between the target view angle information and the video frame to be decoded, so as to determine the second view angle region for displaying the video frame to be decoded.

Through this embodiment, the definition that the video code stream corresponds is controlled according to the regional switching of user's visual angle, can improve the switching speed that low clear flows in the user main visual angle region were flowed to high clear flows, promotes user's use and experiences.

The following explains a video encoding and decoding method in the embodiments of the present application with reference to an alternative example. The example provides a progressive and dynamic multi-stage adaptive low-delay coding frame reference method applied to a VR scene, which estimates the maximum delay frame number that a user can endure according to the time delay requirement and the time consumption of coding and decoding a frame, calculates the maximum division segment number (stage number) according to the frame number, and calculates the image reference I frame of the first (0 th stage) segment and the P frame of the following segment (high stage segment) references the last frame of the previous segment.

Because each GOP is subjected to self-adaptive progression division coding, coding delay and tolerable maximum delay can be calculated according to current calculation force and application delay requirements, progression is divided, P frames at the same level are coded in parallel, and coding speed is increased. Similarly, the P frames of the same stage number can be decoded in parallel during decoding, so that the decoding speed is accelerated. When encoding in this way, when VR switches from the main view to either the left or right view, the conversion from high definition to low definition can be done quickly.

The hybrid adaptive low-latency encoded frame reference method provided in this example may be applied to a network architecture as shown in fig. 8, in which:

an encoder for acquiring and splicing a panoramic video, the panoramic video being a video played in a VR device; coding video data in a main view angle area of a user in a panoramic video according to a first definition (high definition) to obtain a corresponding high definition stream, coding the panoramic video according to a second definition (low definition) to obtain a corresponding low definition stream, and transmitting the obtained high definition stream and the obtained low definition stream to a decoder through a network; receiving visual angle information of a user transmitted by VR equipment, and determining a main visual angle area of the user according to the visual angle information;

the decoder is used for decoding the high-definition stream and the low-definition stream respectively, rendering a video frame obtained by decoding the high-definition stream to a main visual angle area of a video frame obtained by decoding the corresponding low-definition stream to obtain a decoded video, and playing the decoded video through VR equipment;

and the VR equipment is used for playing the video decoded by the decoder, acquiring the visual angle information of the user, and transmitting the visual angle information of the user to the encoder through a network when the visual angle of the user is switched.

As shown in fig. 9, the flow of the video encoding and decoding method in this example may include the following steps:

and step S902, estimating the maximum delay frame number which can be endured by a user according to the time delay requirement and the time for coding and decoding one frame, and calculating the target progression according to the frame number.

On the encoding side, when a certain frame in a GOP is encoded, the maximum delay frame number which can be endured by a user is estimated according to the time delay requirement and the time for encoding and decoding the frame, and the maximum stage number is calculated according to the maximum delay frame number.

For example, the user can tolerate a delay of t, which is required for encoding/decoding a frame I _I Coding/decoding a frame P requires t _P And n P frames are included in the GOP, then the adaptive maximum delay frame number (maximum number of fragments, maximum number of stages) can be calculated as: ((t-t) _I ))/t _P 。

And step S904, coding the video frame corresponding to the main visual angle area in the panoramic video according to the maximum progression to obtain a corresponding video code stream, and transmitting the obtained video code stream to a decoder through a network.

The P frames in a GOP may be divided into a plurality of slices according to the maximum series, each slice may contain P frames in an amount according to formula (1) or formula (2), the first frame of each slice is encoded with reference to the last frame of the previous slice, the frame in the first slice is encoded with reference to an I frame, and the P frames in each slice may be encoded in parallel.

The encoder may encode a video frame corresponding to the main view region in the panoramic video according to the encoding mode to obtain a corresponding video code stream, and transmit the obtained video code stream to the decoder through a network. In addition, the encoder may encode the panoramic video frame in the same manner or in a different manner, which is not limited in this example.

When encoding in this way, when VR switches from the main view to either of the left and right views (switching views to the left or right), the transition from high definition to low definition can be done quickly.

And step S906, decoding the received video code stream according to the reference relation during coding to obtain a corresponding video, and playing the decoded video through VR equipment.

Each frame in the code stream obtained by coding contains the information of which frames are referred to when the frame is coded, namely the reference relation. The decoder can decode the received video code stream according to the reference relation during encoding to obtain a corresponding video, and transmits the corresponding video to the VR equipment for playing.

In addition, because the images of the same level are coded by referring to the same frame, the images of the same level can be coded and decoded in parallel without depending on the previous P frame, and the coding and decoding speed is greatly increased.

In the LDP mode in the related art, when any one P frame is decoded, all reference frames thereof need to be stored in a memory. In this example, when decoding any P frame, the time of decoding one I frame is added to the time of decoding the current slice sequence number (current level number) multiplied by the P frame, and the P frame decoding time of the same slice is the same, that is, the time of waiting for decoding one P frame, the current slice sequence number (current level number) + the time of decoding the I frame is required.

By the example, the number of stages is adaptively adjusted according to the delay and the computing power, so that the application delay requirement is met while the reference error is reduced; the images of the same segment are coded by referring to the same frame without depending on the previous P frame, and the images of the same segment can be coded and decoded in parallel, so that the coding and decoding speed is greatly increased; when any P frame is randomly decoded, decoding can be performed only by waiting for the decoding time of one I frame plus the current stage number P frame, so that the decoding waiting time is greatly shortened.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is also provided a video encoding apparatus for implementing the above video encoding method. Fig. 10 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application, and as shown in fig. 10, the apparatus may include:

an obtaining unit 1002, configured to obtain a group of images to be encoded of a video to be encoded, where the group of images to be encoded includes a key frame to be encoded and a plurality of non-key frames to be encoded;

a determining unit 1004, connected to the obtaining unit 1002, configured to determine a target encoding mode matched with the group of images to be encoded, where the target encoding mode is used to indicate that, in a plurality of segments to be encoded into which a plurality of non-key frames to be encoded are divided, a non-key frame to be encoded in a first segment to be encoded refers to a key frame to be encoded, and non-key frames to be encoded in other segments to be encoded except the first segment to be encoded refer to a last non-key frame of at least one segment to be encoded before the other segments to be encoded;

the encoding unit 1006 is connected to the determining unit 1004 and is configured to encode the group of images to be encoded according to the target encoding mode.

It should be noted that the obtaining unit 1002 in this embodiment may be configured to execute the step S202, the determining unit 1004 in this embodiment may be configured to execute the step S204, and the encoding unit 1006 in this embodiment may be configured to execute the step S206.

Through the module, a group of images to be coded of a video to be coded is obtained, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded, which are divided by a plurality of non-key frames to be coded, a non-key frame to be coded in a first section to be coded refers to a key frame to be coded, and non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before the other sections to be coded; the method and the device encode the image group to be encoded according to the target encoding mode, solve the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in video encoding and decoding modes in the related technology, reduce the encoding and decoding time delay and improve the encoding and decoding efficiency.

As an alternative embodiment, the encoding unit 1006 includes:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a plurality of to-be-encoded segments corresponding to an image group to be encoded, and each to-be-encoded segment comprises at least one to-be-encoded non-key frame;

and the first coding module is used for coding each non-key frame to be coded in each section to be coded according to the target coding mode.

As an alternative embodiment, the first determining module includes:

the first determining submodule is used for determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is the allowed maximum delay time, the target coding and decoding time is the coding and decoding time of one video frame, the coding and decoding time comprises the time used for coding and the time used for decoding, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time;

and the second determining submodule is used for determining a plurality of to-be-encoded segments corresponding to the to-be-encoded image group according to the target progression, wherein the number of the to-be-encoded video segments contained in the plurality of to-be-encoded video segments is less than or equal to the target progression.

As an alternative embodiment, the first determination submodule includes:

the first determining subunit is configured to determine a target time difference between a target delay time and a first encoding and decoding time, where the first encoding and decoding time is an encoding and decoding time of a key frame;

and the second determining subunit is configured to determine a quotient of the target time difference and a second coding and decoding time as a target stage number, where the second coding and decoding time is a coding and decoding time of a non-key frame, and the target coding and decoding time includes the first coding and decoding time and the second coding and decoding time.

As an alternative embodiment, the first determination submodule includes:

the third determining subunit is used for determining a first segment to be encoded according to the target progression, wherein the number of non-key frames to be encoded contained in the segment to be encoded is the target number, the target number is greater than or equal to the value obtained by dividing the target video frame number by the target progression result and rounding up, and the target video frame number is the number of a plurality of non-key frames to be encoded;

and the fourth determining subunit is used for determining other to-be-encoded segments according to the target number, wherein the number of the to-be-encoded non-key frames contained in the other to-be-encoded segments is less than or equal to the target number.

As an alternative embodiment, the encoding unit 1006 includes:

the second coding module is used for carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is the key frame to be coded;

the third coding module is used for performing target coding on the current video frame to be coded by taking the key frame to be coded as a reference video frame of the current video frame to be coded according to a target coding mode under the condition that the current video frame to be coded belongs to the first segment to be coded;

the fourth coding module is used for taking the target reference video frame as the reference video frame of the current video frame to be coded and carrying out target coding on the current video frame to be coded according to the target coding mode under the condition that the current video frame to be coded belongs to other segments to be coded, wherein the target reference video frame comprises the last non-key frame of the segment to be coded before the current segment to be coded;

As an alternative embodiment, the fourth encoding module comprises:

the third determining submodule is used for determining the key frame to be coded and the last non-key frame of at least one segment to be coded positioned before the segment to be coded as the target reference video frame corresponding to the current video frame to be coded according to the target coding mode under the condition that the number of the reference video frames corresponding to the current video frame to be coded is multiple;

and the coding sub-module is used for performing target coding on the current video frame to be coded by taking the target reference video frame as a reference video frame of the current video frame to be coded.

As an alternative embodiment, the encoding unit 1006 includes:

and the fifth coding module is used for coding all the non-key frames to be coded in the same segment to be coded in the group of pictures to be coded in parallel according to the target coding mode.

As an alternative embodiment, the obtaining unit 1002 includes:

the first acquisition module is used for acquiring a to-be-encoded image group of a to-be-encoded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-encoded video is a part corresponding to the main view angle area in the to-be-encoded panoramic video, the to-be-encoded image group is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.

According to another aspect of the embodiments of the present application, there is also provided a video decoding apparatus for implementing the above video decoding method. Fig. 11 is a block diagram illustrating an alternative video decoding apparatus according to an embodiment of the present application, where as shown in fig. 11, the apparatus may include:

an obtaining unit 1102, configured to obtain a group of pictures to be decoded of a video to be decoded, where the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded;

a determining unit 1104, connected to the obtaining unit 1102, configured to determine a target reference relationship corresponding to the image group to be decoded, where the target reference relationship is used to indicate that, in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, a non-key frame to be decoded in a first fragment to be decoded refers to a key frame to be decoded, and a non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to a last non-key frame of at least one fragment to be decoded before the other fragments to be decoded;

a decoding unit 1106 connected to the determining unit 1104 for decoding the group of pictures to be decoded according to the target reference relationship.

It should be noted that the obtaining unit 1102 in this embodiment may be configured to execute the step S602, the determining unit 1104 in this embodiment may be configured to execute the step S604, and the decoding unit 1106 in this embodiment may be configured to execute the step S606.

Through the module, the image group to be decoded of the video to be decoded is obtained, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to a group of pictures to be decoded, wherein the target reference relationship is used for indicating that in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, a non-key frame to be decoded in a first fragment to be decoded refers to a key frame to be decoded, and a non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before other fragments to be decoded; the image group to be decoded is decoded according to the target reference relationship, so that the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in a video encoding and decoding mode in the related technology is solved, the encoding and decoding time delay is reduced, and the encoding and decoding efficiency is improved.

As an alternative embodiment, the decoding unit 1106 includes:

the first decoding module is used for carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded;

the second decoding module is used for taking the key frame to be decoded as the reference video frame of the current video frame to be decoded according to the target reference relationship under the condition that the current video frame to be decoded belongs to the first segment to be decoded, and carrying out target decoding on the current video frame to be decoded;

the third decoding module is used for taking the target reference video frame as the reference video frame of the current video frame to be decoded and carrying out target decoding on the current video frame to be decoded according to the target reference relation under the condition that the current video frame to be decoded belongs to other fragments to be decoded, wherein the target reference video frame comprises the last non-key frame of at least one fragment to be decoded before the current fragment to be decoded;

As an alternative embodiment, the third decoding module comprises:

a fourth determining submodule, configured to determine, according to a target reference relationship, a key frame to be decoded and a last non-key frame of at least one segment to be decoded before a segment to be decoded currently as a target reference video frame corresponding to a video frame to be decoded currently, when the number of reference video frames corresponding to the video frame to be decoded currently is multiple;

and the decoding submodule is used for performing target decoding on the current video frame to be decoded by taking the target reference video frame as a reference video frame of the current video frame to be decoded.

As an alternative embodiment, the decoding unit 1106 includes:

and the fourth decoding module is used for decoding all the non-key frames to be decoded in the same segment to be decoded in the image group to be decoded in parallel according to the target reference relationship.

As an alternative embodiment, the obtaining unit 1102 includes:

the second acquisition module is used for acquiring an image group to be decoded of a video to be decoded under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the video to be decoded is a part corresponding to the main view angle area in the panoramic video to be decoded, the image group to be decoded is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.

According to still another aspect of embodiments of the present application, there is also provided a video transmission system including: the encoding end may include any one of the video encoding devices provided in this embodiment (or the encoding end is the video encoding device), and the decoding end may include any one of the video decoding devices provided in this embodiment (or the decoding end is the video decoding device).

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.

According to yet another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the above-mentioned video encoding method and/or video decoding method, which may be a server, a terminal, or a combination thereof.

Fig. 12 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 12, including a processor 1202, a communication interface 1204, a memory 1206, and a communication bus 1208, wherein the processor 1202, the communication interface 1204, and the memory 1206 communicate with each other via the communication bus 1208,

a memory 1206 for storing a computer program;

the processor 1202, when executing the computer program stored in the memory 1206, performs the following steps:

s1, acquiring a group of images to be coded of the video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;

s2, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded into which a plurality of non-key frames to be coded are divided, the non-key frame to be coded in a first section to be coded refers to a key frame to be coded, and the non-key frames to be coded in other sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before other sections to be coded;

and S3, coding the image group to be coded according to the target coding mode.

Optionally, the processor 1202, when executing the computer program stored in the memory 1206, implements the following steps:

s1, acquiring a group of pictures to be decoded of the video to be decoded, wherein the group of pictures to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;

s2, determining a target reference relationship corresponding to the group of pictures to be decoded, wherein the target reference relationship is used for indicating that in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before other fragments to be decoded;

and S3, decoding the group of pictures to be decoded according to the target reference relation.

Alternatively, in the present embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the aforementioned processor.

As an example, the memory 1202 may include, but is not limited to, an obtaining unit 1002, a determining unit 1004, and an encoding unit 1006 in the video encoding apparatus. In addition, the video encoding apparatus may further include, but is not limited to, other module units in the video encoding apparatus, which is not described in this example again.

As another example, the memory 1202 may include, but is not limited to, the obtaining unit 1102, the determining unit 1104, and the decoding unit 1106 in the video decoding apparatus. In addition, the video decoding apparatus may further include, but is not limited to, other module units in the video decoding apparatus, which is not described in this example again.

The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

It can be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration, and the device implementing the video encoding method and/or the video decoding method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 12 is a diagram illustrating a structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Alternatively, in the present embodiment, the storage medium may be used for program codes for executing a video encoding method and/or a video decoding method.

Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:

s2, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded into which a plurality of non-key frames to be coded are divided, the non-key frame to be coded in a first section to be coded refers to the key frame to be coded, and the non-key frames to be coded in the sections to be coded except the first section to be coded refer to the last non-key frame of at least one section to be coded before the other sections to be coded;

and S3, coding the image group to be coded according to the target coding mode.

s2, determining a target reference relationship corresponding to the image group to be decoded, wherein the target reference relationship is used for indicating that in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before other fragments to be decoded;

Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.

According to yet another aspect of an embodiment of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method steps of any of the embodiments described above.

The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A video encoding method, comprising:

acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;

determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in a plurality of sections to be coded into which a plurality of non-key frames to be coded are divided, the non-key frame to be coded in a first section to be coded refers to the key frame to be coded, and the non-key frame to be coded in other sections to be coded except the first section to be coded refers to the last non-key frame of at least one section to be coded before the other sections to be coded;

and coding the image group to be coded according to the target coding mode.

2. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:

determining the plurality of to-be-encoded segments corresponding to the to-be-encoded image group, wherein each to-be-encoded segment comprises at least one to-be-encoded non-key frame;

and coding each non-key frame to be coded in each section to be coded according to the target coding mode.

3. The method according to claim 2, wherein said determining the plurality of segments to be encoded corresponding to the group of images to be encoded comprises:

determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is an allowed maximum delay time, the target coding and decoding time is coding and decoding time of one video frame, the coding and decoding time comprises coding time and decoding time, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time;

and determining the multiple to-be-encoded segments corresponding to the to-be-encoded image group according to the target progression, wherein the number of the to-be-encoded video segments contained in the multiple to-be-encoded segments is less than or equal to the target progression.

4. The method of claim 3, wherein determining the target stage number according to the target delay time and the target codec time comprises:

determining a target time difference value between the target delay time and a first coding and decoding time, wherein the first coding and decoding time is the coding and decoding time of a key frame;

and determining the quotient of the target time difference and second coding and decoding time as the target progression, wherein the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time comprises the first coding and decoding time and the second coding and decoding time.

5. The method according to claim 3, wherein said determining a plurality of said segments to be encoded corresponding to said group of images to be encoded according to said target progression comprises:

determining the first segment to be encoded according to the target progression, wherein the number of the non-key frames to be encoded contained in the segment to be encoded is a target number, the target number is greater than or equal to a value obtained by dividing a target video frame number by the target progression result and rounding up, and the target video frame number is the number of a plurality of non-key frames to be encoded;

and determining the other to-be-encoded segments according to the target number, wherein the number of the to-be-encoded non-key frames contained in the other to-be-encoded segments is less than or equal to the target number.

6. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:

carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is the key frame to be coded;

under the condition that the current video frame to be coded belongs to the first segment to be coded, according to the target coding mode, taking the key frame to be coded as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded;

under the condition that the current video frame to be coded belongs to other segments to be coded, according to the target coding mode, taking a target reference video frame as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded, wherein the target reference video frame comprises the last non-key frame of the segment to be coded before the current segment to be coded;

7. The method according to claim 6, wherein said target-coding a current video frame to be coded by using a target reference video frame as a reference video frame of the current video frame to be coded according to the target-coding mode comprises:

under the condition that the number of reference video frames corresponding to the current video frame to be coded is multiple, determining the key frame to be coded and the last non-key frame of at least one segment to be coded before the current segment to be coded as the target reference video frame corresponding to the current video frame to be coded according to the target coding mode;

and taking the target reference video frame as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded.

8. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:

and according to the target coding mode, carrying out parallel coding on all the non-key frames to be coded in the same section to be coded in the image group to be coded.

9. The method according to any one of claims 1 to 8, wherein the obtaining the group of pictures to be encoded of the video to be encoded comprises:

the method comprises the steps of acquiring a group of images to be coded of a to-be-coded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-coded video is a part of the to-be-coded panoramic video corresponding to the main view angle area, the group of images to be coded is a group of images where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.

10. A video decoding method, comprising:

acquiring an image group to be decoded of a video to be decoded, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;

determining a target reference relationship corresponding to the group of pictures to be decoded, wherein the target reference relationship is used for indicating that, in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in other fragments to be decoded except the first fragment to be decoded refers to the last non-key frame of at least one fragment to be decoded before the other fragments to be decoded;

and decoding the image group to be decoded according to the target reference relation.

11. The method according to claim 10, wherein said decoding said group of pictures to be decoded according to said target reference relationship comprises:

carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded;

under the condition that the current video frame to be decoded belongs to the first segment to be decoded, according to the target reference relationship, taking the key frame to be decoded as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded;

under the condition that the current video frame to be decoded belongs to other fragments to be decoded, according to the target reference relation, taking a target reference video frame as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded, wherein the target reference video frame comprises the last non-key frame of at least one fragment to be decoded before the current fragment to be decoded;

12. The method according to claim 11, wherein said taking a target reference video frame as a reference video frame of the video frame to be decoded according to the target reference relationship, and performing target decoding on the video frame to be decoded comprises:

under the condition that the number of the reference video frames corresponding to the current video frame to be decoded is multiple, determining the key frame to be decoded and the last non-key frame of at least one segment to be decoded before the segment to be decoded as the target reference video frame corresponding to the current video frame to be decoded according to the target reference relationship;

and taking the target reference video frame as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded.

13. The method according to claim 10, wherein said decoding said group of pictures to be decoded according to said target reference relationship comprises:

and according to the target reference relation, performing parallel decoding on all the non-key frames to be decoded in the same segment to be decoded in the image group to be decoded.

14. The method according to any of claims 10 to 13, wherein said obtaining a group of pictures to be decoded of a video to be decoded comprises:

the method comprises the steps of acquiring a to-be-decoded image group of a to-be-decoded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-decoded video is a part of the to-be-decoded panoramic video corresponding to the main view angle area, the to-be-decoded image group is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.

15. A video encoding apparatus, comprising:

the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring a group of images to be coded of a video to be coded, and the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;

a determining unit, configured to determine a target encoding mode matching the group of images to be encoded, where the target encoding mode is used to indicate that, in a plurality of segments to be encoded into which a plurality of non-key frames to be encoded are divided, the non-key frame to be encoded in a first segment to be encoded refers to the key frame to be encoded, and the non-key frame to be encoded in a segment to be encoded other than the first segment to be encoded refers to a last non-key frame of at least one segment to be encoded before the other segment to be encoded;

and the coding unit is used for coding the image group to be coded according to the target coding mode.

16. A video decoding apparatus, comprising:

the device comprises an acquisition unit, a decoding unit and a decoding unit, wherein the acquisition unit is used for acquiring an image group to be decoded of a video to be decoded, and the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;

a determining unit, configured to determine a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in a plurality of fragments to be decoded into which a plurality of non-key frames to be decoded are divided, the non-key frame to be decoded in a first fragment to be decoded refers to the key frame to be decoded, and the non-key frame to be decoded in a fragment to be decoded other than the first fragment to be decoded refers to a last non-key frame of at least one fragment to be decoded before the other fragment to be decoded;

and the decoding unit is used for decoding the image group to be decoded according to the target reference relationship.

17. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,

the memory for storing a computer program;

the processor configured to perform the method steps of any one of claims 1 to 9 or to perform the method steps of any one of claims 10 to 14 by running the computer program stored on the memory.

18. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method steps of any of claims 1 to 9 or the method steps of any of claims 10 to 14 when executed.