CN112040233A - Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium - Google Patents
Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium Download PDFInfo
- Publication number
- CN112040233A CN112040233A CN202011213187.5A CN202011213187A CN112040233A CN 112040233 A CN112040233 A CN 112040233A CN 202011213187 A CN202011213187 A CN 202011213187A CN 112040233 A CN112040233 A CN 112040233A
- Authority
- CN
- China
- Prior art keywords
- decoded
- coded
- frame
- target
- decoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 239000012634 fragment Substances 0.000 claims abstract description 61
- 238000004891 communication Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 abstract description 17
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000000007 visual effect Effects 0.000 description 30
- 238000010586 diagram Methods 0.000 description 12
- 238000007906 compression Methods 0.000 description 9
- 230000006835 compression Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013144 data compression Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007430 reference method Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 208000002173 dizziness Diseases 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/58—Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The application provides a video encoding method, a video decoding device, an electronic device and a storage medium, wherein the video encoding method comprises the following steps: acquiring an image group to be encoded of a video to be encoded, wherein the image group to be encoded comprises a key frame to be encoded and a plurality of non-key frames to be encoded; determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first fragment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded; and coding the image group to be coded according to the target coding mode. By the method and the device, the problem that data transmission timeliness is poor due to overlarge coding and decoding time delay in a video coding and decoding mode in the related technology is solved.
Description
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for video encoding and decoding, an electronic device, and a storage medium.
Background
At present, for some video processing scenes requiring timeliness of data transmission, a low-delay coding mode can be used for coding video. For example, in VR encoding, a main view uses a high definition stream, and other views use a low definition stream. When the user rotates, the code streams of other visual angles need to be switched to the high-definition code stream, so that the phenomenon that the visual experience of the user is influenced due to the fact that the user feels dizzy and the like caused by the fact that the definition of the picture in the visual angle is changed (the picture with high definition is switched to the picture with low definition) is avoided. In order to quickly switch from a low-definition video stream to a high-definition video stream, VR encoding requires the use of low-latency encoding.
The low delay coding used in the related art is generally LDP coding. When encoding each P frame in a GOP, it is necessary to refer to both the I frame and the previous P frame, and therefore, the process of encoding and decoding P frames is serial. When the non-first P frame is decoded randomly, it is necessary to wait for the previous frame to be decoded, that is, it is necessary to decode the previous frame first, so there is a delay of at least 3 frames, and the delay time is long, which cannot meet the requirement of fast switching video stream.
For example, in VR encoding, the last frame in a GOP requires all previous frames to be decoded. When VR takes place the visual angle and switches, postpone too much, can greatly reduced use experience.
Therefore, the video encoding and decoding method in the related art has the problem of poor data transmission timeliness due to overlarge encoding and decoding time delay.
Disclosure of Invention
The application provides a video coding method, a video decoding method, a video coding device, a video decoding device, electronic equipment and a storage medium, which are used for at least solving the problem of poor data transmission timeliness caused by overlarge coding and decoding time delay in a video coding and decoding mode in the related technology.
According to an aspect of an embodiment of the present application, there is provided a video encoding method, including: acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first fragment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded; and coding the image group to be coded according to the target coding mode.
According to another aspect of the embodiments of the present application, there is also provided a video decoding method, including: acquiring an image group to be decoded of a video to be decoded, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to the group of pictures to be decoded, wherein the target reference relationship is used for indicating that, in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded; and decoding the image group to be decoded according to the target reference relation.
According to another aspect of embodiments of the present application, there is also provided a video encoding apparatus, including: the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring a group of images to be coded of a video to be coded, and the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; the device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining a target coding mode matched with the image group to be coded, the target coding mode is used for indicating that in two fragments to be coded into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first fragment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded; and the coding unit is used for coding the image group to be coded according to the target coding mode.
Optionally, the encoding unit includes: a determining module, configured to determine two to-be-encoded slices corresponding to the to-be-encoded image group, where each to-be-encoded slice includes at least one to-be-encoded non-key frame; and the first coding module is used for coding each non-key frame to be coded in each segment to be coded according to the target coding mode.
Optionally, the determining module includes: the first determining submodule is used for determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is the allowed maximum delay time, the target coding and decoding time is the coding and decoding time of one video frame, the coding and decoding time comprises the coding time and the decoding time, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time; and the second determining submodule is used for determining two to-be-encoded segments corresponding to the to-be-encoded image group according to the target series, wherein the number of the to-be-encoded non-key frames contained in the second to-be-encoded segment is less than or equal to the difference between the target series and 1.
Optionally, the first determining sub-module includes: a first determining subunit, configured to determine a target time difference between the target delay time and a first coding and decoding time, where the first coding and decoding time is the coding and decoding time of a key frame; a second determining subunit, configured to determine a quotient of the target time difference and a second coding and decoding time as the target stage number, where the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time includes the first coding and decoding time and the second coding and decoding time.
Optionally, the encoding unit includes: the second coding module is used for carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is the key frame to be coded; a third encoding module, configured to, when a current video frame to be encoded belongs to the first segment to be encoded, perform target encoding on the current video frame to be encoded by using the key frame to be encoded as a reference video frame of the current video frame to be encoded according to the target encoding mode; a fourth encoding module, configured to, when a current video frame to be encoded belongs to the second segment to be encoded, perform target encoding on the current video frame to be encoded according to the target encoding mode by using a target reference video frame as a reference video frame of the current video frame to be encoded, where the target reference video frame includes at least one non-key frame located before the current video frame to be encoded; wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.
Optionally, the fourth encoding module comprises: a third determining submodule, configured to determine, according to the target encoding mode, the key frame to be encoded and at least one non-key frame located before the current video frame to be encoded as the target reference video frame corresponding to the current video frame to be encoded, when the number of reference video frames corresponding to the current video frame to be encoded is multiple; and the coding submodule is used for performing target coding on the current video frame to be coded by taking the target reference video frame as a reference video frame of the current video frame to be coded.
Optionally, the encoding unit includes: and the fifth coding module is used for coding all the non-key frames to be coded in the first section to be coded in the group of pictures to be coded in parallel according to the target coding mode.
Optionally, the obtaining unit includes: the image group to be coded of the video to be coded is obtained under the condition that a main view angle area of a target object in a panoramic video is switched from a first view angle area to a second view angle area, wherein the video to be coded is a part of the panoramic video to be coded, which corresponds to the main view angle area, the image group to be coded is an image group in which a first video frame after view angle switching is positioned, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.
According to still another aspect of embodiments of the present application, there is also provided a video decoding apparatus including: the device comprises an acquisition unit, a decoding unit and a decoding unit, wherein the acquisition unit is used for acquiring an image group to be decoded of a video to be decoded, and the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; a determining unit, configured to determine a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in two fragments to be decoded into which the plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame located before each non-key frame to be decoded; and the decoding unit is used for decoding the image group to be decoded according to the target reference relation.
Optionally, the decoding unit includes: the first decoding module is used for carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded; the second decoding module is used for performing target decoding on the current video frame to be decoded by taking the key frame to be decoded as a reference video frame of the current video frame to be decoded according to the target reference relation under the condition that the current video frame to be decoded belongs to the first segment to be decoded; a third decoding module, configured to, when a current video frame to be decoded belongs to a second segment to be decoded, perform target decoding on the current video frame to be decoded by using a target reference video frame as a reference video frame of the current video frame to be decoded according to the target reference relationship, where the target reference video frame includes at least one non-key frame located in the current video frame to be decoded; wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.
Optionally, the third decoding module comprises: the determining submodule is used for determining the key frame to be decoded and at least one non-key frame positioned in front of the video frame to be decoded as the target reference video frame corresponding to the video frame to be decoded according to the target reference relation under the condition that the number of the reference video frames corresponding to the video frame to be decoded is multiple; and the decoding submodule is used for performing target decoding on the current video frame to be decoded by taking the target reference video frame as a reference video frame of the current video frame to be decoded.
Optionally, the decoding unit includes: and the fourth decoding module is used for decoding all the non-key frames to be decoded in the first segment to be decoded in the image group to be decoded in parallel according to the target reference relation.
Optionally, the obtaining unit includes: the image group to be decoded of the video to be decoded is acquired under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the video to be decoded is a part of the panoramic video to be decoded, which corresponds to the main view angle area, the image group to be decoded is an image group in which a first video frame after view angle switching is positioned, the definition corresponding to the main view angle area is a first definition, the definition corresponding to other areas except the main view angle area in the panoramic video is a second definition, and the first definition is higher than the second definition.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein the memory is used for storing the computer program; a processor for performing the method steps in any of the above embodiments by running the computer program stored on the memory.
According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method steps of any of the above embodiments when the computer program is executed.
In the embodiment of the application, a segmented coding and decoding mode is adopted, and a group of images to be coded of a video to be coded is obtained, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in two segments to be coded, into which a plurality of non-key frames to be coded are divided, all non-key frames to be coded in a first segment to be coded refer to key frames to be coded, and each non-key frame to be coded in a second segment to be coded refers to at least one non-key frame positioned in front of the video frame; the method comprises the steps of coding an image group to be coded according to a target coding mode, segmenting non-key frames of the image group, coding and decoding all non-key frames of a first segment by referring to key frames, referencing at least one non-key frame before the first segment by referring to the non-key frame of a second segment by referring to the non-key frame of the second segment, and decoding one non-key frame by waiting for the decoding time of the key frames, the decoding time of the last non-key frame of the first segment and the decoding time of all non-key frames before the last non-key frame of the second segment at most.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic diagram of a hardware environment for an alternative video encoding method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of an alternative video encoding method according to an embodiment of the present application;
FIG. 3 is a diagram of an alternative LDP coding mode;
FIG. 4 is a schematic diagram of an alternative VR perspective in accordance with embodiments of the present application;
FIG. 5 is a schematic diagram of an alternative video encoding method according to an embodiment of the present application;
FIG. 6 is a flow chart illustrating an alternative video decoding method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an alternative video encoding and decoding method according to an embodiment of the present application;
FIG. 8 is a flow chart of an alternative video encoding and decoding method according to an embodiment of the present application;
FIG. 9 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application;
fig. 10 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application;
fig. 11 is a block diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the embodiments of the present application better understood, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
1. video coding: the method is a method for converting a file in an original video format into a file in another video format by a compression technology, and common video coding and decoding standards are h.264, h.265, AVS, AV1 and the like.
2. Delaying: is an important index in network transmission, measures the time required for data to travel from one end point to another end point, and generally uses milliseconds, seconds and the like as units thereof.
3. Coding delay: the delay generated in the encoding process is the time consumed by inputting the video frame to the code stream generated after the encoding is finished.
4. The type of the encoded frame: the encoded frames are generally divided into 3 types: the I frame (intra-coded frame) is also called a key frame, is used as a random access point in a video stream, is coded in an intra-frame prediction mode (intra-frame coding), does not refer to other frames, and is generally high in coding quality and low in compression efficiency; a P frame (predictive coding frame) is coded by referring to a forward I frame or other forward P frames in an interframe prediction mode or an interframe and interframe prediction combined mode, and the compression efficiency is high; b frames (bidirectional predictive coding frames) can be predictive coded by referring to the frames in the forward direction and the backward direction, and the compression efficiency is highest.
5. GOP (Group Of Pictures, coding Group): in video coding, a GOP is a set of multi-frame consecutive encoded frame sequences used to aid random access in decoding, typically each GOP beginning with an I-frame.
6. POC (PictureOrder Count, picture order): which represents the display order of the source video frames when encoding video.
7. LDP (Low Delay P, Low Delay P frame) coding: the first frame in each GOP is encoded as an I-frame, the subsequent frames are all encoded as P-frames, and each P-frame is encoded with reference to only the picture in the play order preceding it. By avoiding backward reference, the coding and decoding sequence is ensured to be consistent with the display sequence, and the coding and decoding delay is reduced. In addition to the LDP coding mode, there are All-Intra (full I-frame) coding configurations and Random-Access (Random Access) coding configurations in video coding.
8. RTC (Real-time Communications): the most typical applications are live broadcast and live broadcast, real-time audio and video call, video conference, interactive online education, etc.
9. VR (Virtual Reality): it is a technology for providing an immersive sensation in an interactive three-dimensional environment generated on a computer by comprehensively using a computer graphic system and various interface devices such as display and control.
According to an aspect of an embodiment of the present application, there is provided a video encoding method. Alternatively, in the present embodiment, the video encoding method described above may be applied to a hardware environment formed by an encoding end (encoding device) 102, a decoding end (decoding device) 104, and a playing device 106 as shown in fig. 1. As shown in fig. 1, the encoding end 102 is connected to the decoding end 104 through a network, and a database may be provided on the encoding end 102 (and/or the decoding end 104) or independent of the encoding end 102 (and/or the decoding end 104) for providing a data storage service for the encoding end 102 (and/or the decoding end 104). The decoding end 104 and the playing device 106 may be two devices that are independently arranged, or may be the same device, which is not limited in this embodiment.
As shown in fig. 1, the encoding end 102 may be configured to encode an input video to be transmitted (or a video frame in the video to be transmitted), obtain a corresponding video code stream, and transmit the video code stream to the decoding end 104 through a network; the decoding end 104 may be configured to decode the received video code stream to obtain a corresponding video (or a video frame), and play the obtained video (or the video frame) through the playing device 106.
Such networks may include, but are not limited to: the encoding end 102 and the decoding end 104 may be terminal devices or servers, and may be but are not limited to at least one of the following: a PC, a cell phone, a tablet, a VR device, etc. The video encoding method of the embodiment of the present application may be executed by the encoding end 102, where the encoding end 102 may be a terminal device or a server. The terminal device executing the video coding method of the embodiment of the present application may also be executed by a client installed thereon.
Taking the video encoding method in the present embodiment executed by the encoding end 102 as an example, fig. 2 is a schematic flowchart of an alternative video encoding method according to an embodiment of the present application, and as shown in fig. 2, the flowchart of the method may include the following steps:
step S202, acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded.
The video encoding method in this embodiment may be applied to scenes with video transmission requirements, such as live broadcast, RTC, VR, and the like, where the video may be live broadcast video, real-time audio and video, panoramic video, and the like, and this is not limited in this embodiment.
The encoding device may encode a video to be encoded, and the video to be encoded may be a video to be transmitted to the decoding end and played by the playing device. The video to be encoded may comprise a plurality of groups of pictures, each group of pictures may comprise a key frame and a plurality of non-key frames, and the POC of the video frames within a group of pictures may be consecutively numbered starting from 0. POC =0 video frames may be key frames, with the remaining video frames being non-key frames. The video to be encoded is a group of pictures to be encoded currently, i.e., a group of pictures to be encoded. The group of images to be encoded may comprise a key frame to be encoded and a plurality of non-key frames to be encoded.
For example, a current group of pictures to be encoded includes 9 video frames, and according to the playing order of the video frames, the POC of the 9 video frames is: 0,1,2,3,4,5,6,7,8.
Step S204, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two segments to be coded, into which a plurality of non-key frames to be coded are divided, all non-key frames to be coded in a first segment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second segment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded.
In the related art, when encoding is performed by LDP and LDB, since a certain frame (not the first frame and the second frame) of the GOP refers to both the I frame and the picture of the previous frame, if the frame is to be decoded, the frame to which the GOP refers is decoded first, and therefore, there is a delay of at least 3 frames. For the last frame in a GOP, it is necessary to decode all previous frames before decoding is complete before decoding the frame.
For example, the LDP coding scheme may be as shown in fig. 3. For a GOP, each P frame is encoded while referring to the I frame and its previous P frame, and when any one frame is decoded, it is necessary that the I frame and all its previous P frames have been decoded before decoding the frame, and therefore, it is necessary to wait for the decoding time of the I frame and several P frames.
For the scenes with higher time delay requirements, the coding delay is too large by adopting LDP and LDB, so that the use experience can be greatly reduced. For example, for a VR scene, as shown in fig. 4, the main view area of the user is a high definition stream, and the other views are low definition streams. The main viewing angle area of the user may be switched with the transition of the user viewing angle, for example, the user viewing angle moves to the left, and the main viewing angle area moves to the left. Switching the view angle at any time requires fast conversion of the current view angle to a high definition video stream. If the LDP coding method is adopted, if the timing of random switching is just a certain frame later, it needs to wait for about 1.2 to 1.5 seconds, which exceeds the time that VR users can tolerate, and reduces the use experience.
In this embodiment, for a group of pictures to be encoded, the encoding device may determine a target encoding mode matching the group of pictures to be encoded, or a target encoding mode matching the video to be encoded. The target coding mode is used for indicating that in two segments to be coded into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in the first segment to be coded refer to the key frames to be coded, and each non-key frame to be coded in the second segment to be coded refers to at least one non-key frame positioned in front of the non-key frame to be coded.
By adopting the segment division mode, the frame can be coded and decoded at a higher speed when the vision is switched, and because the dynamic vision of people is insensitive to the details of the fast-moving object, the frame in the parallel segment (the first segment) can be displayed at a higher speed, and the detail error is not easy to be perceived by people subjectively; and then, with the switching to the serial frame, the inter-frame error is further reduced, and better image quality can be obtained even if the image is sensitive to details after the image is visually fixed.
It should be noted that the target coding mode describes a reference relationship between video frames, and may have different expressions, for example, the target coding mode is used to indicate that, in two segments to be coded into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first segment to be coded refer to key frames to be coded, a first non-key frame to be coded in a second segment to be coded refers to key frames to be coded, and other non-key frames to be coded except for the first non-key frame to be coded refer to at least one non-key frame in the second segment to be coded that is located before the other non-key frames to be coded.
The number of non-key frames to be encoded contained in a slice to be encoded may be an integer greater than or equal to 1, and for the first slice to be encoded, the number of non-key frames to be encoded contained therein may be an integer greater than 1. The number of non-key frames to be encoded included in each segment to be encoded may be the same or different, which is not specifically limited in this embodiment.
The target encoding mode may be indicated according to configuration information, which may be an encoding rule corresponding to the target encoding mode, which may include, but is not limited to: the video frame segmentation rules, the video frame reference rules, the video frame encoding scheme (e.g., intra-frame encoding, inter-frame encoding, intra-frame encoding in combination with inter-frame encoding), and other encoding rules may also be included. The configuration information may also be encoding indication information for indicating a reference relationship between video frames, that is, the second video frame refers to the second video frame, and the encoding indication information may also indicate an encoding manner of each video frame. This is not limited in this embodiment.
And step S206, coding the image group to be coded according to the target coding mode.
After the target coding mode is determined, the coding device may code the image group to be coded according to the target coding mode to obtain a corresponding video code stream. If the target coding mode is indicated by the coding rule, the coding device can determine the reference video frame of each non-key frame to be coded according to the target coding rule. If the target coding mode is indicated by the coding indication information, the coding device can determine the reference video frame of each non-key frame to be coded according to the coding indication information. The encoding device may encode each non-key frame to be encoded according to the reference video frame of each non-key frame to be encoded and according to the encoding mode corresponding to each non-key frame to be encoded.
Each frame in the encoded video code stream may contain information of which frames (i.e., reference relationships) are referred to when the frame is encoded, i.e., indication information for indicating reference relationships between video frames. The coding device can transmit the obtained video code stream to the decoding device through the network.
It should be noted that besides video frames, the video to be encoded may also contain other data, such as corresponding audio data, subtitle information, and so on. For other data, the encoding device may perform data compression in a certain data compression manner to obtain a corresponding data code stream, and transmit the data code stream to the decoding end through a network, where the data compression manner and the transmission manner (for example, transmission together with the video code stream, independent transmission, and the like) may be configured as needed, and this is not limited in this embodiment.
In this embodiment, by using a segmented encoding/decoding manner, a non-key frame of a group of pictures is divided into two segments, all non-key frames in a first segment refer to a key frame, each non-key frame in a second segment refers to at least one non-key frame before the non-key frame, and multiple video frames of the group of pictures are no longer in a chain reference relationship, so that encoding/decoding delay can be reduced, and encoding/decoding speed can be increased.
For example, for a VR scene, if the timing of the random switching is a next frame, the waiting time includes: the I frame of the image group where the video stream is located and the coding and decoding time of the P frame with the incidence relation (direct reference relation and indirect reference relation) between the I frame and the P frame can shorten the waiting time and quickly convert the current visual angle into the high-definition video stream compared with the coding and decoding time of all the video frames located before the I frame in the image group where the video stream is located, so that the use experience of a user is improved.
Through the steps S202 to S206, a group of images to be encoded of a video to be encoded is obtained, where the group of images to be encoded includes a key frame to be encoded and a plurality of non-key frames to be encoded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, which are divided by a plurality of non-key frames to be coded, all non-key frames to be coded in a first fragment to be coded refer to key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded; the method and the device encode the image group to be encoded according to the target encoding mode, solve the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in video encoding and decoding modes in the related technology, reduce the encoding and decoding time delay and improve the encoding and decoding efficiency.
As an alternative embodiment, encoding the group of images to be encoded according to the target encoding mode includes:
s11, determining two to-be-encoded segments corresponding to the to-be-encoded image group, wherein each to-be-encoded segment comprises at least one to-be-encoded non-key frame;
and S12, according to the target coding mode, coding each non-key frame to be coded in each section to be coded.
Based on the target encoding mode, the encoding device may first determine two to-be-encoded slices corresponding to the group of images to be encoded, and each to-be-encoded slice may include at least one non-key frame to be encoded. The number of the non-key frames to be encoded contained in different segments to be encoded may be the same or different, which is not limited in this embodiment.
According to the reference relationship indicated by the target coding mode, the coding device may first determine a reference video frame of each non-key frame to be coded in each segment to be coded, and then, according to the reference video frame corresponding to each non-key frame to be coded, the coding device may code each non-key frame to be coded. The reference video frames of different video frames to be encoded may be the same or different.
By the embodiment, the accuracy and reliability of video coding can be improved by grouping the non-key frames of the image group and coding each non-key frame according to the grouping.
As an alternative embodiment, determining two segments to be encoded corresponding to a group of images to be encoded comprises:
s21, determining a target series according to the target delay time and the target coding and decoding time, wherein the target delay time is the allowed maximum delay time, the target coding and decoding time is the coding and decoding time of one video frame, the coding and decoding time comprises the time for coding and the time for decoding, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time;
and S22, determining two to-be-encoded segments corresponding to the to-be-encoded image group according to the target series, wherein the number of the to-be-encoded non-key frames contained in the second to-be-encoded segment is less than or equal to the difference between the target series and 1.
For the image group to be encoded, the number of the non-key frames to be encoded contained in each segment to be encoded may be pre-configured, and the encoding device may read the segment configuration information corresponding to the video to be encoded, and determine the non-key frames to be encoded contained in each segment to be encoded according to the segment configuration information, thereby determining two segments to be encoded corresponding to the image group to be encoded.
Optionally, the two segments to be encoded corresponding to the image group to be encoded may also be estimated according to the time delay requirement (target delay time) and the time taken to encode and decode a frame (target encoding and decoding time): when a frame in a GOP is encoded, the encoding device may estimate a maximum delay frame number that a user can endure according to a delay requirement and a time for encoding and decoding a frame, calculate a minimum parallel frame number or a maximum serial frame number (target number) according to the maximum delay frame number, divide a non-key frame included in a first segment to be encoded according to the minimum parallel frame number, or divide a non-key frame included in a second segment to be encoded according to the maximum serial frame number, thereby determining two segments to be encoded corresponding to an image group to be encoded.
The latency requirement may be a maximum decoding latency (maximum allowed delay time) tolerable by a user, for example, for a VR scene (or similar panoramic video scene), the latency requirement refers to: after switching views, the user can tolerate the time required to switch from the current view to the high definition video stream. The time delay requirement may be manually input (manually input by a user, a preset default value according to an empirical value, etc.), or may be calculated according to object characteristics of a certain object (for example, characteristics for characterizing the object state), or may be calculated according to object characteristics of a plurality of objects, which is not limited in this embodiment.
The encoding and decoding time of a frame is the encoding and decoding time of a video frame, and the encoding and decoding time comprises the encoding time and the decoding time. The time taken to encode and decode a frame is an estimate, which is a statistical estimate of the time taken to encode and decode a video frame (image frame, e.g., I frame, P frame) on the premise of continuously transmitting a video stream. The time for encoding and decoding a frame may be manually input (manually input by a user, preset by a relevant person according to an empirical value, etc.), or may be dynamically adjusted based on a statistical value in the encoding and decoding process, which is not limited in this embodiment.
The encoding apparatus may estimate the maximum number of delay frames that the user can endure according to the delay requirement and the time for encoding and decoding one frame, that is, the number of video frames allowed to be encoded and decoded within the target delay time, which may be the number of all types of video frames (e.g., I-frames and P-frames), or the number of specific types of video frames (e.g., P-frames).
According to the maximum delay frame number, the encoding device may determine a target stage number corresponding to the image group to be encoded, that is, the number of non-key frames allowed to be encoded and decoded in the target delay time, and since the first non-key frame of the second segment to be encoded refers to the last non-key frame of the first segment to be encoded, the difference between the target stage number and 1 is to satisfy the requirement of the maximum delay frame number, and the number of non-key frames allowed to be included in the second segment to be encoded at most. If the maximum delay frame number is the total number of the key frames and the non-key frames, the target stage is the difference between the maximum delay frame number and the number of the key frames, and if the maximum delay frame number is the number of the non-key frames, the target stage is the maximum delay frame number.
According to the target series, the encoding device may determine two to-be-encoded segments corresponding to the to-be-encoded image group, where the number of to-be-encoded non-key frames included in the second to-be-encoded segment is at most the difference between the target series and 1, and the number of to-be-encoded non-key frames included in the first to-be-encoded segment is the difference between the total number of to-be-encoded non-key frames included in the to-be-encoded image group and the number of to-be-encoded non-key frames included in the second to-be-encoded segment.
It should be noted that, because the codec may fluctuate, in this embodiment, the target stage number is dynamically estimated according to the statistical delay, so as to adjust the segmentation mode of the non-key frames in the graphics group.
For example, as shown in fig. 5, the target level is 5, correspondingly, the minimum parallel frame number is 4, the maximum serial frame number is 5, 9 video frames are included in the group of pictures, which are respectively an I frame and 8P frames, the 8P frames are divided into 2 segments, the number of P frames included in the first segment is 4, and the number of P frames included in the first segment is 4.
By the embodiment, the target stage number (the maximum delay frame number of the P frame that the user can endure) is estimated according to the time delay requirement and the time for coding and decoding one frame, so that the coding and decoding speed can be improved, the coding and decoding time delay can meet the time delay requirement, and the use experience of the user is improved.
As an alternative embodiment, determining the target stage number according to the target delay time and the target codec time includes:
s31, determining a target time difference value between the target delay time and a first coding and decoding time, wherein the first coding and decoding time is the coding and decoding time of a key frame;
and S32, determining the quotient of the target time difference and second coding and decoding time as a target stage number, wherein the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time comprises the first coding and decoding time and the second coding and decoding time.
Due to different encoding and decoding modes, the encoding and decoding time of the key frame (e.g., I frame) and the encoding and decoding time of the non-key frame (e.g., P frame) of a group of pictures are different, and the target encoding and decoding time may include: a first codec time corresponding to a key frame of a group of pictures and a second codec time corresponding to a non-key frame of a group of pictures. The encoding apparatus may determine a target delay time, a first codec time, and a second codec time, and determine a target stage number according to the first codec time and the second codec time.
Optionally, in this embodiment, the encoding device may calculate a target time difference between the target delay time and the first codec time, where the target time difference is a maximum delay time of the allowed codec non-key frame; and determining the quotient of the target time difference and the second coding and decoding time as a target progression (the maximum delay frame number of the P frames).
For example, when the tolerable delay is t, the time required for encoding and decoding a frame I is tIThe time required for encoding and decoding a frame P is tPAnd n P frames are included in the GOP, then the adaptive series L (the adaptive maximum serial frame number, which acts as the same target series) can be calculated as: (t-t)I)/tPThe adaptive minimum parallel frame number N is: n +1- (t-t)I)/tP. The frames within a GOP are divided into two slices according to L, the first slice containing N P frames and the second slice containing (L-1) P frames.
Optionally, the P frame of the first segment may refer to the previous frame image for encoding and decoding, and the P frame of the second segment refers to the I frame for encoding and decoding, which is not described herein again.
According to the embodiment, the maximum delay frame number is determined according to the target delay time, the coding and decoding time of the I frame and the coding and decoding time of the P frame, so that the target stage number can be adaptively adjusted according to the delay and the calculation power (calculation capacity), the application delay requirement can be met, and the use experience of a user is ensured.
As an alternative embodiment, encoding the group of images to be encoded according to the target encoding mode includes:
s41, carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is a key frame to be coded;
s42, under the condition that the current video frame to be coded belongs to the first segment to be coded, according to the target coding mode, using the key frame to be coded as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded;
s43, under the condition that the current video frame to be coded belongs to the second segment to be coded, according to the target coding mode, taking the target reference video frame as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded, wherein the target reference video frame comprises at least one non-key frame positioned in front of the current video frame to be coded;
wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.
If the current video frame to be coded is the key frame (the first video frame) to be coded in the image group to be coded, the current video frame to be coded can be used as a random access point in the video stream. The coding device may encode (intra-frame coding) the image group to be coded by using intra-frame prediction without referring to other frames, so as to obtain an I frame (intra-coded frame) corresponding to the image group to be coded. I-frames are typically encoded with higher quality and compression efficiency.
And if the current video frame to be coded belongs to the first segment to be coded, the reference video frame of the current video frame to be coded is the key frame to be coded according to the target coding mode. The coding device can use the key frame to be coded as a reference video frame of the current video frame to be coded, and code the current video frame to be coded in an inter-frame prediction mode (inter-frame coding), or code the current video frame to be coded in a mode of combining intra-frame prediction and inter-frame prediction (intra-frame coding and inter-frame coding), so that the compression efficiency is improved.
And if the current video frame to be coded belongs to the second segment to be coded, according to the target coding mode, the reference video frame of the current video frame to be coded comprises at least one non-key frame positioned before the current video frame to be coded. The encoding device may first determine a target reference video frame corresponding to a video frame currently to be encoded. The target reference video frame may include: the at least one non-key frame located before the current video frame to be encoded may also include: and key frames to be coded.
The encoding device can use the target reference video frame as a reference video frame of the current video frame to be encoded, and perform interframe encoding on the video frame to be encoded, or perform intraframe encoding and interframe encoding on the current video frame to be encoded, so as to improve the compression efficiency.
According to the embodiment, the key frames of the image group and the non-key frames at different positions of each segment are coded in different coding modes, so that both the coding quality and the compression efficiency can be considered, and the resource utilization rate is improved.
As an alternative embodiment, according to the target encoding mode, the target reference video frame is used as a reference video frame of the current video frame to be encoded, and the target encoding of the current video frame to be encoded includes:
s51, under the condition that the number of reference video frames corresponding to the current video frame to be coded is multiple, determining a key frame to be coded and at least one non-key frame positioned before the current video frame to be coded as a target reference video frame corresponding to the current video frame to be coded according to a target coding mode;
and S52, taking the target reference video frame as the reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded.
If the current video frame to be encoded allows multiple reference frames, that is, the number of reference video frames corresponding to the current video frame to be encoded is multiple, according to the target encoding mode, the encoding apparatus may determine multiple reference video frames (target reference video frames) corresponding to the current video frame to be encoded, and may include: the video coding method includes a key frame to be coded, at least one non-key frame located before a current video frame to be coded in an image group to be coded, for example, a previous non-key frame of the current video frame to be coded.
For example, the key frame to be encoded and the previous non-key frame of the current video frame to be encoded may be determined as the target reference video frame corresponding to the current video frame to be encoded.
After determining the target reference video frame, the encoding device may use the target reference video frame as a reference video frame of the current video frame to be encoded, and perform inter-frame encoding or intra-frame encoding combined with inter-frame encoding on the current video frame to be encoded.
For example, as shown in fig. 5, if the reference to the multi-frame is allowed, the frame in the second segment may refer to the multi-frame forward, for example, when the reference frame number is 2, the frame with POC =6 in fig. 5 may refer to the frames with POC =0 and POC = 4.
By the embodiment, when the reference multi-frame is allowed, a plurality of reference video frames corresponding to the video frame to be coded are determined based on the segments of the image group, and the reasonability of the determination of the reference video frames can be ensured.
As an alternative embodiment, encoding the group of images to be encoded according to the target encoding mode includes:
s61, according to the target coding mode, coding all non-key frames to be coded in the first segment to be coded in the group of pictures to be coded in parallel.
In LDP coding mode, each P frame references both the I frame and its preceding P frame, starting with the second P frame, and therefore the way P frames are coded is serial and coding is inefficient.
Optionally, in this embodiment, during encoding, a GOP is segmented according to a target level, and since a P frame is segmented and is no longer in a chain reference relationship, P frames at the same level in the GOP may be encoded and decoded in parallel, which greatly accelerates the encoding and decoding speed.
According to the target coding mode, all the non-key frames to be coded in the first segment to be coded refer to the key frames to be coded, so that the coding equipment can carry out parallel coding and decoding on all the non-key frames to be coded in the first segment to be coded, and the coding and decoding speed is increased.
For example, as shown in fig. 5, the first segment contains the 1 st P frame, the 2 nd P frame, the 3 rd P frame, and the 4 th P frame, and these 4P frames can be coded and decoded in parallel. The P frame coding and decoding uses a parallel mode, and the coding and decoding speed can be increased.
By the embodiment, the coding and decoding speed can be increased by coding and decoding all the non-key frames to be coded in the first segment to be coded in parallel.
As an alternative embodiment, the obtaining of the group of images to be encoded of the video to be encoded includes:
s71, under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, acquiring an image group to be coded of a video to be coded, wherein the video to be coded is a part of the panoramic video to be coded, which corresponds to the main view angle area, the image group to be coded is an image group in which a first video frame after view angle switching occurs, the definition corresponding to the main view angle area is a first definition, the definition corresponding to other areas except the main view angle area in the panoramic video is a second definition, and the first definition is higher than the second definition.
The encoding method in this embodiment may be applied to different video transmission scenes, for example, scenes of panoramic video transmission. For a panoramic video (e.g., a panoramic video in a VR scene), the definition corresponding to the main viewing angle area of the user may be configured as a first definition (high definition), and the definitions corresponding to the other areas except the main viewing angle area in the panoramic video may be configured as a second definition (low definition), where the first definition is higher than the second definition.
Before the view switching does not occur, the main view area of the target object (user) is the first view area, high-definition videos are displayed in the first view area, and low-definition videos are displayed in other areas except the first view area in the panoramic video. At a certain moment, the visual angle of the target object is switched, the main visual angle area of the target object is switched from the first visual angle area to the second visual angle area, and then the video code stream corresponding to the second visual angle area needs to be switched into the high-definition code stream, so that the high-definition video is displayed in the second visual angle area.
The video to be encoded may be a portion of the panoramic video to be encoded, which is located in the main view angle region, and the definition corresponding to the video code stream of the video to be encoded is the first definition. For example, when the VR device detects that the user is switched at a certain time, the video code stream located in the main view area needs to be quickly switched to the high definition code stream after the switching occurs. The encoder in the encoding device itself has a parameter that controls the quality of the video stream, and the sharpness of the video stream can be controlled by this parameter.
Through the embodiment, the definition corresponding to the video code stream is controlled according to the switching of the user visual angle area, the conversion speed from the low definition stream to the high definition stream in the main visual angle area can be increased, and the use experience of a user is improved.
Optionally, in this embodiment, before acquiring the group of pictures to be encoded of the video to be encoded, the encoding device may receive target view information transmitted by a target device, where the target device is a device for acquiring view information of a target object viewing a panoramic video; according to the target view information, the encoding apparatus may determine that a main view region of a target object is switched from a first view region to a second view region in the panoramic video.
At a decoding end or a playing device end, a target device (e.g., a VR device) may acquire view information of a panoramic video viewed by a target object, obtain target view information, and transmit the target view information to an encoding device through a network by the target device, the encoding device, or the playing device. The target device, the encoding device, and the playing device may be the same device or different devices, which is not limited in this embodiment.
The encoding device may receive target view information transmitted by the target device, and determine, according to the target view information, that a main view area of a target object in the panoramic video is switched from a first view area to a second view area. The target view information may be area information of a main view area, or may be a position of a main viewpoint of a target object on a panoramic video (panoramic video frame). The encoding device may directly determine the main view area through the target view information, or may determine the main view area through a position of the main viewpoint on the panoramic video (panoramic video frame) and the range information of the main view area, which is not limited in this embodiment.
For example, the VR glasses (or the VR of the mobile phone when drawing a picture) may obtain the view angle information of the user, determine the position of the main viewpoint in the picture, and further determine the main view angle area of the user.
By the embodiment, the main visual angle area of the target object is determined by acquiring the visual angle information of the target object, so that the accuracy of determining the main visual angle area can be improved.
Optionally, in this embodiment, for a video frame to be encoded in a group of pictures to be encoded, the encoding device may encode the video frame to be encoded to obtain a first video code stream, where a definition corresponding to the first video code stream is a first definition and may be a high-definition code stream. In order to ensure that a user can see the video pictures in the switched view angle area when the view angle is switched, the coding device can code the panoramic video frame to be coded of the panoramic video to be coded to obtain a second video code stream, wherein the definition corresponding to the second video code stream is the second definition and is a low-definition code stream. The encoding device can transmit the first video code stream and the second video code stream to the decoding device, so that the decoding device can render video frames (image frames and images) obtained by decoding the first video code stream into a main view angle area of the video frames obtained by decoding the second video code stream, and high-definition video pictures are displayed in the main view angle area.
The encoding process of the panoramic video frame to be encoded and the encoding process of the video frame to be encoded may be executed simultaneously (consuming a certain storage resource), or may be executed sequentially, which is not limited in this embodiment.
For example, if a video picture displayed in a certain view angle region is converted into a high definition video, a low definition video stream can still be transmitted, because the transmission cost is low, and if the region corresponding to the current view angle is no longer the main visual range during view angle conversion, the low definition video stream can be directly switched to for decoding.
Through this embodiment, through transmitting low clear code stream and high definition code stream simultaneously, can guarantee the integrality that the user video information shows when the visual angle switches, improve user's use and experience.
According to another aspect of the embodiment of the present application, there is also provided a video decoding method. Alternatively, in this embodiment, the video decoding method may be applied to a hardware environment formed by the encoding end 102, the decoding end 104 and the playing device 106 as shown in fig. 1. The description is already given and will not be repeated herein.
The video decoding method of the embodiment of the present application may be executed by the decoding end 104, where the decoding end 104 may be a terminal device or a server. The terminal device executing the video decoding method according to the embodiment of the present application may also be executed by a client installed thereon. Taking the video decoding method in the present embodiment executed by the decoding end 104 as an example, fig. 6 is a schematic flowchart of an alternative video decoding method according to an embodiment of the present application, and as shown in fig. 6, the flowchart of the method may include the following steps:
step S602, acquiring a group of pictures to be decoded of a video to be decoded, wherein the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded.
The video decoding method in this embodiment may be used to decode a video code stream obtained by encoding a group of pictures to be encoded by any one of the above video encoding methods. The decoding device can obtain a video code stream transmitted by the device through a network, namely, a group of pictures to be decoded of a video to be decoded, wherein the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded.
Step S604, determining a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame located before each non-key frame to be decoded.
The decoding device may determine a target reference relationship corresponding to the group of pictures to be decoded, the target reference relationship being used to indicate video frames referenced by respective video frames to be decoded in the group of pictures to be decoded. The indication information of the target reference relationship may be carried in a video code stream corresponding to each video frame to be decoded in the group of pictures to be decoded.
The target reference relationship corresponds to a target coding mode adopted by a coding side, and the indicated reference relationship is as follows: in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in the first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in the second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded.
And step S606, decoding the image group to be decoded according to the target reference relationship.
According to the target reference relation, the decoding device can decode each video frame to be decoded in the group of pictures to be decoded. For the key frame to be decoded in the image group to be decoded, the decoding device can perform intra-frame decoding on the key frame to be decoded to obtain a corresponding video frame; for the non-key frame to be decoded in the image to be decoded, the decoding device may determine the reference image frame of the non-key frame to be decoded based on the target reference relationship, and perform inter-frame decoding according to the corresponding reference image frame, or obtain the corresponding video frame by combining intra-frame decoding with inter-frame decoding. The related art may be referred to for the decoding process of the video frame to be decoded, which is not limited in this embodiment.
Through the steps S602 to S606, a group of pictures to be decoded of a video to be decoded is obtained, where the group of pictures to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to a group of pictures to be decoded, wherein the target reference relationship is used for indicating that in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded; the image group to be decoded is decoded according to the target reference relationship, so that the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in a video encoding and decoding mode in the related technology is solved, the encoding and decoding time delay is reduced, and the encoding and decoding efficiency is improved.
As an alternative embodiment, decoding the group of pictures to be decoded according to the target reference relationship includes:
s81, under the condition that the current video frame to be decoded is the key frame to be decoded, carrying out intra-frame decoding on the key frame to be decoded;
s82, under the condition that the current video frame to be decoded belongs to the first segment to be decoded, according to the target reference relationship, taking the key frame to be decoded as the reference video frame of the current video frame to be decoded, and carrying out target decoding on the current video frame to be decoded;
s83, when the current video frame to be decoded belongs to the second segment to be decoded, according to the target reference relation, the target reference video frame is used as the reference video frame of the current video frame to be decoded, and the current video frame to be decoded is subjected to target decoding, wherein the target reference video frame comprises at least one non-key frame positioned in front of the current video frame to be decoded;
wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.
If the current video frame to be decoded is the key frame (the first video frame, I frame) to be decoded of the group of pictures to be decoded, it can be used as a random access point in the video stream. The decoding device can directly perform intra-frame decoding on other frames without referring to the other frames to obtain corresponding video frames.
And if the current video frame to be decoded belongs to the first segment to be decoded, the reference video frame of the current video frame to be decoded is the key frame to be decoded according to the target reference relationship. The decoding device may use the key frame to be decoded as a reference video frame of the current video frame to be decoded, and perform inter-frame decoding on the current video frame to be decoded, or perform intra-frame decoding and inter-frame decoding on the current video frame to be decoded.
And if the current video frame to be decoded belongs to the second segment to be decoded, according to the target indication information, the reference video frame of the current video frame to be decoded comprises at least one non-key frame (non-key frame to be decoded) positioned before the current video frame to be decoded. The decoding device may first determine a target reference video frame corresponding to a video frame currently to be decoded. The target reference video frame may include: at least one non-key frame located before the video frame to be decoded currently in the current group of pictures to be decoded may also include: the key frame to be decoded.
The decoding device may perform inter-frame decoding on the video frame to be decoded by using the target reference video frame as a reference video frame of the video frame to be decoded currently, or perform intra-frame decoding and inter-frame decoding on the video frame to be decoded currently.
By the embodiment, the key frames of the image group and the non-key frames in different segments are decoded in different decoding modes, so that the video quality and the compression efficiency can be considered, and the resource utilization rate is improved.
As an optional embodiment, according to the target reference relationship, the target reference video frame is used as a reference video frame of the current video frame to be decoded, and the target decoding of the current video frame to be decoded includes:
s91, determining a key frame to be decoded and at least one non-key frame before the video frame to be decoded as a target reference video frame corresponding to the video frame to be decoded according to a target reference relationship under the condition that the number of reference video frames corresponding to the video frame to be decoded is multiple;
and S92, taking the target reference video frame as the reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded.
If multiple reference frames are allowed, that is, the number of reference video frames corresponding to the current video frame to be decoded can be multiple, according to the target reference relationship, the decoding device can determine the key frame to be decoded of the group of pictures to be decoded and at least one non-key frame located before the current video frame to be decoded in the group of pictures to be decoded as the target reference video frame corresponding to the current video frame to be decoded.
For example, the key frame to be decoded and the previous non-key frame of the current video frame to be decoded may be determined as the target reference video frame corresponding to the current video frame to be decoded.
After the target reference video frame is determined, the decoding device may use the target reference video frame as a reference video frame of a current video frame to be decoded, and perform inter-frame decoding or intra-frame decoding combined with inter-frame decoding on the current video frame to be decoded.
By the embodiment, the accuracy of video frame decoding can be improved by decoding the video frame to be decoded by referring to the plurality of video frames.
As an alternative embodiment, decoding the group of pictures to be decoded according to the target reference relationship includes:
s101, according to the target reference relation, all non-key frames to be decoded in the first segment to be decoded in the image group to be decoded are decoded in parallel.
According to the target reference relationship, the decoding device can decode all the non-key frames to be decoded in the first segment to be decoded in the image group to be decoded, and the number of all the non-key frames to be decoded in the first segment to be decoded is at least two.
For all the non-key frames to be decoded in the first segment to be decoded, the decoding device can decode the non-key frames in parallel, and the parallel coding and decoding of all the non-key frames in the first segment can accelerate the coding and decoding speed.
By the embodiment, the coding and decoding speed can be increased by coding and decoding all the non-key frames in the first segment in parallel.
As an alternative embodiment, the obtaining a group of pictures to be decoded of a video to be decoded includes:
s111, under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, acquiring an image group to be decoded of a video to be decoded, wherein the video to be decoded is a part of the panoramic video to be decoded, which corresponds to the main view angle area, the image group to be decoded is an image group in which a first video frame after view angle switching is positioned, the definition corresponding to the main view angle area is a first definition, the definition corresponding to other areas except the main view angle area in the panoramic video is a second definition, and the first definition is higher than the second definition.
For the application scene of the panoramic video, a high-definition video can be displayed in a main visual angle area of a user, corresponding to a first definition, a low-definition video can be displayed in other areas except the main visual angle area, corresponding to a second definition, and the first definition is higher than the second definition.
If the main view area of the target object is switched from the first view area to the second view area, the decoding device may acquire an image group in which the first video frame after the view switching occurs is located, that is, an image group to be decoded. For example, the VR device detects that the user is switched at a certain time, and after the switching occurs, the decoding device may perform video decoding in the above decoding manner, so that the video code stream located in the main view angle area may be quickly switched to the high definition code stream.
The decoding device may receive a video code stream transmitted by the encoding device, for example, a first video code stream, which corresponds to a video frame to be decoded in the group of pictures to be decoded, and may be a high-definition code stream, and for example, a second video code stream, which corresponds to a panoramic video frame to be decoded of the panoramic video to be decoded, and may be a low-definition code stream. The video frame to be decoded and the panoramic video frame to be decoded have a corresponding relationship, and the decoding device can render the video frame (image frame and image) decoded by the first video code stream into the main view angle area of the video frame decoded by the second video code stream, so that a high-definition video picture is displayed in the main view angle area. The sharpness corresponding to the main viewing angle region (second viewing angle region) is a first sharpness, and the sharpness corresponding to the other regions except the second viewing angle region is a second sharpness, and the first sharpness is higher than the second sharpness.
The second view region may be indicated by region indication information in the video bitstream. The area indication information is used to indicate a position of the main view area in the panoramic video, and may be area information of the main view area, or may be other types of information, and information that can indicate an area range of the main view area may be used as the area indication information.
Optionally, the second view angle region may also be determined by target view angle information transmitted by the target device, and by matching the time, the decoding device may determine a corresponding relationship between the target view angle information and the video frame to be decoded, so as to determine the second view angle region for displaying the video frame to be decoded.
Through this embodiment, the definition that the video code stream corresponds is controlled according to the regional switching of user's visual angle, can improve the switching speed that low clear flows in the user main visual angle region were flowed to high clear flows, promotes user's use and experiences.
The following explains a video encoding and decoding method in the embodiments of the present application with reference to an alternative example. The example provides a hybrid adaptive low-delay coding frame reference method applied to VR scene, when coding, the GOP of the panoramic video is divided into segments according to the progression, and the P frames supporting the parallel segments can be coded and decoded in parallel, so that the coding speed is accelerated. Similarly, the P frames supporting the parallel fragments can be decoded in parallel during decoding, and the decoding speed is accelerated. When encoding in this way, when VR switches from the main view to either the left or right view, the conversion from high definition to low definition can be done quickly.
The hybrid adaptive low-latency encoded frame reference method provided in this example may be applied to a network architecture as shown in fig. 7, in which:
an encoder for acquiring and splicing a panoramic video, the panoramic video being a video played in a VR device; coding video data in a main view angle area of a user in a panoramic video according to a first definition (high definition) to obtain a corresponding high definition stream, coding the panoramic video according to a second definition (low definition) to obtain a corresponding low definition stream, and transmitting the obtained high definition stream and the obtained low definition stream to a decoder through a network; receiving visual angle information of a user transmitted by VR equipment, and determining a main visual angle area of the user according to the visual angle information;
the decoder is used for decoding the high-definition stream and the low-definition stream respectively, rendering a video frame obtained by decoding the high-definition stream to a main visual angle area of a video frame obtained by decoding the corresponding low-definition stream to obtain a decoded video, and playing the decoded video through VR equipment;
and the VR equipment is used for playing the video decoded by the decoder, acquiring the visual angle information of the user, and transmitting the visual angle information of the user to the encoder through a network when the visual angle of the user is switched.
As shown in fig. 8, the flow of the video encoding and decoding method in this example may include the following steps:
step S802, estimating the maximum delay frame number which can be endured by the user according to the time delay requirement and the time of coding and decoding a frame, and calculating the maximum serial frame number (target series) according to the frame number.
On the encoding side, when a frame in a GOP is encoded, the encoder can estimate the maximum number of delay frames that a user can endure according to the delay requirement and the time for encoding and decoding a frame, and calculate the maximum serial frame number (or the minimum parallel frame number) according to the maximum number of delay frames.
For example, the user can tolerate a delay of t, which is required for encoding/decoding a frame IICoding/decoding a frame P requires tPAnd n P frames are included in the GOP, then the adaptive maximum serial frame number M can be calculated as: ((t-t)I))/tPThe minimum number of parallel frames N is: n + 1-M.
Step S804, coding the video frame corresponding to the main visual angle area in the panoramic video according to the maximum serial frame number to obtain a corresponding video code stream, and transmitting the obtained video code stream to a decoder through a network.
The P frames in a GOP may be divided into two segments according to a maximum serial number of frames, M, the first segment containing at least (n-M), where n is the total number of P frames contained in a GOP and the second segment containing at most M frames.
The first segment is a parallel segment, all P frames refer to I frames for coding and decoding, and P frames in the second segment refer to the previous frame for coding and decoding. The encoder may encode a video frame corresponding to the main view region in the panoramic video according to the encoding mode to obtain a corresponding video code stream, and transmit the obtained video code stream to the decoder through a network. In addition, the encoder may encode the panoramic video frame in the same manner or in a different manner, which is not limited in this example.
When encoding in this way, when VR switches from the main view to either of the left and right views (switching views to the left or right), the transition from high definition to low definition can be done quickly.
Step 806, decoding the received video code stream according to the reference relationship during encoding to obtain a corresponding video, and playing the decoded video through the VR device.
Each frame in the code stream obtained by coding contains the information of which frames are referred to when the frame is coded, namely the reference relation. The decoder can decode the received video code stream according to the reference relation during encoding to obtain a corresponding video, and transmits the corresponding video to the VR equipment for playing.
In addition, because the images in the first segment are coded by referring to the I frame and do not depend on the previous P frame, the images in the first segment can be coded and decoded in parallel, and the coding and decoding speed is greatly increased.
In the LDP mode in the related art, when any one P frame is decoded, all reference frames thereof need to be stored in a memory. In this example, when decoding a P frame in the first slice, it is necessary to wait for the decoding time of one I frame, and when decoding any P frame in the second slice, it is necessary to wait for the decoding time of one I frame plus the decoding time of + 1P frames, the number of P frames preceding the P frame, in the second slice.
By the example, because each GOP is subjected to two-segment division coding, coding delay and the maximum delay can be calculated according to the current computing capacity and application delay requirements, parallel coding is carried out on parallel P frames in a division mode, and the coding speed is increased; similarly, the parallel P frames supported during decoding can be decoded in parallel, and the decoding speed is accelerated.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.
According to another aspect of the embodiments of the present application, there is also provided a video encoding apparatus for implementing the above video encoding method. Fig. 9 is a block diagram of an alternative video encoding apparatus according to an embodiment of the present application, and as shown in fig. 9, the apparatus may include:
an obtaining unit 902, configured to obtain a group of images to be encoded of a video to be encoded, where the group of images to be encoded includes a key frame to be encoded and a plurality of non-key frames to be encoded;
a determining unit 904, connected to the obtaining unit 902, configured to determine a target coding mode matched with the to-be-coded image group, where the target coding mode is used to indicate that, in two to-be-coded segments into which a plurality of to-be-coded non-key frames are divided, all to-be-coded non-key frames in a first to-be-coded segment refer to the to-be-coded key frame, and each to-be-coded non-key frame in a second to-be-coded segment refers to at least one non-key frame located before each to-be-coded non-key frame;
and the encoding unit 906 is connected to the determining unit 904, and is configured to encode the image group to be encoded according to the target encoding mode.
It should be noted that the obtaining unit 902 in this embodiment may be configured to execute the step S202, the determining unit 904 in this embodiment may be configured to execute the step S204, and the encoding unit 906 in this embodiment may be configured to execute the step S206.
Through the module, a group of images to be coded of a video to be coded is obtained, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded; determining a target coding mode matched with an image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, which are divided by a plurality of non-key frames to be coded, all non-key frames to be coded in a first fragment to be coded refer to key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded; the method and the device encode the image group to be encoded according to the target encoding mode, solve the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in video encoding and decoding modes in the related technology, reduce the encoding and decoding time delay and improve the encoding and decoding efficiency.
As an alternative embodiment, the encoding unit 906 includes:
the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining two to-be-encoded segments corresponding to an image group to be encoded, and each to-be-encoded segment comprises at least one to-be-encoded non-key frame;
and the first coding module is used for coding each non-key frame to be coded in each section to be coded according to the target coding mode.
As an alternative embodiment, the determining module includes:
the first determining submodule is used for determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is the allowed maximum delay time, the target coding and decoding time is the coding and decoding time of one video frame, the coding and decoding time comprises the time used for coding and the time used for decoding, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time;
and the second determining submodule is used for determining two to-be-encoded segments corresponding to the to-be-encoded image group according to the target series, wherein the number of the to-be-encoded non-key frames contained in the second to-be-encoded segment is less than or equal to the difference between the target series and 1.
As an alternative embodiment, the first determination submodule includes:
the first determining subunit is configured to determine a target time difference between a target delay time and a first encoding and decoding time, where the first encoding and decoding time is an encoding and decoding time of a key frame;
and the second determining subunit is configured to determine a quotient of the target time difference and a second coding and decoding time as a target stage number, where the second coding and decoding time is a coding and decoding time of a non-key frame, and the target coding and decoding time includes the first coding and decoding time and the second coding and decoding time.
As an alternative embodiment, the encoding unit 906 includes:
the second coding module is used for carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is a key frame to be coded;
the third coding module is used for performing target coding on the current video frame to be coded by taking the key frame to be coded as a reference video frame of the current video frame to be coded according to a target coding mode under the condition that the current video frame to be coded belongs to the first segment to be coded;
the fourth coding module is used for taking the target reference video frame as the reference video frame of the current video frame to be coded and carrying out target coding on the current video frame to be coded according to a target coding mode under the condition that the current video frame to be coded belongs to the second segment to be coded, wherein the target reference video frame comprises at least one non-key frame positioned in front of the current video frame to be coded;
wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.
As an alternative embodiment, the fourth encoding module comprises:
the third determining submodule is used for determining the key frame to be coded and at least one non-key frame positioned in front of the video frame to be coded as the target reference video frame corresponding to the video frame to be coded according to the target coding mode under the condition that the number of the reference video frames corresponding to the video frame to be coded is multiple;
and the coding submodule is used for performing target coding on the current video frame to be coded by taking the target reference video frame as the reference video frame of the current video frame to be coded.
As an alternative embodiment, the encoding unit 906 includes:
and the fifth coding module is used for coding all the non-key frames to be coded in the first section to be coded in the group of pictures to be coded in parallel according to the target coding mode.
As an alternative embodiment, the obtaining unit 902 includes:
the first acquisition module is used for acquiring a to-be-encoded image group of a to-be-encoded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-encoded video is a part corresponding to the main view angle area in the to-be-encoded panoramic video, the to-be-encoded image group is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.
According to another aspect of the embodiments of the present application, there is also provided a video decoding apparatus for implementing the above video decoding method. Fig. 10 is a block diagram of an alternative video decoding apparatus according to an embodiment of the present application, and as shown in fig. 10, the apparatus may include:
the acquiring unit 1002 is configured to acquire an image group to be decoded of a video to be decoded, where the image group to be decoded includes a key frame to be decoded and a plurality of non-key frames to be decoded;
a determining unit 1004, connected to the obtaining unit 1002, configured to determine a target reference relationship corresponding to the image group to be decoded, where the target reference relationship is used to indicate that, in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame located before each non-key frame to be decoded;
the decoding unit 1006, connected to the determining unit 1004, is configured to decode the group of pictures to be decoded according to the target reference relationship.
It should be noted that the obtaining unit 1002 in this embodiment may be configured to execute the step S602, the determining unit 1004 in this embodiment may be configured to execute the step S604, and the decoding unit 1006 in this embodiment may be configured to execute the step S606.
Through the module, the image group to be decoded of the video to be decoded is obtained, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded; determining a target reference relationship corresponding to a group of pictures to be decoded, wherein the target reference relationship is used for indicating that in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded; the image group to be decoded is decoded according to the target reference relationship, so that the problem of poor data transmission timeliness caused by overlarge encoding and decoding time delay in a video encoding and decoding mode in the related technology is solved, the encoding and decoding time delay is reduced, and the encoding and decoding efficiency is improved.
As an alternative embodiment, the decoding unit 1006 includes:
the first decoding module is used for carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded;
the second decoding module is used for taking the key frame to be decoded as the reference video frame of the current video frame to be decoded according to the target reference relation under the condition that the current video frame to be decoded belongs to the first segment to be decoded, and carrying out target decoding on the current video frame to be decoded;
the third decoding module is used for taking the target reference video frame as the reference video frame of the current video frame to be decoded and carrying out target decoding on the current video frame to be decoded according to the target reference relation under the condition that the current video frame to be decoded belongs to the second segment to be decoded, wherein the target reference video frame comprises at least one non-key frame positioned in front of the current video frame to be decoded;
wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.
As an alternative embodiment, the third decoding module comprises:
the determining submodule is used for determining a key frame to be decoded and at least one non-key frame positioned in front of the video frame to be decoded as a target reference video frame corresponding to the video frame to be decoded according to a target reference relation under the condition that the number of reference video frames corresponding to the video frame to be decoded is multiple;
and the decoding submodule is used for performing target decoding on the current video frame to be decoded by taking the target reference video frame as the reference video frame of the current video frame to be decoded.
As an alternative embodiment, the decoding unit 1006 includes:
and the fourth decoding module is used for decoding all the non-key frames to be decoded in the first segment to be decoded in the image group to be decoded in parallel according to the target reference relationship.
As an alternative embodiment, the obtaining unit 1002 includes:
the third acquisition module is used for acquiring an image group to be decoded of a video to be decoded under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the video to be decoded is a part corresponding to the main view angle area in the panoramic video to be decoded, the image group to be decoded is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.
According to still another aspect of embodiments of the present application, there is also provided a video transmission system including: the encoding end may include any one of the video encoding devices provided in this embodiment (or the encoding end is the video encoding device), and the decoding end may include any one of the video decoding devices provided in this embodiment (or the decoding end is the video decoding device).
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.
According to yet another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the above-mentioned video encoding method and/or video decoding method, which may be a server, a terminal, or a combination thereof.
Fig. 11 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 11, including a processor 1102, a communication interface 1104, a memory 1106, and a communication bus 1108, where the processor 1102, the communication interface 1104, and the memory 1106 communicate with each other via the communication bus 1108, where,
a memory 1106 for storing a computer program;
the processor 1102, when executing the computer program stored in the memory 1106, performs the following steps:
s1, acquiring a group of images to be coded of the video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;
s2, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, which are divided by a plurality of non-key frames to be coded, all non-key frames to be coded in a first fragment to be coded refer to key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded;
and S3, coding the image group to be coded according to the target coding mode.
Optionally, the processor 1102, when executing the computer program stored in the memory 1106, implements the following steps:
s1, acquiring a group of pictures to be decoded of the video to be decoded, wherein the group of pictures to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;
s2, determining a target reference relationship corresponding to the image group to be decoded, wherein the target reference relationship is used for indicating that in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded;
and S3, decoding the group of pictures to be decoded according to the target reference relation.
Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
As an example, the memory 1102 may include, but is not limited to, the obtaining unit 902, the determining unit 904, and the encoding unit 906 of the video encoding apparatus. In addition, the video encoding apparatus may further include, but is not limited to, other module units in the video encoding apparatus, which is not described in this example again.
As another example, the memory 1102 may include, but is not limited to, an obtaining unit 1002, a determining unit 1004, and a decoding unit 1006 in the video decoding apparatus. In addition, the video decoding apparatus may further include, but is not limited to, other module units in the video decoding apparatus, which is not described in this example again.
The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the device implementing the video encoding method and/or the video decoding method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.
According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Alternatively, in this embodiment, the storage medium may be a program code for executing a method for device screen projection.
Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
s1, acquiring a group of images to be coded of the video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;
s2, determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, which are divided by a plurality of non-key frames to be coded, all non-key frames to be coded in a first fragment to be coded refer to key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded;
and S3, coding the image group to be coded according to the target coding mode.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
s1, acquiring a group of pictures to be decoded of the video to be decoded, wherein the group of pictures to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;
s2, determining a target reference relationship corresponding to the image group to be decoded, wherein the target reference relationship is used for indicating that in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded;
and S3, decoding the group of pictures to be decoded according to the target reference relation.
Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.
According to yet another aspect of an embodiment of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method steps of any of the embodiments described above.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.
Claims (17)
1. A video encoding method, comprising:
acquiring a group of images to be coded of a video to be coded, wherein the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;
determining a target coding mode matched with the image group to be coded, wherein the target coding mode is used for indicating that in two fragments to be coded, into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first fragment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least a non-key frame before each non-key frame to be coded;
and coding the image group to be coded according to the target coding mode.
2. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:
determining two to-be-encoded segments corresponding to the to-be-encoded image group, wherein each to-be-encoded segment comprises at least one to-be-encoded non-key frame;
and coding each non-key frame to be coded in each section to be coded according to the target coding mode.
3. The method according to claim 2, wherein said determining two of said segments to be encoded corresponding to said group of images to be encoded comprises:
determining a target series according to a target delay time and a target coding and decoding time, wherein the target delay time is an allowed maximum delay time, the target coding and decoding time is coding and decoding time of one video frame, the coding and decoding time comprises coding time and decoding time, and the target series is the number of non-key frames allowed to be coded and decoded in the target delay time;
and determining two to-be-encoded segments corresponding to the to-be-encoded image group according to the target series, wherein the number of the to-be-encoded non-key frames contained in the second to-be-encoded segment is less than or equal to the difference between the target series and 1.
4. The method of claim 3, wherein determining the target stage number according to the target delay time and the target codec time comprises:
determining a target time difference value between the target delay time and a first coding and decoding time, wherein the first coding and decoding time is the coding and decoding time of a key frame;
and determining the quotient of the target time difference and second coding and decoding time as the target progression, wherein the second coding and decoding time is the coding and decoding time of a non-key frame, and the target coding and decoding time comprises the first coding and decoding time and the second coding and decoding time.
5. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:
carrying out intra-frame coding on the current video frame to be coded under the condition that the current video frame to be coded is the key frame to be coded;
under the condition that the current video frame to be coded belongs to the first segment to be coded, according to the target coding mode, taking the key frame to be coded as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded;
under the condition that the current video frame to be coded belongs to the second segment to be coded, according to the target coding mode, taking a target reference video frame as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded, wherein the target reference video frame comprises at least one non-key frame positioned before the current video frame to be coded;
wherein the target code is one of: and inter-frame coding, wherein the intra-frame coding is combined with the inter-frame coding.
6. The method according to claim 5, wherein said target-coding a current video frame to be coded by using a target reference video frame as a reference video frame of the current video frame to be coded according to the target-coding mode comprises:
under the condition that the number of reference video frames corresponding to the current video frame to be coded is multiple, determining the key frame to be coded and at least one non-key frame positioned before the current video frame to be coded as the target reference video frame corresponding to the current video frame to be coded according to the target coding mode;
and taking the target reference video frame as a reference video frame of the current video frame to be coded, and carrying out target coding on the current video frame to be coded.
7. The method according to claim 1, wherein said encoding the group of pictures to be encoded according to the target encoding mode comprises:
and according to the target coding mode, carrying out parallel coding on all the non-key frames to be coded in the first section to be coded in the group of pictures to be coded.
8. The method according to any one of claims 1 to 7, wherein the obtaining the group of pictures to be encoded of the video to be encoded comprises:
the method comprises the steps of acquiring a group of images to be coded of a to-be-coded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-coded video is a part of the to-be-coded panoramic video corresponding to the main view angle area, the group of images to be coded is a group of images where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.
9. A video decoding method, comprising:
acquiring an image group to be decoded of a video to be decoded, wherein the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;
determining a target reference relationship corresponding to the group of pictures to be decoded, wherein the target reference relationship is used for indicating that, in two fragments to be decoded into which a plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame positioned before each non-key frame to be decoded;
and decoding the image group to be decoded according to the target reference relation.
10. The method according to claim 9, wherein said decoding said group of pictures to be decoded according to said target reference relationship comprises:
carrying out intra-frame decoding on the key frame to be decoded under the condition that the current video frame to be decoded is the key frame to be decoded;
under the condition that the current video frame to be decoded belongs to the first segment to be decoded, according to the target reference relationship, taking the key frame to be decoded as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded;
under the condition that the current video frame to be decoded belongs to a second segment to be decoded, according to the target reference relation, taking a target reference video frame as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded, wherein the target reference video frame comprises at least one non-key frame positioned in the current video frame to be decoded;
wherein the target decoding is one of: and inter-frame decoding, wherein the intra-frame decoding is combined with the inter-frame decoding.
11. The method according to claim 10, wherein the target decoding the current video frame to be decoded by using a target reference video frame as a reference video frame of the current video frame to be decoded according to the target reference relationship comprises:
under the condition that the number of reference video frames corresponding to the current video frame to be decoded is multiple, determining the key frame to be decoded and at least one non-key frame positioned before the current video frame to be decoded as the target reference video frame corresponding to the current video frame to be decoded according to the target reference relationship;
and taking the target reference video frame as a reference video frame of the current video frame to be decoded, and performing target decoding on the current video frame to be decoded.
12. The method according to claim 9, wherein said decoding said group of pictures to be decoded according to said target reference relationship comprises:
and according to the target reference relation, parallelly decoding all the non-key frames to be decoded in the first segment to be decoded in the image group to be decoded.
13. The method according to any of claims 9 to 12, wherein said obtaining a group of pictures to be decoded of a video to be decoded comprises:
the method comprises the steps of acquiring a to-be-decoded image group of a to-be-decoded video under the condition that a main view angle area of a target object in the panoramic video is switched from a first view angle area to a second view angle area, wherein the to-be-decoded video is a part of the to-be-decoded panoramic video corresponding to the main view angle area, the to-be-decoded image group is an image group where a first video frame after view angle switching occurs, definition corresponding to the main view angle area is first definition, definition corresponding to other areas except the main view angle area in the panoramic video is second definition, and the first definition is higher than the second definition.
14. A video encoding apparatus, comprising:
the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring a group of images to be coded of a video to be coded, and the group of images to be coded comprises a key frame to be coded and a plurality of non-key frames to be coded;
the device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining a target coding mode matched with the image group to be coded, the target coding mode is used for indicating that in two fragments to be coded into which a plurality of non-key frames to be coded are divided, all the non-key frames to be coded in a first fragment to be coded refer to the key frames to be coded, and each non-key frame to be coded in a second fragment to be coded refers to at least one non-key frame positioned before each non-key frame to be coded;
and the coding unit is used for coding the image group to be coded according to the target coding mode.
15. A video decoding apparatus, comprising:
the device comprises an acquisition unit, a decoding unit and a decoding unit, wherein the acquisition unit is used for acquiring an image group to be decoded of a video to be decoded, and the image group to be decoded comprises a key frame to be decoded and a plurality of non-key frames to be decoded;
a determining unit, configured to determine a target reference relationship corresponding to the group of pictures to be decoded, where the target reference relationship is used to indicate that, in two fragments to be decoded into which the plurality of non-key frames to be decoded are divided, all the non-key frames to be decoded in a first fragment to be decoded refer to the key frames to be decoded, and each non-key frame to be decoded in a second fragment to be decoded refers to at least one non-key frame located before each non-key frame to be decoded;
and the decoding unit is used for decoding the image group to be decoded according to the target reference relation.
16. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,
the memory for storing a computer program;
the processor configured to perform the method steps of any one of claims 1 to 8 or to perform the method steps of any one of claims 9 to 13 by running the computer program stored on the memory.
17. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method steps of any of claims 1 to 8 or the method steps of any of claims 9 to 13 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011213187.5A CN112040233B (en) | 2020-11-04 | 2020-11-04 | Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011213187.5A CN112040233B (en) | 2020-11-04 | 2020-11-04 | Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112040233A true CN112040233A (en) | 2020-12-04 |
CN112040233B CN112040233B (en) | 2021-01-29 |
Family
ID=73573170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011213187.5A Active CN112040233B (en) | 2020-11-04 | 2020-11-04 | Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112040233B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112995532A (en) * | 2021-02-03 | 2021-06-18 | 上海哔哩哔哩科技有限公司 | Video processing method and device |
CN113873271A (en) * | 2021-09-08 | 2021-12-31 | 广州繁星互娱信息科技有限公司 | Video stream playing method and device, storage medium and electronic equipment |
CN113891019A (en) * | 2021-09-24 | 2022-01-04 | 深圳Tcl新技术有限公司 | Video encoding method, video encoding device, shooting equipment and storage medium |
CN114513658A (en) * | 2022-01-04 | 2022-05-17 | 聚好看科技股份有限公司 | Video loading method, device, equipment and medium |
CN115119009A (en) * | 2022-06-29 | 2022-09-27 | 北京奇艺世纪科技有限公司 | Video alignment method, video encoding device and storage medium |
CN115529474A (en) * | 2021-06-24 | 2022-12-27 | 北京金山云网络技术有限公司 | Panoramic video transmission method and device, electronic equipment and storage medium |
WO2023273535A1 (en) * | 2021-06-30 | 2023-01-05 | 华为技术有限公司 | Video bitstream processing method, medium, program product, and electronic device |
WO2023061129A1 (en) * | 2021-10-12 | 2023-04-20 | 百果园技术(新加坡)有限公司 | Video encoding method and apparatus, device, and storage medium |
CN116170609A (en) * | 2023-02-01 | 2023-05-26 | 广州虎牙科技有限公司 | Video transcoding method and device, live broadcast server, terminal equipment and storage medium |
WO2023109325A1 (en) * | 2021-12-15 | 2023-06-22 | 腾讯科技(深圳)有限公司 | Video encoding method and apparatus, electronic device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1965587A (en) * | 2004-04-07 | 2007-05-16 | 高通股份有限公司 | Method and apparatus for frame prediction in hybrid video compression to enable temporal scalability |
CN101459840A (en) * | 2007-12-13 | 2009-06-17 | 华为技术有限公司 | Encoding and decoding method, apparatus and system for video image |
CN101668208A (en) * | 2009-09-15 | 2010-03-10 | 杭州华三通信技术有限公司 | Frame coding method and device |
CN103796021A (en) * | 2012-10-29 | 2014-05-14 | 浙江大学 | Video coding and decoding method and device |
US20180184089A1 (en) * | 2016-12-28 | 2018-06-28 | Intel Corporation | Target bit allocation for video coding |
-
2020
- 2020-11-04 CN CN202011213187.5A patent/CN112040233B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1965587A (en) * | 2004-04-07 | 2007-05-16 | 高通股份有限公司 | Method and apparatus for frame prediction in hybrid video compression to enable temporal scalability |
CN101459840A (en) * | 2007-12-13 | 2009-06-17 | 华为技术有限公司 | Encoding and decoding method, apparatus and system for video image |
CN101668208A (en) * | 2009-09-15 | 2010-03-10 | 杭州华三通信技术有限公司 | Frame coding method and device |
CN103796021A (en) * | 2012-10-29 | 2014-05-14 | 浙江大学 | Video coding and decoding method and device |
US20180184089A1 (en) * | 2016-12-28 | 2018-06-28 | Intel Corporation | Target bit allocation for video coding |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112995532B (en) * | 2021-02-03 | 2023-06-13 | 上海哔哩哔哩科技有限公司 | Video processing method and device |
CN112995532A (en) * | 2021-02-03 | 2021-06-18 | 上海哔哩哔哩科技有限公司 | Video processing method and device |
CN115529474A (en) * | 2021-06-24 | 2022-12-27 | 北京金山云网络技术有限公司 | Panoramic video transmission method and device, electronic equipment and storage medium |
WO2023273535A1 (en) * | 2021-06-30 | 2023-01-05 | 华为技术有限公司 | Video bitstream processing method, medium, program product, and electronic device |
CN113873271A (en) * | 2021-09-08 | 2021-12-31 | 广州繁星互娱信息科技有限公司 | Video stream playing method and device, storage medium and electronic equipment |
CN113873271B (en) * | 2021-09-08 | 2023-08-11 | 广州繁星互娱信息科技有限公司 | Video stream playing method and device, storage medium and electronic equipment |
CN113891019A (en) * | 2021-09-24 | 2022-01-04 | 深圳Tcl新技术有限公司 | Video encoding method, video encoding device, shooting equipment and storage medium |
WO2023061129A1 (en) * | 2021-10-12 | 2023-04-20 | 百果园技术(新加坡)有限公司 | Video encoding method and apparatus, device, and storage medium |
WO2023109325A1 (en) * | 2021-12-15 | 2023-06-22 | 腾讯科技(深圳)有限公司 | Video encoding method and apparatus, electronic device, and storage medium |
CN114513658A (en) * | 2022-01-04 | 2022-05-17 | 聚好看科技股份有限公司 | Video loading method, device, equipment and medium |
CN114513658B (en) * | 2022-01-04 | 2024-04-02 | 聚好看科技股份有限公司 | Video loading method, device, equipment and medium |
CN115119009A (en) * | 2022-06-29 | 2022-09-27 | 北京奇艺世纪科技有限公司 | Video alignment method, video encoding device and storage medium |
CN115119009B (en) * | 2022-06-29 | 2023-09-01 | 北京奇艺世纪科技有限公司 | Video alignment method, video encoding device and storage medium |
CN116170609A (en) * | 2023-02-01 | 2023-05-26 | 广州虎牙科技有限公司 | Video transcoding method and device, live broadcast server, terminal equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112040233B (en) | 2021-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112040233B (en) | Video encoding method, video decoding method, video encoding device, video decoding device, electronic device, and storage medium | |
CN112333448B (en) | Video encoding method and apparatus, video decoding method and apparatus, electronic device, and storage medium | |
CN112351285B (en) | Video encoding method, video decoding method, video encoding device, video decoding device, electronic equipment and storage medium | |
CN111277826B (en) | Video data processing method and device and storage medium | |
US12052427B2 (en) | Video data processing method and apparatus, and storage medium | |
US20090219985A1 (en) | Systems and Methods for Processing Multiple Projections of Video Data in a Single Video File | |
CN113491113B (en) | Video encoding and decoding method and device | |
US20170180746A1 (en) | Video transcoding method and electronic apparatus | |
CN111800653B (en) | Video decoding method, system, device and computer readable storage medium | |
CN107172376B (en) | Video coding method and device based on screen sharing | |
CN107181744B (en) | Video processing and encoding method, processor and encoder | |
WO2023142716A1 (en) | Encoding method and apparatus, real-time communication method and apparatus, device, and storage medium | |
CN113965751A (en) | Screen content coding method, device, equipment and storage medium | |
CN112040234B (en) | Video encoding method, video decoding method, video encoding device, video decoding device, electronic equipment and storage medium | |
CN112040232B (en) | Real-time communication transmission method and device and real-time communication processing method and device | |
CN115134629A (en) | Video transmission method, system, device and storage medium | |
CN112351278B (en) | Video encoding method and device and video decoding method and device | |
CN114125448B (en) | Video coding method, decoding method and related devices | |
WO2024113869A1 (en) | Video coding method and related apparatus | |
CN112351284B (en) | Video encoding method and apparatus, video decoding method and apparatus, electronic device, and storage medium | |
EP4380155A1 (en) | Encoding and decoding method, encoder, decoder, and electronic device | |
CN111212288B (en) | Video data encoding and decoding method and device, computer equipment and storage medium | |
CN114449348A (en) | Panoramic video processing method and device | |
CN115866297A (en) | Video processing method, device, equipment and storage medium | |
CN115442617A (en) | Video processing method and device based on video coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |