CN113259673B - Scalable video coding method, apparatus, device and storage medium - Google Patents

Scalable video coding method, apparatus, device and storage medium Download PDF

Info

Publication number
CN113259673B
CN113259673B CN202110755288.3A CN202110755288A CN113259673B CN 113259673 B CN113259673 B CN 113259673B CN 202110755288 A CN202110755288 A CN 202110755288A CN 113259673 B CN113259673 B CN 113259673B
Authority
CN
China
Prior art keywords
layer
time domain
code rate
image frame
reference unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110755288.3A
Other languages
Chinese (zh)
Other versions
CN113259673A (en
Inventor
焦华龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110755288.3A priority Critical patent/CN113259673B/en
Publication of CN113259673A publication Critical patent/CN113259673A/en
Application granted granted Critical
Publication of CN113259673B publication Critical patent/CN113259673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability

Abstract

The application provides a method, a device, equipment and a storage medium for encoding a telescopic video, wherein the method comprises the following steps: acquiring a video sequence; acquiring the code rate ratio of the time domain layering layer number N and the N-1 layer of a video sequence in a time domain reference unit respectively, wherein N is an integer greater than 1; determining a time domain reference structure of a video sequence according to the number of time domain layering layers and the code rate ratio of each N-1 layer in a time domain reference unit; and coding the video sequence according to the time domain reference structure to obtain an output code stream. Video compression efficiency and video fluency can thus be balanced.

Description

Scalable video coding method, apparatus, device and storage medium
Technical Field
Embodiments of the present disclosure relate to video processing technologies, and in particular, to a scalable video coding method, apparatus, device, and storage medium.
Background
In a multi-party real-time video communication scene, different networks and different terminal devices exist, so that the requirements of the different networks or the different terminal devices on video quality are different, for example, when a network is used for transmitting video, because the network bandwidth limits video transmission, when the network bandwidth is smaller, only basic video signals are required to be transmitted, whether enhanced video signals are transmitted or not is determined according to the actual network condition, and the video quality is enhanced. Under the background, the scalable video coding technology is utilized to realize one-time coding to generate video compression code streams with different frame rates and resolutions, and then the amount of video information to be transmitted is selected according to different network bandwidths, different display screens and different terminal decoding capabilities, so as to realize the self-adaptive adjustment of the video quality. To enable an encoding technique for decoding video data of different frame rates, resolutions, and image qualities from a single code stream. Scalable Video Coding (SVC) is based on the h.264 Advanced Video Coding (AVC) standard and the h.265 High Efficiency Video Coding (HEVC), and utilizes various High Efficiency algorithm tools of the AVC and HEVC codecs, so that the encoded Video generated by encoding is temporally and spatially Scalable, and is Scalable in Video quality, and can generate videos with different frame rates, resolutions, or quality levels.
Scalability, i.e., scalability, of SVC mainly includes: temporal scalability, spatial scalability, and quality scalability, where temporal scalability is crucial. Currently, when temporal scalability of a video sequence is performed, a coding device uses a fixed number of temporal layers, and once the number of temporal layers is determined, the temporal reference structure is fixed. The current time domain reference structure mainly exchanges a long reference distance in the time domain for the anti-packet loss effect, however, when the reference distance is too large, the code rate of an image frame becomes large, that is, the number of consumed code words becomes large, and further the video compression efficiency is low, and when the reference distance is too small, although the number of consumed code words becomes small, the video compression efficiency is high, the packet loss situation is easy to occur, so that the video smoothness is poor, and therefore, how to balance the video compression efficiency and the video smoothness is a technical problem to be solved urgently in the present application.
Disclosure of Invention
The application provides a scalable video coding method, a scalable video coding device, a scalable video coding apparatus and a scalable video coding storage medium, so that video compression efficiency and video smoothness can be balanced.
In a first aspect, a scalable video coding method is provided, including: acquiring a video sequence; acquiring the code rate ratio of the time domain layering layer number N and the N-1 layer of a video sequence in a time domain reference unit respectively, wherein N is an integer greater than 1; determining a time domain reference structure of a video sequence according to the number of time domain layering layers and the code rate ratio of each N-1 layer in a time domain reference unit; and coding the video sequence according to the time domain reference structure to obtain an output code stream.
In a second aspect, a scalable video coding apparatus is provided, including: the device comprises a first acquisition module, a second acquisition module, a determination module and a coding module, wherein the first acquisition module is used for acquiring a video sequence; the second acquisition module is used for acquiring the code rate ratio of the time domain layering layer number N and the N-1 layer of the video sequence in a time domain reference unit; the determining module is used for determining a time domain reference structure of the video sequence according to the time domain layering layer number and the code rate ratio of each of the N-1 layers in one time domain reference unit; and the coding module is used for coding the video sequence according to the time domain reference structure so as to obtain an output code stream.
In a third aspect, a terminal device is provided, including: a processor and a memory, the memory being configured to store a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform a method as in the first aspect or its implementations.
In a fourth aspect, there is provided a computer readable storage medium for storing a computer program for causing a computer to perform the method as in the first aspect or its implementations.
In a fifth aspect, there is provided a computer program product comprising computer program instructions to cause a computer to perform the method as in the first aspect or its implementations.
A sixth aspect provides a computer program for causing a computer to perform a method as in the first aspect or implementations thereof.
Through the technical scheme of this application, terminal equipment can set up the time domain layering number of piles and the code rate of each layer and account for than, prescribe a limit to the code rate of each layer promptly, that is to say, to different terminal equipment or different networks that different terminal equipment adopted, can set up different time domain layering number of piles and the code rate of each layer and account for than, for example: for a terminal device with better performance, the code rate of the base layer can be set to be higher than the code rate of the base layer of the terminal device with poorer performance, and the terminal device with better performance can consume more code words, so that the video smoothness of the terminal device can be ensured. For a terminal device with poor performance, the ratio of the base layer code rate that can be set for the terminal device is lower than the ratio of the base layer code rate that is set for the terminal device with better performance, because the terminal device with better performance cannot consume more codewords, it is necessary to ensure that the compression efficiency of such a terminal device is lower, and in short, by limiting the ratio of the code rates of each layer, the video compression efficiency and the video smoothness can be balanced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a multi-party real-time video communication scenario provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a typical video encoder;
FIG. 3 is a schematic diagram of a 3-layer time domain reference structure;
fig. 4 is a flowchart of a scalable video coding method according to an embodiment of the present application;
fig. 5 is a schematic diagram of a time domain reference structure provided in an embodiment of the present application;
fig. 6 is a schematic diagram of another time domain reference structure provided in an embodiment of the present application;
FIG. 7 is a diagram illustrating a time-domain reference structure according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a time-domain reference structure provided in an embodiment of the present application;
fig. 9 is a schematic diagram of an apparatus for scalable video coding according to an embodiment of the present application;
fig. 10 is a schematic block diagram of a terminal device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The application scenario of the technical solution of the present application may be a multi-party real-time video communication scenario, but is not limited thereto. Fig. 1 is a schematic view of a multi-party real-time video communication scenario provided in an embodiment of the present application, and as shown in fig. 1, a multi-party real-time video communication system includes: various terminal devices 110 and servers 120, wherein terminal devices 110 and servers 120 are connected via a network.
It should be understood that when each of the terminal devices 110 performs real-time video communication, any one of the terminal devices 110 performs one-time encoding by using a scalable video encoding technique to generate video compressed code streams with different frame rates and resolutions, and then sends the video compressed code streams to the server 120, and then the server 120 selects the amount of video information to be transmitted according to different network bandwidths, different display screens, and the decoding capabilities of other terminal devices, and sends corresponding video information to the other terminal devices.
It should be noted that the terminal devices 110 of each party may also directly transmit video information, for example: when each terminal device 110 performs real-time video communication, any terminal device 110 uses a scalable video coding technology to realize one-time coding to generate video compression code streams with different frame rates and resolutions, then selects the amount of video information to be transmitted according to different network bandwidths, different display screens and the decoding capabilities of other terminal devices, and sends corresponding video information to other terminal devices.
It should be understood that any of the terminal devices 110 described above may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a wearable device, and the like, and the present application is not limited thereto. The server 130 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, which is not limited in this application.
As described above, in a multi-party real-time video communication scenario, different networks and different terminal devices exist, so that the requirements of the different networks or different terminal devices on video quality are different, for example, when a network is used to transmit video, since the network bandwidth limits video transmission, when the network bandwidth is small, only a basic video signal is transmitted, and whether an enhanced video signal is transmitted or not is determined according to the actual network condition, so that the video quality is enhanced. Under the background, the scalable video coding technology is utilized to realize one-time coding to generate video compression code streams with different frame rates and resolutions, and then the amount of video information to be transmitted is selected according to different network bandwidths, different display screens and different terminal decoding capabilities, so as to realize the self-adaptive adjustment of the video quality. To enable an encoding technique for decoding video data of different frame rates, resolutions, and image qualities from a single code stream. SVC is based on the h.264 AVC standard, h.265 HEVC, and utilizes various efficient algorithm tools of AVC and HEVC codecs, and is scalable in terms of temporal, spatial, and video quality of coded video generated by coding, and can generate video of different frame rates, resolutions, or quality levels.
Scalability, i.e., scalability, of SVC mainly includes: temporal, spatial and quality scalability. Wherein, the time gradable representation subcode stream contains video information with reduced play frame rate. The spatially scalable representation of the sub-stream contains video information with reduced spatial resolution of the image. Quality scalable means that the sub-stream provides the same spatial resolution as the full stream, but at a lower quality.
It should be understood that the sub-streams mentioned above are streams corresponding to each temporal hierarchy, or streams corresponding to each spatial hierarchy, or streams corresponding to each quality hierarchy. The complete code stream is a code stream corresponding to the entire video sequence.
Video coding is a process of compressing an original video (i.e., the video sequence) into a minimum number of bits while ensuring that a decoded reconstructed video has a certain playing quality. Fig. 2 is a schematic diagram of a typical video encoder, and as shown in fig. 2, the typical video encoder is generally mainly divided into the following basic units, i.e., a time domain model, a spatial domain model and an entropy encoder. The time domain model inputs an uncompressed video sequence, and uses the correlation between adjacent image frames in the sequence to remove the time domain redundancy, which is usually implemented by establishing a predicted image for the current frame. The time domain model outputs a residual image obtained by the difference between the current image and the predicted image, and model parameters such as a motion vector and the like. The input of the spatial domain model is a reference image output in the front, and the function of the spatial domain model is to transform and quantize the pixel value of the residual image by utilizing the correlation of adjacent pixel points in the residual image and remove spatial redundancy. After quantization of the transform coefficients, only a small number of key coefficient values are retained as input data. The output parameters of the temporal and spatial domain models are compressed by an entropy coder to remove statistical redundancy in the data. The compressed encoded output mainly contains parameters such as encoded motion vectors, residuals and the like and header information.
The SVC encoder encodes video into a plurality of spatial layers, a plurality of temporal layers, and a plurality of quality layers. The temporal scalability technology can support multiple video playback frame rates through a single code stream, and a video stream supporting temporal scalability can be decomposed into a base layer and one or more enhancement layers in the temporal domain.
The current video coding structure may be: an All Intra (All Intra) coding structure, a Low Latency (LP) coding structure, and a Random Access (RA) coding structure.
In the LP coding structure, only the first frame image is coded in an intra-frame manner, and even if an Instantaneous Decoding Refresh (IDR) frame is formed, each subsequent frame is coded as a general P and B frame (GPB). The low-delay coding structure is further divided into two coding structures, namely LDB and LDP, according to whether the type of the subsequent frame is a B frame or a P frame. It should be noted that, at present, image frames are classified into three categories, I frames, P frames, and B frames, according to prediction types, where all coding units in an I frame are coded only by using intra-frame prediction. Coding units in P frames may be encoded using intra prediction or uni-directional inter prediction. Coding units in B frames may be encoded using intra prediction or bi-directional inter prediction. That is, a P frame has only one reference list, while a B frame has two reference lists.
When the LP coding structure is adopted, the playing sequence and the coding sequence of the video sequence are consistent. Therefore, each frame of image only refers to the reconstructed frame of which the playing sequence is before the current coding image, the video sequence is coded and decoded according to the playing sequence, the coding and decoding of the image of which the coding sequence is behind the current image and the playing sequence is in front are not required to be waited, the time delay is relatively smaller, and the low-delay structure is named from the low-delay structure, is mainly designed for interactive real-time communication and is suitable for scenes with higher time delay requirements, such as live broadcast, video call and the like.
As described above, SVC can solve the problem of inconsistent video quality due to different terminal devices and different network conditions, and this is most widely adopted in temporal layering in terms of implementation complexity, cost performance, and compatibility. The layering of a video sequence in the time domain is usually determined by a preset number of time domain layering layers, and then a coded video sequence is referred according to the time domain reference structure, so that the coded code stream is transmitted to a server or a receiving end, and the server performs secondary distribution or the receiving end performs selective decoding and other processing. It should be understood that the temporal reference structure here, i.e. the structure of the video sequence after temporal layering, is called a temporal reference structure because each image frame may be a reference image frame of other image frames. The quality of the selection of the time domain reference structure seriously affects the definition, the fluency and other key indexes of the video. Fig. 3 is a schematic diagram of a 3-layer temporal reference structure, where pn (m) denotes an image frame located at the nth layer and having a video timing of the mth frame, as shown in fig. 3. Except that the 0 th frame is an IDR frame, the other frames are P frames, that is, the frames all adopt a unidirectional prediction mode, and the direction pointed by the arrow in fig. 3 represents a reference image frame, for example: the reference image frame of P3 (1) is IDR (0), the 3-layer temporal reference structure is composed of a plurality of temporal reference units, and the structures of these temporal reference units are identical, as shown in fig. 3, IDR (0), P3 (1), P2 (2), and P3 (3) constitute one temporal reference unit, and P1 (4), P3 (5), P2 (6), and P3 (7) constitute another temporal reference unit, and the structures of these two temporal reference units are identical.
In a low-latency lossy network environment, only the image frame loss in the base layer, i.e., layer 1 in fig. 3, needs to wait for the next key frame (i.e., I frame) to arrive, while the image frames in other enhancement layers, e.g., layer 2 or layer 3, are lost and do not need to be retransmitted and wait for the next key frame. This property is inherently packet loss resistant, for example: in the traditional IPPP coding, a key frame is waited for any 1 frame loss, and when the image frames in the base layer are separated by N frames, the probability of the image frame loss in the base layer is 1/N of the image frame loss in the IPPP coding, that is, the temporal reference structure has a good anti-packet loss characteristic. When Forward Error Correction (FEC) protection is selectively performed, only the base layer needs to be performed, so that effective utilization of bandwidth is greatly increased, and delay caused by a large amount of retransmission is reduced. However, the current time domain reference structure mainly uses a longer reference distance in the time domain to replace the effect of packet loss resistance, and when the reference distance is too large, the code rate of an image frame becomes large, that is, the number of consumed code words becomes large, thereby causing the problem of low video compression efficiency. For example: when Quantization Parameter (QP) =30, the relationship of codewords consumed at different reference distances is shown in table 1:
TABLE 1
Reference distance 1 2 3 4 6 8
1 1 1.436 1.719 1.974 2.404 2.704
2 1 1.197 1.374 1.652 1.864
3 1 1.148 1.547 1.573
4 1 1.218 1.53
The horizontal and vertical terms in the table are reference distances, which form reference distance pairs, and the number corresponding to each reference distance pair represents the relationship between the consumed code words at two reference distances, for example: row 2, column 3 number 1.436 in table 1 indicates that the codeword consumed for a reference distance of 2 is 1.436 times the codeword consumed for a reference distance of 1. The row 2, column 4 number 1.719 of table 1 indicates that the codeword consumed for the reference distance of 3 is 1.719 times the codeword consumed for the reference distance of 1.
Assuming that the codeword consumed by each P frame is 1 in IPPP mode, 4P frames consume 4 codewords, wherein the reference distance between adjacent P frames is 1 in IPPP mode, and in summary, the 4P frames consume 4 codewords in IPPP mode 1= 4. Assume that a 3-layer reference structure as shown in fig. 3 is employed for these 4P frames, for example: the 4P frames are P1 (4), P3 (5), P2 (6), P3 (7) in fig. 3, and the reference image frame of P1 (4) is IDR (0), and the reference distance of P1 (4) to IDR (0) is 4-0= 4; the reference image frame of P3 (5) is P1 (4), the reference distance of P3 (5) from P1 (4) is 5-4= 1; the reference image frame of P2 (6) is P1 (4), the reference distance of P2 (6) from P1 (4) is 6-4= 2; the reference image frame of P3 (7) is P2 (6), and the reference distance of P3 (7) from P2 (6) is 7-6= 1. Referring to table 1, when the reference distance is 4, it consumes 1.974 times as many codewords as it consumes when the reference distance is 1. When the reference distance is 2, it consumes 1.436 times as many codewords as it consumes when the reference distance is 1. Then when the 3-layer reference structure shown in fig. 3 is adopted, the codewords consumed by the 4P frames are 1.974+1.436+2 × 1=5.41, and the codewords consumed by the 3-layer reference structure shown in fig. 3 are 5.41/4 ≈ 1.35 of the codewords consumed in IPPP mode, that is, 35% more codewords are consumed by SVC than IPPP mode, thereby causing a problem of low video compression efficiency.
When the reference distance is too small, although the consumed code words are reduced, the video compression efficiency is higher, the packet loss is easy to occur, so that the video smoothness is poorer, and therefore how to balance the video compression efficiency and the video smoothness is a technical problem to be solved urgently in the present application.
In order to solve the above technical problem, in the present application, the number of time domain layers and the bit rate ratio of each layer may be set, that is, the bit rate ratio of each layer is limited, so that the video compression efficiency and the video smoothness may be balanced.
It should be understood that the code rate is the number of data bits transmitted per unit time during data transmission, and the code rate is also called bit rate, which indicates how many bits are needed to represent the video data after compression coding per second, i.e. the amount of data after compressing the image displayed per second. Therefore, the consumed code words can also be viewed from the perspective of code rate, that is, the more code words consumed, the larger the code rate. Conversely, the fewer codewords consumed, the smaller the code rate.
The scheme of the application will be explained in detail as follows:
fig. 4 is a flowchart of a scalable video coding method provided by an embodiment of the present application, where the method may be performed by a coding device, and the coding device may be any terminal device shown in fig. 1, but is not limited thereto, and as shown in fig. 4, the method includes the following steps:
s410: a video sequence is acquired.
S420: the code rate ratio of the time domain layering layer number N and the N-1 layer of the video sequence in a time domain reference unit is obtained, wherein N is an integer larger than 1.
S430: and determining the time domain reference structure of the video sequence according to the time domain layering layer number and the code rate ratio of the N-1 layers in one time domain reference unit.
S440: and coding the video sequence according to the time domain reference structure to obtain an output code stream.
It should be understood that the video sequence may be captured by a camera of the terminal device, and the video sequence may be referred to as a video or an image sequence, which is not limited in this application.
As described above, in SVC technology, a terminal device may perform temporal classification on a video sequence, where the number of temporal classification layers may be 2 layers, 3 layers, or more, which is not limited in this application. For example: as shown in fig. 3, the video sequence has 3 temporal layering levels.
It should be understood that the time domain reference structure is composed of a plurality of time domain reference units, and the structures of the time domain reference units are identical, as shown in fig. 3, IDR (0), P3 (1), P2 (2), and P3 (3) constitute one time domain reference unit, and P1 (4), P3 (5), P2 (6), and P3 (7) constitute another time domain reference unit, and the structures of the two time domain reference units are identical.
Optionally, when the video sequence is divided into N layers in the time domain, the terminal device only needs to obtain a code rate ratio of any N-1 layers in the N time domain layers in one time domain reference unit, but is not limited thereto. For example: assuming that the video sequence is divided into N layers, the terminal device may obtain the ratio of the code rates of the layer 1, the layer 2 to the N-1 in one temporal reference unit, or the terminal device may obtain the ratio of the code rates of the layer 2, the layer 3 to the N in one temporal reference unit.
It should be understood that the terminal device may also obtain N layers, that is, the ratio of the code rates of all the time domain layers in one time domain reference unit, because the sum of the ratio of the code rates of the N layers in one time domain reference unit is 1, and it is the number of image frames corresponding to the i +1 th layer of each image frame in the i-th layer to be determined when the terminal device subsequently determines the time domain reference structure, that is, N-1 unknown quantities are determined, therefore, the N-1 unknown quantities may be determined only by satisfying N-1 conditions, and based on this, in practical application, the terminal device only needs to utilize the ratio of the code rates of any N-1 layers in the N layers in one time domain reference unit.
It should be understood that for any one of the above N-1 layers, the ratio of the code rates in one temporal reference unit refers to the ratio of the sum of the code rates of all image frames in the layer to the sum of the code rates of all image frames in all layers in the temporal reference unit.
For example, as shown in fig. 3, assuming that the code rate of each P frame is 1 in IPPP mode, the sum of the code rates of 4P frames is 4, wherein the reference distance between adjacent P frames is 1 in IPPP mode, and in summary, the sum of the code rates of the 4P frames is 4 × 1=4 in IPPP mode. Assume that a 3-layer reference structure as shown in fig. 3 is employed for these 4P frames, for example: the 4P frames are P1 (4), P3 (5), P2 (6), P3 (7) in fig. 3, and the reference image frame of P1 (4) is IDR (0), and the reference distance of P1 (4) to IDR (0) is 4-0= 4; the reference image frame of P3 (5) is P1 (4), the reference distance of P3 (5) from P1 (4) is 5-4= 1; the reference image frame of P2 (6) is P1 (4), the reference distance of P2 (6) from P1 (4) is 6-4= 2; the reference image frame of P3 (7) is P2 (6), and the reference distance of P3 (7) from P2 (6) is 7-6= 1. Referring to table 1, when the reference distance is 4, the code rate is 1.974 times the code rate when the reference distance is 1. When the reference distance is 2, its code rate is 1.436 times that when the reference distance is 1. Then when the 3-layer reference structure shown in fig. 3 is adopted, the sum of the code rates of the 4P frames is 1.974+1.436+2 × 1=5.41, and in the temporal reference unit formed by P1 (4), P3 (5), P2 (6) and P3 (7), the code rate of the 0 th layer is 1.974, and therefore, the code rate of the 0 th layer in the temporal reference unit is 1.974/5.41 ≈ 0.36.
It should be noted that, for different terminal devices, different time domain layering numbers and code rate ratios of the layers may be set, for example: for a terminal device with better performance, the ratio of the base layer code rate set for the terminal device can be higher than that of the base layer code rate of the terminal device with poorer performance, and because the terminal device with better performance can consume more code words, the video fluency of the terminal device can be ensured. For a terminal device with poor performance, the base layer bit rate that can be set for the terminal device is lower than the base layer bit rate that is set for the terminal device with better performance, because the terminal device with better performance cannot consume more codewords, it is necessary to ensure that the compression efficiency of such terminal device is lower.
Similarly, for different networks used by each terminal device, different code rate ratios may also be set for the terminal devices, for example: if the terminal device a uses a wireless fidelity (wireless fidelity) network and the terminal device B uses a 5G mobile network, a lower base layer bit rate ratio may be set for the terminal device a and a higher base layer bit rate ratio may be set for the terminal device B. It should be understood that, since the parameters mainly involved in a temporal reference structure are the number of temporal layering layers and the number of image frames corresponding to the i +1 th layer of each image frame in the i-th layer in any temporal reference unit in the temporal reference structure, i =1,2 … … N, where N represents the number of layers of the temporal reference structure, and the 1 st layer is the base layer. And the number of the time domain layering layers is configured, so that the terminal equipment determines that the number of the image frames corresponding to the (i + 1) th layer of each image frame in the ith layer in any one time domain reference unit is the greatest in determining the time domain reference structure. It should be understood that, for any image frame in the ith layer, the number of image frames corresponding to the image frame in the (i + 1) th layer refers to the number of image frames corresponding to the image frame in the (i + 1) th layer, that is, the number of image frames in the (i + 1) th layer that can be inserted between the image frame and the image frame subsequent to the image frame in the ith layer. For example, as in the temporal reference structure shown in fig. 3, P1 (4) is an image frame located at layer 1, the image frame following the image frame is P1 (8), and the image frame located at layer 2 and that can be inserted into P1 (4) and P1 (8) is P2 (6), and thus, the number of image frames corresponding to P1 (4) at layer 2 is 1.
Optionally, the terminal device may determine, in two realizable manners, the number of image frames corresponding to each image frame in the i-th layer in one time-domain reference unit in the i + 1-th layer, but is not limited to this: the implementation mode is as follows: and the terminal equipment determines the number of image frames corresponding to each image frame in the ith layer in the ith +1 layer in one time domain reference unit according to the number of time domain layering layers and the code rate ratio of the N-1 layers in one time domain reference unit. The second implementation mode: the terminal device determines the number of image frames corresponding to the (i + 1) th layer of each image frame in the ith layer in one time domain reference unit before determining the number of image frames corresponding to the (i + 1) th layer of each image frame in the ith layer in one time domain reference unit according to the number of time domain layering layers and the code rate ratio of the N-1 layers in the time domain reference unit, and the lower limit value of the code rate of each image frame in the N-1 layer can be obtained for the time domain reference unit, wherein the lower limit value of the code rate is the lower limit value of the code rate of each image frame relative to the image frame in the highest enhancement layer. Correspondingly, the terminal equipment can determine the number of image frames corresponding to each image frame in the ith layer in one time domain reference unit in the (i + 1) th layer according to the code rate lower limit value of each image frame in the N-1 layer, the number of time domain layering layers and the code rate ratio of the N-1 layers in one time domain reference unit.
That is, the limitation of the lower limit of the code rate is not involved in the first implementation, but is involved in the second implementation.
The following is a detailed description of the first implementation:
the terminal equipment is assumed to acquire the number N of time-domain layering layers and the ratio of code rates of the 1 st layer to the N-1 st layer in one time-domain reference unit, and the code rate of any image frame in the N-th layer is assumed to be 1, and the code rate of each image frame in the 1 st layer relative to any image frame in the N-th layer is assumed to be
Figure 706599DEST_PATH_IMAGE001
The code rate of layer 1 in a time domain reference unit is
Figure 393932DEST_PATH_IMAGE002
In the time domain reference unit, each image frame in the layer 1 corresponds to the image in the layer 2The number of frames is
Figure DEST_PATH_IMAGE003
Similarly, assume that each image frame in layer 2 is a code rate relative to any image frame in layer N is
Figure 722145DEST_PATH_IMAGE004
The code rate of layer 2 in a time domain reference unit is
Figure DEST_PATH_IMAGE005
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 2 in the layer 3 is equal to
Figure 629184DEST_PATH_IMAGE006
And so on, assuming that each image frame in the N-1 th layer is a code rate relative to any image frame in the N-1 th layer is
Figure DEST_PATH_IMAGE007
The code rate of the layer N-1 in a time domain reference unit is
Figure 599414DEST_PATH_IMAGE008
In the time domain reference unit, the number of image frames corresponding to each image frame in the N-1 th layer in the N layer is
Figure DEST_PATH_IMAGE009
. Then it can be calculated by the following formula
Figure 141254DEST_PATH_IMAGE002
Figure 640368DEST_PATH_IMAGE005
……
Figure 2079DEST_PATH_IMAGE008
Figure 244842DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
Wherein the content of the first and second substances,
Figure 906767DEST_PATH_IMAGE012
indicating the number of image frames included in the ith layer in the above time-domain reference unit,
Figure DEST_PATH_IMAGE013
it should be understood that for the above N formulas, it is known
Figure 346757DEST_PATH_IMAGE014
Figure 930185DEST_PATH_IMAGE005
……
Figure 976638DEST_PATH_IMAGE008
Then it can be determined
Figure 227491DEST_PATH_IMAGE003
Figure 68408DEST_PATH_IMAGE006
……
Figure 139132DEST_PATH_IMAGE009
To do so
Figure 458118DEST_PATH_IMAGE001
Figure 829057DEST_PATH_IMAGE004
……
Figure 575296DEST_PATH_IMAGE007
As long as it satisfies more than 0.
The following detailed description is directed to implementation two:
it should be understood that the code rate lower limit of an image frame referred to in this application refers to a lower limit of the code rate of the image frame, and the code rate of the image frame is the code rate of the image frame relative to the image frame in the highest enhancement layer. Of course, the code rate may also be a code rate relative to other image frames, for example: the code rate is relative to the image frame in the second last layer, as long as the reference values relative to the code rates of all the image frames are uniform, which is not limited in the present application.
The terminal equipment is assumed to acquire the number N of time-domain layering layers and the ratio of code rates of the 1 st layer to the N-1 st layer in one time-domain reference unit, and the code rate of any image frame in the N-th layer is assumed to be 1, and the code rate of each image frame in the 1 st layer relative to any image frame in the N-th layer is assumed to be
Figure 398895DEST_PATH_IMAGE001
The code rate of layer 1 in a time domain reference unit is
Figure 521572DEST_PATH_IMAGE014
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 1 in the layer 2 is equal to
Figure 248482DEST_PATH_IMAGE003
The lower limit value of the code rate of each image frame in the layer 1 is
Figure DEST_PATH_IMAGE015
Similarly, assume that each image frame in layer 2 is a code rate relative to any image frame in layer N is
Figure 696781DEST_PATH_IMAGE004
The code rate of layer 2 in a time domain reference unit is
Figure 742097DEST_PATH_IMAGE016
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 2 in the layer 3 is equal to
Figure 668465DEST_PATH_IMAGE006
The lower limit value of the code rate of each image frame in the layer 2 is
Figure DEST_PATH_IMAGE017
And so on, assuming that each image frame in the N-1 th layer is a code rate relative to any image frame in the N-1 th layer is
Figure 279575DEST_PATH_IMAGE007
The code rate of the layer N-1 in a time domain reference unit is
Figure 367617DEST_PATH_IMAGE008
In the time domain reference unit, the number of image frames corresponding to each image frame in the N-1 th layer in the N layer is
Figure 900229DEST_PATH_IMAGE009
The lower limit value of the code rate of each image frame in the N-1 layer is
Figure 630288DEST_PATH_IMAGE018
. Then it can be calculated by the following formula
Figure 564746DEST_PATH_IMAGE014
Figure 587803DEST_PATH_IMAGE005
……
Figure 607712DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE019
And the following conditions need to be satisfied:
Figure 407040DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 196005DEST_PATH_IMAGE012
indicating the number of image frames included in the ith layer in the above time-domain reference unit,
Figure 891428DEST_PATH_IMAGE021
it should be understood that for the above N formulas, it is known
Figure 398633DEST_PATH_IMAGE014
Figure 470494DEST_PATH_IMAGE005
……
Figure 113965DEST_PATH_IMAGE008
Then it can be determined
Figure 245869DEST_PATH_IMAGE003
Figure 974791DEST_PATH_IMAGE006
……
Figure 584764DEST_PATH_IMAGE009
To do so
Figure 849785DEST_PATH_IMAGE001
Figure 887011DEST_PATH_IMAGE004
……
Figure 103229DEST_PATH_IMAGE007
As long as the above conditions are satisfied.
The following determination is made for the case where the number of time-domain layered layers is 2
Figure 516893DEST_PATH_IMAGE022
The process of (a) is exemplified: wherein the content of the first and second substances,
Figure 134956DEST_PATH_IMAGE003
is shown inThe number of image frames corresponding to each image frame in the layer 1 in the time domain reference unit in the layer 2 is as follows:
illustratively, it is assumed that the terminal device has acquired 2 temporal layering levels, and that the code rate of any image frame in the 2 nd level is 1, and the code rate of each image frame in the 1 st level relative to any image frame in the 2 nd level is
Figure 608663DEST_PATH_IMAGE001
The code rate of layer 1 in a time domain reference unit is
Figure 577756DEST_PATH_IMAGE023
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 1 in the layer 2 is equal to
Figure 795110DEST_PATH_IMAGE003
The lower limit value of the code rate of each image frame in the layer 1 is
Figure 267680DEST_PATH_IMAGE015
Then, it can be calculated by the following formula
Figure 912288DEST_PATH_IMAGE002
Figure 861353DEST_PATH_IMAGE024
Wherein, in the step (A),
Figure 351240DEST_PATH_IMAGE025
namely, it is
Figure 943896DEST_PATH_IMAGE026
Based on which it can be determined
Figure 228246DEST_PATH_IMAGE027
And the following conditions need to be satisfied:
Figure 171932DEST_PATH_IMAGE028
that is to say
Figure 465510DEST_PATH_IMAGE029
Then, then
Figure 178251DEST_PATH_IMAGE030
Due to the fact that
Figure 899082DEST_PATH_IMAGE003
Take on values of integers and, therefore,
Figure 798905DEST_PATH_IMAGE031
suppose that
Figure 896174DEST_PATH_IMAGE032
Then, then
Figure 463422DEST_PATH_IMAGE033
Wherein, in the step (A),
Figure 355154DEST_PATH_IMAGE034
indicating a rounded up symbol.
The following determination is made for the case where the number of time-domain layered layers is 3
Figure 509317DEST_PATH_IMAGE022
And
Figure 144698DEST_PATH_IMAGE006
the process of (a) is exemplified: wherein the content of the first and second substances,
Figure 566452DEST_PATH_IMAGE003
indicating the number of image frames corresponding to each image frame in layer 2 in layer 1 in the time domain reference unit,
Figure 894665DEST_PATH_IMAGE006
representing the number of image frames corresponding to each image frame in the layer 2 in the time domain reference unit in the layer 3:
illustratively, assume that the terminal device acquires the number of time domain layering layers 3, andassuming that the code rate of any one image frame in the layer 3 is 1, the code rate of each image frame in the layer 1 relative to any one image frame in the layer 3 is
Figure 769080DEST_PATH_IMAGE001
The code rate of layer 1 in a time domain reference unit is
Figure 208152DEST_PATH_IMAGE014
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 1 in the layer 2 is equal to
Figure 484412DEST_PATH_IMAGE003
The lower limit value of the code rate of each image frame in the layer 1 is
Figure 717948DEST_PATH_IMAGE015
The code rate of each image frame in layer 2 relative to any image frame in layer 3 is
Figure 79659DEST_PATH_IMAGE004
The code rate of layer 2 in a time domain reference unit is
Figure 322421DEST_PATH_IMAGE016
In the time domain reference unit, the number of image frames corresponding to each image frame in the layer 2 in the layer 3 is equal to
Figure 453188DEST_PATH_IMAGE006
The lower limit value of the code rate of each image frame in the layer 2 is
Figure 857625DEST_PATH_IMAGE017
Then, it can be calculated by the following formula
Figure 470747DEST_PATH_IMAGE014
And
Figure 251621DEST_PATH_IMAGE016
Figure 502474DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 77811DEST_PATH_IMAGE036
namely, it is
Figure 679694DEST_PATH_IMAGE037
And the following conditions need to be satisfied:
Figure 998680DEST_PATH_IMAGE038
and therefore, the first and second electrodes are,
Figure 369618DEST_PATH_IMAGE039
based on the above,
Figure 115857DEST_PATH_IMAGE040
due to the fact that
Figure 939457DEST_PATH_IMAGE041
And therefore, the first and second electrodes are,
Figure 62134DEST_PATH_IMAGE042
and due to
Figure 22000DEST_PATH_IMAGE043
And therefore, the first and second electrodes are,
Figure 706184DEST_PATH_IMAGE044
therefore, it is
Figure 17080DEST_PATH_IMAGE045
Finally, finally
Figure 677868DEST_PATH_IMAGE046
Due to the fact that
Figure 757820DEST_PATH_IMAGE047
And therefore, the first and second electrodes are,
Figure 642599DEST_PATH_IMAGE048
substituting it into the formula
Figure 175211DEST_PATH_IMAGE049
In (1) obtaining
Figure 639691DEST_PATH_IMAGE050
Due to the fact that
Figure 839728DEST_PATH_IMAGE051
I.e. by
Figure 98671DEST_PATH_IMAGE052
And due to
Figure 384159DEST_PATH_IMAGE053
Therefore, it is
Figure 891145DEST_PATH_IMAGE054
Finally, finally
Figure 680109DEST_PATH_IMAGE055
Suppose that
Figure 641112DEST_PATH_IMAGE056
Then finally determined
Figure 148316DEST_PATH_IMAGE057
Figure 689019DEST_PATH_IMAGE058
Further, after the terminal device determines the number of image frames corresponding to the i +1 th layer of each image frame in the i-th layer in one temporal reference unit, the terminal device may determine the temporal reference structure.
Exemplarily, fig. 5 is a schematic diagram of a temporal reference structure provided by an embodiment of the present application, and as shown in fig. 5, the temporal reference structure includes a 3-layer temporal reference structure, and the number of image frames corresponding to each image frame in the layer 1 in a temporal reference unit at the layer 2 is 2, for example: the image frames of the layer 2 corresponding to the layer 1P 1 (6) are P2 (8) and P2 (10), the arrow direction refers to the reference image frame, such as the reference image frame of P2 (8) is P1 (6), and the number of image frames corresponding to the layer 3 in each image frame of the layer 2 in one temporal reference unit is 1, for example: the image frame of the 3 rd layer corresponding to the image frame P2 (8) of the 2 nd layer is P2 (9).
Exemplarily, fig. 6 is a schematic diagram of another time-domain reference structure provided by an embodiment of the present application, and as shown in fig. 6, the time-domain reference structure has a 2-layer time-domain reference structure, and the number of image frames corresponding to each image frame in the layer 1 in a time-domain reference unit at the layer 2 is 2, for example: the image frames of the 2 nd layer corresponding to the P1 (3) of the 1 st layer are P2 (4) and P2 (5), and the arrow direction refers to a reference image frame, such as the reference image frame of P2 (4) is P1 (3).
It should be noted that, the present application does not limit the terminal device to determine the time domain reference structure according to the number of image frames corresponding to the (i + 1) th layer of each image frame in the ith layer in one time domain reference unit.
Further, after determining the time-domain reference structure of the video sequence, the terminal device may determine whether intra-frame prediction or inter-frame prediction is specifically adopted for each image frame to perform image prediction, so as to obtain prediction information of a coding unit (i.e., an image block) in each image frame, and then subtract the prediction information from an original signal of the coding unit, so as to obtain a residual signal. After prediction, the amplitude of the residual signal is much smaller than that of the original signal, and transformation and quantization operations are further performed on the residual signal. And obtaining a transformation quantization coefficient after transformation quantization. And finally, coding the quantization coefficient and other indication information in the coding by an entropy coding technology to obtain a code stream.
To sum up, in the present application, the terminal device may set the number of time-domain layers and the ratio of code rates of each layer, that is, the ratio of code rates of each layer is defined, that is, different time-domain layers and ratios of code rates of each layer may be set for different terminal devices or different networks adopted by different terminal devices, for example: for a terminal device with better performance, the code rate of the base layer can be set to be higher than the code rate of the base layer of the terminal device with poorer performance, and the terminal device with better performance can consume more code words, so that the video smoothness of the terminal device can be ensured. For a terminal device with poor performance, the ratio of the base layer code rate that can be set for the terminal device is lower than the ratio of the base layer code rate that is set for the terminal device with better performance, because the terminal device with better performance cannot consume more codewords, it is necessary to ensure that the compression efficiency of such a terminal device is lower, and in short, by limiting the ratio of the code rates of each layer, the video compression efficiency and the video smoothness can be balanced. Furthermore, the method and the device can also specify a code rate lower limit value of the image frame, so that the video fluency can be further ensured.
It should be understood that in the actual image coding, the terminal device may code each image frame with a certain code rate to avoid the problem of low compression efficiency, and the determination of the code rate of each image frame will be described below:
alternatively, the terminal device may determine the code rate of each image frame in the N-1 layer according to the time domain reference structure and the ratio of the code rates of the N-1 layer in one time domain reference unit, where the code rate is the code rate of each image frame relative to the image frame in the highest enhancement layer, as described above. Of course, the code rate may also be a code rate relative to other image frames, for example: the code rate is relative to the image frame in the second last layer, as long as the reference values relative to the code rates of all the image frames are uniform, which is not limited in the present application. Further, when the terminal device performs coding of the video sequence, for each image frame in the N-1 layer, the coding rate of the image frame may be used for coding.
Optionally, the terminal device may calculate according to the following formula
Figure 863649DEST_PATH_IMAGE001
Figure 464394DEST_PATH_IMAGE004
……
Figure 458895DEST_PATH_IMAGE007
Figure 68868DEST_PATH_IMAGE059
Wherein the content of the first and second substances,
Figure 832425DEST_PATH_IMAGE012
indicating the number of image frames included in the ith layer in the above time-domain reference unit,
Figure 869651DEST_PATH_IMAGE021
. In the formula
Figure 351448DEST_PATH_IMAGE014
Figure 997DEST_PATH_IMAGE005
……
Figure 884639DEST_PATH_IMAGE008
Is known, and
Figure 92767DEST_PATH_IMAGE003
Figure 61860DEST_PATH_IMAGE006
……
Figure 748056DEST_PATH_IMAGE009
has been determined by the terminal device, and therefore, by substituting these already-determined parameters into the above formula, the result can be obtained
Figure 486205DEST_PATH_IMAGE001
Figure 130813DEST_PATH_IMAGE004
……
Figure 321623DEST_PATH_IMAGE007
For example, if the number of time-domain layers is 2, the code rate of the base layer in one time-domain reference unit is equal to
Figure 811510DEST_PATH_IMAGE014
The number of image frames in the enhancement layer corresponding to the image frames in the base layer in one time domain reference unit is
Figure 404165DEST_PATH_IMAGE022
Code rate of each image frame in the base layer
Figure 954095DEST_PATH_IMAGE060
For example, if the number of time-domain layers is 3, the code rate of the base layer in one time-domain reference unit is equal to
Figure 396316DEST_PATH_IMAGE014
The code rate of layer 2 in a time domain reference unit is
Figure 689894DEST_PATH_IMAGE005
The number of image frames in the base layer in the 2 nd layer in one time domain reference unit corresponds to the image frame number
Figure 137056DEST_PATH_IMAGE003
The number of image frames in the layer 2 in one time domain reference unit corresponds to the number of image frames in the layer 3
Figure 857887DEST_PATH_IMAGE006
Code rate of each image frame in the base layer
Figure 23289DEST_PATH_IMAGE061
Code rate of each image frame in layer 2
Figure 854979DEST_PATH_IMAGE062
In summary, in the present application, the terminal device may determine a code rate of each image frame in the time-domain reference structure, so as to perform encoding according to the respective code rates of the image frames.
Optionally, the terminal device may further determine whether there is a requirement for packet loss resistance, if there is no requirement for packet loss resistance, the time domain reference structure is not adjusted, and if there is a requirement for packet loss resistance, the time domain reference structure is adjusted. The terminal device may adjust the time domain reference structure in the following adjustment manner, but is not limited thereto: assuming that the number of image frames corresponding to the image frame in the i +1 th layer in an i-th layer in a temporal reference unit is greater than 1, for a last image frame in a plurality of image frames in the i +1 th layer corresponding to the image frame in the i-th layer, a reference image frame of the last image frame may be adjusted to be an image frame in the i-th layer. For example: when the time domain is divided into two layers and has better high network packet loss resistance, a certain Forward Error Correction (FEC) is added to the layer 1, and a time domain reference structure as shown in fig. 7 is selected; without the requirement of packet loss resistance, in order to maximize the image quality and the compression efficiency, the temporal reference structure shown in fig. 6 is adopted, and compared to the temporal reference structure shown in fig. 6, each image frame in the layer 1 in the temporal reference structure of fig. 7 is an image frame in the layer 1 of the reference image frame of the last image frame corresponding to the layer 2, for example: the reference image frame of P2 (2) is P1 (0), and the reference image frame of P2 (5) is P1 (3). For another example: if the time domain is divided into three layers and has better high network packet loss resistance, then a certain FEC is added into the layer 1, and the time domain reference structure shown in the figure 8 is selected; without the requirement of packet loss resistance, in order to maximize the image quality and the compression efficiency, the temporal reference structure shown in fig. 5 is adopted, and compared to the temporal reference structure shown in fig. 5, each image frame in the layer 1 in the temporal reference structure of fig. 8 is an image frame in the layer 1 of the reference image frame of the last image frame corresponding to the layer 2, for example: the reference image frame of P2 (4) is P1 (0), and the reference image frame of P2 (10) is P1 (6).
To sum up, in the present application, the terminal device may determine whether there is a requirement for packet loss resistance, if there is no requirement for packet loss resistance, the time domain reference structure is not adjusted, and if there is a requirement for packet loss resistance, the time domain reference structure is adjusted, so that while video compression efficiency and video smoothness are balanced, packet loss resistance can be better achieved, that is, video smoothness is better achieved.
Fig. 9 is a schematic diagram of a scalable video coding apparatus according to an embodiment of the present application, as shown in fig. 9, the scalable video coding apparatus includes:
a first obtaining module 901, configured to obtain a video sequence.
A second obtaining module 902, configured to obtain a code rate ratio of a time domain reference unit of each of the time domain layering layer number N and the N-1 layer of the video sequence.
A first determining module 903, configured to determine a time domain reference structure of the video sequence according to the number of time domain layering layers and a ratio of code rates of N-1 layers in a time domain reference unit.
And an encoding module 904, configured to encode the video sequence according to the temporal reference structure to obtain an output code stream.
Optionally, the first determining module 903 is specifically configured to: determining the number of image frames corresponding to the (i + 1) th layer of each image frame in the ith layer in one time domain reference unit according to the number of time domain layering layers and the code rate ratio of the N-1 layers in one time domain reference unit, wherein i =1,2 … … N, wherein N represents the number of layers of a time domain reference structure, and the 1 st layer is a basic layer. And determining a time domain reference structure of the video sequence according to the number of the image frames corresponding to the (i + 1) th layer.
Optionally, the scalable video encoding apparatus further includes: a third obtaining module 904, configured to obtain, for a time domain reference unit, a lower limit value of a code rate of each image frame in the N-1 layer before the first determining module 903 determines, according to the number of time domain layering layers and a ratio of code rates of the N-1 layers in each time domain reference unit, the number of image frames corresponding to the i +1 th layer of each image frame in the i-th layer in one time domain reference unit, where the lower limit value of the code rate is a lower limit value of the code rate of each image frame relative to the image frame in the highest enhancement layer. Correspondingly, the first determining module 903 is specifically configured to: and determining the number of image frames corresponding to each image frame in the ith layer in the (i + 1) th layer in one time domain reference unit according to the code rate lower limit value of each image frame in the N-1 layer, the number of time domain layering layers and the code rate ratio of the N-1 layers in one time domain reference unit.
Optionally, the first determining module 903 is specifically configured to: if the number of time domain layering layers is 2, the lower limit value of the code rate of each image frame in the base layer is
Figure 687806DEST_PATH_IMAGE015
The code rate of the base layer in a time domain reference unit is
Figure 313959DEST_PATH_IMAGE002
Then, the number of image frames in the base layer in one temporal reference unit corresponds to the number of image frames in the enhancement layer
Figure 966657DEST_PATH_IMAGE063
Optionally, the first determining module 903 is specifically configured to: if the number of time domain layering layers is 3, the lower limit value of the code rate of each image frame in the base layer is
Figure 867617DEST_PATH_IMAGE015
The lower limit value of the code rate of each image frame in the layer 2 is
Figure 23792DEST_PATH_IMAGE017
The code rate of the base layer in a time domain reference unit is
Figure 86426DEST_PATH_IMAGE014
The code rate of layer 2 in a time domain reference unit is
Figure 727885DEST_PATH_IMAGE016
Then, the number of image frames in the base layer in one time domain reference unit corresponds to the image frame number in the layer 2
Figure 166957DEST_PATH_IMAGE064
The number of image frames in the layer 2 in one time domain reference unit corresponds to the number of image frames in the layer 3
Figure 443217DEST_PATH_IMAGE065
Optionally, the scalable video encoding apparatus further includes: a second determining module 905, configured to, after the first determining module 903 determines the time-domain reference structure of the video sequence according to the number of time-domain layering layers and the bit rate ratio of each of the N-1 layers in one time-domain reference unit, determine a bit rate of each image frame in the N-1 layer according to the time-domain reference structure and the bit rate ratio of each of the N-1 layers in one time-domain reference unit, where the bit rate is a bit rate of each image frame relative to an image frame in a highest enhancement layer. Correspondingly, the encoding module 904 is specifically configured to: and coding the video sequence according to the time domain reference structure and the code rate of each image frame in the N-1 layer to obtain an output code stream.
Optionally, the second determining module 905 is specifically configured to: if the number of time domain layering layers is 2, the code rate of the base layer in one time domain reference unit is
Figure 942332DEST_PATH_IMAGE014
The number of image frames in the enhancement layer corresponding to the image frames in the base layer in one time domain reference unit is
Figure 304043DEST_PATH_IMAGE003
Code rate of each image frame in the base layer
Figure 546806DEST_PATH_IMAGE066
Optionally, the second determining module 905 is specifically configured to: if the number of time domain layering layers is 3, the code rate of the base layer in one time domain reference unit is equal to
Figure 677573DEST_PATH_IMAGE014
The code rate of layer 2 in a time domain reference unit is
Figure 82009DEST_PATH_IMAGE005
The number of image frames in the base layer in the 2 nd layer in one time domain reference unit corresponds to the image frame number
Figure 196596DEST_PATH_IMAGE003
The number of image frames in the layer 2 in one time domain reference unit corresponds to the number of image frames in the layer 3
Figure 711891DEST_PATH_IMAGE006
Code rate of each image frame in the base layer
Figure 962743DEST_PATH_IMAGE067
Code rate of each image frame in layer 2
Figure 803661DEST_PATH_IMAGE068
Optionally, the scalable video encoding apparatus further includes: a judging module 906 and an adjusting module 907, where the judging module 906 is configured to judge whether there is a packet loss resistance requirement after the first determining module 903 determines a time domain reference structure of the video sequence according to the number of time domain layering layers and a code rate ratio of N-1 layers in one time domain reference unit. The adjusting module 907 is configured to adjust the time domain reference structure if there is a packet loss tolerance requirement.
It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the apparatus shown in fig. 9 may execute the method embodiment corresponding to fig. 4, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing corresponding flows in each method in fig. 4, and are not described herein again for brevity.
The apparatus of the embodiments of the present application is described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.
Fig. 10 is a schematic block diagram of a terminal device provided in an embodiment of the present application.
As shown in fig. 10, the terminal device may include:
a memory 1010 and a processor 1020, the memory 1010 being adapted to store a computer program and to transfer the program code to the processor 1020. In other words, the processor 1020 can call and run the computer program from the memory 1010 to implement the method in the embodiment of the present application.
For example, the processor 1020 may be configured to perform the above-described method embodiments according to instructions in the computer program.
In some embodiments of the present application, the processor 1020 may include, but is not limited to:
general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.
In some embodiments of the present application, the memory 1010 includes, but is not limited to:
volatile memory and/or non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
In some embodiments of the present application, the computer program can be partitioned into one or more modules that are stored in the memory 1010 and executed by the processor 1020 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the terminal device.
As shown in fig. 10, the terminal device may further include:
a transceiver 1030, the transceiver 1030 being connectable to the processor 1020 or the memory 1010.
The processor 1020 may control the transceiver 1030 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 1030 may include a transmitter and a receiver. The transceiver 1030 may further include an antenna, and the number of antennas may be one or more.
It should be understood that the various components in the terminal device are connected by a bus system, wherein the bus system includes a power bus, a control bus and a status signal bus in addition to a data bus.
The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. In other words, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiments.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and all the changes or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for scalable video coding, comprising:
acquiring a video sequence;
acquiring the code rate ratio of the time domain layering layer number N and the N-1 layers of the video sequence in a time domain reference unit, wherein N is an integer greater than 1;
for the time domain reference unit, acquiring a code rate lower limit value of each image frame in the N-1 layers, wherein the code rate lower limit value is the code rate lower limit value of each image frame relative to the image frame in the highest enhancement layer;
determining the number of image frames corresponding to each image frame in the ith layer in one time domain reference unit in the i +1 th layer according to the code rate lower limit value of each image frame in the N-1 layers, the number N of the time domain layering layers and the code rate ratio of the N-1 layers in one time domain reference unit, wherein i =1,2 … … N-1, and the 1 st layer is a basic layer;
determining a time domain reference structure of the video sequence according to the number of image frames corresponding to each image frame in the ith layer in the ith time domain reference unit in the (i + 1) th layer;
and coding the video sequence according to the time domain reference structure to obtain an output code stream.
2. The method according to claim 1, wherein the determining the number of image frames corresponding to each image frame in the i-th layer in one temporal reference unit at the i + 1-th layer according to the lower limit value of the code rate of each image frame in the N-1 layers, the number N of temporal layering layers, and the ratio of the code rates of the N-1 layers in the temporal reference unit respectively comprises:
if the number of time domain layering layers N is 2, the lower limit value of the code rate of each image frame in the base layer is
Figure 88846DEST_PATH_IMAGE001
The code rate of the base layer in the time domain reference unit is
Figure 112166DEST_PATH_IMAGE002
Then, the number of image frames in the base layer in the one temporal reference unit corresponds to the number of image frames in the enhancement layer
Figure 822633DEST_PATH_IMAGE003
3. The method according to claim 1, wherein the determining the number of image frames corresponding to each image frame in the i-th layer in one temporal reference unit at the i + 1-th layer according to the lower limit value of the code rate of each image frame in the N-1 layers, the number N of temporal layering layers, and the ratio of the code rates of the N-1 layers in the temporal reference unit respectively comprises:
if the number of time domain layering layers N is 3, the lower limit value of the code rate of each image frame in the base layer is
Figure 363336DEST_PATH_IMAGE001
The lower limit value of the code rate of each image frame in the layer 2 is
Figure 867128DEST_PATH_IMAGE004
The code rate of the base layer in the time domain reference unit is
Figure 671136DEST_PATH_IMAGE005
The code rate of the layer 2 in the time domain reference unit is
Figure 727953DEST_PATH_IMAGE006
Then, the number of image frames in the base layer in the one temporal reference unit corresponding to the image frame in the layer 2 is the same as the number of image frames in the one temporal reference unit corresponding to the image frame in the base layer
Figure 275609DEST_PATH_IMAGE007
The number of image frames in the layer 2 in the one temporal reference unit corresponds to the number of image frames in the layer 3
Figure 39166DEST_PATH_IMAGE008
4. The method according to any one of claims 1-3, wherein said determining the temporal reference structure of the video sequence according to the number of image frames corresponding to the i +1 th layer for each image frame in the i-th layer in the one temporal reference unit further comprises:
determining a code rate of each image frame in the N-1 layers according to the time domain reference structure and the code rate ratio of each of the N-1 layers in one time domain reference unit, wherein the code rate is the code rate of each image frame relative to the image frame in the highest enhancement layer;
correspondingly, the encoding the video sequence according to the temporal reference structure to obtain an output code stream includes:
and coding the video sequence according to the time domain reference structure and the code rate of each image frame in the N-1 layers to obtain an output code stream.
5. The method of claim 4, wherein the determining the code rate for each image frame in the N-1 layers according to the temporal reference structure and the code rate ratio of each of the N-1 layers in one temporal reference unit comprises:
if the time domain layering layer number N is 2, the code rate of the base layer in the time domain reference unit is equal to
Figure 155021DEST_PATH_IMAGE005
The number of image frames in the enhancement layer corresponding to the image frames in the base layer in the one temporal reference unit is
Figure 308921DEST_PATH_IMAGE009
Code rate of each image frame in the base layer
Figure 784902DEST_PATH_IMAGE010
6. The method of claim 4, wherein the determining the code rate for each image frame in the N-1 layers according to the temporal reference structure and the code rate ratio of each of the N-1 layers in one temporal reference unit comprises:
if the time domain layering layer number N is 3, the code rate of the base layer in the time domain reference unit is equal to
Figure 606228DEST_PATH_IMAGE005
The code rate of the layer 2 in the time domain reference unit is
Figure 814355DEST_PATH_IMAGE011
The number of image frames in the base layer in the one time domain reference unit corresponds to the number of image frames in the 2 nd layer
Figure 95033DEST_PATH_IMAGE012
The number of image frames in the layer 2 in the one temporal reference unit corresponds to the number of image frames in the layer 3
Figure 984491DEST_PATH_IMAGE013
Code rate of each image frame in the base layer
Figure 784957DEST_PATH_IMAGE014
Code rate of each image frame in the layer 2
Figure 367248DEST_PATH_IMAGE015
7. The method according to any one of claims 1-3, wherein said determining the temporal reference structure of the video sequence according to the number of image frames corresponding to the i +1 th layer for each image frame in the i-th layer in the one temporal reference unit further comprises:
judging whether a packet loss resistance requirement exists or not;
and if the packet loss resistance requirement exists, adjusting the time domain reference structure.
8. A scalable video encoding apparatus, comprising:
the first acquisition module is used for acquiring a video sequence;
a second obtaining module, configured to obtain a code rate ratio of a time domain layering layer number N and N-1 layers of the video sequence in a time domain reference unit;
a third obtaining module, configured to obtain, for the time-domain reference unit, a code rate lower limit value of each image frame in the N-1 layers, where the code rate lower limit value is a code rate lower limit value of each image frame relative to an image frame in a highest enhancement layer;
a first determining module, configured to determine, according to a code rate lower limit value of each image frame in the N-1 layers, the number N of temporal layering layers, and a code rate ratio of each of the N-1 layers in a temporal reference unit, a number of image frames corresponding to an i +1 th layer of each image frame in an i-th layer in the temporal reference unit, where i =1,2 … … N-1, and the 1 st layer is a base layer; determining a time domain reference structure of the video sequence according to the number of image frames corresponding to each image frame in the ith layer in the ith time domain reference unit in the (i + 1) th layer;
and the coding module is used for coding the video sequence according to the time domain reference structure so as to obtain an output code stream.
9. An encoding device, characterized by comprising:
a processor and a memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program which causes a computer to perform the method of any one of claims 1 to 7.
CN202110755288.3A 2021-07-05 2021-07-05 Scalable video coding method, apparatus, device and storage medium Active CN113259673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110755288.3A CN113259673B (en) 2021-07-05 2021-07-05 Scalable video coding method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110755288.3A CN113259673B (en) 2021-07-05 2021-07-05 Scalable video coding method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN113259673A CN113259673A (en) 2021-08-13
CN113259673B true CN113259673B (en) 2021-10-15

Family

ID=77190603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110755288.3A Active CN113259673B (en) 2021-07-05 2021-07-05 Scalable video coding method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN113259673B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625265A (en) * 2003-12-01 2005-06-08 三星电子株式会社 Method and apparatus for scalable video encoding and decoding
EP2813020A1 (en) * 2012-02-11 2014-12-17 VID SCALE, Inc. Method and apparatus for video aware hybrid automatic repeat request
CN106982382A (en) * 2006-10-16 2017-07-25 维德约股份有限公司 For the signaling in gradable video encoding and perform time stage switching system and method
CN107592540A (en) * 2016-07-07 2018-01-16 腾讯科技(深圳)有限公司 A kind of video data handling procedure and device
WO2018145561A1 (en) * 2017-02-07 2018-08-16 腾讯科技(深圳)有限公司 Code rate control method, electronic device, and computer-readable storage medium
CN112468818A (en) * 2021-01-22 2021-03-09 腾讯科技(深圳)有限公司 Video communication realization method and device, medium and electronic equipment
CN112543329A (en) * 2019-09-23 2021-03-23 安讯士有限公司 Video encoding method and method for reducing file size of encoded video

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104125464A (en) * 2005-12-08 2014-10-29 维德约股份有限公司 Systems and methods for error resilience and random access in video communication systems
US9591316B2 (en) * 2014-03-27 2017-03-07 Intel IP Corporation Scalable video encoding rate adaptation based on perceived quality
CN109936744B (en) * 2017-12-19 2020-08-18 腾讯科技(深圳)有限公司 Video coding processing method and device and application with video coding function

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625265A (en) * 2003-12-01 2005-06-08 三星电子株式会社 Method and apparatus for scalable video encoding and decoding
CN106982382A (en) * 2006-10-16 2017-07-25 维德约股份有限公司 For the signaling in gradable video encoding and perform time stage switching system and method
EP2813020A1 (en) * 2012-02-11 2014-12-17 VID SCALE, Inc. Method and apparatus for video aware hybrid automatic repeat request
CN107592540A (en) * 2016-07-07 2018-01-16 腾讯科技(深圳)有限公司 A kind of video data handling procedure and device
WO2018145561A1 (en) * 2017-02-07 2018-08-16 腾讯科技(深圳)有限公司 Code rate control method, electronic device, and computer-readable storage medium
CN112543329A (en) * 2019-09-23 2021-03-23 安讯士有限公司 Video encoding method and method for reducing file size of encoded video
CN112468818A (en) * 2021-01-22 2021-03-09 腾讯科技(深圳)有限公司 Video communication realization method and device, medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Optimized Temporal Scalability for H.264 based Codecs and its Applications to Video Conferencing;Hans L. Cycon et al.;《"Optimized Temporal Scalability for H.264 based Codecs and its Applications to Video Conferencing",Hans L. Cycon et al., 2010 IEEE 14th International Symposium on Consumer Electronics,26 July 2010,Page(s):1-5》;20100726;1-5 *

Also Published As

Publication number Publication date
CN113259673A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
US10321138B2 (en) Adaptive video processing of an interactive environment
CN101010964B (en) Method and apparatus for using frame rate up conversion techniques in scalable video coding
US20190020888A1 (en) Compound intra prediction for video coding
US11647223B2 (en) Dynamic motion vector referencing for video coding
US8243117B2 (en) Processing aspects of a video scene
WO2023142716A1 (en) Encoding method and apparatus, real-time communication method and apparatus, device, and storage medium
KR20180069905A (en) Motion vector reference selection via reference frame buffer tracking
TWI689198B (en) Method and apparatus for encoding processing blocks of a frame of a sequence of video frames using skip scheme
CN113132728B (en) Coding method and coder
Wang et al. Bit-rate allocation for broadcasting of scalable video over wireless networks
CN113259673B (en) Scalable video coding method, apparatus, device and storage medium
US10820014B2 (en) Compound motion-compensated prediction
CN112004084B (en) Code rate control optimization method and system by utilizing quantization parameter sequencing
CN110731082B (en) Compression of groups of video frames using reverse ordering
US11218737B2 (en) Asymmetric probability model update and entropy coding precision
WO2023231775A1 (en) Filtering method, filtering model training method and related device
WO2023165487A1 (en) Feature domain optical flow determination method and related device
WO2021057478A1 (en) Video encoding and decoding method and related apparatus
CN109413446B (en) Gain control method in multiple description coding
CN117461314A (en) System and method for combining sub-block motion compensation and overlapped block motion compensation
JP2024513873A (en) Geometric partitioning with switchable interpolation filters
CN113905233A (en) Entropy decoding method based on audio video coding standard, readable medium and electronic device thereof
CN116916032A (en) Video encoding method, video encoding device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40051197

Country of ref document: HK