WO2024077616A1 - 编解码方法、装置、设备、及存储介质 - Google Patents
编解码方法、装置、设备、及存储介质 Download PDFInfo
- Publication number
- WO2024077616A1 WO2024077616A1 PCT/CN2022/125490 CN2022125490W WO2024077616A1 WO 2024077616 A1 WO2024077616 A1 WO 2024077616A1 CN 2022125490 W CN2022125490 W CN 2022125490W WO 2024077616 A1 WO2024077616 A1 WO 2024077616A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- level
- sub
- viewpoint
- viewpoints
- array
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 129
- 238000004590 computer program Methods 0.000 claims description 26
- 230000008569 process Effects 0.000 abstract description 26
- 230000006835 compression Effects 0.000 abstract description 18
- 238000007906 compression Methods 0.000 abstract description 18
- 238000010586 diagram Methods 0.000 description 21
- 238000013139 quantization Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 230000009466 transformation Effects 0.000 description 11
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 10
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 9
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 6
- 238000001914 filtration Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- the present application relates to the field of image processing technology, and in particular to a coding and decoding method, apparatus, device, and storage medium.
- Multi-viewpoint video can be used as a visual media object.
- Multi-viewpoint video is an immersive media video captured by multiple cameras, containing different perspectives and supporting user interaction.
- a multi-view video coding and decoding technology uses a 2-D hierarchical coding structure to combine multi-view images into a video sequence, and uses existing video compression tools to compress the video sequence to obtain a compressed video stream.
- the video compression efficiency of this method is low.
- the embodiments of the present application provide a coding and decoding method, apparatus, device, and storage medium, which can better utilize the spatial position correlation between two-dimensionally distributed viewpoints, and help improve the video compression efficiency of a multi-viewpoint array.
- the present application provides an encoding method, comprising:
- the hierarchical prediction structure includes a first level and a second level, the first level includes at least one view of the multi-view array, and the second level includes other views except the first level;
- the viewpoint of the second level is predictively encoded to obtain a reconstructed image.
- an embodiment of the present application provides a decoding method, including:
- the hierarchical prediction structure includes a first level and a second level, the first level includes at least one view of the multi-view array, and the second level includes other views except the first level;
- the viewpoint of the second level is predicted and decoded to obtain a reconstructed image.
- the present application provides a coding device for executing the method in the first aspect or its respective implementations.
- the prediction device includes a functional unit for executing the method in the first aspect or its respective implementations.
- the present application provides a decoding device for executing the method in the second aspect or its respective implementations.
- the prediction device includes a functional unit for executing the method in the second aspect or its respective implementations.
- an encoder comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the first aspect or its implementations.
- a decoder comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or its implementations.
- a coding and decoding system including an encoder and a decoder.
- the encoder is used to execute the method in the first aspect or its respective implementations
- the decoder is used to execute the method in the second aspect or its respective implementations.
- a chip for implementing the method in any one of the first to second aspects or their respective implementations.
- the chip includes: a processor for calling and running a computer program from a memory, so that a device equipped with the chip executes the method in any one of the first to second aspects or their respective implementations.
- a computer-readable storage medium for storing a computer program, wherein the computer program enables a computer to execute the method of any one of the first to second aspects or any of their implementations.
- a computer program product comprising computer program instructions, which enable a computer to execute the method in any one of the first to second aspects or their respective implementations.
- a computer program which, when executed on a computer, enables the computer to execute the method in any one of the first to second aspects or in each of their implementations.
- a code stream is provided, which is generated based on the method of the first aspect.
- the embodiment of the present application can first predict and encode at least one viewpoint of the first level to obtain a reference frame according to the hierarchical prediction structure of the multi-viewpoint array during the frame encoding process, and then predict and encode the viewpoint of the second level according to the reference frame to obtain a reconstructed image, so that a reference relationship can be established between viewpoints at different positions of the second level and the first level (such as different rows or columns), and the spatial position correlation between viewpoints distributed in two dimensions can be better utilized, which helps to improve the video compression efficiency of the multi-viewpoint array.
- the hierarchical prediction structure of the embodiment of the present application can include the reference position of the viewpoint in the multi-view array, without the need for additional means, such as establishing a position lookup table to supplement the reference position of each viewpoint, which helps to further improve the compression efficiency of the multi-view array.
- FIG1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application.
- FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application.
- FIG2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
- FIG3A is a schematic diagram of a multi-view array
- FIG3B is a schematic diagram of the encoding order of a multi-view array
- FIG4 is a schematic diagram of a coding method flow chart provided by an embodiment of the present application.
- FIG5A is a schematic diagram of DO of a multi-view array provided in an embodiment of the present application.
- FIG5B is a schematic diagram of a hierarchical prediction structure of a multi-view array provided by an embodiment of the present application.
- FIG6 is a schematic diagram of an encoding process provided by another embodiment of the present application.
- FIG7 is a schematic diagram of an EO of a multi-view array provided in an embodiment of the present application.
- FIG8 is a schematic diagram of an EO of a multi-view array provided by another embodiment of the present application.
- FIG9 is a schematic diagram of a decoding method flow chart provided in an embodiment of the present application.
- FIG10 is a schematic block diagram of an encoding device provided in an embodiment of the present application.
- FIG11 is a schematic block diagram of a decoding device provided in an embodiment of the present application.
- FIG12 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
- the present application can be applied to the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, the field of real-time video coding and decoding, etc.
- the scheme of the present application can be combined with an audio and video coding standard (AVS), such as the H.264/advanced video coding (AVC) standard, the H.265/high efficiency video coding (HEVC) standard, and the H.266/versatile video coding (VVC) standard.
- AVC H.264/advanced video coding
- HEVC high efficiency video coding
- VVC variatile video coding
- the scheme of the present application can be combined with other proprietary or industry standards for operation, and the standards include ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-TH.262 or ISO/IEC MPEG-2 Visual, ITU-TH.263, ISO/IEC MPEG-4 Visual, ITU-TH.264 (also known as ISO/IEC MPEG-4 AVC), including scalable video coding (SVC) and multi-view video coding (MVC) extensions.
- SVC scalable video coding
- MVC multi-view video coding
- the encoding involved in the embodiment of the present application is mainly video encoding and decoding.
- the video encoding and decoding system involved in the embodiment of the present application is first introduced in conjunction with Figure 1.
- FIG1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG1 is only an example, and the video encoding and decoding system of the embodiment of the present application includes but is not limited to that shown in FIG1.
- the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120.
- the encoding device is used to encode (which can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
- the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
- the encoding device 110 of the embodiment of the present application can be understood as a device with a video encoding function
- the decoding device 120 can be understood as a device with a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, such as smartphones, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, etc.
- the encoding device 110 may transmit the encoded video data (eg, a code stream) to the decoding device 120 via the channel 130.
- the channel 130 may include one or more media and/or devices capable of transmitting the encoded video data from the encoding device 110 to the decoding device 120.
- the channel 130 includes one or more communication media that enable the encoding device 110 to transmit the encoded video data directly to the decoding device 120 in real time.
- the encoding device 110 can modulate the encoded video data according to the communication standard and transmit the modulated video data to the decoding device 120.
- the communication medium includes a wireless communication medium, such as a radio frequency spectrum, and optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
- the channel 130 includes a storage medium that can store the video data encoded by the encoding device 110.
- the storage medium includes a variety of locally accessible data storage media, such as an optical disk, a DVD, a flash memory, etc.
- the decoding device 120 can obtain the encoded video data from the storage medium.
- the channel 130 may include a storage server that can store the video data encoded by the encoding device 110.
- the decoding device 120 can download the stored encoded video data from the storage server.
- the storage server can store the encoded video data and transmit the encoded video data to the decoding device 120, such as a web server (e.g., for a website), a file transfer protocol (FTP) server, etc.
- FTP file transfer protocol
- the encoding device 110 includes a video encoder 112 and an output interface 113.
- the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
- the encoding device 110 may further include a video source 111 in addition to the video encoder 112 and the input interface 113 .
- the video source 111 may include at least one of a video acquisition device (eg, a camera), a video archive, a video input interface, and a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system is used to generate video data.
- a video acquisition device eg, a camera
- a video archive e.g., a video archive
- a video input interface e.g., a computer graphics system
- the video input interface is used to receive video data from a video content provider
- the computer graphics system is used to generate video data.
- the video encoder 112 encodes the video data from the video source 111 to generate a bitstream.
- the video data may include one or more pictures or a sequence of pictures.
- the bitstream contains the encoding information of the picture or the sequence of pictures in the form of a bitstream.
- the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113.
- the encoded video data may also be stored in a storage medium or a storage server for subsequent reading by the decoding device 120.
- the decoding device 120 includes an input interface 121 and a video decoder 122 .
- the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
- the input interface 121 includes a receiver and/or a modem.
- the input interface 121 can receive the encoded video data through the channel 130 .
- the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
- the decoded video data is displayed on the display device 123.
- the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120.
- the display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- FIG1 is only an example, and the technical solution of the embodiment of the present application is not limited to FIG1 .
- the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
- FIG. 2A is a schematic block diagram of a video encoder according to an embodiment of the present application.
- the video encoder 200 may include a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, a loop filter unit 260, a decoded image buffer 270, and an entropy coding unit 280. It should be noted that the video encoder 200 may include more, fewer, or different functional components.
- the prediction unit 210 includes an inter-frame prediction unit 211 and an intra-frame estimation unit 212. Since there is a strong correlation between adjacent pixels in a frame of a video, the intra-frame prediction method is used in the video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in a video, the inter-frame prediction method is used in the video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving the coding efficiency.
- the inter-frame prediction unit 211 can be used for inter-frame prediction, which may include motion estimation and motion compensation. It may refer to image information of different frames. Inter-frame prediction uses motion information to find a reference block from a reference frame, and generates a prediction block based on the reference block to eliminate temporal redundancy.
- the frames used for inter-frame prediction may be P frames and/or B frames, where P frames refer to forward prediction frames and B frames refer to bidirectional prediction frames.
- the intra-frame estimation unit 212 only refers to the information of the same frame image to predict the pixel information in the current image block to eliminate spatial redundancy.
- the frame used for intra-frame prediction may be an I frame.
- FIG. 2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
- the video decoder 300 includes an entropy decoding unit 310, a prediction unit 320, an inverse quantization/transformation unit 330, a reconstruction unit 340, a loop filter unit 350, and a decoded image buffer 360.
- the prediction unit 320 includes an inter-frame prediction unit 321 and an intra-frame estimation unit 322. It should be noted that the video decoder 300 may include more, fewer, or different functional components.
- the video decoder 300 may receive a bitstream.
- the entropy decoding unit 310 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, the entropy decoding unit 310 may parse the syntax elements in the bitstream that have been entropy encoded.
- the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340, and the loop filter unit 350 may decode the video data according to the syntax elements extracted from the bitstream, that is, generate decoded video data.
- the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
- the residual unit 220 can calculate the residual block based on the original block of the prediction block and the current block, that is, the difference between the original block of the prediction block and the current block, and the residual block can also be called residual information.
- the residual block can remove information that is not sensitive to the human eye through the transformation and quantization process of the transformation/quantization unit 230 to eliminate visual redundancy.
- the residual block before transformation and quantization by the transformation/quantization unit 230 can be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 can be called a frequency residual block or a frequency domain residual block.
- the entropy coding unit 280 receives the quantized change coefficient output by the change quantization unit 230, and can entropy encode the quantized change coefficient and output a bit stream. For example, the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary bit stream.
- the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
- the prediction unit 320 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block based on the prediction information.
- the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to inverse quantize and inverse transform the quantization coefficient matrix to obtain a residual block.
- the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed block.
- the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or on the block to obtain a decoded image.
- the encoding end also requires similar operations as the decoding end to obtain a decoded image.
- the decoded image can also be called a reconstructed image, and the reconstructed image can be used as a reference frame for inter-frame prediction for subsequent
- the block division information determined by the encoder as well as the mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the bitstream when necessary.
- the decoder parses the bitstream and determines the same block division information, prediction, transformation, quantization, entropy coding, loop filtering, etc. mode information or parameter information as the encoder by analyzing the existing information, thereby ensuring that the decoded image obtained by the encoder is the same as the decoded image obtained by the decoder.
- the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. The present application is applicable to the basic process of the video codec under the block-based hybrid coding framework, but is not limited to the framework and process.
- multi-viewpoint video can appear in a 3D scene.
- Multi-viewpoint video is an immersive media video captured by multiple cameras, containing different perspectives and supporting user interaction. It is also called multi-view video, free-viewpoint video, etc.
- Multi-view video is usually obtained by shooting the same three-dimensional scene from multiple angles by a camera array.
- the multiple cameras in the camera array are properly positioned during the shooting process so that each camera can capture the scene from a viewpoint.
- the images obtained by multiple cameras are called multi-view images.
- Multi-view images can form a multi-view image array according to the spatial position relationship, which can also be called a multi-view array.
- multiple cameras will capture multiple video sequences corresponding to multiple viewpoints.
- more cameras are used to generate multi-viewpoint videos with a large number of video sequences related to the viewpoints.
- the video needs to be compressed and encoded.
- the video compression algorithm can be completed by AVS3 encoding technology, HEVC encoding technology, etc.
- a multi-view video encoding and decoding technology uses a 2-D hierarchical coding structure to compose a multi-view array into a video sequence (or image sequence). Then, the video sequence is compressed using an existing video compression tool to obtain a compressed video code stream.
- the video source 111 can compose a video sequence from a multi-view image array obtained by a video acquisition device, and then input the video sequence as video data into a video encoder 112, which encodes the video sequence to generate a code stream.
- the key to the above scheme is to determine the frame coding order of each multi-view image in the multi-view array, which corresponds to the order of the video sequence. Specifically, during the frame coding process, the scheme extends the 1-D hierarchical coding structure used for ordinary video coding to a two-dimensional case.
- a classic encoding order is 0, 16, 8, 4, 2, 1, 3, 6, 5, 7, 12, 10, 9, 11, 14, 13, 15.
- the 0th frame image can be an I frame, and each coding unit can only use the information of the current frame image for prediction; the 16th frame can be a P frame, which can use the forward inter-frame information for prediction; the remaining 1 to 15 frames can support bidirectional prediction.
- This encoding order can reduce the storage occupation of the reference frame in the buffer.
- FIG3A shows a specific example of a multi-view array.
- the multi-view array includes 165 viewpoints, and each viewpoint is numbered as shown in FIG1 , wherein the central viewpoint is numbered 0, and the remaining viewpoints are numbered row by row from 1 to 164. These numbers are called viewpoint sequence numbers (picture order count, POC).
- viewpoint sequence numbers picture order count, POC.
- the existing two-dimensional hierarchical coding structure scheme divides all viewpoints (i.e., the multi-view array) into four parts for encoding, as shown by the dotted line in FIG1 .
- the aforementioned one-dimensional hierarchical coding structure is used for encoding in the horizontal and vertical directions respectively. Specifically, the 0th row is encoded first, followed by the 6th row, then the 3rd row, and so on. For the internal encoding of each row, the 0th column is encoded first, then the 6th column, then the 3rd column, and so on.
- the above scheme extends the one-dimensional hierarchical coding structure used for ordinary video coding to the two-dimensional case during the frame coding process. It is a simple imitation and extension of the one-dimensional hierarchical coding structure, ignoring the spatial position correlation of each viewpoint in the multi-view array on the two-dimensional plane, which will reduce the video compression efficiency of the multi-view array. At the same time, the scheme needs to use additional means, such as establishing a position lookup table to supplement the reference position of each viewpoint.
- the embodiment of the present application determines the hierarchical prediction structure of the multi-viewpoint array, and in the frame encoding process first predicts and encodes at least one viewpoint of the first level to obtain a reference frame, and then predicts and encodes the viewpoint of the second level according to the reference frame to obtain a reconstructed image, so that a reference relationship can be established between viewpoints at different positions of the second level and the first level (such as different rows or columns), and the spatial position correlation between viewpoints distributed in two dimensions can be better utilized, which helps to improve the video compression efficiency of the multi-viewpoint array.
- the video encoding method provided in the embodiment of the present application is introduced by taking the encoding end as an example.
- FIG. 4 is a schematic flow chart of an encoding method 400 provided in an embodiment of the present application. As shown in FIG. 4 , the method 400 in the embodiment of the present application includes:
- S401 determining a hierarchical prediction structure of a multi-view array, the hierarchical prediction structure comprising a first level and a second level, the first level comprising at least one view of the multi-view array, and the second level comprising other viewpoints except the first level.
- multi-viewpoint video can be used as the visual media object.
- VR virtual reality
- AR augmented reality
- MR mixed reality
- Multi-view video is obtained by shooting the same 3D scene from multiple angles using multiple cameras (such as a camera array).
- the images obtained by multiple cameras are called multi-view images.
- Multi-view images can form a multi-view array according to the spatial position relationship.
- Each viewpoint in the multi-view array has horizontal and vertical parallax.
- the multi-view array may be obtained by arranging dense multi-view images according to a spatial position relationship, wherein each multi-view image is arranged at a high density.
- the multi-view array may include a centrally symmetric multi-view array, such as a square multi-view array.
- the multi-view array may include multi-view images acquired by multiple cameras at the same time or at different times, which is not limited in the present application.
- a hierarchical prediction structure (HPS) of a multi-view array can be determined. Since the multi-view array is formed based on the spatial position relationship of the multi-view images, the spatial position relationship of each level in the hierarchical prediction structure is different. For example, the spatial position of the viewpoint of the first level is different from that of the viewpoint of the second level. In other words, the hierarchical prediction structure can contain spatial position information between multiple views.
- the first-level viewpoints may be basic viewpoints in a multi-viewpoint array, such as relatively important viewpoints, and may provide reference information for the second-level viewpoints as a reference for the second-level viewpoints.
- the shooting angle of the camera corresponding to each image in the multi-view array is different, and each viewpoint position is related to the shooting angle of the corresponding camera, so the hierarchical prediction structure obtained for the multi-view array is related to the shooting angle of the camera.
- the hierarchical prediction structure can also be called an angular hierarchy prediction structure (AHPS), which is not limited in this application.
- the multi-view array may also be referred to as a multi-view image array, a multi-view array, etc., which is not limited in this application.
- the multi-viewpoint images in the multi-viewpoint array can be rearranged into a video sequence (or image sequence).
- Display order also known as picture order count (POC) refers to the sequential index of each viewpoint image in the video sequence.
- the central viewpoint image i.e., the central viewpoint image
- the central viewpoint image is designated as the first frame in the video sequence and is numbered 0 (i.e., DO#0).
- the remaining viewpoints i.e., viewpoint images
- DOs are assigned DOs in sequence from left to right and from top to bottom.
- FIG. 5A shows a specific example of DO for a 9 ⁇ 9 multi-viewpoint array. Each square in FIG. 5A represents a viewpoint, and the number in the square is the DO of the corresponding viewpoint.
- the first level includes a first sub-level.
- the first sub-level includes a central viewpoint. Because the central viewpoint is located at the center of the multi-view array, it has the smallest average parallax with all viewpoints in the multi-view array and can provide more and more accurate reference information, which can be used as a reference for subsequent frames.
- the first level may further include a second sub-level.
- the second sub-level includes at least two viewpoints uniformly distributed in the multi-view array.
- the second sub-level may include at least two viewpoints uniformly distributed in the multi-view array in a sparse manner.
- the first level such as the first sub-level and the second sub-level
- the first level is the most basic constituent element of the hierarchical prediction structure, and can be the basic viewpoint in the multi-viewpoint array, serving as a reference for subsequent frames (such as frames corresponding to the second-level viewpoints), thereby being able to utilize the spatial position correlation between viewpoints distributed in two dimensions during the frame encoding process.
- the second sub-level may include at least one of a plurality of viewpoints evenly distributed on edge lines (i.e., edge positions) of the multi-viewpoint array, and viewpoints located at intermediate positions between the center viewpoint and the vertex viewpoint on the diagonal lines (i.e., diagonal positions) of the multi-viewpoint array. It should be noted that the second sub-level does not include the center viewpoint.
- the edge line of the multi-view array may include at least one of the leftmost column (such as the first column), the rightmost column (the last column), the topmost row (the first row), and the bottommost row (the last row) of the multi-view array.
- the diagonal line of the multi-view array may include at least one of the first connecting line between the upper left vertex viewpoint and the lower right vertex viewpoint of the multi-view array and the second connecting line between the upper right vertex viewpoint and the lower left vertex viewpoint.
- the center viewpoint is located at the intersection of the first connecting line and the second connecting line.
- Fig. 5B shows a specific example of a hierarchical prediction structure of a 9 ⁇ 9 multi-view array.
- the first level includes a first sub-level H0 and a second sub-level H1.
- the first sub-level H0 includes a central viewpoint, DO#0, located at the center of the multi-view array.
- the second sub-level H1 includes 8 viewpoints evenly distributed on the edge lines of the multi-view array, DO#1, DO#5, DO#9, DO#37, DO#44, DO#72, DO#76, DO#80, and 4 viewpoints evenly distributed between the center viewpoint and the 4 vertex viewpoints on the two diagonals of the multi-view array, such as viewpoint DO#21 located in the middle between the center viewpoint DO#0 and the vertex viewpoint DO#1, viewpoint DO#25 located in the middle between the center viewpoint DO#0 and the vertex viewpoint DO#9, viewpoint DO#56 located in the middle between the center viewpoint DO#0 and the vertex viewpoint DO#72, and viewpoint DO#60 located in the middle between the center viewpoint DO#0 and the vertex viewpoint DO#80, for a total of 12 viewpoints.
- the 12 viewpoints in the second sub-level H1 are sparse and evenly distributed in the view.
- the second level includes a third sub-level
- the third sub-level includes at least two viewpoints located between the first sub-level and the second sub-level on the horizontal center axis and the vertical center axis of the multi-viewpoint array.
- the viewpoints of the third sub-level have the function of connecting the first sub-level and the second sub-level.
- the second level includes the third sub-level H2
- the third sub-level H2 includes the viewpoint DO#39 between the viewpoint DO#0 of the first sub-level H0 and the viewpoint DO#37 of the second sub-level H1 on the horizontal center axis of the multi-viewpoint array, the viewpoint DO#42 between the viewpoint DO#0 of the first sub-level H0 and the viewpoint DO#44 of the second sub-level H1, the viewpoint DO#23 between the viewpoint DO#0 of the first sub-level H0 and the viewpoint DO#5 of the second sub-level H1 on the vertical center axis, and the viewpoint DO#58 between the viewpoint DO#0 of the first sub-level H0 and the viewpoint DO#76 of the second sub-level H1, for a total of four viewpoints.
- the second level further includes a fourth sub-level, the fourth sub-level including viewpoints between viewpoints of the first sub-level located on the edge line of the multi-view array, and at least two viewpoints between the third sub-level and the first sub-level and between the third sub-level and the second sub-level on the horizontal center axis and the vertical center axis of the multi-view array.
- the viewpoints of the fourth sub-level have the function of filling the gap between the third sub-level and the second sub-level.
- the second level also includes a fourth sub-level H3.
- the fourth sub-level H3 includes 16 viewpoints, which are:
- viewpoints in total including viewpoint DO#3 between viewpoints DO#1 and DO#5, viewpoint DO#7 between viewpoints DO#5 and DO#9, viewpoint DO#27 between viewpoints DO#9 and DO#44, viewpoint DO#62 between viewpoints DO#44 and DO#80, viewpoint DO#78 between viewpoints DO#80 and DO#76, viewpoint DO#74 between viewpoints DO#76 and DO#72, viewpoint DO#54 between viewpoints DO#72 and DO#37, and viewpoint DO#19 between viewpoints DO#37 and DO#1; and
- viewpoint DO#40 between viewpoint DO#39 of the third sub-level H2 and viewpoint DO#0 of the first sub-level H1
- viewpoint DO#41 between viewpoint DO#42 of the third sub-level H2 and viewpoint DO#0 of the first sub-level H1
- viewpoint DO#38 between viewpoint DO#39 of the third sub-level H2 and viewpoint DO#37 of the second sub-level H1
- viewpoint DO#43 between viewpoint DO#42 of the third sub-level H2 and viewpoint DO#44 of the second sub-level H1, for a total of 4 viewpoints
- viewpoint DO#40 between viewpoint DO#39 of the third sub-level H2 and viewpoint DO#0 of the first sub-level H1
- viewpoint DO#38 between viewpoint DO#39 of the third sub-level H2 and viewpoint DO#37 of the second sub-level H1
- viewpoint DO#43 between viewpoint DO#42 of the third sub-level H2 and viewpoint DO#44 of the second sub-level H1
- viewpoint DO#32 between viewpoint DO#23 of the third sub-level H2 and viewpoint DO#0 of the first sub-level H1
- viewpoint DO#49 between viewpoint DO#58 of the third sub-level H2 and viewpoint DO#0 of the first sub-level H1
- viewpoint DO#14 between viewpoint DO#23 of the third sub-level H2 and viewpoint DO#5 of the second sub-level H1
- viewpoint DO#67 between viewpoint DO#58 of the third sub-level H2 and viewpoint DO#76 of the second sub-level H1 for a total of 4 viewpoints.
- the second level also includes a fifth sub-level, which includes at least two viewpoints between the second sub-level and the fourth sub-level located on an edge line of the multi-view array, between the third sub-level and the second sub-level on a row of the multi-view array other than the edge line and the horizontal central axis, and between the second sub-level and the fourth sub-level.
- a fifth sub-level which includes at least two viewpoints between the second sub-level and the fourth sub-level located on an edge line of the multi-view array, between the third sub-level and the second sub-level on a row of the multi-view array other than the edge line and the horizontal central axis, and between the second sub-level and the fourth sub-level.
- the second level also includes a fifth sub-level H4.
- the fifth sub-level H4 includes 24 viewpoints, which are:
- Viewpoints between the viewpoints of the second sub-level H1 and the viewpoints of the fourth sub-level H3 on the edge line of the multi-view array such as DO#2, DO#4, DO#6, DO#8, DO#18, DO#36, DO#53, DO#71, DO#79, DO#77, DO#75, DO#73, DO#63, DO#45, DO#28, DO#10, etc., a total of 16 viewpoints;
- viewpoints between the viewpoints of the third sub-level H2 and the viewpoints of the second sub-level H1 on the third row such as DO#22 and DO#24
- viewpoints between the viewpoints of the second sub-level H1 and the viewpoints of the fourth sub-level H3, such as DO#20 and DO#26, for a total of 4 viewpoints
- viewpoints in total including the viewpoints between the viewpoints of the third sub-level H2 and the viewpoints of the second sub-level H1 on the seventh row, such as DO#57 and DO#59, and the viewpoints between the viewpoints of the second sub-level H1 and the viewpoints of the fourth sub-level H3, such as DO#55 and DO#61.
- the viewpoints of the fourth sub-level H3 and the fifth sub-level H4 cover nearly half of the dense multi-view image of the multi-view array. Therefore, when the viewpoints of the fourth sub-level H3 and the fifth sub-level H4 are inter-frame prediction coded with reference to the viewpoints of the first sub-level or the second sub-level, it can help save bit rate. In other words, the fourth sub-level H3 and the fifth sub-level H4 are the main sources of bit rate savings.
- the second level also includes a sixth sub-level, which includes at least two viewpoints between the third sub-level and the second sub-level and between the second sub-level and the fourth sub-level located on columns of the multi-viewpoint array other than edge lines and the vertical central axis.
- the second level also includes a sixth sub-level H5.
- the sixth sub-level H5 includes 8 viewpoints, namely:
- the viewpoints between the viewpoints of the third sub-level H2 and the viewpoints of the second sub-level H1 on the third column such as DO#30 and DO#47, and at least two viewpoints between the viewpoints of the second sub-level H1 and the viewpoints of the fourth sub-level H3, such as DO#12 and DO#65, for a total of 4 viewpoints;
- the viewpoints between the viewpoints of the third sub-level H2 and the viewpoints of the second sub-level H1 on the seventh column such as DO#34 and DO#51, and at least two viewpoints between the viewpoints of the second sub-level H1 and the viewpoints of the fourth sub-level H3, such as DO#16 and DO#69, totaling 4 viewpoints.
- the second level further includes a seventh sub-level, the seventh sub-level including at least two views located in the same row as the sixth sub-level in the row of the multi-view array excluding the edge line.
- the second level also includes a seventh sub-level H6, which includes 16 viewpoints, namely: viewpoints DO#11, DO#13, DO#15, DO#17 in the second row, viewpoints DO#29, DO#31, DO#33, DO#35 in the fourth row, viewpoints DO#46, DO#48, DO#50, DO#52 in the sixth row, and viewpoints DO#64, DO#66, DO#68, DO#70 in the eighth row.
- viewpoints DO#11, DO#13, DO#15, DO#17 in the second row viewpoints DO#29, DO#31, DO#33, DO#35 in the fourth row
- viewpoints DO#46, DO#48, DO#50, DO#52 in the sixth row and viewpoints DO#64, DO#66, DO#68, DO#70 in the eighth row.
- the multi-view images in the multi-view array may be rearranged into a video sequence according to the hierarchical prediction structure, in which the images are ordered from low to high levels (sub-levels).
- the video sequence is arranged in an encoder order (EO), which refers to the actual order in which the view images are encoded. Therefore, the multi-view images in the multi-view array may be encoded from low to high levels (sub-levels).
- EO encoder order
- a precise encoding order may also be involved within a specific level.
- steps S402 and S403 describe the encoding order process according to the hierarchical prediction structure.
- S402 Perform predictive coding on at least one viewpoint of the first level to obtain a reference frame.
- At least one viewpoint of the first level of the hierarchical prediction structure is first compression-encoded.
- step S402 may specifically include S4021:
- the central viewpoint of the first sub-level is designated as the first frame of the coding order EO, numbered 0 (EO#0).
- the coding order of the central viewpoint (DO#0) of the first sub-level H0 is EO#0. Since there is no coded viewpoint image as a reference, the central viewpoint is encoded using the intra-frame prediction mode to obtain the first reference frame.
- the first reference frame can be used as a reference frame for any viewpoint in the subsequent video sequence.
- step S402 may further include S4022:
- S4022 Perform predictive encoding on at least two viewpoints of the second sub-level respectively to obtain a second reference frame.
- the reference frame includes a first reference frame and a second reference frame.
- At least two viewpoints of the second sub-level are respectively predicted and encoded to obtain at least two second reference frames.
- intra-frame prediction coding or inter-frame prediction coding may be performed on at least two viewpoints of the second sub-level to obtain a second reference frame.
- a reference frame may be adaptively selected from adjacent (including the closest or second-adjacent) viewpoints, for example, the first reference frame of the encoded central viewpoint of the first sub-level may be used as a reference frame, or the second reference frame of the encoded viewpoint in the second sub-level may be used as a reference frame, and this application does not limit this.
- the viewpoints of the first sub-level can be divided into an upper half and a lower half, and compression encoding is performed independently.
- compression encoding is performed independently.
- part of the encoded images of the upper half (such as the encoded images corresponding to the viewpoints other than the viewpoints at the junction of the upper half and the lower half) can be deleted, thereby saving the encoding cache.
- the viewpoint of the second sub-level H1 can be divided into an upper half and a lower half.
- the upper half of the second sub-level H1 can be encoded and compressed in sequence from EO#1 to EO#7 in the encoding order.
- the viewpoints DO#44, DO#9, DO#25, DO#5, DO#21, DO#1, and DO#37 of the upper half of the second sub-level H1 can be encoded and compressed in sequence from right to left in the encoding order from EO#1 to EO#7.
- the lower half of the second sub-level may be compressed and encoded.
- the specific compression order is similar to that of the upper half, and the description of the upper half may be referred to, and will not be repeated here.
- the first-level viewpoint can provide reference information for the second-level viewpoint as a reference for the second-level viewpoint.
- the second-level viewpoint can be inter-frame predicted and encoded based on the reference frame obtained by encoding the first-level viewpoint to obtain a reconstructed image.
- the embodiment of the present application determines the hierarchical prediction structure of the multi-viewpoint array, and in the frame encoding process first predicts and encodes at least one viewpoint of the first level to obtain a reference frame, and then predicts and encodes the viewpoint of the second level according to the reference frame to obtain a reconstructed image, so that a reference relationship can be established between viewpoints at different positions of the second level and the first level (such as different rows or columns), and the spatial position correlation between viewpoints distributed in two dimensions can be better utilized, which helps to improve the video compression efficiency of the multi-viewpoint array.
- the hierarchical prediction structure of the embodiment of the present application can include the reference position of the viewpoint in the multi-view array, without the need for additional means, such as establishing a position lookup table to supplement the reference position of each viewpoint, which helps to further improve the compression efficiency of the multi-view array.
- step 403 when the second level further includes multiple sub-levels, step 403 may be specifically implemented as follows:
- the viewpoints in each sub-level in the second level are encoded in sequence from the low sub-level to the high sub-level to obtain a reconstructed image, wherein the viewpoint of each sub-level in each sub-level in the second level is predictively encoded with reference to the viewpoint of the same sub-level or lower sub-level of each sub-level.
- step 403 may specifically be the following steps S4031:
- the viewpoints in each sub-level in the second level are encoded in sequence from the lower sub-level to the higher sub-level, wherein the viewpoint of each sub-level in each sub-level is predictively encoded with reference to the viewpoint of the same sub-level or lower sub-level of each sub-level.
- the viewpoint of the third sub-level in the second level can use the encoded viewpoint in at least one of the first sub-level, the second sub-level and the third sub-level as a reference frame for inter-frame prediction encoding;
- the viewpoint of the fourth sub-level can use the encoded viewpoint in at least one of the first sub-level to the fourth sub-level as a reference frame for inter-frame prediction encoding, and so on.
- each sub-level may adaptively select a reference frame from adjacent (including the most adjacent or second adjacent) viewpoints during encoding, which is not limited in the present application.
- the viewpoints in a certain sub-level can only be encoded with reference to the viewpoints in the same sub-level or a lower sub-level, so that a reference relationship can be established between viewpoints at different positions of different sub-levels (such as different rows or different columns), and the spatial position correlation between viewpoints distributed in two dimensions can be utilized to a greater extent, which helps to improve the video compression efficiency of multi-viewpoint arrays.
- At least two rows of the multi-view array can be encoded row by row according to the encoding order of the one-dimensional hierarchical coding structure, wherein, for each row of the multi-view array, except for the first sub-level and the second sub-level, the viewpoints other than the first sub-level are encoded sequentially in an order from the lower sub-level to the higher sub-level.
- encoding can be performed row by row in the classic encoding order of 1, 9, 5, 3, 2, 4, 7, 6, 8.
- encoding is performed sequentially in the order from the lower sub-level to the higher sub-level.
- the at least two viewpoints when at least two viewpoints of the same sub-level are included in a row, the at least two viewpoints may be encoded one by one according to the encoding order of the one-dimensional hierarchical coding structure.
- the viewpoints DO#1, DO#5, and DO#9 of the second sub-level H1 have been encoded and the corresponding second reference frame has been obtained.
- the viewpoints DO#3 and DO#7 of the fourth sub-level H3 can be firstly inter-frame predicted and encoded, and then the viewpoints DO#2, DO#4, DO#6, and DO#8 of the fifth sub-level H4 can be inter-frame predicted and encoded.
- DO#3 and DO#7 can be encoded in sequence according to the encoding order of the one-dimensional hierarchical coding structure
- DO#2, DO#4, DO#6, and DO#8 can be encoded in sequence according to the encoding order of the one-dimensional hierarchical coding structure.
- a first partial multi-view array may be determined in the multi-view array, wherein the first partial multi-view array includes the central view. Then, at least two rows of the first partial multi-view array may be encoded row by row according to the encoding order of the one-dimensional hierarchical encoding structure.
- the first part of the multi-view array can be independently encoded, so that after the encoding of the first part of the multi-view array is completed, part of the encoded images of the first part of the multi-view array (such as the encoded images corresponding to the viewpoints other than the viewpoints at the junction of the first part of the multi-view array and the rest of the multi-view array) can be deleted, thereby saving the encoding cache.
- the first part of the multi-view array may include the upper half of the multi-view array, the lower half of the multi-view array, the upper right part of the multi-view array, the upper left part of the multi-view array, the lower left part of the multi-view array, or the lower right part of the multi-view array, which is not limited in the present application.
- the viewpoints of the remaining third sub-level H2 to the seventh sub-level H6 in the upper half are further divided into two parts, which are also compressed independently to save the encoding cache.
- At least two rows of the multi-view array can be encoded row by row according to the encoding order of the one-dimensional hierarchical coding structure, for example, row by row in the order of 5, 1, 3, 4, 2.
- the encoding is performed according to the hierarchical order, and the viewpoints of the lower sub-level are encoded first.
- the viewpoints in the fifth row are encoded first.
- the viewpoint DO#42 of the third sub-level H2 may be encoded first, and its encoding order is EO#8.
- the viewpoints DO#43 and DO#41 of the fourth sub-level H3 are encoded, and their encoding orders are EO#9 and EO#10 respectively.
- the viewpoints of the first row are encoded.
- the viewpoint DO#7 of the fourth sub-level H3 may be encoded first, and its encoding order is EO#11.
- the viewpoints DO#8 and DO#6 of the fifth sub-level H4 are encoded, and their encoding orders are EO#12 and EO#13 respectively.
- the viewpoints of the third row are encoded.
- the viewpoint DO#23 of the third sub-level H2 may be encoded first, and its encoding order is EO#14.
- the viewpoint DO#27 of the fourth sub-level H3 may be encoded, and its encoding order is EO#15.
- the viewpoints DO#26 and DO#24 of the fifth sub-level H4 may be encoded, and their encoding orders are EO#16 and EO#17 respectively.
- the viewpoints of the 4th row are encoded.
- the viewpoint DO#32 of the fourth sub-level H3 may be encoded first, and its encoding order is EO#18.
- the viewpoint DO#36 of the fifth sub-level H4 may be encoded, and its encoding order is EO#19.
- the viewpoint DO#34 of the sixth sub-level H5 may be encoded, and its encoding order is EO#20.
- the viewpoints DO#35 and DO#33 of the seventh sub-level H6 may be encoded, and their encoding orders are EO#21 and EO#22 respectively.
- the viewpoints of the second row are encoded. Specifically, the viewpoint DO#14 of the fourth sub-level H3 is encoded first, and its encoding order is EO#23. Then the viewpoint DO#18 of the fifth sub-level H4 is encoded, and its encoding order is EO#24. Then the viewpoint DO#16 of the sixth sub-level H5 is encoded, and its encoding order is EO#25. Then the viewpoints DO#17 and DO#15 of the seventh sub-level H6 are encoded, and their encoding orders are EO#26 and EO#27 respectively.
- a reference frame when encoding a viewpoint within a row, can be adaptively selected from adjacent (including the most adjacent or next adjacent) viewpoints.
- an encoded viewpoint can be adaptively selected as a reference frame in the same row, same column, different rows, or different columns. This application does not limit this.
- the upper left multi-view array may be encoded.
- the encoding sequence of the upper left multi-view array is similar to that of the upper right multi-view array, and the above description may be referred to.
- the encoding of the upper multi-view array is completed.
- the encoding of the lower multi-view array can continue.
- the encoding order of the lower multi-view array is similar to that of the upper multi-view array, and reference can be made to the description above.
- the encoding of the entire multi-view array is completed.
- FIG8 also shows the encoding order of all viewpoints in the multi-view array.
- different partial multi-view arrays have at least two viewpoints in common at the junction. For example, if the first partial multi-view array is the upper right partial multi-view array and the second partial multi-view array is the upper left partial multi-view array, then the junction of the first partial multi-view array and the second partial multi-view array has viewpoints DO#5, DO#14, DO#23, DO#32, and DO#0 in common.
- the reconstructed images corresponding to the viewpoints other than the at least two viewpoints shared at the junction in the first part of the multi-viewpoint array can be deleted to save the encoding cache and maintain a lightweight encoding cache area.
- the reconstructed images of viewpoints other than the viewpoints DO#5, DO#14, DO#23, DO#32, DO#0 shared at the junction with the upper left multi-view array, and the viewpoints DO#41, DO#42, DO#43, DO#44 shared at the junction with the lower right multi-view array can be deleted.
- each part of the multi-view array is encoded independently, deleting the reconstructed images corresponding to the viewpoints other than the at least two viewpoints shared at the junction in the first part of the multi-view array will not affect the encoding of the other parts of the multi-view array, thereby saving the viewpoint encoding cache and maintaining the purpose of lightweight encoding cache.
- the coded image of the last coded viewpoint will not be used as a reference frame for other viewpoints.
- viewpoints can be called non-reference viewpoints, such as the viewpoints of the seventh sub-level H6 in the embodiment of the present application, and the second, fifth, seventh, and tenth columns of the multi-viewpoint array in FIG3A.
- the embodiment of the present application can significantly reduce the number of non-reference viewpoints, thereby further improving the video compression efficiency.
- the above introduces the encoding method of the present application by taking the encoding end as an example.
- the following describes the video decoding method provided by the embodiment of the present application by taking the decoding end as an example.
- FIG9 is a schematic flow chart of a decoding method 500 provided in an embodiment of the present application. As shown in FIG9 , the decoding method in the embodiment of the present application includes:
- S501 determining a hierarchical prediction structure of a multi-view array, the hierarchical prediction structure comprising a first level and a second level, the first level comprising at least one view of the multi-view array, and the second level comprising other viewpoints except the first level.
- S502 Perform predictive decoding on at least one viewpoint of the first layer to obtain a reference frame.
- S503 predictively decode the viewpoint of the second level according to the reference frame to obtain a reconstructed image.
- the decoder After the decoder obtains the bitstream, it can determine the hierarchical prediction structure of the multi-view array according to the bitstream. Then, after the decoder obtains the video frame by reverse decoding according to the coding order of each viewpoint image of the first level and the second level of the multi-view array, the multi-view array can be obtained according to the hierarchical prediction structure of the multi-view array.
- the multi-viewpoint array is obtained by photographing the same three-dimensional scene from multiple angles using multiple cameras.
- the multi-view array includes a centrally symmetric multi-view array.
- the first level includes a first sub-level, the first sub-level including a central viewpoint.
- the predictive decoding of at least one viewpoint of the first level to obtain a reference frame includes:
- the first level further comprises a second sub-level comprising at least two views evenly distributed in the multi-view array.
- the second sub-level includes a plurality of viewpoints evenly distributed on edge lines of the multi-view array, and at least one of viewpoints located at intermediate positions between a center viewpoint and a vertex viewpoint on a diagonal line of the multi-view array.
- the predictive decoding of at least one viewpoint of the first level to obtain a reference frame includes:
- Intra-frame prediction decoding or inter-frame prediction decoding is performed on at least two viewpoints of the second sub-level to obtain a second reference frame.
- the second level includes a third sub-level including at least two viewpoints located between the first sub-level and the second sub-level on a horizontal center axis and a vertical center axis of the multi-view array.
- the second level also includes a fourth sub-level, which includes viewpoints between viewpoints of the first sub-level located on edge lines of the multi-view array, and at least two viewpoints between the third sub-level and the first sub-level and between the third sub-level and the second sub-level on the horizontal central axis and the vertical central axis of the multi-view array.
- the second level also includes a fifth sub-level
- the fifth sub-level includes at least two viewpoints between the second sub-level and the fourth sub-level on the edge line of the multi-view array, between the third sub-level and the second sub-level on the rows of the multi-view array other than the edge line and the horizontal central axis, and between the second sub-level and the fourth sub-level.
- the second level also includes a sixth sub-level
- the sixth sub-level includes at least two viewpoints between the third sub-level and the second sub-level, and between the second sub-level and the fourth sub-level on columns of the multi-viewpoint array except for edge lines and vertical central axis lines.
- the second level further includes a seventh sub-level including at least two views located in the same row as the sixth sub-level in the rows of the multi-view array except for edge lines.
- performing predictive decoding on the viewpoint of the second level according to the reference frame to obtain a reconstructed image includes:
- the viewpoints in each sub-level in the second level are decoded in sequence from a low sub-level to a high sub-level, wherein the viewpoint of each sub-level in the sub-levels is predictively decoded with reference to the viewpoint of the same sub-level or a lower sub-level as each sub-level.
- the decoding of the viewpoints in each sub-level in the second level in sequence from a lower sub-level to a higher sub-level according to the reference frame comprises:
- At least two rows of the multi-viewpoint array are decoded row by row, wherein, for each row of the multi-viewpoint array, except for the first sub-level and the second sub-level, the viewpoints are decoded sequentially from the lower sub-level to the higher sub-level.
- it also includes:
- the step of decoding at least two rows of the multi-view array row by row according to the decoding order of the one-dimensional hierarchical decoding structure comprises:
- At least two rows of the first part of the multi-view array are decoded row by row according to a decoding order of a one-dimensional hierarchical decoding structure.
- the first portion of the multi-view array includes an upper half of the multi-view array, a lower half of the multi-view array, an upper right portion of the multi-view array, an upper left portion of the multi-view array, a lower left portion of the multi-view array, or a lower right portion of the multi-view array.
- the multi-view array further includes a second partial multi-view array, the second partial multi-view array includes the central viewpoint, and the first partial multi-view array and the second partial multi-view array share at least two viewpoints at a junction;
- the method further comprises:
- the reconstructed images corresponding to the viewpoints other than the at least two viewpoints shared by the boundary in the first part of the multi-viewpoint array are deleted.
- the specific process of the decoding method can refer to the process of the encoding method, which will not be repeated here.
- the encoding method provided by the embodiment of the present application can obtain a better encoding effect at the encoding end, improve the encoding compression efficiency, and correspondingly, the decoding performance can also be improved accordingly at the decoder.
- the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
- the term "and/or” is merely a description of the association relationship of associated objects, indicating that three relationships may exist. Specifically, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone.
- the character "/" in the present application generally indicates that the associated objects before and after are in an "or" relationship.
- FIG10 is a schematic block diagram of an encoding device 10 provided in an embodiment of the present application, and the encoding device 10 is applied to the above-mentioned video decoding end.
- the encoding device 10 includes:
- a determining unit 11 is configured to determine a hierarchical prediction structure of a multi-view array; wherein the hierarchical prediction structure comprises a first level and a second level, the first level comprises at least one view of the multi-view array, and the second level comprises other viewpoints except the first level;
- a coding unit 12 configured to perform predictive coding on at least one viewpoint of the first level to obtain a reference frame
- the encoding unit 12 is further configured to perform predictive encoding on the viewpoint of the second level according to the reference frame to obtain a reconstructed image.
- the first level includes a first sub-level, the first sub-level including a central viewpoint.
- the encoding unit 12 is specifically used for:
- Intra-frame prediction encoding is performed on the central viewpoint of the first sub-level to obtain a first reference frame.
- the first level further comprises a second sub-level comprising at least two views evenly distributed in the multi-view array.
- the second sub-level includes a plurality of viewpoints evenly distributed on edge lines of the multi-view array, and at least one of viewpoints located at intermediate positions between a center viewpoint and a vertex viewpoint on a diagonal line of the multi-view array.
- the encoding unit 12 is specifically used for:
- Intra-frame prediction coding or inter-frame prediction coding is performed on at least two viewpoints of the second sub-level to obtain a second reference frame.
- the second level includes a third sub-level including at least two viewpoints located between the first sub-level and the second sub-level on a horizontal center axis and a vertical center axis of the multi-view array.
- the second level also includes a fourth sub-level, which includes viewpoints between viewpoints of the first sub-level located on edge lines of the multi-view array, and at least two viewpoints between the third sub-level and the first sub-level and between the third sub-level and the second sub-level on the horizontal central axis and the vertical central axis of the multi-view array.
- the second level also includes a fifth sub-level
- the fifth sub-level includes at least two viewpoints between the second sub-level and the fourth sub-level on the edge line of the multi-view array, between the third sub-level and the second sub-level on the rows of the multi-view array other than the edge line and the horizontal central axis, and between the second sub-level and the fourth sub-level.
- the second level also includes a sixth sub-level
- the sixth sub-level includes at least two viewpoints between the third sub-level and the second sub-level, and between the second sub-level and the fourth sub-level on columns of the multi-viewpoint array except for edge lines and vertical central axis lines.
- the second level further includes a seventh sub-level, the seventh sub-level including at least two views located in the same row as the sixth sub-level in the rows of the multi-view array except for edge lines.
- the encoding unit 12 is specifically used for:
- the viewpoints in each sub-level in the second level are encoded sequentially in an order from a low sub-level to a high sub-level, wherein the viewpoint of each sub-level in the sub-levels is predictively encoded with reference to the viewpoint of the same or lower sub-level as each sub-level.
- the encoding unit 12 is specifically used for:
- At least two rows of the multi-view array are encoded row by row according to the encoding order of the one-dimensional hierarchical coding structure, wherein the viewpoints except the first sub-level and the second sub-level in each row of the multi-view array are encoded sequentially from the lower sub-level to the higher sub-level.
- the encoding unit 12 is specifically used for:
- At least two rows of the first part of the multi-view array are encoded row by row according to the encoding order of the one-dimensional hierarchical coding structure.
- the first portion of the multi-view array includes an upper half of the multi-view array, a lower half of the multi-view array, an upper right portion of the multi-view array, an upper left portion of the multi-view array, a lower left portion of the multi-view array, or a lower right portion of the multi-view array.
- the multi-view array further includes a second partial multi-view array, the second partial multi-view array includes the central viewpoint, and the first partial multi-view array and the second partial multi-view array share at least two viewpoints at the junction;
- the encoding unit 12 is further configured to delete reconstructed images corresponding to viewpoints other than the at least two viewpoints shared by the boundary in the first part of the multi-viewpoint array.
- the multi-viewpoint array is obtained by photographing the same three-dimensional scene from multiple angles using multiple cameras.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, it will not be repeated here.
- the device 10 shown in Figure 10 can execute the encoding method of the encoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively to implement the corresponding processes in each method such as the encoding method of the encoding end, and for the sake of brevity, it will not be repeated here.
- FIG11 is a schematic block diagram of a decoding device provided in an embodiment of the present application, and the decoding device is applied to the above-mentioned decoding end.
- the decoding device 20 may include:
- a determining unit 21 configured to determine a hierarchical prediction structure of a multi-view array; wherein the hierarchical prediction structure comprises a first level and a second level, the first level comprises at least one view of the multi-view array, and the second level comprises other viewpoints except the first level;
- a decoding unit 22 configured to perform predictive decoding on at least one viewpoint of the first layer to obtain a reference frame
- the decoding unit 22 is further configured to perform predictive decoding on the viewpoint of the second level according to the reference frame to obtain a reconstructed image.
- the first level includes a first sub-level, the first sub-level including a central viewpoint.
- the decoding unit 22 is specifically used for:
- the first level further comprises a second sub-level comprising at least two views evenly distributed in the multi-view array.
- the second sub-level includes a plurality of viewpoints evenly distributed on edge lines of the multi-view array, and at least one of viewpoints located at intermediate positions between a center viewpoint and a vertex viewpoint on a diagonal line of the multi-view array.
- the decoding unit 22 is specifically used for:
- Intra-frame prediction decoding or inter-frame prediction decoding is performed on at least two viewpoints of the second sub-level to obtain a second reference frame.
- the second level includes a third sub-level including at least two viewpoints located between the first sub-level and the second sub-level on a horizontal center axis and a vertical center axis of the multi-view array.
- the second level also includes a fourth sub-level, which includes viewpoints between viewpoints of the first sub-level located on edge lines of the multi-view array, and at least two viewpoints between the third sub-level and the first sub-level and between the third sub-level and the second sub-level on the horizontal central axis and the vertical central axis of the multi-view array.
- the second level also includes a fifth sub-level
- the fifth sub-level includes at least two viewpoints between the second sub-level and the fourth sub-level on the edge line of the multi-view array, between the third sub-level and the second sub-level on the rows of the multi-view array other than the edge line and the horizontal central axis, and between the second sub-level and the fourth sub-level.
- the second level also includes a sixth sub-level
- the sixth sub-level includes at least two viewpoints between the third sub-level and the second sub-level, and between the second sub-level and the fourth sub-level on columns of the multi-viewpoint array except for edge lines and vertical central axis lines.
- the second level further includes a seventh sub-level including at least two views located in the same row as the sixth sub-level in the rows of the multi-view array except for edge lines.
- performing inter-frame prediction decoding on the viewpoint of the second level according to the reference frame to obtain a reconstructed image includes:
- the viewpoints in each sub-level in the second level are decoded in sequence from a low sub-level to a high sub-level, wherein the viewpoint of each sub-level in the sub-levels is predictively decoded with reference to the viewpoint of the same sub-level or a lower sub-level as each sub-level.
- the decoding unit 22 is specifically used for:
- At least two rows of the multi-viewpoint array are decoded row by row, wherein, for each row of the multi-viewpoint array, except for the first sub-level and the second sub-level, the viewpoints are decoded sequentially from the lower sub-level to the higher sub-level.
- the decoding unit 22 is specifically used for:
- At least two rows of the first part of the multi-view array are decoded row by row according to a decoding order of a one-dimensional hierarchical decoding structure.
- the first portion of the multi-view array includes an upper half of the multi-view array, a lower half of the multi-view array, an upper right portion of the multi-view array, an upper left portion of the multi-view array, a lower left portion of the multi-view array, or a lower right portion of the multi-view array.
- the multi-view array further includes a second partial multi-view array, the second partial multi-view array includes the central viewpoint, and the first partial multi-view array and the second partial multi-view array share at least two viewpoints at a junction;
- the decoding unit 22 is further configured to delete reconstructed images corresponding to viewpoints other than the at least two viewpoints shared by the boundary in the first portion of the multi-viewpoint array.
- the multi-viewpoint array is obtained by photographing the same three-dimensional scene from multiple angles using multiple cameras.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, no further description is given here.
- the device 20 shown in FIG. 11 may correspond to the corresponding subject in the prediction method of the decoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively for implementing the corresponding processes in each method such as the decoding method of the decoding end, and for the sake of brevity, no further description is given here.
- the functional unit can be implemented in hardware form, can be implemented by instructions in software form, and can also be implemented by a combination of hardware and software units.
- the steps of the method embodiment in the embodiment of the present application can be completed by the hardware integrated logic circuit and/or software form instructions in the processor, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software units in the decoding processor to perform.
- the software unit can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, a register, etc.
- the storage medium is located in a memory, and the processor reads the information in the memory, and completes the steps in the above method embodiment in conjunction with its hardware.
- FIG. 12 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
- the electronic device 30 may be a video encoder or a video decoder as described in the embodiment of the present application, and the electronic device 30 may include:
- the memory 33 and the processor 32, the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32.
- the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
- the processor 32 may be configured to execute the steps in the above-mentioned method 400 or 500 according to the instructions in the computer program 34 .
- the processor 32 may include but is not limited to:
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- the memory 33 includes but is not limited to:
- Non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM) or flash memory.
- the volatile memory can be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDR SDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link DRAM
- Direct Rambus RAM Direct Rambus RAM, DR RAM
- the computer program 34 may be divided into one or more units, which are stored in the memory 33 and executed by the processor 32 to complete the method provided by the present application.
- the one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30.
- the electronic device 30 may further include:
- the transceiver 33 may be connected to the processor 32 or the memory 33 .
- the processor 32 may control the transceiver 33 to communicate with other devices, specifically, to send information or data to other devices, or to receive information or data sent by other devices.
- the transceiver 33 may include a transmitter and a receiver.
- the transceiver 33 may further include an antenna, and the number of antennas may be one or more.
- bus system includes not only a data bus but also a power bus, a control bus and a status signal bus.
- the present application also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, the computer can perform the method of the above method embodiment.
- the present application embodiment also provides a computer program product containing instructions, and when the instructions are executed by a computer, the computer can perform the method of the above method embodiment.
- the present application also provides a code stream, which is generated according to the above encoding method.
- the code stream includes the above first flag, or includes the first flag and the second flag.
- the computer program product includes one or more computer instructions.
- the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
- the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated.
- the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid state drive (solid state disk, SSD)), etc.
- a magnetic medium e.g., a floppy disk, a hard disk, a tape
- an optical medium e.g., a digital video disc (digital video disc, DVD)
- a semiconductor medium e.g., a solid state drive (solid state disk, SSD)
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供一种编解码方法、装置、设备、及存储介质,能够根据多视点阵列的分层预测结构,在帧编码过程中首先对第一层级的至少一个视点进行预测编码得到参考帧,然后根据该参考帧对第二层级的视点进行预测编码得到重建图像,使得第二层级和第一层级的不同位置视点(比如不同行或不同列)之间能够建立参考关系,能够更好地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压缩效率。
Description
本申请涉及图像处理技术领域,尤其涉及一种编解码方法、装置、设备、及存储介质。
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,可以采用多视点视频作为的视觉媒体对象。多视点视频是由多个摄像机采集的,包含不同视角的,支持用户交互的沉浸式媒体视频。
一种多视点视频编解码技术,采用二维分层编码结构(2-D hierarchical coding structure)将多视点图像组成一个视频序列,利用现有的视频压缩工具对该视频序列进行压缩,得到压缩视频码流。但是,该方法视频压缩效率较低。
发明内容
本申请实施例提供了一种编解码方法、装置、设备、及存储介质,能够更好地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压缩效率。
第一方面,本申请提供了一种编码方法,包括:
确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;
对所述第一层级的至少一个视点进行预测编码,得到参考帧;
根据所述参考帧,对所述第二层级的视点进行预测编码,得到重建图像。
第二方面,本申请实施例提供一种解码方法,包括:
确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;
对所述第一层级的至少一个视点进行预测解码,得到参考帧;
根据所述参考帧,对所述第二层级的视点进行预测解码,得到重建图像。
第三方面,本申请提供了一种编码装置,用于执行上述第一方面或其各实现方式中的方法。具体地,该预测装置包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种解码装置,用于执行上述第二方面或其各实现方式中的方法。具体地,该预测装置包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种编解码系统,包括编码器和解码器。编码器用于执行上述第一方面或其各实现方式中的方法,解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,码流是基于上述第一方面的方法生成的。
基于以上技术方案,本申请实施例能够根据多视点阵列的分层预测结构,在帧编码过程中首先对第一层级的至少一个视点进行预测编码得到参考帧,然后根据该参考帧对第二层级的视点进行预测编码得到重建图像,使得第二层级和第一层级的不同位置视点(比如不同行或不同列)之间能够建立参考关系,能够更好地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压 缩效率。
另外,由于分层预测结构能够包含多视图之间的空间位置信息,因此本申请实施例的分层预测结构能够包含视点的在多视点阵列中的参考位置,不需要额外的手段,比如建立位置查找表来补充各视点的参考位置,有助于进一步提高多视点阵列的压缩效率。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2A是本申请实施例涉及的视频编码器的示意性框图;
图2B是本申请实施例涉及的视频解码器的示意性框图;
图3A是多视点阵列的一个示意图;
图3B是多视点阵列的编码顺序的一个示意图;
图4为本申请一实施例提供的编码方法流程示意图;
图5A为本申请一实施例提供的多视点阵列的DO示意图;
图5B为本申请一实施例提供的多视点阵列的分层预测结构的示意图;
图6为本申请另一实施例提供的编码过程示意图;
图7为本申请一实施例提供的多视点阵列的EO的示意图;
图8为本申请另一实施例提供的多视点阵列的EO的示意图;
图9为本申请实一施例提供的解码方法流程示意图;
图10是本申请一实施例提供的编码装置的示意性框图;
图11是本申请一实施例提供的解码装置的示意性框图;
图12本申请实施例提供的电子设备的示意性框图。
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/高级视频编码(advancedvideo coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
本申请实施例涉及的编码主要为视频编解码,为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设 备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,摄像机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2A是本申请实施例涉及的视频编码器的示意性框图。
如图2A所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。
帧内估计单元212只参考同一帧图像的信息,预测当前图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。
图2B是本申请实施例涉及的视频解码器的示意性框图。
如图2B所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。预测单元320包括帧间预测单元321和帧内估计单元322。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
在一些应用场景中,在三维场景中可以出现多视点视频。多视点视频由多个摄像机采集的,包含不同视角的,支持用户交互的沉浸式媒体视频。也被称为多视角视频、自由视角视频等。
多视点视频通常由摄像机阵列从多个角度对同一三维场景进行拍摄得到。其中,摄像机阵列中的多台摄像机在拍摄过程中被合理置位,以使每台摄像机都能从一个视点(viewpoint)抓取场景。多台摄像机获取的图像称为多视点图像。多视点图像可以按照空间位置关系,形成多视点图像阵列,也可以称为多视点阵列。
相应地,多台摄像机将会抓取对应于多个视点的多个视频序列。为了提供更多的视点,更多的摄像机被使用以生成具有与视点相关的大量视频序列的多视点视频。多视点视频在采集完成后,需要对视频进行压缩编码。例如,在现有的多视点视频的技术中,视频压缩算法可以由AVS3编码技术、HEVC编码技术等完成。
一种多视点视频编解码技术,采用二维分层编码结构(2-D hierarchical coding structure)将多视点阵列组成一个视频序列(或图像序列)。然后,利用现有的视频压缩工具对该视频序列进行压缩,得到压缩视频码流。例如,在图1的编码设备110中,视频源111可以对视频采集装置获取的多视点图像阵列组成视频序列,然后将该视频序列作为视频数据输入视频编码器112,由视频编码器112对该视频序列进行编码,产生码流。
上述方案的关键是确定多视点阵列中的各多视点图像的帧编码顺序,该顺序即对应于视频序列的顺序。具体而言,在帧编码过程中,该方案将用于普通视频编码的一维分层编码结构(1-D hierarchical coding structure)扩展至二维情况。
例如,对于普通的视频编码,以16帧序列为例,一种经典的编码顺序是0,16,8,4,2,1,3,6,5,7,12,10,9,11,14,13,15。具体的,第0帧图像可以为I帧,其每个编码单元只能使用本帧图像的信息进行预测;第16帧可以为P帧,可以利用前向帧间信息进行预测;其余第1至15帧可以支持双向预测。这种编码顺序可以降低参考帧对缓冲区的存储占用。
图3A示出了多视点阵列的一个具体例子。如图3A所示,该多视点阵列包括165个视点,每个视点的编号如图1所示,其中中心视点编号为0,其余视点从1到164逐行编号。这些编号称为视点顺序编号(picture order count,POC)。以该多视点阵列为例,现有的采用二维分层编码结构方案将所有的视点(即多视点阵列)分为四部分进行编码,如图1中虚线所示。如图3B所示,以左上部分为例,在水平和竖直方向上分别以前述的一维分层编码结构进行编码。具体的,先对第0行进行编码,接着是第6行,然后是第3行,以此类推。对于每行内部编码,同理先编码第0列,然后是第6列,然后是第3列,以此类推。
上述方案在帧编码过程将用于普通视频编码的一维分层编码结构扩展至二维情况,是对一维分层编码结构的简单模仿和扩展,忽略了多视图阵列中各个视点在二维平面上的空间位置相关性,这会降 低多视点阵列的视频压缩效率。同时,该方案需要通过额外的手段,比如建立位置查找表来补充各视点的参考位置。
为了解决上述技术问题,本申请实施例通过确定多视点阵列的分层预测结构,并在帧编码过程中首先对第一层级的至少一个视点进行预测编码得到参考帧,然后根据该参考帧对第二层级的视点进行预测编码得到重建图像,使得第二层级和第一层级的不同位置视点(比如不同行或不同列)之间能够建立参考关系,能够更好地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压缩效率。
下面结合图4,以编码端为例,对本申请实施例提供的视频编码方法进行介绍。
图4为本申请一实施例提供的编码方法400流程示意图,如图4所示,本申请实施例的方法400包括:
S401,确定多视点阵列的分层预测结构,该分层预测结构包括第一层级和第二层级,第一层级包括多视点阵列的至少一个视点,第二层级包括除第一层级之外的其他视点。
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,可以采用多视点视频作为的视觉媒体对象。
多视点视频通过多台摄像机(如摄像机阵列)从多个角度对同一三维场景进行拍摄得到。多台摄像机获取的图像称为多视点图像。多视点图像可以按照空间位置关系,形成多视点阵列。多视点阵列中各视点具有水平和垂直视差。
作为一个实例,多视点阵列可以为密集多视点图像按照空间位置关系排列得到,其中各多视点图像高密度排列。
作为一个实例,多视点阵列可以包括中心对称多视点阵列,比如方形多视点阵列。
作为一个实例,多视点阵列可以包括相同时刻或不同时刻的多台摄像机获取的多视点图像,本申请对此不做限定。
本申请实施例中,可以确定多视点阵列的分层预测结构(hierarchy prediction structure,HPS)。由于多视点阵列是根据多视点图像的空间位置关系形成,因此分层预测结构中各层级的空间位置关系不同。例如,上述第一层级的视点与第二层级的视点的空间位置不同。换言之,分层预测结构能够包含多视图之间的空间位置信息。
其中,第一层级的视点可以为多视点阵列中的基础视点,比如相对比较重要的视点,可以为第二层级的视点提供参考信息,作为第二层级的视点的参考。
另外,多视点阵列中每个图像对应的摄像机的拍摄角度不同,而每个视点位置与对应摄像机拍摄的角度相关,因此对多视点阵列得到的分层预测结构与摄像机的拍摄角度相关。在一些实施例中,该分层预测结构也可以称为角度分层预测结构(angular hierarchy prediction structure,AHPS),本申请对此不做限定。
在一些实施例中,多视点阵列也可以称为多视点图像阵列、多视图阵列等,本申请对此不做限定。
本申请实施例中,多视点阵列中多视点图像可以重新排成视频序列(或图像序列)。显示顺序(displayorder,DO),也称为视点顺序编号(picture order count,POC),指各视点图像在视频序列中的顺序索引。
作为示例,中心视点图像(即中心视点图像)被指定为视频序列中的第一帧,编号为0(即DO#0)。其余视点(即视点图像)按照从左到右和从上到下的规则依次分配DO。图5A示出了9×9的多视点阵列的DO的一个具体例子。图5A中每个方形代表一个视点,方形中的数字为对应视点的DO。
在一些实施例中,上述第一层级包括第一子层级。其中,该第一子层级包括中心视点。中心视点因为位于多视点阵列的中心位置,与多视点阵列中所有视点的平均视差最小,可提供较多、较精确的参考信息,可以作为后续帧的参考。
在一些实施例中,第一层级还可以包括第二子层级。其中,该第二子层级包括均匀分布在多视点阵列中的至少两个视点。示例性的,第二子层级可以包括呈稀疏状均匀分布在多视点阵列中的至少两个视点。
本申请实施例中,第一层级,比如第一子层级和第二子层级,是分层预测结构的最基本的构成元素,可以为多视点阵列中的基础视点,作为后续帧(比如第二层级视点对应的帧)的参考,从而能够在帧编码过程中利用呈二维分布的视点之间的空间位置相关性。
作为一种可能的实现方式,第二子层级可以包括位于多视点阵列的边缘线(即边缘位置)上均匀分布的多个视点,和位于多视点阵列的对角线(即对角线位置)上的中心视点和顶点视点的中间位置的视点中的至少一种。需要说明的是,第二子层级不包括中心视点。
示例性的,多视点阵列的边缘线,可以包括多视点最左侧列(如第一列)、最右侧的列(最后一 列)、最上边的行(第一行)和最下边的行(最后一行)中的至少一种。多视点阵列的对角线,可以包括多视点阵列的左上顶点视点与右下顶点视点的第一连线、右上顶点视点与左下顶点视点的第二连线中的至少一种。中心视点位于第一连线和第二连线的交点上。
图5B示出了9×9的多视点阵列的分层预测结构的一个具体例子。如图所示,第一层级包括第一子层级H0和第二子层级H1。其中,第一子层级H0包括位于多视点阵列中心位置的中心视点,DO#0。第二子层级H1包括在多视点阵列的边缘线上均匀分布的8个视点,DO#1、DO#5、DO#9、DO#37、DO#44、DO#72、DO#76、DO#80、,以及位于多视点阵列的两条对角线上,中心视点分别与4个顶点视点的均匀分布的4个视点,如中心视点DO#0与顶点视点DO#1的中间位置上的视点DO#21,中心视点DO#0与顶点视点DO#9的中间位置上的视点DO#25、中心视点DO#0与顶点视点DO#72的中间位置上的视点DO#56、中心视点DO#0与顶点视点DO#80的中间位置上的视点DO#60,总共12个视点。第二子层级H1中的该12个视点呈稀疏状且均匀分布在视图之中。
在一些实施例中,上述第二层级包括第三子层级,第三子层级包括位于多视点阵列的水平中轴线和竖直中轴线上的第一子层级和第二子层级之间的至少两个视点。第三子层级的视点具有连接第一子层级和第二子层级的作用。
继续参见图5B,第二层级包括第三子层级H2,第三子层级H2包括在多视点阵列的水平中轴线上的第一子层级H0的视点DO#0和第二子层级H1的视点DO#37之间的视点DO#39,第一子层级H0的视点DO#0和第二子层级H1的视点DO#44之间的视点DO#42,竖直中轴线上的第一子层级H0的视点DO#0和第二子层级H1的视点DO#5之间的视点DO#23,第一子层级H0的视点DO#0和第二子层级H1的视点DO#76之间的视点DO#58,共四个视点。
在一些实施例中,上述第二层级还包括第四子层级,第四子层级包括位于多视点阵列的边缘线上的第一子层级的视点之间的视点,以及多视点阵列的水平中轴线和竖直中轴线上的第三子层级和第一子层级之间、第三子层级和第二子层级之间的至少两个视点。第四子层级的视点具有填补第三子层级和第二子层级之间的空白的作用。
继续参见图5B,第二层级还包括第四子层级H3,第四子层级H3包括16个视点,分别为:
在多视点阵列的边缘线上的第二子层级H1的视点DO#1和DO#5之间的视点DO#3,视点DO#5和DO#9之间的视点DO#7,视点DO#9和DO#44之间的视点DO#27,视点DO#44和DO#80之间的视点DO#62,视点DO#80和DO#76之间的视点DO#78,视点DO#76和DO#72之间的视点DO#74,视点DO#72和DO#37之间的视点DO#54,视点DO#37和DO#1之间的视点DO#19,共8个视点;以及
水平中轴线上第三子层级H2的视点DO#39和第一子层级H1的视点DO#0之间的视点DO#40,第三子层级H2的视点DO#42和第一子层级H1的视点DO#0之间的视点DO#41,第三子层级H2的视点DO#39和第二子层级H1的视点DO#37之间的视点DO#38,第三子层级H2的视点DO#42和第二子层级H1的视点DO#44之间的视点DO#43,共4个视点;以及
竖直中轴线上第三子层级H2的视点DO#23和第一子层级H1的视点DO#0之间的视点DO#32,第三子层级H2的视点DO#58和第一子层级H1的视点DO#0之间的视点DO#49,第三子层级H2的视点DO#23和第二子层级H1的视点DO#5之间的视点DO#14,第三子层级H2的视点DO#58和第二子层级H1的视点DO#76之间的视点DO#67,共4个视点。
在一些实施例中,第二层级还包括第五子层级,第五子层级包括位于多视点阵列的边缘线上的第二子层级和第四子层级之间的至少两个视点、在多视点阵列的除边缘线和水平中轴线之外的行上的第三子层级和第二子层级之间、第二子层级和第四子层级之间的至少两个视点。
继续参见图5B,第二层级还包括第五子层级H4,第五子层级H4包括24个视点,分别为:
在多视点阵列的边缘线上第二子层级H1的视点和第四子层级H3的视点之间的视点,如DO#2、DO#4、DO#6、DO#8、DO#18、DO#36、DO#53、DO#71、DO#79、DO#77、DO#75、DO#73、DO#63、DO#45、DO#28、DO#10等,共16个视点;以及
第三行上的第三子层级H2的视点和第二子层级H1的视点之间的视点,如DO#22、DO#24,第二子层级H1的视点和第四子层级H3的视点之间的视点,如DO#20、DO#26,共4个视点;以及
第七行上的第三子层级H2的视点和第二子层级H1的视点之间的视点,如DO#57、DO#59,第二子层级H1的视点和第四子层级H3的视点之间的视点,如DO#55、DO#61,共4个视点。
如图5B所示,第四子层级H3和第五子层级H4的视点覆盖了多视点阵列近一半的密集多视点图像。因此,当第四子层级H3和第五子层级H4的视点以第一子层级或第二子层级的视点为参考进行帧间预测编码,能够有助于节省码率。换句话说,第四子层级H3和第五子层级H4是码率节省的主要来源。
在一些实施例中,第二层级还包括第六子层级,第六子层级包括位于多视点阵列的除边缘线和竖直中轴线之外的列上的第三子层级和第二子层级之间、第二子层级和第四子层级之间的至少两个视点。
继续参见图5B,第二层级还包括第六子层级H5,第六子层级H5包括8个视点,分别为:
第三列上的第三子层级H2的视点和第二子层级H1的视点之间的视点,如DO#30、DO#47,第二子层级H1的视点和第四子层级H3的视点之间的至少两个视点,如DO#12、DO#65,共4个视点;
第七列上的第三子层级H2的视点和第二子层级H1的视点之间的视点,如DO#34、DO#51,第二子层级H1的视点和第四子层级H3的视点之间的至少两个视点,如DO#16、DO#69,共4个视点。
在一些实施例中,第二层级还包括第七子层级,第七子层级包括位于多视点阵列的除边缘线之外的行中的与第六子层级相同行的至少两个视点。
继续参见图5B,第二层级还包括第七子层级H6,第七子层级H6包括16个视点,分别为:第二行中的视点DO#11、DO#13、DO#15、DO#17,第四行中的视点DO#29、DO#31、DO#33、DO#35,第六行中的视点DO#46、DO#48、DO#50、DO#52,第八行中的视点DO#64、DO#66、DO#68、DO#70。
在一些实施例中,可以根据分层预测结构,将多视点阵列中的多视点图像重新排成视频序列,该视频序列中图像按照低到高层级(子层级)依次排序。此时,视频序列按编码顺序(encoder order,EO)排列,EO即指各视点图像进行编码的实际顺序。因此,多视点阵列中的多视点图像可以按照从低到高层级(子层级)依次进行编码。可选的,在特定层级内还可以涉及精确的编码顺序。
如下步骤S402和S403描述了根据分层预测结构的编码顺序过程。
S402,对第一层级的至少一个视点进行预测编码,得到参考帧。
具体而言,首先对分层预测结构的第一层级的至少一个视点进行压缩编码。
在一些实施例中,如图6所示,当第一层级包括第一子层级时,步骤S402具体可以包括S4021:
S4021,对第一子层级的中心视点进行帧内预测编码,得到第一参考帧。
具体而言,第一子层级的中心视点被指定为编码顺序EO的第一帧,编号为0(EO#0)。如图7所示,第一子层级H0的中心视点(DO#0)的编码顺序为EO#0。由于没有已编码的视点图像作为参考,中心视点采用帧内预测模式进行编码,得到第一参考帧。第一参考帧可以作为后续视频序列中任一视点的参考帧。
在一些实施例中,继续参见图6,当第一层级还包括第二子层级时,步骤S402进一步还可以包括S4022:
S4022,对第二子层级的至少两个视点分别进行预测编码,得到第二参考帧。
其中,上述参考帧包括第一参考帧和第二参考帧。
具体而言,在第一子层级的中心视点编码完成之后,对第二子层级的至少两个视点分别进行预测编码,得到至少两个第二参考帧。
在一些实施例中,可以对第二子层级的至少两个视点进行帧内预测编码或帧间预测编码,得到第二参考帧。当对第二子层级的视点进行帧间预测编码时,可以自适应地在邻近(包括最邻近或次邻近)的视点中选择参考帧,比如可以将已编码的第一子层级的中心视点的第一参考帧作为参考帧,或者将第二子层级中的已经编码完成的视点的第二参考帧作为参考帧,本申请对此不做限定。
可选的,可以将第一子层级的视点分为上半部分和下半部分,分别独立的进行压缩编码。通过对上半部分和下半部分分别独立的进行压缩编码,能够在上半部分编码完成后对上半部分的部分编码图像(如除上半部分与下半部分交界处的视点之外的视点对应的编码图像)进行删除,从而实现节省编码缓存。
示例性的,继续参见图7,在第一子层级的中心视点(DO#0)编码完成后,可以将第二子层级H1的视点分为上半部分和下半部分,首先可以对第二子层级H1的上半部分按照编码顺序从EO#1到EO#7依次编码压缩。作为一个具体的例子,可以由右至左,依次将第二子层级H1的上半部分的视点DO#44、DO#9、DO#25、DO#5、DO#21、DO#1、DO#37按照编码顺序从EO#1到EO#7依次编码压缩。
可选的,在多视点阵列的上半部分的所有视点编码完成之后,可以对第二子层级的下半部分进行压缩编码。具体的压缩顺序与上半部分类似,可以参考上半部分的描述,不再赘述。
S403,根据参考帧,对第二层级的视点进行预测编码,得到重建图像。
在第一层级的视点编码完成后,第一层级的视点可以为第二层级的视点提供参考信息,作为第二层级的视点的参考。具体而言,可以根据对第一层级的视点编码得到的参考帧,对第二层级的视点进行帧间预测编码,得到重建图像。
因此,本申请实施例通过确定多视点阵列的分层预测结构,并在帧编码过程中首先对第一层级的至少一个视点进行预测编码得到参考帧,然后根据该参考帧对第二层级的视点进行预测编码得到重建 图像,使得第二层级和第一层级的不同位置视点(比如不同行或不同列)之间能够建立参考关系,能够更好地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压缩效率。
另外,由于分层预测结构能够包含多视图之间的空间位置信息,因此本申请实施例的分层预测结构能够包含视点的在多视点阵列中的参考位置,不需要额外的手段,比如建立位置查找表来补充各视点的参考位置,有助于进一步提高多视点阵列的压缩效率。
在一些实施例中,当第二层级进一步包括多个子层级时,步骤403具体可以实现为:
根据第一层级对应的参考帧,按照从低子层级向高子层级的顺序对第二层级中各子层级中的视点依次编码得到重建图像,其中,第二层级中各子层级中每个子层级的视点参考与每个子层级相同或更低子层级的视点进行预测编码。
在一些实施例中,继续参见图6,当第一层级进一步包括第一子层级和第二子层级时,步骤403具体可以为以下步骤S4031:
S4031,根据第一参考帧和第二参考帧,按照从低子层级向高子层级的顺序对第二层级中各子层级中的视点依次编码,其中,各子层级中每个子层级的视点参考与每个子层级相同或更低子层级的视点进行预测编码。
例如,第二层级中第三子层级的视点可以将第一子层级、第二子层级和第三子层级中的至少一个子层级中的已编码视点作为参考帧进行帧间预测编码;第四子层级的视点可以将第一子层级至第四子层级中的至少一个子层级中的已编码视点作为参考帧进行帧间预测编码,以此类推。
在一些实施例中,每个子层级编码时可以自适应地在邻近(包括最邻近或次邻近)的视点中选择参考帧,本申请对此不做限定。
因此,本申请实施例通过在帧编码过程中首先对低子层级视点进行预测编码得到参考帧,然后根据该参考帧对更高子层级的视点进行预测编码得到重建图像,即某个子层级中的视点只能参考相同子层级或更低子层级中的视点进行编码,使得不同子层级的不同位置视点(比如不同行或不同列)之间能够建立参考关系,能够更大程度地利用呈二维分布的视点之间的空间位置相关性,有助于提高多视点阵列的视频压缩效率。
在一些实施例中,在上述步骤4031中,可以按照一维分层编码结构的编码顺序,对多视点阵列的至少两个行逐行编码,其中,对多视点阵列的每行内除第一子层级和第二子层级之外的视点,根据按照从低子层级向高子层级的顺序依次编码。
例如,参见图7,当多视点阵列包括9行视点时,可以按照经典的编码顺序1,9,5,3,2,4,7,6,8逐行进行编码。对于每行内的视点,除已编码的第一子层级和第二子层级的视点之外,根据按照从低子层级向高子层级的顺序依次编码。
在一些实施例中,当一行内包括相同子层级的至少两个视点时,对该至少两个视点可以按照一维分层编码结构的编码顺序逐个进行编码。
例如,继续参见图7,对于第1行中的多个视点进行编码时,第二子层级H1的视点DO#1、DO#5、DO#9已经完成编码,并得到对应的第二参考帧,此时可以首先对第四子层级H3的视点DO#3和DO#7进行帧间预测编码,然后对第五子层级H4的视点DO#2、DO#4、DO#6、DO#8进行帧间预测编码。示例性的,对DO#3和DO#7按照一维分层编码结构的编码顺序可以依次对DO#3和DO#7进行编码,对DO#2、DO#4、DO#6、DO#8按照一维分层编码结构的编码顺序可以依次对DO#2、DO#4、DO#6、DO#8进行编码。
在一些实施例中,还可以在多视点阵列中确定第一部分多视点阵列,其中,该第一部分多视点阵列包括中心视点。然后,可以按照一维分层编码结构的编码顺序,对第一部分多视点阵列的至少两个行逐行编码。
本申请实施例中,可以对第一部分多视点阵列进行独立编码,从而在第一部分多视点阵列编码完成后可以对第一部分多视点阵列的部分编码图像(如除第一部分多视点阵列与其余部分多视点阵列交界处的视点之外的视点对应的编码图像)进行删除,从而实现节省编码缓存。
示例性的,第一部分多视点阵列可以包括多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部分多视点阵列或右下部分多视点阵列,本申请对此不做限定。
如图8所示,以第一部分多视点阵列为多视点阵列的右上部分多视点阵列为例,当完成对第一子层级H0,以及第二子层级H1的上半部分视点的预测编码后,继续将上半部分剩余的第三子层级H2至第七子层级H6的视点进一步划分为左右两部分,这两部分也分别独立的压缩,以实现节省编码缓存。
继续参见图8,以右上部分多视点阵列为例,此时可以按照一维分层编码结构的编码顺序,对该 部分多视点阵列的至少两个行逐行编码,比如按照5,1,3,4,2的顺序逐行编码。在该部分多视点阵列的每行内则按照层级顺序进行编码,优先编码低子层级的视点。
例如,首先对第5行的视点进行编码,具体可以先对第三子层级H2的视点DO#42进行编码,其编码顺序为EO#8,然后对第四子层级H3的视点DO#43、DO#41进行编码,其编码顺序分别为EO#9、EO#10;
然后对第1行的视点进行编码,具体可以先对第四子层级H3的视点DO#7进行编码,其编码顺序为EO#11,然后对第五子层级H4的视点DO#8、DO#6进行编码,其编码顺序分别为EO#12、EO#13;
然后对第3行的视点进行编码,具体可以先对第三子层级H2的视点DO#23进行编码,其编码顺序为EO#14,然后对第四子层级H3的视点DO#27进行编码,其编码顺序为EO#15,然后对第五子层级H4的视点DO#26、DO#24进行编码,其编码顺序分别为EO#16、EO#17;
然后对第4行的视点进行编码,具体可以先对第四子层级H3的视点DO#32进行编码,其编码顺序为EO#18,然后对第五子层级H4的视点DO#36进行编码,其编码顺序为EO#19,然后对第六子层级H5的视点DO#34进行编码,其编码顺序为EO#20,然后对第七子层级H6的视点DO#35、DO#33进行编码,其编码顺序分别为EO#21、EO#22;
然后对第2行的视点进行编码,具体可以先对第四子层级H3的视点DO#14进行编码,其编码顺序为EO#23,然后对第五子层级H4的视点DO#18进行编码,其编码顺序为EO#24,然后对第六子层级H5的视点DO#16进行编码,其编码顺序为EO#25,然后对第七子层级H6的视点DO#17、DO#15进行编码,其编码顺序分别为EO#26、EO#27。
可选的,在对行内视点进行编码时,可以自适应地在邻近(包括最邻近或次邻近)的视点中选择参考帧,例如可以在同行、同列、不同行或不同列中自适应选择已编码视点作为参考帧,本申请对此不做限定。
在一些实施例中,继续参见图8,当右上部分多视点阵列编码完成之后,可以继续对左上部分多视点阵列进行编码。左上部分多视点阵列编码顺序与右上部分类似,可以参考上文中的描述。当右上部分多视点阵列编码完成之后,即完成了上半部分多视点阵列的编码。
进一步的,在上半部分多视点阵列编码完成之后,可以继续对下半部分多视点阵列进行编码。下半部分多视点阵列编码顺序与上半部分类似,可以参考上文中的描述。当下半部分多视点阵列编码完成之后,即完成了整个多视点阵列的编码。图8还示出了多视点阵列中所有视点的编码顺序。
在一些实施例中,不同部分多视点阵列在交界处共有至少两个视点。比如,以第一部分多视点阵列为右上部分多视点阵列,第二部分多视点阵列为左上部分多视点阵列,那么第一部分多视点阵列和第二部分多视点阵列的交界处共有视点DO#5、DO#14、DO#23、DO#32、DO#0。
可选的,在第一部分多视点阵列中的视点都编码完成,且第二部分多视点阵列中的视点还没有编码时,可以删除第一部分多视点阵列中除交界处共有的至少两个视点之外的其他视点对应的重建图像,以节省编码缓存,保持轻量的编码缓存区。
例如,继续参见图8,当完成右上部分多视点阵列中所有视点的编码时,可以删除除与左上部分多视点阵列的交界处共有的视点DO#5、DO#14、DO#23、DO#32、DO#0,以及与右下部分多视点阵列的交界处共有的视点DO#41、DO#42、DO#43、DO#44等视点之外的其他视点的重建图像。由于多视点阵列的各部分分别独立编码,因此删除第一部分多视点阵列中除交界处共有的至少两个视点之外的其他视点对应的重建图像,并不会对其他部分多视点阵列的编码产生影响,从而视点节省编码缓存,保持轻量的编码缓存区的目的。
另外,本申请实施例中,在每个独立编码区域中,最后编码的视点其编码图像不会作为其他视点的参考帧,该类视点可以称为非参考视点,比如本申请实施例中的第七子层级H6的视点,又例如图3A中多视点阵列的第2列、第5列、第7列、图10列。相比于现有技术整列视点为非参考视点而言,本申请实施例能够显著降低非参考视点的数量,从而进一步提高视频压缩效率。
上文以编码端为例对本申请的编码方法进行介绍,下面以解码端为例对本申请实施例提供的视频解码方法进行说明。
图9为本申请实施例提供的一种解码方法500的示意性流程图。如图9所示,本申请实施例的解码方法包括:
S501,确定多视点阵列的分层预测结构,该分层预测结构包括第一层级和第二层级,第一层级包括多视点阵列的至少一个视点,第二层级包括除第一层级之外的其他视点。
S502,对第一层级的至少一个视点进行预测解码,得到参考帧。
S503,根据所参考帧,对第二层级的视点进行预测解码,得到重建图像。
具体而言,解码端获取码流后,可以根据码流确定多视点阵列的分层预测结构。然后,解码端根 据多视点阵列的第一层级和第二层级各视点图像的编码顺序逆向解码获取视频帧后,根据多视点阵列的分层预测结构,即可得到多视点阵列。
在一些实施例中,所述多视点阵列通过多台摄像机从多个角度对同一三维场景进行拍摄得到。
在一些实施例中,其特征在于,所述多视点阵列包括中心对称多视点阵列。
在一些实施例中,所述第一层级包括第一子层级,所述第一子层级包括中心视点。
在一些实施例中,所述对所述第一层级的至少一个视点进行预测解码,得到参考帧,包括:
对所述第一子层级的所述中心视点进行帧内预测解码,得到第一参考帧。
在一些实施例中,所述第一层级还包括第二子层级,所述第二子层级包括均匀分布在所述多视点阵列中的至少两个视点。
在一些实施例中,所述第二子层级包括位于所述多视点阵列的边缘线上均匀分布的多个视点,以及位于所述多视点阵列的对角线上的中心视点和顶点视点的中间位置的视点中的至少一种。
在一些实施例中,所述对所述第一层级的至少一个视点进行预测解码,得到参考帧,包括:
对所述第二子层级的至少两个视点进行帧内预测解码或帧间预测解码,得到第二参考帧。
在一些实施例中,所述第二层级包括第三子层级,所述第三子层级包括位于所述多视点阵列的水平中轴线和竖直中轴线上的所述第一子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第四子层级,所述第四子层级包括位于所述多视点阵列的边缘线上的所述第一子层级的视点之间的视点,以及所述多视点阵列的水平中轴线和竖直中轴线上的所述第三子层级和所述第一子层级之间、所述第三子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第五子层级,所述第五子层级包括位于所述多视点阵列的边缘线上的所述第二子层级和所述第四子层级之间的至少两个视点、在所述多视点阵列的除边缘线和水平中轴线之外的行上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第六子层级,所述第六子层级包括位于所述多视点阵列的除边缘线和竖直中轴线之外的列上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第七子层级,所述第七子层级包括位于所述多视点阵列的除边缘线之外的行中的与所述第六子层级相同行的至少两个视点。
在一些实施例中,所述根据所述参考帧,对所述第二层级的视点进行预测解码,得到重建图像,包括:
根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次解码,其中,所述各子层级中每个子层级的视点参考与所述每个子层级相同或更低子层级的视点进行预测解码。
在一些实施例中,所述根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次解码,包括:
按照一维分层解码结构的解码顺序,对所述多视点阵列的至少两个行逐行解码,其中,对所述多视点阵列的每行内除所述第一子层级和所述第二子层级之外的视点,按照从低子层级向高子层级的顺序依次解码。
在一些实施例中,还包括:
在所述多视点阵列中确定第一部分多视点阵列,其中,所述第一部分多视点阵列包括所述中心视点;
其中,所述按照一维分层解码结构的解码顺序,对所述多视点阵列的至少两个行逐行解码,包括:
按照一维分层解码结构的解码顺序,对所述第一部分多视点阵列的至少两个行逐行解码。
在一些实施例中,所述第一部分多视点阵列包括所述多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部分多视点阵列或右下部分多视点阵列。
在一些实施例中,所述多视点阵列还包括第二部分多视点阵列,所述第二部分多视点阵列包括所述中心视点,所述第一部分多视点阵列与所述第二部分多视点阵列在交界处共有至少两个视点;
所述方法还包括:
删除所述第一部分多视点阵列中除所述交界处共有的至少两个视点之外的其他视点对应的重建图像。
需要说明的是,本申请实施例中,解码方法具体流程可以参见编码方法的流程,这里不再赘述。通过本申请实施例提供的编码的方法在编码端能够得到一个较好的编码效果,提高编码压缩效率,并 且相应的,在解码器也能够相应的提高解码性能。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图4至图9,详细描述了本申请的方法实施例,下文结合图10至图12详细描述本申请的装置实施例。
图10为本申请一实施例提供的编码装置10的示意性框图,该编码装置10应用于上述视频解码端。
如图10所示,编码装置10包括:
确定单元11,用于确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;
编码单元12,用于对所述第一层级的至少一个视点进行预测编码,得到参考帧;
所述编码单元12还用于根据所述参考帧,对所述第二层级的视点进行预测编码,得到重建图像。
在一些实施例中,所述第一层级包括第一子层级,所述第一子层级包括中心视点。
在一些实施例中,所述编码单元12具体用于:
对所述第一子层级的所述中心视点进行帧内预测编码,得到第一参考帧。
在一些实施例中,所述第一层级还包括第二子层级,所述第二子层级包括均匀分布在所述多视点阵列中的至少两个视点。
在一些实施例中,所述第二子层级包括位于所述多视点阵列的边缘线上均匀分布的多个视点,以及位于所述多视点阵列的对角线上的中心视点和顶点视点的中间位置的视点中的至少一种。
在一些实施例中,所述编码单元12具体用于:
对所述第二子层级的至少两个视点进行帧内预测编码或帧间预测编码,得到第二参考帧。
在一些实施例中,所述第二层级包括第三子层级,所述第三子层级包括位于所述多视点阵列的水平中轴线和竖直中轴线上的所述第一子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第四子层级,所述第四子层级包括位于所述多视点阵列的边缘线上的所述第一子层级的视点之间的视点,以及所述多视点阵列的水平中轴线和竖直中轴线上的所述第三子层级和所述第一子层级之间、所述第三子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第五子层级,所述第五子层级包括位于所述多视点阵列的边缘线上的所述第二子层级和所述第四子层级之间的至少两个视点、在所述多视点阵列的除边缘线和水平中轴线之外的行上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第六子层级,所述第六子层级包括位于所述多视点阵列的除边缘线和竖直中轴线之外的列上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第七子层级,所述第七子层级包括位于所述多视点阵列的除边缘线之外的行中的与所述第六子层级相同行的至少两个视点。
在一些实施例中,所述编码单元12具体用于:
根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次编码,其中,所述各子层级中每个子层级的视点参考与所述每个子层级相同或更低子层级的视点进行预测编码。
在一些实施例中,所述编码单元12具体用于:
按照一维分层编码结构的编码顺序,对所述多视点阵列的至少两个行逐行编码,其中,对所述多视点阵列的每行内除所述第一子层级和所述第二子层级之外的视点,按照从低子层级向高子层级的顺 序依次编码。
在一些实施例中,所述编码单元12具体用于:
在所述多视点阵列中确定第一部分多视点阵列,其中,所述第一部分多视点阵列包括所述中心视点;
按照一维分层编码结构的编码顺序,对所述第一部分多视点阵列的至少两个行逐行编码。
在一些实施例中,所述第一部分多视点阵列包括所述多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部分多视点阵列或右下部分多视点阵列。
在一些实施例中,,所述多视点阵列还包括第二部分多视点阵列,所述第二部分多视点阵列包括所述中心视点,所述第一部分多视点阵列与所述第二部分多视点阵列在交界处共有至少两个视点;
所述编码单元12还用于:删除所述第一部分多视点阵列中除所述交界处共有的至少两个视点之外的其他视点对应的重建图像。
在一些实施例中,所述多视点阵列通过多台摄像机从多个角度对同一三维场景进行拍摄得到。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图10所示的装置10可以执行本申请实施例的编码端的编码方法,并且装置10中的各个单元的前述和其它操作和/或功能分别为了实现上述编码端的编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图11是本申请一实施例提供的解码装置的示意性框图,该解码装置应用于上述解码端。
如图11所示,该解码装置20可以包括:
确定单元21,用于确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;
解码单元22,用于对所述第一层级的至少一个视点进行预测解码,得到参考帧;
所述解码单元22还用于根据所述参考帧,对所述第二层级的视点进行预测解码,得到重建图像。
在一些实施例中,所述第一层级包括第一子层级,所述第一子层级包括中心视点。
在一些实施例中,所述解码单元22具体用于:
对所述第一子层级的所述中心视点进行帧内预测解码,得到第一参考帧。
在一些实施例中,所述第一层级还包括第二子层级,所述第二子层级包括均匀分布在所述多视点阵列中的至少两个视点。
在一些实施例中,所述第二子层级包括位于所述多视点阵列的边缘线上均匀分布的多个视点,以及位于所述多视点阵列的对角线上的中心视点和顶点视点的中间位置的视点中的至少一种。
在一些实施例中,所述解码单元22具体用于:
对所述第二子层级的至少两个视点进行帧内预测解码或帧间预测解码,得到第二参考帧。
在一些实施例中,所述第二层级包括第三子层级,所述第三子层级包括位于所述多视点阵列的水平中轴线和竖直中轴线上的所述第一子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第四子层级,所述第四子层级包括位于所述多视点阵列的边缘线上的所述第一子层级的视点之间的视点,以及所述多视点阵列的水平中轴线和竖直中轴线上的所述第三子层级和所述第一子层级之间、所述第三子层级和所述第二子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第五子层级,所述第五子层级包括位于所述多视点阵列的边缘线上的所述第二子层级和所述第四子层级之间的至少两个视点、在所述多视点阵列的除边缘线和水平中轴线之外的行上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第六子层级,所述第六子层级包括位于所述多视点阵列的除边缘线和竖直中轴线之外的列上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
在一些实施例中,所述第二层级还包括第七子层级,所述第七子层级包括位于所述多视点阵列的除边缘线之外的行中的与所述第六子层级相同行的至少两个视点。
在一些实施例中,所述根据所述参考帧,对所述第二层级的视点进行帧间预测解码,得到重建图像,包括:
根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次解码,其中,所述各子层级中每个子层级的视点参考与所述每个子层级相同或更低子层级的视点进行预测解码。
在一些实施例中,所述解码单元22具体用于:
按照一维分层解码结构的解码顺序,对所述多视点阵列的至少两个行逐行解码,其中,对所述多视点阵列的每行内除所述第一子层级和所述第二子层级之外的视点,按照从低子层级向高子层级的顺序依次解码。
在一些实施例中,所述解码单元22具体用于:
在所述多视点阵列中确定第一部分多视点阵列,其中,所述第一部分多视点阵列包括所述中心视点;
按照一维分层解码结构的解码顺序,对所述第一部分多视点阵列的至少两个行逐行解码。
在一些实施例中,所述第一部分多视点阵列包括所述多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部分多视点阵列或右下部分多视点阵列。
在一些实施例中,所述多视点阵列还包括第二部分多视点阵列,所述第二部分多视点阵列包括所述中心视点,所述第一部分多视点阵列与所述第二部分多视点阵列在交界处共有至少两个视点;
所述解码单元22还用于:删除所述第一部分多视点阵列中除所述交界处共有的至少两个视点之外的其他视点对应的重建图像。
在一些实施例中,所述多视点阵列通过多台摄像机从多个角度对同一三维场景进行拍摄得到。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图11所示的装置20可以对应于执行本申请实施例的解码端的预测方法中的相应主体,并且装置20中的各个单元的前述和其它操作和/或功能分别为了实现解码端的解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图12是本申请实施例提供的电子设备的示意性框图。
如图12所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法400或500中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图12所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一标志,或者包括第一标志和第二标志。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。
Claims (39)
- 一种编码方法,其特征在于,包括:确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;对所述第一层级的至少一个视点进行预测编码,得到参考帧;根据所述参考帧,对所述第二层级的视点进行预测编码,得到重建图像。
- 根据权利要求1所述的方法,其特征在于,所述第一层级包括第一子层级,所述第一子层级包括中心视点。
- 根据权利要求2所述的方法,其特征在于,所述对所述第一层级的至少一个视点进行预测编码,得到参考帧,包括:对所述第一子层级的所述中心视点进行帧内预测编码,得到第一参考帧。
- 根据权利要求2所述的方法,其特征在于,所述第一层级还包括第二子层级,所述第二子层级包括均匀分布在所述多视点阵列中的至少两个视点。
- 根据权利要求4所述的方法,其特征在于,所述第二子层级包括位于所述多视点阵列的边缘线上均匀分布的多个视点,以及位于所述多视点阵列的对角线上的中心视点和顶点视点的中间位置的视点中的至少一种。
- 根据权利要求4所述的方法,其特征在于,所述对所述第一层级的至少一个视点进行预测编码,得到参考帧,包括:对所述第二子层级的至少两个视点进行帧内预测编码或帧间预测编码,得到第二参考帧。
- 根据权利要求4所述的方法,其特征在于,所述第二层级包括第三子层级,所述第三子层级包括位于所述多视点阵列的水平中轴线和竖直中轴线上的所述第一子层级和所述第二子层级之间的至少两个视点。
- 根据权利要求7所述的方法,其特征在于,所述第二层级还包括第四子层级,所述第四子层级包括位于所述多视点阵列的边缘线上的所述第一子层级的视点之间的视点,以及所述多视点阵列的水平中轴线和竖直中轴线上的所述第三子层级和所述第一子层级之间、所述第三子层级和所述第二子层级之间的至少两个视点。
- 根据权利要求8所述的方法,其特征在于,所述第二层级还包括第五子层级,所述第五子层级包括位于所述多视点阵列的边缘线上的所述第二子层级和所述第四子层级之间的至少两个视点、在所述多视点阵列的除边缘线和水平中轴线之外的行上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
- 根据权利要求9所述的方法,其特征在于,所述第二层级还包括第六子层级,所述第六子层级包括位于所述多视点阵列的除边缘线和竖直中轴线之外的列上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
- 根据权利要求10所述的方法,其特征在于,所述第二层级还包括第七子层级,所述第七子层级包括位于所述多视点阵列的除边缘线之外的行中的与所述第六子层级相同行的至少两个视点。
- 根据权利要求7-11任一项所述的方法,其特征在于,所述根据所述参考帧,对所述第二层级的视点进行预测编码,得到重建图像,包括:根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次编码,其中,所述各子层级中每个子层级的视点参考与所述每个子层级相同或更低子层级的视点进行预测编码。
- 根据权利要求12所述的方法,其特征在于,所述根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次编码,包括:按照一维分层编码结构的编码顺序,对所述多视点阵列的至少两个行逐行编码,其中,对所述多视点阵列的每行内除所述第一子层级和所述第二子层级之外的视点,按照从低子层级向高子层级的顺序依次编码。
- 根据权利要求13所述的方法,其特征在于,还包括:在所述多视点阵列中确定第一部分多视点阵列,其中,所述第一部分多视点阵列包括所述中心视点;其中,所述按照一维分层编码结构的编码顺序,对所述多视点阵列的至少两个行逐行编码,包括:按照一维分层编码结构的编码顺序,对所述第一部分多视点阵列的至少两个行逐行编码。
- 根据权利要求14所述的方法,其特征在于,所述第一部分多视点阵列包括所述多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部 分多视点阵列或右下部分多视点阵列。
- 根据权利要求14或15所述的方法,其特征在于,所述多视点阵列还包括第二部分多视点阵列,所述第二部分多视点阵列包括所述中心视点,所述第一部分多视点阵列与所述第二部分多视点阵列在交界处共有至少两个视点;所述方法还包括:删除所述第一部分多视点阵列中除所述交界处共有的至少两个视点之外的其他视点对应的重建图像。
- 根据权利要求1-16任一项所述的方法,其特征在于,所述多视点阵列通过多台摄像机从多个角度对同一三维场景进行拍摄得到。
- 一种解码方法,其特征在于,包括:确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;对所述第一层级的至少一个视点进行预测解码,得到参考帧;根据所述参考帧,对所述第二层级的视点进行预测解码,得到重建图像。
- 根据权利要求18所述的方法,其特征在于,所述第一层级包括第一子层级,所述第一子层级包括中心视点。
- 根据权利要求19所述的方法,其特征在于,所述对所述第一层级的至少一个视点进行预测解码,得到参考帧,包括:对所述第一子层级的所述中心视点进行帧内预测解码,得到第一参考帧。
- 根据权利要求19所述的方法,其特征在于,所述第一层级还包括第二子层级,所述第二子层级包括均匀分布在所述多视点阵列中的至少两个视点。
- 根据权利要求21所述的方法,其特征在于,所所述第二子层级包括位于所述多视点阵列的边缘线上均匀分布的多个视点,以及位于所述多视点阵列的对角线上的中心视点和顶点视点的中间位置的视点中的至少一种。
- 根据权利要求21所述的方法,其特征在于,所述对所述第一层级的至少一个视点进行预测解码,得到参考帧,包括:对所述第二子层级的至少两个视点进行帧内预测解码或帧间预测解码,得到第二参考帧。
- 根据权利要求21所述的方法,其特征在于,所述第二层级包括第三子层级,所述第三子层级包括位于所述多视点阵列的水平中轴线和竖直中轴线上的所述第一子层级和所述第二子层级之间的至少两个视点。
- 根据权利要求24所述的方法,其特征在于,所述第二层级还包括第四子层级,所述第四子层级包括位于所述多视点阵列的边缘线上的所述第一子层级的视点之间的视点,以及所述多视点阵列的水平中轴线和竖直中轴线上的所述第三子层级和所述第一子层级之间、所述第三子层级和所述第二子层级之间的至少两个视点。
- 根据权利要求25所述的方法,其特征在于,所述第二层级还包括第五子层级,所述第五子层级包括位于所述多视点阵列的边缘线上的所述第二子层级和所述第四子层级之间的至少两个视点、在所述多视点阵列的除边缘线和水平中轴线之外的行上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
- 根据权利要求26所述的方法,其特征在于,所述第二层级还包括第六子层级,所述第六子层级包括位于所述多视点阵列的除边缘线和竖直中轴线之外的列上的所述第三子层级和所述第二子层级之间、所述第二子层级和所述第四子层级之间的至少两个视点。
- 根据权利要求27所述的方法,其特征在于,所述第二层级还包括第七子层级,所述第七子层级包括位于所述多视点阵列的除边缘线之外的行中的与所述第六子层级相同行的至少两个视点。
- 根据权利要求24-28任一项所述的方法,其特征在于,所述根据所述参考帧,对所述第二层级的视点进行预测解码,得到重建图像,包括:根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次解码,其中,所述各子层级中每个子层级的视点参考与所述每个子层级相同或更低子层级的视点进行预测解码。
- 根据权利要求29所述的方法,其特征在于,所述根据所述参考帧,按照从低子层级向高子层级的顺序对所述第二层级中各子层级中的视点依次解码,包括:按照一维分层解码结构的解码顺序,对所述多视点阵列的至少两个行逐行解码,其中,对所述多视点阵列的每行内除所述第一子层级和所述第二子层级之外的视点,按照从低子层级向高子层级的顺 序依次解码。
- 根据权利要求30所述的方法,其特征在于,还包括:在所述多视点阵列中确定第一部分多视点阵列,其中,所述第一部分多视点阵列包括所述中心视点;其中,所述按照一维分层解码结构的解码顺序,对所述多视点阵列的至少两个行逐行解码,包括:按照一维分层解码结构的解码顺序,对所述第一部分多视点阵列的至少两个行逐行解码。
- 根据权利要求31所述的方法,其特征在于,所述第一部分多视点阵列包括所述多视点阵列的上半部分多视点阵列、下半部分多视点阵列、右上部分多视点阵列、左上部分多视点阵列、左下部分多视点阵列或右下部分多视点阵列。
- 根据权利要求31或32所述的方法,其特征在于,所述多视点阵列还包括第二部分多视点阵列,所述第二部分多视点阵列包括所述中心视点,所述第一部分多视点阵列与所述第二部分多视点阵列在交界处共有至少两个视点;所述方法还包括:删除所述第一部分多视点阵列中除所述交界处共有的至少两个视点之外的其他视点对应的重建图像。
- 根据权利要求18-33任一项所述的方法,其特征在于,所述多视点阵列通过多台摄像机从多个角度对同一三维场景进行拍摄得到。
- 一种编码装置,其特征在于,包括:确定单元,用于确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;编码单元,用于对所述第一层级的至少一个视点进行预测编码,得到参考帧;所述编码单元还用于根据所述参考帧,对所述第二层级的视点进行帧间预测编码,得到重建图像。
- 一种解码装置,其特征在于,包括:确定单元,用于确定多视点阵列的分层预测结构;其中,所述分层预测结构包括第一层级和第二层级,所述第一层级包括所述多视点阵列的至少一个视点,所述第二层级包括除所述第一层级之外的其他视点;解码单元,对所述第一层级的至少一个视点进行预测解码,得到参考帧;所述解码单元还用于根据所述参考帧,对所述第二层级的视点进行帧间预测解码,得到重建图像。
- 一种电子设备,其特征在于,包括处理器和存储器;所示存储器用于存储计算机程序;所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现上述权利要求1至34任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序;所述计算机程序使得计算机执行如上述权利要求1至34任一项所述的方法。
- 一种码流,其特征在于,所述码流是基于如上述权利要求1至16任一项所述的方法生成的。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/125490 WO2024077616A1 (zh) | 2022-10-14 | 2022-10-14 | 编解码方法、装置、设备、及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/125490 WO2024077616A1 (zh) | 2022-10-14 | 2022-10-14 | 编解码方法、装置、设备、及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024077616A1 true WO2024077616A1 (zh) | 2024-04-18 |
Family
ID=90668583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/125490 WO2024077616A1 (zh) | 2022-10-14 | 2022-10-14 | 编解码方法、装置、设备、及存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024077616A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010027142A2 (ko) * | 2008-09-05 | 2010-03-11 | 에스케이 텔레콤주식회사 | 다시점 비디오의 송수신 시스템 및 방법 |
CN101867813A (zh) * | 2010-04-23 | 2010-10-20 | 南京邮电大学 | 面向交互式应用的多视点视频编码方法 |
CN103636222A (zh) * | 2011-04-19 | 2014-03-12 | 三星电子株式会社 | 用于对多视点视频进行统一可分级视频编码的方法和设备以及对多视点视频进行统一可分级视频解码的方法和设备 |
CN104396252A (zh) * | 2012-04-25 | 2015-03-04 | 三星电子株式会社 | 使用用于多视点视频预测的参考画面集的多视点视频编码方法及其装置、使用用于多视点视频预测的参考画面集的多视点视频解码方法及其装置 |
CN105472367A (zh) * | 2015-11-23 | 2016-04-06 | 浙江大学 | 基于gop片划分的支持空域随机访问的自适应多视点视频编码方法 |
CN110392258A (zh) * | 2019-07-09 | 2019-10-29 | 武汉大学 | 一种联合时空边信息的分布式多视点视频压缩采样重建方法 |
-
2022
- 2022-10-14 WO PCT/CN2022/125490 patent/WO2024077616A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010027142A2 (ko) * | 2008-09-05 | 2010-03-11 | 에스케이 텔레콤주식회사 | 다시점 비디오의 송수신 시스템 및 방법 |
CN101867813A (zh) * | 2010-04-23 | 2010-10-20 | 南京邮电大学 | 面向交互式应用的多视点视频编码方法 |
CN103636222A (zh) * | 2011-04-19 | 2014-03-12 | 三星电子株式会社 | 用于对多视点视频进行统一可分级视频编码的方法和设备以及对多视点视频进行统一可分级视频解码的方法和设备 |
CN104396252A (zh) * | 2012-04-25 | 2015-03-04 | 三星电子株式会社 | 使用用于多视点视频预测的参考画面集的多视点视频编码方法及其装置、使用用于多视点视频预测的参考画面集的多视点视频解码方法及其装置 |
CN105472367A (zh) * | 2015-11-23 | 2016-04-06 | 浙江大学 | 基于gop片划分的支持空域随机访问的自适应多视点视频编码方法 |
CN110392258A (zh) * | 2019-07-09 | 2019-10-29 | 武汉大学 | 一种联合时空边信息的分布式多视点视频压缩采样重建方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12003701B2 (en) | File format signaling of error mitigation in sub-picture bitstream based viewport dependent video coding | |
US20220159261A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
US20190273929A1 (en) | De-Blocking Filtering Method and Terminal | |
US11943451B2 (en) | Chroma block prediction method and apparatus | |
KR102585498B1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신장치 및 포인트 클라우드 데이터 수신 방법 | |
TW201911863A (zh) | 用於360度視訊寫碼之參考圖推導及動作補償 | |
US9538239B2 (en) | Decoder and method for decoding encoded input data containing a plurality of blocks or packets | |
US20130271565A1 (en) | View synthesis based on asymmetric texture and depth resolutions | |
WO2020088324A1 (zh) | 一种视频图像预测方法及装置 | |
US20230328241A1 (en) | Tile and sub-picture partitioning | |
BR112021004442A2 (pt) | método de decodificação de vídeo e decodificador de vídeo | |
CN113302663B (zh) | 点云译码中的有效patch旋转 | |
US20210337220A1 (en) | Video Encoder, Video Decoder, and Corresponding Method | |
CN115514972A (zh) | 视频编解码的方法、装置、电子设备及存储介质 | |
US20240314326A1 (en) | Video Coding Method and Related Apparatus Thereof | |
WO2022166462A1 (zh) | 编码、解码方法和相关设备 | |
EP3836542B1 (en) | Picture partition method and device | |
WO2023051156A1 (zh) | 视频图像的处理方法及装置 | |
WO2024077616A1 (zh) | 编解码方法、装置、设备、及存储介质 | |
US20220166982A1 (en) | Video encoder and qp setting method | |
WO2023201504A1 (zh) | 编解码方法、装置、设备及存储介质 | |
WO2024011386A1 (zh) | 一种编解码方法、装置、编码器、解码器及存储介质 | |
WO2024077806A1 (zh) | 一种编解码方法、装置、编码器、解码器及存储介质 | |
WO2024213012A1 (en) | Visual volumetric video-based coding method, encoder and decoder | |
US20240236352A1 (en) | Bitstream syntax for mesh motion field coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22961802 Country of ref document: EP Kind code of ref document: A1 |