WO2024108931A1 - 视频编解码方法及装置 - Google Patents
视频编解码方法及装置 Download PDFInfo
- Publication number
- WO2024108931A1 WO2024108931A1 PCT/CN2023/094873 CN2023094873W WO2024108931A1 WO 2024108931 A1 WO2024108931 A1 WO 2024108931A1 CN 2023094873 W CN2023094873 W CN 2023094873W WO 2024108931 A1 WO2024108931 A1 WO 2024108931A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- filter
- image
- type
- rendering information
- pixels
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 195
- 238000009877 rendering Methods 0.000 claims abstract description 220
- 238000012545 processing Methods 0.000 claims abstract description 124
- 238000013507 mapping Methods 0.000 claims description 46
- 238000010586 diagram Methods 0.000 claims description 32
- 238000001914 filtration Methods 0.000 claims description 30
- 238000012805 post-processing Methods 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 19
- 238000007906 compression Methods 0.000 abstract description 23
- 230000006835 compression Effects 0.000 abstract description 22
- 230000006870 function Effects 0.000 description 14
- 230000000694 effects Effects 0.000 description 12
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 10
- 230000002123 temporal effect Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 6
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 5
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 5
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 5
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
Definitions
- the present application relates to the field of computers, and in particular to a video encoding and decoding method and device.
- video compression technology is particularly important.
- the video compression system performs spatial (intra-image) prediction and/or temporal (inter-image) prediction to reduce or remove redundant information inherent in the video sequence.
- the encoder selects one or more reference frames from the encoded frames of the video image at random, obtains the prediction block corresponding to the current image block from the reference frame, and then calculates the residual value between the prediction block and the current image block, and quantizes and encodes the residual value.
- the prediction block is selected from the reference frame by the encoder according to motion compensation and motion estimation.
- the reference frame is an image processed by a loop filter, and the reference frame is an image corresponding to the adjacent frame of the current image block, resulting in a low similarity between the reference frame and the current image block, and a large difference between the aforementioned prediction block and the current image block, so that the residual value between the prediction block and the current image block is large, the amount of data of the video code stream obtained by encoding is large, the performance of the video compression process is low, and the efficiency of video encoding is affected. Therefore, how to provide a more effective video coding and decoding method has become an urgent problem to be solved.
- the present application provides a video encoding and decoding method and device, which solves the problem that the bit stream data volume obtained by video compression is large and the video compression rate is low.
- the present application provides a video encoding method, which can be applied to a codec system or to an encoding end that supports the codec system to implement the video encoding method, for example, the encoding end includes a video encoder.
- the encoding end performs the video encoding method provided by this embodiment as an example for explanation, and the video encoding method includes: first, the encoding end obtains the source video and the rendering information corresponding to the source video. Second, the encoding end determines a first type of filter from a set multiple types of filters based on the rendering information. Third, the encoding end interpolates the first decoded image based on the first type of filter to obtain a virtual reference frame of the second image. Fourth, the encoding end encodes the second image based on the virtual reference frame of the second image to obtain a code stream corresponding to the second image.
- the source video includes multiple source images.
- the rendering information is used to indicate: processing parameters used by the encoder in the process of generating a bitstream according to the multiple source images, and the processing parameters indicate that the first image and the second image in the multiple source images have at least a partial overlapping area.
- the first decoded image is a reconstructed image obtained by decoding the first image in the multiple images after encoding.
- the encoder determines a first type of filter from the set multiple types of filters, and the first type of filter matches the filter used in the image rendering process, which is conducive to improving the similarity between the virtual reference frame determined by the encoder using the first type of filter and the second image. Therefore, when the encoder encodes the second image based on the virtual reference frame, the residual value corresponding to the second image is reduced, and the amount of data corresponding to the second image in the code stream is reduced, which improves the compression performance and the efficiency of video encoding.
- the encoder determines the first type of filter according to the rendering information, including: the encoder queries a mapping table set according to the rendering information to obtain a filter corresponding to each processing parameter in the rendering information, and the encoder determines the first type of filter from all filters corresponding to the rendering information.
- the mapping table is used to indicate at least one type of filter among the multiple types of filters corresponding to each processing parameter.
- the mapping table indicates the filter corresponding to each processing parameter in the rendering information, and the corresponding relationship between the processing parameter and the filter is determined based on the filter used in each processing process of the image rendering engine.
- the encoder queries the aforementioned mapping table according to the acquired processing parameters to determine the first type of filter, and the first type of filter matches the filter used by the image rendering engine, and then the virtual reference frame obtained by the encoder using the first type of filter is improved in the spatial and temporal domain correlation with the second image, and the residual value of the second image determined by the encoder according to the virtual reference frame is reduced, thereby improving the encoding effect.
- the mapping table is also used to indicate: the priority of each type of filter in multiple types of filters; the encoding end determines the first type of filter from the filters corresponding to the rendering information, including: the encoding end obtains the priority of all filters corresponding to the rendering information; and the encoding end uses the filter with the highest filter priority among all filters as the first type of filter.
- the encoder selects the filter corresponding to the processing parameter with the highest priority from the mapping table as the first type of filter according to the processing parameters included in the rendering information.
- the filter with the highest priority also has the highest matching degree with the filter used by the image rendering engine.
- the encoder interpolates the reference frame using the filter with the highest matching degree to obtain a virtual reference frame.
- the temporal and spatial correlation between the virtual reference frame and the second image is improved, and then the residual value corresponding to the second image obtained by the encoder according to the virtual reference frame is reduced, thereby improving the encoding effect, reducing the amount of data obtained by encoding, and improving the compression performance.
- the encoder determines the first type of filter from all filters corresponding to the rendering information, including: the encoder predicts all filters corresponding to the rendering information to obtain at least one prediction result. And, the encoder selects a target prediction result whose encoding information meets a set condition from the at least one prediction result, and uses the filter corresponding to the target prediction result as the first type of filter.
- one prediction result corresponds to one filter among all filters, and one prediction result is used to indicate: the encoding information obtained by the encoding end by pre-encoding the second image based on one filter, and the encoding information includes: at least one of the predicted bit rate and distortion rate of the second image.
- the above precoding is a process of performing inter-frame prediction on the second image to obtain a prediction block of the second image, and obtaining a residual value of a corresponding image block in the second image according to the prediction block, and then performing a transformation process.
- the encoder performs pre-encoding by using all filters corresponding to the processing parameters in the rendering information.
- the encoder screens the prediction results obtained by pre-encoding according to the set conditions, and uses the filters corresponding to the target prediction results that meet the set conditions as the first type of filters, thereby determining the filters that actually meet the preset conditions among all filters as the first type of filters, and then the virtual reference frame obtained by the encoder through interpolation according to the first type of filters has the greatest similarity with the second image.
- the similarity between the virtual reference frame and the second image is the greatest, the residual value corresponding to the second image determined by the encoder using the virtual reference frame to encode the second image is the smallest, thereby improving the encoding compression effect.
- the rendering information includes: a combination of one or more of a depth map, an albedo map, and a post-processing parameter.
- the present application provides a first type of filter or a step for determining the first type of filter corresponding to each processing parameter.
- the first type of filter is determined by the relationship between the pixels in the depth map and the edge information of the object in the depth map.
- the depth map may correspond to Catmull-Rom filter or bilinear filter; or,
- the first type of filter is determined by the relationship between the pixels in the albedo map and the edge information of the object in the albedo map.
- the albedo map may correspond to a Catmull-Rom filter or a B-spline filter; or,
- the first type of filter is the filter indicated by the anti-aliasing parameters.
- the first type of filter is a bilinear filter
- the first type of filter is a Catmull-Rom filter.
- the edge information of the object in the depth map can be obtained in the following manner: the encoder performs a Sobel filter on the depth map to obtain a filtering result, and the encoder performs a binarization process on the filtering result to obtain the edge information of the object in the depth map.
- the encoding end determines the first type of filter corresponding to the rendering information according to the correspondence between the above-mentioned processing parameters and the filters.
- the first type of filter has a certain degree of matching with the filter used by the image rendering engine.
- the encoding end uses the first type of filter to interpolate the reference frame to obtain a virtual reference frame.
- the matching degree between the virtual reference frame and the second image is improved.
- the encoding end encodes the second image according to the virtual reference frame, the residual value corresponding to the obtained second image is reduced, thereby improving the encoding effect.
- the encoder interpolates the first decoded image according to the first type of filter determined by the rendering information to obtain a virtual reference frame of the second image, including: first, the encoder generates a blank image with the same size as the reference block. Secondly, the encoder interpolates the reference block using the split graphic motion vector map according to the first corresponding relationship to determine the pixel values of all pixels in the blank image. Finally, the encoder obtains the virtual reference frame according to the pixel values of all pixels in the blank image.
- the first decoded image includes one or more reference blocks corresponding to the first image.
- the first corresponding relationship is used to indicate: in the overlapping area of the first image and the second image, the position mapping relationship between the pixels in the first image and the pixels in the second image.
- the encoding end interpolates the reference block in the first decoded image according to the first corresponding relationship to obtain a virtual reference frame of the second image.
- the similarity between the virtual reference frame and the second image is higher than the similarity between the first decoded image and the second image, so that the residual value between the virtual reference frame and the second image is reduced, and then while ensuring the consistency of image quality, the amount of data of the image in the code stream is reduced, thereby improving the encoding effect.
- the first correspondence is obtained by the following method: first, the encoder generates a blank image with the same size as the source image. Secondly, the encoder determines a second correspondence between a pixel in the first graphic motion vector map and a pixel position in the blank image based on the size between the first graphic motion vector map and the blank image. Finally, the encoder interpolates the graphic motion vector map using an interpolation filter to determine a pixel value and an offset of each pixel in the blank image.
- the encoder interpolates the graphic motion vector to obtain an interpolated graphic motion vector.
- the size of the interpolated graphic motion vector is consistent with the size of the first decoded image, and the interpolated graphic motion vector matches the pixels of the first decoded image.
- the encoder interpolates the first decoded image based on the interpolated graphic motion vector to obtain a virtual reference frame, and the matching degree between the virtual reference frame and the second image is improved.
- the encoder encodes the second image according to the virtual reference frame, the residual value corresponding to the image is reduced, thereby improving the compression performance.
- the video encoding method includes: first, the encoding end determines the target discrete parameters of pixels in the first decoded image based on the graphic motion vector map indicated in the rendering information. Second, the first type of filter in the encoding end interpolates the reference block using the first filter kernel, and calculates the first discrete parameters of the pixels in each channel of the interpolated virtual reference block. Third, the encoding end determines the first discrete parameter with the smallest difference from the target discrete parameter from at least one first discrete parameter corresponding to all filter kernels, and uses the filter kernel corresponding to the first discrete parameter as the parameter of the first type of filter.
- the target discrete parameter includes: a target variance or a target covariance.
- the first discrete parameter includes a first variance or a first covariance.
- the first filter kernel is one of the set multiple filter kernels.
- the encoder determines the first discrete parameter with the smallest difference from the target discrete parameter, and uses the first filter kernel corresponding to the first discrete parameter as the parameter of the first type of filter.
- the encoder interpolates the first decoded image using the aforementioned first type of filter and the parameters of the first type of filter to obtain a virtual reference frame.
- the virtual reference frame has an improved match with the second image.
- the video encoding method further includes: the encoding end writes type information of the first type of filter into a bit stream.
- the encoder sends the type information of the first type of filter to the decoder, thereby avoiding the decoder from performing a process of determining the first type of filter according to the rendering information, thereby speeding up the video decoding efficiency.
- the present application provides a video decoding method, which can be applied to a codec system or to a decoding end that supports the codec system to implement the video decoding method, for example, the decoding end includes a video decoder.
- the decoding end performs the video decoding method provided by the present embodiment as an example for explanation.
- the video decoding method includes: first, the decoding end obtains a code stream and rendering information corresponding to the code stream, and the code stream includes multiple image frames. Second, the decoding end determines a first type of filter from a set multiple types of filters based on the rendering information.
- the decoding end interpolates the first decoded image corresponding to the first image frame according to the first type of filter to obtain a virtual reference frame of the second image frame.
- the decoding end decodes the second image frame based on the virtual reference frame to obtain a second decoded image of the second image frame.
- the rendering information is used to indicate: processing parameters used in the process of generating a code stream according to multiple source images, and the processing parameters indicate that a source image corresponding to a first image frame and a source image corresponding to a second image frame in multiple image frames have at least a partially overlapping area.
- the code stream may be sent from the encoding end to the decoding end.
- the decoding end selects a first type of filter from the set multiple types of filters according to the processing parameters indicated by the rendering information, and uses the first type of filter to obtain a virtual reference frame of the second image frame, thereby avoiding the problem of poor decoding effect caused by using a fixed filter to process the first decoded image to obtain a reference frame.
- the matching degree between the virtual reference frame and the source image corresponding to the second image frame is improved.
- the decoding end determines the first type of filter according to the rendering information, including: the decoding end queries a set mapping table according to the rendering information to obtain a filter corresponding to each processing parameter in the rendering information, and the decoding end determines the first type of filter from all filters corresponding to the rendering information.
- the mapping table is used to indicate at least one type of filter among the multiple types of filters corresponding to each processing parameter.
- the decoding end queries the mapping table set according to the rendering information, determines the first type of filter, and the first type of filter has a certain degree of matching with the filter used by the image rendering engine.
- the decoding end uses the first type of filter to perform Interpolation is performed to obtain a virtual reference frame, and the temporal and spatial correlation between the virtual reference frame and the second image frame is improved, that is, the matching degree between the virtual reference frame and the source image corresponding to the second image frame is improved.
- the decoding end decodes the second image frame according to the virtual reference frame, the amount of data processed is reduced, thereby improving the decoding efficiency.
- the mapping table is also used to indicate: the priority of each type of filter in multiple types of filters; the decoding end determines the first type of filter from the filters corresponding to the rendering information, including: the decoding end obtains the priority of all filters corresponding to the rendering information; and the decoding end uses the filter with the highest filter priority among all filters as the first type of filter.
- the decoding end selects the filter corresponding to the processing parameter with the highest priority from the mapping table as the first type of filter according to the processing parameters included in the rendering information.
- the filter with the highest priority also has the highest matching degree with the filter used by the image rendering engine.
- the decoding end interpolates the reference frame using the filter with the highest matching degree to obtain a virtual reference frame.
- the virtual reference frame has a higher matching degree with the source image corresponding to the second image frame.
- the rendering information includes: a combination of one or more of a depth map, an albedo map, and a post-processing parameter.
- the present application provides a first type of filter or a determination step of the first type of filter corresponding to each processing parameter. If the rendering information includes a depth map, the first type of filter is determined by the relationship between the pixels in the depth map and the edge information of the object in the depth map; or,
- the first type of filter is determined by the relationship between the pixels in the albedo map and the edge information of the object in the albedo map; or,
- the first type of filter is the filter indicated by the anti-aliasing parameters.
- the first type of filter is a bilinear filter
- the first type of filter is a Catmull-Rom filter.
- the edge information of the object in the above depth map can be obtained in the following manner: the decoding end performs a sobel filter on the depth map to obtain a filtering result, and the decoding end performs a binarization process on the filtering result to obtain the edge information of the object in the depth map.
- the decoding end determines the first type of filter corresponding to the rendering information according to the correspondence between the above-mentioned processing parameters and the filters.
- the first type of filter has a certain degree of matching with the filter used by the image rendering engine.
- the decoding end uses the first type of filter to interpolate the reference frame to obtain a virtual reference frame.
- the degree of matching between the virtual reference frame and the source image corresponding to the second image frame is improved.
- the video decoding method before the decoding end interpolates the first decoded image corresponding to the first image frame according to the first type of filter determined by the rendering information, the video decoding method also includes: the decoding end obtains the filter information; and the decoding end determines the first type of filter and the filtering parameters corresponding to the first type of filter according to the filter type and parameters indicated in the filter information.
- the filtering parameter is used to indicate: a processing parameter used by the decoding end in a process of obtaining a virtual reference frame based on the first type of filter.
- the encoder determines the first type of filter and the filtering parameters corresponding to the first type of filter
- the encoder sets the first The filtering parameters corresponding to the first type of filter and the second type of filter are written into the bitstream, and the bitstream is sent to the decoding end.
- the decoding end directly obtains the first type of filter and the filtering parameters corresponding to the first type of filter, thereby avoiding the decoding end from executing the processing of determining the first type of filter according to the rendering information, thereby speeding up the video decoding efficiency.
- the decoding end interpolates the first decoded image corresponding to the first image frame according to the first type of filter determined by the rendering information, and obtains the virtual reference frame of the second image frame, including: first, the decoding end generates a blank image with the same size as the reference block. Secondly, the decoding end interpolates the reference block according to the first corresponding relationship using the split graphic motion vector map to determine the pixel values of all pixels in the blank image. Finally, the decoding end obtains the virtual reference frame according to the pixel values of all pixels in the blank image.
- the first decoded image includes one or more reference blocks corresponding to the first image.
- the first correspondence relationship is used to indicate: in an overlapping area of a source image corresponding to the first image frame and a source image corresponding to the second image frame, a pixel position mapping relationship between a source image corresponding to the first image frame and a source image corresponding to the second image frame.
- the decoding end interpolates the reference block in the first decoded image according to the first corresponding relationship to obtain a virtual reference frame of the second image.
- the similarity between the virtual reference frame and the second image frame is higher than the similarity between the first decoded image and the second image frame.
- the first correspondence is obtained by the following method: first, the decoder generates a blank image with the same size as the first decoded image. Secondly, the decoder determines a second correspondence between a pixel in the first graphic motion vector map and a pixel position in the blank image based on the size ratio between the first graphic motion vector map and the blank image. Finally, the decoder interpolates the graphic motion vector map using an interpolation filter to determine the pixel value and offset of each pixel in the blank image.
- the first decoded image has the same size as the source image.
- the decoding end interpolates the graphic motion vector to obtain an interpolated graphic motion vector.
- the size of the interpolated graphic motion vector is consistent with the size of the first decoded image, and the interpolated graphic motion vector matches the first decoded image.
- the decoding end interpolates the first decoded image based on the interpolated graphic motion vector to obtain a virtual reference frame, and the matching degree of the virtual reference frame with the second image frame is improved.
- the decoding end decodes the second image frame according to the virtual reference frame, the amount of data processed is reduced, thereby improving the decoding efficiency.
- the video decoding method further includes: first, the decoding end determines the target discrete parameters of the pixels in the first decoded image based on the graphic motion vector map indicated in the rendering information. Second, the first type of filter in the decoding end interpolates the reference block using the first filter kernel, and calculates the first discrete parameters of the pixels in each channel of the interpolated virtual reference block. Third, the decoding end determines the first discrete parameter with the smallest difference from the target discrete parameter from at least one first discrete parameter corresponding to all filter kernels, and the decoding end uses the filter kernel corresponding to the first discrete parameter as the parameter of the first type of filter.
- the target discrete parameter includes: a target variance or a target covariance.
- the first discrete parameter includes a first variance or a first covariance.
- the first filter kernel is one of the set multiple filter kernels.
- the decoding end determines the first discrete parameter with the smallest difference from the target discrete parameter, and the decoding end uses the first filter kernel corresponding to the first discrete parameter as the parameter of the first type of filter.
- the decoding end interpolates the first decoded image using the first type of filter and the parameters of the first type of filter to obtain a virtual reference frame.
- the virtual reference frame has a higher matching degree with the second image frame, and when the decoding end decodes the second image frame according to the virtual reference frame, the amount of data processed is reduced, thereby improving the decoding efficiency.
- the present application provides a video encoding device, which is applied to an encoding end and is applicable to a video encoding and decoding system including the encoding end, and the video encoding device includes various modules for executing the video encoding method in the first aspect or any optional implementation of the first aspect.
- the video encoding device includes: a first acquisition module, a first interpolation module and an encoding module.
- the first acquisition module is used to obtain the source video and the rendering information corresponding to the source video.
- the first interpolation module is used to determine a first type of filter based on the rendering information.
- the first decoded image is interpolated to obtain a virtual reference frame of the second image.
- the encoding module is used to encode the second image information based on the virtual reference frame of the second image.
- the source video includes multiple source images, and the rendering information is used to indicate: processing parameters used in the process of generating a code stream according to the multiple source images, and the processing parameters indicate that a first image and a second image in the multiple source images have at least a partially overlapping area.
- the first type of filter is one of the multiple types of filters that are set, and the first decoded image is a reconstructed image obtained by encoding and then decoding the first image in the multiple images, and the first image is encoded before the second image.
- the present application provides a video decoding device, which is applied to a decoding end and is applicable to a video encoding and decoding system including a decoding end, and the video decoding device includes various modules for executing the video decoding method in the second aspect or any optional implementation of the second aspect.
- the video decoding device includes: a second acquisition module, a second interpolation module and a decoding module. Among them, the second acquisition module is used to obtain the bitstream and the rendering information corresponding to the bitstream.
- the second interpolation module is used to determine a first type of filter based on the rendering information, and interpolate the first decoded image corresponding to the first image frame according to the first type of filter to obtain a virtual reference frame of the second image frame.
- the decoding module is used to decode the second image frame according to the virtual reference frame to obtain the second decoded image corresponding to the second image frame.
- the code stream includes multiple image frames, and the rendering information is used to indicate: processing parameters used in the process of generating the code stream according to the multiple source images, and the processing parameters indicate that the source image corresponding to the first image frame and the source image corresponding to the second image frame in the multiple image frames have at least a partial overlapping area.
- the first type of filter is one of the multiple types of filters that are set.
- the present application provides a chip comprising: a processor and a power supply circuit; the power supply circuit is used to power the processor, and the processor is used to execute the method in the above-mentioned first aspect and any possible implementation of the first aspect; and/or, the processor is used to execute the method in the above-mentioned second aspect and any possible implementation of the second aspect.
- the present application provides a codec comprising a memory and a processor, wherein the memory is used to store computer instructions; when the processor executes the computer instructions, it implements the method in the above-mentioned first aspect and any possible implementation manner of the first aspect; and/or, when the processor executes the computer instructions, it implements the method in the above-mentioned second aspect and any possible implementation manner of the second aspect.
- the present application provides a coding and decoding system, including a coding end and a decoding end; the coding end is used to encode multiple images according to rendering information corresponding to multiple source images, obtain code streams corresponding to the multiple images, and implement the method in the above-mentioned first aspect and any possible implementation manner of the first aspect;
- the decoding end is used to decode the code stream according to rendering information corresponding to multiple source images to obtain multiple decoded images, thereby implementing the method in the above-mentioned second aspect and any possible implementation method of the second aspect.
- the present application provides a computer-readable storage medium, wherein a computer program or instruction is stored in the storage medium.
- a computer program or instruction is stored in the storage medium.
- the present application provides a computer program product, which includes a computer program or instructions, which, when executed by a processing device, implement the method in the above-mentioned first aspect and any one of the optional implementations of the first aspect; and/or, when executed by a processing device, the computer program or instructions implement the method in the above-mentioned second aspect and any one of the optional implementations of the second aspect.
- FIG1 is an exemplary block diagram of a video encoding and decoding system provided by the present application.
- FIG2 is a schematic diagram of the structure of a video encoder provided by the present application.
- FIG3 is a schematic diagram of the structure of a video decoder provided by the present application.
- FIG4 is a schematic diagram of a flow chart of a temporal anti-aliasing method provided by the present application.
- FIG5 is a schematic diagram of a flow chart of a video encoding method provided by the present application.
- FIG6 is a schematic diagram of a reference block interpolation method provided by the present application.
- FIG7 is a schematic diagram of a graphics motion vector interpolation method provided by the present application.
- FIG8 is a schematic diagram of a video decoding process provided by the present application.
- FIG9 is a schematic diagram of the structure of a video encoding device provided by the present application.
- FIG10 is a schematic diagram of the structure of a video decoding device provided by the present application.
- FIG11 is a schematic diagram of the structure of a computer device provided in the present application.
- the encoder determines a first type of filter from the set multiple types of filters, and the first type of filter matches the filter used in the image rendering process, which is conducive to improving the similarity between the virtual reference frame determined by the encoder using the first type of filter and the second image. Therefore, when the encoder encodes the second image based on the virtual reference frame, the residual value corresponding to the second image is reduced, and the amount of data corresponding to the second image in the code stream is reduced, which improves the compression performance and the efficiency of video encoding.
- the decoding end selects a first type of filter from the set multiple types of filters according to the processing parameters indicated by the rendering information, and uses the first type of filter to obtain a virtual reference frame of the second image frame, thereby avoiding the problem of poor decoding efficiency caused by using a fixed filter to process the decoded image to obtain a reference frame.
- the matching degree between the virtual reference frame and the source image corresponding to the image to be decoded is improved.
- Video encoding The process of compressing multiple frames of images included in a video into a code stream.
- Video decoding The process of restoring the bit stream into multiple frames of reconstructed images according to specific grammatical rules and processing methods.
- FIG1 is an exemplary block diagram of the video encoding and decoding system provided by the present application.
- the video encoding and decoding system includes an encoding end 100 and a decoding end 200.
- the encoding end 100 generates encoded video data (or referred to as a code stream). Therefore, the encoding end 100 can be referred to as a video encoding device.
- the decoding end 200 can decode the code stream (such as a video including one or more image frames) generated by the encoding end 100. Therefore, the decoding end 200 can be referred to as a video decoding device.
- Various implementations of the encoding end 100, the decoding end 200, or both may include one or more processors and a memory coupled to the one or more processors with the processor.
- the memory may include, but is not limited to, a random access memory (RAM), a flash memory, a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer.
- RAM random access memory
- ROM read-only memory
- PROM programmable read-only memory
- EPROM erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer.
- the encoding end 100 includes a source video 110, a video encoder 120, and an output interface 130.
- the output interface 130 may include a modulator/demodulator (modem) and/or a transmitter.
- the source video 110 may include a video capture device (e.g., a camera), a video archive containing previously captured video data, a video feed interface for receiving video data from a video content provider, and/or a computer graphics system for generating video data, such as an image rendering engine, or a combination of the above video sources.
- the video encoder 120 may encode the video data from the source video 110.
- the encoding end 100 transmits the code stream directly to the decoding end 200 via the link 300 through the output interface 130.
- the code stream may also be stored in the storage device 400 for later access by the decoding end 200 for decoding and/or playback.
- the decoding end 200 includes an input interface 230, a video decoder 220, and a display device 210.
- the input interface 230 includes a receiver and/or a modem.
- the input interface 230 may receive encoded video data via a link 300 and/or from a storage device 400.
- the display device 210 may be integrated with the decoding end 200 or may be external to the decoding end 200. In general, the display device 210 displays the decoded video data.
- the display device 210 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- OLED organic light-emitting diode
- Figure 2 is a schematic diagram of the structure of a video encoder provided by the present application.
- the video encoder 120 includes an inter-frame predictor 121, an intra-frame predictor 122, a transformer 123, a quantizer 124, an inverse quantizer 126, an inverse transformer 127, a filter unit 128, and a memory 129.
- the inverse quantizer 126 and the inverse transformer 127 are used for image block reconstruction.
- the filter unit 128 is used to indicate one or more loop filters, such as a deblocking filter, an adaptive loop filter, and a sample adaptive offset filter.
- the memory 129 may store video data encoded by the components of the video encoder 120.
- the video data stored in the memory 129 may be obtained from the source video 110.
- the memory 129 may be a reference image memory that stores reference video data used by the video encoder 120 to encode the video data in intra-frame and inter-frame decoding modes.
- the memory 129 may be a dynamic random access memory (DRAM), a magnetic RAM (MRAM), a resistive RAM (RRAM), or other types of memory devices.
- DRAM dynamic random access memory
- MRAM magnetic RAM
- RRAM resistive RAM
- the workflow of the video encoder is described below in conjunction with the content of FIG. 2 .
- the video encoder 120 subtracts the prediction block from the current image block to be encoded to form a residual image block.
- the residual video data in the residual block may be included in one or more transform units (TUs) and applied to the transformer 123.
- the transformer 123 transforms the residual video data using a transform such as a discrete cosine transform or a conceptually similar transform.
- the video data is transformed into residual transform coefficients.
- the transformer 123 may convert the residual video data from the pixel value domain to a transform domain, such as the frequency domain.
- Transformer 123 may send the resulting transform coefficients to quantizer 124.
- Quantizer 124 quantizes the transform coefficients to further reduce the bit rate.
- quantizer 124 may then perform a scan of the matrix containing the quantized residual transform coefficients.
- entropy encoder 125 may perform the scan.
- the entropy encoder 125 After quantization, the entropy encoder 125 performs entropy encoding on the quantized transform coefficients. For example, the entropy encoder 125 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding method or technique.
- CAVLC context adaptive variable length coding
- CABAC context adaptive binary arithmetic coding
- SBAC syntax-based context adaptive binary arithmetic coding
- PIPE probability interval partitioning entropy
- the encoded bitstream may be transmitted to a video decoder, or archived for later transmission or retrieval by a video decoder.
- the entropy encoder 125 may also perform entropy encoding on syntax elements of the current image block to be encoded.
- the inverse quantizer 126 and the inverse transformer 127 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain, for example for later use as a reference block of a reference image.
- the video encoder 120 adds the reconstructed residual block to the prediction block generated by the inter-frame predictor 121 or the intra-frame predictor 122 to produce a reconstructed image or reconstructed image block.
- the filter unit 128 can be applied to the reconstructed image block to reduce distortion, such as block artifacts.
- the reconstructed image or reconstructed image block is then stored in the memory 129 as a reference block (or referred to as a first decoded image), which can be used by the inter-frame predictor 121 as a reference block for inter-frame prediction of blocks in subsequent video frames or images.
- the video encoder 120 is also connected to an interpolation filter unit 140, which is used to interpolate the first decoded image to obtain a virtual reference frame, and store the virtual reference frame in the memory 129.
- the video encoder 120 uses the virtual reference frame to assist in encoding the second image.
- the filter in the interpolation filter unit is determined by the encoding end 100 based on the acquired prior information, and the prior information can be the processing parameters used in the process of rendering the source video by computer image rendering technology.
- the similarity between the virtual reference frame and the second image is higher than the similarity between the first decoded image and the second image.
- the interpolation filter unit 140 may be disposed inside the video encoder 120 .
- the video encoder 120 can directly quantize the residual signal without being processed by the transformer 123, and accordingly, it does not need to be processed by the inverse transformer 127; or, for some image blocks or image frames, the video encoder 120 does not generate residual data, and accordingly, it does not need to be processed by the transformer 123, the quantizer 124, the inverse quantizer 126 and the inverse transformer 127; or, the video encoder 120 can directly store the reconstructed image block as a reference block without being processed by the filter unit 128; or, the quantizer 124 and the inverse quantizer 126 in the video encoder 120 can be combined together.
- FIG3 is a schematic diagram of the structure of a video decoder according to an embodiment of the present application.
- the video decoder 220 includes an entropy decoder 221, an inverse quantizer 222, an inverse transformer 223, a filter unit 224, a memory 225, an inter-frame predictor 226, and an intra-frame predictor 227.
- the video decoder 220 can perform a decoding process that is substantially the inverse of the encoding process described with respect to the video encoder 120 from FIG2 .
- the entropy decoder 221, the inverse quantizer 222, and the inverse transformer 223 are used to obtain a residual block or a residual value, and the decoded code stream is used to determine whether the current image block uses intra-frame prediction or inter-frame prediction. If it is intra-frame prediction, the intra-frame predictor 227 uses the pixel values of the pixels in the surrounding reconstructed area to construct prediction information according to the intra-frame prediction method used. If it is inter-frame prediction, the inter-frame predictor 226 needs to parse out motion information, and use the parsed motion information to determine a reference block in the reconstructed image, and use the pixel values of the pixels in the block as prediction information. The prediction information plus the residual information can be filtered to obtain the reconstructed information.
- the video decoder 220 is also externally connected to an interpolation filter unit 240 .
- the function of the interpolation filter unit 240 may refer to the content of the interpolation filter unit 140 externally connected to the video decoder 120 , which will not be described in detail here.
- the interpolation filter unit 240 may be disposed inside the video encoder 220 .
- the above source video is obtained after the rendering end performs rendering processing on the input data.
- the input data here refers to the data obtained by the display device communicating with the rendering end, or the input data refers to the data received by the rendering end.
- the rendering end includes an image rendering engine (such as V-Ray, Unreal, Unity).
- image rendering engine such as V-Ray, Unreal, Unity.
- the process map data includes: the rendered image generated by the rendering end processing the input data, and the intermediate map between the input data and the rendered image, etc.
- the intermediate map may include but is not limited to: a motion vector map, a low-quality rendering result, a depth map, a position map, a normal map, an albedo map, a specular intensity map, etc.
- the parameter information in the above rendering process refers to the post-processing parameters used by the rendering end to post-process the rendered image, such as Mesh ID, Material ID, motion blur parameters, and anti-aliasing parameters.
- process graph data and parameter information can also be collectively referred to as intermediate data corresponding to the source video, or rendering information used by the encoder when encoding the source video.
- the image rendering engine on the decoding end 200 can perform image rendering according to the input data of the display end to obtain one or more of the process diagram data and parameter information. Since the decoding end 200 obtains all or part of the intermediate data, it avoids the encoding end 100 from sending all the intermediate data to the decoding end 200, thereby reducing the amount of data transmitted from the encoding end 100 to the decoding end 200, saving bandwidth.
- the input data of the display end may be one or more of the camera (viewing angle) position, the position/intensity/color of the light source, etc.
- the input data of the rendering end may be one or more of the object geometry, the initial position of the object, the light source information, etc.
- the graphic motion vector map is an image indicating the position correspondence relationship between pixels in two frames of the source video.
- the motion blur parameter is an instruction for the image rendering engine to perform motion blur.
- the anti-aliasing parameter is a filtering method used by the image rendering engine when performing TAA.
- the image rendering engine may also perform post-processing on the rendered image.
- post-processing referring to anti-aliasing processing.
- the rendering end performs temporal anti-aliasing (TAA) processing on the rendered image to obtain multiple images included in the source video to solve the aliasing problem of the rendered image.
- TAA temporal anti-aliasing
- Figure 4 is a flow chart of the temporal anti-aliasing method provided by the present application.
- the difference between multiple frames of rendered images is small, such as only a few pixels (such as one or two, etc.) between the previous frame and the current frame have shifted. Taking the first pixel in the current frame as an example, the rendering end determines the second pixel corresponding to the first pixel in the previous frame of the current frame based on the graphic motion vector map.
- the rendering end performs weighted average of the pixel value of the first pixel and the pixel value of the second pixel to obtain the final pixel value.
- the rendering end uses the final pixel value to fill the first pixel in the current frame, thereby determining the pixel value of each pixel in the first image and obtaining the first image.
- the first pixel is any one of all the pixels in the current frame.
- the rendering end mentioned above may be an encoding end 100 , which may be on a cloud server, and the display end mentioned above may be a decoding end 200 , which may be on a client.
- the rendering end determining the pixel value of the second pixel may include: the encoding end 100 uses a filter to filter the second pixel or pixels around the second pixel (such as a 3x3 pixel range) to obtain the pixel value of the second pixel.
- the filtering method of the filter can be one of mean filtering, median filtering and Gaussian filtering.
- the image rendering engine can be deployed on the cloud side (such as the cloud side server or the cloud side encoding end), and the encoding end 100 obtains the source video after rendering multiple images, and encodes the source video to obtain the bit stream.
- the encoding end 100 transmits the bit stream to the decoding end 200, and the decoding end 200 decodes the bit stream and plays it.
- the encoding end 100 processes multiple images through an image rendering engine to obtain a game screen, and encodes the game screen to obtain a bit stream to reduce the amount of data for transmitting the game screen.
- the decoding end 200 decodes the bit stream and plays it, and the user operates the game on the decoding end 200 (such as a mobile device).
- the encoding end 100 and the decoding end 200 are located in the cloud server and the client, respectively, in order to improve the data transmission efficiency, after the encoding end 100 compresses the source video to obtain a code stream, the encoding end 100 sends the code stream to the decoding end 200.
- FIG5 is a flow chart of the video encoding method provided by the present application
- the video encoding method can be applied to the video encoding and decoding system shown in FIG1 or the video encoder shown in FIG2, exemplarily, the video encoding method can be executed by the encoding end 100 or the video encoder 120, and the encoding end 100 is taken as an example to illustrate the image encoding method provided by the present embodiment.
- the image encoding method provided by the present embodiment includes the following steps S510 to S540.
- the encoding end 100 obtains a source video and rendering information corresponding to the source video.
- the source video includes a plurality of source images.
- the source video includes a plurality of source images.
- the rendering information is used to indicate: the processing parameters used by the encoder 100 in the process of generating a bitstream according to multiple source images in the source video.
- the processing parameters may include at least one of the aforementioned process diagram data and parameter information.
- the graphic motion vector diagram in the processing parameters indicates that the first image and the second image in the multiple source images have at least a partially overlapping area, and the pixels in the overlapping area have a positional correspondence.
- S520 The encoding end 100 determines a first type of filter according to the rendering information.
- the first type of filter is one of the multiple types of filters that are set.
- the encoding end 100 may determine the first type of filter from the set filters according to the correspondence between the processing parameters in the rendering information and the set filters.
- the rendering information acquired by the encoding end 100 may include one or more combinations of the processing parameters shown in Table 1 below, where Table 1 shows at least one type of filter among multiple types of filters corresponding to each processing parameter.
- the filter corresponding to priority 5 is the alternative filter.
- the Catmull-Rom filter corresponding to priority 5 is directly used as the first type of filter.
- the filter corresponds to The higher the priority ranking, the higher the match between the filter and the filter used by the image rendering engine.
- the encoder 100 uses the filter with a high match to interpolate the reference frame to obtain a virtual reference frame, and the temporal and spatial correlation between the virtual reference frame and the second image is improved.
- the encoder 100 determines the prediction block based on the virtual reference frame, and the similarity between the prediction block and the image block corresponding to the second image is improved, thereby improving the encoding effect, reducing the amount of data obtained by encoding, and improving the compression performance.
- Table 1 above is only an example provided by the present application and should not be construed as limiting the present application.
- Table 1 also includes more processing parameters, and the processing parameters may correspond to one or more filters, and the priority corresponding to the filter may be before or after the priority corresponding to the filter shown in Table 1 above.
- the processing parameters in Table 1 above may also include a normal map, and the filter corresponding to the normal map is a Mitchell-Netravali filter, and the priority of the Mitchell-Netravali filter is before the priority corresponding to the filter adopted by TAA.
- the encoder 100 searches a mapping table such as Table 1 according to the rendering information to obtain a filter corresponding to each processing parameter in the rendering information.
- the encoder 100 determines a first type of filter from all filters corresponding to the rendering information.
- the mapping table indicates the filter corresponding to each processing parameter in the rendering information, and the corresponding relationship between the processing parameter and the filter is determined based on the filter used in each processing process of the image rendering engine.
- the encoding end 100 queries the aforementioned mapping table according to the acquired processing parameters to determine the first type of filter, and the first type of filter matches the filter used by the image rendering engine, and then the virtual reference frame obtained by the encoding end 100 using the first type of filter and the temporal and spatial correlation of the second image are improved, and the residual value of the second image determined by the encoding end 100 according to the virtual reference frame is reduced, thereby improving the encoding effect.
- the rendering information acquired by the encoding end 100 only includes anti-aliasing parameters.
- the encoder 100 searches Table 1 according to the anti-aliasing parameter to obtain the filter used by the TAA corresponding to the anti-aliasing parameter. Since the rendering information only includes the anti-aliasing parameter, the filter used by the TAA is directly used as the first type of filter.
- the rendering information acquired by the encoding end 100 includes a depth map and anti-aliasing parameters.
- the encoding end 100 queries the above Table 1 according to the depth map and anti-aliasing parameters, and according to the priority of each type of filter in the multiple types of filters indicated in Table 1, the filter priorities corresponding to the depth map and anti-aliasing parameters are 1 and 3 respectively.
- the encoding end 100 selects the filter with the highest priority, that is, the filter "Catmull-Rom filter/bilinear filter” with the aforementioned priority 1.
- the encoding end 100 determines one from the "Catmull-Rom filter/bilinear filter" as the first type of filter.
- the encoder 100 selects the filter corresponding to the processing parameter with the highest priority from the mapping table as the first type of filter according to the processing parameters included in the rendering information.
- the filter with the highest priority also has the highest matching degree with the filter used by the image rendering engine.
- the encoder 100 interpolates the reference frame using the filter with the highest matching degree to obtain a virtual reference frame.
- the temporal and spatial correlation between the virtual reference frame and the second image is improved, and then the residual value corresponding to the second image obtained by the encoder 100 according to the virtual reference frame is reduced, thereby improving the encoding effect, reducing the amount of data obtained by encoding, and improving the compression performance.
- the encoding end 100 obtains the depth map and anti-aliasing parameters included in the rendering information.
- the encoder 100 obtains all filters corresponding to the depth map and the anti-aliasing parameters, and uses all the filters to pre-encode the second image respectively, and each of all the filters obtains a corresponding prediction result.
- the encoder 100 selects a target prediction result that meets the set conditions from all the prediction results, and uses the filter corresponding to the target prediction result as the first type of filter.
- the prediction result is used to indicate: coding information obtained by pre-coding the second image based on a filter, and the coding information includes: at least one of a predicted code rate and a distortion rate of the second image.
- the above-mentioned pre-coding is: inter-frame prediction is performed on the second image to obtain a prediction block of the second image, and based on the prediction block, a residual value or a residual block of the second image is obtained, and then a transformer is processed.
- the encoding end 100 obtains the corresponding prediction code rate and distortion rate after pre-coding using all the above-mentioned filters.
- the above-mentioned set conditions can be a prediction result with the minimum distortion rate when the prediction code rate reaches the set value, or a prediction result with the minimum prediction code rate when the distortion rate is less than the set value.
- the encoding end performs pre-encoding by using all filters corresponding to the processing parameters in the rendering information.
- the encoding end screens the prediction results obtained by pre-encoding according to the set conditions, and uses the filters corresponding to the target prediction results that meet the set conditions as the first type of filters, thereby determining the filters that actually meet the preset conditions among all filters as the first type of filters, and then the virtual reference frame obtained by the encoding end 100 through interpolation according to the first type of filters has the greatest similarity with the second image.
- the bit rate and distortion rate determined by the encoding end 100 using the virtual reference frame to encode the second image achieve the best balance, thereby improving the encoding compression effect.
- the first type of filter is preset on the encoding end 100 .
- the video encoding method provided in this embodiment further includes step S530 .
- S530 The encoding end 100 interpolates the first decoded image according to the first type of filter to obtain a virtual reference frame of the second image.
- the first decoded image is a reconstructed image after encoding and then decoding the first image among the multiple images, and the reconstructed image is an image processed by the filter unit 128 in Figure 2.
- the first image is encoded before the second image.
- the first decoded image includes one or more reference blocks corresponding to the first image, such as the first decoded image includes M x N reference blocks, and M and N are both integers greater than or equal to 1.
- the content of the reconstructed image can be obtained by referring to the description shown in Figure 2 above, which is not repeated here.
- the obtained virtual reference frame can be stored in the memory 129 as shown in Figure 2.
- FIG6 is a schematic diagram of the reference block interpolation method provided in the present application, and the method includes the following first to third steps.
- the encoding end 100 In the first step, the encoding end 100 generates a blank image with the same size as the reference block.
- the encoder 100 When the size of the graphic motion vector is consistent with that of the first decoded image, the encoder 100 directly generates a blank image consistent with the size of the reference block, and splits the graphic motion vector into M x N images so that the split images are consistent with the size of the reference block. When M and N are both 1, the encoder 100 does not split the graphic motion vector.
- the sizes of the first decoded image, the first image, and the second image are the same.
- the encoder 100 interpolates the graphics motion vector to make the size of the interpolated graphics motion vector consistent with that of the first decoded image.
- FIG. 7 below provides a possible implementation method for the encoder 100 to interpolate the motion vector graph. The implementation method will not be described in detail here.
- the encoder 100 interpolates the reference block using the split graphic motion vector map according to the first corresponding relationship to determine the pixel values of all pixels in the blank image.
- the first corresponding relationship is used to indicate that the first image and the second image indicated by the graphic motion vector map in the rendering information have at least a partially overlapping area, and in the partially overlapping area, the position mapping relationship between the pixels of the first image and the pixels of the second image.
- the pixels in the first decoded image Since the first decoded image is obtained by encoding and then decoding the first image, the pixels in the first decoded image have a positional correspondence with the pixels in the first image.
- the blank image obtained in the first step has the same size as the reference block, so it can be understood that the pixels in the blank image correspond to the pixel positions of the corresponding image block in the second image. Based on the positional mapping relationship between the pixels of the first image and the pixels of the second image, the pixels on the reference block have a positional correspondence with the pixels on the blank image.
- the encoding end 100 uses the first type of filter to perform interpolation filtering on each pixel on the reference block, determines the pixel value of each pixel position on the corresponding blank image, and then obtains the pixel values of all pixels in the virtual reference block.
- the position mapping relationship between pixel A in the reference block and pixel A’ in the blank image in the first correspondence is used as an example for explanation.
- the encoding end 100 obtains the position mapping relationship between the pixels of the reference block and the pixels on the blank image based on the position mapping relationship between the pixels of the first image and the pixels of the second image.
- pixel A on the reference block and pixel A’ on the blank image have a position mapping relationship.
- the encoding end 100 uses the first type of filter to perform interpolation filtering on pixel A or pixel A and the pixels around pixel A to obtain the corresponding pixel value.
- the encoding end 100 then fills the pixel value into pixel A’ on the blank image to obtain the color at pixel A’.
- the encoding end 100 calculates the pixel value of each pixel on the blank image according to the above-mentioned processing steps of pixel A', thereby obtaining the pixel values of all pixels on the blank image.
- the encoding end 100 obtains a virtual reference frame according to the pixel values of all pixels in the blank image.
- the encoding end 100 fills the pixel values of all pixels on the blank image into the corresponding pixels of the blank image to obtain a virtual reference block.
- the encoding end 100 obtains the pixel values of all pixels on the blank image corresponding to each reference block by processing each reference block included in the first decoded image through the first and second steps, thereby obtaining the corresponding virtual reference block.
- the encoding end 100 obtains a virtual reference frame by integrating each virtual reference block.
- the first type of filters used may be different. For example, different reference blocks use different first type filters.
- the encoding end interpolates the reference block in the first decoded image according to the first corresponding relationship to obtain a virtual reference frame of the second image.
- the similarity between the virtual reference frame and the second image is higher than the similarity between the first decoded image and the second image, so that the residual value between the virtual reference frame and the second image is reduced, and then while ensuring the consistency of image quality, the amount of data of the image in the code stream is reduced, thereby improving the encoding effect.
- the video encoding method provided in this embodiment further includes step S540 .
- S540 The encoding end 100 encodes the second image based on the virtual reference frame of the second image to obtain a code stream corresponding to the second image.
- the encoder 100 uses the virtual reference frame of the second image to perform inter-frame prediction and determines a prediction frame corresponding to the second image, where the prediction frame includes a plurality of prediction blocks.
- the encoder 100 obtains a corresponding residual value (or residual block) based on the image block in the second image and the corresponding prediction block, and determines a motion vector (Motion).
- the encoding end 100 encodes the residual value and the motion vector to encode the second image and obtain a corresponding code stream.
- the present application provides the following possible examples.
- the encoding end 100 determines, according to the inter-frame predictor 121 in the video encoder 120 shown in FIG. 2 , a reference block having the highest matching degree with the image block of the second image from the memory 129 as the prediction block of the second image.
- the reference block with the highest matching degree includes a virtual reference block.
- the encoding end 100 may use the full search method (Full Search, FS) or the fast search method to search for image blocks, and the encoding end 100 may then use methods such as mean-square error (MSE) and mean absolute deviation (MAD) to calculate the matching degree between the searched reference block and the image block of the second image, and thus obtain the reference block with the highest matching degree.
- Full Search Full Search
- MSE mean-square error
- MAD mean absolute deviation
- the present application provides the following possible examples.
- the encoder 100 determines the relative displacement between the image block and the prediction block according to the position of the image block in the second image and the position of the prediction block in the corresponding reference frame, and the relative displacement is the motion vector of the image block.
- the motion vector is (1, 1), indicating that the position of the image block in the second image is moved by (1, 1) compared to the position of the prediction block in the corresponding reference frame.
- the encoding end 100 may send type information of the first type of filter to the decoding end 200.
- the type information may be a Catmull-Rom filter, a bilinear filter, or a B-spline filter.
- Example 1 The encoder 100 may write the type information of the first type of filter into a bitstream and send the bitstream to the decoder 200.
- Example 2 The encoder 100 may send the type information of the first type of filter as rendering information to the decoder 200. For example, the encoder 100 compresses the rendering information including the type information of the first type of filter and sends it to the decoder 200.
- the encoder determines a first type of filter from the set multiple types of filters, and the first type of filter matches the filter used in the image rendering process, which is conducive to improving the similarity between the virtual reference frame determined by the encoder using the first type of filter and the second image. Therefore, when the encoder encodes the second image based on the virtual reference frame, the residual value corresponding to the second image is reduced, and the amount of data corresponding to the second image in the code stream is reduced, thereby improving the compression performance and the efficiency of video encoding.
- the encoder 100 determines a first type of filter from "Catmull-Rom filter/bilinear filter" according to the correspondence between object edge information in the depth map and a reference block in the first decoded image corresponding to the depth map.
- the encoder 100 performs a sobel filter on the depth map to obtain a filtering result. Secondly, the encoder 100 performs a binarization process on the filtering result to obtain edge information of the object in the depth map. Finally, the encoder 100 determines the first type of filter corresponding to each reference block from the "Catmull-Rom filter/bilinear filter" based on the correspondence between each reference block in the first decoded image or the image block in the depth map corresponding to each reference block and the edge information of the object.
- the bilinear filter is used as the first type of filter for the reference block.
- a filter is determined from the “Catmull-Rom filter/B-spline filter” as the first type of filter.
- the encoding end 100 performs a sobel filter on the albedo map to obtain a filtering result. Secondly, the encoding end 100 performs a large-law binarization process on the filtering result to obtain edge information of the object in the albedo map. Finally, the encoding end 100 determines the first type of filter corresponding to each reference block from the "Catmull-Rom filter/B-spline filter" based on the correspondence between each reference block in the first decoded image or the image block in the depth map corresponding to each reference block and the edge information of the object.
- the Catmull-Rom filter is used as the first type of filter for the reference block; when the position of a pixel in a reference block in the first decoded image is not at the position indicated by the edge information, the B-spline filter is used as the first type of filter for the reference block.
- the encoding end uses different first-class filters to interpolate different reference blocks according to the different positional relationships between the reference blocks and the edge information of the object, so that the similarity between the virtual reference blocks interpolated by the encoding end and the corresponding image blocks in the second image is improved.
- the encoding end uses the virtual reference blocks to encode the above image blocks, the residual value corresponding to the image blocks is reduced, thereby improving the compression performance and the efficiency of video encoding.
- the encoder 100 interpolates the graphics motion vector map.
- a possible implementation manner is given below.
- the encoding end 100 generates a blank image of the same size as the source image or the graphic motion vector map. Secondly, the encoding end 100 determines a second correspondence between a pixel in the first graphic motion vector map and a pixel position in the blank image based on the size ratio between the first graphic motion vector map and the blank image. Finally, the encoding end 100 interpolates the graphic motion vector map or the first decoded image using an interpolation filter to determine the pixel value and offset of each pixel in the blank image. When the size of the interpolated graphic motion vector map is consistent with that of the first decoded image, a first correspondence indicated by the interpolated graphic motion vector map is obtained. For the description of the first correspondence, refer to the second step shown in FIG. 6 above. The source image is consistent in size with the first decoded image.
- Figure 7 is a schematic diagram of the graphics motion vector interpolation method provided by the present application, and is taken as an example in which the first graphics motion vector diagram is the original graphics motion vector diagram output by the image rendering engine, and the size of the first graphics motion vector diagram is smaller than the size of the first decoded image.
- the size of the first graphic motion vector is 4x4, while the size of the first decoded image is 8x8.
- the encoder 100 generates a blank image of the same size as the first decoded image, such as an 8x8 blank image.
- the encoder 100 determines that the size ratio of the first graphic motion vector and the blank image is 1:4, that is, one pixel in the first graphic motion vector corresponds to four pixels in the blank image.
- the four pixels corresponding to the blank image are C(1, 1), C 1 (1, 2), C 2 (2, 1), and C 3 (2, 2).
- the encoder 100 Based on the second corresponding relationship between the positions of pixel C in the first graphic motion vector and pixels C, C 1 , C 2 , and C 3 in the blank image, the encoder 100 interpolates the first graphic motion vector using an interpolation filter to obtain the pixel values and offsets of pixels C, C 1 , C 2 , and C 3 in the second graphic motion vector. As mentioned above, the offsets of the pixels in the blank image are all (2, 2). The encoding end 100 can obtain the first corresponding relationship according to the offset.
- this application provides several possible examples, such as bilinear filtering, nearest filtering, bicubic filtering, etc.
- the encoder interpolates the graphic motion vector to obtain an interpolated graphic motion vector.
- the size of the interpolated graphic motion vector is consistent with the size of the first decoded image, and the interpolated graphic motion vector matches the pixels of the first decoded image.
- the encoder interpolates the first decoded image based on the interpolated graphic motion vector to obtain a virtual reference frame, and the matching degree between the virtual reference frame and the second image is improved.
- the encoder encodes the second image according to the virtual reference frame, the residual value corresponding to the image is reduced, thereby improving the compression performance.
- the encoding end 100 will also determine the filter kernel size used by the filter.
- a possible implementation method is given below.
- the encoding end 100 determines target discrete parameters of pixels in the first decoded image based on the graphic motion vector map indicated in the rendering information.
- the encoding end 100 takes out the image block in the overlapping area of the first decoded image based on the fact that the first image and the second image indicated by the interpolated graphics motion vector diagram have at least a partial overlapping area, and calculates the discrete parameters of the pixels in the image block in each channel as the target discrete parameters.
- the encoder 100 calculates the discrete parameters of the pixels in each channel in the current reference block in the first decoded image as the target discrete parameters.
- the current reference block may indicate a reference block for which interpolation is being performed.
- the pixel in each channel is used to indicate three color channels of brightness (Y), blue concentration offset (Cb) and red concentration offset (Cr), and the target discrete parameters include: target variance and target covariance.
- the first type of filter in the encoding end 100 interpolates the reference block using the first filter kernel, and calculates the first discrete parameter of the pixels in each channel of the interpolated virtual reference block.
- the first filter kernel is one of a plurality of set filter kernels, and the plurality of filter kernels have different sizes.
- the first type of filter uses the 3x3 filter kernel to perform interpolation filtering on the reference block to obtain the pixel values of each pixel on the virtual reference block.
- the encoding end 100 calculates the first variance or first covariance of the pixels in each channel of the virtual reference block according to the pixel values of each pixel on the virtual reference block, thereby obtaining the first discrete parameter of the virtual reference block.
- the weights in the above-mentioned filter kernel can be obtained based on the graphic motion vector map.
- the graphic motion vector map indicates: the position mapping relationship between the pixels in the first image and the pixels in the second image, and this mapping relationship is not necessarily a mapping relationship of integer pixels.
- the second image is obtained, and the pixel at the first position in the first image is offset to the second position in the second image.
- the second position may be at an integer pixel position or between the positions of multiple pixels in the second image. If the distance value corresponding to the second position is 0.5, it indicates that the second position is located at the middle of the positions of two pixels in the second image.
- the encoding end 100 determines the weights in the filter kernel based on the above-mentioned pixel position mapping relationship.
- the encoding end 100 determines a first discrete parameter having the smallest difference with the target discrete parameter from at least one first discrete parameter corresponding to all filter kernels, and uses the filter kernel corresponding to the first discrete parameter as a parameter of the first type of filter.
- the encoder 100 determines a first discrete parameter with the smallest difference from the target discrete parameter, and uses a first filter kernel corresponding to the first discrete parameter as a parameter of the first type of filter.
- the first decoded image is interpolated with a parameter of a type of filter to obtain a virtual reference frame.
- the virtual reference frame has a higher matching degree with the second image.
- FIG 8 is a flow chart of a video decoding method provided by the present application.
- the video decoding method can be applied to the video encoding and decoding system shown in Figure 1 or the video decoder shown in Figure 3.
- the video decoding method can be executed by the decoding end 200 or the video decoder 220.
- the decoding end 200 executes the image decoding method provided in this embodiment as an example for explanation.
- the image encoding method provided in this embodiment includes the following steps S810 to S840.
- the decoding end 200 obtains a bitstream and rendering information corresponding to the bitstream.
- the rendering information is used to indicate the processing parameters used by the decoding end 200 in the process of generating a bitstream according to multiple source images.
- the graphic motion vector diagram in the processing parameters indicates that the source image corresponding to the first graphic frame and the source image corresponding to the second image frame in the multiple image frames have at least a partially overlapping area.
- the rendering information may be sent from the encoding end 100 to the decoding end 200 .
- the rendering information acquired by the decoding end 200 includes: one or more combinations of processing parameters in the mapping table shown in Table 1 above.
- the decoding end 200 determines a first type of filter based on the rendering information.
- the first type of filter is one of the multiple types of filters that are set.
- the decoding end 200 queries Table 1 above to obtain the first type of filter corresponding to the processing parameter according to the processing parameter indicated by the rendering information.
- Table 1 shows at least one type of filter among the multiple types of filters corresponding to the processing parameter, and each type of filter has a corresponding priority.
- the decoding end 200 determining the content of the first type of filter according to the rendering information
- the decoding end 200 may query a preset mapping table according to the rendering information to obtain a filter corresponding to each processing parameter in the rendering information.
- the decoding end 200 determines a first type of filter from all filters corresponding to the rendering information.
- mapping table For the relevant contents of the mapping table, reference may be made to S520 in FIG. 5 and the description of determining the first type of filter shown in Table 1, which will not be described in detail herein.
- the decoding end 200 obtains filter information and determines the first type of filter and the filter parameters corresponding to the first type of filter according to the filter type and parameters indicated in the filter information.
- the filtering parameters are used to indicate processing parameters used by the decoding end 200 in the process of obtaining a virtual reference frame based on the first type of filter, such as the size of the filter kernel and the corresponding weight of the filter.
- the above rendering information or code stream may include filtering information.
- the first type of filter is preset on the decoding end 200 .
- the decoding end 200 interpolates the first decoded image corresponding to the first image frame according to the first type of filter to obtain a virtual reference frame of the second image frame.
- the decoding end 200 For example, first, the decoding end 200 generates a blank image with the same size as the reference block; secondly, the decoding end 200 interpolates the reference block using the split graphic motion vector map according to the first corresponding relationship to determine the pixel values of all pixels in the blank image; finally, the decoding end 200 obtains a virtual reference frame according to the pixel values of all pixels in the blank image.
- the first corresponding relationship is determined based on the graphic motion vector map in the rendering information.
- the reference block is a first decoded image, and the first decoded image is any one of the multiple reference blocks corresponding to the first image.
- the encoding end interpolates the reference block in the first decoded image according to the first corresponding relationship to obtain a virtual reference frame of the second image.
- the similarity between the virtual reference frame and the second image frame is higher than the similarity between the first decoded image and the second image frame.
- S840 The decoding end 200 decodes the second image frame based on the virtual reference frame to obtain a second decoded image corresponding to the second image frame.
- the decoding end 200 performs inverse quantization, inverse transformation, etc. on the second image frame to obtain a residual block and motion vector corresponding to the second image frame.
- the decoding end 200 uses the residual block, motion vector, etc. data and the virtual reference frame determined in S830 to perform image reconstruction to obtain a second decoded image corresponding to the second image frame.
- the decoding end 200 uses the video decoder 220 as shown in FIG3 to obtain a residual block between the second image frame and the corresponding virtual reference frame after the second image frame in the code stream is processed by the entropy decoder 221, the inverse quantizer 222 and the inverse transformer 223.
- the video decoder 220 uses the residual block, the motion vector and the virtual reference frame to reconstruct the image block.
- the video decoder 220 processes the reconstructed multiple image blocks through the filter unit 224 to obtain a second decoded image corresponding to the second image frame.
- the decoding end 200 selects a first type of filter from the set multiple types of filters according to the processing parameters indicated by the rendering information, and uses the first type of filter to obtain a virtual reference frame of the second image frame, thereby avoiding the problem of poor decoding effect caused by using a fixed filter to process the first decoded image to obtain a reference frame.
- the matching degree between the virtual reference frame and the source image corresponding to the second image frame is improved.
- the decoding end 200 will interpolate the graphics motion vector map, for example: first, the decoding end 200 generates a blank image with the same size as the first decoded image. Secondly, the decoding end 200 determines a second correspondence between a pixel in the first graphics motion vector map and a pixel position in the blank image based on the size ratio between the first graphics motion vector map and the blank image. Finally, the decoding end 200 interpolates the graphics motion vector map using an interpolation filter to determine the pixel value and offset of each pixel in the blank image.
- the first decoded image has the same size as the source image.
- the decoding end interpolates the graphic motion vector to obtain an interpolated graphic motion vector.
- the size of the interpolated graphic motion vector is consistent with the size of the first decoded image, and the interpolated graphic motion vector matches the first decoded image.
- the decoding end interpolates the first decoded image based on the interpolated graphic motion vector to obtain a virtual reference frame, and the matching degree of the virtual reference frame with the second image frame is improved.
- the decoding end 200 decodes the second image frame according to the virtual reference frame, the amount of data processed is reduced, thereby improving the decoding efficiency.
- the decoding end 200 will also determine the filter kernel size used by the filter.
- a possible implementation method is given below.
- the decoding end 200 determines target discrete parameters of pixels in the first decoded image based on the graphic motion vector map indicated in the rendering information.
- the first type of filter in the decoding end 200 interpolates the reference block using the first filter kernel, and calculates the first discrete parameter of the pixels in each channel of the interpolated virtual reference block.
- the decoding end 200 determines a first discrete parameter having the smallest difference with the target discrete parameter from at least one first discrete parameter corresponding to all filter kernels, and the decoding end 200 uses the filter kernel corresponding to the first discrete parameter as a parameter of the first type of filter.
- the decoding end 200 determines the first discrete parameter with the smallest difference from the target discrete parameter, and the decoding end 200 uses the first filter kernel corresponding to the first discrete parameter as the parameter of the first type of filter.
- the decoding end 200 interpolates the first decoded image using the first type of filter and the parameters of the first type of filter to obtain a virtual reference frame.
- the matching degree between the virtual reference frame and the second image frame is improved, and when the decoding end 200 decodes the second image frame according to the virtual reference frame, the amount of data processed is reduced, thereby improving the decoding efficiency.
- FIG. 9 is a structural schematic diagram of a video encoding device provided by the present application.
- the video encoding device 900 can be used to implement the functions of the encoding end in the above method embodiment, and thus can also achieve the beneficial effects of the above method embodiment.
- a video encoding device 900 includes a first acquisition module 910, a first interpolation module 920, and an encoding module 930; the video encoding device 900 is used to implement the functions of the encoding end in the method embodiments corresponding to the above-mentioned FIG4 to FIG7 .
- the specific process of the video encoding device 900 for implementing the above-mentioned video encoding method includes the following process:
- the first acquisition module 910 is used to acquire a source video and rendering information corresponding to the source video, wherein the source video includes multiple source images, and the rendering information is used to indicate: a processing parameter used in a process of generating a bitstream according to the multiple source images, and the processing parameter indicates that a first image and a second image in the multiple source images have at least a partially overlapping area.
- the first interpolation module 920 is used to determine a first type of filter according to the rendering information. And interpolate the first decoded image according to the first type of filter to obtain a virtual reference frame of the second image.
- the first type of filter is one of the multiple types of filters set, and the first decoded image is a reconstructed image obtained by encoding and then decoding the first image among the multiple images, and the first image is encoded before the second image.
- the encoding module 930 is configured to encode the second image information based on the virtual reference frame of the second image.
- the present application further provides a video encoding device, and the video encoding device 900 further includes a first filter kernel determination module 940 .
- the first filter kernel determination module 940 is used to determine the target discrete parameters of pixels in the first decoded image based on rendering information; the target discrete parameters include: target variance or target covariance; and the first filter kernel corresponding to the first discrete parameter with the smallest difference with the target discrete parameter is used as the parameter of the first type of filter, and the first discrete parameter is used to indicate: the first type of filter interpolates the pixels based on the first filter kernel, and calculates the discrete parameters based on the interpolated pixels, the first filter kernel is one of the set multiple filter kernels, and the first discrete parameters include the first variance or the first covariance.
- the encoding end of the aforementioned embodiment may correspond to the video encoding device 900, and may correspond to executing the
- the method Figures 4 to 7 of the embodiments of the present application correspond to the corresponding subjects, and the operations and/or functions of each module in the video encoding device 900 are respectively for implementing the corresponding processes of each method of the corresponding embodiments in Figures 4 to 7. For the sake of brevity, they will not be repeated here.
- FIG. 10 is a schematic diagram of the structure of a video decoding device provided by the present application.
- the video decoding device 1000 can be used to implement the functions of the decoding end in the above method embodiment, and thus can also achieve the beneficial effects of the above method embodiment.
- the video decoding device 1000 includes a second acquisition module 1010, a second interpolation module 1020, and a decoding module 1030; the video decoding device 1000 is used to implement the functions of the decoding end in the method embodiments corresponding to the above-mentioned FIG3 and FIG8 .
- the specific process of the video decoding device 1000 for implementing the above-mentioned video decoding method includes the following process:
- the second acquisition module 1010 is used to acquire a code stream and rendering information corresponding to the code stream.
- the code stream includes multiple image frames
- the rendering information is used to indicate: processing parameters used in the process of generating the code stream according to multiple source images, and the processing parameters indicate that the source image corresponding to the first image frame and the source image corresponding to the second image frame in the multiple image frames have at least a partial overlapping area.
- the second interpolation module 1020 is used to determine a first type of filter based on the rendering information, and interpolate the first decoded image corresponding to the first image frame according to the first type of filter to obtain a virtual reference frame of the second image frame.
- the first type of filter is one of the set multiple types of filters.
- the decoding module 1030 is configured to decode the second image frame according to the virtual reference frame to obtain a second decoded image corresponding to the second image frame.
- the present application further provides a video decoding device, the video decoding device 1000 further comprising a filter information acquisition module 1040 and a second filter kernel determination module 1050 .
- the filter information acquisition module 1040 is used to obtain filter information, and determine the first type of filter and the filtering parameters corresponding to the first type of filter according to the filter type and parameters indicated in the filter information.
- the filtering parameters are used to indicate: the processing parameters used in the process of obtaining a virtual reference frame based on the first type of filter.
- the second filter kernel determination module 1050 is used to determine the target discrete parameters of the pixels in the first decoded image based on the rendering information; the target discrete parameters include: the target variance or the target covariance; and the first filter kernel corresponding to the first discrete parameter with the smallest difference with the target discrete parameter is used as the parameter of the first type of filter, the first discrete parameter is used to indicate: the first type of filter interpolates the pixels based on the first filter kernel, and calculates the discrete parameters based on the interpolated pixels, the first filter kernel is one of the set multiple filter kernels, and the first discrete parameters include the first variance or the first covariance.
- the decoding end according to the embodiment of the present application may correspond to the video decoding device 1000 in the embodiment of the application, and may correspond to the corresponding body corresponding to the method Figures 3 and 8 according to the embodiment of the present application, and the operations and/or functions of each module in the video decoding device 1000 are respectively for implementing the corresponding processes of each method of the corresponding embodiments in Figures 3 and 8. For the sake of brevity, they will not be repeated here.
- FIG 11 is a schematic diagram of the structure of a computer device provided in the present application.
- the computer device can be applied to the encoding and decoding system shown in Figure 1, and the computer device can be any one of the encoding end 100 or the decoding end 200.
- the computer device 1100 may be a mobile phone, a tablet computer, a television (also known as a smart TV, a smart screen or a large screen).
- the main types of computing devices include laptops, ultra-mobile personal computers (UMPCs), handheld computers, netbooks, personal digital assistants (PDAs), wearable electronic devices (e.g. smart watches, smart bracelets, smart glasses), in-vehicle devices, virtual reality devices, servers, and other computing devices with computing capabilities.
- UMPCs ultra-mobile personal computers
- PDAs personal digital assistants
- wearable electronic devices e.g. smart watches, smart bracelets, smart glasses
- in-vehicle devices virtual reality devices
- servers and other computing devices with computing capabilities.
- the processing device 1100 may include a processor 1110 , a memory 1120 , a communication interface 1130 , a bus 1140 , etc.
- the processor 1110 , the memory 1120 , and the communication interface 1130 are connected via the bus 1140 .
- the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the processing device.
- the processing device may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange the components differently.
- the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
- the processor 1110 may include one or more processing units, for example, the processor 1110 may include an application processor (application processor, AP), a modem processor, a central processing unit (Central Processing Unit, CPU), a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural-network processing unit (neural-network processing unit, NPU), etc.
- different processing units may be independent devices or integrated in one or more processors.
- the processor 1110 may also be provided with an internal memory for storing instructions and data.
- the internal memory in the processor 1110 is a cache memory.
- the internal memory may store instructions or data that the processor 1110 has just used or cyclically used. If the processor 1110 needs to use the instruction or data again, it may be directly called from the internal memory. This avoids repeated access, reduces the waiting time of the processor 1110, and thus improves the efficiency of the system.
- the memory 1120 can be used to store computer executable program codes, which include instructions.
- the processor 1110 executes various functional applications and data processing of the processing device 1100 by running the instructions stored in the internal memory 1120.
- the internal memory 1120 may include a program storage area and a data storage area.
- the program storage area may store an operating system, an application required for at least one function (such as an encoding function, a sending function, etc.), etc.
- the data storage area may store data created during the use of the processing device 1100 (such as a bit stream, a reference frame, etc.), etc.
- the internal memory 1120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
- UFS universal flash storage
- the communication interface 1130 is used to implement communication between the processing device 1100 and an external device or component. In this embodiment, the communication interface 1130 is used to perform data exchange with other processing devices.
- the bus 1140 may include a path for transmitting information between the above components (such as the processor 1110, the memory 1120, and the communication interface 1130).
- the bus 1140 may also include a power bus, a control bus, and a status signal bus.
- various buses are labeled as bus 1140 in the figure.
- the bus 1140 may be a peripheral component interconnect express (PCIe) high-speed bus, an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), a compute express link (CXL), a cache coherent interconnect for accelerators (CCIX), etc.
- PCIe peripheral component interconnect express
- EISA extended industry standard architecture
- Ubus or UB unified bus
- CXL compute express link
- CIX cache coherent interconnect for accelerators
- FIG11 only takes the processing device 1100 including 1 processor 1110 and 1 memory 1120 as an example.
- the processor 1110 and the memory 1120 are respectively used to indicate a type of device or equipment.
- the number of each type of device or equipment can be determined according to business requirements.
- the hardware can be implemented by a processor or a chip.
- the chip includes a processor for implementing the functions of the encoding end and/or the decoding end in the above method.
- the chip also includes a power supply circuit for powering the processor.
- the chip can be directly composed of a chip, or it can include a chip and other discrete devices.
- the computer program product includes one or more computer programs or instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user device or other programmable device.
- the computer program or instruction may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instruction may be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired or wireless means.
- the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server, data center, etc. that integrates one or more available media.
- the available medium may be a magnetic medium, for example, a floppy disk, a hard disk, a tape; it may also be an optical medium, for example, a digital video disc (DVD); it may also be a semiconductor medium, for example, a solid state drive (SSD).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
公开了一种视频编解码方法及装置,涉及计算机领域。编码端(解码端)基于渲染信息指示的处理参数,从设定的多类滤波器中选择第一类滤波器,并利用该第一类滤波器获取待处理数据的虚拟参考帧,编码端避免了采用固定的滤波器对重建图像进行处理得到参考帧导致的压缩性能较低的问题,编码端根据虚拟参考帧来对前述待编码的图像进行编码时,该图像对应的残差值降低,提升了压缩性能以及视频编码的效率。并且解码端避免采用固定的滤波器对解码图像处理得到参考帧导致的解码效率较差的问题,该虚拟参考帧与待解码图像对应源图像的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了视频解码效率。
Description
本申请要求于2022年11月23日提交国家知识产权局、申请号为202211478458.9、申请名称为“视频编解码方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机领域,尤其涉及视频编解码方法及装置。
在视频编解码技术中,视频压缩技术尤为重要。视频压缩系统执行空间(图像内)预测和/或时间(图像间)预测,以减少或移除视频序列中固有的冗余信息。针对于视频压缩的视频编码过程,编码端从视频图像已编码的帧中任意选取一个或以上参考帧,并从参考帧中获取当前图像块对应的预测块,然后计算预测块与当前图像块之间的残差值,对该残差值进行量化编码。然而,预测块是编码端根据运动补偿和运动估计从参考帧选择的,该参考帧为经环路滤波器处理得到的图像,参考帧为当前图像块的相邻帧对应的图像,导致参考帧与当前图像块的相似度较低,前述预测块与当前图像块的差异较大,从而使得预测块与当前图像块之间的残差值较大,编码获得的视频码流的数据量较大,视频压缩过程的性能较低,视频编码的效率受到影响。因此,如何提供一种更有效的视频编解码方法成为目前亟需解决的问题。
发明内容
本申请提供了视频编解码方法及装置,解决了对视频压缩得到的码流数据量较大,视频压缩率较低的问题。
第一方面,本申请提供了一种视频编码方法,该视频编码方法可应用于编解码系统或应用于支持该编解码系统实现视频编码方法的编码端,例如该编码端包括视频编码器。这里以编码端执行本实施例提供的视频编码方法为例进行说明,该视频编码方法包括:第一,编码端获取源视频和源视频对应的渲染信息。第二,编码端根据渲染信息,从设定的多类滤波器中确定第一类滤波器。第三,编码端根据第一类滤波器,对第一解码图像进行插值,得到第二图像的虚拟参考帧。第四,编码端基于第二图像的虚拟参考帧,对第二图像进行编码,得到第二图像对应的码流。
其中,该源视频包括多个源图像。渲染信息用于指示:编码端根据多个源图像生成码流的过程中所使用的处理参数,该处理参数指示了多个源图像中的第一图像和第二图像具有至少部分重叠区域。第一解码图像为多个图像中第一图像进行编码后再解码获取的重建图像。
相较于采用固定的滤波器对重建图像进行处理得到参考帧,该参考帧与第二图像的相似度较低导致压缩性能较低的问题,本实施例中编码端从设定的多类滤波器中确定第一类滤波器,该第一类滤波器与图像渲染过程中所使用的滤波器匹配,有利于提升编码端利用该第一类滤波器确定的虚拟参考帧与第二图像之间的相似度。从而,在编码端基于该虚拟参考帧对第二图像编码时,该第二图像对应的残差值降低,码流中该第二图像对应的数据量减少,提升了压缩性能以及视频编码的效率。
在一种可能的实现方式中,编码端根据由渲染信息确定的第一类滤波器包括:编码端根据渲染信息查询设定的映射表,获取渲染信息中每个处理参数对应的滤波器。以及,编码端从渲染信息对应的所有滤波器中,确定第一类滤波器。
其中,该映射表用于指示:每个处理参数对应的多类滤波器中至少一类滤波器。
该映射表指示了渲染信息中的各个处理参数对应的滤波器,且该处理参数与滤波器的对应关系基于图像渲染引擎的各处理过程采用的滤波器确定。编码端根据获取到的处理参数查询前述的映射表,确定第一类滤波器,该第一类滤波器与图像渲染引擎采用的滤波器匹配,进而编码端利用该第一类滤波器得到的虚拟参考帧与第二图像时空域相关性提高,编码端根据虚拟参考帧确定的第二图像的残差值降低,从而提升了编码效果。
在一种可能的实现方式中,该映射表还用于指示:多类滤波器中每类滤波器的优先级;编码端从渲染信息对应的滤波器中,确定第一类滤波器包括:编码端获取渲染信息对应的所有滤波器的优先级;以及,编码端将所有滤波器中滤波器优先级最高的滤波器作为第一类滤波器。
编码端根据渲染信息包括的处理参数,从映射表中选取优先级最高的处理参数对应的滤波器作为第一类滤波器,该优先级最高的滤波器与图像渲染引擎采用的滤波器的匹配度也最高。编码端利用该匹配度最高的滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的时空域相关性提高,进而编码端根据该虚拟参考帧,得到的第二图像对应的残差值降低,从而提升了编码效果,降低了编码得到的数据量,提高了压缩性能。
在一种可能的实现方式中,编码端从渲染信息对应的所有滤波器中,确定第一类滤波器包括:编码端对渲染信息对应的所有滤波器进行预测,获取至少一个预测结果。以及,编码端选择至少一个预测结果中编码信息符合设定的条件的目标预测结果,并将目标预测结果对应的滤波器作为第一类滤波器。
其中,一个预测结果对应所有滤波器中的一个滤波器,一个预测结果用于指示:编码端基于一个滤波器对第二图像进行预编码获取的编码信息,该编码信息包括:第二图像的预测码率和失真率中至少一种。
示例的,上述预编码为:对第二图像进行帧间预测得到第二图像的预测块,并根据预测块,得到第二图像中对应图像块的残差值,再进行变换处理的过程。
编码端通过利用渲染信息中处理参数对应的所有滤波器,进行预编码,编码端根据设定的条件对预编码得到的预测结果进行筛选,将满足设定的条件的目标预测结果对应的滤波器,作为第一类滤波器,从而确定了所有滤波器中实际符合预设条件的滤波器,作为第一类滤波器,进而编码端根据该第一类滤波器插值得到的虚拟参考帧与第二图像的相似度最大。在编码端对第二图像进行编码的过程中,由于虚拟参考帧与第二图像之间的相似度最大,因此,编码端利用虚拟参考帧对该第二图像进行编码确定的第二图像对应的残差值最小,提高了编码压缩效果。
在一种可能的示例中,渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
针对于上述渲染信息包括处理参数的示例,本申请给出了各个处理参数对应的第一类滤波器或第一类滤波器的确定步骤,若渲染信息包括深度图时,则第一类滤波器通过深度图中的像素与所述深度图中物体的边缘信息的关系来确定,如深度图可对应Catmull-Rom
滤波器或bilinear滤波器;或者,
若渲染信息包括反照率图时,则第一类滤波器通过反照率图中的像素与反照率图中物体的边缘信息的关系来确定,如反照率图可对应Catmull-Rom滤波器或B-spline滤波器;或者,
若渲染信息包括的后处理参数包含抗锯齿参数时,则第一类滤波器为抗锯齿参数所指示的滤波器;或者,
若渲染信息包括的后处理参数包含执行运动模糊模块的指令时,则第一类滤波器为bilinear滤波器;
或者,第一类滤波器为Catmull-Rom滤波器。
示例的,上述深度图中物体的边缘信息可通过以下方式得到,如编码端将对深度图进行sobel滤波,得到滤波结果。以及,编码端对该滤波结果进行大律法二值化处理,得到深度图中物体的边缘信息。
编码端根据上述各处理参数与滤波器的对应关系,确定渲染信息对应的第一类滤波器,该第一类滤波器与图像渲染引擎所采用的滤波器具有一定的匹配度,编码端利用该第一类滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的匹配度提高,编码端根据该虚拟参考帧对第二图像进行编码时,得到的第二图像对应的残差值降低,从而提升了编码效果。
在一种可能的实现方式中,编码端根据由渲染信息确定的第一类滤波器,对第一解码图像进行插值,获取第二图像的虚拟参考帧包括:首先,编码端生成与参考块尺寸一致的空白图像。其次,编码端根据第一对应关系,利用拆分后的图形运动矢量图对上述参考块进行插值,确定空白图像中所有像素的像素值。最后,编码端根据空白图像中所有像素的像素值,得到虚拟参考帧。
其中,第一解码图像包括第一图像对应的一个或多个参考块。第一对应关系用于指示:在第一图像和第二图像的重叠区域中,第一图像中的像素与第二图像的像素的位置映射关系。
编码端根据第一对应关系,对第一解码图像中的参考块进行插值,得到第二图像的虚拟参考帧,该虚拟参考帧与第二图像的相似度高于第一解码图像与第二图像的相似度,从而该虚拟参考帧与第二图像间的残差值降低,进而在确保图像质量一致的情况,码流中该图像的数据量减少,提高了编码效果。
在一种可能的实现方式中,第一对应关系通过以下方法得到:首先,编码端生成与源图像尺寸一致的空白图像。其次,编码端基于第一图形运动矢量图与空白图像间尺寸,确定第一图形运动矢量图中一个像素与空白图像中像素位置的第二对应关系。最后,编码端利用插值滤波器对图形运动矢量图进行插值,确定空白图像中每个像素的像素值和偏移。
在图形运动矢量与源图像尺寸不一致的情况下,编码端对图形运动矢量图进行插值,得到插值后的图形运动矢量图。该插值后的图形运动矢量图的尺寸与第一解码图像的尺寸一致,且插值后的图形运动矢量图与第一解码图像的像素匹配,编码端基于该插值后的图形运动矢量图对第一解码图像进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的匹配度提高。编码端根据该虚拟参考帧对第二图像进行编码时,该图像对应的残差值降低,提升了压缩性能。
在一种可能的实现方式中,该视频编码方法包括:第一,编码端基于渲染信息中指示的图形运动矢量图,确定第一解码图像中像素的目标离散参数。第二,编码端中第一类滤波器利用第一滤波核对参考块进行插值,并计算插值得到的虚拟参考块中像素在每个通道的第一离散参数。第三,编码端从所有滤波核对应的至少一个第一离散参数中,确定与目标离散参数差值最小的第一离散参数,将该第一离散参数对应的滤波核作为第一类滤波器的参数。
其中,目标离散参数包括:目标方差或目标协方差。第一离散参数包括第一方差或第一协方差。第一滤波核为设定的多个滤波核中的一个。
编码端确定与目标离散参数差值最小的第一离散参数,编码端将该第一离散参数对应的第一滤波核,作为第一类滤波器的参数。编码端利用前述的第一类滤波器和第一类滤波的参数,对第一解码图像进行插值,得到虚拟参考帧。该虚拟参考帧与第二图像的匹配度提高,在编码端根据虚拟参考帧来对第二图像进行编码时,该第二图像对应的残差值降低,使得码流中该图像的数据量减少,提升了压缩性能。
在一种可能的实现方式中,该视频编码方法还包括:编码端将第一类滤波器的类型信息写入码流。
编码端通过将第一类滤波器的类型信息发送至解码端,避免了解码端执行根据渲染信息确定第一类滤波器的处理过程,从而加快了视频解码效率。
第二方面,本申请提供了一种视频解码方法,该视频解码方法可应用于编解码系统或应用于支持该编解码系统实现该视频解码方法的解码端,例如该解码端包括视频解码器。这里以解码端执行本实施例提供的视频解码方法为例进行说明,该视频解码方法包括:第一,解码端获取码流和码流对应的渲染信息,该码流包括多个图像帧。第二,解码端基于渲染信息,从设定的多类滤波器中确定第一类滤波器。第三,解码端根据第一类滤波器,对第一图像帧对应的第一解码图像进行插值,得到第二图像帧的虚拟参考帧。第四,解码端基于虚拟参考帧对第二图像帧进行解码,获取第二图像帧的第二解码图像。
其中,该渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,该处理参数指示了多个图像帧中第一图像帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域。
示例的,该码流可由编码端发送至解码端。
解码端根据渲染信息指示的处理参数,从设定的多类滤波器中选择第一类滤波器,并利用该第一类滤波器来获取第二图像帧的虚拟参考帧,避免采用固定的滤波器对第一解码图像进行处理得到参考帧导致的解码效果较差的问题,该虚拟参考帧与第二图像帧对应的源图像的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,在解码端的处理能力一致时,解码端处理的数据量减少,提高了解码效率。
在一种可能的实现方式中,解码端根据由渲染信息确定的第一类滤波器包括:解码端根据渲染信息查询设定的映射表,获取渲染信息中每个处理参数对应的滤波器。以及,解码端从渲染信息对应的所有滤波器中,确定第一类滤波器。
其中,该映射表用于指示:每个处理参数对应的多类滤波器中至少一类滤波器。
解码端根据渲染信息查询设定的映射表,确定的第一类滤波器,该第一类滤波器与图像渲染引擎所采用的滤波器具有一定的匹配度,解码端利用该第一类滤波器对参考帧进行
插值,得到虚拟参考帧,该虚拟参考帧与第二图像帧的时空域相关性提高,即该虚拟参考帧与第二图像帧对应的源图像的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
在一种可能的实现方式中,该映射表还用于指示:多类滤波器中每类滤波器的优先级;解码端从渲染信息对应的滤波器中,确定第一类滤波器包括:解码端获取渲染信息对应的所有滤波器的优先级;以及,解码端将所有滤波器中滤波器优先级最高的滤波器作为第一类滤波器。
解码端根据渲染信息包括的处理参数,从映射表中选取优先级最高的处理参数对应的滤波器作为第一类滤波器,该优先级最高的滤波器与图像渲染引擎采用的滤波器的匹配度也最高。解码端利用该匹配度最高的滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像帧对应的源图像的匹配度更高,进而解码端根据该虚拟参考帧进行解码时,处理的数据量减少,提高了解码效率。
在一种可能的示例中,渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
针对于上述渲染信息包括处理参数的示例,本申请给出了各个处理参数对应的第一类滤波器或第一类滤波器的确定步骤,若渲染信息包括深度图时,则第一类滤波器通过深度图中的像素与所述深度图中物体的边缘信息的关系来确定;或者,
若渲染信息包括反照率图时,则第一类滤波器通过反照率图中的像素与反照率图中物体的边缘信息的关系来确定;或者,
若渲染信息包括的后处理参数包含抗锯齿参数时,则第一类滤波器为抗锯齿参数所指示的滤波器;或者,
若渲染信息包括的后处理参数包含执行运动模糊模块的指令时,则第一类滤波器为bilinear滤波器;
或者,第一类滤波器为Catmull-Rom滤波器。
示例的,上述深度图中物体的边缘信息可通过以下方式得到,如解码端将对深度图进行sobel滤波,得到滤波结果。以及,解码端对该滤波结果进行大律法二值化处理,得到深度图中物体的边缘信息。
解码端根据上述各处理参数与滤波器的对应关系,确定渲染信息对应的第一类滤波器,该第一类滤波器与图像渲染引擎所采用的滤波器具有一定的匹配度,解码端利用该第一类滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像帧对应源图像的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
在一种可能的实现方式中,在解码端根据由渲染信息确定的第一类滤波器,对第一图像帧对应的第一解码图像进行插值之前,该视频解码方法还包括:解码端获取滤波器信息;以及,解码端根据滤波器信息中指示的滤波器种类及参数,确定第一类滤波器及第一类滤波器对应的滤波参数。
其中,该滤波参数用于指示:解码端基于第一类滤波器得到虚拟参考帧的过程中使用的处理参数。
示例的,编码端确定第一类滤波器和第一类滤波器对应的滤波参数后,编码端将第一
类滤波器和第一类滤波器对应的滤波参数写入码流,并将该码流发送至解码端。
解码端通过直接获取第一类滤波器及第一类滤波器对应的滤波参数,避免了解码端执行根据渲染信息确定第一类滤波器的处理过程,从而加快了视频解码效率。
在一种可能的实现方式中,解码端根据渲染信息确定的第一类滤波器,对第一图像帧对应的第一解码图像进行插值,获取第二图像帧的虚拟参考帧包括:首先,解码端生成与参考块尺寸一致的空白图像。其次,解码端根据第一对应关系,利用拆分后的图形运动矢量图对上述参考块进行插值,确定空白图像中所有像素的像素值。最后,解码端根据空白图像中所有像素的像素值,得到虚拟参考帧。
其中,第一解码图像包括第一图像对应的一个或多个参考块。第一对应关系用于指示:在第一图像帧对应的源图像和第二图像帧对应的源图像的重叠区域中,第一图像帧对应的源图像中的像素和第二图像帧对应的源图像中的像素位置映射关系。
解码端根据第一对应关系,对第一解码图像中的参考块进行插值,得到第二图像的虚拟参考帧,该虚拟参考帧与第二图像帧的相似度高于第一解码图像与第二图像帧的相似度,解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
在一种可能的实现方式中,上述第一对应关系通过以下方法得到:首先,解码器生成与第一解码图像尺寸一致的空白图像。其次,解码端基于第一图形运动矢量图与空白图像间尺寸比例,确定第一图形运动矢量图中一个像素与空白图像中像素位置的第二对应关系。最后,解码端利用插值滤波器对图形运动矢量图进行插值,确定空白图像中每个像素的像素值和偏移。
其中,该第一解码图像与源图像的尺寸一致。
在图形运动矢量与第一解码图像尺寸不一致的情况下,解码端对图形运动矢量图进行插值,得到插值后的图形运动矢量图。该插值后的图形运动矢量图的尺寸与第一解码图像的尺寸一致,且插值后的图形运动矢量图与第一解码图像的匹配,解码端基于该插值后的图形运动矢量图对第一解码图像进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像帧的匹配度提高。解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
在一种可能的实现方式中,该视频解码方法还包括:第一,解码端基于渲染信息中指示的图形运动矢量图,确定第一解码图像中像素的目标离散参数。第二,解码端中第一类滤波器利用第一滤波核对参考块进行插值,并计算插值得到的虚拟参考块中像素在每个通道的第一离散参数。第三,解码端从所有滤波核对应的至少一个第一离散参数中,确定与目标离散参数差值最小的第一离散参数,解码端将该第一离散参数对应的滤波核作为第一类滤波器的参数。
其中,目标离散参数包括:目标方差或目标协方差。第一离散参数包括第一方差或第一协方差。第一滤波核为设定的多个滤波核中的一个。
解码端确定与目标离散参数差值最小的第一离散参数,解码端将该第一离散参数对应的第一滤波核,作为第一类滤波器的参数。解码端利用前述的第一类滤波器和第一类滤波的参数,对第一解码图像进行插值,得到虚拟参考帧。该虚拟参考帧与第二图像帧的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
第三方面,本申请提供了一种视频编码装置,该视频编码装置应用于编码端,并适用于包括编码端的视频编解码系统,该视频编码装置包括用于执行第一方面或第一方面任一种可选实现方式中的视频编码方法的各个模块。示例的,该视频编码装置包括:第一获取模块、第一插值模块和编码模块。其中,第一获取模块,用于获取源视频和源视频对应的渲染信息。第一插值模块,用于根据渲染信息,确定第一类滤波器。以及根据该第一类滤波器,对第一解码图像进行插值,得到第二图像的虚拟参考帧。编码模块,用于基于第二图像的虚拟参考帧对第二图像信息编码。
其中,该源视频包括多个源图像,该渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,处理参数指示了多个源图像中的第一图像和第二图像具有至少部分重叠区域。第一类滤波器为设定的多类滤波器中的一类,第一解码图像为多个图像中第一图像进行编码后再解码获取的重建图像,该第一图像在第二图像之前进行编码。
关于视频编码装置更多详细的实现内容可参照以上第二方面中任一实现方式的描述,以及下述具体实施方式的内容,在此不予赘述。
第四方面,本申请提供了一种视频解码装置,该视频解码装置应用于解码端,并适用于包括解码端的视频编解码系统,该视频解码装置包括用于执行第二方面或第二方面任一种可选实现方式中的视频解码方法的各个模块。示例的,该视频解码装置包括:第二获取模块、第二插值模块和解码模块。其中,第二获取模块,用于获取码流和码流对应的渲染信息。其中,第二插值模块,用于基于渲染信息,确定第一类滤波器,以及根据第一类滤波器,对第一图像帧对应的第一解码图像进行插值,得到第二图像帧的虚拟参考帧。解码模块,用于根据虚拟参考帧对第二图像帧进行解码,获取第二图像帧对应的第二解码图像。
其中,该码流包括多个图像帧,渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,处理参数指示了多个图像帧中第一图像帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域。第一类滤波器为设定的多类滤波器中的一类。
关于视频解码装置更多详细的实现内容可参照以上第一方面中任一实现方式的描述,以及下述具体实施方式的内容,在此不予赘述。
第五方面,本申请提供一种芯片,包括:处理器和供电电路;该供电电路用于为处理器供电,该处理器用于执行上述第一方面和第一方面中任一种可能实现方式中的方法;和/或,该处理器用于执行上述第二方面和第二方面中任一种可能实现方式中的方法。
第六方面,本申请提供一种编解码器,包括存储器和处理器,该存储器用于存储计算机指令;该处理器执行计算机指令时,实现上述第一方面和第一方面中任一种可能实现方式中的方法;和/或,该处理器执行计算机指令时,实现上述第二方面和第二方面中任一种可能实现方式中的方法。
第七方面,本申请提供一种编解码系统,包括编码端和解码端;该编码端用于根据多个源图像对应的渲染信息,对多个图像进行编码,得到多个图像对应的码流,实现上述第一方面和第一方面中任一种可能实现方式中的方法;
该解码端用于根据多个源图像对应的渲染信息,对所述码流进行解码,得到多个解码图像,实现上述第二方面和第二方面中任一种可能实现方式中的方法。
第八方面,本申请提供一种计算机可读存储介质,该存储介质中存储有计算机程序或指令,当计算机程序或指令被处理设备执行时,实现上述第一方面和第一方面中任一种可
选实现方式中的方法;和/或,当计算机程序或指令被处理设备执行时,实现上述第二方面和第二方面中任一种可选实现方式中的方法。
第九方面,本申请提供一种计算机程序产品,该计算程序产品包括计算机程序或指令,当该计算机程序或指令被处理设备执行时,实现上述第一方面和第一方面中任一种可选实现方式中的方法;和/或,当该计算机程序或指令被处理设备执行时,实现上述第二方面和第二方面中任一种可选实现方式中的方法。
以上第三方面至第九方面的有益效果可参照第一方面或第一方面中任一种实现方式,或者,第二方面或第二方面中任一种实现方式的描述,在此不予赘述。本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
图1为本申请提供的视频编解码系统的示例性框图;
图2为本申请提供的视频编码器的结构示意图;
图3为本申请提供的视频解码器的结构示意图;
图4为本申请提供的时域抗锯齿方法的流程示意图;
图5为本申请提供的视频编码方法的流程示意图;
图6为本申请提供的参考块插值方法的示意图;
图7为本申请提供的图形运动矢量插值方法的示意图;
图8为本申请提供的视频解码的流程示意图;
图9为本申请提供的视频编码装置的结构示意图;
图10为本申请提供的视频解码装置的结构示意图;
图11为本申请提供的一种计算机设备的结构示意图。
本申请中,编码端从设定的多类滤波器中确定第一类滤波器,该第一类滤波器与图像渲染过程中所使用的滤波器匹配,有利于提升编码端利用该第一类滤波器确定的虚拟参考帧与第二图像之间的相似度。从而,在编码端基于该虚拟参考帧对第二图像编码时,该第二图像对应的残差值降低,码流中该第二图像对应的数据量减少,提升了压缩性能以及视频编码的效率。
解码端根据渲染信息指示的处理参数,从设定的多类滤波器中选择第一类滤波器,并利用该第一类滤波器来获取第二图像帧的虚拟参考帧,避免采用固定的滤波器对解码图像进行处理得到参考帧导致的解码效率较差的问题,该虚拟参考帧与待解码图像对应的源图像的匹配度提高,解码端根据该虚拟参考帧对第二图像帧进行解码时,在解码端的处理能力一致时,解码端处理的数据量减少,提高了视频解码效率。
下面结合实施例对本申请提供的方案进行说明,为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍。
视频编码(video encoding):将视频包括的多帧图像压缩成码流的处理过程。
视频解码(video decoding):将码流按照特定的语法规则和处理方法恢复成多帧重建图像的处理过程。
为了实现视频的编码与解码,本申请提供了一种视频编解码系统。如图1所示,图1为本申请提供的视频编解码系统的示例性框图,视频编解码系统包含编码端100和解码端
200。编码端100产生经编码视频数据(或称为码流)。因此,编码端100可被称为视频编码装置。解码端200可对由编码端100所产生的码流(如包含一个或多个图像帧的视频)进行解码。因此,解码端200可被称为视频解码装置。编码端100、解码端200或两个的各种实施方案可包含一或多个处理器以及与处理器耦合到所述一个或多个处理器的存储器。所述存储器可包含但不限于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体。
在图1的示例中,编码端100包含源视频110、视频编码器120和输出接口130。在一些实例中,输出接口130可包含调节器/解调器(调制解调器)和/或发射器。源视频110可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,如图像渲染引擎,或上述视频来源的组合。
视频编码器120可对来自源视频110的视频数据进行编码。在一些示例中,编码端100由输出接口130经链路300将码流直接发射到解码端200。在其它示例中,码流还可存储到存储装置400上,供解码端200以后存取来用于解码和/或播放。
在图1的示例中,解码端200包含输入接口230、视频解码器220和显示装置210。在一些示例中,输入接口230包含接收器和/或调制解调器。输入接口230可经由链路300和/或从存储装置400接收经编码视频数据。显示装置210可与解码端200集成或可在解码端200外部。一般来说,显示装置210显示经解码视频数据。显示装置210可包括多种显示装置,例如,液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。
在图1所示的视频编解码系统的基础上,本申请提供了一种可能的视频编码器。如图2所示,图2为本申请提供的一种视频编码器的结构示意图。
视频编码器120包括帧间预测器121、帧内预测器122、变换器123、量化器124、反量化器126、反变换器127、滤波器单元128和存储器129。反量化器126、反变换器127用于图像块重构。滤波器单元128用于指示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器和样本自适应偏移滤波器。
存储器129可存储由视频编码器120的组件编码的视频数据。可从源视频110获得存储在存储器129中的视频数据。存储器129可为参考图像存储器,其存储用于由视频编码器120在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。存储器129可以为动态随机存取存储器(dynamic RAM,DRAM)、磁阻式RAM(magnetic RAM,MRAM)、电阻式RAM(resistive RAM,RRAM),或其它类型的存储器装置。
针对于视频编码流程,以下结合图2的内容,对视频编码器的工作流程进行说明。
在源数据(源视频)经帧间预测器121和帧内预测器122产生当前图像块(或称为第二图像)的预测块之后,视频编码器120从待编码的当前图像块减去预测块来形成残差图像块。残差块中的残差视频数据可包含在一或多个变换单元(transform unit,TU)中,并应用于变换器123。变换器123使用例如离散余弦变换或概念上类似的变换等变换将残差
视频数据变换成残差变换系数。变换器123可将残差视频数据从像素值域转换到变换域,例如频域。
变换器123可将所得变换系数发送到量化器124。量化器124量化变换系数以进一步减小位速率。在一些示例中,量化器124可接着执行对包含经量化的残差变换系数的矩阵的扫描。或者,熵编码器125可执行扫描。
在量化之后,熵编码器125对经量化变换系数进行熵编码。举例来说,熵编码器125可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器125熵编码之后,可将经编码码流发射到视频解码器,或经存档以供稍后发射或由视频解码器检索。熵编码器125还可对待编码的当前图像块的语法元素进行熵编码。
反量化器126和反变换器127分别应用逆量化和逆变换以在像素域中重构残差块,例如以供稍后用作参考图像的参考块。视频编码器120将经重构的残差块添加到由帧间预测器121或帧内预测器122产生的预测块,以产生经重建图像或重建图像块。滤波器单元128可以适用于经重建图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重建图像或重建图像块作为参考块(或称为第一解码图像)存储在存储器129中,可由帧间预测器121用作参考块以对后续视频帧或图像中的块进行帧间预测。
如图2所示,该视频编码器120还外接了插值滤波单元140,该插值滤波单元140用于对第一解码图像进行插值处理,得到虚拟参考帧,并将该虚拟参考帧存入存储器129,视频编码器120利用该虚拟参考帧辅助第二图像进行编码。其中,插值滤波单元中的滤波器为编码端100根据获取到的先验信息确定的,该先验信息可为计算机图像渲染技术进行渲染处理得到源视频的过程中,所使用的处理参数。该虚拟参考帧与第二图像的相似度高于上述第一解码图像与第二图像的相似度。
在一种可能的情形中,该插值滤波单元140可设置于视频编码器120内部。
应当理解的是,视频编码器120的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器120可以直接地量化残差信号而不需要经变换器123处理,相应地也不需要经反变换器127处理;或者,对于某些图像块或者图像帧,视频编码器120没有产生残差数据,相应地不需要经变换器123、量化器124、反量化器126和反变换器127处理;或者,视频编码器120可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元128处理;或者,视频编码器120中量化器124和反量化器126可以合并在一起。
如图3所示,图3为本申请实施例的视频解码器的结构示意图。视频解码器220包括熵解码器221、反量化器222、反变换器223、滤波器单元224、存储器225、帧间预测器226和帧内预测器227。视频解码器220可执行大体上与相对于来自图2的视频编码器120描述的编码过程互逆的解码过程。首先,利用熵解码器221、反量化器222和反变换器223得到残差块或残差值,解码码流确定当前图像块使用的是帧内预测还是帧间预测。如果是帧内预测,则帧内预测器227利用周围已重建区域内像素点的像素值按照所使用的帧内预测方法构建预测信息。如果是帧间预测,则帧间预测器226需要解析出运动信息,并使用所解析出的运动信息在已重建的图像中确定参考块,并将块内像素点的像素值作为预测信
息,使用预测信息加上残差信息经过滤波操作便可以得到重建信息。
以及视频解码器220还外接了插值滤波单元240,插值滤波单元240的作用,可参考上述与视频解码器120外接的插值滤波单元140的内容,在此不予赘述。
在一种可能的情形中,该插值滤波单元240可设置于视频编码器220内部。
可选的,以上的源视频是渲染端对输入数据进行渲染处理后得到的。这里的输入数据是指与渲染端通信的显示设备获取的数据,或者,该输入数据是指渲染端接收到的数据。
示例性的,该渲染端包括图像渲染引擎(如V-Ray、Unreal、Unity)。在图像渲染引擎对输入数据进行渲染处理的过程中,会生成过程图数据和参数信息,
其中,过程图数据包括:渲染端对输入数据进行处理所产生的渲染图像,以及,输入数据到渲染图像之间的中间图等。例如,该中间图可以包括但不限于:图形运动矢量图、低质量渲染结果、深度图、位置图(Position map)、法线图(Normal map)、反照率图(albedo map)、镜面强度图(specular intensity map)等。
上述渲染处理过程中的参数信息是指渲染端对渲染图像进行后处理所采用的后处理参数,例如,Mesh ID、Material ID、运动模糊参数和抗锯齿参数等。
值得注意的是,前述的过程图数据和参数信息也可统称为源视频对应的中间数据,或者是编码端对该源视频进行编码时所利用的渲染信息等。
在一种可能的示例中,解码端200上的图像渲染引擎可根据上述显示端的输入数据进行图像渲染,以得到上述过程图数据和参数信息中的一种或多种。由于解码端200获取到了全部或部分的中间数据,避免了编码端100将上述中间数据全部发送至解码端200,进而减小了编码端100传输上述中间数据至解码端200的数据量,节约了带宽。
示例的,上述显示端的输入数据可为相机(视角)位置、光源的位置/强度/色彩等中的一种或多种。上述渲染端的输入数据可为物体几何、物体初始位置、光源信息等中的一种或多种。该图形运动矢量图为指示源视频中两帧图像中的像素的位置对应关系的图像。该运动模糊参数为图像渲染引擎执行运动模糊的指令。该抗锯齿参数为图像渲染引擎进行TAA时所采用的滤波方式。
可选的,图像渲染引擎还可对渲染图像进行后处理,下面以后处理是指抗锯齿处理为例进行说明。
渲染端对渲染图像进行时域抗锯齿(Temporal Anti-Aliasing,TAA)处理,得到源视频包括的多个图像,以解决渲染图像的锯齿问题。如图4所示,图4为本申请提供的时域抗锯齿方法的流程示意图,多帧渲染图像之间的差异较小,如上一帧和当前帧之间的仅是少许像素(如一个或两个等)的位置发生了偏移,以该当前帧中的第一像素为例,渲染端根据图形运动矢量图,确定该第一像素在当前帧的上一帧中与第一像素对应的第二像素,渲染端对第一像素的像素值与第二像素的像素值进行加权平均,得到最终像素值。渲染端利用该最终像素值填充当前帧中的第一像素,从而确定第一图像中各个像素的像素值,得到第一图像。该第一像素为当前帧中所有像素的任一个。
其中,上述的渲染端可为编码端100,该编码端100可在云服务器上,上述的显示端可为解码端200,该解码端200可在客户端上。
上述渲染端确定该第二像素的像素值可包括:编码端100利用滤波器对该第二像素或第二像素周围的像素(如3x3的像素范围)进行滤波,得到该第二像素的像素值。其中,
滤波器的滤波方式可为均值滤波、中值滤波以及高斯滤波等中的一种。
为降低端侧(如显示端或解码端)的负载,图像渲染引擎可部署在云侧(如云侧服务器或云侧的编码端),编码端100将多个图像进行图像渲染后获得源视频,并对该源视频进行编码获取码流。以及,编码端100将该码流传输至解码端200,解码端200对该码流进行解码后播放。
在一种可能的场景中,在云游戏场景下,编码端100将多个图像经图像渲染引擎处理后得到游戏画面,并对该游戏画面进行编码得到码流,以降低传输游戏画面的数据量,解码端200对该码流进行解码后播放,用户在该解码端200(如移动设备)上操作游戏。
下面将结合附图对本实施例提供的图像编码方法的具体实现方式进行详细描述。
由于编码端100和解码端200分别处于云服务器和客户端,因此为提高数据的传输效率,编码端100对源视频进行压缩得到码流后,编码端100将该码流发送至解码端200。为此,本申请提供了一种视频编码方法,如图5所示,图5为本申请提供的视频编码方法的流程示意图,该视频编码方法可应用于图1所示出的视频编解码系统或者图2所示出的视频编码器中,示例性的,该视频编码方法可由编码端100或者视频编码器120执行,这里以编码端100执行本实施例提供的图像编码方法为例进行说明。如图5所示,本实施例提供的图像编码方法包括以下步骤S510至S540。
S510、编码端100获取源视频和源视频对应的渲染信息。
其中,该源视频包括多个源图像,关于源视频的更多内容可参考前述实施例的描述,在此不予赘述。
该渲染信息用于指示:编码端100根据源视频中多个源图像生成码流的过程中所使用的处理参数。处理参数可以包括前述的过程图数据和参数信息中至少一种。并且该处理参数中的图形运动矢量图指示了多个源图像中的第一图像和第二图像具有至少部分重叠区域,且该重叠区域中的像素具有位置对应关系。
S520、编码端100根据渲染信息,确定第一类滤波器。
其中,该第一类滤波器为设定的多类滤波器中的一类。
编码端100可根据渲染信息中的处理参数与设定的滤波器间的对应关系,从设定的滤波器中确定第一类滤波器。
示例的,编码端100获取到的渲染信息可包括下表1中示出的处理参数中的一种或多种组合,该表1示出了每个处理参数对应的多类滤波器中至少一类滤波器。
表1
其中,抗锯齿参数与运动模糊参数为后处理参数包括的内容。优先级为5的对应的滤波器为备选滤波器,当渲染信息中不包括深度图、反照率图、抗锯齿参数和后处理参数时,则直接将优先级5对应的Catmull-Rom滤波器作为第一类滤波器。示例性的,滤波器对应
优先级排序越靠前,表示该滤波器与图像渲染引擎所采用的滤波器的匹配度越高。编码端100利用该匹配度高的滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的时空域相关性提高,进而编码端100根据该虚拟参考帧确定的预测块,该预测块与第二图像对应的图像块的相似度提高,从而提升了编码效果,降低了编码得到的数据量,提高了压缩性能。
上述表1仅为本申请提供的示例,不应理解为对本申请的限定,在一些情况中,表1中还包括更多的处理参数,并且该处理参数可对应一个或多个滤波器,该滤波器对应的优先级可在上述表1所示的滤波器对应的优先级之前或之后。示例的,上述表1中处理参数还可包括法线图,该法线图对应的滤波器为Mitchell-Netravali滤波器,且该Mitchell-Netravali滤波器的优先级在TAA采用的滤波器对应的优先级之前。
在一种可能的情形中,编码端100根据渲染信息查询如上述表1的映射表,获取渲染信息中每个处理参数对应的滤波器。编码端100从渲染信息对应的所有滤波器中,确定第一类滤波器。
该映射表指示了渲染信息中的各个处理参数对应的滤波器,且该处理参数与滤波器的对应关系基于图像渲染引擎的各处理过程采用的滤波器确定。编码端100根据获取到的处理参数查询前述的映射表,确定第一类滤波器,该第一类滤波器与图像渲染引擎采用的滤波器匹配,进而编码端100利用该第一类滤波器得到的虚拟参考帧与第二图像时空域相关性提高,编码端100根据虚拟参考帧确定的第二图像的残差值降低,从而提升了编码效果。
针对于编码端100根据渲染信息包括的处理参数确定第一类滤波器,以下提供了三种可能的示例。
在第一种可能的示例中,编码端100获取到的渲染信息仅包括抗锯齿参数。
编码端100根据该抗锯齿参数查询上述表1,得到抗锯齿参数对应的TAA采用的滤波器。由于渲染信息中仅包含抗锯齿参数,因此直接将该TAA采用的滤波器作为第一类滤波器。
在第二种可能的示例中,编码端100获取到的渲染信息包括深度图和抗锯齿参数。
编码端100根据该深度图和抗锯齿参数查询上述表1,并根据表1中指示的多类滤波器中每类滤波器的优先级,得到深度图和抗锯齿参数对应的滤波器优先级分别为1和3,编码端100选取优先级最高的滤波器,即前述优先级为1的滤波器“Catmull-Rom滤波器/bilinear滤波器”,编码端100从“Catmull-Rom滤波器/bilinear滤波器”中确定一个作为第一类滤波器。
对于编码端100从“Catmull-Rom滤波器/bilinear滤波器”确定一个滤波器作为第一类滤波器的过程,以下提供了可能的实现方式,在此不予赘述。
编码端100根据渲染信息包括的处理参数,从映射表中选取优先级最高的处理参数对应的滤波器作为第一类滤波器,该优先级最高的滤波器与图像渲染引擎采用的滤波器的匹配度也最高。编码端100利用该匹配度最高的滤波器对参考帧进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的时空域相关性提高,进而编码端100根据该虚拟参考帧,得到的第二图像对应的残差值降低,从而提升了编码效果,降低了编码得到的数据量,提高了压缩性能。
在第三种可能的示例中,编码端100获取到渲染信息包括的深度图和抗锯齿参数。
编码端100将获取深度图和抗锯齿参数对应的所有滤波器,编码端利用该所有滤波器分别对第二图像进行预编码,所有滤波器中的每一个滤波器将得到对应的预测结果。编码端100选择所有预测结果中符合设定条件的目标预测结果,并将目标预测结果对应的滤波器作为第一类滤波器。
其中,预测结果用于指示:基于一个滤波器对第二图像进行预编码获取的编码信息,该编码信息包括:第二图像的预测码率和失真率中至少一种。
例如,上述的预编码为:对第二图像进行帧间预测得到第二图像的预测块,并根据预测块,得到第二图像的残差值或残差块,再进行变换器处理的过程。编码端100利用上述所有滤波器进行预编码后得到对应预测码率和失真率。上述设定的条件可为在预测码率达到设定值时,失真率最小的预测结果,或者在失真率小于设定值时,预测码率最小的预测结果。
编码端通过利用渲染信息中处理参数对应的所有滤波器,进行预编码,编码端根据设定的条件对预编码得到的预测结果进行筛选,将满足设定的条件的目标预测结果对应的滤波器,作为第一类滤波器,从而确定了所有滤波器中实际符合预设条件的滤波器,作为第一类滤波器,进而编码端100根据该第一类滤波器插值得到的虚拟参考帧与第二图像的相似度最大。在编码端100对第二图像进行编码的过程中,由于虚拟参考帧与第二图像之间的相似度最大,因此,编码端100利用虚拟参考帧对该第二图像进行编码确定的码率和失真率达到最佳平衡,提高了编码压缩效果。
在一种可能的情形中,在编码端100中图像渲染引擎的TAA算法确定的情况下,在编码端100上预设第一类滤波器。
请继续参见图5,本实施例提供的视频编码方法还包括步骤S530。
S530、编码端100根据第一类滤波器,对第一解码图像进行插值,得到第二图像的虚拟参考帧。
其中,该第一解码图像为多个图像中第一图像进行编码再解码后的重建图像,该重建图像为经图2中滤波器单元128处理后的图像。该第一图像在第二图像之前进行编码。该第一解码图像包括第一图像对应的一个或多个参考块,如第一解码图像包括M x N个参考块,M和N都为大于等于1的整数。得到重建图像的内容,可参照上述图2所示出的表述,在此不予赘述。得到的虚拟参考帧可存入如图2中的存储器129。
下面给出一种可能的具体示例,以编码端是对第一解码图像包括的任一个参考块进行插值为例进行说明,如图6所示,图6为本申请提供的参考块插值方法的示意图,该方法包括以下第一步至第三步。
第一步,编码端100生成与参考块尺寸一致的空白图像。
其中,当图形运动矢量图与第一解码图像的尺寸一致时,编码端100直接生成与参考块尺寸一致的空白图像,并将图形运动矢量图拆分成M x N个图像,使拆分得到的图像与参考块尺寸一致,当M和N都为1时,编码端100将不对图形运动矢量图进行拆分。该第一解码图像、第一图像和第二图像的尺寸都相同。
当该图形运动矢量图与第一解码图像的尺寸不一致时,编码端100对图形运动矢量图进行插值,使插值后的图形运动矢量图与第一解码图像的尺寸一致。
对于编码端100对图形运动矢量图进行插值的实现方式,以下图7提供了一种可能的
实现方式,在此不予赘述。
第二步,编码端100根据第一对应关系,利用拆分后的图形运动矢量图对上述参考块进行插值,确定空白图像中所有像素的像素值。
其中,该第一对应关系用于指示:渲染信息中图形运动矢量图指示的第一图像和第二图像具有至少部分重叠区域,且该部分重叠区域中,第一图像的像素与第二图像像素的位置映射关系。
由于第一解码图像基于第一图像进行编码再解码得到,因此第一解码图像中的像素与第一图像中的像素具有位置对应关系。在第一步中得到的空白图像与参考块的尺寸相同,因此可以理解为,空白图像中的像素与第二图像中对应图像块的像素位置对应。基于第一图像的像素与第二图像像素的位置映射关系,所以参考块上的像素与空白图像上的像素具有位置对应关系。
编码端100基于参考块上的像素位置与空白图像上的像素位置间的对应关系,利用第一类滤波器对参考块上的每个像素进行插值滤波,确定对应空白图像上每个像素位置的像素值,进而得到虚拟参考块中所有像素的像素值。
示例的,以第一对应关系中的参考块中像素A与空白图像中像素A’的位置映射关系为例进行说明。如图6所示,编码端100根据第一图像的像素与第二图像像素的位置映射关系,得到了参考块的像素与空白图像上像素的位置对应关系,如参考块上的像素A与空白图像上的像素A’具有位置对应关系。编码端100利用第一类滤波器对像素A或像素A及像素A周围的像素进行插值滤波,得到对应的像素值。编码端100再将该像素值填入空白图像上像素A’处,从而得到像素A’处的色彩。
编码端100按照上述像素A’的处理步骤,计算空白图像上每个像素的像素值,从而得到空白图像上所有像素的像素值。
第三步,编码端100根据空白图像中所有像素的像素值,得到虚拟参考帧。
编码端100将空白图像上所有像素的像素值填入空白图像的对应像素,得到虚拟参考块,编码端100通过将第一解码图像包括的每个参考块都经前述第一步和第二步的处理,得到各参考块对应空白图像上的所有像素的像素值,从而得到对应的虚拟参考块。编码端100通过将各虚拟参考块进行整合,得到虚拟参考帧。其中,编码端将第一解码图像包括的每个参考块都经前述第一步和第二步的处理时,采用的第一类滤波器可能不同。如不同的参考块采用不同的第一类滤波器。
编码端根据第一对应关系,对第一解码图像中的参考块进行插值,得到第二图像的虚拟参考帧,该虚拟参考帧与第二图像的相似度高于第一解码图像与第二图像的相似度,从而该虚拟参考帧与第二图像间的残差值降低,进而在确保图像质量一致的情况,码流中该图像的数据量减少,提高了编码效果。
请继续参见图5,本实施例提供的视频编码方法还包括步骤S540。
S540、编码端100基于第二图像的虚拟参考帧,对第二图像进行编码,得到第二图像对应的码流。
编码端100利用第二图像的虚拟参考帧进行帧间预测,确定第二图像对应的预测帧,该预测帧包括多个预测块。编码端100基于第二图像中图像块与对应预测块,得到对应的残差值(或称为残差块),以及确定第二图像中图像块与对应预测块的运动矢量(Motion
Vector,MV)。编码端100对上述残差值和运动矢量进行编码,实现对第二图像编码,得到对应码流。
针对于编码端100确定第二图像对应的多个预测块的过程,本申请给出了以下可能的示例。
在一种可能的示例中,编码端100根据如图2所示视频编码器120中的帧间预测器121,从存储器129中确定与第二图像的图像块匹配度最高的参考块,作为第二图像的预测块。
其中,该匹配度最高的参考块包括虚拟参考块。
例如,编码端100可利用全搜索法(Full Search,FS)或快速搜索法进行图像块搜索,编码端100再利用均方误差(mean-square error,MSE)和平均绝对误差(Mean Absolute Deviation,MAD)等方法计算搜索到的参考块与第二图像的图像块的匹配度,进而得到上述匹配度最高的参考块。
针对于编码端100确定第二图像中图像块与对应预测块的运动矢量,本申请给出了以下可能的示例。
在一种可能的示例中,编码端100根据图像块在第二图像中的位置与预测块在对应参考帧中的位置,确定图像块与预测块的相对位移,该相对位移即为图像块的运动矢量。例如该运行矢量为(1,1),表示图像块在第二图像中的位置相较于预测块在对应参考帧中的位置移动了(1,1)。
在一种可能的情形中,编码端100可将上述第一类滤波器的类型信息发送至解码端200。该类型信息如Catmull-Rom滤波器、bilinear滤波器或B-spline滤波器等。
示例1,编码端100可将上述第一类滤波器的类型信息写入码流,将该码流发送至解码端200。
示例2,编码端100可将上述第一类滤波器的类型信息作为渲染信息发送至解码端200。例如,编码端100将包括第一类滤波器的类型信息的渲染信息进行压缩后发送至解码端200。
在本实施例中,相较于采用固定的滤波器对重建图像进行处理得到参考帧,该参考帧与第二图像的相似度较低导致压缩性能较低的问题,本实施例中编码端从设定的多类滤波器中确定第一类滤波器,该第一类滤波器与图像渲染过程中所使用的滤波器匹配,有利于提升编码端利用该第一类滤波器确定的虚拟参考帧与第二图像之间的相似度。从而,在编码端基于该虚拟参考帧对第二图像编码时,该第二图像对应的残差值降低,码流中该第二图像对应的数据量减少,提升了压缩性能以及视频编码的效率。
可选的,针对于以上编码端100从“Catmull-Rom滤波器/bilinear滤波器”确定一个滤波器作为第一类滤波器,下面给出了一种可能的实现方式。
对于渲染参数包括深度图而言,编码端100将根据深度图中物体边缘信息与深度图对应第一解码图像中参考块的对应关系,从“Catmull-Rom滤波器/bilinear滤波器”确定第一类滤波器。
示例的,首先,编码端100将对深度图进行sobel滤波,得到滤波结果。其次,编码端100对该滤波结果进行大律法二值化处理,得到深度图中物体的边缘信息。最后,编码端100根据第一解码图像中各个参考块或该各个参考块对应的深度图中图像块,与该物体的边缘信息的对应关系,从“Catmull-Rom滤波器/bilinear滤波器”确定各个参考块对应的第一类滤波器。例如,当第一解码图像中的一个参考块中像素的位置处于该边缘信息指示
的位置上时,将Catmull-Rom滤波器作为该参考块的第一类滤波器;当第一解码图像中的一个参考块中像素的位置不处于该边缘信息指示的位置上时,将bilinear滤波器作为该参考块的第一类滤波器。
可选的,针对于上述确定的渲染参数中优先级最高的处理参数为反照率图,以及滤波器为“Catmull-Rom滤波器/B-spline滤波器”时,从“Catmull-Rom滤波器/B-spline滤波器”中确定一个滤波器作为第一类滤波器,下面给出了一种可能的示例。
首先,编码端100将反照率图进行sobel滤波,得到滤波结果。其次,编码端100对该滤波结果进行大律法二值化处理,得到反照率图中物体的边缘信息。最后,编码端100根据第一解码图像中各个参考块或该各个参考块对应的深度图中图像块,与该物体的边缘信息的对应关系,从“Catmull-Rom滤波器/B-spline滤波器”中确定各个参考块对应的第一类滤波器。例如,当第一解码图像中的一个参考块中像素的位置处于该边缘信息指示的位置上时,将Catmull-Rom滤波器作为该参考块的第一类滤波器;当第一解码图像中的一个参考块中像素的位置不处于该边缘信息指示的位置上时,将B-spline滤波器作为该参考块的第一类滤波器。
编码端根据参考块与物体的边缘信息间的位置关系的不同,对不同的参考块采用不同的第一类滤波器进行插值,使编码端插值得到的虚拟参考块与第二图像中对应图像块的相似度提高,编码端利用该虚拟参考块对上述图像块进行编码时,该图像块对应的残差值降低,提升了压缩性能以及视频编码的效率。
针对以上在图形运动矢量图与第一解码图像的尺寸不一致时,编码端100对图形运动矢量图进行插值,下面给出了一种可能的实现方式。
首先,编码端100生成与源图像或图形运动矢量图尺寸一致的空白图像。其次,编码端100基于第一图形运动矢量图与空白图像间尺寸比例,确定第一图形运动矢量图中一个像素与空白图像中像素位置的第二对应关系。最后,编码端100利用插值滤波器对图形运动矢量图或第一解码图像进行插值,确定空白图像中每个像素的像素值和偏移。在插值后的图形运动矢量图与第一解码图像的尺寸一致时,得到该插值后的图形运动矢量图指示的第一对应关系,对于第一对应关系的表述,可参照上述图6所示的第二步内容。该源图像与第一解码图像的尺寸一致。
示例的,如图7所示,图7为本申请提供的图形运动矢量插值方法的示意图,以第一图形运动矢量图为图像渲染引擎输出的原始的图形运动矢量图,且该第一图形运动矢量图的尺寸小于第一解码图像的尺寸为例进行说明。
该第一图形运动矢量图的尺寸为4x4,而第一解码图像的尺寸为8x8。编码端100生成与第一解码图像尺寸一致的空白图像,如8x8的空白图像。编码端100确定第一图形运动矢量图与空白图像的尺寸比例为1:4,即第一图形运动矢量图中的一个像素对应空白图像中的4个像素。如图6所示,当第一图形运动矢量图中的像素C的坐标为(1,1),且其带有偏移(1,1),对应到空白图像中4个像素为C(1,1),C1(1,2),C2(2,1),C3(2,2)。编码端100基于第一图形运动矢量图中像素C与空白图像中像素C,C1,C2,C3的位置的第二对应关系,利用插值滤波器对第一图形运动矢量图进行插值,得到第二图形运动矢量图中像素C,C1,C2,C3的像素值与偏移。如前述空白图像中像素的偏移都为(2,2)。编码端100根据该偏移,即可得到第一对应关系。
对于上述插值滤波器,本申请给出了几种可能的示例,如bilinear滤波、nearest滤波、bicubic滤波等中的一种。
在图形运动矢量与源图像不一致的情况下,编码端对图形运动矢量图进行插值,得到插值后的图形运动矢量图。该插值后的图形运动矢量图的尺寸与第一解码图像的尺寸一致,且插值后的图形运动矢量图与第一解码图像的像素匹配,编码端基于该插值后的图形运动矢量图对第一解码图像进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像的匹配度提高。编码端根据该虚拟参考帧对第二图像进行编码时,该图像对应的残差值降低,提升了压缩性能。
可选的,针对于在上述步骤S520中确定出的第一类滤波器为可设置滤波核的滤波器时,编码端100还将确定该滤波器所使用的滤波核大小,下面给出了一种可能的实现方式。
第一,编码端100基于渲染信息中指示的图形运动矢量图,确定第一解码图像中像素的目标离散参数。
在一种可能的示例中,当渲染信息中图形运动矢量图与第一解码图像的尺寸不一致时,编码端100根据插值后的图形运动矢量图指示的第一图像和第二图像具有至少部分重叠区域,取出处于第一解码图像中重叠区域的图像块,计算该图像块中像素在每个通道的离散参数,作为目标离散参数。
在另一种可能的示例中,当渲染信息中图形运动矢量图与第一解码图像的尺寸一致时,编码端100计算第一解码图像中当前参考块中像素在每个通道的离散参数,作为目标离散参数。当前参考块可指示正在执行插值的参考块。
其中,该像素在每个通道用于指示在亮度(Y)、蓝色浓度偏移量(Cb)和红色浓度偏移量(Cr)三种色彩的通道,该目标离散参数包括:目标方差和目标协方差。
第二,编码端100中第一类滤波器利用第一滤波核对参考块进行插值,并计算插值得到的虚拟参考块中像素在每个通道的第一离散参数。
其中,第一滤波核为设定的多个滤波核中的一个,该多个滤波核具有不同的大小。
例如,当第一滤波核为3x3的滤波核,且该3x3的滤波核设有权重。第一类滤波器利用该3x3的滤波核对参考块进行插值滤波,得到虚拟参考块上各个像素的像素值。编码端100根据虚拟参考块上各个像素的像素值,计算该虚拟参考块中像素在每个通道的第一方差或第一协方差,从而得到该虚拟参考块的第一离散参数。
上述滤波核中的权重,可基于图形运动矢量图得到。例如图形运动矢量图指示:第一图像中的像素与第二图像中的像素的位置映射关系,该映射关系并非一定是整像素的映射关系。在第一图像中的物体运动后,得到第二图像,该第一图像中位于第一位置的像素偏移到了第二图像中的第二位置,该第二位置可能在整像素位置、也可能在第二图像中的多个像素所处的位置之间。如该第二位置对应的距离值为0.5,表明该第二位置位于第二图像中的两个像素所处位置的中位。编码端100根据上述像素的位置映射关系,确定滤波核中的权重。
第三,编码端100从所有滤波核对应的至少一个第一离散参数中,确定与目标离散参数差值最小的第一离散参数,将该第一离散参数对应的滤波核作为第一类滤波器的参数。
编码端100确定与目标离散参数差值最小的第一离散参数,编码端100将该第一离散参数对应的第一滤波核,作为第一类滤波器的参数。编码端利用前述的第一类滤波器和第
一类滤波的参数,对第一解码图像进行插值,得到虚拟参考帧。该虚拟参考帧与第二图像的匹配度提高,在编码端根据虚拟参考帧来对第二图像进行编码时,该第二图像对应的残差值降低,使得码流中该图像的数据量减少,提升了压缩性能。
在编码端100根据上述视频编码方法对源视频编码完毕,或对源视频中的一个或多个源图像编码完毕后,编码端100得到源视频或源图像对应码流。编码端100将该码流发送至解码端200,解码端200对码流进行解码并播放。如图8所示,图8为本申请提供的一种视频解码方法的流程示意图。该视频解码方法可应用于图1所示出的视频编解码系统或者图3所示出的视频解码器中,示例性的,该视频解码方法可由解码端200或者视频解码器220执行,这里以解码端200执行本实施例提供的图像解码方法为例进行说明。如图8所示,本实施例提供的图像编码方法包括以下步骤S810至S840。
S810、解码端200获取码流和码流对应的渲染信息。
其中,渲染信息用于指示:解码端200根据多个源图像生成码流的过程中所使用的处理参数。对于处理参数的内容,可参照上述图5所示的S510中对处理参数的表述,在此不予赘述。该处理参数中的图形运动矢量图指示了多个图像帧中第一图形帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域。
例如,该渲染信息可由编码端100发送至解码端200。
在一种可能的示例中,解码端200获取到的渲染信息包括:如上表1所示映射表中的处理参数中的一种或多种组合。
S820、解码端200基于渲染信息,确定第一类滤波器。
其中,该第一类滤波器为设定的多类滤波器中的一类。
示例性的,解码端200根据渲染信息指示的处理参数,查询如上表1得到处理参数对应的第一类滤波器。该表1示出了处理参数对应的多类滤波器中至少一类滤波器,且每类滤波器具有对应的优先级。
针对于解码端200根据渲染信息确定第一类滤波器的内容,可参照编码端100根据渲染信息包括的处理参数确定第一类滤波器的示例,在此不予赘述。
在一种可能的情形中,解码端200可根据渲染信息,查询预设的映射表,获取渲染信息中每个处理参数对应的滤波器。解码端200从渲染信息对应的所有滤波器中,确定第一类滤波器。
对于映射表的有关内容,可参照上述图5中的S520以及表1所示的确定第一类滤波器的表述,在此不予赘述。
在一种可能的情形中,该解码端200获取滤波器信息。滤波器根据该滤波器信息中指示的滤波器种类及参数,确定第一类滤波器及第一类滤波器对应的滤波参数。
其中,该滤波参数用于指示:解码端200基于第一类滤波器得到虚拟参考帧的过程中使用的处理参数,如滤波器的滤波核的大小以及对应权重等。
示例性的,上述渲染信息或码流中可包括滤波信息。
在一种可能的示例中,在编码端100中图像渲染引擎的TAA算法确定的情况下,在解码端200上预设第一类滤波器。
S830、解码端200根据第一类滤波器,对第一图像帧对应的第一解码图像进行插值,得到第二图像帧的虚拟参考帧。
示例的,首先,解码端200通过生成与参考块尺寸一致的空白图像;其次,解码端200根据第一对应关系,利用拆分后的图形运动矢量图对参考块进行插值,确定空白图像中所有像素的像素值;最后,解码端200根据空白图像中所有像素的像素值,得到虚拟参考帧。其中,第一对应关系基于渲染信息中的图形运动矢量图确定。该参考块为第一解码图像,且该第一解码图像为第一图像对应的多个参考块中的任一个。
对于上述示例的详细说明,可参考上述图6所示出的内容,在此不予赘述。
编码端根据第一对应关系,对第一解码图像中的参考块进行插值,得到第二图像的虚拟参考帧,该虚拟参考帧与第二图像帧的相似度高于第一解码图像与第二图像帧的相似度,解码端200根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
S840、解码端200基于虚拟参考帧对第二图像帧进行解码,获取第二图像帧对应的第二解码图像。
解码端200将第二图像帧进行反量化、反变换等处理,得到第二图像帧对应的残差块、及运动矢量等。解码端200利用该残差块、运动矢量等数据以及S830中确定的虚拟参考帧进行图像重建,得到第二图像帧对应的第二解码图像。
示例的,解码端200利用如图3中的视频解码器220对码流中第二图像帧经熵解码器221、反量化器222和反变换器223后得到第二图像帧与对应虚拟参考帧的残差块,视频解码器220利用该残差块、运动矢量以及虚拟参考帧进行图像块重建。视频解码器220将重建得到的多个图像块经滤波器单元224处理,得到第二图像帧对应的第二解码图像。
解码端200根据渲染信息指示的处理参数,从设定的多类滤波器中选择第一类滤波器,并利用该第一类滤波器来获取第二图像帧的虚拟参考帧,避免采用固定的滤波器对第一解码图像进行处理得到参考帧导致的解码效果较差的问题,该虚拟参考帧与第二图像帧对应的源图像的匹配度提高,解码端200根据该虚拟参考帧对第二图像帧进行解码时,在解码端的处理能力一致时,解码端处理的数据量减少,提高了解码效率。
当解码端200得到的图形运动矢量图与第一解码图像尺寸不一致时,解码端200将对图形运动矢量图进行插值,示例的:首先,解码端200生成与第一解码图像尺寸一致的空白图像。其次,解码端200基于第一图形运动矢量图与空白图像间尺寸比例,确定第一图形运动矢量图中一个像素与空白图像中像素位置的第二对应关系。最后,解码端200利用插值滤波器对图形运动矢量图进行插值,确定空白图像中每个像素的像素值和偏移。
其中,该第一解码图像与源图像的尺寸一致。
对于上述示例的详细描述,可参照上述图7所示出的内容,在此不予赘述。
在图形运动矢量与第一解码图像不一致的情况下,解码端对图形运动矢量图进行插值,得到插值后的图形运动矢量图。该插值后的图形运动矢量图的尺寸与第一解码图像的尺寸一致,且插值后的图形运动矢量图与第一解码图像的匹配,解码端基于该插值后的图形运动矢量图对第一解码图像进行插值,得到虚拟参考帧,该虚拟参考帧与第二图像帧的匹配度提高。解码端200根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
可选的,针对于在上述步骤S820中确定出的第一类滤波器为可设置滤波核的滤波器时,解码端200还将确定该滤波器所使用的滤波核大小,下面给出了一种可能的实现方式。
第一,解码端200基于渲染信息中指示的图形运动矢量图,确定第一解码图像中像素的目标离散参数。
第二,解码端200中第一类滤波器利用第一滤波核对参考块进行插值,并计算插值得到的虚拟参考块中像素在每个通道的第一离散参数。
第三,解码端200从所有滤波核对应的至少一个第一离散参数中,确定与目标离散参数差值最小的第一离散参数,解码端200将该第一离散参数对应的滤波核作为第一类滤波器的参数。
对于上述解码端200确定滤波核的过程,可参考上述编码端100确定滤波核大小的内容,在此不予赘述。
解码端200确定与目标离散参数差值最小的第一离散参数,解码端200将该第一离散参数对应的第一滤波核,作为第一类滤波器的参数。解码端200利用前述的第一类滤波器和第一类滤波的参数,对第一解码图像进行插值,得到虚拟参考帧。该虚拟参考帧与第二图像帧的匹配度提高,解码端200根据该虚拟参考帧对第二图像帧进行解码时,处理的数据量减少,提高了解码效率。
上文中结合图4至图7,详细描述了根据本申请所提供的视频编码的方法,下面将结合图9,图9为本申请提供的一种视频编码装置的结构示意图,描述根据本申请所提供的视频编码装置。视频编码装置900可以用于实现上述方法实施例中编码端的功能,因此也能实现上述方法实施例所具备的有益效果。
如图9所示,视频编码装置900包括第一获取模块910、第一插值模块920和编码模块930;该视频编码装置900用于实现上述图4至图7中所对应的方法实施例中编码端的功能。在一种可能的示例中,该视频编码装置900用于实现上述视频编码方法的具体过程包括以下过程:
第一获取模块910,用于获取源视频和源视频对应的渲染信息。其中,该源视频包括多个源图像,该渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,处理参数指示了多个源图像中的第一图像和第二图像具有至少部分重叠区域。
第一插值模块920,用于根据渲染信息,确定第一类滤波器。以及根据该第一类滤波器,对第一解码图像进行插值,得到第二图像的虚拟参考帧。其中,第一类滤波器为设定的多类滤波器中的一类,第一解码图像为多个图像中第一图像进行编码后再解码获取的重建图像,该第一图像在第二图像之前进行编码。
编码模块930,用于基于第二图像的虚拟参考帧对第二图像信息编码。
为进一步实现上述图4至图7中所示的方法实施例中的功能。本申请还提供了一种视频编码装置,该视频编码装置900还包括第一滤波核确定模块940。
其中,该第一滤波核确定模块940,用于基于渲染信息,确定第一解码图像中像素的目标离散参数;该目标离散参数包括:目标方差或目标协方差;以及将与目标离散参数差值最小的第一离散参数对应的第一滤波核,作为第一类滤波器的参数,第一离散参数用于指示:第一类滤波器基于第一滤波核对像素进行插值,并基于插值后的像素计算得到的离散参数,第一滤波核为设定的多个滤波核中的一个,第一离散参数包括第一方差或第一协方差。
应理解,前述实施例的编码端可对应于该视频编码装置900,并可以对应于执行根据
本申请实施例的方法图4至图7对应的相应主体,并且视频编码装置900中的各个模块的操作和/或功能分别为了实现图4至图7中对应实施例的各个方法的相应流程,为了简洁,在此不再赘述。
上文中结合图3和图8,详细描述了根据本申请所提供的视频解码的方法,下面将结合图10,图10为本申请提供的一种视频解码装置的结构示意图,描述根据本申请所提供的视频解码装置。视频解码装置1000可以用于实现上述方法实施例中解码端的功能,因此也能实现上述方法实施例所具备的有益效果。
如图10所示,视频解码装置1000包括第二获取模块1010、第二插值模块1020和解码模块1030;该视频解码装置1000用于实现上述图3和图8中所对应的方法实施例中解码端的功能。在一种可能的示例中,该视频解码装置1000用于实现上述视频解码方法的具体过程包括以下过程:
第二获取模块1010,用于获取码流和码流对应的渲染信息。该码流包括多个图像帧,渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,处理参数指示了多个图像帧中第一图像帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域。
第二插值模块1020,用于基于渲染信息,确定第一类滤波器,以及根据第一类滤波器,对第一图像帧对应的第一解码图像进行插值,得到第二图像帧的虚拟参考帧。其中,第一类滤波器为设定的多类滤波器中的一类。
解码模块1030,用于根据虚拟参考帧对第二图像帧进行解码,获取第二图像帧对应的第二解码图像。
为进一步实现上述图3和图8中所示的方法实施例中的功能。本申请还提供了一种视频解码装置,该视频解码装置1000还包括滤波器信息获取模块1040和第二滤波核确定模块1050。
其中,滤波器信息获取模块1040,用于获取滤波器信息,以及根据滤波器信息中指示的滤波器种类及参数,确定第一类滤波器及第一类滤波器对应的滤波参数,滤波参数用于指示:基于第一类滤波器得到虚拟参考帧的过程中使用的处理参数。
第二滤波核确定模块1050,用于基于渲染信息,确定第一解码图像中像素的目标离散参数;该目标离散参数包括:目标方差或目标协方差;以及将与目标离散参数差值最小的第一离散参数对应的第一滤波核,作为第一类滤波器的参数,第一离散参数用于指示:第一类滤波器基于第一滤波核对像素进行插值,并基于插值后的像素计算得到的离散参数,第一滤波核为设定的多个滤波核中的一个,第一离散参数包括第一方差或第一协方差。
应理解,根据本申请实施例的解码端可对应于申请实施例中的视频解码装置1000,并可以对应于执行根据本申请实施例的方法图3和图8对应的相应主体,并且视频解码装置1000中的各个模块的操作和/或功能分别为了实现图3和图8中对应实施例的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例提供的一种计算机设备,如图11所示,图11为本申请提供的一种计算机设备的结构示意图。该计算机设备可应用于图1所示的编解码系统中,该计算机设备可以为编码端100或解码端200中任一个。
计算机设备1100具体可以为手机、平板电脑、电视(也可称为智能电视、智慧屏或大
屏设备)、笔记本电脑、超级移动个人计算机(Ultra-mobile Personal Computer,UMPC)、手持计算机、上网本、个人数字助理(Personal Digital Assistant,PDA)、可穿戴电子设备(例如:智能手表,智能手环,智能眼镜)、车载设备、虚拟现实设备、服务器等具有计算能力的计算设备。
如图11所示,处理设备1100可以包括处理器1110、存储器1120、通信接口1130和总线1140等,处理器1110、存储器1120、通信接口1130通过总线1140连接。
可以理解的是,本发明实施例示意的结构并不构成对处理设备的具体限定。在本申请另一些实施例中,处理设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器1110可以包括一个或多个处理单元,例如:处理器1110可以包括应用处理器(application processor,AP),调制解调处理器,中央处理器(Central Processing Unit,CPU),图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器1110中还可以设置内部存储器,用于存储指令和数据。在一些实施例中,处理器1110中的内部存储器为高速缓冲存储器。该内部存储器可以保存处理器1110刚用过或循环使用的指令或数据。如果处理器1110需要再次使用该指令或数据,可从所述内部存储器中直接调用。避免了重复存取,减少了处理器1110的等待时间,因而提高了系统的效率。
存储器1120可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器1110通过运行存储在内部存储器1120的指令,从而执行处理设备1100的各种功能应用以及数据处理。内部存储器1120可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如编码功能,发送功能等)等。存储数据区可存储处理设备1100使用过程中所创建的数据(比如码流,参考帧等)等。此外,内部存储器1120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
通信接口1130用于实现处理设备1100与外部设备或器件的通信。在本实施例中,通信接口1130用于与其他处理设备进行数据交互。
总线1140可以包括一通路,用于在上述组件(如处理器1110、存储器1120、通信接口1130)之间传送信息。总线1140除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线1140。总线1140可以是快捷外围组件互连(peripheral component interconnect express,PCIe)高速总线,扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。
值得说明的是,图11中仅以处理设备1100包括1个处理器1110和1个存储器1120为例,此处,处理器1110和存储器1120分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。
当视频编码(或视频解码)装置通过硬件实现时,该硬件可以通过处理器或芯片实现。下面以该硬件为芯片为例进行说明,该芯片包括处理器,用于实现上述方法中编码端和/或解码端的功能。在一种可能的设计中,所述芯片还包括供电电路,用于为所述处理器供电。该芯片,可以直接由芯片构成,也可以包括芯片和其他分立器件。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
Claims (43)
- 一种视频解码方法,其特征在于,该方法由解码端执行,所述方法包括:获取码流和所述码流对应的渲染信息;所述码流包括多个图像帧,所述渲染信息用于指示:根据多个源图像生成所述码流的过程中所使用的处理参数,所述处理参数指示了所述多个图像帧中第一图像帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域;根据由所述渲染信息确定的第一类滤波器,对所述第一图像帧对应的第一解码图像进行插值,获取所述第二图像帧的虚拟参考帧;其中,所述第一类滤波器为设定的多类滤波器中的一类;根据所述虚拟参考帧对所述第二图像帧进行解码,获取第二图像帧对应的第二解码图像。
- 根据权利要求1所述的方法,其特征在于,所述根据由所述渲染信息确定的第一类滤波器包括:根据所述渲染信息查询设定的映射表,获取所述渲染信息中每个处理参数对应的滤波器;所述映射表用于指示:所述每个处理参数对应的所述多类滤波器中至少一类滤波器;从所述渲染信息对应的所有滤波器中,确定第一类滤波器。
- 根据权利要求2所述的方法,其特征在于,所述映射表还用于指示:所述多类滤波器中每类滤波器的优先级;所述从所述渲染信息对应的所有滤波器中,确定第一类滤波器包括:获取所述渲染信息对应的所有滤波器的优先级;将所述所有滤波器中滤波器优先级最高的滤波器作为所述第一类滤波器。
- 根据权利要求1至3中任一项所述的方法,其特征在于,所述渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
- 根据权利要求4所述的方法,其特征在于,若所述渲染信息包括所述深度图时,则所述第一类滤波器通过所述深度图中的像素与所述深度图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括所述反照率图时,则所述第一类滤波器通过所述反照率图中的像素与所述反照率图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括的所述后处理参数包含抗锯齿参数时,则所述第一类滤波器为所述抗锯齿参数所指示的滤波器;或者,若所述渲染信息包括的所述后处理参数包含执行运动模糊模块的指令时,则所述第一类滤波器为bilinear滤波器;或者,所述第一类滤波器为Catmull-Rom滤波器。
- 根据权利要求1至5中任一项所述的方法,其特征在于,在根据由所述渲染信息确定的第一类滤波器,对第一图像帧对应的第一解码图像进行插值之前,所述方法还包括:获取滤波器信息;根据所述滤波器信息中指示的滤波器种类及参数,确定所述第一类滤波器及第一类滤波器对应的滤波参数,所述滤波参数用于指示:基于所述第一类滤波器得到所述虚拟参考帧的过程中使用的处理参数。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据由所述渲染信息确定的第一类滤波器,对第一图像帧对应的第一解码图像进行插值,获取第二图像帧的虚拟参考帧包括:根据第一对应关系,在所述第一解码图像上进行插值,获取所述虚拟参考帧中所有像素的像素值;所述第一对应关系用于指示:在所述第一图像帧对应的源图像和第二图像帧对应的源图像的重叠区域中,第一图像帧对应的源图像中的像素和第二图像帧对应的源图像中的像素的位置映射关系;根据所述所有像素的像素值,得到所述虚拟参考帧。
- 根据权利要求7所述的方法,其特征在于,所述第一对应关系通过以下方法得到:当所述渲染信息中的图形运动矢量图与所述源图像的尺寸不匹配时,创建尺寸与所述源图像匹配的空图像;确定所述图形运动矢量图上的像素坐标与所述空图像上的像素坐标的第二对应关系;基于所述第二对应关系,利用设定的滤波器对所述图形运动矢量图进行插值,获取所述第一对应关系。
- 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:基于所述渲染信息,确定所述第一解码图像中像素的目标离散参数;所述目标离散参数包括:目标方差或目标协方差;将与所述目标离散参数差值最小的第一离散参数对应的第一滤波核,作为所述第一类滤波器的参数,所述第一离散参数用于指示:第一类滤波器基于所述第一滤波核对所述像素进行插值,并基于插值后的像素计算得到的离散参数,所述第一滤波核为设定的多个滤波核中的一个,所述第一离散参数包括第一方差或第一协方差。
- 一种视频编码方法,其特征在于,该方法由编码端执行,所述方法包括:获取多个源图像和所述多个源图像对应的渲染信息;所述渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,所述处理参数指示了所述多个源图像中的第一图像和第二图像具有至少部分重叠区域;根据由所述渲染信息确定的第一类滤波器,对第一解码图像进行插值,获取第二图像的虚拟参考帧;其中,第一类滤波器为设定的多类滤波器中的一类,所述第一解码图像为所述多个图像中第一图像进行编码后再解码获取的重建图像;基于所述虚拟参考帧对所述第二图像进行编码,得到所述第二图像对应的码流。
- 根据权利要求10所述的方法,其特征在于,所述根据由所述渲染信息确定的第一类滤波器包括:根据所述渲染信息查询设定的映射表,获取所述渲染信息中每个处理参数对应的滤波器;所述映射表用于指示:所述每个处理参数对应的所述多类滤波器中至少一类滤波器;从所述渲染信息对应的所有滤波器中,确定第一类滤波器。
- 根据权利要求11所述的方法,其特征在于,所述映射表还用于指示:所述多类滤波器中每类滤波器的优先级;所述从所述渲染信息对应的所有滤波器中,确定第一类滤波器包括:获取所述渲染信息对应的所有滤波器的优先级;将所述所有滤波器中滤波器优先级最高的滤波器作为所述第一类滤波器。
- 根据权利要求11所述的方法,其特征在于,所述从所述渲染信息对应的所有滤波器中,确定第一类滤波器包括:对所述渲染信息对应的所有滤波器进行预测,获取至少一个预测结果;一个预测结果对应所述所有滤波器中的一个滤波器,所述一个预测结果用于指示:基于所述一个滤波器对第二图像进行预编码获取的编码信息,所述编码信息包括:所述第二图像的预测码率和失真率中至少一种;选择所述至少一个预测结果中编码信息符合设定的条件的目标预测结果,并将所述目标预测结果对应的滤波器作为所述第一类滤波器。
- 根据权利要求10至13中任一项所述的方法,其特征在于,所述渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
- 根据权利要求14所述的方法,其特征在于,若所述渲染信息包括所述深度图时,则所述第一类滤波器通过所述深度图中的像素与所述深度图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括所述反照率图时,则所述第一类滤波器通过所述反照率图中的像素与所述反照率图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括的所述后处理参数包含抗锯齿参数时,则所述第一类滤波器为所述抗锯齿参数所指示的滤波器;或者,若所述渲染信息包括的所述后处理参数包含执行运动模糊模块的指令时,则所述第一类滤波器为bilinear滤波器;或者,所述第一类滤波器为Catmull-Rom滤波器。
- 根据权利要求10至15中任一项所述的方法,其特征在于,所述根据由所述渲染信息确定的第一类滤波器,对第一解码图像进行插值,获取第二图像的虚拟参考帧包括:根据第一对应关系,在所述第一解码图像上进行插值,获取所述虚拟参考帧中所有像素的像素值;所述第一对应关系用于指示:在所述第一图像和所述第二图像的重叠区域中,第一图像中的像素与第二图像的像素的位置映射关系;根据所述所有像素的像素值,得到所述虚拟参考帧。
- 根据权利要求16所述的方法,其特征在于,所述第一对应关系通过以下方法得到:当所述渲染信息中的图形运动矢量图与所述源图像的尺寸不匹配时,将创建尺寸与所述源图像匹配的空图像;确定所述图形运动矢量图上的像素坐标与所述空图像上的像素坐标的第二对应关系;基于所述第二对应关系,利用设定的滤波器对所述图形运动矢量图进行插值,获取第一对应关系。
- 根据权利要求16或17所述的方法,其特征在于,所述方法还包括:基于所述渲染信息,确定所述第一解码图像中像素的目标离散参数;所述目标离散参数包括:目标方差或目标协方差;将与所述目标离散参数差值最小的第一离散参数对应的第一滤波核,作为所述第一类滤波器的参数,所述第一离散参数用于指示:第一类滤波器基于所述第一滤波核对所述像素进行插值,并基于插值后的像素计算得到的离散参数,所述第一滤波核为设定的多个滤波核中的一个,所述第一离散参数包括第一方差或第一协方差。
- 根据权利要求10至18中任一项所述的方法,其特征在于,所述方法还包括:将所述第一类滤波器的类型信息写入所述码流。
- 一种视频解码装置,其特征在于,所述装置包括:第二获取模块,用于获取码流和所述码流对应的渲染信息;所述码流包括多个图像帧,所述渲染信息用于指示:根据多个源图像生成所述码流的过程中所使用的处理参数,所述处理参数指示了所述多个图像帧中第一图像帧对应的源图像和第二图像帧对应的源图像具有至少部分重叠区域;第二插值模块,用于根据由所述渲染信息确定的第一类滤波器,对所述第一图像帧对应的第一解码图像进行插值,获取所述第二图像帧的虚拟参考帧;其中,所述第一类滤波器为设定的多类滤波器中的一类;解码模块,用于根据所述虚拟参考帧对所述第二图像帧进行解码,获取第二图像帧对应的第二解码图像。
- 根据权利要求20所述的装置,其特征在于,所述第二插值模块,还用于根据所述渲染信息查询设定的映射表,获取所述渲染信息中每个处理参数对应的滤波器;所述映射表用于指示:所述每个处理参数对应的所述多类滤波器中至少一类滤波器;以及,从所述渲染信息对应的所有滤波器中,确定第一类滤波器。
- 根据权利要求21所述的装置,其特征在于,所述映射表还用于指示:所述多类滤波器中每类滤波器的优先级;所述第二插值模块,还用于获取所述渲染信息对应的所有滤波器的优先级;以及,将所述所有滤波器中滤波器优先级最高的滤波器作为所述第一类滤波器。
- 根据权利要求20至22中任一项所述的装置,其特征在于,所述渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
- 根据权利要求23所述的装置,其特征在于,若所述渲染信息包括所述深度图时,则所述第一类滤波器通过所述深度图中的像素与所述深度图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括所述反照率图时,则所述第一类滤波器通过所述反照率图中的像素与所述反照率图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括的所述后处理参数包含抗锯齿参数时,则所述第一类滤波器为所述抗锯齿参数所指示的滤波器;或者,若所述渲染信息包括的所述后处理参数包含执行运动模糊模块的指令时,则所述第一类滤波器为bilinear滤波器;或者,所述第一类滤波器为Catmull-Rom滤波器。
- 根据权利要求20至24中任一项所述的装置,其特征在于,所述装置还包括:滤波器信息获取模块,用于获取滤波器信息;以及,根据所述滤波器信息中指示的滤波器种类及参数,确定所述第一类滤波器及第一类滤波器对应的滤波参数,所述滤波参数用于指示:基于所述第一类滤波器得到所述虚拟参考帧的过程中使用的处理参数。
- 根据权利要求20至25中任一项所述的装置,其特征在于,所述第二插值模块,还用于根据第一对应关系,在所述第一解码图像上进行插值,获取所述虚拟参考帧中所有像素的像素值;所述第一对应关系用于指示:在所述第一图像帧对应的源图像和第二图像帧 对应的源图像的重叠区域中,第一图像帧对应的源图像中的像素和第二图像帧对应的源图像中的像素的位置映射关系;以及,根据所述所有像素的像素值,得到所述虚拟参考帧。
- 根据权利要求26所述的装置,其特征在于,所述对应关系通过以下方法得到:当所述渲染信息中的图形运动矢量图与所述源图像的尺寸不匹配时,创建尺寸与所述源图像匹配的空图像;确定所述图形运动矢量图上的像素坐标与所述空图像上的像素坐标的第二对应关系;基于所述第二对应关系,利用设定的滤波器对所述图形运动矢量图进行插值,获取所述第一对应关系。
- 根据权利要求26或27所述的装置,其特征在于,所述装置还包括:第二滤波核确定模块,用于基于所述渲染信息,确定所述第一解码图像中像素的目标离散参数;所述目标离散参数包括:目标方差或目标协方差;以及,将与所述目标离散参数差值最小的第一离散参数对应的第一滤波核,作为所述第一类滤波器的参数,所述第一离散参数用于指示:第一类滤波器基于所述第一滤波核对所述像素进行插值,并基于插值后的像素计算得到的离散参数,所述第一滤波核为设定的多个滤波核中的一个,所述第一离散参数包括第一方差或第一协方差。
- 一种视频编码装置,其特征在于,所述装置包括:第一获取模块,用于获取多个源图像和所述多个源图像对应的渲染信息;所述渲染信息用于指示:根据多个源图像生成码流的过程中所使用的处理参数,所述处理参数指示了所述多个源图像中的第一图像和第二图像具有至少部分重叠区域;第一插值模块,用于根据由所述渲染信息确定的第一类滤波器,对第一解码图像进行插值,获取第二图像的虚拟参考帧;其中,第一类滤波器为设定的多类滤波器中的一类,所述第一解码图像为所述多个图像中第一图像进行编码后再解码获取的重建图像;编码模块,用于基于所述虚拟参考帧对所述第二图像进行编码。
- 根据权利要求29所述的装置,其特征在于,所述第一插值模块,还用于根据所述渲染信息查询设定的映射表,获取所述渲染信息中每个处理参数对应的滤波器;所述映射表用于指示:所述每个处理参数对应的所述多类滤波器中至少一类滤波器;以及,从所述渲染信息对应的所有滤波器中,确定第一类滤波器。
- 根据权利要求30所述的装置,其特征在于,所述映射表还用于指示:所述多类滤波器中每类滤波器的优先级;所述第一插值模块,还用于获取所述渲染信息对应的所有滤波器的优先级;以及,将所述所有滤波器中滤波器优先级最高的滤波器作为所述第一类滤波器。
- 根据权利要求30所述的装置,其特征在于,所述第一插值模块,还用于对所述渲染信息对应的所有滤波器进行预测,获取至少一个预测结果;一个预测结果对应所述所有滤波器中的一个滤波器,所述一个预测结果用于指示:基于所述一个滤波器对第二图像进行预编码获取的编码信息,所述编码信息包括:所述第二图像的预测码率和失真率中至少一种;以及,选择所述至少一个预测结果中编码信息符合设定的条件的目标预测结果,并将所述目标预测结果对应的滤波器作为所述第一类滤波器。
- 根据权利要求29至32中任一项所述的装置,其特征在于,所述渲染信息包括:深度图、反照率图和后处理参数中的一种或多种的组合。
- 根据权利要求33所述的装置,其特征在于,若所述渲染信息包括所述深度图时,则所述第一类滤波器通过所述深度图中的像素与所述深度图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括所述反照率图时,则所述第一类滤波器通过所述反照率图中的像素与所述反照率图中物体的边缘信息的关系来确定;或者,若所述渲染信息包括的所述后处理参数包含抗锯齿参数时,则所述第一类滤波器为所述抗锯齿参数所指示的滤波器;或者,若所述渲染信息包括的所述后处理参数包含执行运动模糊模块的指令时,则所述第一类滤波器为bilinear滤波器;或者,所述第一类滤波器为Catmull-Rom滤波器。
- 根据权利要求29至34中任一项所述的装置,其特征在于,所述第一插值模块,还用于根据第一对应关系,在所述第一解码图像上进行插值,获取所述虚拟参考帧中所有像素的像素值;所述第一对应关系用于指示:在所述第一图像和所述第二图像的重叠区域中,第一图像中的像素与第二图像的像素的位置映射关系;以及,根据所述所有像素的像素值,得到所述虚拟参考帧。
- 根据权利要求35所述的装置,其特征在于,所述第一对应关系通过以下方法得到:当所述渲染信息中的图形运动矢量图与所述源图像的尺寸不匹配时,将创建尺寸与所述源图像匹配的空图像;确定所述图形运动矢量图上的像素坐标与所述空图像上的像素坐标的第二对应关系;基于所述第二对应关系,利用设定的滤波器对所述图形运动矢量图进行插值,获取第一对应关系。
- 根据权利要求35或36所述的装置,其特征在于,所述装置还包括:第一滤波核确定模块,用于基于所述渲染信息,确定所述第一解码图像中像素的目标离散参数;所述目标离散参数包括:目标方差或目标协方差;以及,将与所述目标离散参数差值最小的第一离散参数对应的第一滤波核,作为所述第一类滤波器的参数,所述第一离散参数用于指示:第一类滤波器基于所述第一滤波核对所述像素进行插值,并基于插值后的像素计算得到的离散参数,所述第一滤波核为设定的多个滤波核中的一个,所述第一离散参数包括第一方差或第一协方差。
- 根据权利要求29至37中任一项所述的装置,其特征在于,所述编码模块还用于:将所述第一类滤波器的类型信息写入所述码流。
- 一种芯片,其特征在于,包括:处理器和供电电路;所述供电电路用于为所述处理器供电;所述处理器用于执行权利要求1至9中任一所述的方法;和/或,所述处理器用于执行权利要求10至19中任一项所述的方法。
- 一种编解码器,其特征在于,包括存储器和处理器,所述存储器用于存储计算机指令;所述处理器执行所述计算机指令时,实现权利要求1至9中任一项所述的方法;和/或,所述处理器执行所述计算机指令时,实现权利要求10至19中任一项所述的方法。
- 一种编解码系统,其特征在于,包括编码端和解码端;所述编码端用于根据多个源图像对应的渲染信息,对多个图像进行编码,得到多个图 像对应的码流,实现权利要求10至19中任一项所述的方法;所述解码端用于根据多个源图像对应的渲染信息,对所述码流进行解码,得到多个解码图像,实现权利要求1至9中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被处理设备执行时,实现权利要求1至9中任一项所述的方法;和/或,当所述计算机程序或指令被处理设备执行时,实现权利要求10至19中任一项所述的方法。
- 一种计算机程序产品,包括计算机程序或指令,其特征在于,当所述计算机程序或指令在被处理设备执行时,实现权利要求1至9中任一项所述的方法;和/或,当所述计算机程序或指令在被处理设备执行时,实现权利要求10至19中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211478458.9 | 2022-11-23 | ||
CN202211478458.9A CN118075458A (zh) | 2022-11-23 | 2022-11-23 | 视频编解码方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024108931A1 true WO2024108931A1 (zh) | 2024-05-30 |
Family
ID=91102795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/094873 WO2024108931A1 (zh) | 2022-11-23 | 2023-05-17 | 视频编解码方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118075458A (zh) |
WO (1) | WO2024108931A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060062298A1 (en) * | 2004-09-23 | 2006-03-23 | Park Seung W | Method for encoding and decoding video signals |
CN104641641A (zh) * | 2012-09-28 | 2015-05-20 | 索尼公司 | 编码装置、编码方法、解码装置和解码方法 |
CN105659601A (zh) * | 2013-10-11 | 2016-06-08 | 索尼公司 | 图像处理装置和图像处理方法 |
CN106658019A (zh) * | 2015-10-31 | 2017-05-10 | 华为技术有限公司 | 参考帧编解码的方法与装置 |
CN108632609A (zh) * | 2016-03-17 | 2018-10-09 | 联发科技股份有限公司 | 视频编解码装置及方法 |
CN114071161A (zh) * | 2020-07-29 | 2022-02-18 | Oppo广东移动通信有限公司 | 图像编码方法、图像解码方法及相关装置 |
-
2022
- 2022-11-23 CN CN202211478458.9A patent/CN118075458A/zh active Pending
-
2023
- 2023-05-17 WO PCT/CN2023/094873 patent/WO2024108931A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060062298A1 (en) * | 2004-09-23 | 2006-03-23 | Park Seung W | Method for encoding and decoding video signals |
CN104641641A (zh) * | 2012-09-28 | 2015-05-20 | 索尼公司 | 编码装置、编码方法、解码装置和解码方法 |
CN105659601A (zh) * | 2013-10-11 | 2016-06-08 | 索尼公司 | 图像处理装置和图像处理方法 |
CN106658019A (zh) * | 2015-10-31 | 2017-05-10 | 华为技术有限公司 | 参考帧编解码的方法与装置 |
CN108632609A (zh) * | 2016-03-17 | 2018-10-09 | 联发科技股份有限公司 | 视频编解码装置及方法 |
CN114071161A (zh) * | 2020-07-29 | 2022-02-18 | Oppo广东移动通信有限公司 | 图像编码方法、图像解码方法及相关装置 |
Also Published As
Publication number | Publication date |
---|---|
CN118075458A (zh) | 2024-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101927283B1 (ko) | 비디오 코딩을 위한 저 복잡성 인트라 예측 | |
CN111614956B (zh) | Dc系数符号代码化方案 | |
WO2022068682A1 (zh) | 图像处理方法及装置 | |
JP6761033B2 (ja) | 前フレーム残差を用いた動きベクトル予測 | |
WO2021238540A1 (zh) | 图像编码方法、图像解码方法及相关装置 | |
WO2021185257A1 (zh) | 图像编码方法、图像解码方法及相关装置 | |
JP6285025B2 (ja) | 水平および垂直変換の並行処理 | |
CN113766249A (zh) | 视频编解码中的环路滤波方法、装置、设备及存储介质 | |
WO2021244197A1 (zh) | 图像编码方法、图像解码方法及相关装置 | |
Žádník et al. | Image and video coding techniques for ultra-low latency | |
CN114071161B (zh) | 图像编码方法、图像解码方法及相关装置 | |
CN115769573A (zh) | 编码方法、解码方法及相关装置 | |
AU2023202986A1 (en) | Method and apparatus for intra prediction | |
WO2022266955A1 (zh) | 图像解码及处理方法、装置及设备 | |
WO2024108931A1 (zh) | 视频编解码方法及装置 | |
CN114071162A (zh) | 图像编码方法、图像解码方法及相关装置 | |
CN115866297A (zh) | 视频处理方法、装置、设备及存储介质 | |
CN111953972A (zh) | IBC模式下的Hash表构建方法、装置、设备 | |
EP4226325A1 (en) | A method and apparatus for encoding or decoding a picture using a neural network | |
JP2018525901A (ja) | ディスプレイストリーム圧縮における変換モード用ブロックサイズの変更 | |
US10034007B2 (en) | Non-subsampled encoding techniques | |
CN118075459A (zh) | 一种视频编解码方法及装置 | |
WO2023185806A9 (zh) | 一种图像编解码方法、装置、电子设备及存储介质 | |
WO2024078403A1 (zh) | 图像处理方法、装置及设备 | |
WO2023197717A1 (zh) | 一种图像解码方法、编码方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23893032 Country of ref document: EP Kind code of ref document: A1 |