WO2023115489A1 - 编解码方法、码流、装置、设备以及可读存储介质 - Google Patents

编解码方法、码流、装置、设备以及可读存储介质 Download PDF

Info

Publication number
WO2023115489A1
WO2023115489A1 PCT/CN2021/140985 CN2021140985W WO2023115489A1 WO 2023115489 A1 WO2023115489 A1 WO 2023115489A1 CN 2021140985 W CN2021140985 W CN 2021140985W WO 2023115489 A1 WO2023115489 A1 WO 2023115489A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
mosaic
format
metadata
Prior art date
Application number
PCT/CN2021/140985
Other languages
English (en)
French (fr)
Inventor
虞露
王楚楚
李思成
白雨箫
戴震宇
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Priority to PCT/CN2021/140985 priority Critical patent/WO2023115489A1/zh
Publication of WO2023115489A1 publication Critical patent/WO2023115489A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the embodiments of the present application relate to the field of virtual-real hybrid technology, and in particular, to a codec method, code stream, device, device, and readable storage medium.
  • point cloud data as an important and popular 3D object representation method, is widely used in many fields such as virtual and mixed reality, autonomous driving, and 3D printing. Compared with traditional two-dimensional image data, point cloud data contains more vivid detail information, which makes the amount of point cloud data very large.
  • Embodiments of the present application provide a codec method, code stream, device, equipment, and readable storage medium, which can not only reduce the demand for video decoders, make full use of the processing pixel rate of video decoders, but also improve the resolution of video images. Synthetic quality.
  • the embodiment of the present application provides a decoding method, which includes:
  • the code stream obtain the mosaic atlas information and the video data to be decoded
  • Video decoding is performed on the video data to be decoded to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • the embodiment of the present application provides an encoding method, which includes:
  • the mosaic atlas information and the mosaic image are encoded, and the obtained coded bits are written into a code stream.
  • the embodiment of the present application provides a code stream, which is generated by bit coding according to the information to be encoded; wherein the information to be encoded includes at least one of the following: mosaic atlas information, mosaic images, and syntax The value of element identification information.
  • an embodiment of the present application provides an encoding device, which includes a first acquisition unit, a splicing unit, and an encoding unit; wherein,
  • the first acquisition unit is configured to acquire image sub-blocks corresponding to at least two visual data in heterogeneous formats
  • a splicing unit configured to splice image sub-blocks corresponding to at least two visual data in heterogeneous formats to obtain spliced atlas information and spliced images
  • the encoding unit is configured to encode the mosaic atlas information and the mosaic image, and write the obtained coded bits into a code stream.
  • an embodiment of the present application provides an encoding device, where the encoding device includes a first memory and a first processor; wherein,
  • a first memory for storing a computer program capable of running on the first processor
  • the first processor is configured to execute the method as described in the second aspect when running the computer program.
  • an embodiment of the present application provides a decoding device, which includes a second acquisition unit, a metadata decoding unit, and a video decoding unit; wherein,
  • the second obtaining unit is configured to obtain mosaic atlas information and video data to be decoded according to the code stream;
  • the metadata decoding unit is configured to decode the metadata of the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats
  • the video decoding unit is configured to perform video decoding on the video data to be decoded to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • the embodiment of the present application provides a decoding device, where the decoding device includes a second memory and a second processor; wherein,
  • a second memory for storing a computer program capable of running on the second processor
  • the second processor is configured to execute the method as described in the first aspect when running the computer program.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed, the method as described in the first aspect is implemented, or the method as described in the first aspect is implemented. The method described in the two aspects.
  • Embodiments of the present application provide a codec method, code stream, device, device, and readable storage medium.
  • image sub-blocks corresponding to visual data in at least two heterogeneous formats are acquired; for at least two heterogeneous
  • the image sub-blocks corresponding to the visual data in the format are spliced to obtain the spliced atlas information and the spliced image; the spliced atlas information and the spliced image are encoded, and the obtained coded bits are written into the code stream.
  • the mosaic atlas information and the video data to be decoded are obtained; metadata decoding is performed on the mosaic atlas information to obtain auxiliary information of at least two heterogeneous formats; video decoding is performed on the video data to be decoded to obtain A spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • FIG. 1A is a schematic diagram of a synthesis framework based on a data format
  • Figure 1B is a schematic diagram of another synthesis framework based on data format
  • Fig. 2 is a schematic diagram of an encoding method and a decoding method based on a data format
  • FIG. 3A is a detailed schematic diagram of a video encoder provided in an embodiment of the present application.
  • FIG. 3B is a detailed schematic diagram of a video decoder provided in an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of a decoding method provided by an embodiment of the present application.
  • Fig. 5 is a schematic flow chart of another decoding method provided by the embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another decoding method provided in the embodiment of the present application.
  • FIG. 7 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another encoding method provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of the composition and structure of an encoding device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a specific hardware structure of an encoding device provided in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the composition and structure of a decoding device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a specific hardware structure of a decoding device provided in an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the composition and structure of an encoding and decoding system provided by an embodiment of the present application.
  • references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
  • first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” may be interchanged where permitted so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
  • MPEG Moving Picture Experts Group
  • V3C Visual Volumetric Video-based Coding
  • MPEG Immersive Video MPEG Immersive Video, MIV
  • V-PCC Video based Point Cloud Compression
  • homogeneous data formats are defined as data formats with the same source expression
  • heterogeneous data formats are defined as data formats with different origins.
  • a source with a homogeneous data format may be referred to as a homogeneous source for short
  • a source with a heterogeneous data format may be referred to as a heterogeneous source for short.
  • FIG. 1A shows a schematic diagram of a synthesis framework based on a data format.
  • both format 0 and format 1 are image formats, that is, format 0 and format 1 are isomorphic data formats; format 2 is a point cloud format, and format 3 is a grid (Mesh) format, that is, format 2 and format 3 is a heterogeneous data format. That is to say, in FIG. 1A, two heterogeneous data formats (ie, format 2 and format 3) are combined with homogeneous data formats (ie, format 0 and format 1) in the scene. In this way, real-time immersive video interaction services can be provided for multiple data formats (eg, meshes, point clouds, images, etc.) with different sources.
  • data formats eg, meshes, point clouds, images, etc.
  • FIG. 1B shows another synthesizing framework based on data format.
  • point clouds and images are heterogeneous data formats, which can be combined here, and then independently encoded and decoded based on the data format method.
  • the point cloud format is non-uniform sampling processing
  • the image format is uniform sampling processing.
  • the method based on the data format may allow independent processing at the bit stream level of the data format. That is, like tiles or slices in video coding, different data formats in this scene can be encoded in an independent manner, so that independent encoding and decoding can be performed based on the data format.
  • FIG. 2 it shows a schematic diagram of a data format-based encoding method and decoding method. As shown in FIG. 2 , (a) shows a flow of an encoding method, and (b) shows a flow of a decoding method.
  • each format in format 0 to format 3 can be used to encode separately. Assuming that these formats share a common 3D scene, for some data formats from different sources (for example, format 2 and format 3), it must also be converted to an image format before encoding, specifically, the grid (Mesh) format needs to be converted As an image format, the point cloud (Point Cloud) format also needs to be converted into an image format; then it is encoded by a metadata encoder based on the data format to generate a bit stream (or it can be called a "code stream").
  • the metadata decoder based on the data format decodes the received bitstream.
  • the bitstream separately encoded based on the data format needs to be synthesized into the scene during the content synthesis process.
  • certain data formats can be filtered from rendering.
  • a foreign data format (or bitstream) can be added to the compositing process if the foreign data format can share the same scene. Assuming that these data formats share a common 3D scene, some data formats from different sources (e.g., format 2 and format 3) must also be converted to the data format of the same source before encoding and then subsequent processing.
  • each data format can be independently described in the content description by enabling independent encoding/decoding based on the data format. Therefore, related technologies have proposed that heterogeneous data formats (eg, Mesh, point cloud, etc.) can be converted into image formats (also called “multi-viewpoint planar image formats", “image plane formats”, etc.), which can be used as The new data format is rendered with metadata encoding and decoding methods; it is even proposed to support virtual and real mixing at the system layer, such as mixing the code stream of the point cloud format and the code stream of the image format at the system layer (Multiplex).
  • image formats also called “multi-viewpoint planar image formats”, “image plane formats”, etc.
  • an atlas contains both image patches and point cloud patches. If the point cloud is projected into an image and then encoded and decoded, and the viewpoint image to be viewed is rendered based on the reconstructed image after decoding, the point cloud actually contains sufficient information for continuous multi-view viewing, because the projection before encoding is limited.
  • each data format forms an independent code stream, different data formats
  • Multiple code streams are mixed into a composite system layer code stream by the system layer, and the independent code stream corresponding to each data format calls at least one video codec, which will lead to an increase in the number of video codecs, thereby improving implementation the cost of.
  • the embodiment of the present application provides a decoding method, by obtaining the mosaic atlas information and the video data to be decoded according to the code stream; performing metadata decoding on the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats; Decoding the video data to perform video decoding to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • the embodiment of the present application also provides an encoding method, by obtaining at least two image sub-blocks corresponding to the visual data in heterogeneous formats; splicing the image sub-blocks corresponding to the visual data in at least two heterogeneous formats to obtain a mosaic map set information and spliced images; encode the spliced atlas information and spliced images, and write the obtained coded bits into a code stream.
  • the video encoder 10 includes a transform and quantization unit 101, an intra frame estimation unit 102, an intra frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter Control analysis unit 107, filter unit 108, encoding unit 109 and decoded image cache unit 110, etc., wherein, filter unit 108 can realize deblocking filtering and sample adaptive indentation (Sample Adaptive Offset, SAO) filtering, encoding unit 109 can realize Header information coding and context-based adaptive binary arithmetic coding (Context-based Adaptive Binary Arithmatic Coding, CABAC).
  • SAO Sample Adaptive Offset
  • CABAC Context-based Adaptive Binary Arithmatic Coding
  • a video coding block can be obtained by dividing the coding tree block (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is paired by the transformation and quantization unit 101
  • the video coding block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the obtained transform coefficients to further reduce the bit rate;
  • the intra frame estimation unit 102 and the intra frame prediction unit 103 are used for Intra-frame prediction is performed on the video coding block; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to code the video coding block;
  • the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-frame predictive encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information;
  • the motion estimation performed by the motion estimation unit 105 is used to generate motion vectors process, the motion vector can estimate the motion of the video
  • the context content can be based on adjacent coding blocks, and can be used to encode the information indicating the determined intra-frame prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding block for forecast reference. As the video image encoding progresses, new reconstructed video encoding blocks will be continuously generated, and these reconstructed video encoding blocks will be stored in the decoded image buffer unit 110 .
  • the video decoder 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra prediction unit 203, a motion compensation unit 204, a filtering unit 205, and a decoded image buffer unit 206, etc., wherein the decoding unit 201 can implement header information decoding and CABAC decoding, and filtering unit 205 can implement deblocking filtering and SAO filtering.
  • the decoding unit 201 can implement header information decoding and CABAC decoding
  • filtering unit 205 can implement deblocking filtering and SAO filtering.
  • the code stream of the video signal is output; the code stream is input into the video decoder 20, and first passes through the decoding unit 201 to obtain the decoded transform coefficient; for the transform coefficient, pass
  • the inverse transform and inverse quantization unit 202 performs processing to generate a residual block in the pixel domain; the intra prediction unit 203 is operable to generate residual blocks based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture.
  • the motion compensation unit 204 determines the prediction information for the video decoding block by parsing motion vectors and other associated syntax elements, and uses the prediction information to generate the predictive properties of the video decoding block being decoded block; a decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 202 with the corresponding predictive block produced by the intra prediction unit 203 or the motion compensation unit 204; the decoded video signal
  • the video quality can be improved by filtering unit 205 in order to remove block artifacts; the decoded video blocks are then stored in the decoded image buffer unit 206, which stores reference images for subsequent intra prediction or motion compensation , and is also used for the output of the video signal, that is, the restored original video signal is obtained.
  • FIG. 4 shows a schematic flowchart of a decoding method provided in an embodiment of the present application.
  • the method may include:
  • S401 According to the code stream, obtain mosaic atlas information and video data to be decoded.
  • S402 Perform metadata decoding on the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats.
  • S403 Perform video decoding on the video data to be decoded to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • image sub-blocks corresponding to different heterogeneous formats such as point cloud and image can coexist in one spliced image.
  • only one video decoder is needed to decode the image sub-blocks corresponding to the at least two heterogeneous formats, thereby reducing the demand for video decoders.
  • auxiliary information for different heterogeneous formats such as point clouds and images can coexist on the same atlas, but in the mosaic atlas information, for each Auxiliary information in heterogeneous formats can be decoded by calling the corresponding metadata decoder, so that the rendering characteristics from different heterogeneous formats can be preserved.
  • one video decoder is used for sequences belonging to the same mosaic image, while different mosaic images at the same moment belong to different sequences.
  • the heterogeneous formats described in this embodiment of the present application may refer to different sources of data, or may refer to processing the same source into different data formats, which is not limited here.
  • the stitched atlas information may be formed by splicing auxiliary information of at least two visual data in heterogeneous formats. Therefore, in some embodiments, for S402, performing metadata decoding on the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats may include:
  • the mosaic atlas information may include auxiliary information in at least two heterogeneous formats, and the auxiliary information for each heterogeneous format may be decoded using a corresponding metadata decoder.
  • the auxiliary information for each heterogeneous format may be decoded using a corresponding metadata decoder.
  • the at least two heterogeneous formats may include a first data format and a second data format.
  • the metadata decoding of the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats may include:
  • auxiliary information is information corresponding to the first data format in the mosaic atlas information
  • auxiliary information is the information corresponding to the second data format in the mosaic atlas information
  • the image sub-blocks corresponding to the first data format and the second data format coexisting in a spliced image may be decoded by a video decoder.
  • a video decoder when decoding the corresponding information of different data formats in the spliced atlas information, if the corresponding information of the first data format needs to be decoded, then it needs to Call the metadata decoder corresponding to the first data format to decode, and obtain the auxiliary information corresponding to the first data format; if the current information that needs to be decoded is the corresponding information of the second data format, then you need to call the metadata corresponding to the second data format The decoder performs decoding to obtain auxiliary information corresponding to the second data format.
  • the at least two heterogeneous formats may further include a third data format.
  • the decoding of the metadata of the mosaic atlas information to obtain the auxiliary information of at least two heterogeneous formats may also include:
  • auxiliary information is the information corresponding to the third data format in the mosaic atlas information
  • the at least two heterogeneous formats are not limited to the first data format and the second data format, and may even include the third data format, the fourth data format, etc., when decoding For auxiliary information in a certain data format, it is only necessary to call the corresponding metadata decoder to decode.
  • the following only uses the first data format and the second data format as examples for illustration.
  • the first data format is an image format
  • the second data format is a point cloud format.
  • the following steps may be included:
  • auxiliary information is information corresponding to the image format in the mosaic atlas information, call a multi-view decoder to decode to obtain auxiliary information corresponding to the image format.
  • the first data format and the second data format are different.
  • the first data format can be an image format
  • the second data format can be a point cloud format
  • the projection formats of the first data format and the second data format are different, the first data format can be a perspective projection format
  • the second data format The format may be an orthogonal projection format; or, the first data format may also be a grid format, a point cloud format, etc., and the second data format may also be a grid format, an image format, etc., which are not limited here.
  • the point cloud format is processed by non-uniform sampling
  • the image format is processed by uniform sampling. Therefore, the point cloud format and the image format can be regarded as two heterogeneous formats.
  • the multi-view decoder can be called for decoding; for the point cloud format, the point cloud decoder can be called for decoding.
  • performing video decoding on the video data to be decoded to obtain a spliced image may include:
  • the video decoder to perform video decoding on the video data to be decoded to obtain a spliced image; wherein, the number of video decoders is one.
  • image sub-blocks corresponding to at least two heterogeneous formats coexisting in a spliced image may be obtained by decoding by a video decoder.
  • the number of video decoders that need to be called in the embodiment of the present application is small, and the processing pixel rate of the video decoders can be fully utilized, so that the hardware requirements reduce.
  • the image sub-blocks corresponding to multiple heterogeneous formats in the mosaic image can be decoded by a video decoder; but for the auxiliary information of these various heterogeneous formats in the mosaic atlas information, you can call The respective metadata decoders perform decoding to obtain auxiliary information corresponding to different heterogeneous formats.
  • the point cloud decoder can call the point cloud decoder to decode to obtain the auxiliary information corresponding to the point cloud format; if you need to decode the corresponding information of the image format in the mosaic atlas information Information can be decoded by calling a multi-view decoder to obtain auxiliary information corresponding to the image format, etc., which is not limited in this embodiment of the present application.
  • the method may further include:
  • S601 Perform rendering processing on the spliced image by using auxiliary information in at least two heterogeneous formats to obtain a target three-dimensional image.
  • image sub-blocks corresponding to at least two heterogeneous formats can coexist in one spliced image, and the spliced image is decoded using one video decoder, thereby reducing the number of video decoders ;
  • the corresponding metadata decoders can be called for decoding, so that the rendering advantages from different data formats (such as image formats, point cloud formats, etc.) can be preserved, and the image can also be improved. synthetic quality.
  • the point cloud decoding standard shown in Table 2 stipulates that when the flag bit of the syntax element asps_vpcc_extension_present_flag is true (or the value is 1), the relevant syntax elements involved in the extension of the image decoding standard (the syntax elements with gray as the base part) flags are all false (or take the value 0).
  • the details are as follows. Therefore, neither the point cloud decoding standard (such as the V-PCC standard) nor the image decoding standard (such as the MIV standard) can support both of them being true at the same time.
  • the embodiment of the present application provides a decoding method, which can realize the coexistence of image sub-blocks in different data formats such as point clouds and images in one spliced image, so as to realize the aforementioned advantages of saving the number of video decoders, and can also Rendering characteristics from different data formats such as image formats and point cloud formats are preserved, improving the quality of image synthesis.
  • the embodiment of the present application is provided with a target syntax element profile table (Profile), and the target syntax element profile table is used to indicate that image sub-blocks corresponding to at least two heterogeneous formats can coexist in one spliced image.
  • the embodiment of the present application can realize decoding processing by one video decoder.
  • the target syntax element overview table may be obtained by extending the initial syntax element overview table. That is to say, the target syntax element summary table may be composed of an initial summary part and a mixed summary part.
  • the initial overview part is used to indicate that the image sub-block corresponding to the image format and the image sub-block corresponding to the point cloud format do not support coexistence in one spliced image;
  • the mixed overview part is used to indicate that the image format can be supported The corresponding image sub-blocks and the corresponding image sub-blocks in the point cloud format coexist in one spliced image.
  • the initial syntax element overview table or the initial overview part only supports the image sub-block corresponding to the image format, and clearly indicates the image format corresponding to the Image sub-blocks corresponding to sub-blocks and point cloud formats cannot coexist in one spliced image;
  • the target syntax element overview table can support image sub-blocks corresponding to image formats and image sub-blocks corresponding to point cloud formats Blocks coexist in a stitched image, as shown in Table 3 for details.
  • Table 3 is obtained on the basis of the existing overview of MIV syntax elements in the standard, and the part in gray is the content of the mixed overview part newly added in the embodiment of the present application.
  • Table 3 provides an example of an overview table of target syntax elements.
  • the target syntax element overview table is just a specific example, except that the flag bit of the syntax element vps_occupancy_video_present_flag[atlasID] is determined to be 1 (because of the point cloud projection method, there must be occupancy information), and the flag bits of some other syntax elements can be omitted Restrictions, for example, the syntax element ai_attribute_count[atlasID] can be unconstrained (in addition to texture and transparency, point clouds also support attributes such as reflectivity and material).
  • Table 3 is just an example, which is not specifically limited in this embodiment of the present application.
  • the method may also include:
  • the syntax element identification information indicates that the image sub-blocks corresponding to at least two heterogeneous formats do not support coexistence in the stitched image in the initial overview part, and the image sub-blocks corresponding to at least two heterogeneous formats coexist in the stitched image in the mixed overview part , then execute the step of obtaining mosaic atlas information and data to be decoded according to the code stream.
  • the obtaining the value of the syntax element identification information according to the code stream may include:
  • the syntax element identification information indicates that the image sub-blocks corresponding to at least two heterogeneous formats do not support coexistence in the stitched image in the initial overview part;
  • the syntax element identification information indicates that the mixed summary part supports image sub-blocks corresponding to at least two heterogeneous formats to coexist in the spliced image.
  • the method may further include: if the value of the syntax element identification information is the second value in the initial overview part, determining that the syntax element identification information indicates that at least two images corresponding to heterogeneous formats are supported in the initial overview part The block coexists in the spliced image; or, if the value of the syntax element identification information is the first value in the mixed summary part, then it is determined that the syntax element identification information indicates that the mixed summary part does not support the coexistence of image sub-blocks corresponding to at least two heterogeneous formats for splicing images.
  • the first value and the second value are different.
  • the first value is equal to 0, and the second value is equal to 1; or, the first value is equal to 1, and the second value is equal to 0; or, the first value is false (false), the second value is true (true), and so on.
  • the first value is equal to 0, and the second value is equal to 1, but there is no limitation here.
  • the flag bit (flag) restriction related to V-PCC extension is added, here add two syntax elements asps_vpcc_extension_present_flag and aaps_vpcc_extension_present_flag, and the syntax element identification information in the initial overview part
  • the value of is clearly 0, that is, it is clear that the image format and the point cloud format cannot coexist. Therefore, a new overview table defined here (i.e. the target syntax element overview table shown in Table 3) can support this situation.
  • decoding auxiliary information when encountering an image format, it will call the corresponding The image decoding standard (i.e., image decoder), when encountering a point cloud format, calls the point cloud decoding standard (i.e., point cloud decoder), and then restores all the pixels in the three-dimensional space, and then projects them to the target viewpoint.
  • image decoding standard i.e., image decoder
  • point cloud decoding standard i.e., point cloud decoder
  • the parsing of syntax elements and the decoding process of the point cloud format and the decoding process of the image format recorded in the relevant standards are introduced into the new overview table (that is, the target syntax element overview table described in the embodiment of this application).
  • decoding process Exemplarily, the decoding process of MIV Main Mixed V-PCC Profile comes from the related decoding process of MIV Main and V-PCC, and so on.
  • V-PCC Profile has the following four types, as shown in Table 4.
  • rendering processing is required, and the process may include the following steps: scale geometry (Scale geometry), attribute compensation of sub-blocks Processing (Apply patch attribute offset process), filtering unnecessary sub-blocks (Filter inpaint patches), rebuilding cropped views (Reconstruct pruned views), determining view blending weights based on viewpoint information (Determine view blending weights based on a viewport pose), restoring samples Recover sample weights, Reconstruct 3D points, Reconstruct 3D point cloud specified in the reconstruction standard, Project to a viewport, Get texture information from multiple views (Fetch texture from multiple views), blend texture contributions (Blend texture contributions), etc.
  • "reconstructing the 3D point cloud specified in the standard” is a newly added step in the embodiment of the present application, so as to realize the mixture of virtual and real.
  • the image format and the point cloud format are mixed and encoded. Compared with encoding separately and calling their own decoders to demultiplex signals independently, the number of video decoders that need to be called here is small, making full use of video decoding The processing pixel rate of the processor is reduced, and the hardware requirements are reduced. In addition, the embodiment of the present application retains the rendering advantages of data formats (grid, point cloud, etc.) from different sources, and can also improve the quality of image synthesis.
  • This embodiment provides a decoding method, by obtaining the mosaic atlas information and the video data to be decoded according to the code stream; performing metadata decoding on the mosaic atlas information to obtain auxiliary information of at least two heterogeneous formats; Video decoding is performed on the video data to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • FIG. 7 shows a schematic flowchart of an encoding method provided in an embodiment of the present application. As shown in Figure 7, the method may include:
  • S701 Acquire image sub-blocks corresponding to visual data in at least two heterogeneous formats.
  • S702 Concatenate the image sub-blocks corresponding to the visual data in at least two heterogeneous formats to obtain spliced atlas information and spliced images.
  • S703 Encode the mosaic atlas information and the mosaic image, and write the obtained coded bits into a code stream.
  • the encoding method described in the embodiment of the present application may specifically refer to an encoding method of 3D heterogeneous visual data.
  • image sub-blocks corresponding to different heterogeneous formats such as point cloud and image can coexist in one spliced image.
  • the spliced image composed of the image sub-blocks corresponding to the visual data in at least two heterogeneous formats is encoded, it can be subsequently decoded by only one video decoder, thereby reducing the demand for video decoders.
  • one video decoder is used for sequences belonging to the same mosaic image, while different mosaic images at the same moment belong to different sequences.
  • the heterogeneous formats described in this embodiment of the present application may refer to different sources of data, or may refer to processing the same source into different data formats, which is not limited here.
  • the mosaic atlas information may be formed by splicing auxiliary information of visual data in at least two heterogeneous formats; the mosaic image may be composed of at least two heterogeneous formats The image sub-blocks corresponding to the visual data are stitched together.
  • S801 Invoke a metadata encoder to encode metadata on the mosaic atlas information.
  • S802 Call a video encoder to perform video encoding on the spliced images.
  • the auxiliary information for different data formats such as point cloud and image can coexist on the same atlas, but in the mosaic atlas information, the corresponding element can be called for each heterogeneous format of auxiliary information.
  • the data encoder performs encoding processing.
  • image sub-blocks corresponding to visual data in different data formats such as point cloud and image may be rearranged on the same spliced image, and then a video encoder may be called for encoding processing on the spliced image.
  • the number of video encoders is one; and the number of metadata encoders is at least two, and the number of metadata encoders has a corresponding relationship with the number of heterogeneous formats. That is to say, the auxiliary information for each heterogeneous format can be encoded using a corresponding metadata encoder. In other words, in the embodiment of the present application, how many kinds of auxiliary information in heterogeneous formats are included in the mosaic atlas information, and how many kinds of metadata encoders are needed.
  • the at least two heterogeneous formats may include a first data format and a second data format.
  • said invoking a metadata encoder to perform metadata encoding on said mosaic atlas information may include:
  • auxiliary information is the corresponding information of the first data format in the mosaic atlas information, calling the metadata encoder corresponding to the first data format to encode;
  • auxiliary information is information corresponding to the second data format in the mosaic atlas information
  • a metadata encoder corresponding to the second data format is called to perform encoding.
  • the image sub-blocks corresponding to the first data format and the second data format coexisting in a spliced image may be encoded by a video encoder.
  • auxiliary information currently to be encoded is the corresponding information in the first data format
  • metadata encoder corresponding to the first data format for encoding if the auxiliary information currently to be encoded is the corresponding information of the second data format, then it is necessary to call the metadata encoder corresponding to the second data format for encoding.
  • the at least two heterogeneous formats may further include a third data format.
  • the calling metadata encoder to perform metadata encoding on mosaic atlas information may also include:
  • the metadata encoder corresponding to the third data format is called to encode.
  • the at least two heterogeneous formats are not limited to the first data format and the second data format, and may even include the third data format, the fourth data format, etc., when encoding is required
  • the corresponding metadata encoder for encoding.
  • the following only uses the first data format and the second data format as examples for illustration.
  • the first data format is an image format
  • the second data format is a point cloud format
  • the calling the metadata encoder to perform metadata encoding on the mosaic atlas information may include:
  • auxiliary information is the corresponding information of the image format in the mosaic atlas information
  • the point cloud encoder for encoding.
  • the first data format and the second data format are different.
  • the first data format can be an image format
  • the second data format can be a point cloud format
  • the projection formats of the first data format and the second data format are different, the first data format can be a perspective projection format
  • the second data format The format may be an orthogonal projection format; or, the first data format may also be a grid format, a point cloud format, etc., and the second data format may also be a grid format, an image format, etc., which are not limited here.
  • the point cloud format is processed by non-uniform sampling
  • the image format is processed by uniform sampling. Therefore, the point cloud format and the image format can be regarded as two heterogeneous formats.
  • a multi-view encoder can be called for encoding; for auxiliary information in point cloud format, a point cloud encoder can be called for encoding.
  • the auxiliary information that needs to be encoded currently is the corresponding information of the image format, then it is necessary to call the multi-view encoder for encoding; if the auxiliary information that needs to be encoded currently is the corresponding information of the point cloud format, then it is necessary to call the point cloud encoder Encoding is performed so that the rendering characteristics from the image format and the rendering characteristics from the point cloud format can be preserved when the subsequent decoding process is performed on the decoding side.
  • the image sub-blocks corresponding to the visual data of at least two heterogeneous formats can coexist in a spliced image, and the spliced image can be encoded using a video encoder, which reduces video encoding The number of encoders; since a video decoder is used for subsequent decoding, the number of video decoders is also reduced; however, for the auxiliary information of at least two heterogeneous formats, the corresponding metadata encoders can be called for encoding, and then When re-decoding, the corresponding metadata decoder is called for decoding, so that the rendering advantages from different data formats (such as image formats, point cloud formats, etc.) can be preserved to improve the quality of image synthesis.
  • a video encoder which reduces video encoding The number of encoders; since a video decoder is used for subsequent decoding, the number of video decoders is also reduced; however, for the auxiliary information of at least two heterogeneous formats, the corresponding metadata encoders can be
  • the target syntax element overview table may be obtained by extending the existing initial syntax element overview table in the standard. That is to say, the target syntax element summary table may be composed of an initial summary part and a mixed summary part.
  • the initial overview part is used to indicate that the image sub-block corresponding to the image format and the image sub-block corresponding to the point cloud format do not support coexistence in one spliced image;
  • the mixed overview part is used to indicate that the image format can be supported
  • the corresponding image sub-blocks and the corresponding image sub-blocks in the point cloud format coexist in one spliced image.
  • the initial syntax element overview table or the initial overview part only supports the image sub-blocks corresponding to the image format, and clearly indicates that the image sub-blocks corresponding to the image format and the image sub-blocks corresponding to the point cloud format cannot coexist Based on a spliced image;
  • the overview table of target syntax elements can support the coexistence of image sub-blocks corresponding to the image format and image sub-blocks corresponding to the point cloud format in a spliced image due to the addition of a mixed overview part, see the above table for details 3.
  • Table 3 provides an example of an overview table of target syntax elements.
  • the target syntax element overview table is just a specific example, except that the flag bit of the syntax element vps_occupancy_video_present_flag[atlasID] is determined to be 1 (because of the point cloud projection method, there must be occupancy information), and the flag bits of some other syntax elements can be omitted Restrictions, for example, the syntax element ai_attribute_count[atlasID] can be unconstrained (in addition to texture and transparency, point clouds also support attributes such as reflectivity and material).
  • Table 3 is just an example, which is not specifically limited in this embodiment of the present application.
  • the method may also include:
  • the determining the value of the syntax element identification information may include:
  • syntax element identification information indicates that image sub-blocks corresponding to at least two heterogeneous formats do not support coexistence in the spliced image in the initial summary part, then determine that the value of the syntax element identification information is the first value in the initial summary part;
  • syntax element identification information indicates that image sub-blocks corresponding to at least two heterogeneous formats are supported to coexist in the spliced image in the mixed summary part, then determine that the value of the syntax element identification information is the second value in the mixed summary part.
  • the method may further include: if the syntax element identification information indicates that image sub-blocks corresponding to at least two heterogeneous formats are supported in the spliced image in the initial overview part, then determining the value of the syntax element identification information in the initial overview part Part of it is the second value; or, if the syntax element identification information indicates that the image sub-blocks corresponding to at least two heterogeneous formats do not support coexistence in the spliced image in the mixed overview part, then determine the value of the syntax element identification information in the mixed overview part. first value.
  • the first value and the second value are different.
  • the first value is equal to 0, and the second value is equal to 1; or, the first value is equal to 1, and the second value is equal to 0; or, the first value is false (false), the second value is true (true), and so on.
  • the first value is equal to 0, and the second value is equal to 1, but there is no limitation here.
  • the flag bit (flag) restriction related to V-PCC extension is added, here add two syntax elements asps_vpcc_extension_present_flag and aaps_vpcc_extension_present_flag, and the syntax element identification information in the initial overview part
  • the value of is clearly 0, that is, it is clear that the image format and the point cloud format cannot coexist. Therefore, a new summary table defined here (i.e. the target syntax element summary table shown in Table 3) can support this situation.
  • the encoding method for mixing virtual and real may specifically refer to the encoding method of 3D heterogeneous visual data.
  • the metadata encoder needs to distinguish whether it is the metadata encoding of the image part or the metadata encoding of the point cloud part, but only one video encoder is needed to stitch images, that is, the required The number of video encoders is small.
  • the image format and the point cloud format are mixed and encoded. Compared with encoding separately and then calling their own decoders to demultiplex signals independently, the number of video codecs that need to be called here is small, and the hardware requirements reduce. In addition, the embodiment of the present application retains the rendering advantages of data formats (grid, point cloud, etc.) from different sources, and can also improve the quality of image synthesis.
  • This embodiment provides an encoding method, by obtaining image sub-blocks corresponding to at least two visual data in heterogeneous formats; splicing the image sub-blocks corresponding to at least two visual data in heterogeneous formats to obtain stitched atlas information and the spliced image; encode the spliced atlas information and the spliced image, and write the obtained coded bits into a code stream.
  • the embodiment of the present application provides a code stream, where the code stream is generated by performing bit coding according to the information to be coded.
  • the information to be encoded may include at least one of the following: mosaic atlas information, mosaic images, and values of syntax element identification information.
  • the value of the grammatical element identification information is used to clarify that different formats such as image and point cloud cannot coexist on the same stitched image in related technologies, but the embodiment of this application can support the coexistence of different formats such as image and point cloud in the same stitched image
  • visual data corresponding to at least two heterogeneous formats are supported in the same atlas, and then different metadata decoders can be used to decode the respective auxiliary information of the at least two heterogeneous formats, and a video decoder can be used
  • the mosaic image composed of at least two heterogeneous formats can be decoded, thereby not only realizing the expansion of the codec standard, but also reducing the demand for video decoders, making full use of the processing pixel rate of video decoders, reducing Hardware requirements are reduced; in addition, the quality of image compositing is improved because rendering
  • FIG. 9 shows a schematic structural diagram of an encoding device 90 provided in the embodiment of the present application.
  • the encoding device 90 may include: a first acquiring unit 901, a splicing unit 902, and an encoding unit 903; wherein,
  • the first acquiring unit 901 is configured to acquire image sub-blocks corresponding to visual data in at least two heterogeneous formats
  • the splicing unit 902 is configured to splice image sub-blocks corresponding to at least two visual data in heterogeneous formats to obtain spliced atlas information and spliced images;
  • the encoding unit 903 is configured to encode the mosaic atlas information and the mosaic image, and write the obtained coded bits into a code stream.
  • the mosaic atlas information is formed by stitching auxiliary information of at least two visual data in heterogeneous formats; the mosaic image is stitched by image sub-blocks corresponding to at least two visual data in heterogeneous formats constituted.
  • the encoding unit 903 is specifically configured to call a metadata encoder to perform metadata encoding on the mosaic atlas information; and call a video encoder to perform video encoding on the mosaic image.
  • the number of video encoders is one; the number of metadata encoders is at least two, and the number of metadata encoders corresponds to the number of heterogeneous formats.
  • the at least two heterogeneous formats include the first data format and the second data format; correspondingly, the encoding unit 903 is further configured to, if the currently encoded auxiliary information is the first data format in the mosaic atlas information Corresponding information, call the metadata encoder corresponding to the first data format to encode; and if the currently encoded auxiliary information is the corresponding information of the second data format in the mosaic atlas information, call the metadata encoding corresponding to the second data format device to encode.
  • the first data format is an image format
  • the second data format is a point cloud format
  • the encoding unit 903 is further configured to if the currently encoded auxiliary information is the corresponding information of the image format in the mosaic atlas information , call the multi-view encoder for encoding; and if the currently encoded auxiliary information is the corresponding information in the point cloud format in the mosaic atlas information, call the point cloud encoder for encoding.
  • the at least two heterogeneous formats further include a third data format; correspondingly, the encoding unit 903 is further configured to, if the currently encoded auxiliary information is corresponding information in the third data format in the mosaic atlas information, then Call the metadata encoder corresponding to the third data format to encode.
  • the encoding device 90 may further include a first determining unit 904 configured to determine the value of the syntax element identification information;
  • the encoding unit 903 is further configured to encode the value of the syntax element identification information, and write the obtained encoded bits into the code stream.
  • the first determining unit 904 is specifically configured to determine the syntax element identification information if the syntax element identification information indicates that image sub-blocks corresponding to at least two heterogeneous formats do not support coexistence in the spliced image in the initial overview part The value is the first value in the initial overview part; and if the syntax element identification information indicates that image sub-blocks corresponding to at least two heterogeneous formats are supported in the mosaic image in the mixed overview part, then it is determined that the value of the syntax element identification information is in the mixed The overview part is the second value.
  • the first value is equal to zero and the second value is equal to one.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • an embodiment of the present application provides a computer storage medium, where the computer storage medium stores a computer program, and when the computer program is executed by a first processor, the method described in any one of the foregoing embodiments is implemented.
  • the encoding device 100 may include: a first communication interface 1001 , a first memory 1002 and a first processor 1003 ; each component is coupled together through a first bus system 1004 .
  • the first bus system 1004 includes not only a data bus, but also a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as first bus system 1004 in FIG. 10 . in,
  • the first communication interface 1001 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the first memory 1002 is used to store computer programs that can run on the first processor 1003;
  • the first processor 1003 is configured to, when running the computer program, execute:
  • the mosaic atlas information and the mosaic image are encoded, and the obtained coded bits are written into a code stream.
  • the first memory 1002 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • Synchlink DRAM SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus RAM, DRRAM
  • the first memory 1002 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the first processor 1003 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the first processor 1003 or an instruction in the form of software.
  • the above-mentioned first processor 1003 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in the embodiments of the present application can be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the first memory 1002, and the first processor 1003 reads the information in the first memory 1002, and completes the steps of the above method in combination with its hardware.
  • the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices used to perform the functions described in this application electronic unit or its combination.
  • the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
  • the first processor 1003 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an encoding device, and the encoding device may include the encoding apparatus 90 described in the foregoing embodiments.
  • the encoding device may include the encoding apparatus 90 described in the foregoing embodiments.
  • visual data corresponding to at least two heterogeneous formats are supported in the same atlas, and then different metadata decoders can be used to decode the respective auxiliary information of the at least two heterogeneous formats, and a video decoder can be used to decode
  • the mosaic image composed of at least two heterogeneous formats is decoded, which not only realizes the expansion of the codec standard, but also reduces the demand for video decoders, can make full use of the processing pixel rate of video decoders, and reduces hardware requirements; in addition, it also improves the quality of image compositing due to the ability to preserve rendering characteristics from different heterogeneous formats.
  • FIG. 11 shows a schematic diagram of the composition and structure of a decoding device 110 provided in the embodiment of the present application.
  • the decoding device 110 may include a second acquiring unit 1101, a metadata decoding unit 1102, and a video decoding unit 1103; wherein,
  • the second obtaining unit 1101 is configured to obtain mosaic atlas information and video data to be decoded according to the code stream;
  • the metadata decoding unit 1102 is configured to decode the metadata of the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats;
  • the video decoding unit 1103 is configured to perform video decoding on the video data to be decoded to obtain a spliced image; wherein the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • the metadata decoding unit 1102 is specifically configured to invoke at least two kinds of metadata decoders to decode the metadata of the mosaic atlas information to obtain auxiliary information in at least two heterogeneous formats.
  • the at least two heterogeneous formats include the first data format and the second data format; correspondingly, the metadata decoding unit 1102 is further configured to, if the currently decoded auxiliary information is the first data in the stitched atlas information format corresponding information, call the metadata decoder corresponding to the first data format to decode, and obtain the auxiliary information corresponding to the first data format; and if the currently decoded auxiliary information is the corresponding information of the second data format in the mosaic atlas information , call the metadata decoder corresponding to the second data format to decode, and obtain the auxiliary information corresponding to the second data format.
  • the first data format is an image format
  • the second data format is a point cloud format
  • the metadata decoding unit 1102 is further configured to: Corresponding information, call the multi-view decoder to decode, and get the auxiliary information corresponding to the image format; and if the currently decoded auxiliary information is the corresponding information of the point cloud format in the mosaic atlas information, call the point cloud decoder to decode, and get Auxiliary information corresponding to the point cloud format.
  • the at least two heterogeneous formats further include a third data format; correspondingly, the metadata decoding unit 1102 is further configured to, if the currently decoded auxiliary information is information corresponding to the third data format in the mosaic atlas information , call the metadata decoder corresponding to the third data format to decode, and obtain the auxiliary information corresponding to the third data format.
  • the video decoding unit 1103 is specifically configured to call a video decoder to perform video decoding on the video data to be decoded to obtain a spliced image; wherein, the number of video decoders is one.
  • the decoding device 110 may further include a rendering unit 1104 configured to use auxiliary information in at least two heterogeneous formats to render the spliced image to obtain a target three-dimensional image.
  • a rendering unit 1104 configured to use auxiliary information in at least two heterogeneous formats to render the spliced image to obtain a target three-dimensional image.
  • the second obtaining unit 1101 is further configured to obtain the value of the syntax element identification information according to the code stream; and if the syntax element identification information indicates that the initial overview part does not support at least two images corresponding to heterogeneous formats If the sub-blocks coexist in the spliced image, and the image sub-blocks corresponding to at least two heterogeneous formats are supported in the mixed overview part, the step of obtaining the spliced atlas information and the data to be decoded is performed according to the code stream.
  • the decoding device 110 may further include a second determination unit 1105 configured to determine that if the value of the syntax element identification information in the initial summary part is the first value, determine that the syntax element identification information indicates that in the initial The overview part does not support the coexistence of image sub-blocks corresponding to at least two heterogeneous formats in the spliced image; and if the value of the syntax element identification information is the second value in the hybrid overview part, determine that the syntax element identification information indicates that it is supported in the hybrid overview part Image sub-blocks corresponding to at least two heterogeneous formats coexist in the spliced image.
  • a second determination unit 1105 configured to determine that if the value of the syntax element identification information in the initial summary part is the first value, determine that the syntax element identification information indicates that in the initial The overview part does not support the coexistence of image sub-blocks corresponding to at least two heterogeneous formats in the spliced image; and if the value of the syntax element identification information is the second value in the hybrid overview part, determine that
  • the first value is equal to zero and the second value is equal to one.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated units are implemented in the form of software function modules and are not sold or used as independent products, they can be stored in a computer-readable storage medium. Based on such an understanding, this embodiment provides a computer storage medium, where the computer storage medium stores a computer program, and when the computer program is executed by a second processor, the method described in any one of the foregoing embodiments is implemented.
  • FIG. 12 shows a schematic diagram of a specific hardware structure of a decoding device 120 provided by an embodiment of the present application.
  • the decoding device 120 may include: a second communication interface 1201 , a second memory 1202 and a second processor 1203 ; each component is coupled together through a second bus system 1204 .
  • the second bus system 1204 includes not only a data bus, but also a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as the second bus system 1204 in FIG. 12 . in,
  • the second communication interface 1201 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the second memory 1202 is used to store computer programs that can run on the second processor 1203;
  • the second processor 1203 is configured to, when running the computer program, execute:
  • the code stream obtain the mosaic atlas information and the video data to be decoded
  • Video decoding is performed on the video data to be decoded to obtain a spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.
  • the second processor 1203 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 1202 is similar to that of the first memory 1002
  • the hardware function of the second processor 1203 is similar to that of the first processor 1003 ; details will not be described here.
  • This embodiment provides a decoding device, and the decoding device may include the decoding apparatus 110 described in any one of the foregoing embodiments.
  • the decoding device may include the decoding apparatus 110 described in any one of the foregoing embodiments.
  • visual data corresponding to at least two heterogeneous formats are supported in the same atlas, and then different metadata decoders can be used to decode the respective auxiliary information of the at least two heterogeneous formats, and a video decoder can be used to decode
  • the mosaic image composed of at least two heterogeneous formats is decoded, which not only realizes the expansion of the codec standard, but also reduces the demand for video decoders, can make full use of the processing pixel rate of video decoders, and reduces hardware requirements; in addition, it also improves the quality of image compositing due to the ability to preserve rendering characteristics from different heterogeneous formats.
  • FIG. 13 shows a schematic diagram of the composition and structure of a codec system provided by the embodiment of the present application.
  • the codec system 130 may include an encoding device 1301 and a decoding device 1302 .
  • the encoding device 1301 may be the encoding device described in any one of the foregoing embodiments
  • the decoding device 1302 may be the decoding device described in any one of the foregoing embodiments.
  • the codec system 130 can support visual data corresponding to at least two heterogeneous formats in the same atlas, which can not only realize the expansion of codec standards, but also reduce the demand for video decoders , which reduces hardware requirements; in addition, since the rendering characteristics from different heterogeneous formats can also be preserved, the quality of image synthesis is improved.
  • image sub-blocks corresponding to at least two visual data in heterogeneous formats are obtained; image sub-blocks corresponding to at least two visual data in heterogeneous formats are spliced to obtain spliced atlas information and Stitching the image; encoding the stitching atlas information and the stitching image, and writing the obtained coded bits into the code stream.
  • the mosaic atlas information and the video data to be decoded are obtained; metadata decoding is performed on the mosaic atlas information to obtain auxiliary information of at least two heterogeneous formats; video decoding is performed on the video data to be decoded to obtain A spliced image; wherein, the spliced image is composed of image sub-blocks corresponding to at least two heterogeneous formats.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种编解码方法、码流、装置、设备以及可读存储介质,该方法包括:根据码流,获得拼接地图集信息以及待解码视频数据;对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,不仅能够实现对编解码标准的扩展,而且还能够减少视频解码器的需求量,同时提高了图像的合成质量。

Description

编解码方法、码流、装置、设备以及可读存储介质 技术领域
本申请实施例涉及虚实混合技术领域,尤其涉及一种编解码方法、码流、装置、设备以及可读存储介质。
背景技术
随着视频编码技术的不断发展,点云数据作为一种重要而流行的三维物体表示方法,广泛应用于虚拟和混合现实、自动驾驶、三维打印等诸多领域。与传统的二维图像数据相比,点云数据包含更加生动的细节信息,使得点云数据量非常大。
在相关技术中,已有的视频编解码标准并不支持将点云数据和二维图像数据编码到同一地图集中。当一张地图集中既包含二维图像数据又包含点云数据时,这时候通常是将点云数据投影成图像数据再进行编解码处理,从而无法保留点云的细节信息,导致观看视点图像的质量降低;若需要在系统层支持则会增加视频解码器的需求量,进而提高实施的成本。
发明内容
本申请实施例提供一种编解码方法、码流、装置、设备以及可读存储介质,不仅可以减少视频解码器的需求量,充分利用视频解码器的处理像素率;而且还能够提高视频图像的合成质量。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种解码方法,该方法包括:
根据码流,获得拼接地图集信息以及待解码视频数据;
对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。
第二方面,本申请实施例提供了一种编码方法,该方法包括:
获取至少两种异构格式的视觉数据对应的图像子块;
对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;
对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
第三方面,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的;其中,待编码信息包括下述至少之一:拼接地图集信息、拼接图像和语法元素标识信息的取值。
第四方面,本申请实施例提供了一种编码装置,该编码装置包括第一获取单元、拼接单元和编码单元;其中,
第一获取单元,配置为获取至少两种异构格式的视觉数据对应的图像子块;
拼接单元,配置为对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;
编码单元,配置为对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
第五方面,本申请实施例提供了一种编码设备,该编码设备包括第一存储器和第一处理器;其中,
第一存储器,用于存储能够在第一处理器上运行的计算机程序;
第一处理器,用于在运行计算机程序时,执行如第二方面所述的方法。
第六方面,本申请实施例提供了一种解码装置,该解码装置包括第二获取单元、元数据解码单元和视频解码单元;其中,
第二获取单元,配置为根据码流,获得拼接地图集信息以及待解码视频数据;
元数据解码单元,配置为对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
视频解码单元,配置为对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由 至少两种异构格式对应的图像子块组成。
第七方面,本申请实施例提供了一种解码设备,该解码设备包括第二存储器和第二处理器;其中,
第二存储器,用于存储能够在第二处理器上运行的计算机程序;
第二处理器,用于在运行计算机程序时,执行如第一方面所述的方法。
第八方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,所述计算机程序被执行时实现如第一方面所述的方法、或者实现如第二方面所述的方法。
本申请实施例提供了一种编解码方法、码流、装置、设备以及可读存储介质,在编码侧,获取至少两种异构格式的视觉数据对应的图像子块;对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。在解码侧,根据码流,获得拼接地图集信息以及待解码视频数据;对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
附图说明
图1A为一种基于数据格式的合成框架示意图;
图1B为另一种基于数据格式的合成框架示意图;
图2为一种基于数据格式的编码方法和解码方法示意图;
图3A为本申请实施例提供的一种视频编码器的详细框架示意图;
图3B为本申请实施例提供的一种视频解码器的详细框架示意图;
图4为本申请实施例提供的一种解码方法的流程示意图;
图5为本申请实施例提供的另一种解码方法的流程示意图;
图6为本申请实施例提供的又一种解码方法的流程示意图;
图7为本申请实施例提供的一种编码方法的流程示意图;
图8为本申请实施例提供的另一种编码方法的流程示意图;
图9为本申请实施例提供的一种编码装置的组成结构示意图;
图10为本申请实施例提供的一种编码设备的具体硬件结构示意图;
图11为本申请实施例提供的一种解码装置的组成结构示意图;
图12为本申请实施例提供的一种解码设备的具体硬件结构示意图;
图13为本申请实施例提供的一种编解码系统的组成结构示意图。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
对本申请实施例进行进一步详细说明之前,先对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释:
动态图像专家组(Moving Picture Experts Group,MPEG)
可视体视频编码(Visual Volumetric Video-based Coding,V3C)
MPEG沉浸式视频(MPEG Immersive Video,MIV)
点云压缩(Point Cloud Compression,PCC)
基于视频的点云压缩(Video based Point Cloud Compression,V-PCC)
三维(Three Dimensions,3D)
虚拟现实(Virtual Reality,VR)
增强现实(Augmented Reality,AR)
混合现实(Mix Reality,MR)
地图集(Atlas)
图像子块(Patch)
可以理解,通常情况下,将同构数据格式定义为源的表达方式一样的数据格式,异构数据格式定义为起源不同的数据格式。在本申请实施例中,同构数据格式的源可简称为同构源,异构数据格式的源可简称为异构源。
参见图1A,其示出了一种基于数据格式的合成框架示意图。如图1A所示,可以允许在同一视频场景中解码和合成不同的数据格式比特流。其中,格式0和格式1均为图像格式,即格式0和格式1为同构数据格式;格式2为点云(Point Cloud)格式,格式3为网格(Mesh)格式,即格式2和格式3为异构数据格式。也就是说,在图1A中,两个异构数据格式(即格式2和格式3)与场景中的同构数据格式(即格式0和格式1)组合在一起。通过这种方式,可以为具有不同来源的多个数据格式(例如,网格、点云、图像等等)提供实时沉浸式视频交互服务。
在一种具体的示例中,对于点云和图像这两种数据格式,图1B示出了另一种基于数据格式的合成框架示意图。如图1B所示,点云和图像作为异构数据格式,这里可以将其组合在一起,然后基于数据格式的方法进行独立的编码和解码。另外,需要注意的是,点云格式为非均匀采样处理,图像格式为均匀采样处理。
在本申请实施例中,基于数据格式的方法可以允许以数据格式的比特流级进行独立性处理。即与视频编码中的瓦片(tiles)或切片(slices)一样,该场景中的不同数据格式可以以独立的方式编码,从而可以基于数据格式进行独立的编码和解码。
参见图2,其示出了一种基于数据格式的编码方法和解码方法示意图。如图2所示,(a)示出了一种编码方法的流程,(b)示出了一种解码方法的流程。
在(a)中,对于内容预处理过程,可通过格式0~格式3中的每种格式分别进行编码。假定这些格式共享一个公共的3D场景,针对一些来自不同源的数据格式(例如,格式2和格式3)还必须在编码之前将其转换为图像格式,具体地,网格(Mesh)格式需要转换为图像格式,点云(Point Cloud)格式也需要转换为图像格式;然后再由基于数据格式的元数据编码器进行编码,生成比特流(或者可称为“码流”)。
在(b)中,基于数据格式的元数据解码器对所接收到的比特流进行解码,这时候需要在内容合成过程中将基于数据格式单独编码得到的比特流一起合成到该场景中。其中,为了提高渲染效率,可以从渲染中过滤某些数据格式。如果外来数据格式可以共享相同的场景,那么可以将外来数据格式(或比特流)添加到合成过程中。假设这些数据格式共享一个公共的3D场景,一些来自不同源的数据格式(例如,格式2和格式3)还必须在编码之前转换为相同源的数据格式,然后再进行后续处理。
这样,通过启用基于数据格式的独立编码/解码,可以在内容描述中独立地描述每个数据格式。因此,相关技术提出了可以将异构数据格式(例如,Mesh、点云等)转换成图像格式(也可称之为“多视点平面图像格式”、“图像平面格式”等),将其作为新的数据格式,用元数据编解码方法进行渲染;甚至还提出了可以在系统层支持虚实混合,例如将点云格式的码流与图像格式的码流在系统层进行混合(Multiplex)。
然而,在相关技术中,目前并不支持将异构数据格式编码进同一张地图集(Atlas)中,即一张地图集中既包含图像的patch又包含点云的patch。如果将点云等投影成图像再进行编解码,解码后基于重建图像再渲染需要观看的视点图像,点云中实际包含了可供连续的多视点观看的充分信息,由于编码前的投影只有有限个视点图像,在这些视点中点云的部分遮挡信息在此投影过程会丢失,导致观看视点图像的质量降低;如果在系统层支持虚实混合,每种数据格式形成独立码流,不同数据格式的多个码流由系统层混合为复合的系统层码流,每种数据格式对应的独立码流至少调用一个视频编解码器,那么又会导致视频解码器个数的需求量增加,进而提高实施的成本。
本申请实施例提供了一种解码方法,通过根据码流,获得拼接地图集信息以及待解码视频数据;对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。
本申请实施例还提供了一种编码方法,通过获取至少两种异构格式的视觉数据对应的图像子块;对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
下面将结合附图对本申请各实施例进行详细说明。
参见图3A,其示出了本申请实施例提供的一种视频编码器的详细框架示意图。如图3A所示,该视频编码器10包括变换与量化单元101、帧内估计单元102、帧内预测单元103、运动补偿单元104、运动估计单元105、反变换与反量化单元106、滤波器控制分析单元107、滤波单元108、编码单元109和解码图像缓存单元110等,其中,滤波单元108可以实现去方块滤波及样本自适应缩进(Sample Adaptive 0ffset,SAO)滤波,编码单元109可以实现头信息编码及基于上下文的自适应二进制算术编码(Context-based Adaptive Binary Arithmatic Coding,CABAC)。针对输入的原始视频信号,通过编码树块(Coding Tree Unit,CTU)的划分可以得到一个视频编码块,然后对经过帧内或帧间预测后得到的残差像素信息通过变换与量化单元101对该视频编码块进行变换,包括将残差信息从像素域变换到变换域,并对所得的变换系数进行量化,用以进一步减少比特率;帧内估计单元102和帧内预测单元103是用于对该视频编码块进行帧内预测;明确地说,帧内估计单元102和帧内预测单元103用于确定待用以编码该视频编码块的帧内预测模式;运动补偿单元104和运动估计单元105用于执行所接收的视频编码块相对于一或多个参考帧中的一或多个块的帧间预测编码以提供时间预测信息;由运动估计单元105执行的运动估计为产生运动向量的过程,所述运动向量可以估计该视频编码块的运动,然后由运动补偿单元104基于由运动估计单元105所确定的运动向量执行运动补偿;在确定帧内预测模式之后,帧内预测单元103还用于将所选择的帧内预测数据提供到编码单元109,而且运动估计单元105将所计算确定的运动向量数据也发送到编码单元109;此外,反变换与反量化单元106是用于该视频编码块的重构建,在像素域中重构建残差块,该重构建残差块通过滤波器控制分析单元107和滤波单元108去除方块效应伪影,然后将该重构残差块添加到解码图像缓存单元110的帧中的一个预测性块,用以产生经重构建的视频编码块;编码单元109是用于编码各种编码参数及量化后的变换系数,在基于CABAC的编码算法中,上下文内容可基于相邻编码块,可用于编码指示所确定的帧内预测模式的信息,输出该视频信号的码流;而解码图像缓存单元110是用于存放重构建的视频编码块,用于预测参考。随着视频图像编码的进行,会不断生成新的重构建的视频编码块,这些重构建的视频编码块都会被存放在解码图像缓存单元110中。
参见图3B,其示出了本申请实施例提供的一种视频解码器的详细框架示意图。如图3B所示,该视频解码器20包括解码单元201、反变换与反量化单元202、帧内预测单元203、运动补偿单元204、滤波单元205和解码图像缓存单元206等,其中,解码单元201可以实现头信息解码以及CABAC解码,滤波单元205可以实现去方块滤波以及SAO滤波。输入的视频信号经过图3A的编码处理之后,输出该视频信号的码流;该码流输入视频解码器20中,首先经过解码单元201,用于得到解码后的变换系数;针对该变换系数通过反变换与反量化单元202进行处理,以便在像素域中产生残差块;帧内预测单元203可用于基于所确定的帧内预测模式和来自当前帧或图片的先前经解码块的数据而产生当前视频解码块的预测数据;运动补偿单元204是通过剖析运动向量和其他关联语法元素来确定用于视频解码块的预测信息,并使用该预测信息以产生正被解码的视频解码块的预测性块;通过对来自反变换与反量化单元202的残差块与由帧内预测单元203或运动补偿单元204产生的对应预测性块进行求和,而形成解码的视频块;该解码的视频信号通过滤波单元205以便去除方块效应伪影,可以改善视频质量;然后将经解码的视频块存储于解码图像缓存单元206中,解码图像缓存单元206存储用于后续帧内预测或运动补偿的参考图像,同时也用于视频信号的输出,即得到了所恢复的原始视频信号。
在本申请的一实施例中,参见图4,其示出了本申请实施例提供的一种解码方法的流程示意图。 如图4所示,该方法可以包括:
S401:根据码流,获得拼接地图集信息以及待解码视频数据。
S402:对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息。
S403:对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。
需要说明的是,在本申请实施例中,针对点云、图像等不同异构格式对应的图像子块是可以共存于一张拼接图像的。这样,仅需要一个视频解码器即可实现对这至少两种异构格式对应的图像子块进行解码,从而能够减少视频解码器的需求量。
还需要说明的是,在本申请实施例中,针对点云、图像等不同异构格式各自的辅助信息是可以共存于同一地图集上的,但是在该拼接地图集信息中,针对每一种异构格式的辅助信息可以调用相应元数据解码器进行解码,从而能够保留来自不同异构格式的渲染特性。
还需要说明的是,在本申请实施例中,属于同一拼接图像的序列使用一个视频解码器,而同一时刻的不同拼接图像则属于不同序列。另外,本申请实施例所述的异构格式可以是指数据的来源不同,也可以是指同一来源处理为不同的数据格式,这里不作任何限定。
在这里,拼接地图集信息可以是由至少两种异构格式的视觉数据各自的辅助信息进行拼接构成的。因此,在一些实施例中,对于S402来说,所述对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,可以包括:
调用至少两种元数据解码器对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息。
也就是说,拼接地图集信息可以包括至少两种异构格式各自的辅助信息,而针对每一种异构格式的辅助信息可以使用对应的元数据解码器进行解码。换言之,在本申请实施例中,拼接地图集信息中所包括多少种异构格式的辅助信息,那么就需要多少种元数据解码器,即元数据解码器的数量与异构格式的数量具有对应关系。
进一步地,在一些实施例中,至少两种异构格式可以包括第一数据格式和第二数据格式。相应地,对于S402来说,所述对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,可以包括:
若当前解码的辅助信息为拼接地图集信息中第一数据格式的对应信息,则调用第一数据格式对应的元数据解码器进行解码,得到第一数据格式对应的辅助信息;
若当前解码的辅助信息为拼接地图集信息中第二数据格式的对应信息,则调用第二数据格式对应的元数据解码器进行解码,得到第二数据格式对应的辅助信息。
需要说明的是,针对共存于一个拼接图像中的第一数据格式和第二数据格式对应的图像子块可以是由一个视频解码器进行解码得到。但是针对这种两种数据格式的虚实混合应用事件(Use Case),当解码拼接地图集信息中不同数据格式的对应信息时,如果当前需要解码的是第一数据格式的对应信息,那么就需要调用第一数据格式对应的元数据解码器进行解码,得到第一数据格式对应的辅助信息;如果当前需要解码的是第二数据格式的对应信息,那么就需要调用第二数据格式对应的元数据解码器进行解码,得到第二数据格式对应的辅助信息。
进一步地,在一些实施例中,至少两种异构格式还可以包括第三数据格式。相应地,所述对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,还可以包括:
若当前解码的辅助信息为拼接地图集信息中第三数据格式的对应信息,则调用第三数据格式对应的元数据解码器进行解码,得到第三数据格式对应的辅助信息。
也就是说,在本申请实施例中,这至少两种异构格式并不仅限于第一数据格式和第二数据格式,甚至还可以包括第三数据格式、第四数据格式等等,当需要解码某一数据格式的辅助信息时,只需要调用相应元数据解码器进行解码即可,下面仅以第一数据格式和第二数据格式为例进行说明。
在一种具体的实施例中,第一数据格式为图像格式,第二数据格式为点云格式。相应地,在一些实施例中,如图5所示,对于S402来说,可以包括如下步骤:
S501:若当前解码的辅助信息为拼接地图集信息中图像格式的对应信息,则调用多视点解码器进行解码,得到图像格式对应的辅助信息。
S502:若当前解码的辅助信息为拼接地图集信息中点云格式的对应信息,则调用点云解码器进行解码,得到点云格式对应的辅助信息。
需要说明的是,在本申请实施例中,第一数据格式和第二数据格式不同。其中,第一数据格式可以为图像格式,第二数据格式可以为点云格式;或者,第一数据格式和第二数据格式的投影格式 不同,第一数据格式可以为透视投影格式,第二数据格式可以为正交投影格式;或者,第一数据格式还可以为网格格式、点云格式等等,第二数据格式也可以为网格格式、图像格式等等,这里并不作任何限定。
还需要说明的是,在本申请实施例中,点云格式为非均匀采样处理,图像格式为均匀采样处理,因此,点云格式和图像格式可以作为两种异构格式。在这种情况下,对于图像格式,可以调用多视点解码器进行解码;对于点云格式,可以调用点云解码器进行解码。这样,如果当前需要解码的是图像格式的对应信息,那么就需要调用多视点解码器进行解码,即可得到图像格式对应的辅助信息;如果当前需要解码的是点云格式的对应信息,那么就需要调用点云解码器进行解码,即可得到点云格式对应的辅助信息,从而既能够保留来自图像格式的渲染特性,又能够保留来自点云格式的渲染特性。
进一步地,在一些实施例中,对于S403来说,所述对待解码视频数据进行视频解码,得到拼接图像,可以包括:
调用视频解码器对待解码视频数据进行视频解码,得到拼接图像;其中,视频解码器的数量为一个。
也就是说,针对共存于一个拼接图像中的至少两种异构格式对应的图像子块可以是由一个视频解码器进行解码得到的。这样,与相关技术中分别编码再调用各自的解码器独立解码多路信号相比,本申请实施例需要调用的视频解码器数量少,可以充分利用视频解码器的处理像素率,使得对硬件要求降低。
具体来讲,针对拼接图像中的多种异构格式对应的图像子块,可以由一个视频解码器进行解码得到;但是针对拼接地图集信息中这多种异构格式各自的辅助信息,可以调用各自的元数据解码器进行解码,以得到不同异构格式对应的辅助信息。示例性地,如果需要解码拼接地图集信息中点云格式的对应信息,可以调用点云解码器进行解码,以得到点云格式对应的辅助信息;如果需要解码拼接地图集信息中图像格式的对应信息,可以调用多视点解码器进行解码,以得到图像格式对应的辅助信息等,本申请实施例不作任何限定。
进一步地,在得到至少两种异构格式各自的辅助信息和拼接图像之后,在一些实施例中,如图6所示,该方法还可以包括:
S601:利用至少两种异构格式各自的辅助信息对拼接图像进行渲染处理,得到目标三维图像。
这样,在本申请实施例中,针对至少两种异构格式对应的图像子块可以共存于一张拼接图像中,而且该拼接图像使用一个视频解码器进行解码,从而减少了视频解码器的数量;但是针对至少两种异构格式各自的辅助信息,可以分别调用相应元数据解码器进行解码,从而能够保留来自不同数据格式(例如图像格式、点云格式等)的渲染优点,还能够提高图像的合成质量。
可以理解的是,在相关技术中,对于点云、图像等不同数据格式,如果共存在一张拼接图像上则是不支持的。在MPEG标准中,目前已经定义了图像格式和点云格式的公共高层语法信息,这时候需要搭配图像格式或者是点云格式才可以使用,所以在标准里定义了语法元素asps_extension_present_flag的标志位是用于指示拓展功能的开启。其中,如果语法元素asps_vpcc_extension_present_flag的标志位为真(或取值为1),那么可以参考点云解码标准中的具体解码过程;如果语法元素asps_miv_extension_present_flag的标志位为真(或取值为1),那么可以遵守图像解码标准里的具体解码过程,具体如表1所示。
表1
Figure PCTCN2021140985-appb-000001
Figure PCTCN2021140985-appb-000002
在这里,表2所示的点云解码标准中规定了当语法元素asps_vpcc_extension_present_flag的标志位为真(或取值为1)时,图像解码标准拓展涉及的相关语法元素(以灰色为底的语法元素部分)的标志位均为假(或取值为0)。具体如下所示。所以点云解码标准(例如V-PCC标准)和图像解码标准(例如MIV标准)均不能支持二者同时为真。
表2
Figure PCTCN2021140985-appb-000003
Figure PCTCN2021140985-appb-000004
也就是说,在使用V-PCC标准或MIV标准时实际上二者只能有一个为真,并无法处理二者均为真的情况。基于此,本申请实施例提供了一种解码方法,可以实现点云、图像等不同数据格式的图像子块共存在一张拼接图像,以便实现前述的节省视频解码器数量的优点,而且还能够保留来自图像格式、点云格式等不同数据格式的渲染特性,提高了图像的合成质量。
也就是说,本申请实施例设置有目标语法元素概述表(Profile),且该目标语法元素概述表用于指示可以支持至少两种异构格式对应的图像子块共存于一张拼接图像。这样,由当点云、图像等不同数据格式对应的图像子块共存于一张拼接图像时,本申请实施例可以实现通过一个视频解码器进行解码处理。
在这里,对于目标语法元素概述表而言,其可以是在初始语法元素概述表的基础上扩展得到的。也就是说,目标语法元素概述表可以是由初始概述部分和混合概述部分组成。在一种具体的实施例中,初始概述部分用于指示不支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像;混合概述部分用于指示可以支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像。
示例性地,以MIV解码标准和V-PCC解码标准为例,在这里,初始语法元素概述表或者可以说是初始概述部分仅支持图像格式对应的图像子块,而且明确指出图像格式对应的图像子块和点云格式对应的图像子块不能共存于一张拼接图像;目标语法元素概述表由于增加了混合概述部分,使其能够支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像,具体详见表3所示。其中,表3是在标准中已有的MIV语法元素概述的基础上扩展得到的,以灰色为底的部分即为本申请实施例新增混合概述部分的内容。
表3
Figure PCTCN2021140985-appb-000005
Figure PCTCN2021140985-appb-000006
需要说明的是,表3提供了一种目标语法元素概述表的示例。该目标语法元素概述表仅是一种具体示例,除了语法元素vps_occupancy_video_present_flag[atlasID]的标志位确定为1(点云投影方式的原因,必须有occupancy信息),其余一些语法元素的标志位可以不加限制,例如语法元素ai_attribute_count[atlasID]可以不加约束(除了纹理、透明度,点云也支持反射率、材质等属性)。简言之,表3仅是一个示例,本申请实施例不作具体限定。
还需要说明的是,在表3中,新增加了一些图像格式和点云格式混合相关的语法元素,也就是说,目标语法元素概述表可以是由初始概述部分和混合概述部分组成。因此,在一些实施例中,该方法还可以包括:
根据码流,获得语法元素标识信息的取值;
若语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像、且在混合概述部分支持述至少两种异构格式对应的图像子块共存于拼接图像,则执行根据码流,获得拼接地图集信息以及待解码数据的步骤。
在一种具体的实施例中,所述根据码流,获得语法元素标识信息的取值,可以包括:
若语法元素标识信息的取值在初始概述部分为第一值,则确定语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像;
若语法元素标识信息的取值在混合概述部分为第二值,则确定语法元素标识信息指示在混合概述部分支持至少两种异构格式对应的图像子块共存于拼接图像。
需要说明的是,该方法还可以包括:若语法元素标识信息的取值在初始概述部分为第二值,则确定语法元素标识信息指示在初始概述部分支持至少两种异构格式对应的图像子块共存于拼接图像;或者,若语法元素标识信息的取值在混合概述部分为第一值,则确定语法元素标识信息指示在混合概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像。
在本申请实施例中,第一值和第二值不同。其中,第一值等于0,第二值等于1;或者,第一值等于1,第二值等于0;或者,第一值为假(false),第二值为真(true)等等。在一种具体的实施例中,第一值等于0,第二值等于1,但是这里并不作任何限定。
也就是说,针对标准中的初始语法元素概述表中增加了关于V-PCC extension相关的标志位(flag)限制,这里增加两个语法元素asps_vpcc_extension_present_flag和aaps_vpcc_extension_present_flag,并且在初始概述部分中语法元素标识信息的取值明确为0,即明确图像格式和点云格式不能共存。因此,这里定义新的概述表(即表3所示的目标语法元素概述表)可以支持这一情况,如果是这种虚实混合的应用场景,在解码辅助信息时,遇见图像格式就调用相应的图像解码标准(即图像解码器),遇见点云格式就调用点云解码标准(即点云解码器),然后把像素点都恢复在三维空间中,再投影至目标视点。
还需要说明的是,语法元素的解析以及在相关标准中记载的点云格式的解码过程以及图像格式的解码过程引入到新概述表(即本申请实施例所述的目标语法元素概述表)的解码过程。示例性地,MIV Main Mixed V-PCC Profile的解码过程来自MIV Main以及V-PCC的相关解码过程,以此类推。另外,在标准中,V-PCC Profile有如下四种,如表4所示。
表4
Figure PCTCN2021140985-appb-000007
因此,由于MIV Profile有四种,V-PCC Profile有如下四种,所以虚实混合(MIV Mixed V-PCC)总共有16种组合,如下所示。
表5
Figure PCTCN2021140985-appb-000008
进一步地,在一些实施例中,在对符合混合V-PCC Profile的比特流进行解码之后,还需要进行渲染处理,该过程可以包括如下步骤:尺度缩放几何(Scale geometry)、子块的属性补偿处理(Apply patch attribute offset process)、过滤不必要子块(Filter inpaint patches)、重建裁剪视图(Reconstruct pruned views)、基于视点信息确定视图混合权重(Determine view blending weights based on a viewport pose)、恢复样本权重(Recover sample weights)、重建3D点(Reconstruct 3D points)、重建标准中规定的3D点云(Reconstruct 3D point cloud)、投影到视点(Project to a viewport)、从多视图中获取纹理信息(Fetch texture from multiple views)、混合纹理贡献(Blend texture contributions)等。其中,“重建标准中规定的3D点云”为本申请实施例新增加的步骤,以实现虚实混合。
简言之,本申请实施例所提供的用于虚实混合的解码方法,如果一张拼接图像中共存图像格式和点云格式的patch或者不同投影格式的patch,对于辅助信息的解码,元数据解码器需要区分是图像部分的元数据解码还是点云部分的元数据解码,但是拼接图像仅需要一个视频解码器即可,也即需要的视频解码器数量少。具体而言,不仅能够实现对标准的扩展,而且针对由不同的(或异构的)数据格式与场景中的同构数据格式组成的应用场景,可以通过这种方式,为不同来源的多个数据格式(例如图像、点云、网格等)提供实时沉浸式视频交互服务,促进VR/AR/MR产业的发展。
另外,在本申请实施例中,将图像格式与点云格式混合编码,与分别编码再调用各自解码器独立解多路信号相比,这里需要调用的视频解码器数量少,充分利用了视频解码器的处理像素率,对硬件要求降低。此外,本申请实施例保留了来自不同源的数据格式(网格、点云等)的渲染优点,还能够提高图像的合成质量。
本实施例提供了一种解码方法,通过根据码流,获得拼接地图集信息以及待解码视频数据;对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
在本申请的另一实施例中,参见图7,其示出了本申请实施例提供的一种编码方法的流程示意图。如图7所示,该方法可以包括:
S701:获取至少两种异构格式的视觉数据对应的图像子块。
S702:对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像。
S703:对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
需要说明的是,本申请实施例所述的编码方法具体可以是指3D异构视觉数据的编码方法。在本申请实施例中,针对点云、图像等不同异构格式对应的图像子块是可以共存在一张拼接图像的。这样,在对这至少两种异构格式的视觉数据对应的图像子块所组成的拼接图像进行编码之后,后续 可以仅通过一个视频解码器进行解码,从而能够减少视频解码器的需求量。
还需要说明的是,在本申请实施例中,属于同一拼接图像的序列使用一个视频解码器,而同一时刻的不同拼接图像属于不同序列。另外,本申请实施例所述的异构格式可以是指数据的来源不同,也可以是指同一来源处理为不同的数据格式,这里不作任何限定。
还需要说明的是,在本申请实施例中,拼接地图集信息可以是由至少两种异构格式的视觉数据各自的辅助信息进行拼接构成的;拼接图像可以是由至少两种异构格式的视觉数据对应的图像子块进行拼接构成的。
进一步地,在一些实施例中,如图8所示,对于S703而言,可以包括如下步骤:
S801:调用元数据编码器对拼接地图集信息进行元数据编码。
S802:调用视频编码器对拼接图像进行视频编码。
也就是说,针对点云、图像等不同数据格式各自的辅助信息是可以共存于同一地图集上的,但是在该拼接地图集信息中,针对每一种异构格式的辅助信息可以调用相应元数据编码器进行编码处理。
对于拼接图像而言,可以是将点云、图像等不同数据格式的视觉数据对应的图像子块重排列在同一拼接图像上,然后针对该拼接图像可以调用视频编码器进行编码处理。
在本申请实施例中,视频编码器的数量为一个;而元数据编码器的数量为至少两种,且元数据编码器的数量与异构格式的数量具有对应关系。也就是说,针对每一种异构格式的辅助信息可以使用对应的元数据编码器进行编码。换言之,在本申请实施例中,拼接地图集信息中所包括多少种异构格式的辅助信息,那么就需要多少种元数据编码器。
进一步地,在一些实施例中,至少两种异构格式可以包括第一数据格式和第二数据格式。相应地,所述调用元数据编码器对所述拼接地图集信息进行元数据编码,可以包括:
若当前编码的辅助信息为所述拼接地图集信息中所述第一数据格式的对应信息,则调用所述第一数据格式对应的元数据编码器进行编码;
若当前编码的辅助信息为所述拼接地图集信息中所述第二数据格式的对应信息,则调用所述第二数据格式对应的元数据编码器进行编码。
需要说明的是,针对共存于一个拼接图像中的第一数据格式和第二数据格式对应的图像子块可以是由一个视频编码器进行编码。但是针对这种两种数据格式的虚实混合应用事件(Use Case),当编码拼接地图集信息中不同数据格式的辅助信息时,如果当前需要编码的辅助信息是第一数据格式的对应信息,那么就需要调用第一数据格式对应的元数据编码器进行编码;如果当前需要编码的辅助信息是第二数据格式的对应信息,那么就需要调用第二数据格式对应的元数据编码器进行编码。
进一步地,在一些实施例中,至少两种异构格式还可以包括第三数据格式。相应地,所述调用元数据编码器对拼接地图集信息进行元数据编码,还可以包括:
若当前编码的辅助信息为拼接地图集信息中第三数据格式的对应信息,则调用第三数据格式对应的元数据编码器进行编码。
也就是说,在本申请实施例中,这至少两种异构格式并不仅限于第一数据格式和第二数据格式,甚至还可以包括第三数据格式、第四数据格式等等,当需要编码某一数据格式的辅助信息时,只需要调用相应元数据编码器进行编码即可,下面仅以第一数据格式和第二数据格式为例进行说明。
在一种具体的实施例中,第一数据格式为图像格式,第二数据格式为点云格式。相应地,所述调用元数据编码器对拼接地图集信息进行元数据编码,可以包括:
若当前编码的辅助信息为拼接地图集信息中图像格式的对应信息,则调用多视点编码器进行编码;
若当前编码的辅助信息为拼接地图集信息中点云格式的对应信息,则调用点云编码器进行编码。
需要说明的是,在本申请实施例中,第一数据格式和第二数据格式不同。其中,第一数据格式可以为图像格式,第二数据格式可以为点云格式;或者,第一数据格式和第二数据格式的投影格式不同,第一数据格式可以为透视投影格式,第二数据格式可以为正交投影格式;或者,第一数据格式还可以为网格格式、点云格式等等,第二数据格式也可以为网格格式、图像格式等等,这里并不作任何限定。
还需要说明的是,在本申请实施例中,点云格式为非均匀采样处理,图像格式为均匀采样处理,因此,点云格式和图像格式可以作为两种异构格式。在这种情况下,对于图像格式的辅助信息,可以调用多视点编码器进行编码;对于点云格式的辅助信息,可以调用点云编码器进行编码。这样,如果当前需要编码的辅助信息是图像格式的对应信息,那么就需要调用多视点编码器进行编码;如 果当前需要编码的辅助信息是点云格式的对应信息,那么就需要调用点云编码器进行编码,以便后续在解码侧进行解码处理时,既能够保留来自图像格式的渲染特性,又能够保留来自点云格式的渲染特性。
这样,在本申请实施例中,针对至少两种异构格式的视觉数据对应的图像子块可以共存于一张拼接图像中,而且该拼接图像可以使用一个视频编码器进行编码,减少了视频编码器的数量;由于后续是使用一个视频解码器进行解码,同时还减少了视频解码器的数量;但是针对至少两种异构格式各自的辅助信息,可以分别调用相应元数据编码器进行编码,然后再解码时调用相应元数据解码器进行解码,从而还能够保留来自不同数据格式(例如图像格式、点云格式等)的渲染优点,以提高图像的合成质量。
可以理解的是,在本申请实施例中,对于目标语法元素概述表而言,其可以是在标准中已有的初始语法元素概述表的基础上扩展得到的。也就是说,目标语法元素概述表可以是由初始概述部分和混合概述部分组成。在一种具体的实施例中,初始概述部分用于指示不支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像;混合概述部分用于指示可以支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像。
示例性地,在这里,初始语法元素概述表或者可以说是初始概述部分仅支持图像格式对应的图像子块,而且明确指出图像格式对应的图像子块和点云格式对应的图像子块不能共存于一张拼接图像;目标语法元素概述表由于增加了混合概述部分,使其能够支持图像格式对应的图像子块和点云格式对应的图像子块共存于一张拼接图像,具体详见前述表3所示。
另外,还需要说明的是,表3提供了一种目标语法元素概述表的示例。该目标语法元素概述表仅是一种具体示例,除了语法元素vps_occupancy_video_present_flag[atlasID]的标志位确定为1(点云投影方式的原因,必须有occupancy信息),其余一些语法元素的标志位可以不加限制,例如语法元素ai_attribute_count[atlasID]可以不加约束(除了纹理、透明度,点云也支持反射率、材质等属性)。简言之,表3仅是一个示例,本申请实施例不作具体限定。
还需要说明的是,在前述表3中,新增加了一些图像格式和点云格式混合相关的语法元素,也就是说,目标语法元素概述表可以是由初始概述部分和混合概述部分组成。因此,在一些实施例中,该方法还可以包括:
确定语法元素标识信息的取值;
对语法元素标识信息的取值进行编码,并将所得到的编码比特写入码流。
在一种具体的实施例中,所述确定语法元素标识信息的取值,可以包括:
若语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在初始概述部分为第一值;
若语法元素标识信息指示在混合概述部分支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在混合概述部分为第二值。
需要说明的是,该方法还可以包括:若语法元素标识信息指示在初始概述部分支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在初始概述部分为第二值;或者,若语法元素标识信息指示在混合概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在混合概述部分为第一值。
在本申请实施例中,第一值和第二值不同。其中,第一值等于0,第二值等于1;或者,第一值等于1,第二值等于0;或者,第一值为假(false),第二值为真(true)等等。在一种具体的实施例中,第一值等于0,第二值等于1,但是这里并不作任何限定。
也就是说,针对标准中的初始语法元素概述表中增加了关于V-PCC extension相关的标志位(flag)限制,这里增加两个语法元素asps_vpcc_extension_present_flag和aaps_vpcc_extension_present_flag,并且在初始概述部分中语法元素标识信息的取值明确为0,即明确图像格式和点云格式不能共存。因此,这里定义新的概述表(即表3所示的目标语法元素概述表)可以支持这一情况,如果是这种虚实混合的应用场景,在编码辅助信息时,遇见图像格式就调用相应的图像编码标准(即图像编码器),遇见点云格式就调用点云编码标准(即点云编码器),以便后续再解码时调用相应元数据解码器进行解码,从而当把像素点都恢复在三维空间中再投影至目标视点时,能够保留来自不同数据格式(例如图像格式、点云格式等)的渲染优点,以提高图像的合成质量。
简言之,本申请实施例所提供的用于虚实混合的编码方法,具体可以是指3D异构视觉数据的编码方法,这时候如果一张拼接图像中共存图像格式和点云格式的patch或者不同投影格式的patch,对于辅助信息的编码,元数据编码器需要区分是图像部分的元数据编码还是点云部分的元数据编码, 但是拼接图像仅需要一个视频编码器即可,也即需要的视频编码器数量少。具体而言,不仅能够实现对标准的扩展,而且针对由不同的(或异构的)数据格式与场景中的同构数据格式组成的应用场景,可以通过这种方式,为不同来源的多个数据格式(例如图像、点云、网格等)提供实时沉浸式视频交互服务,促进VR/AR/MR产业的发展。
另外,在本申请实施例中,将图像格式与点云格式混合编码,与分别编码再调用各自解码器独立解多路信号相比,这里需要调用的视频编解码器的数量少,对硬件要求降低。此外,本申请实施例保留了来自不同源的数据格式(网格、点云等)的渲染优点,还能够提高图像的合成质量。
本实施例提供了一种编码方法,通过获取至少两种异构格式的视觉数据对应的图像子块;对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
在本申请的又一实施例中,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的。
在本申请实施例中,待编码信息可以包括下述至少之一:拼接地图集信息、拼接图像和语法元素标识信息的取值。其中语法元素标识信息的取值用于明确相关技术中图像和点云等不同格式在同一张拼接图像上不能共存,但本申请实施例可以支持图像和点云等不同格式共存在同一张拼接图像上;这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图9,其示出了本申请实施例提供的一种编码装置90的组成结构示意图。如图9所示,该编码装置90可以包括:第一获取单元901、拼接单元902和编码单元903;其中,
第一获取单元901,配置为获取至少两种异构格式的视觉数据对应的图像子块;
拼接单元902,配置为对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;
编码单元903,配置为对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
在一些实施例中,拼接地图集信息是由至少两种异构格式的视觉数据各自的辅助信息进行拼接构成的;拼接图像是由至少两种异构格式的视觉数据对应的图像子块进行拼接构成的。
在一些实施例中,编码单元903,具体配置为调用元数据编码器对拼接地图集信息进行元数据编码;以及调用视频编码器对拼接图像进行视频编码。
在一些实施例中,视频编码器的数量为一个;元数据编码器的数量为至少两种,且元数据编码器的数量与异构格式的数量具有对应关系。
在一些实施例中,至少两种异构格式包括第一数据格式和第二数据格式;相应地,编码单元903,还配置为若当前编码的辅助信息为拼接地图集信息中第一数据格式的对应信息,则调用第一数据格式对应的元数据编码器进行编码;以及若当前编码的辅助信息为拼接地图集信息中第二数据格式的对应信息,则调用第二数据格式对应的元数据编码器进行编码。
在一些实施例中,第一数据格式为图像格式,第二数据格式为点云格式;相应地,编码单元903,还配置为若当前编码的辅助信息为拼接地图集信息中图像格式的对应信息,则调用多视点编码器进行编码;以及若当前编码的辅助信息为拼接地图集信息中点云格式的对应信息,则调用点云编码器进行编码。
在一些实施例中,至少两种异构格式还包括第三数据格式;相应地,编码单元903,还配置为若当前编码的辅助信息为拼接地图集信息中第三数据格式的对应信息,则调用第三数据格式对应的元数据编码器进行编码。
在一些实施例中,参见图9,编码装置90还可以包括第一确定单元904,配置为确定语法元素 标识信息的取值;
编码单元903,还配置为对语法元素标识信息的取值进行编码,并将所得到的编码比特写入码流。
在一些实施例中,第一确定单元904,具体配置为若语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在初始概述部分为第一值;以及若语法元素标识信息指示在混合概述部分支持至少两种异构格式对应的图像子块共存于拼接图像,则确定语法元素标识信息的取值在混合概述部分为第二值。
在一些实施例中,第一值等于0,第二值等于1。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码装置90的组成以及计算机存储介质,参见图10,其示出了本申请实施例提供的一种编码设备100的具体硬件结构示意图。如图10所示,该编码设备100可以包括:第一通信接口1001、第一存储器1002和第一处理器1003;各个组件通过第一总线系统1004耦合在一起。可理解,第一总线系统1004用于实现这些组件之间的连接通信。第一总线系统1004除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图10中将各种总线都标为第一总线系统1004。其中,
第一通信接口1001,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器1002,用于存储能够在第一处理器1003上运行的计算机程序;
第一处理器1003,用于在运行所述计算机程序时,执行:
获取至少两种异构格式的视觉数据对应的图像子块;
对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;
对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。
可以理解,本申请实施例中的第一存储器1002可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器1002旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器1003可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器1003中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器1003可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器 执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器1002,第一处理器1003读取第一存储器1002中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器1003还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种编码设备,该编码设备可以包括前述实施例中所述的编码装置90。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图11,其示出了本申请实施例提供的一种解码装置110的组成结构示意图。如图11所示,该解码装置110可以包括第二获取单元1101、元数据解码单元1102和视频解码单元1103;其中,
第二获取单元1101,配置为根据码流,获得拼接地图集信息以及待解码视频数据;
元数据解码单元1102,配置为对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
视频解码单元1103,配置为对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。
在一些实施例中,元数据解码单元1102,具体配置为调用至少两种元数据解码器对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息。
在一些实施例中,至少两种异构格式包括第一数据格式和第二数据格式;相应地,元数据解码单元1102,还配置为若当前解码的辅助信息为拼接地图集信息中第一数据格式的对应信息,则调用第一数据格式对应的元数据解码器进行解码,得到第一数据格式对应的辅助信息;以及若当前解码的辅助信息为拼接地图集信息中第二数据格式的对应信息,则调用第二数据格式对应的元数据解码器进行解码,得到第二数据格式对应的辅助信息。
在一些实施例中,第一数据格式为图像格式,第二数据格式为点云格式;相应地,元数据解码单元1102,还配置为若当前解码的辅助信息为拼接地图集信息中图像格式的对应信息,则调用多视点解码器进行解码,得到图像格式对应的辅助信息;以及若当前解码的辅助信息为拼接地图集信息中点云格式的对应信息,则调用点云解码器进行解码,得到点云格式对应的辅助信息。
在一些实施例中,至少两种异构格式还包括第三数据格式;相应地,元数据解码单元1102,还配置为若当前解码的辅助信息为拼接地图集信息中第三数据格式的对应信息,则调用第三数据格式对应的元数据解码器进行解码,得到第三数据格式对应的辅助信息。
在一些实施例中,视频解码单元1103,具体配置为调用视频解码器对待解码视频数据进行视频解码,得到拼接图像;其中,视频解码器的数量为一个。
在一些实施例中,参见图11,解码装置110还可以包括渲染单元1104,配置为利用至少两种异构格式各自的辅助信息对拼接图像进行渲染处理,得到目标三维图像。
在一些实施例中,第二获取单元1101,还配置为根据码流,获得语法元素标识信息的取值;以及若语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像、且在混合概述部分支持述至少两种异构格式对应的图像子块共存于拼接图像,则执行根据码流,获得拼接地图集信息以及待解码数据的步骤。
在一些实施例中,参见图11,解码装置110还可以包括第二确定单元1105,配置为若语法元素标识信息的取值在初始概述部分为第一值,则确定语法元素标识信息指示在初始概述部分不支持至少两种异构格式对应的图像子块共存于拼接图像;以及若语法元素标识信息的取值在混合概述部分为第二值,则确定语法元素标识信息指示在混合概述部分支持至少两种异构格式对应的图像子块共存于拼接图像。
在一些实施例中,第一值等于0,第二值等于1。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,所述计算机程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码装置110的组成以及计算机存储介质,参见图12,其示出了本申请实施例提供的一种解码设备120的具体硬件结构示意图。如图12所示,该解码设备120可以包括:第二通信接口1201、第二存储器1202和第二处理器1203;各个组件通过第二总线系统1204耦合在一起。可理解,第二总线系统1204用于实现这些组件之间的连接通信。第二总线系统1204除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图12中将各种总线都标为第二总线系统1204。其中,
第二通信接口1201,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1202,用于存储能够在第二处理器1203上运行的计算机程序;
第二处理器1203,用于在运行所述计算机程序时,执行:
根据码流,获得拼接地图集信息以及待解码视频数据;
对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。
可选地,作为另一个实施例,第二处理器1203还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器1202与第一存储器1002的硬件功能类似,第二处理器1203与第一处理器1003的硬件功能类似;这里不再详述。
本实施例提供了一种解码设备,该解码设备可以包括前述实施例中任一项所述的解码装置110。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
在本申请的再一实施例中,参见图13,其示出了本申请实施例提供的一种编解码系统的组成结构示意图。如图13所示,编解码系统130可以包括编码设备1301和解码设备1302。其中,编码设备1301可以为前述实施例中任一项所述的编码设备,解码设备1302可以为前述实施例中任一项所述的解码设备。
在本申请实施例中,该编解码系统130可以在同一张地图集中支持至少两种异构格式对应的视觉数据,不仅能够实现对编解码标准的扩展,而且还能够减少视频解码器的需求量,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,在编码侧,获取至少两种异构格式的视觉数据对应的图像子块;对至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;对拼接地图集信息和拼接图像进行编码,并将所得到的编码比特写入码流。在解码侧,根据码流,获得拼接地图集信息以及待解码视频数据;对拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;对待解码视频数据进行视频解码,得到拼接图像;其中,拼接图像是由至少两种异构格式对应的图像子块组成。这样,在同一张地图集中支持至少两种异构格式对应的视觉数据,然后可以利用不同的元数据解码器对这至少两种异构格式各自的辅助信息进行解码,利用一个视频解码器可以对这至少两种异构格式组成的拼接图像进行解码,从而不仅实现了对编解码标准的扩展,而且还减少了视频解码器的需求量,能够充分利用视频解码器的处理像素率,降低了硬件需求;另外,由于还能够保留来自不同异构格式的渲染特性,进而还提高了图像的合成质量。

Claims (26)

  1. 一种解码方法,所述方法包括:
    根据码流,获得拼接地图集信息以及待解码视频数据;
    对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
    对所述待解码视频数据进行视频解码,得到拼接图像;其中,所述拼接图像是由所述至少两种异构格式对应的图像子块组成。
  2. 根据权利要求1所述的方法,其中,所述对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,包括:
    调用至少两种元数据解码器对所述拼接地图集信息进行元数据解码,得到所述至少两种异构格式各自的辅助信息。
  3. 根据权利要求1所述的方法,其中,所述至少两种异构格式包括第一数据格式和第二数据格式;
    所述对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,包括:
    若当前解码的辅助信息为所述拼接地图集信息中所述第一数据格式的对应信息,则调用所述第一数据格式对应的元数据解码器进行解码,得到所述第一数据格式对应的辅助信息;
    若当前解码的辅助信息为所述拼接地图集信息中所述第二数据格式的对应信息,则调用所述第二数据格式对应的元数据解码器进行解码,得到所述第二数据格式对应的辅助信息。
  4. 根据权利要求3所述的方法,其中,所述第一数据格式为图像格式,所述第二数据格式为点云格式;
    所述对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,包括:
    若当前解码的辅助信息为所述拼接地图集信息中所述图像格式的对应信息,则调用多视点解码器进行解码,得到所述图像格式对应的辅助信息;
    若当前解码的辅助信息为所述拼接地图集信息中所述点云格式的对应信息,则调用点云解码器进行解码,得到所述点云格式对应的辅助信息。
  5. 根据权利要求3所述的方法,其中,所述至少两种异构格式还包括第三数据格式;
    所述对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息,还包括:
    若当前解码的辅助信息为所述拼接地图集信息中所述第三数据格式的对应信息,则调用所述第三数据格式对应的元数据解码器进行解码,得到所述第三数据格式对应的辅助信息。
  6. 根据权利要求1所述的方法,其中,所述对所述待解码视频数据进行视频解码,得到拼接图像,包括:
    调用视频解码器对所述待解码视频数据进行视频解码,得到所述拼接图像;其中,所述视频解码器的数量为一个。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    利用所述至少两种异构格式各自的辅助信息对所述拼接图像进行渲染处理,得到目标三维图像。
  8. 根据权利要求1至7任一项所述的方法,其中,所述方法还包括:
    根据码流,获得语法元素标识信息的取值;
    若所述语法元素标识信息指示在初始概述部分不支持所述至少两种异构格式对应的图像子块共存于所述拼接图像、且在混合概述部分支持述至少两种异构格式对应的图像子块共存于所述拼接图像,则执行所述根据码流,获得拼接地图集信息以及待解码数据的步骤。
  9. 根据权利要求8所述的方法,其中,所述根据码流,获得语法元素标识信息的取值,包括:
    若所述语法元素标识信息的取值在所述初始概述部分为第一值,则确定所述语法元素标识信息指示在初始概述部分不支持所述至少两种异构格式对应的图像子块共存于所述拼接图像;
    若所述语法元素标识信息的取值在所述混合概述部分为第二值,则确定所述语法元素标识信息指示在混合概述部分支持所述至少两种异构格式对应的图像子块共存于所述拼接图像。
  10. 根据权利要求9所述的方法,其中,所述第一值等于0,所述第二值等于1。
  11. 一种编码方法,所述方法包括:
    获取至少两种异构格式的视觉数据对应的图像子块;
    对所述至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图 像;
    对所述拼接地图集信息和所述拼接图像进行编码,并将所得到的编码比特写入码流。
  12. 根据权利要求11所述的方法,其中,所述拼接地图集信息是由所述至少两种异构格式的视觉数据各自的辅助信息进行拼接构成的;
    所述拼接图像是由所述至少两种异构格式的视觉数据对应的图像子块进行拼接构成的。
  13. 根据权利要求12所述的方法,其中,所述对所述拼接地图集信息和所述拼接图像进行编码,包括:
    调用元数据编码器对所述拼接地图集信息进行元数据编码;以及
    调用视频编码器对所述拼接图像进行视频编码。
  14. 根据权利要求13所述的方法,其中,
    所述视频编码器的数量为一个;
    所述元数据编码器的数量为至少两种,且所述元数据编码器的数量与所述异构格式的数量具有对应关系。
  15. 根据权利要求13所述的方法,其中,所述至少两种异构格式包括第一数据格式和第二数据格式;
    所述调用元数据编码器对所述拼接地图集信息进行元数据编码,包括:
    若当前编码的辅助信息为所述拼接地图集信息中所述第一数据格式的对应信息,则调用所述第一数据格式对应的元数据编码器进行编码;
    若当前编码的辅助信息为所述拼接地图集信息中所述第二数据格式的对应信息,则调用所述第二数据格式对应的元数据编码器进行编码。
  16. 根据权利要求15所述的方法,其中,所述第一数据格式为图像格式,所述第二数据格式为点云格式;
    所述调用元数据编码器对所述拼接地图集信息进行元数据编码,包括:
    若当前编码的辅助信息为所述拼接地图集信息中所述图像格式的对应信息,则调用多视点编码器进行编码;
    若当前编码的辅助信息为所述拼接地图集信息中所述点云格式的对应信息,则调用点云编码器进行编码。
  17. 根据权利要求15所述的方法,其中,所述至少两种异构格式还包括第三数据格式;
    所述调用元数据编码器对所述拼接地图集信息进行元数据编码,还包括:
    若当前编码的辅助信息为所述拼接地图集信息中所述第三数据格式的对应信息,则调用所述第三数据格式对应的元数据编码器进行编码。
  18. 根据权利要求11至17任一项所述的方法,其中,所述方法还包括:
    确定语法元素标识信息的取值;
    对所述语法元素标识信息的取值进行编码,并将所得到的编码比特写入所述码流。
  19. 根据权利要求18所述的方法,其中,所述确定语法元素标识信息的取值,包括:
    若所述语法元素标识信息指示在初始概述部分不支持所述至少两种异构格式对应的图像子块共存于所述拼接图像,则确定所述语法元素标识信息的取值在所述初始概述部分为第一值;
    若所述语法元素标识信息指示在混合概述部分支持所述至少两种异构格式对应的图像子块共存于所述拼接图像,则确定所述语法元素标识信息的取值在所述混合概述部分为第二值。
  20. 根据权利要求19所述的方法,其中,所述第一值等于0,所述第二值等于1。
  21. 一种码流,所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息包括下述至少之一:拼接地图集信息、拼接图像和语法元素标识信息的取值。
  22. 一种编码装置,所述编码装置包括第一获取单元、拼接单元和编码单元;其中,
    所述第一获取单元,配置为获取至少两种异构格式的视觉数据对应的图像子块;
    所述拼接单元,配置为对所述至少两种异构格式的视觉数据对应的图像子块进行拼接,得到拼接地图集信息和拼接图像;
    所述编码单元,配置为对所述拼接地图集信息和所述拼接图像进行编码,并将所得到的编码比特写入码流。
  23. 一种编码设备,所述编码设备包括第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
    所述第一处理器,用于在运行所述计算机程序时,执行如权利要求11至20任一项所述的方法。
  24. 一种解码装置,所述解码装置包括第二获取单元、元数据解码单元和视频解码单元;其中,
    所述第二获取单元,配置为根据码流,获得拼接地图集信息以及待解码视频数据;
    所述元数据解码单元,配置为对所述拼接地图集信息进行元数据解码,得到至少两种异构格式各自的辅助信息;
    所述视频解码单元,配置为对所述待解码视频数据进行视频解码,得到拼接图像;其中,所述拼接图像是由所述至少两种异构格式对应的图像子块组成。
  25. 一种解码设备,所述解码设备包括第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
    所述第二处理器,用于在运行所述计算机程序时,执行如权利要求1至10任一项所述的方法。
  26. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被执行时实现如权利要求1至10任一项所述的方法、或者实现如权利要求11至20任一项所述的方法。
PCT/CN2021/140985 2021-12-23 2021-12-23 编解码方法、码流、装置、设备以及可读存储介质 WO2023115489A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/140985 WO2023115489A1 (zh) 2021-12-23 2021-12-23 编解码方法、码流、装置、设备以及可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/140985 WO2023115489A1 (zh) 2021-12-23 2021-12-23 编解码方法、码流、装置、设备以及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023115489A1 true WO2023115489A1 (zh) 2023-06-29

Family

ID=86901014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140985 WO2023115489A1 (zh) 2021-12-23 2021-12-23 编解码方法、码流、装置、设备以及可读存储介质

Country Status (1)

Country Link
WO (1) WO2023115489A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200314449A1 (en) * 2017-07-13 2020-10-01 Interdigital Ce Patent Holdings Methods, devices and stream for encoding and decoding volumetric video
US20210099687A1 (en) * 2019-09-26 2021-04-01 Electronics And Telecommunications Research Institute Method for processing immersive video and method for producing immersive video
WO2021117859A1 (ja) * 2019-12-13 2021-06-17 ソニーグループ株式会社 画像処理装置および方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200314449A1 (en) * 2017-07-13 2020-10-01 Interdigital Ce Patent Holdings Methods, devices and stream for encoding and decoding volumetric video
US20210099687A1 (en) * 2019-09-26 2021-04-01 Electronics And Telecommunications Research Institute Method for processing immersive video and method for producing immersive video
WO2021117859A1 (ja) * 2019-12-13 2021-06-17 ソニーグループ株式会社 画像処理装置および方法

Similar Documents

Publication Publication Date Title
US10575007B2 (en) Efficient decoding and rendering of blocks in a graphics pipeline
CN109716759B (zh) 提升质量递送及合成处理
US11197010B2 (en) Browser-based video decoder using multiple CPU threads
US11683513B2 (en) Partitioning of coded point cloud data
KR20200019718A (ko) 이미지 처리 방법, 단말기, 및 서버
US20220353532A1 (en) Scaling Parameters for V-PCC
JP2010514300A (ja) ビデオ画像のブロックを復号化する方法
JP2022517118A (ja) 点群コーディングにおける効率的なパッチ回転
WO2023142127A1 (zh) 编解码方法、装置、设备、及存储介质
CN110022481A (zh) 视频码流的解码、生成方法及装置、存储介质、电子装置
EP4228255A1 (en) Encoding method, decoding method, encoder, decoder, and storage medium
WO2023115489A1 (zh) 编解码方法、码流、装置、设备以及可读存储介质
US9787966B2 (en) Methods and devices for coding interlaced depth data for three-dimensional video content
WO2024011386A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
WO2024077637A1 (zh) 一种编解码方法、装置、编码器、解码器及存储介质
WO2023201504A1 (zh) 编解码方法、装置、设备及存储介质
WO2024061136A1 (en) Method, apparatus, and medium for video processing
WO2023142132A1 (zh) 基于分辨率的解码方法、编码方法、解码器以及编码器
RU2773642C1 (ru) Сигнализация для передискретизации опорного изображения
US20240089499A1 (en) Displacement coding for mesh compression
RU2782251C1 (ru) Способы и устройство для обобщенного кодирования геометрии с треугольной сеткой
US20230014820A1 (en) Methods and apparatuses for dynamic mesh compression
US20230222697A1 (en) Mesh compression with deduced texture coordinates
WO2023137281A2 (en) Method, apparatus, and medium for video processing
WO2023274772A1 (en) Applying an overlay process to a picture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21968629

Country of ref document: EP

Kind code of ref document: A1