WO2025033270A1 - 復号装置、符号化装置、復号方法、及び符号化方法 - Google Patents

復号装置、符号化装置、復号方法、及び符号化方法 Download PDF

Info

Publication number
WO2025033270A1
WO2025033270A1 PCT/JP2024/027258 JP2024027258W WO2025033270A1 WO 2025033270 A1 WO2025033270 A1 WO 2025033270A1 JP 2024027258 W JP2024027258 W JP 2024027258W WO 2025033270 A1 WO2025033270 A1 WO 2025033270A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
parameter
image layer
layer
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/027258
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
ジンイン ガオ
ハン ブン テオ
チョン スン リム
プラビーン クマール ヤーダブ
清史 安倍
孝啓 西
敏康 杉尾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Corp of America
Original Assignee
Panasonic Intellectual Property Corp of America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Intellectual Property Corp of America filed Critical Panasonic Intellectual Property Corp of America
Priority to CN202480051900.0A priority Critical patent/CN121666757A/zh
Priority to EP24851695.7A priority patent/EP4738832A1/en
Priority to JP2025539327A priority patent/JPWO2025033270A1/ja
Publication of WO2025033270A1 publication Critical patent/WO2025033270A1/ja
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • This disclosure relates to a decoding device, an encoding device, a decoding method, and an encoding method.
  • Patent document 1 discloses a video encoding and decoding method using an adaptive combined pre-filter and post-filter.
  • Patent document 2 discloses a method for encoding image data for loading into an artificial intelligence (AI) integrated circuit.
  • AI artificial intelligence
  • Patent Documents 1 and 2 in an image processing system that transmits a multi-layered bitstream from an encoding device to a decoding device, there is no sufficient consideration given to reducing the processing load on the decoding device.
  • the present disclosure aims to reduce the processing load on a decoding device in an image processing system that transmits a multi-layered bitstream from an encoding device to a decoding device.
  • a decoding device includes a circuit and a memory connected to the circuit, the circuit decodes at least one parameter associated with an image layer from a bitstream having a multi-layer structure including at least one image layer, the parameter indicating whether an image decoded from the image layer associated with the parameter is suitable for a predetermined task processing.
  • FIG. 1 is a diagram showing a simplified configuration of an image processing system according to an embodiment of the present disclosure.
  • FIG. 2 is a simplified diagram showing the configuration of a circuit included in the encoding device. 13 is a flowchart showing a process executed by a circuit included in the encoding device.
  • FIG. 2 is a simplified diagram showing a portion of a bitstream with a multi-layer structure.
  • FIG. 11 is a diagram illustrating an example of parameter setting by a setting unit.
  • FIG. 11 is a diagram showing a first example of syntax for setting parameters.
  • FIG. 11 is a diagram showing a second example of syntax for parameter setting.
  • FIG. 2 is a simplified diagram showing a configuration of a circuit included in a decoding device.
  • FIG. 13 is a flowchart showing a process executed by a circuit included in the decoding device.
  • FIG. 2 is a simplified diagram showing a portion of a bitstream with a multi-layer structure.
  • FIG. 11 is a diagram illustrating an example of parameter setting by a setting unit.
  • FIG. 11 is a diagram illustrating an example of syntax regarding parameter settings.
  • FIG. 11 is a diagram illustrating a first example of parameter setting by a setting unit.
  • FIG. 11 is a diagram illustrating a second example of parameter setting by the setting unit.
  • FIG. 13 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 11 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 13 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 13 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 11 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 13 is a diagram illustrating an example of processing in a decoding device.
  • FIG. 2 is a simplified diagram showing a portion of a bitstream with a multi-layer structure.
  • FIG. 11 is a diagram illustrating an example of syntax regarding parameter settings.
  • 2 is a block diagram showing an example of a functional configuration of an encoding unit.
  • FIG. 2 is a block diagram showing an example of a functional configuration of a decoding unit.
  • FIG. FIG. 2 is a diagram showing an example of a hierarchical structure of data in a stream.
  • FIG. 2 is a diagram illustrating an example of the configuration of a bit stream.
  • the image processing system includes an encoding device and a decoding device.
  • the encoding device encodes an image into a bitstream and transmits the bitstream containing the encoded image to the decoding device.
  • the decoding device decodes the image from the received bitstream and executes a task process using the decoded image.
  • Task processing includes machine vision and human vision.
  • Machine vision includes object detection, object tracking, object segmentation, action recognition, or pose estimation using machine-learned estimation models.
  • Human vision includes viewing or watching video images by a human, such as an operator or user.
  • a bitstream has a multi-layer structure including multiple image layers
  • different images are stored in the multiple image layers.
  • the preferred image layer that contains the image to be used in the task processing differs depending on the content of the task processing.
  • the decoding device decodes all image layers including unpreferred image layers, which imposes a large processing load on the decoding device.
  • the inventors discovered that by including information indicating whether an image decoded from an image layer is suitable for task processing in a bitstream and transmitting the information from the encoding device to the decoding device, it is possible to avoid unnecessary decoding in the decoding device, thereby solving the above problem, and thus came up with the present disclosure.
  • the decoding device includes a circuit and a memory connected to the circuit, and the circuit decodes at least one parameter associated with an image layer from a bitstream having a multi-layer structure including at least one image layer, and the parameter indicates whether an image decoded from the image layer associated with the parameter is suitable for a predetermined task processing.
  • the decoding device can avoid unnecessary decoding based on the parameters, thereby reducing the processing load on the decoding device and improving processing efficiency.
  • the circuit may further decode an image from an image layer selected from the at least one image layer based on the parameters, and execute the task processing using the image decoded from the image layer.
  • the decoding device can appropriately execute task processing using an image suitable for the task processing.
  • the task processing may include machine vision.
  • the decoding device can appropriately perform machine vision using images suitable for machine vision.
  • the task processing may include human vision.
  • the decoding device can appropriately execute human vision using images suitable for human vision.
  • the task processing may include machine vision and human vision
  • the at least one parameter may include a first parameter indicating whether or not the image decoded from the image layer is suitable for the machine vision, and a second parameter indicating whether or not the image decoded from the image layer is suitable for the human vision.
  • the decoding device can appropriately perform machine vision using an image suitable for machine vision, and can appropriately perform human vision using an image suitable for human vision.
  • the parameters may include a first value and a second value, the first value indicating that the image decoded from the image layer is suitable for the task processing, and the second value indicating that the image decoded from the image layer is not suitable for the task processing.
  • the decoding device it is possible to prevent the decoding device from decoding images that are not suitable for task processing.
  • the parameters may include a first value and a second value, the first value indicating that the image decoded from the image layer is suitable for the task processing, and the second value indicating that it is not specified whether the image decoded from the image layer is suitable for the task processing.
  • the decoding device can arbitrarily determine whether or not to decode an image from an image layer associated with a parameter indicating a second value, depending on the processing load status, etc.
  • the circuit may further decode an image from only an image layer associated with the parameter indicating the first value among the at least one image layer, and execute the task processing using the image decoded from the image layer.
  • the decoding device decodes the image only from the image layer associated with the parameter indicating the first value, which makes it possible to further reduce the processing load of the decoding device.
  • the at least one image layer may include an image layer to which the parameter is not associated, and the fact that the parameter is not associated with the image layer may indicate that it has not been determined whether the image decoded from the image layer is suitable for the task processing.
  • the decoding device can arbitrarily determine whether or not to decode an image from an image layer to which no parameters are associated, depending on the processing load status, etc.
  • the circuit decodes the at least one parameter from a predetermined header region of the bitstream, and the predetermined header region may include an SEI.
  • the decoding device can easily decode parameters from a predetermined header area of the bitstream.
  • the at least one image layer may include a base layer that is the lowest layer of the multi-layer structure, and the at least one parameter associated with the at least one image layer may be stored in the header area of the base layer.
  • the decoding device can obtain all parameters associated with all image layers together from the header area of the base layer.
  • the at least one parameter associated with the at least one image layer may be stored in the header area of each of the at least one image layer.
  • the decoding device can obtain each parameter associated with each image layer individually from the header area of each image layer.
  • the encoding device includes a circuit and a memory connected to the circuit, and the circuit encodes at least one parameter associated with at least one image layer into a multi-layered bitstream including the image layer, the parameter indicating whether an image in the image layer associated with the parameter is suitable for a predetermined task processing.
  • a decoding device that receives a bitstream can avoid unnecessary decoding based on parameters, thereby reducing the processing load on the decoding device and improving processing efficiency.
  • the task processing may include machine vision.
  • a decoding device that receives a bitstream can appropriately perform machine vision using images suitable for machine vision.
  • the task processing may include human vision.
  • a decoding device that receives a bitstream can appropriately execute human vision using images suitable for human vision.
  • the task processing includes machine vision and human vision
  • the at least one parameter may include a first parameter indicating whether an image in the image layer is suitable for the machine vision, and a second parameter indicating whether an image in the image layer is suitable for the human vision.
  • a decoding device that receives a bitstream can appropriately perform machine vision using images suitable for machine vision, and can appropriately perform human vision using images suitable for human vision.
  • the parameters may include a first value and a second value, the first value indicating that the image in the image layer is suitable for the task processing, and the second value indicating that the image in the image layer is not suitable for the task processing.
  • the 17th aspect it is possible to prevent a decoding device that receives a bitstream from decoding an image that is not suitable for task processing.
  • the parameters may include a first value and a second value, the first value indicating that the image in the image layer is suitable for the task processing, and the second value indicating that it is not determined whether the image in the image layer is suitable for the task processing.
  • the decoding device that receives the bitstream can arbitrarily determine whether or not to decode an image from an image layer associated with a parameter indicating a second value, depending on the processing load status, etc.
  • the at least one image layer may include an image layer to which the parameter is not associated, and not associating the parameter with the image layer may indicate that it is not determined whether or not an image in the image layer is suitable for the task processing.
  • the decoding device that receives the bitstream can arbitrarily determine whether or not to decode an image from an image layer that is not associated with a parameter, depending on the processing load status, etc.
  • the circuit may encode the at least one parameter in a predetermined header area of the bitstream, and the predetermined header area may include an SEI.
  • a decoding device that receives a bitstream can easily decode parameters from a specified header area of the bitstream.
  • the at least one image layer includes a base layer that is the lowest layer of the multi-layer structure, and the at least one parameter associated with the at least one image layer is stored in the header area of the base layer.
  • a decoding device that receives a bitstream can obtain all parameters associated with all image layers together from the header area of the base layer.
  • the at least one parameter associated with the at least one image layer may be stored in the header area of each of the at least one image layer.
  • a decoding device that receives a bitstream can individually obtain each parameter associated with each image layer from the header area of each image layer.
  • a decoding device decodes at least one parameter associated with at least one image layer from a bitstream having a multi-layer structure including the image layer, and the parameter indicates whether or not an image decoded from the image layer associated with the parameter is suitable for a predetermined task processing.
  • the decoding device can avoid unnecessary decoding based on the parameters, thereby reducing the processing load on the decoding device and improving processing efficiency.
  • an encoding device encodes at least one parameter associated with at least one image layer into a multi-layered bitstream including the image layer, the parameter indicating whether an image in the image layer associated with the parameter is suitable for a given task processing.
  • a decoding device that receives a bitstream can avoid unnecessary decoding based on parameters, thereby reducing the processing load on the decoding device and improving processing efficiency.
  • FIG. 1 is a simplified diagram showing the configuration of an image processing system according to an embodiment of the present disclosure.
  • the image processing system includes an encoding device 1, a decoding device 2, and a transmission line NW.
  • Image data D1 is input to the encoding device 1 from an external device.
  • the external device includes a camera that captures moving images.
  • the external device inputs image data D1 of the captured moving images to the encoding device 1.
  • the encoding device 1 generates a bit stream BS based on image data D1.
  • FIG. 20 is a diagram showing an example of the configuration of the bit stream BS.
  • the bit stream BS has a multi-layer configuration including at least one image layer L.
  • the bit stream BS has a three-layer multi-layer configuration.
  • the three-layer multi-layer configuration includes a first image layer L1, which is the base layer of the lowest layer, and a second image layer L2 and a third image layer L3, which are enhancement layers of upper layers.
  • the first image layer L 1 includes image data of an I frame of a region of interest (ROI).
  • the ROI region corresponds to an object or the like included in an image.
  • the second image layer L 2 includes image data of a P frame and a B frame of the ROI region.
  • the third image layer L 3 includes image data of a background region excluding the ROI region.
  • a low-quality image of the ROI region is obtained by the first image layer L 1.
  • a high-quality image of the ROI region is obtained by the first image layer L 1 and the second image layer L 2.
  • a complete image including the ROI region and the background region is obtained by the first image layer L 1 , the second image layer L 2 , and the third image layer L 3.
  • examples of multi-layer configurations are not limited to the above examples.
  • the encoding device 1 transmits the generated bit stream BS to the decoding device 2 via the transmission path NW.
  • the decoding device 2 receives the bit stream BS.
  • the decoding device 2 obtains an image from the bit stream BS by decoding it, and executes task processing based on the obtained image.
  • the process of obtaining an image from the bit stream may be rephrased as extraction or decoding.
  • Task processing includes machine vision and human vision.
  • Machine vision includes object detection, object tracking, object segmentation, action recognition, pose estimation, etc., using an artificial intelligence (AI) model, which is a machine-learned estimation model.
  • AI artificial intelligence
  • the task processing unit includes an inference device using AI.
  • Human vision includes viewing or watching of a moving image by a human being, such as an operator or a user.
  • the task processing unit includes a display device, such as a liquid crystal display or an organic EL display.
  • the transmission line NW may be the Internet, a wide area network (WAN), a local area network (LAN), or any combination of these. It is preferable that the transmission line NW is a private network in which secure communication is ensured by access restrictions.
  • the encoding device 1 includes a circuit 11 and a memory 12 connected to the circuit 11.
  • the circuit 11 includes a processor such as a CPU.
  • the memory 12 includes any recording medium such as a ROM, RAM, HDD, SSD, or semiconductor memory.
  • the memory 12 stores data to be processed by the circuit 11 or data in the middle of processing.
  • the decryption device 2 includes a circuit 21 and a memory 22 connected to the circuit 21.
  • the circuit 21 includes a processor such as a CPU.
  • the memory 22 includes any recording medium such as a ROM, RAM, HDD, SSD, or semiconductor memory.
  • the memory 22 stores data to be processed by the circuit 21 or data in the middle of processing.
  • FIG. 2 is a simplified diagram showing the configuration of the circuit 11 included in the encoding device 1.
  • the circuit 11 has an acquisition unit 31, a setting unit 32, an encoding unit 33, and a transmission unit 34.
  • FIG. 17 is a block diagram showing an example of the functional configuration of the encoding unit 33 according to this embodiment.
  • the encoding unit 33 encodes an image in units of blocks.
  • the encoding unit 33 includes a division unit 102, a subtraction unit 104, a transformation unit 106, a quantization unit 108, an entropy encoding unit 110, an inverse quantization unit 112, an inverse transformation unit 114, an addition unit 116, a block memory 118, a loop filter 120, a frame memory 122, an intra prediction unit 124, an inter prediction unit 126, a prediction control unit 128, and a prediction parameter generation unit 130.
  • the intra prediction unit 124 and the inter prediction unit 126 are configured as part of the prediction processing unit 125.
  • the multiple components of the encoding unit 33 shown in FIG. 17 are implemented by the circuit 11 and memory 12 shown in FIG. 1.
  • the circuit 11 is configured to include a processor such as a CPU.
  • the circuit 11 may be a dedicated or general-purpose electronic circuit for encoding images, or may be a collection of multiple electronic circuits.
  • the circuit 11 may fulfill the role of multiple components of the encoding unit 33 shown in FIG. 17, excluding the components for storing information.
  • Memory 12 may be a dedicated or general-purpose electronic circuit for storing information, or may be a collection of multiple electronic circuits. Memory 12 may be externally connected to circuit 11, or may be built into circuit 11. Memory 12 may be a magnetic disk or optical disk, etc., and may be expressed as storage or recording medium, etc. Memory 12 may be a non-volatile memory, or may be a volatile memory.
  • the memory 12 may store an image to be encoded, or a stream corresponding to the encoded image.
  • the memory 12 may also store a program for the processor to execute the image encoding process.
  • the memory 12 may also function as one of the components for storing information among the multiple components included in the encoding unit 33 shown in FIG. 29. Specifically, the memory 12 may function as the block memory 118 and the frame memory 122 shown in FIG. 17. More specifically, the memory 12 may store a reconstructed image (specifically, a reconstructed block or a reconstructed picture, etc.).
  • the implementation of some of the multiple components shown in FIG. 17 may be omitted, and the execution of some of the multiple processes executed by the multiple components may be omitted.
  • some of the multiple components shown in FIG. 17 may be implemented in another device, and some of the multiple processes executed by the multiple components may be executed by another device.
  • FIG. 3 is a flowchart showing the processing executed by the circuit 11 of the encoding device 1.
  • step SP11 the acquisition unit 31 acquires image data D1 indicating the image X to be processed that has been input from an external device.
  • the acquisition unit 31 inputs the image data D1 to the encoding unit 33.
  • the encoding unit 33 encodes the image X into a bit stream BS.
  • the image X includes an image X1 corresponding to the first image layer L1 , an image X2 corresponding to the second image layer L2 , and an image X3 corresponding to the third image layer L3 .
  • Figure 4 is a simplified diagram showing a portion of a bitstream BS with a multi-layer structure. Only one access unit is shown in Figure 4. An access unit is the smallest processing unit with time attributes, and corresponds to, for example, one frame of a video image.
  • the bitstream BS is composed of multiple access units that are consecutive in time.
  • Each image layer L has a header area 41 and a payload area 42.
  • the encoding unit 33 stores an encoded image obtained by encoding an image X1 in the payload area 42 of the first image layer L1 .
  • the encoding unit 33 also stores an encoded image obtained by encoding an image X2 in the payload area 42 of the second image layer L2 .
  • the encoding unit 33 also stores an encoded image obtained by encoding an image X3 in the payload area 42 of the third image layer L3 .
  • the setting unit 32 sets a parameter P in association with the image layer L.
  • the parameter P indicates whether the image X encoded in the image layer L associated with the parameter P is suitable for a predetermined task processing.
  • the predetermined task processing is arbitrarily set by the setting unit 32 from among machine vision and human vision.
  • the predetermined task processing is object tracking, which is a type of machine vision.
  • the setting information for the predetermined task processing may be shared in advance by the encoding device 1 and the decoding device 2, or may be included in a bit stream BS and transmitted from the encoding device 1 to the decoding device 2.
  • the parameters P include parameters P1 to P3 .
  • the setting unit 32 sets the parameter P1 in association with the first image layer L1 , sets the parameter P2 in association with the second image layer L2 , and sets the parameter P3 in association with the third image layer L3 .
  • the setting unit 32 sets the value of each of the parameters P 1 to P 3 to "1" (first value) or "0" (second value).
  • the value "1" of the parameter P indicates that the image X encoded in the image layer L is suitable for the above-mentioned predetermined task processing (object tracking).
  • the suitability of an image for task processing means that it has been encoded as an image suitable for task processing.
  • the value "0" of the parameter P indicates that the image X encoded in the image layer L is not suitable for the above-mentioned predetermined task processing (object tracking).
  • the unsuitability of an image for task processing means that it has not been encoded as an image suitable for task processing.
  • the setting unit 32 sets the values of parameters P1 , P2 , and P3 to "1", "0", and "0", respectively.
  • the setting unit 32 inputs data D2 indicating the setting value of the parameter P to the encoding unit 33.
  • step SP14 the encoding unit 33 encodes the parameters P into a bit stream BS.
  • the process of encoding the parameters into a bit stream may be rephrased as saving or storing.
  • the encoding unit 33 encodes a parameter P1 in a header area 41 of a first image layer L1 .
  • the encoding unit 33 also encodes a parameter P2 in a header area 41 of a second image layer L2 .
  • the encoding unit 33 also encodes a parameter P3 in a header area 41 of a third image layer L3 .
  • the parameter P is stored in a predetermined area of the header area 41.
  • the predetermined area is SEI (Supplemental Enhancement Information). However, the predetermined area may be VUI (Video Usability Information), VPS, SPS, PPS, PH , SH, APS, a tile header, a system layer header, or the like.
  • Figure 19 is a diagram showing an example of a hierarchical structure of data in a stream.
  • the stream includes, for example, a video sequence.
  • the video sequence includes, for example, a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Supplemental Enhancement Information (SEI), and multiple pictures.
  • VPS Video Parameter Set
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • SEI Supplemental Enhancement Information
  • the VPS includes coding parameters common to multiple layers, and coding parameters related to multiple layers or individual layers included in the video image.
  • the SPS includes parameters used for the sequence, i.e., encoding parameters that the decoding device 2 references to decode the sequence.
  • the encoding parameters may indicate, for example, the width or height of a picture. Note that there may be multiple SPSs.
  • the PPS includes parameters used for a picture, i.e., encoding parameters referenced by the decoding device 2 to decode each picture in a sequence.
  • the encoding parameters may include, for example, a reference value of the quantization width used in decoding the picture and a flag indicating the application of weighted prediction. Note that there may be multiple PPSs.
  • the SPS and PPS may simply be referred to as parameter sets.
  • a picture includes a picture header and one or more slices.
  • the picture header includes coding parameters that are referenced by the decoding device 2 to decode the one or more slices.
  • a slice includes a slice header and one or more bricks.
  • the slice header includes coding parameters that are referenced by the decoding device 2 to decode the one or more bricks.
  • a brick contains one or more Coding Tree Units (CTUs), as shown in Figure 19 (D).
  • CTUs Coding Tree Units
  • a picture may not contain slices, but may contain tile groups instead.
  • a tile group contains one or more tiles.
  • a brick may contain slices.
  • a CTU is also called a superblock or a basic division unit.
  • a CTU includes a CTU header and one or more CUs (Coding Units).
  • the CTU header includes coding parameters that are referenced by the decoding device 2 to decode the one or more CUs.
  • a CU may be divided into multiple smaller CUs.
  • a CU includes a CU header, prediction information, and residual coefficient information.
  • the prediction information is information for predicting a CU.
  • the residual coefficient information is information indicating a prediction residual.
  • a CU is basically the same as a PU (Prediction Unit) or a TU (Transform Unit), but may include multiple TUs smaller than a CU.
  • a CU may be processed for each VPDU (Virtual Pipeline Decoding Unit) that constitutes the CU.
  • a VPDU is a fixed unit that can be processed in one stage, for example, when performing pipeline processing in hardware.
  • a stream may not have some of the multiple hierarchies shown in FIG. 19.
  • the order of these hierarchies may also be changed, and any of the hierarchies may be replaced with other hierarchies.
  • the picture that is the subject of processing currently being performed by a device such as the encoding device 1 or the decoding device 2 is called the current picture. If the processing is encoding, the current picture is synonymous with the picture to be encoded, and if the processing is decoding, the current picture is synonymous with the picture to be decoded.
  • the block (CU or a block of a CU) that is the subject of processing currently being performed by a device such as the encoding device 1 or the decoding device 2 is called the current block. If the processing is encoding, the current block is synonymous with the block to be encoded, and if the processing is decoding, the current block is synonymous with the block to be decoded.
  • FIG. 6A is a diagram showing a first example of syntax for setting parameter P.
  • parameter P is set as the value of mvi_optimized_for_first_vision_task_flag included in an SEI message such as machine_vision_indication. If the value of the identifier of the flag is "1", it indicates that image X encoded in the image layer L associated with parameter P is suitable for task processing. If the value of the identifier of the flag is "0", it indicates that image X encoded in the image layer L associated with parameter P is not suitable for task processing.
  • FIG. 6B is a diagram showing a second example of syntax for setting parameter P.
  • parameter P is set as the value of mvi_not_optimized_for_first_vision_task_flag included in an SEI message such as machine_vision_indication. If the value of the identifier of the flag is "1", it indicates that image X encoded in the image layer L associated with parameter P is not suitable for task processing. If the value of the identifier of the flag is "0", it indicates that image X encoded in the image layer L associated with parameter P is suitable for task processing.
  • the transmission unit 34 transmits the bit stream BS input from the encoding unit 33 to the decoding device 2 via the transmission path NW.
  • FIG. 7 is a simplified diagram showing the configuration of the circuit 21 provided in the decoding device 2.
  • the circuit 21 has a receiving unit 51, a decoding unit 52, and a task processing unit 53.
  • FIG. 18 is a block diagram showing an example of the functional configuration of the decoding unit 52 according to this embodiment.
  • the decoding unit 52 decodes the stream, which is an encoded image, in units of blocks.
  • the decoding unit 52 includes an entropy decoding unit 202, an inverse quantization unit 204, an inverse transform unit 206, an addition unit 208, a block memory 210, a loop filter 212, a frame memory 214, an intra prediction unit 216, an inter prediction unit 218, a prediction control unit 220, a prediction parameter generation unit 222, and a partition determination unit 224.
  • the intra prediction unit 216 and the inter prediction unit 218 are configured as part of the prediction processing unit 215.
  • the multiple components of the decoding unit 52 shown in FIG. 18 are implemented by the circuit 21 and memory 22 shown in FIG. 1.
  • the circuit 21 is configured with a processor such as a CPU.
  • the circuit 21 may be a dedicated or general-purpose electronic circuit for decoding a stream, or may be a collection of multiple electronic circuits.
  • the circuit 21 may fulfill the role of multiple components of the decoding unit 52 shown in FIG. 18, excluding the components for storing information.
  • Memory 22 may be a dedicated or general-purpose electronic circuit that stores information, or may be a collection of multiple electronic circuits. Memory 22 may be externally connected to circuit 21, or may be built into circuit 21. Memory 22 may be a magnetic disk or optical disk, etc., and may be expressed as storage or recording medium, etc. Memory 22 may be a non-volatile memory, or may be a volatile memory.
  • the memory 22 may store the stream to be decoded, or may store the decoded image.
  • the memory 22 may also store a program for the processor to execute the process of decoding the stream.
  • the memory 22 may also function as one of the components for storing information included in the decoding unit 52 shown in FIG. 18. Specifically, the memory 22 may function as the block memory 210 and the frame memory 214 shown in FIG. 18. More specifically, the memory 22 may store a reconstructed image (specifically, a reconstructed block or a reconstructed picture, etc.).
  • some of the components shown in FIG. 18 may be omitted from implementation, and some of the processes executed by the components may be omitted from execution.
  • some of the components shown in FIG. 18 may be implemented in another device, and some of the processes executed by the components may be executed by another device.
  • the inverse quantization unit 204, inverse transform unit 206, adder unit 208, block memory 210, frame memory 214, intra prediction unit 216, inter prediction unit 218, prediction control unit 220, and loop filter 212 included in the decoding unit 52 shown in FIG. 18 perform the same processing as the inverse quantization unit 112, inverse transform unit 114, adder unit 116, block memory 118, frame memory 122, intra prediction unit 124, inter prediction unit 126, prediction control unit 128, and loop filter 120 included in the encoding unit 33 shown in FIG. 17.
  • FIG. 8 is a flowchart showing the processing executed by the circuit 21 of the decoding device 2.
  • the receiver 51 receives the bit stream BS from the encoding device 1 via the transmission path NW.
  • the receiver 51 inputs the received bit stream BS to the decoder 52.
  • the decoding unit 52 decodes the parameter P from the bit stream BS.
  • the decoding unit 52 decodes the parameter P1 from the header region 41 of the first image layer L1 .
  • the decoding unit 52 also decodes the parameter P2 from the header region 41 of the second image layer L2 .
  • the decoding unit 52 also decodes the parameter P3 from the header region 41 of the third image layer L3 .
  • the values of the parameters P1 , P2 , and P3 are set to "1", "0", and "0", respectively.
  • step SP23 the decoding unit 52 selects an image layer L based on the parameter P decoded in step SP22, and decodes an image X from the selected image layer L.
  • the decoding unit 52 selects the image layer L1 in which the value of the parameter P1 is set to "1", and does not select the image layers L2 and L3 in which the values of the parameters P2 and P3 are set to "0". Therefore, the decoding unit 52 decodes the image X1 from the selected image layer L1 , and does not decode the images X2 and X3 from the unselected image layers L2 and L3 .
  • the decoding unit 52 inputs the image data D3 of the decoded image X1 to the task processing unit 53.
  • the decoding unit 52 when decoding an image of an upper image layer, the image of the lower image layer is referred to. Therefore, when the decoding unit 52 selects the second image layer L2 , it decodes the images X1 and X2 , but does not decode the image X3 . Also, when the decoding unit 52 selects the third image layer L3 , it decodes the images X1 to X3 . However, when the correlation between the images between the image layers is low, the decoding unit 52 does not need to refer to the image of the lower image layer when decoding an image of an upper image layer.
  • step SP24 the task processing unit 53 executes task processing using the image X1 indicated by the image data D3.
  • the task processing unit 53 executes object tracking using the image X1 .
  • the parameter P indicates whether the image X decoded from the image layer L associated with the parameter P is suitable for a specific task processing. Therefore, the decoding device 2 can avoid unnecessary decoding based on the parameter P, which makes it possible to reduce the processing load of the decoding device 2.
  • the decoding device 2 also decodes an image X from the image layer L selected based on the parameter P, and executes the task processing using the image X decoded from the image layer L. Therefore, the task processing can be appropriately executed using an image X suitable for the task processing.
  • the task processing includes machine vision. Therefore, the decoding device 2 can appropriately execute the machine vision using the image X that is suitable for the machine vision.
  • the task processing includes human vision. Therefore, the decoding device 2 can appropriately execute human vision using image X that is suitable for human vision.
  • the parameter P also includes a first value and a second value, the first value indicating that the image X decoded from the image layer L is suitable for task processing, and the second value indicating that the image X decoded from the image layer L is not suitable for task processing. Therefore, it is possible to avoid the decoding device 2 decoding the image X that is not suitable for task processing.
  • the decoding device 2 decodes the image X only from the image layer L associated with the parameter P indicating the first value. This makes it possible to further reduce the processing load on the decoding device 2.
  • the decoding device 2 decodes the parameter P from the SEI in the header region 41 of the bitstream BS. Therefore, the decoding device 2 can easily decode the parameter P.
  • the parameters P1 to P3 associated with the image layers L1 to L3 are stored in the header areas 41 of the image layers L1 to L3 . Therefore, the decoding device 2 can individually obtain the parameters P1 to P3 associated with the image layers L1 to L3 from the header areas 41 of the image layers L1 to L3 .
  • Multi-layer encoding is a method of encoding a video stream by classifying it into multiple layers, each of which has different video data. Each layer is associated with a different vision task, and these layers work collectively to compress the image.
  • the decoding device 2 has the flexibility to use the instructions of these vision tasks and perform the vision tasks according to the decoded layers. Vision tasks such as object detection or object tracking rely on accurate and relevant visual information to make accurate predictions. Therefore, for such vision tasks and for human vision, it may be effective to use high-quality images by including an enhancement layer.
  • a base layer that only includes an ROI may be optimized for machine vision and not suitable for human vision.
  • an enhancement layer that includes residual (non-ROI) or temporal upsampling data is suitable for human vision. Therefore, the method of the present disclosure improves the accuracy of the machine vision model and saves the computational power of the decoding device 2.
  • the predetermined task processing may include machine vision and human vision, and the at least one parameter may include a parameter P (first parameter) indicating whether the image X decoded from the image layer L is suitable for machine vision, and a parameter Q (second parameter) indicating whether the image X decoded from the image layer L is suitable for human vision.
  • a parameter P first parameter
  • a parameter Q second parameter
  • Fig. 9 is a simplified diagram showing a part of a bit stream BS having a multi-layer structure.
  • Fig. 9 shows only one access unit.
  • Parameter P includes parameters P1 to P3
  • parameter Q includes parameters Q1 to Q3 .
  • the setting unit 32 sets parameters P1 and Q1 in association with the first image layer L1 , sets parameters P2 and Q2 in association with the second image layer L2 , and sets parameters P3 and Q3 in association with the third image layer L3 .
  • the setting unit 32 sets the values of parameters P1 , P2 , and P3 to "1", “0", and “0”, respectively, and sets the values of parameters Q1 , Q2 , and Q3 to "0", "0", and “1", respectively.
  • image X1 is suitable for object tracking
  • images X2 and X3 are not suitable for object tracking.
  • image X3 (and images X1 and X2 in the lower layer) are suitable for human vision.
  • FIG. 11 is a diagram showing an example of syntax for setting parameters P and Q.
  • parameter P is set as the value of mvi_optimized_for_first_vision_task_flag. If the value of the identifier of the flag is "1", it indicates that image X encoded in the image layer L associated with parameter P is suitable for machine vision. If the value of the identifier of the flag is "0", it indicates that image X encoded in the image layer L associated with parameter P is not suitable for machine vision.
  • parameter Q is set as the value of mvi_optimized_for_second_vision_task_flag. If the value of the identifier of the flag is "1", it indicates that image X encoded in the image layer L associated with parameter Q is suitable for human vision. If the value of the identifier of the flag is "0", it indicates that image X encoded in the image layer L associated with parameter Q is not suitable for human vision.
  • parameter P or parameter Q is a parameter indicating whether it is suitable for human vision
  • a value indicating that it is suitable for human vision may be set for all image layers in the multi-layer configuration.
  • parameter P or parameter Q is a parameter indicating whether it is suitable for machine vision
  • a constraint may be placed on the set value for at least one of the multiple image layers in the multi-layer configuration, so that one parameter is set to a value indicating that it is not suitable for the corresponding machine vision, and the other parameter is set to a value indicating that it is suitable for the corresponding machine vision.
  • the at least one parameter includes a parameter P indicating whether or not the image X is suitable for machine vision, and a parameter Q indicating whether or not the image X is suitable for human vision. Therefore, based on the parameters P and Q, the decoding device 2 can appropriately execute machine vision using an image X suitable for machine vision, and can appropriately execute human vision using an image X suitable for human vision.
  • (Second Modification) 12 is a diagram showing a first setting example of the parameter P by the setting unit 32.
  • the setting unit 32 sets the value of each of the parameters P 1 to P 3 to "1" (first value) or "0" (second value).
  • the value "1" of the parameter P indicates that the image X encoded in the image layer L is suitable for task processing.
  • the value "0" of the parameter P indicates that it is not specified whether the image X encoded in the image layer L is suitable for task processing.
  • the decoding device 2 that has received the bit stream BS can arbitrarily determine whether or not to decode the image X from the image layer L associated with the parameter P indicating the second value, depending on the processing load situation, etc.
  • the setting unit 32 sets the values of the parameters P1 and P3 to "1" or "0". Furthermore, the setting unit 32 does not associate the parameter P2 with the image layer L2 .
  • the value "1" of the parameter P indicates that the image X encoded in the image layer L is suitable for task processing.
  • the value "0" of the parameter P indicates that the image X encoded in the image layer L is not suitable for task processing. Not associating the parameter P with the image layer L indicates that it is not specified whether the image X encoded in the image layer L is suitable for task processing.
  • the decoding device 2 that has received the bit stream BS can arbitrarily decide whether or not to decode the image X from the image layer L that is not associated with the parameter P, depending on the processing load status, etc.
  • 14A to 14E are diagrams for explaining an example of a processing method for the decoder 2 that has received the bit stream BS to control the image layers to be decoded according to the processing load status, etc.
  • the decoder 2 counts the number of image layers that need to be decoded, starting from the first image layer L1, which is the lowest layer, and identifies the image layers that need to be decoded according to the count value C.
  • the number of image layers that need to be decoded in the decoding device 2 can be easily expressed, thereby simplifying the control for determining the image layer to be decoded depending on the processing load status, etc.
  • FIG. 15 is a simplified diagram showing a part of a bitstream BS having a multi-layer structure, in which only one access unit is shown.
  • the encoding unit 33 collectively stores all of the multiple parameters P 1 to P 3 associated with the multiple image layers L 1 to L 3 in the header area 41 of the first image layer L 1 , which is the base layer.
  • FIG. 16 is a diagram showing an example of syntax for setting parameter P.
  • parameter P is set as the value of mvi_optimized_for_first_vision_task_flag[i] in the SEI message using parameter i indicating the layer number of image layer L.
  • the decoding device 2 that receives the bit stream BS can collectively acquire all the parameters P 1 to P 3 associated with all the image layers L 1 to L 3 from the header area 41 of the base layer.
  • the encoding device 1 stores the parameters P 1 to P 3 in the base layer of all the access units constituting the bit stream BS. However, the encoding device 1 may store the parameters P 1 to P 3 only in the base layer of the first access unit constituting the bit stream BS. In this case, the setting contents of the parameters P 1 to P 3 in the first access unit are inherited by the second and subsequent access units. The encoding device 1 may also store the parameters P 1 to P 3 only in the base layer of the intermediate access units constituting the bit stream BS.
  • the access unit from the first access unit to the intermediate access unit is in a state where it is not specified whether the encoded image is suitable for task processing, and the setting contents of the parameters P 1 to P 3 in the intermediate access unit are inherited by the access units after the intermediate access unit.
  • the encoding unit 33 may collectively store some of the multiple parameters P1 to P3 associated with the multiple image layers L1 to L3 in the header area 41 of the first image layer L1 .
  • the encoding unit 33 stores the parameters P1 and P2 in the header area 41 of the first image layer L1 , and stores the parameter P3 in the header area 41 of the third image layer L3 .
  • the encoding unit 33 may collectively store a plurality of parameters P1 to P3 associated with a plurality of image layers L1 to L3 in the header area 41 of the first image layer L1 as independent SEI messages each having the syntax configuration described in Fig. 6 or 11.
  • an SN (scalable nesting)_SEI message may be used to collectively store the SEI messages of the plurality of image layers L1 to L3 in one header area.
  • the present disclosure is particularly useful when applied to an image processing system that includes an encoding device that encodes an image into a bitstream and transmits the bitstream, and a decoding device that decodes an image from the received bitstream.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
PCT/JP2024/027258 2023-08-09 2024-07-31 復号装置、符号化装置、復号方法、及び符号化方法 Pending WO2025033270A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202480051900.0A CN121666757A (zh) 2023-08-09 2024-07-31 解码装置、编码装置、解码方法以及编码方法
EP24851695.7A EP4738832A1 (en) 2023-08-09 2024-07-31 Decoding device, encoding device, decoding method, and encoding method
JP2025539327A JPWO2025033270A1 (https=) 2023-08-09 2024-07-31

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363531635P 2023-08-09 2023-08-09
US63/531,635 2023-08-09

Publications (1)

Publication Number Publication Date
WO2025033270A1 true WO2025033270A1 (ja) 2025-02-13

Family

ID=94534751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/027258 Pending WO2025033270A1 (ja) 2023-08-09 2024-07-31 復号装置、符号化装置、復号方法、及び符号化方法

Country Status (4)

Country Link
EP (1) EP4738832A1 (https=)
JP (1) JPWO2025033270A1 (https=)
CN (1) CN121666757A (https=)
WO (1) WO2025033270A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9883207B2 (en) 2009-12-31 2018-01-30 Thomson Licensing Dtv Methods and apparatus for adaptive coupled pre-processing and post-processing filters for video encoding and decoding
US10452955B2 (en) 2018-01-15 2019-10-22 Gyrfalcon Technology Inc. System and method for encoding data in an image/video recognition integrated circuit solution
WO2023111384A1 (en) * 2021-12-13 2023-06-22 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9883207B2 (en) 2009-12-31 2018-01-30 Thomson Licensing Dtv Methods and apparatus for adaptive coupled pre-processing and post-processing filters for video encoding and decoding
US10452955B2 (en) 2018-01-15 2019-10-22 Gyrfalcon Technology Inc. System and method for encoding data in an image/video recognition integrated circuit solution
WO2023111384A1 (en) * 2021-12-13 2023-06-22 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DUAN LINGYU; LIU JIAYING; YANG WENHAN; HUANG TIEJUN; GAO WEN: "Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE, USA, vol. 29, 28 August 2020 (2020-08-28), USA, pages 8680 - 8695, XP011807613, ISSN: 1057-7149, DOI: 10.1109/TIP.2020.3016485 *
HUANG HAOFENG; YANG WENHAN; XIANG WEI; LIU JIAYING; DUAN LING-YU: "Collaborative Scalable Visual Compression for Human-Centered Videos", 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), IEEE, 27 May 2022 (2022-05-27), pages 2988 - 2992, XP034224231, DOI: 10.1109/ISCAS48785.2022.9937882 *
J. GAO (PANASONIC), H.-B. TEO, C.-S. LIM, K. ABE (PANASONIC): "AHG8/AHG9: On machine vision indication", 31. JVET MEETING; 20230711 - 20230719; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-AE0090, 4 July 2023 (2023-07-04), XP030311290 *

Also Published As

Publication number Publication date
CN121666757A (zh) 2026-03-13
EP4738832A1 (en) 2026-05-06
JPWO2025033270A1 (https=) 2025-02-13

Similar Documents

Publication Publication Date Title
KR102906942B1 (ko) 비디오 코딩 및 디코딩
TWI603609B (zh) 簡化視訊隨機存取之限制及單元類型
EP4459999A2 (en) Conditional signalling of reference picture list modification information
CN104854866A (zh) 下一代视频的内容自适应、特性补偿预测
EP2304958B1 (en) Methods and apparatus for texture compression using patch-based sampling texture synthesis
KR102860198B1 (ko) 히스토리 기반 모션 벡터 예측을 사용하는 인코더, 디코더 및 대응 방법
CN105872558A (zh) 编码单元的局部运动向量推导的方法
JP2017521980A (ja) デジタル画像のコーディング方法、デコーディング方法、関連する装置およびコンピュータプログラム
US20150264383A1 (en) Block Copy Modes for Image and Video Coding
CN118018765A (zh) 视频解码方法及视频解码器
CN107646194B (zh) 用于视频运动补偿的装置和方法
CN111432213B (zh) 用于视频和图像压缩的贴片数据大小编码的方法和装置
JP7736404B2 (ja) マルチメディアデータ処理方法、装置、機器、コンピュータ可読記憶媒体及びコンピュータプログラム
CN121967719A (zh) 用于对视频信号进行编解码的方法、装置和存储介质
CN105659600A (zh) 用于解码表示图像序列的可伸缩流的方法和设备及相应编码方法和设备
CN116848843A (zh) 可切换的密集运动向量场插值
US20100111166A1 (en) Device for decoding a video stream and method thereof
CN115606180B (zh) 视频编码的通用约束信息
WO2025033270A1 (ja) 復号装置、符号化装置、復号方法、及び符号化方法
CN113365077B (zh) 帧间预测方法、编码器、解码器、计算机可读存储介质
WO2024060161A1 (zh) 编解码方法、编码器、解码器以及存储介质
AU2024250863C1 (en) Decoder, encoder, decoding method, and encoding method
RU2825342C1 (ru) Видеокодер, видеодекодер, способ кодирования видео, способ декодирования видео
WO2025070262A1 (ja) 復号装置、符号化装置、復号方法、及び符号化方法
CA3288344A1 (en) Decoding device, encoding device, decoding method, and encoding method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24851695

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025539327

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025539327

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2024851695

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 202647012982

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2024851695

Country of ref document: EP

Effective date: 20260202

WWP Wipo information: published in national office

Ref document number: 202647012982

Country of ref document: IN

NENP Non-entry into the national phase

Ref country code: DE