WO2019093234A1 - Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage - Google Patents

Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage Download PDF

Info

Publication number
WO2019093234A1
WO2019093234A1 PCT/JP2018/040801 JP2018040801W WO2019093234A1 WO 2019093234 A1 WO2019093234 A1 WO 2019093234A1 JP 2018040801 W JP2018040801 W JP 2018040801W WO 2019093234 A1 WO2019093234 A1 WO 2019093234A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processing
convolutional
neural network
convolutional neural
Prior art date
Application number
PCT/JP2018/040801
Other languages
English (en)
Japanese (ja)
Inventor
アレック ホジキンソン
ルカ リザジオ
遠間 正真
西 孝啓
安倍 清史
龍一 加納
Original Assignee
パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ filed Critical パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ
Publication of WO2019093234A1 publication Critical patent/WO2019093234A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals

Definitions

  • the present disclosure relates to an encoding device, a decoding device, an encoding method, and a decoding method.
  • H.264 also called High Efficiency Video Coding (HEVC)
  • HEVC High Efficiency Video Coding
  • the image space domain is transformed into a coding space domain using a Fourier transform such as discrete cosine transform.
  • the encoding space region transformed by Fourier transform may not be the optimal encoding space region for performing compression on the input image.
  • the present disclosure provides an encoding device and the like that can perform compression of an image in which deterioration of image quality is further suppressed.
  • An encoding apparatus includes a memory and a circuit accessible to the memory, and the circuit accessible to the memory uses a first convolutional neural network model for the input image.
  • Compression processing is performed on the input image by performing conversion from an image space region to an encoding space region, and compression that is a result of compression and decompression on the input image using a second convolutional neural network model
  • a process of extracting a feature amount used in post-processing, which is a process of bringing a release image close to the input image, is performed.
  • the encoding device and the like in one aspect of the present disclosure can perform compression of an image with further suppressed deterioration in image quality.
  • FIG. 1 is a diagram showing an MS-SSIM curve of the codec architecture in the comparative example.
  • FIG. 2 is a block diagram showing the configuration of the image processing apparatus according to the first embodiment.
  • FIG. 3 is a block diagram showing an example of a configuration of a coding apparatus according to Embodiment 1.
  • FIG. 4 is a block diagram showing an example of the configuration of the decoding apparatus in the first embodiment.
  • FIG. 5 is a block diagram showing a connection configuration of the convolutional neural network in the first embodiment.
  • FIG. 6 is a block diagram showing an example of a specific connection configuration of the convolutional neural network according to the first embodiment.
  • FIG. 7 is a block diagram showing the configuration of a convolution block in the first embodiment.
  • FIG. 1 is a diagram showing an MS-SSIM curve of the codec architecture in the comparative example.
  • FIG. 2 is a block diagram showing the configuration of the image processing apparatus according to the first embodiment.
  • FIG. 3 is a block diagram showing an example of
  • FIG. 8 is a block diagram showing a configuration of a residual block in the first embodiment.
  • FIG. 9 is a diagram showing an experimental result of verifying the effectiveness of the image processing apparatus according to the first embodiment.
  • FIG. 10 is a block diagram showing an implementation example of the coding apparatus according to Embodiment 1.
  • FIG. 11 is a flowchart of an exemplary operation of the coding apparatus according to Embodiment 1.
  • FIG. 12 is a block diagram showing an implementation example of the decoding apparatus according to the first embodiment.
  • FIG. 13 is a flowchart showing an operation example of the decoding apparatus according to the first embodiment.
  • FIG. 14 is an overall configuration diagram of a content supply system for realizing content distribution service.
  • FIG. 15 is a diagram illustrating an example of a coding structure at the time of scalable coding.
  • FIG. 16 is a diagram illustrating an example of a coding structure at the time of scalable coding.
  • FIG. 17 is a diagram showing an example of a display screen of a web page.
  • FIG. 18 is a diagram showing an example of a display screen of a web page.
  • FIG. 19 is a diagram illustrating an example of a smartphone.
  • FIG. 20 is a block diagram showing a configuration example of a smartphone.
  • An encoding apparatus includes a memory and a circuit accessible to the memory, and the circuit accessible to the memory uses a first convolutional neural network model to generate an input image.
  • compression processing is performed on the input image by performing conversion from an image space region to a coding space region, and using the second convolutional neural network model, the result is compression and decompression on the input image.
  • a process of extracting a feature amount used in post-processing which is a process of bringing a decompressed image close to the input image is performed.
  • the encoding apparatus uses the first convolutional neural network model for converting to the encoding space and the second convolutional neural network model for extracting the feature quantity used in the post-processing, thereby causing the image quality to be degraded. More suppressed image compression can be performed.
  • the feature amount is high frequency information included in the input image.
  • the encoding apparatus extracts a high-frequency image included in the input image that predominantly includes the information lost by the quantization process, as a feature amount for bringing the decompressed image closer to the input image.
  • processing can be performed to bring the decompressed image closer to the input image in post-processing, so it is possible to perform compression of the image in which deterioration of the image quality is further suppressed.
  • the first convolutional neural network model and the second convolutional neural network model include two or more convolutional blocks, and include one or more residual blocks, and the two or more convolutions
  • Each of the blocks is a processing block including one or more convolutional layers
  • each of the one or more residual blocks is a convolutional group including at least one convolutional layer of the two or more convolutional blocks
  • the data input to the residual block is input to the convolution group included in the residual block, and the data input to the residual block is added to the data output from the convolution group It is a processing block.
  • the encoding apparatus can perform compression of an image in which the deterioration of image quality is further suppressed by using a convolutional neural network model capable of learning and inference with higher accuracy.
  • the one or more residual blocks are two or more residual blocks.
  • the encoding apparatus can perform compression of an image in which the deterioration of image quality is further suppressed by using a convolutional neural network model capable of learning and inference with higher accuracy.
  • the two or more convolutional blocks are four or more convolutional blocks, and the one or more residual blocks constitute a residual group, and at least one of the four or more convolutional blocks.
  • At least one convolutional block including two convolutional blocks and not included in the residual group among the four or more convolutional blocks constitutes a first convolutional group, and the remaining ones of the four or more convolutional blocks
  • At least one convolution block which is not included in the difference group or the first convolution group constitutes a second convolution group, and data outputted from the first convolution group is inputted to the residual group, Data output from the residual group is input to the second convolution group.
  • the encoding apparatus can apply more sophisticated operations to the abstracted feature of the image. Therefore, efficient processing is possible.
  • the circuit includes a memory and a circuit accessible to the memory, and the circuit accessible to the memory uses a first convolutional neural network model to generate an input image from an encoding space area.
  • Decompression processing is performed on the input image by performing conversion to an image space area, and a decompressed image as a result of decompression on the input image is processed using the second convolutional neural network model.
  • a process is performed to acquire feature amounts used in post-processing, which is a process of approaching an image.
  • the decoding apparatus further suppresses the deterioration of the image quality by using the first convolutional neural network model for converting to the image space and the second convolutional neural network model for acquiring the feature amount used in the post-processing. Can be obtained.
  • the circuit capable of accessing the memory may further use the third convolutional neural network model, and use the feature value acquired using the second convolutional neural network model as the post-processing.
  • the decompressed image obtained using the first convolutional neural network model is processed to be close to the original image.
  • processing can be performed to bring the decompressed image closer to the input image in post-processing, so it is possible to obtain a decompressed image in which the deterioration of the image quality is further suppressed.
  • the first convolutional neural network model is used to convert the input image from the image space area to the encoding space area to perform compression processing on the input image, thereby performing a second convolutional neural network.
  • a network model is used to extract feature quantities used in post-processing, which is processing for bringing a decompressed image, which is the result of compression and decompression on the input image, closer to the input image.
  • the deterioration of the image quality is further suppressed by using the first convolutional neural network model for converting to the encoding space and the second convolutional neural network model for extracting feature quantities used in post-processing Image compression can be performed.
  • the input image is converted from the encoding space region to the image space region using the first convolutional neural network model to perform decompression processing on the input image, and the second convolution is performed.
  • a neural network model is used to acquire feature quantities used in post-processing, which is processing for bringing a decompressed image, which is the result of decompression on the input image, closer to the original image of the input image.
  • This decoding method uses the first convolutional neural network model for conversion to image space and the second convolutional neural network model for acquiring feature quantities used in post-processing to further suppress image quality deterioration. A cancellation image can be obtained.
  • these general or specific aspects may be realized by a system, an apparatus, a method, an integrated circuit, a computer program, or a non-transitory recording medium such as a computer readable CD-ROM.
  • the present invention may be realized as any combination of a system, an apparatus, a method, an integrated circuit, a computer program, and a recording medium.
  • Embodiment 1 First, an outline of the first embodiment will be described as an example of an image processing apparatus to which the processing and / or configuration described in each aspect of the present disclosure described later can be applied.
  • Embodiment 1 is merely an example of an image processing apparatus, encoding apparatus or decoding apparatus to which the processing and / or configuration described in each aspect of the present disclosure can be applied, and will be described in each aspect of the present disclosure.
  • the processing and / or configuration can also be implemented in an image processing apparatus, an encoding apparatus or a decoding apparatus different from the first embodiment.
  • each aspect of the present disclosure among a plurality of components constituting the image processing apparatus, the encoding apparatus, or the decoding apparatus Replacing the component corresponding to the component described in the above with the component described in each aspect of the present disclosure
  • the image processing apparatus, encoding apparatus or decoding apparatus according to the first embodiment is applied to an arbitrary change such as addition, replacement, or deletion of a function or a process to be performed on a part of a plurality of components constituting the processing device, the encoding device, or the decoding device.
  • the manner of implementation of the processing and / or configuration described in each aspect of the present disclosure is not limited to the above example.
  • it may be implemented in an apparatus used for a purpose different from the moving picture / image coding apparatus or the moving picture / image decoding apparatus disclosed in the first embodiment, or the process and / or the process described in each aspect.
  • the configuration may be implemented alone.
  • the processes and / or configurations described in the different embodiments may be implemented in combination.
  • CNN Convolutional Neural Network
  • CNN convolutional neural network
  • FIG. 1 is a diagram showing an MS-SSIM curve of the codec architecture in the comparative example.
  • the vertical axis indicates MS-SSIM (multi-scale structural similarity) with respect to RGB, and the horizontal axis indicates compression ratio (Bits per Pixsel).
  • the architecture corresponds to 265 conventional codecs, and WaveOne means that it is a Sakai architecture using a convolutional neural network (CNN).
  • CNN convolutional neural network
  • CNN convolutional neural network
  • CNN convolutional neural networks
  • CNN convolutional neural networks
  • codecs transform an input image from an image space domain to a coding space domain using Fourier transform such as discrete cosine transform.
  • Fourier transform provides many good properties for the codec
  • the Fourier transform transformed encoding space region may not be the optimal encoding space region for performing compression on the input image .
  • the image processing apparatus by using the convolutional neural network, it is possible to perform compression of the image in which the deterioration of the image quality is further suppressed and obtain a decompressed image in which the deterioration of the quality is further suppressed. be able to.
  • the image processing apparatus performs compression processing or decompression processing of an image using two convolutional neural networks. More specifically, the image processing apparatus uses a convolutional neural network model for performing a compression process and a convolutional neural network model for performing a process of extracting feature quantities used in post-processing. In addition, the image processing apparatus uses a convolutional neural network model for performing decompression processing and a convolutional neural network model for performing processing for acquiring feature amounts used in post-processing.
  • the image processing apparatus may include an encoding apparatus and a decoding apparatus.
  • the encoding device encodes an image. That is, the encoding apparatus compresses the original image (input image) to output a compressed image which is a result of the compression on the original image.
  • the decoding device decodes the encoded image. That is, the decoding apparatus performs decompression on the compressed image that is the result of compression on the original image, thereby outputting a decompressed image that is the result of decompression on the compressed image.
  • FIG. 2 is a block diagram showing an example of the configuration of the image processing apparatus 10 according to the present embodiment.
  • FIG. 3 is a block diagram showing an example of a configuration of coding apparatus 100 in the first embodiment.
  • FIG. 4 is a block diagram showing an example of a configuration of decoding apparatus 200 in the first embodiment.
  • the same elements as in FIG. 2 are denoted by the same reference numerals.
  • the image processing apparatus 10 illustrated in FIG. 2 includes an image coding unit 101, a post-processing feature extraction unit 102, a quantum unit 103, an entropy coding unit 104, a storage unit 105, an image decoding unit 106, and a post-processing. And a post-processing unit 108.
  • the image processing apparatus 10 may include the encoding apparatus 100 shown in FIG. 3 and the decoding apparatus 200 shown in FIG. 4.
  • the image encoding unit 101 transforms an input image from an image space region to an encoding space region using a first convolutional neural network model.
  • the first convolutional neural network model is subjected to learning for conversion into a coding space region optimal for image compression.
  • the first convolutional neural network model includes two or more convolutional blocks. Also, the first convolutional neural network model includes one or more residual blocks.
  • FIG. 5 is a block diagram showing a connection configuration of convolutional neural network 300 in the first embodiment.
  • FIG. 6 is a block diagram showing an example of a specific connection configuration of convolutional neural network 300 in the first embodiment.
  • FIG. 7 is a block diagram showing a configuration of convolution block 310 in the first embodiment.
  • FIG. 8 is a block diagram showing a configuration of residual block 320 in the first embodiment.
  • the first convolutional neural network model includes, for example, one or more convolutional blocks 310 and one or more convolutional blocks 310 followed by one or more residual blocks 320, as shown in FIG. 5, for example. After the residual block 320 of, one or more convolutional blocks 330 are included.
  • the configuration of the first convolutional neural network model is not limited to the configuration of the convolutional neural network 300 shown in FIG.
  • One or more convolutional blocks and one or more residual blocks may be configured in any way.
  • the one or more convolutional blocks are four or more convolutional blocks, and the one or more residual blocks constitute a residual group and may include at least two convolutional blocks of the four or more convolutional blocks. Good.
  • At least one convolutional block not included in the residual group among the four or more convolutional blocks constitutes a first convolutional group, and the first convolutional group is also included in the residual group among the four or more convolutional blocks.
  • the at least one convolution block which is not included also constitutes a second convolution group. Data output from the first convolutional group is input to the residual group, and data output from the residual group is input to the second convolution group.
  • the first convolutional neural network model may be a convolutional neural network model 300 shown in FIG. That is, the first convolutional neural network model includes, for example, two convolutional blocks 310 forming the first convolutional group, two residual blocks 320 forming the residual group, and two forming the second convolutional group. And a convolution block 330.
  • two convolution blocks 310 constituting a first convolution group are connected in series
  • two convolution blocks 330 constituting a second convolution group are connected in series.
  • the two residual blocks 320 that make up the residual group are also arranged in series.
  • the convolution block 310, 330 samples the input data at twice the original frequency
  • the residual block 320 adds more capacity to the convolutional neural network model while keeping the receptive field the same size as the convolution block 310, 330 .
  • the first convolutional neural network model can provide 16 times downsampling on the input image.
  • the first convolutional neural network model needs to have a latent space with high information density while learning strong expressions for the input image while reducing the receptive field to 16/1. It becomes. Having a high information density latent space eliminates the need to worry about collisions in the latent space throughout the latent space. Also, convolutional neural network models with high information density latent space can reconstruct arbitrary images from the latent space. This means that even with quantization, graceful degradation can be maintained while suppressing the introduction of distortion.
  • the convolution block 310 is a processing block including one or more convolutional layers, and includes two or more convolutional layers 311, a non-linear activation function 312, and a normalization layer 313, as shown in FIG. 7, for example. Note that, since the convolution block 330 and the convolution block 322 are also the same, the convolution block 310 will be described as an example here. In the example illustrated in FIG. 7, data input to the convolution block 310 is output from the convolution block 310 via the convolution layer 311, the non-linear activation function 312, and the normalization layer 313.
  • the convolution layer 311 is a processing layer that performs a convolution operation on the data input to the convolution block 310 and outputs the result of the convolution operation.
  • the convolution layer 311 is configured by, for example, 32 filters with a kernel size of 3 and stride 2.
  • the nonlinear activation function 312 is a function that outputs an operation result using data output from the convolution layer 311 as an argument.
  • the non-linear activation function 312 controls the output of the non-linear activation function 312 according to the bias.
  • the normalization layer 313 normalizes the data output from the non-linear activation function 312 and outputs normalized data in order to suppress data bias. In the present embodiment, the normalization layer 313 normalizes data output from the nonlinear activation function 312 using Batch Normalization that smoothes data values.
  • the residual block 320 is a processing block configured in a convolution group including two or more convolution layers 311 of at least one of the two or more convolution blocks 310 and 330 described above. Also, residual block 320 inputs incoming data into the convolutional group and adds incoming data to the data output from the convolutional group.
  • residual block 320 includes two convolutional blocks 322 connected in series, as shown for example in FIG. For example, data input to residual block 320 is input to one convolutional block 322 (ie, left convolutional block 322 in FIG. 8). Then, data output from one convolution block 322 is input to the other convolution block 322 (that is, the right convolution block 322 in FIG. 8).
  • the convolution block 322 has the same configuration as the convolution block 310 or 330.
  • data input to the residual block 320 is added to data output from the right convolution block 322 and output from the residual block 320. That is, the data input to the residual block 320 and the data output from the right convolution block 322 are summed and output from the residual block 320.
  • two convolutional blocks 322 are connected in series as the residual group 321, but three or more convolutional blocks 322 may be connected in series.
  • the image encoding unit 101 does not use the first convolutional neural network model, but uses a conventional method, that is, Fourier transform such as discrete cosine transform, to encode an input image from an image space region to an encoding space region. Conversion may be performed.
  • Fourier transform such as discrete cosine transform
  • the post-processing feature extraction unit 102 performs processing for extracting feature quantities used in post-processing using the second convolutional neural network model.
  • the feature amount is high frequency information included in the input image.
  • Post-processing is processing for bringing a decompressed image, which is the result of compression and decompression on an input image, closer to the input image.
  • the second convolutional neural network model is subjected to learning for performing processing for extracting feature quantities used in post-processing.
  • the second convolutional neural network model includes two or more convolutional blocks.
  • the first convolutional neural network model includes one or more residual blocks. That is, the configuration of the second convolutional neural network model is the same as that of the first convolutional neural network model, but may be different.
  • the configuration of the first convolutional neural network model is as described above, and thus the description thereof is omitted.
  • the quantum unit 103 quantizes the data output from the image encoding unit 101 and inversely quantizes the quantized data.
  • the quantum unit 103 includes, for example, the quantization unit 103A or the inverse quantization unit 103B illustrated in FIG.
  • the quantization unit 103A quantizes the data output from the image coding unit 101.
  • the quantizing unit 103A of the present embodiment is configured of, for example, a quantizer that controls the particle size according to (Expression 1). As a result, not only smooth quantization can be performed, but also errors in reconstruction (inverse quantization) can be suppressed.
  • the quantization unit 103A of the present embodiment in place of rounding up, the quantization unit that controls the particle size by (Equation 1) rounds off. As a result, the quantized representation will not be full bits, but will lose half of the bits.
  • the quantization unit 103B of the present embodiment can perform quantization more smoothly, it does not use vector quantization and perceptual metrics. If perceptual metrics are not used, it is possible to introduce a large distortion at the time of reconstruction (during inverse quantization). That is, in the quantization unit 103A of the present embodiment, there is room for further improvement because there is no function of vector quantization and perceptual metric.
  • the inverse quantization unit 103B performs inverse quantization on the compressed image (input image) decoded by the entropy decoding unit 104B. Specifically, the inverse quantization unit 103B inversely quantizes the data quantized by the quantization unit 103A, that is, the compressed image (input image) decoded by the entropy decoding unit 104B. Similar to the quantization unit 103A, the inverse quantization unit 103B may be a dequantizer that uses rounding instead of rounding up and controls the particle size according to (Expression 1).
  • the entropy coding unit 104 performs compression and decompression processing on the input image.
  • the entropy coding unit 104 includes, for example, the entropy coding unit 104A or the entropy decoding unit 104B shown in FIG.
  • the entropy coding unit 104A entropy codes the data output from the quantization unit 103A.
  • the entropy coding unit 104A according to the present embodiment performs adaptive binary arithmetic coding suitable for learning expression in order to remove all redundancy from the quantized expression.
  • the entropy coding unit 104A acquires, as the context of the pixel to be encoded, all pixels before the pixel to be encoded, which is all quantized representations.
  • the entropy coding unit 104A creates a histogram for all the previous pixels acquired as contexts. This histogram is used as a probability table by the entropy coding unit 104A.
  • coding by this method is simple, it has the same function as classical arithmetic coding or an entropy coder using deep learning. That is, the coding according to this method is H.264. H.264 / H. Although simpler than the CABAC used at 265, the results that can be obtained are fully available.
  • the entropy coding unit 104A may entropy code the data output from the quantization unit 103A and the feature quantity extracted by the post-processing feature extraction unit 102.
  • the entropy decoding unit 104B entropy decodes the compressed image (input image). Specifically, the entropy decoding unit 104B entropy decodes the compressed image (input image) using adaptive binary arithmetic coding. The detailed method is the same as the entropy coding, so the description is omitted.
  • the entropy decoding unit 104B entropy decodes the compressed image (input image).
  • the storage unit 105 stores the data entropy-coded by the entropy coding unit 104.
  • the storage unit 105 also outputs the stored entropy-coded data to the entropy decoding unit 104B.
  • the image decoding unit 106 transforms the input image from the encoding space region to the image space region using the first convolutional neural network model.
  • the first convolutional neural network model here is subjected to learning for conversion into an image space area optimal for image decompression.
  • the first convolutional neural network model here also includes two or more convolutional blocks.
  • the first convolutional neural network model includes one or more residual blocks. That is, the configuration of the first convolutional neural network model used by the image decoding unit 106 is the same as the configuration of the first convolutional neural network model used by the image encoding unit 101. Therefore, since the configuration of the first convolutional neural network model is as described above, the description will be omitted.
  • the image decoding unit 106 does not use the first convolutional neural network model, but uses the conventional method, that is, Fourier transform such as discrete cosine transform, to convert the input image into the image space region from the encoding space region.
  • a conversion inverse conversion may be performed.
  • the post-processing feature acquisition unit 107 uses the second convolutional neural network model to perform processing for acquiring feature quantities used in post-processing.
  • the feature amount is high frequency information included in the input image.
  • the post-processing is processing for bringing a decompressed image, which is a result of decompression on an input image, closer to an original image of the input image.
  • the configuration of the second convolutional neural network model is the same as the configuration of the second convolutional neural network model used by the post-processing feature extraction unit 102. Therefore, the configuration of the second convolutional neural network model is as described above, and thus the description thereof is omitted.
  • the post-processing feature acquisition unit 107 expands (decompresses) the entropy-coded data when the compressed image and the feature amount are input as the entropy-coded data to the entropy decoding unit 104B. Acquired by extracting feature quantities.
  • the post-processing feature acquisition unit 107 may acquire the feature quantity extracted by the post-processing feature extraction unit 102.
  • the post-processing unit 108 performs a process for bringing the decompressed image closer to the input image, and outputs the decompressed image subjected to the process as an output image.
  • the post-processing unit 108 performs post-processing using the third convolutional neural network model.
  • post-processing is processing for bringing a decompressed image obtained using the first convolutional neural network model closer to the original image, using feature amounts acquired using the second convolutional neural network model.
  • the post-processing unit 108 uses the feature amount acquired by the post-processing feature acquisition unit 107 for the decompressed image obtained by the image decoding unit 106 using the third convolutional neural network model. Perform processing to make it close to the original image.
  • the third convolutional neural network model consists of a series of convolutional blocks that maintain a constant receptive field. More specifically, the third convolutional neural network model includes two or more convolutional blocks. Also, the first convolutional neural network model includes one or more residual blocks. That is, the configuration of the third convolutional neural network model may be the same as the configuration of the first and second convolutional neural network models. The configuration of the first convolutional neural network model and the like is as described above, and thus the description thereof is omitted.
  • the post-processing unit 108 can improve the quality of the image using the third convolutional neural network model. As a result, it is possible to cause the image encoding unit 101 or the like to perform aggressive image compression while maintaining the value of MS-SSIM.
  • the post-processing unit 108 causes the post-processing feature acquisition unit 107 to acquire high-frequency information extracted from the original image in order to improve the quality of the image quality.
  • the high frequency information extracted from the original image is entropy coded together with the quantized image, and is entropy decoded in the entropy decoding unit 104B.
  • the entropy decoded high frequency information may be further decoded into the image space by the post-processing feature acquisition unit 107 or the entropy decoding unit 104B.
  • the post-processing unit 108 converts the high frequency information decoded into the image space acquired by the post-processing feature acquisition unit 107 into a decompressed image converted from the encoding space region to the image space region by the image decoding unit 106. include. This makes it possible to reintroduce the details lost due to quantization in the decompressed image, so that a higher MS-SSIM can be obtained.
  • the first to third convolutional neural networks are described as having residual blocks connected in a residual manner, but the present invention is not limited to this. Other architectures may be applied.
  • a feedback structure may be applied, such as a Recurrent Neural Network or a Recursive Neural Network.
  • the output of one or more convolutional blocks may be used as the input of the one or more convolutional blocks.
  • the residual connection may then be used in the reverse direction.
  • the image processing apparatus 10 of the present embodiment it is possible to perform compression of an image in which deterioration of image quality is further suppressed using a convolutional neural network, and to obtain a decompressed image in which deterioration of quality is further suppressed. it can.
  • CNN convolutional neural network
  • the convolutional neural network by using the convolutional neural network, it is possible to compress an image in which the deterioration of image quality is further suppressed, and an image in which a decompressed image in which the deterioration of quality is further suppressed can be obtained.
  • the processing device 10 can be realized.
  • CNN convolutional neural networks
  • CNN convolutional neural networks
  • GAN Generative Adversalial Networks
  • image compression is performed using a convolutional neural network that does not employ GAN. This embodiment does not focus on the basic network architecture needed for good image modeling, but focuses on the network architecture for the very difficult hyperparameter search needed to achieve good convergence. It is because it is applied.
  • GAN may be employed to obtain better results.
  • FIG. 9 is a diagram showing an experimental result on effectiveness verification of the image processing apparatus 10 in the first embodiment.
  • FIG. 9 shows experimental results when learned with the RAISE 6K data set and verified with the KODAK test data set.
  • Encoder corresponds to the image encoding unit 101
  • PostProcessor corresponds to the post-processing unit 108.
  • All Modules corresponds to the image processing apparatus 10.
  • the RAISE 6K dataset is a dataset consisting of raw natural images, consisting of 6,000 4K photographs evenly divided into seven categories: indoor, outdoor, nature, people, objects, buildings .
  • preparation is made to randomly take out 10 parts of 128 ⁇ 128 pixels in size for each image to make a data set of learning images.
  • KODAK's data set is a test data set composed of natural images, and consists of 24 images of 768 ⁇ 512 pixels. Note that the natural images that make up the KODAK data set include various colors and textures, so it is a difficult data set for image compression.
  • the image processing apparatus 10 can perform compression of the image in which the deterioration of the image quality is further suppressed, and the compression in which the deterioration of the quality is further suppressed It turned out that a cancellation image can be obtained.
  • FIG. 10 is a block diagram showing an implementation example of the coding apparatus 100 according to the first embodiment.
  • the encoding device 100 includes a circuit 160 and a memory 162.
  • a part of the image processing apparatus 10 shown in FIG. 2 and a plurality of components of the encoding apparatus 100 shown in FIG. 3 are implemented by the circuit 160 and the memory 162 shown in FIG.
  • the circuit 160 is a circuit that performs information processing and can access the memory 162.
  • the circuit 160 is a dedicated or general-purpose electronic circuit that encodes an image.
  • the circuit 160 may be a processor such as a CPU.
  • the circuit 160 may also be an assembly of a plurality of electronic circuits.
  • the circuit 160 may play a role of a plurality of components excluding the component for storing information among the plurality of components of the encoding device 100 illustrated in FIG. 3 and the like.
  • the memory 162 is a dedicated or general-purpose memory in which information for the circuit 160 to encode an image is stored.
  • the memory 162 may be an electronic circuit or may be connected to the circuit 160.
  • the memory 162 may also be included in the circuit 160.
  • the memory 162 may be a collection of a plurality of electronic circuits.
  • the memory 162 may be a magnetic disk or an optical disk, or may be expressed as a storage or a recording medium.
  • the memory 162 may be a non-volatile memory or a volatile memory.
  • the memory 162 may store a moving image composed of a plurality of images to be encoded, or may store a bit string corresponding to the encoded image.
  • the memory 162 may also store a program for the circuit 160 to encode a moving image.
  • a plurality of convolutional neural network models may be stored.
  • the memory 162 may store a plurality of parameters of a plurality of convolutional neural network models.
  • all of the plurality of components shown in FIG. 3 and the like may not be mounted, or all of the plurality of processes described above may not be performed. Some of the plurality of components shown in FIG. 3 and the like may be included in another device, and some of the plurality of processes described above may be performed by another device.
  • FIG. 11 is a flowchart showing an operation example of the coding apparatus 100 shown in FIG.
  • the coding apparatus 100 shown in FIG. 10 performs the operation shown in FIG.
  • the circuit 160 of the encoding device 100 transforms the input image from the image space region to the encoding space region using the first convolutional neural network model using the memory 162.
  • the compression process is performed on the input image (S101).
  • the circuit 160 of the encoding apparatus 100 extracts a feature value used in post-processing, which is processing to bring the decompressed image closer to the input image, using the memory 162.
  • a process is performed (S102).
  • the encoding apparatus 100 degrades the image quality using the first convolutional neural network model for converting to the encoding space and the second convolutional neural network model for extracting the feature amount used in the post-processing. Can be compressed more effectively.
  • FIG. 12 is a block diagram showing an implementation example of the decoding apparatus 200 according to the first embodiment.
  • the decoding device 200 includes a circuit 260 and a memory 262.
  • a part of the image processing apparatus 10 shown in FIG. 2 and a plurality of components of the decoding apparatus 200 shown in FIG. 4 are implemented by the circuit 260 and the memory 262 shown in FIG.
  • the circuit 260 is a circuit that performs information processing and can access the memory 262.
  • circuit 260 is a dedicated or general purpose electronic circuit that uses memory 262 to decode the compressed image.
  • the circuit 260 may be a processor such as a CPU.
  • the circuit 260 may be a collection of a plurality of electronic circuits.
  • the circuit 260 may play a role of a plurality of components excluding the component for storing information among the plurality of components of the decoding apparatus 200 illustrated in FIG. 4 and the like.
  • the memory 262 is a dedicated or general-purpose memory in which information for the circuit 260 to decode a compressed image or a decompressed image after decoding is stored.
  • the memory 262 may be an electronic circuit or may be connected to the circuit 260. Also, the memory 262 may be included in the circuit 260. Further, the memory 262 may be a collection of a plurality of electronic circuits. Also, the memory 262 may be a magnetic disk or an optical disk, or may be expressed as a storage or a recording medium.
  • the memory 262 may be either a non-volatile memory or a volatile memory.
  • a bit string corresponding to the encoded image may be stored, or a decompressed image corresponding to the decoded bit string may be stored.
  • the memory 262 may also store a program for the circuit 260 to decode an image.
  • the memory 262 may store a plurality of convolutional neural network models.
  • the memory 262 may store a plurality of parameters of a plurality of convolutional neural network models.
  • all of the plurality of components shown in FIG. 4 and the like may not be mounted, or all of the plurality of processes described above may not be performed. Some of the plurality of components shown in FIG. 4 and the like may be included in another device, and some of the plurality of processes described above may be performed by another device.
  • FIG. 13 is a flow chart showing an operation example of the decoding apparatus 200 shown in FIG.
  • the decoding apparatus 200 shown in FIG. 12 performs the operation shown in FIG.
  • the circuit 260 of the decoding device 200 performs conversion of the input image from the encoding space region to the image space region using the memory 262 and using the first convolutional neural network model. Decompression processing is performed on the input image (S411).
  • the circuit 260 of the decoding device 200 uses the memory 262 to process the decompressed image, which is the result of decompression on the input image, closer to the original image of the input image using the second convolutional neural network model.
  • a process is performed to acquire a feature amount used in a certain post-process (S412).
  • the decompressed image in which the deterioration of the image quality is further suppressed by using the first convolutional neural network model for converting to the image space and the second convolutional neural network model for acquiring the feature used in the post-processing is obtained. You can get it.
  • coding apparatus 100 and decoding apparatus 200 in the present embodiment may be used as an image coding apparatus that codes an image such as an intra picture and an image decoding apparatus that decodes a compressed image. Furthermore, even if encoding apparatus 100 and decoding apparatus 200 in the present embodiment are each used as a moving image encoding apparatus that encodes each of a plurality of images and a moving image decoding apparatus that decodes each of a plurality of compressed images. Good.
  • At least a part of the present embodiment may be used as a coding method, may be used as a decoding method, or may be used as another method.
  • each component may be configured by dedicated hardware or implemented by executing a software program suitable for each component.
  • Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the image processing apparatus 10 may include a processing circuit (Processing Circuitry) and a storage device (Storage) electrically connected to the processing circuit and accessible to the processing circuit.
  • a processing circuit Processing Circuitry
  • a storage device Storage
  • the processing circuit corresponds to the circuit 110
  • the storage device corresponds to the memory 262.
  • the processing circuit includes at least one of dedicated hardware and a program execution unit, and executes processing using a storage device.
  • the storage device stores a software program executed by the program execution unit.
  • the software for realizing the image processing apparatus 10 and the like of the present embodiment is a program as follows.
  • this program performs compression processing on the input image by performing conversion on the input image from the image space region to the encoding space region using the first convolutional neural network model in the computer, Coding that uses a second convolutional neural network model to extract feature quantities used in post processing, which is processing for bringing a decompressed image, which is the result of compression and decompression on the input image, closer to the input image
  • Coding that uses a second convolutional neural network model to extract feature quantities used in post processing, which is processing for bringing a decompressed image, which is the result of compression and decompression on the input image, closer to the input image
  • the method may be implemented.
  • the program also causes the computer to perform a decompression process on the input image by transforming the input image from the encoding space region to the image space region using the first convolutional neural network model.
  • the second convolutional neural network model is used to obtain a feature amount to be used in post-processing, which is processing for bringing a decompressed image, which is a result of decompression on the input image, closer to the original image of the input image.
  • a decryption method may be performed.
  • each component may be a circuit as described above. These circuits may constitute one circuit as a whole or may be separate circuits. Each component may be realized by a general purpose processor or a dedicated processor.
  • first and second ordinal numbers may be given as appropriate to components and the like.
  • the aspect of the image processing apparatus 10 was demonstrated based on embodiment, the aspect of the image processing apparatus 10 is not limited to this embodiment. Without departing from the spirit of the present disclosure, various modifications that may occur to those skilled in the art may be applied to the present embodiment, and a form configured by combining components in different embodiments may be included within the scope of the image processing apparatus 10. It may be included.
  • This aspect may be practiced in combination with at least some of the other aspects in this disclosure. Also, part of the processing or part of the configuration of this aspect may be implemented in combination with other aspects.
  • This aspect may be practiced in combination with at least some of the other aspects in the present disclosure.
  • part of the processing described in the flowchart of this aspect part of the configuration of the apparatus, part of the syntax, and the like may be implemented in combination with other aspects.
  • each of the functional blocks can usually be realized by an MPU, a memory, and the like. Further, the processing by each of the functional blocks is usually realized by a program execution unit such as a processor reading and executing software (program) recorded in a recording medium such as a ROM.
  • the software may be distributed by downloading or the like, or may be distributed by being recorded in a recording medium such as a semiconductor memory.
  • each embodiment may be realized by centralized processing using a single device (system), or may be realized by distributed processing using a plurality of devices. Good.
  • the processor that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.
  • the system is characterized by having an image coding apparatus using an image coding method, an image decoding apparatus using an image decoding method, and an image coding / decoding apparatus provided with both.
  • Other configurations in the system can be suitably modified as the case may be.
  • FIG. 14 is a diagram showing an overall configuration of a content supply system ex100 for realizing content distribution service.
  • the area for providing communication service is divided into desired sizes, and base stations ex106, ex107, ex108, ex109 and ex110, which are fixed wireless stations, are installed in each cell.
  • each device such as a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, and a smartphone ex115 via the Internet service provider ex102 or the communication network ex104 and the base stations ex106 to ex110 on the Internet ex101 Is connected.
  • the content supply system ex100 may connect any of the above-described elements in combination.
  • the respective devices may be connected to each other directly or indirectly via a telephone network, near-field radio, etc., not via the base stations ex106 to ex110 which are fixed wireless stations.
  • the streaming server ex103 is connected to each device such as the computer ex111, the game machine ex112, the camera ex113, the home appliance ex114, and the smartphone ex115 via the Internet ex101 or the like.
  • the streaming server ex103 is connected to a terminal or the like in a hotspot in the aircraft ex117 via the satellite ex116.
  • a radio access point or a hotspot may be used instead of base stations ex106 to ex110.
  • the streaming server ex103 may be directly connected to the communication network ex104 without the internet ex101 or the internet service provider ex102, or may be directly connected with the airplane ex117 without the satellite ex116.
  • the camera ex113 is a device capable of shooting a still image such as a digital camera and shooting a moving image.
  • the smartphone ex115 is a smartphone, a mobile phone, a PHS (Personal Handyphone System), or the like compatible with a mobile communication system generally called 2G, 3G, 3.9G, 4G, and 5G in the future.
  • the home appliance ex118 is a refrigerator or a device included in a home fuel cell cogeneration system.
  • a terminal having a photographing function when a terminal having a photographing function is connected to the streaming server ex103 through the base station ex106 or the like, live distribution and the like become possible.
  • a terminal (a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, a smartphone ex115, a terminal in an airplane ex117, etc.) transmits the still image or moving image content captured by the user using the terminal.
  • the encoding process described in each embodiment is performed, and video data obtained by the encoding and sound data obtained by encoding a sound corresponding to the video are multiplexed, and the obtained data is transmitted to the streaming server ex103. That is, each terminal functions as an image coding apparatus according to an aspect of the present disclosure.
  • the streaming server ex 103 streams the content data transmitted to the requested client.
  • the client is a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, a smartphone ex115, a terminal in the airplane ex117, or the like capable of decoding the above-described encoded data.
  • Each device that receives the distributed data decrypts and reproduces the received data. That is, each device functions as an image decoding device according to an aspect of the present disclosure.
  • the streaming server ex103 may be a plurality of servers or a plurality of computers, and may process, record, or distribute data in a distributed manner.
  • the streaming server ex103 may be realized by a CDN (Contents Delivery Network), and content delivery may be realized by a network connecting a large number of edge servers distributed around the world and the edge servers.
  • CDN Content Delivery Network
  • content delivery may be realized by a network connecting a large number of edge servers distributed around the world and the edge servers.
  • physically close edge servers are dynamically assigned according to clients. The delay can be reduced by caching and distributing the content to the edge server.
  • processing is distributed among multiple edge servers, or the distribution subject is switched to another edge server, or a portion of the network where a failure has occurred. Since the delivery can be continued bypassing, high-speed and stable delivery can be realized.
  • each terminal may perform encoding processing of captured data, or may perform processing on the server side, or may share processing with each other.
  • a processing loop is performed twice.
  • the first loop the complexity or code amount of the image in frame or scene units is detected.
  • the second loop processing is performed to maintain the image quality and improve the coding efficiency.
  • the terminal performs a first encoding process
  • the server receiving the content performs a second encoding process, thereby improving the quality and efficiency of the content while reducing the processing load on each terminal. it can.
  • the first encoded data made by the terminal can also be received and reproduced by another terminal, enabling more flexible real time delivery Become.
  • the camera ex 113 or the like extracts a feature amount from an image, compresses data relating to the feature amount as metadata, and transmits the data to the server.
  • the server performs compression according to the meaning of the image, for example, determining the importance of the object from the feature amount and switching the quantization accuracy.
  • Feature amount data is particularly effective in improving the accuracy and efficiency of motion vector prediction at the time of second compression in the server.
  • the terminal may perform simple coding such as VLC (variable length coding) and the server may perform coding with a large processing load such as CABAC (context adaptive binary arithmetic coding method).
  • a plurality of video data in which substantially the same scenes are shot by a plurality of terminals.
  • a unit of GOP Group of Picture
  • a unit of picture or a tile into which a picture is divided, using a plurality of terminals for which photographing was performed and other terminals and servers which are not photographing as necessary.
  • the encoding process is allocated in units, etc. to perform distributed processing. This reduces delay and can realize more real time performance.
  • the server may manage and / or instruct the video data captured by each terminal to be mutually referred to.
  • the server may receive the encoded data from each terminal and change the reference relationship among a plurality of data, or may correct or replace the picture itself and re-encode it. This makes it possible to generate streams with enhanced quality and efficiency of each piece of data.
  • the server may deliver the video data after performing transcoding for changing the coding method of the video data.
  • the server may convert the encoding system of the MPEG system into the VP system, or the H.264 system. H.264. It may be converted to 265.
  • the encoding process can be performed by the terminal or one or more servers. Therefore, in the following, although the description such as “server” or “terminal” is used as the subject of processing, part or all of the processing performed by the server may be performed by the terminal, or the processing performed by the terminal Some or all may be performed on the server. In addition, with regard to these, the same applies to the decoding process.
  • the server not only encodes a two-dimensional moving image, but also automatically encodes a still image based on scene analysis of the moving image or at a time designated by the user and transmits it to the receiving terminal. It is also good. Furthermore, if the server can acquire relative positional relationship between the imaging terminals, the three-dimensional shape of the scene is not only determined based on the two-dimensional moving image but also the video of the same scene captured from different angles. Can be generated. Note that the server may separately encode three-dimensional data generated by a point cloud or the like, or an image to be transmitted to the receiving terminal based on a result of recognizing or tracking a person or an object using the three-dimensional data. Alternatively, it may be generated by selecting or reconfiguring from videos taken by a plurality of terminals.
  • the user can enjoy the scene by arbitrarily selecting each video corresponding to each photographing terminal, or from the three-dimensional data reconstructed using a plurality of images or videos, the video of the arbitrary viewpoint You can also enjoy the extracted content.
  • the sound may be picked up from a plurality of different angles as well as the video, and the server may multiplex the sound from a specific angle or space with the video and transmit it according to the video.
  • the server may create viewpoint images for the right eye and for the left eye, respectively, and may perform coding to allow reference between each viewpoint video using Multi-View Coding (MVC) or the like. It may be encoded as another stream without reference. At the time of decoding of another stream, reproduction may be performed in synchronization with each other so that a virtual three-dimensional space is reproduced according to the viewpoint of the user.
  • MVC Multi-View Coding
  • the server superimposes virtual object information in the virtual space on camera information in the real space based on the three-dimensional position or the movement of the user's viewpoint.
  • the decoding apparatus may acquire or hold virtual object information and three-dimensional data, generate a two-dimensional image according to the movement of the user's viewpoint, and create superimposed data by smoothly connecting.
  • the decoding device transmits the motion of the user's viewpoint to the server in addition to the request for virtual object information, and the server creates superimposed data in accordance with the motion of the viewpoint received from the three-dimensional data held in the server.
  • the superimposed data may be encoded and distributed to the decoding device.
  • the superimposed data has an ⁇ value indicating transparency as well as RGB
  • the server sets the ⁇ value of a portion other than the object created from the three-dimensional data to 0 etc., and the portion is transparent , May be encoded.
  • the server may set RGB values of predetermined values as a background, such as chroma key, and generate data in which the portion other than the object has a background color.
  • the decryption processing of the distributed data may be performed by each terminal which is a client, may be performed by the server side, or may be performed sharing each other.
  • one terminal may send a reception request to the server once, the content corresponding to the request may be received by another terminal and decoded, and the decoded signal may be transmitted to a device having a display. Data of high image quality can be reproduced by distributing processing and selecting appropriate content regardless of the performance of the communicable terminal itself.
  • a viewer's personal terminal may decode and display a partial area such as a tile in which a picture is divided. Thereby, it is possible to confirm at hand the area in which the user is in charge or the area to be checked in more detail while sharing the whole image.
  • encoded data over the network such as encoded data being cached on a server that can be accessed in a short time from a receiving terminal, or copied to an edge server in a content delivery service, etc. It is also possible to switch the bit rate of the received data based on ease.
  • the switching of content will be described using a scalable stream compression-coded by applying the moving picture coding method shown in each of the above-described embodiments shown in FIG.
  • the server may have a plurality of streams with the same content but different qualities as individual streams, but is temporally / spatial scalable which is realized by coding into layers as shown in the figure.
  • the configuration may be such that the content is switched using the feature of the stream. That is, the decoding side determines low-resolution content and high-resolution content by determining which layer to decode depending on the internal factor of performance and external factors such as the state of the communication band. It can be switched freely and decoded. For example, when it is desired to view the continuation of the video being watched by the smartphone ex115 while moving on a device such as the Internet TV after returning home, the device only has to decode the same stream to different layers, so the burden on the server side Can be reduced.
  • the picture is encoded for each layer, and the enhancement layer includes meta information based on statistical information of the image, etc., in addition to the configuration for realizing the scalability in which the enhancement layer exists above the base layer.
  • the decoding side may generate high-quality content by super-resolving a picture of the base layer based on the meta information.
  • the super resolution may be either an improvement in the SN ratio at the same resolution or an expansion of the resolution.
  • Meta information includes information for identifying linear or non-linear filter coefficients used for super-resolution processing, or information for identifying parameter values in filter processing used for super-resolution processing, machine learning or least squares operation, etc. .
  • the picture may be divided into tiles or the like according to the meaning of an object or the like in the image, and the decoding side may be configured to decode only a part of the area by selecting the tile to be decoded.
  • the decoding side can position the desired object based on the meta information And determine the tile that contains the object. For example, as shown in FIG. 16, meta information is stored using a data storage structure different from pixel data, such as an SEI message in HEVC. This meta information indicates, for example, the position, size, or color of the main object.
  • meta information may be stored in units of a plurality of pictures, such as streams, sequences, or random access units.
  • the decoding side can acquire the time when a specific person appears in the video and the like, and can identify the picture in which the object exists and the position of the object in the picture by combining the information with the picture unit.
  • FIG. 17 is a diagram showing an example of a display screen of a web page in the computer ex111 and the like.
  • FIG. 18 is a diagram showing an example of a display screen of a web page in the smartphone ex115 and the like.
  • the web page may include a plurality of link images which are links to image content, and the appearance differs depending on the browsing device.
  • the display device When multiple link images are visible on the screen, the display device until the user explicitly selects the link image, or until the link image approaches near the center of the screen or the entire link image falls within the screen
  • the (decoding device) displays still images or I pictures of each content as link images, displays images such as gif animation with a plurality of still images or I pictures, etc., receives only the base layer Decode and display.
  • the display device decodes the base layer with the highest priority.
  • the display device may decode up to the enhancement layer if there is information indicating that the content is scalable in the HTML configuring the web page.
  • the display device decodes only forward referenced pictures (I picture, P picture, forward referenced only B picture) before the selection or when the communication band is very strict. And, by displaying, it is possible to reduce the delay between the decoding time of the leading picture and the display time (delay from the start of decoding of the content to the start of display).
  • the display device may roughly ignore the reference relationship of pictures and roughly decode all B pictures and P pictures with forward reference, and may perform normal decoding as time passes and the number of received pictures increases.
  • the receiving terminal when transmitting or receiving still image or video data such as two-dimensional or three-dimensional map information for automatic traveling or driving assistance of a car, the receiving terminal is added as image information belonging to one or more layers as meta information Information on weather or construction may also be received, and these may be correlated and decoded.
  • the meta information may belong to the layer or may be simply multiplexed with the image data.
  • the receiving terminal since a car including a receiving terminal, a drone or an airplane moves, the receiving terminal transmits the position information of the receiving terminal at the time of reception request to seamlessly receive and decode while switching the base stations ex106 to ex110. Can be realized.
  • the receiving terminal can dynamically switch how much meta information is received or how much map information is updated according to the user's selection, the user's situation or the state of the communication band. become.
  • the client can receive, decode, and reproduce the encoded information transmitted by the user in real time.
  • the server may perform the encoding process after performing the editing process. This can be realized, for example, with the following configuration.
  • the server performs recognition processing such as shooting error, scene search, meaning analysis, and object detection from the original image or encoded data after shooting in real time or by accumulation. Then, the server manually or automatically corrects out-of-focus or camera shake, etc. based on the recognition result, or a scene with low importance such as a scene whose brightness is low or out of focus compared with other pictures. Make edits such as deleting, emphasizing the edge of an object, or changing the color. The server encodes the edited data based on the edited result. It is also known that the audience rating drops when the shooting time is too long, and the server works not only with scenes with low importance as described above, but also moves as content becomes within a specific time range according to the shooting time. Scenes with a small amount of motion may be clipped automatically based on the image processing result. Alternatively, the server may generate and encode a digest based on the result of semantic analysis of the scene.
  • recognition processing such as shooting error, scene search, meaning analysis, and object detection from the original image or encoded data after shooting in real
  • the server may change and encode the face of a person at the periphery of the screen, or the inside of a house, etc. into an image out of focus.
  • the server recognizes whether or not the face of a person different from the person registered in advance appears in the image to be encoded, and if so, performs processing such as mosaicing the face portion. May be Alternatively, the user designates a person or background area desired to process an image from the viewpoint of copyright etc.
  • preprocessing or post-processing of encoding replaces the designated area with another video or blurs the focus. It is also possible to perform such processing. If it is a person, it is possible to replace the image of the face part while tracking the person in the moving image.
  • the decoding apparatus first receives the base layer with the highest priority, and performs decoding and reproduction, although it depends on the bandwidth.
  • the decoding device may receive the enhancement layer during this period, and may play back high-quality video including the enhancement layer if it is played back more than once, such as when playback is looped.
  • scalable coding it is possible to provide an experience in which the stream gradually becomes smart and the image becomes better although it is a rough moving image when it is not selected or when it starts watching.
  • the same experience can be provided even if the coarse stream played back first and the second stream coded with reference to the first moving image are configured as one stream .
  • these encoding or decoding processes are generally processed in an LSI ex 500 that each terminal has.
  • the LSI ex 500 may be a single chip or a plurality of chips.
  • Software for moving image encoding or decoding is incorporated in any recording medium (CD-ROM, flexible disk, hard disk, etc.) readable by computer ex111 or the like, and encoding or decoding is performed using the software. It is also good.
  • moving image data acquired by the camera may be transmitted. The moving image data at this time is data encoded by the LSI ex 500 included in the smartphone ex 115.
  • the LSI ex 500 may be configured to download and activate application software.
  • the terminal first determines whether the terminal corresponds to the content coding scheme or has the ability to execute a specific service. If the terminal does not support the content encoding method or does not have the ability to execute a specific service, the terminal downloads the codec or application software, and then acquires and reproduces the content.
  • the present invention is not limited to the content supply system ex100 via the Internet ex101, but also to a system for digital broadcasting at least a moving picture coding apparatus (image coding apparatus) or a moving picture decoding apparatus (image decoding apparatus) of the above embodiments. Can be incorporated. There is a difference in that it is multicast-oriented with respect to the configuration in which the content supply system ex100 can be easily unicasted, since multiplexed data in which video and sound are multiplexed is transmitted on broadcast radio waves using satellites etc. Similar applications are possible for the encoding process and the decoding process.
  • FIG. 19 is a diagram showing the smartphone ex115.
  • FIG. 20 is a diagram showing a configuration example of the smartphone ex115.
  • the smartphone ex115 receives an antenna ex450 for transmitting and receiving radio waves to and from the base station ex110, a camera unit ex465 capable of taking video and still images, a video taken by the camera unit ex465, and the antenna ex450 And a display unit ex ⁇ b> 458 for displaying data obtained by decoding an image or the like.
  • the smartphone ex115 further includes an operation unit ex466 that is a touch panel or the like, a voice output unit ex457 that is a speaker or the like for outputting voice or sound, a voice input unit ex456 that is a microphone or the like for inputting voice, Identify the user, the memory unit ex 467 capable of storing encoded video or still image, recorded voice, received video or still image, encoded data such as mail, or decoded data, and specify a network, etc. And a slot unit ex464 that is an interface unit with the SIM ex 468 for authenticating access to various data. Note that an external memory may be used instead of the memory unit ex467.
  • a main control unit ex460 that integrally controls the display unit ex458 and the operation unit ex466, a power supply circuit unit ex461, an operation input control unit ex462, a video signal processing unit ex455, a camera interface unit ex463, a display control unit ex459, / Demodulation unit ex452, multiplexing / demultiplexing unit ex453, audio signal processing unit ex454, slot unit ex464, and memory unit ex467 are connected via a bus ex470.
  • the power supply circuit unit ex461 activates the smartphone ex115 to an operable state by supplying power from the battery pack to each unit.
  • the smartphone ex115 performs processing such as call and data communication based on control of the main control unit ex460 having a CPU, a ROM, a RAM, and the like.
  • the audio signal collected by the audio input unit ex456 is converted to a digital audio signal by the audio signal processing unit ex454, spread spectrum processing is performed by the modulation / demodulation unit ex452, and digital analog conversion is performed by the transmission / reception unit ex451.
  • transmission is performed via the antenna ex450.
  • the received data is amplified and subjected to frequency conversion processing and analog-to-digital conversion processing, subjected to spectrum despreading processing by modulation / demodulation unit ex452, and converted to an analog sound signal by sound signal processing unit ex454.
  • Output from In the data communication mode text, still images, or video data are sent to the main control unit ex460 via the operation input control unit ex462 by the operation of the operation unit ex466 or the like of the main unit, and transmission and reception processing is similarly performed.
  • the video signal processing unit ex 455 executes the video signal stored in the memory unit ex 467 or the video signal input from the camera unit ex 465 as described above.
  • the video data is compressed and encoded by the moving picture encoding method shown in the form, and the encoded video data is sent to the multiplexing / demultiplexing unit ex453.
  • the audio signal processing unit ex454 encodes an audio signal collected by the audio input unit ex456 while capturing a video or a still image with the camera unit ex465, and sends the encoded audio data to the multiplexing / demultiplexing unit ex453.
  • the multiplexing / demultiplexing unit ex453 multiplexes the encoded video data and the encoded audio data according to a predetermined method, and performs modulation processing and conversion by the modulation / demodulation unit (modulation / demodulation circuit unit) ex452 and the transmission / reception unit ex451. It processes and transmits via antenna ex450.
  • the multiplexing / demultiplexing unit ex453 multiplexes in order to decode multiplexed data received via the antenna ex450.
  • the multiplexed data is divided into a bit stream of video data and a bit stream of audio data, and the encoded video data is supplied to the video signal processing unit ex455 via the synchronization bus ex470, and The converted audio data is supplied to the audio signal processing unit ex 454.
  • the video signal processing unit ex 455 decodes the video signal by the moving picture decoding method corresponding to the moving picture coding method described in each of the above embodiments, and is linked from the display unit ex 458 via the display control unit ex 459. An image or a still image included in the moving image file is displayed.
  • the audio signal processing unit ex 454 decodes the audio signal, and the audio output unit ex 457 outputs the audio. Furthermore, since real-time streaming is widespread, depending on the user's situation, it may happen that sound reproduction is not socially appropriate. Therefore, as an initial value, it is preferable to have a configuration in which only the video data is reproduced without reproducing the audio signal. Audio may be synchronized and played back only when the user performs an operation such as clicking on video data.
  • the smartphone ex115 has been described as an example, in addition to a transceiving terminal having both an encoder and a decoder as a terminal, a transmitting terminal having only the encoder and a receiver having only the decoder There are three possible implementation forms: terminals. Furthermore, in the digital broadcasting system, it has been described that multiplexed data in which audio data is multiplexed with video data is received or transmitted, but in multiplexed data, character data related to video other than audio data is also described. It may be multiplexed, or video data itself may be received or transmitted, not multiplexed data.
  • the terminal often includes a GPU. Therefore, a configuration in which a large area is collectively processed using the performance of the GPU may be performed using a memory shared by the CPU and the GPU, or a memory whose address is managed so as to be commonly used. As a result, coding time can be shortened, real time property can be secured, and low delay can be realized. In particular, it is efficient to perform processing of motion search, deblock filter, sample adaptive offset (SAO), and transform / quantization collectively in units of pictures or the like on the GPU instead of the CPU.
  • SAO sample adaptive offset
  • the present disclosure is applicable to, for example, a television receiver, a digital video recorder, a car navigation system, a mobile phone, a digital camera, a digital video camera, a video conference system, an electronic mirror, and the like.
  • Image processing apparatus 100 Encoding apparatus 101 Image coding part 102 Feature extraction part 103 for post-processing Quantum part 103A Quantization part 103B Inverse quantization part 104 Entropy coding part 104A Entropy coding part 104B Entropy decoding part 105 Storage part 106 Image Decoding unit 107 Post-processing feature acquisition unit 108 Post-processing unit 160, 260 Circuit 162, 262 Memory 200 Decoding device 300 Convolutional neural network 310, 322, 330 Convoluted block 311 Convoluted layer 312 Nonlinear activation function 313 Normalized layer 320 Residual Block 321 residual group

Abstract

L'invention concerne un dispositif de codage (100) qui est pourvu d'une mémoire (162) et d'un circuit (160). Le circuit (160) utilise un premier modèle de réseau neuronal de convolution afin d'exécuter, sur une image d'entrée, la conversion d'une région d'espace d'image en une région d'espace codé afin d'exécuter un processus de compression sur l'image d'entrée, et utilise un second modèle de réseau neuronal convolutionnel afin d'exécuter un processus d'extraction de quantités de caractéristiques utilisées pour un post-processus de fabrication d'une image décompressée plus proche de l'image d'entrée, l'image décompressée étant obtenue suite à la compression et à la décompression de l'image d'entrée.
PCT/JP2018/040801 2017-11-08 2018-11-02 Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage WO2019093234A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762583156P 2017-11-08 2017-11-08
US62/583,156 2017-11-08

Publications (1)

Publication Number Publication Date
WO2019093234A1 true WO2019093234A1 (fr) 2019-05-16

Family

ID=66438862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/040801 WO2019093234A1 (fr) 2017-11-08 2018-11-02 Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage

Country Status (1)

Country Link
WO (1) WO2019093234A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019208677A1 (fr) * 2018-04-27 2019-10-31 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
US10872186B2 (en) 2017-01-04 2020-12-22 Stmicroelectronics S.R.L. Tool to create a reconfigurable interconnect framework
US11227086B2 (en) 2017-01-04 2022-01-18 Stmicroelectronics S.R.L. Reconfigurable interconnect
US20220224907A1 (en) * 2019-05-10 2022-07-14 Nippon Telegraph And Telephone Corporation Encoding apparatus, encoding method, and program
WO2022210661A1 (fr) * 2021-03-30 2022-10-06 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Procédé de codage d'images, procédé de décodage d'images, procédé de traitement d'images, dispositif de codage d'images, et dispositif de décodage d'images
US11531873B2 (en) 2020-06-23 2022-12-20 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression
US11593609B2 (en) 2020-02-18 2023-02-28 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks
JP2023516305A (ja) * 2020-12-16 2023-04-19 テンセント・アメリカ・エルエルシー 異種クライアントエンドポイントへのストリーミングのための2dビデオの適応のためのニューラルネットワークモデルの参照

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016156864A1 (fr) * 2015-03-31 2016-10-06 Magic Pony Technology Limited Apprentissage de processus vidéo bout-à-bout
WO2017178827A1 (fr) * 2016-04-15 2017-10-19 Magic Pony Technology Limited Post-filtrage en boucle destiné au codage et décodage vidéo

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016156864A1 (fr) * 2015-03-31 2016-10-06 Magic Pony Technology Limited Apprentissage de processus vidéo bout-à-bout
WO2017178827A1 (fr) * 2016-04-15 2017-10-19 Magic Pony Technology Limited Post-filtrage en boucle destiné au codage et décodage vidéo

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JOHNSTON, NICK ET AL.: "Improved Lossy Image Compression with Priming and Spatially Adaptive BitRates for Recurrent Networks", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 18 June 2018 (2018-06-18), XP033476412 *
TAI, YING ET AL.: "Image Super-Resolution via Deep Recursive Residual Network", CVPR 2017, 26 July 2017 (2017-07-26), XP033249624 *
THEIS, LUCAS ET AL.: "LOSSY IMAGE COMPRESSION WITH COMPRESSIVE AUTOENCODERS", 1 March 2017 (2017-03-01), XP080753545 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11675943B2 (en) 2017-01-04 2023-06-13 Stmicroelectronics S.R.L. Tool to create a reconfigurable interconnect framework
US10872186B2 (en) 2017-01-04 2020-12-22 Stmicroelectronics S.R.L. Tool to create a reconfigurable interconnect framework
US11227086B2 (en) 2017-01-04 2022-01-18 Stmicroelectronics S.R.L. Reconfigurable interconnect
US11562115B2 (en) 2017-01-04 2023-01-24 Stmicroelectronics S.R.L. Configurable accelerator framework including a stream switch having a plurality of unidirectional stream links
WO2019208677A1 (fr) * 2018-04-27 2019-10-31 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
US20220224907A1 (en) * 2019-05-10 2022-07-14 Nippon Telegraph And Telephone Corporation Encoding apparatus, encoding method, and program
US11593609B2 (en) 2020-02-18 2023-02-28 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks
US11880759B2 (en) 2020-02-18 2024-01-23 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks
US11531873B2 (en) 2020-06-23 2022-12-20 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression
US11836608B2 (en) 2020-06-23 2023-12-05 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression
JP2023516305A (ja) * 2020-12-16 2023-04-19 テンセント・アメリカ・エルエルシー 異種クライアントエンドポイントへのストリーミングのための2dビデオの適応のためのニューラルネットワークモデルの参照
JP7447293B2 (ja) 2020-12-16 2024-03-11 テンセント・アメリカ・エルエルシー 異種クライアントエンドポイントへのストリーミングのための2dビデオの適応のためのニューラルネットワークモデルの参照
WO2022210661A1 (fr) * 2021-03-30 2022-10-06 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Procédé de codage d'images, procédé de décodage d'images, procédé de traitement d'images, dispositif de codage d'images, et dispositif de décodage d'images

Similar Documents

Publication Publication Date Title
CN111295884B (zh) 图像处理装置及图像处理方法
WO2019093234A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
WO2019208677A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
JP7104085B2 (ja) 復号方法及び符号化方法
JP7014881B2 (ja) 符号化装置及び符号化方法
WO2018021374A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
WO2019022099A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
WO2018101288A1 (fr) Dispositif de codage, procédé de codage, dispositif de codage et procédé de décodage
JP7422811B2 (ja) 非一時的記憶媒体
JP7001822B2 (ja) 復号装置及び復号方法
WO2018097115A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
WO2019013235A1 (fr) Dispositif de codage, procédé de codage, dispositif de décodage et procédé de décodage
JP2017103744A (ja) 画像復号方法、画像符号化方法、画像復号装置、画像符号化装置、及び画像符号化復号装置
WO2019013236A1 (fr) Dispositif de codage, procédé de codage, dispositif de décodage et procédé de décodage
WO2019069782A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
US11356663B2 (en) Encoder, decoder, encoding method, and decoding method
WO2019009314A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
WO2018021373A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage
JPWO2018074291A1 (ja) 画像符号化方法、伝送方法および画像符号化装置
WO2022050166A1 (fr) Dispositif de reproduction, dispositif de transmission, procédé de reproduction et procédé de transmission
WO2019069902A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
WO2018097117A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
WO2019059107A1 (fr) Dispositif d'encodage, dispositif de décodage, procédé d'encodage et procédé de décodage
WO2019065329A1 (fr) Dispositif et procédé de codage, et dispositif et procédé de décodage
WO2019021803A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage, et procédé de décodage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18875770

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 18875770

Country of ref document: EP

Kind code of ref document: A1