WO2019225337A1

WO2019225337A1 - Encoding device, decoding device, encoding method, decoding method, encoding program and decoding program

Info

Publication number: WO2019225337A1
Application number: PCT/JP2019/018568
Authority: WO
Inventors: 翔太折橋; 忍工藤; 正樹北原; 清水　淳
Original assignee: 日本電信電話株式会社
Priority date: 2018-05-21
Filing date: 2019-05-09
Publication date: 2019-11-28
Also published as: JP2019205011A

Abstract

This encoding device for encoding images is provided with: a determination unit which determines whether an input image is to be reconfigured; an auxiliary information extraction unit which extracts, from the image determined to be reconfigured, auxiliary information to be used in reconfiguration; a conversion unit which converts the image determined to be reconfigured to obtain a converted image; and an encoding unit which encodes the converted image to obtain encoded data, wherein the conversion unit performs conversion so that, when the encoding unit performs the encoding, a smaller encoded amount is obtained than when the input image is encoded. According to the present invention, transmission and reception can be performed with a smaller encoded amount for a region to be reconfigured.

Description

Encoding device, decoding device, encoding method, decoding method, encoding program, and decoding program

The present invention relates to an encoding device, a decoding device, an encoding method, a decoding method, an encoding program, and a decoding program.

MPEG-4 and H.264 are standards for compressing and encoding video data. H.264 / AVC, H.H. H.265 / HEVC (hereinafter referred to as “HEVC”) is known. In addition, standardization of a new standard following HEVC is being studied. In these video compression coding standards, processing is performed in units obtained by dividing an image into rectangular blocks, the prediction block adjacent to the prediction target block is referred to, the pixel value of the prediction target block is predicted, and the prediction residual signal only Is used. Hereinafter, taking HEVC as an example, an intra-prediction coding method for predicting a pixel signal by closing in a frame will be described.

In HEVC, as shown in FIG. 17, the entire screen is divided into blocks of 64 pixels × 64 pixels (hereinafter referred to as “64 × 64”), and each unit is defined as a CTU (Coding Tree Unit). The CTU can be divided into four squares called CUs (Coding Units). By recursively processing the CTU, the CTU is divided into fine blocks. In HEVC, four types of 64 × 64, 32 × 32, 16 × 16, and 8 × 8 CU sizes can be used, and prediction processing is performed in units called PUs (Prediction Units) obtained by further dividing the CU. .
In the case of intra prediction, it is possible to use two types of PUs for determining whether to divide a CU into four squares. Each PU can selectively apply 35 types of prediction parameters. For example, a prediction parameter that minimizes a prediction residual signal with the original image is selected on the encoding side, and the prediction parameter and the prediction residual signal are decoded on the decoding side. Send to.

In HEVC, the prediction method can be selected from three types of Planar prediction, DC (Direct Current) prediction, and directionality prediction, and 33 prediction parameters are assigned to the directionality prediction. The total number is 35. As shown in FIG. 18, each prediction method performs prediction using pixel values of reference pixels located on the left and top of the prediction target block, and refers to one direction from the defined 33 directions in the directional prediction. By selecting a direction and assigning a pixel value in the reference direction to the reference block, a prediction pixel of the prediction target block is generated. In Planar prediction, the lower left and upper right of the prediction target block, the left and upper four pixels of the prediction target pixel are referred to, and each pixel in the prediction target block is predicted as a weighted average thereof. In DC prediction, a prediction value of a prediction target block is generated as a single average as the average of reference pixels located on the left and above the prediction target block.

As a method of reducing the code amount while maintaining the image quality, there is a method of reducing the information amount of the prediction residual by improving the prediction method of the predictive coding based on the above for the purpose of complete pixel reproduction.
As another method, a processing method for reconstructing a pseudo image on the decoding side is introduced, and when used together with the conventional encoding method, the above prediction method and its high accuracy cannot be efficiently encoded. Also for an image, a method has been proposed in which the amount of code is reduced while maintaining the subjective quality of a decoded image without aiming at complete pixel reproduction (see Patent Document 1). According to the technique described in Patent Document 1, on the encoding side, an input image is decomposed by Cartoon-Texture signal decomposition, and a non-synthetic component image represented by the sum of a Cartoon component and a non-synthetic Texture component, and a representative of the synthetic Texture The texture and the area information corresponding to the composite texture are transmitted. The area information is expressed by an image, and includes a synthesis area and a synthesis method corresponding to the area. On the decoding side, after decoding the non-synthesized component image, the decoded image is obtained by adding the image reconstructed by the texture synthesis using the representative texture of the synthesized texture and the region information. Here, an existing coding standard is used as a method of coding and decoding region information corresponding to a non-synthesized component image and a synthesized texture. The technique described in Patent Document 1 can encode an image with a large number of texture components with a smaller code amount.

Japanese Patent Laid-Open No. 2017-092801

In the technique described in Patent Document 1, it is necessary to transmit region information corresponding to a composite texture, and this region information is transmitted to the decoding side as an image. For this reason, it is necessary to transmit two frames of a non-synthetic component image and a region information image for transmission of one frame, which causes a problem that the amount of code of auxiliary information increases. As described above, in an encoding method involving reconstruction such as texture synthesis on the decoding side, it is desired to realize a reconstruction process that enables transmission and reception of a region to be reconstructed with a smaller code amount.

In view of the above circumstances, an object of the present invention is to provide a technique capable of transmitting and receiving a region to be reconfigured with a smaller code amount.

One aspect of the present invention is an encoding device that encodes an image, a determination unit that determines whether or not an input image is a reconstruction target, and an image that is determined to be the reconstruction target An auxiliary information extraction unit that extracts auxiliary information that is information used for reconstruction, a conversion unit that converts the image determined to be the reconstruction target and obtains a converted image, and encodes and encodes the converted image An encoding unit that obtains data, and the conversion unit is an encoding device that converts the input image so as to have a smaller code amount than when the input image is encoded when the encoding unit encodes. is there.

One embodiment of the present invention is the above encoding device, in which the determination unit acquires the estimated generated code amount and the estimated distortion amount and performs the rate distortion optimization to obtain the input image. It is determined whether or not to be reconfigured.

Further, one aspect of the present invention is the encoding device described above, in which the auxiliary information is the reconstruction target while maintaining the characteristics of the image that is determined to be the reconstruction image. This is information for inverse conversion into an image having a smaller code amount than the determined image.

Another embodiment of the present invention is a decoding device that decodes encoded data obtained by encoding an image, a decoding unit that decodes input encoded data to obtain a decoded image, and the decoded image is reconstructed A determination unit that determines whether or not the image is a target image; and auxiliary information that is information used for reconstruction is acquired, and the decoded image that is determined to be the image to be reconstructed is converted to the auxiliary information. And a reconfiguration unit that reconfigures the decoding device.

One embodiment of the present invention is an encoding method by an encoding device that encodes an image, the determination step for determining whether or not the input image is a reconstruction target, and the reconstruction target Then, an auxiliary information extraction step for extracting auxiliary information, which is information for use in reconstruction, from the determined image, and an image determined to be the reconstruction target than when the input image is encoded The encoding method includes a conversion step of converting the code amount so as to obtain a converted image and an encoding step of encoding the converted image to obtain encoded data.

Another embodiment of the present invention is a decoding method by a decoding device that decodes encoded data in which an image is encoded, the decoding step of decoding input encoded data to obtain a decoded image, and the decoded image Determining whether or not is an image to be reconstructed, and acquiring auxiliary information that is information for use in reconstruction, and decoding the image that has been determined to be the image to be reconstructed, And a reconstruction step of reconstructing using auxiliary information.

Further, one embodiment of the present invention is an encoding program for causing a computer to function as the above encoding device.

Further, one embodiment of the present invention is a decoding program for causing a computer to function as the above-described decoding device.

According to the present invention, a region to be reconfigured can be transmitted and received with a smaller code amount.

It is a flowchart which shows the flow of the process by the encoding apparatus 10 which concerns on 1st Embodiment. It is a block diagram which shows the function structure of the encoding apparatus 10 which concerns on 1st Embodiment. It is a flowchart which shows the flow of the process by the decoding apparatus 20 which concerns on 1st Embodiment. It is a block diagram which shows the function structure of the decoding apparatus 20 which concerns on 1st Embodiment. It is a block diagram which shows the function structure of the encoding apparatus 30 which concerns on 2nd Embodiment. It is a block diagram which shows the function structure of the decoding apparatus 40 which concerns on 2nd Embodiment. It is a flowchart which shows the flow of the process by the encoding apparatus 50 and the decoding apparatus 60 which concern on a prior art. It is a block diagram which shows the function structure of the encoding apparatus 50 and the decoding apparatus 60 which concern on a prior art. It is a flowchart which shows the flow of the process by the encoding apparatus 70 and the decoding apparatus 80 which concern on 3rd Embodiment. It is a block diagram which shows the function structure of the encoding apparatus 70 and the decoding apparatus 80 which concern on 3rd Embodiment. It is a block diagram which shows the structure of the network by the encoding apparatus 70 and the decoding apparatus 80 which concern on 3rd Embodiment. It is a flowchart which shows the flow of the learning process by the encoding apparatus 70 and decoding apparatus 80 which concern on 3rd Embodiment. It is a block diagram which shows the structure of the network in the defect image reference network learning process which concerns on 3rd Embodiment. It is a block diagram which shows the structure of the network by the encoding apparatus 70 and the decoding apparatus 80 which concern on 4th Embodiment. It is a block diagram which shows the structure of the network by the encoding apparatus 70 and the decoding apparatus 80 which concern on 4th Embodiment. It is a block diagram which shows the structure of the network in the auxiliary information extraction / reference network learning process which concerns on 4th Embodiment. It is a schematic diagram for demonstrating the structure of the block division | segmentation of the prediction in HEVC screen. It is a schematic diagram for demonstrating the structure of the intra prediction in HEVC.

Hereinafter, although an embodiment of the present invention will be described using a combination with HEVC intra prediction coding as an example, the present invention is not limited to HEVC and intra prediction. That is, the present invention can be applied to image coding methods other than HEVC and inter prediction.

In the present invention, for each block such as CTU or CU of HEVC, it is determined whether to be a reconstruction target or a non-reconstruction target on the encoding side, and a block determined as a reconstruction target (hereinafter, a reconstruction target block) ) Extract auxiliary information from Here, reconstruction refers to a process of generating a pseudo image that matches a target region of an image by texture synthesis, image interpolation synthesis processing, or the like. Note that the pseudo image referred to here is, for example, an image in which it is difficult to feel a difference from a subjective viewpoint as compared with an input image.
In addition, the reconstruction target block is subjected to uniform image processing on the entire block so as to reduce the amount of information of the prediction residual in the HEVC intra prediction, and is input to the HEVC encoder. In other words, a block with low prediction accuracy in HEVC or a block related to a subject that does not need to accurately reproduce pixels of an image before encoding if a certain level of subjective image quality can be ensured is set as a reconstruction target block. By configuring the pixels with predictable pixels, the amount of code required for encoding is reduced while maintaining a desired image quality. On the decoding side, the reconstruction target block is determined by determining whether or not uniform image processing is performed on the entire block.

<First Embodiment>
The first embodiment will be described below with reference to the drawings.

[Processing of encoding apparatus]
First, the processing of the encoding device according to the present invention will be described.
FIG. 1 shows a processing flow of the encoding apparatus according to the first embodiment of the present invention.

In the block division processing, the shape of the encoding processing block is determined from the input picture (step S101). The output block division shape follows CTU, CU, and PU as shown in FIG. 17, and this block is used as a unit of reconstruction processing on the decoding side and a unit of HEVC encoding processing.

As a method for determining the division shape, in addition to a method for determining a uniform rectangle such as CTU, a CU division shape determined by rate distortion optimization as implemented in a HEVC test model (HM; HEVC Test Model) A determination method, a method of determining a result obtained by performing region division for each object used in image recognition as an approximation in block units, or the like can be used.

In the coding method determination process, it is determined whether the block is divided into blocks by the block division process, whether to be a reconstruction target block or a non-reconstruction target block (step S102). As a determination method, for example, the estimated generated code amount and the estimated distortion amount are respectively derived for the case of the reconfiguration target and the case of the non-reconfiguration target, and the determination is made by applying rate distortion optimization. The method can be used.

When it is determined as the reconstruction target block (step S103: Yes), auxiliary information to be transmitted to the decoding device to assist the reconstruction process is extracted from the reconstruction target block by the auxiliary information extraction process (step S104). . Note that the reconstruction process is a process of inversely transforming, on the decoding side, a block obtained by performing some kind of transformation described later on the block to be reconstructed. In the auxiliary information extraction process, for example, when reconstruction is performed by synthesizing a reconstruction target block by image synthesis, a representative texture used at the time of synthesis or a label identifying an object is extracted as auxiliary information.

The extracted auxiliary information is entropy encoded by the auxiliary information entropy encoding process, and becomes encoded data of the auxiliary information. For the auxiliary information entropy encoding process, any encoding method such as Huffman encoding or run-length encoding can be used (step S105).

The block to be reconstructed after the auxiliary information is extracted is converted into an image that can be transmitted with a smaller code amount by HEVC by image conversion processing (step S106). In the image conversion process, for example, the reconstruction target block may be replaced with the average value of the block, and the prediction residual when predicting with an arbitrary or specific mode number in HEVC intra direction prediction is asymptotic to zero. Such conversion may be performed.

Further, the mode number of HEVC intra prediction used for conversion may be transmitted to the decoding side as a part of the auxiliary information, and the specific mode number of HEVC intra prediction corresponds to the reconstruction processing method on the decoding side. In addition, image conversion may be performed, and the correspondence relationship may be transmitted to the decoding side as part of the auxiliary information.

For example, when texture synthesis is used as a reconstruction process, an intra prediction mode number and a representative texture may be associated with each other and the corresponding relationship may be transmitted as auxiliary information to the decoding side. The image conversion method may be a method other than conversion based on HEVC intra prediction. An arbitrary conversion method capable of obtaining an output that does not exist in the input picture may be selected from those defined or previously defined in the course of the image conversion process, and the conversion method may be transmitted to the decoding side as auxiliary information.

The converted image (hereinafter referred to as “converted image”) is encoded by the converted image intra encoding process to obtain encoded data of the converted image (step S107).

The above processing is applied to all blocks in the order of processing (step S108 and step S109), and encoded data of auxiliary information and encoded data of a converted image are obtained as transmission information.

[Configuration Example of Encoding Device]
Next, a configuration example of an encoding device for realizing the above processing will be described.
FIG. 2 shows a configuration example of the encoding device 10 in the first embodiment. As illustrated, the encoding apparatus 10 includes a block division unit 101, an encoding scheme determination unit 102, an auxiliary information extraction unit 103, an auxiliary information entropy encoding unit 104, an image conversion unit 105, and an intra prediction unit. 107, a transform / quantization unit 108, an entropy coding unit 109, an inverse quantization / inverse transform unit 110, and a prediction memory 111.

The block division unit 101 performs block division processing with the input picture as an input. The block division unit 101 outputs an input picture that has been divided into blocks.

The encoding method determination unit 102 performs an encoding method determination process using an input picture divided into blocks as an input. The encoding method determination unit 102 outputs a determination result of the block encoding method.

The auxiliary information extraction unit 103 performs auxiliary information extraction processing with the reconstruction target block and the reference block as inputs. The reference block is a block including a pixel to be referred to in the reconstruction process. For example, in the case of using interpolation of an image as the reconstruction process, the reference block is a block including a pixel referred to in the interpolation process. The auxiliary information extraction unit 103 outputs auxiliary information.

The auxiliary information entropy encoding unit 104 performs entropy encoding on the input auxiliary information to obtain encoded data of the auxiliary information. The auxiliary information entropy encoding unit 104 outputs encoded data of auxiliary information.

The image conversion unit 105 performs an image conversion process with the reconstruction target block as an input. The image conversion unit 105 outputs the converted block.

The post-conversion block and the non-reconstruction target block are encoded by intra encoding. In the intra coding, the prediction residual with the predicted image output from the intra prediction unit 107 is subjected to orthogonal transform and quantization by the transform / quantization unit 108 and encoded by the entropy coding unit 109. Thereby, encoded data of the image is obtained.
In the present embodiment, the entropy encoding unit 109 that encodes the prediction residual and the auxiliary information entropy encoding unit 104 that encodes auxiliary information are configured as separate functional blocks. You may be comprised by the same functional block. That is, the encoding residual encoding and the auxiliary information encoding may be performed by one encoding unit, for example, using a common entropy encoding scheme.

The prediction residual quantized by the transform / quantization unit 108 is subjected to inverse quantization and inverse transform processing by the inverse quantization / inverse transform unit 110 and is stored in the prediction memory 111. The data stored in the prediction memory 111 is used for intra prediction processing by the intra prediction unit 107 and auxiliary information report extraction processing by the auxiliary information extraction unit 103.

[Processing of decoding apparatus]
Next, the process of the decoding device that decodes an image from the encoded data generated by the above processing method and functional configuration will be described.
FIG. 3 shows a processing flow of the decoding device according to the first embodiment.

The post-conversion image decoding process decodes the encoded data of the post-conversion image to obtain a block of the decoded image of the post-conversion image (step S201). The decoded image may be a unit image corresponding to the input image, or a unit image corresponding to a block obtained by blocking the input image. In the following processes, the description will be continued assuming that the decoded image is an image of a unit corresponding to a block.

In the encoding method determination process, a block converted by the image conversion method used by the image conversion unit 105 of the encoding device 10 is determined as a reconstruction target block (step S202). For example, when the image conversion unit 105 of the encoding device 10 performs the process of uniformly replacing the reconstruction target block with the average value, the encoding method determination process is performed on the block obtained from the decoded image of the converted image. The processed block is determined as a reconstruction target block.

For the reconstruction target block (Yes in step S203), the coding method determination process corresponds to the reconstruction target block based on the coding method applied by the auxiliary information entropy coding unit 104 of the coding device 10. The encoded data of the auxiliary information to be decoded is decoded (step S204).

In the reconstruction process, the auxiliary information and the reference block that can be referred to by the reconstruction target block are input, and the reconstruction process is performed (step S205).

The above processing is applied to all blocks in the order of processing (step S206 and step S207), and a final decoded image is obtained.

[Configuration Example of Decoding Device]
Next, a configuration example of a decoding device for realizing the above processing will be described.
FIG. 4 shows a configuration example of the decoding device 20 in the first embodiment. As illustrated, the decoding device 20 includes an entropy decoding unit 201, an inverse transform / inverse quantization unit 202, an intra prediction unit 203, a prediction memory 204, a reconstruction unit 205, and an encoding scheme determination unit 206. And an auxiliary information entropy decoding unit 207.

The encoded data of the converted image is decoded by HEVC. In decoding by HEVC, first, encoded data of a converted image is entropy-decoded by an entropy decoding unit 201, and inverse transformation / inverse quantization processing is performed by an inverse transformation / inverse quantization unit 202. Accordingly, the prediction residual image is decoded, and the prediction result by the intra prediction unit 203 is added, so that a block of the decoded image of the converted image is obtained.

The decoded converted image is accumulated in the prediction memory 204 and used as an input to the intra prediction unit 203 and the reconstruction unit 205.

The encoding method determination unit 206 receives the decoded image block of the converted image, performs an encoding method determination process, and outputs a determination result.

The auxiliary information entropy decoding unit 207 performs entropy decoding on the encoded data of the input auxiliary information to obtain auxiliary information. The auxiliary information entropy decoding unit 207 outputs auxiliary information to the reconstruction unit 205.

The reconstruction unit 205 performs reconstruction processing with the auxiliary information, the reference pixel that can be referred to by the reconstruction target block, and the reconstruction target block as inputs, and outputs a final output picture.

As described above, in the encoding method and the decoding method according to the above-described embodiment, unlike the related art, whether the input image is to be reconstructed in units of processing blocks or not to be reconstructed. Classify and apply reconstruction process. The encoding method and the decoding method according to the above embodiment can reduce the amount of code when transmitting boundary information by restricting the processing in units of blocks. In the encoding method and the decoding method according to the above embodiment, for example, the boundary information is transmitted by sharing a rule of replacing the inside of the reconstruction target block with an average value between the encoding device 10 and the decoding device 20. Therefore, it is possible to specify the position of the reconstruction target block.

Conventionally, while it is possible to designate a reconstruction target in an arbitrary shape, it has been necessary to transmit information about whether or not to be a reconstruction target and a reconstruction method as auxiliary information to the decoding side for each region. As a result, there has conventionally been a problem that the code amount of the auxiliary information increases. On the other hand, in the encoding method and decoding method according to the above embodiment, a reconstruction target block is specified for each block, and the specified reconstruction target block is encoded by HEVC with a smaller code amount on the encoding side. Processing that can be performed (for example, processing that replaces the entire block with an average value) is performed, and processing for determining the presence or absence of the processing is performed on the decoding side. Thereby, in the encoding method and decoding method according to the above-described embodiment, it is possible to determine a reconstructed block on the decoding side without transmitting boundary information to the decoding side as auxiliary information. Also, by associating the HEVC mode number with the reconstruction method, the reconstruction method can be transmitted to the decoding side at the same time.

<Second Embodiment>
Hereinafter, the second embodiment will be described with reference to the drawings. In the second embodiment described below, the configurations of the encoding device and the decoding device are different from those of the first embodiment described above.

[Configuration Example of Encoding Device]
FIG. 5 shows the configuration of the encoding device 30 according to the second embodiment. As illustrated, the encoding device 30 includes a preprocessing device 31 and a conventional encoding device 32. The preprocessing device 31 includes a block division unit 301, an encoding scheme determination unit 302, an auxiliary information extraction unit 303, an auxiliary information entropy encoding unit 304, an image conversion unit 305, and a post-conversion image memory 306. Consists of including. The conventional coding apparatus 32 includes an intra prediction unit 307, a transform / quantization unit 308, an entropy coding unit 309, an inverse quantization / inverse transform unit 310, and a prediction memory 311. The

As shown in FIG. 5, the difference between the encoding device 30 in the second embodiment and the encoding device 10 in the first embodiment is that a block division unit, an encoding method determination unit, an image conversion unit, and auxiliary information extraction The apparatus provided with a part and an entropy encoding part is a point provided as the pre-processing apparatus 31 independently from other structural parts (namely, the structural part with which a conventional encoding apparatus is provided).

In this case, as in the configuration illustrated in FIG. 5, the converted image is stored in the converted image memory 306, and the auxiliary information extraction unit 303 refers to the converted image stored in the converted image memory 306. May be. Components other than the components included in the preprocessing device 31 are configured independently as the conventional encoding device 32. As the conventional encoding device 32, for example, an HEVC intra encoding device, an encoding device conforming to an image encoding standard such as JPEG (JointoPhotographic Experts Group), or the like can be used.

Note that the processing flow of the encoding device 30 is the same as the processing flow shown in FIG.

[Configuration Example of Decoding Device]
Next, the configuration of the decoding device 40 in the second embodiment is shown in FIG. As shown in the figure, the decoding device 40 includes a conventional decoding device 41 and a post-processing device 42. The conventional decoding device 41 includes an entropy decoding unit 401, an inverse transform / inverse quantization unit 402, an intra prediction unit 403, and a prediction memory 404. The post-processing device 42 includes a reconstruction unit 405, an encoding scheme determination unit 406, and an auxiliary information entropy decoding unit 407.

As shown in FIG. 6, the difference between the decoding apparatus 40 in the second embodiment and the decoding apparatus 20 in the first embodiment is that an apparatus including an encoding scheme determination unit, an auxiliary information entropy decoding unit, and a reconstruction unit. The post-processing device 42 is provided independently from other components (that is, components included in the conventional decoding device).

In this case, as shown in the configuration illustrated in FIG. 6, the output picture memory 408 may store the output picture, and the reconstruction unit 405 may refer to the output picture stored in the output picture memory 408. Components other than the components included in the post-processing device 42 are configured independently as the conventional decoding device 41.

Note that the processing flow of the decoding device 40 is the same as the processing flow shown in FIG.

According to the encoding method and the decoding method according to the second embodiment described above, the pre-processing device 31 and the post-processing device 42 that can be used in combination with the conventional encoding device and decoding device can be realized. Thereby, since the improvement of the encoding efficiency is additive in the standard and the pre-processing device 31 and the post-processing device 42, according to the encoding method and the decoding method according to the second embodiment, the standard When the efficiency of the encoding device based on is improved, the encoding efficiency of the entire encoding device 30 can be improved.

Hereinafter, means for reconstructing the reconstruction target block on the decoding side by image interpolation and synthesis processing using machine learning will be described. Of course, this means can be used in the first and second embodiments.
<Third Embodiment>
Hereinafter, a third embodiment will be described with reference to the drawings.

As described above, each prediction method (Planar prediction, DC prediction, and directionality prediction) that can be selected in HEVC refers to a referenceable pixel and performs prediction based on a simple prediction rule. There is a problem that prediction efficiency is lowered in an image in which components are randomly distributed. In such an image, since the amount of information of the prediction residual signal is large, when encoding is performed with the quantization width of the prediction residual signal being constant, the amount of code is excessively generated.

Unlike the above prediction, as a method for realizing compression coding for reducing the code amount while maintaining the subjective quality for such an image, unlike the above prediction, the image is reproduced in a pseudo manner. A method of introducing a processing method to be configured is conceivable.

According to the technique described in Non-Patent Document 1 (hereinafter referred to as “Prior Art 1”), an interpolation network constituted by a convolutional neural network and an interpolation image constituted by a convolutional neural network and interpolated by the interpolation network are interpolated. By alternately learning the two networks of the identification network that identify the true image that is not, following the framework of the hostile generation network, the interpolation network can reconstruct the missing region of the image in a pseudo manner.

By applying the interpolation network of the prior art 1 to the decoding side, it is possible to reconstruct an image on the decoding side with respect to the above-described image region where the prediction efficiency is reduced, and transmission of the reconstructed region is not necessary. Can be reduced.

[Example of image encoding / decoding process using interpolation network]
An example of image encoding / decoding processing using an interpolation network is shown in FIG.

In the image loss processing, a region to be reconstructed is selected on the decoding side by image interpolation from the input image, a loss image is generated by loss, and output together with the loss region information indicating the loss region (step S301). Here, the missing area information is a binary image or the like showing the missing area.

In the defect area information encoding process, since the defect area information is transmitted to the decoding side, a process for encoding the defect area information is performed by using a conventional image encoding method such as JPEG (Joint Photographic Experts Group) or HEVC, or a run length. This is performed by an entropy encoding method such as encoding. Thus, the missing area information encoding process obtains encoded data of the missing area information (step S302).

In the image encoding process, the missing image is encoded using a conventional image encoding method such as JPEG or HEVC. Thereby, the image encoding process obtains encoded data of the missing image (step S303).

In the image decoding process, a decoded missing image is obtained from the encoded data of the missing image (step S304).

The missing area information decoding process obtains missing area information from the encoded data of the missing area information (step S305).

In the image interpolation process, the decoded missing image and the missing area information are input to the interpolation network of the conventional technique 1 to obtain a final output image. The processing unit of the encoding process and the decoding process may be the entire screen, or may be a block unit obtained by dividing the screen using a structure such as HEVC CTU (step S306).

[Configuration Examples of Encoding Device and Decoding Device]
FIG. 8 shows a configuration example of the encoding device 50 and the decoding device 60 that realize the above encoding processing and decoding processing. As illustrated, the encoding device 50 includes an image loss processing unit 501, an image encoding unit 502, and a missing region information encoding unit 503.

The image loss processing unit 501 receives the input image and performs image loss processing. As a result, the image defect processing unit 501 outputs a defect image and defect area information.

The image encoding unit 502 receives the missing image and performs image encoding processing. As a result, the image encoding unit 502 outputs encoded data of the missing image.

The missing area information encoding unit 503 receives the missing area information as input and performs a missing area information encoding process. Thereby, the missing area information encoding unit 503 outputs encoded data of the missing area information.

The encoded data of the missing image and the encoded data of the missing area information are transmitted to the decoding device 60.

As shown in FIG. 8, the decoding device 60 includes an image decoding unit 601, a missing area information decoding unit 602, and an image interpolation unit 603.

The image decoding unit 601 receives the encoded data of the missing image and performs an image decoding process. Thereby, the image decoding unit 601 obtains a decoded missing image.

The missing area information decoding unit 602 receives the encoded data of the missing area information as input and performs a missing area information decoding process. Thereby, defect area information is obtained.

The image interpolation unit 603 includes an image interpolation network 604, and receives the decoded missing image and missing area information as input, and performs image interpolation processing. Thereby, the image interpolation unit 603 obtains a final output image.

In the above configuration, the subjective image quality of the output image greatly depends on the area of the missing area of the missing image in the image interpolation process. Specifically, the larger the area of the missing area to be interpolated, the smaller the amount of information input to the interpolation network, making it difficult to estimate the missing area in the image interpolation process and degrading the subjective image quality of the output image. Further, in the above configuration, if the missing region to be interpolated includes a complex element that cannot be inferred from the referenceable region, it is not reconstructed on the decoding side, or the subjective image quality of the output deteriorates.

Therefore, even when the area of the missing area is large or when the missing area is complex, it becomes an encoding method and decoding method including image interpolation processing capable of executing image interpolation processing while suppressing deterioration of subjective image quality, and a constituent element An efficient network learning method is desired.

Hereinafter, the third embodiment of the present invention will be described using learning by a hostile generation network using a convolutional neural network and an identification network as an example. The present invention describes image interpolation and hostile generation network by a convolutional neural network. It is not limited to learning by the framework of That is, any learning model in which the image interpolation method is acquired by learning can be applied to image interpolation. In addition, a learning method using an arbitrary error function can be applied to the learning method.

In the third embodiment, the encoding device performs feature extraction with reference to the original image, and transmits image interpolation auxiliary information for assisting image interpolation to the decoding device. The decoding device performs image interpolation using the image interpolation auxiliary information. Further, the networks used for extraction of image interpolation auxiliary information and image interpolation are individually optimized for each network, and then the networks are combined to be optimized as a whole.

[Flow of encoding process and decoding process]
First, an outline of encoding processing and decoding processing using the interpolation network and auxiliary information extraction network according to the present invention will be described.
FIG. 9 shows the flow of encoding processing and decoding processing according to the third embodiment.

In the image loss process, an area to be reconstructed is selected on the decoding side by image interpolation from the input image. In the image loss process, a defective image is generated by deleting the area by a process such as replacing the area with an average value. In the image defect process, the generated defect image is output together with the defect area information indicating the position of the defect area, which is a set of pixel values of the defect area.

Here, as the defect area information, for example, a binary mask image (hereinafter, a defect area mask image) indicating a defect area can be used. In addition, as a region selection method in image loss processing, a method of selecting a region with a large amount of generated codes when using a fixed quantization width in HEVC intra coding, or region division for each object used in image recognition Can be used to select a region that can be interpolated (step S401).

In the auxiliary information extraction process, image interpolation auxiliary information is extracted from an area corresponding to a missing area derived from the missing area information in the input image or the input image itself using a network for extracting image interpolation auxiliary information. (Step S402). Details of the network for extracting image interpolation auxiliary information will be described later.

The auxiliary information encoding process encodes the image interpolation auxiliary information extracted by the auxiliary information extraction process by a conventional entropy encoding method such as Huffman encoding. Thus, the auxiliary information encoding process obtains encoded data of the image interpolation auxiliary information (step S403).

In the missing area information encoding process, since the missing area information is transmitted to the decoding side, the process for encoding the reconstruction target area is performed using a conventional image encoding method such as JPEG or HEVC, or entropy such as run-length encoding. This is done according to the encoding method. Thereby, the missing area information encoding process obtains encoded data of the missing area information (step S404).

In the image encoding process, a defective image is encoded using a conventional image encoding method such as JPEG or HEVC. Thus, the image encoding process obtains encoded data of the missing image (step S405).

In the image decoding process, a decoded missing image is obtained from the encoded data of the missing image (step S406).

The missing area information decoding process obtains missing area information from the encoded data of the missing area information (step S407).

The auxiliary information decoding process obtains image interpolation auxiliary information from the encoded data of the image interpolation auxiliary information (step S407).

In the image interpolation process, the decoded missing image, the missing region information, and the image interpolation auxiliary information are input to a network for image interpolation, and a final output image is obtained. Details of the network for image interpolation will be described later (step S408).

The processing unit of the encoding process and the decoding process may be the entire screen, or may be a block unit obtained by dividing the screen using a structure such as HEVC CTU.

[Configuration Examples of Encoding Device and Decoding Device]
Next, FIG. 10 shows a configuration example of an encoding device and a decoding device that realize the above encoding processing and decoding processing. As illustrated, the encoding device 70 includes an image loss processing unit 701, an image encoding unit 702, a missing region information encoding unit 703, an auxiliary information extracting unit 704, and an auxiliary information encoding unit 705. Composed.

The image loss processing unit 701 receives an input image and performs image loss processing. Accordingly, the image defect processing unit 701 outputs a defect image and defect area information.

The image encoding unit 702 receives the missing image and performs image encoding processing. As a result, the image encoding unit 702 outputs encoded data of the missing image.

The missing area information encoding unit 703 receives the missing area information as input and performs a missing area information encoding process. Thereby, the missing area information encoding unit 703 outputs encoded data of the missing area information.

The auxiliary information extraction unit 704 performs an auxiliary information extraction process by using, as input, an area corresponding to the missing area derived from the missing area information in the input image or an entire image including an area that is not a missing area. As a result, the auxiliary information extraction unit 704 extracts image interpolation auxiliary information.

The auxiliary information encoding unit 705 receives the image interpolation auxiliary information and performs auxiliary information encoding processing. Thereby, the auxiliary information encoding unit 705 outputs encoded data of the image interpolation auxiliary information.

The encoded data of the missing image, the encoded data of the missing area information, and the encoded data of the image interpolation auxiliary information are transmitted to the decoding device 80.

As shown in FIG. 10, the decoding device 80 includes an image decoding unit 801, a missing region information decoding unit 802, an image interpolation unit 803, and an auxiliary information decoding unit 805.

The image decoding unit 801 receives the encoded data of the missing image and performs an image decoding process. Thereby, the image decoding unit 801 obtains a decoded missing image.

The missing area information decoding unit 802 receives the encoded data of the missing area information as input and performs a missing area information decoding process. Thereby, the missing area information decoding unit 802 obtains missing area information.

The auxiliary information decoding unit 805 receives the encoded data of the image interpolation auxiliary information and performs auxiliary information decoding processing. Thereby, the auxiliary information decoding unit 805 obtains image interpolation auxiliary information.

The image interpolation unit 803 receives the decoded missing image, the missing region information, and the image interpolation auxiliary information, and performs an image interpolation process with reference to the image interpolation auxiliary information. Thereby, the image interpolation unit 803 obtains a final output image.

[Configuration and learning method of auxiliary information extraction unit and image interpolation unit]
Next, the configuration of the auxiliary information extraction unit 704 and the image interpolation unit 803 and the learning method will be described.

FIG. 11 shows a network configuration of the auxiliary information extraction unit 704 and the image interpolation unit 803. As shown in the figure, the auxiliary information extraction unit 704 includes an auxiliary information extraction network 7041 for extracting image interpolation auxiliary information to be transmitted to the decoding side.

The auxiliary information extraction network 7041 is a network that receives the input image and the missing area information and outputs image interpolation auxiliary information. The auxiliary information extraction network 7041 configures an intermediate layer by a convolutional layer, a fully connected layer, or the like, for example, with an input as an input image and a defective area mask image as two images and an output as an arbitrary number of units.

As illustrated in FIG. 11, the image interpolation unit 803 refers to the auxiliary information reference network 8031 for predicting the missing area with reference to the image interpolation auxiliary information, and the missing image reference for predicting the missing area with reference to the missing image. A network 8032 and a reconstruction network 8033 for generating a final interpolated image from the outputs of the two networks.

The auxiliary information reference network 8031 is a network that receives the image interpolation auxiliary information and outputs an intermediate image by referring to the auxiliary information. The auxiliary information reference network 8031 has, for example, the same number of units as the image interpolation auxiliary information and the output as an intermediate image by referring to one auxiliary information, and the intermediate layer is formed by a fully connected layer, a deconvolution layer, a convolution layer, and the like. Constitute.

The missing image reference network 8032 is a network that outputs the intermediate image by referring to the missing image with the missing image and the missing area mask image of the input image as inputs. The missing image reference network 8032 has, for example, a convolutional layer, a fully connected layer, and a deconvolution using, as input, two images, a missing image of the input image and a missing region mask image, and an output as an intermediate image by referring to one missing image. An intermediate layer is constituted by layers and the like.

The reconstruction network 8033 is a network that receives an intermediate image based on auxiliary information reference and an intermediate image based on missing image reference and outputs a final output image in which a missing area is interpolated. The reconstruction network 8033 includes, for example, two intermediate images as input and one output image as output, and forms an intermediate layer including a convolution layer, a fully connected layer, a deconvolution layer, and the like.

With the above configuration, the auxiliary information extraction unit 704 and the image interpolation unit 803 are learned. At the time of learning, the framework of the hostile generation network can be used as in the prior art 1. At this time, as in the prior art 1, the identification network 9000 for evaluating the naturalness of the interpolated region receives the output image of the image interpolating unit 803 as an input, and calculates the probability that the output image is a true image that has not been interpolated. Output.

Next, a network learning method using the configuration of FIG. 11 will be described. In the learning process, a large number of sets of original images, missing images of original images generated by randomly assigning missing regions to the original images, and missing region information are prepared as teacher data. As the error function used in learning, for example, the mean square error of the pixels of the original image and the output image of the network (hereinafter referred to as mean square error) and the framework of the hostile generation network are applied, and the output image of the network is identified by the identification network. Error (hereinafter referred to as “identification network error”) or an error due to a weighted sum of the mean square error and the identification network error (hereinafter referred to as weighted error) can be used.

[Network learning method]
The flow of the learning process is shown in FIG.

In the missing image reference network learning process, the missing image reference network 8032 and the identification network 9000 shown in FIG. 11 are cut out and combined as shown in FIG. 13, and the output of the missing image reference network 8032 is regarded as an input to the identification network 9000. The image reference network 8032 is learned (step S501).

Specifically, in the missing image reference network learning process, the missing image and missing area information of the original image are input to the missing image reference network 8032, and the output image approaches the original image by the error back propagation method. Update the parameters. Here, in the missing image reference network learning process, learning is performed by first applying a mean square error as an error function, and then performing learning by applying a weighted error. In the subsequent learning processing of each network, learning is similarly performed using the mean square error, and then learning is performed using the weighted error.

In the auxiliary information extraction / reference network learning process, the auxiliary information extraction network 7041, the auxiliary information reference network 8031, and the identification network 9000 shown in FIG. 11 are cut out and combined as shown in FIG. 14 to identify the output of the auxiliary information reference network 8031. It is regarded as an input to the network 9000, and the auxiliary information extraction network 7041 and the auxiliary information reference network 8031 are learned (step S502).

Specifically, in the auxiliary information extraction / reference network learning process, the original image and the missing area information are input to a network in which the auxiliary information extraction network 7041 and the auxiliary information reference network 8031 are combined. In the auxiliary information extraction / reference network learning process, the mean square error and the weighted error are sequentially applied so that the output image approaches the original image, and the network parameters are updated by the error back propagation method.

The reconstruction network learning process includes a missing image reference network 8032, an auxiliary information extraction network 7041, an auxiliary information reference network 8031, a reconstruction network 8033, and a defect image reference network learning process and an auxiliary information extraction / reference network learning process. The identification networks 9000 are combined as shown in FIG. 11, and only the reconfiguration network 8033 is learned (step S503).

Specifically, the reconstruction network learning process inputs the original image, the missing image of the original image, and the missing area information to the combined network, and the mean square error and the weight so that the output image approaches the original image. The attached error is applied in order, and only the parameters of the reconstruction network are updated by the error back propagation method.

The whole learning process simultaneously learns the missing image reference network 8032, the auxiliary information extraction network 7041, the auxiliary information reference network 8031, and the reconstruction network 8033 that are combined as shown in FIG. 11 in the reconstruction network learning process (step S504). ).

Specifically, the whole learning process is performed by inputting the original image, the missing image of the original image, and the missing area information into the combined network, and the mean square error and the weighted error so that the output image approaches the original image. Are applied in order, and the parameters of all networks are updated by the error back propagation method.
Note that only the auxiliary information extraction network may be configured to learn with fixed network parameters.

Note that the order of application of the above error functions is an example, and learning may be performed without using the framework of the hostile generation network including the identification network 9000, and the identification network error, the mean square error, or the weighted error is learned. You may apply, changing at any time according to the number of repetitions.

Further, in the case of learning in the framework of the hostile generation network, the identification network 9000 is learned according to the number of iterations and the accuracy rate of the identification network 9000 independently of the learning process of each network in FIG. May be.

In learning of the identification network 9000, for example, the network output image and the original image used in each learning process of FIG. 12 are alternately input to the identification network 9000, and the probability that the input is the original image is output. Alternatively, the error from the correct value of 1 may be evaluated by an error function such as a mutual information amount, and the parameters may be updated by the error back propagation method.

Further, the end of each learning process may be determined by using a threshold process for the number of iterations or a reduction in error. The unit of processing may be the entire screen or may be a block unit obtained by dividing the screen using a structure such as HEVC CTU.

As described above, the encoding method and the decoding method in the third embodiment are different from the method of obtaining the output image by generating the image by applying the interpolation network in the prior art to the decoding side, and using the image interpolation auxiliary information. Generate an image. Thereby, the encoding method and the decoding method in the third embodiment can improve the prediction accuracy over the method using the conventional technique, and can realize the generation using the feature of the original picture.

In addition, since the encoding method and the decoding method in the third embodiment can determine the image interpolation auxiliary information to be transmitted by learning, the image interpolation auxiliary information determined by manual trial and error such as conventional HEVC. Compared to the extraction, it is possible to extract image interpolation auxiliary information that can obtain a more accurate reconstruction result. Furthermore, the encoding method and the decoding method according to the third embodiment acquire an intended operation for each network having a complicated configuration to be learned by controlling the network learning order and the error function to be applied. Can be made.

In the prior art 1 described above, a method of acquiring an image interpolation network by learning has been proposed. However, when this interpolation network is applied to the decoding side in the framework of image encoding, particularly when a large area is interpolated, When the region to be interpolated is so complex that it cannot be inferred from the surroundings, the generation accuracy decreases.

On the other hand, the encoding method and decoding method in the third embodiment solve this problem by providing an auxiliary information extraction unit 704 on the encoding side and providing image interpolation auxiliary information to the interpolation network. At this time, the auxiliary information extraction network 7041 that defines the image interpolation auxiliary information is also acquired by learning, so that the encoding method and the decoding method in the third embodiment can be performed manually like image encoding such as HEVC. As compared with the image interpolation auxiliary information designed in the above, image interpolation auxiliary information with higher image generation accuracy can be extracted.

Since the configuration of the encoding method and the decoding method in the third embodiment includes the auxiliary information extraction unit 704 that generates image interpolation auxiliary information and acquires network parameters by learning, the auxiliary information extraction unit 704 and the image When the interpolating unit 803 learns simultaneously, it is difficult for each network to learn the intended operation. In particular, when using the framework of the hostile generation network, this tendency becomes remarkable because it is difficult to adjust learning.

However, in the encoding method and decoding method according to the third embodiment, the auxiliary information extraction unit 704 and the image interpolation unit 803 are divided into networks for each role, and the network to be learned and the error to be applied according to the number of learning iterations. By controlling the function, each network can acquire an intended operation.

<Fourth Embodiment>
Hereinafter, a fourth embodiment will be described with reference to the drawings.

The fourth embodiment differs from the third embodiment in the configuration of the network of the auxiliary information extraction unit and the image interpolation unit, and generates image interpolation auxiliary information from the output of the missing image reference network and the difference between the input images. .

FIG. 15 shows a network configuration in the fourth embodiment. As illustrated, the auxiliary information extraction unit 704 includes an auxiliary information extraction network 7041 and a missing image reference network 8032 using network parameters common to the image interpolation unit 803.

The auxiliary information extraction network 7041 is a network that outputs the image interpolation auxiliary information by using the difference between the input image and the intermediate image based on the missing image and the missing area information as inputs. The auxiliary information extraction network 7041 has, for example, a difference image between an input image and an intermediate image by referring to a missing image, and two images of a missing region mask image as an input, an output as an arbitrary number of units, a convolution layer, and a fully connected An intermediate layer is constituted by layers and the like.

As illustrated in FIG. 15, the image interpolation unit 803 includes an auxiliary information reference network 8031, a missing image reference network 8032, and a reconstruction network 8033.
The input / output of each network is the same as that of the third embodiment except for the missing image reference network 8032.

The auxiliary information reference network 8031 is a network that receives the image interpolation auxiliary information and outputs an intermediate image by referring to the auxiliary information.

The missing image reference network 8032 is a network that outputs the intermediate image based on the missing image by using the missing image of the input image and the missing area mask image as inputs.

The intermediate image based on the missing image reference is input to the reconstruction network 8033 as a component of the image interpolation unit 803. Further, the difference between the intermediate image and the input image based on the missing image reference is input to the auxiliary information extraction network 7041 as a component of the auxiliary information extraction unit 704.

The reconstruction network 8033 is a network that receives an intermediate image based on auxiliary information reference and an intermediate image based on missing image reference, and outputs a final output image in which a missing area is interpolated.

With the above configuration, learning of the auxiliary information extraction unit 704 and the image interpolation unit 803 is performed.
The learning process is the same as in the third embodiment, but the network configuration in the auxiliary information extraction / reference network learning process is as shown in FIG. In this process, only the auxiliary information extraction network 7041 and the auxiliary information reference network 8031 are learned in the configuration of FIG.

As described above, the auxiliary information extraction unit 704 according to the fourth embodiment can directly input the original image as in the third embodiment, but as described above, the auxiliary information extraction unit 704 can perform the decoding on the decoding side and the encoding side. By assuming that the prediction result from the peripheral block (intermediate image by referring to the missing image) is shared, a difference image between the original image and the predicted image from the peripheral block can be input. Thereby, it is possible to explicitly introduce a constraint that the output image of the image interpolation unit 803 is not too far from the original image, and the subjective quality of the interpolation result is improved.

A part or all of the encoding device and the decoding device in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be a program for realizing a part of the above-described functions, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system. It may be realized using a programmable logic device such as an FPGA (Field Programmable Gate Array).

As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design and the like within the scope not departing from the gist of the present invention.

DESCRIPTION OF

SYMBOLS

10,30 ... Coding apparatus, 101, 301 ... Block division part, 102, 302 ... Coding system determination part, 103, 303 ... Auxiliary information extraction part, 104.304 ... Auxiliary information entropy coding part, 105, 305 ... Image conversion unit, 306 ... post-conversion image memory, 107, 307 ... intra prediction unit, 108, 308 ... transformation / quantization unit, 109, 309 ... entropy coding unit, 110, 310 ... inverse quantization / inverse transformation unit, 111, 311 ... Prediction memory, 20 ... Decoding device, 201, 401 ... Entropy decoding unit, 202, 402 ... Inverse transformation / inverse quantization unit, 203, 403 ... Intra prediction unit, 204, 404 ... Prediction memory, 205 405:

Reconstruction unit

206, 406 Coding method determination unit 207 407 Auxiliary information entropy decoding unit 408

Output picture memory

50, 70 ... coding device, 501, 701 ... image loss processing unit, 502, 702 ... image coding unit, 503, 703 ... missing region information coding unit, 704 ... auxiliary information extraction unit, 7041 ... auxiliary information Extraction network, 705 ... auxiliary information encoding unit, 60, 80 ... decoding device, 601, 801 ... image decoding unit, 602, 802 ... missing region information decoding unit, 603, 803 ... image interpolation unit, 8031 ... auxiliary information reference network , 8032 ... Missing image reference network, 8033 ... Reconstruction network, 604 ... Image interpolation network, 805 ... Auxiliary information decoding unit, 9000 ... Identification network

Claims

An encoding device for encoding an image, comprising:
A determination unit that determines whether the input image is to be reconstructed;
An auxiliary information extraction unit that extracts auxiliary information, which is information for use in reconstruction, from the image determined to be the reconstruction target;
A conversion unit that converts the image determined to be the reconstruction target and obtains a converted image;
An encoding unit that encodes the converted image to obtain encoded data;
With
The conversion unit performs conversion such that when the encoding unit encodes, the code amount is smaller than when the input image is encoded.
The code according to claim 1, wherein the determination unit determines whether or not the input image is to be reconstructed by acquiring an estimated generated code amount and an estimated distortion amount and performing rate distortion optimization. Device.
The auxiliary information is information for inversely converting the converted image into an image having a smaller code amount than the image determined to be the reconstruction target while maintaining the characteristics of the image determined to be the reconstruction target. The encoding device according to claim 1 or 2.
A decoding device for decoding encoded data in which an image is encoded,
A decoding unit that decodes input encoded data and obtains a decoded image;
A determination unit that determines whether or not the decoded image is an image to be reconstructed;
A reconstruction unit that obtains auxiliary information that is information for use in reconstruction, and reconstructs a decoded image determined to be the image to be reconstructed using the auxiliary information;
A decoding device comprising:
An encoding method by an encoding device for encoding an image,
A determination step of determining whether or not the input image is to be reconstructed;
Auxiliary information extraction step for extracting auxiliary information that is information for use in reconstruction from the image determined to be the reconstruction target;
A conversion step of converting the image determined to be the reconstruction target into a code amount smaller than that when the input image is encoded to obtain a converted image;
An encoding step of encoding the converted image to obtain encoded data;
An encoding method comprising:
A decoding method by a decoding device for decoding encoded data in which an image is encoded,
A decoding step of decoding the input encoded data to obtain a decoded image;
A determination step of determining whether or not the decoded image is an image to be reconstructed;
A reconstruction step of acquiring auxiliary information that is information for use in reconstruction, and reconstructing a decoded image determined to be the image to be reconstructed using the auxiliary information;
A decryption method.
An encoding program for causing a computer to function as the encoding device according to any one of claims 1 to 3.
A decoding program for causing a computer to function as the decoding device according to claim 4.