WO2012060179A1

WO2012060179A1 - Encoder apparatus, decoder apparatus, encoding method, decoding method, program, recording medium, and data structure of encoded data

Info

Publication number: WO2012060179A1
Application number: PCT/JP2011/073134
Authority: WO
Inventors: 純生佐藤
Original assignee: シャープ株式会社
Priority date: 2010-11-04
Filing date: 2011-10-06
Publication date: 2012-05-10

Abstract

A moving image encoder apparatus (1) comprises: a range image dividing unit (22) that divides a range image into segments; a range value correcting unit (23) that determines a representative value of each of the segments; a number adding unit (24) that adds numbers to the respective segments in raster scan order; and a predicting/encoding unit (25) that calculates predicted values of the respective segments in numerical order, calculates differential values by subtracting the respective predicted values from the respective representative values of the segments, and that arranges, in numerical order, and encodes the calculated differential values to generate encoded data of the range image.

Description

Encoding device, decoding device, encoding method, decoding method, program, recording medium, and data structure of encoded data

The present invention mainly relates to an encoding device that encodes a distance image (Depth Image) and a decoding device that decodes the distance image encoded by such an encoding device.

It is an important theme to record the three-dimensional shape of a subject accurately and efficiently as data, and various methods have been proposed.

As one of those methods, a texture image, which is a general two-dimensional image that represents the subject space with the color of each subject and background, and the subject space is represented by the distance from the viewpoint to each subject and background. There is a method of recording two types of image data associated with an image (hereinafter referred to as “distance image”) in association with each other. More specifically, the distance image is an image expressing a distance value (depth value) from the viewpoint to a corresponding point in the object space for each pixel.

This distance image can be acquired by a distance measuring device such as a depth camera installed in the vicinity of the camera that records the texture image. Alternatively, a distance image can be acquired by analyzing a plurality of texture images obtained by photographing with a multi-viewpoint camera, and many analysis methods have been proposed.

In addition, distance values are expressed in 256 levels (ie, 8-bit luminance values) in the Moving Picture Experts Group (MPEG), which is a working group of the International Organization for Standardization / ISO / IEC, as a standard for distance images. MPEG-C part3, which is a standard to be established. That is, the standard distance image is an 8-bit grayscale image. In addition, since it is defined that a higher luminance value is assigned as the distance from the viewpoint is shorter, in a standard distance image, a subject located in front is expressed as white and a subject located in the back is expressed in black.

If a texture image and a distance image representing the same subject space are obtained, the distance from the viewpoint of each pixel constituting the subject image drawn in the texture image is known from the distance image, so that the subject has the maximum depth. It can be restored as a three-dimensional shape expressed in 256 stages. Furthermore, by projecting the 3D shape onto the 2D plane geometrically, the original texture image is converted into a texture image in the subject space when the subject is photographed from another angle within a certain range from the original angle. It is possible to convert. In other words, since a three-dimensional shape can be restored when viewed from an arbitrary angle within a certain range by a set of texture images and distance images, a free viewpoint of a three-dimensional shape can be obtained by using multiple sets of texture images and distance images. It is possible to represent an image with a small amount of data.

By the way, Non-Patent Document 1 discloses a technique capable of compressing and encoding video by efficiently eliminating temporal or spatial redundancy in the video. When each video of a texture video (video having a texture image as each frame) and a distance video (video having a distance image as each frame) is encoded by an encoding device using this technology, the redundancy that each video has Can be eliminated, and the data amount of each video transmitted to the decoding device can be further reduced.

Here, the present inventor has found that there are the following two characteristics between the texture image and the distance image. (1) The subject and background edge portions in the distance image and the subject and background edge portions in the texture image are common. (2) In the distance image, the distance depth value is relatively flat inside (in the region surrounded by the edge) from the edge of the subject and the background.

The characteristic (1) will be described. As long as the texture image includes information that allows the subject to be distinguished from the background as an image, the boundary (edge) between the subject and the background is common to the texture video and the distance video. . That is, the edge information indicating the edge portion of the subject is one large element indicating the correlation between the texture image and the distance image. Further, the characteristic (2) will be described. Generally, the distance image tends to be an image having a lower spatial frequency component than the texture image. For example, even if a person wearing a fancy pattern of clothes is drawn on the texture image, the distance depth value of the clothes portion tends to be constant in the distance image. In other words, in the distance image, it can be said that there is a strong tendency for a single distance depth value to appear in a wider area than in the texture image.

That is, in the texture image and the distance image, when a certain area in the texture image is composed of pixel groups composed of pixels of similar colors, all or almost all the pixel groups included in the corresponding area in the distance image There is a correlation that these pixels tend to take the same distance depth value.

Considering the correlation between the two images, if the distance image can be segmented for each range where the distance depth value is constant in the distance image, the distance depth value is substantially constant within that range (within the segmented pixel group). . For example, the entire region of the texture image is divided into a plurality of regions so that the difference between the maximum pixel value and the minimum pixel value of the pixel group included in the region is equal to or less than a predetermined threshold, and the same pattern as the texture image division pattern By dividing the distance image into a plurality of regions by the division pattern, the distance depth value becomes substantially constant in each region in the distance image. A pixel group (each region formed by dividing the entire region of the texture image and the distance image) divided so that the distance depth value becomes substantially constant is referred to as a segment.

By dividing the distance image into segment units in this way, when encoding the distance image, there is no need to perform orthogonal transformation on the pixels included in the segment, and very efficient encoding is performed. Can do. Furthermore, by using a predetermined image segmentation algorithm that uniquely identifies a segment of a texture image when segmenting a distance image into segments, when transmitting an image, there is no need to transmit information about the arrangement and shape of the segment, Furthermore, the encoding efficiency can be improved.

Thus, by dividing the distance image into each segment so that the distance depth value becomes substantially constant, the amount of information in the segment can be compressed. Therefore, the distance image can be handled in units of segments, not in units of pixels. Further, since the distance image is divided based on the corresponding texture image (texture image at the same time as the distance image), the distance image values in the adjacent segments in the distance image are the same or close values. It becomes more and more. Therefore, further information compression is possible by using the characteristic and eliminating the spatial redundancy between segments in the distance image.

Here, in Non-Patent Document 1, a texture image is made into a block, and spatial redundancy between blocks is eliminated by intra prediction encoding or intra prediction encoding. Specifically, first, pixels included in the texture image are blocked in units of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels, or the like. Next, the blocks are encoded in the order from the upper left block to the lower right block of the image. In the encoding of each block, the pixel or pixel adjacent to the encoding target block that is encoded prior to the encoding target block and is inside the block adjacent to the left, top, and upper right of the encoding target block. With reference to the column, the value of the pixel included in the encoding target block is predicted. Then, the difference obtained by subtracting the predicted value from the actual value of each pixel included in the encoding target block is orthogonally transformed and encoded. If the prediction accuracy is good, it can be expected that the value will be smaller than when the actual value itself is encoded, and as a result, the number of bits required for encoding can be reduced.

However, the technique described in Non-Patent Document 1 is a technique optimized for a texture image, and there is a problem that it cannot be applied as it is to a distance image divided into the above-described segment units. .

Specifically, in Non-Patent Document 1, a texture image is divided into blocks each having a square shape. On the other hand, in the range image dividing method proposed by the present inventors, the range image is divided into segments of arbitrary shapes. This is because with this division method, the smaller the number of segments to be divided, the better the coding efficiency. Therefore, each segment can have a flexible shape without any restriction on the shape of each segment. desirable.

If the unit for dividing the image is a square, the blocks adjacent to the left, top, and top right of the encoding target block can be uniquely determined. Furthermore, since it is guaranteed that a block including pixels that the encoding target block refers to for prediction is encoded prior to the encoding target block, the decoding side may reproduce the predicted value. it can. On the other hand, when an image is divided into arbitrary shapes, the adjacent segment of the encoding target segment cannot be uniquely determined. Further, it cannot be determined which segment adjacent to the encoding target segment is encoded in advance. Therefore, even if the technique described in Non-Patent Document 1 is applied as it is, the spatial redundancy of the distance image divided into segment units cannot be removed.

In addition, since the method of dividing the above-described distance image into segment units has been newly invented by the present inventor, a method for eliminating spatial redundancy between segments has not been considered.

The present invention has been made in view of the above problems, and a main object thereof is an encoding apparatus that performs encoding by eliminating spatial redundancy between segments of an image divided in segment units of an arbitrary shape. Another object of the present invention is to realize a decoding device that decodes a distance image supplied from such an encoding device.

In order to solve the above-described problem, an encoding apparatus according to the present invention is an encoding apparatus that encodes an image, and is divided by a dividing unit that divides the entire area of the image into a plurality of regions, and the dividing unit. For each of the plurality of regions, representative value determining means for determining a representative value from the pixel value of each pixel included in the region, number giving means for assigning a number to the plurality of regions in raster scan order, The above-mentioned area is set as the encoding target area in the order of the numbers given by the above-mentioned number assigning means, and among the pixels included in the encoding target area, the first pixel in the raster scan order is set as the representative pixel, and is included in the encoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and that has a raster scan order before the representative pixel is a predicted reference pixel, and a representative of the region having the predicted reference pixel Based on at least one of the above, the predicted value calculating means for calculating the predicted value of the encoding target area, and the predicted value calculating means for each encoding target area from the representative value determined by the representative value determining means. The difference value calculation means for subtracting the calculated predicted value to calculate the difference value, and the difference values calculated by the difference value calculation means are arranged and encoded in the order given by the number assignment means, and the encoded data of the image And encoding means for generating.

According to the above configuration, the number assigning unit assigns numbers in the raster scan order to the plurality of regions into which the dividing unit has divided the image. Next, the prediction value calculation means sets the area as the encoding target area in the order of the numbers given by the number assignment means, and sets the first pixel in the raster scan order as the representative pixel among the pixels included in the encoding target area. A pixel that is included in the encoding target region and is close to a pixel on the same scan line as the representative pixel and whose raster scan order is earlier than the representative pixel is a predicted reference pixel. Then, the predicted value calculation means calculates the predicted value of the encoding target region based on at least one of the representative values of the region having the predicted reference pixel. Next, the difference value calculation means subtracts the prediction value calculated by the prediction value calculation means from the representative value determined by the representative value determination means for each encoding target region to calculate a difference value. Then, the encoding unit arranges and encodes the difference values calculated by the difference value calculation unit in the order given by the number assigning unit, and generates encoded data of the image.

Therefore, even if the plurality of areas divided by the dividing means have an arbitrary shape, the order of the areas can be uniquely specified. Moreover, the representative pixel used when calculating the predicted value of the representative value of each region and the prediction target pixel based on the representative pixel can be uniquely specified. Therefore, the predicted value of the encoding target area determined from the representative value of the area adjacent to the encoding target area can be uniquely calculated.

Therefore, even if the plurality of regions divided by the dividing unit have an arbitrary shape, it is possible to eliminate spatial redundancy between the regions and generate encoded data that can be uniquely decoded. There is an effect.

In addition, when decoding the encoded data encoded using the calculated predicted value, it is necessary to calculate the predicted value in the same manner as when encoding. That is, the prediction reference pixel for a certain area needs to be the same at the time of encoding and at the time of decoding. Therefore, a prediction reference pixel for a certain area needs to be decoded before the certain area, that is, needs to be encoded first.

Therefore, as described above, by setting the pixel whose raster scan order is earlier than the pixel included in the encoding target area as the prediction reference pixel, it is possible to normally decode the encoded data in order from the beginning without omission at the time of decoding. Is guaranteed. Therefore, there is an additional effect that efficient encoding processing can be performed and the amount of memory used during the encoding processing can be reduced.

In order to solve the above-described problem, an encoding method according to the present invention is an encoding method of an encoding device that encodes an image, and the encoding device divides the entire region of the image into a plurality of regions. A division step; a representative value determination step for determining a representative value from a pixel value of each pixel included in the plurality of regions divided in the division step; and a raster scan for the plurality of regions. A numbering step for assigning numbers in order, and the region as an encoding target region in the order of the numbers given in the numbering step, and among the pixels included in the encoding target region, the first pixel in the raster scan order A pixel that is included in the encoding target area and is close to a pixel on the same scan line as the representative pixel, and the raster scan order is higher than that of the representative pixel. For each of the encoding target areas, a prediction value calculating step for calculating a prediction value of the encoding target area based on at least one representative value of the area having the prediction reference pixel, A difference value calculation step for calculating a difference value by subtracting the prediction value calculated in the prediction value calculation step from the representative value determined in the representative value determination step; and a difference value calculated in the difference value calculation step Are encoded in the order given in the number assigning step, and an encoded step of generating encoded data of the image is included.

According to the above configuration, the encoding method according to the present invention has the same effects as the encoding apparatus according to the present invention.

In order to solve the above problem, the decoding device according to the present invention, for each of a plurality of areas obtained by dividing the entire area of the image with a predetermined division pattern, a representative value of the pixel value of each pixel included in the area, The encoded data of the image including a difference value that is a difference from the predicted value of the representative value of the region, and the difference value is arranged in the order of numbers assigned to the plurality of regions in the raster scan order. The predicted values are included in the encoding target area, with the area as the encoding target area in the order of the numbers, with the first pixel in the raster scan order as the representative pixel among the pixels included in the encoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and is at least one representative value of a region having the predicted reference pixel A decoding device that decodes the encoded data calculated based on: a dividing unit that divides the entire area of the image into a plurality of areas based on area information defining the plurality of areas; and the encoding Decoding means for decoding data and generating differential values arranged in order; Number assigning means for assigning numbers to the plurality of regions divided by the dividing means in raster scan order; and the number assigning means Are assigned to the plurality of areas in order of numbers assigned by each of the plurality of areas, and the areas are set as decoding target areas in order of the numbers assigned by the number assigning means, and are included in the decoding target areas. Among the pixels, the first pixel in the raster scan order is a representative pixel, and is included in the decoding target area and is adjacent to a pixel on the same scan line as the representative pixel. A prediction value calculation unit that calculates a prediction value of a decoding target region based on a pixel value of at least one pixel of the prediction reference pixels, the pixel having a raster scan order before the representative pixel as a prediction reference pixel; For each decoding target area, the pixel value of the decoding target area is calculated by adding the difference value assigned by the allocating means to the prediction value calculated by the prediction value calculating means, and all the pixels included in the decoding target area are calculated. A pixel value setting unit that sets the calculated pixel value to the calculated pixel value, and the predicted value calculation unit and the pixel value setting unit repeatedly execute the process for each decoding target region in the order of the numbers. It is characterized by restoring pixel values of an image.

According to the above configuration, the decoding unit decodes the encoded data and generates difference values arranged in order. The allocating unit calculates a difference for each of the plurality of regions obtained by dividing the image by the dividing unit based on region information defining the plurality of regions in the order of numbers assigned by the number assigning unit in raster scan order. Assign values in order from the beginning. Next, the prediction value calculation means decodes the area as the decoding target area in the order of the numbers given by the number assigning means, and uses the first pixel in the raster scan order as the representative pixel among the pixels included in the decoding target area. A pixel that is included in the target region and is adjacent to a pixel on the same scan line as the representative pixel and whose raster scan order is earlier than the representative pixel is set as a predicted reference pixel. Then, the predicted value calculation means calculates a predicted value of the decoding target region based on the pixel value of at least one pixel among the predicted reference pixels. The pixel value setting means calculates the pixel value of the decoding target area by adding the difference value assigned by the assigning means to the prediction value calculated by the prediction value calculating means for each decoding target area, The pixel values of all the pixels included in the decoding target area are set to the calculated pixel values. Then, the prediction value calculation means and the pixel value setting means repeatedly execute the above processing for each decoding target area in the order of the numbers given by the number assignment means, and restore the pixel values of the image.

Therefore, the decoding target area is the same as the plurality of areas into which the image indicated by the encoded data is divided. In addition, the representative pixel used when calculating the predicted value of the representative value of each decoding target area and the prediction target pixel based thereon can be uniquely specified, and the representative pixel of the decoding target area and the prediction target based thereon The pixel, the representative pixel of the encoding target region corresponding to the decoding target region, and the prediction target pixel based thereon can be the same pixel. Therefore, there is an effect that the image indicated by the encoded data can be accurately restored.

In order to solve the above problem, the decoding method according to the present invention provides, for each of a plurality of areas obtained by dividing the entire area of an image with a predetermined division pattern, a representative value of pixel values of each pixel included in the area, The encoded data of the image including a difference value that is a difference from the predicted value of the representative value of the region, and the difference value is arranged in the order of numbers assigned to the plurality of regions in the raster scan order. The predicted values are included in the encoding target area, with the area as the encoding target area in the order of the numbers, with the first pixel in the raster scan order as the representative pixel among the pixels included in the encoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and is at least one representative value of a region having the predicted reference pixel A decoding method of a decoding device that decodes encoded data calculated based on the method, wherein the decoding device converts all regions of the image into a plurality of regions based on region information that defines the plurality of regions. A division step for dividing, a decoding step for decoding the encoded data and generating difference values arranged in order, and assigning numbers to the plurality of regions divided in the division step in a raster scan order A number assigning step, an assigning step in which the difference values are assigned in order from the top to the plurality of regions in the order of numbers assigned in the number assigning step, and the regions in the order of the numbers given in the number assigning step. Among the pixels included in the decoding target area, the first pixel in the raster scan order is set as the representative pixel and included in the decoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and a pixel of at least one of the predicted reference pixels A prediction value calculation step of calculating a prediction value of the decoding target region based on the value, and for each decoding target region, the difference value assigned in the assignment step is added to the prediction value calculated in the prediction value calculation step. A pixel value setting step of adding and calculating the pixel value of the decoding target area and setting the pixel values of all the pixels included in the decoding target area to the calculated pixel value, and in order of the numbers The prediction value calculation step and the pixel value setting step are repeatedly executed for each target region to restore the pixel value of the image.

According to the above configuration, the decoding method according to the present invention has the same operational effects as the decoding device according to the present invention.

As described above, the encoding apparatus according to the present invention can uniquely decode even if a plurality of regions into which an image is divided has an arbitrary shape, eliminating spatial redundancy between the regions. There is an effect that encoded data can be generated.

In addition, the decoding device according to the present invention has an effect that the image indicated by the encoded data can be accurately restored.

Further objects, features, and superior points of the present invention will be fully understood from the following description. The benefits of the present invention will become apparent from the following description with reference to the accompanying drawings.

It is a block diagram which shows the structure of the moving image encoder which concerns on one Embodiment of this invention. It is a flowchart figure which shows operation | movement of the moving image encoder of FIG. It is the figure which showed one specific example of the color texture image input into the moving image encoder of FIG. It is a figure which shows a specific example of the distance image input into the moving image encoder of FIG. 1, and has shown the distance image input by a pair with the texture image of FIG. FIG. 4 is a diagram showing the distribution of each segment defined by the moving image encoding apparatus of FIG. 1 from the texture image of FIG. 3. FIG. 6 is a diagram illustrating a segment boundary portion in which an image division processing unit of the moving image encoding device in FIG. 1 outputs a coordinate value as position information to the subsequent stage for each segment in FIG. 5. FIG. 7 shows 12 pixels of 3 × 4 in the vertical direction that constitute a partial area of the texture image. FIGS. 7A and 7B show a case where two pixels are adjacent vertically and horizontally. FIG. 7C shows a case where two pixels are in contact at only one point. It is a figure which shows the order which scans a texture image in order to determine the value of the segment number which the moving image encoder of FIG. 1 assign | provides to each segment. It is a figure which shows typically the segment number provided to each segment prescribed | regulated from the texture image of FIG. For each segment (region) defined by dividing the entire region of the texture image, the moving image encoding device of FIG. 1 is given the representative value of the distance value in the corresponding segment in the distance image and the raster scan order. It is a figure which shows typically the data with which the segment number was matched. It is a flowchart which shows an example of the prediction encoding process which the prediction encoding part of the moving image encoder of FIG. 1 performs. (A) to (e) of FIG. 12 show 12 pixels of length 3 and width 4 constituting a partial region of the distance image, and the prediction encoding unit predicts the representative value of the segment. It is a figure which shows the specific example of the representative pixel to be used, and the prediction reference pixel based on this representative pixel. It is a figure which shows the correspondence of the value and codeword which are the object of an encoding in the exponent Golomb encoding method which a prediction encoding part uses. It is an example of the encoded data which a prediction encoding part produces | generates. It is the figure which showed the data structure of the NAL unit typically. It is a block diagram which shows the structure of the moving image decoding apparatus which concerns on one Embodiment of this invention. It is a flowchart figure which shows operation | movement of the moving image decoding apparatus of FIG. It is a flowchart which shows an example of the prediction decoding process which the prediction decoding part of the moving image decoding apparatus of FIG. 16 performs. It is a block diagram which shows the structure of the moving image encoder which concerns on another embodiment of this invention. It is a block diagram which shows the structure of the moving image decoding apparatus which concerns on another embodiment of this invention. It is a flowchart figure which shows an example of the operation | movement which prescribes | regulates several segments with the moving image encoder of FIG. It is a flowchart figure which shows the subroutine of the segment coupling | bonding process in the flowchart of FIG.

<Embodiment 1>
A video encoding device and video decoding device according to an embodiment of the present invention will be described below with reference to FIGS.

First, the moving picture encoding apparatus according to the present embodiment will be described. The moving picture coding apparatus according to the present embodiment generally generates coded data for each frame constituting a three-dimensional moving picture by coding a texture image and a distance image constituting the frame. It is a device to do.

The moving picture encoding apparatus according to the present embodiment uses H.264 for encoding texture images. On the other hand, the encoding technique employed in the H.264 / MPEG-4 AVC standard is used, while the encoding of the distance image is a moving picture encoding apparatus using the encoding technique peculiar to the present invention.

The above encoding technique unique to the present invention is an encoding technique developed by paying attention to the fact that there is a correlation between a texture image and a distance image. In two images, when a certain area in the texture image is composed of pixel groups composed of pixels of similar colors, all or almost all of the pixels included in the corresponding area in the distance image are the same. There is a correlation that the tendency to take a distance value is strong.

In the present invention, the values of the pixels constituting the texture image and the distance image are referred to as pixel values. The pixel value in the texture image indicates information regarding the luminance and color of each pixel. In addition, the pixel value in the distance image indicates information related to the distance depth that each pixel has. When distinguishing the pixel values of the texture image and the distance image, the pixel value of the texture image is referred to as a color value, and the pixel value of the distance image is referred to as a distance value.

First, the configuration of the video encoding apparatus according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of a main part of a video encoding device.

(Configuration of moving picture encoding apparatus 1)
As shown in FIG. 1, the moving image encoding apparatus 1 includes an image encoding unit 11, an image decoding unit (decoding unit) 12, a distance image encoding unit 20, and a packaging unit (transmission unit) 28. Yes. The distance image encoding unit 20 includes an image division processing unit 21, a distance image division processing unit (dividing unit) 22, a distance value correcting unit (representative value determining unit) 23, a number assigning unit (number assigning unit) 24, and A prediction encoding unit (prediction value calculation means, difference value calculation means, encoding means) 25 is provided.

The image encoding unit 11 The texture image # 1 is encoded by AVC (Advanced Video Coding) coding defined in the H.264 / MPEG-4 AVC standard.

The image decoding unit 12 decodes the texture image # 1 'from the encoded data # 11 of the texture image # 1.

The image division processing unit 21 divides the entire area of the texture image # 1 into a plurality of segments (areas). Then, the image division processing unit 21 outputs segment information # 21 including position information of each segment. The segment position information is information indicating the position of the segment in the texture image # 1.

When the distance image # 2 and the segment information # 21 are input, the distance image division processing unit 22 includes each segment included in the corresponding segment (region) in the distance image # 2 for each segment in the texture image # 1 ′. A distance value set consisting of pixel distance values is extracted. Then, the distance image division processing unit 22 generates segment information # 22 in which the distance value set and the position information are associated with each segment from the segment information # 21.

The distance value correction unit 23 calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. That is, when the segment i in the distance image # 2 includes N pixels, the distance value correcting unit 23 calculates the mode value from the N distance values. The distance value correcting unit 23 may calculate an average of N distance values as an average value, or a median value of N distance values or the like as a representative value # 23a instead of the mode value. The distance value correcting unit 23 may further round the decimal value to an integer value by rounding down, rounding up, or rounding off when the average value, median value, or the like becomes a decimal value as a result of the calculation. .

Then, the distance value correcting unit 23 replaces the distance value set of each segment included in the segment information # 22 with the representative value # 23a of the corresponding segment, and outputs it to the number assigning unit 24 as the segment information # 23.

The number assigning unit 24 scans the pixels included in the distance image in the raster scan order, and assigns the segment number # 24 in the scanned order to each segment that is an area divided by the segment information # 23. , It is associated with each representative value # 23a included in the segment information # 23.

The predictive encoding unit 25 performs predictive encoding processing based on the input M sets of representative values # 23a and segment numbers # 24, and outputs the obtained encoded data # 25 to the packaging unit 28. Specifically, the predictive encoding unit 25 calculates the segment predicted value for each segment in the order of segment number # 24, subtracts the predicted value from the representative value, calculates the difference value, and calculates the difference value. Encode. Then, the predictive encoding unit 25 arranges the encoded difference values in the order of the segment number # 24 to obtain encoded data # 25, and outputs the encoded data # 25 to the packaging unit 28.

The packaging unit 28 associates the encoded data # 11 of the texture image # 1 and the encoded data # 25 of the distance image # 2 and outputs them as encoded data # 28 to the outside.

(Operation of the video encoding device 1)
Next, the operation of the moving picture encoding apparatus 1 will be described below with reference to FIG. FIG. 2 is a flowchart showing the operation of the moving image encoding apparatus 1. Note that the operation of the moving image encoding apparatus 1 described here is an operation of encoding a texture image and a distance image of the t frame from the head in a moving image including a large number of frames. That is, the moving image encoding apparatus 1 repeats the operation described below as many times as the number of frames of the moving image in order to encode the entire moving image. In the following description of the operation, unless otherwise specified, each data # 1 to # 28 is interpreted as data of the t-th frame.

First, the image encoding unit 11 and the distance image division processing unit 22 respectively receive the texture image # 1 and the distance image # 2 from the outside of the moving image encoding device 1 (step S1). As described above, the pair of the texture image # 1 and the distance image # 2 received from the outside is correlated with the contents of the image, as can be seen, for example, by comparing the texture image of FIG. 3 and the distance image of FIG. is there.

Next, the image encoding unit 11 The texture image # 1 is encoded by the AVC encoding method stipulated in the H.264 / MPEG-4 AVC standard, and the obtained texture image encoded data # 11 is transmitted to the packaging unit 28 and the image decoding unit 12. Output (step S2). When the texture image # 1 is a B picture or a P picture in step S2, the image encoding unit 11 encodes the prediction residual between the texture image # 1 and the predicted image, and the encoded prediction residual Is output as encoded data # 11.

Then, the image decoding unit 12 decodes the texture image # 1 'from the encoded data # 11 and outputs it to the image division processing unit 21 (step S3). Here, the texture image # 1 'to be decoded is not completely the same as the texture image # 1 encoded by the image encoding unit 11. This is because the image encoding unit 11 performs the DCT conversion process and the quantization process during the encoding process, but a quantization error occurs when the DCT coefficient obtained by the DCT conversion is quantized.

Incidentally, the timing at which the image decoding unit 12 decodes the texture image differs depending on whether or not the texture image # 1 is a B picture. This will be described in detail.

That is, when the texture image # 1 is an I picture, the image decoding unit 12 decodes the texture image # 1 ′ without performing inter prediction (inter-screen prediction).

If the texture image # 1 is a P picture, the image decoding unit 12 decodes the prediction residual from the encoded data # 11. Then, the image decoding unit 12 decodes the texture image # 1 ′ by adding a prediction residual to the predicted image generated using the encoded data # 11 of one or more frames before the t-th frame as a reference picture.

Furthermore, when the texture image # 1 is a B picture, the image decoding unit 12 decodes the prediction residual from the encoded data # 11. Then, the image decoding unit 12 generates, as reference pictures, encoded data # 11 of one or more frames before the t-th frame and encoded data # 11 of one or more frames after the t-th frame. Texture image # 1 ′ is decoded by adding the prediction residual to the prediction image.

As can be seen from the above description, when the texture image # 1 in the t frame is an I picture or a P picture, the timing at which the image decoding unit 12 decodes the texture image # 1 ′ in the t frame is the t frame. Immediately after the encoded data # 11 is generated. On the other hand, when the texture image # 1 of the t frame is a B picture, the timing at which the image decoding unit 12 decodes the texture image # 1 ′ is the T (> t) frame (the last frame in the reference picture). ) After the time when the encoding process for texture image # 1 is completed.

After the process of step S3, the image division processing unit 21 defines a plurality of segments from the input texture image # 1 '(step S4). Each segment defined by the image division processing unit 21 has a similar color pixel (that is, the difference between the maximum pixel value and the minimum pixel value (difference between the maximum color value and the minimum color value) is equal to or less than a predetermined threshold value. (Closed pixel group).

The process of step S4 will be described with a specific example. FIG. 5 is a diagram showing the distribution of each segment defined by the image division processing unit 21 from the texture image # 1 ′ of FIG. In FIG. 5, the closed region drawn by the same pattern indicates one segment.

In the texture image # 1 in FIG. 3, the left and right hairs of the girl's head division are drawn in two colors, brown and light brown. As can be seen from FIG. 5, the image division processing unit 21 defines a closed region made up of pixels of similar colors such as brown and light brown as one segment.

On the other hand, the skin portion of the girl's face is also drawn in two colors, the skin color and the pink color of the cheek portion. As can be seen from FIG. Each pink area is defined as a separate segment. This is because the skin color and the pink color are not similar (that is, the difference between the skin color value and the pink color value exceeds a predetermined threshold value).

After the process of step S4, the image division processing unit 21 generates segment information # 21 including the position information of each segment and outputs it to the distance image division processing unit 22 (step S5). As the position information of the segment, for example, the coordinate values of all the pixels included in the segment can be cited. That is, when defining each segment from the texture image # 1 ′ in FIG. 3, each closed region in FIG. 6 is defined as one segment, but the position information of the segment constitutes a closed region corresponding to the segment. Coordinate values for all pixels.

Here, supplementary matters regarding the above-mentioned segment will be described with reference to FIG. (A) to (c) of FIG. 7 show 12 pixels of 3 × 4 in the vertical direction that constitute a partial region of the texture image. In addition, in FIGS. 7A to 7C, the color of the pixel labeled “A” and the color of the pixel labeled “B” are the same or similar. In addition, the colors of the pixels in the other ten partial regions are completely different from the colors of the pixel A and the pixel B.

As described above, each segment is a closed region (a group of connected pixels) made up of pixels of similar colors. The definition of the closed region will be described with reference to FIG.

In the present invention, it is assumed that the pixel A and the pixel B are connected when the positional relationship between the two pixels is (a) and (b) in FIG. That is, it is considered that the pixel A and the pixel B are connected when they are in contact with each other in the vertical direction or the horizontal direction. In other words, when the pixel A and the pixel B are in contact with each other on one side, it is considered that they are connected. That is, in this case, the pixel A and the pixel B form the same segment.

On the other hand, when the positional relationship between the two pixels is (c) in FIG. 7, it is assumed that the pixel A and the pixel B are not connected. That is, when the pixel A and the pixel B are in contact with each other in an oblique direction, it is considered that they are not connected. In other words, when the pixel A and the pixel B are in contact with each other only at a certain point, it is considered that they are not connected. That is, in this case, the pixel A and the pixel B are the same color or similar colors, but are different segments. Needless to say, when the pixel A and the pixel B are not in contact with each other, the pixel A and the pixel B are separate segments.

In summary, “pixels are adjacent to each other” is strictly synonymous with the Manhattan distance between the coordinates of the two pixels being “1”, and two pixels are not adjacent to each other. This is synonymous with the fact that the Manhattan distance between the coordinates of two pixels is “2 or more”.

Based on the above, in the present invention, a pixel group (each region formed by dividing the entire region of the texture image and the distance image) divided so that the distance depth value is substantially constant, A pixel group having a connection relationship is referred to as a segment.

Furthermore, when the pixel A and the pixel B are in the positional relationship shown in FIGS. 7A and 7B, the pixel A is also referred to as being adjacent to the pixel B. In addition, when the pixel A and the pixel B are in any of the positional relationships shown in FIGS. 7A to 7C, the pixel A is also referred to as being close to the pixel B. Further, when a pixel constituting a segment is adjacent to a pixel constituting another segment, the segment is referred to as being adjacent to another segment. Further, when a pixel constituting a segment is close to a pixel constituting another segment, the segment is referred to as being close to another segment.

After step S5, the distance image division processing unit 22 divides the input distance image # 2 into a plurality of segments. Specifically, the distance image division processing unit 22 refers to the input segment information # 21, specifies the position of each segment in the texture image # 1 ′, and is the same as the segment division pattern in the texture image # 1 ′. In this division pattern, the distance image # 2 is divided into a plurality of segments (in the following description, it is assumed that the number of segments is M).

Then, the distance image division processing unit 22 extracts the distance value of each pixel included in the segment as a distance value set for each segment of the distance image # 2. Furthermore, the distance image division processing unit 22 associates the distance value set extracted from the corresponding segment with the position information of each segment included in the segment information # 21. And the distance image division | segmentation process part 22 outputs segment information # 22 obtained by this to the distance value correction part 23 (step S6).

The distance value correction unit 23 calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. Then, the distance value correcting unit 23 replaces each of the M distance value sets included in the segment information # 22 with the representative value # 23a of the corresponding segment, and outputs it as the segment information # 23 to the number assigning unit 24 ( Step S7).

The number assigning unit 24 associates the representative value # 23a with the segment number # 24 corresponding to the position information for each of the M sets of position information and representative value # 23a included in the segment information # 23, and sets M sets The representative value # 23a and the segment number # 24 are output to the predictive coding unit 25 (step S8). Specifically, the number assigning unit 24 sets the representative value # 23a of the i-th segment in the raster scan order for each segment i from 1 to M (M: the number of segments) based on the segment information # 23. The segment number “i−1” is associated. Here, the “i-th segment in the raster scan order” is a segment in which the i-th pixel is scanned when the distance image or the texture image is scanned in the raster scan order as shown in FIG.

A specific example will be described below with reference to FIG.

FIG. 9 is a diagram schematically showing the position of each segment of the distance image input to the moving image encoding device 1 together with the texture image as shown in FIG. In FIG. 9, one closed region indicates one segment.

In the distance image of FIG. 9, the segment number “0” is assigned to the segment R0 located at the head in the raster scan order. Further, the segment number “1” is assigned to the segment R1 that is positioned second in the raster scan order. Similarly, segment numbers “2” and “3” are respectively assigned to the third and fourth segments R2 and R3 in the raster scan order.

Then, the number assigning unit 24 outputs the M sets of representative values # 23a and the segment number # 24 whose specific examples are shown in FIG. 10 to the predictive encoding unit 25.

After step S8, the predictive encoding unit 25 performs predictive encoding processing based on the input M sets of representative values # 23a and segment numbers # 24, and the obtained encoded data # 25 is packaged by the packaging unit 28. (Step S9). Specifically, the predictive encoding unit 25 calculates the segment predicted value for each segment in the order of segment number # 24, subtracts the predicted value from the representative value, calculates the difference value, and calculates the difference value. Encode. Then, the predictive encoding unit 25 arranges the encoded difference values in the order of the segment number # 24 to obtain encoded data # 25.

In the present invention, it is assumed that adjacent segments have the same or close distance values. Based on this assumption, the predicted value of the representative value of the segment is predicted from the representative value of the segment adjacent to the segment. Usually, each region inside the same subject has a slightly different distance value, so this assumption is valid.

Details of the predictive encoding process executed by the predictive encoding unit 25 in step S9 will be described with reference to FIG. FIG. 11 is a flowchart illustrating an example of the prediction encoding process performed by the prediction encoding unit 25.

First, “i” that is the segment number # 24 is set to “0” (step S101). Then, the segment whose segment number # 24 is “i” is set as an encoding target segment (encoding target region) (step S102). That is, the segment with the first segment number “0” is set as the encoding target segment.

Next, the representative pixel of the encoding target segment used for calculating the predicted value is specified from the pixels included in the encoding target segment (step S103). Specifically, the pixel included in the encoding target segment and scanned first in the raster scan order in step S8 (the first pixel in the raster scan order) is set as the representative pixel. Although the shape of the segment in the present invention is various as described above, the pixel to be scanned first in the raster scan order is uniquely determined regardless of the shape of the segment.

After specifying the representative pixel, the prediction reference pixel is specified based on the representative pixel (step S104). Specifically, a pixel that is included in the encoding target segment and is close to a pixel on the same scan line as the representative pixel and whose raster scan order is earlier than the representative pixel is a predicted reference pixel. For example, the prediction reference pixel is a pixel adjacent to the pixel on the same scan line as the representative pixel that is included in the encoding target segment and the pixel immediately before the raster scan order of the representative pixel. The pixel of the previous scan line in the raster scan order of the pixel and the next pixel in the raster scan order of the last pixel included in the encoding target segment and the same scan line as the representative pixel are adjacent. The pixel group may include a pixel and a pixel on a scan line that is one previous in the raster scan order of the representative pixel. Further, for example, the prediction reference pixel is a pixel adjacent to the pixel in the encoding target segment and the pixel on the same scan line as that of the representative pixel, and the pixel immediately before the raster scan order of the representative pixel, Any one of the pixels of the previous scan line in the raster scan order of the representative pixel and the last pixel in the same scan line as the representative pixel included in the segment to be encoded Three pixels including a pixel adjacent to the next pixel and a pixel on the previous scan line in the raster scan order of the representative pixel may be used.

Here, a specific example of specifying the prediction reference pixel based on the representative pixel will be described based on FIG. (A) to (e) of FIG. 12 show 12 pixels of 3 × 4 in the vertical direction that constitute a partial region of the distance image. In addition, pixels (referred to as pixels A) denoted by “A” in FIGS. 12A to 12E are pixels that constitute the same segment. Also, the pixels labeled “B”, “C”, or “D” (referred to as pixels B, C, and D, respectively) and “blank” pixels are different from the segment RA having the pixel A. It constitutes a segment. Here, the segments to which the pixels other than the pixel A belong may all be the same segment, or may all be different segments. 12A to 12E, the representative pixels in each case and the pixels on the same scanning line as the representative pixels (scan lines: pixel rows in the present embodiment) are hatched. .

In this embodiment, as shown in FIG. 8, the pixels are scanned in the horizontal direction from the upper left to the lower right of the image. Therefore, here, a pixel on the same scanning line as a certain pixel means a pixel in the same row as the certain pixel. Further, a pixel immediately before a certain pixel in the raster scan order means a pixel one pixel to the left of the certain pixel. Further, a pixel whose raster scan order is one after a certain pixel means a pixel one right of a certain pixel. A pixel that is adjacent to a certain pixel and is one pixel before the scanning line in the raster scan order of the certain pixel means a pixel that is one pixel above the certain pixel.

Hereinafter, the encoding target segment is assumed to be the segment RA having the pixel A, and the identification of the representative pixel and the predicted reference pixel of the segment RA will be described in the example shown in FIGS.

First, in the case of FIG. 12A, the representative pixel is the pixel A in the shaded line located at the top of the pixel A. In this case, the pixel B that is one pixel to the left of the representative pixel, the pixel C that is one pixel above the representative pixel, and the pixel D that is diagonally to the right of the representative pixel are used as predicted reference pixels.

In the case of FIG. 12B, the representative pixel is the pixel A on the left side of the pixel A among the pixels A that is shaded. Here, a pixel B that is one pixel to the left of the representative pixel, a pixel C that is one pixel above the representative pixel, and a pixel D that is diagonally right above the pixel A that is one pixel to the right of the representative pixel are used as predicted reference pixels.

In the case of FIG. 12C, the representative pixel is the leftmost pixel A of the pixel A among the pixels A that is shaded. And here, prediction reference is made to a pixel B one pixel to the left of the representative pixel, a pixel C one pixel above the representative pixel, and a pixel D diagonally right above the pixel A located at the rightmost position in the same row as the representative pixel. Let it be a pixel.

In the case of FIG. 12D, the representative pixel is the pixel A on the left side of the pixel A among the pixels A that is shaded. And here, pixel B one pixel to the left of the representative pixel, pixel C one pixel above the representative pixel, pixel C one pixel right above the representative pixel, and one pixel right above the representative pixel A pixel D on the upper right side of the pixel is set as a predicted reference pixel.

In the case of (e) in FIG. 12, the representative pixel is the leftmost pixel A of the pixel A among the pixels A that is shaded. In this example, the pixel B one pixel to the left of the representative pixel, the three pixels C respectively positioned on one pixel A on the same row as the representative pixel, and the rightmost pixel on the same row as the representative pixel. A pixel D on the upper right side of the pixel A is set as a predicted reference pixel.

In this embodiment, the pixel is scanned in the order shown in FIG. 8, and segment number # 24 is assigned. Therefore, in order to encode each segment in the order indicated by the segment number # 24, the pixel B, the pixel C, and the pixel D illustrated in FIG. 12 are encoded in advance of the encoding target segment (the pixel A included in the encoding target segment). It is guaranteed that

After specifying the predicted reference pixel in step S104, the predicted value of the representative value of the encoding target segment is calculated based on the representative value of the segment having the predicted reference pixel (step S105). For example, when the prediction reference pixel is pixel B, pixel C, and pixel D as in the example illustrated in FIG. 12A, the representative value Z_B of the segment RB having the pixel B and the representative of the segment RC having the pixel C Based on the value Z_C and the representative value Z_D of the segment RD having the pixel D, the predicted value Z′_A of the representative value Z_A of the segment RA is calculated. Here, Z′_A may be a median value of Z_B, Z_C, and Z_D. Z′_A may be an average value of Z_B, Z_C, and Z_D. Z′_A may be any value of Z_B, Z_C, and Z_D.

After calculating the prediction value Z′_A of the encoding target segment, the difference value ΔZ_A is calculated by subtracting the prediction value Z′_A from the representative value Z_A of the encoding target segment (step S106). The calculated difference value ΔZ_A is a value indicating the distance value of the pixels included in the encoding target segment. As described above, the distance value has 256 steps and takes a value from 0 to 255. Therefore, ΔZ_A can take a value from −255 to +255.

Next, the calculated difference value is encoded by a variable-length encoding method in which the code word is shorter as the value is closer to 0 (step S107). In the present embodiment, it is assumed that the difference value is encoded using an exponential Golomb encoding method which is one of variable length encoding methods. FIG. 13 shows the correspondence between the difference value and the code word in the exponential Golomb encoding method. In FIG. 13, the difference value is shown in the right column, and the code word when exponent Golomb coding is performed on the difference value is shown in the left column. As shown in FIG. 13, in the exponential Golomb encoding method, the codeword to be assigned becomes shorter as the difference value is closer to 0, that is, as the predicted value is closer to the representative value approximating the actual distance value. Therefore, it is possible to transmit the distance image while reducing the amount of information to be transmitted.

After step S107, it is confirmed whether or not the segment number # 24 “i” is “M−1” (step S108). If it is not “i = M−1” (NO in step S108), “i” that is the segment number # 24 is set to “i + 1” (step S109), and the processes of steps S102 to S107 are executed. On the other hand, if “i = M−1” (YES in step S108), the process proceeds to step S110.

That is, after step S107, it is confirmed whether or not all (M) segments have been encoded. If all segments have not been encoded, the processes of steps S102 to S107 are executed in order of segment number # 24. If difference values are calculated and encoded for all segments, the process proceeds to step S110.

In step S110, the encoded difference values are arranged in the order of segment number # 24 to generate encoded data # 25. A specific example of the encoded data # 25 is shown in FIG. FIG. 14 shows an example of encoded data # 25 in which difference values “3”, “−4”, “−1”, and “0” are encoded in order. Thus, the predictive encoding unit 25 compresses the input data to generate encoded data # 25, and outputs the generated encoded data # 25 to the packaging unit 28 (step S9).

After step S9, the packaging unit 28 integrates the encoded data # 11 output from the image encoding unit 11 in step S2 and the encoded data # 25 output from the predictive encoding unit 25 in step S9. Then, the obtained encoded data # 28 is transmitted to a moving picture decoding apparatus to be described later (step S10).

Specifically, the packaging unit 28 is H.264. In accordance with the format of the NAL unit defined in the H.264 / MPEG-4 AVC standard, the texture image encoded data # 11 and the distance image encoded data # 25 are integrated. More specifically, the integration of the encoded data # 11 and the encoded data # 25 is performed as follows.

FIG. 15 is a diagram schematically showing the configuration of the NAL unit. As shown in FIG. 11, the NAL unit is composed of three parts: a NAL header part, an RBSP part, and an RBSP trailing bit part. .

The packaging unit 28 stores a specified numerical value I in the nal_unit_type (identifier indicating the type of NAL unit) field of the NAL header portion of the NAL unit corresponding to each slice (main slice) of the main picture. The prescribed numerical value I is generated in accordance with the encoding method according to the present embodiment (that is, the encoding method for encoding the distance image # 2 after calculating the difference value for each segment) according to the present embodiment. This is a value indicating encoded data. The numerical value I is, for example, H.264. Values defined as “undefined” or “for future expansion” in the H.264 / MPEG-4 AVC standard can be used.

The packaging unit 28 stores the encoded data # 11 and the encoded data # 25 in the RBSP unit of the NAL unit corresponding to the main slice. Further, the packaging unit 28 stores the RBSP trailing bit in the RBSP trailing bit unit.

The packaging unit 28 transmits the NAL unit thus obtained to the video decoding device as encoded data # 28.

(Appendix 1)
In the above-described embodiment, the image division processing unit 21 is configured from a group of pixels whose difference between the maximum pixel value and the minimum pixel value is equal to or less than a predetermined threshold value from the input texture image # 1 ′. Although a plurality of segments are defined, the method of defining the segments is not limited to this configuration. For example, for each segment, the image division processing unit 21 calculates the average value calculated from the pixel values of the pixel group included in the segment and the pixels included in the segment adjacent to the segment from the input texture image # 1 ′. A plurality of segments whose difference from the average value calculated from the pixel values of the group is equal to or greater than a predetermined threshold value may be defined.

A specific algorithm for defining a plurality of segments in which the difference between the average values is equal to or greater than a predetermined threshold will be described below with reference to FIGS.

FIG. 21 is a flowchart showing an operation in which the video encoding device 1 defines a plurality of segments based on the above algorithm. FIG. 22 is a flowchart showing a subroutine of segment combination processing in the flowchart of FIG.

The image division processing unit 21 performs, for each of all the pixels included in the texture image, in the initialization step in the figure for the texture image subjected to the smoothing process as shown in (Appendix 2). One independent segment (provisional segment) is defined, and the pixel value itself of the corresponding pixel is set as the average value (average color) of all pixel values in each provisional segment (step S41).

Next, the process proceeds to a segment combination processing step (step S42) to combine provisional segments having similar colors. This segment combining process will be described in detail below with reference to FIG. 22, but this combining process is repeated until the combination is not performed.

The image division processing unit 21 performs the following processing (steps S51 to S55) for all provisional segments.

First, the image division processing unit 21 determines whether or not the height and width of the temporary segment of interest are both equal to or less than a threshold value (step S51). If it is determined that both are equal to or less than the threshold value (YES in S51), the process proceeds to step S52. On the other hand, when it is determined that any one is larger than the threshold value (NO in S51), the process of step S51 is performed for the temporary segment to be focused next. Note that the temporary segment to be noted next may be, for example, a temporary segment positioned next to the temporary segment of interest in the raster scan order.

The image division processing unit 21 selects a temporary segment having an average color closest to the average color in the temporary segment of interest among the temporary segments adjacent to the temporary segment of interest (step S52). As an index for judging the closeness of colors, for example, the Euclidean distance between vectors when the three RGB values of pixel values are regarded as a three-dimensional vector can be used. As a pixel value of each segment, an average value of all pixel values included in each segment is used.

After the process of step S52, the image division processing unit 21 determines whether or not the proximity of the temporary segment of interest and the temporary segment that is determined to have the closest color is equal to or less than a certain threshold value. (Step S53). If it is determined that the value is larger than the threshold value (NO in step S53), the process of step S51 is performed for the temporary segment that should be noticed next. On the other hand, when it is determined that the value is equal to or less than the threshold value (NO in step S53), the process proceeds to step S54.

After the process of step S53, the image division processing unit 21 converts two provisional segments (provisional segments determined to be closest in color to the provisional segment of interest) into one provisional segment. (Step S54). The number of provisional segments is reduced by 1 by the process of step S54.

After the process of step S54, the average value of the pixel values of all the pixels included in the converted target segment is calculated (step S55). If there is a segment that has not yet been subjected to the processing of steps S51 to S55, the processing of step S51 is performed for the temporary segment to be noticed next.

After completing the processes of steps S51 to S55 for all the provisional segments, the process proceeds to the process of step S43.

The image division processing unit 21 compares the number of provisional segments before the process of step S42 and the number of provisional segments after the process of step S42 (step S43).

If the number of provisional segments has decreased (YES in step S43), the process returns to step S42. On the other hand, when the number of temporary segments does not change (NO in step S43), the image division processing unit 21 defines each current temporary segment as one segment.

By the above algorithm, for example, when the input texture image is an image of 1024 × 768 dots, it can be divided into several thousand segments (for example, 3000 to 5000 segments).

Note that, as described above, the segment is used to divide the distance image. Therefore, if the size of the segment becomes too large, various distance values are included in one segment, resulting in pixels having a large error from the representative value. As a result, the encoding accuracy of the distance image is increased. descend. Therefore, in the present invention, the process of step S51 is not essential, but it is desirable to prevent the segment size from becoming too large by limiting the segment size as in step S51.

In the above-described embodiment, the image division processing unit 21 is configured from a group of pixels whose difference between the maximum pixel value and the minimum pixel value is equal to or less than a predetermined threshold value from the input texture image # 1 ′. Although a plurality of segments are defined, an upper limit may be set for the number of pixels included in each segment. In addition, an upper limit may be provided for the width or height of the segment together with the upper limit of the number of pixels or instead of the upper limit of the number of pixels.

When the upper limit is provided, the number of segments defined by the image division processing unit 21 is larger than when no upper limit is provided. That is, as the number of segments increases, the size of the segments becomes relatively small. Therefore, by providing an upper limit, the moving image decoding apparatus 2 can decode a distance image that more faithfully reproduces the original distance image # 2.

(Appendix 2)
The image division processing unit 21 may perform a smoothing process on the input texture image # 1 ′. For example, the image division processing unit 21 is a non-patent document “C. Lawrence Zinick, Sing Bing Kang, Mattew Uyttendaele, Simon Winder and Richard Szeliski,“ High-quality video view interpolation using a layered representation, ”ACM Trans. On Graphics, 23 (3), 600-608, (2004) ”, the texture image # 1 ′ may be repeatedly smoothed to such an extent that the edge information is not lost.

Then, the image division processing unit 21 converts the texture image after the smoothing process into a plurality of segments each composed of a pixel group in which the difference between the maximum pixel value and the minimum pixel value is equal to or less than a predetermined threshold value. It may be divided.

When the above smoothing process is not performed, if the texture image # 1 ′ contains a lot of noise, the size of the segment is reduced. However, the smoothing process reduces the size of the segment. Can be suppressed. That is, by performing the smoothing process, the code amount of the encoded data # 25 can be reduced as compared with the case where the smoothing process is not performed.

In addition, the image division processing unit 21 may be arranged before the image encoding unit 11 instead of being arranged between the image decoding unit 12 and the distance image division processing unit 22. . That is, the image division processing unit 21 outputs the input texture image # 1 as it is to the subsequent image encoding unit 11, and each segment of the texture image # 1 has a predetermined difference between the maximum pixel value and the minimum pixel value. May be divided into a plurality of segments composed of pixel groups that are equal to or smaller than the threshold value, and segment information # 21 may be output to the distance image division processing unit 22 in the subsequent stage.

(Appendix 3)
Further, the positions of the distance value correcting unit 23 and the number assigning unit 24 may be interchanged. That is, the processing order of steps S7 and S8 shown in FIG.

In this case, the number assigning unit 24 receives the segment information # 22 in which the distance value set and the position information are associated with each segment from the distance image division processing unit 22. Then, the number assigning unit 24 scans the pixels included in the distance image in the raster scan order, and performs segment numbers in the scanned order for each segment that is an area divided by the position information of the segment information # 22. # 24 is assigned and associated with the distance value set of each segment included in the segment information # 22.

The distance value correcting unit 23 receives information in which the segment number # 24 and the distance value set are associated with each other from the number assigning unit 24. Then, the distance value correcting unit 23 calculates the mode value as the representative value # 23a from the distance value set of each segment. Then, the distance value correction unit 23 associates the segment number # 24 with the segment representative value # 23a and outputs the segment number # 24 to the prediction encoding unit 25.

(Appendix 4)
The number assigning unit 24 receives the segment information # 23 including the segment position information and the representative value # 23a of each segment, and the predictive encoding unit 25 converts the representative value # 23a and the segment number # 24 of each segment. However, it is not limited to this. For example, the number assigning unit 24 may output the segment position information to the predictive encoding unit 25 in addition to the representative value # 23a and the segment number # 24 of each segment. In this case, the predictive encoding unit 25 adds segment position information to encoded data # 25 obtained by encoding the difference value, and outputs the result to the packaging unit 28. The packaging unit 28 may add the position information of the segment to the encoded data # 28 instead of the encoded image # 11 of the texture image output from the image encoding unit 11. That is, in this case, the packaging unit 28 transmits the encoded data # 28 including the encoded data # 25 obtained by encoding the difference value and the segment position information to the video decoding device.

In this case, although details will be described later, when the encoded data # 25 is decoded in the moving picture decoding apparatus, the moving picture decoding apparatus decodes the encoded data # 25 based on the position information of the segment. The moving image decoding apparatus only needs to be able to divide a segment with the same division pattern as that of the moving image encoding apparatus 1, and thus can restore a distance image based on segment position information indicating the position of the segment. That is, even when there is no encoded image # 11 of the texture image, the distance image divided into segments based on the texture image can be restored. Therefore, it is sufficient for the packaging unit 28 to transmit the segment defining information (region information) defining the segment and the encoded data # 25 to the video decoding device. Here, the segment defining information is the texture image encoded data # 11 or the segment position information.

(Appendix 5)
When the prediction encoding unit 25 specifies a prediction reference pixel based on the representative pixel, in the example illustrated in FIG. 12C, in addition to the pixel B and the pixel D, the pixel C that is one above the representative pixel Although the prediction reference pixel is used, the present invention is not limited to this. At least one of the representative pixel and one pixel above the pixel A on the same scanning line as the representative pixel may be a predicted reference pixel. For example, in the example shown in (c) of FIG. 12, a pixel that is one pixel above the center pixel (one pixel to the right of the representative pixel) of the pixel A that is shaded may be used as the predicted reference pixel.

Further, the predictive encoding unit 25 calculates the predictive value of the encoding target segment based on the representative value of the segment having the predictive reference pixel, but is not limited thereto. For example, when the pixel values of the pixels included in each segment are the same value in the same segment (when the values can be regarded as constant), the pixel value of the predicted reference pixel is used instead of the representative value of the segment having the predicted reference pixel Based on the above, the predicted value of the encoding target segment may be calculated.

Further, the prediction encoding unit 25 may encode information indicating a calculation method of the prediction value and add it to the encoded data # 25. In this case, the packaging unit 28 transmits encoded data # 28 including information indicating the calculation method of the predicted value to the video decoding device. For example, the predictive coding unit 25 (1) “predicted value Z′_A is Z_B.” (2) “predicted value Z′_A is Z_C.” (3) “predicted value Z′_A is Z_D. (4) When calculating predicted values by selecting from the four predicted value calculation methods of “predicted value Z′_A is an average value of Z_B, Z_C, and Z_D”, those four calculation methods are used. May be represented by 2-bit information, and for each encoding target segment, the selected calculation method may be associated with the difference value of the encoding target segment to generate encoded data # 25. Further, for example, the predictive encoding unit 25 adds (5) “predicted value Z′_A as the median value of Z_B, Z_C, and Z_D” to the above four calculation methods, and adds these five calculation methods. Information to represent may be used.

In addition, for example, when the representative pixel is the upper leftmost pixel of the distance image, the predictive encoding unit 25 does not include a segment having a predicted reference pixel and a predicted reference pixel. The pixel value of the representative value and the prediction reference pixel is set to 0. That is, when specifying the prediction reference pixel based on the representative pixel, the prediction encoding unit 25 sets the representative value of the segment having the prediction reference pixel and the pixel value of the prediction reference pixel to 0 when there is no prediction reference pixel. To do.

FIG. 12 shows the case where the number of pixels on the same scanning line as the representative pixel and the representative pixel is 1 to 3, but the number of pixels is naturally not limited to this, and the number is 4 or more. Is also present. In those cases, processing can be performed in the same manner as described in the three examples.

Further, although the predictive encoding unit 25 encodes the difference value by the exponent Golomb encoding method, the encoding method is not limited to this. The exponent Golomb coding method makes the codeword very long for values far from 0, instead of making the codeword for values near 0 very short. For this reason, when the accuracy of prediction is not so good, it is better to use general Golomb coding instead of the exponential Golomb coding method, and the amount of information can be relatively compressed. That is, it is desirable to select an encoding method based on prediction accuracy (a distribution of differences between representative values and predicted values).

Also, there is no prediction reference pixel for the first segment with segment number # 24, and the prediction value is a value obtained by multiplying the representative value of that segment by −1, so that the value is far from zero. Therefore, for the first segment, a code word obtained by encoding the representative value of the segment itself with a fixed-length encoding method (for example, 8 bits) instead of the difference from the predicted value may be used. In this case, the amount of information can be further compressed.

(Appendix 6)
In the above embodiment, the moving image encoding apparatus 1 is the H.264 standard. The texture image # 1 is encoded using AVC encoding defined in the H.264 / MPEG-4 AVC standard, but the present invention is not limited to this. That is, the image encoding unit 11 of the moving image encoding apparatus 1 may encode the texture image # 1 using another encoding method such as MPEG-2 or MPEG-4. The texture image # 1 may be encoded using an encoding method established as the H.265 / HVC standard.

(Advantages of the video encoding device 1)
As described above, in the moving image encoding device 1, the image division processing unit 21 is a plurality of segments obtained by dividing the entire region of the texture image # 2, and the maximum pixel value and the minimum pixel group included in each region A plurality of segments are defined such that the difference from the pixel value is equal to or less than a predetermined threshold value. Further, the distance image division processing unit 22 defines a plurality of segments obtained by dividing the entire area of the distance image # 2 with the same division pattern as the plurality of segment division patterns defined by the image division processing unit 21. Further, for each segment defined by the distance image division processing unit 22, the distance value correction unit 23 calculates a representative value # 23a from the distance value of each pixel included in the segment.

The distance image encoding unit 20 generates encoded data # 25 including a plurality of representative values # 23a calculated by the distance value correcting unit 23.

With the above configuration, the moving image encoding apparatus 1 transmits the representative values # 23a corresponding to the number of segments at most as the encoded data # 25 of the distance image # 2 transmitted to the moving image decoding apparatus.

On the other hand, when the distance image is encoded using AVC encoding, the code amount of the encoded data of the distance image is clearly larger than the code amount of the encoded data # 25.

For example, if the image segmentation processing unit 21 (distance image segmentation processing unit 22) defines a plurality of segments by the method described in the above (Appendix 1), each segment is determined when the texture image is an image of 1024 × 768 dots. The number of pixels included in is about 3000 to 5000. On the other hand, when a distance image is encoded using AVC encoding, DCT transformation and quantization processing are performed for each block (4 × 4 = 16 pixels), and the total number of blocks is 49152. In addition, since the pixel values of all the pixels included in the block are encoded by AVC encoding, the encoding method of this embodiment is also used for the code amount per block of the distance image when AVC encoding is used. In this case, the code amount per segment of the distance image becomes larger.

Therefore, the moving image encoding apparatus 1 reduces the code amount of the encoded data of the distance image # 2 compared to the conventional moving image encoding apparatus that AVC encodes the distance image # 2 and transmits the encoded image to the moving image decoding apparatus. can do.

In the moving image encoding device 1, the distance image division processing unit 22 divides the distance image # 2 into segments, and the distance value correction unit 23 approximates the distance values of the pixels included in the segments to determine representative values. The number assigning unit 24 assigns numbers to the segments in the raster scan order. Then, for each segment, the predictive encoding unit 25 calculates a predicted value of the representative value of the segment based on the pixels that are close to the segment and whose raster scan order is earlier than the pixels included in the segment. A prediction value is subtracted from the value to calculate a difference value, and the difference values are arranged in numerical order and encoded to generate encoded data # 25.

The moving image encoding apparatus 1 compresses the spatial redundancy between segments even when the distance image # 2 transmitted to the moving image decoding apparatus is divided into segments of an arbitrary shape by the above configuration. Can be generated. Therefore, the moving image encoding device 1 has an effect that the code amount of the encoded data of the distance image # 2 transmitted to the moving image decoding device can be further reduced.

(Moving picture decoding apparatus 2)
Next, a moving picture decoding apparatus according to an embodiment of the present invention will be described below with reference to FIGS. The moving picture decoding apparatus according to the present embodiment uses the texture image # 1 ′ and the distance picture # from the encoded data # 28 transmitted from the moving picture encoding apparatus 1 described above for each frame constituting the moving picture to be decoded. This is a moving picture decoding apparatus for decoding 2 ′.

First, the configuration of the video decoding device according to the present embodiment will be described with reference to FIG. FIG. 16 is a block diagram illustrating a main configuration of the video decoding device.

As shown in FIG. 16, the moving image decoding apparatus 2 includes an image decoding unit 12, an image division processing unit (dividing unit) 21 ', a numbering unit (numbering unit, assigning unit) 24' an unpackaging unit (receiving unit). ) 31 and a predictive decoding unit (predicted value calculating means, pixel value setting means) 32.

The unpackaging unit 31 extracts the encoded data # 11 of the texture image # 1 and the encoded data # 25 of the distance image # 2 from the received encoded data # 28.

The image decoding unit 12 decodes the texture image # 1 'from the encoded data # 11. The image decoding unit 12 is the same as the image decoding unit 12 included in the moving image encoding device 1. That is, the image decoding unit 12 is configured to transmit the encoded data # 28 from the moving image encoding apparatus 1 to the moving image decoding apparatus 2 as long as no noise is mixed in the encoded data # 28. The texture image # 1 ′ having the same content as the texture image decoded by the image decoding unit 12 is decoded.

The image division processing unit 21 ′ divides the entire area of the texture image # 1 ′ into a plurality of segments (areas) using the same algorithm as the image division processing unit 21 of the video encoding device 1. Then, the image division processing unit 21 ′ generates segment information # 21 ′ including the position information of each segment, and outputs it to the number assigning unit 24 ′.

The number assigning unit 24 'assigns a number to each segment divided based on the segment information # 21' in the raster scan order by the same algorithm as the number assigning unit 24 of the video encoding device 1. The number assigning unit 24 ′ generates a segment identification image # 24 ′ in which the number assigned to the segment position information is associated, and outputs the generated image to the predictive decoding unit 32.

Here, the segment identification image # 24 'is information in which a number is associated with segment position information indicating the position of each segment. As will be described later, the predictive decoding unit 32 can specify the arrangement of each segment in the entire image, the number of pixels included in each segment based on the segment position information, and can also specify the number of pixels in the entire image. . Therefore, the predictive decoding unit 32 can restore an image that is divided into segments and does not have information indicating the pixel values of the pixels that form the image, based on the segment position information.

The segment identification image # 24 ′ divides the texture image # 1 ′ into segments, and assigns the segment number “i-1” to the i-th segment in the raster scan order, so that the texture image # 1 ′ The pixel value of each pixel included in the i th segment may be replaced with “i−1”. In this case, the image division processing unit 21 ′ divides the texture image # 1 ′ into segments, and the number assigning unit 24 ′ assigns the segment number “i-1” to the i-th segment in the raster scan order, and the above i The pixel value of each pixel included in the th segment may be replaced with “i−1”.

The predictive decoding unit 32 performs a predictive decoding process based on the input encoded data # 25 and the segment identification image # 24 'to restore the distance image # 2'. Specifically, the predictive decoding unit 32 decodes the encoded data # 25, generates difference values arranged in order, and identifies the generated difference values in the order given by the number assigning unit 24 ′. It assigns to each segment prescribed | regulated by segment information # 21 'of image # 24'. Next, the predictive decoding unit 32 calculates the predicted value of the segment for each segment in the order given by the number assigning unit 24 ′, and adds the assigned difference value to the calculated predicted value. Set the value as the distance value for each segment. Then, the predictive decoding unit 32 sets the distance value of the set segment as the pixel value (distance value) of all the pixels included in the segment, and restores the distance image # 2 '. The prediction decoding unit 32 outputs the restored distance image # 2 ′ to a stereoscopic video display device (not shown) outside the moving image decoding device 2.

(Operation of the video decoding device 2)
Next, the operation of the video decoding device 2 will be described below with reference to FIG. FIG. 17 is a flowchart showing the operation of the video decoding device 2. The operation of the moving image decoding apparatus 2 described here is an operation of decoding a texture image and a distance image of the t-th frame from the top in a three-dimensional moving image including a large number of frames. That is, the moving image decoding apparatus 2 repeats the operation described below as many times as the number of frames of the moving image in order to decode the entire moving image. Further, in the following description, unless otherwise specified, each data # 1 to # 28 is interpreted as data of the t-th frame.

First, the unpackaging unit 31 extracts the encoded data # 11 of the texture image and the encoded data # 25 of the distance image from the encoded data # 28 received from the moving image encoding device 1. Then, the unpackaging unit 31 outputs the encoded data # 11 to the image decoding unit 12, and outputs the encoded data # 25 to the predictive decoding unit 32 (Step S21).

The image decoding unit 12 decodes the texture image # 1 ′ from the input encoded data # 11, and sends it to the image division processing unit 21 ′ and a stereoscopic video display device (not shown) outside the moving image decoding device 2. Output (step S22).

The image division processing unit 21 ′ defines a plurality of segments with the same algorithm as the image division processing unit 21 of the moving image encoding device 1. Then, the image division processing unit 21 'generates segment information # 21' composed of the position information of each segment, and outputs it to the number assigning unit 24 '(step S23).

The number assigning unit 24 'assigns a number to each segment divided based on the segment information # 21' in the raster scan order by the same algorithm as the number assigning unit 24 of the video encoding device 1. The number assigning unit 24 'generates a segment identification image # 24' in which the number assigned to the segment position information is associated, and outputs the segment identifying image # 24 'to the predictive decoding unit 32 (step S24).

After step S24, the predictive decoding unit 32 performs a predictive decoding process based on the input encoded data # 25 and the segment identification image # 24 'to restore the distance image # 2' (step S25). Specifically, the predictive decoding unit 32 decodes the encoded data # 25, generates difference values arranged in order, and identifies the generated difference values in the order given by the number assigning unit 24 ′. It assigns to each segment prescribed | regulated by segment information # 21 'of image # 24'. Next, the predictive decoding unit 32 calculates the predicted value of the segment for each segment in the order given by the number assigning unit 24 ′, and adds the assigned difference value to the calculated predicted value. Set the value as the distance value for each segment. Then, the predictive decoding unit 32 sets the distance value of the set segment as the pixel value (distance value) of all the pixels included in the segment, and restores the distance image # 2 '.

The predictive decoding unit 32 outputs the restored distance image # 2 'to a stereoscopic video display device (not shown) outside the video decoding device 2. As described above, the texture image # 1 'and the distance image # 2' can be restored.

(Predictive decoding process)
Details of the predictive decoding process executed by the predictive decoding unit 32 in step S25 will be described with reference to FIG. FIG. 18 is a flowchart illustrating an example of the predictive decoding process executed by the predictive decoding unit 32.

First, the predictive decoding unit 32 uses the encoded data # 25 input from the unpackaging unit 31 as a code used when the predictive encoding unit 25 of the video encoding device 1 generates the encoded data # 25. The difference values arranged in order are generated by decoding using the conversion method (step S201). That is, in this embodiment, the predictive decoding unit 32 decodes the encoded data # 25 illustrated in FIG. 14 using the exponential Golomb encoding method illustrated in FIG.

The predictive decoding unit 32 sets each segment defined by the segment information # 21 ′ of the segment identification image # 24 ′ according to the order in which the number assigning unit 24 ′ assigns the difference values arranged in order from the top. (Step S202).

Next, “i”, which is the number assigned by the number assigning unit 24 ′, is set to “0” (step S 203). Then, the segment with the number “i” assigned by the number assigning unit 24 ′ is set as a decoding target segment (decoding target region) (step S 204). That is, the segment with the head number assigned by the number assigning unit 24 'is set as a decoding target segment.

Next, the representative pixel of the decoding target segment to be used for calculating the prediction value is specified from the pixels included in the decoding target segment (step S205). Specifically, the pixels included in the decoding target segment and first scanned in the raster scan order in step S24 are set as representative pixels.

After identifying the representative pixel, the predictive decoding unit 32 performs prediction based on the identified representative pixel using the same method as the method of identifying the prediction reference pixel used by the predictive coding unit 25 of the video encoding device 1. A reference pixel is specified (step S206). Specifically, a pixel that is included in the decoding target segment and is adjacent to a pixel on the same scan line as the representative pixel and whose raster scan order is earlier than the representative pixel is set as a predicted reference pixel. For example, the prediction reference pixel is a pixel adjacent to a pixel in the decoding target segment that is one pixel before the representative pixel in the raster scan order and adjacent to a pixel on the same scan line as the representative pixel, and the representative pixel A pixel in the previous scan line in the raster scan order and a pixel adjacent to the next pixel in the raster scan order of the last pixel of the same scan line as the representative pixel included in the decoding target segment In this case, the pixel group may include a pixel on the previous scan line in the raster scan order of the representative pixel. In addition, for example, the prediction reference pixel is a pixel adjacent to the pixel in the decoding target segment and the pixel on the same scan line as the representative pixel, and the pixel immediately before the raster scan order of the representative pixel, One of the pixels of the previous scan line in the raster scan order of the representative pixel and one in the raster scan order of the last pixel included in the decoding target segment and the same scan line as the representative pixel Three pixels including a pixel adjacent to a subsequent pixel and a pixel on the previous scan line in the raster scan order of the representative pixel may be used.

After specifying the prediction reference pixel in step S206, the prediction decoding unit 32 uses the same method as the method for calculating the prediction value used by the prediction encoding unit 25 of the video encoding device 1 to specify the specified prediction reference pixel. Based on the pixel value, a predicted value of the representative value of the decoding target segment is calculated (step S207). For example, the predicted value may be the median value of the pixel values of the prediction target pixels. The predicted value may be an average value of the pixel values of the prediction target pixels. Further, the predicted value may be any one of the pixel values of the prediction target pixel.

The prediction decoding unit 32 adds the difference value assigned to the decoding target segment to the calculated prediction value, and sets the value as a representative value of the decoding target segment (step S208). Then, the predictive decoding unit 32 sets the pixel values of all the pixels included in the decoding target segment to the representative values of the set decoding target segment (step S209).

After step S209, it is confirmed whether or not “i”, which is the number assigned by the number assigning unit 24 ', is “M-1” (step S210). If it is not “i = M−1” (NO in step S210), “i”, which is the number assigned by the number assigning unit 24 ′, is set to “i + 1” (step S211), and the processes of steps S204 to S209 are executed. . On the other hand, if “i = M−1” (YES in step S210), the process proceeds to step S212.

That is, after step S209, it is confirmed whether or not the pixel values of the pixels included in the segments are set for all (M) segments. If the pixel values are not set for all the segments, the number assigning unit The processes of steps S204 to S209 are executed in the order of numbers assigned by 24 '. If pixel values are set for all segments, the process proceeds to step S212.

In step S212, all the segments in which the pixel values of the belonging pixels are set are combined to restore the distance image # 2 '(step S212).

The operation of the video decoding device 2 has been described above. The distance image # 2 ′ decoded by the prediction decoding unit 32 in step S25 is generally the distance image # 2 input to the video encoding device 1. The distance image approximates to.

As described above, this is because, from the correlation between the texture image # 1 and the distance image # 2, “when the texture image # 1 ′ is divided into a plurality of segments each composed of a group of pixels of similar colors” This is because it can be said that all or almost all pixels included in a single segment in the distance image # 2 have the same distance value. That is, the distance image # 2 ′ is the same as the image obtained by changing the distance value of a very small part included in the segment in the distance image # 2 to the representative value in the segment. It can be said that the distance image # 2 is approximate.

(Advantages of the video decoding device 2)
As described above, in the video decoding device 2, the image division processing unit 21 ′ defines a plurality of segments obtained by dividing the entire area of the texture image # 1 ′. Specifically, the image division processing unit 21 ′ defines a plurality of segments each including a group of pixels each having a similar color.

Also, the predictive decoding unit 32 reads the encoded data # 25. The encoded data # 25 is data including at most one representative value # 23a as a distance value for each of a plurality of segments constituting the distance image # 2 'to be decoded. Note that the division pattern of the plurality of segments constituting the distance image # 2 'to be decoded is the same as the division pattern of the plurality of segments defined by the image division processing unit 21'.

In addition, the moving image decoding apparatus 2 uses a decoding method corresponding to an encoding method in which the representative value of the segment obtained by dividing the distance image # 2 by the moving image encoding apparatus 1 is encoded into encoded data # 25. Since the encoded data # 25 is decoded, the representative value of each segment generated by the video encoding device 1 can be accurately restored. Therefore, the moving image decoding apparatus 2 can accurately restore the distance image # 2 'that approximates the distance image # 2.

The distance image # 2 ′ restored from the encoded data # 25 by the moving image decoding apparatus 2 is similar to the distance image # 2 encoded by the moving image encoding apparatus 1 as described above. The device 2 can decode an appropriate distance image.

In addition to the above, it will be shown below that the distance image # 2 'decoded by the video decoding device 2 has further advantages.

That is, when a three-dimensional image is generated from the texture image # 1 ′ on which the subject and the background are drawn and the distance image # 2, the contour of the subject in the generated three-dimensional image is the subject and background in the distance image # 2. It depends on the shape of the boundary.

Generally, although the texture image # 1 'and the distance image # 2 match the position of the boundary between the subject and the background, the position of the boundary between the subject and the background may not match. In this case, in the texture image # 1 generated by camera photographing and the distance image # 2 generated by the distance measuring device, the texture image reproduces the shape of the edge portion between the subject and the background more faithfully.

The position of the boundary between the subject and the background in the distance image # 2 ′ decoded by the moving image decoding apparatus 2 often coincides with the position of the boundary between the subject and the background in the texture image # 1. This is because, in general, the subject color and the background color are significantly different in the texture image # 1, and the boundary between the subject and the background becomes the segment boundary in the texture image # 1.

Therefore, the three-dimensional image reproduced by the stereoscopic image display device from the texture image # 1 ′ and the distance image # 2 ′ output from the moving image decoding apparatus 2 according to the present embodiment is the texture image # 1 ′ and the distance image # 2. In addition to being substantially faithful to the three-dimensional image reproduced from the above, in some cases, it becomes a three-dimensional image reproducing the real subject more faithfully.

(Appendix 7)
As described above, the video decoding device 2 restores the distance image # 2 ′ from the encoded data # 28 using a decoding method corresponding to the encoding method used by the video encoding device 1. For this reason, the moving image encoding device 1 and the moving image decoding device 2 may determine the encoding method and the decoding method in advance before performing the encoding and decoding processes, respectively.

In addition, the video decoding device 2 receives the information indicating the encoding method together with the encoded data # 28 (encoded data # 25) from the video encoding device 1, and corresponds to the encoding method indicated by the received information. The decoding method to be performed may be specified, and the distance image # 2 ′ may be restored based on the specified decoding method. At this time, information indicating the encoding method may be associated with each segment included in the encoded data # 25. By doing in this way, the moving picture coding apparatus 1 can use the optimum coding method for each segment, and the moving picture decoding apparatus 2 is coded by a coding method that is different for each segment. Even in the case of data, the data can be accurately decoded.

For example, the information indicating the encoding method includes a variable length encoding method for converting a difference value into a code word, information indicating a fixed length encoding method, and a prediction reference pixel specification that specifies a prediction reference pixel based on a representative pixel. Prediction reference pixel specifying method information indicating a method, prediction value calculation method information indicating a prediction value calculation method for calculating a prediction value based on a representative value of a segment having a prediction reference pixel, and the like. In addition to the information indicating the encoding method, the division method information indicating the segment division method in which the image division processing unit 21 divides the segment, and the numbering rule information indicating the order (rule) in which the number assigning unit 24 assigns the numbers. Also, representative pixel specifying method information indicating a representative pixel specifying method for specifying a representative pixel may be included.

Specifically, when the moving image encoding apparatus 1 encodes the segment representative value only for the first segment with the segment number # 24 by the fixed-length encoding method, the moving image decoding apparatus 2 displays information indicating that fact. Is received together with the encoded data # 28, only the first code word of the encoded data # 25 is decoded by the fixed-length encoding method, and the representative value of the first segment is set to the decoded value. The pixel values of all the pixels included in the segment are set to decoded values.

(Appendix 8)
In the present embodiment, the moving image decoding apparatus 2 receives the encoded data # 28 including the encoded data # 11 of the texture image and the encoded data # 25 of the distance image. However, the present invention is not limited to this. Absent. For example, the moving image decoding apparatus 2 may receive the encoded data # 25 of the distance image and the position information of the segment. In this case, the number assigning unit 24 ′ assigns a number to each segment divided based on the segment position information in the raster scan order. The number assigning unit 24 ′ generates a segment identification image # 24 ′ in which the number assigned to the segment position information is associated, and outputs the segment identifying image # 24 ′ to the predictive decoding unit 32.

For example, even when the moving image encoding device 1 transmits the encoded image data # 11 of the texture image and the encoded data # 25 of the distance image to different decoding devices, the encoded data # 25 of the distance image Can receive the segment position information together with the encoded data # 25 of the distance image, thereby restoring the distance image.

(Appendix 9)
In the above embodiment, the moving image encoding apparatus 1 transmits the encoded data # 25 to the moving image decoding apparatus 2. However, the moving image encoding apparatus 1 transmits the encoded data # 25 to the moving image decoding apparatus 2 as follows. Then, the encoded data # 25 may be supplied.

That is, the moving image encoding apparatus 1 and the moving image decoding apparatus 2 are provided with access means that can access a removable recording medium such as an optical disk drive, and the moving image encoding apparatus 1 and the moving image decoding apparatus 2 are connected via the recording medium. Alternatively, the encoded data # 25 may be supplied. In other words, the encoding apparatus of the present invention does not necessarily include a means for transmitting data, and the decoding apparatus of the present invention does not necessarily include a receiving means for receiving data.

<Embodiment 2>
Next, a moving picture coding apparatus and a moving picture decoding apparatus according to another embodiment of the present invention will be described below with reference to FIGS. 19 and 20. First, the moving picture coding apparatus according to the present embodiment will be described.

The moving image encoding apparatus according to this embodiment is H.264 for encoding texture images. On the other hand, the MVC coding adopted as the MVC standard in H.264 / AVC is used, while the distance picture is coded by a moving picture coding apparatus using a coding technique peculiar to the present invention. The moving image encoding apparatus according to the present embodiment is different from the moving image encoding apparatus 1 in that a plurality of sets (N sets) of texture images and distance images are encoded per frame. Here, the N sets of texture images and distance images are images of subjects simultaneously captured by cameras and ranging devices installed at N locations so as to surround the subject. That is, the N sets of texture images and distance images are images for generating a free viewpoint image. In addition, each set of texture image and distance image includes actual data of the texture image and distance image of the set, and a camera parameter indicating which azimuth angle is an image generated by a camera and a distance measuring device. Is included as metadata.

Hereinafter, the configuration of the moving picture encoding apparatus of the present embodiment will be described with reference to FIG.

(Moving picture encoding device)
FIG. 19 is a block diagram showing a main configuration of the moving picture encoding apparatus according to the present embodiment. As shown in FIG. 19, the moving image encoding apparatus 1A includes an image encoding unit 11A, an image decoding unit 12A, a distance image encoding unit 20A, and a packaging unit (transmission means) 28A. The distance image encoding unit 20A includes an image division processing unit 21A, a distance image division processing unit (dividing unit) 22A, a distance value correcting unit (representative value determining unit) 23A, a number assigning unit (number assigning unit) 24A, and A predictive encoding unit (predicted value calculating means, difference value calculating means, encoding means) 25A is provided.

The image encoding unit 11A N view components (that is, texture images # 1-1 to # 1-N) are encoded by MVC encoding (multi-view video encoding) defined in the MVC standard in H.264 / AVC, and each view component is Coded data # 11-1 to # 11-N are generated. Further, the image encoding unit 11A converts the encoded data # 11-1 to # 11-N into the image decoding unit 12A and the packaging unit 28 together with view IDs “1” to “N” that are parameters by NAL header extension. Output to '.

The image decoding unit 12A decodes the texture images # 1′-1 to 1′-N from the encoded data # 11-1 to # 11-N of the texture image # 1 by the decoding method stipulated in the MVC standard. To do.

The image division processing unit 21 divides the entire area of the texture image # 1'-j into a plurality of segments (areas). Then, the image division processing unit 21 outputs segment information # 21-j including the position information of each segment.

When the distance image # 2-j and the segment information # 21-j are input, the distance image division processing unit 22A corresponds to each segment in the texture image # 1′-j in the distance image # 2-j. A distance value set including the distance values of each pixel included in the segment (region) is extracted. Then, the distance image division processing unit 22A generates segment information # 22-j in which the distance value set and the position information are associated with each segment from the segment information # 21-j.

Further, the distance image division processing unit 22A generates a view ID “j” of the distance image # 2-j, and generates segment information # 22A-j in which the view ID “j” is associated with the segment information # 22-j. To do.

The distance value correcting unit 23A calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22A-j for each segment of the distance image # 2-j. Then, the distance value correcting unit 23 replaces the distance value set of each segment included in the segment information # 22A-j with the representative value # 23a-j of the corresponding segment, and the number assigning unit 24A as the segment information # 23A-j Output to.

Number giving unit 24A, when the segment information # 23A-j is input, for each set of M _j sets of position information and the representative value # 23a-j contained in the segment information # 23A-j, representative value # 23a-j is associated with segment number # 24-j corresponding to the position information. The number assigning unit 24A then associates M _j sets of segment numbers # 24-j and representative values # 23a-j with the view ID “j” included in the segment information # 23A-j. 24A-j is output to the predictive coding unit 25A.

The predictive encoding unit 25A performs predictive encoding processing for each viewpoint based on the M _j sets of representative values # 23a-j and segment numbers # 24-j included in the input data # 24A-j, The encoded data # 25-j is output to the packaging unit 28. Specifically, the predictive coding unit 25A calculates the predicted value of the segment for each segment in the order of the segment number # 24-j, and subtracts the predicted value from the representative value # 23a-j to obtain the difference value. Calculate and encode the difference value. Then, the predictive encoding unit 25 arranges the encoded difference values in the order of the segment numbers # 24-j to generate encoded data # 25-j.

The prediction encoding unit 25A uses the encoded data of the distance image # 2-j for each j from 1 to N obtained in this way as the VCL / NAL unit and the view ID “j” as the non-VCL / NAL unit. Is transmitted to the packaging unit 28A.

The packaging unit 28A integrates the encoded data # 11-1 to # 11-N of the texture images # 1-1 to # 1-N and the encoded data # 25A to thereby convert the encoded data # 28A. Generate. Then, the packaging unit 28A transmits the encoded data # 28A to the video decoding device.

(Video decoding device)
Next, the configuration of the video decoding device according to the present embodiment will be described with reference to FIG. FIG. 20 is a block diagram showing a main configuration of the moving picture decoding apparatus according to the present embodiment. As shown in FIG. 20, the moving picture decoding apparatus 2A includes an image decoding unit 12A, an image division processing unit (dividing unit) 21A ′, a numbering unit (numbering unit, assigning unit) 24A ′, an unpackaging unit (reception). Means) 31A and a predictive decoding unit (predicted value calculating means, pixel value setting means) 32A.

The image decoding unit 12A decodes the texture images # 1′-1 to 1′-N from the encoded data # 11-1 to # 11-N of the texture image # 1 by a decoding method defined in the MVC standard. .

The unpackaging unit 31A extracts the encoded data # 11-j of the texture image # 1 and the encoded data # 25A of the distance image # 2 from the received encoded data # 28A.

The image division processing unit 21A 'divides the entire region of the texture image # 1'-j into a plurality of segments (regions) by the same algorithm as the image division processing unit 21A of the moving image encoding device 1A. Then, the image division processing unit 21A ′ generates segment information # 21′-j including the position information of each segment, and outputs it to the number assigning unit 24A ′.

The number assigning unit 24A 'assigns a number to each segment divided based on the segment information # 21'-j in the raster scan order by the same algorithm as the number assigning unit 24A of the moving image encoding device 1A. The number assigning unit 24A 'generates a segment identification image # 24'-j in which the number assigned to the segment position information is associated, and outputs it to the predictive decoding unit 32A.

The predictive decoding unit 32A extracts the encoded data # 25-j and the view ID “j” from the input encoded data # 25A. Next, predictive decoding processing is performed based on the encoded data # 25-j and the segment identification image # 24'-j to restore the distance images # 2'-1 to # 2'-N. Specifically, the prediction decoding unit 32A decodes the distance image # 2'-j as follows.

The predictive decoding unit 32A decodes the encoded data # 25-j, generates differential values arranged in order, and generates the generated differential values in the order given by the number assigning unit 24A ′. This is assigned to each segment defined by the segment information # 21'-j of 24'-j. Next, the predictive decoding unit 32A calculates the predicted value of the segment for each segment in the order given by the number assigning unit 24A ′, and adds the assigned difference value to the calculated predicted value. Set the value as the distance value for each segment. Then, the predictive decoding unit 32A sets the distance value of the set segment as the pixel value (distance value) of all the pixels included in the segment, and restores the distance image # 2'-j. The predictive decoding unit 32 associates the restored distance image # 2′-j with the view ID “j” included in the encoded data # 25A to provide a stereoscopic video display device (not shown) outside the video decoding device 2A. ).

Note that the image decoding unit 12 is the same as the image decoding unit 12 of the video decoding device 2 of the first embodiment, and a description thereof will be omitted.

(Appendix 10)
In the above-described embodiment, the moving image encoding device 1A and the moving image decoding device 2A have N sets of texture images and distance images of a subject captured simultaneously by cameras and ranging devices installed at N locations so as to surround the subject. Then, an encoding process and a decoding process were performed.

Needless to say, the moving image encoding device 1A and the moving image decoding device 2A can perform encoding processing and decoding processing on N sets of texture images and distance images generated as follows. .

That is, the moving image encoding device 1A and the moving image decoding device 2A are generated by N sets of cameras and ranging devices installed in one place so that each set of cameras and ranging devices faces different directions. Also, encoding processing and decoding processing can be performed on the N sets of texture images and distance images. That is, the moving image encoding device 1A and the moving image decoding device 2A perform the encoding process and the decoding process on N sets of texture images and distance images for generating omnidirectional images, panoramic images, and the like. Can do.

In this case, the texture image and the distance image of each set indicate the direction of the image generated by the camera and the distance measuring device in which direction it is directed together with the actual data of the texture image and the distance image of the set. Camera parameters are included as metadata.

(Appendix 11)
In the second embodiment, the image encoding unit 11A of the moving image encoding apparatus 1A is configured as H.264. Although texture images # 1-1 to 1-N are encoded using MVC encoding defined in the MVC standard in H.264 / AVC, the present invention is not limited to this.

That is, the image encoding unit 11A of the moving image encoding device 1A uses other encoding methods such as a VSP (View Synthesis Prediction) encoding method, an MVD encoding method, and an LVD (Layered Video Depth) encoding method. Texture images # 1-1 to 1-N may be encoded. In this case, if the image decoding unit 12A of the video decoding device 2A is configured to decode the texture images # 1 ′ to 1′-N by a decoding method corresponding to the encoding method employed by the image encoding unit 11A. Good.

(Means for solving problems)
In order to solve the above-described problem, an encoding apparatus according to the present invention is an encoding apparatus that encodes an image, and is divided by a dividing unit that divides the entire area of the image into a plurality of regions, and the dividing unit. For each of the plurality of regions, representative value determining means for determining a representative value from the pixel value of each pixel included in the region, number giving means for assigning a number to the plurality of regions in raster scan order, The above-mentioned area is set as the encoding target area in the order of the numbers given by the above-mentioned number assigning means, and among the pixels included in the encoding target area, the first pixel in the raster scan order is set as the representative pixel, and is included in the encoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and that has a raster scan order before the representative pixel is a predicted reference pixel, and a representative value of an area having the predicted reference pixel Based on at least one, the predicted value calculating means for calculating the predicted value of the encoding target area, and the predicted value calculating means calculates the representative value determined by the representative value determining means for each of the encoding target areas. The difference value calculation means for calculating the difference value by subtracting the predicted value and the difference value calculated by the difference value calculation means are arranged and encoded in the order given by the number assignment means, and the encoded data of the image is obtained. And encoding means for generating.

The encoding apparatus according to the present invention further includes transmission means for associating the encoded data of the image generated by the encoding means with the area information defining the plurality of areas, and transmitting the associated information to the outside. It is desirable to have it.

According to the above configuration, the transmission unit transmits the encoded data of the image generated by the encoding unit and the region information defining the plurality of regions in association with each other. Therefore, the device that has received the encoded data and the region information can further accurately decode the received encoded data by dividing the image into the plurality of regions based on the region information. Play.

In the encoding apparatus according to the present invention, it is preferable that the encoding means encodes the difference value by a variable length encoding method in which a code word is shorter as a value to be encoded is closer to 0.

According to the above configuration, the encoding means encodes the difference value by a variable length encoding method in which the code word is shorter as the value to be encoded is closer to 0. Here, when the prediction value of the encoding target region calculated by the prediction value calculating unit approximates the representative value of the encoding target region (when the prediction accuracy of the prediction value calculating unit is high), the difference value Is a very small value. Therefore, when the prediction accuracy of the prediction value calculation unit is high, the encoding unit encodes the difference value using a variable length encoding method, thereby further reducing the amount of encoded data. Play.

In the encoding device according to the present invention, the prediction value calculation means includes a prediction reference pixel that is included in the encoding target area and the pixel immediately before the representative pixel in the raster scan order, and is the same as the representative pixel. A pixel adjacent to the scan line pixel, the pixel of the scan line immediately preceding in the raster scan order of the representative pixel, and the last pixel of the same scan line included in the encoding target area and the representative pixel. It is desirable that the pixel group includes a pixel adjacent to the next pixel in the raster scan order of the pixels and a pixel on the previous scan line in the raster scan order of the representative pixel.

According to the above configuration, the predicted value calculation means includes the predicted reference pixel, the pixel immediately before the representative pixel in the raster scan order, and the same scan line as the representative pixel included in the encoding target region. Of the scan line that is adjacent to the first pixel in the raster scan order of the representative pixel and the last pixel of the same scan line that is included in the encoding target area and that is the same as the representative pixel. The pixel group includes a pixel adjacent to the next pixel in the raster scan order, and a pixel on the previous scan line in the raster scan order of the representative pixel.

Here, in order to improve the accuracy of prediction of the prediction value of the encoding target region, it is desirable to refer to pixels in as many regions as possible adjacent to or close to the encoding target region. However, as described above, in order to efficiently decode the encoded data, it is necessary to refer to pixels that are encoded prior to the encoding target region.

Therefore, assuming that the raster scan order is from the upper left to the lower right of the image, the pixel immediately before the representative pixel in the raster scan order is a pixel adjacent in the left direction of the representative pixel. Further, a pixel that is included in the encoding target area and is adjacent to a pixel on the same scan line as the representative pixel and that is on the previous scan line in the raster scan order of the representative pixel is the encoding target. The pixel is included in the region and is adjacent in the upward direction to the pixel of the same scan line as the representative pixel. Further, a pixel that is included in the encoding target area and is adjacent to the next pixel in the raster scan order of the last pixel of the same scan line as the representative pixel, and 1 in the raster scan order of the representative pixel. The pixel on the previous scan line is a pixel that is included in the encoding target region and is adjacent to the uppermost pixel of the last scan line on the same scan line as the representative pixel in the diagonally upper right direction.

That is, a pixel in three directions of a pixel adjacent in the left direction of the encoding target region, a pixel adjacent in the upward direction, and a pixel adjacent to the upper right (a pixel in the right direction of the encoding target region) is used as a prediction reference pixel. Yes. Therefore, since the prediction value is calculated with reference to the pixels in the raster scan order before the encoding target area and the pixels in multiple directions, the prediction value can be predicted with high accuracy.

In the encoding device according to the present invention, the prediction value calculation means includes a prediction reference pixel that is included in the encoding target area and the pixel immediately before the representative pixel in the raster scan order, and is the same as the representative pixel. A pixel adjacent to a pixel on the scan line, and one of the pixels on the previous scan line in the raster scan order of the representative pixel and included in the encoding target area and the same as the representative pixel Three pixels including a pixel adjacent to the next pixel in the raster scan order of the last pixel of the scan line and including a pixel in the previous scan line in the raster scan order of the representative pixel. It is desirable.

According to the above configuration, the predicted value calculation means includes the predicted reference pixel, the pixel immediately before the representative pixel in the raster scan order, and the same scan line as the representative pixel included in the encoding target region. Any one of the pixels on the previous scan line in the raster scan order of the representative pixel and the same scan line as the representative pixel included in the encoding target region Are the pixels adjacent to the next pixel in the raster scan order of the last pixel, and the pixels on the previous scan line in the raster scan order of the representative pixel.

That is, predictive reference is made to three pixels in three directions: a pixel adjacent in the left direction of the encoding target region, a pixel adjacent in the upward direction, and a pixel adjacent to the upper right (a pixel in the right direction of the encoding target region). It is a pixel. Therefore, since the prediction reference pixel is a pixel located in multiple directions with respect to the encoding target region and is a pixel group (three pixels) as few as possible, the processing load for calculating the prediction value is reduced, There is a further effect that the predicted value can be predicted with high accuracy.

In the encoding device according to the present invention, in the encoding device according to the present invention, the prediction value calculation means sets the median of the representative values of the region having the prediction reference pixel as the prediction value of the encoding target region. Is desirable.

According to the above configuration, the predicted value calculation means sets the median of the representative values of the region having the predicted reference pixel as the predicted value of the encoding target region.

Basically, the representative value of the encoding target region and the representative value of the region having the prediction reference pixel are approximate, but the representative value of the region having a certain prediction reference pixel is the representative of the encoding target region. It is also possible that the value is very different. At this time, when the representative value of the region having any prediction reference pixel is used as the prediction value of the encoding target region as it is, when the certain prediction reference pixel is selected,
The prediction value of the encoding target region is greatly different from the representative value of the encoding target region, and the accuracy of the prediction value is reduced.

Therefore, even if the representative value of the region having a certain prediction reference pixel is significantly different from the representative value of the region to be encoded, the median of the representative value of the region having the prediction reference pixel is encoded. By using the predicted value of the target area, the predicted value can be predicted with a stable accuracy.

In the encoding apparatus according to the present invention, it is desirable that the predicted value calculation means uses the average value of the representative values of the region having the predicted reference pixel as the predicted value of the encoding target region.

According to the above configuration, the predicted value calculation means sets the average value of the representative values of the region having the predicted reference pixel as the predicted value of the encoding target region. Therefore, even when the representative value of an area having a certain prediction reference pixel is significantly different from the representative value of the encoding target area, it is possible to predict the predicted value with stable accuracy. Play.

In the encoding apparatus according to the present invention, it is desirable that the predicted value calculation means use any one of the representative values of the region having the prediction reference pixel as the predicted value of the encoding target region.

According to the above configuration, the predicted value calculation means sets one of the representative values of the region having the predicted reference pixel as the predicted value of the encoding target region.

Here, when the accuracy of the predicted value does not decrease even if the representative value of the region having a certain prediction reference pixel that is significantly different from the representative value of the encoding target region is used as the predicted value of the encoding target region, the above configuration is used. By doing so, there is an additional effect that it is possible to reduce the processing load of the prediction value calculation while maintaining the accuracy of the prediction value.

In other words, any one of the representative values of the area having the prediction reference pixel is set as the prediction value of the encoding target area, and the median value or the average value of the representative values of the area having the prediction reference pixel is encoded. When there is no difference between the predicted value of the target area and the accuracy of the predicted value, the above configuration can reduce the processing load for calculating the predicted value while maintaining the accuracy of the predicted value. There is a further effect of being able to.

In the encoding apparatus according to the present invention, the transmission means further includes a prediction value calculation method indicating a prediction value calculation method executed by the prediction value calculation means in addition to the encoded data and the region information of the image. It is desirable to associate information and transmit it to the outside.

According to the above configuration, in addition to the encoded data and the region information of the image, the transmission unit further includes prediction value calculation method information indicating a prediction value calculation method executed by the prediction value calculation unit. Relate to the outside and transmit. Therefore, even if the device that has received the encoded data, the region information, and the prediction value calculation method information does not know the prediction value calculation method executed by the prediction value calculation unit, the prediction value calculation method By calculating the predicted value based on the information, there is an additional effect that the received encoded data can be accurately decoded.

In the encoding apparatus according to the present invention, the encoding unit is configured to encode the difference value of the encoding target area whose number assigned by the number assigning unit is the earliest instead of the variable length encoding method. It is desirable to encode the representative value of the encoding target area by a fixed-length encoding method.

According to the above configuration, the encoding unit is configured to replace the difference value of the first encoding target area with the number assigned by the number assigning unit using the variable length encoding method instead of encoding the first code. The representative value of the conversion target area is encoded by a fixed-length encoding method. Here, when the representative pixel of the first encoding target area with the number assigned by the number assigning means is located at the end of the image, there is no pixel in the raster scan order from the representative pixel. In this case, since the predicted value of the earliest encoding target area cannot be predicted with high accuracy, the difference value of the earliest encoding target area becomes a very large value. When a large difference value is encoded by the variable length encoding method, the amount of code becomes very large.

Therefore, for the earliest encoding target area where the predicted value cannot be predicted accurately, instead of calculating the difference value and encoding using the variable length encoding method, the representative value of the earliest encoding target area Are encoded by a fixed-length encoding method. Thereby, there is an additional effect that the code amount of the encoded data can be further reduced.

In the encoding device according to the present invention, when the image is a distance image that forms a pair with a texture image, the dividing unit converts the entire area of the texture image into a pixel group included in the area for each area. A division pattern that divides a plurality of regions so that a difference between an average value calculated from pixel values and an average value calculated from pixel values of a pixel group included in a region adjacent to the region is equal to or less than a predetermined threshold value; It is desirable to divide the entire area of the distance image into a plurality of areas with the same division pattern.

Here, when the texture image and the distance image are configured by a pixel group including pixels of similar colors in a certain area in the texture image, the pixel group included in the corresponding area in the distance image is all or substantially omitted. There is a correlation that all pixels tend to take the same distance depth value. Therefore, by dividing the pixels for each range in which the pixel value is constant in the texture image and dividing the distance image in the same section, the distance value becomes substantially constant in each region in the distance image.

According to the above configuration, when the image is a distance image that forms a pair with the texture image, the dividing unit calculates the entire area of the texture image from the pixel values of the pixel group included in the area for each area. The same division pattern as the division pattern that is divided into a plurality of areas in which the difference between the calculated average value and the average value calculated from the pixel values of the pixel group included in the area adjacent to the area is equal to or less than a predetermined threshold value Thus, the entire area of the distance image is divided into a plurality of areas. Therefore, the representative value determining means determines the representative value from the pixel value of each pixel included in each region, thereby reducing the information amount of the distance image and generating data that can restore the distance image with high accuracy. There is a further effect that it is possible.

The encoding device according to the present invention relates to the transmission in which the encoded data of the image generated by the encoding means and the encoded data of the texture image obtained by encoding the texture image are associated and transmitted to the outside. Preferably further means are provided.

According to the above configuration, the transmission unit associates the encoded data of the image generated by the encoding unit with the encoded data of the texture image obtained by encoding the texture image, and externally. To transmit. Therefore, the device that has received the coded data and the coded data of the texture image divides the coded data of the texture image into the plurality of regions by the division pattern, thereby dividing the distance image into the plurality of regions. Can be divided. Therefore, there is an additional effect that the encoded data of the received image can be accurately decoded based on the encoded data of the texture image.

The decoding device according to the present invention preferably further includes receiving means for receiving the encoded data and the region information from the outside.

According to the above configuration, the receiving means receives the encoded data and the region information from the outside. Therefore, even when the decoding device does not hold the region information, an image can be divided into the plurality of regions based on the region information by acquiring the region information from the outside. Therefore, even if the decoding apparatus does not hold the area information, the received encoded data can be accurately decoded.

In the decoding apparatus according to the present invention, it is desirable that the receiving means receives the encoded data encoded by the variable length encoding method in which the code word is shorter as the value to be encoded is closer to 0.

According to the above configuration, the receiving means receives the encoded data encoded by the variable length encoding method in which the code word is shorter as the value to be encoded is closer to 0. As described above, when the encoded data is encoded with high accuracy, the code amount of the encoded data is small. Therefore, the decoding apparatus has an additional effect that the processing load for decoding the encoded data can be reduced.

In the decoding device according to the present invention, the prediction value calculation means includes a prediction reference pixel, a pixel preceding the raster scan order of the representative pixel, and a scan line that is included in the decoding target area and is the same as the representative pixel. And a raster of the last pixel of the scan line that is included in the decoding target area and is the same as the representative pixel, which is adjacent to the pixel of the first pixel in the raster scan order of the representative pixel. It is desirable that the pixel group includes a pixel adjacent to the next pixel in the scan order and a pixel on the scan line one previous in the raster scan order of the representative pixel.

According to the above configuration, the predicted value calculation means includes the prediction reference pixel, the pixel immediately preceding the raster scan order of the representative pixel, and the same scan line as the representative pixel included in the decoding target area. A raster scan of a pixel adjacent to the pixel, the pixel on the previous scan line in the raster scan order of the representative pixel, and the last pixel included in the decoding target area and on the same scan line as the representative pixel A pixel group that includes a pixel adjacent to the next pixel in the order and a pixel on the previous scan line in the raster scan order of the representative pixel.

That is, the pixels in the three directions of the pixel adjacent in the left direction of the decoding target region, the pixel adjacent in the upward direction, and the pixel close to the upper right (the pixel in the right direction of the decoding target region) are used as the prediction reference pixels. Therefore, since the prediction value is calculated with reference to the pixels in the raster scan order before the decoding target region and the pixels in multiple directions, the prediction value can be predicted with high accuracy.

In the decoding device according to the present invention, the prediction value calculation means includes a prediction reference pixel, a pixel preceding the raster scan order of the representative pixel, and a scan line that is included in the decoding target area and is the same as the representative pixel. One of the pixels of the scan line immediately preceding in the raster scan order of the representative pixel and the same scan line as the representative pixel included in the decoding target region. It is desirable that the pixel is adjacent to the next pixel in the raster scan order of the last pixel, and includes the pixel of the previous scan line in the raster scan order of the representative pixel. .

According to the above configuration, the predicted value calculation means includes the prediction reference pixel, the pixel immediately preceding the raster scan order of the representative pixel, and the same scan line as the representative pixel included in the decoding target area. A pixel adjacent to the pixel and one of the pixels of the scan line immediately preceding in the raster scan order of the representative pixel, and the last of the same scan line as the representative pixel included in the decoding target region It is assumed that the pixel is adjacent to the next pixel in the raster scan order of the tail pixel, and includes the pixel of the previous scan line in the raster scan order of the representative pixel.

That is, three pixels in three directions, that is, a pixel adjacent in the left direction of the decoding target area, a pixel adjacent in the upward direction, and a pixel close to the upper right (a pixel in the right direction of the decoding target area) are used as predicted reference pixels. Yes. Therefore, since the prediction reference pixel is a pixel located in multiple directions with respect to the decoding target region and is a pixel group (three pixels) as small as possible, the processing load of the prediction value calculation is reduced and the prediction is performed. There is an additional effect that the value can be predicted with high accuracy.

In the decoding apparatus according to the present invention, it is desirable that the predicted value calculation means uses the median value of the predicted reference pixels as the predicted value of the decoding target area.

According to the above configuration, the predicted value calculation means sets the median value of the predicted reference pixels as the predicted value of the decoding target area.

As described above, even if the representative value of the region having a certain prediction reference pixel is significantly different from the representative value of the decoding target region, the median of the representative value of the region having the prediction reference pixel is By using the predicted value of the decoding target area, there is an additional effect that the predicted value can be predicted with stable accuracy.

In the decoding apparatus according to the present invention, it is desirable that the predicted value calculation means use an average value of the pixel values of the predicted reference pixels as a predicted value of the decoding target region.

According to the above configuration, the predicted value calculation means sets the average value of the pixel values of the predicted reference pixels as the predicted value of the decoding target region.

As described above, even if the representative value of the region having a certain prediction reference pixel is significantly different from the representative value of the region to be decoded, the average value of the representative values of the region having the prediction reference pixel is By using the predicted value of the decoding target area, there is an additional effect that the predicted value can be predicted with stable accuracy.

In the decoding apparatus according to the present invention, it is desirable that the predicted value calculation means uses a pixel value of any pixel included in the predicted reference pixel as a predicted value of the decoding target region.

According to the above configuration, the predicted value calculation means sets the pixel value of any pixel included in the predicted reference pixel as the predicted value of the decoding target region.

Here, even if the representative value of a region having a certain prediction reference pixel that is significantly different from the representative value of the decoding target region is used as the prediction value of the decoding target region, the above configuration is used when the accuracy of the prediction value does not decrease. Thus, there is an additional effect that it is possible to reduce the processing load for calculating the predicted value while maintaining the accuracy of the predicted value.

In the decoding apparatus according to the present invention, in addition to the encoded data and the region information of the image, the reception unit further includes prediction value calculation method information indicating a prediction value calculation method executed by the prediction value calculation unit. Preferably, the prediction value calculation means calculates the prediction value based on the calculation method indicated by the prediction value calculation method information received by the reception means.

According to the above configuration, in addition to the encoded data and the region information of the image, the reception unit further includes prediction value calculation method information indicating a prediction value calculation method executed by the prediction value calculation unit. The prediction value calculation means receives the prediction value based on the calculation method indicated by the prediction value calculation method information received by the reception means. Therefore, even when the decoding device does not know the calculation method of the prediction value executed by the prediction value calculation unit, the decoding device calculates the prediction value based on the prediction value calculation method information, thereby receiving the encoded data. There is an additional effect that can be accurately decoded.

In the decoding apparatus according to the present invention, when the first code word in the encoded data is obtained by encoding the representative value of the earliest encoding target area by the fixed-length encoding method, the decoding means includes The first code word of the encoded data is decoded by a fixed-length encoding method, and the pixel value setting means calculates the pixel values of all the pixels included in the first area in the number order assigned by the number assigning means. It is desirable that the decoding means sets the representative value obtained by decoding the head codeword.

According to the above configuration, when the first code word in the encoded data is obtained by encoding the representative value of the earliest encoding target area by the fixed-length encoding method, the decoding means includes the code The first code word of the coded data is decoded by a fixed-length encoding method, and the pixel value setting means decodes the pixel values of all pixels included in the first area in the number order assigned by the number assigning means. The converting means sets the head codeword to the decoded representative value.

As described above, when the representative value of the earliest encoding target area is encoded by the fixed-length encoding method, the code amount of the encoded data is small. Therefore, the decoding apparatus has an additional effect that the processing load for decoding the encoded data can be reduced.

In the decoding device according to the present invention, when the image is a distance image that forms a pair with a texture image, the receiving means uses, as the region information, encoded data of the texture image obtained by encoding the texture image. The dividing means receives the entire area of the texture image decoded from the encoded data of the texture image, and calculates the average value calculated from the pixel values of the pixel group included in the area for each area and the area. A division pattern that divides a plurality of regions so that a difference from an average value calculated from pixel values of pixel groups included in adjacent regions is a predetermined threshold value or less. It is desirable to divide.

According to said structure, when the said image is a distance image which makes a pair with a texture image, the said receiving means receives the coding data of the said texture image which coded the said texture image as said area | region information. Then, the dividing unit is configured to calculate the entire area of the texture image decoded from the encoded data of the texture image, adjacent to the average value calculated from the pixel value of the pixel group included in the area for each area and the area. This is a division pattern that divides the distance image into a plurality of regions so that the difference from the average value calculated from the pixel values of the pixel group included in the region to be equal to or less than a predetermined threshold value. To do.

As described above, the distance value is substantially constant in each area of the distance image divided by the dividing means. Therefore, by using the representative value of each region, the encoded data can be made into data that has a small code amount and can restore the distance image with high accuracy. Therefore, the decoding device can reconstruct the distance image from the encoded data with high accuracy, and can further reduce the processing load for decoding the encoded data.

Also, an encoding program that causes a computer to function as each unit of the encoding device according to the present invention, a decoding program that causes the computer to function as each unit of the decoding device according to the present invention, and a computer-readable recording of the encoding program A recording medium and a computer-readable recording medium on which a decoding program is recorded are also included in the scope of the present invention.

Furthermore, the data structure of the encoded data of the image, for each of a plurality of regions obtained by dividing the entire region of the image by a predetermined division pattern, a representative value of the pixel value of each pixel included in the region, A difference value that is a difference from the predicted value of the representative value of the region is included, and the difference value is arranged in the order of the numbers given in the raster scan order to the plurality of regions, and the predicted value is The above areas are set as encoding target areas in numerical order, and the first pixel in the raster scan order among the pixels included in the encoding target area is set as the representative pixel, and the same scan as the above representative pixel is included in the encoding target area. Calculated based on at least one representative value of the region having the predicted reference pixel, with the pixel adjacent to the pixel of the line and having the raster scan order before the representative pixel as the predicted reference pixel Data structure of the coded data, wherein the at is also included in the scope of the present invention.

(Program etc.)
Finally, each block included in the moving

image encoding device

1, 1A and the moving

image decoding device

2, 2A may be configured by hardware logic. In addition, each control of the moving

image encoding apparatuses

1 and 1A and the moving

image decoding apparatuses

2 and 2A may be realized by software using a CPU (Central Processing Unit) as follows.

That is, the program code (execution format program, intermediate code program, source program) of the control program that realizes the control of each of the moving

image encoding apparatuses

1 and 1A and the moving

image decoding apparatuses

2 and 2A is recorded so as to be readable by a computer. Just do it. The moving

image encoding device

1, 1A and the moving

image decoding device

2, 2A (or CPU or MPU) may read and execute the program code recorded on the supplied recording medium.

The recording medium for supplying the program code to the moving

image encoding apparatus

1, 1A and the moving

image decoding apparatus

2, 2A is, for example, a tape system such as a magnetic tape or a cassette tape, or a magnetic disk such as a floppy (registered trademark) disk / hard disk. And disk systems including optical disks such as CD-ROM / MO / MD / DVD / CD-R, card systems such as IC cards (including memory cards) / optical cards, mask ROM / EPROM / EEPROM / flash ROM, etc. A semiconductor memory system can be used.

Further, even if the moving

image encoding devices

1 and 1A and the moving

image decoding devices

2 and 2A are configured to be connectable to a communication network, the object of the present invention can be achieved. In this case, the program code is supplied to the moving

image encoding apparatuses

1 and 1A and the moving

image decoding apparatuses

2 and 2A via a communication network. This communication network is not limited to a specific type or form as long as it can supply a program code to the moving

image encoding apparatuses

1 and 1A and the moving

image decoding apparatuses

2 and 2A. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, mobile communication network, satellite communication network, etc. may be used.

The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, wired communication such as IEEE 1394, USB (Universal Serial Bus), power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line) line, infrared such as IrDA or remote control, Bluetooth (registered trademark), 802. 11 wireless, HDR, mobile phone network, satellite line, terrestrial digital network, etc. can also be used.

The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.

The present invention can be suitably applied to a content generation device that generates 3D-compatible content, a content playback device that plays back 3D-compatible content, and the like.

DESCRIPTION OF

SYMBOLS

1, 1A moving

image encoding apparatus

2, 2A moving image decoding apparatus 11 Image encoding part 12 Image decoding part (decoding means)
20, 20A Distance

image encoding unit

21, 21A Image division processing unit 21 ′, 21A ′ Image division processing unit (division means)
22, 22A Distance image division processing unit (division means)
23, 23A Distance value correction unit (representative value determining means)
24, 24A Numbering unit (numbering means)
24 ', 24A' number assigning unit (number assigning means, assigning means)
25, 25A Predictive encoding unit (predicted value calculating means, difference value calculating means, encoding means)
28, 28A Packaging part (transmission means)
31, 31A Unpackaging unit (receiving means)
32, 32A Predictive decoding unit (predicted value calculating means, pixel value setting means)

Claims

In an encoding device for encoding an image,
Dividing means for dividing the entire region of the image into a plurality of regions;
For each of the plurality of areas divided by the dividing means, representative value determining means for determining a representative value from the pixel value of each pixel included in the area;
Numbering means for assigning numbers to the plurality of areas in raster scan order;
The above-mentioned area is set as the encoding target area in the order of the numbers given by the above-mentioned number assigning means, and among the pixels included in the encoding target area, the first pixel in the raster scan order is set as the representative pixel, and is included in the encoding target area. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and is at least one of the representative values of the region having the predicted reference pixel. Based on a predicted value calculation means for calculating a predicted value of the encoding target region,
Difference value calculating means for subtracting the predicted value calculated by the predicted value calculating means from the representative value determined by the representative value determining means for each encoding target area;
An encoding apparatus comprising: encoding means for encoding the difference values calculated by the difference value calculating means in the order given by the number assigning means and generating encoded data of the image.
And further comprising: transmission means for associating the encoded data of the image generated by the encoding means with the area information defining the plurality of areas, and transmitting the associated information to the outside. Item 4. The encoding device according to Item 1.
3. The encoding according to claim 1, wherein the encoding unit encodes the difference value by a variable-length encoding method in which a code word is shorter as a value to be encoded is closer to 0. 4. apparatus.
The predicted value calculation means includes a prediction reference pixel that includes a pixel immediately preceding the representative pixel in the raster scan order, and a pixel adjacent to a pixel on the same scan line as the representative pixel that is included in the encoding target region. Then, the pixel of the previous scan line in the raster scan order of the representative pixel and the next pixel in the raster scan order of the last pixel included in the encoding target area and the same scan line as the representative pixel. 4. The pixel group according to claim 1, wherein the pixel group includes a pixel adjacent to the first pixel and a pixel on a scan line immediately preceding the representative pixel in the raster scan order. The encoding device described.
The predicted value calculation means includes a prediction reference pixel that includes a pixel immediately preceding the representative pixel in the raster scan order, and a pixel adjacent to a pixel on the same scan line as the representative pixel that is included in the encoding target region. Any one of the pixels on the previous scan line in the raster scan order of the representative pixel, and the last pixel on the same scan line as the representative pixel included in the encoding target area. 2. The pixel adjacent to the next pixel in the scan order and including the pixel on the scan line one previous in the raster scan order of the representative pixel, 4. The encoding device according to any one of 3.
The code according to any one of claims 1 to 5, wherein the predicted value calculating means uses a median of representative values of the area having the predicted reference pixel as a predicted value of the encoding target area. Device.
The code according to any one of claims 1 to 5, wherein the prediction value calculation means uses an average value of the representative values of the area having the prediction reference pixel as a prediction value of the encoding target area. Device.
The code according to any one of claims 1 to 5, wherein the prediction value calculation means uses any one of the representative values of the area having the prediction reference pixel as a prediction value of the encoding target area. Device.
In addition to the encoded data and the region information of the image, the transmission unit further transmits prediction value calculation method information indicating a prediction value calculation method executed by the prediction value calculation unit in association with the outside. The encoding device according to claim 2.
The encoding means, instead of encoding the difference value of the earliest encoding target area with the number given by the number assigning means by the variable length encoding method, represents the representative value of the earliest encoding target area. The encoding apparatus according to claim 3, wherein encoding is performed by a fixed-length encoding method.
If the image is a distance image that is paired with a texture image,
The dividing means calculates the entire area of the texture image from the average value calculated from the pixel value of the pixel group included in the area and the pixel value of the pixel group included in the area adjacent to the area for each area. The entire area of the distance image is divided into a plurality of areas with the same division pattern as the division pattern divided into a plurality of areas such that the difference from the average value is equal to or less than a predetermined threshold. The encoding device according to 1.
The image processing apparatus further comprises transmission means for associating the encoded data of the image generated by the encoding means with the encoded data of the texture image obtained by encoding the texture image and transmitting them to the outside. The encoding device according to claim 11.
For each of a plurality of areas obtained by dividing the entire area of the image with a predetermined division pattern, a difference value that is a difference between the representative value of the pixel value of each pixel included in the area and the predicted value of the representative value of the area is calculated. Encoded data of the image including the difference values are arranged in the order of numbers assigned to the plurality of areas in a raster scan order, and the predicted values are encoded in the order of the numbers. Among the pixels included in the encoding target region as the target region, the first pixel in the raster scan order is the representative pixel, and the pixel that is included in the encoding target region and is close to the pixel on the same scan line as the representative pixel A decoding device that decodes encoded data calculated based on at least one representative value of a region having the predicted reference pixel, with a pixel having a raster scan order before the representative pixel as a predicted reference pixel. There,
A dividing means for dividing the entire area of the image into a plurality of areas based on area information defining the plurality of areas;
Decoding means for decoding the encoded data and generating difference values arranged in order;
Numbering means for assigning numbers in the raster scan order to the plurality of regions divided by the dividing means;
Assigning means for sequentially assigning difference values from the top to the plurality of areas in the order of numbers assigned by the number assigning means;
The region is set as a decoding target region in the order of numbers given by the number assigning means, and among the pixels included in the decoding target region, the first pixel in the raster scan order is set as a representative pixel and is included in the decoding target region. A pixel adjacent to a pixel on the same scan line as the pixel and having a raster scan order before the representative pixel is a predicted reference pixel, and based on a pixel value of at least one of the predicted reference pixels, Predicted value calculation means for calculating a predicted value of the decoding target area;
For each decoding target region, the difference value assigned by the assigning unit is added to the prediction value calculated by the prediction value calculating unit to calculate the pixel value of the decoding target region, and all the decoding target regions are included in the decoding target region. Pixel value setting means for setting the pixel value of the pixel to the calculated pixel value,
The decoding apparatus according to claim 1, wherein the prediction value calculation unit and the pixel value setting unit repeatedly execute the processing for each decoding target area in the order of numbers to restore the pixel values of the image.
14. The decoding apparatus according to claim 13, further comprising receiving means for receiving the encoded data and the region information from the outside.
15. The decoding apparatus according to claim 14, wherein the receiving unit receives the encoded data encoded by a variable length encoding method in which a codeword is shorter as a value to be encoded is closer to 0. .
The predicted value calculation means includes a predicted reference pixel that is adjacent to a pixel in the decoding target area and a pixel on the same scan line as the representative pixel, and a pixel immediately preceding the representative pixel in the raster scan order. In addition, the pixel of the previous scan line in the raster scan order of the representative pixel, and the pixel immediately after in the raster scan order of the last pixel included in the decoding target area and the same scan line as the representative pixel 16. The pixel group according to claim 13, wherein the pixel group includes: a pixel adjacent to the first scan line in a raster scan order of the representative pixel. Decoding device.
The predicted value calculation means includes a predicted reference pixel that is adjacent to a pixel in the decoding target area and a pixel on the same scan line as the representative pixel, and a pixel immediately preceding the representative pixel in the raster scan order. The raster scan order of any one pixel of the previous scan line pixel in the raster scan order of the representative pixel and the last pixel of the same scan line included in the decoding target area as the representative pixel 16. The pixel adjacent to the next pixel in step (1) and including the pixel on the previous scan line in the raster scan order of the representative pixel. The decoding device according to any one of the preceding claims.
The decoding apparatus according to any one of claims 13 to 17, wherein the predicted value calculation means uses a median value of the predicted reference pixels as a predicted value of a decoding target region.
The decoding apparatus according to any one of claims 13 to 17, wherein the predicted value calculation means uses an average value of pixel values of the prediction reference pixels as a predicted value of a decoding target region.
18. The prediction value calculating unit according to claim 13, wherein the pixel value of any pixel included in the prediction reference pixel is set as a prediction value of a decoding target region. Decoding device.
In addition to the encoded data and the region information of the image, the reception unit further receives prediction value calculation method information indicating a calculation method of a prediction value executed by the prediction value calculation unit,
15. The decoding apparatus according to claim 14, wherein the prediction value calculation unit calculates a prediction value based on a calculation method indicated by the prediction value calculation method information received by the reception unit.
When the first code word in the encoded data is a representative value of the first encoding target area encoded by a fixed-length encoding method,
The decoding means decodes the head codeword of the encoded data by a fixed length encoding method,
The pixel value setting means sets the pixel values of all the pixels included in the first area in the number order assigned by the number assigning means to a representative value obtained by decoding the first code word by the decoding means. The decoding device according to claim 15.
If the image is a distance image that is paired with a texture image,
The reception means receives encoded data of the texture image obtained by encoding the texture image as the region information,
The dividing means calculates the total area of the texture image decoded from the encoded image of the texture image, the average value calculated from the pixel values of the pixel group included in the area for each area, and the area adjacent to the area Dividing the entire area of the distance image into a plurality of areas with a division pattern in which the difference from the average value calculated from the pixel values of the pixel group included in the pixel group is a predetermined threshold value or less. The decoding device according to claim 14.
In an encoding method of an encoding device for encoding an image,
In the above encoding device,
A division step of dividing the entire area of the image into a plurality of areas;
For each of the plurality of regions divided in the dividing step, a representative value determining step for determining a representative value from the pixel value of each pixel included in the region;
A numbering step of assigning numbers to the plurality of regions in raster scan order;
The region is an encoding target region in the order of the numbers given in the numbering step, and among the pixels included in the encoding target region, the earliest pixel in the raster scan order is the representative pixel and is included in the encoding target region. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and is at least one representative value of a region having the predicted reference pixel Based on the prediction value calculation step of calculating the prediction value of the encoding target region,
A difference value calculating step for calculating a difference value by subtracting the predicted value calculated in the predicted value calculating step from the representative value determined in the representative value determining step for each encoding target region;
A coding step, wherein the difference values calculated in the difference value calculating step are encoded in the order given in the number assigning step and encoded data of the image is generated. Method.
For each of a plurality of areas obtained by dividing the entire area of the image with a predetermined division pattern, a difference value that is a difference between the representative value of the pixel value of each pixel included in the area and the predicted value of the representative value of the area is calculated. Encoded data of the image including the difference values are arranged in the order of numbers assigned to the plurality of areas in a raster scan order, and the predicted values are encoded in the order of the numbers. Among the pixels included in the encoding target region as the target region, the first pixel in the raster scan order is the representative pixel, and the pixel that is included in the encoding target region and is close to the pixel on the same scan line as the representative pixel A decoding device that decodes encoded data calculated based on at least one representative value of a region having the predicted reference pixel, with a pixel having a raster scan order before the representative pixel as a predicted reference pixel A decoding method,
In the decoding device,
A division step of dividing the entire area of the image into a plurality of areas based on area information defining the plurality of areas;
A decoding step of decoding the encoded data and generating difference values arranged in order;
A numbering step for assigning numbers in the raster scan order to the plurality of regions divided in the dividing step;
An assigning step for assigning difference values in order from the top to the plurality of regions in the order of the numbers assigned in the numbering step;
The region is a decoding target region in the order of the numbers given in the numbering step, and among the pixels included in the decoding target region, the first pixel in the raster scan order is the representative pixel, and is included in the decoding target region. A pixel that is adjacent to a pixel on the same scan line as the representative pixel and has a raster scan order before the representative pixel is a predicted reference pixel, and is based on the pixel value of at least one of the predicted reference pixels. A predicted value calculation step of calculating a predicted value of the decoding target area;
For each decoding target area, the pixel value of the decoding target area is calculated by adding the difference value allocated in the allocation step to the prediction value calculated in the prediction value calculating step, and is included in the decoding target area A pixel value setting step for setting the pixel values of all the pixels to be calculated pixel values,
A decoding method characterized by repetitively executing the prediction value calculating step and the pixel value setting step for each decoding target region in the order of numbers to restore the pixel values of the image.
A program that causes a computer to operate as the encoding device according to any one of claims 1 to 12, and that causes the computer to function as each of the above-described means.
A program for causing a computer to operate as the decoding device according to any one of claims 13 to 23, wherein the program causes the computer to function as each of the means.
A computer-readable recording medium on which at least one of the program according to claim 26 and the program according to claim 27 is recorded.
A data structure of encoded data of an image,
For each of a plurality of areas obtained by dividing the entire area of the image with a predetermined division pattern, a difference value that is a difference between the representative value of the pixel value of each pixel included in the area and the predicted value of the representative value of the area Contains
The difference values are arranged in the order of the numbers assigned in the raster scan order to the plurality of areas,
The predicted value is included in the encoding target area, with the area as the encoding target area in the order of the numbers, with the first pixel in the raster scan order as the representative pixel among the pixels included in the encoding target area. Based on at least one of the representative values of the region having the predicted reference pixel, the pixel adjacent to the pixel on the same scan line as the representative pixel and having a raster scan order before the representative pixel is a predicted reference pixel. A data structure of encoded data, which is calculated by