WO2007026440A1

WO2007026440A1 - Image information compression method, image information compression device, and free viewpoint television system

Info

Publication number: WO2007026440A1
Application number: PCT/JP2006/304590
Authority: WO
Inventors: Masayuki Tanimoto; Toshiaki Fujii; Kenji Yamamoto
Original assignee: National University Corporation Nagoya University
Priority date: 2005-08-29
Filing date: 2006-03-09
Publication date: 2007-03-08
Also published as: JP4825984B2; JPWO2007026440A1

Abstract

There is provided an image information compression method capable of improving encoding compression efficiency in encoding a plurality of still images captured by cameras (2021, ...) arranged at a plurality of positions on a circumference around an object (201) or a rectilinear line. The method includes: a step of capturing still images (2031, ...) from a plurality of positions on a circumference around an object (201) or a rectilinear line; a step of arranging the plurality of still images (2031, ...) in the order of the aforementioned plurality of positions in such a manner that adjacent still images oppose to one another in the z-axis direction of the orthogonal coordinate system so as to generate a multi-camera still image (204); a step of cutting out the multi-camera still image (204) by a plane vertical tot he xz-plane and the xy-plane so as to generate a plurality of vertical cross sectional images (205); and a step of handling each of the vertical cross sectional images (205) as frames arranged in the temporal axis direction of the respective dynamic image and encoding the vertical cross sectional images by using the in-frame encoding and inter-frame prediction encoding.

Description

Specification

Image information compression method, image information compression apparatus, and free-viewpoint television system

[0001] The present invention improves the code compression efficiency in a plurality of still image codes acquired by a plurality of positions on a circle around a subject or a plurality of position force cameras on a straight line. The present invention relates to an image information compression method, an image information compression device, and a free-viewpoint television system using the image information compression device.

Background art

[0002] The inventor of the present application is a free viewpoint TV (FTV) that allows viewers to freely change their viewpoints and view 3D scenes as if they were on the spot. (For example, see Non-Patent Documents 1 to 4). Furthermore, the viewpoint can be freely moved in the horizontal plane based on the photographed images acquired by 15 cameras. F TV experimental equipment is completed (see Non-Patent Document 1, for example).

[0003] Non-Patent Document 1: Masayuki Tanimoto, “Free Viewpoint Television”, Nihon Kogyo Publishing, Imaging Lab, February 2005, pp. 23-28

Non-Patent Document 2: Shinya Oka, Nonon Champurim, Toshiaki Fujii, Masayuki Tanimoto, “Light-Space Information Compression for Free Viewpoint Television”, IEICE Technical Report, CS2003—141, pp. 7-12, 2003 1 2 Moon

Non-Patent Document 3: Masayuki Tanimoto, "5. Free-viewpoint TV FTV, using multi-viewpoint image processing", Journal of the Institute of Image Information and Media Sciences, Vol. 58, No. 7, pp. 898-901, 2004 Patent Document 4: Shinya Oka, Nonon Champurim, Toshiaki Fujii, Masayuki Tanimoto, “Compression of Dynamic Ray Space for Free Viewpoint Television”, 3D Image Conference 2004, pp. 139-142, 2004

[0004] It should be noted that the left column on page 9 of Non-Patent Document 2 states, “Because the light space is very similar in both the time axis and the space axis, motion (parallax) prediction is applied to both axes. It is thought that it is possible to obtain a high compression ratio by doing so. " In addition, Non-Patent Document 3 8 Page 99, left column describes “interpolate ray space”, and page 900 left column “interpolation is It suffices to go to only the necessary part, not the entire line space. Is described. Non-Patent Document 4, page 140, left column, states that “dynamic ray space can be expected to have a large correlation between time and space.” From page 140, right column to page 141, left Examples of reference images are shown in the column.

FIG. 1 is a diagram conceptually showing the basic configuration of an FTV system. The F TV system shown in Fig. 1 uses a camera (step ST1), image interpolation processing (step ST2 or ST2a), image information compression processing (step ST3), and an image viewed from the input viewpoint. Is displayed (steps ST4 and ST5). In an FTV system, image information of a subject 101 that exists in a three-dimensional real space is stored in multiple cameras (Fig. 1 shows five cameras 102 to 102).

1 5 As shown, more cameras are actually used. ) (Step ST1) and images acquired by multiple cameras (Figure 1 shows five images with reference numerals 103 to 103)

1 5

Force More images are actually used. ) Are arranged in the ray space 103 to form an FTV signal. In FIG. 1, X represents a horizontal viewing direction, y represents a vertical viewing direction, and u (= tan 0) represents a viewing zone direction. As shown in Fig. 2 (a), the arrangement of the plurality of cameras 102 is a linear arrangement in which the directions parallel to each other are arranged on a straight line, as shown in Fig. 2 (b). Circumferential arrangement (or arc arrangement) arranged with the inside of the circumference facing the inside of the circumference, as shown in Fig. 2 (c), planar arrangement arranged in parallel with each other on the plane, Fig. 2 (d) As shown in Fig. 2, the spherical arrangement (or hemispherical arrangement) arranged on the spherical surface with the inner surface of the spherical surface arranged, and the cylindrical arrangement arranged on the cylinder with the inner surface of the cylinder oriented as shown in Fig. 2 (e). is there. The arrangement of multiple cameras 102 should be either the linear arrangement shown in Fig. 2 (a) or the circumferential arrangement shown in Fig. 2 (b) when only a horizontal free viewpoint is realized. When the free viewpoint in both the vertical direction and the vertical direction is realized, the planar arrangement shown in Fig. 2 (c), the cylindrical arrangement shown in Fig. 2 (d), or the spherical arrangement shown in Fig. 2 (e) To do.

[0006] In the ray space method, one ray in a three-dimensional real space is represented by one point in a multidimensional space with a parameter representing it as a coordinate. This virtual multidimensional space is called a light space. The whole ray space expresses all rays in 3D space without excess or deficiency. Ray space is created by collecting images taken with a lot of viewpoint power. Since the value of the point in the ray space is the same as the pixel value of the image, conversion to the image force ray space is a simple coordinate transformation. is there. As shown in Fig. 3 (a), the light beam 107 that passes through the reference plane 106 in the real space can be uniquely expressed by four parameters: the passing position (X, y) and the passing direction (0, φ). it can. In FIG. 3 (a), X is a horizontal coordinate axis in three-dimensional real space, Y is a vertical coordinate axis, and Z is a depth coordinate axis. Θ is the horizontal angle with respect to the normal of the reference surface 106, that is, the horizontal projection angle with respect to the reference surface 106, and φ is the vertical angle with respect to the normal of the reference surface 106, that is, This is an emission angle in a direction perpendicular to the reference plane 106. As a result, the ray information in this three-dimensional real space can be expressed as luminance f (x, y, θ, φ). Here, in order to simplify the explanation, the vertical parallax (angle Φ) is ignored. As shown in Fig. 3 (a), images taken by a number of cameras placed horizontally toward the reference plane 106 are X, y as shown in Fig. 3 (b). , u (= tan 0) in the three-dimensional space, it is located in the cross section 103-103 drawn with a dotted line. Arbitrary surface from ray space 103 shown in Fig. 3 (b)

1 5

It is possible to generate an image viewed from an arbitrary viewpoint in the horizontal direction in real space. For example, when the section 103a is cut out from the light beam space 103 shown in FIG. 4 (a), an image as shown in FIG. 4 (b) is displayed on the display 105, and the light beam space 103 shown in FIG. When the cross section 103b is cut out from the image, an image as shown in FIG. 4 (c) is displayed on the display 105.

[0007] In addition, there is no data between the images (sections 103 to 103) arranged in the light space 103.

1 5

Therefore, this is created by interpolation (step ST2 or ST2a in Fig. 1). It should be noted that the interpolation need only be performed for the necessary part of the entire ray space. In addition, the location where interpolation is performed is the image information transmission side (step ST2) for applications such as VOD (Video On Demend), and the image information reception side (step ST2a) for applications such as broadcasting.

[0008] Compression of image information (step ST3 in Fig. 1) is not an indispensable process when the components of the FTV system are in the same location, but the camera and the user exist in different locations. This is an indispensable process when distributing image information using the Internet. As a conventional image information compression method, for example, there is a method compliant with the H.264ZAVC standard (for example, see Patent Document 1). Patent Document 1: Japanese Patent Laid-Open No. 2003-348595 (FIGS. 1 and 2)

Disclosure of the invention

Problems to be solved by the invention

[0009] In recent years, for example, for the preservation of historical or artistic cultural heritage, the subject is centered, and from a plurality of positions on the circumference centered on the subject, or a plurality of positions on a straight line. Attention has been focused on a technique for photographing a subject from the above and creating a free viewpoint image from a plurality of still images obtained by photographing. However, the above document discloses a method for handling images of a plurality of cameras arranged in the same horizontal line or the same vertical line as a frame. An efficient method for compressing image information in a plurality of still image codes is not disclosed. In addition, there is a need for the development of an efficient method for compressing image information in a plurality of still image codes obtained by a camera with respect to a plurality of position forces on a straight line.

[0010] Therefore, an object of the present invention is to perform code compression in a plurality of still image codes obtained by a camera, such as a plurality of positions on a circumference around a subject or a plurality of positions on a straight line. An object is to provide an image information compression method, an image information compression device, and a free viewpoint television system using the image information compression device capable of improving efficiency.

Means for solving the problem

[0011] The image information compression method of the present invention includes:

Acquiring a plurality of still images by means of a plurality of positions on a circle centered on the subject or a plurality of position forces on a straight line;

Multi-camera still images are arranged by arranging the plurality of still images in the order of the plurality of positions so that adjacent still images face each other in the z-axis direction of the Cartesian coordinate system including the X-axis, y-axis, and z-axis. Generating an image; and

Generating a plurality of vertical cross-sectional images by cutting the multi-camera still image along a plane perpendicular to the xz plane including the X axis and the z axis and perpendicular to the xy plane including the X axis and the y axis; Each of the plurality of vertical slice images is treated as a plurality of frames arranged in the time axis direction of the moving image, and the plurality of vertical slices are used by using inter-frame predictive coding using intra-frame code and correlation between frames. Encoding image With tape

It is characterized by having.

[0012] Further, in the image information compression method, the encoding processing of the plurality of vertical slice images includes processing conforming to the H.264ZAVC standard or processing conforming to the MPEG2 standard.

[0013] Further, according to the image information compression method, the interval between the plurality of positions at which the still image acquired by the camera has a resolution higher than a predetermined reference resolution is greater than the predetermined reference interval. If the sparseness is also sparse, each of the plurality of camera images acquired by the camera may be replaced with the step of generating the vertical slice image and the step of encoding the plurality of vertical slice images. And encoding the plurality of camera images using inter-frame predictive coding using intra-frame code and correlation between frames. You may have.

[0014] Further, the image information compression device of the present invention includes:

Multiple positions on the circumference centered on the subject or multiple positions on the straight line Multiple still images acquired by the camera in the z-axis direction of the Cartesian coordinate system consisting of the X-axis, y-axis, and z-axis Multi-camera still image generation means for generating a multi-camera still image by arranging the plurality of positions in order such that adjacent still images face each other;

A vertical cross-section that generates a plurality of vertical cross-sectional images by cutting the multi-camera still image on a plane that is perpendicular to the xz plane including the X-axis and the z-axis and that is perpendicular to the xy plane including the X-axis and the y-axis Image generating means;

Each of the plurality of vertical slice images is treated as a plurality of frames arranged in the time axis direction of the moving image, and the plurality of vertical slices are encoded using inter-frame prediction coding using intra-frame code and correlation between frames. Encoding means for encoding an image;

It is characterized by having.

[0015] Further, in the image information compression apparatus, the encoding processing of the plurality of vertical slice images includes processing conforming to the H.264ZAVC standard or processing conforming to the MPEG2 standard. [0016] In addition, the image information compression apparatus has a predetermined reference interval in which a plurality of positions at which the still images acquired by the camera have a resolution higher than a predetermined reference resolution are acquired. If the sparser image is sparser, the encoding means replaces the generation of the vertical slice images and the encoding process of the plurality of vertical slice images with each of the plurality of camera images acquired by the camera. The plurality of camera images are treated as a plurality of frames arranged in the time axis direction of the moving image, and the plurality of camera images are subjected to the code processing using the intra-frame code and the inter-frame prediction code using the correlation between the frames. can do.

[0017] The free viewpoint television system of the present invention includes:

A plurality of positions on the circumference centered on the subject or a plurality of position forces on a straight line; a still image acquisition means for acquiring a plurality of still images;

The image information compression device that performs an encoding process on the plurality of still images, an image information decoding device that decodes the encoded information output from the image information compression device, and

A user interface for inputting the viewpoint position of the viewer;

An image information extraction unit for extracting an image viewed from a viewpoint input by the user interface from the plurality of still images;

It is characterized by having.

The invention's effect

[0018] According to the image information compression method, the image information compression device, and the FTV system of the present invention, by performing the same encoding process as the moving image encoding process on a plurality of vertical slice images of a multi-camera still image. The effect of improving the code compression efficiency can be obtained.

Brief Description of Drawings

FIG. 1 is a diagram conceptually showing a basic configuration of an FTV system.

[Fig. 2] (a) to (e) are diagrams showing examples of the arrangement of multiple cameras, (a) is a linear arrangement, (b) is a circumferential arrangement, (c) is a planar arrangement, (d) Is a cylindrical arrangement, and (e) is a spherical arrangement.

[Fig. 3] (a) is a diagram showing an object in real space, a linearly arranged camera, a reference plane, and light rays, and (b) is a diagram showing the light space. [Fig. 4] (a) is a diagram showing a light space, (b) is a diagram showing an image cut out from the light space, and (c) is a diagram showing another image cut out from the light space. is there.

FIG. 5 is an explanatory diagram conceptually showing processing up to generation of a multi-camera still image of the image information compression method of the present invention.

FIG. 6 is an explanatory diagram showing a process of cutting out a vertical cross-sectional image from a multi-camera still image.

FIG. 7 is an explanatory diagram conceptually showing a sign key process for a vertical cross-sectional image.

[FIG. 8] (a) to (c) are explanatory diagrams showing processing for cutting out a cross-sectional image from a multi-camera still image.

[FIG. 9] (a) to (c) are diagrams showing examples of cross-sectional images of FIG. 8 (a) to (c).

[FIG. 10] (a) to (c) are explanatory diagrams conceptually showing the encoding processing of the cross-sectional images of FIG. 8 (a) to (c).

[FIG. 11] (a) to (c) are graphs showing the results of compression encoding processing of flower with a camera image sequence, a horizontal slice image sequence, and a vertical slice image sequence, respectively.

[Fig. 12] (a) and (b) generate multi-camera still images using images obtained by acquiring a plurality of still images by a camera with multiple positions on a straight line directed to the subject. 6 is a graph showing the results of an experiment in which compression encoding processing is performed on a horizontal slice image sequence and a vertical slice image sequence.

FIG. 13 is a block diagram schematically showing a configuration of an image information encoding device capable of executing the image information compression method of the present invention.

FIG. 14 is a flowchart showing an operation of the image information encoding device shown in FIG.

FIG. 15 is a block diagram schematically showing a configuration of an image information decoding apparatus capable of decoding image information encoded by the image information compression method of the present invention.

FIG. 16 is a flowchart showing an operation of the image information decoding apparatus shown in FIG.

FIG. 17 is a diagram conceptually showing the basic structure of the FTV system of the present invention.

Explanation of symbols

201 subjects

202, 202, 202, 202,… Camera

1 2 3 4

203 Camera image 203, 203, 203, 203,… Camera image sequence

1 2 3 4

204 Multi-camera still image

205 Vertical section image

205, 205, 205,-"Vertical section image sequence

one two Three

206 Horizontal cross section image (ΕΡΙ)

300 Image information encoder

301 to 301 input terminals

1 Ν

302 to 302 AZD converter

1 N

303 pixel rearrangement buffer

304 adder

305 Orthogonal transformation unit

306 Quantizer

307 Variable encoding unit

308 Accumulation buffer

309 output terminal

310 Rate control unit

311 Inverse quantization section

312 Inverse orthogonal transform unit

313 frame memory

315 Motion prediction and compensation unit

350 FTV system transmitter

400 Image information decoder

401 input terminal

402 Accumulation buffer

403 Variable decoding unit

404 Inverse quantization section

405 Inverse orthogonal transform unit

406 Adder 407 pixel reordering buffer

408-408 DZA converter

1 N

409 to 409 output terminals

1 N

410 frame memory

412 Motion prediction and compensation unit

450 FTV system receiver equipment

451 Image information extraction unit

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 5 is an explanatory diagram conceptually showing processing up to generation of a multi-camera still image of the image information compression method of the present invention. FIG. 6 is an explanatory diagram showing a process of cutting out a vertical slice image from a multi-camera still image, and FIG. 7 is an explanatory diagram conceptually showing a coding process of the vertical slice image.

In the image information compression method of the present invention, as shown in FIG. 5, a plurality of still images 203, 203 are taken by a plurality of cameras from a plurality of positions on the circumference around the subject 201.

1 2

, 203, 203, ... To obtain multiple still images, center the subject 201.

3 4

Use multiple cameras (# 1, # 2, # 3, # 4,--) 202, 202, 202, 202,… placed inwardly on the circumference (that is, toward the subject 201) Speak. Multiple cameras 202, 20

1 2 3 4 1

2, 202, 202,... Have a predetermined angle (for example, 0.25) on the circumference around the subject 201.

2 3 4

(°, 1 °, 3 °, etc.) However, when the subject 201 is stationary, one camera facing the subject 201 is placed on the circumference around the subject 201 by a predetermined angle (for example, by 0.25 °, by 1 °, or Using a moving mechanism that moves the camera by 3 °, etc., the camera may be moved and photographed to obtain a plurality of still images. Note that FIG. 5 illustrates the case where the subject is photographed from a plurality of positions on the circumference centered on the subject 201. However, a plurality of position force cameras arranged in the same direction on a straight line facing the subject. (E.g., the camera is placed as shown in Fig. 2 (a) or Fig. 3 (a))

The present invention can also be applied when acquiring 3, 203, 203, 203,. In this case

1 2 3 4

The interval between multiple camera shooting positions is, for example, 1 mm, 10 mm, or 100 mm. The camera interval can be freely determined based on various conditions such as the size of the subject and the power of the subject, such as the distance to the camera.

Next, as shown in FIG. 5, a plurality of still images 203, 203, 203, 203,.

1 2 3 Multi-camera still images are arranged in the z-axis direction of the Cartesian coordinate system consisting of the 4-axis, y-axis, and z-axis. 2 04 is generated. This process is performed by a pixel rearrangement buffer 303 in FIG.

Next, as shown in FIGS. 5 and 6, the multi-camera still image 204 is perpendicular to the xz plane including the X axis and the z axis, and the xy plane including the X axis and the y axis. A plurality of vertical cross-sectional images 205 are generated by cutting out a plane perpendicular to the plane (a plane perpendicular to the X axis, that is, a plane parallel to the yz plane).

Next, as shown in FIG. 7, a plurality of vertical sectional images 205 (in FIG.

, 205, 205,... ) Are arranged in the time axis direction of the video.

one two Three

A plurality of vertical slice images 205 are encoded using inter-frame prediction encoding using intra-frame code and correlation between frames. As this encoding process, for example, a process based on the H.264ZAVC standard can be used.

[0026] H. 264ZAVC is a type of hybrid code, and is a compression method that reduces interframe redundancy by motion compensated interframe prediction and reduces intra-image redundancy by DCT conversion. For this reason, it is possible to expect more effective compression for moving images whose redundancy is easily reduced. A multi-camera still image has characteristics that are not found in a moving image composed of a plurality of frames arranged in the normal time axis direction, and a high compression rate can be obtained by using such characteristics. Note that the encoding method applicable to the present invention is not limited to H.264ZAVC, and other code coding methods such as the encoding method conforming to the MPEG2 standard may be adopted. The experimental results of the code compression method using multi-camera still images are described below.

FIGS. 8A to 8C are explanatory diagrams showing processing for cutting out a cross-sectional image from the multi-camera still image 204. FIG. As shown in Fig. 8 (a) to (c), a cross-section is taken from the multi-camera still image 204. There are three typical methods for cropping an image. The first method is the method shown in FIG. 8 (a), in which a cross-sectional image is cut out by a plane orthogonal to the z axis (that is, a plane parallel to the xy plane). The cross-sectional image 203 shown in FIG. 8 (a) is referred to as a “camera image”. The second method is a method shown in FIG. 8B, in which a cross-sectional image is cut out by a plane orthogonal to the y-axis (that is, a plane parallel to the xz plane). The cross-sectional image 206 shown in FIG. 8B is referred to as “horizontal cross-sectional image” or “Epipolar Plane Image (EPI)”. The third method is a method shown in FIG. 8 (c), in which a cross-sectional image is cut out by a plane orthogonal to the X axis (that is, a plane parallel to the yz plane). The cross-sectional image 205 shown in FIG. 8 (c) is referred to as a “vertical cross-sectional image”.

[0028] FIGS. 9A to 9C are diagrams illustrating examples of cross-sectional images of FIGS. 8A to 8C. When a potted flower (hereinafter referred to as “fl _OW er” t 撮影.) Is photographed to generate a multi-camera still image and the camera image, which is the cross-sectional image in FIG. 9 As shown in (a). Further, when the horizontal cross-sectional image that is the cross-sectional image of FIG. 8B is cut out, for example, as shown in FIG. 9B, the horizontal cross-sectional image that is the cross-sectional image of FIG. 8C is cut out. In this case, for example, as shown in FIG.

FIGS. 10 (a) to 10 (c) are explanatory diagrams conceptually showing the cross-sectional image encoding processing of FIGS. 8 (a) to 8 (c). As shown in Fig. 10 (a), the usual compression method for multi-camera still images is to treat a camera image sequence in which camera images are arranged in sequence as a moving image and apply H.264 / AVC. is there. The compression method shown in Fig. 10 (a), as shown in Fig. 10 (b), the compression method applying H.264ZAVC to the horizontal slice image sequence in which the horizontal slice images are arranged in order, and Fig. 10 (c) As shown in Fig. 4, the results of a comparison of compression methods applying H. 2 64ZAVC to a vertical slice image sequence in which vertical slice images are arranged in order are shown below. Note that the coding method in Figs. 10 (b) and 10 (c) only changes the cross-sections and does not resample, so the image quality will not deteriorate depending on the application of these coding methods! There are advantages.

[0030] FIGS. 11 (a) to 11 (c) are graphs showing the results of compression encoding processing of flower with a camera image sequence, a horizontal slice image sequence, and a vertical slice image sequence, respectively. 11 (a) to 11 (c), the horizontal axis represents the bit rate (bpp (bit / pel)), and the vertical axis represents PSNR (peak-signal to noise ratio) (dB). In order to measure the effect of the camera distance, f shown in Fig. 11 (a) When shooting at 0.25 ° intervals for lower, when shooting at 1 ° intervals for the flower shown in Fig. 11 (b), at 3 ° intervals for the flower shown in Fig. 11 (c). An experiment was conducted when shooting. JM7.3, which is H.264ZAVC encoding software, was used as the software used for the compression code.

[0031] As shown in Figs. 11 (a) and 11 (b), in a practical range with a PSNR of 30 to 40 dB, if the camera interval is about 1 ° or less, compression for a vertical section image sequence (shown by a black triangle) The encoding process is effective. As shown in Fig. 11 (c), when the camera interval is about 3 ° or more, the compression encoding process for the camera image sequence is effective. This is the result of an experiment conducted with the resolution of the camera image set to 400 pixels wide and 288 pixels high. In addition, when the resolution is low (for example, 200 pixels wide and 144 pixels long), compression code processing for vertical slice image sequences is effective even when the camera interval is wide.

[0032] Figs. 12 (a) and 12 (b) show that a multi-camera still image is generated using images obtained by acquiring a plurality of still images by a camera with a plurality of positions on a straight line directed to a subject. 4 is a graph showing the results of an experiment in which compression encoding processing is performed on a horizontal slice image sequence and a vertical slice image sequence. Figure 12 (a) shows the experimental results when using an image acquired with a camera image resolution of 320 pixels wide, 96 pixels long, and a camera spacing of mm. Figure 12 (b) shows the experimental results when using an image acquired with a camera image resolution of 128 pixels wide, 96 pixels high, and a camera spacing of 4 mm. 12 (a) and 12 (b), the horizontal axis represents the bit rate (bpp), and the vertical axis represents PSNRy (dB). As shown in Fig. 12 (a), when the resolution is high, the method of compressing the camera image sequence is effective, but as shown in Fig. 12 (b), the resolution is high. Is low, both the method of compressing and encoding the camera image sequence and the method of compressing and encoding the vertical slice image sequence are effective.

[0033] As described above, when the camera image has a high resolution and the camera interval is sparse, the camera image sequence is effective. However, as the camera image has a low resolution and the camera interval becomes narrower, the vertical cross-sectional image sequence is effective. Become effective. Furthermore, when an experiment was performed in which the resolution was reduced compared to the resolution shown in FIG. 12 (b), it is not clearly shown as a graph, but it is more vertical than the method of compressing and encoding the camera image sequence. A method that compresses and encodes image sequences is more effective The experimental results showed that Note that the graphs shown in Figs. 12 (a) and 12 (b) reduce the horizontal (X-axis direction) resolution while fixing the vertical (y-axis direction) pixels (vertical 96 pixels) ( Forces showing experimental results when 320 pixels are 128 pixels) When the horizontal (X-axis direction) pixels are fixed and the vertical (y-axis direction) resolution is reduced, the same experimental results are shown. I was strong.

[0034] As shown in FIGS. 5 to 7, the compression encoding method of the present invention performs compression encoding processing on a vertical slice image sequence, but includes a camera image sequence, a horizontal slice image sequence, and a vertical slice image. It may be configured to compare the results of the compression encoding processing with the columns and execute the compression encoding method having the highest compression encoding efficiency. Therefore, in consideration of the efficiency of information compression of multi-camera images, the resolution of the multi-camera images and the density of the intervals between the cameras are determined (that is, the comparison result with the predetermined reference resolution and the predetermined reference interval). Based on the comparison result, processing for selecting a compression encoding method using the camera image sequence (see FIG. 12 (a)) may be performed. Similarly, depending on the resolution of the multi-camera image and the density of the gap between the camera and the camera (that is, based on the comparison result with a predetermined reference resolution and the comparison result with a predetermined reference interval), Select either compression encoding using image sequence (see Fig. 12 (a)) or compression encoding using vertical slice image sequence (see Fig. 12 (b)). Processing may be performed. In addition, considering the efficiency of information compression of multi-camera images, select a method for compression coding using vertical slice image sequences according to the resolution of multi-camera images and the density of the distance between cameras. Processing may be performed. Note that “predetermined reference resolution” and “predetermined reference interval” refer to the resolution and interval determined corresponding to a still image. In the case of flower, as an example of resolution, 400 pixels horizontally, 288 pixels vertically, An example of the interval can be given once.

FIG. 13 is a block diagram schematically showing the configuration of an image information code encoding device 300 that can implement the image information compression method of the present invention.

As shown in FIG. 13, the image information encoding device 300 includes N input terminals 301 to 301 (N is an integer of 2 or more), N AZD conversion units 302 to 302, pixels Sort buff

1 N 1 N

303, adder 304, orthogonal transform unit 305, quantization unit 306, variable encoding unit 307, An accumulation buffer 308, an output terminal 309, and a rate control unit 310 are provided. In addition, the image information encoding device 300 includes an inverse quantization unit 311, an inverse orthogonal transform unit 312, a multi-camera frame 313, and a motion prediction / compensation unit 315. The image information encoding apparatus 300 shown in FIG. 5 has a single force provided with a plurality of input terminals 301 to 301 and AZD converters 302 to 302 so that image information of a plurality of camera forces can be received. The subject of the camera

N 1 N

When the subject is photographed by moving it on a circle around the center or on a straight line facing the subject, only one input terminal and one AZD conversion unit are required.

[0037] Each of the input terminals 301 to 301 of the image information encoding device 300 has a plurality of shootings.

1 N

An analog video signal acquired by the camera with positional force is input. The arrangement of the camera is, for example, as shown in Figs. 2 (b), (d), and (e). Input terminal 301-301 input

Each 1 N analog video signal is converted into a digital video signal by the AZD converters 302 to 302.

1 N

And is stored in the pixel rearrangement buffer 303. Input terminals 301 to 301

When digital video signal is input to 1 N, AZD converters 302 to 302 are not required

1 N

[0038] The pixel rearrangement buffer 303 of the image information encoding device 300 generates a multi-camera still image from the image information supplied from the AZD conversion units 302 to 302 N, and generates a vertical cross-sectional image from the multi-camera still image. To extract. The pixel rearrangement buffer 303 supplies the image information of the entire frame to the orthogonal transform unit 305 for an image to be subjected to intra-frame coding (intra coding). The orthogonal transform unit 305 performs orthogonal transform such as discrete cosine transform on the image information, and supplies transform coefficients to the quantization unit 306. The quantization unit 306 performs a quantization process on the transform coefficient supplied from the orthogonal transform unit 305.

[0039] The variable code key unit 307 determines the quantized transform coefficient and quantization scale iso-power code key mode supplied from the quantization unit 306, and sets a variable length code for this coding mode. Or variable coding such as arithmetic coding is performed to form information to be inserted into the header portion of each image code key. Then, the variable code key unit 307 supplies the encoded encoding mode to the storage buffer 308 for storage. The encoded code mode is output from the output terminal 309 as image compression information. The variable code key unit 307 applies a variable code key such as a variable-length code key or an arithmetic code key to the quantized transform coefficient, and the code key The converted conversion coefficient is supplied to the accumulation buffer 308 and accumulated. The encoded conversion coefficient is output from the output terminal 309 as image compression information.

[0040] The behavior of the quantization unit 306 is controlled by the rate control unit 310 based on the data amount of the transform coefficient accumulated in the accumulation buffer 308. Further, the quantization unit 306 supplies the quantized transform coefficient to the inverse quantization unit 311, and the inverse quantization unit 311 performs inverse quantization on the quantized transform coefficient. The inverse orthogonal transform unit 312 performs inverse orthogonal transform processing on the inversely quantized transform coefficients to generate decoded image information, and supplies the information to the frame memory 313 for accumulation.

In addition, the pixel rearrangement buffer 303 supplies image information to the motion prediction / compensation unit 315 for an image on which inter-frame prediction encoding (inter-encoding) is performed. The motion prediction / compensation unit 315 performs an encoding process on the image information, supplies the generated reference image information to the adder 304, and the adder 304 converts the reference image information into a difference signal from the corresponding image information. To do. In addition, the motion prediction / compensation unit 315 supplies motion vector information to the variable encoding unit 307 at the same time.

The variable encoding unit 307 changes the encoding mode based on the quantized transform coefficient and quantization scale from the quantization unit 306, the motion vector information supplied from the motion prediction / compensation unit 315, and the like. Then, variable encoding such as variable length encoding or arithmetic encoding is performed on the determined encoding mode, and information to be inserted in the header portion of each image code key is generated. Then, the variable code key unit 307 supplies the encoded code key mode to the accumulation buffer 308 for accumulation. The encoded code mode is output as image compression information.

[0043] Further, the variable code key unit 307 performs variable coding processing such as variable length code key or arithmetic coding on the motion vector information, and is inserted into the header part of each image code key. Information is generated. In contrast to intra coding, in the case of inter coding, image information input to the orthogonal transform unit 305 is a difference signal obtained from the adder 304. The other processes are the same as those in the case of image compression using intra codes.

FIG. 14 is a flowchart showing the encoding process of the image information encoding apparatus 300 shown in FIG. As shown in FIG. 14, the image information encoding device 300 includes an AZD conversion unit 30. 2 to 302, the AZD conversion of the input analog video signal is applied to all frames.

1 N

Then, the pixel rearrangement is performed by the pixel rearrangement buffer 303 (step ST12), and then the motion prediction / compensation unit 315 performs motion prediction / compensation (step ST13). After that, the image information generated by the orthogonal transform unit 305 is orthogonally transformed (step ST14), the quantization unit 306 and the rate control unit 310 perform quantization and quantization rate control (steps ST15 and ST16), and variable The code key unit 307 performs variable code key (step ST17), the inverse quantization unit 311 performs inverse quantization (step ST18), and the inverse orthogonal transform unit 312 performs inverse orthogonal transform (step ST19). The processing of steps STl 3 to ST19 is performed for all blocks having a predetermined number of pixels in the frame.

FIG. 15 is a block diagram schematically showing a configuration of an image information decoding device 400 corresponding to the image information encoding device 300.

As shown in FIG. 15, the image information decoding apparatus 400 includes an input terminal 401, a storage buffer 402, a variable decoding unit 303, an inverse quantization unit 404, an inverse orthogonal transform unit 405, an adder 406, pixel rearrangement buffer 407, N DZ A conversion units 408 to 408, and N output terminals

1 N

With children 409-409. In addition, the image information decoding apparatus 400 includes a frame memory 410.

1 N

And a motion prediction / compensation unit 412. An image information decoding apparatus 400 shown in FIG. 15 includes a plurality of output terminals 401 to 401 and a plurality of AZD conversion units 402 to 402.

1 N 1 N

There may be one power terminal and one AZD converter. N output terminals 409 ~ 409

When outputting 1 N digital video signals, N DZA converters 408 to 408 are not required.

1 N.

In the image information decoding apparatus 400 shown in FIG. 15, the image compression information input from the input terminal 401 is temporarily stored in the storage buffer 402 and then transferred to the variable decoding unit 403. The variable decoding unit 403 performs processing such as variable length decoding or arithmetic decoding on the image compression information based on the determined format of the image compression information, and acquires code key mode information stored in the header unit. This is supplied to the inverse quantization unit 404 and the like. Similarly, the variable decoding unit 403 acquires the quantized transform coefficient and supplies it to the inverse quantization unit 404. Furthermore, if the variable decoding unit 403 has been subjected to interframe decoding to be decoded, the variable decoding unit 403 also decodes the motion vector information stored in the header portion of the image compression information, and The information is supplied to the motion prediction / compensation unit 412.

[0048] The inverse quantization unit 404 inverse-quantizes the quantized transform coefficient supplied from the variable decoding unit 403, and supplies the transform coefficient to the inverse orthogonal transform unit 405. The inverse orthogonal transform unit 405 performs inverse orthogonal transform such as inverse discrete cosine transform on the transform coefficient based on the determined format of the image compression information. Here, in the case where the target frame force is S intra code, the image information subjected to the inverse orthogonal transform processing is stored in the pixel rearrangement buffer 407, and is transmitted in the DZ A conversion units 408 to 408. After DZA conversion processing, output terminals 409 to 4

1 N 1

09 power is output.

N

[0049] Also, when the target frame force inter-coding is performed, the motion prediction / compensation unit 412 includes the motion vector information subjected to the variable decoding process and the image information stored in the frame memory 410. A reference image is generated based on the above and supplied to the adder 406. The adder 406 combines the reference image and the output from the inverse orthogonal transform unit 405. The other processes are the same as those of the intra-coded frame.

FIG. 16 is a flowchart showing the encoding process of the image information decoding apparatus 400 shown in FIG. As shown in FIG. 16, the image information decoding apparatus 400 performs motion prediction compensation on the image information after variable decoding of the input signal (step ST21), inverse quantization (step ST22), and inverse orthogonal transform (step ST23). If so, decode using motion prediction compensation (step ST24) and perform this process for all blocks. Thereafter, pixel rearrangement (step ST25) and DZA conversion (step ST26) are performed.

[0051] As described above, the image information encoding apparatus 300 that can perform the image information compression method of the present invention and the image information decoding that can decode the image information encoded by the image information compression method of the present invention. Although the apparatus 400 has been described as an example, the image information encoding apparatus 300 and the image information decoding apparatus 400 that can implement the image information compression method of the present invention are not limited to those having the above-described configuration. The image information compression method of the present invention can also be applied to an apparatus having a configuration. Next, an embodiment of the image information compression method of the present invention and an FTV system to which the image information compression method of the present invention is applied will be described.

[0052] <Description of FTV system>

FIG. 17 is a diagram conceptually showing the basic structure of the FTV system of the present invention. Fig. 17 In FIG. 1, the same or corresponding components as those shown in FIG.

[0053] In this FTV system, the transmission-side device 350 and the reception-side device 450 are separated from each other, and from the transmission-side device 350 to the reception-side device 450, for example, using the Internet or the like, A system for transmitting signals.

[0054] As shown in FIG. 17, the transmission-side apparatus 350 includes a plurality of cameras (in FIG.

Force that shows 4 units from 2 to 102 More cameras are actually used. ) And the power of multiple units

14

An image information encoding device 300 having the configuration and function described in the above embodiment for compressing and encoding video information acquired by a camera is provided. The image information compressed and encoded by the image information encoding device 300 is sent to the receiving device 450 by a communication device (not shown).

[0055] In addition, the receiving-side device 450 includes a receiving device, the image information decoding device 400 described in the first embodiment, and an output signal from the image information decoding device 400, not shown. Then, a light ray space 103 is formed on the basis of the information, and a cross section is extracted from the light ray space 103 according to the viewpoint position input from the user interface 104 and displayed.

[0056] As shown in FIGS. 3 (a), (b) and FIGS. 4 (a) to (c), for example, by using the ray space method, by cutting an arbitrary surface from the ray space 103, It is possible to generate an image viewed from an arbitrary viewpoint in the horizontal direction in real space. For example, when the cross section 103a is cut out from the ray space 103 shown in FIG. 4 (a), an image as shown in FIG. 4 (b) is generated, and the cross section 103b is drawn from the ray space 103 shown in FIG. When cut out, the image shown in Fig. 4 (c) is generated.

[0057] As described above, since the FTV system uses the image information compression method described in the above embodiment, the coding efficiency of the FTV signal in the FTV system is improved. Can be made.

Claims

The scope of the claims

[1] A step of acquiring a plurality of still images by a plurality of positions on a circle centered on a subject or a plurality of position forces on a straight line;

Generating a plurality of vertical cross-sectional images by cutting the multi-camera still image along a plane perpendicular to the xz plane including the X axis and the z axis and perpendicular to the xy plane including the X axis and the y axis; ,

Each of the plurality of vertical cross-sectional images is treated as a plurality of frames arranged in the time axis direction of the moving image, and the plurality of vertical slice images are used by using an intra-frame prediction code using the intra-frame code and the correlation between frames. A step of sign-processing the cross-sectional image;

An image information compression method characterized by comprising:

[2] The image information compression method according to claim 1, wherein the encoding processing of the plurality of vertical slice images includes processing conforming to the H.264ZAVC standard or processing conforming to the MPEG2 standard. .

[3] The resolution of the still image acquired by the camera is higher than a predetermined reference resolution. When the interval between the plurality of positions for acquiring the still image is sparser than the predetermined reference interval, the vertical cross-sectional image Instead of the step of generating the image and the step of encoding the plurality of vertical sectional images, each of the plurality of camera images acquired by the camera is treated as a plurality of frames arranged in the time axis direction of the moving image. The image according to claim 1, further comprising a step of encoding the plurality of camera images using an intra-frame code and an inter-frame prediction code using a correlation between frames. Information Compression method.

[4] Multiple positions on the circumference centered on the subject or multiple position forces on a straight line Multiple still images acquired by the camera are converted into the z-axis of the Cartesian coordinate system consisting of the X-axis, y-axis, and z-axis. Multi-camera still image generation means for generating a multi-camera still image by arranging the plurality of positions in order such that adjacent still images face each other in a direction; A vertical cross-section that generates a plurality of vertical cross-sectional images by cutting the multi-camera still image along a plane perpendicular to the XZ plane including the X-axis and the Z-axis and perpendicular to the xy plane including the X-axis and the y-axis. Image generating means;

An image information compression apparatus comprising:

5. The image information compression apparatus according to claim 4, wherein the encoding processing of the plurality of vertical slice images includes processing conforming to the H.264ZAVC standard or processing conforming to the MPEG2 standard. .

[6] The resolution of the still image acquired by the camera is higher than a predetermined reference resolution. When the interval between the plurality of positions for acquiring the still image is sparser than the predetermined reference interval, the encoding means Instead of generating the vertical slice image and encoding the plurality of vertical slice images, each of the plurality of camera images acquired by the camera is treated as a plurality of frames arranged in the time axis direction of the moving image, 5. The image information compression apparatus according to claim 4, wherein the plurality of camera images are encoded using an intra-frame encoding and an inter-frame prediction code using a correlation between frames.

[7] Still image acquisition means for acquiring a plurality of still images on a plurality of positions on a circle centered on a subject or a plurality of positions on a straight line;

The image information compression apparatus according to claim 4, wherein the plurality of still images are subjected to encoding processing.

An image information decoding device for decoding the encoded information output from the image information compression device;

A user interface for inputting the viewpoint position of the viewer;

A free viewpoint television system characterized by comprising: