WO2007081117A1

WO2007081117A1 - Method and apparatus for inter-viewing reference in multi-viewpoint video coding

Info

Publication number: WO2007081117A1
Application number: PCT/KR2007/000083
Authority: WO
Inventors: Suk-Hee Cho; Namho Hur; Soo-In Lee; Yung-Lyul Lee; Jae-Ho Hur; Dae-Yeon Kim; Yung-Ki Lee
Original assignee: Electronics And Telecommunications Research Institute
Priority date: 2006-01-07
Filing date: 2007-01-05
Publication date: 2007-07-19
Also published as: EP1972141A1; KR20070074495A; EP1972141A4

Abstract

Provided is an inter-view frame reference method and apparatus used in multi-view video encoding. In the present invention, pictures at the sequence of spatially closer viewpoints are preferentially referred to from among pictures at previously encoded viewpoints in each Group Of Pictures (GOP), except for a basic GOP that makes only a temporal frame reference at its viewpoint, an Instantaneous Decoder Refresh (IDR) picture, and pictures at the last point of time of encoding.

Description

METHOD AND APPARATUS FOR INTER-VIEWING REFERENCE IN MULTI- VIEWPOINT VIDEO CODING

TECHNICAL FIELD

The present invention relates to an inter-view frame reference method and apparatus for coding multi-view video data captured by a plurality of cameras having different viewpoints.

BACKGROUND ART

At present, the H.264 standard for video encoding refers to various frames that temporally precede or follow a current frame at the same viewpoint as the view point of the current frame.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an inter-view frame reference apparatus for encoding multi-view video data according to an exemplary embodiment of the present invention.

FIG. 2 illustrates a coding structure of a multi-view Group Of Pictures (GOP) suggested for inter-view referencing of multi-view video data according to an exemplary embodiment of the present invention.

FIG. 3 illustrates an inter-view frame reference scheme for encoding a Predictive (P)-picture in a camera type having a primary parallel and arch structure.

FIG. 4 illustrates an inter-view frame reference scheme for encoding a Bidirectional (B)-picture in a camera type having a primary parallel and arch structure.

FIG. 5 illustrates an inter-view frame reference scheme for encoding a P picture and a B picture in a camera type having a secondary cross structure.

FIG. 6 illustrates another inter-view frame reference scheme for encoding a P picture and a B picture in a camera type having a secondary cross structure.

FIG. 7 illustrates an inter-view frame reference scheme for encoding a P picture and a B picture in a camera type having a secondary parallel structure (3x5).

FIG. 8 is a flowchart of an inter-view frame reference method for encoding multi- view video data according to an exemplary embodiment of the present invention.

FIGS. 9A and 9B illustrate test environments for testing the effect of an inter-view frame reference method according to an exemplary embodiment of the present invention.

FIGS. 1OA through 1 OG illustrate rate-distortion (RD) curves based on the test environments illustrated in FIGS. 9A and 9B.

DETAILED DESCRIPTION OF THE INVENTION

TECHNICAL PROBLEM So far, reference has been made to frames at the same viewpoint.

TECHNICAL SOLUTION

The present invention suggests a technique for referring to frames at different viewpoints from the viewpoint of a current frame to be encoded and at the same point of time as the point of time of the current frame or frames at different viewpoints and at different points of time from the viewpoint and the point of time of the current frame in multi-view video coding, thereby reducing a residual signal and thus improving a compression rate when compared to a technique for referring to a temporally past or future frame at the same viewpoint as the viewpoint of the current frame.

ADVANTAGEOUS EFFECTS

By referring to frames at spatially and temporally adjacent viewpoints in multi- view video data, a compression rate can be improved. Therefore, frames at different viewpoints from the viewpoint of a current frame to be encoded and at the same point of time as the point of time of the current frame or frames of different viewpoints and at different points of time from the viewpoint and the point of time of the current frame are referred to, reducing a residual signal and thus improving a compression rate when compared to a technique for referring to a temporally past or future frame at the same viewpoint as the viewpoint of the current frame.

BEST MODE

According to an aspect of the present invention, there is provided an inter-view frame reference apparatus for encoding multi-view video data, the inter-view frame reference apparatus comprising: a multi-view Group Of Pictures (GOP) arrangement unit arranging a multi-view GOP so that the multi-view GOP includes GOPs, wherein each of GOPs corresponds to each of at least one viewpoint and coding structures of each GOP are the same as one another; and a cross-view reference unit referring to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs.

The inter-view frame reference apparatus may further include a same-view reference unit referring to previously encoded pictures of same viewpoint.

A basic GOP may be an H.264/AVC GOP.

Encoding may be performed in the order of an Intra-predictive (I) picture, a Predictive (P) picture, a reference Bidirectional (rB) picture, and then a Bidirectional (B) picture in the GOP of each viewpoint.

When a P picture to be encoded at an i^th viewpoint and at a point of time t is b(i, t), the cross-view reference unit may refer to at least one of b(i-1 , t), b(i-1 , t-n), and b(i+1 , t- n) so as to encode the current P picture b(i, t) and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t.

When an rB picture to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross-view reference unit may refer to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current rB picture and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture or a P picture is encoded after the point of time t.

When a B picture to be encoded at the i^th viewpoint and the point of time t is b(i, t), the cross-view reference unit may refer to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current B picture and (t-n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded after the point of time t.

According to another aspect of the present invention, there is provided an interview frame reference method for encoding multi-view video data, the inter-view frame reference method comprising a multi-view Group Of Pictures (GOP) arrangement operation of arranging a multi-view GOP so that the multi-view GOP includes GOPs₁ wherein each of GOPs corresponds to each of at least one viewpoint and coding structures of each GOP are the same as one another; and a cross-view reference operation of referring to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs.

Hereinafter, an exemplary embodiment of the present invention will be described in detail with reference to the annexed drawings. It should be noted that like reference numerals refer to like elements throughout the specification. In the following description, a detailed description of known functions and configurations incorporated herein has been omitted for conciseness.

MODE OF THE INVENTION

FIG. 1 is a block diagram of an inter-view frame reference apparatus 100 for encoding multi-view video data according to an exemplary embodiment of the present invention.

The inter-view frame reference apparatus 100 includes a multi-view Group Of Pictures (GOP) arrangement unit 110, a cross-view reference unit 120, and a same- view reference unit 130.

The multi-view GOP arrangement unit 110 includes GOPs, wherein each of GOPs corresponds to each of at least one viewpoint. Each of the GOPs includes an Instantaneous Decoder Refresh (IDR) picture at the initial point of time of encoding and each of the GOPs in the multi-view GOP has the same coding structure (see FIG. 2).

A multi-view GOP arranged by the multi-view GOP arrangement unit 110 includes not only a H.264/AVC GOP but also other viewpoints along a viewpoint axis and each of the GOPs includes an IDR picture at the initial point of time of encoding along a time axis. In a GOP corresponding to each viewpoint, encoding is performed in the order of an Intra-predictive (I) picture, a Predictive (P) picture, a reference Bidirectional (rB) picture, and then a Bidirectional (B) picture.

The coding structure of a multi-view GOP suggested by an embodiment of the present invention for inter-view frame reference of multi-view video data will be described in more detail with reference to FIG. 2.

The cross-view reference unit 120 improves a compression rate by referring to frames at most spatially and temporally adjacent viewpoints. More specifically, the cross-view reference unit 120 refers to frames (or pictures) of different viewpoints at the same point of time and frames (or pictures) of different viewpoints at different point of time in a multi-view GOP, except for a basic GOP that only makes a temporal frame reference at its viewpoint, an IDR picture, and pictures at the last viewpoint.

More specifically, pictures referred to by the cross-view reference unit 120 in order to encode a P picture, a reference B (rB) picture, and a B picture are as follows.

When a P picture to be encoded at an i^th viewpoint and at a point of time t is b(i, t), the cross-view reference unit 120 refers to at least one of b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) in order to encode the current P picture b(i, t). In this case, (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t, which will be described with reference to FIG. 3.

When an rB picture that is to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross-view reference unit 120 refers to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) to encode the current rB picture. In this case, (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture or a P picture is encoded after the point of time t, which will be described later with reference to FIG. 4.

When a B picture to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross-view reference unit 120 refers to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t- n), and b(i+1 , t-n) in order to encode the current B picture. In this case, (t-n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded after the point of time t, which will be described later with reference to FIG. 4.

The cross-view reference unit 120 applies inter-view frame reference schemes differently according to camera types. FIGS. 3 and 4 illustrate an inter-view frame reference scheme in a camera type having a primary parallel and arch structure, FIGS. 5 and 6 illustrate an inter-view frame reference scheme in a camera type having a secondary cross structure, and FIG. 7 illustrates an inter-view frame reference scheme in a camera type having a secondary parallel structure (3x5).

The same-view reference unit 130 refers to previously encoded pictures at the same viewpoint in order to encode current picture, i.e., temporally preceding and following pictures at the same viewpoint.

FIG. 2 illustrates the coding structure of a multi-view GOP suggested for interview frame referencing of multi-view video data according to an exemplary embodiment of the present invention.

In FIG. 2, an x axis indicates a viewpoint and a y axis indicates a point of time. A multi-view GOP 200 includes at least one H.264/AVC GOP and each of GOPs jsαrresponding Jo-each-of -viewpoints. — The coding-struetures -of-the GΘPs-are- the same as one another.

The H.264/AVC GOP makes only a temporal frame reference at its viewpoint and does not make an inter-view frame avireference. Any viewpoint in multi-view video data can be selected as the H.264/AVC GOR

The coding structure of a GOP at each viewpoint may be I B rB B P B rβ B P. Alternatively, the coding structure of a GOP may also be I rB rB rB P rB rB rB P. In this case, rB is a B picture that can be used as a reference picture.

FIG. 3 illustrates an inter-view frame reference scheme for encoding a P picture in a camera type having a primary parallel and arch structure.

In this case, a plurality of cameras, while arranged in the primary parallel and arch structure, capture video data. Indices O through 9 indicate the encoding orders of P pictures and arrows indicate inter-view frames that are to be referred to.

For example, for inter-view frame referencing of P pictures, P pictures of a multi- view GOP, except for an H.264/AVC GOP and pictures at the last viewpoint, refer to a P picture at its left viewpoint and at the same point of time, an I picture or a P picture at its left viewpoint, which has been encoded immediately prior to the point of time t, and an I picture or a P picture at its right viewpoint, which has been encoded immediately prior to the point of time t. The last camera refers to a P picture from the left camera and at the same point of time and an I picture or a P picture from the left camera, which has been encoded immediately prior to the point of time t.

FIG. 4 illustrates an inter-view frame reference scheme for encoding a B picture in a camera type having a primary parallel and arch structure.

For example, for inter-view frame referencing of rB pictures that are located temporally between I pictures and P pictures, multi-view GOPs), except for an H.264/ AVC GOP and pictures at the last viewpoint, refer to an rB picture from the left camera, which has been encoded at the same point of time, a P picture from the left camera, which is to be encoded immediately after the point of time t, and a P picture from the right camera, which is to be encoded immediately after the point of time t. The last camera refers to a B picture from the left camera, which has been encoded at the same point of time, and a P picture from the left camera, which is to be encoded immediately after the point of time t.

More specifically, for example, an rB picture indicated by an index 1 can refer to three pictures such as (O, 20, 22), (0, 15, 17), or the like out of pictures (0, 20, 22, 15, 17).

For inter-view frame referencing of B pictures that are located temporally between rB pictures and P or I pictures, multi-view GOPs, except for an H.264/ AVC GOP and pictures at the last viewpoint, refer to an rB picture from the left camera, which has been encoded immediately prior to the point of time t, a P picture or an rB picture from the left camera, which is to be encoded immediately after the point of time t, and a P picture or an rB picture from the right camera, which is to be encoded immediately after the point of time t. The last camera refers to an rB picture from the left camera, which has been encoded immediately prior to the point of time t, and a P picture or an rB picture from the left camera, which is to be encoded immediately after the point of time t.

More specifically, for example, a B picture indicated by an index 6 can refer to three pictures such as (0, 2, 15), (0, 15, 17), or the like out of pictures (0, 2, 15, 17).

In this case, a plurality of cameras, while arranged in the secondary cross structure, capture video data. Indices 0 through 4 indicate the encoding orders of pictures and arrows indicate inter-view frames to be referred to.

Multi-view GOPs, except for an H.264/AVC GOP, refer to inter-view frames as follows. The central viewpoint has no previously encoded viewpoint at the same point of time and thus does not make an inter-view frame reference. The left viewpoint refers to a picture at the central viewpoint, which has been encoded at the same point of time. The top viewpoint refers to the picture at the central viewpoint encoded at the same point of time and a picture at the left viewpoint. The right viewpoint refers to the picture at the central viewpoint encoded at the same point of time and the picture at the top viewpoint. The bottom viewpoint refers to the picture at the central viewpoint encoded at the same point of time and the pictures at the left and right viewpoints.

FIG. 6 illustrates another inter-view frame reference scheme for encoding a P picture and a B picture in a camera type having a secondary cross structure. In this case, an inter-view frame reference is made like in FIG. 6 and not like in FIG. 5.

In other words, multi-view GOPs, except for an H.264/AVC GOP, refer to interview frames as follows. The central viewpoint has no previously encoded viewpoint at the same point of time and thus does not make an inter-view frame reference. The left viewpoint refers to a picture at the central viewpoint, which has been encoded at the same point of time. The right viewpoint refers to the picture at the central viewpoint encoded at the same point of time and a picture at the left viewpoint. The top viewpoint refers to the picture at the central viewpoint encoded at the same point of time and the pictures at the left and right viewpoints. The bottom viewpoint reters to tne picture at the central viewpoint encoded at the same point of time and the pictures at the left and right viewpoints.

In this case, a plurality of cameras, while arranged in the secondary parallel structure, capture video data. Indices 0 through 14 indicate the encoding orders of pictures and arrows indicate inter-view frames to be referred to.

Multi-view GOPs, except for an H.264/AVC GOP₁ refer to inter-view frames as follows. The central viewpoint indicated by an index 0 has no previously encoded viewpoint at the same point of time and thus does not make an inter-view frame reference.

A viewpoint indicated by the index 1 refers to a picture (0) at the central viewpoint encoded at the same point of time. A viewpoint indicated by the index 2 refers to the picture at the central viewpoint (0) and a picture at its left viewpoint (1). A viewpoint indicated by the index 3 refers to the picture at the central viewpoint (0) and pictures at its top (2) and left (1 ) viewpoints. A viewpoint indicated by the index 4 refers to the picture at the central viewpoint (0) and pictures at its upper left (1 ) and right (3) viewpoints. A left viewpoint indicated by an index 9 among viewpoints in the middle row, except for the central viewpoint 0 and its neighboring viewpoints 1 and 3, refers to pictures at its upper right viewpoint (5), its right viewpoint (1 ), and its lower right viewpoint (7). A right viewpoint indicated by 10 refers to pictures at its upper left viewpoint (6), its left viewpoint (3), and its lower left viewpoint (8).

Each of upper left viewpoints 5 and 11 with respect to the central viewpoint (0) refers to pictures at its right viewpoint (2 or 5), its lower right viewpoint (0 or 1 ), and its lower viewpoint (1 or 9). Each of upper right viewpoints 6 and 12 with respect to the central viewpoint 0 refers to pictures at its left viewpoint, its lower left viewpoint, and its lower viewpoint. Each of lower left viewpoints 7 and 13 with respect to the central viewpoint 0 refers to pictures at its right viewpoint, its upper right viewpoint, and its upper viewpoint. Each of lower right viewpoints 8 and 14 with respect to the central viewpoint 0 refers to pictures at its left viewpoint, its lower left viewpoint, and its lower viewpoint.

FIG. 8 is a flowchart of an inter-view frame reference method for encoding multi- view video data according to an exemplary embodiment of the present invention. The inter-view frame reference method arranges a multi-view GOP for same-view referencing and cross-view referencing and then refers to a picture at a spatially and temporally adjacent viewpoint among previously encoded pictures at viewpoints.

A multi-view GOP used for cross-view reference is arranged in operation S810 so that the multi-view GOP includes GOPs, wherein each of the GOPs corresponds to each of at least one viewpoint and each of the GOPs includes an IDR picture at the initial point of time of encoding and coding structures of each GOP are the same as one another in the multi-view GOP. In operation S820, in each of the GOPs, pictures are referred in spatially closer viewpoints sequence among previously encoded pictures of different viewpoints, except for a basic GOP that makes only a temporal frame reference at its viewpoint, the IDR picture, and pictures at the last point of time.

In operation S830, a picture at the same point of time is referred to for multi-view video encoding. In other words, a temporally past or future frame at the same viewpoint is referred to. Same-view referencing and cross-view referencing may be performed simultaneously or in a different order.

For a test data set of FIG. 9A, spatial resolutions, temporal resolutions, camera arrangement, and bit-rates for multi-view sequences are given as shown in FIGS. 9A and 9B.

FIGS. 1OA through 1OG illustrate rate-distortion (RD) curves based on the test environments illustrated in FIGS. 9A and 9B

In FIGS. 1OA through 1 OG, RD curves with respect to multi-view test sequences set in the environments of FIGS. 9A and 9B are shown. It can be seen from FIGS. 1OA through 1OG that a Peak Signal to Noise Ratio (PSNR) of 1.2 to 2dB can be obtained for all bitrates of all the multi-view test sequences.

The present invention can also be embodied as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system.

Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (transmission over the Internet). The computer-readable recording medium can also be distributed over network coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, function programs, codes, and code segments for implementing the present invention can be easily construed by those skilled in the art.

The present invention has been particularly shown and described with reference to an exemplary embodiment thereof. Terms used herein are only intended to describe the present invention and are not intended to limit any meaning or the scope of the present invention claimed in the claims.

Therefore, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. Accordingly, the disclosed embodiments should be considered in a description sense not in a restrictive sense. The scope of the present invention will be defined by the appended claims, and differences within the scope should be construed to be included in the present invention.

Claims

1. An inter-view frame reference apparatus for encoding multi-view video data, the inter-view frame reference apparatus comprising: a multi-view Group Of Pictures (GOP) arrangement unit arranging a multi-view GOP so that the multi-view GOP includes GOPs, wherein each of GOPs corresponds to each of at least one viewpoint and coding structures of each GOP are the same as one another; and a cross-view reference unit referring to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs.

2. The inter-view frame reference apparatus of claim 1 , further comprising: a same-view reference unit referring to previously encoded pictures of same viewpoint.

3. The inter-view frame reference apparatus of claim 1 , wherein the multi- view GOP includes a basic GOP and the basic GOP is an H.264/AVC GOP.

4. The inter-view frame reference apparatus of claim 1 , wherein encoding is performed in the order of an Intra-predictive (I) picture, a Predictive (P) picture, a reference Bidirectional (rB) picture, and then a Bidirectional (B) picture in the GOP of each viewpoint.

5. The inter-view frame reference apparatus of claim 1 , wherein when a P picture to be encoded at an i^th viewpoint and at a point of time t is b(i, t), the cross-view reference unit refers to at least one of b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current P picture b(i, t) and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t.

6. The inter-view frame reference apparatus of claim 1 , wherein when an rB picture to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross- view reference unit refers to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current rB picture and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture or a P picture is encoded after the point of time t.

7. The inter-view frame reference apparatus of claim 1 , wherein when a B picture to be encoded at the i^th viewpoint and the point of time t is b(i, t), the cross-view reference unit refers to three out of b(i+1, t+n), b(i-1 , t+n), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current B picture and (t-n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded after the point of time t.

8. The inter-view frame reference apparatus of claim 1 , wherein when viewpoints and points of time form coordinate axes in a camera type having a secondary cross structure and a camera type having a secondary parallel structure, the cross-view reference unit preferentially encodes a viewpoint having the smallest radius from the closest point to the central point of the points of time and the viewpoints of the multi-view GOP.

9. The inter-view frame reference apparatus of claim 1 , wherein the cross- view reference unit refers to a maximum of three pictures.

10. The inter-view frame reference apparatus of claim 1 , wherein the coding structure of each of the GOPs at each of the viewpoints is I B rB B P B rB B P and rB is a B picture that can be used as a reference picture.

11. The inter-view frame reference apparatus of claim 1 , wherein the coding structure of each of the GOPs at each of the viewpoints is I rB rB rB P B rB rB P and rB is a B picture that can be used as a reference picture.

12. The inter-view frame reference apparatus of claim 1 , wherein each of the GOPs includes an Instantaneous Decoder Refresh (IDR) picture at the initial point of time of coding.

13. The inter-view frame reference apparatus of claim 1 , wherein the cross- view reference unit refers to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs, except for a basic GOP that makes only a temporal frame reference at its viewpoint, an Instantaneous Decoder Refresh (IDR) picture, and pictures at the last point of time of coding.

14. An inter-view frame reference method for encoding multi-view video data, the inter-view frame reference method comprising: a multi-view Group Of Pictures (GOP) arrangement operation of arranging a multi-view GOP so that the multi-view GOP includes GOPs, wherein each of GOPs corresponds to each of at least one viewpoint and coding structures of each GOP are the same as one another; and a cross-view reference operation of referring to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs.

15. The inter-view frame reference method of claim 14, further comprising a same-view reference operation of referring to previously encoded pictures of same viewpoint.

16. The inter-view frame reference method of claim 14, wherein the multi- view GOP includes a basic GOP and the basic GOP is an H.264/AVC GOP.

17. The inter-view frame reference method of claim 14, wherein encoding is performed in the order of an Intra-predictive (I) picture, a Predictive (P) picture, a reference Bidirectional (rB) picture, and then a Bidirectional (B) picture in the GOP of each viewpoint.

18. The inter-view frame reference method of claim 14, wherein when a P picture to be encoded at an i^th viewpoint and at a point of time t is b(i, t), the cross-view reference operation comprises referring to at least one of b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current P picture b(i, t) and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t.

19. The inter-view frame reference method of claim 14, wherein when an rB picture to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross- view reference operation comprises referring to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current rB picture and (t-n) indicates the closest point of time at which an I picture or a P picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture or a P picture is encoded after the point of time t.

20. The inter-view frame reference method of claim 14, wherein when a B picture to be encoded at the i^th viewpoint and at the point of time t is b(i, t), the cross- view reference operation comprises referring to three out of b(i+1 , t+n), b(i-1 , t+n), b(i-1 , t-n), and b(i+1 , t-n) so as to encode the current B picture and (t-n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded prior to the point of time t and (t+n) indicates the closest point of time at which an I picture, a P picture, or an rB picture is encoded after the point of time t.

21. The inter-view frame reference method of claim 14, wherein when viewpoints and points of time form coordinate axes in a camera type having a secondary cross structure and a camera type having a secondary parallel structure, the cross-view reference operation comprises preferentially encoding a viewpoint having the smallest radius from the closest point to the central point of the points of time and the viewpoints of the multi-view GOP.

22. The inter-view frame reference method of claim 14, wherein the cross- view reference operation comprises referring to a maximum of three pictures.

23. The inter-view frame reference method of claim 14, wherein the coding structure of each of the GOPs at each of the viewpoints is I B rB B P B rB B P and rB is a B picture that can be used as a reference picture.

24. The inter-view frame reference method of claim 14, wherein the coding structure of each of the GOPs at each of the viewpoints is I rB rB rB P B rB rB P and rB is a B picture that can be used as a reference picture.

25. The inter-view frame reference method of claim 14, wherein each of the GOPs includes an Instantaneous Decoder Refresh (IDR) picture at the initial point of time of coding.

26. The inter-view frame reference method of claim 14, wherein the cross- view reference operation comprises referring to pictures in the sequence of spatially closer viewpoints among previously encoded pictures of different viewpoints, in each of the GOPs, except for a basic GOP that makes only a temporal frame reference at its viewpoint, an Instantaneous Decoder Refresh (IDR) picture, and pictures at the last point of time of coding.

27. A computer-readable medium having recorded thereon a program for implementing the inter-view frame reference method in any one of claims 14 through 26.