MXPA97004998A - Control of the proportion for encoding digital video estereoscop - Google Patents

Control of the proportion for encoding digital video estereoscop

Info

Publication number
MXPA97004998A
MXPA97004998A MXPA/A/1997/004998A MX9704998A MXPA97004998A MX PA97004998 A MXPA97004998 A MX PA97004998A MX 9704998 A MX9704998 A MX 9704998A MX PA97004998 A MXPA97004998 A MX PA97004998A
Authority
MX
Mexico
Prior art keywords
image
images
activity
level
bits
Prior art date
Application number
MXPA/A/1997/004998A
Other languages
Spanish (es)
Other versions
MX9704998A (en
Inventor
Chen Xuemin
Original Assignee
General Instrument Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Instrument Corporation filed Critical General Instrument Corporation
Priority to MX9704998A priority Critical patent/MX9704998A/en
Priority claimed from MX9704998A external-priority patent/MX9704998A/en
Publication of MXPA97004998A publication Critical patent/MXPA97004998A/en
Publication of MX9704998A publication Critical patent/MX9704998A/en

Links

Abstract

The proportion control in a stereoscopic digital video communication system is carried out by modifying the level of quantification of the structure data P or B in the improvement layer depending on whether the structure is predicted temporarily (from the same layer) or is predicted by disparity (from the opposite layer). The invention can maintain a consistent image quality by providing additional quantization bits for the P images predicted by disparity, for example, where a structure P can be encoded from a structure B in the enhancement layer. The selected quantization level corresponds to a total bit proportion requirement of the improvement layer. For the P structures predicted by disparity, the size of the quantization stage is modified according to the level of activity of the structure being encoded in the improvement layer, or of the reference structure, whichever is greater. Also, the image quality is improved and the paralysis of the structure is avoided during editing modes such as fast forward and fast regression which require random access to the image data. When the reference structure in the base layer is the first structure of a group of images (GOP), the structure of the corresponding improvement layer will be coded as a structure I or P instead of as a structure B to improve the image quality and eliminate or reduce the propagation of errors during random access

Description

CONTROL OF PROPORTION FOR DIGITAL STEREOSCOPIC VIDEO CODING BACKGROUND OF THE INVENTION The present invention relates to the coding of digital video signals. In particular, a method and apparatus for encoding stereoscopic digital video signals in order to optimize image quality while maintaining bandwidth limitations are presented. A method and apparatus for improving the quality of the image is also presented when invoking editing features such as fast forward or regression. Digital technology has revolutionized the supply of audio and video services to consumers since it can provide much higher quality signals to analog techniques and provides additional features that were previously not available. Digital systems are particularly advantageous for signals that are broadcast via a cable or satellite television network to cable television affiliates and / or directly to local satellite television receivers. In such systems, a subscriber receives the digital data stream through a receiver / decrypter that decompresses and decodes the data in order to reconstruct the original audio and video signals. The digital receiver includes a microcomputer and memory storage elements to be used in this process. However, the need to provide low cost receivers while still providing high quality audio and video requires that the amount of data processed be limited. In addition, the bandwidth available for digital signal transmission can also be limited to physical constraints, existing communication protocols and government regulations. In accordance with the foregoing, various intra-structure data compression schemes have been developed which have the advantage of the spatial correlation between the adjacent pixels in a particular video image (eg, structure). In addition, intra-structure compression schemes have the advantage of temporal correlations between corresponding successive structure regions when using motion compensation data and comparison estimation algorithms per block. In this case, a movement vector for each block is determined in a current description of an image by identifying a block in a previous description that more closely resembles the particular current block. The entire current image can then be reconstructed in a decoder by sending data representing the difference between the corresponding pairs of blocks, together with the motion vectors that are required to identify the corresponding pairs. Block comparison movement estimation algorithms are particularly effective when combined with block-based spatial compression techniques such as discrete cosine transformation (DCT). However, there is now an even greater challenge for the proposed stereoscopic transmission formats, such as the Multiple Vision Profile (MVP) system.
MPEG-2 of the Group of Experts of Image in Motion (MPEG), described in document ISO / IEC JTCI / SC29 / G11 N1088, entitled "Proposed Draft Amendment No. 3 to 13818-2 (Multi-view Profile) ", November 1995, incorporated herein by reference.Sterescopic video provides slightly offset views of the same image to produce a combined image with greater depth of field, thereby creating a three-dimensional effect (3 -D) In such a system, double cameras can be placed about two inches apart to record an event in two separate video signals.The separation of the cameras approximates the distance between the left and right human eyes. , with some stereoscopic video cameras, the two lenses are constructed in a camera head recorder and therefore move in synchronism, for example, when a panoramic shot is made through an image. The two video signals can be transmitted and recombined in a receiver to produce an image with a depth of field that corresponds to normal human vision. Other special effects can also be provided. The MPEG MVP system includes two video layers that are transmitted in a multiplexed signal. First, a base layer represents a left view of a three-dimensional object. Second, an improvement layer (for example, auxiliary) represents a right view of the object. Since the right and left views are the same object and move only slightly in relation to each other, there will usually be a high degree of correlation between the video images of the base and enhancement layers. This correlation can be used to compress the data of the improvement layer relative to the base layer, thereby reducing the amount of data that needs to be transmitted in the enhancement layer to maintain a given image quality. The image quality generally corresponds to the level of quantification of the video data. The MPEG MVP system includes three types of images; specifically, the intra-coded image (image I), predictive coded image (image P), and bi-directional predictive coded image (image B). Furthermore, while the base layer accommodates either the configuration or video structure field sequences, the enhancement layer accommodates only the configuration structure. Image I completely describes a single video image without reference to any other image. For improved error reconciliation, motion vectors can be included with an I image. An error in an I image has the potential for a greater impact on the displayed video since both P images and B images in the base layer are predicted from I images. In addition, the images in the enhancement layer can be predicted from of images in the base layer in a cross-layer prediction process known as disparity prediction. Predicting one image to another within a layer is known as a temporal prediction. In the base layer, P images are predicted based on previous I or P images. The reference is from an I or P image before a future P image and is known as forward prediction. The B images are predicted from the nearest I or P image and the immediate posterior I or P image. In the enhancement layer, an image P can be predicted from the most recently decoded image in the enhancement layer, without taking into account the type of image, or from the most recent base layer image, without taking into account the type, in order of deployment. In addition, with a B image in the enhancement layer, the forward reference image is the most recently decoded image in the enhancement layer, and the backward reference image is the most recent image in the base layer, in order of display . Since the B images in the enhancement layer can be reference images for other images in the enhancement layer, the bit allocation for the P and B images in the enhancement layer must be adjusted based on the complexity (e.g. activity) of the images in the descriptions. In an optional configuration, the enhancement layer has only images P and B, but not images I. The reference to a future image (ie, one that has not yet been displayed) is called backward prediction. There are situations where the regressive prediction is very useful in increasing the compression ratio. For example, in a scene in which a door is opened, the current image can predict what is behind the door based on a future image in which the door is already open. The B images produce the greatest compression but also incorporate the biggest error. To eliminate the propagation of errors, B images can never be predicted from other B images in the base layer. P images produce fewer errors and less compression. I images produce the least compression, but are able to provide random access. In this way, in the base layer, to decode P images, the previous I image or P image must be available. Similarly, to decode B images, the previous P or l and future P or I images must be available. Consequently, the video images are encoded and transmitted in order of dependence, in such a way that all the images used for the prediction are encoded before the images predicted from them. When the encoded signal is received in a decoder, the video images are decoded and reordered for display. According to the above, the temporary storage elements are required to store the data before its deployment. The MPEG-2 standard for non-stereoscopic video signals does not specify any particular distribution that images I, P images and B images must take within a sequence in a layer, but allow different distributions to provide different degrees of compression and random accessibility .
A common distribution in the base layer is to have two B images between successive I or P images. The sequence of the images can be, for example, I ?, B ?, B2, Pi, B3, B4, I2, B5, B6, P2, B7, B8, I3, and so on. In the enhancement layer, an image P can be followed by three images B, with one image I being provided every twelve images P and B, for example, in the sequence Ii, Bi, B2, Pi, B3, B4, P2, B5, B6, P3, B7, B8, I2. Additional details of the MPEG-2 standard can be found in ISO / IEC document JTC1 / SC29 / G11 NO702, entitled "Information Technology - Generating Coding of Moving Pictures and Associated Audio, Recommendation H.262", March 25, 1994, incorporated herein by reference. Figure 1 shows a conventional time and disparity video image prediction scheme of the MPEG MVP system. The arrow heads indicate the direction of the prediction in such a way that the image that is indicated by the head of the arrow is predicted based on the image that is connected to the tail of the arrow. With a base layer sequence (left view) 150 of Ib 155, Bb? 160, Bb2 165, Pb 170, where the subscript "b" denotes the base layer, the temporal prediction occurs as shown. Specifically, Bb? 160 is predicted from I 155 and Pb 170, B2 165 is predicted from Ib 155 and Pb 170, and Pb 170 is predicted from Ib 155. With an improvement layer sequence (right view) 100 from Pe 105, Be_ 110, Be2 115, and _Be3 120, where the subscript "e" denotes that the prediction of the improvement, temporal and / or disparity layer occurs. Specifically, Pe 105 is predicted by disparity from Ib 155. Be? 110 is predicted, both temporarily from Pe 105, as predicted by disparity from Bb_ 160. Be2 115 is temporarily predicted from Be? 110 and is predicted by disparity from B2 165. Be3 120 is temporarily predicted from Be2 115 and is predicted by disparity from Pb 170. Generally, the base layer in the MPEG MVP system is coded according to the protocol of Main Profile (MP), while the improvement layer is encoded according to the MPEG-2 Temporary Scalability tools. For fixed-bandwidth stereoscopic video services, the output bit stream comprising the multiple transmission of the base and enhancement layers must not exceed a given bit rate or corresponding bandwidth. This result can be achieved with separate ratio control schemes in the base and enhancement layers such that the bit rate for each layer does not exceed a given threshold, and the sum of the two bit proportions satisfies the amplitude requirement of total band. Alternatively, the proportion of bits in each layer can be varied as the combined bit rate meets the total bandwidth requirements. In addition, the proportion control scheme must also provide a relatively constant video signal quality over all image types (eg, I, P, and B images) in the enhancement layer and match the Storage Verifier model in Video Intermediate Memory (VBV) in the MPEG MVP system. The VBV is a hypothetical decoder that connects conceptually to the output of an encoder. The encoded data is placed in the buffer at the constant bit rate that is being used, and is removed according to the data that has been in the buffer for the longest period of time. It is required that the bit stream produced by an encoder or editor does not cause the VBV either to excess or to lower flow. With conventional systems, the quality of a P image in the improvement layer may vary depending on whether it is predicted temporarily or predicted by disparity. For example, for a scene with the cameras in a panning to the right, with a constant quantization level, a P-image predicted temporarily from a B-image in the enhancement layer may have a lower quality than if it were predicts by disparity from an I image in the base layer. This is due to, as mentioned, the B images produce the greatest compression but also incorporate the greatest error. In contrast, the quality of a P image of the base layer is maintained since a B image can not be used as a reference image in the base layer. The image quality of the image P corresponds to the size of the average quantization step of the image data P. In addition, the editing operations such as fast forward and regression can be carried out in a decoding terminal in response to the provided commands for a consumer. Such editing operations may result in an encoding error since the group of images (GOP) or structures of the renewal period may be different in the base and enhancement layers, and their respective start points may be temporarily displaced. The GOP consists of one or more consecutive images. The order in which the images are displayed normally differs from the order in which the encoded versions appear in the bit stream. In the bitstream, the first image in a GOP is always an I image. However, in order of display, the first image in a GOP is either an I image or the first B image of the consecutive series of B images. which immediately precedes the first image I. Also, in order of display, the last image in a GOP is always an I or P image. In addition, a GOP header is used • immediately before a structure I encoded in the bit stream to indicate to the decoder whether the first consecutive B-images immediately following the structure I encoded in the bit stream can be reconstructed appropriately in the case of a random access, where the structure I is not available for use as a reference structure. Even when structure I is not available, B images can possibly be reconstructed using only regressive prediction from a subsequent I or P structure. When it is required to deploy a structure that does not immediately follow the GOP header, as during editing operations, the synchronization between the base layer and the improvement structures can be destroyed. This can result in a discontinuity that leads to a paralysis of structure or other deterioration in the resulting video image. Accordingly, it would be advantageous to provide a ratio control scheme for a stereoscopic video system such as the MPEG MVP system that adjusts the quantization level of P images in the enhancement layer depending on whether the image is being predicted temporarily or because of disparity. The scheme must also respond to the complexity level of the coded image and the reference structure. The scheme should also respond to the data proportion requirements during potential editing operations while providing a uniform image quality and preventing image paralysis. The present invention provides the above and other advantages. SUMMARY OF THE INVENTION In accordance with the present invention, a method and proportion control apparatus for use in an encoder on the transmit side of a stereoscopic digital video communication system for modifying the level of quantification of structure data is presented. or B in the improvement layer depending on whether the structure is predicted temporarily (from the same layer) or predicted by disparity (from the opposite layer). The invention can maintain a consistent image quality by providing additional quantization bits for P images predicted by disparity, for example, where a structure P can be coded from a structure B in the base layer. The selected quantization level corresponds to a total bitrate requirement of the improvement layer, right_bi t_rate, and a virtual buffer memory fullness parameter, Vr. In addition, in many applications, it is necessary to recode the decoded data for editing modes such as fast forward and fast regression. According to the present invention, when the reference structure in the base layer is the first structure of a GOP, the structure of the corresponding improvement layer will be coded as an I or P structure to improve image quality and eliminate propagation of errors during such potential editing modes. For example, if the structure of the improvement layer in question were to be coded as a structure B using a conventional image distribution scheme, the image type would be switched instead to an image P or l. In addition, the proportion control calculations in the transmitter will respond to this possibility by reducing the bits allocated to the current image in the enhancement layer to avoid a possible overflow of the virtual buffer in the encoder. Furthermore, for the P structures predicted by disparity, the size of the quantization step is modified according to the activity level of the structure that is being encoded in the improvement layer, or to the reference structure in the base layer, the which is greater. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows the scheme of temporal prediction and conventional disparity of the MPEG MVP system. Figure 2 shows the initial GOP or renewal period subroutine according to the present invention. Figure 3 shows the synchronization of the image layer for the sequence of the enhancement layer according to the present invention. Figure 4 shows the subroutine for the pre-processing of the current image according to the present invention. Figure 5 shows the subroutine for the postprocessing of the previous image according to the present invention. Figure 6 shows an image distribution configuration according to the present invention. DETAILED DESCRIPTION OF THE INVENTION The proportion control method of the present invention includes seven procedures, including initialization of parameters, initialization for the improvement layer, initialization and update of the renewal period or group of images (GOP), pre-processing of the current image, post-processing of the previous image, processing of proportion control of the work in slice and of work of macro-block, and processing of adaptive quantification. The parameters that are initialized for later use include the minimum assigned number of bits, Trmin, for the structures corresponding to a GOP or to a renewal period of the improvement layer. Trmin is determined from right bit rate Tr "picture rate where right_bi t_rate is the maximum assigned bitrate for the enhancement layer, and pi cture_ra te is the image proportion of the stereoscopic signal, for example, 30 images / second for the NTSC video, and 25 images / second for the PAL video. In addition, an initial complexity value Kx is assigned to the current image in the improvement layer. The quantization level selected from the current image corresponds to the level of complexity, such that a smaller quantization stage size is used with a more complex image, thereby producing more coded data bits. The level of initial complexity - is assigned depending on the type of image. An I image is used with a random access reference image and therefore must be quantified in relatively small stages. In this way, an I image has a relatively higher level of complexity. Images P and B are assigned a lower initial complexity value, and therefore are quantified more closely. In addition, the complexity of a given image can be determined either in the spatial domain or in the transformation domain. The representative values are Kx? = 1.39, KxDP = 0.52, KxTP = 0.37 and KxB = 0.37, where the subscript "I" denotes an I image, "DP" denotes an image P predicted by disparity, "TP" denotes an image P temporally predicted, and "B" denotes an image B. Additionally, the terms PD and PT will be used herein to indicate, respectively, a P image predicted by disparity and a P image predicted temporally. The complexity parameters must satisfy the relationship Kx? > KxDP > KxTP > KxB For a given type of image, the complexity value Kx is adjustable. A highly complex image will have greater variations in values of luminance or chrominance of the pixel, for example. In order to maintain a given image quality (for example, resolution), the highly complex image must be encoded using additional bits compared to a less complex image. According to the above, the complexity value of a given image can be increased or decreased, respectively, if the image is more or less complex than other images of the same type. KrDP, KrTP, KrB are initial virtual buffering fullness parameters for predictive encoded structures (eg, PD, Pt, and B structures) in the enhancement layer. For example, and KrB = 1.4 are adequate. These parameters are adjustable and should satisfy KrDP < KrTp < KrB. XrDP, XrTp and XrB are the complexities for PD, Pt and B images, respectively, and are initially determined from the complexity parameters KxDP, KxTP, KxB, respectively. Specifically, by using the assigned minimum number of bits for the improvement layer, rigth_bi t_ra te, the desired bit rate for the I images is XrI = KxI * right_bi t_rate. For PD images, XrDP = KxDP * right_bi t_rate. For Pt images, XrTP = KxTp * right_bi t_rate. For images B, XrB = KxB * right__bi t_rate. In addition, in the case that there are no structures I in the improvement layer, the proportion of bits allocated for the P images predicted by disparity can be increased by the term XrAN ?, where N_ = max. { Nr / GOP_length_oí_le_ft_V-_etv, 1.}. , Nr is the renewal period of the improvement layer, as mentioned, and GOP_length_of_left_view is the number of structures in a group of images in the base layer. In this case, t_rate + j / Nj. With respect to GOP_length_of_left_view, a conventional image distribution scheme is considered in the base layer of Ix, Bi, B2, Pi, B3, B4, I2, B5, B6, P2, B7, B8. In this case, GOP_l ength_of_left_vi ew = l2. Next, the current image type in the improvement layer is determined. If the current image is an I image, the virtual buffer fullness level is If the current image is a P image predicted by disparity, the virtual buffer fullness level is VrDP = 10 * RPr * KrDP / 31. For a P image predicted temporarily, the virtual buffer fullness level is VrTP = 10 * RPr * KrTp / 31. For an image B, the virtual buffer fullness level is VrB = VrTP = 10 * RPr * KrB / 31 since KrTP = KrB. The reaction parameter RPr is defined as RPr = 2 * ri gh t_bi t_ra t e / pi c ure urera. The initialization for the improvement layer will now be described. Figure 2 shows the initial GOP or renewal period subroutine according to the present invention. The routine starts in block 200.
In block 210, the values Nr, NI and Ml are retrieved. Nr is the number of images (for example, length) in the renewal period or GOP of the improvement layer (right view). NI is the length of the GOP of the base layer (left view), and Ml denotes the configuration of the types of images in the base layer. Specifically, for Ml = l, the base layer has only I and P images. For Ml = 2, the base layer has I, P and B images, with a B image between I or P images. For Ml = 3, the base layer has I, P and B images, with two consecutive B images between I or P images. In block 220, the initial value of the number of I, P and B images in the renewal period or GOP of the improvement layer. Nri is the number of structures I, NrDP is the number of structures P predicted by disparity NrTP is the number of structures P predicted temporarily, and NrB is the number of structures B. In block 230, the initial value of the number of remaining bits, Gr, in the renewal period or GOP of the improvement layer is determined from right bit rate Gr = = = * Nr. picture_rate In block 240, Rr, the remaining number of available bits is recovered to encode the remaining images in the renewal period or GOP. Rr is an execution balance that is updated after each image is encoded in the improvement layer. The initial value of the remaining number of bits is Rr = 0. In block 250, Rr is updated as Rr = Rr + Gr. In block 260, other parameters are initialized as previously discussed, including Trmln, Kx, Kr, Xr and Vr. In block 270 the routine ends. The initialization and update of the renewal period or group of images (GOP) will now be described. In the base layer, the renewal period is in the interval between successive I images in the sequences of encoded video structures, and defines the same images as the GOP. In the enhancement layer, the renewal period is the interval between successive I images, if present, or between two P-images predicted by pre-assigned disparity (e.g., PD images). The pre-assigned media that is set to the type of image in the enhancement layer before examining the base layer configuration. According to the present invention, a pre-assigned type of image can be switched to another type of image before coding. In the base layer, and in the enhancement layer, when I structures are used, the GOP header immediately precedes an I structure encoded in the video bit stream packaged to indicate whether the first consecutive B images immediately following the encoded structure I can be reconstructed appropriately in the case of a random access. This situation may arise, for example, during the editing of a video structure sequence in a decoder. When I structures are not used in the improvement layer, there is no GOP in the same way. In addition, the GOP or renewal period used in the base and improvement layers will typically have the start and end points of temporary displacement. That is, the first structure of a GOP in the base layer will not necessarily coincide with the first structure of the renewal period of the improvement layer. Similarly, the length of the GOP or renewal period (eg, number of structures) will also typically vary between the base and the improvement layers. The fact that the base and enhancement layers can be displaced and have different lengths can provide problems during editing modes such as fast forward or fast reverse. In fact, editing operations can result in the loss of the enhancement layer or other visual deterioration. Protocols such as MPEG-2 provide a syntactic hierarchy in the coded bit stream that allows such editing functions. For example, the bit stream can be coded with various access points that allow processing and editing of corresponding portions of the base layer without the need to decode the entire video. Nevertheless, such access points in the base layer do not necessarily correspond to acceptable access points in the improvement layer. For example, an access point is usually provided on the base layer where an I image is located. Since a structure I provides a self-contained video structure image, the subsequent structures in the base layer can be predicted using image I However, the structure I in the base layer may coincide with a structure B in the improvement layer. In this case, the subsequent images can not be accurately predicted from structure B in the improvement layer since a structure B does not contain data from a full video structure. According to the present invention, an image of the enhancement layer which is pre-defined as an image B is encoded in its place as a PD image when it is determined that the image matches the first image I of a GOP of the base layer. That is, the type of image is switched. Thus, in the event that random access is required in the base layer, the corresponding P image in the improvement layer can be predicted by disparity using structure I in the base layer in order to provide the information required for rebuild the image of the improvement layer. Alternatively, the image of the enhancement layer can be encoded as an I image if sufficient bits are available, thereby providing synchronized random access for both base and enhancement layers. Also, in the decoder, errors can propagate into structures that are predicted from other structures due to quantification and other errors. In this way, it is necessary to periodically provide a new structure that is self-contained and does not depend on any other structure (such as a structure I in the base layer), or that is directly predicted from a structure I (such as a structure P predicted by disparity in the improvement layer). When such a structure is provided, it is said that the data stream is renewed since the propagated errors are eliminated or reduced and a new baseline is established. For example, with a structure ratio of 30 structures / seconds and with every eight images in the base layer being an I image, the renewal period is 8/30 seconds. It is said that the structures in relation to a GOP heading cover the renewal period. Figure 3 shows the synchronization of the image layer for the sequence of the enhancement layer according to the present invention. The enhancement layer includes the sequence of structures Ii, Bi, PDi, PT1, B3, B4, PT2, B5, PD2, Pt3, B7, B8 in the full GOP shown. The images PD1 and PD2 have replaced B2 and B6 (not shown), respectively. A reset signal 310 denotes the start of the coding sequence with a pulse 315. A pulse synchronization signal (PSYNC) 320 provides a pulse train. An impulse 325 denotes the last structure (which is shown as an image B) in the previous renovation period or GOP. The impulse 330 denotes the first structure in the next renewal period or GOP. As mentioned, the GOP is defined when I images are used in the improvement layer. Otherwise, the renewal period defines the set of images to be coded with the assigned number of bits. The images of the renewal period are grouped like this even when there is no GOP. The pulses 330 to 390 correspond to the images A, Bi, PDi, Ptl, B3, B4, PT2, B5, B6, PT3, B7, B8, I2, respectively. The pulse 390 indicates the start of another group of images or renewal period in the improvement layer. In the example shown, the first structure to be encoded in the GOP or renewal period is either an I image or a PD image, respectively. The following structure, denoted by impulse 335, is an image B. However,, according to the present invention, the following structure, denoted by the pulse 340, has been switched from an image B (for example B2) to a PD image. Similarly, PD2 denoted by pulse 370 has replaced another image B, B6 (not shown). The last two structures of the GOP or renewal period are B and B8, as indicated by pulses 380 and 385, respectively. Following the reset signal pulse 317, another refresh period or GOP is started as indicated by pulse 390 with another I image or PD image. The pulse 395 denotes a first B image of this GOP and so on. Additionally, each of the pulses 330-385 indicates the occurrence of post-processing of a previous image, and the pre-processing of the current image. For example, suppose there are no I images in the improvement layer. In this way, the pulse 330 indicates that the pre-processing of the current structure is started, which is to be encoded as a PD image. Also, at this time, the post-processing of the B-image indicated by the pulse 325 starts. Similarly, the impulse 335 indicates that the pre-processing of the current structure, Bi initiates and that the post-processing of the image PD indicated by pulse 330 begins. The pre-processing and post-processing stages will now be described. Figure 4 shows the subroutine for the pre-processing of the current image according to the present invention. The routine is started in block 400. In block 405, the parameters Rr, Trmin, NrI r NrDP, NrTP, NrB, KrDP, KrTP, KrB, XrDP, XrB r and XrTP are retrieved. Rr is the remaining number of bits that can be assigned to the structures of a GOP or renovation period of the improvement layer. Trmln is the minimum number of bits assigned to each structure. Nri, NrDP, NrTP and NrB are the number of I, PD, PT and B images, respectively, provided in a GOP or renewal period of the enhancement layer according to the present invention. In a stereoscopic video signal, if the coded structure of the base layer is the first structure of the GOP, the corresponding structure in the enhancement layer must be encoded either as a structure I or P with the structure of the base layer as a reference structure. This factor must also be taken into account in the proportion control calculations to ensure that the renewal period is configured correctly.
For example, Figure 6 shows an image distribution configuration according to the present invention. It should be understood that the example shown is only one of many possible image distribution configurations. The images 602 to 626 are in the enhancement layer 600, and the images 652 to 676 are in the base layer. The type of image is indicated in the image. When used, the subscribed "e" denotes the improvement layer, the subscribed "b" denotes the base layer, and the numerical subscript is a sequential indicator. For example, the image Be4 616 is the fourth image B in the image of the improvement layer that is shown. PD and Pt indicate, respectively, an image P predicted by disparity and a P image predicted temporarily. Note that the images are displayed in the order in which they are transmitted in the bitstream, which usually differs from the order of display. In addition, the arrows show which point with respect to the images in the enhancement layer indicates the type of coding used for the image. A solid arrow indicates that the indicated image is encoded using the image in the tail of the arrow as a reference image. For example, Be? 604 is encoded using both Ie? 602 in the improvement layer, such as Bb2 654 in the base layer. A dotted arrow indicates an optional coding alternative. For example, the image 608 may be encoded using the image PD1 606 in the enhancement layer, in which case the image is Pti, or the image 608 may be encoded using the image Bb3 658 in the base layer, in which case the image is P.S. In accordance with the present invention, the option that meets a particular criterion can be selected. This criterion may reflect a reduced prediction error, or a desired bit allocation or image quality, for example. In any case, the proportion control scheme of the present invention responds to the selected type of image. Note also that although a type of image P has only one reference structure, one image B will typically have macro-blocks predicted from structures in both layers in an average process. For example Be3 612 is predicted from both Be2 610 and P? 662. Prediction modes in the base layer are not shown as they are conventional. In the image distribution and configuration of the prediction mode shown in Figure 6, the enhancement layer includes images I Ie? 602 and Ie2 626. In this way a GOP of the enhancement layer includes the twelve images 602 to 624. Another GOP of the enhancement layer is started at Ie2 626, but not shown at all. A GOP of the base layer includes images Ibl 656 to Bb6 666. Another GOP of the base layer is started in the image Ib2 668, but not shown at all. Observe that the twelve images Bb? 652 to Pb2 674 in the base layer correspond to the GOP of the improvement layer. In this sequence of the base layer of twelve images, there are two I images at the start of the GOPs of the base layer. According to the present invention, the images in the enhancement layer corresponding to these images I of the base layer are switched to a different type of image for their coding. Specifically, the image PD1 606 has been switched, which corresponds to the image Ibl 656 in the base layer. Similarly, the image PD2 618, which corresponds to the image Ib2 668 in the base layer, has been switched. Mainly, in a conventional image distribution scheme, the PD? 606 and PD2 618 are images B. In an alternative embodiment, the images of the enhancement layer corresponding to the start of GOP images in the base layer can be switched to I images. Switching an image B to either an image P or I in the enhancement layer in the exposed manner provides advantages during the editing modes, when random access in the base and enhancement layers may be required. In this way, in the example of figure 6, Nr? = l, NrDP = 2, NrTp = 3, and NrB = 6 in the GOP of the improvement layer that covers the images Ie? 602 to Bee 624. Additionally, Nr = 12 since there are twelve images in the GOP of the improvement layer (right), N = 6 since there are six images in the GOP of the base layer (left), and Ml = 3 since there are two consecutive B images between I or P images in the base layer. Referring now to Figure 4, in block 410, the current image type in the enhancement layer is determined. Depending on the type of image, one of the four different branches in Figure 4 will be followed. If the current image is an I image, the virtual buffer fullness level Vrr is determined in block 415. If the current image is a P image, the type of image P is determined in block 412. For a P image predicted by disparity, the virtual buffer fullness level VrDP is determined in block 435. For a P-image predicted temporarily, the level VrTP virtual buffer fullness is determined in block 455. For an image B, the virtual buffer fullness level VrB is determined in block 475. Next, the current image in the enhancement layer that is being encoded is pre-process to determine an "objective" bit allocation, Tr, which is the estimated number of bits available to encode the next image. In addition, bit allocation is done on a number of structures that are defined by the GOP or renewal period. In accordance with the above, it is also necessary to know how many structures and what type comprise the GOP or renewal period. In particular, when the newly coded structure is an I image, in block 420, Rr Tri ~ maX. { "NrDP * Xr Nrtp * Xr NrB * XrB Tr ^} * Nrt + + + 1 Kr- ^ Xr, Krtp * Xr: KrB * Xr. For an image P predicted by disparity, in block 440, Rr TrDP = max. { , Tr_. _,} . NrDp _ + _. KrDP, * NrI XrA +, KrDp. * .. WcTP * Xj-TP + ..K.rDp _ *, »IB * XIB ^ XrDp Krtp * XrDp Krß * XrDpFor a P-predicted image temporarily, in block 460, Rr TrTP = max. { , Tr_ r,} Nr? * Xrj NrDp * XrDp NrB * Xrß t n Nrtp + Krtp * + KrTp * + K? -p? riip? _rpp R X-Tmp For a picture B, in block 480 Rr TrB = max. { , Tr_ - "} Nr -, - * XrI NrDp * XrDp Nr? P * Xrtp m? N NrB + KrB * + KrB * + KrB * XrB KrDP * XrB KrTP * XrB When the current structure is encoded as a certain type of image, the number of Remaining images of that type needed in the improvement layer can be reduced by one. In this way, for images I, in block 425, Nri is decreased by one and stored. The corresponding actions occur for NrDP, NrTP and NrB in blocks 445, 465, and 485, respectively. If the current image is a PD image, then, according to the present invention, a new average activity level avg_act "is defined in block 450. avg_act indicates the average activity of the previous structure in the improvement layer, and can determined either in the spatial domain, such as with the MPEG Test Model 5 system, or in the transformation domain, as with some MPEG-2 systems .. Additional details of Test Model 5 can be found in the ISO / IEC document JTC1 / SC29 / G11, AVC-491, Version 1, entitled "Test Model 5", April 1993, incorporated herein by reference Conventionally, the level of quantification of a structure that is being encoded is determined on the basis of activity level of only the reference structure, however, this can produce a reduced image quality if the current structure has a higher level of activity than the reference structure. Reference height would be found in the base layer (left), with an average activity of avg_act_l. According to the present invention, for the PD images, the maximum of the average activity levels of the previous structure and reference structure is used. In this way, the new average activity level is avg_act "= max { Avg_act f avg_act_l.}. Alternatively, when the system has structure buffers, avg_act of a structure encoded in the improvement layer can pre -calculate and store, that is, for the current structure that is being encoded, the average activity can be calculated from the current image by itself For images I, Pt and B, in blocks 430, 470 and 487, respectively, the average activity level is avg_act "= avg_act. In block 489, the current image type is stored for later retrieval in the postprocessing of the previous image. In 490, the determination is made whether linear or non-linear quantization is used. For linear quantization, the size of the initial quantization step to be used in the quantification of both DC and AC coefficients graduated in the current structure is derived from the macro-block quantization parameter, MQUANT, which is determined in block 492 as Vr * 62 MQUANT = max. { 2, min. { , 62} } . As discussed, Vr is the RPr virtual buffer fullness level, and RPr is the reaction parameter. For a non-linear quantization scale, in block 494, Vr * 31 MQUANT = max. { l, min. { non_linear_mguant_tajble [], 112} } in RPr where non_linear_mquant_tabl e is the output of a lookup table with an entry of Vr * 31 / RPr. The routine ends in block 496. Figure 5 shows the subroutine for the postprocessing of the previous image according to the present invention. In block 505, the parameters Rr, MBr, Sr, Tr, TQr and Vr are retrieved. Rr is the remaining number of bits that can be assigned to the structures of a GOP or renovation period of the improvement layer after the current structure has been coded. MBr is the number of macro-blocks in a structure. Sr is the number of bits in a previous image and does not include filler bits, which are simulated bits inserted before a start code in the data stream. Tr is the number of bits assigned to encode the current image. TQr is the accumulation of MQUANT for the previous image. Vr is the virtual buffer fullness level. In block 510, the average quantization parameter, Qr, is calculated. If the next image is a PD image, QrDP = TQrDP / MBr. Otherwise, Qr = TQr / MBr.
In block 515, a global complexity level, Xr, is determined. If the next image is a PD image, the global complexity is Xr = SrDP * QrDP. Otherwise, Xr = Sr * Qr. In block 520, the virtual buffer fullness level, Vr, is updated by taking the previous buffer fullness level, adding the number of bits in the previous image, Sr, and subtracting the number of bits allocated to the current image, Tr. In block 525, the previous image type is recovered. If the previous image in the enhancement layer is an image I, rj and Vr: are set and stored in block 535. If the previous image is a PD image as determined in block 555, XrDP and VrDP are set and stored in block 545. If the previous image is a Pt image as determined in block 555, XrTP and VrTP are established and stored in block 560. If the previous image is a B image, XrB and VrB are established and stored in block 570. Next, the average activity is calculated and stored for the I, PD, PT and B images in blocks 540, 550, 565 and 575, respectively, as discussed in connection with blocks 430, 450, 470 and 487 of Figure 4. Next, in block 580, the remaining number of bits that can be assigned to the GOP structures or renewal period of the improvement layer is updated by subtracting the number of bits in the previous image in the improvement layer, Mr. The routine ends in block 585. Processing will now be to control the proportion of the macro-block work and the work in slice. In the MPEG-2 system, the proportion control is based in part on the macroblock level and on the slice level of a video structure. For example, with an NTSC format, a video structure can be divided into 30 slices, each of which has forty-four macro-blocks. In this way, an entire NTSC structure comprises 1,320 macroblocks. With a PAL format, there are 1,584 macro-blocks. For proportion control based on a macroblock, let Bm (j) represent the number of bits in the ji-th macro-block in the current image, for j = l to 1,320. Abm (j) is the cumulative number of bits up to the twentieth macro-block in the current image. MBr is the number of macroblocks in the image. A macro-block virtual buffer discrepancy, d (j) is determined at j * Tr from d (j) = Vr + ABm (j). The parameter of MBr quantization of reference for the jiésimo macro-block d (j) * 31 is Q = - • RPr For the control of proportion of level in slice, Bs (j) is the number of bits in the twentieth slice in the current image, for j = l to 30. Abs (j) is the number of bits accumulated up to the twentieth slice in the current image. No_sli ce is the number of slices in the image. The virtual buffer discrepancy in j * Tr slice is ds (j), where ds (j) = Vr + Abs (j). The No_Sl? Ce reference quantification parameter for the jiésima ds (j) * 31 slice is Qs (j) =. RPr The processing of adaptive quantization will now be discussed. First, the activity of the ninth macro-block, act (j), is calculated. If the current image is a P image in disparity prediction mode, the normalized activity of the ninth macro-block, N__act (j), is calculated as 2 * act (j) + avg act "N act (j) = =. Act (j) + 2 * avg_act" For other types of P images, 2 * act (j) + avg act N act (j) = =. - act (j) + 2 * avg_act with the control of macro-block level ratio, the size of the quantization stage for the twentieth macro-block is calculated as follows. For a linear Q scale, MQUANT (j) = max. { 2, min. { Q (j) * N_act (j), 62.}. } For a non-linear quantification scale, MQUANT (j) = a ?. { l, in. { non_lineaz_ quant_table [Q (j) * N_act (j)], 112.}. } . where non_linear_mquant_table is the output of a lookup table with an entry of Q (j) * N_act (j). With the control of slice level ratio, Qs (j) is replaced by Q (j), so that for a linear Q scale, MQUANT (j) = max. { 2, min. { Qs (j) * N_act (j), 62.}. } . and for a non-linear quantization scale, MQUANT (j) = a? (l, min { non_linear_mquant_table [Qs (j) * N_act (j)] r 112.}..}.. where non_linear_mquant_table is the output of A search table with an entry of Qs (j) * N_act (j) In accordance with the above, it can be seen that the present invention provides a proportion control scheme for a stereoscopic digital video communication system that modifies the level of quantification of structure data P or B in the improvement layer depending on whether the structure is predicted temporarily (from the same layer) or predicted by disparity (from the opposite layer). of the quantification stage is modified according to the activity level of the structure that is being encoded in the improvement layer, or of the reference structure in the base layer, which is always greater. the paralysis of the structure is improved and avoided Curing during editing modes by coding the structure of the enhancement layer as a structure I or P when the reference structure in the base layer is the first structure of a group of images (GOP). Although the invention has been described in connection with various specific embodiments, those skilled in the art will appreciate that numerous adaptations and modifications may be made without departing from the spirit and scope of the invention as set forth in the claims.

Claims (22)

  1. NOVELTY OF THE INVENTION Having described the present invention, it is considered as a novelty and therefore the property described in the following claims is claimed as property. A method for encoding successive images of video data into an enhancement layer of a digital stereoscopic data signal, said method comprising the steps of: providing a reference image for use in encoding a current image of a cluster of said images; wherein: if said reference image is provided in said improvement layer, it encodes at least a portion of said current image using a first number of bits; and if said reference image is provided in a base layer of said stereoscopic signal, encode said portion of said current image using a second number of bits that is different from said first number of bits. A method according to claim 1, characterized in that it further comprises the steps of: assigning an initial number of bits to encode said grouping of said images; and maintaining a total execution of the remaining bits available as each of said images is encoded in said grouping, wherein at least one of said first and second number of bits is modified according to said execution total. 3. A method according to claim 1 or 2, characterized in that said first and second numbers of bits are determinative of the respective sizes of the quantization step, first and second, to encode said portion of said current image. A method according to one of the preceding claims, characterized in that said first and second numbers of bits are associated with a desired data rate of said stereoscopic signal. 5. A method according to one of the preceding claims, characterized in that said current image is a disparity predicted image (PD image), comprising the additional step of: increasing one of said first and second numbers of bits when there are no intra- encoded (I images) in said improvement layer in relation to a case where there are intra-encoded images (images I) in said improvement layer. A method according to one of claims 1 to 5, characterized in that said reference image is provided in said base layer, comprising the additional steps of: determining a first level of activity of at least a portion of an image that precedes said current image in said improvement layer; determining a second level of activity of at least a portion of said reference image; and deriving one of said first and second number of bits from the higher of said activity levels, first and second. A method according to one of claims 1 to 5, characterized in that said reference image is provided in said base layer, comprising the additional steps of: determining a first level of activity of at least a portion of an image that precedes said current image in said improvement layer; determining a second level of activity of at least a portion of said reference image; and deriving one of said first and second number of bits from an average of said activity levels, first and second. A method according to one of claims 1 to 5, characterized in that it comprises the additional steps of: pre-calculating and storing a value indicating a first level of activity which is the activity level of at least a portion of said image current before a time when said current image is encoded; recover said value to be used in the encoding of said current image; determining a second level of activity which is an activity level of at least a portion of said reference image; and deriving the number of bits used to encode said current image using the highest of said activity levels, first and second. 9. A method according to one of claims 1 to 5, characterized in that it comprises the additional steps of: pre-calculating and storing a value indicating a first level of activity which is the activity level of at least a portion of said current image before from a time when that current image is encoded; recover said value to be used in the encoding of said current image; determining a second level of activity which is an activity level of at least a portion of said reference image; and deriving the number of bits used to encode said current image using an average of said activity levels, first and second. A method for encoding successive images of video data into an enhancement layer of a stereoscopic digital data signal, said method comprising the step of: determining a grouping of said images of said enhancement layer; wherein: when a current image of said grouping corresponds to an image in a base layer of said stereoscopic signal which is the first image of a group of images (GOP) of said base layer, said current image is encoded as an image intra-coded (image I) and a predictive coded (image P). A method according to claim 10, characterized in that: a first image of said grouping of said images in said improvement layer is temporarily displaced from said reference image of said base layer. 12. An apparatus for encoding successive images of video data into an enhancement layer of a digital stereoscopic data signal, comprising: means for providing a reference image for use in encoding a current image of a grouping of said images; wherein: if said reference image is provided in said improvement layer, at least a portion of said current image is encoded using a first number of bits; and if said reference image is provided in a base layer of said stereoscopic signal, said portion of said current image is encoded using a second number of bits that is different from said first number of bits. An apparatus according to claim 12, characterized in that it comprises: means for assigning an initial number of bits to encode said grouping of said images; and means for maintaining a total execution of the remaining bits available as each of said images is encoded in said grouping, wherein at least one of said first and second number of bits is modified according to said execution total. 14. An apparatus according to one of claims 12 or 13, characterized in that it further comprises: means for determining the respective sizes of the quantization stage, first and second, according to said first and second numbers of bits for encoding said portion of said current image . 15. An apparatus according to one of claims 12 to 14, characterized in that said first and second numbers of bits are associated with a desired data rate of said stereoscopic signal. 16. An apparatus according to one of claims 12 to 15, characterized in that said current image is a disparity predicted image (PD image), comprising: means for increasing one of said first and second numbers of bits when there are no intra- encoded (I images) in said improvement layer in relation to a case where there are intra-encoded images (images I) in said improvement layer. An apparatus according to one of claims 12 to 16, characterized in that said reference image is provided in said base layer, comprising means for determining a first activity level of at least a portion of an image preceding said current image. in said improvement layer; means for determining a second level of activity of at least a portion of said reference image; and means for deriving one of said first and second number of bits from the higher of said activity levels, first and second. 18. An apparatus according to one of claims 12 to 16, characterized in that said reference image is provided in said base layer, comprising means for determining a first activity level of at least a portion of an image preceding said current image. in said improvement layer; means for determining a second level of activity of at least a portion of said reference image; and means for deriving one of said first and second number of bits from an average of said first and second activity levels. An apparatus according to one of claims 12 to 16, characterized in that it comprises: means for pre-calculating and storing a value indicating a first level of activity which is the activity level of at least a portion of said current image before from a time when that current image is encoded; means for recovering said value for use in the encoding of said current image; means for determining a second level of activity which is a level of activity of at least a portion of said reference image; and means for deriving the number of bits used to encode said current image using the highest of said activity levels, first and second. An apparatus according to one of claims 12 to 16, characterized in that it comprises: means for pre-calculating and storing a value indicating a first level of activity which is the activity level of at least a portion of said current image before from a time when that current image is encoded; means for recovering said value for use in the encoding of said current image; means for determining a second level of activity which is a level of activity of at least a portion of said reference image; and means for deriving the number of bits used to encode said current image using an average of said first and second activity levels. 21. An apparatus for encoding successive images of video data into an enhancement layer of a digital stereoscopic data signal, comprising: means for determining a grouping of said images of said enhancement layer; wherein: when a current image of said grouping corresponds to an image in a base layer of said stereoscopic signal which is the first image of a group of images (GOP) of said base layer, said current image is encoded as an image intra-coded (image I) and a predictive coded (image P). 22. An apparatus according to claim 21, characterized in that: a first image of said grouping of said images in said improvement layer is temporarily displaced from said reference image of said base layer. SUMMARY The proportion control in a stereoscopic digital video communications system is carried out by modifying the level of quantification of the structure data P or B in the improvement layer depending on whether the structure is predicted temporarily (from of the same layer) or predicted by disparity (from the opposite layer). The invention can maintain a consistent image quality by providing additional quantization bits for the P images predicted by disparity, for example, where a structure P can be encoded from a structure B in the enhancement layer. The selected quantization level corresponds to a total bit proportion requirement of the improvement layer. For the P structures predicted by disparity, the size of the quantization stage is modified according to the level of activity of the structure that is being encoded in the improvement layer, or of the reference structure, whichever is greater. Also, the image quality is improved and the paralysis of the structure is avoided during editing modes such as fast forward and fast regression which require random access to the image data. When the reference structure in the base layer is the first structure of a group of images (GOP), the structure of the corresponding improvement layer will be coded as a structure I or P instead of as a structure B to improve the quality of image and eliminate or reduce the propagation of errors during random access.
MX9704998A 1997-07-02 1997-07-02 Rate control for stereoscopic digital video encoding. MX9704998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
MX9704998A MX9704998A (en) 1997-07-02 1997-07-02 Rate control for stereoscopic digital video encoding.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/674,859 1996-07-03
MX9704998A MX9704998A (en) 1997-07-02 1997-07-02 Rate control for stereoscopic digital video encoding.

Publications (2)

Publication Number Publication Date
MXPA97004998A true MXPA97004998A (en) 1998-01-01
MX9704998A MX9704998A (en) 1998-01-31

Family

ID=39165584

Family Applications (1)

Application Number Title Priority Date Filing Date
MX9704998A MX9704998A (en) 1997-07-02 1997-07-02 Rate control for stereoscopic digital video encoding.

Country Status (1)

Country Link
MX (1) MX9704998A (en)

Similar Documents

Publication Publication Date Title
US6072831A (en) Rate control for stereoscopic digital video encoding
US5847761A (en) Method for performing rate control in a video encoder which provides a bit budget for each frame while employing virtual buffers and virtual buffer verifiers
EP0838959B1 (en) Synchronization of a stereoscopic video sequence
US5929916A (en) Variable bit rate encoding
US5650860A (en) Adaptive quantization
US5682204A (en) Video encoder which uses intra-coding when an activity level of a current macro-block is smaller than a threshold level
US5872598A (en) Scene change detection using quantization scale factor rate control
US6043838A (en) View offset estimation for stereoscopic video coding
US5878166A (en) Field frame macroblock encoding decision
US5801779A (en) Rate control with panic mode
EP0862837B1 (en) Method and apparatus for statistical -multiplexing programs using decoder buffer fullness
US6563954B2 (en) Method for computational graceful degradation in an audiovisual compression system
US5619256A (en) Digital 3D/stereoscopic video compression technique utilizing disparity and motion compensated predictions
US5612735A (en) Digital 3D/stereoscopic video compression technique utilizing two disparity estimates
US5761398A (en) Three stage hierarchal motion vector determination
US20090190662A1 (en) Method and apparatus for encoding and decoding multiview video
EP0823826A2 (en) Optimal disparity estimation for stereoscopic video coding
US20070104276A1 (en) Method and apparatus for encoding multiview video
US5771316A (en) Fade detection
WO1997019561A9 (en) Method and apparatus for multiplexing video programs
US5764293A (en) Method of encoding video using master and slave encoders wherein bit budgets for frames to be encoded are based on encoded frames
KR100523930B1 (en) Apparatus for compressing/decompressing multi-viewpoint image
Yang et al. An MPEG-4-compatible stereoscopic/multiview video coding scheme
Adolph et al. 1.15 Mbit/s coding of video signals including global motion compensation
JPH10191393A (en) Multi-view-point image coder