MXPA06009733A - Method and system for digital coding 3d stereoscopic video images. - Google Patents

Method and system for digital coding 3d stereoscopic video images.

Info

Publication number
MXPA06009733A
MXPA06009733A MXPA06009733A MXPA06009733A MXPA06009733A MX PA06009733 A MXPA06009733 A MX PA06009733A MX PA06009733 A MXPA06009733 A MX PA06009733A MX PA06009733 A MXPA06009733 A MX PA06009733A MX PA06009733 A MXPA06009733 A MX PA06009733A
Authority
MX
Mexico
Prior art keywords
video
extension
sequence
picture
image
Prior art date
Application number
MXPA06009733A
Other languages
Spanish (es)
Inventor
Manuel Rafael Gutierrez Novelo
Original Assignee
Td Vision Corp S A De C V
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/MX2004/000011 external-priority patent/WO2005083636A1/en
Application filed by Td Vision Corp S A De C V filed Critical Td Vision Corp S A De C V
Priority to MXPA06009733A priority Critical patent/MXPA06009733A/en
Publication of MXPA06009733A publication Critical patent/MXPA06009733A/en

Links

Abstract

Changes have been made to current MPEG2 coders by making software and hardware changes in different parts of the coding process with the aim of obtaining three-dimensional images in a digital video stream. The video sequence structures of the video data stream are modified in order to include the necessary flags that identify the type of TDVision?? technology image at bit level. Software modifications are made in the processes relating to the heading of the video sequence, modification of the identification flags, modification of data fields, modification of image fields. An electronic comparison between left channel and right channel images is carried out in the hardware. The difference is processed as a B-type image, namely an error correction is carried out. The result is stored in the buffer memory with the TDVision?? technology identifier. During information coding, a DSP is used, wherein said DSP performs the processes of predicting, comparing, quantifying and applies the DCT function in order to form the compacted MPEG2-compatible video stream.

Description

METHOD AND SYSTEM OF DIGITAL CODING OF 3D STEREOSCOPIC VIDEO IMAGES FIELD OF THE INVENTION The present invention relates to the deployment of stereoscopic video images in the 3DVisor® device, more particularly to a method for encoding video images by means of a digital data compression system that allows to store three-dimensional information using standardized compression techniques. .
BACKGROUND OF THE INVENTION Currently data compression techniques are used in order to reduce the consumption of bits in the representation of an image or series of images. The standardization work was carried out by a group of experts from the International Standardization Organization (ISO). The methods are named in the usual way at present, JPEG (joint photographic expert group) and MPEG (moving picture expert group). A common characteristic of these techniques is that the blocks of images are processed by means of the application to the block of a suitable transformation, called in a usual way, the Discrete Cosine Transform (DCT). The formed blocks are subjected to a quantization process, then they are encoded with a variable length code.
The type of variable length code is a reversible process, which allows to reconstruct exactly what has been encoded with the variable length code. The digital video displays include a number of image frames (30 to 96 fps) that are They display or represent successively frequencies between 30 and 75 Hz. Each picture frame in a still image formed of an array of pixels according to the resolution of deployment of a particular system. For example, the VHS system has a display resolution of 320 columns, 480 lines, the NTSC system has a display resolution of 720 columns, 486 lines; and the high definition television system (HDTV) has a display resolution of 1360 columns, 1020 lines. With reference to a digitized form of a low resolution VHS format that has 320 columns, 480 lines, a two-hour movie could correspond to 100 gigabytes of digital video information. By way of comparison, a conventional compact optical disc has a capacity of approximately 0.6 gigabytes, a magnetic hard disk has a capacity of 1-2 gigabytes, and the current compact optical discs have a capacity of more than 8 gigabytes. In response to the limitations of storage and transmission of such massive amounts of information, several standard video compression processes have been established. These video compression techniques "use similar characteristics between successive frames of mage, referred to as a temporal, spatial correlation to provide a frame-to-frame compression, which is based on the representation of the pixels from frame to frame.
All the images that we observe in the cinema as in a television screen are based on the principle of presenting complete images (static as photographs) at high speed, that when presented in a fast and sequential way (30 fps) with a speed of 30 frames per second the sensation of an animated image is perceived, due to the retentiveness of the eye of the human being.
In order to encode the images that will be presented sequentially and form video signals, it is necessary to divide each image into lines, where each line is divided into image elements or pixels, each pixel has two associated values, namely, luma and Chroma The luma represents the light intensity of each point, chroma represents the color as a function of a defined color space (RGB) that can be represented by three bytes. The images are displayed on the screen in a vertical-horizontal scan, from top to bottom and from left to right, and so on cyclically. The number of lines and frequency with which it is displayed can change depending on the format that can be NTSC, PAL or SECAM. Theoretically it would be possible to assign a value for each pixel of luma, of chroma U and of chroma V; but this represents four bytes (one for chroma and three for color), that in NTSC format of 480 lines by 720 columns and approximately 30 frames per second, you have 4 X 480 X 720 X 30 approximately 40 Megabytes of memory per second, which is difficult to store and transmit due to the available bandwidths. At present it has been possible to reduce the chroma data in a ratio of 1: 4 pixels; that is, take a color sample for every four pixels and replicate the same information for the three missing ones and the human being does not perceive the difference, the formats namely: 4: 4: 4 (four samples of luma and four of chroma in a group of 4 x 4 = 16 pixels). 4: 2: 2 (four samples of luma two samples of chroma in a group of 4 x 2 = 8 pixels 4: 1: 1 (four samples of luma, one sample of chroma in a group of 4 x 1 = 4 pixels 4: 2: 0 (eight samples of luma, with two samples of chroma between pixel and horizontal peel in a group of 4 x 2 = 8 pixels) in MPEG1.4: 2: 0 (eight samples of luma, with two samples chroma between pixel and vertical pixel in a group of 4 x 2 = 8 pixels.) Even reducing the information in this way, the amount of digital information needed to store a second of information in an NTSC format, at a quality of 4: 2: 0, requires 15 megabytes or 108 gigabytes for a two-hour file There are methods of reconstructing three-dimensional scenes from a two-dimensional video sequence In light of recent advances in technology and in relation to the future the standard MPEG4 tries to provide means to encode graphics that have relationships in space time, which e will be an important tool in stereoscopic images, in design and manufacturing in engineering applications. A virtual space is created in which a geometric model of the scene is reconstructed. For example, in USP No. 6,661,914, granted to Cecile Dufour, on December 9, 2003, where a new method of reconstructing three-dimensional scenes is described, the succession of scenes have been taken with a simple camera, the contours of the image, the depth of the hidden parts in each view that are subsequently projected and subjected to a process of refinement.
In the race for the process of an image many make their valuable contributions, for example, in the USP No. 6,636,644, granted to Itokawa, on October 21, 2003 that refers to an image process using MPEG4, where they are extracted from the image the chroma values that extend beyond the border of the image, with the above, greater coding efficiency is achieved and the reproduction of the natural color in the contours of the image can be achieved. There are several methods and arrangements for encoding video signals such as that of USP No. 6,633,676, granted to Kleihorst et al., On October 14, 2003, the method is applied to the encoder detector in the one-chamber system, the signals of video encoded with compensated movement (IBP) and a higher resolution image is generated, this image is an interpolation of the previous ones, in summary, the regions of greatest interest of the video signal are determined that together occupy less memory. Image compression coding is essentially for efficiently storing or transmitting digital images, a method of encoding compressed digital images uses DCT as it is a dominant technology in typical standards such as JPEG and MPEG. USP No. 6,345,123, given to Boon, on February 5, 2002, describes a method to encode digital images by transforming the coefficients by the usual method of DCT, applies the quantization process of such coefficients to transform them with a quantization scale Prescribed and finally a variable length coding process is applied to the quantized and transformed coefficients by comparing them with a variable length code table. The image is divided into a plurality of small regions in order to be coded, the small regions are adjacent to each other, a sample of a region is taken and the environment of the next image region is predicted. This predictive method of coding is used in USP No. 6,148,109, issued to Boon et al., On November 14, 2000, wherein the image data generated from the difference of the small regions is coded and extracted. USP No. 6,097,759, issued to Murakami et al. On August 1, 2000, describes a block coding system for a field-coded image. The block patterns include a field of individual blocks and a non-interlaced block, in addition the coding system investigates the movements of the even and odd fields to produce a compensated motion prediction signal, thus providing a high efficiency coding . USP No. 5,579,413, issued to Gisle, November 26, 1996, which describes a method of converting a block data signal from a quantized and transformed image into a variable length encoded data signal, where each event is represented as a three-dimensional quantity. The need arises to use a data compression system that allows the same content to be stored in less space, a group of experts focused on the creation of a way to compact information for coding, storage, decoding and displaying images; but without mentioning details of implementation, in order that all software and hardware developers can create new ways to perform the process, as long as they were compatible with MPEG. MPEG2 is currently a standard widely used worldwide by television companies and those related to video and audio.
Audio and video are packaged in elementary packets (PES), said audio and video packages are intertwined together to create an MPEG2 data stream. Each package has a time stamp to synchronize the audio and video at the time of playback, for example, every three video frames is associated with an audio one. MPEG has two different methods to interlace audio and video, in the system flows: The transport flow, used in systems with a wide possibility of error as is the case of satellite systems, which are susceptible to interference. Each packet has a length of 188 bytes, starting with an identification header, which allows to recognize missing and repair errors. Under the transport stream, several audio and video programs can be transmitted simultaneously in a single transport stream, since due to the header they can be decoded and integrated to many programs independently and individually. The program flow is used in systems that do not have as much possibility of error, as in the reproduction of a DVD. In this case the packages are of variable length and substantially larger than the packages used in the transport flow. As a main feature, the program flow allows only one program content. The video system under the MPEG2 standard allows interlaced and progressive type video images to be encoded. Namely, the progressive video scheme is saved in a complete frame (frame picture fp), and in the interlaced video scheme it can be saved in two forms, by full frame images (frame picture) or by field images (field picture ).
In the compression scheme there are three types of images in the MPEG2 format: Intracoded (I), its information is encoded according to internal data of the image. Predictive-Coded (P), your information depends in a unique way on data located at another point in the future time. Bi-Directionally Predictive Coded (B), your information depends on data located in the past and in the future. In turn, there are compression types that apply to the previous ones, for example: compression by time prediction and space compression. The compression by prediction in the time makes reference to two different pictures in the time but that take an associated movement, takes advantage of the fact that the images between picture and picture change very little. The compression in compact space the information located within a same frame (Intracoded), for example, if you have an image of 100 x 100 pixels, with color of 3 bytes and 1 of luma, if you want to save this information you need 40 kilobytes per frame; but on the other hand if this picture is completely blank it could be represented as color: 255R.255G, -255B, Xinicio = 0, Yinicio = 0, Xfin = 99, Yfin = 99, this would indicate that all this area is white, instead of using 40 kilobytes, only 7 or 8 bytes are used. This is how MPEG2 compression is achieved, whose process steps are complicated and outside the scope of the present invention. The type images (I) contain themselves, they do not refer to any previous or subsequent image, so it does not use compression in prediction of time, it is only compressed according to its own space.
Type (P) images are based on a reference image to encode themselves, so they use prediction compression over time and also use space compression. These images can refer to a type image (I) or another type image (P); but use only one reference source. The images type (B) require two references before and after to be able to be reconstructed, this type of images have the best index of compression. The references to obtain an image of type (B) can only be of type (P) or type (I), never of type (B). The coding and decoding sequences are different. In order to reduce the volume of information, the complete image is divided into a complete frame of a unit called macroblock (macroblock), each macroblock (macroblock) is composed of a division of 16 pixels x 16 pixels, it is sorted and nominated by top down and from left to right, making a matrix array of macroblocks on the screen, the macroblocks are sent into the flow of information sequentially ordered, ie 0, 1, 2, 3, .... n. Macroblocks that have a type image (I) contain themselves only in spatial compression, type (P) images can contain macroblocks of type (P) to refer to previous images, with the possibility of being intracoded macroblocks ( interwoven macrobioces) without any limit. Type (B) images can also be formed by intra-coded (interlaced) macroblocks that refer to a previous, subsequent or both images.
The macroblock is divided into blocks (blocks), a block (block) is a matrix of 8 x 8 data or samples, due to the way in which the chroma format has been classified, we have the format 4: 4: 4 requires a sample of luma Y, a sample of chroma Cr, a sample of chroma Cb, so a macroblock in 4: 4: 4 format requires 12 blocks per macroblock. In the 4: 2: 2 format, 8 blocks are needed per macroblock, in the 4: 2: 0 format, 6 blocks are required for each macroblock. A set of consecutive macroblocks represents a slice (slice), there can be any number of macroblocks in a slice; but they must belong to the same row, in the same way as the macroblocks, the slices are called from left to right and from top to bottom. The slices do not have to cover the entire image, since a coded image does not need samples for each pixel. Some MPEG profiles require that a rigid structure of slices (Slices) be handled, in which the image must be completely fulfilled. The use of the right combination of Hardware and Software algorithms allows the compaction of images in MPEG to be carried out. The encoded data are bytes with specific information of blocks, macroblocks, fields, frames, images, video in MPEG2 format. The information must be grouped by blocks (Blocks), and what is obtained by coding the information, for example (VLC), is a linear flow of bit-bytes. Where the VLC (Variable Length Decoder) is a compression algorithm in which the patterns that occur most often are replaced by shorter codes and those that occur less frequently by longer codes. The compressed version of this information occupies less space and can be transmitted more quickly through networks, however, it is not a format that is easy to edit and requires a decompression that uses a lookup table. Reverse Exploration (Inverse Sean), the information must be grouped by blocks, and what is obtained when encoding the information by the VLC is a linear flow. The blocks are arrays of 8x8 data, so it is necessary to convert the linear information into a square matrix of 8x8. It is done in a descending zigzag manner, from top to bottom and from left to right in the two types of sequence, depending on whether it is a progressive image or an interlaced image. Inverse Quantization, is simply the multiplication of each data value by a factor. When coding, most block data is quantized to remove information that the human eye is not able to perceive, quantization allows a greater conversion of the MPEG2 stream to be obtained, and it is also required that the Inverse process (Inverse Quantization, Inverse Quntization) within the decoding process.
Structure of the video sequence in MPEG, this is the maximum structure used in the MPEG2 format and has the following format: Video Sequence (Video_Sequence) Sequence Header (Secuence_Header) Sequence Extension (Sequense_Extension) Data (0) of the User and Extension (Extension_and_User_Data (0) Header of the Group of Images (Group_of_Picture_Header) Data (1) of the User and Extension (Extensio_and_User_Data (1)) Header of Image (Picture_Header) Enlargement of Encoded Image (Picture_Coding_Extension) Data (2) of the User and Extensions ( Extension_and_User_Data (2)) Image Data (Picture_Data) Slice Macroblock (Macroblock) Motion Vectors (Motion_Vectors) Pattern of Coded Blocks (Coded_Block_Pattern) Block (Block) Final Code of the Sequence (Sequense_end_Code) The video sequence is composed of these structures, a video sequence is applied for MPEG1 format and MPEG2 format, to distinguish cad The version must validate that immediately after the sequence header the sequence extension is present, if the sequence extension does not precede the header, then it is a video stream in MPEG1 format.
BRIEF DESCRIPTION OF THE INVENTION It is an object of the present invention to provide a method and digital coding system of stereoscopic 3D images that provides coded data for transmission, reception and display in 3Dvisors®. It is another object of the present invention to provide an encoding scheme in which the video_sequence structures of the video data stream are modified and bit-level identification flags are included. It is still an object of the present invention to provide a software process for digital coding of 3D images, in such a way that the header of the video_sequence, the identification flags, the data fields, the image fields are modified. It is still. an object of the present invention, to provide a hardware process of digital coding of 3D images, in such a way that the electronic comparison is made between left and right channel, the correction of errors is made in the process of the difference between images, stores the processed image in the video_sequence with the TDVision® technology detector. It is still an object of the present invention to provide a hardware process for digital encoding of 3D images, in such a way that the memory of the nput_buffer of the DSP is doubled, it has simultaneous input of two independent video signals and it is enabled to DSP to make comparisons of the input buffers of both video signals.
BRIEF DESCRIPTION OF THE FIGURES Figure 1 represents the process of changes in hardware and software for the coding of stereoscopic 3D video images. Figure 2 represents the compilation process for stereoscopic 3D video images compatible with MPEG2-4. Figure 3 represents the software format for the compilation of stereoscopic 3D video images compatible with MPEG2-4. Figure 4 represents the hardware format for the compilation of stereoscopic 3D images compatible with MPEG2-4. Figure 5 represents the map of the branch of technology to which the coder object of the present invention belongs: namely, the process of stereoscopic 3D images, their coding, decoding, cable transmission, satellite and DVD display, HDTV, and 3DVisors®.
DETAILED DESCRIPTION OF THE INVENTION In order to obtain three-dimensional images of a digital video stream, modifications to the current MPEG2 encoders have been made through changes in the hardware and software of the different parts of the coding process, as shown in Figure 1, encoder (1) TDVision® compatible with MPEG2-4, which has its own coding process (2) that is achieved based on changes in software (3) and hardware (4). In figure 2, the compilation process of the encoder object of the present invention is represented, in fact, the image (10) is taken and subjected to a process of motion compensation and error detection (11); the function of the discrete cosine transform is applied to change to frequency parameters (12), then the quantization matrix (13) is applied to carry out a normalization process, the matrix-to-row conversion process is applied (14) , here you have the possibility of carrying out the variable length coding (15) and finally you have the video sequence with encoded data (16). To carry out this compilation process, we must follow a format (30, fig.3) or compilation method of 3D images compatible with MPEG2, in fact, as shown in figure 3, we must modify the video_sequence (31) in the structures sequence_header (32), user_data (33), sequence_scalable_extension (34), picturejpeader (35), picture_coding_extension (36) and picture_temporal_scalable_extension (37), this way you have an appropriate compilation format for stereoscopic 3D digital images taken with a TDVision® stereoprophic camera.
Namely, the structures and video_sequence of the video data stream must be modified to include the necessary flags that identify, at the bit level, the type of image encoded with TDVision® technology. Modifications are made in the following coding stages, that is, when encoding the dual image in MPEG2 (software); when coding the image in hardware.
Software: The headers of the video_sequence are modified. Identification flags are modified. The data fields are modified. The image fields are modified.
Hardware: An electronic comparison is made between left channel and right channel. The difference is processed as a type B image (error correction). It is stored with the TDVision® identifier. The change is applied to the complementary buffer. It is saved and stored in the secondary buffer. In effect, the input memory of the DSP buffer is doubled; the simultaneous entry of two independent video signals, corresponding to the existing left-right stereoscopic signal of a TDVision® stereoscopic camera, is allowed; the DSP is enabled to perform input buffers comparisons for both video signals. The coding procedure in hardware is carried out when coding a frame in normal MPEG2 form depending on a single video input channel, both signals are taken (left and right) and compared electronically, the difference of the comparison is obtained between left signal and right signal, that differential is stored in a temporary buffer, the error correction is calculated in LUMA and CROMA with respect to the left signal; the DCT (Discrete Cosine Transformation) function is applied, the information is stored in a block of type B: a) Inside the identification structure USER_DATA () (SW) b) Inside the structure PICTURE_DATA3D () Continue with the following picture. The hardware is represented in the block diagram of figure 4, in effect, the left signal (41), the right signal (42) are taken, both signals are stored in the temporary buffer (43), the differential of left and right signals (44), the error differential is calculated and the information is stored (45), the correct image is coded (46), the coding is carried out as an "I", "B" type image or "P" (47) and finally it is stored in the video_sequence (48). It is essential that the memory be increased to double to be handled by the DSP and have the possibility of having up to 8 output buffers that allow the previous and simultaneous representation of a stereoscopic image in a device such as the 3DVisor® of TDVision®. In effect, two channels must be initialized by calling the Texas Instrument TMS320C62X DSP programming APIs.
MPEG2VDEC_create (const IMPEG2VDEC_fxns * fxns, const MEPG2VDEC_Params * params). Where IMPEG2VDEC_fxns and MEPG2VDEC_Params are pointer structures that define the operation parameters of each video channel, for example: 3DLhandle = MPEG2VDEC_create (fxns3DLEFT, Params3DLEFT). 3DRhandle = MPEG2VDEC_create (fxns3DRIGHT, Params3DRIGHT). Enabling in this way, two video channels to be decoded and obtaining two video handlers, one for left-right stereoscopic channel. There should be a double buffer of video presentation output, and by software it will be defined to which of the two buffers the output should be presented by calling the AP function: Namely, MPEG2VDEC_APPLY (3DRhandle, inputRI, inputR2, inputR3, 3doutright_pb, 3doutright_fb). MPEG2VDEC_APPLY (3DLhandIe, inputLI, inputL2, inputL3, 3doutIeft_pb, 3doutleft_fb). Where 3DLhandle is the pointer to the handle returned by the DSP's function, the inputl parameter is the address FUNC_DECODE_FRAME or FUNC_START_PARA, input2 is the pointer to the address of the external input buffer, input3 is the size of the external input buffer. 3doutleft_pb is the address of the parameter buffer, 3doutleft_fb is the start of the output buffer where the decoded image will be stored. The timecode and the timestamp will be used to output the final device sequentially synchronized.
The integration of software and hardware processes is carried out by devices called DSP that perform most of the hardware process. These DSPs are programmed by a hybrid of C language and Assembler provided by the manufacturer. Each DSP has its own API that consists of a list of functions or calls to procedures that reside within the DSP that are called by software. With this reference information, the present application is prepared for coding of 3D images that are compatible with the MPEG2 format. In effect, at the beginning of a video sequence, the sequence header and the sequence extension always appear. The repetitions of the sequence extension should be identical to the first, in contrast, the repetitions of the sequence header vary little with respect to the first occurrence, should change only the portion that defines the quantization matrices. Having sequence repeats allows a random access to the video stream, that is, if the decoder wants to start playback halfway through the video stream it will be able to do so, just look for the previous sequence header and the extension of the sequence. the sequence to be able to decode the following images in the video stream. The same happens for video streams that may not start from the beginning, such as a satellite decoder that turns on half the problem. Namely the sequence header, provides a higher level of information about the video stream, for clarity purposes it is also indicated the number of bits corresponding to each, the most significant bits are within the structure extension of the sequence (Sequence_Extension), consists of the following structures: Sequense_Header Field # bits Description Secuence_Header_Code 32 Start of SequenceJHeader 0x00001 B3 Horizontal_Size_Value 12 12 least significant bits of width Vertical Size Value 12 12 least significant bits of height Aspect Ratio Information image aspect 0000 prohibited 0001 n / a TDVision® 00104: 3 TDVision® 0011 16: 9 TDVision® 0100 2.21: 1 TDVision® 0111 will be a logical "and" for backward compatibility with 2D systems . 0101 ... 1111 reserved Frame rate code 0000 prohibited 0001 24,000 / 1001 (23,976) in TDVision® 0010 24 format in TDVision® 0011 format 25"0100 30,000 / 1001 (29.97)" 0101 30 0110 50 0111 60,000 / 1001 (59.94) "(a logical" and "will be made to obtain backwards compatibility with 2D systems.) 1000 60 1111 reserved Bit rate value 18 The least significant bits of the bit rate of the video_stream (bit_rate = 400 x bit_rate_value + bit_rate_extension« 18) the most significant bits are within the sequence_extension structure Marker bit 1 is always 1 (prevents start_code failure) Vbv buffer size value 10 The least significant 10 bits of the vbv_buffer_size, which determines the size of the video buffering verifier (VBV), a structure used to ensure that a data stream can be used by decoding a buffer of limited size without exceeding the buffer or becoming too large.
Constrained_parameters_flag 1 It is always 0, it is not used in MPEG2. Load_intra_quantizer_matrix 1 Indicates whether an intra-coded quantization matrix is available. Yes (load_intra_quantizer_matrix) lntra_quantizer_matrix (64) 8x64 If quantization matrix is indicated then it must be specified here, it is an 8x64 matrix. Load_non_intra_quantizer_matrix 1 If you have a non-intra-quantized matrix, then this flag should be activated. If load_non_intra_quantizer_matrix Non_intra_quantizer__matrix (64) 8x64 If the previous flag is activated, the 8x64 data that make up the quantized matrix are placed here.
Sequence_extension Field # of bits Description Extension_Start_Code 32 Start, sequence_extension, is always 0x000001 B5 Extension_Start_code_ldentifier 4 Extension type identifier 0x1 Profile and level indication 8 Defines the profile and video flow level Progressive_sequence 1 1 = frames, 0 = frames and Chroma field format 2 00 reserved 01 4: 2: 0 10 4: 2: 2 11 4: 4: 4 Horizontal_size_extension 2 Extension of sequence_header Vertical_size_extension 2 Extension of the sequence_header Bit_rate_extension 12 Extension of the sequence_header Marker_bit 1 Always 1 Vbv_buffer_size_extension 8 Extension of the sequence_header Low_delay 1 1 = Does not have type B images, and may also cause underutilization of the VBV buffer during normal playback (known as BIG images) 0 = may contain images of type B; but it can not have BIG images, it will not be able to cause a 'sub-use of the VBV buffer. Frame_rate_extension_n 2 Frame_rate_extension_d 5 Next_start_code () Extension and user data (i) It is a container for storing other structures and does not have data for itself, basically it is a series of structures extension_data (1) and user_data (), in some cases the structure may be completely empty. Extension_data (i) This structure contains a simple extension of structure. The type of extension structure that it contains depends on the value of (i), which can take the values 1 or 2. If it is equal to "0" then the extension_data follows a sequence_extension and the extension_data (i) can contain both: a sequence_display_extension or a sequence_scalable_extension. If i = 2, then the structure follows a picture_coding_extension, which may contain either a quant_matrix_extension (), copyright_extension (), picture_display_extension (), picture_spatial_scalable_extension () or a picture_temporal_scalable_extension. This structure always starts with 0x000001 B5. User_data The structure of user_data allows specific data for an application to be saved within the video sequence (video_sequence). The MPEG2 specification does not define the format of this function or the user's data. The structure starts with user_data_start_code = 0x000001 B5, and contains an arbitrary number of data (user_data) that continues until the next start code in the data stream (stream). The only condition is that it does not have more than 23 consecutive zeros, since it could be misinterpreted as a start code.
Sequence_display_extension () This structure provides information that is not used in the decoding process, information regarding the encoded content that is useful for correctly displaying the decoded video.
Sequence_display_extension () Field # of bits Description Extension_start_code_identifier 4 Must be 2, identifies the start Video format 3 000 components 001 PAL 010 NTSC 011 SECAM 100 MAC 101 Not specified 110 Reserved, TDVision® 111 Reserved, TDVision® Color_description 1 0 = does not specify color parameters. 1 = contains the following 3 color parameters. Color_rprimaries 8 0 prohibited 1 recommendation ITU-R BT.709 2 video not specified 3 reserved 4 recommendation ITU-R-BT470-2SYSTEM 5 recommendation ITU-R-BT.470-2 SYSTEM B, G 6 SMPTE 170M 7 SMPTE 240M 8 - 255 reserved Transfer characteristics 8 0 Prohibited 1 Recommendation ITU-RBT.709 2 video not specified 3 reserved 4 recommendation ITU-RT.470-2 SYSTEM M 5 recommendation ITU-R-BT.470-2 SYSTEM B, G 6 SMPTE 170M 7 SMPTE 240M 8 real transfer characteristics 255 reserved Matrix coefficients 8 0 prohibited 1 Recommendation ITU-RBT8709 2 video not specified3 reserved 4 FCC 5 recommendation ITU-R-BT.470-2 SYSTEM B, G 6 SMPTE 170M 7 SMPTE 240M 8 - 255 reserved Display_horizontal_style 14 Not specified in MPEG2 Marker_bit 1 Always 1 Display_vertical_size 14 Not specified in MPEG2 Next start code Secuence_ scalable_extensión This structure must be presented in all scalable video stream, which is one that contains a base layer as well as one or more layers of improvement, there are different types of scalability in MPEG2, an example of scalability for the main layer is that contains standard definition of the video content, while the extension layer has additional data that increase the definition.
Secuence_ scalable_extension Description # bits Description Extens¡on_start_code_identifier 4 Always 5 Scalable mode February 00 partitioned data 01 spatial scalability 10 SNR scalable 11 temporary scalability Layerjd 4 Layer number (0) Lower_layer_prediction_horizontal_size 14 marker_bit 1 5 LowerJayer_prediction_vertical_size 14 Horizontal_subsampIing_factor_m Vertical_subsampling_factor_m Horizontal_subsampling_factor_n 5 5 5 Vertical_subsampling_factor_n Picture_mux_enable 1 mux_to_progressive_sequence 1 Picture_mux_order 3 Picture mux factor 3 G rou p_of_pictu res_head er () This structure marks the beginning of a group of images. Group_of_pictures_header () Field # bits Description Group_start_code 32 0x000001 B8 Time_code 25 Time stamp for the first image that precedes the group_of_picture_header Dromp_frame_flag- Time__code_hours-5 Time__code_minutes-6 Marker_bit-1 Time__code_seconds-6 Time_code_pictures-6 Closed_gop If 1 = image B, it does not reference to previous images.
Broken link 1 = indicates that you need a frame (type I) that does not exist anymore 0 = the link has not been broken (link) Next_start_code () Picture_header Field # bits Description Picture_start_code 32 0x00000100 Temporal_reference 10 Order of display of images Picture_coding_type 3 0000 prohibited 001 intra-coded (I) 010 predictive-coded (P) 011 (B) bidirectional_predictive_coded 100 reserved for MPEG1 101 reserved 110 reserved 111 reserved Vbv_delay 16 video buffer verification mechanism (temporary memory) Full_pel_forward_vector 1 used in MPEG1 MPEG2 = 0 Forward_f_code 3 Used in MPEG1 MPEG2 = 111 Full_pel_backward_vector 1 Used in MPEG1 MPEG2 = 0 Backward_f_code 3 Used in MPEG1 MPEG2 = 111 Extra_bit_picture 1 Extra_information_picture can be ignored 8 You can ignore Extra_bit_picture 1 You can ignore Extra start code Picture_coding_extension Field # bits Description Extension_start_code 32 Always 0x000001 B5 Extension_start_codejdentif¡er 4 Always 1000 F_code (0) (0) 4 Used to decode motion vectors, when it is a type I image, this data is filled with 1111. F_code (0) (1) 4 F_code (1) (0 ) 4 Decode back information in motion vectors (B), when it is a type image (P) it should be put in 1111, because they have no backward movement. F_code (1) (1) Decode information backwards in motion vectors, when it is a P type image it must be put in 1111, because they have no backward movement. lntra_dc_precision precision used in the inverse quantization of the coefficients of the DC cosine discrete transform. 00 precision of 8 bits 01 precision of 9 bits 10 precision of 10 bits 11 precision of 11 bits Picture structure Specifies whether the image is divided into fields or a full frame 00 reserved (image in TDVision® format) 01 upper field 10 lower field '11 image per frame Top_field_first 0 = decode the lower field first 1 = decode the upper field first Frame_pred_frame_dct Concealment_motion_vectors Q_scale_type lntra_vic_format Altérnate be Repeat_first_field 1 0 = display a progressive frame 1 = display two identical progressive frames Chroma_420_type 1 If the chroma format is 4: 2: 0 then must be equal to progressive_frame, otherwise it must equal zero Progressive_frame 1 0 = interlaced 1 = progressive Composite_display_flag 1 warns about the originally encoded information V_axis 1 Field_sequence 3 Sub carrier 1 Burst_amplitude 7 Sub_carrier_phase 8 Next_start_code () Picture_temporal_scalable_extension () In case of having temporal scalability, there are two identical spatial resolution streams, the lower layer provides a lower index version of video frames, while the upper layer can be used to derive a higher index version of frames from the same video. Temporary scalability can be used by low quality, low cost or totally free decoders, while the higher frame rate would be used for a payment.
Picture_temporal_scalable_extension () Field # of bits Definition Extension_start_code_identifier 4 Always 1010 Reference_select_code 2 Used to indicate which reference image will be used to decode intra_coded images FOR IMAGES TYPE O 00 improvement in the most recent images 01 the layer of the lowest and most recent frame in order of deployment 10 the next layer of under frame in order of deployment 11 prohibited. FOR IMAGES TYPE B 00 prohibited 01 most recently decoded images in enhanced mode 10 most recently decoded images in enhanced mode 11 most recent image in the lower layer in order of display Forward_temporal_reference 10 Temporal reference Marker_bit 1 Backward_temporal_reference 10 Temporary reference Next_star_code ( ) Picture_spatial_scalable_extension () In case of spatial image scalability, the enhancement layer contains data that allows a higher resolution of the base layer so that it can be reconstructed. When an improvement layer is used based on a base layer as a reference for motion compensation, then the lower layer must be scaled and out of phase to obtain the highest resolution of the improvement layer.
Picture_spatial_scalable_extension () Field # of bits Definition Extension_start_code_identifier 4 Always 1001 Lower_layer_temporal_reference 10 Reference to the temporal image of the lower layer Marker_bit 1 1 Lower_layer_horizontal_offset 15 Compensation (Offset) horizontal Marker_bit 1 1 LowerJayer_veretical_offset 15 Compensation (Offset) vertical Spatial_temporal_weight_code_tabIe_¡ndex 2 Prediction details Lower_layer_progressive_frame 1 1 = progressive 0 = interlaced LowerJayer_desinterIaced_field_select 1 0 = the upper field is used 1 = the lower field is used Next_start_code () Copyright_extension () Extension_start_code_identifier 4 Always 010 Copyright_flag 1 if it is equal to 1 then use copyright If it is zero (0) it does not require additional copyright information Copyright_¡dentifier 8 1 = original 0 = copy Original_or_copy 1 Reserved 7 Marker_bit 1 Copyright_number_1 20 Number granted by copyright instance Marker_bit 1 Copyright_number_2 22 Number granted by copyright instance Marker_bit 1 Copyright_number_3 22 Number granted by copyright instance Next_start_code () Picture_data () This is a simple structure, it does not have fields in itself. Slice () Contains information from one or more macroblocks in the same vertical position. Slice_start_code 32 Slice_vertical_position_extension3 Priority_breakpoint 7 Quantizer scale code 5 lntra_slice_flag 1 lntra_slice 1 Reserved_bits 7 Extra_bit_slice 1 Extra_information_slice 8 Extra_bit_slice 1 Macroblock () Macroblock_modes () Motion_vectors () Motion_vector () 'Coded_block_pattern () Block () EXTENSION_AND_USER_DATA (2) This encoding process compatible with MPEG2 is currently used to encode 3D digital images that are taken with the c. { stereoscopic magnet (52) of fig. 5, they pass to the compiler (51) and from there the signal can be obtained to display it on a PC (50), a DVD (53); when encoding the signal in the encoder (54), it can be sent to a decoder (55) so that the image is displayed via cable (56), satellite (57), high definition television (59) (HDTV), or in a 3DVisor® device (59), etc. So in this way, you can display the image in: DVD (Digital Versatile Disks) DTV (Digital Television) HDTV (High Definition Television) CABLE (DVB Digital Video Broadcast) SATELLITE (DSS Digital Satellite Systems); and it is the integration of software and hardware processes. In the hardware part, most of the process is carried out by devices called DSP (Digital Signal Processors) or digital signal processor. Namely, a model of Motorola and one of Texas Instruments (TMS320C62X). These DSPs are programmed by a hybrid of C language and assembler, provided by the manufacturer in question. Each DSP has its own API that consists of a list of functions or calls to procedures that are located within the DSP to be called by software. From this reference information, the 3D images that are compatible with the MPEG2 format and also with its own coding algorithm are encoded. When coding the information, the DSP is responsible for performing the processes of prediction, comparison, quantization, application of the DCT function to form the MPEG2 compacted video stream. Particular embodiments of the present invention have been illustrated and described, it will be obvious to those skilled in the art that various modifications or changes may be made without departing from the scope of the present invention. The foregoing is intended to be covered by the appended claims so that all changes and modifications fall within the scope of the present invention. Having described the foregoing invention, the content of the following claims is claimed as property:

Claims (12)

  1. CLAIMS 1. - Method and digital coding system of stereoscopic 3D video images, consisting of an MPEG-type coding process, which includes a software algorithm and changes in associated hardware, characterized in that the software is modified in the coding process, in effect : modification of video structures; modification of the video_sequence header of the video data stream; modification of identification flags at the bit level; modification of data fields; modification of the image fields. 2. Method and digital encoding system of stereoscopic 3D video images according to claim 1, further characterized in that the video_sequence of the video data stream consists of the following structures: sequence_header; sequence_extension; extension_and_user_data (0); group_of_pictures_header; extension_and_user_data (1); picture header; picture_coding_extension; extensions_and_user_data (2); picture_data; slice; macroblock; motion ectors; coded_block_pattern; block; sequence_end_code, applies to MPEG1 and MPEG2 format. 3. Method and digital encoding system of stereoscopic 3D video images according to claim 1, further characterized in that the sequence_header structure provides a higher level of information about the video flow in the field of aspect_ratio_information, in which will do a logical "and" with 0111 to obtain backward compatibility with 2D systems; and frame_rate_code in which a logical "and" will be done with 0111 to obtain backward compatibility with 2D systems. 4. Method and digital encoding system of stereoscopic 3D video images according to claim 1, further characterized in that the structure extension_and_user_data (i) is a container for storing other structures and in some cases may be completely empty, the value of i can be 0 or 2, if it is equal to 0 then the extension_data follows a sequence_extension and the extension__data (i) contains a sequence_display_extension or sequence_scalab! e_extension; when i = 2, then the structure that follows is a picture_coding_extension that contains a quant_matrix_extension (), copyright_extension () picture_spatial_scaIable_extension (), or a picture_temporal_scalable_extension. 5. - Method and digital encoding system of stereoscopic 3D video images according to claim 1, further characterized in that the structure sequence_display_extension () provides information regarding the encoded content that is useful for displaying the video correctly; in the field of video_format it identifies it with 111. 6.- Method and digital coding system of stereoscopic 3D video images according to claim 1, further characterized because the structure sequence_scalable_extension has additional data that increase the definition, since it contains a layer base and layers of improvement, with spatial scalability modes 01 and temporal scalability 11; layerjd; lower_layer_pred¡ction_vertical_size; marker_b¡t; lower_layer_pred¡ction_vertical_si ze; horizontal_subsampIing_factor_m; horizontal_subsampIing_factor_n; vertical_su bsampling_factor_m; vertical_subsampling_factor_n; picture_mux_enable; mux_to_progressive_sequence; picture_mux_order; picture_mux_factor 7. Method and digital coding system of stereoscopic 3D video images according to claim 1, further characterized in that the structure picture_header is defined with a field of type of image code (picture_coding_type), 010 for predictive coding (P) , 011 for bidirectionally predictive coding (B); a verification mechanism of the video temporary memory (video buffer). -8.- Method and digital coding system of stereoscopic 3D video images according to claim 1, further characterized because the field picture_structure, specifies whether the image is divided into fields or is a complete picture; 00 reserved for image in TDVision format, 01 upper field, 10 lower field, 11 image per frame; it also defines the following fields: composite_display_flag; V_axis; field_sequence; sub_carrier; burst_amplitude; sub_carrier_phase 9.- Method and digital coding system of stereoscopic 3D video images, consisting of an MPEG-type coding process, which includes a software algorithm and changes in associated hardware, characterized in that the hardware is modified in the coding process, in effect: Two independent video input channels are enabled when making the electronic comparison between the left and right channels, the comparison difference between the left and right channels is processed, the memory is doubled to allow the previous and simultaneous presentation of a stereoscopic image; the DSP is enabled to perform simultaneous input buffers comparisons for both left-right video signals. 10. - Method and digital encoding system of steropic 3D video images according to claim 9, further characterized in that the enabling of two independent video channels allows the simultaneous input of two independent video signals, corresponding to the existing left steropic signal- right of a TDVision® camera. 11. Method and digital encoding system of steropic 3D video images according to claim 9, further characterized by doubling the memory of the DSP input buffer. 12. Method and digital encoding system of steropic 3D video images according to claim 9, further characterized in that to make the modifications in hardware the procedure has the steps: _Codify frame in normal form (MPEG2) depending on a only video input channel; _Taking both signals; _Compare the left signal and right signal electronically; _Get the error differential between the right and left signal; _A! Store the differential in a temporary buffer; _Calculate the error correction in luma and chroma with respect to the left signal; ^ Apply DCT; _Store the information in block type B, inside the structure picture_data3D ().
MXPA06009733A 2004-02-27 2006-08-25 Method and system for digital coding 3d stereoscopic video images. MXPA06009733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
MXPA06009733A MXPA06009733A (en) 2004-02-27 2006-08-25 Method and system for digital coding 3d stereoscopic video images.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PCT/MX2004/000011 WO2005083636A1 (en) 2004-02-27 2004-02-27 Method and system for digital coding 3d stereoscopic video images
MXPA06009733A MXPA06009733A (en) 2004-02-27 2006-08-25 Method and system for digital coding 3d stereoscopic video images.

Publications (1)

Publication Number Publication Date
MXPA06009733A true MXPA06009733A (en) 2007-03-15

Family

ID=40259070

Family Applications (1)

Application Number Title Priority Date Filing Date
MXPA06009733A MXPA06009733A (en) 2004-02-27 2006-08-25 Method and system for digital coding 3d stereoscopic video images.

Country Status (1)

Country Link
MX (1) MXPA06009733A (en)

Similar Documents

Publication Publication Date Title
US20190058865A1 (en) System and method for encoding 3d stereoscopic digital video
US9503742B2 (en) System and method for decoding 3D stereoscopic digital video
US5568200A (en) Method and apparatus for improved video display of progressively refreshed coded video
US8170097B2 (en) Extension to the AVC standard to support the encoding and storage of high resolution digital still pictures in series with video
RU2511595C2 (en) Image signal decoding apparatus, image signal decoding method, image signal encoding apparatus, image encoding method and programme
US20090141809A1 (en) Extension to the AVC standard to support the encoding and storage of high resolution digital still pictures in parallel with video
KR20010022752A (en) Trick play signal generation for a digital video recorder
Haskell et al. Mpeg video compression basics
Teixeira et al. Video compression: The mpeg standards
Aramvith et al. MPEG-1 and MPEG-2 video standards
Baron et al. MPEG overview
CN116134821A (en) Method and apparatus for processing high level syntax in an image/video coding system
JP5228077B2 (en) System and method for stereoscopic 3D video image digital decoding
MXPA06009733A (en) Method and system for digital coding 3d stereoscopic video images.
JP5227439B2 (en) Stereo 3D video image digital coding system and method
KR20070011340A (en) Method and system for digital coding 3d stereoscopic video images
SECTOR et al. Information technology–Generic coding of moving pictures and associated audio information: Video
MXPA06009734A (en) Method and system for digital decoding 3d stereoscopic video images.
CN101917616A (en) The method and system that is used for digital coding three-dimensional video image
Recommendation Generic coding of moving pictures and associated audio information: Video
KR20070011341A (en) Method and system for digital decoding 3d stereoscopic video images
LGGGGGG C ZIT źd d': http:% pic. gc. ca-Ottawa-Hull KlA 0C9-http://cipo. gc. ca () PI
Kher Object tracking in compressed MPEG video bitstream

Legal Events

Date Code Title Description
FG Grant or registration