WO2004114666A1 - Constant stream compression processing method - Google Patents

Constant stream compression processing method

Info

Publication number
WO2004114666A1
WO2004114666A1 PCT/CN2003/000486 CN0300486W WO2004114666A1 WO 2004114666 A1 WO2004114666 A1 WO 2004114666A1 CN 0300486 W CN0300486 W CN 0300486W WO 2004114666 A1 WO2004114666 A1 WO 2004114666A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
stream
audio
sub
gop
Prior art date
Application number
PCT/CN2003/000486
Other languages
French (fr)
Chinese (zh)
Inventor
Sannan Yuan
Mei Xue
Qin Wang
Original Assignee
Shanghai Dracom Communication Technology Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dracom Communication Technology Ltd. filed Critical Shanghai Dracom Communication Technology Ltd.
Priority to PCT/CN2003/000486 priority Critical patent/WO2004114666A1/en
Priority to AU2003248219A priority patent/AU2003248219A1/en
Priority to CN200410049156.5A priority patent/CN1638480A/en
Publication of WO2004114666A1 publication Critical patent/WO2004114666A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the invention relates to a data processing method for the MPEG-2 compression standard, in particular to process video and audio compressed data to a constant flow rate, which makes it suitable for real-time transmission and playback on a streaming media network, and can perform code matching. Data processing method for random access of streams.
  • MPEG Motion Pictures Experts Group
  • ISO International Organization for Standardization
  • DCT Discrete Cosine Transform
  • Motion Compensation reduces the temporal redundancy of the image
  • Huffinan coding reduces the image's redundancy in information (Entropy).
  • Entropy the degree of the image's redundancy in information
  • MPEG2 The MPEG2 standard is similar to MPEG1, but it is more adaptable and can be applied to all processes and links of broadcast television.
  • MPEG1 is actually a subset of MPEG2. This can be seen in the following MPEG2 class and level classification table.
  • the MPEG2 standard is divided into four files, which are:
  • the system layer (System, IS013818-1) describes the video and audio data multiplexing method and the video and audio synchronization method.
  • Video compression layer (Video, IS013818-2), which describes the digital video encoding method and decoding process.
  • Audio compression layer (Audio, IS013818-3), which describes the digital audio encoding method and decoding process.
  • Consistency (Conformance, IS013818-4), explaining the process of testing the coded stream to verify compliance with the requirements of the first three documents.
  • the MPEG 2 compression algorithm is defined as a universal video and audio compression standard at the time of design. It is required to take into account different application requirements and control the compressed output bit rate and image quality. To this end, the MPEG 2 compression algorithm is divided into different levels and categories. Classes define chrominance spatial resolution and output bitstream control, while classes define image resolution, luminance sampling frequency, number of layers of video and audio that can be supported by the scalable class, and the maximum bitstream corresponding to each class of the class .
  • MPEG uses a syntax to define a hierarchical structure. Its structure is divided into six layers, from top to bottom:
  • An image sequence is composed of a group of images, and has an image sequence header indicating the beginning and an image termination code indicating the end. Is a random access paragraph.
  • GOP The group of pictures
  • N length
  • M frame repetition frequency
  • An image is an independent display unit and a basic coding unit.
  • images can be progressive or interlaced. This is different from MPEG1, which is always progressive.
  • a macroblock strip contains several consecutive macroblocks and is a unit of resynchronization.
  • the purpose of setting the macroblock strip is to prevent the spread of error codes. When an error occurs in a macroblock strip, it does not affect the subsequent decoding of the macroblock strip.
  • the image is divided into 16 ⁇ 16 macroblocks in a luminance array.
  • Macroblocks are the basic unit for motion compensation.
  • a macro block contains 4 8X8 luma blocks.
  • a macro block also contains two 8X8 chroma blocks (one each for RY and BY, when 4: 2: 0 are sampled) or four 8X8 chroma blocks ( RY and BY two each, 4: 2: 2 sampling).
  • a block is a unit for performing DCT operations and contains only brightness or only chrominance.
  • MPEG is based on DCT, motion compensation, and Huffman coding algorithms. Therefore, MPEG uses two methods of intra-frame compression and inter-frame compression in compression. To achieve the maximum compression ratio in encoding, MPEG uses three types of images, g
  • I-frame Intra-Fmme
  • I-frames are intra-frame compression, does not use motion compensation, and provides a medium compression ratio. Because I-frames do not depend on other frames, they are the entry point for random access, and they are also the reference frames in decoding.
  • the P-frame (Predicated-Frame) performs prediction based on the previous I-frame or P-frame, and uses motion compensation algorithm to compress. Therefore, the compression ratio is higher than that of the I-frame, and the average data volume is about 1/3 of the I-frame.
  • the P frame is a reference frame for decoding the preceding and succeeding B frames and the succeeding P frames. The P frame itself has errors. If the previous reference frame of the P frame is also a P frame, it will cause error propagation.
  • B frame (Bidirectinal-Frame) is a frame based on interpolation reconstruction. It is based on two I, P frames or P, P frames before and after. It uses bidirectional prediction, and the average data volume can reach about 1/9 of I frames. B frame The body is not used as a reference, so it can not propagate errors while providing a higher compression ratio.
  • a GOP consists of a series of I, B, and P frames, starting with an I frame.
  • the number of frames of an image in a GOP is variable.
  • a large number of frames can provide a high compression ratio, but it will cause random access delay (must wait until the next I frame) and accumulation of errors (error propagation of P frames).
  • the structure of the GOP is not specified in MPEG2, and the frame repetition mode can be IP, IB, IBP, IBPB, or even all I frames.
  • the repetition frequency of the reference frame is represented by M. Different frame repetition frequencies provide different output bit rates and affect the access delay.
  • M-JPEG and DV can provide frame-accurate random access.
  • M-JPEG and DV can provide frame-accurate random access.
  • the compressed data stream of MPEG2 is Based on I, P frames or I, P, B frames, this cannot be done. This is brought by the motion compensation compression algorithm, and the advantages and disadvantages of the new technology are shown here.
  • a GOP in order to decode P frames and B frames, it must depend on I frames, so when accessing a video stream, it must enter from I frames. The consequences of this problem in different applications vary widely.
  • the delay caused by the digital video decoding box waiting for the arrival of the I-frame of the new channel is not a problem, because there are at least two I-frames per second, and the viewer does not care about this small delay.
  • TV station business it is difficult to control the starting point and length of the insertion, and the slowness of material search during non-linear editing. Therefore, the rate of existing MPEG-2 streams, such as DVD, varies with the content of the picture. This is not conducive to real-time playback through the network transmission, which will cause the decoder's VBV buffer to overflow or underflow, and the picture will appear mosaic, block and fluctuate, and even cause the decoder to stop working.
  • the invention provides a method for processing video and audio data compressed by the MPEG-2 compression standard, which overcomes the above problems.
  • the technology of the present invention is to make the MPEG-2 video stream have a constant number of image frames in GOP units, and the code stream length in GOP units is absolutely constant, which ensures a constant video stream rate.
  • the video stream and the audio stream are multiplexed to obtain a system stream.
  • the audio stream is originally a constant stream, and the traffic of the sub-image and the like is very small compared to the video stream, so only a small amount of redundancy is required for each GOP , You can get a constant current MPEG-2 program stream.
  • the invention provides a method for processing video and audio data for an MPEG-2 program stream, which includes the following steps: (1) analyzing a video object file to determine parameters to be used in subsequent steps; (2) correlating The video object file is pre-processed; (3) the demultiplexed video stream is re-encoded to obtain constant stream video data; (4) the constant stream video stream and the extracted audio / sub-image packet are multiplexed, and Constant current again to get the final data flow.
  • the invention can achieve a fixed length between adjacent I frames and a fixed number of frames between adjacent I frames. This is equivalent to a fixed GOP code stream length and a fixed number of frames in the GOP. This makes searching, positioning, editing, and other operations very easy. Another benefit is that it is suitable for web-based real-time playback. It will cause the overflow or underflow of the VBV buffer on the decoder side, and the picture will not appear mosaic, block and fluctuate.
  • FIG. 1 is a flowchart of an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a GOP frame arrangement in the pulldown situation of the present invention.
  • FIG. 3 is a schematic diagram of a GOP frame arrangement in the present invention without pulldown.
  • the three main parameters SCR / PTS / DTS that will appear in the following are the time tags characterized at specific positions in the system stream. They are all a small piece of data inserted by the encoder in the data stream. among them:
  • SCR is the system clock reference. Insert at least every 0.7 seconds.
  • the decoder extracts the SCR from the data stream and sends the SCR to the image decoder and audio decoder to synchronize the internal clock with the system clock.
  • An image can be divided into many "display units", and the display unit of an image is a frame.
  • PTS indicates the display time of the display unit.
  • the decoder checks the PTS and compares it with the SCR, and displays the image accordingly to synchronize it with the system time.
  • Step 1 in Figure 1 First analyze the video target file to determine the parameters that will be used in subsequent steps. These parameters include: 1. After decoding the output, whether to display an image repeatedly every 2 frames, that is, whether to 3: 2 pulldown. 3: 2 pulldown refers to the repeated display of an image every 2 frames after decoding the output. This is because the frame rates of film 24 s and NTSC 30 s are different. The adjustments needed to be performed in the system conversion; 2.
  • How many packets are cut ( Video target 1, video target 2, ..., keep the last video target program; make pure pulldown or not; and cut off the black screen); 3. Determine the video format, that is, PAL or NTSC. 4. Determine the stream ID of the required audio stream and subtitle stream; 5. Determine the audio traffic (kbps); 6. Determine the frame rate (frame / seconds).
  • Step 2 in Figure 1 to preprocess the relevant video object files, including the following:
  • the present invention adopts to repeat the I frame data in the first GOP twice to cover the next two. B-frame data so that no mosaic occurs.
  • the PTS of an audio packet refers to the display time of the audio frame header that first appears in the packet.
  • the original PTS of the first audio display unit and the first video display unit are compared to obtain the difference.
  • the PTS of the first video display unit of the video is about 0.28 seconds. Value to correct the PTS of the first display unit of the audio. Since every audio frame
  • the display time is fixed. For example, the time interval of an audio frame of Dolby AC3 is 32 milliseconds, so as long as the first audio frame header in the packet is the number of audio frames, the PTS of the packet can be easily calculated. .
  • the PTS of the video packet is not calculated strictly according to the video frame rate in the code stream, such as 29.97 frames / second, and the re-encoded and multiplexed code stream of the present invention strictly follows the frame. Rate to calculate SCR, PTS, and multiplex audio and video streams accordingly, which will cause audio and video to be out of sync. This is why the PTS and SCR of the audio package and the sub-picture package should be corrected here.
  • the solution is: Multiply the PTS and SCR of the audio package by the scaling factor.
  • the scaling factor is the ratio of the theoretical display time (calculated based on the number of display units) of the video stream to the audio stream in the original video object file.
  • the reason for scaling audio is that the constant bit rate transmission code stream on the network is based on the transmission of a fixed number of video frames in a fixed time.
  • Sub-image packs appear much less frequently than video and audio, and they do not have the characteristics of constant flow of audio and video streams.
  • the SCR and PTS correction schemes of the present invention for sub-picture packs are also different from audio packs.
  • the present invention corrects the SCR and PTSo of the sub-image packet according to the difference between the SCR of the navigation packet of the GOP where the sub-image packet is located in the original file and the theoretical SCR.
  • step 3 in Figure 1 Re-encode the video data stream to re-encode the constant stream Video data.
  • the definition of video constant current is: one GOP is fixed at 12/15 frames, which is equivalent to fixing the playback time of one GOP; the length of a GOP stream is fixed (in bytes); one GOP is fixed at the video sequence header The codeword Ox 000001b3 starts.
  • the constant current process when the encoding length of a G0P is greater than the specified value, re-encoding (if the re-encoding cannot be performed, the specified length is intercepted, which will cause mosaic, which should be avoided if possible); When the encoding length of a GOP is less When the value is specified, data 0 is filled until the length is equal to the specified value.
  • FIG. 2 is a frame arrangement of a 12-frame GOP as defined in the pulldown case according to the present invention.
  • FIG. 3 is a GOP of 15 frames defined according to the present invention without pulldown. According to different needs, GOPs of different lengths and frame structures can also be defined. Obviously, this is also within the protection scope of the present invention. .
  • step 4 in Figure 1 Multiplex the video stream after the constant stream and the extracted audio / sub-image packets, and then constant stream again to obtain the final data stream. It includes the following:
  • the audio or sub-picture packet should be located in several GOPs and the position in the GOP, and inserted according to this position. 3. Make the number of packets in each GOP constant (constant current), including video packets, audio packets, and sub-image packets. If the number is less than the specified value, it is filled with all-zero packets (2048 bytes). If the number is less than the specified value, the extra audio packets and sub-picture packets are moved to the beginning of the next GOP.
  • the rules are: As the packets are transmitted on the network, the time interval for sending packets is fixed. Furthermore, the playback time of each GOP of the present invention is fixed, and the number of packets per GOP is fixed.
  • the time information in the packet SCR refers to the time when the first byte of the packet is expected to reach the decoder. Therefore, the SCR of each packet should be increased evenly in turn.
  • the method of the present invention can achieve a fixed length between adjacent I frames and a fixed number of frames between adjacent I frames, which is equivalent to a fixed GOP code stream length and a fixed number of frames in the GOP, so that search and positioning , Editing and other operations are very easy.
  • the method of the present invention is also suitable for network-based real-time playback, which does not cause overflow or underflow of the VBV buffer at the decoding end, and the picture does not appear mosaic, blocking, and fluctuating.

Abstract

A method of processing video/audio data compressed according to the MPEG-2 standard, including the steps of: (l) analyzing the video objects, determing the parameters used in the following steps; (2) preprocessing associated video objects; (3) re-encoding video data stream demultiplexed, outputting constant video data stream; (4) multiplexing constant video data stream and sampled audio/sub-picture packets, outputting constant data stream. The method of compressing video/audio data to constant stream enables real-time transmission/playing of compressed video/audio data over stream networks and can randomly access bit stream.

Description

恒流压縮处理方法 技术领域  Constant current compression processing method
本发明涉及一种针对 MPEG-2压缩标准的数据的处理方法,尤其是将视 音频压缩数据处理为流量恒定, 使其适宜于在流媒体网络上实时地传输、 播放, 并可以做到对码流的随机访问的数据处理方法。  The invention relates to a data processing method for the MPEG-2 compression standard, in particular to process video and audio compressed data to a constant flow rate, which makes it suitable for real-time transmission and playback on a streaming media network, and can perform code matching. Data processing method for random access of streams.
背景技术  Background technique
MPEG (Motion Pictures Experts Group), 译为运动图像专家组, 它是在 国际标准化组织(ISO)的召集下, 为数字视频和音频制定压缩标准的专家 组。 该组织最初在 1992年制定了 MPEG1的标准, 应用于激光视盘的节目传 播。 广播电视行业从 MPEG1标准的应用上看到了 MPEG技术对于电视行业 的意义, 于是该组织又在 1994年推出了 MPEG2压缩标准, 建立了全世界范 围内视音频服务与应用进行相互操作的可能性。  MPEG (Motion Pictures Experts Group), translated as the Motion Picture Experts Group, is an expert group convened by the International Organization for Standardization (ISO) to develop compression standards for digital video and audio. The organization originally formulated the MPEG1 standard in 1992, which was used for the transmission of programs on laser discs. The broadcasting and television industry saw the significance of MPEG technology for the television industry from the application of the MPEG1 standard, so the organization launched the MPEG2 compression standard in 1994, establishing the possibility of interoperability of video and audio services and applications worldwide.
有三个关键的压缩技术被 MPEG压缩标准使用, 这就是离散余弦变换 (DCT)、 运动补偿(Motion Compensation)和 Huffinan编码。 DCT降低了 图像的空间(Spatial)冗余度, 运动补偿降低了图像的时间(Temporal)冗 余度, 而 Huffinan编码则降低了图像在信息 (Entropy) 方面的冗余度。 这 几种技术的综合应用, 使得 MPEG的压缩率较高。  There are three key compression techniques used by the MPEG compression standard: Discrete Cosine Transform (DCT), Motion Compensation, and Huffinan coding. DCT reduces the spatial redundancy of the image, motion compensation reduces the temporal redundancy of the image, and Huffinan coding reduces the image's redundancy in information (Entropy). The comprehensive application of these technologies makes the compression rate of MPEG higher.
MPEG2标准类似于 MPEG1 , 但是它的适应性更强, 可以适用于广播电 视的所有过程和环节。从定义上来看, MPEG1实际上是 MPEG2的一个子集。 这在后面的 MPEG2的类和级的分类表中可以看出。  The MPEG2 standard is similar to MPEG1, but it is more adaptable and can be applied to all processes and links of broadcast television. By definition, MPEG1 is actually a subset of MPEG2. This can be seen in the following MPEG2 class and level classification table.
MPEG2标准分为四个文件, 分别是: 系统层 (System, IS013818-1 ), 描述视、 音频的数据复用方式和视、 音频同步方式。 The MPEG2 standard is divided into four files, which are: The system layer (System, IS013818-1) describes the video and audio data multiplexing method and the video and audio synchronization method.
视频压缩层(Video, IS013818-2),描述数字视频编码方式和解码过程。 音频压缩层(Audio, IS013818-3),描述数字音频编码方式和解码过程。 一致性(Conformance, IS013818-4), 说明测试编码码流的过程, 检验 是否符合前三个文件的规定。  Video compression layer (Video, IS013818-2), which describes the digital video encoding method and decoding process. Audio compression layer (Audio, IS013818-3), which describes the digital audio encoding method and decoding process. Consistency (Conformance, IS013818-4), explaining the process of testing the coded stream to verify compliance with the requirements of the first three documents.
MPEG 2压缩算法在设计时被定义为一个通用的视音频压缩标准,要求 能兼顾不同的应用要求, 可以对压縮的输出码率和图像质量进行控制。 为 此, MPEG 2压缩算法分成不同的级别和类别。 类定义色度空间分辨率和 输出比特流控制, 级则定义图像分辨率、 亮度取样频率、 可分级的类所能 支持的视音频的层数、 对应于该级的每一类的最大比特流。  The MPEG 2 compression algorithm is defined as a universal video and audio compression standard at the time of design. It is required to take into account different application requirements and control the compressed output bit rate and image quality. To this end, the MPEG 2 compression algorithm is divided into different levels and categories. Classes define chrominance spatial resolution and output bitstream control, while classes define image resolution, luminance sampling frequency, number of layers of video and audio that can be supported by the scalable class, and the maximum bitstream corresponding to each class of the class .
MPEG为更好地表示编码数据, 用句法规定了一个层次性的结构。其结 构分为六层, 自上到下分别是:  In order to better represent the encoded data, MPEG uses a syntax to define a hierarchical structure. Its structure is divided into six layers, from top to bottom:
图像序列层 (Video Sequence)  Video Sequence Layer
图像组(GOP, Group of Pictures)  Group of Pictures (GOP)
图像(Picture)  Picture
宏块条(Slice)  Macro block slice
宏块 (Macroblock)  Macroblock
块(Block)  Block
图像序列是由图像组构成的,有表示开始的图像序列头和表示结束的图 像终止码。 是随机存取段落。  An image sequence is composed of a group of images, and has an image sequence header indicating the beginning and an image termination code indicating the end. Is a random access paragraph.
图像组(GOP)是为方便随机存取而加的, 其结构和长度均为可变的, MPEG2对此没有硬性规定。 GOP有两个参数, 即长度 (N)和帧重复频率 (M), 下面将会作解释。 图像组是随机存取的视频单位。 The group of pictures (GOP) is added to facilitate random access. Its structure and length are variable. MPEG2 has no hard rules for this. GOP has two parameters, namely length (N) and frame repetition frequency (M), which will be explained below. Groups of pictures are randomly accessed video units.
图像是独立的显示单位, 也是基本编码单位。 在 MPEG2中, 图像可以 是逐行的, 也可以是隔行的, 这一点与 MPEG1不同, MPEG1总是逐行的。  An image is an independent display unit and a basic coding unit. In MPEG2, images can be progressive or interlaced. This is different from MPEG1, which is always progressive.
宏块条包含若干个连续的宏块, 是重新同步单位。 宏块条的设置目的 是防止误码的扩散, 当一个宏块条出现误码时, 不影响后续的宏块条解码。  A macroblock strip contains several consecutive macroblocks and is a unit of resynchronization. The purpose of setting the macroblock strip is to prevent the spread of error codes. When an error occurs in a macroblock strip, it does not affect the subsequent decoding of the macroblock strip.
图像以亮度阵列被分成 16X16的宏块, 宏块是进行运动补偿的基本单 位。一个宏块包含 4个 8X8的亮度块,依据类的不同,一个宏块还包含两个 8X8色度块 (R-Y和 B-Y各一个, 4:2:0取样时) 或四个 8X8色度块(R-Y和 B-Y各两个, 4:2:2取样时)。 块是进行 DCT运算的单位, 仅包含亮度或仅包 含色度。  The image is divided into 16 × 16 macroblocks in a luminance array. Macroblocks are the basic unit for motion compensation. A macro block contains 4 8X8 luma blocks. Depending on the class, a macro block also contains two 8X8 chroma blocks (one each for RY and BY, when 4: 2: 0 are sampled) or four 8X8 chroma blocks ( RY and BY two each, 4: 2: 2 sampling). A block is a unit for performing DCT operations and contains only brightness or only chrominance.
上面已经提到, MPEG是基于 DCT、 运动补偿和 Huffman编码算法的, 由此, MPEG在压縮中使用了帧内压缩和帧间压缩两种方式。 为了在编码 中实现最大的压縮比, MPEG使用三种类型的图像, g|3l帧, P帧和 B帧。  As mentioned above, MPEG is based on DCT, motion compensation, and Huffman coding algorithms. Therefore, MPEG uses two methods of intra-frame compression and inter-frame compression in compression. To achieve the maximum compression ratio in encoding, MPEG uses three types of images, g | 3l frames, P frames, and B frames.
I帧 (Intra-Fmme)是帧内压缩, 不使用运动补偿, 提供中等的压缩比。 由于 I帧不依赖于其他帧,所以是随机存取的入点,同时是解码中的基准帧。  I-frame (Intra-Fmme) is intra-frame compression, does not use motion compensation, and provides a medium compression ratio. Because I-frames do not depend on other frames, they are the entry point for random access, and they are also the reference frames in decoding.
P帧 (Predicated-Frame)根据前面的 I帧或 P帧进行预测, 使用运动补偿算 法进行压缩, 因而压縮比要比 I帧高, 数据量平均达到 I帧的 1/3左右。 P帧是 对前后的 B帧和后继的 P帧进行解码的基准帧。 P帧本身是有误差的, 如果 P 帧的前一个基准帧也是 P帧, 就会造成误差传播。  The P-frame (Predicated-Frame) performs prediction based on the previous I-frame or P-frame, and uses motion compensation algorithm to compress. Therefore, the compression ratio is higher than that of the I-frame, and the average data volume is about 1/3 of the I-frame. The P frame is a reference frame for decoding the preceding and succeeding B frames and the succeeding P frames. The P frame itself has errors. If the previous reference frame of the P frame is also a P frame, it will cause error propagation.
B帧 (Bidirectinal-Frame)是基于内插重建的帧, 它基于前后的两个 I、 P 帧或 P、 P帧, 它使用双向预测, 数据量平均可以达到 I帧的 1/9左右。 B帧本 身不作为基准, 因此可以在提供更高的压缩比的情况下不传播误差。 B frame (Bidirectinal-Frame) is a frame based on interpolation reconstruction. It is based on two I, P frames or P, P frames before and after. It uses bidirectional prediction, and the average data volume can reach about 1/9 of I frames. B frame The body is not used as a reference, so it can not propagate errors while providing a higher compression ratio.
需要指出的是, 尽管这里使用帧 (Frame)这个词, 但是 MPEG-2本身 没有规定进行数字图像压縮时必须使用帧作为单位,对于隔行的视频图像, 可以使用场 (Field)作为单位。  It should be pointed out that, although the term Frame is used here, MPEG-2 itself does not stipulate that frames must be used as a unit when compressing digital images. For interlaced video images, Field can be used as a unit.
一个 GOP由一串 I、 B、 P帧组成, 起始为 I帧。 GOP中图像的帧数是可 变的, 帧数多可以提供高的压縮比, 但是会造成随机存取的延迟(必须等 到下一个 I帧)和误差的积累 (P帧的误差传播)。 一般是一秒内有两个 I帧, 用来作为随机存取的入口。  A GOP consists of a series of I, B, and P frames, starting with an I frame. The number of frames of an image in a GOP is variable. A large number of frames can provide a high compression ratio, but it will cause random access delay (must wait until the next I frame) and accumulation of errors (error propagation of P frames). Generally, there are two I-frames in one second, which are used as entries for random access.
在 MPEG2中也没有规定 GOP的结构, 帧重复方式可以是 IP, IB, IBP, ΙΒΒΡ, 甚至全部是 I帧。基准帧的重复频率用 M表示, 不同的帧重复频率提 供不同的输出码率,同时影响存取延迟。  The structure of the GOP is not specified in MPEG2, and the frame repetition mode can be IP, IB, IBP, IBPB, or even all I frames. The repetition frequency of the reference frame is represented by M. Different frame repetition frequencies provide different output bit rates and affect the access delay.
在对 M-JPEG、 DV和 MPEG2三种压缩方法进行比较的时候, 提到了 这个棘手的问题: M-JPEG、 DV都能提供精确到帧的随机访问, 然而, 如 果 MPEG2的压缩数据码流是基于 I、 P帧的或 I、 P、 B帧的, 就不能做到 ¾一点。这是由运动补偿压缩算法带来的, 新技术的利与弊就表现在这里。 在一个 GOP中, 要想对 P帧和 B帧进行解码, 必须依赖于 I帧, 所以在访 问一个视频流的时候必须从 I帧入口。 这个问题在不同应用上带来的后果 大相径庭。 比如说, 电视观众在切换节目频道时, 数字视频解码盒等到新 频道的 I帧到来所产生的延迟不足为患,因为每秒至少有两个 I帧,观众不 会在意这小小的延迟。 但是在电视台业务上却有很大的问题。 比如说广告 插播很难控制插入的起始点和长度, 非线性编辑时素材搜索的迟缓等。 因 此, 现有的 MPEG-2码流, 例如 DVD的码流速率是随着画面的内容而变 化的, 这样不利于经过网络传输的实时播放, 会造成解码端的 VBV buffer 的上溢或下溢, 画面会出现马赛克、 阻塞和不流畅, 甚至会导致解码器的 停止工作。 When comparing the three compression methods of M-JPEG, DV and MPEG2, this tricky issue was mentioned: M-JPEG and DV can provide frame-accurate random access. However, if the compressed data stream of MPEG2 is Based on I, P frames or I, P, B frames, this cannot be done. This is brought by the motion compensation compression algorithm, and the advantages and disadvantages of the new technology are shown here. In a GOP, in order to decode P frames and B frames, it must depend on I frames, so when accessing a video stream, it must enter from I frames. The consequences of this problem in different applications vary widely. For example, when a television viewer switches program channels, the delay caused by the digital video decoding box waiting for the arrival of the I-frame of the new channel is not a problem, because there are at least two I-frames per second, and the viewer does not care about this small delay. However, there are major problems in TV station business. For example, it is difficult to control the starting point and length of the insertion, and the slowness of material search during non-linear editing. Therefore, the rate of existing MPEG-2 streams, such as DVD, varies with the content of the picture. This is not conducive to real-time playback through the network transmission, which will cause the decoder's VBV buffer to overflow or underflow, and the picture will appear mosaic, block and fluctuate, and even cause the decoder to stop working.
发明内容  Summary of the Invention
针对现有的 MPEG-2码流不利于搜索定位和编辑,以及因为码流流量的 不均匀, 各个 GOP(Group of Pictures)内帧数及帧的结构不固定, 流量的不 确定性而导致不可能做到随机定位作为随机存取入口的 I帧, 及其对于快进 播放、 快退播放、 剪辑、 定位等造成的无法逾越的困难。 本发明提供一种 克服以上问题的针对 MPEG-2压缩标准压縮的视音频数据的处理方法。  In view of the existing MPEG-2 code stream, it is not conducive to searching, positioning, and editing, and because the stream traffic is not uniform, the number of frames and the frame structure in each GOP (Group of Pictures) are not fixed, and the traffic uncertainty causes It is possible to achieve random positioning of I-frames as random access entries and their insurmountable difficulties caused by fast-forward playback, fast-rewind playback, editing, positioning, and the like. The invention provides a method for processing video and audio data compressed by the MPEG-2 compression standard, which overcomes the above problems.
本发明的技术就是使得 MPEG-2的视频流以 GOP为单位图像帧数恒 定, 而且以 GOP为单位码流长度绝对恒定, 这样保证了视频码流速率的恒 定。 将视频流和音频流等进行复用得到系统流, 音频流原本就是恒流的, 而子图像等的流量和视频流相比较来讲非常小,所以只要给每个 GOP很少 量的冗余, 就可以得到恒流的 MPEG-2节目流。 本发明提供的一种针对 MPEG-2节目流的视音频数据的处理方法, 它包括以下步骤: (1 )对视频 目标文件进行分析, 确定随后步骤中将使用到的参数; (2)将相关视频目 标文件进行预处理; (3) 将解复用得到视频数据流重新编码得到恒流的视 频数据; (4)将恒流后的视频流和抽取的音频 /子图像包进行复用, 并再次 恒流得到最终的数据流。  The technology of the present invention is to make the MPEG-2 video stream have a constant number of image frames in GOP units, and the code stream length in GOP units is absolutely constant, which ensures a constant video stream rate. The video stream and the audio stream are multiplexed to obtain a system stream. The audio stream is originally a constant stream, and the traffic of the sub-image and the like is very small compared to the video stream, so only a small amount of redundancy is required for each GOP , You can get a constant current MPEG-2 program stream. The invention provides a method for processing video and audio data for an MPEG-2 program stream, which includes the following steps: (1) analyzing a video object file to determine parameters to be used in subsequent steps; (2) correlating The video object file is pre-processed; (3) the demultiplexed video stream is re-encoded to obtain constant stream video data; (4) the constant stream video stream and the extracted audio / sub-image packet are multiplexed, and Constant current again to get the final data flow.
本发明能做到相邻 I帧之间的长度固定, 而且相邻 I帧间的帧数固定。 这等价于 GOP的码流长度固定, 且 GOP中的帧数固定。 这样使得搜索定 位、 编辑等操作非常容易, 另一个好处是适合于基于网络的实时播放, 不 会造成解码端的 VBV buffer的上溢或下溢, 画面就不会出现马赛克、 阻塞 和不流畅。 The invention can achieve a fixed length between adjacent I frames and a fixed number of frames between adjacent I frames. This is equivalent to a fixed GOP code stream length and a fixed number of frames in the GOP. This makes searching, positioning, editing, and other operations very easy. Another benefit is that it is suitable for web-based real-time playback. It will cause the overflow or underflow of the VBV buffer on the decoder side, and the picture will not appear mosaic, block and fluctuate.
附图说明  BRIEF DESCRIPTION OF THE DRAWINGS
图 1是本发明的一个实施例的流程图。  FIG. 1 is a flowchart of an embodiment of the present invention.
图 2是本发明在 pulldown情况下的一个 GOP的帧排列示意图。  FIG. 2 is a schematic diagram of a GOP frame arrangement in the pulldown situation of the present invention.
图 3是本发明在无 pulldown情况下的一个 GOP的帧排列示意图。  FIG. 3 is a schematic diagram of a GOP frame arrangement in the present invention without pulldown.
最佳实施例  Best embodiment
本发明的各方面详细实施情况将在结合附图对优选实施例的下述描述 中变得更清楚。  The detailed implementation of various aspects of the present invention will become clearer in the following description of the preferred embodiments with reference to the accompanying drawings.
下文中将出现的三个主要参数 SCR/PTS/DTS是在系统流特定位置表征 的时间标签, 它们都是编码器在数据流中插入的一小段数据。 其中:  The three main parameters SCR / PTS / DTS that will appear in the following are the time tags characterized at specific positions in the system stream. They are all a small piece of data inserted by the encoder in the data stream. among them:
系统时间 (SCR)  System time (SCR)
SCR是系统时钟参考。 至少每隔 0.7秒插入一次。解码器从数据流中提 取 SCR,将 SCR送往图像解码器和音频解码器, 使内部时钟与系统时钟同 步。  SCR is the system clock reference. Insert at least every 0.7 seconds. The decoder extracts the SCR from the data stream and sends the SCR to the image decoder and audio decoder to synchronize the internal clock with the system clock.
显示时间戳(PTS)  Display Timestamp (PTS)
图像可以分为许多"显示单元", 图像的显示单元是帧。 PTS表示显示 单元的显示时间, 解码器检查 PTS并与 SCR比较, 并据此显示图像, 使其 与系统时间同步。  An image can be divided into many "display units", and the display unit of an image is a frame. PTS indicates the display time of the display unit. The decoder checks the PTS and compares it with the SCR, and displays the image accordingly to synchronize it with the system time.
解码时间戳(DTS)  Decoding Timestamp (DTS)
表示该访问单元预计在系统目标解码器中的解码时间。在分级编码中, 相关的 DTS必须与相应的访问单元在所有分级层次中保持一致。 参照图 1 中的步骤 1 : 首先对视频目标文件进行分析, 确定随后步骤 中将使用到的参数。 这些参数包括: 1、解码输出后, 是否每 2帧重复显示 一场图像, 即是否 3:2 pulldown。 3:2 pulldown是指解码输出后, 每 2帧重 复显示一场图像, 这是因为 film 24 s和 NTSC 30 s帧率不同, 在制式转 换中所需要做的调整; 2、切割掉多少包 (视频目标 1,视频目标 2,...,保留最 后一个视频目标节目; 使得纯粹的 pulldown与否;并切割掉黑屏 ); 3、 确 定视频制式, 即是 PAL制或 NTSC制。 4、 确定所需音频流及字幕流的数 据流标识(stream id); 5、确定音频流量 (kbps); 6、确定帧率(frame/seconds)。 Represents the expected decoding time of this access unit in the system's target decoder. In hierarchical coding, the relevant DTS must be consistent with the corresponding access unit in all hierarchical levels. Refer to step 1 in Figure 1: First analyze the video target file to determine the parameters that will be used in subsequent steps. These parameters include: 1. After decoding the output, whether to display an image repeatedly every 2 frames, that is, whether to 3: 2 pulldown. 3: 2 pulldown refers to the repeated display of an image every 2 frames after decoding the output. This is because the frame rates of film 24 s and NTSC 30 s are different. The adjustments needed to be performed in the system conversion; 2. How many packets are cut ( Video target 1, video target 2, ..., keep the last video target program; make pure pulldown or not; and cut off the black screen); 3. Determine the video format, that is, PAL or NTSC. 4. Determine the stream ID of the required audio stream and subtitle stream; 5. Determine the audio traffic (kbps); 6. Determine the frame rate (frame / seconds).
当然,在一些其他的实施例中,可以根据不同的需要选择不同的参数, 这种对于选择的参数种类的简单变化, 并不超出本发明的范围。  Of course, in some other embodiments, different parameters may be selected according to different needs, and such a simple change in the type of the selected parameters does not exceed the scope of the present invention.
参照图 1 中的步骤 2: 将相关视频目标文件进行预处理, 包括下面所 作的内容:  Refer to Step 2 in Figure 1 to preprocess the relevant video object files, including the following:
(a)从截取的视频目标文件中抽取所需的音频和子图像包;  (a) Extract the required audio and sub-image packages from the intercepted video object file;
(b)从截取的视频目标文件中解复用得到视频数据流;  (b) Demultiplexing from the intercepted video object file to obtain a video data stream;
由于截取的第一个 GOP不一定是自包含的, I帧后面的两个 B帧不能 正确的被解码, 本发明采取将第一个 GOP中的 I帧数据重复 2次来覆盖其 后两个 B帧的数据, 这样就不会出现马赛克现象。  Since the first GOP intercepted may not be self-contained, the two B frames following the I frame cannot be decoded correctly. The present invention adopts to repeat the I frame data in the first GOP twice to cover the next two. B-frame data so that no mosaic occurs.
(c)修正抽取的音频、 子图像包的 SCR和 PTS。  (c) Correct the SCR and PTS of the extracted audio and sub-picture packages.
音频包: 音频包的 PTS是指该包内首次出现的音频帧头的显示时间。 将所截取的第一个音频显示单元和第一个视频显示单元的原始 PTS 作比 较, 得出差值, 而经截取后, 视频的第一个视频显示单元的 PTS约为 0.28 秒, 根据差值来修正音频的第一个显示单元的 PTS。 由于每一个音频帧的 显示时间是固定的, 如杜比 ac3的一个音频帧的时间间隔为 32毫秒, 所以 只要知道该包中第一个音频帧头是第几个音频帧, 就可以很容易地算出该 包的 PTS。 而音频包的 SCR为该包的第一个字节到达解码器的预计时间, 而不是第一个音频帧头到达解码器的预计时间,所以本发明的 SCR修正方 案为: SCR (秒) =PTS (秒) -该包第一个音频帧头的位置(byte) /帧尺寸 (byte)*帧时间 (秒) -固定经验值(秒)。 Audio packet: The PTS of an audio packet refers to the display time of the audio frame header that first appears in the packet. The original PTS of the first audio display unit and the first video display unit are compared to obtain the difference. After the interception, the PTS of the first video display unit of the video is about 0.28 seconds. Value to correct the PTS of the first display unit of the audio. Since every audio frame The display time is fixed. For example, the time interval of an audio frame of Dolby AC3 is 32 milliseconds, so as long as the first audio frame header in the packet is the number of audio frames, the PTS of the packet can be easily calculated. . The SCR of an audio packet is the estimated time for the first byte of the packet to reach the decoder, rather than the estimated time for the first audio frame header to reach the decoder, so the SCR correction scheme of the present invention is: SCR (seconds) = PTS (seconds)-the position of the first audio frame header (byte) / frame size (byte) * frame time (seconds) of the packet-fixed experience value (seconds).
由于一些 DVD的视频目标文件中, 视频包的 PTS并非严格按照码流 中的视频帧速率, 如 29.97帧 /秒, 来计算, 而本发明经过重新编码和复用 后的码流严格按照该帧速率来计算 SCR、 PTS, 并据此来复用音视频等码 流, 这样会造成音视频不同步的现象。 这也是这里为什么要修正音频包和 子图像包的 PTS、 SCR的原因。 解决的办法是: 将音频包的 PTS、 SCR 乘以伸缩因子。 伸缩因子是原始视频目标文件中视频流与音频流的理论显 示时间 (根据显示单元的个数计算) 的比值。 伸缩音频的原因是, 这里在 网络上恒定码率的传输码流是基于固定时间内传输固定数目的视频帧数。  Because in some DVD video object files, the PTS of the video packet is not calculated strictly according to the video frame rate in the code stream, such as 29.97 frames / second, and the re-encoded and multiplexed code stream of the present invention strictly follows the frame. Rate to calculate SCR, PTS, and multiplex audio and video streams accordingly, which will cause audio and video to be out of sync. This is why the PTS and SCR of the audio package and the sub-picture package should be corrected here. The solution is: Multiply the PTS and SCR of the audio package by the scaling factor. The scaling factor is the ratio of the theoretical display time (calculated based on the number of display units) of the video stream to the audio stream in the original video object file. The reason for scaling audio is that the constant bit rate transmission code stream on the network is based on the transmission of a fixed number of video frames in a fixed time.
子图像包: 子图像包的出现频率远小于视频和音频, 而且也不具备音 频和视频流的流量恒定的特性。本发明对子图像包的 SCR,PTS的修正方案 也不同于音频包。 本发明根据原始文件中子图像包所在 GOP 的导航包的 SCR与理论上的 SCR的差值来校正子图像包的 SCR、 PTSo  Sub-image packs: Sub-image packs appear much less frequently than video and audio, and they do not have the characteristics of constant flow of audio and video streams. The SCR and PTS correction schemes of the present invention for sub-picture packs are also different from audio packs. The present invention corrects the SCR and PTSo of the sub-image packet according to the difference between the SCR of the navigation packet of the GOP where the sub-image packet is located in the original file and the theoretical SCR.
这些优选步骤的实现将有助于提高视音频的同步性, 当然采用其他习 知的预处理来取得所需的音频和子图像包和视频数据流同样也能实现本发 明。  The implementation of these preferred steps will help improve the synchronization of video and audio. Of course, other known pre-processing can be used to obtain the required audio and sub-image packets and video data streams. The present invention can also be implemented.
参照图 1 中的步骤 3: 将解码复用得到视频数据流重新编码得到恒流 的视频数据。 在此, 视频恒流的定义为: 一个 GOP固定为 12/15帧, 相当 于固定一个 GOP的播放时间;一个 G0P的码流长度固定(以字节为单位); 一个 GOP固定以视频序列头码字 Ox 000001b3开始。 在恒流过程中, 当一 个 G0P的编码长度大于规定值时,进行重新编码(在不能重新编码的情况 下, 截取规定长度, 这样会有马赛克, 应尽量避免); 当一个 GOP的编码 长度小于规定值时, 填充数据 0, 直至长度等于规定值。 Refer to step 3 in Figure 1: Re-encode the video data stream to re-encode the constant stream Video data. Here, the definition of video constant current is: one GOP is fixed at 12/15 frames, which is equivalent to fixing the playback time of one GOP; the length of a GOP stream is fixed (in bytes); one GOP is fixed at the video sequence header The codeword Ox 000001b3 starts. In the constant current process, when the encoding length of a G0P is greater than the specified value, re-encoding (if the re-encoding cannot be performed, the specified length is intercepted, which will cause mosaic, which should be avoided if possible); When the encoding length of a GOP is less When the value is specified, data 0 is filled until the length is equal to the specified value.
图 2是根据本发明在 pulldown情况下所定义的一个 12帧的一个 GOP 的帧排列。  FIG. 2 is a frame arrangement of a 12-frame GOP as defined in the pulldown case according to the present invention.
图 3是根据本发明无 pulldown情况下所定义的一个 15帧的一个 GOP的 根据不同的需要, 同样也可以定义不同长度以及帧结构的 GOP, 很明 显, 这也在本发明的保护范围之内。  FIG. 3 is a GOP of 15 frames defined according to the present invention without pulldown. According to different needs, GOPs of different lengths and frame structures can also be defined. Obviously, this is also within the protection scope of the present invention. .
参照图 1中的步骤 4: 将恒流后的视频流和抽取的音频 /子图像包进行 复用, 并再次恒流得到最终的数据流。 它包括下面的内容:  Refer to step 4 in Figure 1: Multiplex the video stream after the constant stream and the extracted audio / sub-image packets, and then constant stream again to obtain the final data stream. It includes the following:
1、将恒流过的视频流打包复用为系统流。(关键确定 I帧的 SCR,PTS, DTS) 先读取视频的一个访问单元。 然后对这个临时文件中的数据进行打 包。 要注意, 一个 GOP开始时另起一包, 目的是一个 GOP的随机访问; 紧接着 I帧的后面一帧数据另起一包,目的是在快进或快退时对 I帧数据的 访问。  1. Pack and multiplex the constant-current video stream into a system stream. (The key is to determine the SCR, PTS, DTS of the I frame.) First read an access unit of the video. The data in this temporary file is then packaged. It should be noted that a packet starts at the beginning of a GOP, the purpose of which is a random access of the GOP; the next frame of data immediately after the I frame starts another packet, the purpose is to access the data of the I frame when fast forward or rewind.
2、 将抽取的音频和子图像包插入到打好包的视频包中去。  2. Insert the extracted audio and sub-image packages into the packaged video package.
其规则为: 根据 SCR计算该音频或子图像包应该处在第几个 GOP及在该 GOP中的位置, 并按照这个位置进行插入。 3、 使每个 GOP中的包的个数恒定(恒流), 包括视频包、音频包和子 图像包。 如若个数小于规定值, 以全零的包(2048字节)填充之。 如若个 数小于规定值, 则将多出来的音频包和子图像包移到下一个 GOP的开始。 The rules are as follows: According to the SCR, the audio or sub-picture packet should be located in several GOPs and the position in the GOP, and inserted according to this position. 3. Make the number of packets in each GOP constant (constant current), including video packets, audio packets, and sub-image packets. If the number is less than the specified value, it is filled with all-zero packets (2048 bytes). If the number is less than the specified value, the extra audio packets and sub-picture packets are moved to the beginning of the next GOP.
4、修正各个包的 SCR。其规则为: 由于在网络上传输, 发包的时间间 隔固定。 而且本发明的每个 GOP的播放时间固定、 而且每个 GOP的包的 个数固定。包中的时间信息 SCR是指该包的第一个字节预计到达解码器的 时间。 所以每个包的 SCR应该依次均匀地递增。  4. Correct the SCR of each package. The rules are: As the packets are transmitted on the network, the time interval for sending packets is fixed. Furthermore, the playback time of each GOP of the present invention is fixed, and the number of packets per GOP is fixed. The time information in the packet SCR refers to the time when the first byte of the packet is expected to reach the decoder. Therefore, the SCR of each packet should be increased evenly in turn.
值得一提的是, 对视频流和音频 /子图像包进行复用的习知方法有很多 种, 这里说明的只是一种较佳的实施方法, 本领域内的一般技术人员能够 使用其他复用方法来实现此步骤, 而并不超出本发明所揭露的内容。  It is worth mentioning that there are many known methods for multiplexing video streams and audio / sub-image packets. What is described here is only a preferred implementation method. Those skilled in the art can use other multiplexing methods. Method to achieve this step without exceeding the content disclosed by the present invention.
工业实用性  Industrial applicability
本发明的方法能做到相邻 I帧之间的长度固定,而且相邻 I帧间的帧数 固定, 这等价于 GOP的码流长度固定, 且 GOP中的帧数固定, 使得搜索 定位、编辑等操作非常容易。本发明的方法还适合于基于网络的实时播放, 不会造成解码端的 VBV buffer的上溢或下溢, 画面就不会出现马赛克、 阻 塞和不流畅。  The method of the present invention can achieve a fixed length between adjacent I frames and a fixed number of frames between adjacent I frames, which is equivalent to a fixed GOP code stream length and a fixed number of frames in the GOP, so that search and positioning , Editing and other operations are very easy. The method of the present invention is also suitable for network-based real-time playback, which does not cause overflow or underflow of the VBV buffer at the decoding end, and the picture does not appear mosaic, blocking, and fluctuating.

Claims

权利要求 Rights request
1、一种针对 MPEG-2压缩标准的视音频数据的处理方法,其特征在于它包 括以下步骤:  1. A method for processing video and audio data for the MPEG-2 compression standard, which is characterized in that it includes the following steps:
( 1 )对视频目标文件进行分析, 确定随后步骤中将使用到的参数; (1) analyzing the video target file to determine parameters to be used in subsequent steps;
(2)将相关视频目标文件进行预处理; (2) Pre-processing related video object files;
(3)将解复用得到的视频数据流重新编码得到恒流的视频数据; (3) re-encoding the demultiplexed video data stream to obtain constant stream video data;
(4)将恒流后的视频流和抽取的音频 /子图像包进行复用, 并再次恒 流得到最终的数据流。 (4) Multiplexing the constant stream video stream and the extracted audio / sub-image packets, and then constant stream again to obtain the final data stream.
2、根据权利要求 1所述的方法, 其特征在于, 步骤 1中的参数包括但不局 限于: 解码输出后是否是 3:2 pUlldown、切割掉的包数、视频制式、所需音 频流及字幕流的数据流标识、 音频流量、 帧率。 2. The method according to claim 1, wherein the parameters in step 1 include but are not limited to: whether the decoded output is 3 : 2 p U lldown, the number of cut packets, the video format, and the required audio Data stream identification, audio traffic, and frame rate of the stream and subtitle stream.
3、 根据权利要求 1所述的方法, 其特征在于, 步骤 2包括以下步骤: 3. The method according to claim 1, wherein step 2 comprises the following steps:
(a)从截取的视频目标文件中抽取所需的音频和子图像包; (a) Extract the required audio and sub-image packages from the intercepted video object file;
(b)从截取的视频目标文件中解复用得到视频数据流;  (b) Demultiplexing from the intercepted video object file to obtain a video data stream;
(c)修正抽取的音频、 子图像包的 SCR和 PTS。  (c) Correct the SCR and PTS of the extracted audio and sub-picture packages.
4、 根据权利要求 3所述的方法, 其特征在于, 步骤(c) 中音频包 PTS修 正方案为: 将音频包的 PTS乘以伸縮因子, 伸縮因子是原始视频目标文件 中视频流与音频流的理论显示时间 (根据显示单元的个数计算) 的比值。 4. The method according to claim 3, wherein the audio packet PTS correction scheme in step (c) is: multiplying the PTS of the audio packet by a scaling factor, and the scaling factor is the video stream and the audio stream in the original video target file The ratio of the theoretical display time (calculated based on the number of display units).
5、 依据权利要求 3所述的方法, 其特征在于, 步骤(c) 中的音频包 SCR 修正方案为: SCR (秒) =PTS (秒) -该包第一个音频帧头的位置 (byte) /帧尺寸 (byte)*帧时间 (秒) -固定经验值(秒)。 5. The method according to claim 3, wherein the audio packet SCR correction scheme in step (c) is: SCR (seconds) = PTS (seconds)-the position of the first audio frame header of the packet (byte ) / Frame size (byte) * frame time (seconds)-fixed experience value (seconds).
6、 依据权利要求 3所述的方法, 其特征在于, 步骤(c) 中的子图像包的 SCRIPTS修正方案为:根据原始文件中子图像包所在 GOP的导航包的 SCR 与理论上的 SCR的差值来校正子图像包的 SCR、 PTSc 6. The method according to claim 3, wherein the sub-image package in step (c) The SCRIPTS correction scheme is to correct the SCR and PTSc of the sub-image package according to the difference between the SCR of the navigation package of the GOP where the sub-image package is located in the original file and the theoretical SCR.
7、 依据权利要求 1至 6中任何一项所述的方法, 其特征在于, 步骤 (3 ) 中视频恒流的定义为: 一个 GOP固定为 12或 15帧; 一个 GOP的码流长 度固定; 一个 GOP固定以码字 0x 000001b3开始。  7. The method according to any one of claims 1 to 6, wherein the definition of the constant video stream in step (3) is: one GOP is fixed to 12 or 15 frames; the GOP code stream length is fixed; A GOP starts with codeword 0x000001b3.
8、 依据权利要求 1至 6中任何一项所述的方法, 其特征在于, 步骤 (4) 包括以下步骤:  8. The method according to any one of claims 1 to 6, wherein step (4) comprises the following steps:
( 1 ) 将恒流过的视频流打包复用为系统流;  (1) packetizing and multiplexing a constant-current video stream into a system stream;
(2) 将抽取的音频和子图像包插入到打好包的视频流中去;  (2) Insert the extracted audio and sub-image packets into the packaged video stream;
(3 ) 使每个 GOP中的包的个数恒定, 如若个数小于规定值, 以全零 的包填充之。 如若个数小于规定值, 则将多出来的音频包和子图像包移到 下一个 GOP的开始;  (3) Make the number of packets in each GOP constant. If the number is less than the specified value, fill it with all zero packets. If the number is less than the specified value, the extra audio packets and sub-picture packets are moved to the beginning of the next GOP;
(4) 修正各个包的 SCR。  (4) Correct the SCR of each package.
PCT/CN2003/000486 2003-06-23 2003-06-23 Constant stream compression processing method WO2004114666A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2003/000486 WO2004114666A1 (en) 2003-06-23 2003-06-23 Constant stream compression processing method
AU2003248219A AU2003248219A1 (en) 2003-06-23 2003-06-23 Constant stream compression processing method
CN200410049156.5A CN1638480A (en) 2003-06-23 2004-06-22 Video frequency compressing method for motion compensation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2003/000486 WO2004114666A1 (en) 2003-06-23 2003-06-23 Constant stream compression processing method

Publications (1)

Publication Number Publication Date
WO2004114666A1 true WO2004114666A1 (en) 2004-12-29

Family

ID=33520368

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2003/000486 WO2004114666A1 (en) 2003-06-23 2003-06-23 Constant stream compression processing method

Country Status (2)

Country Link
AU (1) AU2003248219A1 (en)
WO (1) WO2004114666A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103733633A (en) * 2011-05-12 2014-04-16 索林科集团 Video analytics system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5929916A (en) * 1995-12-26 1999-07-27 Legall; Didier J. Variable bit rate encoding
EP1045589A2 (en) * 1999-04-16 2000-10-18 Sony United Kingdom Limited Apparatus and method for splicing of encoded video bitstreams
US6215820B1 (en) * 1998-10-12 2001-04-10 Stmicroelectronics S.R.L. Constant bit-rate control in a video coder by way of pre-analysis of a slice of the pictures
US20020094031A1 (en) * 1998-05-29 2002-07-18 International Business Machines Corporation Distributed control strategy for dynamically encoding multiple streams of video data in parallel for multiplexing onto a constant bit rate channel

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5929916A (en) * 1995-12-26 1999-07-27 Legall; Didier J. Variable bit rate encoding
US20020094031A1 (en) * 1998-05-29 2002-07-18 International Business Machines Corporation Distributed control strategy for dynamically encoding multiple streams of video data in parallel for multiplexing onto a constant bit rate channel
US6215820B1 (en) * 1998-10-12 2001-04-10 Stmicroelectronics S.R.L. Constant bit-rate control in a video coder by way of pre-analysis of a slice of the pictures
EP1045589A2 (en) * 1999-04-16 2000-10-18 Sony United Kingdom Limited Apparatus and method for splicing of encoded video bitstreams

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103733633A (en) * 2011-05-12 2014-04-16 索林科集团 Video analytics system

Also Published As

Publication number Publication date
AU2003248219A1 (en) 2005-01-04

Similar Documents

Publication Publication Date Title
KR100711635B1 (en) Picture coding method
US6980594B2 (en) Generation of MPEG slow motion playout
US8355437B2 (en) Video error resilience
KR100420740B1 (en) Encoding device, encoding method, decoding device, decoding method, coding system and coding method
RU2385541C2 (en) Variation of buffer size in coder and decoder
JP4311570B2 (en) Playback apparatus, video decoding apparatus, and synchronous playback method
US7072571B2 (en) Data reproduction transmission apparatus and data reproduction transmission method
US20040212729A1 (en) Method and apparatus for processing a data series including processing priority data
JP2006524948A (en) A method for encoding a picture with a bitstream, a method for decoding a picture from a bitstream, an encoder for encoding a picture with a bitstream, a transmission apparatus and system including an encoder for encoding a picture with a bitstream, bit Decoder for decoding picture from stream, and receiving apparatus and client comprising a decoder for decoding picture from bitstream
TWI495344B (en) Video decoding method
EP1553779A1 (en) Data reduction of video streams by selection of frames and partial deletion of transform coefficients
WO2005071970A1 (en) Method and apparatus for determining timing information from a bit stream
JP2007043719A (en) Method and apparatus for skipping pictures
US20030002583A1 (en) Transcoding of video data streams
EP3360334B1 (en) Digital media splicing system and method
WO2004114666A1 (en) Constant stream compression processing method
JP3469866B2 (en) Method for changing the bit rate of a data stream of an encoded video picture
US20060109906A1 (en) Methods and apparatus for dynamically adjusting f-codes for a digital picture header
JP4569847B2 (en) Data reconstruction device and data reconstruction method
JP2823806B2 (en) Image decoding device
US9219930B1 (en) Method and system for timing media stream modifications
KR101226329B1 (en) Method for channel change in Digital Broadcastings
JP3307163B2 (en) Encoding method and encoding device, and decoding method and decoding device
JP3748240B2 (en) MPEG data recording method
Wells Transparent concatenation of MPEG compression

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP