CN101068366A

CN101068366A - Method and multiplexer based on H.264 multiplex video transcoding multiplexing

Info

Publication number: CN101068366A
Application number: CN200710023476.7A
Authority: CN
Inventors: 方怀东; 柳翀; 鹿宝生; 严肃; 陈启美
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2007-06-05
Filing date: 2007-06-05
Publication date: 2007-11-07
Anticipated expiration: 2027-06-05
Also published as: CN100496129C

Abstract

An escape-code complexing method based on H.264 multi-path video includes applying fast conversion means of MPEG-2 to H.264 code and utilizing H.264 macro-block mode to select relativity to MPEG-2 motion compensation residual error as well as utilizing motion compensation residual error and MB mode as well as mapped H.264 macro-block mode obtained by MPEG-2 decoding to synthesize TS stream and to input multi-path MPEG-2 program stream to escape code complex server by SI interface through PCI bus and outputting escape-coded and complexed single H.264 video stream through PCI bus in ASI interface mode.

Description

Method and multiplexer based on H.264 multiplex video transcoding multiplexing

技术领域technical field

本发明属于数字电视中的视频压缩编码与复用领域。尤其是涉及基于H.264多路视频转码复用的方法和复用器。The invention belongs to the field of video compression encoding and multiplexing in digital television. In particular, it relates to a method and a multiplexer based on H.264 multiplex video transcoding and multiplexing.

背景技术Background technique

近年来移动数字电视在国内迅速发展，但图像带宽束缚了数字视频业务的拓展。为了兼顾码流传输效率和视频图像质量，系统通常的传输速率在6～10Mbps。而数字电视视频录像节目多采用MPEG-2视频压缩标准，图像尺寸较大。比如，标清的MPEG-2码率约为4Mbps，高清的MPEG-2码率约为10Mbps。移动数字电视用户的带宽一般难以满足多路高码率的视频流的实时传输，为了使用户能在较低带宽的情况下能顺利收看更多的移动数字电视节目，需要降低视频流的码率。再加上储存容量的限制和各种不同数字电视终端的出现，使数字电视用户对高效的视频编码技术需求越来越迫切。In recent years, mobile digital TV has developed rapidly in China, but the image bandwidth restricts the expansion of digital video services. In order to give consideration to both code stream transmission efficiency and video image quality, the usual transmission rate of the system is 6-10Mbps. However, most digital TV video recording programs adopt the MPEG-2 video compression standard, and the image size is relatively large. For example, the standard definition MPEG-2 bit rate is about 4Mbps, and the high definition MPEG-2 bit rate is about 10Mbps. The bandwidth of mobile digital TV users is generally difficult to meet the real-time transmission of multiple high-bit-rate video streams. In order to enable users to watch more mobile digital TV programs smoothly under the condition of lower bandwidth, it is necessary to reduce the bit rate of video streams. . Coupled with the limitation of storage capacity and the emergence of various digital TV terminals, digital TV users have increasingly urgent demands for efficient video coding technology.

在数字电视信源未采用低码率、高清晰度的压缩编码标准之前，上述问题目前的解决办法有两个，其一是将高码率的MPEG-2等数字视频进行高压缩，转为低码率的MPEG-2数字视频；其二是将高码率的MPEG-2等数字视频进行转码，转为H.264数字视频。第一种方法将会导致图象质量大幅度下降，显然不可取，第二种方法则会在几乎不降低图象质量的情况下，获得更高压缩效率和更低传输码率。Before the digital TV source adopts low-bit-rate, high-definition compression coding standards, there are two current solutions to the above problems. One is to compress high-bit-rate digital videos such as MPEG-2 into Low bit rate MPEG-2 digital video; the second is to transcode high bit rate MPEG-2 digital video into H.264 digital video. The first method will lead to a substantial decline in image quality, which is obviously not desirable, and the second method will obtain higher compression efficiency and lower transmission bit rate without reducing the image quality.

与MPEG-2相比，H.264在同等图像质量下，能够提高4倍以上的压缩效率。可见上述第二种方法比较可取。但H.264作为单纯的视频压缩标准，没有关于音视频合成及多路复用传输等方面的内容。目前也没有专用的设备实现MPEG-2到H.264的视频转码以及复用。考虑到电视台原有的MPEG-2前端设备数量多且非常昂贵，摒弃已有大量的MPEG-2前端设备，包括数码摄像机、非线性图像节目编辑器，这是不现实的。如何保障图像质量，同时大幅度降低图像的带宽，即构建转码复用专用设备已成当务之急。Compared with MPEG-2, H.264 can increase the compression efficiency by more than 4 times under the same image quality. It can be seen that the above-mentioned second method is preferable. However, as a pure video compression standard, H.264 has no content about audio and video synthesis and multiplexing transmission. At present, there is no dedicated equipment to realize video transcoding and multiplexing from MPEG-2 to H.264. Considering that the original MPEG-2 front-end equipment of the TV station is large and very expensive, it is unrealistic to abandon a large number of existing MPEG-2 front-end equipment, including digital video cameras and non-linear image program editors. How to ensure the image quality while greatly reducing the bandwidth of the image, that is, building a dedicated device for transcoding and multiplexing has become a top priority.

现有技术中未涉及基于H.264多路视频转码复用的方法和复用器。如CN1745573图像拾取设备及其运动图片拍摄方法，在运动图片拍摄模式下工作的图像拾取装置，其中运动图片拍摄开始之前，通过在键输入部分(12)上的快门按钮来指示，将控制部分(10)的时钟频率设置为普通频率，从而减小监控状态下的电力消耗以延长电池寿命，而且其中，当指示运动图片拍摄开始时，由时钟转换控制部分(101)将该时钟频率大幅增加，从而使得在对运动图片数据进行解码处理期间，MPEG转换器(7)能够高速访问存储YUV数据，例如参考数据，搜索数据等的SDRAM(8)，并能够对运动图片进行实时压缩。The method and multiplexer based on H.264 multiplex video transcoding and multiplexing are not involved in the prior art. Such as CN1745573 image pickup device and moving picture shooting method thereof, the image pickup device working under the moving picture shooting mode, wherein before the moving picture shooting starts, it is indicated by the shutter button on the key input part (12), and the control part ( The clock frequency of 10) is set to a common frequency, thereby reducing power consumption in the monitoring state to prolong battery life, and wherein, when the start of motion picture shooting is instructed, the clock frequency is greatly increased by the clock conversion control part (101), Therefore, during the decoding process of moving picture data, the MPEG converter (7) can access SDRAM (8) storing YUV data at high speed, such as reference data, search data, etc., and can compress the moving picture in real time.

CN1567271具备高速网络接口的MPEG码流变换采集方法及装置，在设备内实现传输流的数据过滤、PID修改、服务信息插入和码率变换，设备具有高速以太网接口用于将变换后的目标传输流传送到计算机中。实现码流的直接采集，也能够对码流进行处理。CN1567271 MPEG code stream conversion collection method and device with high-speed network interface, data filtering, PID modification, service information insertion and code rate conversion of transmission stream are realized in the device, and the device has a high-speed Ethernet interface for transmitting the transformed target stream to the computer. Realize the direct acquisition of the code stream, and also process the code stream.

CN1633180基于变换和数据融合的多描述视频编码方法，包括对要编码的信号实施变换1～n；分别对变换1～n后的信号进行量化和熵编码；分别按照各自的路径1～n对量化和熵编码后的信号1～n进行解码；分别对解码后的信号1～n进行逆变换；逆变换后分别得到边描述1～n，将1～n个逆变换后的数据融合成为中心描述等步骤。它能将基于变换和数据融合的多描述编码和视频编码结合起来，对一组视频序列，这种编码方法能产生多个MPEG码流，从每一个码流中都可以还原出一个失真较大的视频序列；当多个码流被收到时，一个失真较小的视频序列将被还原出来。CN1633180 Multi-description video coding method based on transformation and data fusion, including implementing transformation 1 to n on the signal to be coded; respectively performing quantization and entropy coding on the signal after transformation 1 to n; respectively quantizing according to the respective paths 1 to n and entropy-encoded signals 1~n are decoded; the decoded signals 1~n are respectively inversely transformed; after inverse transformation, side descriptions 1~n are respectively obtained, and the 1~n inversely transformed data are fused into a central description and other steps. It can combine multi-description coding and video coding based on transformation and data fusion. For a set of video sequences, this coding method can generate multiple MPEG streams, and can restore a distorted image from each stream. video sequence; when multiple streams are received, a video sequence with less distortion will be restored.

发明内容Contents of the invention

本发明提出了在原有的MPEG-2移动数字电视的基础上，增加专用的H.264视频转码复用服务器，并采用变换域的视频转码算法，降低转码复杂度。用软件方式实现了多路MPEG-2到H.264的转码、H.264视频与音频的复用与解复用及多路H.264节目的复用与解复用。The invention proposes adding a dedicated H.264 video transcoding and multiplexing server on the basis of the original MPEG-2 mobile digital TV, and adopts a video transcoding algorithm in a transform domain to reduce transcoding complexity. The transcoding from MPEG-2 to H.264, the multiplexing and demultiplexing of H.264 video and audio, and the multiplexing and demultiplexing of multiple H.264 programs are realized by software.

本转码复用器的实现所采用的技术方案如下：基于H.264多路视频转码复用的方法和复用器，输入是多路MPEG-2单节目流，输出是一路H.264多节目流，实现MPEG-2到H.264的视频转码、音视频的解复用与复用、多路H.264节目的复用，其视频转码包括码率、分辨率及格式转换。MPEG-2到H.264的视频转码算法采用基于机器学习的转码算法，实现码率、分辨率可调，帧内、帧间采用不同的算法。MPEG-2到H.264码的快速转换方法如下述。在合成TS流时按一定规律重新改写PID值，以避免PID冲突引起的解码器不能正确解码。在合成TS流时对PMT表的流类型字段作相应的修改。多路MPEG-2节目流以ASI接口通过PCI总线输入转码复用服务器，转码复用后的单路H.264视频流通过PCI总线以ASI接口方式输出，并使用FIFO提供的半满信号来读取FIFO数据或写FIFO，以避免CPU频繁访问PCI接口。MPEG-2到H.264的转码以及多路转码后的H.264视频流与音频流复用在同一服务器中完成。The technical scheme adopted in the realization of this transcoding multiplexer is as follows: based on the method and multiplexer of H.264 multi-channel video transcoding multiplexing, the input is a multi-channel MPEG-2 single program stream, and the output is one H.264 Multi-program stream, realizing video transcoding from MPEG-2 to H.264, demultiplexing and multiplexing of audio and video, and multiplexing of multiple H.264 programs. The video transcoding includes code rate, resolution and format conversion . The video transcoding algorithm from MPEG-2 to H.264 adopts a machine learning-based transcoding algorithm to realize adjustable bit rate and resolution, and different algorithms are used within and between frames. The fast conversion method from MPEG-2 to H.264 code is as follows. When synthesizing the TS stream, rewrite the PID value according to a certain rule, so as to avoid the decoder not being able to decode correctly caused by the PID conflict. When synthesizing the TS stream, the stream type field of the PMT table is modified accordingly. Multiple MPEG-2 program streams are input to the transcoding and multiplexing server through the PCI bus through the ASI interface, and the single H.264 video stream after transcoding and multiplexing is output through the PCI bus through the ASI interface, and uses the half-full signal provided by the FIFO To read FIFO data or write FIFO to avoid frequent CPU access to the PCI interface. The transcoding from MPEG-2 to H.264 and the multiplexing of H.264 video stream and audio stream after multiple transcoding are completed in the same server.

TS流由编码后的基本数据流(ES)根据一定的格式打包形成PES包，再加入一些系统信息而构成，在发送端，基本流的PES打包由音/视频编码器完成，复用器接收编码端的音、视频数据流以及辅助数据流，按照一定的复用方法将其交织成为单一的TS流。为了实现音、视频同步，在码流中还必须加入各种时间的标志和系统的控制信息。对于接收端，则正好和发送端过程相反。The TS stream is composed of the encoded elementary data stream (ES) packaged according to a certain format to form a PES package, and then some system information is added. At the sending end, the PES package of the elementary stream is completed by the audio/video encoder, and the multiplexer receives Audio and video data streams and auxiliary data streams at the encoding end are interleaved into a single TS stream according to a certain multiplexing method. In order to realize audio and video synchronization, various time marks and system control information must also be added to the code stream. For the receiving end, it is just the opposite of the sending end process.

MPEG-2到H.264的视频转码：从MPEG-2视频到H.264视频的转码，目前主要有两种架构：基于像素域的级联体系转码(CPDT)和基于DCT域的转码(DDT)。基于像素域的级联体系转码就是先完全解码，在像素域做处理，再重新编码。由于二次编码时编码部分和解码部分在结构上完全独立，因此转码具有很大的灵活性，但是对宏块数据的运动矢量和编码模式都重新作了计算，转码效率低，如全靠软件来实现，难以达到实时的要求。基于DCT域的转码(DDT)直接在DCT域对DCT系数、运动失量等重估计，计算复杂度低，但灵活性受到限制，当要求改变运动矢量、码率、分辨率等，就很难采用这种体系结构。Video transcoding from MPEG-2 to H.264: Transcoding from MPEG-2 video to H.264 video, there are currently two main architectures: cascade system transcoding (CPDT) based on pixel domain and DCT domain-based transcoding Transcoding (DDT). Transcoding based on the cascade system in the pixel domain is to completely decode it first, process it in the pixel domain, and then re-encode. Since the encoding part and the decoding part are completely independent in structure during the secondary encoding, transcoding has great flexibility, but the motion vector and encoding mode of the macroblock data are recalculated, and the transcoding efficiency is low. Realized by software, it is difficult to meet real-time requirements. DCT domain-based transcoding (DDT) directly re-estimates DCT coefficients and motion loss in the DCT domain. The computational complexity is low, but the flexibility is limited. When it is required to change the motion vector, code rate, resolution, etc., it is very difficult Difficult to adopt this architecture.

本发明MPEG-2到H.264码的快速转换方法，利用H.264的宏块模式选择与MPEG-2运动补偿残差间的相关性，将H.264宏块模式选择问题转化为数据分类问题，利用MPEG-2解码得到的运动补偿残差、MB模式、编码块模式(CBPC)直接映射成H.264的宏块模式；在MPEG-2码解码时，保存相关的MB信息，包括MB编码模式、编码块类型(CBPC)、MB残差的均值和方差(以4×4的sub-MB分别计算，共16个均值和方差)，其解码后采用标准的H.264编码器对YUV图像编码，并保存H.264MB编码模式，采用机器学习算法得到决策树，用于H.264编码模式的分类；在MPEG-2码流解码时，获取MPEG-2的MC残差、宏块模式、编码块模式(CBPC)，并计算出4×4子块MC残差的均值和方差；通过决策树获取H.264中宏块编码模式；在H.264编码时，对MB的编码模式直接赋值；H.264编码器的输入为MPEG-2解码后的YUV数据以及MB编码模式，并没有使用MPEG-2的运动矢量，在运动估计时，使用由决策树得到的MB编码模式。其转码算法框图如图1所示。The method for quickly converting MPEG-2 to H.264 codes of the present invention utilizes the correlation between H.264 macroblock mode selection and MPEG-2 motion compensation residuals to convert the H.264 macroblock mode selection problem into data classification The problem is that the motion compensation residual, MB mode, and coded block mode (CBPC) obtained by MPEG-2 decoding are directly mapped to the macroblock mode of H.264; when MPEG-2 code is decoded, relevant MB information is saved, including MB Coding mode, coded block type (CBPC), mean and variance of MB residuals (calculated separately in 4×4 sub-MB, a total of 16 mean and variance), after decoding, use standard H.264 encoder to YUV Image encoding, and save the H.264MB encoding mode, use the machine learning algorithm to obtain a decision tree for the classification of the H.264 encoding mode; when decoding the MPEG-2 stream, obtain the MC residual and macroblock mode of MPEG-2 , coded block mode (CBPC), and calculate the mean and variance of the 4×4 sub-block MC residuals; obtain the macroblock coding mode in H.264 through a decision tree; when encoding H.264, directly encode the MB coding mode Assignment; the input of the H.264 encoder is the MPEG-2 decoded YUV data and the MB coding mode, and does not use the MPEG-2 motion vector. In the motion estimation, the MB coding mode obtained by the decision tree is used. The block diagram of its transcoding algorithm is shown in Figure 1.

得到决策树的方法是：决策树分类应遵循一下原则：The method of obtaining the decision tree is: the classification of the decision tree should follow the following principles:

1)将输入序列分成Intra、Skip、Inter16×16和Inter8×8的分类器；1) Classifiers that divide the input sequence into Intra, Skip, Inter16×16 and Inter8×8;

2)将Inter16×16分成16×16、16×8、8×16的分类器；2) Divide Inter16×16 into 16×16, 16×8, 8×16 classifiers;

3)将inter8×8分成8×8、8×4、4×8、4×4的分类器。3) Divide inter8×8 into 8×8, 8×4, 4×8, 4×4 classifiers.

决策树生成应遵循一下原则：The decision tree generation should follow the following principles:

1)如果MPEG-2MB的MC没有编码，即没有非零MV，4个8×8块没有编码系数，H.264MB将被编码成16×16，需通过决策树二级判别，选择最优模式；1) If the MC of MPEG-2MB is not coded, that is, there is no non-zero MV, and the four 8×8 blocks have no coding coefficients, H.264MB will be coded into 16×16, and the optimal mode needs to be selected through the second-level discrimination of the decision tree ;

2)如果MPEG-2 MB为intra模式，则在H.264中，该MB编码成intra或inter8×8，若编码成intra，算法终止；若为inter8×8，需通过二级判决，选择最优模式；2) If the MPEG-2 MB is in intra mode, then in H.264, the MB is coded as intra or inter8×8. If it is coded as intra, the algorithm terminates; Excellent mode;

3)如果MPEG-2MB为skip模式，在H.264中，该MB也为skip模式。3) If MPEG-2MB is in skip mode, in H.264, this MB is also in skip mode.

4)决策树通过WEKA数据挖掘工具生成。WEKA的数据挖掘程序的文件格式为ARFF(Attribute-Relation File Format)。一个ARFF文件采用ASCII代码书写，反映一组属性间的相互关系。一般包括两个不同的段：1)文件头，包括关系的名称、属性和类型；2)数据。4) The decision tree is generated by WEKA data mining tool. The file format of WEKA's data mining program is ARFF (Attribute-Relation File Format). An ARFF file is written in ASCII code, reflecting the interrelationships between a set of attributes. It generally includes two different sections: 1) the file header, including the name, attribute and type of the relationship; 2) the data.

5)训练集由高码率的MPEG-2序列组成，不包括B帧。决策集由MPEG-2码流解码后，H.264重新编码得到。在H.264编码过程中，量化参数为25，使用RD优化得到宏块编码模式。5) The training set consists of high bit rate MPEG-2 sequences, excluding B frames. The decision set is obtained by H.264 re-encoding after decoding the MPEG-2 code stream. In the H.264 encoding process, the quantization parameter is 25, and the macroblock encoding mode is obtained through RD optimization.

转码决策树包括三个等级，采用3个不同的WEKA树，如图2所示：The transcoding decision tree includes three levels, using three different WEKA trees, as shown in Figure 2:

第一个WEKA决策树，训练数据集使用了MPEG-2一个宏块内16个4×4子块残差的均值和方差、宏块模式(skip、intra和3种non-intra，分别以0、1、2、4、8标识)、编码块模式(CBPC)和H.264 MB的编码模式。The first WEKA decision tree, the training data set uses the mean and variance of 16 4×4 sub-block residuals in a MPEG-2 macro block, macro block mode (skip, intra and 3 kinds of non-intra, respectively with 0 , 1, 2, 4, 8 logo), coding block mode (CBPC) and H.264 MB coding mode.

ARFF数据段的实例行样本用于训练决策树模型，一行代表一个宏块样本。The instance row samples of the ARFF data segment are used to train the decision tree model, and one row represents a macroblock sample.

第二个决策树，训练样本集使用了MPEG-2一个宏块内16个4×4子块残差的均值和方差、宏块模式(3种non-intra)、编码块模式(CBPC)和H.264MB的16×16的子编码模式(16×16，16×8，8×16)。该决策树决定了inter16×16的最终编码模式。For the second decision tree, the training sample set uses the mean and variance of 16 4×4 sub-block residuals in a MPEG-2 macroblock, macroblock mode (3 kinds of non-intra), coded block mode (CBPC) and H.264MB 16×16 sub-coding mode (16×16, 16×8, 8×16). This decision tree determines the final encoding mode of inter16×16.

第三个决策树，训练样本集使用了MPEG-2一个宏块内4个4×4子块残差的均值和方差、宏块模式(3种non-intra)、编码块模式(CBPC)和H.264MB的8×8的子编码模式(8×8，8×4，4×8，4×4)。For the third decision tree, the training sample set uses the mean and variance of 4 4×4 sub-block residuals in a MPEG-2 macroblock, macroblock mode (3 kinds of non-intra), coding block mode (CBPC) and H.264MB 8×8 sub-coding mode (8×8, 8×4, 4×8, 4×4).

基于这些训练文件，通过WEKA数据挖掘工具使用J48算法生成决策树。J48算法由Ross Quinlan提出，在数据挖掘领域有着广泛的应用。Based on these training files, a decision tree is generated using the J48 algorithm through the WEKA data mining tool. The J48 algorithm was proposed by Ross Quinlan and has a wide range of applications in the field of data mining.

TS流复用TS stream multiplexing

对于转码后多路节目的H.264视频，和原来节目的音频按照MPEG-2系统层来实现音、视频数据的复用和同步，并把多路节目合成一路TS流(传输流)进行传输。TS流由编码后的基本数据流(ES)根据一定的格式打包形成PES包，再加入一些系统信息而构成，在发送端，基本流的PES打包由音/视频编码器完成，复用器接收编码端的音、视频数据流以及辅助数据流，按照一定的复用方法将其交织成为单一的TS流。为了实现音、视频同步，在码流中还必须加入各种时间的标志和系统的控制信息。对于接收端，则正好和发送端过程相反。For the H.264 video of the multi-channel program after transcoding, and the audio of the original program, the multiplexing and synchronization of audio and video data are realized according to the MPEG-2 system layer, and the multi-channel program is synthesized into one TS stream (transport stream) for transmission. The TS stream is composed of the encoded elementary data stream (ES) packaged according to a certain format to form a PES package, and then some system information is added. At the sending end, the PES package of the elementary stream is completed by the audio/video encoder, and the multiplexer receives Audio and video data streams and auxiliary data streams at the encoding end are interleaved into a single TS stream according to a certain multiplexing method. In order to realize audio and video synchronization, various time marks and system control information must also be added to the code stream. For the receiving end, it is just the opposite of the sending end process.

传输流可由多个节目构成，而每一个节目可由多个流复合在一起，包括视频流、音频流、节目特殊信息流(PSI)等。其中PSI有四种类型：节目关联表(PAT)、节目映射表(PMT)、网络信息表(NIT)和条件访问表(CAT)。复用器将转码后的H.264视频和原音频按传输流的格式打包。TS包的长度为188字节，分成包头和包负荷两部分。包头4字节前缀是链接字头，包括同步字节0×47和数据包标识PID，从PID可以判断其后面负载的数据类型，是视频流、音频流、PSI还是其它数据包。包负荷是包的实际内容，根据具体情况，可以放置PES包或PSI包。A transport stream can be composed of multiple programs, and each program can be composed of multiple streams, including video streams, audio streams, program specific information streams (PSI), and so on. There are four types of PSI: Program Association Table (PAT), Program Mapping Table (PMT), Network Information Table (NIT) and Conditional Access Table (CAT). The multiplexer packs the transcoded H.264 video and original audio in the transport stream format. The length of the TS packet is 188 bytes, which is divided into two parts: the header and the payload. The 4-byte prefix of the packet header is the link header, including the synchronization byte 0×47 and the data packet identification PID. From the PID, it can be judged whether the data type behind it is a video stream, audio stream, PSI or other data packets. The packet payload is the actual content of the packet, depending on the situation, a PES packet or a PSI packet can be placed.

PSI用来描述传送流的组成结构，在系统中担任极其重要的角色，在多路复用中尤为重要的是PAT表和PMT表。PAT表中给出了一路TS流中有多少套节目，以及它与PMT表PID之间的对应关系；PMT表给出了一套节目的具体组成及与视频、音频等PID的对应关系。PSI is used to describe the composition structure of the transport stream, and plays an extremely important role in the system, especially the PAT table and PMT table in multiplexing. The PAT table shows how many sets of programs there are in one TS stream, and the corresponding relationship between it and the PID of the PMT table; the PMT table gives the specific composition of a set of programs and the corresponding relationship with PIDs such as video and audio.

在转码复用器中，采用软件方式将多路单一节目的MPEG-2传送流(SPTS)转码后复用成多节目一路H.264传送流(MPTS)，它的系统组成框图如图3所示。In the transcoding multiplexer, the MPEG-2 transport stream (SPTS) of multiple single programs is transcoded by software and multiplexed into one H.264 transport stream (MPTS) of multiple programs. Its system composition block diagram is shown in the figure 3.

多路单节目MPEG-2的TS流以ASI接口方式接入，通过PCI总线将节目数据传给转码复用服务器。服务器主要功能是接收4路MPEG-2单路节目传输流，将其视频转成H.264视频，然后复用成一个多路节目的传输流，并且除去空包，重新改写PID值和流类型字段；抽取和处理任何一个接收到的PSI和业务信息(SI)，将其和本地产生的这类数据集成起来；另外，还需要用系统时钟STC来进行节目时钟参考PGR的再标识处理。为完成以上功能，并且尽可能提高系统工作速度，在具体实现考虑了以下几点：Multiple single-program MPEG-2 TS streams are connected through the ASI interface, and the program data is transmitted to the transcoding and multiplexing server through the PCI bus. The main function of the server is to receive 4-channel MPEG-2 single-channel program transport stream, convert its video into H.264 video, and then multiplex it into a multi-channel program transport stream, remove empty packets, and rewrite the PID value and stream type field; extract and process any received PSI and service information (SI), and integrate it with such locally generated data; in addition, the system clock STC is also required to carry out re-identification processing of the program clock reference PGR. In order to complete the above functions and improve the working speed of the system as much as possible, the following points are considered in the specific implementation:

1)为了避免主机CPU频繁访问PCI接口，利用FIFO提供的半满信号，CPU读取FIFO数据或写FIFO。对于输入FIFO，半满时产生中断，CPU响应中断，将FIFO中的数据一次性读入内存缓冲；对于输出FIFO，情况类似，一次性将FIFO写至半满。1) In order to prevent the host CPU from frequently accessing the PCI interface, the CPU reads FIFO data or writes FIFO using the half-full signal provided by FIFO. For the input FIFO, an interrupt is generated when it is half full, and the CPU responds to the interrupt, and reads the data in the FIFO into the memory buffer at one time; for the output FIFO, the situation is similar, and the FIFO is written to half full at one time.

2)节目同步字的识别。要获取一个节目的数据，必须先找到TS流数据包的同步字，由于同步头并非满足唯一透明原则，即负荷中有可能恰为其值，因此需搜索检测。2) Recognition of program sync word. To obtain the data of a program, you must first find the sync word of the TS stream data packet. Since the sync header does not meet the unique transparency principle, that is, the payload may be exactly its value, so it needs to be searched and detected.

3)PID冲突的解决。PID是TS流中负载类型的唯一标识。不同支路MPEG-2码流的PID值可能相同，如果不加修改往往会导致不能正确译码，解决的办法是在合成TS流时按一定的规律重新改写PID值。例如，若节目1的PID为100，以后每检测一道节目，新的PID加1，依次类推。3) Resolution of PID conflicts. PID is the unique identifier of the payload type in the TS stream. The PID values of different branch MPEG-2 code streams may be the same, if not modified, it will often lead to incorrect decoding. The solution is to rewrite the PID values according to certain rules when synthesizing TS streams. For example, if the PID of program 1 is 100, then every time a program is detected, the new PID will be increased by 1, and so on.

4)流类型的修改。由于输入的MPEG-2 TS流的视频格式为MPEG-2，而重新合成的TS流的视频格式为H.264，因此需要对PMT表的流类型字段作相应的修改，修改前MPEG-2的流类型字段为0×02，修改后的流类型字段为0×1b。4) Modification of stream type. Since the video format of the input MPEG-2 TS stream is MPEG-2, and the video format of the recomposed TS stream is H.264, it is necessary to modify the stream type field of the PMT table accordingly. The flow type field is 0×02, and the modified flow type field is 0×1b.

TS流解复用TS stream demultiplexing

TS流的解复用与复用的流程刚好相反，其流程如图4所示。接收端通过检测PID为0包来建立PAT表，由PAT表得到该路TS流所包含各套节目的PMT表的PID，从而建立PMT表。最后由PMT表得到每套节目所对应的音视频包的PID。接收端通过这些PID将对应的音频、视频数据放入缓冲区，以便音视频解码器的解码。The process of demultiplexing and multiplexing of TS streams is just opposite, and its process is shown in Figure 4. The receiving end establishes a PAT table by detecting a packet with a PID of 0, and obtains the PID of the PMT table of each program contained in the TS stream from the PAT table, thereby establishing a PMT table. Finally, the PID of the audio and video package corresponding to each program is obtained from the PMT table. The receiving end puts the corresponding audio and video data into the buffer through these PIDs for decoding by the audio and video decoder.

附图说明Description of drawings

图1是MPEG-2到H.264的视频转码算法框图。Figure 1 is a block diagram of the video transcoding algorithm from MPEG-2 to H.264.

图2MPEG-2到H.264视频转码器决策树框图。Figure 2 MPEG-2 to H.264 video transcoder decision tree block diagram.

图3是多路单节目传送流的转码复用框图。Fig. 3 is a block diagram of transcoding and multiplexing of multiple single-program transport streams.

图4是TS流解复用流程图。Fig. 4 is a flowchart of TS stream demultiplexing.

图5是视频转码在移动数字电视中的应用框图。Fig. 5 is a block diagram of the application of video transcoding in mobile digital TV.

图6是TS流各表PID的对应关系图。FIG. 6 is a diagram showing the correspondence relationship between table PIDs of TS streams.

具体实施方式Detailed ways

在基于MPEG-2的移动数字电视系统中，视频内容主要来自于MPEG-2节目库、卫星电视，以及视频直播节目，通过复用器将多个MPEG-2节目流复用后，进行信道编码调制，然后进行数字电视无线发射。In the mobile digital TV system based on MPEG-2, the video content mainly comes from the MPEG-2 program library, satellite TV, and live video programs. Multiple MPEG-2 program streams are multiplexed through the multiplexer and then channel coded. Modulation, and then digital TV wireless transmission.

引入基于H.264的视频转码复用器后，系统构架如图5所示。它实际上是将MPEG-2节目库与MPEG-2节目流前移，一方面，通过静态转码建立H.264视频节目库，供播放系统选用；另一方面，对卫星电视和视频直播的MPEG-2节目流进行动态实时转码，降低视频流的码率，改变视频流的空间分辨率、帧率，适应后端的传输需求。转码后通过软件复用将多套H.264节目合成一路TS流进行传输。After introducing the H.264-based video transcoding multiplexer, the system architecture is shown in Figure 5. It actually moves the MPEG-2 program library and MPEG-2 program stream forward. On the one hand, the H.264 video program library is established through static transcoding for selection by the playback system; The MPEG-2 program stream is dynamically transcoded in real time to reduce the bit rate of the video stream, change the spatial resolution and frame rate of the video stream, and adapt to the transmission requirements of the backend. After transcoding, multiple sets of H.264 programs are synthesized into one TS stream for transmission through software multiplexing.

多路单节目MPEG-2的TS流以ASI接口方式接入视频转码复用器，通过PCI总线将节目数据传给转码复用服务器。服务器接收多路MPEG-2单路节目传输流，将其视频转成H.264视频，然后复用成一个多路节目的传输流，并通过ASI接口输出。The multi-channel single-program MPEG-2 TS stream is connected to the video transcoding multiplexer through the ASI interface, and the program data is transmitted to the transcoding multiplexing server through the PCI bus. The server receives multiple MPEG-2 single-channel program transport streams, converts its video into H.264 video, and then multiplexes it into a multi-channel program transport stream, and outputs it through the ASI interface.

在输入的MPEG-2单路节目流中，检测到的第一套节目的PID为100，以后每检测到一套节目，在合成TS流时，新的PID加1。由于输入的MPEG-2 TS流的视频格式为MPEG-2，而重新合成的TS流的视频格式为H.264，需要对PMT表的流类型字段作相应的修改，修改前MPEG-2的流类型字段为0×02，修改后的流类型字段为0×1b。In the input MPEG-2 single-channel program stream, the PID of the first program detected is 100, and every time a program is detected thereafter, the new PID is added by 1 when TS stream is synthesized. Since the video format of the input MPEG-2 TS stream is MPEG-2, and the video format of the recomposed TS stream is H.264, it is necessary to modify the stream type field of the PMT table accordingly. The type field is 0×02, and the modified stream type field is 0×1b.

MPEG-2到H.264码的快速转换方法中采用的基于决策树的分类：Decision tree-based classification employed in the fast conversion method of MPEG-2 to H.264 codes:

使用开源的数据挖掘工具WEKA分析MPEG-2宏块残差的均值与方差、编码模式、编码块类型(CBPC)，获取H.264宏块编码模式。该转码器的决策树包括3个WEKA决策树，在图2中以灰色标识。第一个WEKA决策树用于判别skip、Intra、8×8、16×16模式，如果是8×8模式或16×16模式，则使用第二个或第三个决策树判决该MB的最终模式。通过WEKA工具计算出决策树中均值和方差的判决电平。决策树的工作如下：The open source data mining tool WEKA is used to analyze the mean and variance, coding mode, and coding block type (CBPC) of MPEG-2 macroblock residuals to obtain the H.264 macroblock coding mode. The decision tree of this transcoder includes 3 WEKA decision trees, marked in gray in Figure 2. The first WEKA decision tree is used to distinguish skip, Intra, 8×8, and 16×16 modes. If it is 8×8 mode or 16×16 mode, use the second or third decision tree to determine the final decision of the MB model. The decision level of the mean and variance in the decision tree is calculated by the WEKA tool. A decision tree works as follows:

节点1：输入该节点的是MPEG-2编码MB。通过检测MPEG-2MB的残差大小，将MB的编码方式分成4类：skip、Intra、8×8或16×16。Intra决策过程不在专利中讨论，其他情况需根据前面的分类情况进行第二次决策分类。在生成决策树时，将使用以下规则：Node 1: Input to this node is an MPEG-2 encoded MB. By detecting the residual size of MPEG-2MB, MB encoding methods are divided into 4 categories: skip, Intra, 8×8 or 16×16. The Intra decision-making process is not discussed in the patent, and other cases need to be classified according to the previous classification. When building a decision tree, the following rules are used:

1)如果MPEG-2MB的MC没有编码，即没有非零MV，4个8×8块没有编码系数。H.264MB将被编码成16×16。需通过决策树二级判别，选择最优模式。1) If the MC of MPEG-2MB is not coded, that is, there is no non-zero MV, the four 8×8 blocks have no coded coefficients. H.264MB will be encoded as 16×16. It is necessary to select the optimal mode through the secondary discrimination of the decision tree.

2)如果MPEG-2MB为intra模式，则在H.264中，该MB编码成intra或inter8×8。若编码成intra，算法终止；若为inter8×8，需通过二级判决，选择最优模式。2) If the MPEG-2 MB is in intra mode, then in H.264, the MB is coded as intra or inter8×8. If it is encoded as intra, the algorithm terminates; if it is inter8×8, it needs to go through a second-level decision to select the optimal mode.

节点2：输入该节点是由节点1分出的16×16MB，该节点用第二个WEKA决策树，对H.264MB的模式(16×16，16×8或8×16)分类。检测16×8或8×16子块是否生成更好的预测，若判别为16×8或8×16，则为最终的编码模式，否则，将由节点4继续判别。Node 2: Input This node is 16×16MB separated from node 1. This node uses the second WEKA decision tree to classify the H.264MB mode (16×16, 16×8 or 8×16). Detect whether the 16×8 or 8×16 sub-block generates a better prediction. If it is judged as 16×8 or 8×16, it is the final coding mode. Otherwise, node 4 will continue to judge.

节点3：输入该节点的由节点1分出的8×8MB。该节点用第三个WEKA决策树，对H.264 8×8子宏块选择最优模式：8×8，8×4，4×8，4×4。该决策树执行4次，分别对一个宏块内的4个8×8子块进行判别一次，该部分只使用8×8子块内的4个4×4块的均值和方差。Node 3: The 8×8 MB split by Node 1 is input to this node. This node uses the third WEKA decision tree to select the optimal mode for the H.264 8×8 sub-macroblock: 8×8, 8×4, 4×8, 4×4. The decision tree is executed four times, and the four 8×8 sub-blocks in one macroblock are discriminated once, and this part only uses the mean value and variance of the four 4×4 blocks in the 8×8 sub-block.

节点4：输入该节点的是由节点1分出的skip模式块或由节点2分出的16×16模式块。该节点估计H.264 16×16模式(不包括16×8和8×16模式)，选择最优模式是skip或inter16×16。Node 4: The input to this node is the skip pattern block split from node 1 or the 16×16 pattern block split from node 2. This node estimates the H.264 16×16 mode (excluding 16×8 and 8×16 modes), and the optimal mode is skip or inter16×16.

MB模式的判决和门限值的选择由H.264的量化参数(QP)决定，随着QP的不同，均值和方差的门限值也不同。解决这种情况可以有两种方法：1)对每个QP生成一个决策树，在H.264编码时，根据所用的QP值，选择相应的决策树；2)只生成一个决策树，根据QP值调整均值和方差的门限。对于第一种方法，在一个转码器中需生成52个不同的决策树，而每一个又需3个WEKA决策树，因此共需156个WEKA决策树。在H.264中，QP值与量化步长有一定的关系，QP每增加6，量化步长增加一倍，因此可以通过这种关系调整均值和方差的门限值。在该转码器中，采用了第二种方法。生成了QP为25的决策树，其他QP值可以通过调整门限电平来实现。当QP增加6时，门限值提高2.5％，反之降低2.5％。The judgment of the MB mode and the selection of the threshold value are determined by the quantization parameter (QP) of H.264. With the difference of the QP, the threshold value of the mean value and the variance are also different. There are two ways to solve this situation: 1) Generate a decision tree for each QP, and select the corresponding decision tree according to the QP value used during H.264 encoding; 2) Generate only one decision tree, according to the QP The value adjusts the threshold for the mean and variance. For the first method, 52 different decision trees need to be generated in one transcoder, and each needs 3 WEKA decision trees, so a total of 156 WEKA decision trees are needed. In H.264, the QP value has a certain relationship with the quantization step size. For every 6 increase in QP, the quantization step size doubles. Therefore, the threshold value of the mean and variance can be adjusted through this relationship. In this transcoder, the second method is used. A decision tree with QP of 25 is generated, and other QP values can be realized by adjusting the threshold level. When the QP increases by 6, the threshold value increases by 2.5%, otherwise it decreases by 2.5%.

在接收端的TS流解复用，通过检测PID为0的包建立PAT表，由PAT表得到该路TS流所包含各套节目的PMT表的PID，从而建立PMT表。最后由PMT表得到每套节目所对应的音视频包的PID，如图6所示。接收端通过这些PID将对应的音频、视频数据放入缓冲区，由音视频解码器解码。In the demultiplexing of the TS stream at the receiving end, the PAT table is established by detecting the packet with a PID of 0, and the PMT table of the PMT table of each program contained in the TS stream is obtained from the PAT table, thereby establishing the PMT table. Finally, the PID of the audio and video package corresponding to each program is obtained from the PMT table, as shown in FIG. 6 . The receiving end puts the corresponding audio and video data into the buffer through these PIDs, and is decoded by the audio and video decoder.

合成TS流时按一定规律重新改写PID值，例如，若节目1的PID为100，以后每检测一道节目，新的PID加1，依次类推；在合成TS流时对PMT表的流类型字段作相应的修改，修改前MPEG-2的流类型字段为0×02，修改后的流类型字段为0×1b。Rewrite the PID value according to a certain rule when synthesizing TS streams. For example, if the PID of program 1 is 100, the new PID will be increased by 1 for each program detected in the future, and so on; Corresponding modification, the stream type field of MPEG-2 before modification is 0×02, and the stream type field after modification is 0×1b.

基本数据流(ES)根据一定的格式打包形成PES包，再加入一些系统信息(如业务信息(SI)、系统时钟信息等)而构成。The basic data stream (ES) is packaged according to a certain format to form a PES package, and then some system information (such as service information (SI), system clock information, etc.) is added to form it.

PSI用来描述传送流的组成结构，在多路复用中PAT表中给出了一路TS流中有多少套节目，以及它与PMT表PID之间的对应关系；PMT表给出了一套节目的具体组成及与视频、音频等PID的对应关系；并采用流类型的修改：由于输入的MPEG-2TS流的视频格式为MPEG-2，而重新合成的TS流的视频格式为H.264，对PMT表的流类型字段作相应的修改，修改前MPEG-2的流类型字段为0×02，修改后的流类型字段为0×1b。PSI is used to describe the composition structure of the transport stream. In multiplexing, the PAT table shows how many sets of programs there are in a TS stream, and the corresponding relationship between it and the PMT table PID; the PMT table gives a set of programs. The specific composition of the program and the corresponding relationship with PIDs such as video and audio; and the modification of the stream type: since the video format of the input MPEG-2TS stream is MPEG-2, the video format of the re-synthesized TS stream is H.264 , modify the stream type field of the PMT table accordingly, the stream type field of MPEG-2 before modification is 0×02, and the stream type field after modification is 0×1b.

多路单节目MPEG-2的TS流以ASI接口方式接入，通过PCI总线将节目数据传给转码复用服务器；服务器接收4路MPEG-2单路节目传输流，将其视频转成H.264视频，然后复用成一个多路节目的传输流，并且除去空包，重新改写PID值和流类型字段；抽取和处理任何一个接收到的PSI和业务信息(SI)，将其和本地产生的这类数据集成起来。另外，还需要用系统时钟STC来进行节目时钟参考PCR的再标识处理。TS流解复用时，接收端通过检测PID为0包来建立PAT表，由PAT表得到该路TS流所包含各套节目的PMT表的PID，从而建立PMT表；最后由PMT表得到每套节目所对应的音视频包的PID。接收端通过这些PID将对应的音频、视频数据放入缓冲区，由音视频解码器解码。Multi-channel single-program MPEG-2 TS streams are connected through the ASI interface, and the program data is transmitted to the transcoding and multiplexing server through the PCI bus; the server receives 4-channel MPEG-2 single-channel program transport streams and converts its video into H .264 video, and then multiplex it into a transport stream of a multi-channel program, and remove empty packets, rewrite the PID value and stream type field; extract and process any received PSI and service information (SI), and combine it with the local The resulting data are integrated. In addition, it is necessary to use the system clock STC to re-identify the program clock reference PCR. When the TS stream is demultiplexed, the receiving end establishes the PAT table by detecting the PID of 0 packets, and obtains the PID of the PMT table of each program contained in the TS stream from the PAT table, thereby establishing the PMT table; The PID of the audio and video package corresponding to the program. The receiving end puts the corresponding audio and video data into the buffer through these PIDs, and is decoded by the audio and video decoder.

Claims

1. Based on the H.264 multi-channel video transcoding and multiplexing method, it is characterized in that the input is multiple MPEG-2 single program streams, and the output is one H.264 multi-program stream, realizing video from MPEG-2 to H.264 Transcoding, demultiplexing and multiplexing of audio and video, and multiplexing of multiple H.264 programs. The video transcoding includes bit rate, resolution and format conversion; the video transcoding algorithm from MPEG-2 to H.264 adopts A fast conversion method from MPEG-2 to H.264 codes, using the correlation between H.264 macroblock mode selection and MPEG-2 motion compensation residuals, transforms the H.264 macroblock mode selection problem into a data classification problem, The motion compensation residual, MB mode, and coded block mode (CBPC) obtained by MPEG-2 decoding are directly mapped to the macroblock mode of H.264; when MPEG-2 code is decoded, relevant MB information is saved, including MB coding mode , coded block type (CBPC), mean and variance of MB residual, after decoding, use standard H.264 encoder to encode YUV image, and save H.264 MB encoding mode, use machine learning algorithm to obtain decision tree, use Classification of H.264 encoding modes; when decoding MPEG-2 code stream, obtain MPEG-2 MC residual, macroblock mode, coded block mode (CBPC), and calculate 4×4 sub-block MC residual Mean value and variance; Obtain the macroblock coding mode in H.264 through the decision tree; when encoding H.264, directly assign the coding mode of MB; the input of the H.264 encoder is the YUV data after MPEG-2 decoding and MB Coding mode: In motion estimation, use the MB coding mode obtained by the decision tree; realize adjustable code rate and resolution, and use different algorithms within and between frames; and synthesize TS streams, in multiple MPEG-2 program streams The transcoding and multiplexing server is input through the PCI bus through the ASI interface, and the single-channel H.264 video stream after transcoding and multiplexing is output through the PCI bus through the ASI interface.

2. The method for transcoding and multiplexing based on H.264 multi-channel video according to claim 1, characterized in that the CPU reads the FIFO data or writes the FIFO for the half-full signal provided by the FIFO; for the input FIFO, half-full When an interrupt is generated, the CPU responds to the interrupt and reads the data in the FIFO into the memory buffer at one time; for the output FIFO, writes the FIFO to half full at one time.

3. The method for transcoding and multiplexing based on H.264 multiplex video according to claim 1, characterized in that the PID value is rewritten when synthesizing TS streams.

4. The method for transcoding and multiplexing based on H.264 multi-channel video according to claim 1, characterized in that the TS stream is packaged according to a certain format by the encoded elementary data stream (ES) to form a PES package, and then added to the system At the sending end, the PES packaging of the basic stream is completed by the audio/video encoder, and the multiplexer receives the audio, video data stream and auxiliary data stream at the encoding end, and interleaves them into a single TS according to a certain multiplexing method Stream; add various time marks and system control information to the code stream; for the receiving end, it is just the opposite of the sending end process.

5. The method according to claim 1 based on H.264 multiplex video transcoding and multiplexing, characterized in that the transport stream can be composed of multiple programs, and each program can be composed of multiple streams, including video streams, Audio stream, Program Specific Information Stream (PSI); PSI has four types: Program Association Table (PAT), Program Mapping Table (PMT), Network Information Table (NIT) and Conditional Access Table (CAT); the multiplexer will The transcoded H.264 video and original audio are packaged in transport stream format.

6. The method for transcoding and multiplexing based on H.264 multi-channel video according to claim 5, wherein the length of the TS packet is 188 bytes, which is divided into two parts: header and packet load; the 4-byte prefix of the header is a link word Header, including the synchronization byte 0x47 and the data packet identification PID, judge the type of data loaded behind it from the PID, whether it is a video stream, audio stream, PSI or other data packets; the packet load is the actual content of the packet, and place the PES packet or PSI packet .

7. The method for multiplexing based on H.264 multi-channel video transcoding according to claim 1, characterized in that PSI is used to describe the composition structure of the transport stream, and one TS is given in the PAT table in the multiplexing How many sets of programs are there in the stream, and the corresponding relationship between it and the PID in the PMT table; the PMT table gives the specific composition of a set of programs and the corresponding relationship with PIDs such as video and audio; and the modification of the stream type: due to the input The video format of the MPEG-2 TS stream is MPEG-2, and the video format of the re-synthesized TS stream is H.264. The stream type field of the PMT table is modified accordingly. The stream type field of MPEG-2 before modification is 0x02, the modified stream type field is 0x1b.

8. The method for transcoding and multiplexing based on H.264 multi-channel video according to claim 1, characterized in that the multi-channel single-program MPEG-2 TS streams are connected through the ASI interface, and the program data is transmitted through the PCI bus. For the transcoding and multiplexing server; the server receives 4 MPEG-2 single-channel program transport streams, converts its video into H.264 video, and then multiplexes it into a multi-channel program transport stream, removes empty packets, and rewrites the PID Value and Flow Type fields; extract and process any received PSI and Service Information (SI), integrating it with locally generated such data.

9. The method for transcoding and multiplexing based on H.264 multiplexed video according to claim 8, characterized in that the system clock STC is also required to re-identify the program clock reference PCR.

10. The method for transcoding and multiplexing based on H.264 multi-channel video according to claim 1, characterized in that when the TS stream is demultiplexed, the receiving end establishes a PAT table by detecting a packet with a PID of 0, and obtains from the PAT table The TS stream contains the PID of the PMT table of each program, thereby establishing the PMT table; finally, the PID of the audio and video package corresponding to each program is obtained from the PMT table. The receiving end puts the corresponding audio and video data into the buffer through these PIDs, and is decoded by the audio and video decoder.