WO2009045178A1 - Procédé de transcodage d'un flux de données et transcodeur de données - Google Patents

Procédé de transcodage d'un flux de données et transcodeur de données Download PDF

Info

Publication number
WO2009045178A1
WO2009045178A1 PCT/SG2008/000385 SG2008000385W WO2009045178A1 WO 2009045178 A1 WO2009045178 A1 WO 2009045178A1 SG 2008000385 W SG2008000385 W SG 2008000385W WO 2009045178 A1 WO2009045178 A1 WO 2009045178A1
Authority
WO
WIPO (PCT)
Prior art keywords
mode
threshold
inter
motion vector
intra
Prior art date
Application number
PCT/SG2008/000385
Other languages
English (en)
Inventor
Kwong Huang Goh
Dajun Wu
Tuan Kiang Chiew
Jo Yew Tham
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Publication of WO2009045178A1 publication Critical patent/WO2009045178A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/162User input
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present invention relates broadly to a method of transcoding a data stream in a first format to another format, to a data transcoder and to a computer readable data storage medium having stored thereon computer code means for instructing a computer processor to execute a method of transcoding a data stream in a first format to another format.
  • Video transcoding is directed to transforming compressed video bit-streams from one format to another format.
  • Video transcoding can comprise transforming video bit- streams into different bit-rates (also known as transrating) and also into different picture sizes (also known as transcaling).
  • Video transcoding typically enables video content providers to transform video sources with different formats to a desired transmission format and to provide the contents to heterogeneous devices with different hardware capabilities, such as Personal Digital Assistants (PDAs), pocket PCs and mobile phones.
  • Digital video content for Digital Video Discs (DVDs) is typically encoded using the MPEG-2 format.
  • the H.264 format is increasingly attractive to broadcasters due to its superior coding efficiency.
  • transcoding typically uses format transcoding as well as bit-rate and picture size transcoding.
  • a typical use-case for MPEG2 to H.264/AVC (Advanced Video Coding) transcoding is a residential gateway used in households to transcode a DVD movie which is in MPEG2 format into a Common Intermediate Format (CIF) H.264/AVC format and streamed via a wireless network to a portable device such as a mobile phone or a PDA for viewing "on the move".
  • CIF Common Intermediate Format
  • Such usage where high-quality content is streamed onto devices with lower capabilities typically involves transcaling from the so-called D1 resolution to the CIF or even quarter-CIF (QCIF) resolution and, also involves a drop in frame rates.
  • QCIF quarter-CIF
  • transcoding is to decode a bit-stream to obtain reconstructed images and to perform a full re-encoding on these images.
  • This approach is not ideal as the bulk of the re-encoding step is being repeated when it has already been accomplished in the original encoding of the bit-stream and information on various decisions made during the original encoding can be found in the decoded bit-stream. The usage of this information typically determines the efficiency of current transcoding schemes.
  • the MB can be broken into smaller sub-blocks, depending on the activity of the various block sizes.
  • This type of approach can be applied to both inter or intra coded MBs.
  • the analysis of AC coefficients is typically computational intensive.
  • 8x8 DCT coefficients is appreciated to be an accurate approach to check for MB smoothness, it is typically too computational intensive to use for fast transcoding.
  • yet another current method comprises down-sampling of four 16x16-MBs into one 16x16-MB block.
  • This method decides a motion vector (MV) of the resultant sub-sampled MB by using either the mean or the median of the four MVs of the four MBs.
  • MV motion vector
  • Using the mean typically gives rise to inaccuracies when there is one or more MV pointing to directions different from the other MVs.
  • using the median can eliminate one of those MVs pointing away from the other MVs, inaccuracies still arise as part of the resultant sub-sampled MB is not moving in the same direction.
  • this method cannot solve a problem of having all the MVs pointing to different directions. Also, this method cannot provide a way to deal with different inter- mode block-sizes.
  • a method of transcoding a data stream in a first format to another format comprising, decoding the data stream in the first format to obtain one or more reconstructed data frames and a meta data set of the reconstructed data frames; applying a transcoding process on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold.
  • the method may further comprise transcoding the reconstructed data frames to said another format.
  • the method may further comprise predicting a motion vector for the transcoding process based on an availability of a forward motion vector or a backward motion vector from the meta data set.
  • the at least one threshold may comprise a reference picture distance and the forward motion vector may be compared against the reference picture distance.
  • the at least one threshold may further comprise a sum of absolute difference (SAD) threshold
  • the method may further comprise deriving a SAD of the predicted motion vector and comparing the SAD of the predicted motion vector against the SAD threshold.
  • SAD sum of absolute difference
  • the method may further comprise, prior to the step of applying a transcoding process on the reconstructed data frames, comparing a criteria associated with the decoded data stream against an adaptive threshold, and only proceeding with the step of applying a transcoding process on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold if the criteria is smaller than the adaptive threshold.
  • the criteria for comparing against the adaptive threshold may comprise a macroblock bit count.
  • the criteria for comparing against the adaptive threshold may comprise a SAD of a macroblock.
  • the at least one threshold may comprise a quantization step size for determining an intra-mode block size for an intra-mode coding in the transcoding process.
  • the one or more parameters for comparing to the quantization step size may comprise a macrobiock bit count.
  • a 4x4 DC mode may be used for the intra-mode coding and if the macroblock bit count is smaller than the quantization step size, a 16x16 DC mode may be used for the intra- mode coding.
  • the one or more parameters may comprise an inter-coded macroblock bit count for comparing to a function of the quantization step size.
  • an inter-mode coding may be switched to the intra-mode coding.
  • the at least one threshold may comprise a distance threshold for determining an inter-mode coding block size.
  • the one or more parameters for comparing to the distance threshold may comprise a distance between motion vectors of the reconstructed data frames.
  • a 8x8 mode may be used for the inter-mode coding and if the distance between the motion vectors is smaller than the distance threshold, a 16x16 mode may be used for the inter-mode coding.
  • a data transcoder comprising, a decoder module for decoding a data stream in a first format to obtain one or more reconstructed data frames and a meta data set of the reconstructed data frames; an encoding module for applying a transcoding process on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold.
  • the encoding module may be capable of transcoding the reconstructed data frames to another format.
  • the transcoder may further comprise a motion vector derivation module for predicting a motion vector for the transcoding process based on an availability of a forward motion vector or a backward motion vector from the meta data set.
  • the at least one threshold may comprise a reference picture distance and the forward motion vector may be compared against the reference picture distance.
  • the at least one threshold may further comprise a sum of absolute difference (SAD) threshold, and the motion vector derivation module may derive a SAD of the predicted motion vector and compare the SAD of the predicted motion vector against the SAD threshold. If the SAD of the predicted motion vector is larger than the SAD threshold, a re-estimation process of the predicted motion vector may be carried out by the motion vector derivation module.
  • SAD sum of absolute difference
  • the encoder may compare a criteria associated with the decoded data stream against an adaptive threshold, and the encoder only proceeds with the applying a transcoding process on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold if the criteria is smaller than the adaptive threshold.
  • the criteria for comparing against the adaptive threshold may comprise a macroblock bit count.
  • the criteria for comparing against the adaptive threshold may comprise a SAD of a macroblock.
  • the transcoder may further comprise an inter/intra mode and block size decision module and wherein the at least one threshold may comprise a quantization step size for determining an intra-mode block size for an intra-mode coding in the transcoding process.
  • the one or more parameters for comparing to the quantization step size may comprise a macroblock bit count.
  • the inter/intra mode and block size decision module may use a 4x4 DC mode for the intra- mode coding and if the macroblock bit count is smaller than the quantization step size, the inter/intra mode and block size decision module may use a 16x16 DC mode for the intra-mode coding.
  • the one or more parameters may comprise an inter-coded macroblock bit count for comparing to a function of the quantization step size.
  • the inter/intra mode and block size decision module may switch an inter-mode coding to the intra-mode coding.
  • the transcoder may further comprise a down-scaling coding mode decision module, wherein for an inter-mode transcaling process, the at least one threshold may comprise a distance threshold for determining an inter-mode coding block size.
  • the one or more parameters for comparing to the distance threshold may comprise a distance between motion vectors of the reconstructed data frames.
  • the down-scaling coding mode decision module may use a 8x8 mode for the inter-mode coding and if the distance between the motion vectors is smaller than the distance threshold, the down-scaling coding mode decision module may use a 16x16 mode for the inter-mode coding.
  • a computer readable data storage medium having stored thereon computer code means for instructing a computer processor to execute a method of transcoding a data stream in a first format to another format, the method comprising, decoding the data stream in the first format to obtain one or more reconstructed data frames and a meta data set of the reconstructed data frames; applying a transcoding process on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold.
  • Figure 1 is a schematic block diagram illustrating a MPEG-2/H.264 transcoder in an example embodiment.
  • Figure 2 is a schematic diagram illustrating a system architecture of the example embodiment.
  • Figure 3 is a schematic flowchart illustrating prediction of motion vectors and re- estimation decision making in the example embodiment.
  • Figure 4 is a schematic diagram for illustrating how a motion vector (MV) is derived during a transcoding process using a predicted motion vector (PMV) in the example embodiment.
  • MV motion vector
  • PMV predicted motion vector
  • Figure 5 is a schematic diagram illustrating determination of a H.264 mode in the example embodiment.
  • Figure 6 is a schematic diagram illustrating obtaining a motion vector for a Common Intermediate Format (CIF) picture in the H.264 format in the example embodiment.
  • CIF Common Intermediate Format
  • Figure 7 is a schematic diagram illustrating obtaining a motion vector for another
  • Figure 8 is a schematic flowchart 800 for illustrating a method of transcoding a data stream in a first format to another format in an example embodiment.
  • Figure 9 is a schematic illustration of a computer system for implementing a method and system of an example embodiment.
  • the example embodiments described herein can provide a method and a system for transcoding one video format to another video format.
  • a method of transcoding a video format that uses B-pictures, such as MPEG-2 video data, to a video format that does not use B-pictures, such as H.264/AVC video data can be provided.
  • the example embodiments can provide one or more of the following: fast derivation of motion vectors (MVs) for transcoding a higher MPEG-2 profile to a H.264 baseline profile; fast inter/intra coding decisions; and fast coding decisions for transcaling from a hfgher resolution to a lower resolution.
  • MVs motion vectors
  • the present specification also discloses apparatus for performing the operations of the methods.
  • Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Various general purpose machines may be used with programs in accordance with the teachings herein.
  • the construction of more specialized apparatus to perform the required method steps may be appropriate.
  • the structure of a conventional general purpose computer will appear from the description below.
  • the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in the art that the individual steps of the method described herein may be put into effect by computer code.
  • the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
  • the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
  • Such a computer program may be stored on any computer readable medium.
  • the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer.
  • the computer readable medium may also include a hard-wired medium such as exemplified in the Internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
  • the computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the preferred method.
  • a module is a functional hardware unit designed for use with other components or modules.
  • a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist.
  • ASIC Application Specific Integrated Circuit
  • a MB has a size of 16X16. Blocks with smaller sizes such as 8X8 are referred to as sub-blocks.
  • FIG. 1 is a schematic block diagram illustrating a MPEG-2/H.264 transcoder in an example embodiment.
  • the MPEG-2/H.264 transcoder 102 is a pixel-domain transcoder.
  • the transcoder 102 comprises an optimized MPEG-2 decoder 104 that receives a MPEG-2 bitstream at 106 and outputs reconstructed frames 108 and a metadata set 110 pertaining to the decoder stream.
  • the transcoder 102 further comprise a re- encoder 112 that makes use of the meta-data set 110 to perform fast MV determination and mode selection for encoding of the reconstructed frames 108.
  • the meta-data set 110 comprises a MB mode decision and MB MVs.
  • the meta-data set comprises sequence-level information such as picture horizontal size, picture vertical size, Group of Pictures (GOP) size, picture frame rate and sequence encoded bit-rate.
  • the meta-data set comprises picture level information such as picture coding type (e.g. I/P/B type), interlaced/progressive picture, number of bits used for picture, Q_scaie_type (e.g. linear/non-linear) and average picture quantizer.
  • the meta-data set further comprises MB level information such as MB type, MB quantizer scale, MB bitcount (e.g. bits used for the MB) and MVs.
  • FIG. 2 is a schematic diagram illustrating the system architecture of the example embodiment.
  • a decoder and information extraction module 202 is provided to decode the MPEG-2 bitstream 106 ( Figure 1) and to extract information such as the meta-data set 110.
  • a MV derivation module 204 is provided to derive MVs for the H.264 format based on MVs in the MPEG-2 format.
  • An inter/intra mode and block size decision module 206 is provided for deciding whether a MB is smooth or detailed and for deciding the block size for H.264 encoding.
  • a down-scaling coding mode decision module 208 is provided for deciding an inter/intra mode for a transcaled MB 1 for deciding an inter-block size and for determining MVs for the transcaled MB.
  • a rate control module 209 is provided for varying bit-rates during transcoding.
  • a H.264 encoder module 210 is provided for performing re-encoding into H.264 format of the decoded MPEG-2 bitstream 106 ( Figure 1).
  • a baseline profile H.264 is used.
  • the baseline profile uses only the so- called P-picture, for pictures coded as a B-directional type (i.e. so-called B-pictures), the B-Pictures are encoded as P-pictures in H.264.
  • the MVs used in the MPEG-2 encoding are used to predict the MVs for use in the H.264 encoding.
  • the inventors have also recognised that the predicted motion vectors (PMVs) may not be accurate for some cases. Therefore, further re-estimation for such cases is performed by using the relevant PMV as the centre point for a finer search.
  • PMVs predicted motion vectors
  • FIG. 3 is a schematic flowchart illustrating prediction of MVs and re-estimation decision making for each MB in the example embodiment.
  • step 302 for each MPEG- 2 inter-coded MB, it is determined whether there is a forward MV available. If a forward MV is available, the available forward MV is used to derive the PMV for the H.264 P- picture encoding.
  • step 304 it is determined whether the MPEG-2 forward reference picture distance is more than one frame. If the MPEG-2 forward reference picture distance is more than one frame, at step 306, the forward MV is scaled to a one-frame reference picture distance, as the encoding in H.264 comprise each P-picture referencing a previous picture. Further, a motion re-estimation flag is set at step 306. On the other hand, if the MPEG-2 forward reference picture distance is not more than one frame, at step 308, the PMV is set to be equal to the forward MV.
  • step 310 it is determined whether there is a backward MV available. If there is a backward MV available, at step 312, the backward MV is still used to predict the forward MV by "negating" the backward MV and scaling to a one-frame distance. Further, the motion re-estimation flag is set at step 312.
  • FIG. 4 is a schematic diagram for illustrating how a MV is derived during the transcoding process using a PMV in the example embodiment.
  • MPEG-2 is scaled by half (i.e. to a one-frame reference picture distance) and used as the PMV 404 for P2 in H.264.
  • the motion re-estimation flag is set when the MPEG-2 forward reference picture distance is greater than one frame (see step 306) or when there is only a backward MV available (see step 312), i.e. the B-picture in MPEG-2 is backward predicted, which can mean that the forward prediction for use in the H.264 format may not have a good match.
  • a sum of absolute difference (SAD) between the current MB and a motion compensated MB using the PMVs is first checked.
  • the SAD is the sum of absolute pixel to pixel difference between two blocks of pixels.
  • the SAD of the PMVs is compared against a threshold.
  • this threshold is chosen based on experimental simulations. For example, for pixel values ranging from 0-255 (HUE), the threshold value can range from about 256 to about 768. For the example embodiment, the value used for the threshold is about 512. If the SAD is less than the threshold, at step 316, re-estimation is not carried out and the H.264 MVs are set to be equal to the
  • the motion re-estimation is only performed using the PMV as the search centre.
  • the motion re-estimation can be carried out using a variety of motion estimation methods, keeping the PMV as the search centre.
  • step 318 the process proceeds to step 316 to set the H.264 MVs to be equal to the derived PMVs.
  • step 320 the prediction of MVs for H.264 is ended.
  • conditional motion re- estimation is introduced by using an adaptive threshold adjustable according to a user selected percentage, to decide an amount of re-estimation to perform.
  • This additional control can be added prior to step 302 to decide whether to proceed with the transcoding process/algorithm described with reference to Figure 3.
  • a user-defined percentile threshold can be used to decide the re-estimation to be performed, up to a full re- estimation.
  • Two example adaptive threshold-comparison methods are provided.
  • a criteria associated with the decoded data stream such as the MB bit-count is used as a cost measurement to decide whether motion estimation is to be performed e.g. for some badly coded MB.
  • the algorithm in this example is:
  • the initial value of adaptive_threshold1 is about 200.
  • the MPEG2_MB_bit_count can range from about 20 to more than 1000.
  • a criteria associated with the decoded data stream such as the SAD of the MB is used as a cost measurement. That is, the algorithm in this example is:
  • adaptive_threshold2 For example, for pixel values ranging from 0-255 (HUE), the initial value of adaptive_threshold2 is about 512.
  • the thresholds mentioned above are made adaptive according to a user-select optimization percentage or value. That is, the user can choose to optimize for speed or quality. The user can select from 0% speed (i.e. 100% for quality) to 100% speed (i.e. 0% quality). For example, 20% for speed means 80% for quality.
  • the transcode process uses fully the MPEG2 MVs as the MVs for H.264.
  • the transcode process comprises performing full re-estimation of MVs for H.264.
  • the threshold value e.g. adaptive_threshold1 or adaptive_threshold2 depending on algorithm is used
  • the threshold value is then adjusted up or down so that the percentage number of MBs performing re-estimation moves towards the user-selected optimization level.
  • This adaptive threshold adjustment process is useful as a fast trade-off parameter for the user to decide the amount of re-estimation to be performed, so as to obtain a balance between speed (i.e. frame-rate) and coding efficiency (i.e. bit-rate) during the transcoding process.
  • the MV derivation module 204 ( Figure 2) facilitates applying of a transcoding process on the reconstructed data frames, based on comparing one or more parameters from the meta data set against at least one threshold.
  • the transcoding process of the example embodiment re-uses the intra/inter decision during the H.264 encoding. It is noted that the MPEG-2 'skip' mode is not used directly as the skip mode in H.264 because the skip mode definition in each standard is different. Thus, in the example embodiment, when the MPEG-2 skip mode is encountered, the transcoding process uses an inter mode with MVs of a previous MB instead.
  • the inventors have recognised that checking of the directional prediction mode for all 16x16 and 4x4 modes is too computational intensive for fast transcoding. Therefore, the intra-coding modes are limited to a 16x16 DC-mode or a 4x4 DC-mode in the example embodiment.
  • the inventors have recognised that since MPEG-2 intra-coding is based on 8x8 DCT (i.e. 4 blocks for each MB), if each MB is smooth, coding in H.264 in the 16x16 DC mode is beneficial in terms of coding efficiency. However, if the MB has high spatial activity (i.e. not smooth), coding in the 16x16 DC mode may not be efficient.
  • a fast transcoding method is used whereby the bits-used information from the MPEG-2 bit-stream is used to decide whether the
  • H.264 intra-mode coding uses a 16x16 or a 4x4 DC-mode. It has been recognised that with a same quantization value, a smoother MB uses lesser bits than a more detailed
  • the number of bits used to encode per MB in the MPEG-2 format and the quantization-step value for the MBs are used to check on whether each MB is smooth or detailed. Hence, it can be decided whether the MB is to be intra-coded in the
  • H.264 format as a 16x16 or a 4x4 DC-mode.
  • the Activity_threshold function returns from a look-up table a corresponding value of about 250.
  • the Bit_count_for_MB can range from about 100 to more than 1000.
  • the Bit_count_for_MB is the number of bits used to encode the MB in the
  • the Activity_threshold [.] is a look-up table function to determine the quantization-step-size used to encode the MPEG-2 intra-coded MB.
  • the inter/intra mode and block size decision module 206 ( Figure 2) is also used to carry out conditional inter to intra mode switching during the transcoding process.
  • the bit-count is also used to check whether the
  • MPEG2 inter/intra decision is good/appropriate, e.g. an inter-coded MB using a significant number of bits may be better coded in the intra mode instead.
  • a threshold is used in the example embodiment to decide the inter-to-intra mode switching. The algorithm is as follows:
  • the inter_threshold function returns from a look-up table a corresponding value of about 300.
  • the MPEG2_inter_MB_bit_count can range from about 20 to more than 1000.
  • the inter_threshold can be a function of the quantization-step-size used, such that a higher threshold is used for a lower quantization-step-size.
  • This conditional mode switching method is advantageously useful to deal with bad/inappropriate inter-mode decisions in the MPEG-2 encoding.
  • an inefficient software MPEG-2 encoder typically chooses only the inter-coding mode for all P and B pictures without further checking, for simplicity reasons. Therefore, for such bitstreams that contain bad/inappropriate inter-mode decisions (i.e.
  • this fast conditional inter to intra mode-switching of the example embodiment can be useful such that any MB that was to be better coded in the intra mode in the MPEG-2 format can be detected and coded in the intra mode in the H.264 format to achieve better coding efficiency.
  • the inter/intra mode and block size decision module 206 ( Figure 2) facilitates applying of a transcoding process on the reconstructed data frames, based on comparing one or more parameters from the meta data set against at least one threshold.
  • the description below describes an operation of the down-scaling coding mode decision module 208 ( Figure 2) in the example embodiment.
  • the MPEG-2/H.264 transcoder 102 ( Figure 1) can carry out transcaling or decimated coding.
  • the meta-data set (compare 110 of Figure 1) comprises the MB inter/intra mode decision and the MB MVs.
  • FIG. 5 is a schematic diagram illustrating determination of a H.264 mode in the example embodiment.
  • Four MBs 502, 504, 506, 508 each having an intra-coded/inter- coded/skip mode are downsized to a single MB 510 in the H.264 format with a H.264 mode decided during the transcaling process.
  • the inter/intra mode decision making is based on the encoding mode of the corresponding original MPEG-2 MBs 502, 504, 506, 508.
  • the algorithm for the decision-making in transcaling is as follows:
  • the MV for the down-sized picture i.e. a single decimated MB derived from four MBs
  • Figure 6 is a schematic diagram illustrating obtaining a MV for a CIF picture in the H.264 format in the example embodiment.
  • MV 610 for the resultant MB can be determined by the mean of the four respective MVs 602, 604, 606, 608 and the resultant MB is coded in an inter-16x16 mode.
  • the MV 610 is determined as (mean of the four MVs)/2.
  • Figure 7 is a schematic diagram illustrating obtaining a MV for another CIF picture in the H.264 format in the example embodiment.
  • the resultant MB is then coded in a four inter-8x8 mode where each of the resultant 8x8 MVs 710, 712, 714, 716 is a scaled-down version of the original four MVs 702, 704, 706, 708 respectively.
  • the threshold can have a value of about 3 to 6 "pixels apart”.
  • the function d(MV1 , MV2) to compute the distance between 2 MVs can be presented as two examples below.
  • d(mv1 [x,y], mv2[x,y]) (mv1 [x] - mv2[x]) 2 + (mv1 [y] - mv2[y]) 2 eq.1
  • the MVs e.g. 602, 604, 606, 608 of Figure 6 are determined to be close to each other and the inter-16x16 mode is chosen for the decimated MB using the mean of the four MVs scaled by half as its MV e.g. 610 (as shown in Figure 6).
  • the down-scaling coding mode decision module 208 ( Figure 2) facilitates applying of a transcoding process on the reconstructed data frames, based on comparing one or more parameters from the meta data set against at least one threshold.
  • Table 1 below shows simulation results for transcoding six MPEG-2 test streams.
  • the six test streams are transcoded using a typical full re-encode scheme without using MVs metadata, to obtain a control set of data.
  • the six test streams are then transcoded using the MVs metadata and a MV prediction algorithm of the example embodiment (compare Figure 3), to obtain the results.
  • the transcoding is carried out using C-code running on a 1.6GHz notebook PC and the MPEG-2 bit-streams used are in CIF format encoded at 1 and 2 Mbps (see "1M” and "2M”) respectively
  • fps is the frames per second
  • ⁇ fps is the change in frames per second
  • PSNR is the peak signal to noise ratio
  • kbit/frame is the kilo-bits per frame
  • avg ⁇ kbps is the average change in kilo-bits per second.
  • the transcoding speed improvement is about 14% to 20% by using the MV meta-data of the example embodiment.
  • bit-rate see avg ⁇ kbps
  • the difference in bit-rate can be compensated by either transrating or transcaling to achieve a desired bit-rate.
  • the frame-rate improvement can be useful for real-time transcoding and streaming for e.g. home entertainment applications, and for 4- CIF resolution transcoding.
  • Table 2 below shows additional simulation results obtained by incorporating conditional re-estimation provided by the MV derivation module 204 ( Figure 2). See column "Adaptive usage of MPEG-2 MV”.
  • the conditional re-estimation uses an adaptive threshold adjustable according to a user selected percentage, to decide an amount of re-estimation to perform.
  • the column "Adaptive usage of MPEG2 MV” relates to a first decision step on whether to carry out a full encoding in H.264 format without using any metadata based on a cost measurement (compare the examples using adaptive_threshold1 and adaptive_threshold2).
  • the above first decision making step is taken prior to step 302 of Figure 3, ie. prior to the flowchart illustrating the fast MV algorithm using MV metadata of the example embodiment.
  • conditional re-estimation can still provide transcoding speed improvement (see ⁇ fps) of about 5% to about 16%.
  • transcoding speed improvement see ⁇ fps
  • the increase in bit-rate is reduced from the case of using only MVs metadata and the MV prediction algorithm of the example embodiment.
  • Table 3 tabulates simulation results for comparing a control set of data derived from coding using only an intra 16x16 mode against a set of data derived from coding using an intra-mode adaptive block-size decision algorithm of the example embodiment.
  • Table 4 below show the simulation results of using a conditional inter to intra switching algorithm of the example embodiment.
  • the results show that the adaptive intra mode decision algorithm can improve coding efficiency with a reduced bit-rate of up to about 4.7%. Further, the adaptive inter to intra switching also helps to reduce the bit-rate in some data streams when the motion search for inter-mode coding may not be performing well.
  • Table 5 shows the simulation results for a control set of data derived from a typical full-decode-encode scheme without using metadata and a set of data derived from a transcoding process using MPEG-2 inter/intra decision metadata from the example embodiment.
  • Table 5 The table shows a speed improvement of between 1 to 5% in frame-rate (see
  • Table 6 shows simulation results of a control set of data derived from performing typical full re-estimation for comparison to a set of data derived from using an algorithm of the example embodiment to decide the intra/inter mode and adaptively choosing an inter-16x16 or inter-8x8 mode.
  • Table 7 shows simulation results of a control set of data derived from performing typical coding using a non-adaptive inter-block size for comparison to a set of data derived from using an adaptive algorithm for choosing an inter 16x16 or inter 8x8 mode, for decimating a CIF resolution picture to a QCIF resolution picture.
  • the inventors carried out a comparison for the subjective quality of a MPEG-2 encoded picture, a H.264 picture transcoded using a typical full-decode-encode method and a H.264 picture transcoded using a fast transcoding method of the example embodiment.
  • the bit-rates for encoding in the MPEG2 format and for transcoding in the H.264 format were maintained at a rate of 2 Mbps.
  • both the typical full-decode-encode method and the fast transcoding method of the example embodiment do not introduce any significant degradation to the picture quality.
  • a quantizer of the example embodiment can be selected to maintain the quality by varying the bit-rate.
  • Figure 8 is a schematic flowchart 800 for illustrating a method of transcoding a data stream in a first format to another format in an example embodiment. At step 800
  • the data stream in the first format is decoded to obtain one or more reconstructed data frames and a meta data set of the reconstructed data frames.
  • a transcoding process is applied on the reconstructed data frames based on comparing one or more parameters from the meta data set against at least one threshold.
  • the inventors have recognized the MPEG2 basic encoder structure is similar to the H.264/AVC encoding structure.
  • the inventors have recognized that basic MB encoding information such as the MVs and intra/inter decisions can be re-used in the transcoding to the H.264/AVC format process.
  • the inventors have also recognized that MB information such as the MB bit-count and the quantization step-size can also be used to provide useful information on the MB content. This information can be extracted or computed during the MPEG-2 decoding process and provided to the H.264/AVC encoding process.
  • decoding of MPEG-2 bit-streams use far less processing resources (e.g. CPU cycles in software and gates in a field- programmable gate array or FPGA) than the H.264 re-encoding process.
  • processing resources e.g. CPU cycles in software and gates in a field- programmable gate array or FPGA
  • pixel-domain transcoding methods are provided in which MPEG-2 sequences are decoded to obtain reconstructed frames and by using a meta-data set to provide a H.264 encoder with additional information, fast MVs derivations and fast coding decisions can be carried out and hence, encoding complexity can be reduced.
  • the inventors have recognised that typically, transcoding does not change the picture encoding type, i.e., the originally encoded I 1 B and P pictures are transcoded similarly to I, B and P pictures respectively.
  • the inventors have found no prior art for changing the picture type during a transcoding process. Further, it is appreciated that encoder coding with B-pictures is generally more complex and gives rise to bigger memory requirements, memory access and computation.
  • transcoding to a H.264 baseline profile which has no B-picture encoding is carried out for fast transcoding applications (e.g. for real-time D1 resolutions transcoding while maintaining the MPEG-2 picture quality). Further, the above described example embodiment proposes a fast method using the MPEG-2 higher profile MVs to determine MVs for the H.264 baseline profile encoding.
  • a fast method is provided to decide if the MB is smooth or of high activity. Further, fast inter and intra-coding decisions can be made.
  • information from the MPEG2 encoding such as the number of bits used to encode the MPEG2 MB (or known as the bit-count) can be used to determine the MB activity and assist in making fast coding decisions.
  • transcaling from e.g. a D1 or 4CIF resolution to a CIF resolution can be provided which converts a MPEG-2 bit-stream to a down-sized H.264 bitstream at real-time (or for real-time streaming applications).
  • a fast inter/intra mode selection scheme and fast inter-coding block-size decisions can be carried out.
  • a fast and adaptive method can be provided to determine correct coding modes and provide MVs prediction for transcoding for a down-sized picture.
  • a MPEG-2 to H.264 transcoder for real-time software transcoding applications can be provided. Simulation results can show that the algorithms of the above described example embodiment perform relatively well in terms of speed improvement with some bit-rate increment, as compared to a full re-encoding process. With further code optimization, experimental results show that the transcoder of the above described example embodiment is able to perform 'live' full DVD resolution transcoding of a DVD movie at a full frame-rate on a notebook PC and simultaneously streaming of the H.264 video stream to another PC client decoder.
  • the method and system of the example embodiment can be implemented on a computer system 900, schematically shown in Figure 9. It may be implemented as software, such as a computer program being executed within the computer system
  • the computer system 900 comprises a computer module 902, input modules such as a keyboard 904 and mouse 906 and a plurality of output devices such as a display 908, and printer 910.
  • the computer module 902 is connected to a computer network 912 via a suitable transceiver device 914, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
  • LAN Local Area Network
  • WAN Wide Area Network
  • the computer module 902 in the example includes a processor 918, a
  • the computer module 902 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 924 to the display 908, and I/O interface 926 to the keyboard 904.
  • I/O interface 924 to the display 908
  • I/O interface 926 to the keyboard 904.
  • the components of the computer module 902 typically communicate via and interconnected bus 928 and in a manner known to the person skilled in the relevant art.
  • the application program is typically supplied to the user of the computer system 900 encoded on a data storage medium such as a CD-ROM or memory stick and read utilising a corresponding data storage medium drive of a data storage device 930.
  • the application program is read and controlled in its execution by the processor 918.
  • Intermediate storage of program data maybe accomplished using RAM 920.
  • the present invention can be applied to transcoding from a format that uses B-pictures to another format that does not use B-pictures other than the transcoding from higher profile MPEG-2 to H.264 described in the example embodiment.
  • the transcoding can be from a MPEG-4 format that does use B-pictures to H.264 or to a MPEG-2 profile that does not use B-pictures, or from a higher profile MPEG-2 format that does use B-pictures to a MPEG-2 format that does not use B-pictures.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé de transcodage d'un flux de données dans un premier format vers un autre format, et un transcodeur de données et un support de données lisible par ordinateur renfermant des moyens de code informatiques pour donner des instructions à un processeur d'ordinateur lui permettant d'exécuter un procédé de transcodage d'un flux de données dans un premier format vers un autre format. Le procédé comprend le décodage du flux de données dans le premier format pour obtenir une ou des trames de données reconstruites et un ensemble de métadonnées des trames de données reconstruites ; l'application d'un procédé de transcodage sur les trames de données reconstruites en fonction d'une comparaison d'un ou de plusieurs paramètres dérivés de l'ensemble de métadonnées avec au moins un seuil.
PCT/SG2008/000385 2007-10-05 2008-10-06 Procédé de transcodage d'un flux de données et transcodeur de données WO2009045178A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US99790707P 2007-10-05 2007-10-05
US60/997,907 2007-10-05

Publications (1)

Publication Number Publication Date
WO2009045178A1 true WO2009045178A1 (fr) 2009-04-09

Family

ID=40526470

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2008/000385 WO2009045178A1 (fr) 2007-10-05 2008-10-06 Procédé de transcodage d'un flux de données et transcodeur de données

Country Status (1)

Country Link
WO (1) WO2009045178A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105474310A (zh) * 2013-07-22 2016-04-06 弗朗霍夫应用科学研究促进协会 用于低延迟对象元数据编码的装置及方法
US20230171418A1 (en) * 2021-11-30 2023-06-01 Comcast Cable Communications, Llc Method and apparatus for content-driven transcoder coordination

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069936A2 (fr) * 2000-03-13 2001-09-20 Sony Corporation Methode et dispositif permettant de generer des metadonnees compactes sur des indices de transcodage
US20040247030A1 (en) * 2003-06-09 2004-12-09 Andre Wiethoff Method for transcoding an MPEG-2 video stream to a new bitrate
US20050111555A1 (en) * 2003-11-24 2005-05-26 Lg Electronics Inc. System and method for estimating motion vector for transcoding digital video
US20060039473A1 (en) * 2004-08-18 2006-02-23 Stmicroelectronics S.R.L. Method for transcoding compressed video signals, related apparatus and computer program product therefor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069936A2 (fr) * 2000-03-13 2001-09-20 Sony Corporation Methode et dispositif permettant de generer des metadonnees compactes sur des indices de transcodage
US20040247030A1 (en) * 2003-06-09 2004-12-09 Andre Wiethoff Method for transcoding an MPEG-2 video stream to a new bitrate
US20050111555A1 (en) * 2003-11-24 2005-05-26 Lg Electronics Inc. System and method for estimating motion vector for transcoding digital video
US20060039473A1 (en) * 2004-08-18 2006-02-23 Stmicroelectronics S.R.L. Method for transcoding compressed video signals, related apparatus and computer program product therefor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALMAOUI M.: "METADATA DRIVEN MULTIMEDIA TRANSCODING", MASTER DEGREE THESIS, UNIVERSITY OF TORONTO, 2005, Retrieved from the Internet <URL:http://www.dsp.toronto.edu/nectar/projects/uma/AlmaouiMASC05.pdf> *
BO SHEN, ET AL.: "IMAGE/VIDEO TRANSCODING WITH HPL TECHNOLOGY"", HEWLETT PACKARD DEVELOPMENT COMPANY, 27 August 2007 (2007-08-27), Retrieved from the Internet <URL:http://www.hp).hp.com/techreports/2007/HPL-2007-145.pdf> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105474310A (zh) * 2013-07-22 2016-04-06 弗朗霍夫应用科学研究促进协会 用于低延迟对象元数据编码的装置及方法
US20230171418A1 (en) * 2021-11-30 2023-06-01 Comcast Cable Communications, Llc Method and apparatus for content-driven transcoder coordination

Similar Documents

Publication Publication Date Title
Xin et al. Digital video transcoding
KR101032587B1 (ko) 적응형 비디오 프레임 보간법
CA2752080C (fr) Procede et systeme pour l&#39;execution selective de multiples operations de transcodage video
US20120300834A1 (en) Method and System for Efficient Video Transcoding Using Coding Modes, Motion Vectors and Residual Information
Van et al. HEVC backward compatible scalability: A low encoding complexity distributed video coding based approach
Shen et al. A fast downsizing video transcoder for H. 264/AVC with rate-distortion optimal mode decision
Al-Muscati et al. Temporal transcoding of H. 264/AVC video to the scalable format
KR100929607B1 (ko) 엠펙-2 메인 프로파일에서 h.264/avc 베이스라인프로파일로의 트랜스코딩 방법
Lei et al. H. 263 video transcoding for spatial resolution downscaling
US7236529B2 (en) Methods and systems for video transcoding in DCT domain with low complexity
WO2009045178A1 (fr) Procédé de transcodage d&#39;un flux de données et transcodeur de données
Lee et al. MPEG-4 to H. 264 transcoding using macroblock statistics
Tu et al. Fast variable-size block motion estimation for efficient H. 264/AVC encoding
Liu et al. Efficient MPEG-2 to MPEG-4 video transcoding
Nguyen et al. Efficient MPEG‐4 to H. 264/AVC Transcoding with Spatial Downscaling
Lee et al. An efficient algorithm for VC-1 to H. 264 video transcoding in progressive compression
Goh et al. Real-time software MPEG-2 TO H. 264 video transcoding
Metoevi et al. Efficient MPEG-4 to H. 264 transcoding exploiting MPEG-4 block modes, motion vectors, and residuals
DinhQuoc et al. An iterative algorithm for efficient adaptive GOP size in transform domain Wyner-Ziv video coding
Bialkowski et al. Fast video transcoding from H. 263 to H. 264/MPEG-4 AVC
Pantoja et al. P-frame transcoding in VC-1 to H. 264 transcoders
Lee et al. MPEG-4 to H. 264 transcoding with frame rate reduction
Liu et al. MPEG video transcoding with joint temporal-spatial rate control
Li et al. Motion information exploitation in H. 264 frame skipping transcoding
Pereira et al. Efficient transcoding of an MPEG-2 bit stream to an H. 264 bit stream

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08836283

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08836283

Country of ref document: EP

Kind code of ref document: A1