US20110182365A1 - Method and System for Parallel Processing Video Data - Google Patents
Method and System for Parallel Processing Video Data Download PDFInfo
- Publication number
- US20110182365A1 US20110182365A1 US13/079,923 US201113079923A US2011182365A1 US 20110182365 A1 US20110182365 A1 US 20110182365A1 US 201113079923 A US201113079923 A US 201113079923A US 2011182365 A1 US2011182365 A1 US 2011182365A1
- Authority
- US
- United States
- Prior art keywords
- pictures
- compression parameters
- group
- picture
- encoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- Video encoders can be very computationally intense devices. They may be responsible for determining a large set of spatial and temporal descriptors from which the best candidates are selected for final encoding. The best candidate selection is based on the premise that the source video signal must be transformed to fit within a communication channel or storage media while maintaining a certain level of quality. Sometimes multiple candidates are chosen during sub-level encoding. For example, macroblock encoding may be carried out multiple times to achieve best output quality. The complete encoding cycle may be repeated, and filtering may be added to optimize video quality.
- Hardware and/or software scaling can be performed to reduce the number of compute cycles.
- parallel processing can be adopted. The degree to which this parallelism is carried out can depend on specific application requirements.
- FIG. 1 is a block diagram of a first exemplary system for parallel processing video data in accordance with an embodiment of the present invention
- FIG. 2 is a block diagram of a second exemplary system for parallel processing video data in accordance with an embodiment of the present invention
- FIG. 3 is a flow diagram of an exemplary method for parallel processing video data in accordance with an embodiment of the present invention
- FIG. 4 is a block diagram of an exemplary video encoding system that comprises compression parameter generation in accordance with an embodiment of the present invention.
- FIG. 5 is a flow diagram of another exemplary method for parallel processing video data in accordance with an embodiment of the present invention.
- a system and method are presented for parallel processing.
- Pictures in a video sequence can be partitioned into a group of pictures (GOP).
- a GOP is typically one second or less in duration.
- compression parameters such as a number of picture bits (Ta) and picture quantization scale (Qa) can be produced by an encoder device and used as estimates for the encoding of a future picture. Compression parameters of different picture types may be stored separately and used for the encoding of a future picture of the same type.
- the amount of encoding time it takes to compress a video sequence of M GOPs can be defined as ET.
- ET the amount of time it takes to encode M GOPs.
- the output created by a system with N encoder devices may not have the same quality as a system that uses a single encoder device.
- the quality difference can be due to artificial seams that are created where the N encoder devices begin processing.
- an open-loop seam may occur that impacts quality.
- the encoding time for a video sequence of M GOPs using N encoder devices would be (ET/N+D) where D is a delay associated with the time it takes to encode one or more pictures.
- the parameter D in inversely related to number of artificial seams that may occur.
- delay D is zero, all N encoder devices start at the same time and there may be (N ⁇ 1) instances where compression parameters are not transferred for the encoding of future pictures. It should be noted that a greater architectural advantage is achieved when the number of GOPs (M) in video sequence is larger than the number of encoder devices (N).
- a bounded quality degradation can be used to compute a fixed allowable delay D.
- the delay D is (N ⁇ 1), and the encoding time is ET/N+(N ⁇ 1).
- FIG. 1 depicts a first exemplary system 100 comprising 8 encoder devices in accordance with an embodiment of the present invention.
- the delay D can be derived from following formula:
- the ceiling function Ceiling(y) outputs the smallest integer number larger than y
- floor function Floor(z) outputs the largest integer number smaller than z.
- one open-loop seam at 106 is created during the encoding of GOPs 101 - 108
- one open-loop seam at 114 is created during the encoding of GOPs 109 - 116 .
- a commutator 120 can reorder the outputs of the encoder devices E 1 -E 8 for transmition or storage in the system 100 .
- FIG. 2 depicts a second exemplary system 200 comprising of 8 encoder devices in accordance with an embodiment of the present invention.
- D the number of open loop seams (OL) per N GOPs is derived from the following set of equations:
- R 1 ( R 0 ⁇ 1) ⁇ 2 ⁇ S 1
- the delay D may be associated with the time it takes to encode one or more pictures.
- the compression parameters may be associated with one or more pictures.
- a phase 1 begins with the simultaneous encoding of GOPs 101 , 104 , and 107 by encoder devices E 1 , E 4 , and E 7 respectively.
- Encoder devices E 2 , E 3 , E 5 , E 6 , and E 8 encode GOPs 102 , 103 , 105 , 106 , and 108 respectively.
- the delay ensures that the first pictures of GOPs 101 , 104 , and 107 are ready in compressed form prior to phase 2 , and their statistical parameters can be used to estimate new compression parameters for the first pictures of phase 2 GOPs 102 , 103 , 105 , 106 , and 108 .
- the first picture of GOP 102 can be estimated from the first picture of GOP 101 in forward mode
- the first picture of GOP 103 can be estimated from the first picture of GOP 104 in backward mode
- the first picture of GOP 105 can be estimated from the first picture of GOP 104 in forward mode
- the first picture of GOP 106 can be estimated from the first picture of GOP 107 in backward mode
- the first picture of GOP 108 can be estimated from the first picture of GOP 107 in forward mode.
- the first pictures of GOPs 104 and 107 are processed in open-loop fashion.
- GOPs 101 - 108 are run on encoder devices E 1 -E 8 respectively.
- the first picture of any video sequence (e.g. the first picture of GOP 101 ) is also started in open-loop mode.
- a commutator 120 can reorder the outputs of the encoder devices E 1 -E 8 for transmition or storage in the system 200 .
- the multi-encoder system 200 is presented where the input sequence is partitioned into multiple GOP phases.
- Each GOP phase is associated with an encoding delay such that the first GOP phase has zero delay, followed by a second GOP phase started after a delay.
- the delay can be equal to encoding one or more picture in the first phase. Additional GOP phases may also be included after another delay.
- Each GOP phase is comprised of non-contiguous GOPs that may be generated by sub-sampling the said input sequence in GOP resolution and in a non-uniform fashion. The choice of the number of non-contiguous GOPs in first phase and the nominal value of the delay determine the number of open-loop seams in the output stream.
- the phase one is comprised of GOPs 101 , 104 , and 107 and the phase two is comprised of GOPs 102 , 103 , 105 , 106 , and 108 .
- Selecting GOPs 101 , 102 , and 107 for phase one can create additional open-loop seams since GOP 102 cannot be utilized in bi-directional estimation mode in the way GOP 104 can be used.
- the number of open-loop seams OL (and therefore, a measure of quality) can be approximated from the following formula:
- R 0 ( N ⁇ 1) ⁇ (2+ D ) ⁇ S 0
- R i ( R i ⁇ 1 ⁇ 1) ⁇ (2 +D ⁇ i ) ⁇ S i
- delay D By selecting several values of delay D, a range of open loop seams OL can be pre-computed, and consequently for a target value of OL, the corresponding D can be known a priori. Therefore, a measure of quality of service and/or system delay is readily available.
- a multi-encoding system based on parallelizing encoder devices can balance quality of service (associated with number of open-loop seams) and delay. This balance can be determined according to the equations described above.
- FIG. 3 is a flow diagram 300 of an exemplary method for parallel processing video data in accordance with an embodiment of the present invention.
- Pictures in a video sequence can be partitioned into a group of pictures (GOP).
- a GOP is typically one second or less in duration.
- compression parameters such as a number of picture bits (Ta) and picture quantization scale (Qa) can be produced by an encoder device and used as estimates for the encoding of a future picture. Compression parameters of different picture types may be stored separately and used for the encoding of a future picture of the same type.
- the amount of encoding time it takes to compress a video sequence can be decreased if more than one encoder device is used.
- the output created by a system with parallel encoder devices may not have the same quality as a system that uses a single encoder device as a result of artificial seams that are created where the encoder devices begin processing. When compression parameters from previous pictures are not available, an open-loop seam may occur that impacts quality.
- An example parallel system may have at least three encoder devices that can process different groups of pictures.
- a first encoder device is utilized to generate a first set of compression parameters for a first group of pictures
- a second encoder device is utilized to generate a second set of compression parameters for a second group of pictures.
- the first encoder device and the second encoder device can be run in parallel and started simultaneously.
- a third encoder device is utilized at 305 to encode a third group of pictures based on at least one of the first set of compression parameters and the second set of compression parameters.
- compression parameters such as a number of picture bits or a picture quantizer scale to the third encoder device based on the first and/or second group of picture, a parameter estimation loop can be closed to improve quality.
- This invention can be applied to video data encoded with a wide variety of standards, one of which is H.264.
- H.264 An overview of H.264 will now be given. A description of an exemplary system for scene change detection in H.264 will also be given.
- video is encoded on a macroblock-by-macroblock basis.
- video can be compressed while preserving image quality through a combination of spatial, temporal, and spectral compression techniques.
- QoS Quality of Service
- video compression systems exploit the redundancies in video sources to de-correlate spatial, temporal, and spectral sample dependencies.
- Statistical redundancies that remain embedded in the video stream are distinguished through higher order correlations via entropy coders.
- Advanced entropy coders can take advantage of context modeling to adapt to changes in the source and achieve better compaction.
- An H.264 encoder can generate three types of coded pictures: Intra-coded (I), Predictive (P), and Bidirectional (B) pictures.
- I picture Intra-coded (I), Predictive (P), and Bidirectional (B) pictures.
- I pictures are referenced during the encoding of other picture types and are coded with the least amount of compression.
- Each macroblock in a P picture includes motion compensation with respect to another picture.
- Each macroblock in a B picture is interpolated and uses two reference pictures.
- the picture type I uses the exploitation of spatial redundancies while types P and B use exploitations of both spatial and temporal redundancies.
- I pictures require more bits than P pictures, and P pictures require more bits than B pictures.
- the video encoder 400 comprises a rate controller 401 , a motion estimator 403 , a motion compensator 405 , a spatial predictor 407 , a mode decision engine 409 , a transformer/quantizer 411 , an entropy encoder 413 , an inverse transformer/quantizer 415 , and a deblocking filter 417 .
- the spatial predictor 407 uses the contents of a current picture for prediction. Spatially predicted partitions are intra-coded. Luma macroblocks can be divided into 4 ⁇ 4 or 16 ⁇ 16 partitions and chroma macroblocks can be divided into 8 ⁇ 8 partitions. 16 ⁇ 16 and 8 ⁇ 8 partitions each have 4 possible prediction modes, and 4 ⁇ 4 partitions have 9 possible prediction modes.
- the motion estimator 403 generates motion vector that predicts the partitions in the current picture from reference partitions out of the deblocking filter 417 .
- a temporally encoded macroblock can be divided into 16 ⁇ 8, 8 ⁇ 16, 8 ⁇ 8, 4 ⁇ 8, 8 ⁇ 4, or 4 ⁇ 4 partitions. Each partition of a 16 ⁇ 16 macroblock is compared to one or more prediction blocks in previously encoded picture that may be temporally located before or after the current picture.
- the motion compensator 405 receives the motion vectors from the motion estimator 403 and generates a temporal prediction. Motion compensation runs along with the main encoding loop to allow intra-prediction macroblock pipelining.
- the mode decision engine 409 will receive the spatial prediction and temporal prediction and select the prediction mode according to a sum of absolute transformed difference (SATD) cost that optimizes rate and distortion. A selected prediction is output.
- SATD absolute transformed difference
- a corresponding prediction error is the difference 419 between the current picture and the selected prediction.
- the transformer/quantizer 411 transforms the prediction error and produces quantized transform coefficients.
- Transformation in H.264 utilizes Adaptive Block-size Transforms (ABT).
- ABT Adaptive Block-size Transforms
- the block size used for transform coding of the prediction error corresponds to the block size used for prediction.
- the prediction error is transformed independently of the block mode by means of a low-complexity 4 ⁇ 4 matrix that together with an appropriate scaling in the quantization stage approximates the 4 ⁇ 4 Discrete Cosine Transform (DCT).
- DCT Discrete Cosine Transform
- the Transform is applied in both horizontal and vertical directions.
- Quantization in H.264 utilizes 52 quantization parameters (QP) that specify 52 different quantization step sizes.
- QP quantization parameters
- a lower QP corresponds to a smaller step size and finer resolution.
- the rate controller 401 will adjust a nominal QP level to maintain a specified bit rate profile.
- the rate controller 401 can generate compression parameters such as picture bits (Ta) and picture quantization scale (Qa).
- the compression parameters generated in one encoder device can be used by another encoder device as estimates for the encoding of a future picture.
- a first encoding device can be encoding a first GOP that begins with an I picture.
- the rate controller 401 of the first encoding device can generate compression parameters based on the I picture and pass said compression parameters to a rate controller in a second device.
- the second encoding device can then encode a second GOP that begins with an I picture.
- the parallel encoder devices gain the predictive ability of a serial encoding device. After a delay equal to the time it takes to encode one picture, the two parallel encoder devices can process a video stream twice as fast as a single encoding device without loss of quality.
- H.264 specifies two types of entropy coding:
- CABAC Context-based Adaptive Binary Arithmetic Coding
- CAVLC Context-based Adaptive Variable-Length Coding
- the quantized transform coefficients are also fed into an inverse transformer/quantizer 415 to produce a regenerated error.
- the selected prediction and the regenerated error are summed 421 to regenerate a reference picture that is passed through the deblocking filter 417 and used for motion estimation.
- FIG. 5 is a flow diagram 500 of another exemplary method for parallel processing video data in accordance with an embodiment of the present invention.
- a first set of compression parameters for a first group of pictures is generated.
- a second set of compression parameters for a second group of pictures is generated.
- the generation of the first set of compression parameters and the generation of the second set of compression parameters are simultaneous.
- a third group of pictures is encoded based on the first set of compression parameters that are passed forward.
- the first set of compression parameters comprises a number of picture bits in an I picture that appears in display order prior to the third group of pictures.
- a third group of pictures is encoded based on the second set of compression parameters that are passed backward.
- the second set of compression parameters comprises a number of picture bits in an I picture that appears in display order after to the third group of pictures.
- the third group of pictures may contain more than one I picture.
- inventions described herein may be implemented as a board level product, as a single chip, application specific integrated circuit (ASIC), or with individual encoder devices integrated with other portions of the system as separate components.
- An integrated circuit may store encoded and unencoded video data in memory and use an arithmetic logic to encode, detect, and format the video output.
- the degree of integration and the number of encoder devices in the parallel encoder circuit will primarily be determined by the size, speed, and cost considerations. Because of the sophisticated nature of modern processors, it is possible to utilize a commercially available processor, which may be implemented external to an ASIC implementation.
- processor is available as an ASIC core or logic block, then the commercially available processor can be implemented as part of an ASIC device wherein certain functions can be implemented in firmware as instructions stored in a memory. Alternatively, the functions can be implemented as hardware accelerator units controlled by the processor.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
- [Not Applicable]
- [Not Applicable]
- [Not Applicable]
- Video encoders can be very computationally intense devices. They may be responsible for determining a large set of spatial and temporal descriptors from which the best candidates are selected for final encoding. The best candidate selection is based on the premise that the source video signal must be transformed to fit within a communication channel or storage media while maintaining a certain level of quality. Sometimes multiple candidates are chosen during sub-level encoding. For example, macroblock encoding may be carried out multiple times to achieve best output quality. The complete encoding cycle may be repeated, and filtering may be added to optimize video quality.
- Hardware and/or software scaling can be performed to reduce the number of compute cycles. For applications where scaling creates an unacceptable quality degradation, parallel processing can be adopted. The degree to which this parallelism is carried out can depend on specific application requirements.
- Although parallel processing speeds up the encoding task, certain strategies may be required to improve quality.
- Limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
- Described herein are system(s) and method(s) for parallel processing video data, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
- These and other advantages and novel features of the present invention will be more fully understood from the following description.
-
FIG. 1 is a block diagram of a first exemplary system for parallel processing video data in accordance with an embodiment of the present invention; -
FIG. 2 is a block diagram of a second exemplary system for parallel processing video data in accordance with an embodiment of the present invention; -
FIG. 3 is a flow diagram of an exemplary method for parallel processing video data in accordance with an embodiment of the present invention; -
FIG. 4 is a block diagram of an exemplary video encoding system that comprises compression parameter generation in accordance with an embodiment of the present invention; and -
FIG. 5 is a flow diagram of another exemplary method for parallel processing video data in accordance with an embodiment of the present invention. - According to certain aspects of the present invention, a system and method are presented for parallel processing. Pictures in a video sequence can be partitioned into a group of pictures (GOP). A GOP is typically one second or less in duration. In order to maintain a bit-rate during GOP encoding, compression parameters such as a number of picture bits (Ta) and picture quantization scale (Qa) can be produced by an encoder device and used as estimates for the encoding of a future picture. Compression parameters of different picture types may be stored separately and used for the encoding of a future picture of the same type.
- The amount of encoding time it takes to compress a video sequence of M GOPs can be defined as ET. Ideally if N encoder devices are used and started simultaneously, the amount of time it takes to encode M GOPs can be (ET/N). However, the output created by a system with N encoder devices may not have the same quality as a system that uses a single encoder device. The quality difference can be due to artificial seams that are created where the N encoder devices begin processing. When compression parameters from previous pictures are not available, an open-loop seam may occur that impacts quality. By transferring compression parameters such as a number of picture bits or a picture quantizer scale, a parameter estimation loop can be closed.
- To allow compression parameters to be transferred in the system with N encoder devices, a processing delay is added. The encoding time for a video sequence of M GOPs using N encoder devices would be (ET/N+D) where D is a delay associated with the time it takes to encode one or more pictures. The parameter D in inversely related to number of artificial seams that may occur. When delay D is zero, all N encoder devices start at the same time and there may be (N−1) instances where compression parameters are not transferred for the encoding of future pictures. It should be noted that a greater architectural advantage is achieved when the number of GOPs (M) in video sequence is larger than the number of encoder devices (N).
- There is a tradeoff between system delay D and quality degradation due to open-loop seams. A bounded quality degradation can be used to compute a fixed allowable delay D. For the case of zero open-loop seams per N devices, the delay D is (N−1), and the encoding time is ET/N+(N−1).
-
FIG. 1 depicts a firstexemplary system 100 comprising 8 encoder devices in accordance with an embodiment of the present invention. For the case of one open-loop seam per N devices, the delay D can be derived from following formula: -
D=max [N−α,Ceiling((α−2)/2)] -
α=Floor((2×(N+1))/3) - The ceiling function Ceiling(y) outputs the smallest integer number larger than y, and floor function Floor(z) outputs the largest integer number smaller than z.
- For a compression system comprised of eight encoder devices E1-E8, one open-loop seam at 106 is created during the encoding of GOPs 101-108, and one open-loop seam at 114 is created during the encoding of GOPs 109-116. This configuration can be accomplished by a delay D=2.
- A
commutator 120 can reorder the outputs of the encoder devices E1-E8 for transmition or storage in thesystem 100. -
FIG. 2 depicts a secondexemplary system 200 comprising of 8 encoder devices in accordance with an embodiment of the present invention. For a delay D=1, the number of open loop seams (OL) per N GOPs is derived from the following set of equations: -
OL=S 0 +S 1 +R 1 - With,
-
S 0=Floor((N−1)/3) -
S 1=Floor((R 1−1)/2) -
R 1=(R 0−1)−2×S 1 -
R 0=(N−1)−3×S 0 - A minimum delay of D=1 can be achieved in a system with 8 encoder devices E1-E8 and 2 open-loop seams per 8 GOPs. The delay D=1 is the time it takes for compression parameters to be generated in one encoding device and passed to another encoding device. The delay D may be associated with the time it takes to encode one or more pictures. The compression parameters may be associated with one or more pictures. In 200, a phase 1 begins with the simultaneous encoding of
GOPs GOPs GOPs GOPs GOP 102 can be estimated from the first picture ofGOP 101 in forward mode, the first picture ofGOP 103 can be estimated from the first picture ofGOP 104 in backward mode, the first picture ofGOP 105 can be estimated from the first picture ofGOP 104 in forward mode, the first picture ofGOP 106 can be estimated from the first picture ofGOP 107 in backward mode, and the first picture ofGOP 108 can be estimated from the first picture ofGOP 107 in forward mode. In this case the first pictures ofGOPs - A
commutator 120 can reorder the outputs of the encoder devices E1-E8 for transmition or storage in thesystem 200. - The
multi-encoder system 200 is presented where the input sequence is partitioned into multiple GOP phases. Each GOP phase is associated with an encoding delay such that the first GOP phase has zero delay, followed by a second GOP phase started after a delay. The delay can be equal to encoding one or more picture in the first phase. Additional GOP phases may also be included after another delay. Each GOP phase is comprised of non-contiguous GOPs that may be generated by sub-sampling the said input sequence in GOP resolution and in a non-uniform fashion. The choice of the number of non-contiguous GOPs in first phase and the nominal value of the delay determine the number of open-loop seams in the output stream. - In order to minimize delay and the number of open-loop seams, it is important to sub-sample the pre-defined N GOPs such that the first phase contains GOPs that can be used in both forward and backward estimation modes for future phases. This concept can be extended if there are multiple phases in the sequence. For example in
FIG. 2 with N=8 encoder devices, the phase one is comprised ofGOPs GOPs GOPs GOP 102 cannot be utilized in bi-directional estimation mode in theway GOP 104 can be used. - For a compression system based on N encoder devices, and a desired delay D (other than zero), the number of open-loop seams OL (and therefore, a measure of quality) can be approximated from the following formula:
-
OL=(ΣS i)+R D i=0,1, . . . ,D - With,
-
S 0=Floor((N−1)/(2+D)) -
R 0=(N−1)−(2+D)×S 0 -
S i=Floor((R i−1−1)/(2+D−i)) -
R i=(R i−1−1)−(2+D−i)×S i - For N=16 the following OL and D combinations are possible:
-
OL D 5 1 4 2 3 3 - By selecting several values of delay D, a range of open loop seams OL can be pre-computed, and consequently for a target value of OL, the corresponding D can be known a priori. Therefore, a measure of quality of service and/or system delay is readily available.
- A multi-encoding system based on parallelizing encoder devices can balance quality of service (associated with number of open-loop seams) and delay. This balance can be determined according to the equations described above.
-
FIG. 3 is a flow diagram 300 of an exemplary method for parallel processing video data in accordance with an embodiment of the present invention. Pictures in a video sequence can be partitioned into a group of pictures (GOP). A GOP is typically one second or less in duration. In order to maintain a bit-rate during GOP encoding, compression parameters such as a number of picture bits (Ta) and picture quantization scale (Qa) can be produced by an encoder device and used as estimates for the encoding of a future picture. Compression parameters of different picture types may be stored separately and used for the encoding of a future picture of the same type. - The amount of encoding time it takes to compress a video sequence can be decreased if more than one encoder device is used. The output created by a system with parallel encoder devices may not have the same quality as a system that uses a single encoder device as a result of artificial seams that are created where the encoder devices begin processing. When compression parameters from previous pictures are not available, an open-loop seam may occur that impacts quality.
- An example parallel system may have at least three encoder devices that can process different groups of pictures. At 301, a first encoder device is utilized to generate a first set of compression parameters for a first group of pictures, and at 303, a second encoder device is utilized to generate a second set of compression parameters for a second group of pictures. The first encoder device and the second encoder device can be run in parallel and started simultaneously.
- Following the simultaneous parameter generation at 301 and 303, a third encoder device is utilized at 305 to encode a third group of pictures based on at least one of the first set of compression parameters and the second set of compression parameters. By transferring compression parameters such as a number of picture bits or a picture quantizer scale to the third encoder device based on the first and/or second group of picture, a parameter estimation loop can be closed to improve quality.
- This invention can be applied to video data encoded with a wide variety of standards, one of which is H.264. An overview of H.264 will now be given. A description of an exemplary system for scene change detection in H.264 will also be given.
- The ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) drafted a video coding standard titled ITU-T Recommendation H.264 and ISO/IEC MPEG-4 Advanced Video Coding, which is incorporated herein by reference for all purposes. In the H.264 standard, video is encoded on a macroblock-by-macroblock basis.
- By using the H.264 compression standard, video can be compressed while preserving image quality through a combination of spatial, temporal, and spectral compression techniques. To achieve a given Quality of Service (QoS) within a small data bandwidth, video compression systems exploit the redundancies in video sources to de-correlate spatial, temporal, and spectral sample dependencies. Statistical redundancies that remain embedded in the video stream are distinguished through higher order correlations via entropy coders. Advanced entropy coders can take advantage of context modeling to adapt to changes in the source and achieve better compaction.
- An H.264 encoder can generate three types of coded pictures: Intra-coded (I), Predictive (P), and Bidirectional (B) pictures. Each macroblock in an I picture is encoded independently of other pictures based on a transformation, quantization, and entropy coding. I pictures are referenced during the encoding of other picture types and are coded with the least amount of compression. Each macroblock in a P picture includes motion compensation with respect to another picture. Each macroblock in a B picture is interpolated and uses two reference pictures. The picture type I uses the exploitation of spatial redundancies while types P and B use exploitations of both spatial and temporal redundancies. Typically, I pictures require more bits than P pictures, and P pictures require more bits than B pictures.
- Referring now to
FIG. 4 , there is illustrated a block diagram of anexemplary video encoder 400. Thevideo encoder 400 comprises arate controller 401, amotion estimator 403, amotion compensator 405, aspatial predictor 407, amode decision engine 409, a transformer/quantizer 411, anentropy encoder 413, an inverse transformer/quantizer 415, and adeblocking filter 417. - The
spatial predictor 407 uses the contents of a current picture for prediction. Spatially predicted partitions are intra-coded. Luma macroblocks can be divided into 4×4 or 16×16 partitions and chroma macroblocks can be divided into 8×8 partitions. 16×16 and 8×8 partitions each have 4 possible prediction modes, and 4×4 partitions have 9 possible prediction modes. - The
motion estimator 403 generates motion vector that predicts the partitions in the current picture from reference partitions out of thedeblocking filter 417. A temporally encoded macroblock can be divided into 16×8, 8×16, 8×8, 4×8, 8×4, or 4×4 partitions. Each partition of a 16×16 macroblock is compared to one or more prediction blocks in previously encoded picture that may be temporally located before or after the current picture. - The
motion compensator 405 receives the motion vectors from themotion estimator 403 and generates a temporal prediction. Motion compensation runs along with the main encoding loop to allow intra-prediction macroblock pipelining. - The
mode decision engine 409 will receive the spatial prediction and temporal prediction and select the prediction mode according to a sum of absolute transformed difference (SATD) cost that optimizes rate and distortion. A selected prediction is output. - Once the mode is selected, a corresponding prediction error is the
difference 419 between the current picture and the selected prediction. The transformer/quantizer 411 transforms the prediction error and produces quantized transform coefficients. - Transformation in H.264 utilizes Adaptive Block-size Transforms (ABT). The block size used for transform coding of the prediction error corresponds to the block size used for prediction. The prediction error is transformed independently of the block mode by means of a low-
complexity 4×4 matrix that together with an appropriate scaling in the quantization stage approximates the 4×4 Discrete Cosine Transform (DCT). The Transform is applied in both horizontal and vertical directions. When a macroblock is encoded as intra 16×16, the DC coefficients of all 16 4×4 blocks are further transformed with a 4×4 Hardamard Transform. - Quantization in H.264 utilizes 52 quantization parameters (QP) that specify 52 different quantization step sizes. A lower QP corresponds to a smaller step size and finer resolution. During the encoding process, the
rate controller 401 will adjust a nominal QP level to maintain a specified bit rate profile. - While maintaining the bit rate profile, the
rate controller 401 can generate compression parameters such as picture bits (Ta) and picture quantization scale (Qa). The compression parameters generated in one encoder device can be used by another encoder device as estimates for the encoding of a future picture. For example, a first encoding device can be encoding a first GOP that begins with an I picture. Therate controller 401 of the first encoding device can generate compression parameters based on the I picture and pass said compression parameters to a rate controller in a second device. The second encoding device can then encode a second GOP that begins with an I picture. In this example the parallel encoder devices gain the predictive ability of a serial encoding device. After a delay equal to the time it takes to encode one picture, the two parallel encoder devices can process a video stream twice as fast as a single encoding device without loss of quality. - H.264 specifies two types of entropy coding:
- Context-based Adaptive Binary Arithmetic Coding (CABAC) and Context-based Adaptive Variable-Length Coding (CAVLC). The
entropy encoder 413 receives the quantized transform coefficients and produces a video output. In the case of temporal prediction, a set of picture reference indices may be entropy encoded as well. - The quantized transform coefficients are also fed into an inverse transformer/
quantizer 415 to produce a regenerated error. The selected prediction and the regenerated error are summed 421 to regenerate a reference picture that is passed through thedeblocking filter 417 and used for motion estimation. -
FIG. 5 is a flow diagram 500 of another exemplary method for parallel processing video data in accordance with an embodiment of the present invention. In 501, a first set of compression parameters for a first group of pictures is generated. In 503, a second set of compression parameters for a second group of pictures is generated. The generation of the first set of compression parameters and the generation of the second set of compression parameters are simultaneous. In 505, a third group of pictures is encoded based on the first set of compression parameters that are passed forward. The first set of compression parameters comprises a number of picture bits in an I picture that appears in display order prior to the third group of pictures. In 507, a third group of pictures is encoded based on the second set of compression parameters that are passed backward. The second set of compression parameters comprises a number of picture bits in an I picture that appears in display order after to the third group of pictures. The third group of pictures may contain more than one I picture. - The embodiments described herein may be implemented as a board level product, as a single chip, application specific integrated circuit (ASIC), or with individual encoder devices integrated with other portions of the system as separate components. An integrated circuit may store encoded and unencoded video data in memory and use an arithmetic logic to encode, detect, and format the video output.
- The degree of integration and the number of encoder devices in the parallel encoder circuit will primarily be determined by the size, speed, and cost considerations. Because of the sophisticated nature of modern processors, it is possible to utilize a commercially available processor, which may be implemented external to an ASIC implementation.
- If the processor is available as an ASIC core or logic block, then the commercially available processor can be implemented as part of an ASIC device wherein certain functions can be implemented in firmware as instructions stored in a memory. Alternatively, the functions can be implemented as hardware accelerator units controlled by the processor.
- While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention.
- Additionally, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. For example, although the invention has been described with a particular emphasis on one encoding standard, the invention can be applied to a wide variety of standards.
- Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/079,923 US9271004B2 (en) | 2005-04-22 | 2011-04-05 | Method and system for parallel processing video data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/112,628 US7920633B2 (en) | 2005-04-22 | 2005-04-22 | Method and system for parallel processing video data |
US13/079,923 US9271004B2 (en) | 2005-04-22 | 2011-04-05 | Method and system for parallel processing video data |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/112,628 Continuation US7920633B2 (en) | 2005-04-22 | 2005-04-22 | Method and system for parallel processing video data |
Publications (2)
Publication Number | Publication Date |
---|---|
US20110182365A1 true US20110182365A1 (en) | 2011-07-28 |
US9271004B2 US9271004B2 (en) | 2016-02-23 |
Family
ID=37186846
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/112,628 Active 2030-02-03 US7920633B2 (en) | 2005-04-22 | 2005-04-22 | Method and system for parallel processing video data |
US13/079,923 Active 2027-10-18 US9271004B2 (en) | 2005-04-22 | 2011-04-05 | Method and system for parallel processing video data |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/112,628 Active 2030-02-03 US7920633B2 (en) | 2005-04-22 | 2005-04-22 | Method and system for parallel processing video data |
Country Status (1)
Country | Link |
---|---|
US (2) | US7920633B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10856023B2 (en) | 2016-04-12 | 2020-12-01 | Sony Corporation | Transmission apparatus, transmission method, reception apparatus, and reception method |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080152014A1 (en) * | 2006-12-21 | 2008-06-26 | On Demand Microelectronics | Method and apparatus for encoding and decoding of video streams |
US8411734B2 (en) * | 2007-02-06 | 2013-04-02 | Microsoft Corporation | Scalable multi-thread video decoding |
CA2689441C (en) * | 2007-06-14 | 2015-11-24 | Thomson Licensing | A system and method for time optimized encoding |
US9648325B2 (en) | 2007-06-30 | 2017-05-09 | Microsoft Technology Licensing, Llc | Video decoding implementations for a graphics processing unit |
US8265144B2 (en) | 2007-06-30 | 2012-09-11 | Microsoft Corporation | Innovations in video decoder implementations |
US8649426B2 (en) * | 2008-09-18 | 2014-02-11 | Magor Communications Corporation | Low latency high resolution video encoding |
US8311115B2 (en) | 2009-01-29 | 2012-11-13 | Microsoft Corporation | Video encoding using previously calculated motion information |
US8396114B2 (en) | 2009-01-29 | 2013-03-12 | Microsoft Corporation | Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming |
US8270473B2 (en) | 2009-06-12 | 2012-09-18 | Microsoft Corporation | Motion based dynamic resolution multiple bit rate video encoding |
US8705616B2 (en) | 2010-06-11 | 2014-04-22 | Microsoft Corporation | Parallel multiple bitrate video encoding to reduce latency and dependences between groups of pictures |
US8885729B2 (en) | 2010-12-13 | 2014-11-11 | Microsoft Corporation | Low-latency video decoding |
US9706214B2 (en) | 2010-12-24 | 2017-07-11 | Microsoft Technology Licensing, Llc | Image and video decoding implementations |
US9049459B2 (en) | 2011-10-17 | 2015-06-02 | Exaimage Corporation | Video multi-codec encoders |
US8750383B2 (en) | 2011-01-17 | 2014-06-10 | Exaimage Corporation | Systems and methods for wavelet and channel-based high definition video encoding |
MY189650A (en) | 2011-06-30 | 2022-02-23 | Microsoft Technology Licensing Llc | Reducing latency in video encoding and decoding |
US8731067B2 (en) | 2011-08-31 | 2014-05-20 | Microsoft Corporation | Memory management for video decoding |
US9591318B2 (en) * | 2011-09-16 | 2017-03-07 | Microsoft Technology Licensing, Llc | Multi-layer encoding and decoding |
US9819949B2 (en) | 2011-12-16 | 2017-11-14 | Microsoft Technology Licensing, Llc | Hardware-accelerated decoding of scalable video bitstreams |
US11089343B2 (en) | 2012-01-11 | 2021-08-10 | Microsoft Technology Licensing, Llc | Capability advertisement, configuration and control for video coding and decoding |
US10349069B2 (en) * | 2012-12-11 | 2019-07-09 | Sony Interactive Entertainment Inc. | Software hardware hybrid video encoder |
US20140169481A1 (en) * | 2012-12-19 | 2014-06-19 | Ati Technologies Ulc | Scalable high throughput video encoder |
US9924165B1 (en) * | 2013-07-03 | 2018-03-20 | Ambarella, Inc. | Interleaved video coding pipeline |
US9596470B1 (en) | 2013-09-27 | 2017-03-14 | Ambarella, Inc. | Tree-coded video compression with coupled pipelines |
US11936864B2 (en) | 2019-11-07 | 2024-03-19 | Bitmovin, Inc. | Fast multi-rate encoding for adaptive streaming using machine learning |
US11546401B2 (en) | 2019-11-07 | 2023-01-03 | Bitmovin, Inc. | Fast multi-rate encoding for adaptive HTTP streaming |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6434196B1 (en) * | 1998-04-03 | 2002-08-13 | Sarnoff Corporation | Method and apparatus for encoding video information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5986712A (en) * | 1998-01-08 | 1999-11-16 | Thomson Consumer Electronics, Inc. | Hybrid global/local bit rate control |
-
2005
- 2005-04-22 US US11/112,628 patent/US7920633B2/en active Active
-
2011
- 2011-04-05 US US13/079,923 patent/US9271004B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6434196B1 (en) * | 1998-04-03 | 2002-08-13 | Sarnoff Corporation | Method and apparatus for encoding video information |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10856023B2 (en) | 2016-04-12 | 2020-12-01 | Sony Corporation | Transmission apparatus, transmission method, reception apparatus, and reception method |
Also Published As
Publication number | Publication date |
---|---|
US7920633B2 (en) | 2011-04-05 |
US9271004B2 (en) | 2016-02-23 |
US20060239343A1 (en) | 2006-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7920633B2 (en) | Method and system for parallel processing video data | |
US7822116B2 (en) | Method and system for rate estimation in a video encoder | |
US8908765B2 (en) | Method and apparatus for performing motion estimation | |
US20060198439A1 (en) | Method and system for mode decision in a video encoder | |
CA2703775C (en) | Method and apparatus for selecting a coding mode | |
US9258567B2 (en) | Method and system for using motion prediction to equalize video quality across intra-coded frames | |
US20060176953A1 (en) | Method and system for video encoding with rate control | |
US9667999B2 (en) | Method and system for encoding video data | |
US20070098067A1 (en) | Method and apparatus for video encoding/decoding | |
US9066097B2 (en) | Method to optimize the transforms and/or predictions in a video codec | |
US20060239347A1 (en) | Method and system for scene change detection in a video encoder | |
US20060222075A1 (en) | Method and system for motion estimation in a video encoder | |
US7864839B2 (en) | Method and system for rate control in a video encoder | |
EP1703735A2 (en) | Method and system for distributing video encoder processing | |
WO2009157581A1 (en) | Image processing device and image processing method | |
JPH09154143A (en) | Video data compression method | |
US11115683B2 (en) | High definition VP8 decoder | |
US20060222251A1 (en) | Method and system for frame/field coding | |
US20060227863A1 (en) | Method and system for spatial prediction in a video encoder | |
KR20230117428A (en) | Adaptive Resolution for Motion Vector Differences | |
US20100118948A1 (en) | Method and apparatus for video processing using macroblock mode refinement | |
US20060171455A1 (en) | Method and system for encoding video data | |
US20130077674A1 (en) | Method and apparatus for encoding moving picture | |
JP2006517369A (en) | Apparatus for encoding a video data stream | |
US20060209951A1 (en) | Method and system for quantization in a video encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM ADVANCED COMPRESSION GROUP, LLC;REEL/FRAME:036560/0910 Effective date: 20090212 Owner name: BROADCOM ADVANCED COMPRESSION GROUP, LLC, MASSACHU Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MOHSENIAN, NADER;REEL/FRAME:036560/0904 Effective date: 20050421 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH CAROLINA Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:037806/0001 Effective date: 20160201 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD., SINGAPORE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BROADCOM CORPORATION;REEL/FRAME:041706/0001 Effective date: 20170120 |
|
AS | Assignment |
Owner name: BROADCOM CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:041712/0001 Effective date: 20170119 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE Free format text: MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047229/0408 Effective date: 20180509 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EFFECTIVE DATE PREVIOUSLY RECORDED ON REEL 047229 FRAME 0408. ASSIGNOR(S) HEREBY CONFIRMS THE THE EFFECTIVE DATE IS 09/05/2018;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047349/0001 Effective date: 20180905 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE PATENT NUMBER 9,385,856 TO 9,385,756 PREVIOUSLY RECORDED AT REEL: 47349 FRAME: 001. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:051144/0648 Effective date: 20180905 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |