GB2353652A

GB2353652A - Video coding employing a predetermined percentage of stuffing bits

Info

Publication number: GB2353652A
Application number: GB9920273A
Authority: GB
Inventors: Nicholas Ian Saunders; Robert Mark Stefan Porter; Timothy Stuart Roberts
Original assignee: Sony United Kingdom Ltd
Current assignee: Sony Europe Ltd
Priority date: 1999-08-26
Filing date: 1999-08-26
Publication date: 2001-02-28
Anticipated expiration: 2019-08-26
Also published as: GB2353652B; GB9920273D0

Abstract

A decoder (1) decodes a first generation MPEG2 bitstream Ao having a long GOP ....IBBPBBPBBPBB... to baseband. A recoder (8) recodes the GOP to a second generation bitstream AI having only intra frames, which are stored in an intra frame store (12). An encoder (4) re-encodes the intra frames of generation 2 as a third generation long GOP. Even if the transcoding parameters of the first generation are reused at the third generation, it is found that occupancy VBV values for the first and third generations tend to differ. This may cause the downstream buffer 2 to under - or over- flow. Thus, at the first generation, the target number of bits for the GOPs is made up of a preset percentage of stuffing bits. If in subsequent generations the number of data bits grows, then correspondingly fewer stuffing bits are used in those generations to maintain a constant GOP size.

Description

2353652 SIGNAL PROCESSOR The present invention relates to a signal

processor. Embodiments of the invention described herein are concerned with digital video bitstreams which are compressed according to the MPEG-2 standard.

The invention and its background will be discussed by way of example with reference to MPEG-2 video bitstreams. However the invention is not limited to MPEG-2.

MPEG-2 is well known from for example ISOJEC/13818-2, and will not be described in detail herein. Splicing of video is well known. It is used in editing video.

Splicing analogue signals is relatively straight forward and can be done at the boundary between adjacent frames, because each analogue frame contains the whole of the video information of that frame independently of other frames. (Splicing can be done similarly in the digital domain for both compressed and uncompressed video data if all frames contain the whole video information of the frame.) MIPEG-2 compressed video comprises groups of intra, inter frames known as GOPs, Groups of Pictures. intra, P and B frames are well known. An intra or Intraencoded frame contains all the information of the frame independently of any other frame. A P frame in a GOP ultimately depends on an intra frame and may depend on other P frames. A B frame of a GOP ultimately depends on an intra-frame and may depend on P frames in the GOP. A B frame must not depend on any other B frame.

A GOP typically comprises 12 or 15 frames comprising at least one intra. frame and several P and B frames. To correctly decode a GOP requires all the frames of the GOP, because a large part of the video information required to decode a B frame in the GOP is in a preceding and/or succeeding frame of the GOP. Likewise a large part of the video information required to decode a P frame is in a preceding frame of the GOP.

It has been proposed to encode, decode and recode compressed bit streams. That may be done in various circumstances. In one circumstance, a bit stream is encoded, having a GOP of 12 or 15 frames for attaining a high degree of compression, but then is decoded and recoded as intra-frames only for storage and convenience in processing in a studio. In the studio, the intra. frames may be processed, e.g. spliced, 2 and then recoded to a GOP of 12 or 15 frames. Alternatively, it has been proposed to splice GOPs of 12 or 15 frames, a process which involves at least partially recoding the spliced GOPs in a transition region including the splice point: see for example the paper "Flexible Switching and editing of MPEG-2 Video Bitstreams" by P.J. 5 Brightwell, S.J. Dancer and M.J. Knee was published in "Atlantic Technical Papers 1996/1997" the preface to which is dated September 1997.

In one case a frame may be encoded at generation 1, decoded at generation 2 to digital baseband and recoded at generation 3 as the same type i.e. intra (1) or inter,( P or B) as in generation 1. In another case, at generation 2 the frame may be decoded and recoded as an intra frame before being reencoded at generation 3 as the same type as in genrerationl. In either case, even if the generation I transcoding parameters are reused at generation 3, the number of bits in the frames changes. Difficulties occur if the numbers of bits in the frames change. A buffer in an encoder or a decoder may under- or over- flow.

It is desired to reuse the transcoding parameters as much as possible to maintain picture quality through several generations of decoding and recoding. However changes in the number of bits in the frames from generation to generation occur.

According to one aspect of the present invention, there is provided a signal processor for encoding a video signal to produce a compressed bitstream comprising a Group of Pictures (GOP), the processor being arranged to produce a GOP having a number of bits, (Remain-bit_GOP) of which a predetermined percentage (P Remain_bit_GOP) are stuffing bits.

By adding a proportion of stuffing bits to a frame at generation 1, if the number of data bits in the frame increase during subsequent decoding and recording operations, the number of stuffing bits may be correspondingly reduced maintaining the original size of the frame.

According to another aspect of the present invention there is provided a signal Z processing system comprising a first signal processor according to any preceding Z claim for producing a first generation GOP, a first intra-frame encoder for recoding all the frames of the first generation GOP as intra-frames of a second generation, and a 3 further signal processor arranged to recode the second generation intra- frames as a recoded third generation GOP comprising intra and inter frames, and to add stuffing bits to the frames of the third generation GOP according to a predetermined algorithm so that the third generation frames at least tend towards having the same number of 5 bits as the corresponding first generation frames.

It will be appreciated that the first generation GOP may be re-encoded from a previous generation of frames. Furthermore, further generations may be re-encoded from the third generation.

In a preferred embodiment, the transcoding parameters of at least the first generation intra frames are reused in recoding those frames in the second and third generations. Furthermore, preferably the transcoding parameters of the first generation inter frames are retained in association with the corresponding second generation frames and reused when recoding those frames in the third generation.

With reuse of transcoding parameters, where the problem of changes in the number of bits is most likely to occur, the present invention allows the total number of bits (including stuffing bits) of the third generation to substantially equal, or at least track changes in, the numbers of bits (including stuffing bits) in the first generation.

An embodiment of the further processor comprises means for stuffing the third generation frames with bits according to the predetermined algorithm so that the number of bits thereof (including stuffing bits) at least tends towards the number of bits (including stuffing bits) of the corresponding first generation frames.

The algorithm may be such that that the number of bits stuffed in the third generation frames is dependent on the cumulative sum of the differences in bits (including stuffing bits) between the third generation frames and the corresponding first generation frames or any other suitable parameter which indicates the amount of divergence in the number of bits between the frames of the first and third generations. In an embodiment, the number of stuffing bits added to a third generation frame is smaller if the accumulated difference is small than if the accumulated. difference is large. Thus 'spare bits' can be accumulated where the differences between third and first generation frames are small to be used where those differences are large.

4 A For a better understanding of the present invention, reference will now be made by way of example to the accompanying drawings, in which:

Figure 1 is a time chart illustrating the splicing of a bitstream Bo to a bitstream AO in accordance with a first example of the present invention; Figure 2 illustrates a portion R of the chart of Figure 1 in more detail; Figure 3 is a schematic block diagram of an illustrative signal processor according to the present invention and operating as illustrated in'Figures 1 and 2; Figure 4 is a time chart illustrating the splicing of a bitstream Bo to a bitstream AO in accordance with a second example of the present invention; Figure 5 illustrates a portion R of Figure 4 in more detail; Figure 6A is a schematic block diagram of an illustrative signal processor according to the invention and operating as illustrated in Figures 4 and 5; Pigure 6B shows a modification of Figure 6A; Figure 7 shows illustrative G0Ps in display order and the application of illustrative picture type decisions; Figure 8 shows the GOP's of Figure 4 in processing order; Figure 9 shows an example of variation of VBV occupancy during a splicing operation; Figure 10 illustrates drift in VBV occupancy; and Figure 11 illustrates VBV occupancy for an illustrative embodiment of the present invention.

Transcoding Parameters 1 frames have the following transcoding parameters which are well known in MPEG.

DCT-Type, Q and Q-Matrix which are reused in embodiments of the present invention described hereinbelow. These parameters are reused in the recoding of I frames with reuse of parameters.

P and B frames have the parameters WT-Type, Q, Q-Matrix, Pred__Type, MBMode and Motion Vectors. These parameters are reused in the recoding of P and B frames with reuse of parameters. These parameters are recalculated when fully recoding all frames.

First Example

Referring to Figures 1, 3 and 7, two bitstreams A0 and Bo are MPEG-2 encoded with GOPs comprising 12 frames. In this example the bitstreams Ao and Bo have the same GOP structure:

IBBPBBPBBPBB as shown in Figure 7. However the bitstreams may have any other GOP structure allowed by MPEG-2. The two bitstreams A0 and Bo may have different GOP structures. For ease of explanation it is assumed the bitstreams A0 and Bo have the same GOP structure as shown in Figure 7.

It is desired to replace bitstrearn A0 by bitstrearn B0. As shown in Figure 1, initially A0 is provided to the processor. It is routed in the processor P of Figure 3 from input Ao via a delay DA to contact A0 of switch S, where it is fed, unchanged to, for example, a downstream decoder 2. In decoder 2 it is decoded for display.

Downstream decoder 2 may be in, for example, a domestic television receiver.

Processor P may be in a studio.

When an operator decides to splice bitstreams Bo and Ao, the operator operates the switch S I and a switch S2 so that A0 is routed through decoder A, encoder 4 and via contact C of switch S 1 to the downstream decoder 2.

The bitstrearn A0 is decoded in decoder A and re-encoded in encoder 4. The MPEG-2 parameters are derived from the decoder A by a control processor 6 and re used in the encoder 4, so that the decoding and re-encoding is as loss- less as possible.

(There may be some loss because DCT rounding process can cause the DCT process to be not transparent.) Before the splice point, bitstrearn Bo is also decoded in decoder B. The processor P has sufficient storage (not shown) associated with decoders A and B to store for example 30 compressed frames.

A splice point SPLICE is chosen. At the splice point switch S2 selects the decoder B and decoded bitstrearn B is fed via the encoder 4 to the contact C of switch S 1. In the example of Figures I and 7, full recoding, that is without re- use of the MPEG parameters, begins on the bitstream A0 5 frames before the splice point 6 SPLICE. The reason for this will be explained below. After the splice, bitstrearn Ao is irrelevant except that some frames of A0 after the splice may be needed to decode frames of A0 occurring before the splice.

After the splice point SPLICE, the bitstrearn B0 is fully recoded for a transition period during which VBV-lock is achieved as will be explained below. VBV- lock is the occurrence of the VBV value of the recoded bitstrearn in the transition region equalling the VBV value of the stream 130. Once VBV-lock is achieved, recoding of bitstream B0 continues but with re-use of the MPEG parameters derived from the original bitstream B0.

After a short interval of recoding of B0 with re-use of the MPEG parameters, switch S1 selects contact B0 and thus the original bitstrearn B0, by- passing the decoder B and encoder C via a delay DB. The delays DA and DB are provided to compensate for the signal processing delays in the decoders I and 3 and the encoder 4 Second Example Referring to Figures 4, 6A and 7, two first generation bitstreams A0 and B0 are MPEG-2 encoded with GOPs comprising 12 frames. In this example the bitstrearns A0 and B0 have the same GOP structure:

IBBPBBPBBPBB as shown in Figure 7. However the bitstreams may have any other GOP structure allowed by MPEG-2. The two bitstreams AO and B0 may have different GOP structures. For ease of explanation it is assumed the bitstreams A0 and B0 have the same GOP structure as shown in Figure 7.

It is desired to replace bitstream A0 by bitstrearn B0. As shown in Figures 4 and 6A, initially A0 and B0 are decoded in decoders I and 3 and recoded as second generation intra frames in recoders 8 and 10. When decoding and recoding A0 and B0 as A, and B1, the MPEG parameters of all frames of the original bitstreams A0 and B0 are retained in association with the recoded corresponding frames of A, and BI. intra frames of A0 and B0 are recoded as intra frames of the bitstreams A, and B, using the, same parameters they had in A0 and B0. P and B frames of A0 and B0 are recoded as intra frames in A, and B, but their original MPEG parameters are retained. The NIPEG 7 parameters are retained in the recoded bitstream & B, as, for example, user data. In addition, the VBV-delay values of the corresponding first generation frames are retained in association with the second generation frames as user data.

When an operator decides to splice the recoded bitstreams B, and & the operator operates the switch S2 so that A, is routed to a store 12 up to the splice point and B, is routed to the store 12 after the splice point, so that store 12 stores spliced bitstrearn A, /B, with a splice point between a frame of A, and a frame of Bp The spliced bitstream AilB, is decoded to base band and re-encoded in an encoder 4 as a third generation GOP C of the form shown in Figure 4.

In a preferred embodiment shown in Figure 3B the bitstreams are stored in respective stores 14 and 16 upstream of the switch S2 before they are spliced. The spliced bitstream A,1B, is stored in another store 12. The stores 14, 16 and 12 may be intra frame servers, digital Video Tape Recorders andlor disc recorders for example.

Referring to Figure 4 the splice point SPLICE is indicated. At the splice point switch S2 switches from, for example, bitstream AI to bitstream Bp When the spliced bitstreams are to be re-encoded they are fed to the encoder 4. In the example of Figures 6A or B, full recoding, that is without re-use of the MPEG parameters, takes place in a transition region beginning on the bitstream AI 5 frames before the splice point SPLICE. The reason for this will be explained below.

Before the beginning of the transition region (i.e. more than 5 frames before SPLICE) the bitstream AI is recoded reusing the MPEG parameters derived from the original bitstream A, After the splice point SPLICE, the bitstrearn B, is fully recoded for the remainder of the transition period during which VBV-1ock is achieved as will be explained below. Once VIIV-1ock is achieved, recoding of bitstrearn B, continues but with re-use of the MPEG parameters derived from the original bitstream B0.

The spliced and recoded bitstream C produced by processor P and encoder 4 is fed to a downstream decoder 2 where the bitstream C is decoded for display for example. Downstream decoder 2 may be in, for example, a domestic television receiver. Processor P may be in a studio.

8 In Figures 6A and 6B the spliced bitstream. AilB, is stored in the intra- frame store 12 before being recoded. A marker marking the splice point is recorded in the bitstream, for example in the user bits to allow the recoder 4 to fully recode in the transition region around the splice point. The recoder may include a delay for 5 determining the position of the marker before recoding.

As shown in Figures 1 and 4, for both the first and second examples there is a transitional region which includes the splice point SPLICE during which the occupancy for the bitstream C is controlled to prevent underand over-flow of the buffer of the downstream decoder 2.

The methods of control discussed in the following discussion apply equally to both examples. However, it will be appreciated that:

a) in the first example it is the original bitstreams AO and Bo which are being re-encoded by the encoder 4; whereas b) in the second example it is the intra frame bitstreams A, and B, which are being re-encoded by the encoder 4.

However, the re-encoding of the frames of the bitstreams A, and B, is dependent on corresponding frames of the bitstreams AO and BO from which they are derived. Thus, in the following reference is made only to the frames of the original bitstreams AO and Bo. The following discussion applies to the frames of the bitstreams A, and B, which correspond to the frames of the bitstreams AO and BO, with the result that the effect of recoding is the same as if the bitstream had not been re-coded as intra frames (apart from some losses due to the additional recoding and decoding to and from intra frames).

Picture Type Decision The splicing of bitstream BO to bitstream. AO disrupts the GOP structure. Thus the following rules are applied.

The bit stream, in this example AO, hefore the splice, is recoded so that:

(1) the last 'F or 'P' frame before the splice is converted to 'P; (2) if the last frame before the splice is a 'B' frame, it is converted to 'P'.

9 The bitstream, in this example Bo, after the splice is recoded so that:

(3) the first 'I' or 'P' frame after a splice is converted to 'I'; and (4) if the first GOP after the splice and after the application of rule (3) contains less than three 'P' frames, the 'F frame of the subsequent GOP is converted to 7, 5 thereby lengthening the GOP.

In effect, a new GOP begins with an intra-frame at the splice, and the new GOP may be made longer than preceding (and succeeding) G0Ps in the bitstream BO. The new GOP is in effect a prediction of where VBV-1ock is to be achieved. The application of these rules is shown in Figure 7 at (1), (2), (3) And (4).

10!n Figure 7 A is bitstream Ao, B is bitstream BO, and C is the spliced bitstream at output C of encoder 4 as if the picture type decisions have not been made, and C' is the spliced bitstream at output C of encoder 4 with the picture type 15 decisions applied to it.

By application of rule (2), the W frame of AO immediately before the splice is converted to P. By application of rule (1) the intra frame of AO before the splice is also converted to P.

By application of rule (3), the first 'P' frame. of stream BO after the splice is 20 converted to intra in stream C'.

By application of rule (4), the GOP of bitstream BO after the splice has (after conversion of its first P frame to intra) less than 3 'P' frames. Therefore the next 'I' frame is converted to 'P'. Application of these rules gives a GOP which defines a predicted VBV-1ock point as will be discussed below.

Rule 4 may be changed to (41).

(41) If the first COP after the splice contains only one 'P' frame, the frame types of the next COP are altered from 'I' to 'P' and 'P' to 'I' to give two 'P' frames in a COP. This results in two shorter G0Ps between the splice point and V13V-1ock.

Processing Order Figure 7 shows the frames of the bitstreams in the order in which they are displayed or would be displayed. Figure 8 shows the order in which the frames are processed. For example, referring to Figure 7 (Display Order) frames 0, 1 and 2 of bitstream A are shown in that order. Even though the B frames 0 and 1 would be displayed before intra frame 2, they depend on intra frame 2 to be decoded. Thus to decode them intra frame 2 must precede the B frames as shown in Figure 8. Likewise B frames 3 and 4 of Figure 7 depend on P frame 5 of Figure 7; thus in Figure 8 P frame 5 of Figure 7 becomes P frame 3 preceding the two B frames.

Constant bit rate The example of the processor P of Figure 3 or 6A or 6B, has a constant bit rate.

The bitstreams AO BO, have a fixed bit rate and the encoder 4 produces at output C a constant bit rate.

Downstream Decoder and Buffer The downstream decoder 2 has a buffer 8. The encoding which takes place in encoder 4 of the processor is arranged so that the buffer 8 of the downstream decoder 2 neither underflows nor overflows. Figures, 9 and 11 show the operation of the downstream buffer 8 of the downstream decoder 2.

(The encoder 4 has a corresponding buffer and it operates as the inverse of what is shown in Figures 9 and 1 L) The following are known MPEG rate control parameters.

VBV 11 VBV is virtual buffer verifier. It is a measure of the number of bits that would be in the downstream buffer 8.

Remain-bit-GOP This is a target number for the total number of bits for the remainder of the current GOP. At the beginning of a GOP it is a target for the whole GOP. It reduces as the GOP progresses.

Complexity X, constants Kp, Kb and N, Np, Nb N is the number of pictures in a GOP.

Np is the number of P frames remaining in the current GOP.

Nb is the number of B frames remaining in the current GOP.

Kp and Kb are 'universal' estimates dependent on quantisation matrices. They (indirectly) define the relative sizes of intra, P and B frames.

Xi, Xp, Xb are "complexity measures" for intra, P and B frames.

These parameters are used in a known manner to distribute the bits of a GOP 15 amongst intra, P and B frames.

They are further explained in "Test Model Y' published by "International Organisation for Standardisation Organisation Internationale De Normalisation Coded Representation of Picture and Audio Information ISO/IEC JTC1/SC29/WG1 I/NO4OW.

Achieving VBV-Lock As discussed above, the downstream buffer must neither underflow nor overflow. In MPEG-2 the buffer is normally kept approximately half-full. A discontinuity in the bit stream can make the buffer underflow or overflow. VBV is the measure of buffer occupancy.

12 Figure 9 shows, as an extreme case, buffer occupancy VBV for two bitstreams A0 and B0. A0 has a typical occupancy and Bo has unusually high occupancy.

In the situation where the bitstrearn begins with A with typical occupancy, and B with high occupancy is spliced onto A at the splice point, it is necessary to provide a transitional GOP which:

a) provides continuity, albeit changing, of VBV occupancy; and b) changes the VBV occupancy from the value of stream A just before the splice to a target value which is the value of VBV for stream B. As shown in Figure 9, the VBV of bit stream C begins identical to A, then 10 changes progressively towards the VBV of B. The point at which the VBV occupancy of C becomes identical to that of stream B is the VBV_lock point.

As mentioned above Figure 9 shows occupancy of the downstream buffer 8. To achieve VBV-10ck the encoder 4 is controlled as follows.

Temporal Reference For the system of Figure 6, it is sequences of intra -frames which are spliced. Thus, it is necessary to make use of the "Temporal Reference" in MPEG to keep track of the frames. The temporal reference is a count incremented at each frame indicating the position of the frame relative to a GOP header. There is normally one GOP header per GOP. For an unchanging GOP structure, the temporal reference identifies the frame type, intra, P or B, or for the second generation intra. -frames the type ftom which the intra. -frame was derived. That information is needed to: apply the picture decision Rules for constructing the transitional GOP; to determine when the original GOPS end; and for determining Remain_bit_GOP which depends on the number of frames remaining in a GOP.

Illustrative Method of Achieving VBV Lock 13 The following describes one method of achieving VBV lock. Other methods are described in for example co-filed UK Patent Application P7374.GI3 Q- 99-21) and in UK Patent Application No. 9908809.8 (P 99-3). Reference is made to these applications by way of information only.

a) Picture Decision Rules The methods use the picture decision rules (1) to (4) above.

b) Complexi!y For the system of Figure 3 stream A is decoded and re-encoded before the splice. Thus encoding occurs with complexity values appropriate to stream A.

However, these values are not appropriate for stream B. Thus before the splice, complexity values X,, Xp, XB of the intra, P and B frames of stream B immediately before the splice are calculated based on X = S.Q where X = complexity value S number of bits generated by encoding picture Q average quantisation parameter of all macroblocks in a picture.

X S.Q is a standard equation for rate control in MPEG.

At the splice point these complexity values replace the existing values (of stream A). So after the splice complexity values appropriate to stream B are used.

The complexity values control the distribution of bits amongst intra, P and B frames. Achieving good subjective quality is dependent on the complexity values.

For the system of Figure 6, in which the bit streams Ao and Bo (generation 1) Z"> are recoded as intra -frame bit streams AI, BI (generation 2) such complexity values 14 are not similarly available. Thus for the system of Figure 6 the complexity estimates for bit stream B of Generation 3 are obtained as follows:

a) for P and B frames, from transcoding parameters of the first P and B frames of the first generation stream B after the splice point. The transcoding parameters are 5 retained in association with the frames as discussed above.

b) for intra frames, from any second generation stream B intra frames after the splice point. Preferably the intra frame which is chosen is the frame which will be recoded as an intra -frame at Generation 3.

These frames are available because of the 3 frame delay involved in reordering frames.

C) Virtual Buffers Modification The virtual buffers are used to calculate the reference Q scale for each macroblock. Improvement in quality can be gained by setting the virtual buffers to estimated stream B values at the splice point. This ensures that the resulting Q scales are similar to those used in stream B in the previous generation, instead of continuing 15 with stream A Q scales.

For the system of Figure 3, stream B virtual buffer values are estimated for the last intra, P and B frames before the splice point, based on Q.

For the intra frame, the value is calculated as:

estimatedbuf i = (Q bit-rate) / (31 frame-rate).

Q is the average quantization parameter.

This value is then forced at the splice point.

This is also done for the P and B virtual buffers in same way.

For the system of Figure 6, such values are not similarly available. Instead, the buffer values are derived separately for intra, P and B frames, a) for P and B frames, from the transcoding parameters of the first P and B frames of stream B after the splice point in the first generation; and b) for intra frames, from the intra frames of the second generation stream B. These frames are available because of the 3 frame delay included for reordering the frames.

For both systems, the encoding in encoder 4 is controlled in accordance with Remain-bit-GOP, between splice and VBV-lock so that the occupancy of the downstream buffer 8 follows a continuous but changing trajectory from before the splice at bitstream A occupancy to VBV-lock at the bitstream B occupancy. The control is also used to force the complexity and virtual buffers as described above.) D. Adjust Remain_bit-GOP.

In order to increase occupancy of the downstream buffer as shown in Figure 6, the buffer in the encoder is controlled in accordance with Remain_bit-GOP to output pictures with smaller number of bits, so that its occupancy decreases. Pictures with smaller numbers of bits are produced by increased compression/coarser quantisation.

If the trajectory is from high occupancy of the downstream buffer to lower occupancy of the downstream buffer, the encoder is controlled in accordance with the higher Remain-bit-GOP to increase occupancy of its buffer, producing larger pictures by less compression/finer quantisation.

Remain-bit-GOP is the target for the number of bits remaining in the GOP.

The length of the transitional GOP is known from the result of adjustthe GOP length using the picture decision rules.

16 The value of the Remain-bit-GOP is reset to zero at the splice point. The value of Remain-bit-GOP is recalculated in normal manner for the new transitional GOP following the splice point. The length of the transitional GOP is determined by the picture decision rules. The value of Remain-bit-GOP at the splice is changed by a 5 value VBV-diff where VBV-diff = VBV_C_ splice - VBV_B_next I or P.

That is, the Remain-bit-GOP is initially set to the sum of [the normal allocation of bits for the GOP] and [the difference between (the VBV value of stream C at the splice) and (the VBV value at the first I or P frame following the splice of the stream B)]. This is a prediction of the VBV of the bitstrearn B at the VBV_lock point. The value of Remain-bit-GOP is then reduced by a factor cc<l, for the transitional GOP.

The factor reduces Remain-bit-GOP for the transitional GOP by an empirically determined amount e.g. 5% or less. It is assumed herein that a is fixed. It may be varied.

Remainjit-GOP is updated at every frame by the number of bits used to encode that frame. In addition the following updating occurs.

At the splice point VBV-diff is based on VBV-C-splice - VBV-B-next I or P. Subsequently at every succeeding I and P frame of the stream B, VBV_Diff= VBV_B_currenLI or P - VBV-B-next I or P.

If VBV-diff is positive the change is divided by the number of I and P frames in the remainder of the transitional GOP and the result added to Remain-Bit- GOP at each subsequent update. If VBV-diff is negative Remain -bit-GOP is reduced by VBV-diff. In stream C of Figure 9 Remain-bit-GOP is thus updated every 3 frames.

This continues until VBV-lock is achieved. The differences VBV-diff can be calculated because the frames are stored.

17 VBV_Lock Point The initial value of Remain-bit GOP is reduced by the factor a. cc is chosen to reduce the value of Remain-bit-GOP so that spare bits are available at VBVjock. The spare bits allow additional bits ("stuffing bits") to be added to achieve exact lock. The 5 need for this will be explained hereinbelow.

The period over which VBV-10ck is predicted to be achieved is one GOP albeit a GOP the length of which may have been changed by the picture type decision rules. In this example it is about 30 frames.

Referring to Figures 2, 5, 7 and 8 assume lock is achieved at I-frame 52 in display order (Figure 7). In fact the VBV lock is achieved in the processing order so that it occurs at reordered I frame 50 (Figure 8). The following B frames 51 and 52 in Figure 8 are fully recoded frames from prior to VBV-Iock and disturb the lock. Thus the spare bits are used at the second B frame 52 to stuff the bitstream to achieve exact lock.

If the factor a is zero, then no spare bits are available so the system attempts to achieve exact lock at the I frame immediately after the end of the transitional GOP. Rate control under or over steers producing usually too many bits. Even if exact lock is achieved, at the I frame, lock is disturbed at the B frames. So, Remain_bit-GOP is reduced by the factor a so rate control oversteers so that spare bits are available at the end of the GOP. The spare bits are used to achieve exact lock at the second B frame.

The I-frame 50 is processed by reusing its parameters derived from the original bitstream B0. For both the sytems of Figures 3 and 6A or 6B after the fully recoded B frames 51 and 52, re-use of parameters resumes. In the case of the system of Figure 3, reuse of parameters continues for a short time until the original bitstrearn is directed directly to the output of the processor bypassing the decoding and re- coding.

Motion Vectors 18 Motion Vectors Motion vectors are regenerated for the transitional GOP.

Alternatively motion vectors could be estimated from vectors in neighbouring frames.

The foregoing description describes splicing (and is the subject of cofiled UK

Patent Applications P7372 1-99-19 and of 9908809.8), which are mentioned here only as a matter of information. However, referring for example to Figures 6A and 6B a long GOP e.g. Ao may be decoded by decoder I to baseband, recoded by recoder 8 to all I-frames, stored in an I frame store such as 14 and/or 12 without being subjected to processing such as splicing and re-encoded as a long GOP by coder 4.

Whilst the processes described above make as much re-use as possible of the transcoding parameters of the original bitstreams A0, B0 to minimise any full recoding and to maximise picture quality of the re-encoded bitstream, it has been found that, even without splicing, VBV occupancy may drift. This will be explained with reference to Figure 10.

Figure 10 assumes, for simplicity, that a long GOP A0 also referred to herein as generation one or Gen 1, is recoded as all I frame (generation two, Gen 2) and Gen 2 is recoded as a long GOP (generation three, Gen 3), with re- use of the transcoding parameters of Gen 1. There is no splicing. Figure 10 shows Gen 1 superimposed on Gen 3. It is apparent that VBV occupancy (of the downstream buffer 8) for Gen 3 drifts away from that for Gen 1 even though, assuming fully transparent recoding, the VBVs should be identical.

The same effect occurs to a lesser extent in the case of the system of Figure 3 for the bitstreams A0 and B0 which are recoded with re-use of the transcoding parameters outside the transitional GOP and shown as "Reuse A" and "Reuse B" in Figure 1. For convenience, the following discussion will refer only to the system of Figures 6A and 6B, but it also applies to the system of Figure 3.

It is believed that the drift occurs because even though, for example, I frames are not fully recoded from Gen I to Gen 2 to Gen 3 but instead their transcoding 19 parameters (1, Q, Q-matrix) are re-used, they are decoded to baseband and recoded including DCT and inverse-DCT processes which introduce errors which tend to accumulate. In addition, Pand B frames are recoded as I frames with different Q scale and then back to P and B frames, which involves loss of information and therefor different numbers of bits. The drift is likely to increase with successive generations:more than three generations of recoding may take place.

One solution to the drift is to monitor the drift, and if it becomes excessive, fully recode the bitstreams until occupancy is again within desired limits. However, that may reduce picture quality.

It will also be appreciated that Figure 10 shows occupancy of the downstream buffer 8. The occupancy of the upstream buffer in coder 4 is the opposite to that of buffer 8. Thus a downward drift as shown in Figure 10 corresponds to an increase in the number of bits in the frames of Gen 3.

The following sets out four illustrative methods in accordance with the present invention for at least reducing the drift. These methods are applied to Gen 1 and Gen 3, i.e. where Gen I is the original long GOP Ao or Bo of Figures 1 to 3, or of Figures 4 to 6A and 6B, and Gen 3 is the recoded long GOP of these Figures.

Method I which is also used in Methods 2 to 4 subject to modification.

This method seeks to allow the numbers of bits of the frames of Gen 3 (and subsequent Generations) to substantially equal the numbers of bits of the corresponding frames of Gen 1.

1) At the encoding of Gen 1, at the beginning of each GOP, Remain-bit_GOP which is the target for the number of bits in a GOP is reduced by 0%.

P% is in the range from about 2% to about 8%. Preferably P% is 5%. The number of bits is then made up to the original unreduced Remain-bit-GOP by adding stuffing bits. Reducing Remain-bit-GOP by P% reduces the picture quality slightly.

The stuffing bits are divided equally amongst the 1, P and B frames: however they could be divided unequally between the 1, B and P frames.

2) If there is a Gen 2, (comprising only I-frames) all the frames, of the long GOP of Gen I are recoded (decoder 1, 3 and recoder 8, 10 as described above), to I-frames of Gen 2.

The I frames of Gen 1 are recoded as I-frames in Gen 2 with re-use of the original transcoding parameters. The P and B frames are fully recoded as I-frames. The original Gen I transcoding parameters of the Gen I frames are retained in association with the corresponding Gen 2 I-frames.

There is no additional bit stuffing in Gen 2.

3) Assuming there is a splicing operation, a) bit streams A0 and B0 of Gen I are spliced as described above with respect to Figures I to 3 or b) bitstreams A, or Bi of Gen 2 are spliced as described above with respect to Figures 4 to 6A or 6B.

In either case a) or b), the bitstreams are fully recoded in the transition region. That is, in the transitional GOP, as described with reference to Figures I to 3 or 4 to 6A/6B, Remain-bit-GOP is reduced by factor to a, and stuffing bits are used to allow the transitional GOP to achieve precise lock with the bitstream. B. Remain -bit-GOP may or may not be reduced by P% and stuffing bits added as in Generation 1.

4) The bitstream A (either A0 or A,) before the transition region and the bitstream B (either A0 or A,) after the transition region is recoded in coder 4 with the re-use of the transcoding parameters of Gen 1.

The processor calculates the number of bits (including the stuffing bits) of the frames in Gen I from for example the value of VBV-delay which is a measure of the occupancy of the downstream buffer. The VBV values are information known at the 21 time of encoding Gen I and may be included as user data which is preserved throughout the generations in association with the frames.

The coder 4 in encoding Gen 3 from either A0/B0 or AJBI stuffs the encoded bitstreams to the same size as in Gen 1, so that the frames of Gen 3 have the same size 5 as the corresponding frames of Gen 1.

If decoding and recoding between Gen 1 and Gen 3 (including Gen 2 if used) causes the number of data bits in the frames to increase, the increase in data bits uses up some of the stuffing bits added at Gen 1. If in subsequent generations, the data bits increase further, more of the stuffing bits are used up. As a consequence, even though 10 the data bits increase, buffer occupancy, i.e. VBV remains constant.

Methods 2 to 4 I. Step 1) of Method 1 is modified in that P%, the reduction in Remain- bit-GOP is replaced by a small reduction Y% where Y% in the range from about I% to about 5 %, preferably 2.5 %.

2. If there is a step 2) it is the same as in method 1.

3. If there is a step 3) it is the same as in method 1.

4. As in method 1, the bitstrearn A (either A0 or A,) before the transition region and the bitstream. B (either B0 or BI) after the transition region is encoded in Im coder 4 with re-use of the transcoding parameters of Gen 1.

The coder 4 in producing Gen 3 bit stuffs the recoded long GOP with the stuffing bits.

Assuming there has been no growth in the number of data bits, the coder 4 stuffs Gen 3 with Y%, e.g. 2.5% of stuffing bits. However, in these methods 2 to 4 the actual percentage of stuffing bits in a frame is varied in accordance with a number called "bit-accumulation".

22 "Bit-accumulation" is the sum of the changes in size of the frames from the beginning of encoding of the third generation Gen 3 up to the current frame compared to the sum of the sizes of the corresponding frames of the corresponding GOP in Gen 1. The sizes are calculated in bits and include for Gen I the data bits and the Gen 1 5 stuffing bits and include for Gen 3 the Gen 3 data bits and the Gen 3 stuffing bits.

Thus for a frame n in a GOP in Gen 3 called Fn3; Frame-n-size-I is the number of data bits in the corresponding frame FnI in Gen 1.

Stuff n- I is the number of stuffing bits added to Fn I in Gen 1.

Frame-n-size-3 is the number of data bits in Fn3.

Stuff n-3 is the number of stuffing bits added to Fn3 in Gen 3.

Then the change in size An is An = (Frame-n- size- 1) - (Frame-n-size_3) + (stuff n _1) -( stuff n -3).

bit-accummulation is An where n = I to C- 1 I where C is the current frame number counting from the beginning of encoding. That is bit accumulation is the cumulative throughout encoding of generation 3..Bit-accumulation is positive when Gen 3 uses less bits than Gen 1.

Let a number G be the number of bits allocated to a whole GOP.

Let Byte-stuff be the percentage of stuffing bits added to a GOP in Gen 3.

Byte-stuff is always greater than or equal to zero.

Method 2 23 Al) The coder 4 stuffs Gen 3 by a nominal 2.5% of G equally divided between L P and B frames.

A2) If bit-accumulation <-5% of G, then Byte_stuff reduces by 0.5% of G.

B2) If bit-accumulation >+5% of G then Byte-Stuff increases by 0.5% of G.

Method 3 Al) The coder 4 stuffs Gen 3 by a nominal 2.5% of G as in Method 2.

It uses B2) as in Method 2 but replaces A2) by A3) where A3) is:- if bit-accumulation <0% of G then Bytc_stuff reduces by 0.5% of G.

Method 4 Al) The coder 4 stuffs Gen 3 by a nominal 2.5% of G as in Method 2.

A4) If b.it_accumulation <0% of G then Byte-stuff changes to 1/2x its current value B4) If bit-accumulation >+5% of G then Byte-stuff changes to 2x its current value In methods 2 to 4 the given values of bit-accumulation and Byte_stuff are only examples. Other values may be used. They may be determined empirically.

In methods 2 to 4, the stuffing bits are not used to cause Gen 3 to have in each frame exactly the same number of bits as in Gen 1. Instead, the numbers of stuffing bits are varied so that the numbers of bits in the frames in Gen 3 generally track the numbers of bits in the frames of Gen 1. That is, the numbers of bits in the frames of Gen 3 and Gen I may not be the same, but they are not on diverging trends as shown for example in Figure 10.

24 Figure 11 shows an example of occupancy of the downstream buffer 2 for Method 4. Figure I I is for a splicing operation as carried out by a system according to Figures 4 to 6A or 6B including an 1-frame generation Gen 2.

Line C is the Gen 3 which tracks Frames 0 to 23 of bit stream (Generation 1) A, up to the splice point, and then tracks bit stream B (Gen 1) frame 45 onwards but on occasions slightly diverges from B as in frames 59 to 83.

The methods I to 4 described above are examples only. The invention may use any suitable method of controlling bit stuffing in Generation 3 according to a predetermined algorithm dependent on bit accumulation or any other suitable parameter which indicates the amount of divergence in the number of bits between the frames of the first and third generations.

Claims

A signal processor for encoding a video signal to produced a compressed bitstream comprising a Group of Pictures (GOP), the processor being arranged to produce a GOP having a number of bits, (Remainjit-GOP) of which a predetermined 5 percentage (P Remain-bit-GOP) are stuffing bits.
2. A processor according to claim 1, wherein the GOP comprises at least one frame.
3. A processor according to claim 1 or 2, wherein the GOP comprises intra, (I) and inter frames.
4. A processor according to claim 3, wherein the stuffing bits are distributed equally amongst the frames of the GOP. 15
5. A processor according to claim 3, wherein the stuffing bits are unequally C distributed amongst the intra and inter frames of the GOP.
6. A signal processing system comprising a first signal processor according to any preceding claim for producing a first generation GOP, a first intra-frame encoder for recoding all the frames of the first generation GOP as intra- frames of a second generation, and a further signal processor arranged to recode the second generation intra-frames as a recoded third generation GOP comprising intra and inter frames, and to add stuffing bits to the frames of the third generation GOP according to a predetermined algorithm so that the third generation frames at least tend towards having the same number of bits as the corresponding first generation frames.
7. A system according to claim 6 wherein the transcoding parameters of at least the first generation intra frames are reused in recoding those frames in the second and third generations.

26
8. A system according to claim 7, wherein the transcoding parameters of the first generation inter frames are retained in association with the corresponding second generation frames and reused when recoding those frames in the third generation.
9 A system according to claim 6, wherein the frames of the first generation GOP have transcoding parameters, the intra frame encoder retains the transcoding parameters in association with the corresponding intra frames produced by the intraframe encoder and the further signal processor recodes the intra-frames in accordance with the transcoding parameters to produce recoded third generation frames having the same transcoding parameters as the corresponding frames of the first generation..
10. A system according to claim 6, further comprising a second signal processor according to any one of claims I to 4 for producing another first generation GOP, a second intra-frame encoder for recoding all the frames of the GOP produced by the second processor as intra frames of another second generation GOP, a splicer for splicing the intra frames produced by the first and second intra-frame encoders at a splice point to produce a spliced intra-frame bitstrearn, wherein the further signal processor is arranged to recode the spliced intra-frames as a third generation there being a transition region including the splice point, and in which the occupancy of spliced bitstream varies from the occupancy of the bitstream. produced by the first processor to the occupancy of the bitstream produced by the second processor and wherein the further signal processor is arranged to add stuffing bits to the third generation recoded frames outside the transition region according to a predetermined :1 algorithm so that the recoded frames outside the transition region at least tend towards having the same numbers of bits as the corresponding frames produced by the first and second signal processors.
11. A system according to claim 10, wherein the transcoding parameters of at least the first generation intra frames are reused in recoding those frames in the second generation and in the third generation outside the transition region.

C 27
12. A system according to claim 11, wherein the transcoding parameters of the first generation inter frames are retained in association with the corresponding second generation frames and reused when recoding those frames in the third generation outside the transition region.
13 A system according to claim 10 wherein the frames of the first generation GOP have transcoding parameters, the intra, frame encoder retains the transcoding parameters in association with the corresponding intra. frames produced by the intraframe encoders and the further processing means recodes the intra-frames outside the transition region in accordance with the transcoding parameters to produce third generation recoded frames having the same transcoding parameters as the corresponding frames of the first generation.
14. A signal processing system according to claim 6,7, 8 or 9, comprising intra- frame storage means for storing the second generation intra frames.
15. A system according to claim 14, wherein the intra-frame storage means comprises; an intra-frame video tape recorder; an intra-frame video disk recorder; and/or an intra-frame server.
16. A signal processing system comprising a first signal processor according to any one of claims 1 to 4, a second signal processor according to any one of claims I to 4, a splicer for splicing a GOP produced by the second processor to a GOP produced by the first processor at a splice point to produce a spliced bitstream, and a further signal processor arranged to recode the spliced bitstream, in a recoding region around the splice point, the recoding region including a transitional region including the splice point in which the occupancy of the spliced bitstrearn varies from the occupancy of the bitstrearn produced by the first processor to the occupancy of the bitstrearn produced by the second processor, and 28 wherein the further signal processor is arranged to add stuffing bits to the recoded frames outside the transitional region according to a predetermined algorithm so that the recoded frames at least tend towards having the same number of bits as the corresponding frames produced by the first and second processors.
17. A signal processing system according to claim 16, wherein the frames of the GOPs produced by the first and second processors have transcoding parameters and the further processing means recodes the frames outside the transition region but in the recoding region in accordance with the transcoding parameters to produce recoded frames having the same transcoding parameters as the corresponding frames produced by the first and second processors.
18. A system comprising a signal processor according to anyone of claims I to 4, for producing a first generation GOP having intra and inter frames, means for decoding the first generation GOP to a second generation of frames, and a further signal processor for recoding the second generation as a third generation GOP having intra and inter frames.
19. A system according to claim 18, wherein the frames of the first generation GOP have transcoding parameters and the parameters are retained in association with the frames in the second generation and the further signal processor reuses the transcoding parameters in recoding the third generation frames.
20. A signal processing system according to any one of claims 6 to 19, wherein stuffing bits are added to the said recoded third generation frames so that the numbers of bits of the third generation frames (including stuffing bits) substantially equals the C numbers of bits (including stuffing bits) of the said corresponding first generation frames.
A signal processing system according to anyone of claims 6 to 20, wherein the further processor comprises means for stuffing the third generation frames with bits 29 according to a predetermined algorithm so that the number of bits thereof (including stuffing bits) at least tends towards the number of bits (including stuffing bits) of the corresponding first generation frames.
22.A system according to claim 21, wherein the algorithm is such that the number of bits stuffed in the third generation frames is dependent on the cumulative sum of the differences in bits (including stuffing bits) between the third generation frames and the corresponding first generation frames.
23. A system according to claim 21, wherein: if v = percentage of stuffing bits added to the first generation G0Ps Fn is a current frame of a current third generation GOP, An is the difference in size (in bits including stuffing bits) between Fri and the corresponding first generation frame, n bi(_accumulation is 1 An when l<n<C-1, where C is the frame number of the 1 current frame counting from the beginning of encoding of the third generation, G is the number of bits allocated to the current GOP, and Byte,_stuff is the percentage of stuffing bits added to the current GOP and distributed amongst its frames; the predetermined algorithm is:

Al) Byte-Stuff = v unless A2) if bit-accumulation <-X% of G then Byte-stuff reduces by Y% of G B2) if bit-accumulation >+Z% of G then Byte-stuff increases by W% of G.
24. A system according to claim 23, wherein IXI = JZI and 1Y1 = IWI, IXI, IZI, 1Y1 and IWI being greater than zero.
25. A system according to claim 23 wherein X=O; IZI, 1Y1 and IWI are greater than zero, and IWI = IZI.
26. A system according to claim 24 or 25, IZI = 12Y1 and 1WI 12Y/101.
27 A system according to claim 23, 24, 25 or 26, wherein v 2.5%.
28. A system according to claim 21, wherein the predetem-iined algorithm is: AI) Byte-stuff = v unless A4) if bit-accumulation <V% of G then Byte-stuff reduces by a factor R B4) if bit-accumulation <T% of G then Byte-stuff increases by a factor R where ITI>1VI>O, RA and Byte-stuff and bit-accumulation are as defined in claim 22.
29. A system according to claim 28, wherein R=2 T=5 V = 0. 20