WO2014201428A1

WO2014201428A1 - Error resilient multicast video streaming over wireless networks

Info

Publication number: WO2014201428A1
Application number: PCT/US2014/042420
Authority: WO
Inventors: Daniel Perrine Mclane; Chunmei KANG; Brian NUTTER; Bian LI
Original assignee: Chirp, Inc.
Priority date: 2013-06-13
Filing date: 2014-06-13
Publication date: 2014-12-18

Abstract

Systems, devices, and methods for determining a reconstructed frame (990) from a set of source packets (910) and where a set of error-correcting packets may be determined, encoded (930), and interleaved (940) with the set of source packets (910) by a first device. The interleaved packets may then be transmitted and collected by a second device within a set frame period. In one embodiment, if a received set of input packets (945) by the second device does not comprise the set of source packets (910) and the total size of the received set of input packets (945) is equal to the size of the set of source packets (910), then the second device may regenerate lost source packets using error-correcting codes (960) from the received set of input packets (945). In addition, if the size of the received set of input packets is less than the size of the set of source packets, then the second device may drop the received set of input packets (945) or invoke a concealment process (980) on the received set of input packets (945). Accordingly, a set of schemes may be aggregated for packet erasure protection comprising packet level forward error correction (930, 960), packet interleaver and deinterleaver (940, 950), and an audio concealment scheme (980).

Description

NON-PROVISIONAL PATENT APPLICATION

TITLE: ERROR RESILIENT MULTICAST VIDEO STREAMING OVER

WIRELESS NETWORKS

INVENTORS: Daniel Perrine McLane, Chunmei Kang, Brian Nutter, and Bian Li

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of Provisional Patent Application No. 61/834,812 filed June 13, 2013, and Provisional Patent Application No. 62/002,839 filed May 24, 2014, the contents of which are hereby incorporated by reference herein in their entirety and for all purposes.

FIELD OF ENDEAVOR

The present invention relates to a multicast video streaming system, and more particularly, to compressing and decompressing videos, and delivering multicast video streams over wireless networks.

BACKGROUND

Demand for video streaming to a large number of users at crowded events, for example, sporting and music events, have increased dramatically in recent years with the proliferation of smartphones and Wi-Fi networks. Solutions have evolved for handling such video distribution, particularly using multicast. Multicast is a one-to- many distribution scheme, allowing for a data stream to be sent to a plurality of devices in a single transmission from the source. A disadvantage of multicast is that it does not support retransmission. Accordingly, the client devices must use other strategies to overcome dropped, damaged, or out-of-order multicast packets.

In current video streaming systems, video encoding and decoding systems, i.e., CODECs use significant inter-coding techniques to remove the similarities between frames to achieve low bit rate transmission. The disadvantage is that, once a packet is lost it may affect correct decoding of many subsequent frames that rely on the information contained in the lost packet. The error propagation time may be up to a few seconds before the video can be restored to normal display. As a result, the clients may experience frequent intermittence which degrades the quality of user experience. The current systems have to implement sophisticated schemes to repair or conceal the damaged frames, which are at the cost of computational power and system latency.

SUMMARY

Exemplary method embodiments may comprise the steps of (i) determining, by an encoder, a set of source packets based on partitioning a received encoded input frame, where the partitioning may be based on a determined size of the set of source packets; (ii) determining, by the encoder, a set of error-correcting packets based on encoding the set of source packets via a packet-level forward error correction encoding scheme; (iii) interleaving, by the encoder, the determined set of source packets and the determined set of error-correcting packets to generate a set of output packets for transmission, where the set of output packets may be transmitted based on a frame size; (iv) receiving, by a decoder, a set of input packets from the transmitted set of output packets within a frame period; (v) if size of the received set of input packets is less than the determined size of the set of source packets, then performing at least one of: dropping the received set of input packets, and invoking a concealment process on the received set of input packets; and (vi) if the received set of input packets does not comprise the entire set of source packets and the total size of the received set of input packets is equal to the determined size of the source packets, then regenerating lost source packets using error-correcting codes from the received set of input packets, and (vii) determining a reconstructed frame by performing forward error correction decoding on the received set of input packets.

In some embodiments the method may further comprise determining a set of video frames and a set of audio frames from the received encoded input frame.

Optionally, the step of performing at least one of: dropping the received set of input packets, and invoking a concealment process on the received set of input packets may be based on whether the received set of input packets are audio packets or video packets. In addition, the encoder may encode each video frame and audio frame independently.

In other embodiments, the determined size of the source packets may be the number of packets in the received input frame. The encoding and decoding may be via a backward coding of wavelet trees scheme or via a line-based implementation of backward coding of wavelet trees scheme. Optionally, the determined size of the source packets may be calculated based on a bit rate associated with the frame period. The set of output packets may be transmitted based on a frame size and the frame size may be fixed based on the partitioning of the received input frame. In another embodiment, the step of: determining, by an encoder, a set of source packets, may be further based on an equal partitioning of the received encoded input frame to determine the number of packets. Optionally, the receiving a set of input packets may further comprise where the receiving may be via a packet deinterleaver and further comprise putting the input packets in a chronological order. Additionally, the putting of the input packets in chronological order may further be based on the source packets.

Exemplary system embodiments may comprise (a) an encoding device comprising a processor and memory, the encoding device configured to: (i) encode a plurality of received input frames, the received input frames comprising video frames and audio frames; (ii) determine a set of source packets based on partitioning the received encoded input frame, where the partitioning may be based on a determined size of the set of source packets; (iii) determine a set of error-correcting packets based on encoding the set of source packets via a packet-level forward error correction encoding scheme; (iv) interleave the determined set of source packets and the determined set of error-correcting packets to generate a set of output packets for transmission, where the set of output packets may be transmitted based on a frame size; (b) a decoding device comprising a processor and memory, the decoding device configured to: (i) receive, from the encoder, a set of input packets from the

transmitted set of output packets within a frame period; (ii) if the received set of input packets does not comprise the entire set of source packets and the total size of the received set of input packets is equal to the determined size of the source packets, then (A) regenerate lost source packets using error-correcting codes from the received set of input packets, and (B) determine a reconstructed frame by performing forward error correction decoding on the received set of input packets; and (iii) if size of the received set of input packets is less than the determined size of the set of source packets, then (A) drop the received set of input packets if the received set of input packets comprise video packets; and (B) invoke a concealment process on the received set of input packets if the received set of input packets comprise audio packets. Other exemplary method embodiments may comprise the steps of: encoding, by an encoder, a plurality of received input frames, the received input frames comprising video frames and audio frames; determining, by the encoder, a set of source packets based on partitioning the received video frames having a determined size; determining, by the encoder, a set of error-correcting packets based on encoding the source packets via a packet- level forward error correction scheme; interleaving, by the encoder, the determined source packets and the determined error-correcting packets to produce an output frame to be transmitted, where the output frame size may be a fixed size based on the partitioning of the received video frame; collecting, by a decoder, packets within the received output frame until the determined size of the source packets is received; processing, by the decoder, the output frame based on whether the number of source packets received is less than the determined size of the set of source packets, where the processing may comprise at least one of: regenerating lost data using error-correcting codes from received frames, dropping the received output frame, and invoking a concealment process on the received output frame; and determining, by the decoder, whether to drop audio frames via an audio frame concealment if the received input stream is failed to be repaired using the forward error correction scheme.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which:

FIG. 1 depicts an exemplary structure of the proposed video streaming system; FIG. 2 depicts single level 2-D wavelet decomposition;

FIG. 3 depicts hierarchical trees in multi-level decomposition;

FIG. 4 depicts a 2-D Backward Coding of the Wavelet Tree (BCWT) coding unit;

FIG. 5 depicts an exemplary zero tree unit;

FIG. 6 depicts the memory assignments for line -based BCWT coding;

FIG. 7 depicts an exemplary line-based BCWT encoding process;

FIG. 8 depicts an output order of a bit stream with a 4-level Discrete Wavelet Transform (DWT) transform; and

FIG. 9 depicts an exemplary structure of packet erasure protection;

FIG. 10 depicts, in a top-level flowchart, an exemplary method of implementation of the video CODEC; and

FIG. 11 depicts an exemplary top level functional block diagram of a computing device embodiment.

DETAILED DESCRIPTION

The disclosed video streaming system may utilize a memory and

computationally efficient intra frame coding scheme. Embodiments disclose an efficient and cost-effective delivery of video streams to a densely packed audience through a lossy and noisy wireless channel where the video CODEC may encode each video frame independently. The compression occurs by removing the spatial redundancy within the same frame, whereby no temporal processing is performed. This arrangement of the video CODEC enables the error propagation time— due to dropped packets— to be restricted to within one frame. The scheme exploited in the video CODEC may in some embodiments be based on backward coding of wavelet trees (BCWT). The system in these embodiments may include a wavelet transform and a one pass backward coding algorithm.

In an embodiment using the BCWT coding scheme, a method may begin processing the video frames using a 2-dimensional wavelet transform, followed by a BCWT encoder, where the encoder comprises a processor and memory. In this embodiment, the implementation of both lossy and lossless BCWT is utilized. In another embodiment, a method describes a line -based implementation of BCWT (L- BCWT). Line-based implementation of BCWT tightly couples the BCWT with the line-based DWT (L-DWT), resulting in a memory-efficient video CODEC that is suitable for real-time applications such as the ones running on mobile devices.

The encoded video frames produced using the above embodiments, together with the audio frames, may be partitioned into packets for network delivery. The disclosed video streaming system may aggregate a set of schemes to protect the video streams from being damaged by network packet loss. The schemes may include, but are not limited to, the following aspects: First, redundant forward error correction (FEC) packets may be sent aside the source video/audio packets for the receiver to repair dropped or damaged packets due to the imperfection of network transmission. Second, the source packets and FEC packets may then be interleaved to combat the effect of bursty losses in a wireless channel. Lastly, audio frames that have failed to be repaired using FEC packets may be concealed using an audio concealment scheme.

BCWT Intra Frame CODEC

BCWT intra frame CODEC is a wavelet tree based coding technique. The method starts with a wavelet transform followed by a one pass backward coding of the wavelet tree. Wavelet-tree-based coding techniques may widely be used in image CODECs because of the excellent energy-compacting capability and high flexibility in terms of scalability in resolution and distortion. A common feature of such algorithms is to exploit the wavelet tree structure to build dependencies between wavelet coefficients so as to effectively encode the coefficients.

However, when implementing the wavelet-tree-based algorithms, large memory consumption from both the wavelet transform and the coding algorithms may be required. For example, the straightforward implementation of two dimensional (2- D) discrete wavelet transform (DWT) must hold the complete image in its buffer. Besides wavelet transform, most wavelet-tree-based coding algorithms themselves may require significant memory to store temporary data. The coding procedure requires multiple scans of the same wavelet tree as the algorithm passes through different quantization steps. As a result, these algorithms must retain all the data required to complete these repeated scans.

One exemplary implementation may be to use a line -based wavelet transform. Instead of buffering the entire image for the DWT to operate, the line-based DWT can start computing the wavelet coefficients with only a few lines of the image present in the buffer. This significantly alleviates the burden of memory usage due to the wavelet transform and makes the wavelet-based image compression algorithm more realistic for implementation.

In an exemplary implementation of the line-based forward wavelet transform, the algorithm outputs the lowest level wavelet coefficients first, while most wavelet tree-based algorithms start encoding the wavelet trees from the highest wavelet level. Therefore, the algorithm has to buffer the wavelet coefficients from lower levels until sufficient coefficients from the highest level are available in order to begin the encoding process. As a result, the system memory consumption remains high and the potential advantages of the line-based wavelet transform cannot be exploited.

The exemplary video CODEC system, using wavelet-tree-based coding algorithms, provides a memory efficient system while still supporting such features as: low complexity and resolution scalability.

In some embodiments, the exemplary video CODEC system may utilize a wavelet- tree-based coding algorithm where instead of "forward" coding of wavelet trees from the highest level, i.e., lowest resolution, this CODEC is based on backward coding of the wavelet tree (BCWT) allowing it to efficiently work with line-based wavelet transforms without the need for large memory to store the wavelet coefficients from lower levels. This backward coding feature makes significant reduction of the overall system memory possible compared to other wavelet-tree-based CODECs. BCWT algorithm is a one pass coding scheme, no repeated scan of the wavelet tree is required, and thus even lower computation complexity and faster encoding time may be achieved compared to other wavelet-tree-based algorithms. FIG. 1 depicts an exemplary embodiment of a video CODEC system 100 which comprises a compressor and a decompressor and where the figure illustrates an exemplary system structure of the video streaming CODEC system 100. The compressor may comprise a wavelet transform 110, a BCWT encoder 140, an output interface 150 with ROI control 152, resolution scalability control 154, and transport protection, i.e., packet erasure protection 155. The output interface 150 may transmit a data stream 156 based on the ROI control 152, scalability control 154, and packet erasure protection 155. The decompressor comprises an input interface 160 with bit stream recovery 166, ROI control 162 and resolution scalability control 164, a BCWT decoder 170, and an inverse wavelet transform 180.

In one embodiment, each component of an input frame 105, e.g., video frame, may be processed by the compressor independently by the subsequent wavelet transform 1 10 and the BCWT encoder 140. The output interface 150 may combine all encoded data into one bit stream, proper reorganization of which may be needed for ROI and scalability control, and may write the resulting bit stream 156 to the output file. If transport protection is enabled via the packet erasure protection 155, redundant information is added to the bit stream to protect it from transmission loss.

Every step in the decompressor may be in the reverse order of the compressor. First the input bit stream 156 may be parsed by the input interface 160. If transport protection is enabled, the data recovery process 166 is invoked. This will repair the bit stream from transmission loss. Then, depending on whether the ROI 162 and resolution scalability control 164 is enabled, full or partial, i.e., the entire bit stream or portions of the bit stream, may be processed by the BCWT decoder 170, and inverse wavelet transform 180. The output frame 195 may then be an output image comprising the reconstructed image pixels. In the exemplary image CODEC embodiment 100, an input signal 107, 185 may be used to select whether a lossy or lossless implementation of the image CODEC is to be implemented. For example, one embodiment of the video CODEC may be lossy compression, where Daubechies 9/7 wavelet may be used in the wavelet transform. These operations are irreversible because round-off errors due to fixed- point operation will be introduced. Another embodiment of the video CODEC is lossless compression, where Daubechies 5/3 wavelet may be used in the wavelet transform block. These operations may be reversible because all transform

coefficients are integers.

The below sections provide a detailed overview of the wavelet tree structure and the methodology to backward coding of the wavelet tree.

Wavelet tree structure

FIG. 2 illustrates, in a functional block diagram, a single level 2-D wavelet decomposition 200. As way of explanatory examples, the wavelet transform may implement a 2-D wavelet decomposition to the input image 205, including one horizontal direction wavelet decomposition followed by one vertical direction wavelet decomposition. The output wavelet coefficients may be split into four subbands as depicted in FIG. 2.

The LL subband 210 may comprise the low frequency wavelet coefficients, the LH subband 220 may comprise the vertical details, the HL subband 230 may comprise the horizontal details, and the HH subband 240 may comprise the diagonal details in the image. In one embodiment, a multi-level wavelet transform may be implemented by iteratively decomposing the LL band. The resulting subbands may form a wavelet tree structure, where the top (LL subband) of the wavelet tree may comprise the coarsest information of the image and the bottom of the wavelet tree may comprise the finest information of the image.

In the exemplary wavelet tree structure, each wavelet coefficient in a coarser subband may be the parent of four children arranged in the form of a two by two (2x2) block in the finer subband immediately below it. FIG. 3 depicts a wavelet tree structure 300 where each arrow points from a parent to its group of four children. In some embodiments, hierarchical trees may be formed in the decomposition with parent wavelet coefficients constituting nodes of the tree.

The mechanism of BCWT coding algorithm.

1. Building the map of maximum quantization level of wavelet tree

descendants

In an exemplary embodiment of the BCWT algorithm, each wavelet-tree node may store the maximum quantization levels of its descendants (MQD). The MQD map may contain all the MQD of the wavelet tree nodes. In one embodiment, if the wavelet transformed image is, for example, w x h in size, the MQD map is w/2 x h/2 in size. In one embodiment, the MQD may be mainly used to select the transmitted bits from the wavelet coefficients. That is, for each wavelet coefficient only the bits from the minimum quantization threshold, up to its MQD node value may be transmitted. Since the MQD tends to be smaller at the bottom of the wavelet tree and gradually increases until reaching the top, the wavelet coefficients may, for example, be encoded with the number of bits roughly proportional to their magnitude. In some embodiments, the MQD map may then be differentially coded and transmitted as side information. 2. BCWT coding unit

FIG. 4 illustrates an exemplary BCWT coding unit 400. The coding process comprises recursive coding of many small branches of the wavelet trees, where these branches and their corresponding MQD map may be denoted as BCWT coding units. FIG. 4 depicts an exemplary BCWT coding unit where each coding unit lies in two consecutive wavelet levels, which include four (4) wavelet coefficients and five (5) MQD map nodes. An exemplary encoding algorithm may comprise the following steps: (a) generate the MQD node in level N+l utilizing the 2x2 block MQD nodes and the 2x2 block wavelet coefficients in level N; (b) encode the 2x2 block wavelet coefficients utilizing all 5 MQD nodes; (c) encode the 2x2 block MQD nodes. There is a recursive coding mechanism for MQD nodes, meaning that MQD used in the current coding unit is generated from coding process of the previous coding unit. In one embodiment, the lowest level MQD nodes may be generated based on the level 1 wavelet coefficients. Since in each coding unit only two levels of wavelet trees are involved, the entire tree need not be stored. Once the coding is finished, most of the data may no longer be needed and may be released from memory. For example, after encoding level N coding unit, the level N MQD and wavelet coefficients are no longer needed, and thus may be discarded where the memory may be used to store new data. Accordingly, a further saving on the memory usage may be achieved via releasing the unneeded memory.

3. One pass backward coding

One embodiment of the BCWT algorithm may implement, for example, a one pass encoding algorithm. That is, the encoding process may be completed with only a one time scan of the wavelet tree. The algorithm starts encoding the coding units from the lowest wavelet level and moves up to the higher wavelet levels. The encoding procedure repeats until the top level is reached. The coefficients in the LL band may then be encoded using a uniform quantization. In the decompressor embodiment, every step in the decoding is in the exact reverse order of the encoding, therefore coefficients and MQD map nodes may be reconstructed from the highest wavelet level to the lowest.

4. BCWT encoding algorithm

The symbols used to describe the wavelet tree structure and the coding unit are defined as:

U : A BCWT coding unit with its root at coordinate^,;).

C_jj : The wavelet coefficient at coordinate

0(i,j) : A set of coordinates of all the offspring of

D(i,j) : A set of coordinates of all the descendants of

L(i,j) = D (i,j)— 0 (i,j) : A set of coordinates of all the leaves of

The quantization levels associated with the wavelet tree are defined as

\ ^ci,j \ — l^;The quantization level of the coefficient c_t .

q_mi_n : The minimum quantization threshold. q_mi_n ≥ 0. Any bits below q_mi_n will not be present at the encoder output.

The node in the MQD map is denoted as τη_ί -, representing the maximum quantization level of all the descendants of the wavelet coefficient c_t , where is in level 2 or higher subbands (this is because coefficients in level 1 subbands do not have descendants).

A few operations involved in the encoding algorithm are:

B(x) : binary code of L| x|J, e.g., fl (5.5) = 101

T(ri) binary code with a single one at the wth right-most bit (n≥ 0), e.g., T(0) = 00000001, 7(5) = 00100000

section of binary code b, starting from the wth and ending at the mth right-most- bit (m≥ n≥ 0), e.g., ΟΟΟΟΟΟΙθ β = 000; OOl lOl lO ^ = 1101

For an N level 2-D wavelet decomposition, there should be 3N+1 subbands, which are HL_t, LH_t, HH_t, HL₂, LH₂, HH₂ HL_N, LH_N, HH_N , and LL_N . LL_N is further divided into four small subbands HL_N+1, LH_N+1, HH_N+1 and LL_N+1. The

HL_N+1, LH_N+1, and HH_N+1 subbands in LL_N are used to encode the HL_N, LH_N, and HH_N subbands. In one embodiment, the notion S_N is made to represents the entirety of the three high frequency subbands at level N. Therefore, S₁ represents HL_t, LH_t, and HH_t subbands. S_N represents HL_N, LH_N, and /H/_wsubbands. S_N+1 represents HL_{N+ 1}, LH_{N+ 1}, and HH_{N+ 1} subbands. With all of the above notations and operations the encoding may be summarized below:

1. Encode level 1 high frequency subbands

1.1. V(iJ)€¾

1.1.1. V(r, s) G 0 (i,j . m _j = max_{( s)e0(iJ)}{q _s}

1.1.2. if ^mi,j — q-min-. V(r, s) £ 0 (i,j) \ 1.1.2.1. Output B(c_rs)

Qmin

1.1.2.2. If \c_{r s}\ > 0, output sign (c_{r s}) Encode level 2 to level N-1 high frequency subbands:

2.1. V(i,;) ES_n(n = 3,4 N-1)

2.1.1. q_{L{i )} = max_(rs)e0(iJ){m _s}

2.1.2. rrii = max{max_{( s)e0(iJ}){q_r,s}, q_L(i_j)}

2.1.3. lim _j≥ V(r,s) G 0(i ):

2.1.3.1. Output B c_TiS ) I

2.1.3.2. If |c _s| > 0, output sign (c _s)

"H.j

2.1.3.3. Output r(q_L(u)) max((?_i(i_J),i?_min)

2.1.3.4. If¾_(iJ)≥ Hmini

2.1.3.4.1. Output r(m _s)

max(m_rs,q_min)

2.2. n = n+ 1. If < N - 1, go to step 2.1.

Encode level Ν high frequency subbands:

3.1. qmax = max{max_(rs)eSjv{q_r,s),max₍₇._jS)eSiV{m_r)S) 3.1.1. Output B(q_max) \ \

3.2. V(i,;) G S_n+1

3.2.1. q_{L(i )} = max_{iriS)e0( n}{m_r>s}

3.2.2. m _j = max{max_{( s)e0(iJ)}{q_r,s},i7_L(ij)}

3.2.3. Output (mi Qmax

T ) max^ij^min)

3.2.4. If rrii ≥ q_min, V(r,s) G 0(i,j):

3.2.4.1. Output B{c_rs)

Qmin

3.2.4.2. If |c _s| > 0, output sign (c _s)

3.2.4.3. Output T(q_{L(i )}) 3.2.4.4. If ≥ Vrnin, V(r,s) G 0(i,;)

3.2.4.4.1. Output r(m _s)

max(m_rs,(7_min) 4. Encode LL_N subband:

4- 1 - max — ^max r,s)eLL_N{^clr,s}

4.1.1. Output B(q_max) \ ³ _Q

4.2. If q_max≥ q-min, V(i,_/) G LL_N+1

4.2.1. V r, s) G 0 (i,j) m_tij = max_{(r s)e0{i )}{q_r>s}

4.2.3. if m ≥ q_min, V(r, s) G 0 (i,;^'):

4.2.3.1. Output B cr_iS)

4.2.3.2. If |c_{r s}| > 0, output sign (c_{r s})

According to the coding steps, the algorithm starts processing the coding units at the lowest wavelet level. After all the coding units in that wavelet level are processed, the algorithm moves to the next higher wavelet level. Level 1 to N-l subbands may be encoded differently from level N high frequency subbands. A simple uniform quantization may be used to process the coefficients in the LL subband.

The decoding algorithm may be in the exact reverse order of the encoding algorithm. That is, the BCWT algorithm decodes the MQD map nodes and the wavelet coefficients from the highest wavelet level to the lowest wavelet level. For example, first, the LL subband is decoded. After that, level N high frequency subbands are decoded. Lastly, the rest of the high frequency subbands are decoded in the order of

5. Zero Tree Detection Scheme

FIG. 5 depicts an exemplary zero tree unit 500 showing normal nodes and those nodes with a flag_0. In one embodiment, a zero tree unit defines a wavelet-tree node that has all of its descendants having quantization level less than q_mi_n. If a zero tree unit is detected, all coding units within the zero tree structure are skipped without encoding. All corresponding MQDs are set to -1. Only the topmost MQD of the zero tree is encoded and transmitted. In the decoder, if m =— 1 ,which is decoded from the upper level wavelet tree— for the current coding unit— then all four wavelet coefficients are filled with zero and all four MQD are set to m_{r s} = — 1 without further processing. Line-based BCWT

Another embodiment of the video codec may use a line-based BCWT. In one embodiment, an exemplary line-based BCWT structure is proposed to achieve line- based wavelet tree coding. In an embodiment of the line-based wavelet transform, the image may be read into the buffer, line by line. For each image line in the buffer, a one-level 1 -D horizontal wavelet transform may be performed. When there are sufficient 1-D wavelet transformed lines in the buffer, a one-level 1-D vertical wavelet transform is performed on those lines. After this operation, the first line of each of the four level 1 wavelet subbands (LL_X, LH_X, UL_X, UH_X) is obtained. Then, the first line of each of the LH₁, HL₁, and HH₁ subbands are sent to the BCWT encoder. The first line of the LL₁ subband may be sent to the level 2 buffer to be held for further calculation.

Subsequent to the first line of each of the four subbands being sent, the first two lines in the level 0 buffer may be discarded and two new image lines are read into the buffer. One-level horizontal wavelet transform may again be performed on these two lines followed by a one-level vertical wavelet transform. This generates the second line of each of the four level 1 wavelet subbands. The second line of LH₁, UL_X, HH₁ subbands is sent to the BCWT encoder, and the second line of hh is sent to the level 2 buffer.

This operation is repeated until there are sufficient lines in the level 2 buffer. Similarly, one-level horizontal and vertical transforms are conducted to the level 2 buffer. This yields the first line of each of the four level 2 wavelet subbands (LL₂, LH₂ HL₂, HH₂). The first line of each of the LH₂, HL₂, and HH₂ subbands are sent to the BCWT encoder. The first line of the LL₂ subband is sent to the level 3 buffer. An N- level wavelet transform may then be achieved via repeating these steps. As described above, the BCWT encoder may receive the subbands data from the line-based wavelet transform line by line and in a non-consecutive fashion, meaning that the next line received after a line from level N may be from any level. Accordingly, an N-level buffer is to be used to store the inputs in the BCWT encoder if an N-level wavelet transform is performed.

FIG. 6 shows the memory assignments for an exemplary line -based BCWT coding embodiment 600. In some embodiments, the buffer may be used to store the wavelet coefficients and another buffer may be used to store the MQD map. In one embodiment, as soon as there are two lines in the buffer, the BCWT may start encoding the coefficients, since each level BCWT buffer only needs to hold 2 lines, thus the buffer height is 2. In this exemplary embodiment, the width of the level 1 BCWT buffer is 3w/2, where w is the wavelet transformed image width. The level N BCWT buffer has a width of 3w/2^N . The MQD map starts from level 2, since level 1 wavelet coefficients do not have descendants. The width and height of the MQD map buffer are the same as the MQD map's corresponding level BCWT buffer.

FIG. 7 depicts an exemplary line-based BCWT encoding process 700. In this figure, the gray blocks represent a coding unit, where the 2x2 block of the wavelet coefficients in the level N BCWT buffer and the 2x2 block of the MQD nodes in the level N MQD map buffer may be utilized to generate an MQD node in the level N+1 MQD map buffer and to encode the 2x2 block of wavelet coefficients. Once this process is completed, the encoding process may move to the next coding unit and encode the 2x2 block of wavelet coefficients to the right side of the previously encoded gray 2x2 block of wavelet coefficients in the level N BCWT buffer. After encoding all the coefficients in these two lines in the level N BCWT buffer, a new line for the level N+1 MQD map buffer is generated. After that, these two lines in the level N BCWT buffer may be discarded.

Resolution scalability control FIG. 8 depicts an exemplary output order of a bit stream 800 with 4 level discrete wavelet transform (DWT) transform. In one embodiment, because the bits may be generated in the order of lower level coding units to higher level units of the wavelet tree, the coding scheme is inherently resolution scalable. In this embodiment, after all the wavelet coefficients are encoded, the LL band outputs may be transmitted first, then the high frequency subband outputs may be transmitted from S_N subbands down to S_t subbands. In one embodiment, the encoded MQD map and wavelet coefficients may be interleaved in the output bit stream. The received bit stream may be progressive-of-resolution decodable. That is, the decoder may choose to stop decoding at a certain higher level unit and reconstruct a smaller version of the original image, and resume decoding to get a larger version, until the full resolution is reached. In one embodiment, the progressive-resolution decoder may decode, after a smaller part of the whole file has been received, at a lower quality of the final picture; as more data is received at the decoder and decoded, the quality of the picture may improve monotonically.

Region Of Interest (ROD Control

Two exemplary ROI control schemes may be implemented in the image CODEC system, one on the receiver side and one on the transmitter side. ROI control in the receiver side may allow the user to select any ROI from a low resolution thumbnail of the full image and decode that specific ROI to a desired higher resolution. ROI control in the transmitter side may encode the predefined one or more ROIs with layered resolution, where the resulting bits are much smaller than that of originally encoded high resolution version of the full image. That is, a layered resolution may provide different resolutions at different layers. This technique may be especially suitable for band limited transmission environments.

1. Receiver ROI control

In order to decode an ROI in an image, a wavelet-based CODEC must decode all the relevant wavelet coefficients. In BCWT, due to the wavelet-tree structure, the wavelet coefficients relevant to— or associated with— an ROI correspond to a wavelet-forest, e.g., a set of neighboring wavelet-trees. Encoded data bits representing the wavelet coefficients in a wavelet-forest may be grouped together in the bit stream and may be referred to as a BCWT forest. To decode an ROI, only the BCWT forests with relevant wavelet coefficients may be extracted from the bit stream and decoded. The receiver ROI capability may be obtained by reorganizing the encoded data bits in the bit stream, and the reorganization may happen after the encoding process, thus no tile-boundary artifacts— due to the discontinuity between adjacent BCWT forests— may appear. 2. Transmitter ROI control

Broadcasting live videos are receiving more and more attention due to the rapidly increased free wireless coverage for users. However, due to limited bandwidth, high resolution videos may not always be available. Intermittent videos or low resolution videos are certainly less entertaining. In most cases, the user may only be interested in a particular region of the video while care little or even nothing about the video outside of that region. For example, at a baseball game, fans will want to watch the pitches very closely around the plate to judge balls and strikes, while not caring much about other background details. In these exemplary environments, the proposed ROI control technique allows the user to stream the video with selective resolutions in different regions.

In an environment where streaming of the video with selective resolution is desired, the ROI control technique may be implemented. In one embodiment, after the wavelet transform, an ROI mask may be applied to the transformed image. The ROI mask follows the same wavelet-tree structure but comprises only binary data. Similar to the wavelet transformed image, the ROI mask may be partitioned into BCWT forests, within which, it is partitioned into different decomposition levels. In this exemplary embodiment, all BCWT forests relevant to an ROI in the ROI mask are set to 1. For BCWT forests not relevant to an ROI, only positions within the LL band and the upper decomposition levels are set to 1, all other positions are set to 0. A transmitted ROI embedded image may be generated by multiplying the ROI mask to the wavelet transformed image. Since in the non ROI, i.e., region of no interest, most of the high frequency wavelet coefficients are zeroed by the ROl mask, the associated zero-tree units may be skipped without encoding, and thereby resulting in reduced output bits. In this embodiment, the decoder may not need to take any additional steps in order to decode the image. The decoder therefore decodes the ROl embedded image just like normal images. In the reconstructed image, within the ROl, the image may be viewed with full resolution. Outside of ROl, the image is reconstructed with lower resolution. If desired, the ROl control may be adapted to mask the transformed image with multi-level resolution. For example, the BCWT forests may be partitioned into ROl, non ROl 1, and non ROl 2. The user may then assign different resolution levels to mask non ROl 1 and non ROl 2. Optionally, a few ROIs in the ROl mask, if more than one region is of interest to the users, may be assigned. Packet level erasure protection

FIG. 9 depicts an exemplary functional block structure of a packet erasure protection scheme, where each encoded video frame— received in a bit stream— may comprise a set of packets 910 and exhibit a variable bit stream size. This embodiment describes a method to manipulate such bit steams on the packet level, for example, Real-time Transport Protocol (RTP) packets 920. RTP packets may be formatted and used for delivering audio and video over IP networks, in particular, for

communication systems involving streaming media. In one embodiment, the packets may be encoded along with FEC packets, for example, via an FEC encoder 930, and interleaver 940, upon being transmitted. Interleaving packets of a frame allow for the shuffling of packets across the bit stream, thereby reducing packet loss that may occur in bursts and accordingly creating a more uniform distribution of errors. In the receiver, the system may determine whether a received frame, comprising a received set of input packets 945, may be recoverable or not recoverable depending on the number of erased packets in that frame. A decoder may initially, for example, via a packet deinterleaver 950, perform deinterleaving of the received frame.

Deinterleaving may be done on a temporal basis and function to put the received packets of a frame in the correct chronological order. If a frame comprising one or more erased or lost packets is detected and determined to be recoverable, then packet FEC decoder 960 may be used to repair the frame, thereby generating a reconstructed frame 990. Otherwise, the system may determine to either drop the frame or conceal, for example, via a frame concealment scheme, the frame with previously received frames. A video/audio decoder 970 may be used to decode the frames for either playback or display, via, for example, using a BCWT decoding scheme.

1. Packet level FEC

Error correction encoding, using error-correction codes to detect and correct multiple random symbol errors, may be used in one exemplary system embodiment. The error-correction codes may work on fixed-sized packets. That is, each encoded video frame may be equally partitioned into K source packets, regardless of the size of the bit stream. This plurality of K source packets are packet-level forward error correction encoded to form a plurality of N minus K (N - K) error-correcting packets, namely, FEC packets, where the K source packets and N - K FEC packets are to be transmitted to the receiver. The receiver collects packets within each frame period. In some embodiment, a frame period may be defined as a time interval equal to the reciprocal of the frame rate. If all K source packets are received, the receiver may stop receiving further packets within that frame period, and the FEC decoding process may be skipped. Otherwise, the receiver may continue collecting packets within that frame period until K packets are received, at which point, the receiver may stop receiving further packets within that frame period and the FEC decoding process is invoked. If at the end of the frame period there are less than K packets received, the receiver may determine to either drop the frame or invoke a concealment process for the damaged frame.

2. Packet interleaver

Packets may be interleaved 940 after the FEC encoder 930, where M number of frames make up an interleaver block. In some embodiments, since each frame encloses N number of packets, the interleaver block comprises N multiplied by M (N x M) packets in total. In one exemplary embodiment, a row-column interleaver may be applied to the N x M packets. According to the way the bit stream of each frame is partitioned, the interleaver block size may be fixed and the system latency from the interleaver may also be fixed.

3. Concealment of lost audio frames

Concealment of lost audio frames may be used to mask the effects of packet loss within that audio frame. An audio frame may be dropped if less than K packets are received in that frame. The previously received frame may be used to generate a best match to the lost frame. The last few milliseconds of data from the previous frame may be used as a template, after which point a correlation may be calculated between the template and the rest of the data in the previous frame, yielding a correlation score. The highest score identifies the best match. The data segment after the best match position and before the template may be used to fill the lost packet. If consecutive packets are lost, the data segment may be cyclically played with faded amplitude to each sample until the samples fade to zero. The fading factor may be determined by the system according to the source of the audio. The system may also determine to use silent data to conceal, via frame concealment 980, the lost frames if there is no apparent pattern in the audio data or the network packet loss rate is over a threshold and burst frames losses are over, for example, 64 milliseconds. Embodiments may include an exemplary method of implementation of a

CODEC system 1000, as illustrated in a top-level flowchart of FIG. 10. The exemplary method steps of the system and associated computing devices may comprise the following steps: (a) determining, by an encoder, a set of source packets based on partitioning a received encoded input frame, where the partitioning is based on a determined size of the set of source packets (step 1010); (b) determining, by the encoder, a set of error-correcting packets based on encoding the set of source packets via a packet-level forward error correction encoding scheme (step 1020); (c) interleaving, by the encoder, the determined set of source packets and the determined set of error-correcting packets to generate a set of output packets for transmission, where the set of output packets are transmitted based on a frame size (step 1030); (d) receiving, by a decoder, a set of received input packets from the received output packets within a frame period (step 1040); (e) if size of the set of input packets received is less than the determined size of the set of source packets, then either dropping the set of received input packets or invoking a concealment process on the received output frame, where the performing is based on whether the input packets are audio packets or video packets (step 1050); (f) if the set of received input packets does not comprise the entire set of source packets and the total size of the received input packets is equal to the determined size of the source packets, then regenerating lost source packets using error-correcting codes from the received set of input packets (step 1060); and (g) determining a reconstructed frame by performing forward error correction decoding on the set of input packets of the current frame (step 1070).

Other embodiments may include an exemplary method of implementation of a CODEC system where the exemplary steps of the system and associated computing devices may comprise the following steps: (a) determining, by an encoder having a processor and memory, a set of subbands associated with a received input image of the plurality of input images; (b) determining a set of wavelet coefficients associated with each subband of the plurality of subbands; (c) generating, by the processor, a wavelet tree, the wavelet tree comprising a set of nodes, where each node of the set of nodes of the wavelet tree is associated with a wavelet coefficient of the set of wavelet coefficients; (d) generating, by the processor, a maximum quantization level for a set of descendants of the set of nodes of the wavelet tree; (e) determining, by the processor, if the maximum quantization level is less than a threshold and then setting the wavelet coefficient to zero if it is not, otherwise encoding the maximum quantization level; (f) encoding the set of wavelet coefficients; and (g) transmitting a bit stream, where the bit stream comprises the encoded maximum quantization level and the encoded set of wavelet coefficients.

FIG. 11 illustrates an exemplary top level functional block diagram of a wavelet-based image CODEC system 1 100 comprising a plurality of computing devices. The exemplary operating environment is shown as a computing device 1120 comprising a processor 1124, such as a central processing unit (CPU); an addressable memory 1 127, such as a lookup table having an array; an external device interface 1126, such as an optional universal serial bus port and related processing, and/or an Ethernet port and related processing; an output device interface 1 123; an application processing kernel 1122; and an optional user interface 1 129, such as an array of status lights, and/or one or more toggle switches, and/or a display, and/or a keyboard and/or a pointer-mouse system and/or a touch screen. A user interface may also have at least one user interface element. Examples of user interface elements comprise: input devices including manual input such as buttons, dials, keyboards, touch pads, touch screens, mouse and wheel related devices, and voice and line-of-sight interpreters. Additional examples of user interface elements comprise output devices including displays, tactile feedback devices, and auditory devices. Optionally, the addressable memory may, for example, be: flash memory, Solid State Drive (SSD), EPROM, and/or a disk drive and/or another storage medium. These elements may be in communication with one another via a data bus 1128. An operating system 1 125, such as one supporting the execution of applications, may comprise a processor 1124 which may be configured to execute steps of determining a reconstructed frame from a set of received encoded source packets by using a set of error-correcting packets interleaved with the set of received encoded source packets. Additionally, receiving the interleaved packets within a set frame period and based if the set of received input packets does not comprise the entire set of source packets and the total size of the received input packets is equal to the size of the source packets, then regenerating lost source packets, using error-correcting codes, from the received set of input packets.

It is contemplated that various combinations and/or sub-combinations of the specific features and aspects of the above embodiments may be made and still fall within the scope of the invention. Accordingly, it should be understood that various features and aspects of the disclosed embodiments may be combined with or substituted for one another in order to form varying modes of the disclosed invention. Further it is intended that the scope of the present invention herein disclosed by way of examples should not be limited by the particular disclosed embodiments described above.

Claims

CLAIMS: What is claimed is:

1. A method comprising:

determining, by an encoder, a set of source packets based on partitioning a received encoded input frame, wherein the partitioning is based on a determined size of the set of source packets;

determining, by the encoder, a set of error-correcting packets based on

encoding the set of source packets via a packet-level forward error correction encoding scheme;

interleaving, by the encoder, the determined set of source packets and the determined set of error-correcting packets to generate a set of output packets for transmission, wherein the set of output packets are transmitted based on a frame size;

receiving, by a decoder, a set of input packets from the transmitted set of output packets within a frame period;

if size of the received set of input packets is less than the determined size of the set of source packets, then

performing at least one of: dropping the received set of input packets, and invoking a concealment process on the received set of input packets; and

if the received set of input packets does not comprise the entire set of source packets and the total size of the received set of input packets is equal to the determined size of the source packets, then

regenerating lost source packets using error-correcting codes from the received set of input packets, and

determining a reconstructed frame by performing forward error correction decoding on the received set of input packets.

2. The method of claim 1 further comprising:

determining a set of video frames and a set of audio frames from the received encoded input frame.

3. The method of claim 2 wherein the performing at least one of: dropping the received set of input packets, and invoking a concealment process on the received set of input packets is based on whether the received set of input packets are audio packets or video packets.

4. The method of claim 2 wherein the encoder encodes each video frame and each audio frame independently.

5. The method of claim 1 wherein the determined size of the source packets is the number of packets in the received input frame.

6. The method of claim 1 wherein the encoding and decoding is via a backward coding of wavelet trees scheme.

7. The method of claim 1 wherein the encoding and decoding is via a line-based implementation of backward coding of wavelet trees scheme.

8. The method of claim 1 wherein the determined size of the source packets is

calculated based on a bit rate associated with the frame period.

9. The method of claim 1 wherein the set of output packets are transmitted based on a frame size and the frame size is fixed based on the partitioning of the received input frame.

10. The method of claim 1 wherein the step of:

determining, by an encoder, a set of source packets is further based on an equal partitioning of the received encoded input frame to determine the number of packets.

1 1. The method of claim 1 wherein the receiving a set of input packets further

comprises wherein the receiving is via a packet deinterleaver and further comprises putting the input packets in a chronological order.

12. The method of claim 1 1 wherein the putting the input packets in chronological order is further based on the source packets.

13. A system comprising:

an encoding device comprising a processor and memory, the encoding device configured to:

encode a plurality of received input frames, the received input frames comprising video frames and audio frames;

determine a set of source packets based on partitioning the received

encoded input frame, wherein the partitioning is based on a determined size of the set of source packets;

determine a set of error-correcting packets based on encoding the set of source packets via a packet-level forward error correction encoding scheme;

interleave the determined set of source packets and the determined set of error-correcting packets to generate a set of output packets for transmission, wherein the set of output packets are transmitted based on a frame size;

a decoding device comprising a processor and memory, the decoding device configured to:

receive, from the encoder, a set of input packets from the transmitted set of output packets within a frame period;

if the received set of input packets does not comprise the entire set of source packets and the total size of the received set of input packets is equal to the determined size of the source packets, then regenerate lost source packets using error-correcting codes from the received set of input packets, and

determine a reconstructed frame by performing forward error correction decoding on the received set of input packets; and if size of the received set of input packets is less than the determined size of the set of source packets, then drop the received set of input packets if the received set of input packets comprise video packets; and

invoke a concealment process on the received set of input packets if the received set of input packets comprise audio packets.