WO2003100655A1 - Systems and methods for pile-processing parallel-processors - Google Patents
Systems and methods for pile-processing parallel-processors Download PDFInfo
- Publication number
- WO2003100655A1 WO2003100655A1 PCT/US2003/016908 US0316908W WO03100655A1 WO 2003100655 A1 WO2003100655 A1 WO 2003100655A1 US 0316908 W US0316908 W US 0316908W WO 03100655 A1 WO03100655 A1 WO 03100655A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recited
- data
- processing
- exceptions
- decoder
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 132
- 238000012545 processing Methods 0.000 title claims abstract description 60
- 238000004590 computer program Methods 0.000 claims abstract description 8
- 238000004020 luminiscence type Methods 0.000 claims abstract description 5
- 238000007906 compression Methods 0.000 claims description 29
- 230000006835 compression Effects 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 23
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000013144 data compression Methods 0.000 claims description 7
- 230000006837 decompression Effects 0.000 claims description 7
- 238000009738 saturating Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 description 77
- 230000006870 function Effects 0.000 description 28
- 230000002123 temporal effect Effects 0.000 description 16
- 238000013213 extrapolation Methods 0.000 description 10
- 230000015654 memory Effects 0.000 description 10
- 230000008859 change Effects 0.000 description 8
- 230000002441 reversible effect Effects 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000013139 quantization Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 238000007792 addition Methods 0.000 description 6
- 241000023320 Luma <angiosperm> Species 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000012048 forced swim test Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/148—Wavelet transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30072—Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
- G06F9/3865—Recovery, e.g. branch miss-prediction, exception handling using deferred exception handling, e.g. exception flags
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/62—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding by frequency transforming in three dimensions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Definitions
- the present invention relates to data processing.
- Parallel processors are difficult to program for high throughput when the required algorithms have narrow data widths, serial data dependencies, or frequent control statements (e.g., "if, "for", “while” statements). There are three types of parallelism that may be used to overcome such problems in processors.
- the first type of parallelism is supported by multiple functional units and allows processing to proceed simultaneously in each functional unit.
- Super-sealer processor architectures and very long instruction word (VLIW) processor architectures allow instructions to be issued to each of several functional units on the same cycle.
- VLIW very long instruction word
- the latency, or time for completion varies from one type of functional unit to another.
- the most simple functions e.g. bitwise AND
- a floating add function may take 3 or more cycles.
- the second type of parallel processing is supported by pipelining of individual functional units.
- a floating ADD may take 3 cycles to complete and be implemented in three sequential sub-functions requiring 1 cycle each.
- a second floating ADD may be initiated into the first sub-function on the same cycle that the previous floating ADD is initiated into the second sub-function.
- a floating ADD may be initiated and completed every cycle even though any individual floating ADD requires 3 cycles to complete.
- the third type of parallel processing available is that of devoting different field-partitions of a word to different instances of the same calculation.
- a 32 bit word on a 32 bit processor may be divided into 4 field-partitions of 8 bits. If the data items are small enough to fit in 8 bits, it may be possible to process all 4 values with the same single instruction.
- loop unrolling is a generally applicable technique, a specific example is helpful in learning the benefits.
- Program A Given, for example, Program A below.
- Program B below is equivalent to Program A.
- n 0:4:255, ⁇ S(n); S(n+1); S(n+2); S(n+3); ⁇ ;
- Program C for n 0:4:255, ⁇ Sl(n) S2(n); S3(n); S4(n); S5(n); Sl(n+1); S2(n+1); S3(n+1); S4(n+1); S5(n+1); Sl(n+2); S2(n+2); S3(n+2); S4(n+2); S5(n+2); Sl(n+3); S2(n+3); S3(n+3); S4(n+3); S5(n+3); ⁇ ;
- n 0:4:255, ⁇ Sl(n); Sl(n+1); Sl(n+2); Sl(n+3); S2(n); S2(n+1); S2(n+2); S2(n+3); S3(n); S3(n+1); S3(n+2); S3(n+3); S4(n); S4(n+1); S4(n+2); S4(n+3);
- guarded instructions a facility available on many processors.
- a guarded instruction specifies a Boolean value as an additional operand with the meaning that the instruction always occupies the expected functional unit, but the retention of the result is suppressed if the guard is false.
- guarded approach suffers a large penalty if, as in Program A', the guards are preponderantly "true” and the "else” clause is large. In that case, all instances pay the large "else” clause penalty even though only a few are affected by it. If one has an operation S to be guarded by a condition C, it may be programmed as guard(C, S);
- Program A' maybe unrolled to Program D' as follows:
- n 0:4:255, ⁇ Sl(n); Sl(n+1); Sl(n+2); Sl(n+3); S2(n); S2(n+1); S2(n+2); S2(n+3); S3(n); S3(n+1); S3(n+2); S3(n+3); S4(n); S4(n+1); S4(n+2); S4(n+3); S5(n); S5(n+1); S5(n+2); S5(n+3); ifC(n) then T(I(n)); ifC(n+l) then T(I(n+l)); ifC(n+2) then T(I(n+2)); ifC(n+3) then T(I(n+3));
- T(I(n)) may be executed in 77% of the loop turns, one T(I(n)) may be executed in 21% of the loop turns, and more than one T(I(n)) in only 2% of the loop turns.
- An encoder is a process which maps an input sequence of symbols into another, coded, sequence of symbols in such a way that another process, called a decoder, is able to reconstruct the input sequence of symbols from the coded sequence of symbols.
- the encoder and decoder pair together are referred to as a "codec.”
- a finite sequence of symbols is often referred to as a string so one can refer to the input string and the coded string.
- Each symbol of an input string is drawn from an associated, finite, alphabet of input symbols I.
- each symbol of a coded string is drawn from an associated, finite alphabet of code symbols C.
- Each alphabet contains a distinguished symbol, called the ⁇ end> symbol.
- Each and every string terminates in the associated ⁇ end> symbol and the ⁇ end> symbol may only appear at the terminal end of a string.
- the purpose of the ⁇ end> symbols is to bring the codec processes to an orderly halt. Any method of determining the end of an input or code string can be used to synthesize the effect of a real or virtual ⁇ end> symbol. For example, in many applications the length of the input and/or the coded string is known and that information may be used in substitution for a literal ⁇ end> string.
- Codecs as described so far, do not have a practical implementation as the number input strings (and the number of code strings) is infinite. Without placing more structure and restrictions on a codec, it cannot be feasibly implemented in a finite machine, much less have a practical implementation.
- a finite state transducer is an automaton that sequentially processes a string from its initial symbol to its terminal symbol ⁇ end>, writing the symbols of the code string as it sequences. Information is sequentially obtained from the symbols of the input string and eventually represented in the code string. To bridge the delay between obtaining the information from the input string and representing it in the code string, the FST maintains and updates a state as it sequences.
- the state is chosen from a finite set of possible states called a state space.
- the state space contains two distinguished states called ⁇ start> and ⁇ finish>.
- the FST initiates its process in the ⁇ start> state and completes its process in the ⁇ finish> state.
- the ⁇ fmish> state should not be reached until the ⁇ end> symbol has been read from the input string and an ⁇ end> symbol has been appended to the code string.
- the state space is finite, it is not possible to represent every encoder as an FST.
- the present description focuses on codecs where both the encoder and decoder can be described and implemented as FSTs.
- the encoder ⁇ can be implemented as an FST, it can be specified by means of an update function ⁇ .
- the first input symbol a from the input string is combined with the current state si and produces the next state s2.
- the first symbol is conditionally removed from the beginning of the input string.
- the produced code symbol b is conditionally appended to the code string.
- ⁇ is undefined if the current state is ⁇ f-nish> and the FST terminates sequencing.
- ⁇ s (s ⁇ , a) is by definition the first component of ⁇ (s ⁇ , a)
- ⁇ t> (s ⁇ , a) is by definition the second component of ⁇ (s ⁇ , a).
- si) there is a probability Prob(a
- this probability may be stipulated, may be statically estimated from historical data, or may be dynamically estimated from the recent operation of the FST. In this latter case, the information on which the probability estimate is based may be encoded in the state space. From this, one can calculate Prob(s 2
- the asymptotic state probabilities P(s) can be calculated as the elements of the right eigenvector of M corresponding to the largest eigenvalue 1.
- Video “codecs” are used to reduce the data rate required for data communication streams by balancing between image quality, processor requirements (i.e. cost/power consumption), and compression ratio (i.e. resulting data rate).
- the currently available compression approaches offer a different range of trade-offs, and spawn a plurality of codec profiles, where each profile is optimized to meet the needs of a particular application.
- Lossy digital video compression systems operate on digitized video sequences to produce much smaller digital representations.
- the reconstructed visible result looks much like the original video but may not generally be a perfect match.
- a typical digital video compression system operates in a sequence of stages, comprising a transform stage, a quantization stage, and an entropy-coding stage.
- Some compression systems such as MPEG and other DCT-based codec algorithms add other stages, such as a motion compensation search, etc.
- 2D and 3D Wavelets are current alternatives to the DCT-based codec algorithms. Wavelets have been highly regarded due to their pleasing image quality and flexible compression ratios, prompting the JPEG committee to adopt a wavelet algorithm for its JPEG2000 still image standard.
- wavelet transform When using a wavelet transform as the transform stage in a video compressor, such algorithm operates as a sequence of filter pairs that split the data into high-pass and low-pass components or bands.
- Standard wavelet transforms operate on the spatial extent of a single image, in 2-dimensional fashion. The two dimensions are handled by combining filters that work horizontally with filters that work vertically. Typically, these alternate in sequence, H-V-H-V, though strict alternation is not necessary. It is known in the art to apply wavelet filters in the temporal direction as well: operating with samples from successive images in time, hi addition, wavelet transforms can be applied separately to brightness or luminance (luma) and color-difference or chrominance (chroma) components of the video signal.
- This mixed 3-D transform serves the same purpose as a 3-D wavelet transform. It is also possible to use a short DCT in the temporal direction for a 3-D DCT transform.
- the temporal part of a 3-D wavelet transform typically differs from the spatial part in being much shorter.
- Typical sizes for the spatial transform are 720 pixels horizontally and 480 pixels vertically; typical sizes for the spatial transform are two, four, eight, or fifteen frames. These temporal lengths are smaller because handling many frames results in long delays in processing, which are undesirable, and requires storing frames while they are processed, which is expensive.
- a system, method and computer program product are provided for processing exceptions. Initially, computational operations are processed in a loop. Moreover, exceptions are identified and stored while processing the computational operations. Such exceptions are then processed separate from the loop.
- the computational operations may involve nonsignificant values.
- the computational operations may include counting a plurality of zeros.
- the computational operations may include either clipping and/or saturating operations.
- the exceptions may include significant values.
- the exceptions may include non-zero data.
- the computational operations may be processed at least in part utilizing a transform module, quantize module and/or entropy code module of a data compression system, for example.
- the processing may be carried out to compress data.
- the data may be compressed utilizing wavelet transforms, discrete cosine transforms, and/or any other type of de-correlating transform.
- a coder and/or decoder system and method including a variable modulus.
- the modulus may reflect a steepness of a probability distribution curve associated with a compression algorithm.
- the modulus may include a negative exponential of the probability distribution.
- the probability distribution is associated with a codec.
- the modulus may depend on a context of a previous set of data. Moreover, the modulus may avoid increasing as a function of a run length (i.e. a plurality of identical bits in a sequence).
- the codec may be designed to utilize a minimal computational complexity given a predetermined, desired performance level.
- a system and method are provided for compressing data.
- luminescence data of a frame is updated at a first predetermined rate, while chrominance data of the frame is updated at a second predetermined rate that is less than the first predetermined rate.
- one or more frequency bands of the chrominance data may be omitted.
- the one or more frequency bands may be omitted utilizing a filter.
- Such filter may include a wavelet filter.
- Another system and method are provided for compressing data.
- Such system and method involves compressing video data, and inserting pause information with the compressed data.
- the pause information is used when the video data is paused during the playback thereof.
- the pause information may be used to improve a quality of the played back video data during a pause operation.
- the pause information may include a high-resolution frame.
- the pause information may include data capable of being used to construct a high-resolution frame.
- Figure 1 illustrates a framework for compressing/decompressing data, in accordance with one embodiment.
- Figure 2 illustrates a method for processing exceptions, in accordance with one embodiment.
- Figure 3 illustrates an exemplary operational sequence of the method of Figure 2.
- FIGS 4-9 illustrate various graphs and tables associated various operational features, in accordance with different embodiments.
- Figure 10 is a computational complexity v. performance level graph illustrating a relationship of the present dyadic-monotonic (DM) codec framework and other algorithms.
- Figure 11 shows a transition table illustrating an update function for both an encoder and decoder, in accordance with one embodiment.
- Figure 12 illustrates a method for compressing data with chrominance (chroma) temporal rate reduction, in accordance with one embodiment.
- Figure 12A illustrates a method for compressing data with a high-quality pause capability during playback, in accordance with one embodiment.
- Figure 13 illustrates a method for compressing/decompressing data, in accordance with one embodiment.
- Figure 14 shows a data structure on which the method of Figure 13 is carried out.
- Figure 15 illustrates a method for compressing/decompressing data, in accordance with one embodiment.
- Figure 1 illustrates a framework 100 for compressing/decompressing data, in accordance with one embodiment. Included in this framework 100 are a coder portion 101 and a decoder portion 103, which together form a "codec.”
- the coder portion 101 includes a transform module 102, a quantizer 104, and an entropy encoder 106 for compressing data for storage in a file 108.
- the decoder portion 103 includes a reverse transform module 114, a de-quantizer 111, and an entropy decoder 110 for decompressing data for use (i.e. viewing in the case of video data, etc).
- the transform module 102 carries out a reversible transform, often linear, of a plurality of pixels (i.e. in the case of video data) for the purpose of de- correlation.
- the quantizer 104 effects the quantization of the transform values, after which the entropy encoder 106 is responsible for entropy coding of the quantized transform coefficients.
- the various components of the decoder portion 103 essentially reverse such process.
- Figure 2 illustrates a method 200 for processing exceptions, in accordance with one embodiment.
- the present method 200 may be carried out in the context of the framework 100 of Figure 1. It should be noted, however, that the method 200 may be implemented in any desired context.
- the computational operations may involve non-significant values.
- the computational operations may include counting a plurality of zeros, which is often carried out during the course of data compression.
- the computational operations may include either clipping and/or saturating in the context of data compression.
- the computational operations may include the processing of any values that are less significant than other values.
- exceptions are identified and stored in operations 204-206.
- the storing may include storing any related data required to process the exceptions.
- the exceptions may include significant values.
- the exceptions may include non-zero data.
- the exceptions may include the processing of any values that are more significant than other values.
- the exceptions are processed separate from the loop. See operation 208.
- the processing of the exceptions does not interrupt the "pile" processing of the loop by enabling the unrolling of loops and the consequent improved performance in the presence of branches.
- the present embodiment particularly enables the parallel execution of lengthy exception clauses. This may be accomplished by writing and rereading a modest amount of data to/from memory. More information regarding various options associated with such technique, and "pile" processing will be set forth hereinafter in greater detail.
- the various operations 202-208 may be processed at least in part utilizing a transform module, quantize module and/or entropy code module of a data compression system. See, for example, the various modules of the framework 100 of Figure 1.
- the operations 202-208 may be carried out to compress/decompress data.
- the data may be compressed utilizing wavelet transforms, discrete cosine transform (DCT) transforms, and/or any other desired de-correlating transforms.
- DCT discrete cosine transform
- Figure 3 illustrates an exemplary operation 300 of the method 200 of Figure 2. While the present illustration is described in the context of the method 200 of Figure 2, it should be noted that the exemplary operation 300 may be implemented in any desired context.
- a first stack 302 of operational computations 304 are provided for processing in a loop 306. While progressing through such first stack 302 of operational computations 304, various exceptions 308 may be identified. Upon being identified, such exceptions 308 are stored in a separate stack and may be processed separately. For example, the exceptions 308 may be processed in the context of a separate loop 310.
- a "pile” is a sequential memory object that may be stored in memory (i.e. RAM). Piles may be intended to be written sequentially and to be subsequently read sequentially from the beginning. A number of methods are defined on pile objects.
- Table 1 illustrates the various operations that may be performed to carry out pile processing, in accordance with one embodiment.
- Conditional_Append(pile, condition, record) The primary method for writing to a pile is Conditional_Append(pile, condition, record). This method appends the record to the pile if and only if the condition is true.
- Destroy_PiIe(P) destroys the pile P by deallocating all of its state variables.
- Program D' (see Background section) into Program E' below by means of a pile P.
- Program E' operates by saving the required information I for the exception computation T on the pile P.
- I records corresponding to the exception condition C(n) are written so that the number (e.g., 16) of I records in P is less than the number of loop turns (e.g., 256) in the original Program A (see Background section).
- the second loop may be more difficult than the first loop because the number of turns of the second loop, while 16 on the average in this example, is indeterminate. Therefore, a "while" loop rather than a "for" loop may be used, terminating when the end of file (EOF) method indicates that all records have been read from the pile.
- EEF end of file
- Conditional_Append method invocations can be implemented inline and without branches. This means that the first loop is still unrolled in an effective manner, with few unproductive issue opportunities.
- PI Create_Pile
- P2 Create_Pile
- P3 Create_Pile
- P4 Create_Pile
- n 0:4:255, ⁇ Sl(n); Sl(n+1); Sl(n+2); Sl(n+3);
- Program F' is Program E' with the second loop unrolled.
- the unrolling is accomplished by dividing the single pile of Program E' into four piles, each of which can be processed independently of the other.
- Each turn of the second loop in Program F' processes one record from each of these four piles. Since each record is processed independently, the operations of each T can be interleaved with the operations of the 3 other T's.
- the control of the "while" loop may be modified to loop until all of the piles have been processed. Moreover, the T's in the "while" loop body may be guarded since, in general, all of the piles will not necessarily be completed on the same loop turn. There may be some inefficiency whenever the number of records in two piles differ greatly from each other, but the probabilities (i.e. law of large numbers) are that the piles may contain similar numbers of records.
- T itself contains a lengthy conditional clause T'
- T' one can split T' out of the second loop with some additional piles and unroll the third loop.
- Many practical applications have several such nested exception clauses.
- a pile may include an allocated linear array in memory (i.e. RAM) and a pointer, index, whose current value is the location of the next record to read or write.
- the written size of the array, sz is a pointer whose value is the maximum value of index during the writing of the pile.
- the EOF method can be implemented as the inline conditional (sz ⁇ index).
- the pointer base has a value which points to the first location to write in the pile. It may be set by the Create_Pile method.
- guard(condition, index index + sz_record).
- the record may be copied to the pile without regard to condition. If the condition is false, this record may be overwritten by the very next record. If the condition is true, the very next record may be written following the current record. This next record may or may not be itself overwritten by the record thereafter. As a result, it is generally optimal to write as little as possible to the pile even if that means re-computing some (i.e. redundant) data when the record is read and processed.
- Destroy_Pile deallocates the storage for the pile. All of these techniques (except Create_Pile and Destroy_Pile) may be implemented in a few inline instructions and without branches.
- an alternative to guarded processing is pile processing.
- the "else” clause transfers the input data to a pile in addressable memory (i.e. cache or RAM).
- the pile acts like a file being appended with the input data. This is accomplished by writing to memory at the address given by a pointer, hi file processing, the pointer may then be incremented by the size of the data written so that the next write would be appended to the one just completed.
- the incrementing of the pointer may be made conditional on the guard. If the guard is true, the next write may be appended to the one just completed. If the guard is false, the pointer is not incremented and the next write overlays the one just completed.
- the pile may be short and the subsequent processing of the pile with the "else" operations may take a time proportional to just the number of true guards (i.e. false if conditions) rather than to the total number of instances.
- the trade-off is the savings in "else” operations vs. the extra overhead of writing and reading the pile.
- processors have special instructions which enable various arithmetic and logical operations to be performed independently and in parallel on disjoint field-partitions of a word.
- the current description involves methods for processing "bit-at-a-time" in each field-partition.
- the 8 bits of a field-partition are chosen to be contiguous within the word so the "adds" can be performed and "carry's” propagate within a single field-partition.
- the commonly available arithmetic field-partition instructions inhibit the carry-up from the most significant bit (MSB) of one field-partition into the least significant bit (LSB) of the next most significant field-partition.
- the array c may need an extra guard index at the end. The user knows whether or not to discard the last value in c by inspecting the final value of i.
- processors that have partitioned arithmetic often have ADD instructions that act on each field independently. Some of these processors have other kinds of field- by-field instructions (e.g., partitioned arithmetic right shift which shifts right, does not shift one field into another, and does copy the MSB of the field, the sign bit, into the just vacated MSB).
- partitioned arithmetic right shift which shifts right, does not shift one field into another, and does copy the MSB of the field, the sign bit, into the just vacated MSB.
- Some of these processors have field-by-field comparison instructions, generating multiple condition bits. If not, the partitioned subtract instruction is often pressed into service for this function. In this case, a ⁇ b is computed as a-b with a minus sign indicating true and a plus sign indicating false. The other bits of the field are not relevant. Such a result can be converted into a field mask of all 1 's for true or all O's for false, as used in the example in C) of Table 2, by means of a partitioned arithmetic right shift with a sufficiently long shift. This results in a multi-field comparison in two mstmctions.
- a field mask can be constructed from the sign bit by means of four instructions found on all contemporary processors. These are set forth in Table 3.
- a partitioned zero test on a positive field x can be performed by x + 0x7fff so that the sign bit is zero if and only if x is zero. If the field is signed, one may use x I x + 0x7fff. The sign bit can be converted to a field mask as described above.
- condition that all fields are zero can be tested in a single instruction by comparing the total (un-partitioned) word of fields to zero.
- a zero word except for a "1" in the MSB position of each field-partition is called MSB.
- a zero word except for a "1" in the LSB position of each field-partition is called LSB.
- the number of bits in a bit- partition is B. Unless otherwise stated, all words are unsigned (Uint) and all right shifts are logical with zero fill on the left.
- a single information bit in a multi-bit field-partition can be represented in many different ways.
- the mask representation has all of the bits of a given field- partition equal to each other and equal to the information bit.
- the information bits may vary from one field-partition to another within a word.
- MSB representation Another useful representation is the MSB representation.
- the information bit is stored in the MSB position of the corresponding field-partition and the remainder of the field-partition bits are zero.
- the LSB representation has the information bit in the LSB position and all others zero.
- ZNZ representation where a zero information bit is represented by zeros in every bit of a field-partition and a "1" information bit otherwise. All of the mask, MSB, and LSB representations are ZNZ representations, but not necessarily vice versa.
- Conversions between representations may require one to a few word length instructions, but those instructions process all field-partitions simultaneously.
- the mask representation m can be converted to the MSB representation by clearing the non-MSB bits. On most processors, all field-partitions of a word can be converted from mask to MSB in a single "andnof instruction (m ⁇ MSB). Likewise, the mask representation can be converted to the LSB representation by a single "andnot" instruction (m ⁇ LSB).
- All of the field partitions of a word can be converted from ZNZ x to MSB y as follows.
- One may use the word add instruction to add to the ZNZ a word with zero bits in the MSB positions and "1" bits elsewhere. The result of this add may have the proper bit in the MSB position, but the other bit positions may have anything. This is remedied by applying an "andnot" instruction to clear the non- MSB bits, y (x + ⁇ msb) ⁇ ⁇ MSB.
- Bit Output fri some applications e.g., entropy codecs
- the current description will now indicate how to do this in a field-partition parallel way.
- the field partitions and associated bit strings may be independent of each other, each representing a parallel instance.
- the information bits are conditionally (i.e. conditioned on valid true) appended until a field-partition is filled. 3. When a field-partition is filled, it is appended to the end of a corresponding field-partition string.
- the lengths of the field- partitions are all equal and a divisor of the word-length.
- the not-yet-completely-filled independent field-partitions are held in a single word, called the accumulator.
- the accumulator There is an associated bit-pointer word in which every field-partition of that word contains a single 1 bit (i.e. the rest zeros). That single 1 bit is in a bit position that corresponds to the bit position in the accumulator to receive the next appended bit for that field-partition. If the field-partition of the accumulator fills completely, the field-partition is appended to the corresponding field-partition string and the accumulator field-partition is reset to zero.
- Appending (conditionally) the incoming information bit may be feasible.
- the input bit mask, the valid mask, and the bit-pointer are wordwise “ANDed” together and then wordwise “ORed” with the accumulator. This takes 3 instruction executions per word on most processors.
- bit-pointer word may be updated by rotating each valid field-partition of the bit-pointer right one position. The method for doing this is as follows in Table 6.
- a field-partition is full if the corresponding field-partition of the bit-pointer p has its 1 in the LSB partition.
- the probability of full is usually significantly less than 0.5 so that an application of piling is in order.
- Both the accumulator a and fare piled to pile Al, using full as the condition.
- the length of pile Al may be significantly less than the number of bit append operations. Piling is designed so that processing does not necessarily involve control flow changes other than those involved in the overall processing loop.
- pile Al is processed by looping through the items in Al .
- the field-partitions are scanned in sequence. The number of field- partitions per word is small, so this sequence can be performed by straight-line code with no control changes.
- pile A2 is processed by looping through the items of A2.
- the index I is used to select the bit-string array to which the corresponding a2 should be appended.
- the file-partition size in bits, B is usually chosen to be a convenient power of two (e.g., 8 or 16 bits). Store instructions for 8 bit or 16 bit values make those lengths convenient. Control changes other than the basic loops are not necessarily required throughout the above processes.
- a common operation required for codecs is the serial readout of bits in a field of a word.
- the bit to be extracted from a field x is designated by a bit_pointer, a field value of 0s except for a single "1" bit (e.g., 0x0200).
- the "1" bit is aligned with the bit to be extracted so that x & bit_pointer is zero or non-zero according to the value of the read out bit. This can be converted to a field mask as described above.
- Each instruction in this sequence may simultaneously process all of the fields in a word.
- the serial scanning is accomplished by shifting the bit_pointer in the proper direction and repeating until the proper terminating condition. Since not all fields may terminate at the same bit position, the above procedure may be modified so that terminated fields do not produce an output while unterminated fields do produce an output. This is accomplished by producing a valid field mask that is all "l"s if the field is unterminated or all "0"s if the field is terminated. This valid field mask is used as an output conditional. The actual scanning is continued until all fields are terminated, indicated by valid being a word of all zeros.
- the terminal condition is often the bit in the bit_pointer reaching a position indicated by a " 1 " bit in a field of terminal_bit_pointer. This may be indicated by a "1" bit in bit_pointer& terminal_bitjpointer. These fields may be converted to the valid field mask as described above.
- test in operation E) can be initiated as early as operation B) with the branch delayed to operation E) and operations B)-D) available to cover the branch pipeline delay. Also, since the sub-fields are congruent it is relatively easy to unroll the processing of several words to cover the sequential dependencies within the instructions for a single word of field-partitions.
- Step D) may need a condition where the field-partition value is false for completed field-partitions and true for not-yet- completed field-partitions. This is accomplished by appending to operation E) an operation which "andnot" the cond word onto COND.
- COND (COND ⁇ ⁇ cond)
- step E) The if condition in step E) needs to be modified to loop back to B) unless COND is all FALSE.
- a common operation in entropy coding is that of converting a field from binary to unary - that is producing a string of n ones followed by a zero for a field whose value is n.
- the values of n are expected to have a negative exponential distribution with a mean of one so that, on the average, one may expect to have just one "1" in addition to the terminal zero in the output.
- the procedure is to count down (in parallel) the fields in question and at the same time carry up into the initially zero MSB position c. If the MSB position is a "1" after the subtraction, the previous value of the field was not zero and a "1" should be output. If the MSB position is a zero after the subtraction, the previous value of the field was zero and a zero should be output. In any case, the MSB position contains the bit to be output for the corresponding field-partition of the word X.
- Figure 4 shows a graph 400 illustrating » , in accordance with one embodiment.
- Figure 5 shows a graph 500 illustrating the corresponding " , in accordance with one embodiment.
- output bits may have a 0.5 probability of being one and a 0.5 probability of being zero. They may also be independent. With these assumptions, one can make the following calculations.
- the coder portion 101 and/or decoder portion 103 of Figure 1 may include a variable modulus.
- the modulus may reflect a steepness of a probability distribution curve associated with a compression algorithm utilized by the codec framework 100.
- the modulus may include a negative exponential of the probability distribution.
- the modulus may vary as a function of any desired parameter, the modulus may, in one embodiment, depend on a context of a previous set of data, where such set of data may refer to a set of bits being processed by the various modules of the codec framework 100. Moreover, the modulus may avoid increasing as a function of a run length (i.e. a plurality of identical bits in a sequence).
- a dyadic-monotonic (DM) codec framework may thus be provided. More information regarding optional ways in which the modulus may depend on a context of a previous set of data, the modulus may avoid increasing as a function of a run length, etc. will be set forth hereinafter in greater detail.
- Figure 10 is a computational complexity v. performance level graph 1000 illustrating a relationship of the present dyadic-monotonic (DM) codec framework and other algorithms (i.e. Huffman, Rice Golomb, arithmetic, etc.). As shown, the DM codec framework may be designed to utilize a minimal computational complexity given a predetermined performance level.
- DM dyadic-monotonic
- the DM codec may be specified by describing the state space and update function thereof (see Background section).
- Each state has five components, the position P, the context, the shift, the Aregister, and the Cregister.
- the modulus may vary based on a context of a previous set of data.
- the present context may include a bit string of length k over the input alphabet (total 2 k states).
- Each of the Aregister and Cregister may hold a non-negative multiple of 2 " “ that is less than one (total 2" states each), hi both the ⁇ start> state and the ⁇ finish> state, both the Aregister and the Cregister have the value zero.
- the context value for the ⁇ start> state initial, though arbitrary, may be necessarily the same for both the encoder and decoder.
- the P value in the ⁇ starf> state is start and in the ⁇ flnish> state is finish.
- the shift value is irrelevant in the ⁇ start> and ⁇ flnish> states.
- the function mps maps each context value to a value in the input alphabet /.
- the intention is that mps is the symbol in /that is the more probable symbol given the context.
- the function delta maps each of the 2 k context values to 2 ⁇ m where 0 ⁇ m ⁇ n.
- the intention of the function delta is that it quantitatively captures the information about the probability of the value of the next symbol.
- the DM constraints are those set forth in Table 12.
- Figure 11 shows a transition table 1100 illustrating an update function for both an encoder and decoder, in accordance with one embodiment.
- Each line represents a set of states, where the states of the set are those that satisfy each of the conditions in the predicate columns.
- Each row forms a partition of the allowable states.
- the actions in the right hand part of the row are executed. All values utilized are values at the initial state, so action sequence within a row is not necessarily an issue. Each component of the new state may receive a unique value. Blank entries mean that that component of the state is unchanged.
- the update actions from the "common” group of columns and the update actions from the "encoder” group of columns are carried out.
- the decoder the actions are chosen from the "common” and "decoder" groups of columns.
- the effect of the DM conditions is that the Aregister is always a multiple of the last delta added (at F13).
- the dyadic condition ensures that the binary representation of delta has exactly one "1" bit.
- the monotonic condition ensures that delta not become larger until a code symbol is produced, so that the bit in delta remains only in the same position or moves to the right. This situation remains until a code symbol is produced, at which point the Aregister becomes zero (precisely because only the Aregister bits to the right of the last delta are preserved).
- A is a multiple of delta b) A is zero after writing a code symbol c) Renormalization after writing a code symbol is unnecessary
- the Cregister required in general arithmetic coding, is not necessarily used in the DM encoder.
- the entire memory of the preceding state sequence is captured in the context. Since the context is the previous k input symbols, the DM codec has a short-term memory and consequently adapts quickly to the local statistics of the input string.
- image, video, and signal processing often involves a transform whose purpose is to "concentrate” the signal, i.e., produce a few large coefficients and many negligible coefficients (which are discarded by replacing them with zero).
- the identity (or location) of the non-negligible coefficients is usually as important as their values. This information is often captured in a "significance function" which maps non-negligible coefficients to "1" and negligible ones to "0.”
- a significance bit can be predicted with good accuracy from its immediate predecessors. If that order lists the coefficients in descending order by their expected magnitude, one may obtain a significance bit string that begins with predominantly 1 's and ends with predominantly O's. Such a string, whose statistics change as the string goes on, is called non-stationary. Effective entropy coding of such a string may require a memory of the immediately preceding context. This memory may be extensive enough to achieve good prediction accuracy and short- lived enough to allow sufficiently rapid adaptation.
- a run within the significance function may include a substring of bits where all but the last bit have one value and the last bit has the other value. The next run begins immediately after the last bit of the preceding mn.
- the procedure is to collect sufficient empirical data, qualified by context, and for each context form a histogram. From this, the probability functions can be approximated.
- the function mps(context) can be calculated directly.
- the function delta(context) can be calculated by an iterative solution of 2) above in Table 14.
- the DM codec thus maps input strings 1 : 1 into coded strings and a coded string, when decoded, yields the original input string. Not all output strings can necessarily be generated as the encode of some input string. Some input strings may encode to shorter coded strings - many may encode to longer coded strings. Regarding the length of an input string vis-a-vis the coded string to which it encodes, it may be helpful to describe the probabilities of occurrence of various possible input strings. If the codec has useful compression properties, it may be that the probability of occurrence of an input string which encodes to a short string is much larger than the probability of occurrence of an input string which encodes to a long string.
- Dynamic probabilities need not necessarily apply.
- the statistics of the significance bitstream can and does change often and precipitously. Such changes cannot necessarily be tracked by adaptive probability tables, which change only slowly even over many runs.
- the DM coder therefore does not necessarily use probability tables; but rather adapts within either the last few bits or within a single run.
- Empirical tests with significance bits data indicate that most of the benefit of the context is obtained with only the last few bits of the significance bit string. These last few bits of the significance string, as context, are used to condition the probability of the next bit.
- the important probability quantity ⁇ co t ext ⁇ Prob(next input bit LSB
- the entropy that may be added to minimally represent that next bit is as follows in Table 15.
- delta(context) entropy/2 because the Aregister is scaled to output the 2 "1 bit.
- I tffpccoo ⁇ n ttee x tt - ⁇ 00..55, tthheenn tthhee eennttrrooppyy that may be added for that next bit is approximately as follows in Table 16.
- Figure 12 illustrates a method 1200 for compressing data with chrominance temporal rate reduction, in accordance with one embodiment.
- the present method 1200 may be carried out in the context of the transform module 102 of Figure 1 and the manner in which it carries out a reversible transform. It should be noted, however, that the method 1200 may be implemented in any desired context.
- luminescence (luma) data of a frame is updated at a first predetermined rate.
- chrominance (chroma) data of the frame is updated at a second predetermined rate that is less than the first predetermined rate.
- ⁇ n a digital video compression system, it is thus possible to vary the effective rate of transmitting temporal detail for different components of the scene. For example, one may arrange the data stream so that some components of the transformed signal are sent more frequently than others. In one example of this, one may compute a three-dimensional (spatial + temporal) wavelet transform of a video sequence, and transmit the resulting luma coefficients at the full frame rate.
- chroma rate compression is as follows: for the chroma components, one may compute an average across two frames (four fields) of the spatially transformed chroma values. This may be accomplished by applying a double Haar wavelet filter pair and discarding all but the lowest frequency component. One may transmit only this average value. On reconstruction, one can hold the received value across two frames (four fields) of chroma. It has been found that viewers do not notice this, even when they are critically examining the compression method for flaws.
- the following stage of the video compression process discards information by grouping similar values together and transmitting only a representative value. This discards detail about exactly how bright an area is, or exactly what color it is.
- zero is chosen for the representative value (denoting no change at a particular scale).
- human visual sensitivity to levels is known to differ between luma and chroma.
- Figure 12A illustrates a method 1250 for compressing data with a high- quality pause capability during playback, in accordance with one embodiment.
- the present method 1250 may be carried out in the context of the framework of Figure 1. It should be noted, however, that the method 1250 may be implemented in any desired context. ⁇ -
- fri operation 1252 video data is compressed.
- the data compression may be carried out in the context of the coder portion 101 of the framework of Figure 1. Of course, such compression may be implemented in any desired context.
- pause information is inserted with the compressed data.
- the pause information may be used to improve a quality of the played back video data.
- the pause information may include a high- resolution frame.
- the pause information may include data capable of being used to construct a high-resolution frame.
- the pause information may be used when the video data is paused during the playback thereof, hi the present method, the compressed video is equipped with a set of extra information especially for use when the video is paused.
- This extra information may include a higher-quality frame, or differential information that when combined with a regular compressed frame results in a higher-quality frame.
- this extra information need not be included for every frame, but rather only for some frames.
- the extra information may be included for one frame of every 15 or so in the image, allowing a high-quality pause operation to occur at a time granularity of Vz second. This may be done in accord with observations of video pausing behavior.
- the extra information may include a whole frame of the video, compressed using a different parameter set (for example, quantizing away less infoimation) or using a different compression method altogether (for example, using JPEG-2000 within an MPEG stream).
- These extra frames may be computed when the original video is compressed, and ma be carried along with the regular compressed video frames in the transmitted or stored compressed video.
- the extra information may include extra information for the use of the regular decompression process rather than a complete extra frame.
- the extra information might consist of a filter band of data that is discarded in the normal compression but retained for extra visual sha ⁇ ness when paused.
- the extra information might include extra low-order bits of information from the transformed coefficients, and additional coefficients, resulting from using a smaller quantization setting for the chosen pausable frames.
- the extra information may include data for the use of a decompression process that differs from the regular decompression process, and is not a complete frame. This information, after being decompressed, may be combined with one or more frames of video decompressed by the regular process to produce a more detailed still frame.
- Figure 13 illustrates a method 1300 for compressing/decompressing data, in accordance with one embodiment, hi one embodiment, the present method 1300 may be carried out in the context of the transform module 102 of Figure 1 and the manner in which it carries out a reversible transform. It should be noted, however, that the method 1300 may be implemented in any desired context.
- an interpolation formula is received (i.e. identified, retrieved from memory, etc.) for compressing data.
- the data may refer to any data capable of being compressed.
- the interpolation formula may include any formula employing inte ⁇ olation (i.e. a wavelet filter, etc.).
- At least one data value is required by the inte ⁇ olation formula, where the required data value is unavailable.
- Such data value may include any subset of the aforementioned data. By being unavailable, the required data value may be non-existent, out of range, etc.
- the extrapolation formula may include any formula employing extrapolation. By this scheme, the compression of the data is enhanced.
- Figure 14 shows a data structure 1400 on which the method 1300 is carried out.
- a "best fit" 1401 may be achieved by an inte ⁇ olation formula 1403 involving a plurality of data values 1402. Note operation 1302 of the method 1300 of Figure 13. If it is determined that one of the data values 1402 is unavailable (see 1404), an extrapolation formula may be used to generate such unavailable data value. More optional details regarding one exemplary implementation of the foregoing technique will be set forth in greater detail during reference to Figure 15.
- Figure 15 illustrates a method 1500 for compressing/decompressing data, in accordance with one embodiment.
- the present method 1500 may be carried out in the context of the transform module 102 of Figure 1 and the manner in which it carries out a reversible transform. It should be noted, however, that the method 1500 may be implemented in any desired context.
- the method 1500 provides a technique for generating edge filters for a wavelet filter pair.
- a wavelet scheme is analyzed to determine local derivatives that a wavelet filter approximates.
- a polynomial order is chosen to use for extrapolation based on characteristics of the wavelet filter and a numbers of available samples.
- extrapolation formulas are derived for each wavelet filter using the chosen polynomial order. See operation 1506.
- specific edge wavelet cases are derived utlizing the extrapolation formulas with the available samples in each case.
- One of the transforms specified in the JPEG 2000 standard 1) is the reversible 5-3 transform shown in Equations #1.1 and 1.2.
- Equation # 1.1.R Equation # 1.1.R.
- Equation # 1.1.R may be used in place of Equation #1.1 when point one is right-most.
- the apparent multiply by 3 can be accomplished with a shift and add.
- the division by 3 is trickier.
- the right-most index is 2N - 1
- Equation #1.2 there is no problem calculating Y 2N _ 2 by means of Equation #1.2.
- the index of the right-most point is even (say 2N )
- Equation #1.2 involves missing values.
- the object is to subtact an estimate of 7 from the even X using just the previously calculated odd indexed 7 s, 7, and 7 3 in the case in point. This required estimate at index 2N can be obtained by linear extrapolation, as noted above.
- the appropriate formula is given by Equation #1.2.R.
- Equations #1.1.L and 1.2.L >X ⁇ -X + ⁇ r 0 - x - eq l.l.L
- the reverse transform fiters can be obtained for these extrapolating boundary filters as for the original ones, namely by back substitution.
- the inverse transform boundary filters may be used in place of the standard filters in exactly the same circumstances as the forward boundary filters are used.
- Such filters are represented by Equations #2.1.Rinv, 2.2.Rinv, 2.1.L.inv, and 2.2.L.inv.
- one embodiment may utilize a reformulation of the 5-3 filters that avoids the addition steps of the prior art while preserving the visual properties of the filter. See for example, Equations #3.1, 3.1 R, 3.2, 3.2L.
- Equations #3.1. 3.1R. 3.2. 3.2L (N 2 consult +l/2) + (N 2 ⁇ +2 +l/2) 2 «+l - ( ⁇ 2» +1 + ⁇ ) - eq 3.1
- Y 2N+1 ⁇ X 2NH +l/2)-(X 2N +l/2) eg 3.1R
- JPEG-2000 inverse filters can be reformulated in the following Equations #4.2, 4.2L, 4.1, 4.1R.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004508038A JP2005527911A (en) | 2002-05-28 | 2003-05-28 | System and method for pile processing and parallel processors |
EP03755529A EP1527396A4 (en) | 2002-05-28 | 2003-05-28 | Systems and methods for pile-processing parallel-processors |
AU2003232418A AU2003232418A1 (en) | 2002-05-28 | 2003-05-28 | Systems and methods for pile-processing parallel-processors |
Applications Claiming Priority (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US38525302P | 2002-05-28 | 2002-05-28 | |
US38525002P | 2002-05-28 | 2002-05-28 | |
US38525102P | 2002-05-28 | 2002-05-28 | |
US60/385,253 | 2002-05-28 | ||
US60/385,251 | 2002-05-28 | ||
US60/385,250 | 2002-05-28 | ||
US39034502P | 2002-06-21 | 2002-06-21 | |
US39049202P | 2002-06-21 | 2002-06-21 | |
US60/390,492 | 2002-06-21 | ||
US60/390,345 | 2002-06-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003100655A1 true WO2003100655A1 (en) | 2003-12-04 |
Family
ID=29587921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2003/016908 WO2003100655A1 (en) | 2002-05-28 | 2003-05-28 | Systems and methods for pile-processing parallel-processors |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1527396A4 (en) |
JP (1) | JP2005527911A (en) |
CN (1) | CN100390781C (en) |
AU (1) | AU2003232418A1 (en) |
WO (1) | WO2003100655A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281559A (en) * | 2013-07-09 | 2015-01-14 | 罗伯特·博世有限公司 | Model calculation method and device used for performing function model based on data |
CN104965461A (en) * | 2015-07-03 | 2015-10-07 | 武汉华中数控股份有限公司 | Bus hand-held unit having touch function |
CN112995637A (en) * | 2021-03-10 | 2021-06-18 | 湘潭大学 | Multi-section medical image compression method based on three-dimensional discrete wavelet transform |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2634171C1 (en) * | 2016-12-12 | 2017-10-24 | Акционерное общество "Лаборатория Касперского" | Method of code execution by interpreter |
KR102414583B1 (en) * | 2017-03-23 | 2022-06-29 | 삼성전자주식회사 | Electronic apparatus for operating machine learning and method for operating machine learning |
CN112106363A (en) * | 2018-05-10 | 2020-12-18 | 夏普株式会社 | System and method for performing binary arithmetic coding in video coding |
US20230024560A1 (en) * | 2018-05-10 | 2023-01-26 | Sharp Kabushiki Kaisha | Systems and methods for performing binary arithmetic coding in video coding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5893145A (en) * | 1996-12-02 | 1999-04-06 | Compaq Computer Corp. | System and method for routing operands within partitions of a source register to partitions within a destination register |
US6141673A (en) * | 1996-12-02 | 2000-10-31 | Advanced Micro Devices, Inc. | Microprocessor modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instructions |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1158396A1 (en) * | 1999-10-29 | 2001-11-28 | V-Sync Co., Ltd. | Database system |
JP3613454B2 (en) * | 1999-11-15 | 2005-01-26 | 日本電気株式会社 | Compiler device and computer-readable recording medium storing compiler |
JP4267173B2 (en) * | 2000-05-01 | 2009-05-27 | トヨタ自動車株式会社 | Abnormality diagnosis system |
-
2003
- 2003-05-28 JP JP2004508038A patent/JP2005527911A/en active Pending
- 2003-05-28 CN CNB038177501A patent/CN100390781C/en not_active Expired - Fee Related
- 2003-05-28 WO PCT/US2003/016908 patent/WO2003100655A1/en active Application Filing
- 2003-05-28 EP EP03755529A patent/EP1527396A4/en not_active Withdrawn
- 2003-05-28 AU AU2003232418A patent/AU2003232418A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5893145A (en) * | 1996-12-02 | 1999-04-06 | Compaq Computer Corp. | System and method for routing operands within partitions of a source register to partitions within a destination register |
US6141673A (en) * | 1996-12-02 | 2000-10-31 | Advanced Micro Devices, Inc. | Microprocessor modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instructions |
Non-Patent Citations (1)
Title |
---|
See also references of EP1527396A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281559A (en) * | 2013-07-09 | 2015-01-14 | 罗伯特·博世有限公司 | Model calculation method and device used for performing function model based on data |
CN104965461A (en) * | 2015-07-03 | 2015-10-07 | 武汉华中数控股份有限公司 | Bus hand-held unit having touch function |
CN112995637A (en) * | 2021-03-10 | 2021-06-18 | 湘潭大学 | Multi-section medical image compression method based on three-dimensional discrete wavelet transform |
CN112995637B (en) * | 2021-03-10 | 2023-02-28 | 湘潭大学 | Multi-section medical image compression method based on three-dimensional discrete wavelet transform |
Also Published As
Publication number | Publication date |
---|---|
JP2005527911A (en) | 2005-09-15 |
CN1672147A (en) | 2005-09-21 |
EP1527396A4 (en) | 2008-03-12 |
AU2003232418A1 (en) | 2003-12-12 |
CN100390781C (en) | 2008-05-28 |
EP1527396A1 (en) | 2005-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3653183B2 (en) | Wavelet coefficient reconstruction processing method and apparatus, and recording medium | |
US7321695B2 (en) | Encoder rate control | |
US6167092A (en) | Method and device for variable complexity decoding of motion-compensated block-based compressed digital video | |
US6301392B1 (en) | Efficient methodology to select the quantization threshold parameters in a DWT-based image compression scheme in order to score a predefined minimum number of images into a fixed size secondary storage | |
US6229927B1 (en) | Reversible embedded wavelet system implementation | |
US7548583B2 (en) | Generation and use of masks in MPEG video encoding to indicate non-zero entries in transformed macroblocks | |
US6798833B2 (en) | Video frame compression/decompression hardware system | |
US7251375B2 (en) | Tile boundary artifact removal for arbitrary wavelet filters | |
WO1996033575A1 (en) | Video decoder using block oriented data structures | |
JP2007267384A (en) | Compression apparatus and compression method | |
US6198767B1 (en) | Apparatus for color component compression | |
KR20040005962A (en) | Apparatus and method for encoding and computing a discrete cosine transform using a butterfly processor | |
US6737993B2 (en) | Method and apparatus for run-length encoding data values | |
WO2003100655A1 (en) | Systems and methods for pile-processing parallel-processors | |
US20110072251A1 (en) | Pile processing system and method for parallel processors | |
WO2002056250A2 (en) | Method and system to encode a set of input values into a set of coefficients using a given algorithm | |
US20030198395A1 (en) | Wavelet transform system, method and computer program product | |
US6339614B1 (en) | Method and apparatus for quantizing and run length encoding transform coefficients in a video coder | |
US20020194175A1 (en) | Data processing method | |
KR20050023280A (en) | System and methods for pile-processing parallel-processors | |
Adiletta et al. | Architecture of a flexible real-time video encoder/decoder: the DECchip 21230 | |
Wu et al. | Additive vector decoding of transform coded images | |
Bhanja et al. | Hardware implementation of data compression | |
Linzer | Super efficient decoding of color JPEG images on RISC machines | |
Turri et al. | Integer division for quantization of wavelet-transformed images, using field programmable gate arrays |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004508038 Country of ref document: JP Ref document number: 1020047019345 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003755529 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038177501 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 1020047019345 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2003755529 Country of ref document: EP |