WO2001061543A9 - Method for compression of small computer data files - Google Patents

Method for compression of small computer data files

Info

Publication number
WO2001061543A9
WO2001061543A9 PCT/US2001/004986 US0104986W WO0161543A9 WO 2001061543 A9 WO2001061543 A9 WO 2001061543A9 US 0104986 W US0104986 W US 0104986W WO 0161543 A9 WO0161543 A9 WO 0161543A9
Authority
WO
WIPO (PCT)
Prior art keywords
data
compression
entries
ofthe
storing
Prior art date
Application number
PCT/US2001/004986
Other languages
French (fr)
Other versions
WO2001061543A1 (en
Inventor
Wallace Walter
Original Assignee
Orbcomm Global L P
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orbcomm Global L P filed Critical Orbcomm Global L P
Priority to AU2001238365A priority Critical patent/AU2001238365A1/en
Publication of WO2001061543A1 publication Critical patent/WO2001061543A1/en
Publication of WO2001061543A9 publication Critical patent/WO2001061543A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention generally relates to compression of binary data for ease of transmission between a source and receiving device, and more particularly to compression of binary data files by selecting an optimum compression method from a plurality of compression methods.
  • the definition ofthe codes is large relative to the data to be compressed, and second, when data changes slightly from record to record, few repetitive byte values can be assigned compact codes to represent them.
  • the present invention although useful for many applications, was originally designed for the purpose of reducing file sizes of data recorded at remote locations of natural
  • the present invention has reduced the size ofthe file needed to be transmitted to a
  • the present invention eliminates unnecessary bits and makes use ofthe principle that
  • Computers store and use numbers in binary format (base 2), instead ofthe more familiar base 10 format. Computers typically use a 16-bit or 32-bit segment of binary
  • Binary digits, or bits may have a value of
  • 1024 (base 10) is expressed as 00000100 00000000 (base 2 or binary). 1023 is expressed as 00000011 11111111, and uses one less meaningful bit than the value of 1024. Just as in base 10 notation, the zeros included in the 16-bit number before the first digit that is not zero do not have any numerical significance.
  • the present invention examines data arrays found in gas flow records and
  • each of five algorithms is applied to each array and the method is chosen for each array that allows maximum compression.
  • Each compression method makes
  • the present invention will use only four bits instead of the 16 bits or 32 bits used in the standard computer format. Many ofthe compression methods and techniques are in
  • present invention also includes novel methods for specifying time and summary data.
  • Some data arrays such as flow time, may include many repetitions ofthe same
  • the present invention searches for the most frequently repeated value in the data
  • the present invention will include the mode only one time in the compression record. If the array has variations from the mode, the variations are treated as outliers and treated individually. The present invention identifies the position of outlier
  • Mode compression Some data arrays, such as static pressure or C readings, may change by small
  • Some data arrays such as temperature readings or storage tank level readings, increase or decrease regularly by relatively small amounts.
  • the present invention will include
  • This compression method is called
  • Some data arrays are proportional to another array or a combination of other
  • volume is proportional to C, pressure
  • the present invention creates an array of estimated values and finds a ratio that when multiplied to the estimated values will minimize the range of deltas between the multiplied
  • Ratio compression This compression method is called Ratio compression.
  • 100,000 is expressed in binary form as 00000000 00000001 10000110 10100000.
  • the present invention can safely strip the first 15 bits from the binary representation, as they will be 0 for every value contained in the array.
  • Strip Zeros This compression method is called Strip Zeros. Each ofthe five compression methods also makes use of other compression techniques to further reduce the size ofthe compressed file. Strip Top Bit is routinely used to remove the first bit of compressed values when it is safe to assume that the value ofthe top
  • the outlier compression method allows the program to identify which elements of an array are “outliers" or expressed differently than the majority of other
  • an outlier bitmap is a series of bits that "maps" which values of an array must be treated as outliers.
  • bit of an array of data entries when that bit may be assumed to be there.
  • Fig. 1 is a top-level processing diagram, or flowchart, showing the process of the present invention
  • Fig. 2a is a table illustrating a sample data array
  • Fig. 2b is a compression record created for the data contained in Fig. 2a;
  • Fig. 3a is a table illustrating a sample data array
  • Fig. 3b is a compression record created for the data contained in Fig. 3a;
  • Fig. 4 is a compression record illustrating the Strip Top Bit compression
  • Fig. 5 is a top-level processing diagram, or flowchart, showing the process of
  • Fig. 6a is a table illustrating a sample data array
  • Fig. 6b is a compression record created for the data contained in Fig. 6a
  • Fig. 7a is a table illustrating a sample data array with binary values
  • Fig. 7b is a compression record created for the data contained in Fig. 7a;
  • Fig. 7c is another compression record for the data contained in Fig. 7a;
  • Fig. 7d is another compression record for the data contained in Fig. 7a;
  • Fig. 8 is a top-level processing diagram, or flowchart, showing the process of
  • Fig. 9 is a top-level processing diagram, or flowchart, showing the process of
  • Fig. 10a is a table illustrating a sample data array
  • Fig. 10b is a compression record for the data contained in Fig. 10a;
  • Fig. 1 la is a table illustrating a sample data array
  • Fig. 1 lb is a compression record for the data contained in Fig. 1 la
  • Fig. 12a is a table illustrating a sample data array
  • Fig. 12b is a compression record for the data contained in Fig. 12a;
  • Fig. 13a is a table illustrating a sample data array
  • Fig. 13b is a compression record for the data contained in Fig. 13a;
  • Fig. 14 is a top-level processing diagram, or flowchart, showing the process of the Difference From Last compression method;
  • Fig. 15a is a table illustrating a sample data array
  • Fig. 15b is a compression record for the data contained in Fig. 15a;
  • Fig. 16a is a table illustrating a sample data array
  • Fig. 16b is a compression record for the data contained in Fig. 16a;
  • Fig. 17a a table illustrating a sample data array
  • Fig. 17b a compression record for the data contained in Fig.17a;
  • Fig. 18 is a top-level processing diagram, or flowchart, showing the process of
  • Fig. 19 is a table illustrating a sample data array
  • Figs. 20a - 20d are tables showing iterations ofthe mantissa determination of
  • Fig. 21a is a table illustrating a sample data array
  • Fig. 21b is a table showing estimate data used by the Ratio compression
  • Fig. 21c is a table showing deltas determined by the Ratio compression
  • Fig. 21d is a compression record for the data contained in Fig. 21a;
  • Fig. 22a is a table illustrating a sample data array
  • Fig. 22b is a table showing estimates determined by the Ratio compression method
  • Fig. 22c is a compression record for the data contained in Fig. 22a;
  • Fig. 23a is a table illustrating a sample data array
  • Fig. 23b is a table illustrating an intermediate step ofthe Ratio compression
  • Fig. 23c is a compression record for the data contained in Fig. 23a.
  • the present invention was designed to compress the binary representation of numbers. It performs its compression by removing unnecessary bits from binary
  • 00000576 identifies the same value as 576.
  • 00000100 00000000 in binary format identifies the same value as 100 00000000.
  • N bits 2 N - 1.
  • 4 bits can hold a maximum value of 2 4 - 1 or 15.
  • 15 is represented in binary form as l l l l.
  • base 10 the maximum value of N
  • N digits 10 N - 1.
  • 4 digits in base 10 can hold a maximum value of 10 4 - 1 or 9999.
  • Fig. 1 a flowchart illustrates the basic steps included within the present invention.
  • Raw data is input.
  • the present invention immediately compresses time data (time of day that readings were taken) and removes "no
  • Time data readings are typically one hour apart at the top ofthe hour. Under
  • records where flow did not occur are identified when all readings are 0 except for temperature and static temperature. These records are removed from the data to be compressed, and indices for which records are no flow are included in the compressed file.
  • each of five algorithms are applied to each data array and the best method is selected. The five
  • the present invention then saves the data in data fields limited to that number of bits,
  • Mode Compression searches every parameter array for the value most frequently repeated, or the mode value. For example, when flow time is consistently 60 minutes, Mode Compression will store one value along with an instruction to repeat that value in each field ofthe array. Any deviations from the mode are identified with the Outlier Compression. If there are no exceptions, the present invention will only save the beginning time value and need not include the delta standard
  • the Delta From Base method finds the lowest value in the array and subtracts that
  • the Difference From Last method is similar to Delta From Base, but the deltas are the differences from the previous values instead of from a base number. This method works particularly well for parameters that change relatively uniformly over time, such as temperature and tank levels. Finally the Ratio Compression method is used when the
  • elements of an array are related proportionally to a data set that can be created from other arrays already included in the compression record.
  • the present invention finally performs a
  • Strip Top Bit Each technique minimizes the number of bits required to identify the data to
  • Outlier values are those in data arrays that require unique identification, allowing the remainder ofthe elements to be expressed in a more compact format.
  • Fig. 2a shows an array of 24 data entries having 21 identical entries and three
  • the varying entries being the first, third, and fourth locations.
  • bitmap indicates with a 1 in the first, third, and fourth locations ofthe 24-element bitmap that the first, third, and fourth data entries are outliers. If the present invention were to use this configuration, it would require 24 bits to specify which elements were outliers, plus an additional bit to specify that more than one outlier exists. The value of the 21 identical entries is included only once. Although the outlier values must be stored individually, the
  • Fig. 2b shows a compression record where that same identification is accomplished with only 13 bits. As shown in Fig. 2b, the first bit indicates
  • the number of outliers indicator is safely reduced by one because at least one outlier must
  • the next bit indicates whether the index is recorded in reverse order from the actual order ofthe data array. In situations where the outliers are towards the
  • the present invention examines both directions ofa data array and
  • the final index can only be one less than 2, so the final index
  • the outlier method can be used to determine whether outliers are congregated at one end of an array.
  • Fig. 3a shows a 24-element data array
  • elements 15, 17 and 20 - 24 are outliers. Because five outliers are clustered at the end ofthe array, the last four positions do not have
  • Fig. 3a The compression record for Fig. 3a is shown in Fig. 3b.
  • the invention must know the lengths ofthe fields.
  • the present invention will size the fields to match the largest size
  • the present invention may determine that the first bit of every data entry is 1. For example, in a Delta From Base array, the base number might be
  • Strip Top Bit must deal with 0 as an element. It does this by setting the field width value to maximum. For example, when compressing an array with a maximum value of 2 30 - 1, five bits are necessary to specify the field width. If the largest value is in fact 2 30 - 1, then the file size will be stored as 11110. To express 0, the field width will be set to the maximum size or 11111. In other words, 0 is specified as 2 N - 1 , where N is
  • the present invention in its optimal form, works only with positive integer values.
  • the upper limit is bound by Strip Top Bit, which uses the value 31 to specify
  • This number limits bit widths for data to 30, which has a maximum decimal amount of 1,073,741,823. To ensure error free execution, the present invention may adjust the raw data to remain below this maximum number.
  • the present invention adds an offset to all temperature data to make certain that the data consists only of positive integers.
  • all temperature data By default, all temperature
  • the present invention checks all the data
  • offset results in -1000, which is less than 0. 0 is substituted for -1000.
  • the present invention could also address negative and positive numbers by adding a sign entry where necessary, either as an inflection indicator or a sign indicator for each data entry.
  • Strip Zeros The first compression method analyzed by the present invention is Strip Zeros.
  • the present invention recognizes that 15 of these bytes are not necessary and only the last 17
  • Strip Zeros saves 15 bits for a single data entry less the bits necessary to
  • Strip Zeros evaluates four distinct methods of compressing arrays of numbers
  • the four methods are: (1) uniform field size; (2) single
  • Strip Zeros is selected as
  • the present invention determines that all array elements are exactly N bits in size. Only N - 1 bits are stored because the present
  • Fig. 6a shows a 6-index array
  • Fig. 6b shows the compression record for that array, demonstrating how the uniform size indicator ofthe
  • Strip Zeros compression method was used and how each element was identified with 6 bits.
  • each data entry was reduced by a size of 5 bits.
  • the single field size Strip Zeros compression method is used when the uniform field size does not apply. In other words, some values will require different size fields than others.
  • the bit width ofthe storage field is selected to be the bit width necessary to describe the largest element. Any preceding zeros are stripped from the values, but the top
  • the individual field sizes method can take advantage ofthe Strip Top Bit technique since each element is separately compressed. Before each element in the compression record is a field size
  • the maximum size ofthe field can be predetermined.
  • the method can save storage space.
  • the original record was 76 bits, while the compressed record
  • the two field sizes Strip Zeros compression method divides the array elements into two groups.
  • the present invention starts with the first group being limited to data values
  • the present invention may determine that the smallest data value requires 4 bits to identify. All data values that can be identified in 4 bits will belong in the
  • the size ofthe compressed file is calculated. Strip Zeros next increases the bit width ofthe smaller group by one, so that the small group includes the previous data entries as well as all entries identifiable with one more bit. The present invention then recalculates the results. This process is repeated until the bit width ofthe smaller group is
  • Fig. 7d shows a compression record
  • the result in this case is 66 bits. While that size causes the present invention to
  • the present invention may be able to save an additional bit by listing the larger field width before the smaller field width. For example, if
  • the Mode Compression method searches through an array for the value most
  • a reading may be taken every hour, resulting in precisely 60 minutes between readings.
  • the present invention identifies the value ofthe data, strips the top bit, and transmits the value one time instead of once for each element.
  • the number of array elements is either a fixed number or specified at the beginning
  • Mode Compression adds an outlier map and the values ofthe outlier elements
  • Mode Compression is capable of reducing the data file by the number of bits necessary to store all data elements minus one (the one that must be transmitted) less the bits necessary to define the compression. Decompression occurs simply by copying the saved
  • the Delta From Base compression method finds the lowest value in the array.
  • This value is called the base and is subtracted from all the array elements to create deltas.
  • Delta From Base yield particularly good compression when the base is significantly greater than the largest delta. For example, suppose an array had values of 6000, 6003, 6015, and
  • the present invention can send the
  • Delta From Base employs a number of strategies for adjusting the base, specifying the deltas and determining the outliers to minimize the overall compression record
  • Delta From Base begins with the consideration of one delta bit width and no outliers.
  • An example of an array that Delta From Base compresses efficiently is shown in
  • the base bit width, after Strip Top Bit is applied, is 8 bits.
  • the largest delta, 4022, requires 1 ' 2 bits to identify.
  • the base is transmitted one time in 8 bits and the
  • Bin size refers to the field width necessary to identify the data ofthe array
  • Delta From Base performs this analysis by starting with the largest
  • element value as the base. For example, examine the data array shown in Fig. 11a. The largest value, selected for the base, is 30803. The deltas range from 0 to 493. The largest delta thus requires 9 bits to identify. The greatest compression ratio occurs, however, when
  • Fig. 1 lb shows the compression record for this array and this method, indicating
  • the base width requirement is 14.
  • the top bit is stripped, and the larger delta of 493 is assigned its width of 9 bits.
  • the smaller delta size is assigned a width of 4 bits.
  • This example has small bin deltas clustered at the end ofthe array, so the compression record need not include indices for all of them.
  • An outlier is defined as a delta that is too
  • Delta From Base can express the outlier as a raw value or as a value outside the delta range.
  • the goal behind removing outliers is to shrink the number of bits required to identify the remaining records that vary little from the base number, and thereby reduce the overall number of bits in the compressed file. For example, suppose that in an array of 24
  • Delta From Base adjusts the base to the new value, saves the new upper and lower delta values, and determines the field width necessary to handle the base and the
  • maximum delta would be 730 and would require 10 bits to store. If three values were
  • Delta From Base can use a separate size field of six bits to hold 33 and 39. Another approach is to subtract the smaller delta range of 33, leaving 0 and 6, which fit in only 3 bits.
  • This approach has the advantage of using fewer bits for specifying outliers, but does take an
  • Fig. 13a shows a 24-element array with a delta range from 0 to
  • the outlier value is shifted down by 512 (2 9 ) to 31, which takes 5 bits to store.
  • the adjusted base is represented only once and removed from the remaining elements.
  • the outlier element has an additional
  • deltas are the differences from the previous values and not from a base value. This compression type works particularly well for parameters that steadily ramp up/down or cycle through a sampling period, such as a tank level or temperature sensor. Difference From Last
  • the values of 6000, 6003, and 6015 have binary values of: 00010111 01110000;
  • Difference From Last may report the second and third values as 0011 and l l l l, saving the first 12 bits of each entry. Decompression requires that the first delta be added to or
  • map It consists of a number of bits equivalent to the number of data entries in the data array.
  • the first method splits the differences into a large size bin and a small size bin.
  • Fig. 15b shows the compression record for the array.
  • Fig. 16a shows an array where the initial value is 4547 and the differences range from -1131 to 994. Difference From Last will create an inflection record to identify when the differences change between positive and negative values. The inflection record specifies where the differences
  • the record contains an inflections indicator, few outliers indicator, number of outliers, index direction backwards indicator, and the outlier
  • index records. It then indicates the initial size ofthe field to hold the initial value and the
  • Fig. 17a shows an
  • Difference From Last sets all the differences to positive. Difference From Last starts with 10 bits to store 739 and then shrinks the difference size from 10 to 0 bits. In this example, the optimal size occurs at 5 bits, leaving three outliers in index
  • Ratio compression is used when the elements of an array are related
  • Processing begins by creating an array of estimated values, and then
  • the present invention searches for the optimum way to describe the deltas.
  • ratio compression For compressing natural gas flow data, ratio compression is used for three types of parameters: pressure extension, volume and energy. For each, an estimated array is
  • the estimated arrays are created in the following ways:
  • the estimated pressure extension array is calculated by taking the square
  • the estimated volume array is created by multiplying flow time by pressure
  • the pressure extension array is not included in the gas flow data record, then the estimated pressure extension array is used. If either the flow time or C array
  • the estimated energy array is simply the actual volume array.
  • the estimated arrays are created in a similar manner from
  • the present invention checks to make sure that multiplying the differential and static pressures will not result in an overflow before performing the square root, i.e., that the result will not exceed the maximum storage of 32 bits.
  • the product size is estimated by
  • log 2 (product) ⁇ log 2 (differential pressure) + log 2 (static pressure).
  • Log 2 is defined as the number of bits required to store a value. If the product were to overflow, then log 2 (product) would exceed 32 by no more than N bits where
  • N log 2 (differential pressure) + log 2 (static pressure) - 32. If the present
  • the present invention shifts down the larger ofthe two elements first until it is the smaller, and then alternates shifting down the respective elements. This
  • the present invention prevents an estimated volume overflow with
  • estimated volume pressure extension x flow time x C.
  • the present invention performs the calculation in two steps: first multiplying the pressure extension by flow time and then multiplying the result by C. Either step can result in an overflow, defined in this case as exceeding the maximum Strip Top Bit representation of a number, meaning 30 bits.
  • the product is re-written in logarithmic form as previously described and Shrink Product is used to avoid overflows. In this case, the two equations are rewritten log 2 (product) ⁇
  • log 2 (multiplicand 1) + log 2 (multiplicand2). If the product were to overflow, then log 2 (product) would exceed 30 by N, where N log 2 (multiplicand 1) + log 2 (multiplicand2) - 30,
  • the present invention will correct this with an adjusting ratio.
  • the data arrays may be amenable for Ratio compression. If an array is proportionally related to another, a simple constant multiplied to one array will provide the values of another. In this situation, the original array may be used as the estimate array. In the situation where the second array is
  • ratios may include two offsets that vary over time. In this case, the best approach is to use the
  • Ratio compression assumes that the ideal ratio is the one that minimizes the sum of deltas when specified as:
  • the present invention uses a more compact representation by limiting the mantissa to 15 bits (14 bits after
  • the ratio record can range from 1 bit to 24 bits. Allowing the mantissa to become greater than 15 bits
  • the dynamic range ofthe ratio is from 4.66 x 10 "10 to 3.28 x 10 4 .
  • the present invention calculates the initial ratio in four steps:
  • the present invention aborts further ratios processing if either array sum is 0, the calculated
  • the present invention will also take steps to prevent an overflow in summing ratio arrays. Before summing either the estimate or actual arrays, the present invention will verify that the result will not exceed the maximum storage size of 32 bits. The sum is quickly estimated as:
  • the present invention increases the exponent by N when preventing an estimated array sum
  • the table presented in Fig. 19 is a sample gas flow record with 22 volume
  • the estimated volume was created by multiplying flow time, C and the square root ofthe product of differential and static pressures, as previously
  • the present invention checks for a potential overflow in summing the actual and estimated volume arrays.
  • an overflow may exist.
  • volume sum 15016.
  • the present invention adjusts the estimated and actual volume
  • the present invention divides the actual volume sum (3936354304) by the estimated volume sum (184651) to create the mantissa (21317).
  • the present invention recognizes that the exponent is greater than 31, and that it needs make no further adjustments.
  • the present invention must then determine the compression record size corresponding to this ratio. Next, it will shrink the mantissa by one
  • the present invention saves these values.
  • the present invention treats the new ratio as a better approximation ofthe optimum ratio.
  • Fig. 21a For example, consider the pressure extension array of Fig. 21a.
  • the present invention creates an estimate array by taking the square root ofthe product ofthe differential and static pressures as shown in Fig. 21b.
  • the present invention determines that the optimum ratio has a mantissa
  • the present invention picks the lowest delta as the base and adjusts the deltas upwards. Since
  • Ratio Compression next considers dividing the deltas into two size bins. Ratio searches every possible lower bin size, which can range from 0 to one less than the largest bin size. Ratio repeats this process by using the largest delta as the base and subtracts each delta from it. If any two-size combination from either search results in an overall smaller compression record, ratio compression will save the parameters. For example, see Fig. 22a.
  • the present invention determines that the optimum ratio has a mantissa of 3 and an exponent
  • ratio crates a range of deltas from -455 to 15 as seen in Fig. 22b.
  • the present invention picks
  • delta bin size or field bit width, one bit at a time and then calculates the value ofthe difference between the top ofthe data range and the outliers. Ratio repeats this process by using the largest delta as the base and subtracting each delta from it. If any combination
  • Fig. 23a shows an
  • the present invention uses the square root ofthe product ofthe differential and static pressures for the pressure extension estimate.
  • the present invention determines that the optimum ratio has a mantissa of 1 and an exponent of 0. Since the ratio is 1 , the present invention uses a single flag to indicate a ratio of one in the
  • FIG. 23b shows an array for the same example readings as Fig. 23a with
  • Fig. 23c shows the compression record for this example. The first 5 lines ofthe record indicate that the data is compressed by ratio compression with many outliers and indicate the positions of
  • the base is indicated in the next three lines as positive 68.
  • a flag indicating direction from base for the outliers is next, bit lengths are specified, and the outlier values are
  • the present invention applies Time Compression, No Flow Record Removal, Daily Summary Compression, and One
  • the present invention can compress the
  • the present invention creates an outlier bitmap for all the data entries.
  • the first entry is flagged as an outlier. Any other entry that is not exactly 60 minutes from the preceding entry is also flagged as an outlier.
  • the full value of all outliers is stored in the compressed file. During decompression, the outlier values are
  • the invention searches the data for records for "no flow" records. No flow records exist when all readings are zero with the exception of static pressure and temperature. The invention then removes the zero elements form the
  • present invention uses the outlier method to identify which records are no flow.
  • the present invention creates a summary value of each given parameter array after compression.
  • the summary value may be the daily average or it may be a total across
  • the present invention begins by calculating the average or total of a
  • the present invention first checks for an overflow, i.e., the sum exceeds the maximum storage space of 32 bits.
  • the upper limit ofthe sum is estimated by:
  • log 2 (sum) ⁇ log 2 (largest element) + log 2 (number of elements). If the sum were to overflow, then log 2 (sum) would exceed 32 by no more than N bits
  • N log 2 (largest element) + log 2 (number of elements) - 32.
  • the present invention determines that N is greater than 0, it can prevent an overflow by shifting down each element N bits before summing and shifting the average back up by N bits.
  • the present invention sets the total to the upper limit. Otherwise, it checks if the [( ⁇ elements) x 24] portion ofthe extrapolation will exceed 32 bits by calculating
  • N log 2 (largest element) + log 2 (24) - 32. If N is greater than 0, the present invention can prevent an overflow by first shifting down ⁇
  • total ( ⁇ [ ( £ elements) / 2 N ] x 24 ⁇ / number of elements) x 2 N .
  • the present invention determines the difference with the actual summary. Depending on the size ofthe difference,
  • the present invention will create on of three types of compression records. If the difference is not
  • the small size summary compression record is used.
  • the compression record includes a small size indicator (1 bit), a negative sign indicator (1 bit) and a value for
  • the present invention can assume the size ofthe summary value in the compression record.
  • the compression record will include a flag for the summary size being other than small size, a flag to indicate the value is within the calculated range, and the summary value.
  • the compression record must add a designator for the size ofthe field

Abstract

A method for compressing small computer files through the use of reduction of values and removal of unnecessary bits in the compressed computer file. The method examines data in arrays or groups that share certain characteristics and evaluates each of a plurality of compression methods for each array. Compression methods include, without limitation, Mode Compression, Strip Zeros compression, Delta From Base compression, Difference from Last compression, and Ratio Compression. It chooses the best compression method for each array. The method also applies certain compression techniques to all data and uses varying field widths throughout to minimize the compressed file. It results in substantially reduced file sizes compared to traditional compression methods for small computer files.

Description

METHOD FOR COMPRESSION OF SMALL COMPUTER DATA FILES
FIELD OF THE INVENTION
The present invention generally relates to compression of binary data for ease of transmission between a source and receiving device, and more particularly to compression of binary data files by selecting an optimum compression method from a plurality of compression methods.
BACKGROUND OF THE INVENTION
Many data compression methods are known in the art. Most available data compression methods examine the frequency of data character strings and assign variable
length codes to represent that data. More frequently appearing and longer strings receive
shorter codes and rare byte strings receive longer codes. When the resulting compressed file
is prepared for transmission, it must include a definition ofthe codes used in the compression.
These compression programs have two major faults when used to compress small files. First,
the definition ofthe codes is large relative to the data to be compressed, and second, when data changes slightly from record to record, few repetitive byte values can be assigned compact codes to represent them.
The present invention, although useful for many applications, was originally designed for the purpose of reducing file sizes of data recorded at remote locations of natural
gas flow, such as where the natural gas flowed from the well into a pipeline. Many natural
gas wells are located in places that are difficult to reach. Prior to the development ofthe present invention, data was recorded and stored on site. Approximately once a month, a
person would visit the site with the resulting expenses in time and travel costs and download the data into a laptop computer. One method to avoid the expense of such a process was to transmit the data from the remote site to a central location. However, standard telephone lines are not typically run to these sites, and the cost of installing a telephone line or similar communication
network line is prohibitive. The introduction of satellite communication systems provided
another alternative for data transmission.
However, even the satellite communication option proved to be prohibitively
expensive. The present invention has reduced the size ofthe file needed to be transmitted to a
point where it is more economical to use satellite communication than to visit each site.
Traditional compression methods, such as PKZip®, examine the frequency of individual bytes of data and assign variable length codes to represent that data, with
frequently appearing byte strings receiving shorter codes than more rare byte strings. In contrast, the present invention eliminates unnecessary bits and makes use ofthe principle that
smaller numbers are identified by fewer bits than larger numbers.
The prior art teaches few compression methods that are optimized for small
files composed of data arrays. In most applications, reducing a file of 900 bytes to 200 bytes
represents a substantial cost savings over other small file compression methods.
SUMMARY OF THE INVENTION
Computers store and use numbers in binary format (base 2), instead ofthe more familiar base 10 format. Computers typically use a 16-bit or 32-bit segment of binary
numbers, regardless ofthe number's actual value. Binary digits, or bits, may have a value of
either 0 or 1. In a 16-bit computer system, 1024 (base 10) is expressed as 00000100 00000000 (base 2 or binary). 1023 is expressed as 00000011 11111111, and uses one less meaningful bit than the value of 1024. Just as in base 10 notation, the zeros included in the 16-bit number before the first digit that is not zero do not have any numerical significance.
In other words, just as in base 10 format 000001023 is identical to 1023, in binary format, 00000011 11111111 is identical to 11 11111111. The present invention uses several methods
to compress the number of bits used to store data entries, either by removing unnecessary bits or by designating the number in a fashion that requires fewer bits.
The present invention examines data arrays found in gas flow records and
evaluates, for example, five different compression algorithms for each data array. The best algorithm choice depends upon the underlying characteristics ofthe array. A data sample
might include time, differential pressure, static pressure, temperature, flow time, C , pressure
extension, volume and energy data values. After these measurements are arranged in arrays
by the present invention, each of five algorithms is applied to each array and the method is chosen for each array that allows maximum compression. Each compression method makes
use of a few compression techniques that further reduce the size ofthe compressed file.
The present invention uses dynamic data field sizing throughout the
compressed file to identify data values. If four bits are sufficient to positively identify a
value, the present invention will use only four bits instead ofthe 16 bits or 32 bits used in the standard computer format. Many ofthe compression methods and techniques are in
themselves novel, such as Ratio Compression, Strip Top Bit, and Outlier Compression. The
present invention also includes novel methods for specifying time and summary data.
Some data arrays, such as flow time, may include many repetitions ofthe same
number. The present invention searches for the most frequently repeated value in the data
array, or the mode ofthe array. The present invention will include the mode only one time in the compression record. If the array has variations from the mode, the variations are treated as outliers and treated individually. The present invention identifies the position of outlier
values and their values in the compressed file. This compression method is called Mode compression. Some data arrays, such as static pressure or C readings, may change by small
increments in relation to the value ofthe reading. The present invention will search for the
smallest value in the array and include that value only once in the compressed file. It will then subtract that value from the remaining values and include the difference, or delta, in the compressed file. This compression method is called Delta From Base.
Some data arrays, such as temperature readings or storage tank level readings, increase or decrease regularly by relatively small amounts. The present invention will include
the first reading in the compressed file. It will then subtract that reading from the subsequent
reading and include that difference in the compressed file. This compression method is called
Difference From Last.
Some data arrays are proportional to another array or a combination of other
arrays in the data file. In the gas flow application, volume is proportional to C, pressure
extension, and flow time. Energy is proportional to differential pressure and static pressure.
The present invention creates an array of estimated values and finds a ratio that when multiplied to the estimated values will minimize the range of deltas between the multiplied
values and the actual values. The ratio and the deltas are included in the compressed file. This compression method is called Ratio compression.
Some data arrays are most compactly expressed simply by stripping zeros that
are not necessary to identify the values ofthe array. The present invention examines the array
and determines the maximum number of bits necessary to identify the values included. For example, 100,000 is expressed in binary form as 00000000 00000001 10000110 10100000.
If 100,000 is the largest value ofthe array, the present invention can safely strip the first 15 bits from the binary representation, as they will be 0 for every value contained in the array.
This compression method is called Strip Zeros. Each ofthe five compression methods also makes use of other compression techniques to further reduce the size ofthe compressed file. Strip Top Bit is routinely used to remove the first bit of compressed values when it is safe to assume that the value ofthe top
bit must be 1. The outlier compression method allows the program to identify which elements of an array are "outliers" or expressed differently than the majority of other
elements. It will create an outlier bitmap that has one bit for each element ofthe array. The bit ofthe bitmap that corresponds to each outlier will have a value of 1, while all non-outlier
values will be 0. In other words, an outlier bitmap is a series of bits that "maps" which values of an array must be treated as outliers.
It is an object ofthe present invention to provide a method of compression that
substantially improves the compression ratio for small computer data files.
It is a further object ofthe invention to optimize the compression of files by
choosing from a plurality of compression methods for different sections of data that are
transmitted. It is a further object of the invention to provide a method of compressing data
by identifying an array of data values as a ratio of other data values.
It is a further object ofthe invention to provide a methods of removing the first
bit of an array of data entries when that bit may be assumed to be there.
It is a further object ofthe invention to optimize the compression file for each compression method by identifying and compressing outliers separately, thereby reducing the
record size for the remaining elements.
It is a further object ofthe invention to provide a method for compressing the
average or sum value of an array. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a top-level processing diagram, or flowchart, showing the process of the present invention;
Fig. 2a is a table illustrating a sample data array; Fig. 2b is a compression record created for the data contained in Fig. 2a;
Fig. 3a is a table illustrating a sample data array;
Fig. 3b is a compression record created for the data contained in Fig. 3a;
Fig. 4 is a compression record illustrating the Strip Top Bit compression
method; Fig. 5 is a top-level processing diagram, or flowchart, showing the process of
the Strip Zeros compression method;
Fig. 6a is a table illustrating a sample data array;
Fig. 6b is a compression record created for the data contained in Fig. 6a; Fig. 7a is a table illustrating a sample data array with binary values; Fig. 7b is a compression record created for the data contained in Fig. 7a;
Fig. 7c is another compression record for the data contained in Fig. 7a;
Fig. 7d is another compression record for the data contained in Fig. 7a;
Fig. 8 is a top-level processing diagram, or flowchart, showing the process of
the Mode compression method; Fig. 9 is a top-level processing diagram, or flowchart, showing the process of
the Delta From Base compression method;
Fig. 10a is a table illustrating a sample data array; Fig. 10b is a compression record for the data contained in Fig. 10a;
Fig. 1 la is a table illustrating a sample data array; Fig. 1 lb is a compression record for the data contained in Fig. 1 la; Fig. 12a is a table illustrating a sample data array;
Fig. 12b is a compression record for the data contained in Fig. 12a;
Fig. 13a is a table illustrating a sample data array;
Fig. 13b is a compression record for the data contained in Fig. 13a; Fig. 14 is a top-level processing diagram, or flowchart, showing the process of the Difference From Last compression method;
Fig. 15a is a table illustrating a sample data array;
Fig. 15b is a compression record for the data contained in Fig. 15a;
Fig. 16a is a table illustrating a sample data array; Fig. 16b is a compression record for the data contained in Fig. 16a;
Fig. 17a a table illustrating a sample data array;
Fig. 17b a compression record for the data contained in Fig.17a;
Fig. 18 is a top-level processing diagram, or flowchart, showing the process of
the Ratio compression method;
Fig. 19 is a table illustrating a sample data array;
Figs. 20a - 20d are tables showing iterations ofthe mantissa determination of
the Ratio compression method;
Fig. 21a is a table illustrating a sample data array;
Fig. 21b is a table showing estimate data used by the Ratio compression
method;
Fig. 21c is a table showing deltas determined by the Ratio compression
method;
Fig. 21d is a compression record for the data contained in Fig. 21a;
Fig. 22a is a table illustrating a sample data array; Fig. 22b is a table showing estimates determined by the Ratio compression method;
Fig. 22c is a compression record for the data contained in Fig. 22a;
Fig. 23a is a table illustrating a sample data array;
Fig. 23b is a table illustrating an intermediate step ofthe Ratio compression
method; and
Fig. 23c is a compression record for the data contained in Fig. 23a.
DETAILED DESCRIPTION OF THE INVENTION
The present invention was designed to compress the binary representation of numbers. It performs its compression by removing unnecessary bits from binary
representations and designating numbers in more compact fashion. Computer systems use
16-bit or 32-bit binary values to designate values, regardless of how many bits are actually
required. In a 16-bit system, 1024 is designated as 00000100 00000000. Under normal
circumstances, computer systems will always use all 16 bits regardless of whether they are necessary. One way the present invention will reduce the amount of bits required to store the
number is to remove the 0's preceding the first occurring value of 1. Although computers keep track of values in the 16-bit or 32-bit format, the preceding 0's are not necessary to
designate the value ofthe number in the same way that 0's preceding base 10 numbers don't
mean anything. For example, in base 10, 00000576 identifies the same value as 576. Likewise, 00000100 00000000 in binary format identifies the same value as 100 00000000.
The number of bits necessary to hold a particular base 10 number is identified
by the equation N bits = 2N - 1. In other words, 4 bits can hold a maximum value of 24 - 1 or 15. 15 is represented in binary form as l l l l. Similarly, in base 10, the maximum value of N
digits is expressed by the equation N digits = 10N - 1. 4 digits in base 10 can hold a maximum value of 104 - 1 or 9999. As discussed more fully below, the present invention
reduces the values of some data entries so that those entries may be designated by fewer bits. Referring to Fig. 1 , a flowchart illustrates the basic steps included within the present invention. Raw data is input. In the gas flow application, the present invention immediately compresses time data (time of day that readings were taken) and removes "no
flow" data. Time data readings are typically one hour apart at the top ofthe hour. Under
these circumstances, the present invention will only send the original time value. If variations
occur, the outlier compression method is used, as discussed below. In the gas flow data
application, records where flow did not occur are identified when all readings are 0 except for temperature and static temperature. These records are removed from the data to be compressed, and indices for which records are no flow are included in the compressed file.
After compressing the time records and removing the no flow records, each of five algorithms are applied to each data array and the best method is selected. The five
different compression methods are:(l) Strip Zeros; (2) Mode; (3) Delta From Base; (4)
Difference From Last; and (5) Ratio Compression. In simple terms, Strip Zeros compression
finds the largest value in an array and determines how many bits are required to hold that
value. The present invention then saves the data in data fields limited to that number of bits,
eliminating the string of zeros customarily in front ofthe actual data. Mode Compression
searches every parameter array for the value most frequently repeated, or the mode value. For example, when flow time is consistently 60 minutes, Mode Compression will store one value along with an instruction to repeat that value in each field ofthe array. Any deviations from the mode are identified with the Outlier Compression. If there are no exceptions, the present invention will only save the beginning time value and need not include the delta standard
value. The Delta From Base method finds the lowest value in the array and subtracts that
number from all array values. If the remainders are small relative to the base, then the compressed output containing the base and the deltas will be much smaller than the original
array. The Difference From Last method is similar to Delta From Base, but the deltas are the differences from the previous values instead of from a base number. This method works particularly well for parameters that change relatively uniformly over time, such as temperature and tank levels. Finally the Ratio Compression method is used when the
elements of an array are related proportionally to a data set that can be created from other arrays already included in the compression record. The present invention finally performs a
daily summary compression.
Most ofthe compression algorithms ofthe present invention use two
techniques to compress the data included in the compressed file: Outlier Compression and
Strip Top Bit. Each technique minimizes the number of bits required to identify the data to
be compressed.
Outlier Compression
Outlier values are those in data arrays that require unique identification, allowing the remainder ofthe elements to be expressed in a more compact format. For example, Fig. 2a shows an array of 24 data entries having 21 identical entries and three
varying entries, the varying entries being the first, third, and fourth locations. The three
varying entries are considered outliers. A bitmap indicating which entries are outliers,
including one bit for each data entry, appears as 10110000 00000000 00000000. The outlier
bitmap indicates with a 1 in the first, third, and fourth locations ofthe 24-element bitmap that the first, third, and fourth data entries are outliers. If the present invention were to use this configuration, it would require 24 bits to specify which elements were outliers, plus an additional bit to specify that more than one outlier exists. The value of the 21 identical entries is included only once. Although the outlier values must be stored individually, the
total amount of data stored is significantly reduced. When just a few data values are outliers and the array contains seven or more
elements, such as the array shown in Fig. 2a, the present invention will use a second outlier method for identification. See Fig. 2b. Fig. 2b shows a compression record where that same identification is accomplished with only 13 bits. As shown in Fig. 2b, the first bit indicates
that it is a "few outliers" type data array. The next four bits indicate the number of outliers.
The number of outliers indicator is safely reduced by one because at least one outlier must
exist to invoke this record. The next bit indicates whether the index is recorded in reverse order from the actual order ofthe data array. In situations where the outliers are towards the
front ofthe data array, subsequent indices are specified with fewer bits when expressed in the backwards direction. The present invention examines both directions ofa data array and
chooses the direction that results in the smallest resulting data file. If only one outlier exists,
the direction indicator is not used as it would be meaningless. The first outlier index in this
example is 3. Five bits are used to store this index because 5 bits are sufficient to indicate any number between 0 (000002) and the 23 indices of this example (101112). The next outlier must be at least one less than the first index, or 2 in this example. Therefore, the position can
be indicated in only two bits. The final index can only be one less than 2, so the final index
(0) is stored with one bit. Therefore, 13 bits identify the outliers in place ofthe 24-bit outlier
bitmap.
When outliers are congregated at one end of an array, the outlier method can
further compress the record. For example, Fig. 3a shows a 24-element data array where
elements 15, 17 and 20 - 24 (indicated by index numbers 14, 16 and 19 - 23) are outliers. Because five outliers are clustered at the end ofthe array, the last four positions do not have
to be indicated individually. During the decompression process, the present invention keeps track of how many outlier positions remain to be determined. When that number matches the remaining number of indices, the remaining indices must be outliers and do not need indicators. The compression record for Fig. 3a is shown in Fig. 3b. Strip Top Bit
The present invention uses varying length data fields to identify data entries
throughout its process. When the data files are decompressed, the invention must know the lengths ofthe fields. The present invention will size the fields to match the largest size
necessary to convey the required information. For example, assume that Delta From Base
was used to compress an array. If the deltas are 17, 0, 23, 45, 3, 8 and 29, the present
invention will set the data field to 6 bits - the number of bits necessary to identify the number 45 in binary form.
In some situations, the present invention may determine that the first bit of every data entry is 1. For example, in a Delta From Base array, the base number might be
2000. The binary representation of 2000 is 11111010000. The initial "1" of that binary string
represents a value of 1024. If the greatest Delta From Base value is less than 976, the present
invention knows that all values ofthe first bit for each data entry are "1" and will remove the
top bit from the compressed file. When the file is decompressed, that "1" is added back in. This method saves one bit for every entry after the base entry less bits needed to define the
modification. A compression record illustrating this example is shown in Fig. 4.
Sometimes, Strip Top Bit must deal with 0 as an element. It does this by setting the field width value to maximum. For example, when compressing an array with a maximum value of 230 - 1, five bits are necessary to specify the field width. If the largest value is in fact 230 - 1, then the file size will be stored as 11110. To express 0, the field width will be set to the maximum size or 11111. In other words, 0 is specified as 2N - 1 , where N is
the size ofthe filed width value. Data Input Limits
The present invention, in its optimal form, works only with positive integer values. The upper limit is bound by Strip Top Bit, which uses the value 31 to specify
subsequent data values of 0. This number limits bit widths for data to 30, which has a maximum decimal amount of 1,073,741,823. To ensure error free execution, the present invention may adjust the raw data to remain below this maximum number.
For example, the present invention adds an offset to all temperature data to make certain that the data consists only of positive integers. By default, all temperature
readings have 40 degrees added to them. Of course, the actual value ofthe increase will depend on the accuracy ofthe reading. If temperature is measured in lOOths ofa degree, the
present invention will multiply the readings by 100 to turn the fractions into integers. The 40-
degree offset becomes 4000 under these circumstances. This offset is subtracted when the data is decompressed.
After adding the temperature offset, the present invention checks all the data
values to verify that they fall within the acceptable data range. If they do not, the present
invention either raises them to 0 (if negative) or lowers them to 1,073,741,823 (the maximum
value the present invention can handle). For example, if temperature us -5000, adding the
offset results in -1000, which is less than 0. 0 is substituted for -1000.
It should be noted that the present invention is easily modified to
accommodate negative numbers or a mix of positive and negative numbers. The offset
technique applied to temperature reasons is easily expanded to address negative data entries. The present invention could also address negative and positive numbers by adding a sign entry where necessary, either as an inflection indicator or a sign indicator for each data entry.
Strip Zeros The first compression method analyzed by the present invention is Strip Zeros.
When a typical number is stored in binary format, many ofthe binary digits at the beginning ofthe number are zeros. For example, 100,000 is expressed in binary format as:
00000000000000011000011010100000.
The present invention recognizes that 15 of these bytes are not necessary and only the last 17
bits are required to identify the value that needs to be stored, resulting in:
1 10000110 10100000.
In this example, Strip Zeros saves 15 bits for a single data entry less the bits necessary to
define the data field.
Strip Zeros evaluates four distinct methods of compressing arrays of numbers
and selects the most compact method. The four methods are: (1) uniform field size; (2) single
field size; (3) individual field size; and (4) two field sizes. When Strip Zeros is selected as
the most efficient compression method by the present invention, only one ofthe four methods is used. A flow chart showing the logic behind the choice is demonstrated in Fig. 5.
In the uniform method field size, the present invention determines that all array elements are exactly N bits in size. Only N - 1 bits are stored because the present
invention knows the first bit is 1 and applies Strip Top Bit. Fig. 6a shows a 6-index array
including the elements, their binary value, and the stripped binary value element field size in
bits. In this example, a value of 64 (26) was stripped from the element values. Fig. 6b shows the compression record for that array, demonstrating how the uniform size indicator ofthe
Strip Zeros compression method was used and how each element was identified with 6 bits.
In this example, each data entry was reduced by a size of 5 bits.
The single field size Strip Zeros compression method is used when the uniform field size does not apply. In other words, some values will require different size fields than others. The bit width ofthe storage field is selected to be the bit width necessary to describe the largest element. Any preceding zeros are stripped from the values, but the top
bit is not stripped. Elements that require smaller field widths retain zeros in the beginning bit positions as shown in Fig. 7a.
The individual field sizes Strip Zeros compression method is most effective
when a large variation of field sizes exists. Storing the individual field sizes may be more compact than setting the width to accommodate the largest element value. The present
invention will evaluate both methods and choose the smallest result. The individual field sizes method can take advantage ofthe Strip Top Bit technique since each element is separately compressed. Before each element in the compression record is a field size
specifying how many bits are required to hold the element (without the top bit). Elsewhere throughout the present invention, this field will always be 5 bits wide. However, in the
individual field sizes method, the maximum size ofthe field can be predetermined. In the
example illustrated in Fig. 7c, only 4 bits are needed to identify the maximum field width of
10, which is the number of bits required to store 1025 after the top bit has been removed. The
present invention subtracts 2 from the number of bits to hold the field width of 10 (which is 4 bits) and includes it as the maximum size indicator (2 = 4 - 2). Two can safely be subtracted
from the maximum field size because the cases of 0 and 1 correspond to arrays whose largest elements are 0 and 1, respectively. In these two cases, other Strip Zeros methods are found to
be more compact. The example shown in Fig. 7a demonstrates that the individual field sizes
method can save storage space. The original record was 76 bits, while the compressed record
is 56 bits.
The two field sizes Strip Zeros compression method divides the array elements into two groups. The present invention starts with the first group being limited to data values
that require the smaller field widths to identify, while the larger bin holds all other data
entries. For example, the present invention may determine that the smallest data value requires 4 bits to identify. All data values that can be identified in 4 bits will belong in the
small size group. The size ofthe compressed file is calculated. Strip Zeros next increases the bit width ofthe smaller group by one, so that the small group includes the previous data entries as well as all entries identifiable with one more bit. The present invention then recalculates the results. This process is repeated until the bit width ofthe smaller group is
one less than the bit width ofthe larger group. If any result is the smallest overall
compression, the present invention chooses that method. Fig. 7d shows a compression record
using the two field sizes method for the data array shown in Fig. 7a in the individual field size
method. The result in this case is 66 bits. While that size causes the present invention to
choose the single field size method over the two field sizes method for this array, it shows a compression of 10 bits. In some circumstances, the present invention may be able to save an additional bit by listing the larger field width before the smaller field width. For example, if
the large group requires 11 bits, the smaller size must be a bit width of 10 or less, expressed 10102 and requiring only 4 bits to store (a savings of 1 bit). Simply reversing the process
decompresses data entries that are compressed by Strip Zeros compression.
Mode Compression
The Mode Compression method searches through an array for the value most
frequently repeated - the mode ofthe array. It stores this value in the compression record only once, and treats all other values as outliers. The Mode Compression method yields
excellent compression for parameters that are relatively constant such as the flow time
parameter in the gas flow data record. For example, a reading may be taken every hour, resulting in precisely 60 minutes between readings. The present invention identifies the value ofthe data, strips the top bit, and transmits the value one time instead of once for each element. The number of array elements is either a fixed number or specified at the beginning
ofthe compressed file, so the decompression is easily accomplished. If outliers exist in the array, Mode Compression adds an outlier map and the values ofthe outlier elements, and
applies the Strip Top Bit method. The logic behind Mode Compression is shown in Fig. 8. Mode compression is capable of reducing the data file by the number of bits necessary to store all data elements minus one (the one that must be transmitted) less the bits necessary to define the compression. Decompression occurs simply by copying the saved
data value in each entry ofthe array.
Delta From Base
The Delta From Base compression method finds the lowest value in the array.
This value is called the base and is subtracted from all the array elements to create deltas.
Delta From Base yield particularly good compression when the base is significantly greater than the largest delta. For example, suppose an array had values of 6000, 6003, 6015, and
6008. The binary value of those numbers is: 00010111 01110000;
00010111 01110011;
00010111 Ol l l l l l lj and
00010111 01111000.
The first 12 bits of each of these numbers is identical. The present invention can send the
base number one time, and follow it with the deltas from the base. In this example, the deltas
are all 4 designated in 4 bits, resulting in a 12 bit savings per data entry, less the bits
necessary to define the compression. Decompression occurs by translating the base number
(restoring the top bit, if necessary) and adding each delta to that base for individual data
entries.
Delta From Base employs a number of strategies for adjusting the base, specifying the deltas and determining the outliers to minimize the overall compression record
size, such as using a single delta size, two delta sizes, and removal of outliers.. The logic flow of Delta From Base can be seen in Fig. 9. Delta From Base considers several different
methods of compression, chooses the best one, and then compares it to the results of other compression methods.
Delta From Base begins with the consideration of one delta bit width and no outliers. An example of an array that Delta From Base compresses efficiently is shown in
Fig. 10a, with a resulting compression record shown in Fig. 10b. When applying the no
outliers Delta From Base method to the array produced in Fig. 10a, the base is 347 and the
largest delta is 4022. The base bit width, after Strip Top Bit is applied, is 8 bits. The largest delta, 4022, requires 1'2 bits to identify. The base is transmitted one time in 8 bits and the
remaining deltas are each transmitted in 12 bits.
Delta From Base next considers dividing the deltas into two separate groups,
or two size bins. Bin size refers to the field width necessary to identify the data ofthe array,
with the large bin size sufficient to identify the largest value stored. Delta From Base
examines every smaller bin size, ranging from 0 to one less than the largest bin size. If any
combination results in an overall smaller compression record size, Delta From Base saves those record parameters. Delta From Base performs this analysis by starting with the largest
element value as the base. For example, examine the data array shown in Fig. 11a. The largest value, selected for the base, is 30803. The deltas range from 0 to 493. The largest delta thus requires 9 bits to identify. The greatest compression ratio occurs, however, when
the deltas are split into two groups, with the smaller deltas assigned 4 bits, or a maximum value of 15. Fig. 1 lb shows the compression record for this array and this method, indicating
that the base width requirement is 14. The top bit is stripped, and the larger delta of 493 is assigned its width of 9 bits. The smaller delta size is assigned a width of 4 bits. Direction
and lower size indicators are presented, and the deltas follow. This example has small bin deltas clustered at the end ofthe array, so the compression record need not include indices for all of them.
After Delta From Base has searched for the best no outlier representation, it then searches for the best outlier representation. An outlier is defined as a delta that is too
large to be included compactly in the compression record. Delta From Base can express the outlier as a raw value or as a value outside the delta range.
The goal behind removing outliers is to shrink the number of bits required to identify the remaining records that vary little from the base number, and thereby reduce the overall number of bits in the compressed file. For example, suppose that in an array of 24
elements, 23 elements required 4 bits to store and one element required 5 bits to store. By keeping the delta size within 4 bits, 23 bits were saved. If the number of bits required to identify the outlier is less than 23, a net compression has occurred. Delta From Base
examines the elements in three ways: 1) it removes elements from largest to smallest; 2) it
removes elements from smallest to largest, and 3) removes elements from top and bottom in
an optimized removal process towards the center of the element values. If a more compact
record is found, Delta From Base adjusts the base to the new value, saves the new upper and lower delta values, and determines the field width necessary to handle the base and the
outliers.
Consider the array shown in Fig. 12a. If the base value were set to 28, the
maximum delta would be 730 and would require 10 bits to store. If three values were
excluded and the base was adjusted to 706, the maximum delta is 52, requiring only 6 bits to store. The number of saved bits is 4 per element, or 76 in this example. As the identification ofthe outliers will take 19 bits, the record is compressed by an additional 57 bits. The compression record is demonstrated in Fig. 12b. Delta From Base - Outliers/Delta From Range considers specifying outliers as the distance form the record's delta range. For example, consider an array containing the
elements 0, 23, 15, 4, 31, 19, 33, 39. The deltas range from 0 to 39, requiring 6 bits to store. If the delta fields were reduced by 1 to 5 bits each, all ofthe values would fit except for 33
and 39. Delta From Base can use a separate size field of six bits to hold 33 and 39. Another approach is to subtract the smaller delta range of 33, leaving 0 and 6, which fit in only 3 bits.
This approach has the advantage of using fewer bits for specifying outliers, but does take an
extra five bits to specify the outlier range as compared to using raw values described earlier. If the outlier deltas are in the negative direction, then the absolute values are stored. Delta
From Base uses a sign flag to indicate whether the outliers extend towards the top ofthe range or towards the bottom. Fig. 13a shows a 24-element array with a delta range from 0 to
543, taking a field of 10 bits to store. Delta From Base determines that removing the
maximum delta, or 543, allows the remaining deltas to fit in 9 bit fields. The outlier value is shifted down by 512 (29) to 31, which takes 5 bits to store. The adjusted base is represented only once and removed from the remaining elements. The outlier element has an additional
512 removed from it. The remaining values are the deltas from base. The compression record
for this example is shown in Fig. 13b.
Difference From Last
Difference From Last compression is similar to Delta From Base, but the
deltas are the differences from the previous values and not from a base value. This compression type works particularly well for parameters that steadily ramp up/down or cycle through a sampling period, such as a tank level or temperature sensor. Difference From Last
begins by subtracting each element from its subsequent element to determine the differences.
It then identifies if and where there are inflections, or changes in the signs ofthe differences. For example, the values of 6000, 6003, and 6015 have binary values of: 00010111 01110000;
00010111 01110011; and 00010111 01111111.
Difference From Last may report the second and third values as 0011 and l l l l, saving the first 12 bits of each entry. Decompression requires that the first delta be added to or
subtracted from the number before it, and the next delta be added to or subtracted from the result ofthe first delta calculation, and so on. If there is a change in sign ofthe deltas, then either an inflection map must be included in the compression record or else the values that
change inflection must be treated as outliers. An inflection map is similar to an outlier bit-
map. It consists of a number of bits equivalent to the number of data entries in the data array.
The bit that corresponds to each entry that changes inflection from the previous entry is stored
as 1 while other bits are stored as 0.
Difference From Last employs two methods for optimizing the compression
record size. The first method splits the differences into a large size bin and a small size bin.
It next finds the optimum lower bin size in a fashion similar to that previously discussed. The second method shrinks the difference size and adds outliers to those removed because of inflection. The logic flow ofthe Difference From Last method can be seen in Fig. 14.
Difference From Last begins by considering the simplest case of just one
difference size and no outliers. An inflection record is included if necessary. Fig. 15a shows
a 24-element array where the initial value is 116. The differences range from 10 to 19
without inflections. Fig. 15b shows the compression record for the array. The compression
record indicates no inflections, no outliers, change in the positive direction, a field size, a base
size, and the deltas in order.
Difference From Last next considers dividing the differences into two size
bins. Difference From Last searches every possible lower bin size, which can range from 0 to one less than the largest bin size. If any two-size combination results in an overall smaller
compression record size, then Difference From Last saves the record parameters. Fig. 16a shows an array where the initial value is 4547 and the differences range from -1131 to 994. Difference From Last will create an inflection record to identify when the differences change between positive and negative values. The inflection record specifies where the differences
change signs, i.e., index numbers 2, 3, 7 and 15. The difference array is then converted to all positive values. Next, Difference From Last determines that the larger bin size must be 11
bits wide. It calculates the resulting final output for smaller bin sizes of 0 to 10 bits. In this example, the most advantageous example occurs when the smaller bin size is 8 bits. Fig. 16b
shows the compression record for this example. The record contains an inflections indicator, few outliers indicator, number of outliers, index direction backwards indicator, and the outlier
index records. It then indicates the initial size ofthe field to hold the initial value and the
initial value itself. Two different size bins are then indicated with more than one outlier. The record then includes the larger and smaller difference field sizes, followed by the values of
the differences as well as a bit map which indicates which data entries are small bin entries
and which bin entries are large bin entries.
Finally, Difference From Last will consider designating specific differences as
outliers. Difference From Last splits the differences based upon sign ofthe value. Those
values whose sign (positive or negative) is occurs less frequently are considered outliers. The remaining differences are set to positive values and the difference size is based upon the largest non-outlier difference. Difference From Last then shrinks the difference size, designating the now excluded values as outliers. If a particular difference size results in a
smaller compression record, Difference From Last saves the parameters. Fig. 17a shows an
example array where the initial value is 1048 and the differences range from -1 to -739. There are no inflections. Difference From Last sets all the differences to positive. Difference From Last starts with 10 bits to store 739 and then shrinks the difference size from 10 to 0 bits. In this example, the optimal size occurs at 5 bits, leaving three outliers in index
positions 1, 3 and 4. The compression record is shown in Fig. 17b. Decompression of any Difference From Last compression method is achieved by reversing the compression steps
taken.
Ratio Compression
Ratio compression is used when the elements of an array are related
proportionally to a data set that can be created from other arrays already included in the
compressed record. Processing begins by creating an array of estimated values, and then
finding a ratio that minimizes the range of deltas between the estimated and actual values.
The present invention then searches for the optimum way to describe the deltas. The logic
flow of ratio compression is shown in Fig. 18.
For compressing natural gas flow data, ratio compression is used for three types of parameters: pressure extension, volume and energy. For each, an estimated array is
created from the arrays that have already been compressed, and the deltas from the actual arrays are then calculated. If the deltas are small, then a compression output record is created.
The estimated arrays are created in the following ways:
1. The estimated pressure extension array is calculated by taking the square
root ofthe product ofthe differential and static pressures. 2. The estimated volume array is created by multiplying flow time by pressure
extension and then by C. If the pressure extension array is not included in the gas flow data record, then the estimated pressure extension array is used. If either the flow time or C array
is not included, then they are assumed to be the value of 1.
3. The estimated energy array is simply the actual volume array. For non-gas flow applications, the estimated arrays are created in a similar manner from
arrays included in the compression record.
The present invention checks to make sure that multiplying the differential and static pressures will not result in an overflow before performing the square root, i.e., that the result will not exceed the maximum storage of 32 bits. The product size is estimated by
rewriting the product calculation in logarithms: log2 (product) ≤ log2 (differential pressure) + log2 (static pressure). Log2 is defined as the number of bits required to store a value. If the product were to overflow, then log2 (product) would exceed 32 by no more than N bits where
N is calculated as N = log2 (differential pressure) + log2 (static pressure) - 32. If the present
invention determines that N is greater than 0, it will prevent an overflow by first shifting
down both elements by a total of N bits before multiplying them, where N is first rounded up
to be an even number. The present invention shifts down the larger ofthe two elements first until it is the smaller, and then alternates shifting down the respective elements. This
technique is called "Shrink Product". Shrink Product then shifts the square root back up by N/2 bits. The equation for the process is:
estimate = [(differential pressure/2A) x (static pressure/2B)]° 5 x 2N/2, where N = A + B. A and B are the shift values and are increased one at a time while the larger of differential or static pressure is reduced by a factor of 2. When A + B equals N, then the shifting process
has completed.
Similarly, the present invention prevents an estimated volume overflow with
the following equation: estimated volume = pressure extension x flow time x C. The present invention performs the calculation in two steps: first multiplying the pressure extension by flow time and then multiplying the result by C. Either step can result in an overflow, defined in this case as exceeding the maximum Strip Top Bit representation of a number, meaning 30 bits. The product is re-written in logarithmic form as previously described and Shrink Product is used to avoid overflows. In this case, the two equations are rewritten log2 (product) <
log2 (multiplicand 1) + log2 (multiplicand2). If the product were to overflow, then log2 (product) would exceed 30 by N, where N = log2 (multiplicand 1) + log2 (multiplicand2) - 30,
and the result would be off by a factor of 2N1+N2. The present invention will correct this with an adjusting ratio.
For some applications ofthe present invention, the data arrays may be amenable for Ratio compression. If an array is proportionally related to another, a simple constant multiplied to one array will provide the values of another. In this situation, the original array may be used as the estimate array. In the situation where the second array is
proportional to the first with a constant offset added to the multiplication result, the estimate
is more difficult. If the offset is small compared to the results ofthe multiplication, it can be
ignored. If it is large, it may skew the ratio calculation. In this situation, the safest method is
to remove the offset before invoking ratio compression. However, this offset must be known, and would have to be included in the present invention build for the purpose intended. Some
ratios may include two offsets that vary over time. In this case, the best approach is to use the
lowest values of each array as separate offsets. The value ofthe offset must be included in the compression record.
Ratio compression assumes that the ideal ratio is the one that minimizes the sum of deltas when specified as:
∑ deltan = ∑ [actualn - (K x estimate,,)] = ∑ actual,, - K x ∑ estimate,,.
If ∑ deltan is set to equal 0, then K = ∑ actual„/∑ estimate,,. If K were a floating-point
number, then it would take 32 bits to specify in the compression record. The present invention uses a more compact representation by limiting the mantissa to 15 bits (14 bits after
Strip Top Bit) and the exponent to 5 bits. With a flag field and a mantissa size field, the ratio record can range from 1 bit to 24 bits. Allowing the mantissa to become greater than 15 bits
does not add any benefits. The product ofthe mantissa with the estimate is limited to 31 bits, so increasing the precision ofthe mantissa only reduces it for the estimate.
The exponent is essentially the number of places to shift the product to the right. Therefore, the product ofthe ratio and the estimate is the following equation:
K x estimate,, = (mantissa x estimate.) x 2"exponent.
The dynamic range ofthe ratio is from 4.66 x 10"10 to 3.28 x 104. The present invention calculates the initial ratio in four steps:
1. Calculate the sums ofthe estimate and actual arrays, while ensuring that neither sum will overflow. The exponent is initialized to 0 and adjusted to prevent an
overflow.
2. Adjust the actual array sum upward and the estimate array sum downward
to size the mantissa at 15 bits, and adjust the exponent accordingly.
3. Divide the estimate array sum into the actual array sum to create the mantissa.
4. If the exponent is greater than 31, then reduce the mantissa downward N bits to bring the exponent down to 31.
The present invention aborts further ratios processing if either array sum is 0, the calculated
mantissa is 0, or the calculated exponent is less than 0.
The present invention will also take steps to prevent an overflow in summing ratio arrays. Before summing either the estimate or actual arrays, the present invention will verify that the result will not exceed the maximum storage size of 32 bits. The sum is quickly estimated as:
Sum = ∑ elements < largest element x number of elements The equation can be written logarithmically as: log2(sum) < log2 (largest element) + log2 (number of elements)
If the sum were to overflow, than log2(sum) would exceed 32 by no more than N bits, where N is calculated as:
N = log2 (largest element) + log2 (number of elements) - 32. If the present invention determines that N is greater than 0, it can prevent an overflow by first shifting down each element N bits before summing pursuant to the equation: sum = ∑ (elements/2N).
The present invention increases the exponent by N when preventing an estimated array sum
overflow and reduces the exponent by N when preventing an actual array sun overflow.
The table presented in Fig. 19 is a sample gas flow record with 22 volume
values and estimated volume values. The estimated volume was created by multiplying flow time, C and the square root ofthe product of differential and static pressures, as previously
described. First, the present invention checks for a potential overflow in summing the actual and estimated volume arrays. In this example, an overflow may exist. The present invention
determines that adjusting all ofthe estimated volume elements downward by a factor of 22 is
necessary. After completing the calculations, the present invention determines that the actual
volume sum is 15016. Next, the present invention adjusts the estimated and actual volume
sums so that the calculated ratio will be 15 bits in size. For this example, the present
invention adjusts the estimated volume sum downward (378166712/211 = 184651) and the
actual volume sum upward (15016 x 218 = 3936354304). The exponent is increased by 1 1
and then by 18 to a new value of 31. Third, the present invention divides the actual volume sum (3936354304) by the estimated volume sum (184651) to create the mantissa (21317). Finally, the present invention recognizes that the exponent is greater than 31, and that it needs make no further adjustments. With a mantissa of 21317 and exponent of 31 , the initial ratio has the effective value of 21317 x 2"31 = 9.9265 x 10"6. After finding the initial ratio, the present invention must then determine the compression record size corresponding to this ratio. Next, it will shrink the mantissa by one
bit and reduce the exponent by one iteratively. With each iteration, the present invention
calculates what the compression record size would be for the new ratio along with 1 added to the ratio and 1 subtracted from the ratio. If a new mantissa and exponent result in a smaller
compression record, then the present invention saves these values. The present invention treats the new ratio as a better approximation ofthe optimum ratio. The present invention
then adjusts the new mantissa back up to 15 bits and the exponent upward, provided that the exponent does not exceed 31. The present invention repeats the mantissa shrinkage cycle repetitively until the compression record size no longer decreases. Tabular examples of this
process, through four iterations, are demonstrated in Figs. 20a - 20d.
When the present invention completes the mantissa shrinkage cycle, it has the
parameters necessary to create a no outlier compression record with one delta size. For
example, consider the pressure extension array of Fig. 21a. The present invention creates an estimate array by taking the square root ofthe product ofthe differential and static pressures as shown in Fig. 21b. The present invention determines that the optimum ratio has a mantissa
of 289 and an exponent of 8. This is the equivalent of 1.129. Multiplying the estimated pressure extensions by this ratio creates a range of deltas from -2 to 13, as seen in Fig. 21c.
The present invention picks the lowest delta as the base and adjusts the deltas upwards. Since
the largest delta is 15, the deltas can fit into 4 bits. The compression record for the pressure extension array is shown in Fig. 2 Id.
Ratio Compression - No Outliers/Two Delta Sizes
Ratio Compression next considers dividing the deltas into two size bins. Ratio searches every possible lower bin size, which can range from 0 to one less than the largest bin size. Ratio repeats this process by using the largest delta as the base and subtracts each delta from it. If any two-size combination from either search results in an overall smaller compression record, ratio compression will save the parameters. For example, see Fig. 22a.
Flow time, C and pressure extension values are not available, so the present invention uses
the square root ofthe product ofthe differential and static pressures for the volume estimate. The present invention determines that the optimum ratio has a mantissa of 3 and an exponent
of 2. This is the equivalent of 0.75 (3x2'2). Multiplying the estimated volume values by this
ratio crates a range of deltas from -455 to 15 as seen in Fig. 22b. The present invention picks
the highest delta of 15 and subtracts the deltas from it. The deltas now range from 0 to 470, where the larger bin is 9 bits wide. The greatest compression comes by splitting the deltas
into two bins, where the smaller bin holds deltas that are 61 or smaller (6 bits wide). The compression record is shown in Fig. 22c.
Ratio Compression - Outliers/Delta from Range
Ratio compression next considers specifying outliers as the value ofthe
record's delta range. The method is the same as that described for the Delta From Base -
Outliers/Delta From Range method previously discussed. Ratio compression shrinks the
delta bin size, or field bit width, one bit at a time and then calculates the value ofthe difference between the top ofthe data range and the outliers. Ratio repeats this process by using the largest delta as the base and subtracting each delta from it. If any combination
results in an smaller compression record, the parameters are saved. Fig. 23a shows an
example pressure extension array. The present invention uses the square root ofthe product ofthe differential and static pressures for the pressure extension estimate. The present invention determines that the optimum ratio has a mantissa of 1 and an exponent of 0. Since the ratio is 1 , the present invention uses a single flag to indicate a ratio of one in the
compression record. Fig. 23b shows an array for the same example readings as Fig. 23a with
volume, the ratio estimate, the delta, and a value for (base - delta). The present invention picked the highest delta of 187 and subtracted the other deltas from it. Using a delta size of 7 and treating all deltas below 68 as outliers provides the optimal size compression file. Outlier values range up to 993 outside the acceptable delta range, requiring 10 bits to store. Fig. 23c shows the compression record for this example. The first 5 lines ofthe record indicate that the data is compressed by ratio compression with many outliers and indicate the positions of
those outliers. The base is indicated in the next three lines as positive 68. A flag indicating direction from base for the outliers is next, bit lengths are specified, and the outlier values are
identified.
Decompression of records compressed by Ratio Compression applies the
selected ratio equation to the other decompressed values ofthe transmitted data, followed by
addition or subtraction ofthe deltas that designate the variation for each data entry from the
ratio. Other Compression Methods
In addition to the five core compression methods, the present invention applies Time Compression, No Flow Record Removal, Daily Summary Compression, and One
Record Compression to source files. Readings are typically taken once an hour for the
purposes that the present invention was designed. The present invention can compress the
time array very compactly if every element is taken at the top ofthe hour and every
subsequent reading is one hour later. Under these circumstances, the present invention
identifies that there are no outliers and specifies the first time reading by hour after midnight.
If either of these conditions is not met, an outlier record is created.
If any element in the time array violates either condition (i.e., not at the top of
the hour exactly one hour after the previous reading), the present invention creates an outlier bitmap for all the data entries. The first entry is flagged as an outlier. Any other entry that is not exactly 60 minutes from the preceding entry is also flagged as an outlier. The full value of all outliers is stored in the compressed file. During decompression, the outlier values are
placed in the time array, and all other values are determined by adding 60 minutes to the preceding data value.
After the time array is compressed, the invention searches the data for records for "no flow" records. No flow records exist when all readings are zero with the exception of static pressure and temperature. The invention then removes the zero elements form the
differential pressure, flow time, C, pressure extension, volume and energy arrays. The records remaining for compression are reduced by the number of no flow records. The
present invention uses the outlier method to identify which records are no flow.
The present invention creates a summary value of each given parameter array after compression. The summary value may be the daily average or it may be a total across
24 hours of records. The present invention begins by calculating the average or total of a
parameter array and then subtracts the estimated value from the actual value. A number of summation rules are used. To estimate the summary average, the present invention first checks for an overflow, i.e., the sum exceeds the maximum storage space of 32 bits. The upper limit ofthe sum is estimated by:
sum = ∑ elements < largest element x number of elements. The equation is rewritten as:
log2 (sum) < log2 (largest element) + log2 (number of elements). If the sum were to overflow, then log2 (sum) would exceed 32 by no more than N bits where
N = log2 (largest element) + log2 (number of elements) - 32.
If the present invention determines that N is greater than 0, it can prevent an overflow by shifting down each element N bits before summing and shifting the average back up by N bits. The equation employed is: average = {[ ∑ (elements / 2N) ] / number of elements} x 2N. To estimate the summary total, the present invention will check during the
summation whether the total exceeds the upper limit of 31 bits. If the limit is exceeded, the
present invention will set the total to the upper limit. Next, it extrapolates the total upwards if there are fewer than 24 records. The revised total is calculated by: total = [( ∑ elements) x 24] / number of elements.
The equation is rewritten as: log2 (total) < log2 ( ∑ elements) + log2 (24 / number of elements).
If the log2 (total) can equal or exceed 32, then the present invention sets the total to the upper limit. Otherwise, it checks if the [( ∑ elements) x 24] portion ofthe extrapolation will exceed 32 bits by calculating
N = log2 (largest element) + log2 (24) - 32. If N is greater than 0, the present invention can prevent an overflow by first shifting down ∑
(elements) by N bits before dividing by (number of elements), then shifting the total back up
by N bits. The equation used is: total = ({[ ( £ elements) / 2N] x 24} / number of elements) x 2N.
After creating the estimate ofthe average or total, the present invention determines the difference with the actual summary. Depending on the size ofthe difference,
the present invention will create on of three types of compression records. If the difference is
between -8 and +7, the small size summary compression record is used. The compression record includes a small size indicator (1 bit), a negative sign indicator (1 bit) and a value for
the difference (3 bits).
If the difference is outside the -8 to +7 range, the actual summary value must be included in the compression record. If an average summary value is less than or equal to
the size ofthe largest element in the parameter array, or if a total summary value is less than
or equal to the size ofthe total estimate, the present invention can assume the size ofthe summary value in the compression record. The compression record will include a flag for the summary size being other than small size, a flag to indicate the value is within the calculated range, and the summary value.
If the summary value cannot fit into the either ofthe two previous fields, the
present invention will create a large size compression record that includes the summary size
field. In this case, the compression record must add a designator for the size ofthe field
needed to hold the value (after Strip Top Bit is applied).
Finally, if each array in the input record has only one reading, the present
invention will put all ofthe parameters in one array instead of having an array for each type of
data reading. If daily summaries are set for that reading, they are interspersed with the readings from that hour. The composite array is then compressed by Strip Zeros.

Claims

I claim:
1. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the raw data entries into data arrays; predicting the size of a compressed file from each of a plurality of compression methods; selecting the optimum compression method for each data array;
applying the optimum compression method for each data array to that data array; and creating a computer compression record that includes identification ofthe manner in
which data was stored and the data itself of a significantly smaller size than the original data
file.
2. A method according to claim 1 wherein one ofthe plurality of compression
methods comprises identifying that each data entry is identical to every other data entry and storing the entry only one time.
3. A method according to claim 1 wherein one ofthe plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe
character string.
4. A method according to claim 1 wherein one ofthe plurality of compression
methods comprises selecting one data entry as a base value of a data array, subtracting the
base value from remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.
5. A method according to claim 1 wherein one ofthe plurality of compression methods comprises selecting and storing a first data entry ofa data array as a base, and determining the difference between each remaining data entry and its preceding data entry and
storing the differences for the remaining data arrays.
6. A method according to claim 1 wherein one ofthe plurality of compression methods comprises determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining
the variation ofthe actual data entries from the results ofthe mathematical equation applied to the corresponding data entries ofthe two or more other data arrays, and storing the mathematical equation and each variation.
7. A method according to claim 1 wherein the plurality of compression comprises:
identifying that each data entry is identical to every other data entry and storing the entry only one time; removing unnecessary zeros from the beginning ofa character string that is a binary
representation of a data value and storing only the meaningful portion ofthe character string;
selecting one data entry as a base value of a data array, subtracting the base value from
remaining data entries ofthe data array, and storing the base value for one data entry and the
calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the
difference between each remaining data entry and its preceding data entry and storing the difference; and
determining a mathematical equation that will approximately describe the data entries
of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the
corresponding data entries ofthe two or more other data arrays, and storing the mathematical equation and each the variation.
8. A method for compressing data files comprising the steps of:
acquiring raw data and organizing the data into data arrays of similar type where the raw data represents readings from instruments, including, but not limited to, entries identifying the time of readings and amount of time between readings;
determining which entries representing the amount of time between each reading was
a predetermined uniform amount of time, and, if so, storing only the beginning reading time;
determining whether data entries indicate that no activity occurred between a time reading and a next time reading, and, if so, removing the record corresponding to the next time reading; predicting the size of a compressed file that would result from each of a plurality of
compression methods applied to each data array; selecting the optimum compression method for each data array;
storing instructions indicating data compression methods and data field sizes; storing data compressed by the plurality of compression methods;
storing data entry values that are excluded from the data stored by the plurality of
compression records;
compressing and storing summaries of a day's worth of data; and creating a computer compression record that includes identification ofthe manner in
which data was stored and the data itself, resulting in a compressed computer file of significantly smaller size than the original data file.
9. A method according to claim 8 wherein one ofthe plurality of compression
methods comprises identifying that each data entry is identical to every other data entry and
storing the entry one time.
10. A method according to claim 8 wherein one ofthe plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe character string.
11. A method according to claim 8 wherein one ofthe plurality of . compression methods comprises selecting one data entry as a base value of a data array, subtracting the base value from remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.
12. A method according to claim 8 wherein one ofthe plurality of
compression methods comprises selecting and storing a first data entry of a data array as a
base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference.
13. A method according to claim 8 wherein one ofthe plurality of
compression methods comprises determining a mathematical equation that will approximately
describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the corresponding data entries ofthe two or more other data arrays, and
storing the mathematical equation and each variation.
14. A method according to claim 8 wherein the plurality of compression
methods comprises
identifying that each data entry is identical to every other data entry and storing the
entry only one time; removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe character string; selecting one data entry as a base value of a data array, subtracting the base value from
remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference; and
determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the
corresponding data entries ofthe two or more other data arrays, and storing the mathematical equation and each the variation.
15. A method for compressing data files comprising the steps of:
acquiring raw data entries and organizing the data entries into data arrays of similar
type;
determining an equation that closely describes each data entry of one data array as a comparison of a plurality of other corresponding data entries from other data arrays;
determining a variation of each actual data entry from the result ofthe equation; storing the equation; and storing the variations.
16. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar
type;
determining the number of bits necessary to identify each data entry;
storing each data entry in a data field where the data field width equals the number of
bits necessary to store the data entry.
17. A method according to claim 1 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar
type; determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.
18. A method according to claim 7 further comprising the steps of:
acquiring raw data entries and organizing the data entries into data arrays of similar
type; determining the number of bits necessary to identify each data entry;
storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.
19. A method according to claim 8 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar
type; determining the number of bits necessary to identify each data entry;
storing each data entry in a data field where the data field width equals the number of
bits necessary to store the data entry.
20. A method according to claim 14 further comprising the steps of:
acquiring raw data entries and organizing the data entries into data arrays of similar type;
determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.
21. A method for compressing data files comprising the steps of:
acquiring raw data entries and organizing the data entries into data arrays; applying a compression method to the data entries; identifying data entries that would allow more efficient compression if they were
excluded from the data entries to which the compression method was applied;
storing the excluded data entry separately from the data entries to which the compression method was applied.
22. A method according to claim 1 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality
of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied;
storing the excluded data entry separately from the data entries to which the compression method was applied.
23. A method according to claim 7 further comprising the steps of:
identifying data entries that would allow more efficient compression by the plurality
of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied;
storing the excluded data entry separately from the data entries to which the
compression method was applied.
24. A method according to claim 8 further comprising the steps of:
identifying data entries that would allow more efficient compression by the plurality
of compression methods if the identified data entries were excluded from the data entries to
which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.
25. A method according to claim 14 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality
of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.
26. A method according to claim 1 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and
reversing the compression process to recreate the original data array.
27. A method according to claim 7 further comprising the steps of:
reading the computer compression record,
identifying how data entries were stored, and reversing the compression process to recreate the original data array.
28. A method according to claim 8 further comprising the steps of: reading the computer compression record,
identifying how data entries were stored, and reversing the compression process to recreate the original data array.
29. A method according to claim 14 further comprising the steps of: reading the computer compression record,
identifying how data entries were stored, and
reversing the compression process to recreate the original data array.
30. A method according to claim 17 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.
31. A method according to claim 21 further comprising the steps of:
reading the computer compression record,
identifying how data entries were stored, and reversing the compression process to recreate the original data array.
32. A method according to claim 22 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and
reversing the compression process to recreate the original data array.
PCT/US2001/004986 2000-02-15 2001-02-15 Method for compression of small computer data files WO2001061543A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001238365A AU2001238365A1 (en) 2000-02-15 2001-02-15 Method for compression of small computer data files

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US50415300A 2000-02-15 2000-02-15
US09/504,153 2000-02-15

Publications (2)

Publication Number Publication Date
WO2001061543A1 WO2001061543A1 (en) 2001-08-23
WO2001061543A9 true WO2001061543A9 (en) 2002-10-17

Family

ID=24005072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/004986 WO2001061543A1 (en) 2000-02-15 2001-02-15 Method for compression of small computer data files

Country Status (2)

Country Link
AU (1) AU2001238365A1 (en)
WO (1) WO2001061543A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1292036B1 (en) * 2001-08-23 2012-08-01 Nippon Telegraph And Telephone Corporation Digital signal decoding methods and apparatuses
US6577254B2 (en) * 2001-11-14 2003-06-10 Hewlett-Packard Development Company, L.P. Data compression/decompression system
US7672262B2 (en) 2005-04-22 2010-03-02 Baker Hughes Incorporated System, method, and apparatus for command and control of remote instrumentation
EP2153610A4 (en) * 2007-06-01 2010-09-15 Research In Motion Ltd Interactive compression with multiple units of compression state information
GB2471056B (en) * 2010-03-09 2011-02-16 Quantum Corp Controlling configurable variable data reduction
CN107409152B (en) * 2015-03-12 2021-10-15 英特尔公司 Method and apparatus for compressing data received over a network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE8307228D0 (en) * 1983-12-30 1983-12-30 Grundstenen 16808 Ab data compression
US4988998A (en) * 1989-09-05 1991-01-29 Storage Technology Corporation Data compression system for successively applying at least two data compression methods to an input data stream
US5649151A (en) * 1992-06-29 1997-07-15 Apple Computer, Inc. Efficient method and apparatus for access and storage of compressed data
US5546575A (en) * 1994-05-23 1996-08-13 Basil E. Potter & Associates, Inc. Encoding method for compressing a tabular database by selecting effective compression routines for each field and structure of partitions of equal sized records
US5678043A (en) * 1994-09-23 1997-10-14 The Regents Of The University Of Michigan Data compression and encryption system and method representing records as differences between sorted domain ordinals that represent field values

Also Published As

Publication number Publication date
WO2001061543A1 (en) 2001-08-23
AU2001238365A1 (en) 2001-08-27

Similar Documents

Publication Publication Date Title
CA2475186C (en) Method and apparatus for windowing in entropy encoding
US8037034B2 (en) Methods of creating a dictionary for data compression
Lewis Interleaved memory function interpolators with application to an accurate LNS arithmetic unit
CA2283591C (en) Data coding network
US8452825B2 (en) Sortable floating point numbers
CN101095284B (en) Device and data method for selective compression and decompression and data format for compressed data
EP1995974B1 (en) Method for realizing arithmetic coding
US20070050436A1 (en) Order-preserving encoding formats of floating-point decimal numbers for efficient value comparison
US7973680B2 (en) Method and system for creating an in-memory physical dictionary for data compression
CN104834539B (en) A kind of data increment update method
US20110234432A1 (en) Systems and methods for optimizing bit utilization in data encoding
US6253222B1 (en) Compression of limited range floating point numbers
US5995970A (en) Method and apparatus for geographic coordinate data storage
WO1996021200A1 (en) Method and apparatus for digital data compression
CN107291832B (en) Data storage method based on list storage structure
US7921144B2 (en) Fast correctly-rounding floating-point conversion
US5724253A (en) System and method for searching data vectors such as genomes for specified template vector
WO2001061543A9 (en) Method for compression of small computer data files
US6240431B1 (en) Decompression of limited range floating point numbers
CN104077272A (en) Method and device for compressing dictionary
KR102243119B1 (en) Quantization apparatus and method using numerical representation with dynamic precision
Chandrasekaran et al. Polynomial testing of the query “Is ab≥ cd?” with application to finding a minimal cost reliability ratio spanning tree
JP3018990B2 (en) Arithmetic coding device
US20240005213A1 (en) System and method for off-chip data compression and decompression for machine learning networks
JPH0137047B2 (en)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/30-30/30, DRAWINGS, REPLACED BY NEW PAGES 1/33-33/33; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP