WO2001061543A9

WO2001061543A9 - Method for compression of small computer data files

Info

Publication number: WO2001061543A9
Application number: PCT/US2001/004986
Authority: WO
Inventors: Wallace Walter
Original assignee: Orbcomm Global L P
Priority date: 2000-02-15
Filing date: 2001-02-15
Publication date: 2002-10-17
Also published as: WO2001061543A1; AU2001238365A1

Abstract

A method for compressing small computer files through the use of reduction of values and removal of unnecessary bits in the compressed computer file. The method examines data in arrays or groups that share certain characteristics and evaluates each of a plurality of compression methods for each array. Compression methods include, without limitation, Mode Compression, Strip Zeros compression, Delta From Base compression, Difference from Last compression, and Ratio Compression. It chooses the best compression method for each array. The method also applies certain compression techniques to all data and uses varying field widths throughout to minimize the compressed file. It results in substantially reduced file sizes compared to traditional compression methods for small computer files.

Description

METHOD FOR COMPRESSION OF SMALL COMPUTER DATA FILES

FIELD OF THE INVENTION

The present invention generally relates to compression of binary data for ease of transmission between a source and receiving device, and more particularly to compression of binary data files by selecting an optimum compression method from a plurality of compression methods.

BACKGROUND OF THE INVENTION

Many data compression methods are known in the art. Most available data compression methods examine the frequency of data character strings and assign variable

length codes to represent that data. More frequently appearing and longer strings receive

shorter codes and rare byte strings receive longer codes. When the resulting compressed file

is prepared for transmission, it must include a definition ofthe codes used in the compression.

These compression programs have two major faults when used to compress small files. First,

the definition ofthe codes is large relative to the data to be compressed, and second, when data changes slightly from record to record, few repetitive byte values can be assigned compact codes to represent them.

The present invention, although useful for many applications, was originally designed for the purpose of reducing file sizes of data recorded at remote locations of natural

gas flow, such as where the natural gas flowed from the well into a pipeline. Many natural

gas wells are located in places that are difficult to reach. Prior to the development ofthe present invention, data was recorded and stored on site. Approximately once a month, a

person would visit the site with the resulting expenses in time and travel costs and download the data into a laptop computer. One method to avoid the expense of such a process was to transmit the data from the remote site to a central location. However, standard telephone lines are not typically run to these sites, and the cost of installing a telephone line or similar communication

network line is prohibitive. The introduction of satellite communication systems provided

another alternative for data transmission.

However, even the satellite communication option proved to be prohibitively

expensive. The present invention has reduced the size ofthe file needed to be transmitted to a

point where it is more economical to use satellite communication than to visit each site.

Traditional compression methods, such as PKZip^®, examine the frequency of individual bytes of data and assign variable length codes to represent that data, with

frequently appearing byte strings receiving shorter codes than more rare byte strings. In contrast, the present invention eliminates unnecessary bits and makes use ofthe principle that

smaller numbers are identified by fewer bits than larger numbers.

The prior art teaches few compression methods that are optimized for small

files composed of data arrays. In most applications, reducing a file of 900 bytes to 200 bytes

represents a substantial cost savings over other small file compression methods.

SUMMARY OF THE INVENTION

Computers store and use numbers in binary format (base 2), instead ofthe more familiar base 10 format. Computers typically use a 16-bit or 32-bit segment of binary

numbers, regardless ofthe number's actual value. Binary digits, or bits, may have a value of

either 0 or 1. In a 16-bit computer system, 1024 (base 10) is expressed as 00000100 00000000 (base 2 or binary). 1023 is expressed as 00000011 11111111, and uses one less meaningful bit than the value of 1024. Just as in base 10 notation, the zeros included in the 16-bit number before the first digit that is not zero do not have any numerical significance.

In other words, just as in base 10 format 000001023 is identical to 1023, in binary format, 00000011 11111111 is identical to 11 11111111. The present invention uses several methods

to compress the number of bits used to store data entries, either by removing unnecessary bits or by designating the number in a fashion that requires fewer bits.

The present invention examines data arrays found in gas flow records and

evaluates, for example, five different compression algorithms for each data array. The best algorithm choice depends upon the underlying characteristics ofthe array. A data sample

might include time, differential pressure, static pressure, temperature, flow time, C , pressure

extension, volume and energy data values. After these measurements are arranged in arrays

by the present invention, each of five algorithms is applied to each array and the method is chosen for each array that allows maximum compression. Each compression method makes

use of a few compression techniques that further reduce the size ofthe compressed file.

The present invention uses dynamic data field sizing throughout the

compressed file to identify data values. If four bits are sufficient to positively identify a

value, the present invention will use only four bits instead ofthe 16 bits or 32 bits used in the standard computer format. Many ofthe compression methods and techniques are in

themselves novel, such as Ratio Compression, Strip Top Bit, and Outlier Compression. The

present invention also includes novel methods for specifying time and summary data.

Some data arrays, such as flow time, may include many repetitions ofthe same

number. The present invention searches for the most frequently repeated value in the data

array, or the mode ofthe array. The present invention will include the mode only one time in the compression record. If the array has variations from the mode, the variations are treated as outliers and treated individually. The present invention identifies the position of outlier

values and their values in the compressed file. This compression method is called Mode compression. Some data arrays, such as static pressure or C readings, may change by small

increments in relation to the value ofthe reading. The present invention will search for the

smallest value in the array and include that value only once in the compressed file. It will then subtract that value from the remaining values and include the difference, or delta, in the compressed file. This compression method is called Delta From Base.

Some data arrays, such as temperature readings or storage tank level readings, increase or decrease regularly by relatively small amounts. The present invention will include

the first reading in the compressed file. It will then subtract that reading from the subsequent

reading and include that difference in the compressed file. This compression method is called

Difference From Last.

Some data arrays are proportional to another array or a combination of other

arrays in the data file. In the gas flow application, volume is proportional to C, pressure

extension, and flow time. Energy is proportional to differential pressure and static pressure.

The present invention creates an array of estimated values and finds a ratio that when multiplied to the estimated values will minimize the range of deltas between the multiplied

values and the actual values. The ratio and the deltas are included in the compressed file. This compression method is called Ratio compression.

Some data arrays are most compactly expressed simply by stripping zeros that

are not necessary to identify the values ofthe array. The present invention examines the array

and determines the maximum number of bits necessary to identify the values included. For example, 100,000 is expressed in binary form as 00000000 00000001 10000110 10100000.

If 100,000 is the largest value ofthe array, the present invention can safely strip the first 15 bits from the binary representation, as they will be 0 for every value contained in the array.

This compression method is called Strip Zeros. Each ofthe five compression methods also makes use of other compression techniques to further reduce the size ofthe compressed file. Strip Top Bit is routinely used to remove the first bit of compressed values when it is safe to assume that the value ofthe top

bit must be 1. The outlier compression method allows the program to identify which elements of an array are "outliers" or expressed differently than the majority of other

elements. It will create an outlier bitmap that has one bit for each element ofthe array. The bit ofthe bitmap that corresponds to each outlier will have a value of 1, while all non-outlier

values will be 0. In other words, an outlier bitmap is a series of bits that "maps" which values of an array must be treated as outliers.

It is an object ofthe present invention to provide a method of compression that

substantially improves the compression ratio for small computer data files.

It is a further object ofthe invention to optimize the compression of files by

choosing from a plurality of compression methods for different sections of data that are

transmitted. It is a further object of the invention to provide a method of compressing data

by identifying an array of data values as a ratio of other data values.

It is a further object ofthe invention to provide a methods of removing the first

bit of an array of data entries when that bit may be assumed to be there.

It is a further object ofthe invention to optimize the compression file for each compression method by identifying and compressing outliers separately, thereby reducing the

record size for the remaining elements.

It is a further object ofthe invention to provide a method for compressing the

average or sum value of an array. BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a top-level processing diagram, or flowchart, showing the process of the present invention;

Fig. 2a is a table illustrating a sample data array; Fig. 2b is a compression record created for the data contained in Fig. 2a;

Fig. 3a is a table illustrating a sample data array;

Fig. 3b is a compression record created for the data contained in Fig. 3a;

Fig. 4 is a compression record illustrating the Strip Top Bit compression

method; Fig. 5 is a top-level processing diagram, or flowchart, showing the process of

the Strip Zeros compression method;

Fig. 6a is a table illustrating a sample data array;

Fig. 6b is a compression record created for the data contained in Fig. 6a; Fig. 7a is a table illustrating a sample data array with binary values; Fig. 7b is a compression record created for the data contained in Fig. 7a;

Fig. 7c is another compression record for the data contained in Fig. 7a;

Fig. 7d is another compression record for the data contained in Fig. 7a;

Fig. 8 is a top-level processing diagram, or flowchart, showing the process of

the Mode compression method; Fig. 9 is a top-level processing diagram, or flowchart, showing the process of

the Delta From Base compression method;

Fig. 10a is a table illustrating a sample data array; Fig. 10b is a compression record for the data contained in Fig. 10a;

Fig. 1 la is a table illustrating a sample data array; Fig. 1 lb is a compression record for the data contained in Fig. 1 la; Fig. 12a is a table illustrating a sample data array;

Fig. 12b is a compression record for the data contained in Fig. 12a;

Fig. 13a is a table illustrating a sample data array;

Fig. 13b is a compression record for the data contained in Fig. 13a; Fig. 14 is a top-level processing diagram, or flowchart, showing the process of the Difference From Last compression method;

Fig. 15a is a table illustrating a sample data array;

Fig. 15b is a compression record for the data contained in Fig. 15a;

Fig. 16a is a table illustrating a sample data array; Fig. 16b is a compression record for the data contained in Fig. 16a;

Fig. 17a a table illustrating a sample data array;

Fig. 17b a compression record for the data contained in Fig.17a;

Fig. 18 is a top-level processing diagram, or flowchart, showing the process of

the Ratio compression method;

Fig. 19 is a table illustrating a sample data array;

Figs. 20a - 20d are tables showing iterations ofthe mantissa determination of

the Ratio compression method;

Fig. 21a is a table illustrating a sample data array;

Fig. 21b is a table showing estimate data used by the Ratio compression

method;

Fig. 21c is a table showing deltas determined by the Ratio compression

method;

Fig. 21d is a compression record for the data contained in Fig. 21a;

Fig. 22a is a table illustrating a sample data array; Fig. 22b is a table showing estimates determined by the Ratio compression method;

Fig. 22c is a compression record for the data contained in Fig. 22a;

Fig. 23a is a table illustrating a sample data array;

Fig. 23b is a table illustrating an intermediate step ofthe Ratio compression

method; and

Fig. 23c is a compression record for the data contained in Fig. 23a.

DETAILED DESCRIPTION OF THE INVENTION

The present invention was designed to compress the binary representation of numbers. It performs its compression by removing unnecessary bits from binary

representations and designating numbers in more compact fashion. Computer systems use

16-bit or 32-bit binary values to designate values, regardless of how many bits are actually

required. In a 16-bit system, 1024 is designated as 00000100 00000000. Under normal

circumstances, computer systems will always use all 16 bits regardless of whether they are necessary. One way the present invention will reduce the amount of bits required to store the

number is to remove the 0's preceding the first occurring value of 1. Although computers keep track of values in the 16-bit or 32-bit format, the preceding 0's are not necessary to

designate the value ofthe number in the same way that 0's preceding base 10 numbers don't

mean anything. For example, in base 10, 00000576 identifies the same value as 576. Likewise, 00000100 00000000 in binary format identifies the same value as 100 00000000.

The number of bits necessary to hold a particular base 10 number is identified

by the equation N bits = 2^N - 1. In other words, 4 bits can hold a maximum value of 2⁴ - 1 or 15. 15 is represented in binary form as l l l l. Similarly, in base 10, the maximum value of N

digits is expressed by the equation N digits = 10^N - 1. 4 digits in base 10 can hold a maximum value of 10⁴ - 1 or 9999. As discussed more fully below, the present invention

reduces the values of some data entries so that those entries may be designated by fewer bits. Referring to Fig. 1 , a flowchart illustrates the basic steps included within the present invention. Raw data is input. In the gas flow application, the present invention immediately compresses time data (time of day that readings were taken) and removes "no

flow" data. Time data readings are typically one hour apart at the top ofthe hour. Under

these circumstances, the present invention will only send the original time value. If variations

occur, the outlier compression method is used, as discussed below. In the gas flow data

application, records where flow did not occur are identified when all readings are 0 except for temperature and static temperature. These records are removed from the data to be compressed, and indices for which records are no flow are included in the compressed file.

After compressing the time records and removing the no flow records, each of five algorithms are applied to each data array and the best method is selected. The five

different compression methods are:(l) Strip Zeros; (2) Mode; (3) Delta From Base; (4)

Difference From Last; and (5) Ratio Compression. In simple terms, Strip Zeros compression

finds the largest value in an array and determines how many bits are required to hold that

value. The present invention then saves the data in data fields limited to that number of bits,

eliminating the string of zeros customarily in front ofthe actual data. Mode Compression

searches every parameter array for the value most frequently repeated, or the mode value. For example, when flow time is consistently 60 minutes, Mode Compression will store one value along with an instruction to repeat that value in each field ofthe array. Any deviations from the mode are identified with the Outlier Compression. If there are no exceptions, the present invention will only save the beginning time value and need not include the delta standard

value. The Delta From Base method finds the lowest value in the array and subtracts that

number from all array values. If the remainders are small relative to the base, then the compressed output containing the base and the deltas will be much smaller than the original

array. The Difference From Last method is similar to Delta From Base, but the deltas are the differences from the previous values instead of from a base number. This method works particularly well for parameters that change relatively uniformly over time, such as temperature and tank levels. Finally the Ratio Compression method is used when the

elements of an array are related proportionally to a data set that can be created from other arrays already included in the compression record. The present invention finally performs a

daily summary compression.

Most ofthe compression algorithms ofthe present invention use two

techniques to compress the data included in the compressed file: Outlier Compression and

Strip Top Bit. Each technique minimizes the number of bits required to identify the data to

be compressed.

Outlier Compression

Outlier values are those in data arrays that require unique identification, allowing the remainder ofthe elements to be expressed in a more compact format. For example, Fig. 2a shows an array of 24 data entries having 21 identical entries and three

varying entries, the varying entries being the first, third, and fourth locations. The three

varying entries are considered outliers. A bitmap indicating which entries are outliers,

including one bit for each data entry, appears as 10110000 00000000 00000000. The outlier

bitmap indicates with a 1 in the first, third, and fourth locations ofthe 24-element bitmap that the first, third, and fourth data entries are outliers. If the present invention were to use this configuration, it would require 24 bits to specify which elements were outliers, plus an additional bit to specify that more than one outlier exists. The value of the 21 identical entries is included only once. Although the outlier values must be stored individually, the

total amount of data stored is significantly reduced. When just a few data values are outliers and the array contains seven or more

elements, such as the array shown in Fig. 2a, the present invention will use a second outlier method for identification. See Fig. 2b. Fig. 2b shows a compression record where that same identification is accomplished with only 13 bits. As shown in Fig. 2b, the first bit indicates

that it is a "few outliers" type data array. The next four bits indicate the number of outliers.

The number of outliers indicator is safely reduced by one because at least one outlier must

exist to invoke this record. The next bit indicates whether the index is recorded in reverse order from the actual order ofthe data array. In situations where the outliers are towards the

front ofthe data array, subsequent indices are specified with fewer bits when expressed in the backwards direction. The present invention examines both directions ofa data array and

chooses the direction that results in the smallest resulting data file. If only one outlier exists,

the direction indicator is not used as it would be meaningless. The first outlier index in this

example is 3. Five bits are used to store this index because 5 bits are sufficient to indicate any number between 0 (00000₂) and the 23 indices of this example (10111₂). The next outlier must be at least one less than the first index, or 2 in this example. Therefore, the position can

be indicated in only two bits. The final index can only be one less than 2, so the final index

(0) is stored with one bit. Therefore, 13 bits identify the outliers in place ofthe 24-bit outlier

bitmap.

When outliers are congregated at one end of an array, the outlier method can

further compress the record. For example, Fig. 3a shows a 24-element data array where

elements 15, 17 and 20 - 24 (indicated by index numbers 14, 16 and 19 - 23) are outliers. Because five outliers are clustered at the end ofthe array, the last four positions do not have

to be indicated individually. During the decompression process, the present invention keeps track of how many outlier positions remain to be determined. When that number matches the remaining number of indices, the remaining indices must be outliers and do not need indicators. The compression record for Fig. 3a is shown in Fig. 3b. Strip Top Bit

The present invention uses varying length data fields to identify data entries

throughout its process. When the data files are decompressed, the invention must know the lengths ofthe fields. The present invention will size the fields to match the largest size

necessary to convey the required information. For example, assume that Delta From Base

was used to compress an array. If the deltas are 17, 0, 23, 45, 3, 8 and 29, the present

invention will set the data field to 6 bits - the number of bits necessary to identify the number 45 in binary form.

In some situations, the present invention may determine that the first bit of every data entry is 1. For example, in a Delta From Base array, the base number might be

2000. The binary representation of 2000 is 11111010000. The initial "1" of that binary string

represents a value of 1024. If the greatest Delta From Base value is less than 976, the present

invention knows that all values ofthe first bit for each data entry are "1" and will remove the

top bit from the compressed file. When the file is decompressed, that "1" is added back in. This method saves one bit for every entry after the base entry less bits needed to define the

modification. A compression record illustrating this example is shown in Fig. 4.

Sometimes, Strip Top Bit must deal with 0 as an element. It does this by setting the field width value to maximum. For example, when compressing an array with a maximum value of 2³⁰ - 1, five bits are necessary to specify the field width. If the largest value is in fact 2³⁰ - 1, then the file size will be stored as 11110. To express 0, the field width will be set to the maximum size or 11111. In other words, 0 is specified as 2^N - 1 , where N is

the size ofthe filed width value. Data Input Limits

The present invention, in its optimal form, works only with positive integer values. The upper limit is bound by Strip Top Bit, which uses the value 31 to specify

subsequent data values of 0. This number limits bit widths for data to 30, which has a maximum decimal amount of 1,073,741,823. To ensure error free execution, the present invention may adjust the raw data to remain below this maximum number.

For example, the present invention adds an offset to all temperature data to make certain that the data consists only of positive integers. By default, all temperature

readings have 40 degrees added to them. Of course, the actual value ofthe increase will depend on the accuracy ofthe reading. If temperature is measured in lOOths ofa degree, the

present invention will multiply the readings by 100 to turn the fractions into integers. The 40-

degree offset becomes 4000 under these circumstances. This offset is subtracted when the data is decompressed.

After adding the temperature offset, the present invention checks all the data

values to verify that they fall within the acceptable data range. If they do not, the present

invention either raises them to 0 (if negative) or lowers them to 1,073,741,823 (the maximum

value the present invention can handle). For example, if temperature us -5000, adding the

offset results in -1000, which is less than 0. 0 is substituted for -1000.

It should be noted that the present invention is easily modified to

accommodate negative numbers or a mix of positive and negative numbers. The offset

technique applied to temperature reasons is easily expanded to address negative data entries. The present invention could also address negative and positive numbers by adding a sign entry where necessary, either as an inflection indicator or a sign indicator for each data entry.

Strip Zeros The first compression method analyzed by the present invention is Strip Zeros.

When a typical number is stored in binary format, many ofthe binary digits at the beginning ofthe number are zeros. For example, 100,000 is expressed in binary format as:

00000000000000011000011010100000.

The present invention recognizes that 15 of these bytes are not necessary and only the last 17

bits are required to identify the value that needs to be stored, resulting in:

1 10000110 10100000.

In this example, Strip Zeros saves 15 bits for a single data entry less the bits necessary to

define the data field.

Strip Zeros evaluates four distinct methods of compressing arrays of numbers

and selects the most compact method. The four methods are: (1) uniform field size; (2) single

field size; (3) individual field size; and (4) two field sizes. When Strip Zeros is selected as

the most efficient compression method by the present invention, only one ofthe four methods is used. A flow chart showing the logic behind the choice is demonstrated in Fig. 5.

In the uniform method field size, the present invention determines that all array elements are exactly N bits in size. Only N - 1 bits are stored because the present

invention knows the first bit is 1 and applies Strip Top Bit. Fig. 6a shows a 6-index array

including the elements, their binary value, and the stripped binary value element field size in

bits. In this example, a value of 64 (2⁶) was stripped from the element values. Fig. 6b shows the compression record for that array, demonstrating how the uniform size indicator ofthe

Strip Zeros compression method was used and how each element was identified with 6 bits.

In this example, each data entry was reduced by a size of 5 bits.

The single field size Strip Zeros compression method is used when the uniform field size does not apply. In other words, some values will require different size fields than others. The bit width ofthe storage field is selected to be the bit width necessary to describe the largest element. Any preceding zeros are stripped from the values, but the top

bit is not stripped. Elements that require smaller field widths retain zeros in the beginning bit positions as shown in Fig. 7a.

The individual field sizes Strip Zeros compression method is most effective

when a large variation of field sizes exists. Storing the individual field sizes may be more compact than setting the width to accommodate the largest element value. The present

invention will evaluate both methods and choose the smallest result. The individual field sizes method can take advantage ofthe Strip Top Bit technique since each element is separately compressed. Before each element in the compression record is a field size

specifying how many bits are required to hold the element (without the top bit). Elsewhere throughout the present invention, this field will always be 5 bits wide. However, in the

individual field sizes method, the maximum size ofthe field can be predetermined. In the

example illustrated in Fig. 7c, only 4 bits are needed to identify the maximum field width of

10, which is the number of bits required to store 1025 after the top bit has been removed. The

present invention subtracts 2 from the number of bits to hold the field width of 10 (which is 4 bits) and includes it as the maximum size indicator (2 = 4 - 2). Two can safely be subtracted

from the maximum field size because the cases of 0 and 1 correspond to arrays whose largest elements are 0 and 1, respectively. In these two cases, other Strip Zeros methods are found to

be more compact. The example shown in Fig. 7a demonstrates that the individual field sizes

method can save storage space. The original record was 76 bits, while the compressed record

is 56 bits.

The two field sizes Strip Zeros compression method divides the array elements into two groups. The present invention starts with the first group being limited to data values

that require the smaller field widths to identify, while the larger bin holds all other data

entries. For example, the present invention may determine that the smallest data value requires 4 bits to identify. All data values that can be identified in 4 bits will belong in the

small size group. The size ofthe compressed file is calculated. Strip Zeros next increases the bit width ofthe smaller group by one, so that the small group includes the previous data entries as well as all entries identifiable with one more bit. The present invention then recalculates the results. This process is repeated until the bit width ofthe smaller group is

one less than the bit width ofthe larger group. If any result is the smallest overall

compression, the present invention chooses that method. Fig. 7d shows a compression record

using the two field sizes method for the data array shown in Fig. 7a in the individual field size

method. The result in this case is 66 bits. While that size causes the present invention to

choose the single field size method over the two field sizes method for this array, it shows a compression of 10 bits. In some circumstances, the present invention may be able to save an additional bit by listing the larger field width before the smaller field width. For example, if

the large group requires 11 bits, the smaller size must be a bit width of 10 or less, expressed 1010₂ and requiring only 4 bits to store (a savings of 1 bit). Simply reversing the process

decompresses data entries that are compressed by Strip Zeros compression.

Mode Compression

The Mode Compression method searches through an array for the value most

frequently repeated - the mode ofthe array. It stores this value in the compression record only once, and treats all other values as outliers. The Mode Compression method yields

excellent compression for parameters that are relatively constant such as the flow time

parameter in the gas flow data record. For example, a reading may be taken every hour, resulting in precisely 60 minutes between readings. The present invention identifies the value ofthe data, strips the top bit, and transmits the value one time instead of once for each element. The number of array elements is either a fixed number or specified at the beginning

ofthe compressed file, so the decompression is easily accomplished. If outliers exist in the array, Mode Compression adds an outlier map and the values ofthe outlier elements, and

applies the Strip Top Bit method. The logic behind Mode Compression is shown in Fig. 8. Mode compression is capable of reducing the data file by the number of bits necessary to store all data elements minus one (the one that must be transmitted) less the bits necessary to define the compression. Decompression occurs simply by copying the saved

data value in each entry ofthe array.

Delta From Base

The Delta From Base compression method finds the lowest value in the array.

This value is called the base and is subtracted from all the array elements to create deltas.

Delta From Base yield particularly good compression when the base is significantly greater than the largest delta. For example, suppose an array had values of 6000, 6003, 6015, and

6008. The binary value of those numbers is: 00010111 01110000;

00010111 01110011;

00010111 Ol l l l l l lj and

00010111 01111000.

The first 12 bits of each of these numbers is identical. The present invention can send the

base number one time, and follow it with the deltas from the base. In this example, the deltas

are all 4 designated in 4 bits, resulting in a 12 bit savings per data entry, less the bits

necessary to define the compression. Decompression occurs by translating the base number

(restoring the top bit, if necessary) and adding each delta to that base for individual data

entries.

Delta From Base employs a number of strategies for adjusting the base, specifying the deltas and determining the outliers to minimize the overall compression record

size, such as using a single delta size, two delta sizes, and removal of outliers.. The logic flow of Delta From Base can be seen in Fig. 9. Delta From Base considers several different

methods of compression, chooses the best one, and then compares it to the results of other compression methods.

Delta From Base begins with the consideration of one delta bit width and no outliers. An example of an array that Delta From Base compresses efficiently is shown in

Fig. 10a, with a resulting compression record shown in Fig. 10b. When applying the no

outliers Delta From Base method to the array produced in Fig. 10a, the base is 347 and the

largest delta is 4022. The base bit width, after Strip Top Bit is applied, is 8 bits. The largest delta, 4022, requires 1^'2 bits to identify. The base is transmitted one time in 8 bits and the

remaining deltas are each transmitted in 12 bits.

Delta From Base next considers dividing the deltas into two separate groups,

or two size bins. Bin size refers to the field width necessary to identify the data ofthe array,

with the large bin size sufficient to identify the largest value stored. Delta From Base

examines every smaller bin size, ranging from 0 to one less than the largest bin size. If any

combination results in an overall smaller compression record size, Delta From Base saves those record parameters. Delta From Base performs this analysis by starting with the largest

element value as the base. For example, examine the data array shown in Fig. 11a. The largest value, selected for the base, is 30803. The deltas range from 0 to 493. The largest delta thus requires 9 bits to identify. The greatest compression ratio occurs, however, when

the deltas are split into two groups, with the smaller deltas assigned 4 bits, or a maximum value of 15. Fig. 1 lb shows the compression record for this array and this method, indicating

that the base width requirement is 14. The top bit is stripped, and the larger delta of 493 is assigned its width of 9 bits. The smaller delta size is assigned a width of 4 bits. Direction

and lower size indicators are presented, and the deltas follow. This example has small bin deltas clustered at the end ofthe array, so the compression record need not include indices for all of them.

After Delta From Base has searched for the best no outlier representation, it then searches for the best outlier representation. An outlier is defined as a delta that is too

large to be included compactly in the compression record. Delta From Base can express the outlier as a raw value or as a value outside the delta range.

The goal behind removing outliers is to shrink the number of bits required to identify the remaining records that vary little from the base number, and thereby reduce the overall number of bits in the compressed file. For example, suppose that in an array of 24

elements, 23 elements required 4 bits to store and one element required 5 bits to store. By keeping the delta size within 4 bits, 23 bits were saved. If the number of bits required to identify the outlier is less than 23, a net compression has occurred. Delta From Base

examines the elements in three ways: 1) it removes elements from largest to smallest; 2) it

removes elements from smallest to largest, and 3) removes elements from top and bottom in

an optimized removal process towards the center of the element values. If a more compact

record is found, Delta From Base adjusts the base to the new value, saves the new upper and lower delta values, and determines the field width necessary to handle the base and the

outliers.

Consider the array shown in Fig. 12a. If the base value were set to 28, the

maximum delta would be 730 and would require 10 bits to store. If three values were

excluded and the base was adjusted to 706, the maximum delta is 52, requiring only 6 bits to store. The number of saved bits is 4 per element, or 76 in this example. As the identification ofthe outliers will take 19 bits, the record is compressed by an additional 57 bits. The compression record is demonstrated in Fig. 12b. Delta From Base - Outliers/Delta From Range considers specifying outliers as the distance form the record's delta range. For example, consider an array containing the

elements 0, 23, 15, 4, 31, 19, 33, 39. The deltas range from 0 to 39, requiring 6 bits to store. If the delta fields were reduced by 1 to 5 bits each, all ofthe values would fit except for 33

and 39. Delta From Base can use a separate size field of six bits to hold 33 and 39. Another approach is to subtract the smaller delta range of 33, leaving 0 and 6, which fit in only 3 bits.

This approach has the advantage of using fewer bits for specifying outliers, but does take an

extra five bits to specify the outlier range as compared to using raw values described earlier. If the outlier deltas are in the negative direction, then the absolute values are stored. Delta

From Base uses a sign flag to indicate whether the outliers extend towards the top ofthe range or towards the bottom. Fig. 13a shows a 24-element array with a delta range from 0 to

543, taking a field of 10 bits to store. Delta From Base determines that removing the

maximum delta, or 543, allows the remaining deltas to fit in 9 bit fields. The outlier value is shifted down by 512 (2⁹) to 31, which takes 5 bits to store. The adjusted base is represented only once and removed from the remaining elements. The outlier element has an additional

512 removed from it. The remaining values are the deltas from base. The compression record

for this example is shown in Fig. 13b.

Difference From Last

Difference From Last compression is similar to Delta From Base, but the

deltas are the differences from the previous values and not from a base value. This compression type works particularly well for parameters that steadily ramp up/down or cycle through a sampling period, such as a tank level or temperature sensor. Difference From Last

begins by subtracting each element from its subsequent element to determine the differences.

It then identifies if and where there are inflections, or changes in the signs ofthe differences. For example, the values of 6000, 6003, and 6015 have binary values of: 00010111 01110000;

00010111 01110011; and 00010111 01111111.

Difference From Last may report the second and third values as 0011 and l l l l, saving the first 12 bits of each entry. Decompression requires that the first delta be added to or

subtracted from the number before it, and the next delta be added to or subtracted from the result ofthe first delta calculation, and so on. If there is a change in sign ofthe deltas, then either an inflection map must be included in the compression record or else the values that

change inflection must be treated as outliers. An inflection map is similar to an outlier bit-

map. It consists of a number of bits equivalent to the number of data entries in the data array.

The bit that corresponds to each entry that changes inflection from the previous entry is stored

as 1 while other bits are stored as 0.

Difference From Last employs two methods for optimizing the compression

record size. The first method splits the differences into a large size bin and a small size bin.

It next finds the optimum lower bin size in a fashion similar to that previously discussed. The second method shrinks the difference size and adds outliers to those removed because of inflection. The logic flow ofthe Difference From Last method can be seen in Fig. 14.

Difference From Last begins by considering the simplest case of just one

difference size and no outliers. An inflection record is included if necessary. Fig. 15a shows

a 24-element array where the initial value is 116. The differences range from 10 to 19

without inflections. Fig. 15b shows the compression record for the array. The compression

record indicates no inflections, no outliers, change in the positive direction, a field size, a base

size, and the deltas in order.

Difference From Last next considers dividing the differences into two size

bins. Difference From Last searches every possible lower bin size, which can range from 0 to one less than the largest bin size. If any two-size combination results in an overall smaller

compression record size, then Difference From Last saves the record parameters. Fig. 16a shows an array where the initial value is 4547 and the differences range from -1131 to 994. Difference From Last will create an inflection record to identify when the differences change between positive and negative values. The inflection record specifies where the differences

change signs, i.e., index numbers 2, 3, 7 and 15. The difference array is then converted to all positive values. Next, Difference From Last determines that the larger bin size must be 11

bits wide. It calculates the resulting final output for smaller bin sizes of 0 to 10 bits. In this example, the most advantageous example occurs when the smaller bin size is 8 bits. Fig. 16b

shows the compression record for this example. The record contains an inflections indicator, few outliers indicator, number of outliers, index direction backwards indicator, and the outlier

index records. It then indicates the initial size ofthe field to hold the initial value and the

initial value itself. Two different size bins are then indicated with more than one outlier. The record then includes the larger and smaller difference field sizes, followed by the values of

the differences as well as a bit map which indicates which data entries are small bin entries

and which bin entries are large bin entries.

Finally, Difference From Last will consider designating specific differences as

outliers. Difference From Last splits the differences based upon sign ofthe value. Those

values whose sign (positive or negative) is occurs less frequently are considered outliers. The remaining differences are set to positive values and the difference size is based upon the largest non-outlier difference. Difference From Last then shrinks the difference size, designating the now excluded values as outliers. If a particular difference size results in a

smaller compression record, Difference From Last saves the parameters. Fig. 17a shows an

example array where the initial value is 1048 and the differences range from -1 to -739. There are no inflections. Difference From Last sets all the differences to positive. Difference From Last starts with 10 bits to store 739 and then shrinks the difference size from 10 to 0 bits. In this example, the optimal size occurs at 5 bits, leaving three outliers in index

positions 1, 3 and 4. The compression record is shown in Fig. 17b. Decompression of any Difference From Last compression method is achieved by reversing the compression steps

taken.

Ratio Compression

Ratio compression is used when the elements of an array are related

proportionally to a data set that can be created from other arrays already included in the

compressed record. Processing begins by creating an array of estimated values, and then

finding a ratio that minimizes the range of deltas between the estimated and actual values.

The present invention then searches for the optimum way to describe the deltas. The logic

flow of ratio compression is shown in Fig. 18.

For compressing natural gas flow data, ratio compression is used for three types of parameters: pressure extension, volume and energy. For each, an estimated array is

created from the arrays that have already been compressed, and the deltas from the actual arrays are then calculated. If the deltas are small, then a compression output record is created.

The estimated arrays are created in the following ways:

1. The estimated pressure extension array is calculated by taking the square

root ofthe product ofthe differential and static pressures. 2. The estimated volume array is created by multiplying flow time by pressure

extension and then by C. If the pressure extension array is not included in the gas flow data record, then the estimated pressure extension array is used. If either the flow time or C array

is not included, then they are assumed to be the value of 1.

3. The estimated energy array is simply the actual volume array. For non-gas flow applications, the estimated arrays are created in a similar manner from

arrays included in the compression record.

The present invention checks to make sure that multiplying the differential and static pressures will not result in an overflow before performing the square root, i.e., that the result will not exceed the maximum storage of 32 bits. The product size is estimated by

rewriting the product calculation in logarithms: log₂ (product) ≤ log₂ (differential pressure) + log₂ (static pressure). Log₂ is defined as the number of bits required to store a value. If the product were to overflow, then log₂ (product) would exceed 32 by no more than N bits where

N is calculated as N = log₂ (differential pressure) + log₂ (static pressure) - 32. If the present

invention determines that N is greater than 0, it will prevent an overflow by first shifting

down both elements by a total of N bits before multiplying them, where N is first rounded up

to be an even number. The present invention shifts down the larger ofthe two elements first until it is the smaller, and then alternates shifting down the respective elements. This

technique is called "Shrink Product". Shrink Product then shifts the square root back up by N/2 bits. The equation for the process is:

estimate = [(differential pressure/2^A) x (static pressure/2^B)]° ⁵ x 2^N/2, where N = A + B. A and B are the shift values and are increased one at a time while the larger of differential or static pressure is reduced by a factor of 2. When A + B equals N, then the shifting process

has completed.

Similarly, the present invention prevents an estimated volume overflow with

the following equation: estimated volume = pressure extension x flow time x C. The present invention performs the calculation in two steps: first multiplying the pressure extension by flow time and then multiplying the result by C. Either step can result in an overflow, defined in this case as exceeding the maximum Strip Top Bit representation of a number, meaning 30 bits. The product is re-written in logarithmic form as previously described and Shrink Product is used to avoid overflows. In this case, the two equations are rewritten log₂ (product) <

log₂ (multiplicand 1) + log₂ (multiplicand2). If the product were to overflow, then log₂ (product) would exceed 30 by N, where N = log₂ (multiplicand 1) + log₂ (multiplicand2) - 30,

and the result would be off by a factor of 2^N1+N2. The present invention will correct this with an adjusting ratio.

For some applications ofthe present invention, the data arrays may be amenable for Ratio compression. If an array is proportionally related to another, a simple constant multiplied to one array will provide the values of another. In this situation, the original array may be used as the estimate array. In the situation where the second array is

proportional to the first with a constant offset added to the multiplication result, the estimate

is more difficult. If the offset is small compared to the results ofthe multiplication, it can be

ignored. If it is large, it may skew the ratio calculation. In this situation, the safest method is

to remove the offset before invoking ratio compression. However, this offset must be known, and would have to be included in the present invention build for the purpose intended. Some

ratios may include two offsets that vary over time. In this case, the best approach is to use the

lowest values of each array as separate offsets. The value ofthe offset must be included in the compression record.

Ratio compression assumes that the ideal ratio is the one that minimizes the sum of deltas when specified as:

∑ delta_n = ∑ [actual_n - (K x estimate,,)] = ∑ actual,, - K x ∑ estimate,,.

If ∑ delta_n is set to equal 0, then K = ∑ actual„/∑ estimate,,. If K were a floating-point

number, then it would take 32 bits to specify in the compression record. The present invention uses a more compact representation by limiting the mantissa to 15 bits (14 bits after

Strip Top Bit) and the exponent to 5 bits. With a flag field and a mantissa size field, the ratio record can range from 1 bit to 24 bits. Allowing the mantissa to become greater than 15 bits

does not add any benefits. The product ofthe mantissa with the estimate is limited to 31 bits, so increasing the precision ofthe mantissa only reduces it for the estimate.

The exponent is essentially the number of places to shift the product to the right. Therefore, the product ofthe ratio and the estimate is the following equation:

K x estimate,, = (mantissa x estimate.) x 2^"exponent.

The dynamic range ofthe ratio is from 4.66 x 10^"10 to 3.28 x 10⁴. The present invention calculates the initial ratio in four steps:

1. Calculate the sums ofthe estimate and actual arrays, while ensuring that neither sum will overflow. The exponent is initialized to 0 and adjusted to prevent an

overflow.

2. Adjust the actual array sum upward and the estimate array sum downward

to size the mantissa at 15 bits, and adjust the exponent accordingly.

3. Divide the estimate array sum into the actual array sum to create the mantissa.

4. If the exponent is greater than 31, then reduce the mantissa downward N bits to bring the exponent down to 31.

The present invention aborts further ratios processing if either array sum is 0, the calculated

mantissa is 0, or the calculated exponent is less than 0.

The present invention will also take steps to prevent an overflow in summing ratio arrays. Before summing either the estimate or actual arrays, the present invention will verify that the result will not exceed the maximum storage size of 32 bits. The sum is quickly estimated as:

Sum = ∑ elements < largest element x number of elements The equation can be written logarithmically as: log₂(sum) < log₂ (largest element) + log₂ (number of elements)

If the sum were to overflow, than log₂(sum) would exceed 32 by no more than N bits, where N is calculated as:

N = log₂ (largest element) + log₂ (number of elements) - 32. If the present invention determines that N is greater than 0, it can prevent an overflow by first shifting down each element N bits before summing pursuant to the equation: sum = ∑ (elements/2^N).

The present invention increases the exponent by N when preventing an estimated array sum

overflow and reduces the exponent by N when preventing an actual array sun overflow.

The table presented in Fig. 19 is a sample gas flow record with 22 volume

values and estimated volume values. The estimated volume was created by multiplying flow time, C and the square root ofthe product of differential and static pressures, as previously

described. First, the present invention checks for a potential overflow in summing the actual and estimated volume arrays. In this example, an overflow may exist. The present invention

determines that adjusting all ofthe estimated volume elements downward by a factor of 2² is

necessary. After completing the calculations, the present invention determines that the actual

volume sum is 15016. Next, the present invention adjusts the estimated and actual volume

sums so that the calculated ratio will be 15 bits in size. For this example, the present

invention adjusts the estimated volume sum downward (378166712/2¹¹ = 184651) and the

actual volume sum upward (15016 x 2¹⁸ = 3936354304). The exponent is increased by 1 1

and then by 18 to a new value of 31. Third, the present invention divides the actual volume sum (3936354304) by the estimated volume sum (184651) to create the mantissa (21317). Finally, the present invention recognizes that the exponent is greater than 31, and that it needs make no further adjustments. With a mantissa of 21317 and exponent of 31 , the initial ratio has the effective value of 21317 x 2^"31 = 9.9265 x 10^"6. After finding the initial ratio, the present invention must then determine the compression record size corresponding to this ratio. Next, it will shrink the mantissa by one

bit and reduce the exponent by one iteratively. With each iteration, the present invention

calculates what the compression record size would be for the new ratio along with 1 added to the ratio and 1 subtracted from the ratio. If a new mantissa and exponent result in a smaller

compression record, then the present invention saves these values. The present invention treats the new ratio as a better approximation ofthe optimum ratio. The present invention

then adjusts the new mantissa back up to 15 bits and the exponent upward, provided that the exponent does not exceed 31. The present invention repeats the mantissa shrinkage cycle repetitively until the compression record size no longer decreases. Tabular examples of this

process, through four iterations, are demonstrated in Figs. 20a - 20d.

When the present invention completes the mantissa shrinkage cycle, it has the

parameters necessary to create a no outlier compression record with one delta size. For

example, consider the pressure extension array of Fig. 21a. The present invention creates an estimate array by taking the square root ofthe product ofthe differential and static pressures as shown in Fig. 21b. The present invention determines that the optimum ratio has a mantissa

of 289 and an exponent of 8. This is the equivalent of 1.129. Multiplying the estimated pressure extensions by this ratio creates a range of deltas from -2 to 13, as seen in Fig. 21c.

The present invention picks the lowest delta as the base and adjusts the deltas upwards. Since

the largest delta is 15, the deltas can fit into 4 bits. The compression record for the pressure extension array is shown in Fig. 2 Id.

Ratio Compression - No Outliers/Two Delta Sizes

Ratio Compression next considers dividing the deltas into two size bins. Ratio searches every possible lower bin size, which can range from 0 to one less than the largest bin size. Ratio repeats this process by using the largest delta as the base and subtracts each delta from it. If any two-size combination from either search results in an overall smaller compression record, ratio compression will save the parameters. For example, see Fig. 22a.

Flow time, C and pressure extension values are not available, so the present invention uses

the square root ofthe product ofthe differential and static pressures for the volume estimate. The present invention determines that the optimum ratio has a mantissa of 3 and an exponent

of 2. This is the equivalent of 0.75 (3x2^'2). Multiplying the estimated volume values by this

ratio crates a range of deltas from -455 to 15 as seen in Fig. 22b. The present invention picks

the highest delta of 15 and subtracts the deltas from it. The deltas now range from 0 to 470, where the larger bin is 9 bits wide. The greatest compression comes by splitting the deltas

into two bins, where the smaller bin holds deltas that are 61 or smaller (6 bits wide). The compression record is shown in Fig. 22c.

Ratio Compression - Outliers/Delta from Range

Ratio compression next considers specifying outliers as the value ofthe

record's delta range. The method is the same as that described for the Delta From Base -

Outliers/Delta From Range method previously discussed. Ratio compression shrinks the

delta bin size, or field bit width, one bit at a time and then calculates the value ofthe difference between the top ofthe data range and the outliers. Ratio repeats this process by using the largest delta as the base and subtracting each delta from it. If any combination

results in an smaller compression record, the parameters are saved. Fig. 23a shows an

example pressure extension array. The present invention uses the square root ofthe product ofthe differential and static pressures for the pressure extension estimate. The present invention determines that the optimum ratio has a mantissa of 1 and an exponent of 0. Since the ratio is 1 , the present invention uses a single flag to indicate a ratio of one in the

compression record. Fig. 23b shows an array for the same example readings as Fig. 23a with

volume, the ratio estimate, the delta, and a value for (base - delta). The present invention picked the highest delta of 187 and subtracted the other deltas from it. Using a delta size of 7 and treating all deltas below 68 as outliers provides the optimal size compression file. Outlier values range up to 993 outside the acceptable delta range, requiring 10 bits to store. Fig. 23c shows the compression record for this example. The first 5 lines ofthe record indicate that the data is compressed by ratio compression with many outliers and indicate the positions of

those outliers. The base is indicated in the next three lines as positive 68. A flag indicating direction from base for the outliers is next, bit lengths are specified, and the outlier values are

identified.

Decompression of records compressed by Ratio Compression applies the

selected ratio equation to the other decompressed values ofthe transmitted data, followed by

addition or subtraction ofthe deltas that designate the variation for each data entry from the

ratio. Other Compression Methods

In addition to the five core compression methods, the present invention applies Time Compression, No Flow Record Removal, Daily Summary Compression, and One

Record Compression to source files. Readings are typically taken once an hour for the

purposes that the present invention was designed. The present invention can compress the

time array very compactly if every element is taken at the top ofthe hour and every

subsequent reading is one hour later. Under these circumstances, the present invention

identifies that there are no outliers and specifies the first time reading by hour after midnight.

If either of these conditions is not met, an outlier record is created.

If any element in the time array violates either condition (i.e., not at the top of

the hour exactly one hour after the previous reading), the present invention creates an outlier bitmap for all the data entries. The first entry is flagged as an outlier. Any other entry that is not exactly 60 minutes from the preceding entry is also flagged as an outlier. The full value of all outliers is stored in the compressed file. During decompression, the outlier values are

placed in the time array, and all other values are determined by adding 60 minutes to the preceding data value.

After the time array is compressed, the invention searches the data for records for "no flow" records. No flow records exist when all readings are zero with the exception of static pressure and temperature. The invention then removes the zero elements form the

differential pressure, flow time, C, pressure extension, volume and energy arrays. The records remaining for compression are reduced by the number of no flow records. The

present invention uses the outlier method to identify which records are no flow.

The present invention creates a summary value of each given parameter array after compression. The summary value may be the daily average or it may be a total across

24 hours of records. The present invention begins by calculating the average or total of a

parameter array and then subtracts the estimated value from the actual value. A number of summation rules are used. To estimate the summary average, the present invention first checks for an overflow, i.e., the sum exceeds the maximum storage space of 32 bits. The upper limit ofthe sum is estimated by:

sum = ∑ elements < largest element x number of elements. The equation is rewritten as:

log₂ (sum) < log₂ (largest element) + log₂ (number of elements). If the sum were to overflow, then log₂ (sum) would exceed 32 by no more than N bits where

N = log₂ (largest element) + log₂ (number of elements) - 32.

If the present invention determines that N is greater than 0, it can prevent an overflow by shifting down each element N bits before summing and shifting the average back up by N bits. The equation employed is: average = {[ ∑ (elements / 2^N) ] / number of elements} x 2^N. To estimate the summary total, the present invention will check during the

summation whether the total exceeds the upper limit of 31 bits. If the limit is exceeded, the

present invention will set the total to the upper limit. Next, it extrapolates the total upwards if there are fewer than 24 records. The revised total is calculated by: total = [( ∑ elements) x 24] / number of elements.

The equation is rewritten as: log₂ (total) < log₂ ( ∑ elements) + log₂ (24 / number of elements).

If the log₂ (total) can equal or exceed 32, then the present invention sets the total to the upper limit. Otherwise, it checks if the [( ∑ elements) x 24] portion ofthe extrapolation will exceed 32 bits by calculating

N = log₂ (largest element) + log₂ (24) - 32. If N is greater than 0, the present invention can prevent an overflow by first shifting down ∑

(elements) by N bits before dividing by (number of elements), then shifting the total back up

by N bits. The equation used is: total = ({[ ( £ elements) / 2^N] x 24} / number of elements) x 2^N.

After creating the estimate ofthe average or total, the present invention determines the difference with the actual summary. Depending on the size ofthe difference,

the present invention will create on of three types of compression records. If the difference is

between -8 and +7, the small size summary compression record is used. The compression record includes a small size indicator (1 bit), a negative sign indicator (1 bit) and a value for

the difference (3 bits).

If the difference is outside the -8 to +7 range, the actual summary value must be included in the compression record. If an average summary value is less than or equal to

the size ofthe largest element in the parameter array, or if a total summary value is less than

or equal to the size ofthe total estimate, the present invention can assume the size ofthe summary value in the compression record. The compression record will include a flag for the summary size being other than small size, a flag to indicate the value is within the calculated range, and the summary value.

If the summary value cannot fit into the either ofthe two previous fields, the

present invention will create a large size compression record that includes the summary size

field. In this case, the compression record must add a designator for the size ofthe field

needed to hold the value (after Strip Top Bit is applied).

Finally, if each array in the input record has only one reading, the present

invention will put all ofthe parameters in one array instead of having an array for each type of

data reading. If daily summaries are set for that reading, they are interspersed with the readings from that hour. The composite array is then compressed by Strip Zeros.

Claims

I claim:

1. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the raw data entries into data arrays; predicting the size of a compressed file from each of a plurality of compression methods; selecting the optimum compression method for each data array;

applying the optimum compression method for each data array to that data array; and creating a computer compression record that includes identification ofthe manner in

which data was stored and the data itself of a significantly smaller size than the original data

file.

2. A method according to claim 1 wherein one ofthe plurality of compression

methods comprises identifying that each data entry is identical to every other data entry and storing the entry only one time.

3. A method according to claim 1 wherein one ofthe plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe

character string.

4. A method according to claim 1 wherein one ofthe plurality of compression

methods comprises selecting one data entry as a base value of a data array, subtracting the

base value from remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.

5. A method according to claim 1 wherein one ofthe plurality of compression methods comprises selecting and storing a first data entry ofa data array as a base, and determining the difference between each remaining data entry and its preceding data entry and

storing the differences for the remaining data arrays.

6. A method according to claim 1 wherein one ofthe plurality of compression methods comprises determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining

the variation ofthe actual data entries from the results ofthe mathematical equation applied to the corresponding data entries ofthe two or more other data arrays, and storing the mathematical equation and each variation.

7. A method according to claim 1 wherein the plurality of compression comprises:

identifying that each data entry is identical to every other data entry and storing the entry only one time; removing unnecessary zeros from the beginning ofa character string that is a binary

representation of a data value and storing only the meaningful portion ofthe character string;

selecting one data entry as a base value of a data array, subtracting the base value from

remaining data entries ofthe data array, and storing the base value for one data entry and the

calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the

difference between each remaining data entry and its preceding data entry and storing the difference; and

determining a mathematical equation that will approximately describe the data entries

of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the

corresponding data entries ofthe two or more other data arrays, and storing the mathematical equation and each the variation.

8. A method for compressing data files comprising the steps of:

acquiring raw data and organizing the data into data arrays of similar type where the raw data represents readings from instruments, including, but not limited to, entries identifying the time of readings and amount of time between readings;

determining which entries representing the amount of time between each reading was

a predetermined uniform amount of time, and, if so, storing only the beginning reading time;

determining whether data entries indicate that no activity occurred between a time reading and a next time reading, and, if so, removing the record corresponding to the next time reading; predicting the size of a compressed file that would result from each of a plurality of

compression methods applied to each data array; selecting the optimum compression method for each data array;

storing instructions indicating data compression methods and data field sizes; storing data compressed by the plurality of compression methods;

storing data entry values that are excluded from the data stored by the plurality of

compression records;

compressing and storing summaries of a day's worth of data; and creating a computer compression record that includes identification ofthe manner in

which data was stored and the data itself, resulting in a compressed computer file of significantly smaller size than the original data file.

9. A method according to claim 8 wherein one ofthe plurality of compression

methods comprises identifying that each data entry is identical to every other data entry and

storing the entry one time.

10. A method according to claim 8 wherein one ofthe plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe character string.

11. A method according to claim 8 wherein one ofthe plurality of . compression methods comprises selecting one data entry as a base value of a data array, subtracting the base value from remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.

12. A method according to claim 8 wherein one ofthe plurality of

compression methods comprises selecting and storing a first data entry of a data array as a

base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference.

13. A method according to claim 8 wherein one ofthe plurality of

compression methods comprises determining a mathematical equation that will approximately

describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the corresponding data entries ofthe two or more other data arrays, and

storing the mathematical equation and each variation.

14. A method according to claim 8 wherein the plurality of compression

methods comprises

identifying that each data entry is identical to every other data entry and storing the

entry only one time; removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion ofthe character string; selecting one data entry as a base value of a data array, subtracting the base value from

remaining data entries ofthe data array, and storing the base value for one data entry and the calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference; and

determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation ofthe actual data entries from the results ofthe mathematical equation applied to the

15. A method for compressing data files comprising the steps of:

acquiring raw data entries and organizing the data entries into data arrays of similar

type;

determining an equation that closely describes each data entry of one data array as a comparison of a plurality of other corresponding data entries from other data arrays;

determining a variation of each actual data entry from the result ofthe equation; storing the equation; and storing the variations.

16. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type;

determining the number of bits necessary to identify each data entry;

storing each data entry in a data field where the data field width equals the number of

bits necessary to store the data entry.

17. A method according to claim 1 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type; determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.

18. A method according to claim 7 further comprising the steps of:

type; determining the number of bits necessary to identify each data entry;

storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.

19. A method according to claim 8 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type; determining the number of bits necessary to identify each data entry;

bits necessary to store the data entry.

20. A method according to claim 14 further comprising the steps of:

acquiring raw data entries and organizing the data entries into data arrays of similar type;

determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.

21. A method for compressing data files comprising the steps of:

acquiring raw data entries and organizing the data entries into data arrays; applying a compression method to the data entries; identifying data entries that would allow more efficient compression if they were

excluded from the data entries to which the compression method was applied;

storing the excluded data entry separately from the data entries to which the compression method was applied.

22. A method according to claim 1 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality

of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied;

23. A method according to claim 7 further comprising the steps of:

identifying data entries that would allow more efficient compression by the plurality

storing the excluded data entry separately from the data entries to which the

compression method was applied.

24. A method according to claim 8 further comprising the steps of:

of compression methods if the identified data entries were excluded from the data entries to

which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

25. A method according to claim 14 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality

of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

26. A method according to claim 1 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and

reversing the compression process to recreate the original data array.

27. A method according to claim 7 further comprising the steps of:

reading the computer compression record,

identifying how data entries were stored, and reversing the compression process to recreate the original data array.

28. A method according to claim 8 further comprising the steps of: reading the computer compression record,

29. A method according to claim 14 further comprising the steps of: reading the computer compression record,

identifying how data entries were stored, and

reversing the compression process to recreate the original data array.

30. A method according to claim 17 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

31. A method according to claim 21 further comprising the steps of:

reading the computer compression record,

32. A method according to claim 22 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and

reversing the compression process to recreate the original data array.