EP1813022A1 - Lossless compression of data, in particular grib1 files - Google Patents
Lossless compression of data, in particular grib1 filesInfo
- Publication number
- EP1813022A1 EP1813022A1 EP04822654A EP04822654A EP1813022A1 EP 1813022 A1 EP1813022 A1 EP 1813022A1 EP 04822654 A EP04822654 A EP 04822654A EP 04822654 A EP04822654 A EP 04822654A EP 1813022 A1 EP1813022 A1 EP 1813022A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- differences
- field
- compression
- grib
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01W—METEOROLOGY
- G01W1/00—Meteorology
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/62—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding by frequency transforming in three dimensions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/649—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding the transform being applied to non rectangular image segments
Definitions
- the invention is a method, for lossless and fast compression of numerical data stored as a file in a computer.
- the format of the data to be compressed is arbitrary.
- the invention in particular is applicable to the compression of data stored in the GRIB 1 format for gridded data in binary form. Nevertheless, in the following reference is made to GRIBl -formated data in order to explain the prior art and the invention.
- Compression means that the data requires less storage capacity in the computer after the invention has been applied.
- the GRIBl format is an international standard for exchange and archiving of meteorological data: ,,FM 92-K Ext. - GRIB" (gridded binary), World Meteorological Organization (WMO) Manual on Codes, Publication 306, Volume 1, Part B, 1988 Edition, Supplement No. 3 (VUI 1991).
- WMO World Meteorological Organization
- the GRIB-I format takes a two-dimensional field of floating point numbers and stores it as a block of positive binary numbers together with a reference value and two scaling factors. In addition there is a header with META data concerning the type and origin of the data stored. This is called a GRIB file.
- the GRJBl format allows the specification of the bit length and the precision of the stored data by means of parameters. (See the section "The GRIB format") The invention compresses such files without loss. This is in contrast to lossy methods of compression as, for example, MPEG.
- the intended application of the invention is the rapid and lossless compression of all of the GRIB data, which is generated daily in a meteorological institute or weather service.
- the advantages of such a compression are:
- the first group N. Wang, R. Brummer and C. Steffen of the NOAA (National Oceanic and Atmospheric Administration) - used lossy two dimensional (2D) wavelet methods. The lost data was stored in an additional "difference" file. They also tried the method of 2D differences followed by arithmetic encoding, but rejected it.
- N. Wang, R. Brummer and C. Steffen of the NOAA National Oceanic and Atmospheric Administration
- each meteorological institute has its own set of tested parameters and is not prepared to change them.
- the Deutscher Wetterdienst prescribes the bit length to be 16 and the decimal scaling to be O.
- JPEG2000 (ISO/IEC 15444-1, 2000, JPEG2000 Image Coding System) has a lossless mode which yields good compression factors for pictures. It is wavelet based.
- JPEG-LS (ISO/TEC JTC1/SC29/WG1, FDIS14495-1) is also intended for the compression of pictures. This means that the input is a rectangular array of integers. It is based on a linear prediction method, yields good compression factors and is extremely fast. As a linear method, is best suited for data with medium to low smoothness.
- the compression factor is the factor by which the size of the uncompressed file is larger than the size of the compressed file, that is, it should be larger than 1.
- 2D data is data which is given on a two dimensional grid. In our case, these would be the values of some meteorological variable as measured or calculated on a rectangular 2D grid. Such a set of values is put into one GRIB file.
- 3D data we mean a collection of 2D data fields which all are values of one and the same meteorological variable on the same rectangular grid but at different heights. An example for this is temperature. Temperature is measured or calculated on a rectangular grid on the surface of the earth as well as at various heights above the surface. The values at each of these heights is a 2D field of values which would be put into a separate GRIB file.
- the NOAA group uses lossy 2D wavelet methods.
- the error due to the lossy method is calculated and compressed with some entropy encoding.
- the number of values in the error file is as large as in the original data file, it can be very well compressed since the errors are small integer numbers.
- the disadvantages of this approach are
- the invention compresses files of data (in particular GRIB-type files) so that they require less storage capacity on a computer.
- the invention can also decompress the files in which case the decompressed files are identical with the original files (lossless compression).
- the invention can be used in either the manual or the automatic mode to
- the invention yields better compression factors at faster execution speeds than any other lossless compression method for GRIB files know today
- the first assumption is based on the results of compressing the GRIB files resulting from a typical weather forecast. This is a suite of approximately 370 GREB files having a total size of about 320 MB (Megabyte).
- the invention is a method for compressing 3D data, in particular 3D meteorological data and, more specifically 3D data in GRIBl format, wherein the 3D data comprises several layers of 2D data and wherein the method comprises the steps of
- step (b) determining the differences between the differential data of respective adjacent 2D differential data layers, and (c) compressing the 2D differential data layers obtained in step (b) using a 2D data compression such as e.g. JPEG LS or JPEG 2000.
- a 2D data compression such as e.g. JPEG LS or JPEG 2000.
- a method for preprocessing 3D data in particular 3D meteorological data and, more specifically 3D data in GRIBl format, for compressing purposes, wherein the 3D data comprises several layers of 2D data and wherein the method comprises the steps of
- the major goal of data compression is to reduce storage space for the data. This can be achieved by describing the data sets by means of the differences between the individual data. As long as the data to be compressed does vary merely little, storing the difference values between the data requires less storage space than storing the data as such. However, in some applications such as in files of meteorological data, the values presented by the data can vary significantly. This is for example true for meteorological specific humidity or pressure data which both can vary significantly depending on the height over ground.
- One of the key aspects of the invention is that determining the differences between adjacent 2D data layers of a 3D data set makes it possible to describe the data with a small amount of digits only (assuming that scaling factors and standarization is taken into consideration so as to make the 2D data layers comparable to each other). Namely, even if throughout all of the 2D data layers of a 3D data set the data values in Z-direction could significantly vary, the data from layer to layer does not. Accordingly, preprocessing the 3D data with respect to the determination of differences between adjacent data as mentioned above results in a data format using merely little space. This data format furthermore can be used to compress the 3D data set using well known 2D data processing methods such as JPEG LS or JPEG 2000 for example. In contrast, existing 3D-wavelet methods would take differences of values from the bottom layer and the top layer. These differences could be numbers with more digits than the original data, nullifying any compression gain.
- the compression consists of four parts, namely
- the numbers will be processed exclusively in integer arithmetic.
- the preprocessing is based on differences of various orders. For 2D fields, differences are used in the standard way. For 3D fields, they must be modified. For 2D fields, preprocessing with differences effectively increases the extrapolation order of JPEG-LS. For example, applying differences once increases the extrapolation order of JPEG-LS from linear to quadratic. Applying 3D differences to 3D data results in both increasing the extrapolation order of JPEG-LS and in exploiting the 3D correlation for better compression factors.
- the preprocessing involving differences can be carried out so fast that its contribution to the total computational cost is negligible.
- test data from the German Weather Service.
- the data contains seven 3D variables as well as many 2D variables. It was determined that 3D differences yield good compression factors for five of them and that the other two should be preprocessed with 2D differences. Having determined this once, one can "hard wire" this into the method.
- a compressed file contains two headers and the compressed data.
- the first header, header 1 consists of
- header2 contains the information needed to reconstruct the original data. It includes the separately stored data and the minimum values of the 2D fields. Header2 also contains other bookkeeping information which will not be discussed. Just as one example, one needs to know where header2 ends and where the compressed data starts.
- the GRIB header is written as header 1. Then 0), 1) and 2) as described above for 2D data are carried out in dependence of the number of differences to be taken.
- the compressed data are written to the compressed file.
- the first value and the minimum value of the field of the field obtained after taking differences are written to header2. Then the compressed data is written to the file.
- the first row and the first column and the minimum value of the field of the field obtained after taking differences twice are written to header2. Then the compressed data is written to the file.
- the first step of the decompression is just applying JPEG-LS decompression to all of the 2D fields involved. Then the minimum values of all of the 2D fields are added back. These minimum values are in header2.
- 2D differences are formed by first applying ID differences to the rows, then ID differences to the columns. 2D decompression is then ID decompression applied to the columns followed by ID compression applied to the rows. The first values required for the decompression are fond in header2.
- the invention yields better compression factors at faster execution speeds than any other lossless compression method for GREB files know today.
- the method is at least as fast as any lossless compression method for GRIB files know today.
- the files compressed on one computer can be decompressed without loss on any other computer: the method is computer independent.
- GRUB files which was generated in a weather forecast for Northern and Central Europe (local model Europe, LME) by the German Weather Service.
- This suite consists of 368 GRJJB files with a total size of 321 MB. It includes five variables which were processed as 3D fields (with 40 layers each). Two other variables could also have been compressed as 3D fields, but it turned out to be more favorable to compress each of the layers as 2D fields.
- the GPJB encoding uses 16 bits for each of the values.
- the results presented were calculated on a 2.3 GHZ Linux PC.
- the preprocessing depends on whether 2D differences or 3D differences are taken.
- GRIB encoding also involves three other values which will not be used here and one which is sometimes used. They are the binary and decimal scaling factors and the reference value. The bit length B is sometimes used.
- K] j the field K] j .
- K] ⁇ a field of nonnegative values and its bit length are determined and passed to JPEG-LS for compression.
- the compression of 3D fields requires detailed knowledge of the GRIB format.
- the GRJB format is applied to 2D fields X. j of floating point numbers.
- the process includes scaling the numbers according to the needs of the user. First each number is multiplied by a power D of 10, namely 10 ⁇ . This decimal scaling factor is a positive or negative integer. Then the reference value, R , is determined. It is the smallest number in the field (after the decimal scaling) and is stored as a single precision floating point number. R is subtracted from all the numbers in the field to obtain a field of nonnegative floating point numbers. This field is multiplied by 2 ⁇ E . This is the binary scaling and the binary scaling factor E is a positive or negative integer. Finally, the integer part of this field is taken to obtain F 1 j which are the binary numbers stored in the GRIB file. The bit length of this field is determined and stored in the header.
- D 10 ⁇
- R the reference value
- the GRIB format allows one to format the values of a 3D field belonging to different layers with different precisions. This is what causes difficulties for 3D wavelet methods, since they depend on the fact that all values occurring have the same precision. As a result, the compression factors are worse.
- the compressed file consists of two headers and the compressed data.
- the first header, header 1 is
- Header2 contains the auxiliary information necessary to decompress the data. It contains the data that was stored separately from the compressed fields and the field minima. The bit lengths are not stored. Header2 also contains additional book keeping details which will not be explained here. For example, one must know where Header2 ends and where the data compressed by JPEG-LS starts.
- the difference between manual and automatic determination of the number of differences to be taken is only that in the automatic mode, differences are taken until a termination criterion is met, while in the manual mode the number of differences to be taken is fixed for each variable once and for all. Therefore only the manual mode is described.
- the method for 2D data the GRJB header is written as headerl. Then 0), 1) and 2) as described above for 2D data are carried out in dependence of the number of differences to be taken.
- the compressed data are written to the compressed file.
- the first value F 11 and the minimum value of the field of the field obtained after taking differences are written to header2.
- the compressed data is written to the file.
- the first row and the first column K 1 . and the minimum value of the field of the field obtained after taking differences twice are written to header2.
- the compressed data is written to the file.
- the method for 3D data AU GRIB headers are written to headerl .
- the first step of the decompression is just applying JPEG-LS decompression to all of the 2D fields involved. Then the minimum values of all of the 2D fields are added back. These minimum values are in header2.
- 2D differences are formed by first applying ID differences to the rows, then ID differences to the columns. 2D decompression is then ID decompression applied to the columns followed by ID compression applied to the rows. The first values required for the decompression are fond in header2.
- This example comes from data of the German Weather Service.
- the quantity measured is specific humidity on a 325 X 325 grid with 35 layers.
- the values stored in the GRIB file are
- decimal scaling factors are both 0.
- E 18 -22 .
- the binary form of 25 is 11001 which is 5 bits large instead of the original 16 bits.
- This example is taken from a GRIB file of the NOAA (National Oceanic and Atmospheric Organization). The quantity measured is also specific humidity but in kg/kg (the units in Example 1 are kg/m A 3). The grid size is 147 X IlO.
- the bit length for layer 28 is 9 while the bit length for layer 29 is 6.
- the binomial form of-11 is -1011 which is 4 (5) bits.
- the decimal scaling factors are both 0 and will not be needed.
- the decimal points are included just to make it easier to count the number of bits which is 18.
- the original number of bits was 16. Thus, instead of getting smaller numbers, we obtain larger ones.
- Starting point A GRIB file which contains 2D grids of floating point numbers formatted in a particular way.
- the main content is a 2D field of integers numbers, hi order to extract the floating point numbers they represent, two scaling factors, decimal and binomial, as well as a reference value are used.
- there is a header which describes the data contained in the file.
- 3D compression involves a new method for using 3D correlation of GRIB data for compression. It is used when one has a sequence of GRIB files containing the values of some quantity on grids at different heights above the surface of the earth. In this case all of these files are compressed together. Then the 2D fields of integers as well as the scaling factors are used.
- the main components of the method are a) 2D or 3D differences as preprocessing. These are extremely fast and invertible methods, i.e., the original data can be recovered from the data obtained by differencing. b) A 2D lossless compression method for integer fields which uses integer arithmetic. We have chosen JPEG-LS, but any other such method would work also. The choice is a question of how fast the method is and how well it compresses.
- the 3D case in this case, the numbers contained in different integer fields may not be comparable. For this reason, scaling factors are used in forming 3D differences to make them comparable. a) First 2D differences are first applied to each of the 2D fields a certain number of times. b) Then ID differences are applied in the third dimension (height), that is between neighboring 2D fields, using a special formula incorporating the scaling factors. c) Then each of the resulting 2D fields is compressed separately by JPEG-LS.
- a 2D field of floating point numbers is called scaled if it is represented as a 2D field of integers all having the same scaling factors. More exactly, it is a field of numbers X. j of the form
- N 1 and N 2 are nonnegative integers
- D x and D 2 are integers
- F Uj . is a field of integers
- R is a real number, either a floating point number or an integer.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Life Sciences & Earth Sciences (AREA)
- Atmospheric Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Ecology (AREA)
- Environmental Sciences (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2004/013109 WO2006053582A1 (en) | 2004-11-18 | 2004-11-18 | Lossless compression of data, in particular grib1 files |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1813022A1 true EP1813022A1 (en) | 2007-08-01 |
EP1813022B1 EP1813022B1 (en) | 2011-07-27 |
Family
ID=34959407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04822654A Expired - Fee Related EP1813022B1 (en) | 2004-11-18 | 2004-11-18 | Lossless compression of data, in particular grib1 files |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1813022B1 (en) |
WO (1) | WO2006053582A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116683915A (en) * | 2023-06-14 | 2023-09-01 | 上海海洋中心气象台 | Meteorological data compression method, system and medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT1397591B1 (en) * | 2009-12-21 | 2013-01-16 | Sisvel Technology Srl | METHOD FOR THE GENERATION, TRANSMISSION AND RECEPTION OF STEREOSCOPIC IMAGES AND RELATIVE DEVICES. |
-
2004
- 2004-11-18 EP EP04822654A patent/EP1813022B1/en not_active Expired - Fee Related
- 2004-11-18 WO PCT/EP2004/013109 patent/WO2006053582A1/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2006053582A1 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116683915A (en) * | 2023-06-14 | 2023-09-01 | 上海海洋中心气象台 | Meteorological data compression method, system and medium |
CN116683915B (en) * | 2023-06-14 | 2024-02-13 | 上海海洋中心气象台 | Meteorological data compression method, system and medium |
Also Published As
Publication number | Publication date |
---|---|
EP1813022B1 (en) | 2011-07-27 |
WO2006053582A1 (en) | 2006-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7003168B1 (en) | Image compression and decompression based on an integer wavelet transform using a lifting scheme and a correction method | |
WO2019111006A1 (en) | Methods and apparatuses for hierarchically encoding and decoding a bytestream | |
US11074723B2 (en) | Lossless compression of fragmented image data | |
US6456744B1 (en) | Method and apparatus for video compression using sequential frame cellular automata transforms | |
EP2512137A2 (en) | A method and system for data compression | |
WO1993014600A1 (en) | Method and apparatus for compression and decompression of color image data | |
KR101687865B1 (en) | Encoder, decoder and method | |
EP0739570A1 (en) | Boundary-spline-wavelet compression for video images | |
US20160021396A1 (en) | Systems and methods for digital media compression and recompression | |
CN105120293A (en) | Image cooperative decoding method and apparatus based on CPU and GPU | |
EP1796397A1 (en) | Stepwise reversible video encoding method, stepwise reversible video decoding method, stepwise reversible video encoding device, stepwise reversible video decoding device, program therefore, and recording medium for the program | |
US20160309190A1 (en) | Method and apparatus to perform correlation-based entropy removal from quantized still images or quantized time-varying video sequences in transform | |
Kadhim | Image compression using discrete cosine transform method | |
Dubey et al. | 3D medical image compression using Huffman encoding technique | |
CN1315023A (en) | Circuit and method for performing bidimentional transform during processing of an image | |
US6330283B1 (en) | Method and apparatus for video compression using multi-state dynamical predictive systems | |
EP1813022A1 (en) | Lossless compression of data, in particular grib1 files | |
Martel | Compressed Matrix Computations | |
De Silva et al. | Exploring the Implementation of JPEG Compression on FPGA | |
US6400766B1 (en) | Method and apparatus for digital video compression using three-dimensional cellular automata transforms | |
Jancy et al. | Various lossless compression techniques surveyed | |
Usevitch | JPEG2000 compatible lossless coding of floating-point data | |
WO2000055757A1 (en) | A fast multiplierless transform | |
CA2253145C (en) | System for interactive visualization and analysis of imaging spectrometry datasets over a wide-area network | |
Bracamonte et al. | A multiplierless implementation scheme for the JPEG image coding algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070329 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: IZA-TERAN, RODRIGO Inventor name: LORENTZ, RUDOLPH |
|
17Q | First examination report despatched |
Effective date: 20071019 |
|
DAX | Request for extension of the european patent (deleted) | ||
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB |
|
GRAC | Information related to communication of intention to grant a patent modified |
Free format text: ORIGINAL CODE: EPIDOSCIGR1 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602004033722 Country of ref document: DE Effective date: 20110922 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20120502 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602004033722 Country of ref document: DE Effective date: 20120502 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20141120 Year of fee payment: 11 Ref country code: DE Payment date: 20141120 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20141118 Year of fee payment: 11 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602004033722 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20151118 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20160729 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160601 Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20151118 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20151130 |