CN109478893B - Data compression encoding method, apparatus and storage medium - Google Patents

Data compression encoding method, apparatus and storage medium Download PDF

Info

Publication number
CN109478893B
CN109478893B CN201780045701.9A CN201780045701A CN109478893B CN 109478893 B CN109478893 B CN 109478893B CN 201780045701 A CN201780045701 A CN 201780045701A CN 109478893 B CN109478893 B CN 109478893B
Authority
CN
China
Prior art keywords
data
column
record
encoding
encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780045701.9A
Other languages
Chinese (zh)
Other versions
CN109478893A (en
Inventor
铃木隆之
柴田薰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Expressway Co ltd
Denso Corp
Original Assignee
Expressway Co ltd
Denso Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Expressway Co ltd, Denso Corp filed Critical Expressway Co ltd
Publication of CN109478893A publication Critical patent/CN109478893A/en
Application granted granted Critical
Publication of CN109478893B publication Critical patent/CN109478893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3077Sorting
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6035Handling of unkown probabilities
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Provided are a compression encoding method, device, and program suitable for use in the case of continuously encoding fixed-length data. The compression coding method comprises the following steps: dividing a record into a column of a predetermined bit width, the record being composed of a fixed length bit string including 1 or more fields, the 1 or more fields being fields in which the same kind of data is described in the same field among predetermined fields; and solving, in the plurality of records, an occurrence probability of a bit value at the same position in the solution column for each column, and encoding the plurality of records by an entropy encoding method according to the occurrence probability.

Description

Data compression encoding method, apparatus and storage medium
Technical Field
The present embodiment described below relates to a data compression encoding method, an apparatus thereof, and a program thereof.
Background
In recent years, it has been conceived to construct the following sensor network: a plurality of wireless terminals with sensors are dispersed in a space and cooperate to acquire environmental, physical conditions. In addition, with the development of electronic control of automobiles, various in-vehicle sensor networks have been put into practical use.
Fig. 1 is a schematic diagram schematically illustrating these sensor networks. For example, in the sensor network 1, data detected by the sensor 2a and the like is transmitted to the processing device 4 via the sensor node 5 and the gateway 3. When the data acquired by the sensors 2a, 2b, and 2c are transmitted to the processing device 4, the transmitted data tends to have a fixed data size. In addition, in the example of fig. 1, the data compression device is at the sensor node.
The data sequence in which the data whose capacity is predetermined, such as the state of the environment, detected by each sensor is arranged in a specific arrangement order is referred to as a record. In this case, 1 record is fixed-length data composed of a bit string of a fixed length. In the sensor network, data such as an environmental state detected at a sensor time is continuously output as a record. Here, the sensor includes a temperature sensor, a humidity sensor, a pressure sensor, a rotational speed sensor, a wind speed sensor, a flow rate sensor, an acceleration sensor, a speed sensor, a position sensor, a sensor for detecting on/off information of a switch, and the like.
Fig. 2 is a diagram illustrating the fixed-length data example.
In the example shown in fig. 2, the case where the detection information of the sensor 2a is the number of rotation pulses and the detection information of the sensors 2b and 2c is the on/off information of the corresponding switches is shown.
The bit length of the fixed-length data transmitted and received through the sensor network 1 is set to a fixed value. The fixed-length bit data may be internally divided into fields in which the type of data to be described is specified for each predetermined number of bits. For example, fig. 2 (a) shows an example in which the fixed-length data is represented by a 10-ary number. In the example of fig. 2 (a), at the time when 26 bits are described in the initial field of the fixed-length data, the number of 14-bit rotation pulses as the output of the rotation pulse number sensor 2a is described in the next field. The next field describes 1-bit data indicating whether the detection information of the sensor 2b is on or off, and the next field describes 1-bit data indicating whether the detection information of the sensor 2c is on or off. The overall data bit length is a fixed value. In addition, in the examples of fig. 1 and 2, it is shown that 3 sensors are provided on 1 sensor node 5 of the sensor network 1. However, the types and the number of sensors provided in 1 sensor node are not limited to this, and any number of arbitrary types of sensors of 1 or more may be provided.
In fig. 2 (b), the fixed length data represented by 10-ary numbers of fig. 2 (a) is represented by 2-ary numbers. In this case, the time of 26 bits, the number of rotation pulses of 14 bits, the on/off state of the sensor 1 of 1 bit, and the on/off state of the sensor 2 are described from the start. Fig. 2 (c) is a diagram of fixed length data represented by a 2-ary number in fig. 2 (b) as consecutive bits. In this case, since the information indicating what kind of bit is indicated from the start point to the next bit is also determined in advance, the device that has received the fixed-length data can recognize the data described in the fixed-length data by sequentially reading the bits from the start point.
In the example of fig. 1 to 2, the number of rotation pulses and the on/off information of the switch are shown as the detection information of the sensor, but the sensor of the present embodiment is not limited to this, and may be, for example, a sensor that detects various detection amounts such as temperature, humidity, position, speed, acceleration, wind speed, flow speed, pressure, and the like.
Furthermore, it is also not necessary to limit the transmitted and received data to the detection information of the sensor. The present invention is not limited to the detection information of the sensor, and can be applied to data sequentially transmitted from a transmission source.
In the case of continuously transmitting such a record of a fixed length, the following method is sometimes used: the data of a certain degree of components is stored, and the data capacity is reduced by the existing compression technique and then transmitted, and decompressed at the receiving side.
In this case, when the accumulation amount is not very large, the compression efficiency is not high, and therefore, if the compression efficiency is prioritized, a delay in the accumulation time is generated. Therefore, when the timeliness is required, the transmission may be performed without compression. However, if transmitted without compression, the data transfer amount is large compared to the case of compression.
As conventional techniques for data compression, there are techniques disclosed in patent documents 1 to 8 and non-patent document 1, but none of them describes a compression encoding method of data suitable for use in encoding data of a fixed length.
Prior art literature
Patent literature
Patent document 1: japanese patent laid-open No. 2007-214998
Patent document 2: U.S. patent publication No. 2011/0200104
Patent document 3: japanese patent application laid-open No. 2014-502827
Patent document 4: japanese patent laid-open No. 2010-26884
Patent document 5: japanese patent laid-open No. 2007-214813
Patent document 6: international publication No. 2013/175909
Patent document 7: japanese patent laid-open No. 2007-221280
Patent document 8: japanese patent laid-open No. 2011-481514
Non-patent literature
Non-patent document 1: lossless compression handbook, academic press,2002/8/15, ISBN-10:01620811, ISBN-13:978-0126208610
Disclosure of Invention
Problems to be solved by the invention
Accordingly, in an embodiment according to one aspect of the present invention, an object is to provide a data compression encoding method, an apparatus thereof, and a program thereof, which are suitable for encoding fixed-length data and then decoding the same.
Means for solving the problems
The data compression encoding of one aspect of the present invention comprises the steps of: dividing a record into a predetermined bit-width column so as not to be related to the boundary of the field, the record being composed of a fixed-length bit string including 1 or more fields in which the same kind of data is described in the same field among predetermined fields; and solving the occurrence probability of the bit value at the same position in the solution column according to each column in the plurality of records, and performing entropy coding on the plurality of records according to the occurrence probability.
In addition, in the data compression encoding according to another aspect of the present invention, each sensor data input from 1 or more sensors is combined with a record composed of a fixed-length bit string, and the record is compression encoded to output the sensor data, wherein the following steps are repeated for a number corresponding to the amount of a predetermined number of records: the record is divided into columns of a predetermined bit width, the probability of occurrence of a bit value at the same position in the division is obtained for each column among a plurality of records inputted before the time, the columns constituting the record are encoded by entropy encoding based on the probability of occurrence, and the encoded columns are combined and outputted.
That is, a predetermined number of sensor data inputted from 1 or more sensors sequentially and serially are combined to obtain a fixed-length bit string, which is regarded as virtual table data, and the virtual table data is compressed in the column direction.
The entropy encoding is an encoding scheme in which codes having a large occurrence probability are assigned a short code length and codes having a small occurrence probability are assigned a long code length to compress the codes. As typical symbols used for entropy coding, huffman coding, arithmetic coding, and the like are known.
There are various modes such as adaptive huffman coding and Canonical Huffman Codes (normal huffman coding), and in arithmetic coding, there are known various modes such as adaptive arithmetic coding and Q-encoder and section encoder.
Effects of the invention
According to the embodiments according to one aspect of the present invention, it is possible to provide a data compression encoding method, an apparatus thereof, and a program thereof, which are suitable for use in encoding fixed-length data.
Drawings
Fig. 1 is a schematic diagram schematically illustrating a sensor network.
Fig. 2 is a diagram illustrating an example of fixed-length data.
Fig. 3 is a diagram illustrating column division by the encoding method according to the present embodiment.
Fig. 4A is a diagram showing an example of a functional block configuration of the data compression encoding device according to the present embodiment.
Fig. 4B is a diagram showing another example of the functional block configuration of the data compression encoding device according to the present embodiment.
Fig. 5A is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4A.
Fig. 5B is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4B.
Fig. 6 is a flowchart for explaining a data compression encoding method according to the present embodiment using an adaptive entropy encoding method.
Fig. 7 is a flowchart for explaining a data compression encoding method according to the present embodiment using an accumulated entropy encoding method in general.
Fig. 8 is a flowchart illustrating a cumulative Huffman (Huffman) coding method.
Fig. 9 is a flowchart illustrating the cumulative huffman decoding method.
Fig. 10 is a flowchart illustrating an adaptive huffman coding method.
Fig. 11 is a flowchart illustrating an adaptive huffman decoding method.
Fig. 12 is a flowchart illustrating an adaptive arithmetic coding method.
Fig. 13 is a flowchart illustrating an adaptive arithmetic decoding method.
Fig. 14A is a diagram illustrating a record group for explaining the cumulative huffman coding method according to the present embodiment by way of specific example.
Fig. 14B is a diagram illustrating a coding dictionary for explaining the cumulative huffman coding method according to the present embodiment by way of specific example.
Fig. 14C is a diagram for explaining encoded data by the accumulated huffman encoding method according to the present embodiment by way of specific example.
Fig. 15A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (1 thereof).
Fig. 15B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (2 thereof).
Fig. 16A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (3 thereof).
Fig. 16B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (4 thereof).
Fig. 17A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (5 thereof).
Fig. 17B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (6) thereof.
Fig. 18A is a diagram (1) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 18B is a diagram (2) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 19A is a diagram (3) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 19B is a diagram (4) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 20A is a diagram (5) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 20B is a diagram (6) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.
Fig. 21A is a diagram (7) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.
Fig. 21B is a diagram (8) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.
Fig. 22A is a diagram (9) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.
Fig. 22B is a diagram (10) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.
Fig. 23A is a diagram for explaining, by way of specific example, the creation of an encoding dictionary based on a decoding method for decoding encoded data encoded by the cumulative huffman encoding method according to the present embodiment.
Fig. 23B is a diagram illustrating, by way of specific example, decoding of encoded data encoded by the accumulated huffman encoding method of the present embodiment.
Fig. 24A is a diagram (1) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 24B is a diagram (2) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 25A is a diagram (3) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 25B is a diagram (4) illustrating a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 26A is a diagram (5) illustrating a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 26B is a diagram (6) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.
Fig. 27A is a diagram (1) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 27B is a diagram (2) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 28A is a diagram (3) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 28B is a diagram (4) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 29A is a diagram (5) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 29B is a diagram (6) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 30A is a diagram (7) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 30B is a diagram (8) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 31A is a diagram (9) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 31B is a diagram (10) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.
Fig. 32 is a hardware environment view of an exemplary computer executing a program when the present embodiment is installed in the program.
Detailed Description
Fig. 3 is a diagram illustrating column division according to the present embodiment.
Fig. 3 shows an example of 1 record of fixed-length data constituted by a fixed-length bit string. The record is composed of fields of the bit positions and bit widths that have been determined, and data is described in fields 1 to n. In the present embodiment, the records are divided into columns made up of a predetermined bit width. For example, in the case of fig. 3, column 1 is composed of 1 to a1 bits, column 2 is composed of a1+1 to a2 bits, column 3 is composed of a2+1 to a3 bits, and then, similarly, column m is composed of am-1+1 to am bits. a1 to am may be the same value or may be different values. In addition, columns may be divided according to the positions and bit widths of the fields, or may be divided irrespective of the widths and positions of the fields. The bit width of a column may be, for example, 1 bit, 2 bits, 4 bits, 8 bits, 16 bits, or the like.
The data of the fixed length is also constituted by adding "0" to the rear of the data to adjust the data length, and the variable length data is constituted by a field storing the same data, and when no data is recorded in the rear data, the data length is set to a fixed value by adding "0" to the data, and even in this case, the method of the present embodiment can be applied. As described above, in the present embodiment, the record constituted by the fixed-length bit string of the fixed-length data is constituted by the data having different meanings described in the plurality of specified fields, and the data described in the field at the same position in each record is made the same kind of data. Further, by dividing the record into blocks of an arbitrary number of bits, i.e., columns, and sequentially encoding the columns in the column direction independently of each other, compression encoding more effective than the conventional encoding method is achieved. That is, in the present embodiment, one record is encoded by successively encoding each column for each column of the same position of a plurality of records.
Here, encoding columns in a mutually independent manner means that the encoding process does not depend on the data of different columns. Further, a field is a data storage location within fixed-length data in which a piece of data is stored, and the meaning of the piece of data stored in each field has been determined. The fixed length data is composed of data stored in 1 or more fields. Columns are partitioned for fixed length data, but the data stored in a column need not necessarily be a piece of meaningful data. If a column is divided so as to span a field, one field may be divided into a plurality of columns, or the like, to be simply divided into pieces of data. However, the column dividing method is the same in a plurality of fixed-length data in which the same column indicates the same portion of the data segment.
Fig. 4A is a diagram showing an example of a functional block configuration of the data compression encoding device according to the present embodiment. As shown in fig. 4A, after the input record is divided into columns by the dividing unit 10, data of each column is temporarily stored in each of the column registers 11-1 to 11-m, and then compression-encoded individually for each column by each of the column encoding units 12-1 to 12-m. The compression-encoded data of each column is converted into 1 data stream by the mixing unit 13, and is output as 1 recorded encoded data.
Although the individual encoding units 12-1 to 12-m are provided for each column, the present invention is not limited to this, and the compression encoding process may be performed in a time-division manner so that 1 encoding unit performs compression encoding for each column individually. As in the example of fig. 1, the data compression encoding device of the present embodiment is provided in a sensor node, for example.
The compression encoding method used by the data compression encoding device having the functional block configuration shown in fig. 4A may be, for example, an entropy encoding method including huffman encoding. When the entropy coding method is used for the column coding units 12-1 to 12-m, as shown in fig. 4A, a frequency table and a coding table are stored in each of the column coding units 12-1 to 12-m.
The compression encoding method according to this embodiment is particularly effective when the fixed-length bit string is composed of a plurality of pieces of independent information. Even if the boundaries of fields including the independent information of the fixed-length bit strings are disregarded when dividing the columns, the average data amount after compression encoding can be reduced by ignoring the correlation between columns.
Fig. 4B is a diagram showing another example of the functional block configuration of the data compression encoding device according to the present embodiment. The example shown in fig. 4B is a case where arithmetic coding is used.
As shown in fig. 4B, in the case of encoding in arithmetic encoding, the division unit 10a divides the data for each column for the record input, and the data for each column is stored in the column registers 11a-1 to 11 a-m. Then, the column division range determination units 12a-1 to 12a-m calculate the occurrence probability from the frequency of the read data values in each column, and determine the value for dividing the current section corresponding to the column for each column. Then, the section dividing means calculates a section corresponding to the next column from the calculated value and the column value.
That is, when the column division range determining unit 12a-1 of the column 1 completes the processing, the section dividing unit 18-1 divides the section corresponding to the column 2 according to the arithmetic coding method based on the data of the column 1 and the result of processing the data of the column 1. Next, the column division range determination unit 12a-2 of column 2 determines the value of the section dividing column 2 based on the occurrence probability of the data of column 2, and the section division unit 18-2 divides the section required for the next column 3 based on the result and the data of column 2. In the same manner, the above-described process is repeated up to column m. Then, the encoding section 19 encodes the record inputted based on the value that minimizes the binary representation included in the section as the section division result of the section dividing section 18-m, to obtain the encoded data output.
Fig. 5A is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4A.
When encoded data encoded by the data compression encoding apparatus of fig. 4A is input, the dividing unit 16 divides the encoded data into columns. Then, the plurality of decoding units 14-1 to 14-m decode the encoded data of each column. In this case, the decoding units 14-1 to 14-m perform decoding by referring to the frequency table and the encoding tables 15-1 to 15-m provided for each column of the data before encoding according to a specific encoding method. For example, when the encoding method is huffman encoding, the encoded data is sequentially read, and the symbol of the decoded data is generated by referring to a frequency table and an encoding table obtained by providing the symbol scheme of the encoded data for each of columns 1 to m.
Then, the decoded data decoded for each column is combined by the mixing unit 17, and the decoded record is output.
Fig. 5B is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4B.
In the case of decoding of arithmetic encoding shown in fig. 5B, the record subjected to encoding is input to the column division range determining unit 20-1 of column 1. Then, the column division range determination units 20a-1 to 20a-m calculate the occurrence probability from the frequency of the decoded data values in each column, and calculate the value for dividing the current section corresponding to the column. Then, the column 1 decoding units 14a-1 to column m decoding units 14a-m compare the value dividing the current section corresponding to each column with the value of the encoded data to obtain the decoded data of the column. Further, a section corresponding to the next column is obtained by a section dividing means based on the decoded data and the previously obtained value for dividing the current section. The mixing section 17a combines the decoded data of the column 1 decoding sections 14a-1 to column m decoding sections 14a-m to output a decoded record.
Fig. 6 is a flowchart for explaining a data compression encoding method according to the present embodiment using an adaptive entropy encoding method. In the adaptive coding method, compression coding is sequentially performed with input data.
First, in step S10, a frequency table used for entropy encoding is initialized. The frequency table is obtained by counting how many times a certain symbol appears in the encoded data. The frequency table itself is a frequency table conventionally used for entropy encoding. In this embodiment, the present invention is characterized in that the symbols present in the columns at the same position of the plurality of records are counted. As initialization, for example, all items are set to 0.
Next, in the loop of step S11, the process of step S12 is repeated for the number of columns corresponding to 1 record. In step S12, a coding table is created from the frequency table. In the case of huffman coding, the coding table is a huffman coding dictionary, and in the case of arithmetic coding, the coding table is an occurrence probability and is a table used in the case of actually replacing original data with coding information.
When the repetition process of the number of times corresponding to the number of columns in step S11 is completed, the flow advances to step S13. In the first process of step S11, a coding table is created from the frequency table initialized in step S10.
In step S13, 1 record is read as a fixed-length bit string. Next, in step S14, the records are divided into columns according to a predetermined method. In step S14a, encoding is performed for each column, and in step S15, the encoded data for each column is mixed to obtain 1 recorded compressed encoded data. In step S16, compression-encoded data of 1 recorded amount is output. At the end of data output for the 1 recorded amounts of all records, compression encoding of the input data is completed.
Next, after step S16, the process proceeds to step S17, and the process of step S18 is repeated for the number of columns. In step S18, the frequency table is updated. In this case, the frequency table is independent for each column, and has the number corresponding to the number of columns. The update of the frequency table is performed not by using data of other columns, but by sequentially encoding records for predetermined columns of records and updating the frequency table based on data of corresponding columns of the previous records.
When the loop processing of step S17 is completed, the routine returns to step S11, and a coding table is created from the frequency table of each column updated in the loop processing of step S17, and the routine proceeds to step S13, where the coding processing of the next record is entered. When there is no record to be processed, compression encoding is completed.
In the following, several modes corresponding to the entropy encoding mode will be described in more detail by way of specific examples.
Fig. 7 is a flowchart for explaining a data compression encoding method according to the present embodiment using an accumulated entropy encoding method in general. In the accumulating type encoding method, data to be compression-encoded is once read in its entirety, and then compression-encoded. That is, the encoded data is once read all and the frequency table is completed, and then the data is read again and encoded.
First, in step S19, a frequency table is initialized. In the loop of step S20, the repetition process is performed for all records of the data to be encoded for the number of times corresponding to the number of records. In step S21, 1 record is read in, and in step S22, the records are divided into columns by a predetermined method. In the loop of step S23, step S24 is repeatedly processed for the number of columns. In step S24, the frequency table provided for each column is updated. When the repetition process of the number of times corresponding to the number of columns in step S23 is completed, it is determined whether or not the repetition process of the number of times corresponding to the number of records in step S20 is completed, and if not, the repetition process is continued, and if so, the process proceeds to step S25. At the time of reaching step S25, since the update of the frequency table is completed for all the data to be encoded, the frequency table is outputted, and the flow proceeds to step S26.
In step S26, the process of step S27 is repeated for the number of columns. In step S27, a coding table is created from the frequency table. In the case of huffman coding, the coding table is a huffman coding dictionary, and in the case of arithmetic coding, the coding table is an occurrence probability and is a table used in the case of actually replacing original data with coding information. When the repetition process of the number of times corresponding to the number of columns in step S26 is completed, the flow advances to step S28.
In step S28, the processing is repeated for the number of times corresponding to the number of records included in the data to be encoded. In step S29, 1 record is read, and in step S30, the records are divided according to a predetermined method. In step S31, compression encoding is performed for each column, and in step S32, the compression encoded data are mixed to obtain 1 record of compression encoded data. In step S33, 1 recorded amount of data is output. In the loop processing in step S28, when the repetition processing of the number of times corresponding to the number of records is completed, the processing is ended.
Here, for example, in the case where the data to be compression-encoded is fixed-length data received from a sensor or the like, the number of records of the data to be compression-encoded depends on how much the data is summarized and compression-encoded. The capacity of data to be compression-encoded to be summarized depends on the capacity of a memory or the like of the encoding apparatus, but this should be appropriately determined by those skilled in the art using the present embodiment. Further, compression encoding is repeatedly performed to collect the data according to the case where the data is sequentially transmitted from the transmission source.
Fig. 8 and 9 are flowcharts illustrating the accumulated huffman coding and decoding method in more detail.
In the cumulative huffman coding method shown in fig. 8, the frequency table is initialized in step S40. In the loop of step S41, the processing in the period of step S41 is repeated for the number of times corresponding to the number of records. In step S42, 1 record is read, and in step S43, the records are divided into columns according to a predetermined method. In the loop of step S44, step S45 is repeated a number of times corresponding to the number of columns. In step S45, the frequency table is updated for each column. When the frequency table of all columns is updated, the frequency table is output in step S46, and the routine advances to the loop of step S47.
In the loop of step S47, the process of step S48 is repeated a number of times corresponding to the number of columns. In step S48, a coding table is created from the frequency table.
Next, in the loop of step S49, the processing in the period of step S49 is repeated for the number of times corresponding to the recording. In step S50, 1 record is read in. In step S51, the records are divided into columns according to a prescribed method. In the loop of step S52, the process of step S53 is repeated a number of times corresponding to the number of columns. In step S53, column data is encoded. Next, in step S54, the encoded data obtained in the loop of step S52 is mixed into 1 record. In step S55, 1 recorded amount of data is output. When the processing of the amount corresponding to the number of records is completed, the processing is ended.
In the cumulative huffman decoding method shown in fig. 9, the frequency table is read in step S60. In the loop of step S61, step S62 is repeated a number of times corresponding to the number of columns. In step S62, a coding table is created from the frequency table. In the loop of step S63, the processing in step S63 is repeated a number of times corresponding to the number of records. In step S64, 1 recorded amount of encoded data is read. In the loop of step S65, step S66 is repeated a number of times corresponding to the number of columns. In step S66, the column data is decoded according to the encoding table created in step S62. In step S67, the decoded data of each column is mixed into 1 record. In step S68, 1 recorded amount of data is output. When the processing of the amount corresponding to the number of records is completed, the processing is ended.
Fig. 10 and 11 are flowcharts for explaining the adaptive huffman coding and decoding method.
In the adaptive huffman coding method shown in fig. 10, a frequency table is initialized in step S70. In the loop of step S71, the process of step S72 is repeated a number of times corresponding to the number of columns. In step S72, in the initial processing, a coding table is created from the frequency table initialized in step S70, and after that, a coding table is created from the frequency table updated in step S80. In step S73, 1 record is read in. In step S74, the records are divided into columns according to a prescribed method. In the loop of step S75, the process of step S76 is repeated a number of times corresponding to the number of columns. In step S76, the column data is encoded according to the encoding table created in step S72. In step S77, the encoded data of each column is mixed by 1 recorded amount. In step S78, 1 recorded amount of data is output. In the loop of step S79, the process of step S80 is repeated a number of times corresponding to the number of columns. In step S80, the frequency table of each column is updated. When the number of times of repeated execution corresponding to the number of columns is completed, the routine returns to step S71 to create a coding table, and the subsequent recording process in step S73 and subsequent steps is repeated.
The adaptive huffman decoding method shown in fig. 11 decodes data encoded by the adaptive huffman encoding method shown in fig. 10. The decoding of the encoded data is performed by performing reverse analysis on the encoding table used for encoding, and obtaining the original column data from the encoded data. Therefore, in the flow shown in fig. 11, the step of encoding column data and the step of mixing encoded data in the flow shown in fig. 10 are replaced with the step of decoding column data and the step of mixing decoded data, the 1-record reading step is replaced with the encoded data reading step of 1-record amount, and the encoded data outputting step is replaced with the decoded record outputting step.
As shown in fig. 11, in step S85, the frequency table is initialized. In the loop of step S86, the process of step S87 is repeated a number of times corresponding to the number of columns. In step S87, the initial processing creates a coding table from the frequency table initialized in step S85, and thereafter creates a coding table from the frequency table updated in step S94. In step S88, 1 recorded amount of encoded data is read in. In the loop of step S89, the process of step S90 is repeated a number of times corresponding to the number of columns. In step S90, the column data is decoded based on the encoding table created in step S87. In step S91, the decoded data of each column is mixed by 1 recorded amount. In step S92, 1 recorded amount of data is output. In the loop of step S93, the process of step S94 is repeated a number of times corresponding to the number of columns. In step S94, the frequency table of each column is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S86, a coding table is created, and the processing of the subsequent recording after step S88 is repeated.
Fig. 12 and 13 are flowcharts for explaining the adaptive arithmetic coding and decoding method. The adaptive arithmetic coding device and the decoding device can be realized by a computer by using a program that causes an algorithm shown in these flowcharts to be executed, corresponding to the configuration of the functional blocks described previously with reference to fig. 4B and 5B.
In the adaptive arithmetic coding method shown in fig. 12, in step S95, a frequency table is initialized. In the loop of step S96, the process of step S97 is repeated a number of times corresponding to the number of columns. In step S97, in the initial processing, an occurrence probability table is created from the frequency table initialized in step S95, and thereafter, an occurrence probability table is created from the frequency table updated in step S106. In step S98, 1 record is read in. In step S99, the records are divided into columns according to a prescribed method. In step S100, the section is initialized. In the loop of step S101, the process of step S102 is repeated a number of times corresponding to the number of columns. In step S102, the intervals are divided according to an arithmetic coding method. In step S103, encoded data is generated from the section finally obtained in the loop of step S101. In step S104, the encoded data is outputted as 1 recorded amount of encoded data. In the loop of step S105, the process of step S106 is repeated a number of times corresponding to the number of columns. In step S106, the frequency table is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S96, an occurrence probability table is created, and the processing of the subsequent recording after step S98 is repeated.
The adaptive arithmetic decoding method shown in fig. 13 decodes data encoded by the adaptive arithmetic encoding method shown in fig. 12.
As shown in fig. 13, in step S110, a frequency table is initialized. In the loop of step S111, the process of step S112 is repeated a number of times corresponding to the number of columns. In step S112, an occurrence probability table is created from the frequency table. In step S113, 1 recorded amount of encoded data is read. In step S114, the section is initialized. In the loop of step S115, the processing of step S116a, step S116, and step S117 is repeated for the number of columns. In step S116a, the occurrence probability is calculated from the frequency of the decoded data value in each column, and the value dividing the current section corresponding to the column is obtained. In step S116, the value of the current section corresponding to each column is compared with the value of the encoded data to obtain the decoded data of the column. In step S117, a section corresponding to the next column is obtained from the decoded data obtained in step S116 and the value for dividing the current section is obtained in step S116 a. In step S118, the column-decoded data obtained in step S116 is mixed by 1 recorded amount. In step S119, 1 recorded amount of data is output. In the loop of step S120, the process of step S121 is repeated a number of times corresponding to the number of columns. In step S121, the frequency table of each column is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S111, an occurrence probability table is created, and the processing of the subsequent recording after step S113 is repeated.
While the data compression encoding method and decoding method of the present embodiment have been described above with reference to fig. 6 to 13, the data compression encoding apparatus and decoding apparatus of the present embodiment may be installed on a computer by using a program that uses an algorithm shown in the flowcharts described in these drawings.
Next, data compression encoding/decoding according to the present embodiment will be described with reference to specific examples of recording.
Fig. 14A to 22B show processing examples of the data compression encoding method according to the present embodiment.
Fig. 14A to 14C are diagrams for explaining the cumulative huffman coding method according to the present embodiment by way of specific example. In the example shown in fig. 14A to 14C, 10 records are accumulated and then compression-encoded uniformly.
Illustrated in fig. 14A is a record group 20 made up of 10 records of fixed length 8 bits. Each record is divided into, for example, column 1 and column 2 having a bit width of 4 bits. In the following description of encoding according to another embodiment, the record group 20 is used as a record group to be encoded.
Fig. 14B illustrates an example of the encoding dictionary 25 in the case of using huffman encoding. The conventional huffman coding method can be referred to non-patent document 1. In the present embodiment, the encoding dictionary 25 is provided separately for each column. The same coding dictionary is used for the same column. In the case of fig. 14A to 14C, 1 record is divided into 2 columns, and therefore, 2 encoding dictionaries are also provided.
In fig. 14B, reference code 21 shows data that may appear in each column. That is, 1 column is composed of 4 bits, and thus there are 16 kinds of arrangements of 0 and 1. Thus, to include all combinations of these bits, the encoding dictionary 25 is composed of 16 rows.
The data shown by reference code 22 is obtained by solving the number of occurrences of each bit pattern in record group 20. The data of the occurrence probability of each data solved based on the number of occurrences is shown by reference code 23, and reference code 24 shows the self-information entropy. The occurrence probability 23 is obtained by dividing the number of occurrences 22 by the number of records. For example, in the coding dictionary on the left side of the coding dictionary shown by reference code 25, the number of occurrences of "0010" is 7, and the total number of records is 10, so that the occurrence probability 23 is 7/10=0.7. Further, when the self-information entropy 24 is S and the occurrence probability 23 is p, s= -log (p). The coding is performed according to the occurrence probability 23 or the self-information entropy 24.
The data indicated by the reference code 27 is encoded data of each column obtained by the above encoding. By combining the huffman coding, coded data obtained by compression-coding the record is obtained. The data shown by the reference code 26 of fig. 14C is encoded data corresponding to each record of the record group 20. When the record group 20 and the encoded data 26 are compared, it can be judged that the data amount is reduced, but in this method, it is necessary to refer to an encoding dictionary used in compression encoding at the time of decoding, and therefore it is necessary to separately transmit and receive the frequency table of the reference code 22 (or the encoding dictionary of the reference code 25). In the case of the accumulation type illustrated in fig. 14A to 14C, it is suitable to perform compression encoding by summarizing records to some extent.
In the description of fig. 6 and 7, the case where the frequency table and the encoding table are independent tables is described, but in the example of fig. 14A to 14C, a structure in which the frequency table is included in the encoding table is adopted.
Fig. 15A to 17B are diagrams for explaining the adaptive huffman coding method according to the present embodiment by way of specific examples. In the adaptive encoding/decoding method, the occurrence probability or the occurrence frequency does not need to be obtained in advance, and the encoding can be performed immediately at the time of generating the recorded data. In addition, the encoded information can be decoded instantaneously.
Shown in fig. 15A are the initial state code table 25 shown by the reference code 30-1, the record group 20, and the original recorded code data 31-1. The input recording group 20 is the same as the recording group 20 shown in fig. 14. The structure of the code table 25 is the same as the structure of the code dictionary 25 shown in fig. 14. The same item is labeled with the same reference code only in fig. 15A. The laplace smoothing is applied to the frequency table 22 included in the encoding table 25 in the initial state so that all of them have the same value "1". The occurrence probability, the self-information entropy, and the Huffman code are solved based on the frequency, and the first record is encoded using the code. The encoding result is the same value as the input record as indicated by the encoded data 31-1. In the initial state, all frequencies are equal, and therefore, the compression effect is not obtained.
Next, the frequency table is updated based on the initial record. The frequency of the items corresponding to the data appearing is increased by a predetermined value. As shown in fig. 15B, the number of times of generation of "0010" increases by 1 in the left column, and the number of times of generation of "1000" increases by 1 in the right column. The data obtained by re-solving the occurrence probability and the self-information entropy from the frequency table is the code table 25 shown as 30-2, and the data obtained by solving the huffman code is represented by a bold word in the code data 31-2. In the encoded data 31-2, it is shown that: the compression effect is exhibited compared to the initial record in which the compression effect is not obtained.
Next, "0010" appears again in the left column and "1000" appears again in the right column of the 3 rd record, respectively, as shown in fig. 16A, and therefore, the entry of "0010" in the frequency table on the left side and the entry of "1000" in the frequency table on the right side of the coding table shown as 30-3 are updated to 3. The result of Huffman encoding based on this frequency table is shown in encoded data 31-3.
In fig. 16B, "0010" appears in the left column of the 4 th record, and "1100" appears in the right column, and therefore, in the frequency table of the coding table shown as 30-4, the entry of "0010" is updated to 4 in the frequency table on the left side. In the right frequency table, the entry "1100" appears for the first time, but is not updated since the initial value is 1. The result of Huffman encoding based on this frequency table is shown in encoded data 31-4.
In fig. 17A, "1010" appears in the left column and "1000" appears in the right column of the 5 th record, and therefore, in the frequency table of the coding table shown as 30-5, the entry of "1010" is maintained at the initial value of 1 in the frequency table on the left side. In the right frequency table, the entry of "1000" is updated to 4. The result of Huffman encoding based on this frequency table is shown in encoded data 31-5.
In fig. 17B, "0010" appears in the left column and "1000" appears in the right column of the 6 th record, and therefore, in the frequency table of the coding table shown as 30-6, the entry of "0010" is updated to 5 in the frequency table on the left side. In the right frequency table, the entry of "1000" is updated to 5. The result of Huffman encoding based on this frequency table is shown in encoded data 31-6.
The encoding is sequentially performed by repeating the processing in this way. In fig. 17A and 17B, the encoding table up to 6 records is described, but all records can be encoded by updating the frequency table in the same manner, repeatedly solving the occurrence probability, the self-information entropy, and the huffman encoding and encoding.
In this way, when the adaptive coding method is used, the transmission/reception coding dictionary is not required, and therefore, even with data having a small number of records, a compression effect can be obtained.
Fig. 18A to 22B are diagrams for explaining a data compression encoding method according to the present embodiment in units of 1 bit by way of specific example.
By this method, the memory capacity for recording the frequency table at the time of encoding and decoding can be reduced.
When divided into bit units, the coding can be performed by an arithmetic coding (Arithmetic coding) method. Further, since the frequency is updated while encoding in sequence in the column direction, an adaptive binary arithmetic encoding method is used. The arithmetic coding method itself can be a conventionally known method. If necessary, non-patent document 1 can be referred to.
The input record group 20 is the same data as the record group 20 shown in fig. 14A, but is divided into 1-bit units by columns.
The upper part of table 40-1 shown in fig. 18A is the frequency, and the lower part is the occurrence probability corresponding to the frequency. The same applies to fig. 18A to 22B below. Table 40-1 is a table of initial states. In this case, the frequency of each of the data "0" and the data "1" is required, but only the frequency of "0" is recorded in table 40-1. The frequency in the case of "1" is not recorded, but a column of the total recorded number 41-1 is set. The frequency of "1" can be obtained by subtracting the frequency of "0" from the total number of records. Regarding the initial value, the frequency of "0" is still set to 1 and the total number of records is set to 2 using laplace smoothing. The occurrence probability of "0" obtained from this frequency is shown in the lower part of table 40-1. The occurrence probability can be obtained by using the frequency and the total number of records. In addition, the occurrence probability of "1" can be calculated by (occurrence probability of 1- ("0").
Arithmetic coding is performed based on the occurrence probability. Here, in the present embodiment, the probability (frequency) of occurrence that is independent for each column (independent for each bit in this example) is used. The arithmetic coding result of the 1 st record is shown in the coded data 42-1. The value of the section obtained by arithmetic coding is described on the right side of the coded data 42-1. The decimal part in the form of a 2-ary number of the numerical value that can be expressed by the shortest number of digits included in the section is the result of arithmetic coding. In this example, 0.00101 (2-digit) =0.15625 (10-digit), and thus the result is "00101". In general, in the case of arithmetic coding, even if the last "0" of the coding result is omitted, decoding can be performed, and therefore, normally, the last "0" is omitted here. In addition, in the present embodiment, as the encoding result, the column division is performed in units of bits, and therefore, the frequency is not related to the frequency of other bits in the record, but the frequency of occurrence of bits determined by the positions of the bits is counted between different records, as in the case of the frequency of occurrence of bit 1 corresponding to bit 1 and the frequency of occurrence of bit 2 corresponding to bit 2. Therefore, the occurrence probability is obtained by dividing the number of "0" s appearing at the position of the prescribed bit by the number of records to be processed. The occurrence probability of "1" is obtained by subtracting the occurrence probability of "0" from 1.
The frequency of occurrence and probability of occurrence of the 2 nd record updated after the encoding of the 1 st record are shown in table 40-2 of fig. 18B. Since only the frequency of "0" is solved, in table 40-2, the frequency of "0" is increased by 1 only at the position where "0" appears in the 1 st record. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-2 is increased to 3. The occurrence probability obtained from the frequency and the total number of records is described in the lower part of the frequency table 40-2. The result of arithmetic coding based on the occurrence probability is shown in association with the 2 nd line and the 2 nd record of the coded data 42-2. It is known that the value of the arithmetic-coded section changes. The binary form of the minimum number of bits included in this section is 0.01 (binary) =0.25 (decimal), and thus the result of encoding is "01".
The frequency of occurrence and probability of occurrence of the 3 rd record updated after the encoding of the 2 nd record are shown in table 40-3 of fig. 19A. In table 40-3, in the 2 nd record, the frequency of occurrence of "0" is increased by 1 only at the position where they are 3, respectively. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-2 is increased to 4. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-3. The result of arithmetic coding based on the occurrence probability is shown in association with the 3 rd line and the 3 rd record of the coded data 42-3. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".
The frequency of occurrence and probability of occurrence of the 4 th record updated after the encoding of the 3 rd record are shown in table 40-4 of fig. 19B. In table 40-4, in the 3 rd record, the frequency of occurrence of "0" is increased by 1 only at the position, and they are respectively 4. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-4 is increased to 5. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-4. The result of arithmetic coding based on the occurrence probability is shown in association with the 4 th line and 4 th record of the coded data 42-4. It is known that the value of the arithmetic-coded section changes. Since 0.1 (binary) =0.5 (decimal), the result of the encoding is "1".
The frequency of occurrence and probability of occurrence of the 5 th record updated after the encoding of the 4 th record are shown in table 40-5 of fig. 20A. In table 40-5, in the 4 th record, the frequency of occurrence of "0" is increased by 1 only at the position where they are 5, respectively. The frequency of the positions of the 3 rd, 5 th and 6 th bits where "1" appears in the 4 th record is kept at the previous value. In addition, the total recorded number 41-5 is increased to 6. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-5. The result of arithmetic coding based on the occurrence probability is shown in association with the 5 th line and the 5 th record of the coded data 42-5. It is known that the value of the arithmetic-coded section changes. Since 0.111 (binary) =0.875 (decimal), the result of encoding is "111".
The frequency of occurrence and probability of occurrence of the 6 th record updated after the encoding of the 5 th record are shown in table 40-6 of fig. 20B. In Table 40-6, in record 5, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the positions of the 1 st, 3 rd, and 5 th bits where "1" appears in the 5 th record is kept at the previous value. In addition, the total recorded number 41-6 is increased to 7. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-6. The result of arithmetic coding based on the occurrence probability is shown in association with the 6 th line and 6 th record of the coded data 42-6. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".
The frequency of occurrence and probability of occurrence of the 7 th record updated after the encoding of the 6 th record are shown in table 40-7 of fig. 21A. In Table 40-7, in record 6, the frequency value is incremented by 1 only at the position where "0" occurs. The frequency of the 3 rd and 5 th bit positions where "1" appears in the 6 th record is kept at the previous value. Further, the total recorded number 41-7 is increased to 8. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-7. The result of arithmetic coding based on the occurrence probability is shown in the 7 th line of the coded data 42-7 corresponding to the 7 th record. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".
The frequency of occurrence and probability of occurrence of the 8 th record updated after the encoding of the 7 th record are shown in table 40-8 of fig. 21B. In Table 40-8, in record 7, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the 3 rd and 5 th bit positions where "1" appears in the 7 th record is kept at the previous value. Further, the total recorded number 41-8 is increased to 9. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-8. The result of arithmetic coding based on the occurrence probability is shown in association with the 8 th line and 8 th record of the coded data 42-8. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".
The frequency of occurrence and probability of occurrence of the 9 th record updated after the encoding of the 8 th record are shown in the table 40-9 of fig. 22A. In Table 40-9, in record 8, the frequency value is incremented by 1 only at the position where "0" occurs. The frequency of the positions of the 3 rd bit and the 5 th bit where "1" appears in the 8 th record is kept at the previous value. Further, the total recorded number 41-9 is increased to 10. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of tables 40 to 9. The result of arithmetic coding based on the occurrence probability is shown in association with the 9 th line and 9 th record of the coded data 42-9. It is known that the value of the arithmetic-coded section changes. Since 0.10101 (binary) = 0.65625 (decimal), the result of encoding is "10101".
The frequency of occurrence and probability of occurrence of the 10 th record updated after the encoding of the 9 th record are shown in the table 40-10 of fig. 22B. In Table 40-10, in record 9, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the 3 rd and 4 th bit positions where "1" appears in the 9 th record is kept at the previous value. Further, the total recorded number 41-10 is increased to 11. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of the table 40-10. The result of arithmetic coding based on the occurrence probability is shown in association with the 10 th record on the 10 th line of the coded data 42-10. It is known that the value of the arithmetic-coded section changes. Since 0.101111 (binary) = 0.734375 (decimal), the result of encoding is "101111".
In this way, the frequency update and arithmetic coding are repeatedly performed to perform coding.
In the case where arithmetic coding is used in the above-described division of each bit, there are the following effects.
That is, if the entire record is regarded as 1 column, the same compression as in the prior art is performed, but in the example of the present embodiment, when the record is divided into 8 bits in units of bits, the size of the required frequency table is 8+1=9, but in the prior art, 256 is required. In addition, since the occurrence probability can be calculated from the frequency table, no separate storage is required.
When assuming that the recording length is 32 bits (33 bits in the example of the present embodiment), in the related art, 32 th power of 2= 4294967296, if the recording length is longer data, a method of regarding the recording as 1 column as a whole is impossible in reality. In the case of dividing, the method of the example of the present embodiment can obtain a higher compression effect than a method using a conventional compression technique having 1 dictionary as a whole.
In addition, when compression encoding is performed in the column direction by dividing it in units of 1 bit, the following effects are obtained. For example, in the case of dividing the information into a plurality of bits, the information for substitution for encoding must be stored in accordance with the bit pattern of the division unit, but if the information is 1 bit, it is sufficient to store in advance whether or not the 1 bit is "1", so that the capacity of the job memory required at the time of compression encoding is small. In addition, in the case of dividing into a plurality of bits, the symbol is replaced for each division unit, and compression encoding is required by an amount of 1 record, but in the case of dividing into 1 bit, compression encoding can be performed by obtaining the number of bits of 1 record and the number of bits of "1" or "0", and therefore, logic for performing compression encoding is also simplified.
Fig. 23A to 31B show processing examples of a decoding method corresponding to the data compression encoding method of the present embodiment.
Fig. 23A and 23B are diagrams illustrating a decoding method of decoding encoded data encoded by the cumulative huffman encoding method shown in fig. 14A to 14C.
It is determined in advance to decode encoded data encoded by the accumulated huffman encoding method shown in fig. 14A to 14C, that is, it is determined in advance to process an 8-bit record constituted by 2 4-bit columns. In addition, the manner of solving the huffman coding is also predetermined.
On the decoding side, an area of the decoding dictionary 50-1 shown in fig. 23A is prepared in advance. A table consisting of 2 blocks of 16 (4 th power of 2) rows was made by the above determination. The columns other than column a of the table are set as blank columns in advance.
Next, the frequency of generation of the symbol generated by encoding is read into the column b. In this case, 32 integer values are read in. Based on the frequency of occurrence, the probability of occurrence of the column c is calculated, a Huffman tree is created, and Huffman encoding is solved in the column e, thereby completing the decoding dictionary 50-1. The calculation step of huffman coding requires the same step as the coding. The decoding dictionary 50-1 is the same as the encoding dictionary 25 shown in fig. 14B.
There is also a method of generating the occurrence probability of the transmission/reception column c, not the occurrence frequency of the transmission/reception column b. In addition, the huffman code table of the reception column e may be transmitted, and in this case, it is not necessary to determine a method for solving the huffman code in advance.
Next, the encoded bit string is read, and decoded data is obtained from the decoding dictionary 50-1. Since huffman coding is a prefix code, the coded bit string can be decoded sequentially from the beginning. No special separator is required.
Fig. 23B shows a decoded record 51-2 obtained by decoding the encoded data 51-1 using the decoding dictionary 50-1. When the 1 st line of the encoded data 51-1 is observed, the encoded data is "00". When looking at columns a and e of the decoding dictionary 50-1, in the left column, the encoded data "0" corresponds to the symbol column "0010", and in the right column, the encoded data "0" corresponds to the symbol column "1000". Accordingly, the encoded data "00" becomes "00101000" after decoding. The same applies to the case described above until the 3 rd line of the encoded data 51-1.
Line 4 of the encoded data 51-1 is "010". According to the decoding dictionary 50-1, a symbol such as "01" does not exist in the left column, and therefore, the encoded data in the left column takes "0". This corresponds to "0010" after decoding. Since the encoded data of the right column is "10", the encoded data becomes "1100" when the decoding dictionary 50-1 is observed. Therefore, the decoded symbol sequence becomes "00101100". In the following, the encoded data 51-1 can be decoded in the same manner.
Fig. 24A to 26B are diagrams illustrating a decoding method of decoding encoded data encoded by the adaptive huffman encoding method illustrated in fig. 15A to 17B.
It is determined in advance to decode encoded data encoded by the adaptive huffman encoding method shown in fig. 15A to 17B, that is, it is determined in advance to process an 8-bit record composed of 2 4-bit columns. In addition, the manner of solving the huffman code is also predetermined.
On the decoding side, a table 50-2 shown in fig. 24A is prepared in advance. A table consisting of 2 blocks of 16 (4 th power of 2) rows was made by the above determination. In this method, since the transmission and reception of the frequency table are not performed in advance, the initial value of the generated frequency is calculated by using the laplace smoothing and setting all to "1" as in the encoding. As a result, the same table as the initial state encoding table 30-1 shown in fig. 15A was created.
Here, if the corresponding code is found from column e of table 50-2 at the time of reading the first encoded data "00101000" into region 51-2, column a is the decoded data. By performing the above processing on the left and right columns and combining 2 pieces of decoded data on the table 51-3, the record before encoding can be decoded. Since huffman coding is a prefix code, the coded bit string can be decoded sequentially from the beginning, and therefore no special delimiter is required.
Since the decoded data of the left column is "0010" and the decoded data of the right column is "1000", 1 is added to the frequency of the corresponding column of table 50-2. The Huffman code of the table 50-3 shown in FIG. 24B is obtained from the added frequency.
Here, the data "010101" of the 2 nd record is read in. First, the 1 st column is decoded from the e column on the left side. That is, "010" is found from the beginning of the encoded data, knowing that it corresponds to the decoded data "0010" in table 50-3. Next, the 2 nd column is decoded according to the e column on the right side. That is, the remaining part of the encoded data is "101", and therefore, when viewing the table 50-3, it is known that it corresponds to "1000". Therefore, the decoded right column data is known to be "1000". Then, the decoded symbol sequences of the left and right columns are combined to obtain "00101000". Then, table 50-3 is updated. Since huffman coding is a prefix code, no delimiter is required. By repeating this process, decoding can be performed.
In fig. 25A, the 3 rd encoded data is "001001", and therefore "001" corresponds to "0010" according to the left column of table 50-4, and "001" corresponds to "1000" according to the right column. Thus, the 3 rd decoded symbol column is "00101000".
Further, as shown in fig. 25B, the 4 th encoded data is "00100010", and therefore, it can be seen from table 50-5 that "001" in the left column corresponds to "0010", and "00010" in the right column corresponds to "1100". Therefore, the 4 th decoded symbol column is "00101100".
As shown in fig. 26A, the 5 th encoded data is "0000011", and therefore, "00000" corresponds to "1010" in the left column of table 50-6, and "11" corresponds to "1000" in the right column. Thus, the 5 th decoded symbol column is "10101000".
Since the 6 th encoded data is "0101", it is clear from tables 50 to 7 that "01" in the left column corresponds to "0010" and "01" in the right column corresponds to "1000". Therefore, the 6 th decoded symbol sequence is "00101000". By repeating the above processing, decoding of all records can be performed.
Fig. 27A to 31B are diagrams illustrating a decoding method of decoding encoded data encoded by the adaptive arithmetic encoding method shown in fig. 18A to 22B.
It is determined in advance to decode encoded data encoded by the adaptive arithmetic encoding method shown in fig. 18A to 22B, that is, it is determined in advance to process an 8-bit record constituted by 1-bit columns×8 columns. In addition, the manner of arithmetic coding is also predetermined.
On the decoding side, a table 60-1 shown in fig. 27A is prepared in advance. A table made up of 8 blocks was made by the above determination. Although it is necessary that the column data is "0" and "1" in each block, only data at the time of "0" is stored as in the case of encoding, and a column of the total encoded data 61-1 is provided. In the adaptive type, since the transmission/reception of the frequency table is not performed in advance, the occurrence probability is calculated by using the laplace smoothing and setting all the values to "1" as in the encoding of the initial value of the generated frequency, and the table obtained is the table 60-1 shown in fig. 27A.
Here, the 1 st recorded data "00101" is read into the area 61-2. In addition, since the arithmetic symbol is not a prefix code, it is necessary to use a protocol capable of determining a record separator.
When received data "00101" is interpreted as a binary fraction, encoded data 0.15625 is obtained. By determining a column value from the data and dividing the section by the same method as the arithmetic coding, as shown in table 61-3, decoded data "00101000" which is a record before coding is obtained.
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column where "0" of the decoded data "00101000" is located, resulting in table 60-2 shown in fig. 27B.
Here, the data "01" of the 2 nd record is read in.
When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as the arithmetic coding, thereby obtaining the decoded data "00101000". Hereinafter, for the sake of clarity, a processing example for acquiring the 2 nd decoded data will be described in detail.
In table 60-2, the frequency shown in fig. 27B is described by inputting the 1 st encoded data. The 2 nd code value "01" of the input is the decimal part of the binary decimal 0.01, which in decimal form is 0.25. The decoded data is sequentially solved for each bit by using the occurrence probability of "0" for each column (bit) obtained from the decimal value "0.25" and the frequency of "0" for each column after decoding. The initial value of the interval when decoding the first bit is [0, 1). The division of the section is repeated according to the occurrence probability of "0" for each column. The score value is calculated by a calculation formula of "(maximum value of section-minimum value of section) ×probability of" 0 "and minimum value of section".
First, the occurrence probability "0.667" is obtained as described in table 60-2 based on the frequency "2" and the number of records "3" of "0" described in the first column of table 60-2, and the score value of the current section is obtained by the above formula. The current section is [0,1 ] of the initial value, and therefore the calculated division value is "0.667". The process of solving the division value corresponds to the process of the column division range determination unit 20-1 shown in fig. 5B. (furthermore, the occurrence probability may be calculated in advance at the time of frequency update.)
The value of each column of the decoded record is "0" when the code value < = division value, and "1" when the symbol value > division value. In the present case, the division value is "0.667", the symbol value is "0.25", and thus the decoded bit value of the first column is "0". This process corresponds to the process of the column 1 decoding unit 14a-1 of fig. 5B. Since the bit value of the first column is "0" and the division value is "0.667", the next section is set to [0,0.667 ] which is a smaller range than the division value. This process corresponds to the process of the section dividing unit 21-1 of fig. 5B.
Next, the division value "0.444" of the current section [0,0.667 ] is solved based on the occurrence frequency of "0" in the 2 nd column of the table 60-2, and the decoded bit value in the 2 nd column is "0" based on the magnitude relation between the division value and the symbol value "0.25". Further, the next section is [0,0.444 ] based on the decoded bit value. These processing for the 2 nd column corresponds to the processing performed by the column division range determining unit 20-2, the column 2 decoding unit 14a-2, and the section dividing unit 21-2 described in fig. 5B, as with the processing for the first column.
Hereinafter, similarly, the division value "0.148" of the current section [0,0.444 ] is solved based on the occurrence frequency of "0" in the 3 rd column of the table 60-2, and the decoded bit value in the 3 rd column is "1" based on the magnitude relation between the division value and the symbol value "0.25". Further, the next section is [0.148,0.444 ] based on the decoded bit value.
The above processing is repeated for each column, whereby 1 record is decoded.
In this way, the sequentially decoded column data is mixed by the mixing unit 17a of fig. 5B to become 1 recorded decoded data.
Next, 1 is added to the number of records in table 60-2 shown in fig. 27B, and 1 is added to the frequency of the column in which "0" of the 2 nd decoded data "00101000" is located, and the occurrence probability is recalculated, and this calculation is performed in table 60-3 shown in fig. 28A.
Here, the data "01" of the 3 rd record is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described for the 2 nd recording, thereby obtaining the decoded data "00101000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-4 shown in fig. 28B.
Here, the 4 th data "1" is read in. When the received data "1" is interpreted as a binary fraction, the encoded data 0.5 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101100".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101100" is located, the calculation being performed in table 60-5 shown in fig. 29A.
Here, the data "111" of the 5 th record is read in. When the received data "111" is interpreted as a binary fraction, the encoded data 0.875 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "10101000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "10101000" is located, the calculation being performed in table 60-6 shown in fig. 29B.
Here, the data "01" of the 6 th record is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-7 shown in fig. 30A.
Here, the 7 th recorded data "01" is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-8 shown in fig. 30B.
Here, the 8 th recorded data "01" is read in, and when the received data "01" is interpreted as a binary decimal, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in the table 60-9 shown in fig. 31A.
Here, the 9 th recorded data "10101" is read in. When the received data "10101" is interpreted as a binary fraction, the encoded data 0.65625 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining decoded data "00110000".
The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00110000" is located, the calculation being performed in the table 60-10 shown in fig. 31B.
Here, the 10 th recorded data "101111" is read in. When the received data "101111" is interpreted as a binary fraction, the encoded data 0.734375 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00111100".
Fig. 32 is a hardware environment view of an exemplary computer executing a program when the present embodiment is installed as a program.
The illustrated computer 60 includes, for example, a CPU 50, a ROM 51, a RAM 52, a network interface 53, a storage device 56, a read-write drive 57, and an input-output device 59. Which are interconnected by a bus 55.
The CPU 50 executes a program in which the present embodiment is installed. The program is recorded in the storage device 56 or the portable recording medium 58, and can be executed by the CPU 50 by being developed from these media into the RAM 52.
The storage device 56 is, for example, a hard disk or the like. The portable recording medium 58 includes a magnetic disk such as a floppy disk, an optical disk such as a CD-ROM, DVD, blu-Ray, a semiconductor memory such as an IC memory, and the like, and the portable recording medium 58 is inserted into the read/write drive 57 to perform reading/writing to the portable recording medium 58. In the present embodiment, the program in which the present embodiment is installed may be recorded not only in the storage device 56 or the portable recording medium 58, but also input fixed-length data as an encoding target may be temporarily recorded in the program, and then read out to the RAM 52 and encoded.
The ROM 51 stores basic programs such as BIOS for performing communication via the bus 55 and functions of the network interface 53 and the input-output device 59. The CPU 50 executes these basic programs, thereby realizing the basic functions of the illustrated computer 60.
The input/output device 59 receives information input from a user using the illustrated computer 60, and outputs information to the user. The input-output device 59 includes, for example, a keyboard, a mouse, a touch panel, a display, a printer, and the like.
The network interface 53 is used for the computer 60 for illustration to communicate with other computers or network devices and the like via the network 54. In the present embodiment, the program in which the present embodiment is installed can be recorded in the storage device 56 or the portable recording medium 58 via the network 54. The program according to the present embodiment may be executed on another computer or network device connected to the network 54, and the input/output data may be transmitted and received via the network 53. Also, fixed length data to be encoded may be transmitted from a terminal having a sensor connected to the network 54.
The network 54 may be any network as long as it is a wired network, a wireless network, or the like, and can communicate between computers or between a computer and a network device. In one example, the network 54 may include the Internet, a LAN (Local Area Network: local area network), a WAN (Wide Area Network: wide area network), a fixed telephone network, a mobile telephone network, an ad hoc network, a VPN (Virtual Private Network: virtual private network), a sensor network, and the like.
As described above, in the present embodiment of one aspect of the present invention, when a fixed-length bit string of fixed-length data is composed of data having different meanings described in a plurality of specified fields and data described in the same-position field of each fixed-length data is the same type of data, the fixed-length bit string of fixed-length data is divided into arbitrary bit-number columns and the columns are successively encoded in the column direction so as to be independent of each other, whereby compression encoding with a higher compression rate than in the conventional encoding method can be achieved.
As an example of improving the compression ratio, the present inventors can compress the original data of 70, 016 bytes, 560, 128 bits to 13, 532 bytes, 94, 000 bits (excluding the complementary bits) by using the compression encoding device of the present embodiment. gzip is a compression of 14, 464 bytes, 115, 712 bits, bzip2 is a compression of 12, 985 bytes, 103, 880 bits, and therefore, the effectiveness of the compression encoding method of the present embodiment can be understood.
The encoding device of the present embodiment may be mounted by hardware such as an FPGA (Field Programmable Gate Array: field programmable gate array).
For example, the encoding device of the present embodiment may be implemented by combining hardware and software, in part, and in part, by hardware and software.
The above embodiments can be realized independently of each other or in combination with each other.
In the above-described embodiments, in the embodiments using the adaptive coding method, compression coding can be performed successively without temporarily accumulating data, and thus coding can be performed in real time. When the above embodiment is applied to real-time encoding, a predetermined number of recorded virtual images, which are sequentially input, are compressed in the column direction as table data.
Description of the reference numerals
1: sensor network
2: sensor for detecting a position of a body
3: gateway (GW)
4: processing device
10. 10a, 16: dividing unit
11-1 to 11-m, 11a-1 to 11a-m: column 1-m registers
12-1 to 12-m: coding units of columns 1-m
12a-1 to 12a-m, 20-1 to 20-m: column 1-m column division range determination unit
13. 17, 17a: mixing unit
14-1 to 14-m, 14a-1 to 14a-m: column 1-m decoding unit
15-1 to 15-m: list 1-m frequency table, coding table
18-1 to 18-m, 21-1 to 21-m: section dividing unit
19: coding unit
50:CPU
51:ROM
52:RAM
53: network interface
54: network system
55: bus line
56: storage device
57: read-write driver
58: portable recording medium
59: input/output device

Claims (9)

1. A data compression encoding method for temporarily accumulating a record composed of a fixed length bit string including 1 or more fields in which data of the same attribute among a plurality of data sequentially transmitted from a transmission source is described, to a predetermined number of 2 or more, and compression encoding the accumulated record, the data compression encoding method comprising the steps of:
a dividing step of dividing the predetermined number of records of the 2 or more number into columns of a predetermined bit width so as not to be related to the boundary of the field;
a code table generation step of obtaining, for each column, the occurrence probability of a bit value in a column at the same position among the predetermined number of records of 2 or more, and creating a code table for an entropy coding method for each column based on the occurrence probability;
an encoding step of encoding each column constituting each record of the predetermined number of records of the 2 or more records by using an encoding table created for each column; and
An output step of outputting encoded data obtained by combining the encoded columns for each record,
and repeating the encoding step and the outputting step according to the predetermined number of records.
2. A storage medium storing a program, wherein,
the program causes a computer to execute the data compression encoding method according to claim 1.
3. A data compression encoding device wherein a record is temporarily accumulated to a prescribed number of 2 or more, and the accumulated record is compression encoded, the record being composed of a fixed length bit string including 1 or more fields in which data of the same attribute among a plurality of data sequentially transmitted from a transmission source is described,
the data compression encoding device is characterized by comprising:
a dividing unit that divides the predetermined number of records of the 2 or more number into columns of a predetermined bit width so as not to be related to the boundary of the field;
a code table generation unit that obtains, for each column, the occurrence probability of a bit value in a column at the same position among the 2 or more predetermined number of records, and creates a code table for an entropy encoding method for each column based on the occurrence probability;
An encoding unit that encodes each column constituting each record of the predetermined number of records of the 2 or more records by an encoding table created for each column; and
an output unit that outputs encoded data obtained by combining the encoded columns for each record,
and repeating the processing based on the encoding means and the output means in accordance with the predetermined number of records.
4. A data compression encoding method for compression-encoding a record composed of a fixed-length bit string composed of 1 or more fields describing data of the same attribute among data sequentially transmitted from a transmission source and outputting the data,
the data compression coding method is characterized in that the data compression coding method comprises the following steps:
a dividing step of dividing the record into columns of a prescribed bit width in a manner not related to the boundaries of the fields;
an encoding step of, for a record input at a current time, calculating, for each column, an occurrence probability of a bit value in a column at the same position with respect to a record input before the time, and encoding each column constituting the record by an adaptive entropy encoding method based on the occurrence probability; and
And outputting the coded data obtained by combining the coded columns in real time.
5. The data compression encoding method according to claim 4, wherein,
the bit width of the column is 1, and in the encoding step, the record is encoded by an arithmetic encoding method according to the occurrence probability.
6. A storage medium storing a program, wherein,
the program causes a computer to execute the data compression encoding method according to claim 4.
7. A data compression encoding device for compression-encoding and outputting a record composed of a fixed-length bit string composed of 1 or more fields describing data of the same attribute among data sequentially transmitted from a transmission source,
the data compression encoding device is characterized by comprising:
a dividing unit that divides the record into columns of a prescribed bit width in a manner not related to the boundaries of the fields;
an encoding unit that obtains, for a record input at a current time, an occurrence probability of a bit value in a column at the same position for each column for a record that has been input before the current time, and encodes each column constituting the record by an adaptive entropy encoding method according to the occurrence probability; and
And an output unit that outputs encoded data obtained by combining the encoded columns in real time.
8. The data compression encoding apparatus according to claim 7, wherein,
the bit width of the column is 1, and the encoding unit encodes the record by an arithmetic encoding method according to the occurrence probability.
9. A storage medium storing a program that causes a computer to execute the data compression encoding method of claim 5.
CN201780045701.9A 2016-07-25 2017-07-18 Data compression encoding method, apparatus and storage medium Active CN109478893B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2016145397A JP6336524B2 (en) 2016-07-25 2016-07-25 Data compression encoding method, apparatus thereof, and program thereof
JP2016-145397 2016-07-25
PCT/JP2017/025955 WO2018021094A1 (en) 2016-07-25 2017-07-18 Data compression coding method, decoding method, device therefor, and program therefor

Publications (2)

Publication Number Publication Date
CN109478893A CN109478893A (en) 2019-03-15
CN109478893B true CN109478893B (en) 2023-05-09

Family

ID=61015968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780045701.9A Active CN109478893B (en) 2016-07-25 2017-07-18 Data compression encoding method, apparatus and storage medium

Country Status (5)

Country Link
US (1) US10547324B2 (en)
EP (2) EP3490153B1 (en)
JP (1) JP6336524B2 (en)
CN (1) CN109478893B (en)
WO (1) WO2018021094A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102552833B1 (en) * 2018-05-28 2023-07-06 삼성에스디에스 주식회사 Data processing method based-on entropy value of data
JP7047651B2 (en) * 2018-07-30 2022-04-05 富士通株式会社 Information processing equipment, distributed processing systems, and distributed processing programs
EP3817236A1 (en) * 2019-11-04 2021-05-05 Samsung Electronics Co., Ltd. Neural network data processing method and apparatus
CN111181568A (en) * 2020-01-10 2020-05-19 深圳花果公社商业服务有限公司 Data compression device and method, data decompression device and method
US20230214367A1 (en) * 2022-01-05 2023-07-06 AVAST Software s.r.o. System and method for data compression and decompression
CN115441878A (en) * 2022-08-05 2022-12-06 海飞科(南京)信息技术有限公司 FSE code table rapid establishing method for text compression
CN115078892B (en) * 2022-08-19 2022-11-01 深圳天川电气技术有限公司 State remote monitoring system for single-machine large-transmission frequency converter
DE102022003682A1 (en) * 2022-10-05 2024-04-11 Mercedes-Benz Group AG Method for compressing and decompressing log files and information technology system
CN115658628B (en) * 2022-12-19 2023-03-21 武汉惠强新能源材料科技有限公司 Production data intelligent management method for MES system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004349939A (en) * 2003-05-21 2004-12-09 Canon Inc Method and device for image encoding and recording device
US20090254521A1 (en) * 2008-04-04 2009-10-08 International Business Machines Corporation Frequency partitioning: entropy compression with fixed size fields

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2587134B2 (en) * 1990-12-17 1997-03-05 日本電信電話株式会社 Subband coding method
JP2000305822A (en) * 1999-04-26 2000-11-02 Denso Corp Device for database management and device for database extraction, and method for database management and method for database extraction
US6748520B1 (en) * 2000-05-02 2004-06-08 3Com Corporation System and method for compressing and decompressing a binary code image
JP2006100973A (en) * 2004-09-28 2006-04-13 Nomura Research Institute Ltd Data compression apparatus and data expansion apparatus
JP4434155B2 (en) * 2006-02-08 2010-03-17 ソニー株式会社 Encoding method, encoding program, and encoding apparatus
JP4846381B2 (en) 2006-02-08 2011-12-28 富士通セミコンダクター株式会社 BAND ALLOCATION METHOD, COMMUNICATION CONTROL DEVICE, AND COMMUNICATION DEVICE
JP2007214998A (en) 2006-02-10 2007-08-23 Fuji Xerox Co Ltd Coding apparatus, decoding apparatus, coding method, decoding method, and program
JP4688690B2 (en) * 2006-02-15 2011-05-25 日立造船株式会社 State change detection method and state change detection apparatus in plant equipment
US20090006399A1 (en) * 2007-06-29 2009-01-01 International Business Machines Corporation Compression method for relational tables based on combined column and row coding
US7609179B2 (en) * 2008-01-08 2009-10-27 International Business Machines Corporation Method for compressed data with reduced dictionary sizes by coding value prefixes
US7683809B2 (en) * 2008-04-11 2010-03-23 Aceurity, Inc. Advanced lossless bit coding
JP5169495B2 (en) * 2008-05-30 2013-03-27 東洋製罐株式会社 Compression molding die and compression molding apparatus
JP5303213B2 (en) * 2008-07-23 2013-10-02 株式会社日立製作所 Data management method with data compression processing
US8108361B2 (en) * 2008-07-31 2012-01-31 Microsoft Corporation Efficient column based data encoding for large-scale data storage
JP5180782B2 (en) * 2008-11-11 2013-04-10 日本電信電話株式会社 Parallel distributed information source encoding system and parallel distributed information source encoding / decoding method
JP2011048514A (en) * 2009-08-26 2011-03-10 Panasonic Electric Works Co Ltd Data management device and authentication system
US8487791B2 (en) 2010-02-18 2013-07-16 Research In Motion Limited Parallel entropy coding and decoding methods and devices
EP2362657B1 (en) * 2010-02-18 2013-04-24 Research In Motion Limited Parallel entropy coding and decoding methods and devices
WO2012040857A1 (en) * 2010-10-01 2012-04-05 Research In Motion Limited Methods and devices for parallel encoding and decoding using a bitstream structured for reduced delay
ES2607982T3 (en) 2011-01-14 2017-04-05 Ge Video Compression, Llc Entropic encoding and decoding scheme
US10816579B2 (en) * 2012-03-13 2020-10-27 Informetis Corporation Sensor, sensor signal processor, and power line signal encoder
JP5826114B2 (en) * 2012-05-25 2015-12-02 クラリオン株式会社 Data decompression device, data compression device, data decompression program, data compression program, and compressed data distribution system
US8933829B2 (en) * 2013-09-23 2015-01-13 International Business Machines Corporation Data compression using dictionary encoding
US10235377B2 (en) * 2013-12-23 2019-03-19 Sap Se Adaptive dictionary compression/decompression for column-store databases
CN104156990B (en) * 2014-07-03 2018-02-27 华南理工大学 A kind of lossless compression-encoding method and system for supporting super-huge data window
CN104462524A (en) * 2014-12-24 2015-03-25 福建江夏学院 Data compression storage method for Internet of Things

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004349939A (en) * 2003-05-21 2004-12-09 Canon Inc Method and device for image encoding and recording device
US20090254521A1 (en) * 2008-04-04 2009-10-08 International Business Machines Corporation Frequency partitioning: entropy compression with fixed size fields

Also Published As

Publication number Publication date
JP6336524B2 (en) 2018-06-06
EP3490153A1 (en) 2019-05-29
US20190140657A1 (en) 2019-05-09
EP3490153B1 (en) 2023-11-01
WO2018021094A1 (en) 2018-02-01
EP3490153A4 (en) 2020-03-11
CN109478893A (en) 2019-03-15
US10547324B2 (en) 2020-01-28
JP2018022933A (en) 2018-02-08
EP3771104A1 (en) 2021-01-27

Similar Documents

Publication Publication Date Title
CN109478893B (en) Data compression encoding method, apparatus and storage medium
US6061398A (en) Method of and apparatus for compressing and restoring data
KR100808664B1 (en) Parity check matrix storing method, block ldpc coding method and the apparatus using parity check matrix storing method
US8094048B2 (en) Method of decoding syntax element in context-based adaptive binary arithmetic coding decoder and decoding device therefor
US7786907B2 (en) Combinatorial coding/decoding with specified occurrences for electrical computers and digital data processing systems
CN102098508A (en) Multimedia signature coding and decoding
JP5656593B2 (en) Apparatus and method for decoding encoded data
CN104468044A (en) Data compression method and device applied to network transmission
US7786903B2 (en) Combinatorial coding/decoding with specified occurrences for electrical computers and digital data processing systems
KR101617965B1 (en) Encoder of systematic polar codes
CN112332854A (en) Hardware implementation method and device of Huffman coding and storage medium
JPH11340838A (en) Coder and decoder
CN112804029A (en) Transmission method, device and equipment of BATS code based on LDPC code and readable storage medium
JP2018074604A (en) Data compression encoding method, decoding method, device thereof, and program thereof
JP6336636B2 (en) Data compression encoding method, apparatus thereof, and program thereof
CN115765756A (en) Lossless data compression method, system and device for high-speed transparent transmission
JP2010258532A (en) Circuit and method for converting bit length into code
KR20050010918A (en) A method and a system for variable-length decoding, and a device for the localization of codewords
JP2014220713A (en) Coding apparatus, decoding apparatus, method, and program
JP2007336056A (en) Encoding device, encoding method and program
CN113659992B (en) Data compression method and device and storage medium
Wei et al. Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding
Fong et al. Using a tree algorithm to determine the average synchronisation delay of self-synchronising T-codes
WO2020075277A1 (en) Decoding device, decoding method, and non-transitory computer-readable medium storing program
Pannirselvam et al. A Comparative Analysis on Different Techniques in Text Compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant