CN109478893B

CN109478893B - Data compression encoding method, apparatus and storage medium

Info

Publication number: CN109478893B
Application number: CN201780045701.9A
Authority: CN
Inventors: 铃木隆之; 柴田薰
Original assignee: Expressway Co ltd; Denso Corp
Current assignee: Expressway Co ltd; Denso Corp
Priority date: 2016-07-25
Filing date: 2017-07-18
Publication date: 2023-05-09
Anticipated expiration: 2037-07-18
Also published as: JP6336524B2; EP3490153A1; US20190140657A1; EP3490153B1; WO2018021094A1; EP3490153A4; CN109478893A; US10547324B2; JP2018022933A; EP3771104A1

Abstract

Provided are a compression encoding method, device, and program suitable for use in the case of continuously encoding fixed-length data. The compression coding method comprises the following steps: dividing a record into a column of a predetermined bit width, the record being composed of a fixed length bit string including 1 or more fields, the 1 or more fields being fields in which the same kind of data is described in the same field among predetermined fields; and solving, in the plurality of records, an occurrence probability of a bit value at the same position in the solution column for each column, and encoding the plurality of records by an entropy encoding method according to the occurrence probability.

Description

Data compression encoding method, apparatus and storage medium

Technical Field

The present embodiment described below relates to a data compression encoding method, an apparatus thereof, and a program thereof.

Background

In recent years, it has been conceived to construct the following sensor network: a plurality of wireless terminals with sensors are dispersed in a space and cooperate to acquire environmental, physical conditions. In addition, with the development of electronic control of automobiles, various in-vehicle sensor networks have been put into practical use.

Fig. 1 is a schematic diagram schematically illustrating these sensor networks. For example, in the sensor network 1, data detected by the sensor 2a and the like is transmitted to the processing device 4 via the sensor node 5 and the gateway 3. When the data acquired by the sensors 2a, 2b, and 2c are transmitted to the processing device 4, the transmitted data tends to have a fixed data size. In addition, in the example of fig. 1, the data compression device is at the sensor node.

The data sequence in which the data whose capacity is predetermined, such as the state of the environment, detected by each sensor is arranged in a specific arrangement order is referred to as a record. In this case, 1 record is fixed-length data composed of a bit string of a fixed length. In the sensor network, data such as an environmental state detected at a sensor time is continuously output as a record. Here, the sensor includes a temperature sensor, a humidity sensor, a pressure sensor, a rotational speed sensor, a wind speed sensor, a flow rate sensor, an acceleration sensor, a speed sensor, a position sensor, a sensor for detecting on/off information of a switch, and the like.

Fig. 2 is a diagram illustrating the fixed-length data example.

In the example shown in fig. 2, the case where the detection information of the sensor 2a is the number of rotation pulses and the detection information of the sensors 2b and 2c is the on/off information of the corresponding switches is shown.

The bit length of the fixed-length data transmitted and received through the sensor network 1 is set to a fixed value. The fixed-length bit data may be internally divided into fields in which the type of data to be described is specified for each predetermined number of bits. For example, fig. 2 (a) shows an example in which the fixed-length data is represented by a 10-ary number. In the example of fig. 2 (a), at the time when 26 bits are described in the initial field of the fixed-length data, the number of 14-bit rotation pulses as the output of the rotation pulse number sensor 2a is described in the next field. The next field describes 1-bit data indicating whether the detection information of the sensor 2b is on or off, and the next field describes 1-bit data indicating whether the detection information of the sensor 2c is on or off. The overall data bit length is a fixed value. In addition, in the examples of fig. 1 and 2, it is shown that 3 sensors are provided on 1 sensor node 5 of the sensor network 1. However, the types and the number of sensors provided in 1 sensor node are not limited to this, and any number of arbitrary types of sensors of 1 or more may be provided.

In fig. 2 (b), the fixed length data represented by 10-ary numbers of fig. 2 (a) is represented by 2-ary numbers. In this case, the time of 26 bits, the number of rotation pulses of 14 bits, the on/off state of the sensor 1 of 1 bit, and the on/off state of the sensor 2 are described from the start. Fig. 2 (c) is a diagram of fixed length data represented by a 2-ary number in fig. 2 (b) as consecutive bits. In this case, since the information indicating what kind of bit is indicated from the start point to the next bit is also determined in advance, the device that has received the fixed-length data can recognize the data described in the fixed-length data by sequentially reading the bits from the start point.

In the example of fig. 1 to 2, the number of rotation pulses and the on/off information of the switch are shown as the detection information of the sensor, but the sensor of the present embodiment is not limited to this, and may be, for example, a sensor that detects various detection amounts such as temperature, humidity, position, speed, acceleration, wind speed, flow speed, pressure, and the like.

Furthermore, it is also not necessary to limit the transmitted and received data to the detection information of the sensor. The present invention is not limited to the detection information of the sensor, and can be applied to data sequentially transmitted from a transmission source.

In the case of continuously transmitting such a record of a fixed length, the following method is sometimes used: the data of a certain degree of components is stored, and the data capacity is reduced by the existing compression technique and then transmitted, and decompressed at the receiving side.

In this case, when the accumulation amount is not very large, the compression efficiency is not high, and therefore, if the compression efficiency is prioritized, a delay in the accumulation time is generated. Therefore, when the timeliness is required, the transmission may be performed without compression. However, if transmitted without compression, the data transfer amount is large compared to the case of compression.

As conventional techniques for data compression, there are techniques disclosed in patent documents 1 to 8 and non-patent document 1, but none of them describes a compression encoding method of data suitable for use in encoding data of a fixed length.

Prior art literature

Patent literature

Patent document 1: japanese patent laid-open No. 2007-214998

Patent document 2: U.S. patent publication No. 2011/0200104

Patent document 3: japanese patent application laid-open No. 2014-502827

Patent document 4: japanese patent laid-open No. 2010-26884

Patent document 5: japanese patent laid-open No. 2007-214813

Patent document 6: international publication No. 2013/175909

Patent document 7: japanese patent laid-open No. 2007-221280

Patent document 8: japanese patent laid-open No. 2011-481514

Non-patent literature

Non-patent document 1: lossless compression handbook, academic press,2002/8/15, ISBN-10:01620811, ISBN-13:978-0126208610

Disclosure of Invention

Problems to be solved by the invention

Accordingly, in an embodiment according to one aspect of the present invention, an object is to provide a data compression encoding method, an apparatus thereof, and a program thereof, which are suitable for encoding fixed-length data and then decoding the same.

Means for solving the problems

The data compression encoding of one aspect of the present invention comprises the steps of: dividing a record into a predetermined bit-width column so as not to be related to the boundary of the field, the record being composed of a fixed-length bit string including 1 or more fields in which the same kind of data is described in the same field among predetermined fields; and solving the occurrence probability of the bit value at the same position in the solution column according to each column in the plurality of records, and performing entropy coding on the plurality of records according to the occurrence probability.

In addition, in the data compression encoding according to another aspect of the present invention, each sensor data input from 1 or more sensors is combined with a record composed of a fixed-length bit string, and the record is compression encoded to output the sensor data, wherein the following steps are repeated for a number corresponding to the amount of a predetermined number of records: the record is divided into columns of a predetermined bit width, the probability of occurrence of a bit value at the same position in the division is obtained for each column among a plurality of records inputted before the time, the columns constituting the record are encoded by entropy encoding based on the probability of occurrence, and the encoded columns are combined and outputted.

That is, a predetermined number of sensor data inputted from 1 or more sensors sequentially and serially are combined to obtain a fixed-length bit string, which is regarded as virtual table data, and the virtual table data is compressed in the column direction.

The entropy encoding is an encoding scheme in which codes having a large occurrence probability are assigned a short code length and codes having a small occurrence probability are assigned a long code length to compress the codes. As typical symbols used for entropy coding, huffman coding, arithmetic coding, and the like are known.

There are various modes such as adaptive huffman coding and Canonical Huffman Codes (normal huffman coding), and in arithmetic coding, there are known various modes such as adaptive arithmetic coding and Q-encoder and section encoder.

Effects of the invention

According to the embodiments according to one aspect of the present invention, it is possible to provide a data compression encoding method, an apparatus thereof, and a program thereof, which are suitable for use in encoding fixed-length data.

Drawings

Fig. 1 is a schematic diagram schematically illustrating a sensor network.

Fig. 2 is a diagram illustrating an example of fixed-length data.

Fig. 3 is a diagram illustrating column division by the encoding method according to the present embodiment.

Fig. 4A is a diagram showing an example of a functional block configuration of the data compression encoding device according to the present embodiment.

Fig. 4B is a diagram showing another example of the functional block configuration of the data compression encoding device according to the present embodiment.

Fig. 5A is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4A.

Fig. 5B is a diagram showing an example of the functional block configuration of a decoding apparatus corresponding to the data compression encoding apparatus shown in fig. 4B.

Fig. 6 is a flowchart for explaining a data compression encoding method according to the present embodiment using an adaptive entropy encoding method.

Fig. 7 is a flowchart for explaining a data compression encoding method according to the present embodiment using an accumulated entropy encoding method in general.

Fig. 8 is a flowchart illustrating a cumulative Huffman (Huffman) coding method.

Fig. 9 is a flowchart illustrating the cumulative huffman decoding method.

Fig. 10 is a flowchart illustrating an adaptive huffman coding method.

Fig. 11 is a flowchart illustrating an adaptive huffman decoding method.

Fig. 12 is a flowchart illustrating an adaptive arithmetic coding method.

Fig. 13 is a flowchart illustrating an adaptive arithmetic decoding method.

Fig. 14A is a diagram illustrating a record group for explaining the cumulative huffman coding method according to the present embodiment by way of specific example.

Fig. 14B is a diagram illustrating a coding dictionary for explaining the cumulative huffman coding method according to the present embodiment by way of specific example.

Fig. 14C is a diagram for explaining encoded data by the accumulated huffman encoding method according to the present embodiment by way of specific example.

Fig. 15A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (1 thereof).

Fig. 15B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (2 thereof).

Fig. 16A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (3 thereof).

Fig. 16B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (4 thereof).

Fig. 17A is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (5 thereof).

Fig. 17B is a diagram illustrating the adaptive huffman coding method according to the present embodiment by way of specific example. (6) thereof.

Fig. 18A is a diagram (1) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 18B is a diagram (2) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 19A is a diagram (3) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 19B is a diagram (4) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 20A is a diagram (5) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 20B is a diagram (6) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided in units of 1 bit.

Fig. 21A is a diagram (7) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.

Fig. 21B is a diagram (8) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.

Fig. 22A is a diagram (9) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.

Fig. 22B is a diagram (10) illustrating a specific example of the data compression encoding method according to the present embodiment in which columns are divided by 1 bit.

Fig. 23A is a diagram for explaining, by way of specific example, the creation of an encoding dictionary based on a decoding method for decoding encoded data encoded by the cumulative huffman encoding method according to the present embodiment.

Fig. 23B is a diagram illustrating, by way of specific example, decoding of encoded data encoded by the accumulated huffman encoding method of the present embodiment.

Fig. 24A is a diagram (1) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 24B is a diagram (2) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 25A is a diagram (3) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 25B is a diagram (4) illustrating a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 26A is a diagram (5) illustrating a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 26B is a diagram (6) for explaining a decoding method for decoding encoded data encoded by the adaptive huffman encoding method according to the present embodiment, by way of specific example.

Fig. 27A is a diagram (1) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 27B is a diagram (2) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 28A is a diagram (3) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 28B is a diagram (4) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 29A is a diagram (5) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 29B is a diagram (6) for explaining a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 30A is a diagram (7) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 30B is a diagram (8) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 31A is a diagram (9) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 31B is a diagram (10) illustrating a decoding method for decoding encoded data encoded by the adaptive arithmetic coding method according to the present embodiment, by way of specific example.

Fig. 32 is a hardware environment view of an exemplary computer executing a program when the present embodiment is installed in the program.

Detailed Description

Fig. 3 is a diagram illustrating column division according to the present embodiment.

Fig. 3 shows an example of 1 record of fixed-length data constituted by a fixed-length bit string. The record is composed of fields of the bit positions and bit widths that have been determined, and data is described in fields 1 to n. In the present embodiment, the records are divided into columns made up of a predetermined bit width. For example, in the case of fig. 3, column 1 is composed of 1 to a1 bits, column 2 is composed of a1+1 to a2 bits, column 3 is composed of a2+1 to a3 bits, and then, similarly, column m is composed of am-1+1 to am bits. a1 to am may be the same value or may be different values. In addition, columns may be divided according to the positions and bit widths of the fields, or may be divided irrespective of the widths and positions of the fields. The bit width of a column may be, for example, 1 bit, 2 bits, 4 bits, 8 bits, 16 bits, or the like.

The data of the fixed length is also constituted by adding "0" to the rear of the data to adjust the data length, and the variable length data is constituted by a field storing the same data, and when no data is recorded in the rear data, the data length is set to a fixed value by adding "0" to the data, and even in this case, the method of the present embodiment can be applied. As described above, in the present embodiment, the record constituted by the fixed-length bit string of the fixed-length data is constituted by the data having different meanings described in the plurality of specified fields, and the data described in the field at the same position in each record is made the same kind of data. Further, by dividing the record into blocks of an arbitrary number of bits, i.e., columns, and sequentially encoding the columns in the column direction independently of each other, compression encoding more effective than the conventional encoding method is achieved. That is, in the present embodiment, one record is encoded by successively encoding each column for each column of the same position of a plurality of records.

Here, encoding columns in a mutually independent manner means that the encoding process does not depend on the data of different columns. Further, a field is a data storage location within fixed-length data in which a piece of data is stored, and the meaning of the piece of data stored in each field has been determined. The fixed length data is composed of data stored in 1 or more fields. Columns are partitioned for fixed length data, but the data stored in a column need not necessarily be a piece of meaningful data. If a column is divided so as to span a field, one field may be divided into a plurality of columns, or the like, to be simply divided into pieces of data. However, the column dividing method is the same in a plurality of fixed-length data in which the same column indicates the same portion of the data segment.

Fig. 4A is a diagram showing an example of a functional block configuration of the data compression encoding device according to the present embodiment. As shown in fig. 4A, after the input record is divided into columns by the dividing unit 10, data of each column is temporarily stored in each of the column registers 11-1 to 11-m, and then compression-encoded individually for each column by each of the column encoding units 12-1 to 12-m. The compression-encoded data of each column is converted into 1 data stream by the mixing unit 13, and is output as 1 recorded encoded data.

Although the individual encoding units 12-1 to 12-m are provided for each column, the present invention is not limited to this, and the compression encoding process may be performed in a time-division manner so that 1 encoding unit performs compression encoding for each column individually. As in the example of fig. 1, the data compression encoding device of the present embodiment is provided in a sensor node, for example.

The compression encoding method used by the data compression encoding device having the functional block configuration shown in fig. 4A may be, for example, an entropy encoding method including huffman encoding. When the entropy coding method is used for the column coding units 12-1 to 12-m, as shown in fig. 4A, a frequency table and a coding table are stored in each of the column coding units 12-1 to 12-m.

The compression encoding method according to this embodiment is particularly effective when the fixed-length bit string is composed of a plurality of pieces of independent information. Even if the boundaries of fields including the independent information of the fixed-length bit strings are disregarded when dividing the columns, the average data amount after compression encoding can be reduced by ignoring the correlation between columns.

Fig. 4B is a diagram showing another example of the functional block configuration of the data compression encoding device according to the present embodiment. The example shown in fig. 4B is a case where arithmetic coding is used.

As shown in fig. 4B, in the case of encoding in arithmetic encoding, the division unit 10a divides the data for each column for the record input, and the data for each column is stored in the column registers 11a-1 to 11 a-m. Then, the column division range determination units 12a-1 to 12a-m calculate the occurrence probability from the frequency of the read data values in each column, and determine the value for dividing the current section corresponding to the column for each column. Then, the section dividing means calculates a section corresponding to the next column from the calculated value and the column value.

That is, when the column division range determining unit 12a-1 of the column 1 completes the processing, the section dividing unit 18-1 divides the section corresponding to the column 2 according to the arithmetic coding method based on the data of the column 1 and the result of processing the data of the column 1. Next, the column division range determination unit 12a-2 of column 2 determines the value of the section dividing column 2 based on the occurrence probability of the data of column 2, and the section division unit 18-2 divides the section required for the next column 3 based on the result and the data of column 2. In the same manner, the above-described process is repeated up to column m. Then, the encoding section 19 encodes the record inputted based on the value that minimizes the binary representation included in the section as the section division result of the section dividing section 18-m, to obtain the encoded data output.

When encoded data encoded by the data compression encoding apparatus of fig. 4A is input, the dividing unit 16 divides the encoded data into columns. Then, the plurality of decoding units 14-1 to 14-m decode the encoded data of each column. In this case, the decoding units 14-1 to 14-m perform decoding by referring to the frequency table and the encoding tables 15-1 to 15-m provided for each column of the data before encoding according to a specific encoding method. For example, when the encoding method is huffman encoding, the encoded data is sequentially read, and the symbol of the decoded data is generated by referring to a frequency table and an encoding table obtained by providing the symbol scheme of the encoded data for each of columns 1 to m.

Then, the decoded data decoded for each column is combined by the mixing unit 17, and the decoded record is output.

In the case of decoding of arithmetic encoding shown in fig. 5B, the record subjected to encoding is input to the column division range determining unit 20-1 of column 1. Then, the column division range determination units 20a-1 to 20a-m calculate the occurrence probability from the frequency of the decoded data values in each column, and calculate the value for dividing the current section corresponding to the column. Then, the column 1 decoding units 14a-1 to column m decoding units 14a-m compare the value dividing the current section corresponding to each column with the value of the encoded data to obtain the decoded data of the column. Further, a section corresponding to the next column is obtained by a section dividing means based on the decoded data and the previously obtained value for dividing the current section. The mixing section 17a combines the decoded data of the column 1 decoding sections 14a-1 to column m decoding sections 14a-m to output a decoded record.

Fig. 6 is a flowchart for explaining a data compression encoding method according to the present embodiment using an adaptive entropy encoding method. In the adaptive coding method, compression coding is sequentially performed with input data.

First, in step S10, a frequency table used for entropy encoding is initialized. The frequency table is obtained by counting how many times a certain symbol appears in the encoded data. The frequency table itself is a frequency table conventionally used for entropy encoding. In this embodiment, the present invention is characterized in that the symbols present in the columns at the same position of the plurality of records are counted. As initialization, for example, all items are set to 0.

Next, in the loop of step S11, the process of step S12 is repeated for the number of columns corresponding to 1 record. In step S12, a coding table is created from the frequency table. In the case of huffman coding, the coding table is a huffman coding dictionary, and in the case of arithmetic coding, the coding table is an occurrence probability and is a table used in the case of actually replacing original data with coding information.

When the repetition process of the number of times corresponding to the number of columns in step S11 is completed, the flow advances to step S13. In the first process of step S11, a coding table is created from the frequency table initialized in step S10.

In step S13, 1 record is read as a fixed-length bit string. Next, in step S14, the records are divided into columns according to a predetermined method. In step S14a, encoding is performed for each column, and in step S15, the encoded data for each column is mixed to obtain 1 recorded compressed encoded data. In step S16, compression-encoded data of 1 recorded amount is output. At the end of data output for the 1 recorded amounts of all records, compression encoding of the input data is completed.

Next, after step S16, the process proceeds to step S17, and the process of step S18 is repeated for the number of columns. In step S18, the frequency table is updated. In this case, the frequency table is independent for each column, and has the number corresponding to the number of columns. The update of the frequency table is performed not by using data of other columns, but by sequentially encoding records for predetermined columns of records and updating the frequency table based on data of corresponding columns of the previous records.

When the loop processing of step S17 is completed, the routine returns to step S11, and a coding table is created from the frequency table of each column updated in the loop processing of step S17, and the routine proceeds to step S13, where the coding processing of the next record is entered. When there is no record to be processed, compression encoding is completed.

In the following, several modes corresponding to the entropy encoding mode will be described in more detail by way of specific examples.

Fig. 7 is a flowchart for explaining a data compression encoding method according to the present embodiment using an accumulated entropy encoding method in general. In the accumulating type encoding method, data to be compression-encoded is once read in its entirety, and then compression-encoded. That is, the encoded data is once read all and the frequency table is completed, and then the data is read again and encoded.

First, in step S19, a frequency table is initialized. In the loop of step S20, the repetition process is performed for all records of the data to be encoded for the number of times corresponding to the number of records. In step S21, 1 record is read in, and in step S22, the records are divided into columns by a predetermined method. In the loop of step S23, step S24 is repeatedly processed for the number of columns. In step S24, the frequency table provided for each column is updated. When the repetition process of the number of times corresponding to the number of columns in step S23 is completed, it is determined whether or not the repetition process of the number of times corresponding to the number of records in step S20 is completed, and if not, the repetition process is continued, and if so, the process proceeds to step S25. At the time of reaching step S25, since the update of the frequency table is completed for all the data to be encoded, the frequency table is outputted, and the flow proceeds to step S26.

In step S26, the process of step S27 is repeated for the number of columns. In step S27, a coding table is created from the frequency table. In the case of huffman coding, the coding table is a huffman coding dictionary, and in the case of arithmetic coding, the coding table is an occurrence probability and is a table used in the case of actually replacing original data with coding information. When the repetition process of the number of times corresponding to the number of columns in step S26 is completed, the flow advances to step S28.

In step S28, the processing is repeated for the number of times corresponding to the number of records included in the data to be encoded. In step S29, 1 record is read, and in step S30, the records are divided according to a predetermined method. In step S31, compression encoding is performed for each column, and in step S32, the compression encoded data are mixed to obtain 1 record of compression encoded data. In step S33, 1 recorded amount of data is output. In the loop processing in step S28, when the repetition processing of the number of times corresponding to the number of records is completed, the processing is ended.

Here, for example, in the case where the data to be compression-encoded is fixed-length data received from a sensor or the like, the number of records of the data to be compression-encoded depends on how much the data is summarized and compression-encoded. The capacity of data to be compression-encoded to be summarized depends on the capacity of a memory or the like of the encoding apparatus, but this should be appropriately determined by those skilled in the art using the present embodiment. Further, compression encoding is repeatedly performed to collect the data according to the case where the data is sequentially transmitted from the transmission source.

Fig. 8 and 9 are flowcharts illustrating the accumulated huffman coding and decoding method in more detail.

In the cumulative huffman coding method shown in fig. 8, the frequency table is initialized in step S40. In the loop of step S41, the processing in the period of step S41 is repeated for the number of times corresponding to the number of records. In step S42, 1 record is read, and in step S43, the records are divided into columns according to a predetermined method. In the loop of step S44, step S45 is repeated a number of times corresponding to the number of columns. In step S45, the frequency table is updated for each column. When the frequency table of all columns is updated, the frequency table is output in step S46, and the routine advances to the loop of step S47.

In the loop of step S47, the process of step S48 is repeated a number of times corresponding to the number of columns. In step S48, a coding table is created from the frequency table.

Next, in the loop of step S49, the processing in the period of step S49 is repeated for the number of times corresponding to the recording. In step S50, 1 record is read in. In step S51, the records are divided into columns according to a prescribed method. In the loop of step S52, the process of step S53 is repeated a number of times corresponding to the number of columns. In step S53, column data is encoded. Next, in step S54, the encoded data obtained in the loop of step S52 is mixed into 1 record. In step S55, 1 recorded amount of data is output. When the processing of the amount corresponding to the number of records is completed, the processing is ended.

In the cumulative huffman decoding method shown in fig. 9, the frequency table is read in step S60. In the loop of step S61, step S62 is repeated a number of times corresponding to the number of columns. In step S62, a coding table is created from the frequency table. In the loop of step S63, the processing in step S63 is repeated a number of times corresponding to the number of records. In step S64, 1 recorded amount of encoded data is read. In the loop of step S65, step S66 is repeated a number of times corresponding to the number of columns. In step S66, the column data is decoded according to the encoding table created in step S62. In step S67, the decoded data of each column is mixed into 1 record. In step S68, 1 recorded amount of data is output. When the processing of the amount corresponding to the number of records is completed, the processing is ended.

Fig. 10 and 11 are flowcharts for explaining the adaptive huffman coding and decoding method.

In the adaptive huffman coding method shown in fig. 10, a frequency table is initialized in step S70. In the loop of step S71, the process of step S72 is repeated a number of times corresponding to the number of columns. In step S72, in the initial processing, a coding table is created from the frequency table initialized in step S70, and after that, a coding table is created from the frequency table updated in step S80. In step S73, 1 record is read in. In step S74, the records are divided into columns according to a prescribed method. In the loop of step S75, the process of step S76 is repeated a number of times corresponding to the number of columns. In step S76, the column data is encoded according to the encoding table created in step S72. In step S77, the encoded data of each column is mixed by 1 recorded amount. In step S78, 1 recorded amount of data is output. In the loop of step S79, the process of step S80 is repeated a number of times corresponding to the number of columns. In step S80, the frequency table of each column is updated. When the number of times of repeated execution corresponding to the number of columns is completed, the routine returns to step S71 to create a coding table, and the subsequent recording process in step S73 and subsequent steps is repeated.

The adaptive huffman decoding method shown in fig. 11 decodes data encoded by the adaptive huffman encoding method shown in fig. 10. The decoding of the encoded data is performed by performing reverse analysis on the encoding table used for encoding, and obtaining the original column data from the encoded data. Therefore, in the flow shown in fig. 11, the step of encoding column data and the step of mixing encoded data in the flow shown in fig. 10 are replaced with the step of decoding column data and the step of mixing decoded data, the 1-record reading step is replaced with the encoded data reading step of 1-record amount, and the encoded data outputting step is replaced with the decoded record outputting step.

As shown in fig. 11, in step S85, the frequency table is initialized. In the loop of step S86, the process of step S87 is repeated a number of times corresponding to the number of columns. In step S87, the initial processing creates a coding table from the frequency table initialized in step S85, and thereafter creates a coding table from the frequency table updated in step S94. In step S88, 1 recorded amount of encoded data is read in. In the loop of step S89, the process of step S90 is repeated a number of times corresponding to the number of columns. In step S90, the column data is decoded based on the encoding table created in step S87. In step S91, the decoded data of each column is mixed by 1 recorded amount. In step S92, 1 recorded amount of data is output. In the loop of step S93, the process of step S94 is repeated a number of times corresponding to the number of columns. In step S94, the frequency table of each column is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S86, a coding table is created, and the processing of the subsequent recording after step S88 is repeated.

Fig. 12 and 13 are flowcharts for explaining the adaptive arithmetic coding and decoding method. The adaptive arithmetic coding device and the decoding device can be realized by a computer by using a program that causes an algorithm shown in these flowcharts to be executed, corresponding to the configuration of the functional blocks described previously with reference to fig. 4B and 5B.

In the adaptive arithmetic coding method shown in fig. 12, in step S95, a frequency table is initialized. In the loop of step S96, the process of step S97 is repeated a number of times corresponding to the number of columns. In step S97, in the initial processing, an occurrence probability table is created from the frequency table initialized in step S95, and thereafter, an occurrence probability table is created from the frequency table updated in step S106. In step S98, 1 record is read in. In step S99, the records are divided into columns according to a prescribed method. In step S100, the section is initialized. In the loop of step S101, the process of step S102 is repeated a number of times corresponding to the number of columns. In step S102, the intervals are divided according to an arithmetic coding method. In step S103, encoded data is generated from the section finally obtained in the loop of step S101. In step S104, the encoded data is outputted as 1 recorded amount of encoded data. In the loop of step S105, the process of step S106 is repeated a number of times corresponding to the number of columns. In step S106, the frequency table is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S96, an occurrence probability table is created, and the processing of the subsequent recording after step S98 is repeated.

The adaptive arithmetic decoding method shown in fig. 13 decodes data encoded by the adaptive arithmetic encoding method shown in fig. 12.

As shown in fig. 13, in step S110, a frequency table is initialized. In the loop of step S111, the process of step S112 is repeated a number of times corresponding to the number of columns. In step S112, an occurrence probability table is created from the frequency table. In step S113, 1 recorded amount of encoded data is read. In step S114, the section is initialized. In the loop of step S115, the processing of step S116a, step S116, and step S117 is repeated for the number of columns. In step S116a, the occurrence probability is calculated from the frequency of the decoded data value in each column, and the value dividing the current section corresponding to the column is obtained. In step S116, the value of the current section corresponding to each column is compared with the value of the encoded data to obtain the decoded data of the column. In step S117, a section corresponding to the next column is obtained from the decoded data obtained in step S116 and the value for dividing the current section is obtained in step S116 a. In step S118, the column-decoded data obtained in step S116 is mixed by 1 recorded amount. In step S119, 1 recorded amount of data is output. In the loop of step S120, the process of step S121 is repeated a number of times corresponding to the number of columns. In step S121, the frequency table of each column is updated. When the processing of the number of times corresponding to the number of columns is completed, the flow returns to step S111, an occurrence probability table is created, and the processing of the subsequent recording after step S113 is repeated.

While the data compression encoding method and decoding method of the present embodiment have been described above with reference to fig. 6 to 13, the data compression encoding apparatus and decoding apparatus of the present embodiment may be installed on a computer by using a program that uses an algorithm shown in the flowcharts described in these drawings.

Next, data compression encoding/decoding according to the present embodiment will be described with reference to specific examples of recording.

Fig. 14A to 22B show processing examples of the data compression encoding method according to the present embodiment.

Fig. 14A to 14C are diagrams for explaining the cumulative huffman coding method according to the present embodiment by way of specific example. In the example shown in fig. 14A to 14C, 10 records are accumulated and then compression-encoded uniformly.

Illustrated in fig. 14A is a record group 20 made up of 10 records of fixed length 8 bits. Each record is divided into, for example, column 1 and column 2 having a bit width of 4 bits. In the following description of encoding according to another embodiment, the record group 20 is used as a record group to be encoded.

Fig. 14B illustrates an example of the encoding dictionary 25 in the case of using huffman encoding. The conventional huffman coding method can be referred to non-patent document 1. In the present embodiment, the encoding dictionary 25 is provided separately for each column. The same coding dictionary is used for the same column. In the case of fig. 14A to 14C, 1 record is divided into 2 columns, and therefore, 2 encoding dictionaries are also provided.

In fig. 14B, reference code 21 shows data that may appear in each column. That is, 1 column is composed of 4 bits, and thus there are 16 kinds of arrangements of 0 and 1. Thus, to include all combinations of these bits, the encoding dictionary 25 is composed of 16 rows.

The data shown by reference code 22 is obtained by solving the number of occurrences of each bit pattern in record group 20. The data of the occurrence probability of each data solved based on the number of occurrences is shown by reference code 23, and reference code 24 shows the self-information entropy. The occurrence probability 23 is obtained by dividing the number of occurrences 22 by the number of records. For example, in the coding dictionary on the left side of the coding dictionary shown by reference code 25, the number of occurrences of "0010" is 7, and the total number of records is 10, so that the occurrence probability 23 is 7/10=0.7. Further, when the self-information entropy 24 is S and the occurrence probability 23 is p, s= -log (p). The coding is performed according to the occurrence probability 23 or the self-information entropy 24.

The data indicated by the reference code 27 is encoded data of each column obtained by the above encoding. By combining the huffman coding, coded data obtained by compression-coding the record is obtained. The data shown by the reference code 26 of fig. 14C is encoded data corresponding to each record of the record group 20. When the record group 20 and the encoded data 26 are compared, it can be judged that the data amount is reduced, but in this method, it is necessary to refer to an encoding dictionary used in compression encoding at the time of decoding, and therefore it is necessary to separately transmit and receive the frequency table of the reference code 22 (or the encoding dictionary of the reference code 25). In the case of the accumulation type illustrated in fig. 14A to 14C, it is suitable to perform compression encoding by summarizing records to some extent.

In the description of fig. 6 and 7, the case where the frequency table and the encoding table are independent tables is described, but in the example of fig. 14A to 14C, a structure in which the frequency table is included in the encoding table is adopted.

Fig. 15A to 17B are diagrams for explaining the adaptive huffman coding method according to the present embodiment by way of specific examples. In the adaptive encoding/decoding method, the occurrence probability or the occurrence frequency does not need to be obtained in advance, and the encoding can be performed immediately at the time of generating the recorded data. In addition, the encoded information can be decoded instantaneously.

Shown in fig. 15A are the initial state code table 25 shown by the reference code 30-1, the record group 20, and the original recorded code data 31-1. The input recording group 20 is the same as the recording group 20 shown in fig. 14. The structure of the code table 25 is the same as the structure of the code dictionary 25 shown in fig. 14. The same item is labeled with the same reference code only in fig. 15A. The laplace smoothing is applied to the frequency table 22 included in the encoding table 25 in the initial state so that all of them have the same value "1". The occurrence probability, the self-information entropy, and the Huffman code are solved based on the frequency, and the first record is encoded using the code. The encoding result is the same value as the input record as indicated by the encoded data 31-1. In the initial state, all frequencies are equal, and therefore, the compression effect is not obtained.

Next, the frequency table is updated based on the initial record. The frequency of the items corresponding to the data appearing is increased by a predetermined value. As shown in fig. 15B, the number of times of generation of "0010" increases by 1 in the left column, and the number of times of generation of "1000" increases by 1 in the right column. The data obtained by re-solving the occurrence probability and the self-information entropy from the frequency table is the code table 25 shown as 30-2, and the data obtained by solving the huffman code is represented by a bold word in the code data 31-2. In the encoded data 31-2, it is shown that: the compression effect is exhibited compared to the initial record in which the compression effect is not obtained.

Next, "0010" appears again in the left column and "1000" appears again in the right column of the 3 rd record, respectively, as shown in fig. 16A, and therefore, the entry of "0010" in the frequency table on the left side and the entry of "1000" in the frequency table on the right side of the coding table shown as 30-3 are updated to 3. The result of Huffman encoding based on this frequency table is shown in encoded data 31-3.

In fig. 16B, "0010" appears in the left column of the 4 th record, and "1100" appears in the right column, and therefore, in the frequency table of the coding table shown as 30-4, the entry of "0010" is updated to 4 in the frequency table on the left side. In the right frequency table, the entry "1100" appears for the first time, but is not updated since the initial value is 1. The result of Huffman encoding based on this frequency table is shown in encoded data 31-4.

In fig. 17A, "1010" appears in the left column and "1000" appears in the right column of the 5 th record, and therefore, in the frequency table of the coding table shown as 30-5, the entry of "1010" is maintained at the initial value of 1 in the frequency table on the left side. In the right frequency table, the entry of "1000" is updated to 4. The result of Huffman encoding based on this frequency table is shown in encoded data 31-5.

In fig. 17B, "0010" appears in the left column and "1000" appears in the right column of the 6 th record, and therefore, in the frequency table of the coding table shown as 30-6, the entry of "0010" is updated to 5 in the frequency table on the left side. In the right frequency table, the entry of "1000" is updated to 5. The result of Huffman encoding based on this frequency table is shown in encoded data 31-6.

The encoding is sequentially performed by repeating the processing in this way. In fig. 17A and 17B, the encoding table up to 6 records is described, but all records can be encoded by updating the frequency table in the same manner, repeatedly solving the occurrence probability, the self-information entropy, and the huffman encoding and encoding.

In this way, when the adaptive coding method is used, the transmission/reception coding dictionary is not required, and therefore, even with data having a small number of records, a compression effect can be obtained.

Fig. 18A to 22B are diagrams for explaining a data compression encoding method according to the present embodiment in units of 1 bit by way of specific example.

By this method, the memory capacity for recording the frequency table at the time of encoding and decoding can be reduced.

When divided into bit units, the coding can be performed by an arithmetic coding (Arithmetic coding) method. Further, since the frequency is updated while encoding in sequence in the column direction, an adaptive binary arithmetic encoding method is used. The arithmetic coding method itself can be a conventionally known method. If necessary, non-patent document 1 can be referred to.

The input record group 20 is the same data as the record group 20 shown in fig. 14A, but is divided into 1-bit units by columns.

The upper part of table 40-1 shown in fig. 18A is the frequency, and the lower part is the occurrence probability corresponding to the frequency. The same applies to fig. 18A to 22B below. Table 40-1 is a table of initial states. In this case, the frequency of each of the data "0" and the data "1" is required, but only the frequency of "0" is recorded in table 40-1. The frequency in the case of "1" is not recorded, but a column of the total recorded number 41-1 is set. The frequency of "1" can be obtained by subtracting the frequency of "0" from the total number of records. Regarding the initial value, the frequency of "0" is still set to 1 and the total number of records is set to 2 using laplace smoothing. The occurrence probability of "0" obtained from this frequency is shown in the lower part of table 40-1. The occurrence probability can be obtained by using the frequency and the total number of records. In addition, the occurrence probability of "1" can be calculated by (occurrence probability of 1- ("0").

Arithmetic coding is performed based on the occurrence probability. Here, in the present embodiment, the probability (frequency) of occurrence that is independent for each column (independent for each bit in this example) is used. The arithmetic coding result of the 1 st record is shown in the coded data 42-1. The value of the section obtained by arithmetic coding is described on the right side of the coded data 42-1. The decimal part in the form of a 2-ary number of the numerical value that can be expressed by the shortest number of digits included in the section is the result of arithmetic coding. In this example, 0.00101 (2-digit) =0.15625 (10-digit), and thus the result is "00101". In general, in the case of arithmetic coding, even if the last "0" of the coding result is omitted, decoding can be performed, and therefore, normally, the last "0" is omitted here. In addition, in the present embodiment, as the encoding result, the column division is performed in units of bits, and therefore, the frequency is not related to the frequency of other bits in the record, but the frequency of occurrence of bits determined by the positions of the bits is counted between different records, as in the case of the frequency of occurrence of bit 1 corresponding to bit 1 and the frequency of occurrence of bit 2 corresponding to bit 2. Therefore, the occurrence probability is obtained by dividing the number of "0" s appearing at the position of the prescribed bit by the number of records to be processed. The occurrence probability of "1" is obtained by subtracting the occurrence probability of "0" from 1.

The frequency of occurrence and probability of occurrence of the 2 nd record updated after the encoding of the 1 st record are shown in table 40-2 of fig. 18B. Since only the frequency of "0" is solved, in table 40-2, the frequency of "0" is increased by 1 only at the position where "0" appears in the 1 st record. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-2 is increased to 3. The occurrence probability obtained from the frequency and the total number of records is described in the lower part of the frequency table 40-2. The result of arithmetic coding based on the occurrence probability is shown in association with the 2 nd line and the 2 nd record of the coded data 42-2. It is known that the value of the arithmetic-coded section changes. The binary form of the minimum number of bits included in this section is 0.01 (binary) =0.25 (decimal), and thus the result of encoding is "01".

The frequency of occurrence and probability of occurrence of the 3 rd record updated after the encoding of the 2 nd record are shown in table 40-3 of fig. 19A. In table 40-3, in the 2 nd record, the frequency of occurrence of "0" is increased by 1 only at the position where they are 3, respectively. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-2 is increased to 4. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-3. The result of arithmetic coding based on the occurrence probability is shown in association with the 3 rd line and the 3 rd record of the coded data 42-3. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".

The frequency of occurrence and probability of occurrence of the 4 th record updated after the encoding of the 3 rd record are shown in table 40-4 of fig. 19B. In table 40-4, in the 3 rd record, the frequency of occurrence of "0" is increased by 1 only at the position, and they are respectively 4. The frequency of the positions of the 3 rd bit and the 5 th bit of the "1" appearing is kept as an initial value. Further, the total recorded number 41-4 is increased to 5. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-4. The result of arithmetic coding based on the occurrence probability is shown in association with the 4 th line and 4 th record of the coded data 42-4. It is known that the value of the arithmetic-coded section changes. Since 0.1 (binary) =0.5 (decimal), the result of the encoding is "1".

The frequency of occurrence and probability of occurrence of the 5 th record updated after the encoding of the 4 th record are shown in table 40-5 of fig. 20A. In table 40-5, in the 4 th record, the frequency of occurrence of "0" is increased by 1 only at the position where they are 5, respectively. The frequency of the positions of the 3 rd, 5 th and 6 th bits where "1" appears in the 4 th record is kept at the previous value. In addition, the total recorded number 41-5 is increased to 6. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-5. The result of arithmetic coding based on the occurrence probability is shown in association with the 5 th line and the 5 th record of the coded data 42-5. It is known that the value of the arithmetic-coded section changes. Since 0.111 (binary) =0.875 (decimal), the result of encoding is "111".

The frequency of occurrence and probability of occurrence of the 6 th record updated after the encoding of the 5 th record are shown in table 40-6 of fig. 20B. In Table 40-6, in record 5, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the positions of the 1 st, 3 rd, and 5 th bits where "1" appears in the 5 th record is kept at the previous value. In addition, the total recorded number 41-6 is increased to 7. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-6. The result of arithmetic coding based on the occurrence probability is shown in association with the 6 th line and 6 th record of the coded data 42-6. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".

The frequency of occurrence and probability of occurrence of the 7 th record updated after the encoding of the 6 th record are shown in table 40-7 of fig. 21A. In Table 40-7, in record 6, the frequency value is incremented by 1 only at the position where "0" occurs. The frequency of the 3 rd and 5 th bit positions where "1" appears in the 6 th record is kept at the previous value. Further, the total recorded number 41-7 is increased to 8. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-7. The result of arithmetic coding based on the occurrence probability is shown in the 7 th line of the coded data 42-7 corresponding to the 7 th record. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".

The frequency of occurrence and probability of occurrence of the 8 th record updated after the encoding of the 7 th record are shown in table 40-8 of fig. 21B. In Table 40-8, in record 7, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the 3 rd and 5 th bit positions where "1" appears in the 7 th record is kept at the previous value. Further, the total recorded number 41-8 is increased to 9. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of table 40-8. The result of arithmetic coding based on the occurrence probability is shown in association with the 8 th line and 8 th record of the coded data 42-8. It is known that the value of the arithmetic-coded section changes. Since 0.01 (binary) =0.25 (decimal), the result of encoding is "01".

The frequency of occurrence and probability of occurrence of the 9 th record updated after the encoding of the 8 th record are shown in the table 40-9 of fig. 22A. In Table 40-9, in record 8, the frequency value is incremented by 1 only at the position where "0" occurs. The frequency of the positions of the 3 rd bit and the 5 th bit where "1" appears in the 8 th record is kept at the previous value. Further, the total recorded number 41-9 is increased to 10. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of tables 40 to 9. The result of arithmetic coding based on the occurrence probability is shown in association with the 9 th line and 9 th record of the coded data 42-9. It is known that the value of the arithmetic-coded section changes. Since 0.10101 (binary) = 0.65625 (decimal), the result of encoding is "10101".

The frequency of occurrence and probability of occurrence of the 10 th record updated after the encoding of the 9 th record are shown in the table 40-10 of fig. 22B. In Table 40-10, in record 9, the frequency of occurrence of "0" is increased by 1 only at the position. The frequency of the 3 rd and 4 th bit positions where "1" appears in the 9 th record is kept at the previous value. Further, the total recorded number 41-10 is increased to 11. The occurrence probability obtained from the frequency and the total number of records is shown in the lower part of the table 40-10. The result of arithmetic coding based on the occurrence probability is shown in association with the 10 th record on the 10 th line of the coded data 42-10. It is known that the value of the arithmetic-coded section changes. Since 0.101111 (binary) = 0.734375 (decimal), the result of encoding is "101111".

In this way, the frequency update and arithmetic coding are repeatedly performed to perform coding.

In the case where arithmetic coding is used in the above-described division of each bit, there are the following effects.

That is, if the entire record is regarded as 1 column, the same compression as in the prior art is performed, but in the example of the present embodiment, when the record is divided into 8 bits in units of bits, the size of the required frequency table is 8+1=9, but in the prior art, 256 is required. In addition, since the occurrence probability can be calculated from the frequency table, no separate storage is required.

When assuming that the recording length is 32 bits (33 bits in the example of the present embodiment), in the related art, 32 th power of 2= 4294967296, if the recording length is longer data, a method of regarding the recording as 1 column as a whole is impossible in reality. In the case of dividing, the method of the example of the present embodiment can obtain a higher compression effect than a method using a conventional compression technique having 1 dictionary as a whole.

In addition, when compression encoding is performed in the column direction by dividing it in units of 1 bit, the following effects are obtained. For example, in the case of dividing the information into a plurality of bits, the information for substitution for encoding must be stored in accordance with the bit pattern of the division unit, but if the information is 1 bit, it is sufficient to store in advance whether or not the 1 bit is "1", so that the capacity of the job memory required at the time of compression encoding is small. In addition, in the case of dividing into a plurality of bits, the symbol is replaced for each division unit, and compression encoding is required by an amount of 1 record, but in the case of dividing into 1 bit, compression encoding can be performed by obtaining the number of bits of 1 record and the number of bits of "1" or "0", and therefore, logic for performing compression encoding is also simplified.

Fig. 23A to 31B show processing examples of a decoding method corresponding to the data compression encoding method of the present embodiment.

Fig. 23A and 23B are diagrams illustrating a decoding method of decoding encoded data encoded by the cumulative huffman encoding method shown in fig. 14A to 14C.

It is determined in advance to decode encoded data encoded by the accumulated huffman encoding method shown in fig. 14A to 14C, that is, it is determined in advance to process an 8-bit record constituted by 2 4-bit columns. In addition, the manner of solving the huffman coding is also predetermined.

On the decoding side, an area of the decoding dictionary 50-1 shown in fig. 23A is prepared in advance. A table consisting of 2 blocks of 16 (4 th power of 2) rows was made by the above determination. The columns other than column a of the table are set as blank columns in advance.

Next, the frequency of generation of the symbol generated by encoding is read into the column b. In this case, 32 integer values are read in. Based on the frequency of occurrence, the probability of occurrence of the column c is calculated, a Huffman tree is created, and Huffman encoding is solved in the column e, thereby completing the decoding dictionary 50-1. The calculation step of huffman coding requires the same step as the coding. The decoding dictionary 50-1 is the same as the encoding dictionary 25 shown in fig. 14B.

There is also a method of generating the occurrence probability of the transmission/reception column c, not the occurrence frequency of the transmission/reception column b. In addition, the huffman code table of the reception column e may be transmitted, and in this case, it is not necessary to determine a method for solving the huffman code in advance.

Next, the encoded bit string is read, and decoded data is obtained from the decoding dictionary 50-1. Since huffman coding is a prefix code, the coded bit string can be decoded sequentially from the beginning. No special separator is required.

Fig. 23B shows a decoded record 51-2 obtained by decoding the encoded data 51-1 using the decoding dictionary 50-1. When the 1 st line of the encoded data 51-1 is observed, the encoded data is "00". When looking at columns a and e of the decoding dictionary 50-1, in the left column, the encoded data "0" corresponds to the symbol column "0010", and in the right column, the encoded data "0" corresponds to the symbol column "1000". Accordingly, the encoded data "00" becomes "00101000" after decoding. The same applies to the case described above until the 3 rd line of the encoded data 51-1.

Line 4 of the encoded data 51-1 is "010". According to the decoding dictionary 50-1, a symbol such as "01" does not exist in the left column, and therefore, the encoded data in the left column takes "0". This corresponds to "0010" after decoding. Since the encoded data of the right column is "10", the encoded data becomes "1100" when the decoding dictionary 50-1 is observed. Therefore, the decoded symbol sequence becomes "00101100". In the following, the encoded data 51-1 can be decoded in the same manner.

Fig. 24A to 26B are diagrams illustrating a decoding method of decoding encoded data encoded by the adaptive huffman encoding method illustrated in fig. 15A to 17B.

It is determined in advance to decode encoded data encoded by the adaptive huffman encoding method shown in fig. 15A to 17B, that is, it is determined in advance to process an 8-bit record composed of 2 4-bit columns. In addition, the manner of solving the huffman code is also predetermined.

On the decoding side, a table 50-2 shown in fig. 24A is prepared in advance. A table consisting of 2 blocks of 16 (4 th power of 2) rows was made by the above determination. In this method, since the transmission and reception of the frequency table are not performed in advance, the initial value of the generated frequency is calculated by using the laplace smoothing and setting all to "1" as in the encoding. As a result, the same table as the initial state encoding table 30-1 shown in fig. 15A was created.

Here, if the corresponding code is found from column e of table 50-2 at the time of reading the first encoded data "00101000" into region 51-2, column a is the decoded data. By performing the above processing on the left and right columns and combining 2 pieces of decoded data on the table 51-3, the record before encoding can be decoded. Since huffman coding is a prefix code, the coded bit string can be decoded sequentially from the beginning, and therefore no special delimiter is required.

Since the decoded data of the left column is "0010" and the decoded data of the right column is "1000", 1 is added to the frequency of the corresponding column of table 50-2. The Huffman code of the table 50-3 shown in FIG. 24B is obtained from the added frequency.

Here, the data "010101" of the 2 nd record is read in. First, the 1 st column is decoded from the e column on the left side. That is, "010" is found from the beginning of the encoded data, knowing that it corresponds to the decoded data "0010" in table 50-3. Next, the 2 nd column is decoded according to the e column on the right side. That is, the remaining part of the encoded data is "101", and therefore, when viewing the table 50-3, it is known that it corresponds to "1000". Therefore, the decoded right column data is known to be "1000". Then, the decoded symbol sequences of the left and right columns are combined to obtain "00101000". Then, table 50-3 is updated. Since huffman coding is a prefix code, no delimiter is required. By repeating this process, decoding can be performed.

In fig. 25A, the 3 rd encoded data is "001001", and therefore "001" corresponds to "0010" according to the left column of table 50-4, and "001" corresponds to "1000" according to the right column. Thus, the 3 rd decoded symbol column is "00101000".

Further, as shown in fig. 25B, the 4 th encoded data is "00100010", and therefore, it can be seen from table 50-5 that "001" in the left column corresponds to "0010", and "00010" in the right column corresponds to "1100". Therefore, the 4 th decoded symbol column is "00101100".

As shown in fig. 26A, the 5 th encoded data is "0000011", and therefore, "00000" corresponds to "1010" in the left column of table 50-6, and "11" corresponds to "1000" in the right column. Thus, the 5 th decoded symbol column is "10101000".

Since the 6 th encoded data is "0101", it is clear from tables 50 to 7 that "01" in the left column corresponds to "0010" and "01" in the right column corresponds to "1000". Therefore, the 6 th decoded symbol sequence is "00101000". By repeating the above processing, decoding of all records can be performed.

Fig. 27A to 31B are diagrams illustrating a decoding method of decoding encoded data encoded by the adaptive arithmetic encoding method shown in fig. 18A to 22B.

It is determined in advance to decode encoded data encoded by the adaptive arithmetic encoding method shown in fig. 18A to 22B, that is, it is determined in advance to process an 8-bit record constituted by 1-bit columns×8 columns. In addition, the manner of arithmetic coding is also predetermined.

On the decoding side, a table 60-1 shown in fig. 27A is prepared in advance. A table made up of 8 blocks was made by the above determination. Although it is necessary that the column data is "0" and "1" in each block, only data at the time of "0" is stored as in the case of encoding, and a column of the total encoded data 61-1 is provided. In the adaptive type, since the transmission/reception of the frequency table is not performed in advance, the occurrence probability is calculated by using the laplace smoothing and setting all the values to "1" as in the encoding of the initial value of the generated frequency, and the table obtained is the table 60-1 shown in fig. 27A.

Here, the 1 st recorded data "00101" is read into the area 61-2. In addition, since the arithmetic symbol is not a prefix code, it is necessary to use a protocol capable of determining a record separator.

When received data "00101" is interpreted as a binary fraction, encoded data 0.15625 is obtained. By determining a column value from the data and dividing the section by the same method as the arithmetic coding, as shown in table 61-3, decoded data "00101000" which is a record before coding is obtained.

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column where "0" of the decoded data "00101000" is located, resulting in table 60-2 shown in fig. 27B.

Here, the data "01" of the 2 nd record is read in.

When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as the arithmetic coding, thereby obtaining the decoded data "00101000". Hereinafter, for the sake of clarity, a processing example for acquiring the 2 nd decoded data will be described in detail.

In table 60-2, the frequency shown in fig. 27B is described by inputting the 1 st encoded data. The 2 nd code value "01" of the input is the decimal part of the binary decimal 0.01, which in decimal form is 0.25. The decoded data is sequentially solved for each bit by using the occurrence probability of "0" for each column (bit) obtained from the decimal value "0.25" and the frequency of "0" for each column after decoding. The initial value of the interval when decoding the first bit is [0, 1). The division of the section is repeated according to the occurrence probability of "0" for each column. The score value is calculated by a calculation formula of "(maximum value of section-minimum value of section) ×probability of" 0 "and minimum value of section".

First, the occurrence probability "0.667" is obtained as described in table 60-2 based on the frequency "2" and the number of records "3" of "0" described in the first column of table 60-2, and the score value of the current section is obtained by the above formula. The current section is [0,1 ] of the initial value, and therefore the calculated division value is "0.667". The process of solving the division value corresponds to the process of the column division range determination unit 20-1 shown in fig. 5B. (furthermore, the occurrence probability may be calculated in advance at the time of frequency update.)

The value of each column of the decoded record is "0" when the code value < = division value, and "1" when the symbol value > division value. In the present case, the division value is "0.667", the symbol value is "0.25", and thus the decoded bit value of the first column is "0". This process corresponds to the process of the column 1 decoding unit 14a-1 of fig. 5B. Since the bit value of the first column is "0" and the division value is "0.667", the next section is set to [0,0.667 ] which is a smaller range than the division value. This process corresponds to the process of the section dividing unit 21-1 of fig. 5B.

Next, the division value "0.444" of the current section [0,0.667 ] is solved based on the occurrence frequency of "0" in the 2 nd column of the table 60-2, and the decoded bit value in the 2 nd column is "0" based on the magnitude relation between the division value and the symbol value "0.25". Further, the next section is [0,0.444 ] based on the decoded bit value. These processing for the 2 nd column corresponds to the processing performed by the column division range determining unit 20-2, the column 2 decoding unit 14a-2, and the section dividing unit 21-2 described in fig. 5B, as with the processing for the first column.

Hereinafter, similarly, the division value "0.148" of the current section [0,0.444 ] is solved based on the occurrence frequency of "0" in the 3 rd column of the table 60-2, and the decoded bit value in the 3 rd column is "1" based on the magnitude relation between the division value and the symbol value "0.25". Further, the next section is [0.148,0.444 ] based on the decoded bit value.

The above processing is repeated for each column, whereby 1 record is decoded.

In this way, the sequentially decoded column data is mixed by the mixing unit 17a of fig. 5B to become 1 recorded decoded data.

Next, 1 is added to the number of records in table 60-2 shown in fig. 27B, and 1 is added to the frequency of the column in which "0" of the 2 nd decoded data "00101000" is located, and the occurrence probability is recalculated, and this calculation is performed in table 60-3 shown in fig. 28A.

Here, the data "01" of the 3 rd record is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described for the 2 nd recording, thereby obtaining the decoded data "00101000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-4 shown in fig. 28B.

Here, the 4 th data "1" is read in. When the received data "1" is interpreted as a binary fraction, the encoded data 0.5 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101100".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101100" is located, the calculation being performed in table 60-5 shown in fig. 29A.

Here, the data "111" of the 5 th record is read in. When the received data "111" is interpreted as a binary fraction, the encoded data 0.875 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "10101000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "10101000" is located, the calculation being performed in table 60-6 shown in fig. 29B.

Here, the data "01" of the 6 th record is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-7 shown in fig. 30A.

Here, the 7 th recorded data "01" is read in. When the received data "01" is interpreted as a binary fraction, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in table 60-8 shown in fig. 30B.

Here, the 8 th recorded data "01" is read in, and when the received data "01" is interpreted as a binary decimal, the encoded data 0.25 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00101000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00101000" is located, the calculation being performed in the table 60-9 shown in fig. 31A.

Here, the 9 th recorded data "10101" is read in. When the received data "10101" is interpreted as a binary fraction, the encoded data 0.65625 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining decoded data "00110000".

The occurrence probability is recalculated by adding 1 to the number of records and adding 1 to the frequency of the column in which "0" of the decoded data "00110000" is located, the calculation being performed in the table 60-10 shown in fig. 31B.

Here, the 10 th recorded data "101111" is read in. When the received data "101111" is interpreted as a binary fraction, the encoded data 0.734375 is obtained. The column value is determined from the data, and the section is divided by the same method as that described above, thereby obtaining the decoded data "00111100".

Fig. 32 is a hardware environment view of an exemplary computer executing a program when the present embodiment is installed as a program.

The illustrated computer 60 includes, for example, a CPU 50, a ROM 51, a RAM 52, a network interface 53, a storage device 56, a read-write drive 57, and an input-output device 59. Which are interconnected by a bus 55.

The CPU 50 executes a program in which the present embodiment is installed. The program is recorded in the storage device 56 or the portable recording medium 58, and can be executed by the CPU 50 by being developed from these media into the RAM 52.

The storage device 56 is, for example, a hard disk or the like. The portable recording medium 58 includes a magnetic disk such as a floppy disk, an optical disk such as a CD-ROM, DVD, blu-Ray, a semiconductor memory such as an IC memory, and the like, and the portable recording medium 58 is inserted into the read/write drive 57 to perform reading/writing to the portable recording medium 58. In the present embodiment, the program in which the present embodiment is installed may be recorded not only in the storage device 56 or the portable recording medium 58, but also input fixed-length data as an encoding target may be temporarily recorded in the program, and then read out to the RAM 52 and encoded.

The ROM 51 stores basic programs such as BIOS for performing communication via the bus 55 and functions of the network interface 53 and the input-output device 59. The CPU 50 executes these basic programs, thereby realizing the basic functions of the illustrated computer 60.

The input/output device 59 receives information input from a user using the illustrated computer 60, and outputs information to the user. The input-output device 59 includes, for example, a keyboard, a mouse, a touch panel, a display, a printer, and the like.

The network interface 53 is used for the computer 60 for illustration to communicate with other computers or network devices and the like via the network 54. In the present embodiment, the program in which the present embodiment is installed can be recorded in the storage device 56 or the portable recording medium 58 via the network 54. The program according to the present embodiment may be executed on another computer or network device connected to the network 54, and the input/output data may be transmitted and received via the network 53. Also, fixed length data to be encoded may be transmitted from a terminal having a sensor connected to the network 54.

The network 54 may be any network as long as it is a wired network, a wireless network, or the like, and can communicate between computers or between a computer and a network device. In one example, the network 54 may include the Internet, a LAN (Local Area Network: local area network), a WAN (Wide Area Network: wide area network), a fixed telephone network, a mobile telephone network, an ad hoc network, a VPN (Virtual Private Network: virtual private network), a sensor network, and the like.

As described above, in the present embodiment of one aspect of the present invention, when a fixed-length bit string of fixed-length data is composed of data having different meanings described in a plurality of specified fields and data described in the same-position field of each fixed-length data is the same type of data, the fixed-length bit string of fixed-length data is divided into arbitrary bit-number columns and the columns are successively encoded in the column direction so as to be independent of each other, whereby compression encoding with a higher compression rate than in the conventional encoding method can be achieved.

As an example of improving the compression ratio, the present inventors can compress the original data of 70, 016 bytes, 560, 128 bits to 13, 532 bytes, 94, 000 bits (excluding the complementary bits) by using the compression encoding device of the present embodiment. gzip is a compression of 14, 464 bytes, 115, 712 bits, bzip2 is a compression of 12, 985 bytes, 103, 880 bits, and therefore, the effectiveness of the compression encoding method of the present embodiment can be understood.

The encoding device of the present embodiment may be mounted by hardware such as an FPGA (Field Programmable Gate Array: field programmable gate array).

For example, the encoding device of the present embodiment may be implemented by combining hardware and software, in part, and in part, by hardware and software.

The above embodiments can be realized independently of each other or in combination with each other.

In the above-described embodiments, in the embodiments using the adaptive coding method, compression coding can be performed successively without temporarily accumulating data, and thus coding can be performed in real time. When the above embodiment is applied to real-time encoding, a predetermined number of recorded virtual images, which are sequentially input, are compressed in the column direction as table data.

Description of the reference numerals

1: sensor network

2: sensor for detecting a position of a body

3: gateway (GW)

4: processing device

10. 10a, 16: dividing unit

11-1 to 11-m, 11a-1 to 11a-m: column 1-m registers

12-1 to 12-m: coding units of columns 1-m

12a-1 to 12a-m, 20-1 to 20-m: column 1-m column division range determination unit

13. 17, 17a: mixing unit

14-1 to 14-m, 14a-1 to 14a-m: column 1-m decoding unit

15-1 to 15-m: list 1-m frequency table, coding table

18-1 to 18-m, 21-1 to 21-m: section dividing unit

19: coding unit

50：CPU

51：ROM

52：RAM

53: network interface

54: network system

55: bus line

56: storage device

57: read-write driver

58: portable recording medium

59: input/output device

Claims

1. A data compression encoding method for temporarily accumulating a record composed of a fixed length bit string including 1 or more fields in which data of the same attribute among a plurality of data sequentially transmitted from a transmission source is described, to a predetermined number of 2 or more, and compression encoding the accumulated record, the data compression encoding method comprising the steps of:

a dividing step of dividing the predetermined number of records of the 2 or more number into columns of a predetermined bit width so as not to be related to the boundary of the field;

a code table generation step of obtaining, for each column, the occurrence probability of a bit value in a column at the same position among the predetermined number of records of 2 or more, and creating a code table for an entropy coding method for each column based on the occurrence probability;

an encoding step of encoding each column constituting each record of the predetermined number of records of the 2 or more records by using an encoding table created for each column; and

An output step of outputting encoded data obtained by combining the encoded columns for each record,

and repeating the encoding step and the outputting step according to the predetermined number of records.

2. A storage medium storing a program, wherein,

the program causes a computer to execute the data compression encoding method according to claim 1.

3. A data compression encoding device wherein a record is temporarily accumulated to a prescribed number of 2 or more, and the accumulated record is compression encoded, the record being composed of a fixed length bit string including 1 or more fields in which data of the same attribute among a plurality of data sequentially transmitted from a transmission source is described,

the data compression encoding device is characterized by comprising:

a dividing unit that divides the predetermined number of records of the 2 or more number into columns of a predetermined bit width so as not to be related to the boundary of the field;

a code table generation unit that obtains, for each column, the occurrence probability of a bit value in a column at the same position among the 2 or more predetermined number of records, and creates a code table for an entropy encoding method for each column based on the occurrence probability;

An encoding unit that encodes each column constituting each record of the predetermined number of records of the 2 or more records by an encoding table created for each column; and

an output unit that outputs encoded data obtained by combining the encoded columns for each record,

and repeating the processing based on the encoding means and the output means in accordance with the predetermined number of records.

4. A data compression encoding method for compression-encoding a record composed of a fixed-length bit string composed of 1 or more fields describing data of the same attribute among data sequentially transmitted from a transmission source and outputting the data,

the data compression coding method is characterized in that the data compression coding method comprises the following steps:

a dividing step of dividing the record into columns of a prescribed bit width in a manner not related to the boundaries of the fields;

an encoding step of, for a record input at a current time, calculating, for each column, an occurrence probability of a bit value in a column at the same position with respect to a record input before the time, and encoding each column constituting the record by an adaptive entropy encoding method based on the occurrence probability; and

And outputting the coded data obtained by combining the coded columns in real time.

5. The data compression encoding method according to claim 4, wherein,

the bit width of the column is 1, and in the encoding step, the record is encoded by an arithmetic encoding method according to the occurrence probability.

6. A storage medium storing a program, wherein,

the program causes a computer to execute the data compression encoding method according to claim 4.

7. A data compression encoding device for compression-encoding and outputting a record composed of a fixed-length bit string composed of 1 or more fields describing data of the same attribute among data sequentially transmitted from a transmission source,

the data compression encoding device is characterized by comprising:

a dividing unit that divides the record into columns of a prescribed bit width in a manner not related to the boundaries of the fields;

an encoding unit that obtains, for a record input at a current time, an occurrence probability of a bit value in a column at the same position for each column for a record that has been input before the current time, and encodes each column constituting the record by an adaptive entropy encoding method according to the occurrence probability; and

And an output unit that outputs encoded data obtained by combining the encoded columns in real time.

8. The data compression encoding apparatus according to claim 7, wherein,

the bit width of the column is 1, and the encoding unit encodes the record by an arithmetic encoding method according to the occurrence probability.

9. A storage medium storing a program that causes a computer to execute the data compression encoding method of claim 5.