CN1790918A

CN1790918A - Lossless data compression method based on virtual information source and neural network

Info

Publication number: CN1790918A
Application number: CN 200410098954
Authority: CN
Inventors: 杨国为; 涂序彦; 王守觉
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2004-12-17
Filing date: 2004-12-17
Publication date: 2006-06-21

Abstract

The invention provides a lossless data compression method based on virtual information source (VIS) and NN in computer technology field, which comprises: a. BP compression: (1) taking all data to be treated as character string from VIS; (2) building BP NN model for VIS; (3) using BP model parameter to code the character string from VIS; b. BP decompression: (1) recovering BP model parameter; (2) recovering model of VIS; (3) combining the VIS model and rounding transform to construct recovery mapping for string; (4) recovering completely the string generated by VIS. This invention has wide application.

Description

Destructive data compressing method based on virtual information source and neural net

Technical field

The invention belongs to field of computer technology, the invention provides a kind of destructive data compressing method based on virtual information source and neural net.

Background technology

We notice, take a message along with optical networking and compact disk equipment are wide and the employing of big volume of passenger traffic storage medium, and channel width and storage quantitative change are increasing, but the increase of data volume is far away faster than the growth of bandwidth and memory space.Therefore data compression efficiently is still indispensable.Particularly in the occasions to data quality requirement harshness such as computer system files, medical treatment, secure communication, lossless compressiong is still very necessary efficiently.

There is not lossless compression method efficiently so far.Although at present numerous lossless compression methods such as Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding are arranged, they have three public defectives:

1. the compression thought of these destructive data compressing methods all is to eliminate statistical redundancy in the data as far as possible, and redundant some reject of other in the data does not fall.In fact, a large body of facts has very big non-statistical redundancy probably, is by certain shape such as y=kx+y as the set of the skew lines Duan Shangdian on the plane ₀, (a, " virtual information source " b) provides x ∈.Obviously the set of skew lines Duan Shangdian does not here almost have " statistics " redundancy, and we can use { k, y ₀, a, b} go to encode it, compress it from forming hundred thousandfolds, illustrate that also this set has very high non-statistical redundancy.

2. their compression ratio is little.Compression ratio to general pattern, text has only about 2: 1, and the image that more obviously was compressed, text still have very high redundancy.

3. can't recompress the data of compressing with entropy coding.Because all entropy codings, as Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding, all being based on ' eliminating statistical redundancy in the data ' thought makes up, therefore the data of having compressed with entropy coding have not almost had statistical redundancy, thereby just can not compress with Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding etc. more yet.

People have attempted using high error resilience tool, as neural net, construct destructive data compressing method, but do not succeed.Existing data compression method with neural network configuration all diminishes.

Based on above-mentioned cognition to traditional lossless data compression, the present invention proposes a kind of destructive data compressing method based on virtual information source and neural net, this method is not walked unique old road of the past destructive data compressing method of ' only eliminating statistical redundancy in the data ', new thought is compressed in the modeling of proposition off the beaten track ' based on virtual information source and neural net ', has improved the lossless data compression rate.

Summary of the invention

The present invention proposes a kind of destructive data compressing method based on virtual information source and neural net, the modeling compression method based on virtual information source and neural net of ' only eliminating the statistical redundancy method ' is proposed to be different from from essential idea, improved the data lossless compression ratio, and the compressible data of having compressed with entropy coding of this compression method (reject part statistical redundancy data), can obtain to have the more destructive data compressing method of high compression ratio thereby combine with existing entropy coding with this compression method.Realization technology based on the destructive data compressing method of virtual information source and neural net is an artificial neural net, is that international and domestic first Application neural net realizes the lossless data compression scheme.

The present invention has at first proposed the virtual information source notion and has made up the thought of data compression method based on virtual information source and artificial neural net, set up 0 and 1 the long general virtual information source Y that goes here and there of character, set up the model of virtual information source Y then with neural net, constructed destructive data compressing method with this model and bracket function again based on virtual information source and neural net.Experiment shows that the compression ratio of this compression method is 3: 1 generally speaking.Its step is as follows:

The a.BP compression

1. all data to be processed are considered as the character string that produced by virtual information source;

2. set up the BP neural network model of virtual information source;

3. the character string of going to encode and producing by virtual information source with BP model parameter (data volume is less than the data volume of original character string);

B.BP decompresses

1. recover the BP model parameter;

2. recover the model of virtual information source;

3. the model of virtual information source is in conjunction with the recovery mapping that rounds conversion structure character string;

4. recover the character string that virtual information source produces fully.

Description of drawings

Fig. 1 is contraction principle of the present invention and decompression principle flow chart.

Embodiment

The invention provides a kind of destructive data compressing method based on virtual information source and neural net, its flow chart as shown in Figure 1.The invention process step is as follows:

1. set up the virtual information source of initial data to be processed.Be provided with long string γ=c to be compressed ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152.The set at definition 12 dimension unit hypercube summits and center is former image set below, as the mapping Y that drops on 12 dimension unit hypercube surfaces and center.Title Y is γ=c ₁C ₄₉₁₅₂, c _jThe virtual information source of=0 or 1.

Mapping Y: establish A={ (x ₁..., x _i..., x ₁₂) | x _i=0,1} is (x ₁..., x _i..., x ₁₂) press binary number x ₁x ₂X _iX ₁₂Big or small inverted order arrange, promptly row is some a in order ₁=(1,1 ..., 1), a ₂=(1 ..., 1,0), a ₃=(1 ..., 1,0,1) ..., a ₄₀₉₅=(0 ..., 0,1), a ₄₀₉₆=(0 ..., 0,0).Suppose character string γ=c of to be compressed 0 and 1 ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152, note B={b _j| b _j=(c _{12x (j-1)+1}..., c _{12 * j}), 1≤j≤4096} annotates: b _jIn c _iBe converted to the form of 0 and 1 number, definition Y is the mapping of A to B

Y:A→B

Y:a _i→b _i，i＝1，…，4096，a _i∈A，b _i∈B

Be b _i=Y (a _i), i=1 ..., 4096.

2. set up the neural network model of virtual information source.Because virtual information source Y is the bounded mapping of finite point set A to finite point set B, it necessarily can be extended for ball A={ (x ₁..., x _i..., x ₁₂) | 0≤| x _i|≤1, i=1 ..., the bounded Continuous Mappings Y on the 12}.By relevant result as can be known, exist 3 layers of BP network fully to approach Y, naturally this network fully approaches Y.

We are according to practical experience, have considered that 4096 samples will handle, and select for use 3 layers the BP model that 960+64=1024 the degree of freedom arranged to approach Y.The selected a certain state that satisfies condition of neural net is as the model of virtual information source Y, delivery shape parameter coded string γ=c ₁C ₄₉₁₅₂, c _j=0 or 1 is so that the long string of squeezing characters γ.The present invention is the mapping that approaches that the BP model of Sigmoid function is used as mapped specific Y with 3 layers of 64 neuron (each 12 of input, output layers, 40 of hidden layers) and hidden layer output function.Wherein 960 to connect weights be w _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12 and 64 threshold values are w _j ^(k), j=1 ..., 12, k=1,2,3, w _j ⁽²⁾, j=29 ..., 40.

3.BP compression: as preceding setting long string γ=c to be compressed ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152, a _k, b _k, k=1 ..., 4096 and the mapping Y.(i) with order sample (a _k, b _k) the selected BP network of training, k=1 ..., 4096, variable is set during training | w _Ij ^(k)|, | w _j ^(k)| the upper bound be 128, all weight w _Ij ^(k)With threshold value w _j ^(k)All be accurate to behind the decimal point 2, (ii) a state (being called stable state) of selected BP network: this state is output b down _k' with desired output b _kThe absolute value of component difference less than 0.49, promptly | b _Ki'-b _Ki|＜0.49, i=1 ..., 12, (iii) when stable state, order is got 960 neurons and is connected weight w _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12 and 64 threshold value w _j ^(k)=1 ..., 12, k=1,2,3, w _i ⁽²⁾J=29 ..., 40, as the coding of the long string of character γ, (iv) order connects weights and 64 threshold values 2 scale codings successively to 960 neurons, and scheme is to be each w with 16 of 2 bytes _Ij ^(k), w _j ^(k)Coding, 1 is-symbol position wherein, 1 is decimal point mark position, and 7 are used for 2 codings behind the decimal point, and 7 are used for preceding 2 codings of decimal point (this coding can be guaranteed ought

| w_{ij}^{(k)} | < 128 + 1, | w_{j}^{(k)} | < 128 + 1,

w _Ij ^(k), w _j ^(k)When being accurate to behind the decimal point 2, coding is effectively correct, although coding has deviation during compression, can recover γ fully after following decompression method decompression.

By above-mentioned principle as can be known, if (i), (ii) can carry out smoothly, then the compression ratio of this compression method is

\frac{6 \times 1024 \times 8}{960 \times 13 \times 64 \times 16} = \frac{3}{1}

4.BP bracket function decompresses: BP bracket function decompression model is to be formed by mapping compound (or overcoat) bracket function that selected BP network constitutes.W when (i) recovering stable state _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40, k=1,2; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12, the threshold value w when value and stable state _j ^(k), j=1 ..., 12, k=1,2,3, w _j ⁽²⁾, j=29 ..., 40,, recover BP model stability state, (ii) the network that recovers is considered as a fixing mapping Y ¹, Y at this moment ¹Be shining upon of Y, and the degree of approximation value is less than 0.49 (annotating: differ 0.01 at the most with theoretical degree of approximation, see the strict proof of mathematics of attached sheet), (iii) Y like near ¹With bracket function [y _i+ 0.5], i=1 ..., 10, the compound [y that gets _i(X)+0.5], i=1 ..., 12, X=(x ₁..., x _i..., x ₁₂), (iv) a _k, k=1 ..., 4096 orders are imported compound function [y _i(X)+0.5], i=1 ..., in 12, order obtains b _k, k=1 ..., 4096, (v) by b _k, k=1 ..., 4096 recover long string γ=c fully ₁C ₄₉₁₅₂

Embodiment:

Appoint and get long string γ=c to be compressed ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152 (long string length surpasses 49152, then is divided into plurality of sections length and is 49152 son section, if length is not enough 49152, then mends several in the string back and 0 makes its length become 49152).

1. set up γ=c ₁C ₄₉₁₅₂, c _jThe virtual information source Y of=0 or 1.Mapping Y: establish A={ (x ₁..., x _i..., x ₁₂) | x _i=0,1} is (x ₁..., x _i..., x ₁₂) press binary number x ₁x ₂X _jX ₁₂Big or small inverted order arrange, promptly row is some a in order ₁=(1,1 ..., 1), a ₂=(1 ..., 1,0), a ₃=(1 ..., 1,0,1) ..., a ₄₀₉₅=(0 ..., 0,1), a ₄₀₉₆=(0 ..., 0,0).Suppose character string γ=c of to be compressed 0 and 1 ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152, note B={b _j| b _j=(c _{12x (j-1)+1}..., c _{12 * j}), 1≤j≤4096}.Annotate: b _jIn c _iBe converted to the form of 0 and 1 number.Definition Y is the mapping of A to B

Y:A→B

Y:a _i→b _i，i＝1，…，4096，a _i∈A，b _i∈B

Be b _i=Y (a _i), i=1 ..., 4096.

2. set up the neural network model of virtual information source Y.3 layers of 64 neuron of concrete usefulness neural net (each 12 of input, output layers, 40 of hidden layers) and hidden layer output function are the mapping that approaches that the BP model of Sigmoid function is used as mapped specific Y.Wherein 960 to connect weights be w _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12 and 64 threshold values are w _j ^(k), i=1 ..., 12, k=1,2,3, w _j ⁽²⁾, j=29 ..., 40.

3. the BP of γ compression: as preceding setting long string γ=c to be compressed ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152, a _k, b _k, k=1 ..., 4096 and the mapping Y.(i) with order sample (a _k, b _k) the selected BP network of training, k=1 ..., 4096, variable is set during training | w _Ij ^(k)|, | w _j ^(k)| the upper bound be 128, all weight w _Ij ^(k)With threshold value w _j ^(k)All be accurate to behind the decimal point 2, (ii) a state (being called stable state) of selected BP network: this state is output b down _k' with desired output b _kThe absolute value of component difference less than 0.49, promptly | b _Ki'-b _Ki|＜0.49, i=1 ..., 12, (iii) when stable state, order is got 960 neurons and is connected weight w _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12 and 64 threshold value w _j ^(k), j=1 ..., 12, k=1,2,3, w _j ⁽²⁾, j=29 ..., 40, as the coding of the long string of character γ, (iv) order connects weights and 64 threshold values 2 scale codings successively to 960 neurons, and scheme is to be each w with 16 of 2 bytes _Ij ^(k), w _j ^(k)Coding, 1 is-symbol position wherein, 1 is decimal point mark position, and 7 are used for 2 codings behind the decimal point, and 7 are used for preceding 2 codings of decimal point.

4. the BP bracket function of γ decompresses: BP bracket function decompression model is to be formed by mapping compound (or overcoat) bracket function that selected BP network constitutes.W when (i) recovering stable state _Ij ⁽¹⁾, i=1 ..., 12, j=1 ..., 40, k=1,2; w _Ij ⁽²⁾, i=1 ..., 40, j=1 ..., 12, the threshold value w when value and stable state _j ^(k), j=1 ..., 12, k=1,2,3, w _j ⁽²⁾, j=29 ..., 40,, recover BP model stability state, (ii) the network that recovers is considered as a fixing mapping Y ¹, Y at this moment ¹Be shining upon of Y, and the degree of approximation value is less than 0.49, (iii) Y like near ¹With bracket function [y _i+ 0.5], i=1 ..., 10, the compound [y that gets _i(X)+0.5], i=1 ..., 12, X=(x ₁..., x _i..., x ₁₂), (iv) a _k, k=1 ..., 4096 orders are imported compound function [y _i(X)+0.5], i=1 ..., in 12, order obtains b _k, k=1 ..., 4096, (v) by b _k, k=1 ..., 4096 recover long string γ=c fully ₁C ₄₉₁₅₂

Claims

1. based on the destructive data compressing method of virtual information source and neural net, it is characterized in that its step is as follows:

The a.BP compression

(1). all data to be processed are considered as the character string that produces by virtual information source;

(2). set up the BP neural network model of virtual information source;

(3). the character string of going to encode and producing with the BP model parameter by virtual information source;

B.BP decompresses

(1). recover the BP model parameter;

(2). recover the model of virtual information source;

(3). the model of virtual information source is in conjunction with the recovery mapping that rounds conversion structure character string;

(4). recover the character string that virtual information source produces fully.

2. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, the virtual information source of data is mappings:

Mapping Y: establish A={ (x ₁..., x _i..., x ₁₂) | x _i=0,1} is (x ₁..., x _i..., x ₁₂) press binary number x ₁x ₂X _iX ₁₂Big or small inverted order arrange, promptly row is some a in order ₁=(1,1 ..., 1), a ₂=(1 ..., 1,0), a ₃=(1 ..., 1,0,1) ..., a ₄₀₉₅=(0 ..., 0,1), a ₄₀₉₆=(0 ..., 0,0), suppose character string γ=c of to be compressed 0 and 1 ₁C ₄₉₁₅₂, c _j=0 or 1, wherein the length of γ is 49152, note B={b _j| b _j=(c _{12x (j-1)+1}..., c _{12 * j}), 1≤j≤4096} annotates: b _jIn c _iBe converted to the form of 0 and 1 number, definition Y is the mapping of A to B

Y：A→B

Y：a _i→b _i，i＝1，…，4096，a _i∈A，b _i∈B

Be b _i=Y (a _i), i=1 ..., 4096.

3. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, the BP neural network model is that 3 layers of 64 neuron and hidden layer output function are the BP model of Sigmoid function.

4. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, it is the compound bracket function [y of BP model that data are recovered mapping _i(X)+0.5].