CN1790918A - Lossless data compression method based on virtual information source and neural network - Google Patents

Lossless data compression method based on virtual information source and neural network Download PDF

Info

Publication number
CN1790918A
CN1790918A CN 200410098954 CN200410098954A CN1790918A CN 1790918 A CN1790918 A CN 1790918A CN 200410098954 CN200410098954 CN 200410098954 CN 200410098954 A CN200410098954 A CN 200410098954A CN 1790918 A CN1790918 A CN 1790918A
Authority
CN
China
Prior art keywords
information source
virtual information
model
character string
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410098954
Other languages
Chinese (zh)
Inventor
杨国为
涂序彦
王守觉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Semiconductors of CAS
Original Assignee
Institute of Semiconductors of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Semiconductors of CAS filed Critical Institute of Semiconductors of CAS
Priority to CN 200410098954 priority Critical patent/CN1790918A/en
Publication of CN1790918A publication Critical patent/CN1790918A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a lossless data compression method based on virtual information source (VIS) and NN in computer technology field, which comprises: a. BP compression: (1) taking all data to be treated as character string from VIS; (2) building BP NN model for VIS; (3) using BP model parameter to code the character string from VIS; b. BP decompression: (1) recovering BP model parameter; (2) recovering model of VIS; (3) combining the VIS model and rounding transform to construct recovery mapping for string; (4) recovering completely the string generated by VIS. This invention has wide application.

Description

Destructive data compressing method based on virtual information source and neural net
Technical field
The invention belongs to field of computer technology, the invention provides a kind of destructive data compressing method based on virtual information source and neural net.
Background technology
We notice, take a message along with optical networking and compact disk equipment are wide and the employing of big volume of passenger traffic storage medium, and channel width and storage quantitative change are increasing, but the increase of data volume is far away faster than the growth of bandwidth and memory space.Therefore data compression efficiently is still indispensable.Particularly in the occasions to data quality requirement harshness such as computer system files, medical treatment, secure communication, lossless compressiong is still very necessary efficiently.
There is not lossless compression method efficiently so far.Although at present numerous lossless compression methods such as Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding are arranged, they have three public defectives:
1. the compression thought of these destructive data compressing methods all is to eliminate statistical redundancy in the data as far as possible, and redundant some reject of other in the data does not fall.In fact, a large body of facts has very big non-statistical redundancy probably, is by certain shape such as y=kx+y as the set of the skew lines Duan Shangdian on the plane 0, (a, " virtual information source " b) provides x ∈.Obviously the set of skew lines Duan Shangdian does not here almost have " statistics " redundancy, and we can use { k, y 0, a, b} go to encode it, compress it from forming hundred thousandfolds, illustrate that also this set has very high non-statistical redundancy.
2. their compression ratio is little.Compression ratio to general pattern, text has only about 2: 1, and the image that more obviously was compressed, text still have very high redundancy.
3. can't recompress the data of compressing with entropy coding.Because all entropy codings, as Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding, all being based on ' eliminating statistical redundancy in the data ' thought makes up, therefore the data of having compressed with entropy coding have not almost had statistical redundancy, thereby just can not compress with Huffman encoding, arithmetic coding, dictionary encoding, Run-Length Coding, predictive coding etc. more yet.
People have attempted using high error resilience tool, as neural net, construct destructive data compressing method, but do not succeed.Existing data compression method with neural network configuration all diminishes.
Based on above-mentioned cognition to traditional lossless data compression, the present invention proposes a kind of destructive data compressing method based on virtual information source and neural net, this method is not walked unique old road of the past destructive data compressing method of ' only eliminating statistical redundancy in the data ', new thought is compressed in the modeling of proposition off the beaten track ' based on virtual information source and neural net ', has improved the lossless data compression rate.
Summary of the invention
The present invention proposes a kind of destructive data compressing method based on virtual information source and neural net, the modeling compression method based on virtual information source and neural net of ' only eliminating the statistical redundancy method ' is proposed to be different from from essential idea, improved the data lossless compression ratio, and the compressible data of having compressed with entropy coding of this compression method (reject part statistical redundancy data), can obtain to have the more destructive data compressing method of high compression ratio thereby combine with existing entropy coding with this compression method.Realization technology based on the destructive data compressing method of virtual information source and neural net is an artificial neural net, is that international and domestic first Application neural net realizes the lossless data compression scheme.
The present invention has at first proposed the virtual information source notion and has made up the thought of data compression method based on virtual information source and artificial neural net, set up 0 and 1 the long general virtual information source Y that goes here and there of character, set up the model of virtual information source Y then with neural net, constructed destructive data compressing method with this model and bracket function again based on virtual information source and neural net.Experiment shows that the compression ratio of this compression method is 3: 1 generally speaking.Its step is as follows:
The a.BP compression
1. all data to be processed are considered as the character string that produced by virtual information source;
2. set up the BP neural network model of virtual information source;
3. the character string of going to encode and producing by virtual information source with BP model parameter (data volume is less than the data volume of original character string);
B.BP decompresses
1. recover the BP model parameter;
2. recover the model of virtual information source;
3. the model of virtual information source is in conjunction with the recovery mapping that rounds conversion structure character string;
4. recover the character string that virtual information source produces fully.
Description of drawings
Fig. 1 is contraction principle of the present invention and decompression principle flow chart.
Embodiment
The invention provides a kind of destructive data compressing method based on virtual information source and neural net, its flow chart as shown in Figure 1.The invention process step is as follows:
1. set up the virtual information source of initial data to be processed.Be provided with long string γ=c to be compressed 1C 49152, c j=0 or 1, wherein the length of γ is 49152.The set at definition 12 dimension unit hypercube summits and center is former image set below, as the mapping Y that drops on 12 dimension unit hypercube surfaces and center.Title Y is γ=c 1C 49152, c jThe virtual information source of=0 or 1.
Mapping Y: establish A={ (x 1..., x i..., x 12) | x i=0,1} is (x 1..., x i..., x 12) press binary number x 1x 2X iX 12Big or small inverted order arrange, promptly row is some a in order 1=(1,1 ..., 1), a 2=(1 ..., 1,0), a 3=(1 ..., 1,0,1) ..., a 4095=(0 ..., 0,1), a 4096=(0 ..., 0,0).Suppose character string γ=c of to be compressed 0 and 1 1C 49152, c j=0 or 1, wherein the length of γ is 49152, note B={b j| b j=(c 12x (j-1)+1..., c 12 * j), 1≤j≤4096} annotates: b jIn c iBe converted to the form of 0 and 1 number, definition Y is the mapping of A to B
Y:A→B
Y:a i→b i,i=1,…,4096,a i∈A,b i∈B
Be b i=Y (a i), i=1 ..., 4096.
2. set up the neural network model of virtual information source.Because virtual information source Y is the bounded mapping of finite point set A to finite point set B, it necessarily can be extended for ball A={ (x 1..., x i..., x 12) | 0≤| x i|≤1, i=1 ..., the bounded Continuous Mappings Y on the 12}.By relevant result as can be known, exist 3 layers of BP network fully to approach Y, naturally this network fully approaches Y.
We are according to practical experience, have considered that 4096 samples will handle, and select for use 3 layers the BP model that 960+64=1024 the degree of freedom arranged to approach Y.The selected a certain state that satisfies condition of neural net is as the model of virtual information source Y, delivery shape parameter coded string γ=c 1C 49152, c j=0 or 1 is so that the long string of squeezing characters γ.The present invention is the mapping that approaches that the BP model of Sigmoid function is used as mapped specific Y with 3 layers of 64 neuron (each 12 of input, output layers, 40 of hidden layers) and hidden layer output function.Wherein 960 to connect weights be w Ij (1), i=1 ..., 12, j=1 ..., 40; w Ij (2), i=1 ..., 40, j=1 ..., 12 and 64 threshold values are w j (k), j=1 ..., 12, k=1,2,3, w j (2), j=29 ..., 40.
3.BP compression: as preceding setting long string γ=c to be compressed 1C 49152, c j=0 or 1, wherein the length of γ is 49152, a k, b k, k=1 ..., 4096 and the mapping Y.(i) with order sample (a k, b k) the selected BP network of training, k=1 ..., 4096, variable is set during training | w Ij (k)|, | w j (k)| the upper bound be 128, all weight w Ij (k)With threshold value w j (k)All be accurate to behind the decimal point 2, (ii) a state (being called stable state) of selected BP network: this state is output b down k' with desired output b kThe absolute value of component difference less than 0.49, promptly | b Ki'-b Ki|<0.49, i=1 ..., 12, (iii) when stable state, order is got 960 neurons and is connected weight w Ij (1), i=1 ..., 12, j=1 ..., 40; w Ij (2), i=1 ..., 40, j=1 ..., 12 and 64 threshold value w j (k)=1 ..., 12, k=1,2,3, w i (2)J=29 ..., 40, as the coding of the long string of character γ, (iv) order connects weights and 64 threshold values 2 scale codings successively to 960 neurons, and scheme is to be each w with 16 of 2 bytes Ij (k), w j (k)Coding, 1 is-symbol position wherein, 1 is decimal point mark position, and 7 are used for 2 codings behind the decimal point, and 7 are used for preceding 2 codings of decimal point (this coding can be guaranteed ought | w ij ( k ) | < 128 + 1 , | w j ( k ) | < 128 + 1 , w Ij (k), w j (k)When being accurate to behind the decimal point 2, coding is effectively correct, although coding has deviation during compression, can recover γ fully after following decompression method decompression.
By above-mentioned principle as can be known, if (i), (ii) can carry out smoothly, then the compression ratio of this compression method is
6 &times; 1024 &times; 8 960 &times; 13 &times; 64 &times; 16 = 3 1
4.BP bracket function decompresses: BP bracket function decompression model is to be formed by mapping compound (or overcoat) bracket function that selected BP network constitutes.W when (i) recovering stable state Ij (1), i=1 ..., 12, j=1 ..., 40, k=1,2; w Ij (2), i=1 ..., 40, j=1 ..., 12, the threshold value w when value and stable state j (k), j=1 ..., 12, k=1,2,3, w j (2), j=29 ..., 40,, recover BP model stability state, (ii) the network that recovers is considered as a fixing mapping Y 1, Y at this moment 1Be shining upon of Y, and the degree of approximation value is less than 0.49 (annotating: differ 0.01 at the most with theoretical degree of approximation, see the strict proof of mathematics of attached sheet), (iii) Y like near 1With bracket function [y i+ 0.5], i=1 ..., 10, the compound [y that gets i(X)+0.5], i=1 ..., 12, X=(x 1..., x i..., x 12), (iv) a k, k=1 ..., 4096 orders are imported compound function [y i(X)+0.5], i=1 ..., in 12, order obtains b k, k=1 ..., 4096, (v) by b k, k=1 ..., 4096 recover long string γ=c fully 1C 49152
Embodiment:
Appoint and get long string γ=c to be compressed 1C 49152, c j=0 or 1, wherein the length of γ is 49152 (long string length surpasses 49152, then is divided into plurality of sections length and is 49152 son section, if length is not enough 49152, then mends several in the string back and 0 makes its length become 49152).
1. set up γ=c 1C 49152, c jThe virtual information source Y of=0 or 1.Mapping Y: establish A={ (x 1..., x i..., x 12) | x i=0,1} is (x 1..., x i..., x 12) press binary number x 1x 2X jX 12Big or small inverted order arrange, promptly row is some a in order 1=(1,1 ..., 1), a 2=(1 ..., 1,0), a 3=(1 ..., 1,0,1) ..., a 4095=(0 ..., 0,1), a 4096=(0 ..., 0,0).Suppose character string γ=c of to be compressed 0 and 1 1C 49152, c j=0 or 1, wherein the length of γ is 49152, note B={b j| b j=(c 12x (j-1)+1..., c 12 * j), 1≤j≤4096}.Annotate: b jIn c iBe converted to the form of 0 and 1 number.Definition Y is the mapping of A to B
Y:A→B
Y:a i→b i,i=1,…,4096,a i∈A,b i∈B
Be b i=Y (a i), i=1 ..., 4096.
2. set up the neural network model of virtual information source Y.3 layers of 64 neuron of concrete usefulness neural net (each 12 of input, output layers, 40 of hidden layers) and hidden layer output function are the mapping that approaches that the BP model of Sigmoid function is used as mapped specific Y.Wherein 960 to connect weights be w Ij (1), i=1 ..., 12, j=1 ..., 40; w Ij (2), i=1 ..., 40, j=1 ..., 12 and 64 threshold values are w j (k), i=1 ..., 12, k=1,2,3, w j (2), j=29 ..., 40.
3. the BP of γ compression: as preceding setting long string γ=c to be compressed 1C 49152, c j=0 or 1, wherein the length of γ is 49152, a k, b k, k=1 ..., 4096 and the mapping Y.(i) with order sample (a k, b k) the selected BP network of training, k=1 ..., 4096, variable is set during training | w Ij (k)|, | w j (k)| the upper bound be 128, all weight w Ij (k)With threshold value w j (k)All be accurate to behind the decimal point 2, (ii) a state (being called stable state) of selected BP network: this state is output b down k' with desired output b kThe absolute value of component difference less than 0.49, promptly | b Ki'-b Ki|<0.49, i=1 ..., 12, (iii) when stable state, order is got 960 neurons and is connected weight w Ij (1), i=1 ..., 12, j=1 ..., 40; w Ij (2), i=1 ..., 40, j=1 ..., 12 and 64 threshold value w j (k), j=1 ..., 12, k=1,2,3, w j (2), j=29 ..., 40, as the coding of the long string of character γ, (iv) order connects weights and 64 threshold values 2 scale codings successively to 960 neurons, and scheme is to be each w with 16 of 2 bytes Ij (k), w j (k)Coding, 1 is-symbol position wherein, 1 is decimal point mark position, and 7 are used for 2 codings behind the decimal point, and 7 are used for preceding 2 codings of decimal point.
4. the BP bracket function of γ decompresses: BP bracket function decompression model is to be formed by mapping compound (or overcoat) bracket function that selected BP network constitutes.W when (i) recovering stable state Ij (1), i=1 ..., 12, j=1 ..., 40, k=1,2; w Ij (2), i=1 ..., 40, j=1 ..., 12, the threshold value w when value and stable state j (k), j=1 ..., 12, k=1,2,3, w j (2), j=29 ..., 40,, recover BP model stability state, (ii) the network that recovers is considered as a fixing mapping Y 1, Y at this moment 1Be shining upon of Y, and the degree of approximation value is less than 0.49, (iii) Y like near 1With bracket function [y i+ 0.5], i=1 ..., 10, the compound [y that gets i(X)+0.5], i=1 ..., 12, X=(x 1..., x i..., x 12), (iv) a k, k=1 ..., 4096 orders are imported compound function [y i(X)+0.5], i=1 ..., in 12, order obtains b k, k=1 ..., 4096, (v) by b k, k=1 ..., 4096 recover long string γ=c fully 1C 49152

Claims (4)

1. based on the destructive data compressing method of virtual information source and neural net, it is characterized in that its step is as follows:
The a.BP compression
(1). all data to be processed are considered as the character string that produces by virtual information source;
(2). set up the BP neural network model of virtual information source;
(3). the character string of going to encode and producing with the BP model parameter by virtual information source;
B.BP decompresses
(1). recover the BP model parameter;
(2). recover the model of virtual information source;
(3). the model of virtual information source is in conjunction with the recovery mapping that rounds conversion structure character string;
(4). recover the character string that virtual information source produces fully.
2. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, the virtual information source of data is mappings:
Mapping Y: establish A={ (x 1..., x i..., x 12) | x i=0,1} is (x 1..., x i..., x 12) press binary number x 1x 2X iX 12Big or small inverted order arrange, promptly row is some a in order 1=(1,1 ..., 1), a 2=(1 ..., 1,0), a 3=(1 ..., 1,0,1) ..., a 4095=(0 ..., 0,1), a 4096=(0 ..., 0,0), suppose character string γ=c of to be compressed 0 and 1 1C 49152, c j=0 or 1, wherein the length of γ is 49152, note B={b j| b j=(c 12x (j-1)+1..., c 12 * j), 1≤j≤4096} annotates: b jIn c iBe converted to the form of 0 and 1 number, definition Y is the mapping of A to B
Y:A→B
Y:a i→b i,i=1,…,4096,a i∈A,b i∈B
Be b i=Y (a i), i=1 ..., 4096.
3. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, the BP neural network model is that 3 layers of 64 neuron and hidden layer output function are the BP model of Sigmoid function.
4. the destructive data compressing method based on virtual information source and neural net according to claim 1 is characterized in that, it is the compound bracket function [y of BP model that data are recovered mapping i(X)+0.5].
CN 200410098954 2004-12-17 2004-12-17 Lossless data compression method based on virtual information source and neural network Pending CN1790918A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410098954 CN1790918A (en) 2004-12-17 2004-12-17 Lossless data compression method based on virtual information source and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410098954 CN1790918A (en) 2004-12-17 2004-12-17 Lossless data compression method based on virtual information source and neural network

Publications (1)

Publication Number Publication Date
CN1790918A true CN1790918A (en) 2006-06-21

Family

ID=36788477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410098954 Pending CN1790918A (en) 2004-12-17 2004-12-17 Lossless data compression method based on virtual information source and neural network

Country Status (1)

Country Link
CN (1) CN1790918A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183873B (en) * 2007-12-11 2011-09-28 广州中珩电子科技有限公司 BP neural network based embedded system data compression/decompression method
CN110520909A (en) * 2017-04-17 2019-11-29 微软技术许可有限责任公司 The neural network processor of bandwidth of memory utilization rate is reduced using the compression and decompression of activation data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183873B (en) * 2007-12-11 2011-09-28 广州中珩电子科技有限公司 BP neural network based embedded system data compression/decompression method
CN110520909A (en) * 2017-04-17 2019-11-29 微软技术许可有限责任公司 The neural network processor of bandwidth of memory utilization rate is reduced using the compression and decompression of activation data
US11182667B2 (en) 2017-04-17 2021-11-23 Microsoft Technology Licensing, Llc Minimizing memory reads and increasing performance by leveraging aligned blob data in a processing unit of a neural network environment
US11528033B2 (en) 2017-04-17 2022-12-13 Microsoft Technology Licensing, Llc Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization

Similar Documents

Publication Publication Date Title
CN101243611B (en) Efficient coding and decoding of transform blocks
CN103814396B (en) The method and apparatus of coding/decoding bit stream
CN100517979C (en) Data compression and decompression method
CN107481295B (en) Image compression system of convolutional neural network based on dynamic byte length distribution
CN101183873B (en) BP neural network based embedded system data compression/decompression method
CN1316828A (en) Data compaction, transmission, storage and program transmission
CN111312356B (en) Traditional Chinese medicine prescription generation method based on BERT and integration efficacy information
CN1369970A (en) Position adaptive coding method using prefix prediction
US7583849B2 (en) Lossless image compression with tree coding of magnitude levels
CN110059822A (en) One kind compressing quantization method based on channel packet low bit neural network parameter
US7302106B2 (en) System and method for ink or handwriting compression
CN1405735A (en) Colour-picture damage-free compression method based on perceptron
CN111276187B (en) Gene expression profile feature learning method based on self-encoder
CN1112674C (en) Predictive split-matrix quantization of spectral parameters for efficient coding of speech
CN1628466A (en) Context-sensitive encoding and decoding of a video data stream
CN1186766C (en) Bidirectional pitch enhancement in speech coding systems
KR100511719B1 (en) 3-dimension normal mesh data compression apparatus by using rate-distortion optimization
US6606416B1 (en) Encoding method and apparatus for representing a digital image
CN1790918A (en) Lossless data compression method based on virtual information source and neural network
CN111343458B (en) Sparse gray image coding and decoding method and system based on reconstructed residual
CN1140996C (en) Image compression method using wavelet transform
CN113949880B (en) Extremely-low-bit-rate man-machine collaborative image coding training method and coding and decoding method
CN101754021B (en) Method for realizing mobile phone mobile portal technology based on improved wavelet-transform image compression method
CN101094402A (en) Method for encoding image based on neural network and SVM
Apostolico et al. Compression and the wheel of fortune

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication