CN112667583A - Method for recovering damaged ZIP compressed file - Google Patents
Method for recovering damaged ZIP compressed file Download PDFInfo
- Publication number
- CN112667583A CN112667583A CN202011599777.6A CN202011599777A CN112667583A CN 112667583 A CN112667583 A CN 112667583A CN 202011599777 A CN202011599777 A CN 202011599777A CN 112667583 A CN112667583 A CN 112667583A
- Authority
- CN
- China
- Prior art keywords
- stream
- executing
- bit
- data
- dist
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method for recovering a damaged ZIP compressed file, which is characterized by comprising the following steps of: s100: constructing a first Huffman code table; s200: constructing a second Huffman code table; s300: constructing a third Huffman code table; s400: whether the decoding position is the end of the LIT coded stream/DIST coded stream or not, if so, executing step SC00, otherwise, executing step S500; s500: acquiring decoded data; s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700; s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; s800: judging whether the mark is an end mark, if so, executing step SA00, otherwise, executing step S900; s900: moving the LIT coded stream/DIST coded stream from a low bit to a high bit by N bits; SA 00: judging whether the data is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00; SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; SC 00: and acquiring a final decoded stream.
Description
Technical Field
The invention belongs to the field of data recovery and electronic evidence collection, and relates to a method for recovering a damaged ZIP compressed file.
Background
The ZIP file format is a file format for data compression and document storage, and belongs to one of several mainstream compression formats. Microsoft Windows operating system provides built-in support for the zip format, and even if decompression software is not installed on a computer of a user, compressed files in the zip format can be opened and made, so that the compression mode is commonly used for file transmission and storage in various industrial works. When the file is damaged, decompression is needed before processing. In the practical application process, the most common problem is that the ZIP compressed file is damaged and cannot be decompressed and opened, so that the data is lost. A recovery method of a damaged ZIP compressed file becomes very important.
In general, the main part of the Deflate compressed data stream of the ZIP compressed file is a LIT encoded stream/a DIST encoded stream. The more data is compressed, the larger the ratio LIT/DIST encoded streams is, and in the limit, the ratio can approach 99.9%. Therefore, the situation that the LIT coded stream/DIST coded stream is damaged is more common.
The problems of the prior art are as follows: the recovery decompression method for the damaged ZIP compressed file often fails to decompress normal original data or can only decompress the first section of normal data when the main part of the data storage (i.e. LIST encoded stream/DIST encoded stream) is damaged, so that the recovery ratio is low, data loss is caused, and even data recovery and electronic evidence obtaining failure are caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for recovering a damaged ZIP compressed file, which realizes the recovery of the damaged ZIP compressed file by constructing a Huffman code table for three times and decoding the Huffman code table, and comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500;
s500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
Preferably, the step S100 includes the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
Preferably, the step S200 includes the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
Preferably, the step S300 includes the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
Preferably, in step S400, when the decoding result of the encoded stream is-1, it indicates that the end of the LIT/DIST encoded stream is reached.
Preferably, the step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
Preferably, shifting the LIT encoded stream/DIST encoded stream forward by one bit means that the LIT encoded stream/DIST encoded stream is shifted from a low bit to a high bit.
The invention has the beneficial effects that: the method solves the technical problem that no method for recovering the damaged ZIP compressed file exists in the prior art.
Drawings
FIG. 1 is a general flow diagram of a method provided by the present invention;
fig. 2 is a specific flowchart of decoding and obtaining decoded data according to the second Huffman code table and the third Huffman code table in the method provided by the present invention.
Detailed Description
Fig. 1 shows a general flow chart of the method provided by the present invention. As shown in fig. 1, the method provided by the present invention comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
step S100 includes the following steps:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
S200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
step S200 includes the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
S300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
step S300 includes the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
S400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500; specifically, when the decoding result of the encoded stream is-1, it indicates that the end of the LIT/DIST encoded stream is reached.
S500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
The following is an example of the method for acquiring bits in steps S501, S505, and S508:
assuming that the low order (left to right) to high order (left to right) of the bits of the LIT encoded stream/DIST encoded stream is 11110010011, the first step S501 takes the value "1" of one bit at the low order (left-most) and the second loop to step S501 obtains the bit number +1 of the bit, and obtains the value "11" of 1 bit again from the low order to high order of the bits of the LIT encoded stream/DIST encoded stream, and in the same way, the value "111" of 1 bit is obtained again for the third time, the value "1111" of 1 bit is obtained again for the fourth time, and the value "11110" of 1 bit is obtained again for the fifth time.
S600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
specifically, shifting the LIT encoded stream/DIST encoded stream forward by one bit represents shifting the LIT encoded stream/DIST encoded stream from a low bit to a high bit. Examples are as follows:
still assuming that the bits of the LIT encoded stream/DIST encoded stream are 11110010011 from low to high (from left to right), shifting the LIT encoded stream/DIST encoded stream one bit forward for the first time means that after shifting the LIT encoded stream/DIST encoded stream one bit from low to high, the obtained value is "1110010011", the obtained value after shifting one bit for the second time is "110010011", the obtained value after shifting one bit for the third time is "10010011", the obtained value after shifting one bit for the fourth time is "10010011", the obtained value after shifting one bit for the fifth time is "0010011", and so on.
S800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: the intermediate decoded stream is decoded using a lossless compression algorithm (e.g., LZ77) and a final decoded stream is obtained, and the decompressed data is validated and examined.
By the method provided by the invention, the damaged ZIP compressed file can be recovered.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations are possible to those skilled in the art in light of the above teachings, and that all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (7)
1. A method of recovering a damaged ZIP compressed file, comprising the steps of:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500;
s500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
2. A method of recovering a damaged ZIP compressed file according to claim 1, wherein said step S100 comprises the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
3. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S200 comprises the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
4. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S300 comprises the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
5. The method for recovering the damaged ZIP compressed file according to claim 1, wherein the step S400 is performed when the encoded stream decoding result is-1, indicating that the end of the LIT/DIST encoded stream is reached.
6. A method of recovering a damaged ZIP compressed file according to claim 1, wherein said step S500 comprises the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
7. The method of recovering a damaged ZIP compressed file according to claim 1, wherein shifting the LIT/DIST encoded stream forward by one bit means that the LIT/DIST encoded stream is shifted from a low level to a high level by one bit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599777.6A CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599777.6A CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112667583A true CN112667583A (en) | 2021-04-16 |
CN112667583B CN112667583B (en) | 2022-11-04 |
Family
ID=75410436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011599777.6A Active CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112667583B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020160378A1 (en) * | 2000-08-24 | 2002-10-31 | Harper Jeffrey F. | Stress-regulated genes of plants, transgenic plants containing same, and methods of use |
CN1656802A (en) * | 2002-04-09 | 2005-08-17 | 高通股份有限公司 | Apparatus and method for detecting error in a digital image |
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN103886883A (en) * | 2014-03-20 | 2014-06-25 | 公安部物证鉴定中心 | Method and system for recovering lossy video monitoring data |
CN105068895A (en) * | 2015-09-18 | 2015-11-18 | 四川效率源信息安全技术股份有限公司 | Data recovery method aiming at Android equipment |
CN107592117A (en) * | 2017-08-15 | 2018-01-16 | 深圳前海信息技术有限公司 | Deflate-based compressed data block output method and device |
CN110620637A (en) * | 2019-09-26 | 2019-12-27 | 上海仪电(集团)有限公司中央研究院 | Data decompression device and method based on FPGA |
-
2020
- 2020-12-30 CN CN202011599777.6A patent/CN112667583B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020160378A1 (en) * | 2000-08-24 | 2002-10-31 | Harper Jeffrey F. | Stress-regulated genes of plants, transgenic plants containing same, and methods of use |
CN1656802A (en) * | 2002-04-09 | 2005-08-17 | 高通股份有限公司 | Apparatus and method for detecting error in a digital image |
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN103886883A (en) * | 2014-03-20 | 2014-06-25 | 公安部物证鉴定中心 | Method and system for recovering lossy video monitoring data |
CN105068895A (en) * | 2015-09-18 | 2015-11-18 | 四川效率源信息安全技术股份有限公司 | Data recovery method aiming at Android equipment |
CN107592117A (en) * | 2017-08-15 | 2018-01-16 | 深圳前海信息技术有限公司 | Deflate-based compressed data block output method and device |
CN110620637A (en) * | 2019-09-26 | 2019-12-27 | 上海仪电(集团)有限公司中央研究院 | Data decompression device and method based on FPGA |
Non-Patent Citations (2)
Title |
---|
TAO BAN ET AL.: "Efficient Malware Packer Identification Using Support Vector Machines with Spectrum Kernel", 《2013 EIGHTH ASIA JOINT CONFERENCE ON INFORMATION SECURITY》 * |
杨原: "两种常用压缩文件口令恢复技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112667583B (en) | 2022-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7079051B2 (en) | In-place differential compression | |
US7102552B1 (en) | Data compression with edit-in-place capability for compressed data | |
CN108768403B (en) | LZW-based lossless data compression and decompression method, LZW encoder and decoder | |
US7623047B2 (en) | Data sequence compression | |
KR100894002B1 (en) | Device and data method for selective compression and decompression and data format for compressed data | |
US20090060047A1 (en) | Data compression using an arbitrary-sized dictionary | |
EP0729237A2 (en) | Adaptive multiple dictionary data compression | |
KR100353171B1 (en) | Method and apparatus for performing adaptive data compression | |
JP4814292B2 (en) | Data compression and decompression apparatus and method | |
US6225922B1 (en) | System and method for compressing data using adaptive field encoding | |
JPH0888568A (en) | Reversible code encoding method for data | |
US7656320B2 (en) | Difference coding adaptive context model using counting | |
JP2016513436A (en) | Encoder, decoder and method | |
US20100321218A1 (en) | Lossless content encoding | |
CN112667583B (en) | Method for recovering damaged ZIP compressed file | |
US20030174895A1 (en) | Method and apparatus for decoding compressed image data and capable of preventing error propagation | |
Pic et al. | Mq-coder inspired arithmetic coder for synthetic dna data storage | |
US9348535B1 (en) | Compression format designed for a very fast decompressor | |
Shim et al. | DH-LZW: lossless data hiding in LZW compression | |
US20090212981A1 (en) | Bidirectional context model for adaptive compression | |
JP5209467B2 (en) | Method and apparatus for improved multimedia decoder | |
CN113643389B (en) | Image lossless compression method based on segmentation | |
US11967975B1 (en) | Method and apparatus for recursive data compression using seed bits | |
US7612693B2 (en) | Difference coding adaptive context model | |
JP4497029B2 (en) | Data encoding apparatus and data encoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |