CN112667583B - Method for recovering damaged ZIP compressed file - Google Patents
Method for recovering damaged ZIP compressed file Download PDFInfo
- Publication number
- CN112667583B CN112667583B CN202011599777.6A CN202011599777A CN112667583B CN 112667583 B CN112667583 B CN 112667583B CN 202011599777 A CN202011599777 A CN 202011599777A CN 112667583 B CN112667583 B CN 112667583B
- Authority
- CN
- China
- Prior art keywords
- executing
- stream
- bit
- data
- dist
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method for recovering a damaged ZIP compressed file, which is characterized by comprising the following steps of: s100: constructing a first Huffman code table; s200: constructing a second Huffman code table; s300: constructing a third Huffman code table; s400: whether the decoding position is the end of the LIT coding stream/DIST coding stream or not, if so, executing the step SC00, otherwise, executing the step S500; s500: acquiring decoded data; s600: judging whether the data decoding is successful, if so, executing step S800, otherwise, executing step S700; s700: shifting the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; s800: judging whether the mark is an end mark, if so, executing a step SA00, otherwise, executing a step S900; s900: moving the LIT coded stream/DIST coded stream from a low bit to a high bit by N bits; SA00: judging whether the data is positioned at the tail of the LIT coding stream/DIST coding stream, if so, executing a step SC00, otherwise, executing a step SB00; SB00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; SC00: and acquiring a final decoded stream.
Description
Technical Field
The invention belongs to the field of data recovery and electronic evidence collection, and relates to a method for recovering a damaged ZIP compressed file.
Background
The ZIP file format is a file format for data compression and document storage, and belongs to one of several mainstream compression formats. Microsoft Windows operating system provides built-in support for the zip format, and even if decompression software is not installed on a computer of a user, compressed files in the zip format can be opened and made, so that the compression mode is commonly used for file transmission and storage in various industrial works. When the file is damaged, decompression is needed before processing. In the practical application process, the most common problem is that the ZIP compressed file is damaged and cannot be decompressed and opened, so that the data is lost. A recovery method of a damaged ZIP compressed file becomes very important.
In general, the main part of the Deflate compressed data stream of the ZIP compressed file is a LIT encoded stream/a DIST encoded stream. The more data is compressed, the larger the ratio LIT/DIST encoded streams is, and in the limit, the ratio can approach 99.9%. Therefore, the situation that the LIT coded stream/DIST coded stream is damaged is more common.
The problems of the prior art are as follows: the recovery decompression method for the damaged ZIP compressed file often fails to decompress normal original data or can only decompress the first section of normal data when the main part of the data storage (i.e. LIST encoded stream/DIST encoded stream) is damaged, so that the recovery ratio is low, data loss is caused, and even data recovery and electronic evidence obtaining failure are caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for recovering a damaged ZIP compressed file, which realizes the recovery of the damaged ZIP compressed file by constructing a Huffman code table for three times and decoding the Huffman code table, and comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded stream: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded stream: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing the step SC00, otherwise, executing the step S500;
s500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB00;
SB00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
Preferably, the step S100 includes the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in table 1 and comprises the first code length sequence, an SQ1 coded stream, an SQ2 coded stream and an LIT coded stream/DIST coded stream.
Table 1: data structure of Deflate compressed data stream
Preferably, the step S200 includes the steps of:
s201: decoding an SQ1 coding stream according to a first Huffman code table and acquiring an SQ1 sequence;
s202: decoding the SQ1 sequence in a run-length mode to obtain a second code length sequence;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
Preferably, the step S300 includes the steps of:
s301: according to the first Huffman code table, run-length decoding the SQ2 coded stream and acquiring an SQ2 sequence;
s302: decoding the SQ2 sequence in a run-length mode to obtain a third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
Preferably, in step S400, when the decoding result of the coded stream is-1, it indicates that the end of the LIT/DIST coded stream is reached.
Preferably, the step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coded stream/DIST coded stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if yes, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
Preferably, shifting the LIT encoded stream/DIST encoded stream forward by one bit means that the LIT encoded stream/DIST encoded stream is shifted from a low bit to a high bit.
The beneficial effects of the invention are: the method solves the technical problem that no method for recovering the damaged ZIP compressed file exists in the prior art.
Drawings
FIG. 1 is a general flow diagram of a method provided by the present invention;
fig. 2 is a specific flowchart of decoding and obtaining decoded data according to the second Huffman code table and the third Huffman code table in the method provided by the present invention.
Detailed Description
Fig. 1 shows a general flow chart of the method provided by the present invention. As shown in fig. 1, the method provided by the present invention comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
step S100 includes the following steps:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in table 1 and comprises the first code length sequence, an SQ1 coded stream, an SQ2 coded stream and an LIT coded stream/DIST coded stream.
Table 1: data structure of Deflate compressed data stream
S200: for the case of SQ1 coded stream: constructing a second Huffman code table, and executing the step S400;
the step S200 includes the steps of:
s201: decoding an SQ1 coding stream according to a first Huffman code table and acquiring an SQ1 sequence;
s202: run length decoding SQ1 sequence to obtain a second code length sequence;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
S300: for the case of SQ2 encoded stream: constructing a third Huffman code table, and executing the step S400;
step S300 includes the steps of:
s301: according to a first Huffman code table, run-length decoding an SQ2 coding stream and acquiring an SQ2 sequence;
s302: decoding the SQ2 sequence in a run-length mode to obtain a third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
S400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing the step SC00, otherwise, executing the step S500; specifically, when the decoding result of the encoded stream is-1, it indicates that the end of the LIT/DIST encoded stream is reached.
S500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
The following is an example of the method for acquiring bits in steps S501, S505, and S508:
assuming that the low order to the high order (left to right) of the bits of the LIT encoded stream/DIST encoded stream is 11110010011, the first step S501 takes the value "1" of one bit of the low order (left-most) and the second loop is performed to step S501 to obtain the bit number +1 of the bits, and then obtains the value "11" of 1 bit from the low order to the high order of the bits of the LIT encoded stream/DIST encoded stream, and similarly, the third time obtains the value "111" of 1 bit, the fourth time obtains the value "1111" of 1 bit, and the fifth time obtains the value "11110" of 1 bit.
S600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: shifting the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
specifically, shifting the LIT encoded stream/DIST encoded stream forward by one bit represents shifting the LIT encoded stream/DIST encoded stream from a low bit to a high bit. Examples are as follows:
still, assuming that the bits of the LIT encoded stream/DIST encoded stream are 11110010011 from low to high (from left to right), shifting the LIT encoded stream/DIST encoded stream one bit forward for the first time means that after the LIT encoded stream/DIST encoded stream is shifted one bit from low to high, the obtained value is "1110010011", the obtained value after shifting one bit for the second time is "110010011", the obtained value after shifting one bit for the third time is "10010011", the obtained value after shifting one bit for the fourth time is "10010011", the obtained value after shifting one bit for the fifth time is "0010011".
S800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB00;
SB00: shifting the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC00: the intermediate decoded stream is decoded using a lossless compression algorithm (e.g., LZ 77) and the final decoded stream is obtained, and the decompressed data is validated and examined.
By the method provided by the invention, the damaged ZIP compressed file can be recovered.
It will be understood that the invention is not limited to the examples described above, but that modifications and variations are possible to those skilled in the art in light of the above teachings, and that all such modifications and variations are within the scope of the invention as defined in the appended claims.
Claims (7)
1. A method of recovering a damaged ZIP compressed file, comprising the steps of:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded stream: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded stream: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing the step SC00, otherwise, executing the step S500;
s500: according to the bit, LIT coded stream/DIST coded stream data is taken as a code word, decoding is carried out according to a second Huffman code table and a third Huffman code table, and decoding data are obtained;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB00;
SB00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
2. A method of recovering a damaged ZIP compressed file according to claim 1, wherein said step S100 comprises the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in table 1 and comprises the first code length sequence, an SQ1 coded stream, an SQ2 coded stream and an LIT coded stream/DIST coded stream.
Table 1: data structure of Deflate compressed data stream
3. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S200 comprises the steps of:
s201: decoding an SQ1 coding stream according to a first Huffman code table and acquiring an SQ1 sequence;
s202: run length decoding SQ1 sequence to obtain a second code length sequence;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
4. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S300 comprises the steps of:
s301: according to the first Huffman code table, run-length decoding the SQ2 coded stream and acquiring an SQ2 sequence;
s302: decoding the SQ2 sequence in a run-length mode to obtain a third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
5. The method for recovering the damaged ZIP compressed file according to claim 1, wherein in step S400, when the decoding result of the coded stream is-1, it indicates that the end of the LIT/DIST coded stream is reached.
6. A method for recovering a damaged ZIP compressed file according to claim 1, wherein the step S500 comprises the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
7. The method of recovering a damaged ZIP compressed file according to claim 1, wherein shifting the LIT/DIST encoded stream forward by one bit means that the LIT/DIST encoded stream is shifted from a low level to a high level by one bit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599777.6A CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011599777.6A CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112667583A CN112667583A (en) | 2021-04-16 |
CN112667583B true CN112667583B (en) | 2022-11-04 |
Family
ID=75410436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011599777.6A Active CN112667583B (en) | 2020-12-30 | 2020-12-30 | Method for recovering damaged ZIP compressed file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112667583B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1656802A (en) * | 2002-04-09 | 2005-08-17 | 高通股份有限公司 | Apparatus and method for detecting error in a digital image |
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN103886883A (en) * | 2014-03-20 | 2014-06-25 | 公安部物证鉴定中心 | Method and system for recovering lossy video monitoring data |
CN105068895A (en) * | 2015-09-18 | 2015-11-18 | 四川效率源信息安全技术股份有限公司 | Data recovery method aiming at Android equipment |
CN107592117A (en) * | 2017-08-15 | 2018-01-16 | 深圳前海信息技术有限公司 | Compression data block output intent and device based on Deflate |
CN110620637A (en) * | 2019-09-26 | 2019-12-27 | 上海仪电(集团)有限公司中央研究院 | Data decompression device and method based on FPGA |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2420555C (en) * | 2000-08-24 | 2012-10-23 | Jeffrey F. Harper | Stress-regulated genes of plants, transgenic plants containing same, and methods of use |
-
2020
- 2020-12-30 CN CN202011599777.6A patent/CN112667583B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1656802A (en) * | 2002-04-09 | 2005-08-17 | 高通股份有限公司 | Apparatus and method for detecting error in a digital image |
CN102438145A (en) * | 2011-11-22 | 2012-05-02 | 广州中大电讯科技有限公司 | Image lossless compression method on basis of Huffman code |
CN103886883A (en) * | 2014-03-20 | 2014-06-25 | 公安部物证鉴定中心 | Method and system for recovering lossy video monitoring data |
CN105068895A (en) * | 2015-09-18 | 2015-11-18 | 四川效率源信息安全技术股份有限公司 | Data recovery method aiming at Android equipment |
CN107592117A (en) * | 2017-08-15 | 2018-01-16 | 深圳前海信息技术有限公司 | Compression data block output intent and device based on Deflate |
CN110620637A (en) * | 2019-09-26 | 2019-12-27 | 上海仪电(集团)有限公司中央研究院 | Data decompression device and method based on FPGA |
Non-Patent Citations (2)
Title |
---|
Efficient Malware Packer Identification Using Support Vector Machines with Spectrum Kernel;Tao Ban et al.;《2013 Eighth Asia Joint Conference on Information Security》;20131007;69-76 * |
两种常用压缩文件口令恢复技术的研究与实现;杨原;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215;I138-257 * |
Also Published As
Publication number | Publication date |
---|---|
CN112667583A (en) | 2021-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7079051B2 (en) | In-place differential compression | |
US7102552B1 (en) | Data compression with edit-in-place capability for compressed data | |
US7907069B2 (en) | Fast compression method for scientific data | |
KR100353171B1 (en) | Method and apparatus for performing adaptive data compression | |
JP4814292B2 (en) | Data compression and decompression apparatus and method | |
WO2010044100A1 (en) | Lossless compression | |
US6225922B1 (en) | System and method for compressing data using adaptive field encoding | |
JPH0888568A (en) | Reversible code encoding method for data | |
US7656320B2 (en) | Difference coding adaptive context model using counting | |
US20100321218A1 (en) | Lossless content encoding | |
CN112667583B (en) | Method for recovering damaged ZIP compressed file | |
JP4093200B2 (en) | Data compression method and program, and data restoration method and apparatus | |
US9348535B1 (en) | Compression format designed for a very fast decompressor | |
Shim et al. | DH-LZW: lossless data hiding in LZW compression | |
Pic et al. | Mq-coder inspired arithmetic coder for synthetic dna data storage | |
US7612692B2 (en) | Bidirectional context model for adaptive compression | |
JPH10190476A (en) | Data compression method and device for the method | |
US6262675B1 (en) | Method of compressing data with an alphabet | |
US7652599B1 (en) | Range normalization for entropy reduction arithmetic encoding/decoding | |
US11967975B1 (en) | Method and apparatus for recursive data compression using seed bits | |
KR101141897B1 (en) | Encoding/Decoding Method for Data Hiding And Encoder/Decoder using the method | |
JP4497029B2 (en) | Data encoding apparatus and data encoding method | |
JPH03209923A (en) | Data compressing system | |
CN115037309A (en) | Compression parameter analysis and calculation method and system for LZ77 compression algorithm | |
JP3051501B2 (en) | Data compression method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |