CN112667583A - Method for recovering damaged ZIP compressed file - Google Patents

Method for recovering damaged ZIP compressed file Download PDF

Info

Publication number
CN112667583A
CN112667583A CN202011599777.6A CN202011599777A CN112667583A CN 112667583 A CN112667583 A CN 112667583A CN 202011599777 A CN202011599777 A CN 202011599777A CN 112667583 A CN112667583 A CN 112667583A
Authority
CN
China
Prior art keywords
stream
executing
bit
data
dist
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011599777.6A
Other languages
Chinese (zh)
Other versions
CN112667583B (en
Inventor
梁效宁
朱星海
陆宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xly Salvationdata Technology Inc
Original Assignee
Xly Salvationdata Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xly Salvationdata Technology Inc filed Critical Xly Salvationdata Technology Inc
Priority to CN202011599777.6A priority Critical patent/CN112667583B/en
Publication of CN112667583A publication Critical patent/CN112667583A/en
Application granted granted Critical
Publication of CN112667583B publication Critical patent/CN112667583B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for recovering a damaged ZIP compressed file, which is characterized by comprising the following steps of: s100: constructing a first Huffman code table; s200: constructing a second Huffman code table; s300: constructing a third Huffman code table; s400: whether the decoding position is the end of the LIT coded stream/DIST coded stream or not, if so, executing step SC00, otherwise, executing step S500; s500: acquiring decoded data; s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700; s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; s800: judging whether the mark is an end mark, if so, executing step SA00, otherwise, executing step S900; s900: moving the LIT coded stream/DIST coded stream from a low bit to a high bit by N bits; SA 00: judging whether the data is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00; SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400; SC 00: and acquiring a final decoded stream.

Description

Method for recovering damaged ZIP compressed file
Technical Field
The invention belongs to the field of data recovery and electronic evidence collection, and relates to a method for recovering a damaged ZIP compressed file.
Background
The ZIP file format is a file format for data compression and document storage, and belongs to one of several mainstream compression formats. Microsoft Windows operating system provides built-in support for the zip format, and even if decompression software is not installed on a computer of a user, compressed files in the zip format can be opened and made, so that the compression mode is commonly used for file transmission and storage in various industrial works. When the file is damaged, decompression is needed before processing. In the practical application process, the most common problem is that the ZIP compressed file is damaged and cannot be decompressed and opened, so that the data is lost. A recovery method of a damaged ZIP compressed file becomes very important.
In general, the main part of the Deflate compressed data stream of the ZIP compressed file is a LIT encoded stream/a DIST encoded stream. The more data is compressed, the larger the ratio LIT/DIST encoded streams is, and in the limit, the ratio can approach 99.9%. Therefore, the situation that the LIT coded stream/DIST coded stream is damaged is more common.
The problems of the prior art are as follows: the recovery decompression method for the damaged ZIP compressed file often fails to decompress normal original data or can only decompress the first section of normal data when the main part of the data storage (i.e. LIST encoded stream/DIST encoded stream) is damaged, so that the recovery ratio is low, data loss is caused, and even data recovery and electronic evidence obtaining failure are caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for recovering a damaged ZIP compressed file, which realizes the recovery of the damaged ZIP compressed file by constructing a Huffman code table for three times and decoding the Huffman code table, and comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500;
s500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
Preferably, the step S100 includes the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
Figure BDA0002870712630000031
Preferably, the step S200 includes the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
Preferably, the step S300 includes the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
Preferably, in step S400, when the decoding result of the encoded stream is-1, it indicates that the end of the LIT/DIST encoded stream is reached.
Preferably, the step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
Preferably, shifting the LIT encoded stream/DIST encoded stream forward by one bit means that the LIT encoded stream/DIST encoded stream is shifted from a low bit to a high bit.
The invention has the beneficial effects that: the method solves the technical problem that no method for recovering the damaged ZIP compressed file exists in the prior art.
Drawings
FIG. 1 is a general flow diagram of a method provided by the present invention;
fig. 2 is a specific flowchart of decoding and obtaining decoded data according to the second Huffman code table and the third Huffman code table in the method provided by the present invention.
Detailed Description
Fig. 1 shows a general flow chart of the method provided by the present invention. As shown in fig. 1, the method provided by the present invention comprises the following steps:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
step S100 includes the following steps:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
Figure BDA0002870712630000051
S200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
step S200 includes the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
S300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
step S300 includes the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
S400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500; specifically, when the decoding result of the encoded stream is-1, it indicates that the end of the LIT/DIST encoded stream is reached.
S500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
step S500 includes the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
The following is an example of the method for acquiring bits in steps S501, S505, and S508:
assuming that the low order (left to right) to high order (left to right) of the bits of the LIT encoded stream/DIST encoded stream is 11110010011, the first step S501 takes the value "1" of one bit at the low order (left-most) and the second loop to step S501 obtains the bit number +1 of the bit, and obtains the value "11" of 1 bit again from the low order to high order of the bits of the LIT encoded stream/DIST encoded stream, and in the same way, the value "111" of 1 bit is obtained again for the third time, the value "1111" of 1 bit is obtained again for the fourth time, and the value "11110" of 1 bit is obtained again for the fifth time.
S600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
specifically, shifting the LIT encoded stream/DIST encoded stream forward by one bit represents shifting the LIT encoded stream/DIST encoded stream from a low bit to a high bit. Examples are as follows:
still assuming that the bits of the LIT encoded stream/DIST encoded stream are 11110010011 from low to high (from left to right), shifting the LIT encoded stream/DIST encoded stream one bit forward for the first time means that after shifting the LIT encoded stream/DIST encoded stream one bit from low to high, the obtained value is "1110010011", the obtained value after shifting one bit for the second time is "110010011", the obtained value after shifting one bit for the third time is "10010011", the obtained value after shifting one bit for the fourth time is "10010011", the obtained value after shifting one bit for the fifth time is "0010011", and so on.
S800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: the intermediate decoded stream is decoded using a lossless compression algorithm (e.g., LZ77) and a final decoded stream is obtained, and the decompressed data is validated and examined.
By the method provided by the invention, the damaged ZIP compressed file can be recovered.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations are possible to those skilled in the art in light of the above teachings, and that all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (7)

1. A method of recovering a damaged ZIP compressed file, comprising the steps of:
s100: constructing a first Huffman code table according to a first code length sequence in the Deflate compressed data stream;
s200: for the case of SQ1 encoded streams: constructing a second Huffman code table, and executing the step S400;
s300: for the case of SQ2 encoded streams: constructing a third Huffman code table, and executing the step S400;
s400: judging whether the current decoding position is the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step S500;
s500: according to the bit, LIT coding stream/DIST coding stream data is taken as a code word, and decoding is carried out according to a second Huffman code table and a third Huffman code table to obtain decoding data;
s600: judging whether the data decoding is successful, if so, executing the step S800, otherwise, executing the step S700;
s700: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
s800: judging whether the current decoding data is an end mark, if so, executing step SA00, otherwise, executing step S900;
s900: writing the current decoding data into an intermediate decoding stream, and moving an LIT encoding stream/DIST encoding stream from a low bit to a high bit by N bits, wherein N is the bit number of a code word;
SA 00: judging whether the code word is positioned at the end of the LIT coded stream/DIST coded stream, if so, executing a step SC00, otherwise, executing a step SB 00;
SB 00: moving the LIT encoded stream/DIST encoded stream forward by one bit, and executing step S400;
SC 00: and decoding the intermediate decoded stream by adopting a lossless compression algorithm, acquiring a final decoded stream, and confirming and checking decompressed data.
2. A method of recovering a damaged ZIP compressed file according to claim 1, wherein said step S100 comprises the steps of:
and constructing a first Huffman code table according to a data structure and a first code length sequence of the Deflate compressed data stream, wherein the data structure of the Deflate compressed data stream is shown in Table 1 and comprises the first code length sequence, an SQ1 code stream, an SQ2 code stream and an LIT code stream/DIST code stream.
Table 1: data structure of Deflate compressed data stream
Figure FDA0002870712620000021
3. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S200 comprises the steps of:
s201: decoding an SQ1 coded stream and acquiring an SQ1 sequence according to the first Huffman code table;
s202: the SQ1 sequence is decoded in the run length mode, and a second code length sequence is obtained;
s203: and constructing a second Huffman code table according to the second code length sequence, and executing the step S600.
4. A method of recovering a damaged ZIP compressed file according to claim 1, wherein the step S300 comprises the steps of:
s301: according to a first Huffman code table, run-length decoding a SQ2 coded stream and acquiring a SQ2 sequence;
s302: run length decoding SQ2 sequence to obtain the third code length sequence;
s303: and constructing a third Huffman code table according to the third code length sequence, and executing the step S600.
5. The method for recovering the damaged ZIP compressed file according to claim 1, wherein the step S400 is performed when the encoded stream decoding result is-1, indicating that the end of the LIT/DIST encoded stream is reached.
6. A method of recovering a damaged ZIP compressed file according to claim 1, wherein said step S500 comprises the steps of:
s501: acquiring the content of 1 bit according to the low order to the high order of the bit of the LIT coding stream/DIST coding stream, and adding the tail to the code word;
s502: judging whether the type of the previous decoding data is length, if so, executing step S503, otherwise, executing step S506;
s503: searching code words in a third Huffman code table;
s504: judging whether the code word in the third Huffman code table is found, if so, executing step S505, otherwise, executing step S509;
s505: duplicating the corresponding decoded data, marking the type of the decoded data as distance, acquiring the bit number +1 of the bit, and executing the step S501;
s506: searching code words in a second Huffman code table;
s507: judging whether the code word in the second Huffman code table is found, if so, executing step S508, otherwise, executing step S509;
s508: duplicating corresponding decoding data, marking the type of the decoding data as natural or length, acquiring the bit number +1 of the bit, and executing the step S501; if so, executing step S600, otherwise, executing step S509;
s509: and judging whether the current decoding data is empty or not, if so, executing the step S700, otherwise, executing the step S600.
7. The method of recovering a damaged ZIP compressed file according to claim 1, wherein shifting the LIT/DIST encoded stream forward by one bit means that the LIT/DIST encoded stream is shifted from a low level to a high level by one bit.
CN202011599777.6A 2020-12-30 2020-12-30 Method for recovering damaged ZIP compressed file Active CN112667583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011599777.6A CN112667583B (en) 2020-12-30 2020-12-30 Method for recovering damaged ZIP compressed file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011599777.6A CN112667583B (en) 2020-12-30 2020-12-30 Method for recovering damaged ZIP compressed file

Publications (2)

Publication Number Publication Date
CN112667583A true CN112667583A (en) 2021-04-16
CN112667583B CN112667583B (en) 2022-11-04

Family

ID=75410436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011599777.6A Active CN112667583B (en) 2020-12-30 2020-12-30 Method for recovering damaged ZIP compressed file

Country Status (1)

Country Link
CN (1) CN112667583B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020160378A1 (en) * 2000-08-24 2002-10-31 Harper Jeffrey F. Stress-regulated genes of plants, transgenic plants containing same, and methods of use
CN1656802A (en) * 2002-04-09 2005-08-17 高通股份有限公司 Apparatus and method for detecting error in a digital image
CN102438145A (en) * 2011-11-22 2012-05-02 广州中大电讯科技有限公司 Image lossless compression method on basis of Huffman code
CN103886883A (en) * 2014-03-20 2014-06-25 公安部物证鉴定中心 Method and system for recovering lossy video monitoring data
CN105068895A (en) * 2015-09-18 2015-11-18 四川效率源信息安全技术股份有限公司 Data recovery method aiming at Android equipment
CN107592117A (en) * 2017-08-15 2018-01-16 深圳前海信息技术有限公司 Deflate-based compressed data block output method and device
CN110620637A (en) * 2019-09-26 2019-12-27 上海仪电(集团)有限公司中央研究院 Data decompression device and method based on FPGA

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020160378A1 (en) * 2000-08-24 2002-10-31 Harper Jeffrey F. Stress-regulated genes of plants, transgenic plants containing same, and methods of use
CN1656802A (en) * 2002-04-09 2005-08-17 高通股份有限公司 Apparatus and method for detecting error in a digital image
CN102438145A (en) * 2011-11-22 2012-05-02 广州中大电讯科技有限公司 Image lossless compression method on basis of Huffman code
CN103886883A (en) * 2014-03-20 2014-06-25 公安部物证鉴定中心 Method and system for recovering lossy video monitoring data
CN105068895A (en) * 2015-09-18 2015-11-18 四川效率源信息安全技术股份有限公司 Data recovery method aiming at Android equipment
CN107592117A (en) * 2017-08-15 2018-01-16 深圳前海信息技术有限公司 Deflate-based compressed data block output method and device
CN110620637A (en) * 2019-09-26 2019-12-27 上海仪电(集团)有限公司中央研究院 Data decompression device and method based on FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAO BAN ET AL.: "Efficient Malware Packer Identification Using Support Vector Machines with Spectrum Kernel", 《2013 EIGHTH ASIA JOINT CONFERENCE ON INFORMATION SECURITY》 *
杨原: "两种常用压缩文件口令恢复技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN112667583B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
US7079051B2 (en) In-place differential compression
US7102552B1 (en) Data compression with edit-in-place capability for compressed data
CN108768403B (en) LZW-based lossless data compression and decompression method, LZW encoder and decoder
US7623047B2 (en) Data sequence compression
KR100894002B1 (en) Device and data method for selective compression and decompression and data format for compressed data
US20090060047A1 (en) Data compression using an arbitrary-sized dictionary
EP0729237A2 (en) Adaptive multiple dictionary data compression
KR100353171B1 (en) Method and apparatus for performing adaptive data compression
JP4814292B2 (en) Data compression and decompression apparatus and method
US6225922B1 (en) System and method for compressing data using adaptive field encoding
JPH0888568A (en) Reversible code encoding method for data
US7656320B2 (en) Difference coding adaptive context model using counting
JP2016513436A (en) Encoder, decoder and method
US20100321218A1 (en) Lossless content encoding
CN112667583B (en) Method for recovering damaged ZIP compressed file
US20030174895A1 (en) Method and apparatus for decoding compressed image data and capable of preventing error propagation
Pic et al. Mq-coder inspired arithmetic coder for synthetic dna data storage
US9348535B1 (en) Compression format designed for a very fast decompressor
Shim et al. DH-LZW: lossless data hiding in LZW compression
US20090212981A1 (en) Bidirectional context model for adaptive compression
JP5209467B2 (en) Method and apparatus for improved multimedia decoder
CN113643389B (en) Image lossless compression method based on segmentation
US11967975B1 (en) Method and apparatus for recursive data compression using seed bits
US7612693B2 (en) Difference coding adaptive context model
JP4497029B2 (en) Data encoding apparatus and data encoding method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant