CN111597155B - File linearization method suitable for ZIP file - Google Patents
File linearization method suitable for ZIP file Download PDFInfo
- Publication number
- CN111597155B CN111597155B CN202010431709.2A CN202010431709A CN111597155B CN 111597155 B CN111597155 B CN 111597155B CN 202010431709 A CN202010431709 A CN 202010431709A CN 111597155 B CN111597155 B CN 111597155B
- Authority
- CN
- China
- Prior art keywords
- file
- zip
- offset table
- entry
- tail
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention relates to a file linearization method suitable for ZIP files, which constructs a head entry offset table at the head of a source file when compressing the source file, and a program can address the whole compressed file through the head entry offset table, thereby realizing linearization of the compressed file. The method is suitable for leading the offset table of the ZIP-format electronic document by the file linearization method of the ZIP file, thereby meeting the linearization requirement of the ZIP-format electronic document.
Description
Technical Field
The present invention relates to a file compression method, and in particular, to a file linearization method suitable for ZIP files.
Background
The electronic file is often required to be linearized when being applied to a network, namely, the linearization is that the content of the logic front in the file is arranged at the front part of a file binary stream, so that partial content of the file can be displayed without the completion of the whole downloading of the file in the network application, an 'artifact' of the fact that the file is already downloaded is given to a user, thereby improving the user experience of consulting the network file, while for the ZIP file, the linearization of the electronic file is not applicable at present, the ZIP file format is a file format of data compression and file storage, fig. 1 is a ZIP file format principle, the ZIP file is used for arranging file information and content compression data of each component part from the front part of the file, but an Entry table of an Entry offset value corresponding to the file information and the content is arranged at the tail part of the file, and the detailed information of each part of the Entry table can be obtained by consulting the ZIP file specification. This design results in ZIP files that must be fully downloaded to achieve arbitrary addressing throughout the full range, and the post-offset nature of the offset table prevents programs from performing arbitrary jump addressing within the file, which is not compatible with linearization requirements.
In view of the above-mentioned drawbacks, the present inventors have actively studied and innovated to create a new structure of document linearization method suitable for ZIP documents, which makes the method more industrially useful.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a file linearization method applicable to ZIP files, wherein the method is used for pre-setting an offset table to meet the linearization requirement of ZIP-format electronic files.
The file linearization method suitable for ZIP files of the invention comprises the following steps,
s1: compressing a source file data area to obtain a compressed file data stream;
s2: constructing a tail file entry offset table and concatenating it to the tail of the compressed file data stream;
the method also comprises the following steps:
s3: constructing a header entry offset table at the header of the compressed file data stream;
s4: the offset values in the tail file entry offset table at the tail of the file are modified, each offset value in the tail file entry offset table being added to the length of the file header entry offset table.
Furthermore, the header entry offset table name contains the identification characteristic value.
Furthermore, the identification characteristic value is @ linear.
Further, the header entry offset table includes at least one pair of file name and offset value pairs including the file name of the compressed source file and its location offset value within the compressed file data stream.
By means of the scheme, the invention has at least the following advantages: the method for linearizing the file, which is suitable for the ZIP file, realizes linearizing treatment on the compressed file by forming the head entry offset table at the head of the compressed file, the formed file is a ZIP file meeting the specification, the original analysis logic of the file is not affected, and any analysis program realized according to the ZIP specification is not modified, so that the capability of obtaining any jump addressing in the file by analysis from the tail is still reserved.
In summary, the front-end of the offset table meets the linearization requirement for ZIP-format electronic documents. And an operator can reform the file in the package on the premise of maintaining the zip format, so that the file in the package is offset to the front of the entry table, and the linearization requirement is met.
The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a schematic diagram of a prior ZIP file structure and addressing thereof;
FIG. 2 is a schematic diagram of a modified ZIP file and its addressing.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Embodiment one:
referring to fig. 1 to 2, a file linearization method for a ZIP file according to a preferred embodiment of the present invention includes a file linearization method for a ZIP file, including the steps of,
s1: compressing a source file data area to obtain a compressed file data stream;
the step compresses the files through a normal compression program, so as to obtain a compressed file data stream of each Entry, wherein the Entry is a file compressed by each source file, and the file comprises file header information, file content data and additional description.
S2: constructing a tail file entry offset table and connecting the tail file entry offset table to the tail of the compressed file data stream;
the tail file entry offset table includes a source file directory area and a source file directory end flag. The program addresses each file in the compressed file data stream through the tail file entry offset table.
The method also comprises the following steps:
s3: constructing a header entry offset table at the header of the compressed file data stream;
the header Entry offset table is used for addressing each compressed file (i.e. Entry) in the data stream of the compressed file at the rear part thereof so as to realize the full-text addressing and further realize the linearization requirement of the compressed file. It may have the same structure as the above-described tail file entry offset table, and in addition, a dedicated structure may be provided for it, such as a document structure having a plurality of file name and offset value pairs, in order to reduce redundancy.
S4: the offset values in the tail file entry offset table at the tail of the file are modified, each offset value in the tail file entry offset table being added to the length of the file header entry offset table.
The file formed by the above process is a ZIP file meeting the specification, the original parsing logic of the file is not affected, any parsing program implemented according to the ZIP specification is not modified, and the capability of obtaining any jump addressing in the file by parsing from the tail is still reserved (refer to the left broken line part in FIG. 2). However, since the file formed by the method also has an Entry offset table at the head, i.e. the head Entry offset table, an operator can slightly modify the original resolver according to the feature to form a new zip resolver, before addressing the tail file Entry offset table at the tail of the file, try to read the first Entry, if the first file name has a certain identification feature, such as @ linear. The modified parser may obtain addressing capabilities for the files partially arranged in front of the file stream before the complete file is obtained, which exactly meets the requirements of linearization of the electronic file network application.
In summary, the file linearization method suitable for ZIP files of the present invention has the advantages that:
(1) The method does not destroy any characteristics of the ZIP file, and does not affect the analysis of other file formats such as OFD, DOCX and the like based on the compression format, and the original program of the OFD or DOCX file processed by the method can still be normally identified.
(2) On the basis, the linearization effect is realized, when the method is applied to a network application, a program which utilizes the inlet offset data of the front file can obtain better user experience than a common program, the speed of opening the first page content of the document online is improved by more than 50% compared with a comparison group, and the larger the file (the more data in the ZIP file), the more obvious the advantage.
Preferably, the header entry offset table name of the file linearization method suitable for the ZIP file of the present invention contains an identification characteristic value.
Preferably, the file linearization method suitable for ZIP files of the present invention has an identification characteristic value of @ linear.
Embodiment two:
the method for linearizing a file applicable to a ZIP file in this embodiment is substantially the same as that in the first embodiment, and the header entry offset table includes at least one pair of a file name and an offset value pair, where the file name and the offset value pair include a file name of a compressed source file and a position offset value thereof in a compressed file data stream.
Because the pre-header Entry offset file is only used to obtain the name of the in-package file (Entry) and its corresponding offset value, the Entry offset table structure defined by the ZIP file specification is somewhat redundant and complex for this purpose, and thus the design simplifies the structure of the header Entry offset table and reduces the size of the compressed file. The specific generation sequence is as follows:
(1) Analyzing the file name to be compressed, and counting the volume forming a head offset entry table;
(2) Writing an uncompressed Entry with a special name (such as @ linear.entry) in the header of the file, wherein the file content of the Entry is a null byte with the number of bytes being a statistical volume;
(3) Compressing the file according to a conventional mode, and recording the file name and the entry offset value into the memory of the structure;
(4) Completing a tail file entry offset table of the tail of the file according to ZIP specifications;
(5) Writing the entry data in the memory into an offset entry table of the file header;
(6) The ZIP file is closed.
The characteristics and advantages of the compressed file formed by the method are completely consistent with those of the compressed file, and the compressed file is only slightly changed according to the structure in the header entry offset table processed by the ZIP parser.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description.
Furthermore, the foregoing is only a preferred embodiment of the present invention, and it should be noted that it is possible for those skilled in the art to make several improvements and modifications without departing from the technical principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. Meanwhile, it should be understood that although the present disclosure describes embodiments, not every embodiment contains only one independent technical solution, and the description is merely for clarity, and those skilled in the art should consider the disclosure as a whole, and the technical solutions in the examples may be combined appropriately to form other embodiments that can be understood by those skilled in the art.
Claims (4)
1. A file linearization method suitable for ZIP files comprises the following steps,
s1: compressing a source file data area to obtain a compressed file data stream;
s2: constructing a tail file entry offset table and concatenating it to the tail of the compressed file data stream;
it is characterized in that the method comprises the steps of,
the method also comprises the following steps:
s3: constructing a header entry offset table at a header of said compressed file data stream;
s4: modifying an offset value in the tail file entry offset table for a tail of a file, each offset value in the tail file entry offset table plus a length of the head entry offset table for a file.
2. The method for linearizing a file suitable for use in a ZIP file as recited in claim 1, wherein: the header entry offset table contains an identification feature value in its name.
3. The method for linearizing a file suitable for use in a ZIP file as recited in claim 2, wherein: the identification characteristic value is @ linear.
4. The method for linearizing a file suitable for use in a ZIP file as recited in claim 1, wherein: the header entry offset table includes at least one pair of a file name and offset value pair including a file name of a compressed source file and its location offset value within the compressed file data stream.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010431709.2A CN111597155B (en) | 2020-05-20 | 2020-05-20 | File linearization method suitable for ZIP file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010431709.2A CN111597155B (en) | 2020-05-20 | 2020-05-20 | File linearization method suitable for ZIP file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111597155A CN111597155A (en) | 2020-08-28 |
CN111597155B true CN111597155B (en) | 2023-07-14 |
Family
ID=72192391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010431709.2A Active CN111597155B (en) | 2020-05-20 | 2020-05-20 | File linearization method suitable for ZIP file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597155B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131183A (en) * | 2020-09-07 | 2020-12-25 | 百望股份有限公司 | Linear access method of OFD electronic file |
US11860951B2 (en) | 2021-10-01 | 2024-01-02 | Micro Focus Llc | Optimization of a file format |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761095A (en) * | 2014-01-23 | 2014-04-30 | 上海斐讯数据通信技术有限公司 | Method for generating universal header data information of upgraded file |
US9311317B1 (en) * | 2012-05-14 | 2016-04-12 | Symantec Corporation | Injecting custom data into files in zip format containing apps, without unzipping, resigning or re-zipping the files |
CN105718538A (en) * | 2016-01-18 | 2016-06-29 | 中国科学院计算技术研究所 | Adaptive compression method and system for distributed file system |
CN109495271A (en) * | 2018-10-19 | 2019-03-19 | 北京梆梆安全科技有限公司 | Compare APK file method, apparatus, server and its storage medium |
CN109582653A (en) * | 2018-11-14 | 2019-04-05 | 网易(杭州)网络有限公司 | Compression, decompression method and the equipment of file |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7594169B2 (en) * | 2005-08-18 | 2009-09-22 | Adobe Systems Incorporated | Compressing, and extracting a value from, a page descriptor format file |
-
2020
- 2020-05-20 CN CN202010431709.2A patent/CN111597155B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9311317B1 (en) * | 2012-05-14 | 2016-04-12 | Symantec Corporation | Injecting custom data into files in zip format containing apps, without unzipping, resigning or re-zipping the files |
CN103761095A (en) * | 2014-01-23 | 2014-04-30 | 上海斐讯数据通信技术有限公司 | Method for generating universal header data information of upgraded file |
CN105718538A (en) * | 2016-01-18 | 2016-06-29 | 中国科学院计算技术研究所 | Adaptive compression method and system for distributed file system |
CN109495271A (en) * | 2018-10-19 | 2019-03-19 | 北京梆梆安全科技有限公司 | Compare APK file method, apparatus, server and its storage medium |
CN109582653A (en) * | 2018-11-14 | 2019-04-05 | 网易(杭州)网络有限公司 | Compression, decompression method and the equipment of file |
Also Published As
Publication number | Publication date |
---|---|
CN111597155A (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111597155B (en) | File linearization method suitable for ZIP file | |
KR100614677B1 (en) | Method for compressing/decompressing a structured document | |
US8078640B1 (en) | High efficiency binary encoding | |
US8988257B2 (en) | Data compression utilizing variable and limited length codes | |
US8245203B2 (en) | Logging system and method for computer software | |
JP4373721B2 (en) | Method and system for encoding markup language documents | |
KR101074010B1 (en) | Block unit data compression and decompression method and apparatus thereof | |
EP1676210B1 (en) | Method and apparatus for handling text and binary mark up languages in a computing device | |
CN108108394B (en) | Compressed file recovery method and storage medium of APFS file system | |
CN102867039B (en) | Adding and reading method and device of multimedia annotations | |
US8688621B2 (en) | Systems and methods for information compression | |
JP2002529849A (en) | Data compression method for intermediate object code program executable in embedded system supplied with data processing resources, and embedded system corresponding to this method and having multiple applications | |
US20070273564A1 (en) | Rapidly Queryable Data Compression Format For Xml Files | |
US7814408B1 (en) | Pre-computing and encoding techniques for an electronic document to improve run-time processing | |
US7594169B2 (en) | Compressing, and extracting a value from, a page descriptor format file | |
CN103366000B (en) | A kind of analytic method of large volume XML message | |
CN112765112A (en) | Installation package packing and unpacking method | |
CN101739391A (en) | Method for generating electronic book with binary file format and electronic book generated by same | |
CN104320454B (en) | A kind of method and system that self-defined output is realized in http protocol reduction | |
US8463759B2 (en) | Method and system for compressing data | |
US20090132564A1 (en) | Information processing apparatus, control method, and storage medium | |
KR100938277B1 (en) | Method and apparatus for file compression and restoration of compression format | |
CN115334169B (en) | Communication protocol coding method capable of saving network bandwidth | |
CN112487249B (en) | XML document compression and decompression method and device | |
CN115952133A (en) | Rich text data processing method, system and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |