CN111597155B - File linearization method suitable for ZIP file - Google Patents

File linearization method suitable for ZIP file Download PDF

Info

Publication number
CN111597155B
CN111597155B CN202010431709.2A CN202010431709A CN111597155B CN 111597155 B CN111597155 B CN 111597155B CN 202010431709 A CN202010431709 A CN 202010431709A CN 111597155 B CN111597155 B CN 111597155B
Authority
CN
China
Prior art keywords
file
zip
offset table
entry
tail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010431709.2A
Other languages
Chinese (zh)
Other versions
CN111597155A (en
Inventor
刘丹
陈亚军
王少康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shuke Wangwei Technology Co ltd
Original Assignee
Beijing Shuke Wangwei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shuke Wangwei Technology Co ltd filed Critical Beijing Shuke Wangwei Technology Co ltd
Priority to CN202010431709.2A priority Critical patent/CN111597155B/en
Publication of CN111597155A publication Critical patent/CN111597155A/en
Application granted granted Critical
Publication of CN111597155B publication Critical patent/CN111597155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention relates to a file linearization method suitable for ZIP files, which constructs a head entry offset table at the head of a source file when compressing the source file, and a program can address the whole compressed file through the head entry offset table, thereby realizing linearization of the compressed file. The method is suitable for leading the offset table of the ZIP-format electronic document by the file linearization method of the ZIP file, thereby meeting the linearization requirement of the ZIP-format electronic document.

Description

File linearization method suitable for ZIP file
Technical Field
The present invention relates to a file compression method, and in particular, to a file linearization method suitable for ZIP files.
Background
The electronic file is often required to be linearized when being applied to a network, namely, the linearization is that the content of the logic front in the file is arranged at the front part of a file binary stream, so that partial content of the file can be displayed without the completion of the whole downloading of the file in the network application, an 'artifact' of the fact that the file is already downloaded is given to a user, thereby improving the user experience of consulting the network file, while for the ZIP file, the linearization of the electronic file is not applicable at present, the ZIP file format is a file format of data compression and file storage, fig. 1 is a ZIP file format principle, the ZIP file is used for arranging file information and content compression data of each component part from the front part of the file, but an Entry table of an Entry offset value corresponding to the file information and the content is arranged at the tail part of the file, and the detailed information of each part of the Entry table can be obtained by consulting the ZIP file specification. This design results in ZIP files that must be fully downloaded to achieve arbitrary addressing throughout the full range, and the post-offset nature of the offset table prevents programs from performing arbitrary jump addressing within the file, which is not compatible with linearization requirements.
In view of the above-mentioned drawbacks, the present inventors have actively studied and innovated to create a new structure of document linearization method suitable for ZIP documents, which makes the method more industrially useful.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a file linearization method applicable to ZIP files, wherein the method is used for pre-setting an offset table to meet the linearization requirement of ZIP-format electronic files.
The file linearization method suitable for ZIP files of the invention comprises the following steps,
s1: compressing a source file data area to obtain a compressed file data stream;
s2: constructing a tail file entry offset table and concatenating it to the tail of the compressed file data stream;
the method also comprises the following steps:
s3: constructing a header entry offset table at the header of the compressed file data stream;
s4: the offset values in the tail file entry offset table at the tail of the file are modified, each offset value in the tail file entry offset table being added to the length of the file header entry offset table.
Furthermore, the header entry offset table name contains the identification characteristic value.
Furthermore, the identification characteristic value is @ linear.
Further, the header entry offset table includes at least one pair of file name and offset value pairs including the file name of the compressed source file and its location offset value within the compressed file data stream.
By means of the scheme, the invention has at least the following advantages: the method for linearizing the file, which is suitable for the ZIP file, realizes linearizing treatment on the compressed file by forming the head entry offset table at the head of the compressed file, the formed file is a ZIP file meeting the specification, the original analysis logic of the file is not affected, and any analysis program realized according to the ZIP specification is not modified, so that the capability of obtaining any jump addressing in the file by analysis from the tail is still reserved.
In summary, the front-end of the offset table meets the linearization requirement for ZIP-format electronic documents. And an operator can reform the file in the package on the premise of maintaining the zip format, so that the file in the package is offset to the front of the entry table, and the linearization requirement is met.
The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a schematic diagram of a prior ZIP file structure and addressing thereof;
FIG. 2 is a schematic diagram of a modified ZIP file and its addressing.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Embodiment one:
referring to fig. 1 to 2, a file linearization method for a ZIP file according to a preferred embodiment of the present invention includes a file linearization method for a ZIP file, including the steps of,
s1: compressing a source file data area to obtain a compressed file data stream;
the step compresses the files through a normal compression program, so as to obtain a compressed file data stream of each Entry, wherein the Entry is a file compressed by each source file, and the file comprises file header information, file content data and additional description.
S2: constructing a tail file entry offset table and connecting the tail file entry offset table to the tail of the compressed file data stream;
the tail file entry offset table includes a source file directory area and a source file directory end flag. The program addresses each file in the compressed file data stream through the tail file entry offset table.
The method also comprises the following steps:
s3: constructing a header entry offset table at the header of the compressed file data stream;
the header Entry offset table is used for addressing each compressed file (i.e. Entry) in the data stream of the compressed file at the rear part thereof so as to realize the full-text addressing and further realize the linearization requirement of the compressed file. It may have the same structure as the above-described tail file entry offset table, and in addition, a dedicated structure may be provided for it, such as a document structure having a plurality of file name and offset value pairs, in order to reduce redundancy.
S4: the offset values in the tail file entry offset table at the tail of the file are modified, each offset value in the tail file entry offset table being added to the length of the file header entry offset table.
The file formed by the above process is a ZIP file meeting the specification, the original parsing logic of the file is not affected, any parsing program implemented according to the ZIP specification is not modified, and the capability of obtaining any jump addressing in the file by parsing from the tail is still reserved (refer to the left broken line part in FIG. 2). However, since the file formed by the method also has an Entry offset table at the head, i.e. the head Entry offset table, an operator can slightly modify the original resolver according to the feature to form a new zip resolver, before addressing the tail file Entry offset table at the tail of the file, try to read the first Entry, if the first file name has a certain identification feature, such as @ linear. The modified parser may obtain addressing capabilities for the files partially arranged in front of the file stream before the complete file is obtained, which exactly meets the requirements of linearization of the electronic file network application.
In summary, the file linearization method suitable for ZIP files of the present invention has the advantages that:
(1) The method does not destroy any characteristics of the ZIP file, and does not affect the analysis of other file formats such as OFD, DOCX and the like based on the compression format, and the original program of the OFD or DOCX file processed by the method can still be normally identified.
(2) On the basis, the linearization effect is realized, when the method is applied to a network application, a program which utilizes the inlet offset data of the front file can obtain better user experience than a common program, the speed of opening the first page content of the document online is improved by more than 50% compared with a comparison group, and the larger the file (the more data in the ZIP file), the more obvious the advantage.
Preferably, the header entry offset table name of the file linearization method suitable for the ZIP file of the present invention contains an identification characteristic value.
Preferably, the file linearization method suitable for ZIP files of the present invention has an identification characteristic value of @ linear.
Embodiment two:
the method for linearizing a file applicable to a ZIP file in this embodiment is substantially the same as that in the first embodiment, and the header entry offset table includes at least one pair of a file name and an offset value pair, where the file name and the offset value pair include a file name of a compressed source file and a position offset value thereof in a compressed file data stream.
Because the pre-header Entry offset file is only used to obtain the name of the in-package file (Entry) and its corresponding offset value, the Entry offset table structure defined by the ZIP file specification is somewhat redundant and complex for this purpose, and thus the design simplifies the structure of the header Entry offset table and reduces the size of the compressed file. The specific generation sequence is as follows:
(1) Analyzing the file name to be compressed, and counting the volume forming a head offset entry table;
(2) Writing an uncompressed Entry with a special name (such as @ linear.entry) in the header of the file, wherein the file content of the Entry is a null byte with the number of bytes being a statistical volume;
(3) Compressing the file according to a conventional mode, and recording the file name and the entry offset value into the memory of the structure;
(4) Completing a tail file entry offset table of the tail of the file according to ZIP specifications;
(5) Writing the entry data in the memory into an offset entry table of the file header;
(6) The ZIP file is closed.
The characteristics and advantages of the compressed file formed by the method are completely consistent with those of the compressed file, and the compressed file is only slightly changed according to the structure in the header entry offset table processed by the ZIP parser.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description.
Furthermore, the foregoing is only a preferred embodiment of the present invention, and it should be noted that it is possible for those skilled in the art to make several improvements and modifications without departing from the technical principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. Meanwhile, it should be understood that although the present disclosure describes embodiments, not every embodiment contains only one independent technical solution, and the description is merely for clarity, and those skilled in the art should consider the disclosure as a whole, and the technical solutions in the examples may be combined appropriately to form other embodiments that can be understood by those skilled in the art.

Claims (4)

1. A file linearization method suitable for ZIP files comprises the following steps,
s1: compressing a source file data area to obtain a compressed file data stream;
s2: constructing a tail file entry offset table and concatenating it to the tail of the compressed file data stream;
it is characterized in that the method comprises the steps of,
the method also comprises the following steps:
s3: constructing a header entry offset table at a header of said compressed file data stream;
s4: modifying an offset value in the tail file entry offset table for a tail of a file, each offset value in the tail file entry offset table plus a length of the head entry offset table for a file.
2. The method for linearizing a file suitable for use in a ZIP file as recited in claim 1, wherein: the header entry offset table contains an identification feature value in its name.
3. The method for linearizing a file suitable for use in a ZIP file as recited in claim 2, wherein: the identification characteristic value is @ linear.
4. The method for linearizing a file suitable for use in a ZIP file as recited in claim 1, wherein: the header entry offset table includes at least one pair of a file name and offset value pair including a file name of a compressed source file and its location offset value within the compressed file data stream.
CN202010431709.2A 2020-05-20 2020-05-20 File linearization method suitable for ZIP file Active CN111597155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010431709.2A CN111597155B (en) 2020-05-20 2020-05-20 File linearization method suitable for ZIP file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010431709.2A CN111597155B (en) 2020-05-20 2020-05-20 File linearization method suitable for ZIP file

Publications (2)

Publication Number Publication Date
CN111597155A CN111597155A (en) 2020-08-28
CN111597155B true CN111597155B (en) 2023-07-14

Family

ID=72192391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010431709.2A Active CN111597155B (en) 2020-05-20 2020-05-20 File linearization method suitable for ZIP file

Country Status (1)

Country Link
CN (1) CN111597155B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131183A (en) * 2020-09-07 2020-12-25 百望股份有限公司 Linear access method of OFD electronic file
US11860951B2 (en) 2021-10-01 2024-01-02 Micro Focus Llc Optimization of a file format

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761095A (en) * 2014-01-23 2014-04-30 上海斐讯数据通信技术有限公司 Method for generating universal header data information of upgraded file
US9311317B1 (en) * 2012-05-14 2016-04-12 Symantec Corporation Injecting custom data into files in zip format containing apps, without unzipping, resigning or re-zipping the files
CN105718538A (en) * 2016-01-18 2016-06-29 中国科学院计算技术研究所 Adaptive compression method and system for distributed file system
CN109495271A (en) * 2018-10-19 2019-03-19 北京梆梆安全科技有限公司 Compare APK file method, apparatus, server and its storage medium
CN109582653A (en) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 Compression, decompression method and the equipment of file

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594169B2 (en) * 2005-08-18 2009-09-22 Adobe Systems Incorporated Compressing, and extracting a value from, a page descriptor format file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9311317B1 (en) * 2012-05-14 2016-04-12 Symantec Corporation Injecting custom data into files in zip format containing apps, without unzipping, resigning or re-zipping the files
CN103761095A (en) * 2014-01-23 2014-04-30 上海斐讯数据通信技术有限公司 Method for generating universal header data information of upgraded file
CN105718538A (en) * 2016-01-18 2016-06-29 中国科学院计算技术研究所 Adaptive compression method and system for distributed file system
CN109495271A (en) * 2018-10-19 2019-03-19 北京梆梆安全科技有限公司 Compare APK file method, apparatus, server and its storage medium
CN109582653A (en) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 Compression, decompression method and the equipment of file

Also Published As

Publication number Publication date
CN111597155A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN111597155B (en) File linearization method suitable for ZIP file
KR100614677B1 (en) Method for compressing/decompressing a structured document
US8078640B1 (en) High efficiency binary encoding
US8988257B2 (en) Data compression utilizing variable and limited length codes
US8245203B2 (en) Logging system and method for computer software
JP4373721B2 (en) Method and system for encoding markup language documents
KR101074010B1 (en) Block unit data compression and decompression method and apparatus thereof
EP1676210B1 (en) Method and apparatus for handling text and binary mark up languages in a computing device
CN108108394B (en) Compressed file recovery method and storage medium of APFS file system
CN102867039B (en) Adding and reading method and device of multimedia annotations
US8688621B2 (en) Systems and methods for information compression
JP2002529849A (en) Data compression method for intermediate object code program executable in embedded system supplied with data processing resources, and embedded system corresponding to this method and having multiple applications
US20070273564A1 (en) Rapidly Queryable Data Compression Format For Xml Files
US7814408B1 (en) Pre-computing and encoding techniques for an electronic document to improve run-time processing
US7594169B2 (en) Compressing, and extracting a value from, a page descriptor format file
CN103366000B (en) A kind of analytic method of large volume XML message
CN112765112A (en) Installation package packing and unpacking method
CN101739391A (en) Method for generating electronic book with binary file format and electronic book generated by same
CN104320454B (en) A kind of method and system that self-defined output is realized in http protocol reduction
US8463759B2 (en) Method and system for compressing data
US20090132564A1 (en) Information processing apparatus, control method, and storage medium
KR100938277B1 (en) Method and apparatus for file compression and restoration of compression format
CN115334169B (en) Communication protocol coding method capable of saving network bandwidth
CN112487249B (en) XML document compression and decompression method and device
CN115952133A (en) Rich text data processing method, system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant